mygcc overview

the customizable gcc compiler

introduction

Mygcc allows programmers to add their own checks to gcc, that take into account syntax, control flow, and data flow information.
Syntax information is expressed by syntax patterns. Control flow and data flow information are expressed as graph reachability properties.

User-defined checks are performed in addition to normal compilation, and may result in additional warning messages.

patterns

Atomic syntactic patterns are plain C code fragments but possibly containing pattern variables. Pattern variables are sometimes called meta-variables, to distinguish them from variables of the C language. Pattern variables are represented by the `%' character followed by a variable name consisting of one letter. For example, `%X = malloc (%Y)', `return %X', `%X = %X + 1', `%X++', `%X * 3', `sizeof (%X)', and `%X == 0' are atomic patterns representing C statements or expressions. Note that there is no special difference between atomic statement patterns and atomic expression patterns.

A C fragment matches an atomic pattern if there is a substitution mapping the pattern variables to C sub-fragments that makes the pattern equal to the C fragment. This means that a same variable occurring twice in a pattern stands for the same sub-fragment. By exception, there is an anonymous variable, noted `%_' whose occurrences may stand for different sub-fragments. For instance, the code fragment `a[i] = a[i] + 1' matches the pattern `%X = %X + 1'. The fragment `a[i] = a[i - 1] + 1' does not match the pattern `%X = %X + 1', but it matches the pattern `%X = %Y + 1' and also the pattern `%_ = %_ + 1'.

simple checks

Option `-ftree-check=PATTERN'

Raise a warning for every statement matching an atomic syntactic PATTERN in the source files being compiled.

Using `-ftree-check=PATTERN', one can search for dangerous or unrecommended statements. For instance, the standard library call `gets ()' is unrecommended because it can be used in a malicious way by an external attack that overruns the corresponding buffer in the program. All these cases may be warned by supplying the option `-ftree-check="gets (%_)"'.

Note that patterns are tried only on the top level of a statement. Thus, pattern `gets (%_)' would not match statement `i = gets(buff)'. This latter form can be matched by the pattern `%_ = gets (%_)'. If you want to match both forms of calling `gets ()', you should use a disjunctive pattern, as described later on.

complex checks

Option `-ftree-checks=FILE'

Perform the series of user-defined checks defined in FILE over the source files being compiled.

More complex user-defined checks take into account not only syntax, but also control flow and data flow information. These control flow and dataflow patterns are called "condates". A condate encodes an example of programming error that should never occur.

GCC traverses the CFG and prints a warning for each example of condate it finds. Condates express reachability properties on the control-flow graph of each function, having the general form:

from S1 to S2 avoid E;

interpreted as "find some path in the CFG starting with a statement matching S1, finishing with a statement matching S2 while avoiding all edges matching E", where S1, S2 are statement patterns, and E is an edge pattern.

A statement pattern is either an atomic statement pattern, as described above, or a disjunction of atomic statement patterns separated by the `or' operator. For example the statement pattern `"gets (%_)" or "%_ = gets (%_)"' matches any call to `gets ()'.

An edge pattern may have one of the following forms:

All parts of a condate except the first part may be omitted. The default pattern for the `to' pattern S2 is the `%_' pattern matching anything. The default pattern for the `avoid' pattern P3 is the null pattern, matching nothing. In particular, if only the `from' part is specified, meaning that there is a path from P1 to anywhere avoiding nothing, the compiler simply issues a warning for all statements matching pattern P1.

For example, the following check detects cases when a function exits with interrupts disabled:

from "disable_interrupts (%X)"
to "return" or "return %_"
avoid "enable_interrupts (%X)";

Here is a slightly more complex example involving all the three patterns, and looking for pointers obtained through `malloc ()' being dereferenced without being checked as non-null:

from "%X = malloc(%_)"
to "%_ = %X->%_" or "%X->%_ = %_"
avoid "%X = %_" or +"%X != 0B" or -"%X == 0B";

Note that `NULL' values are represented as `0B' above because condates are checked on the Gimple form, where `NULL' pointers are represented as such. To see the Gimple form, use `-fdump-tree-gimple', which generates it in file `NAME.c.004t.gimple'.

Pattern variables occurring in checks are of two kinds:

Using `-ftree-checks=FILE', one can check for a series of condates defined in the given FILE as described above. Comments may be added at any place within the file, preceded by a `#' character.

All the condates described so far are anonymous condates. When raising a warning on an anonymous condate, GCC refers to it by its indexing number in FILE, and prints a generic warning message. There is an alternate syntax for condates allowing to give them a name and to customize the corresponding warning message:

condate NAME { CONDATE } warning(``MESSAGE'');

For example (a slightly extended version of) the malloc check above can be defined as a named condate as follows:

condate malloc_deref {
from "%X = malloc (%_)" # any malloc
to "%_ = %X->%_" or "%X->%_ = %_"
or "*%X = %_" or "%_ = *%X"
avoid "%X = %_" or +"%X != 0B" or -"%X == 0B"
} warning("unsafe dereference of X after malloc");

Named condates should be preferred over anonymous condates as they generate more explicit warnings.

verbosity

Option `-ftree-check-verbose'

Outputs to `stderr' more precise messages on the matched patterns for each warning generated by the tree-checker. This is turned off by default.

Last update: 15/4/2007. Contact: mygcc@free.fr