Precompiled Headers for C++
Stephen Davies
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This document is a discussion about the design and implementation of
precompiled headers for the C++ parser in Synopsis.

THE PROBLEM

Reading and parsing headers can be very time consuming, especially when you
consider the size of the STL headers, and the penalty involved in #including
them. Currently (v0.2) Synopsis' C++ parser will read all declarations, parse
them all, and convert them all to Python AST objects. It is then up to the
linker to strip unneccessary declarations.

SOLUTION

We cannot just ignore the #included files, since full type information is
needed to resolve non-local type references. For example, looking up types in
base classes, etc. Whilst it may seem possible to leave this up the linker,
the scoping rules are language specific leaving them out of the scope of the
Linker.

So the #include files must be processed, but must they be processed more than
once? The answer is, tentatively, no. It is conceivable that the C++ parser
store the AST generated by parsing the file the first time around, and from
then on just load the store version. There are many pitfalls that must be
catered for, such as #includes not in the global scope, and the effects of
macros on the contents of the #included file.

SCOPES

Consider the following case:

// File: vector
namespace std {
#include <vector.h>
};

Whilst g++ doesn't do this (vector.h includes vector, following it with a
using std::vector), it would be incorrect for Synopsis to ignore this
possibility.

Fortunately, the solution to this problem is somewhat simple -- just iterate
through the loaded AST and modify the names of all declarations by prepending
the current scope. The declarations must then be inserted into the current
scope, which may be a class body or even a function body.

It is possible that the user may place an #include statement in the middle of
any declaration or definition, however Synopsis may ignore this case.

MACROS

The problem:

// File:
#define String XString
#include <x11.h>
#undef String

There are two problems inherent here: How do we modify the AST to cater for
this, and how do we document it?

THE AST

At first it appeared a simple name substitution might suffice, but further
thought revealed that this would not be sufficient, since the macro may alter
the structure of the AST, eg: #define FOO class, or #define FOO volatile.
The easiest and most correct way to handle the situation seems to be the
following:

Make a list of all identifiers in the file, before preprocessing. Compare this
list to defined macros at the point of #inclusion. Store the intersection of
the two lists with the AST. If, at another point of #inclusion, the list or
its values are different, then the AST needs to be regerenated. This new AST
may also be stored, resulting in multiple ASTs for the same file, which means
some sort of index needs to be created to identify and select them.


DESIGN

The following changes will need to be made to the C++ Parser:

. Create a hierarchy of C++ objects that mirrors the Python AST and Type
hierarchies. This is needed since using Python types is inefficient,
especially when most of the generated AST will be discarded before the rest of
Synopsis uses what's left.
. Write a preprocessor that strips everything but #preprocessor directives,
such that #includes will still result in the correct macros being defined.
This must also recursively parse any #included files.
. Create a mapping to a local repository of these stripped include files. This
must be able to handle <system> includes, and -I options.

