

File format of .bbg file
------------------------

Old Format
----------

Derived from source code of gcc/gcov.c (CVS snapshot 20001120)

LUINT32 is a little-endian 32 bit unsigned integer
LUINT64 is a little-endian 64 bit unsigned integer

{
    LUINT32 num_blocks
    LUINT32 num_arcs
    {
	LUINT32 num_arcs_per_block
	{
    	    LUINT32 dest
	    LUINT32 flags
		 {
	     	    on_tree = (1<<0),
		    fake = (1<<1),
		    fall_through = (1<<2)
		 }
	}
	[num_arcs_per_block]
    }
    [num_blocks]
    LUINT32 seperator = 0x80000001
}[*]

Read LUINT32s from the .da file in this order:
(ignoring the first LUINT64 which is a count of the LUINT64s following)

foreach function in .bbg file
    foreach block in function->blocks
	foreach arcs in block->outarcs where !arc->on_tree
	    LUINT64 arc_count



New Format
----------

This format is used in gcc 3.3 and apparently in SuSE's gcc 3.2.
The following is from gcc/gcc/gcov-io.h (cvs 20030615). Note
that unlike the old format the new format is reasonably well
documented, even if only in a C comment.

/* File format for coverage information
   Copyright (C) 1996, 1997, 1998, 2000, 2002,
   2003 Free Software Foundation, Inc.
   Contributed by Bob Manson &lt;<A HREF="mailto:manson@cygnus.com">manson@cygnus.com</A>&gt;.
   Completely remangled by Nathan Sidwell &lt;<A HREF="mailto:nathan@codesourcery.com">nathan@codesourcery.com</A>&gt;.

This file is part of GCC.

GCC is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free
Software Foundation; either version 2, or (at your option) any later
version.

GCC is distributed in the hope that it will be useful, but WITHOUT ANY
WARRANTY; without even the implied warranty of MERCHANTABILITY or
FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
for more details.

You should have received a copy of the GNU General Public License
along with GCC; see the file COPYING.  If not, write to the Free
Software Foundation, 59 Temple Place - Suite 330, Boston, MA
02111-1307, USA.  */

/* As a special exception, if you link this library with other files,
   some of which are compiled with GCC, to produce an executable,
   this library does not by itself cause the resulting executable
   to be covered by the GNU General Public License.
   This exception does not however invalidate any other reasons why
   the executable file might be covered by the GNU General Public License.  */

/* Coverage information is held in two files.  A basic block graph
   file, which is generated by the compiler, and a counter file, which
   is generated by the program under test.  Both files use a similar
   structure.  We do not attempt to make these files backwards
   compatible with previous versions, as you only need coverage
   information when developing a program.  We do hold version
   information, so that mismatches can be detected, and we use a
   format that allows tools to skip information they do not understand
   or are not interested in.

   Numbers are recorded in big endian unsigned binary form.  Either in
   32 or 64 bits.  Strings are stored with a length count and NUL
   terminator, and 0 to 3 bytes of zero padding up to the next 4 byte
   boundary.  Zero length and NULL strings are simply stored as a
   length of zero (they have no trailing NUL or padding).

   	int32:  byte3 byte2 byte1 byte0
	int64:  byte7 byte6 byte5 byte4 byte3 byte2 byte1 byte0
	string: int32:0 | int32:length char* char:0 padding
	padding: | char:0 | char:0 char:0 | char:0 char:0 char:0
	item: int32 | int64 | string

   The basic format of the files is

   	file : int32:magic int32:version record*

   The magic ident is different for the bbg and the counter files.
   The version is the same for both files and is derived from gcc's
   version number.  Although the ident and version are formally 32 bit
   numbers, they are derived from 4 character ASCII strings.  The
   version number consists of the single character major version
   number, a two character minor version number (leading zero for
   versions less than 10), and a single character indicating the
   status of the release.  That will be 'e' experimental, 'p'
   prerelease and 'r' for release.  Because, by good fortune, these are
   in alphabetical order, string collating can be used to compare
   version strings, and because numbers are stored big endian, numeric
   comparison can be used when it is read as a 32 bit value.  Be aware
   that the 'e' designation will (naturally) be unstable and might be
   incompatible with itself.  For gcc 3.4 experimental, it would be
   '304e' (0x33303465).  When the major version reaches 10, the letters
   A-Z will be used.  Assuming minor increments releases every 6
   months, we have to make a major increment every 50 years.  Assuming
   major increments releases every 5 years, we're ok for the next 155
   years -- good enough for me.

   A record has a tag, length and variable amount of data.

   	record: header data
	header: int32:tag int32:length
	data: item*

   Records are not nested, but there is a record hierarchy.  Tag
   numbers reflect this hierarchy.  Tags are unique across bbg and da
   files.  Some record types have a varying amount of data.  The LENGTH
   is usually used to determine how much data.  The tag value is split
   into 4 8-bit fields, one for each of four possible levels.  The
   most significant is allocated first.  Unused levels are zero.
   Active levels are odd-valued, so that the LSB of the level is one.
   A sub-level incorporates the values of its superlevels.  This
   formatting allows you to determine the tag heirarchy, without
   understanding the tags themselves, and is similar to the standard
   section numbering used in technical documents.  Level values
   [1..3f] are used for common tags, values [41..9f] for the graph
   file and [a1..ff] for the counter file.

   The basic block graph file contains the following records
   	bbg:  unit function-graph*
	unit: header int32:checksum string:source
	function-graph: announce_function basic_blocks {arcs | lines}*
	announce_function: header int32:ident int32:checksum
		string:name string:source int32:lineno
	basic_block: header int32:flags*
	arcs: header int32:block_no arc*
	arc:  int32:dest_block int32:flags
        lines: header int32:block_no line*
               int32:0 string:NULL
	line:  int32:line_no | int32:0 string:filename

   The BASIC_BLOCK record holds per-bb flags.  The number of blocks
   can be inferred from its data length.  There is one ARCS record per
   basic block.  The number of arcs from a bb is implicit from the
   data length.  It enumerates the destination bb and per-arc flags.
   There is one LINES record per basic block, it enumerates the source
   lines which belong to that basic block.  Source file names are
   introduced by a line number of 0, following lines are from the new
   source file.  The initial source file for the function is NULL, but
   the current source file should be remembered from one LINES record
   to the next.  The end of a block is indicated by an empty filename
   - this does not reset the current source file.  Note there is no
   ordering of the ARCS and LINES records: they may be in any order,
   interleaved in any manner.  The current filename follows the order
   the LINES records are stored in the file, *not* the ordering of the
   blocks they are for.

   The data file contains the following records.
        da: {unit function-data* summary:object summary:program*}*
	unit: header int32:checksum
        function-data:	announce_function arc_counts
	announce_function: header int32:ident int32:checksum
	arc_counts: header int64:count*
	summary: int32:checksum {count-summary}GCOV_COUNTERS
	count-summary:	int32:num int32:runs int64:sum
			int64:max int64:sum_max

   The ANNOUNCE_FUNCTION record is the same as that in the BBG file,
   but without the source location.
   The ARC_COUNTS gives the counter values for those arcs that are
   instrumented.  The SUMMARY records give information about the whole
   object file and about the whole program.  The checksum is used for
   whole program summaries, and disambiguates different programs which
   include the same instrumented object file.  There may be several
   program summaries, each with a unique checksum.  The object
   summary's checkum is zero.  Note that the da file might contain
   information from several runs concatenated, or the data might be
   merged.

   This file is included by both the compiler, gcov tools and the
   runtime support library libgcov. IN_LIBGCOV and IN_GCOV are used to
   distinguish which case is which.  If IN_LIBGCOV is nonzero,
   libgcov is being built. If IN_GCOV is nonzero, the gcov tools are
   being built. Otherwise the compiler is being built. IN_GCOV may be
   positive or negative. If positive, we are compiling a tool that
   requires additional functions (see the code for knowledge of what
   those functions are).  */

#define GCOV_TAG_FUNCTION	 ((gcov_unsigned_t)0x01000000)
#define GCOV_TAG_BLOCKS		 ((gcov_unsigned_t)0x01410000)
#define GCOV_TAG_ARCS		 ((gcov_unsigned_t)0x01430000)
#define GCOV_TAG_LINES		 ((gcov_unsigned_t)0x01450000)
#define GCOV_TAG_COUNTER_BASE 	 ((gcov_unsigned_t)0x01a10000)
#define GCOV_TAG_OBJECT_SUMMARY  ((gcov_unsigned_t)0xa1000000)
#define GCOV_TAG_PROGRAM_SUMMARY ((gcov_unsigned_t)0xa3000000)


Experiment shows that a .bbg file at version '303p' is structured
slightly differently from the description above, and is (in BNF):

bbgfile ::= function-record*

function-record ::= function-chunk blocks-chunk arcs-chunk* lines-chunk*

function-chunk ::=  chunk-header(GCOV_TAG_FUNCTION) function-name checksum
function-name ::= string
checksum ::= u32

blocks-chunk ::= chunk-header(GCOV_TAG_BLOCKS) block-flags*
block-flags ::= u32
# one u32 of flags per block, so number of blocks is chunk-length/4

arcs-chunk ::= chunk-header(GCOV_TAG_ARCS) src-block arc*
arc ::= dest-block arc-flags
src-block ::= u32   	# block number of source of arc
dest-block ::= u32   	# block number of destination of arc
arc-flags ::= u32   	# appears to be the same 3 flags we know & love

lines-chunk ::= chunk-header(GCOV_TAG_LINES)

chunk-header ::= chunk-tag chunk-length
chunk-tag ::= u32
chunk-length ::= u32

The only difference to the format described in the comment is that
the GCOV_TAG_FUNCTION chunk is {name,checksum} instead of
{ident,checksum,name,source,lineno}.

