C++ reflection: extract type inheritance information with code postprocessing

NOTE 1: In the following reflection is meant in the rather narrow sense of the capability of a computer program to observe its own type structure (types, sizes, member layout, member function signatures, inheritance) at run-time.

NOTE 2:The following code (C) Copyright 2010 Paolo Greppi libpf.com; the author released version 0.1 on 20100917 with no warranties whatsoever; distribute freely and free of charge citing this source.


Reflection is required in varying degrees if you want to serialize (i.e. write to disk or database) type instances, if you need signals/slots, if you have instances that can be dynamically modified at runtime, to implement a better object factory / object broker, or of you need to provide an API with reflection capabilities.

Reflection is available in many modern programming languages, but is absent in today’s C++ and will remain absent in tomorrow’s C++ as well because it has been excluded from the upcoming C++0x standard. There are several approaches to work around this limitation, listed here in order or decreasing obtrusiveness:

Post-processing is to be considered the most orthodox approach, because it retains the centricity of the C++ source code.

In this post we’ll discuss a lightweight post-processing procedure to extract just the inheritance structure from a type hierarchy, capable of handling generic types (templated classes), typedefs, and multi-file project. It is mainly based on GCC-XML as the parser plus XQuery for XML processing, and has been tested on Kubuntu Lucid Lynx 10.04.

So here goes the recipe:

  1. Before you start, get the required programs: sudo apt-get install g++-4.3 gccxml libqt4-xmlpatterns xsltproc

  2. generate the necessary extracts of the C++ AST in XML format by running the source files though gccxml (the headers will do in most cases, but if you have templates around do that on the implementation files itself):

    gccxml --gccxml-compiler g++-4.3 -I ../include headerfile.h -fxml=headerfile.xml -fxml-start=start,from,just,these,six,classes
    

    The –gccxml-compiler option is required because on Kubuntu Lucid the default compiler is g++ 4.4, but the current gccxml package in not very well compatible with that version: better use an older one.

    The -fxml-start option can be used to reduce the size of the generate XML file, by specifying a list of starting declarations (you should replace “start,from,just,these,six,classes” with actual class names for your code!); this incidentally also prevents gccxml from parsing non-required header files, such as boost-1.44 which often causes it to crash !
    The generated XML files have the following structure:

    <?xml version="1.0"?>
    <GCC_XML cvs_revision="1.128">
      <Class id="_1" name="edgeBase" context="_3" abstract="1" mangled="8edgeBase" demangled="edgeBase" location="f0:471" file="f0" line="471" artificial="1" size="3360" align="32" members="_4 _5 _6 _7 _8 _9 _10 _11 _12 _13 _14 _15 _16 " bases="_17 _18 ">
        <Base type="_17" access="public" virtual="1" offset="300"/>
        <Base type="_18" access="public" virtual="0" offset="4"/>
      </Class>
      <Class id="_17" name="modelBaseInterface" context="_3" abstract="1" mangled="18modelBaseInterface" demangled="modelBaseInterface" location="f0:167" file="f0" line="167" artificial="1" size="960" align="32" members="_45 _46 _47 _48 _49 _50 _51 _52 _53 _54 _55 _56 _57 _44 _58 _59 _60 _61 _62 _63 _64 _65 _66 _67 _68 _69 _70 _71 _72 _73 _74 _75 _76 _77 _78 _79 _80 _81 _82 _83 _84 _85 _86 _87 _88 _89 _90 _91 _92 _93 _94 _95 _96 _97 _98 _99 _100 _101 _102 _103 _104 _105 _106 _107 _108 _109 _110 _111 _112 _113 _114 _115 _116 _117 _118 _119 _120 _121 _122 _123 _124 _125 _126 _127 _128 _129 " bases="_130 _131 ">
        <Base type="_130" access="public" virtual="0" offset="0"/>
        <Base type="_131" access="public" virtual="1" offset="112"/>
      </Class>
    
      <Typedef id="_144" name="Qdouble" type="_143" context="_3" location="f4:220" file="f4" line="220"/>
    
      <Class id="_143" name="Quantity<double>" context="_3" mangled="8QuantityIdE" demangled="Quantity<double>" location="f4:62" file="f4" line="62" artificial="1" size="832" align="32" members="_318 _319 _320 _321 _322 _323 _324 _325 _326 _327 _328 _329 _330 _331 _332 _333 _334 _335 _336 _337 _338 _339 _340 _341 _342 _343 _344 _345 _346 _347 _348 _349 _350 _351 _352 _353 _354 _355 _356 _357 _358 _359 _360 _361 _362 _363 _364 _365 _366 _367 _368 _369 _370 _371 _372 _373 _374 _375 _376 _377 _378 _379 _380 _381 " bases="_382 _181 ">
        <Base type="_382" access="public" virtual="0" offset="24"/>
        <Base type="_181" access="public" virtual="0" offset="0"/>
      </Class>
    
    <Namespace id="_3" name="::" mangled="_Z2::" demangled="::"/>
    
    ...
    </GCC_XML>
    

    Here clearly the _[0-9]* patterns are unique labels for the various identifiers.

  3. post-process the XML file by running a XQuery script to extract the required information:

    xmlpatterns process_xml.xq -param fileToOpen=.xml -output ../bin/$1_processed.xml
    

    This is the process_xml.xq XQuery script (53 lines of code – LOC):

    declare variable $fileToOpen as xs:anyURI external;
    <reflection>
    {
    <types>
    {
    for $class in doc($fileToOpen)//Class[@context=//Namespace[@name='::']/@id]
    let $name := $class/@name
    return
    <type>{$name}</type>
    }
    </types>
    
    ,
    
    <typedefs>
    {
    for $typedef in doc($fileToOpen)//Typedef[@context=//Namespace[@name='::']/@id]
    let $tdname := $typedef/@name
    let $source := doc($fileToOpen)//Class[@id=string($typedef/@type)]
    return
    if ($source)
    then
    <typedef>{$tdname} {string($source/@name)}</typedef>
    else ()
    }
    </typedefs>
    
    ,
    <dependencies>
    {
    for $class in doc($fileToOpen)//Class[@context=//Namespace[@name='::']/@id]
    let $name := $class/@name
    return
    if (count($class/Base) > 0)
    then
    <dependency>{$name}{
    for $base in $class/Base
    let $basename := doc($fileToOpen)//Class[@id=string($base/@type)]
    return
    <base>{$basename/@name}</base>
    }
    </dependency>
    else ()
    }
    </dependencies>
    
    }
    </reflection>
    

    The post-processed XML is 1-2 orders or magnitude more compact than the one generated by gccxml and looks like this:

    <reflection>
      <types>
        <type name="edgeBase"/>
        <type name="vertexBase"/>
        <type name="modelBaseInterface"/>
    ...
      </types>
      <typedefs>
        <typedef name="Qdouble">Quantity<double></typedef>
    ...
      </typedefs>
      <dependencies>
        <dependency name="edgeBase">
          <base name="modelBaseInterface"/>
          <base name="precedence"/>
        </dependency>
        <dependency name="vertexBase">
          <base name="modelBaseInterface"/>
          <base name="task"/>
        </dependency>
    ...
      </dependencies>
    </reflection>
    
  4. finally run XSLT transformation to generate C++ code:

    xsltproc ../scripts/generate_hierarchy.xslt ../bin/$1_processed.xml >> classes.cc
    

    This is the required generate_hierarchy.xslt XSLT (Extensible Stylesheet Language Transformations) code (17 LOC):

    <?xml version="1.0"?>
      <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xsch="http://www.w3.org/1999/XMLSchema" version="1.0">
      <xsl:output method="text"/>
      <xsl:template match="/reflection/types/type">
    insert_type_("<xsl:value-of select="@name"/>");</xsl:template>
      <xsl:template match="/reflection/typedefs/typedef">
    insert_synonym_("<xsl:value-of select="@name"/>", "<xsl:value-of select="text()"/>");</xsl:template>
      <xsl:template match="/reflection/dependencies/dependency">
        <xsl:variable name="classname" select="@name"/>
          <xsl:for-each select="base">
    insert_dependency_("<xsl:value-of select="@name"/>", "<xsl:value-of select="$classname"/>");</xsl:for-each>
      </xsl:template>
    </xsl:stylesheet>
    

    The generated C++ code looks like this:

    insert_type_("edgeBase");
    insert_type_("vertexBase");
    insert_type_("modelBaseInterface");
    ...
    insert_synonym_("Qdouble", "Quantity<double>");
    insert_synonym_("UOMarray", "UOMarrayGen<int>");
    ...
    insert_dependency_("modelBaseInterface", "edgeBase");
    insert_dependency_("modelBaseInterface", "vertexBase");
    

If you have a multi-file project, there is the need to purge duplicates (which occur when header files are included more than once in different files). To do this, here is a quick shell script fix:

```
grep insert_type classes.cc | sort | uniq
grep insert_synonym classes.cc | sort | uniq
grep insert_dependency classes.cc | sort | uniq
```

Compared with the most close “competitor”, Root’s Reflex (which is also based on gccxml but uses some python scripts to extract the information and generate the C++ code), it is much more limited in scope, but consist in less than 80 LOC ! For comparison:

```
cat root/cint/reflex/python/genreflex/*py | wc
```

reports 4643 LOC !

In a later post we’ll give away the C++ code that integrates with this automatically generated code to make the actual run-time reflection work.