SlideShare una empresa de Scribd logo
1 de 38
Descargar para leer sin conexión
The use of the code analysis library
OpenC++: modifications, improvements,
error corrections
Author: Andrey Karpov

Date: 12.01.2008


Abstract
The article may be interesting for developers who use or plan to use OpenC++ library (OpenCxx). The
author tells about his experience of improving OpenC++ library and modifying the library for solving
special tasks.


Introduction
One may often here in forums that there are a lot of C++ syntax analyzers ("parsers"), and many of
them are free. Or that one may take YACC, for example, and realize his own analyzer easily. Don't
believe, it is not so easy [1, 2]. One may understand it especially if one remembers that it is even not
half a task to parse syntax. It is necessary to realize structures for storing the program tree and semantic
tables containing information about different objects and their scopes. It is especially important while
developing specialized applications related to the processing and static analysis of C++ code. It is
necessary for their realization to save the whole program tree what may be provided by few libraries.
One of them is open library OpenC++ (OpenCxx) [3] about which we'll speak in this article.

We'd like to help developers in mastering OpenC++ library and share our experience of modernization
and improvement of some defects. The article is a compilation of pieces of advice, each of which is
devoted to correction of some defect or realization of improvement.

The article is based on recollections about changes that were carried out in VivaCore library [4] based on
OpenC++. Of course, only a small part of these changes is discussed here. It is a difficult task to
remember and describe them all. And, for example, description of addition of C language support into
OpenC++ library will take too much place. But you can always refer to original texts of VivaCore library
and get a lot of interesting information.

It remains to say that OpenC++ library is unfortunately out-of-date now and needs serious improvement
for supporting the modern C++ language standard. That's why if you are going to realize a modern
compiler for example, you'd better pay your attention to GCC or to commercial libraries [5, 6]. But
OpenC++ still remains a good and convenient tool for many developers in the sphere of systems of
specialized processing and modification of program code. With the use of OpenC++ many interesting
solutions are developed, for example, execution environment OpenTS [7] for T++ programming
language (development of Program systems Institution RAS), static code analyzer Viva64 [8] or Synopsis
tool for preparing documentation on the original code [9].

The purpose of the article is to show by examples how one can modify and improve OpenC++ library
code. The article describes 15 library modifications related to error correction or addition of new
functionality. Each of them not only allows to make OpenC++ library better but also gives an opportunity
to study its work principles deeper. Let's get acquainted with them.


1. Skip of development environment keywords not influencing the
program processing
While developing a code analyzer for a specific development environment, you are likely to come across
with its specific language constructions. These constructions are often guidance for a concrete compiler
and may not be of interest for you. But such constructions cannot be processed by OpenC++ library as
they are not a part of C++ language. In this case one of the simplest ways to ignore them is to add them
into rw_table table with ignore key. For example:

static rw_table table[] = {

      ...

      { "__ptr32",              Ignore},

      { "__ptr64",              Ignore},

      { "__unaligned", Ignore},

      ...

};

While adding you should keep in mind that words in rw_table table should be arranged in alphabetic
order. Be careful.


2. Addition of a new lexeme
If you want to add a keyword which should be processed, you need to create a new lexeme ("token").
Let's look at the example of adding a new keyword "__w64". At first create an identifier of the new
lexeme (see token-name.h file), for example in this way:

enum {

   Identifier = 258,

   Constant = 262,

   ...

   W64 = 346, // New token name

   ...

};

Modernize the table "table" in lex.cc file:

static rw_table table[] = {

      ...
{ "__w64",               W64 },

      ...

};

The next step is to create a class for the new lexeme, which we'll call LeafW64.

namespace Opencxx

{

class LeafW64 : public LeafReserved {

public:

    LeafW64(Token& t) : LeafReserved(t) {}

    LeafW64(char* str, ptrdiff_t len) :

       LeafReserved(str, len) {}

    ptrdiff_t What() { return W64; }

};

}

To create an object we'll need to modify optIntegralTypeOrClassSpec() function:

...

case UNSIGNED :

    flag = 'U';

    kw = new (GC) LeafUNSIGNED(tk);

    break;

case W64 : // NEW!

    flag = 'W';

    kw = new (GC) LeafW64(tk);

    break;

...

Pay attention that as far as we've decided to refer "__w64" to data types, we'll need the 'W' symbol for
coding this type. You may learn more about type coding mechanism in Encoding.cc file.

Introducing a new type we must remember that we need to modernize such functions as
Parser::isTypeSpecifier() for example.

And the last important point is modification of Encoding::MakePtree function:
Ptree* Encoding::MakePtree(unsigned char*& encoded, Ptree* decl)

{

      ...

            case 'W' :

                  typespec = PtreeUtil::Snoc(typespec, w64_t);

                  break;

      ...

}

Of course, it is only an example, and adding other lexemes may take much more efforts. A good way to
add a new lexeme correctly is to take one close to it in sense and then find and examine all the places in
OpenC++ library where it is used.


3. Skip of development environment complex key constructions not
influencing the program processing
We have already examined the way of skipping single keywords which are senseless for our program but
impede code parsing. Unfortunately, sometimes it is even more difficult. Let's take for demonstration
such constructions as __pragma and __noop which you may see in header files of VisualC++:

__forceinline DWORD HEAP_MAKE_TAG_FLAGS (

       DWORD TagBase, DWORD Tag )

{

    __pragma(warning(push)) __pragma(warning(disable : 4548)) do
{__noop(TagBase);} while((0,0) __pragma(warning(pop)) );

      return ((DWORD)((TagBase) + ((Tag) << 18)));

}

You may look for description of __pragma and __noop constructions in MSDN. The next points are
important for our program: a) they are not of interest for us; b) they have some parameters; c) they
impede code analysis.

Let's add new lexemes at first, as it was told before, but now let's use InitializeOtherKeywords() function
for this purpose:

static void InitializeOtherKeywords(bool recognizeOccExtensions)

{

    ...

    verify(Lex::RecordKeyword("__pragma", MSPRAGMA));

    verify(Lex::RecordKeyword("__noop", MS__NOOP));
...

}

Solution consists in modifying Lex::ReadToken function so that when we come across with DECLSPEC or
MSPRAGMA lexeme we skip it. And then we skip all the lexemes related to __pragma and __noop
parameters. For skipping all the unnecessary lexemes we use SkipDeclspecToken() function as it is
shown further.

ptrdiff_t Lex::ReadToken(char*& ptr, ptrdiff_t& len)

{

       ...

           else if(t == DECLSPEC){

                 SkipDeclspecToken();

                 continue;

           }

           else if(t == MSPRAGMA) { // NEW

                 SkipDeclspecToken();

                 continue;

           }

           else if(t == MS__NOOP) { //NEW

               SkipDeclspecToken();

               continue;

           }

     ...

}


4. Function of full file paths disclosure
In tasks of analysis of original code a large amount of functionality is related to creation of error
messages and also to navigation on original files. What is inconvenient is that file names returned by
such functions as Program::LineNumber() may be presented in different ways. Here are some examples:

C:Program FilesMSVS 8VCatlmfcincludeafx.h

.drawing.cpp

c:srcwxwindows-2.4.2samplesdrawingwx/defs.h

Boostboost-1_33_1boost/variant/recursive_variant.hpp
..FieldEdit2SrcamsEdit.cpp

......srcbaseftbase.c

The way may be full or relative. Different delimiters may be used. All this makes the use of such ways
inconvenient for processing or for output in information messages. That's why we offer realization of
FixFileName() function bringing paths to uniform full way. An auxiliary function GetInputFileDirectory() is
used to return the path to the catalogue where the processed file is situated.

const string &GetInputFileDirectory() {

    static string oldInputFileName;

    static string fileDirectory;

    string dir;

    VivaConfiguration &cfg = VivaConfiguration::Instance();

    string inputFileName;

    cfg.GetInputFileName(inputFileName);

    if (oldInputFileName == inputFileName)

        return fileDirectory;

    oldInputFileName = inputFileName;

  filesystem::path inputFileNamePath(inputFileName,
filesystem::native);

    fileDirectory = inputFileNamePath.branch_path().string();

    if (fileDirectory.empty()) {

        TCHAR curDir[MAX_PATH];

        if (GetCurrentDirectory(MAX_PATH, curDir) != 0) {

            fileDirectory = curDir;

        } else {

            assert(false);

        }

    }

    algorithm::replace_all(fileDirectory, "/", "");

    to_lower(fileDirectory);

    return fileDirectory;

}
typedef map<string, string> StrStrMap;

typedef StrStrMap::iterator StrStrMapIt;

void FixFileName(string &fileName) {

 static StrStrMap FileNamesMap;

 StrStrMapIt it = FileNamesMap.find(fileName);

 if (it != FileNamesMap.end()) {

     fileName = it->second;

     return;

 }

 string oldFileName = fileName;

 algorithm::replace_all(fileName, "/", "");

 algorithm::replace_all(fileName, "", "");

 filesystem::path tmpPath(fileName, filesystem::native);

 fileName = tmpPath.string();

 algorithm::replace_all(fileName, "/", "");

 to_lower(fileName);

 if (fileName.length() < 2) {

     assert(false);

     FileNamesMap.insert(make_pair(oldFileName, fileName));

     return;

 }



 if (fileName[0] == '.' && fileName[1] != '.') {

     const string &dir = GetInputFileDirectory();

     if (!dir.empty())

      fileName.replace(0, 1, dir);

     FileNamesMap.insert(make_pair(oldFileName, fileName));

     return;

 }

 if (isalpha(fileName[0]) && fileName[1] == ':' ) {
FileNamesMap.insert(make_pair(oldFileName, fileName));

        return;

    }

    const string &dir = GetInputFileDirectory();

    if (dir.empty())

        fileName.insert(0, ".");

    else {

        fileName.insert(0, "");

        fileName.insert(0, dir);

    }

    FileNamesMap.insert(make_pair(oldFileName, fileName));

}


5. Getting values of numerical literals
The function of getting a value of a numerical literal may be useful in systems of building documentation
on the code. For example, with its help one may see that the argument of "void foo(a = 99)" function is
99 and use this for some purpose.

GetLiteralType() function that we offer allows to get the literal type and its value if it is integer.
GetLiteralType() function is created for getting information needed most often and doesn't support
rarely used record types. But if you need to support UCNs for example or get values of double type, you
may expand functionality of the functions given below by yourself.

", 5) == 0) {     retValue = 0;      ;   }    ; }   IsHexLiteral(
*from, size_t len) {    (len < 3)      ;    (from[0] != '0')      ;
(from[1] != 'x' && from[1] != 'X')      ;    ; } SimpleType
GetTypeBySufix( *from, size_t len) {    assert(from != NULL);    (len
== 0)      ST_INT;   assert(!isdigit(*from));     suffix_8 = ;
suffix_16 = ;    suffix_32 = ;    suffix_64 = ;    suffix_i = ;
suffix_l = ;    suffix_u = ;     (len != 0) {     --len;       c =
*from++;     (c) {        '8': suffix_8 = ; ;        '1':
(len == 0 || *from++ != '6') {           assert();
ST_UNKNOWN;         }         --len;         suffix_16 = ;         ;
'3':          (len == 0 || *from++ != '2') {           assert();
ST_UNKNOWN;         }         --len;         suffix_32 = ;         ;
'6':          (len == 0 || *from++ != '4') {           assert();
ST_UNKNOWN;         }         --len;         suffix_64 = ;         ;
'I':        'i': suffix_i = ; ;        'U':        'u': suffix_u = ; ;
'L':        'l': suffix_l = ; ;       :         assert();
ST_UNKNOWN;     }   }    assert(suffix_8 + suffix_16 + suffix_32 +
suffix_64 <= 1);       (suffix_8 || suffix_16)      ST_LESS_INT;
(suffix_32) {      (suffix_u)        ST_UINT;             ST_INT;   }
(suffix_64) {      (suffix_u)        ST_UINT64;             ST_INT64;
}     (suffix_l) {      (suffix_u)        ST_ULONG;
ST_LONG;   }     (suffix_u)      ST_UINT;    assert(suffix_i);
ST_INT; } SimpleType GetHexLiteral( *from, size_t len,
&retValue) {   assert(len >= 3);     *p = from + 2;    (!GetHex(p,
len, retValue)) {      ST_UNKNOWN;   }   ptrdiff_t newLen = len - (p -
from);   assert(newLen >= 0 && newLen < <ptrdiff_t>(len));
GetTypeBySufix(p, newLen); }   IsOctLiteral( *from, size_t len) {
(len < 2)      ;    (from[0] != '0')      ;    ; } SimpleType
GetOctLiteral( *from, size_t len,
&retValue) {   assert(len >= 2);     *p = from + 1;    (!GetOct(p,
len, retValue)) {      ST_UNKNOWN;   }   ptrdiff_t newLen = len - (p -
from);   assert(newLen >= 0 && newLen < <ptrdiff_t>(len));
GetTypeBySufix(p, newLen); } SimpleType GetDecLiteral( *from, size_t
len,                            &retValue) {   assert(len >= 1);
*limit = from + len;     n = 0;    (from < limit) {       c = *from;
(c < '0' || c > '9')       ;     from++;     n = n * 10 + (c - '0');
}   ptrdiff_t newLen = limit - from;    (newLen == <ptrdiff_t>(len))
ST_UNKNOWN;   retValue = n;   assert(newLen >= 0 && newLen <
<ptrdiff_t>(len));    GetTypeBySufix(from, newLen); } SimpleType
GetLiteralType( *from, size_t len,
&retValue) {    (from == NULL || len == 0)      ST_UNKNOWN;   retValue
= 1;    (from == NULL || len == 0)      ST_UNKNOWN;
(GetCharLiteral(from, len, retValue))      ST_LESS_INT;
(GetStringLiteral(from, len))      ST_POINTER;
(GetBoolLiteral(from, len, retValue))      ST_LESS_INT;
(IsRealLiteral(from, len))      GetRealLiteral(from, len);
(IsHexLiteral(from, len))      GetHexLiteral(from, len, retValue);
(IsOctLiteral(from, len))      GetOctLiteral(from, len, retValue);
GetDecLiteral(from, len, retValue); }

unsigned __int64 GetHexValue(unsigned char c) {

    if (c >= '0' && c <= '9')

     return c - '0';

    if (c >= 'a' && c <= 'f')

     return c - 'a' + 0x0a;

    if (c >= 'A' && c <= 'F')

     return c - 'A' + 0x0a;

    assert(false);

    return 0;

}

bool GetHex(const char *&from, size_t len,

                unsigned __int64 &retValue) {
unsigned __int64 c, n = 0, overflow = 0;

    int digits_found = 0;

    const char *limit = from + len;

    while (from < limit)

    {

        c = *from;

        if (!isxdigit(c))

         break;

        from++;

        overflow |= n ^ (n << 4 >> 4);

        n = (n << 4) + GetHexValue(c);

        digits_found = 1;

    }

    if (!digits_found)

        return false;

    if (overflow) {

        assert(false);

    }

    retValue = n;

    return true;

}

bool GetOct(const char *&from, size_t len,

                  unsigned __int64 &retValue) {

    unsigned __int64 c, n = 0;

    bool overflow = false;

    const char *limit = from + len;

    while (from < limit)

    {

        c = *from;

        if (c < '0' || c > '7')
break;

        from++;

        overflow |= static_cast<bool>(n ^ (n << 3 >> 3));

        n = (n << 3) + c - '0';

    }

    retValue = n;

    return true;

}

#define HOST_CHARSET_ASCII

bool GetEscape(const char *from, size_t len,

                   unsigned __int64 &retValue) {

    /* Values of a b e f n r t v respectively.      */

        // HOST_CHARSET_ASCII

        static const char charconsts[] =

         {   7,   8, 27, 12, 10, 13,   9, 11 };

        // HOST_CHARSET_EBCDIC

        //static const uchar charconsts[] =

         { 47, 22, 39, 12, 21, 13,     5, 11 };

    unsigned char c;

    c = from[0];

    switch (c)

    {

        /* UCNs, hex escapes, and octal escapes

          are processed separately.    */

    case 'u': case 'U':

        // convert_ucn - not supported. Return: 65535.

        retValue = 0xFFFFui64;

        return true;

    case 'x': {

        const char *p = from + 1;
return GetHex(p, len, retValue);

    }

    case '0':    case '1':   case '2':   case '3':

    case '4':    case '5':   case '6':   case '7': {

        const char *p = from + 1;

        return GetOct(p, len, retValue);

    }



    case '': case ''': case '"': case '?':

        break;

    case 'a': c = charconsts[0]; break;

    case 'b': c = charconsts[1];    break;

    case 'f': c = charconsts[3];    break;

    case 'n': c = charconsts[4];    break;

    case 'r': c = charconsts[5];    break;

    case 't': c = charconsts[6];    break;

    case 'v': c = charconsts[7];    break;

    case 'e': case 'E': c = charconsts[2]; break;

    default:

        assert(false);

        return false;

    }

    retValue = c;

    return true;

}

//'A', 't', L'A', 'xFE'

static bool GetCharLiteral(const char *from,

                               size_t len,

                               unsigned __int64 &retValue) {

    if (len >= 3) {
if (from[0] == ''' && from[len - 1] == ''') {

            unsigned char c = from[1];

            if (c == '') {

                verify(GetEscape(from + 2, len - 3, retValue));

            } else {

                retValue = c;

            }

            return true;

        }

    }

    if (len >= 4) {

        if (from[0] == 'L' &&

                from[1] == ''' &&

                from[len - 1] == ''') {

            unsigned char c = from[2];

            if (c == '') {

                verify(GetEscape(from + 3, len - 4, retValue));

            } else {

                retValue = c;

            }

            return true;

        }

    }

    return false;

}

// "string"

static bool GetStringLiteral(const char *from, size_t len) {

    if (len >= 2) {

        if (from[0] == '"' && from[len - 1] == '"')

            return true;
}

    if (len >= 3) {

        if (from[0] == 'L' &&

           from[1] == '"' &&

           from[len - 1] == '"')

         return true;

    }

    return false;

}

bool IsRealLiteral(const char *from, size_t len) {

    if (len < 2)

        return false;

    bool isReal = false;

    bool digitFound = false;

    for (size_t i = 0; i != len; ++i) {

        unsigned char c = from[i];

        switch(c) {

         case 'x': return false;

         case 'X': return false;

         case 'f': isReal = true; break;

         case 'F': isReal = true; break;

         case '.': isReal = true; break;

         case 'e': isReal = true; break;

         case 'E': isReal = true; break;

         case 'l': break;

         case '-': break;

         case '+': break;

         case 'L': break;

         default:

           if (!isdigit(c))
return false;

             digitFound = true;

        }

    }

    return isReal && digitFound;

}

SimpleType GetRealLiteral(const char *from, size_t len) {

    assert(len > 1);

    unsigned char rc1 = from[len - 1];

    if (is_digit(rc1) || rc1 == '.' ||

            rc1 == 'l' || rc1 == 'L' ||

            rc1 == 'e' || rc1 == 'E')

        return ST_DOUBLE;

    if (rc1 == 'f' || rc1 == 'F')

        return ST_FLOAT;

    assert(false);

    return ST_UNKNOWN;

}

bool GetBoolLiteral(const char *from, size_t len,

                         unsigned __int64 &retValue) {

    if (len == 4 && strncmp(from, "true", 4) == 0) {

        retValue = 1;

        return true;

    }

    if (len == 5 && strncmp(from, "false", 5) == 0) {

        retValue = 0;

        return true;

    }

    return false;

}
bool IsHexLiteral(const char *from, size_t len) {

    if (len < 3)

     return false;

    if (from[0] != '0')

     return false;

    if (from[1] != 'x' && from[1] != 'X')

     return false;

    return true;

}

SimpleType GetTypeBySufix(const char *from, size_t len) {

    assert(from != NULL);

    if (len == 0)

     return ST_INT;

    assert(!isdigit(*from));

    bool suffix_8 = false;

    bool suffix_16 = false;

    bool suffix_32 = false;

    bool suffix_64 = false;

    bool suffix_i = false;

    bool suffix_l = false;

    bool suffix_u = false;

    while (len != 0) {

     --len;

     const char c = *from++;

     switch(c) {

       case '8': suffix_8 = true; break;

       case '1':

         if (len == 0 || *from++ != '6') {

           assert(false);

           return ST_UNKNOWN;
}

     --len;

     suffix_16 = true;

     break;

    case '3':

     if (len == 0 || *from++ != '2') {

         assert(false);

         return ST_UNKNOWN;

     }

     --len;

     suffix_32 = true;

     break;

    case '6':

     if (len == 0 || *from++ != '4') {

         assert(false);

         return ST_UNKNOWN;

     }

     --len;

     suffix_64 = true;

     break;

    case 'I':

    case 'i': suffix_i = true; break;

    case 'U':

    case 'u': suffix_u = true; break;

    case 'L':

    case 'l': suffix_l = true; break;

    default:

     assert(false);

     return ST_UNKNOWN;

}
}

    assert(suffix_8 + suffix_16 + suffix_32 + suffix_64 <= 1);



    if (suffix_8 || suffix_16)

        return ST_LESS_INT;



    if (suffix_32) {

        if (suffix_u)

         return ST_UINT;

        else

         return ST_INT;

    }

    if (suffix_64) {

        if (suffix_u)

         return ST_UINT64;

        else

         return ST_INT64;

    }

    if (suffix_l) {

        if (suffix_u)

         return ST_ULONG;

        else

         return ST_LONG;

    }

    if (suffix_u)

        return ST_UINT;

    assert(suffix_i);

    return ST_INT;

}

SimpleType GetHexLiteral(const char *from, size_t len,
unsigned __int64 &retValue) {

    assert(len >= 3);

    const char *p = from + 2;

    if (!GetHex(p, len, retValue)) {

        return ST_UNKNOWN;

    }

    ptrdiff_t newLen = len - (p - from);

    assert(newLen >= 0 && newLen < static_cast<ptrdiff_t>(len));

    return GetTypeBySufix(p, newLen);

}

bool IsOctLiteral(const char *from, size_t len) {

    if (len < 2)

        return false;

    if (from[0] != '0')

        return false;

    return true;

}

SimpleType GetOctLiteral(const char *from, size_t len,

                             unsigned __int64 &retValue) {

    assert(len >= 2);

    const char *p = from + 1;

    if (!GetOct(p, len, retValue)) {

        return ST_UNKNOWN;

    }

    ptrdiff_t newLen = len - (p - from);

    assert(newLen >= 0 && newLen < static_cast<ptrdiff_t>(len));

    return GetTypeBySufix(p, newLen);

}

SimpleType GetDecLiteral(const char *from, size_t len,

                             unsigned __int64 &retValue) {
assert(len >= 1);

    const char *limit = from + len;

    unsigned __int64 n = 0;

    while (from < limit) {

        const char c = *from;

        if (c < '0' || c > '9')

         break;

        from++;

        n = n * 10 + (c - '0');

    }

    ptrdiff_t newLen = limit - from;

    if (newLen == static_cast<ptrdiff_t>(len))

        return ST_UNKNOWN;

    retValue = n;

    assert(newLen >= 0 && newLen < static_cast<ptrdiff_t>(len));

    return GetTypeBySufix(from, newLen);

}

SimpleType GetLiteralType(const char *from, size_t len,

                                unsigned __int64 &retValue) {

    if (from == NULL || len == 0)

        return ST_UNKNOWN;

    retValue = 1;

    if (from == NULL || len == 0)

        return ST_UNKNOWN;

    if (GetCharLiteral(from, len, retValue))

        return ST_LESS_INT;

    if (GetStringLiteral(from, len))

        return ST_POINTER;

    if (GetBoolLiteral(from, len, retValue))

        return ST_LESS_INT;
if (IsRealLiteral(from, len))

      return GetRealLiteral(from, len);

    if (IsHexLiteral(from, len))

      return GetHexLiteral(from, len, retValue);

    if (IsOctLiteral(from, len))

      return GetOctLiteral(from, len, retValue);

    return GetDecLiteral(from, len, retValue);

}


6. Correction of string literal processing function
We offer you to modify Lex::ReadStrConst() function as it is shown further. This will allow to correct two
errors related to processing of separated string literals. The first error occurs while processing strings of
the following kind:

const char *name = "Viva

Core";

The second:

const wchar_t *str = L"begin"L"end".

The corrected function variant:

bool Lex::ReadStrConst(size_t top, bool isWcharStr)

{

      char c;

      for(;;){

            c = file->Get();

            if(c == ''){

                  c = file->Get();

                  // Support: ""

                  if (c == 'r') {

                     c = file->Get();

                     if (c != 'n')

                        return false;

                  } else if(c == '0')
return false;

}

else if(c == '"</str>'){

    size_t pos = file->GetCurPos() + 1;

    ptrdiff_t nline = 0;

    do{

            c = file->Get();

            if(c == 'n')

                ++nline;

    } while(is_blank(c) || c == 'n');

    if (isWcharStr && c == 'L') {

        //Support: L"123" L"456" L "789".

        c = file->Get();

        if(c == '"')

            /* line_number += nline; */ ;

        else{

            file->Unget();

            return false;

        }

    } else {

        if(c == '"')

            /* line_number += nline; */ ;

        else{

            token_len = ptrdiff_t(pos - top);

            file->Rewind(pos);

            return true;

        }

    }

}

else if(c == 'n' || c == '0')
return false;

        }

}


7. Partial correction of the processing of "bool r = a < 1 || b > (int) 2;"
type expressions
There is an error in OpenC++ related to the processing of some expressions which are wrongly taken for
templates. For example, in a string "bool r = a < 1 || b > (int) 2;" "a" variable will be taken for a template
name and then a lot of troubles with syntactical analysis will follow... Full correction of this error
requires great changes and is not realized by now. We offer you a temporary solution excluding the
major part of errors. Further the functions are given which may be added or modified.

bool VivaParser::MaybeTypeNameOrClassTemplate(Token &token) {

    if (m_env == NULL) {

        return true;

    }

    const char *ptr = token.GetPtr();

    ptrdiff_t len = token.GetLen();

    Bind *bind;

    bool isType = m_env->LookupType(ptr, len, bind);

    return isType;

}

static bool isOperatorInTemplateArg(ptrdiff_t t) {

    return t == AssignOp || t == EqualOp || t == LogOrOp ||

              t == LogAndOp || t == IncOp || t == RelOp;

}

/*

    template.args : '<' any* '>'

    template.args must be followed by '(' or '::'

*/

bool Parser::isTemplateArgs()

{

        ptrdiff_t i = 0;
ptrdiff_t t = lex->LookAhead(i++);

   if(t == '<'){

          ptrdiff_t n = 1;

          while(n > 0){

             ptrdiff_t u = lex->LookAhead(i++);

            /*

             TODO. :(

             Fixing: bool r = a < 1 || b > (int) 2;

             We'll correct not all the cases but it will be better
anyway.

            Editing method. If an identifier is found near the
operator, it is

             obviously not a template because only a type or a constant

             expression may stay inside the brackets.

             An example which doesn't work anyway:

             r = a < fooi() || 1 > (int) b;



            Unfortunately, the following expression is processed
incorrectly now,

             but such cases are fewer than corrected ones.

             template <int z>

             unsigned TFoo(unsigned a) {

             return a + z;

             }

             enum EEnum { EE1, EE2 };

             b = TFoo < EE1 && EE2 > (2);

             */



             ptrdiff_t next = lex->LookAhead(i);

             if (u == Identifier &&

                  isOperatorInTemplateArg(next))
return false;

            if (isOperatorInTemplateArg(u) &&

                 next == Identifier)

                return false;

            if(u == '<')

                 ++n;

            else if(u == '>')

                 --n;

            else if(u == '('){

                 ptrdiff_t m = 1;

                 while(m > 0){

                     ptrdiff_t v = lex->LookAhead(i++);

                     if(v == '(')

                         ++m;

                     else if(v == ')')

                         --m;

                     else if(v == '0' || v == ';' || v == '}')

                         return false;

                 }

            }

            else if(u == '0' || u == ';' || u == '}')

                 return false;

        }

        t = lex->LookAhead(i);

        return bool(t == Scope || t == '(');

    }

    return false;

}
8. Improved error correction
Unfortunately, the error correction mechanism in OpenC++ sometimes causes program crash. Problem
places in OpenC++ are the code similar to this:

if(!rDefinition(def)){

    if(!SyntaxError())

     return false;

    SkipTo('}');

    lex->GetToken(cp); // WARNING: crash in the same case.

    body = PtreeUtil::List(new Leaf(op), 0, new Leaf(cp));

    return true;

}

One should pay attention to those places where the processing of errors occurs and correct them the
way shown by the example of Parser::rLinkageBody() and Parser::SyntaxError() functions. The general
sense of the corrections is that after an error occurs, at first presence of the next lexeme should be
checked with the use of CanLookAhead() function instead of immediate extraction of it by using
GetToken,().

bool Parser::rLinkageBody(Ptree*& body)

{

      Token op, cp;

      Ptree* def;

      if(lex->GetToken(op) != '{')

           return false;

      body = 0;

      while(lex->LookAhead(0) != '}'){

           if(!rDefinition(def)){

                 if(!SyntaxError())

                       return false;                       // too many errors

                 if (lex->CanLookAhead(1)) {

                    SkipTo('}');

                    lex->GetToken(cp);

                    if (!lex->CanLookAhead(0))

                       return false;
} else {

                  return false;

             }

             body =

                  PtreeUtil::List(new (GC) Leaf(op), 0,

                                  new (GC) Leaf(cp));

             return true;                  // error recovery

         }

         body = PtreeUtil::Snoc(body, def);

    }

    lex->GetToken(cp);

    body = new (GC)

        PtreeBrace(new (GC) Leaf(op), body, new (GC) Leaf(cp));

    return true;

}

bool Parser::SyntaxError()

{

    syntaxErrors_ = true;

    Token t, t2;



    if (lex->CanLookAhead(0)) {

        lex->LookAhead(0, t);

    } else {

        lex->LookAhead(-1, t);

    }

    if (lex->CanLookAhead(1)) {

        lex->LookAhead(1, t2);

    } else {

        t2 = t;

    }
SourceLocation location(GetSourceLocation(*this, t.ptr));

     string token(t2.ptr, t2.len);

     errorLog_.Report(ParseErrorMsg(location, token));

     return true;

}


9. Update of rTemplateDecl2 function
Without going into details we offer you to replace rTemplateDecl2() function with the given variant. This
will exclude some errors while working with template classes.

bool Parser::rTemplateDecl2(Ptree*& decl,

                                         TemplateDeclKind &kind)

{

     Token tk;

     Ptree *args = 0;

     if(lex->GetToken(tk) != TEMPLATE)

            return false;

     if(lex->LookAhead(0) != '<') {

        if (lex->LookAhead(0) == CLASS) {

            // template instantiation

            decl = 0;

            kind = tdk_instantiation;

            return true;           // ignore TEMPLATE

        }

        decl = new (GC)

            PtreeTemplateDecl(new (GC) LeafReserved(tk));

     } else {

        decl = new (GC)

            PtreeTemplateDecl(new (GC) LeafReserved(tk));

        if(lex->GetToken(tk) != '<')

            return false;
decl = PtreeUtil::Snoc(decl, new (GC) Leaf(tk));

        if(!rTempArgList(args))

         return false;

        if(lex->GetToken(tk) != '>')

         return false;

    }

    decl =

        PtreeUtil::Nconc(decl,

         PtreeUtil::List(args, new (GC) Leaf(tk)));

    // ignore nested TEMPLATE

    while (lex->LookAhead(0) == TEMPLATE) {

         lex->GetToken(tk);

         if(lex->LookAhead(0) != '<')

             break;

         lex->GetToken(tk);

         if(!rTempArgList(args))

             return false;

         if(lex->GetToken(tk) != '>')

             return false;

    }

    if (args == 0)

         // template < > declaration

         kind = tdk_specialization;

    else

         // template < ... > declaration

         kind = tdk_decl;

    return true;

}
10. Detection of Ptree position in the program text
In some cases it is necessary to know in what places of the program text there is the code from which a
particular Ptree object was built.

The function given below returns the address of the beginning and the end of memory space with the
text of the program from which the mentioned Ptree object was created.

void GetPtreePos(const Ptree *p, const char *&begin,

                         const char *&end) {

    if (p == NULL)

        return;

    if (p->IsLeaf()) {

        const char *pos = p->GetLeafPosition();

        if (begin == NULL) {

            begin = pos;

        } else {

            begin = min(begin, pos);

        }

        end = max(end, pos);

    }

    else {

        GetPtreePos(p->Car(), begin, end);

        GetPtreePos(p->Cdr(), begin, end);

    }

}


11. Support of const A (a) type definitions
OpenC++ library doesn't support definition of variables of "const A (a)" type. To correct this defect a part
of the code should be changed inside Parser::rOtherDeclaration function:

if(!rDeclarators(decl, type_encode, false))

    return false;

Instead of it the following code should be used:

if(!rDeclarators(decl, type_encode, false)) {

    // Support: const A (a);
Lex::TokenIndex after_rDeclarators = lex->Save();

    lex->Restore(before_rDeclarators);

    if (lex->CanLookAhead(3) && lex->CanLookAhead(-2)) {

        ptrdiff_t c_2 = lex->LookAhead(-2);

        ptrdiff_t c_1 = lex->LookAhead(-1);

        ptrdiff_t c0 = lex->LookAhead(0);

        ptrdiff_t c1 = lex->LookAhead(1);

        ptrdiff_t c2 = lex->LookAhead(2);

        ptrdiff_t c3 = lex->LookAhead(3);

        if (c_2 == CONST && c_1 == Identifier &&

            c0 == '(' && c1 == Identifier && c2 == ')' &&

            (c3 == ';' || c3 == '='))

        {

            Lex::TokenContainer newEmptyContainer;

            ptrdiff_t pos = before_rDeclarators;

            lex->ReplaceTokens(pos + 2, pos + 3, newEmptyContainer);

            lex->ReplaceTokens(pos + 0, pos + 1, newEmptyContainer);

            lex->Restore(before_rDeclarators - 2);

            bool res = rDeclaration(statement);

            return res;

        }

    }

}

In this code some auxiliary functions are used which are not discussed in this article. But you can find
them in VivaCore library.


12. Support of definitions in classes of T (min)() { } type functions
Sometimes while programming one has to use workarounds to reach the desirable result. For example,
a widely known macro "max" often causes troubles while defining in a class a method of "T max()
{return m;}" type. In this case one resorts to some tricks and define the method as "T (max)() {return
m;}". Unfortunately, OpenC++ doesn't understand such definitions inside classes. To correct this defect
Parser::isConstructorDecl() function should be changed in the following way:
bool Parser::isConstructorDecl()

{

      if(lex->LookAhead(0) != '(')

           return false;

      else{

           // Support: T (min)() { }

           if (lex->LookAhead(1) == Identifier &&

                lex->LookAhead(2) == ')' &&

                lex->LookAhead(3) == '(')

              return false;

           ptrdiff_t t = lex->LookAhead(1);

           if(t == '*' || t == '&' || t == '(')

                return false;                // declarator

           else if(t == CONST || t == VOLATILE)

                return true;                 // constructor or declarator

           else if(isPtrToMember(1))

                return false;                // declarator (::*)

           else

                return true;                 // maybe constructor

      }

}


13. Processing of constructions "using" and "namespace" inside
functions
OpenC++ library doesn't know that inside functions "using" and "namespace" constructions may be
used. But one can easily correct it by modifying Parser::rStatement() function:

bool Parser::rStatement(Ptree*& st)

{

...

      case USING :

           return rUsing(st);
case NAMESPACE :

         if (lex->LookAhead(2) == '=')

            return rNamespaceAlias(st);

         return rExprStatement(st);

...

}


14. Making "this" a pointer
As it is known "this" is a pointer. But it's not so in OpenC++. That's why we should correct
Walker::TypeofThis() function to correct the error of type identification.

Replace the code

void Walker::TypeofThis(Ptree*, TypeInfo& t)

{

       t.Set(env->LookupThis());

}

with

void Walker::TypeofThis(Ptree*, TypeInfo& t)

{

       t.Set(env->LookupThis());

       t.Reference();

}


15. Optimization of LineNumber() function
We have already mentioned Program::LineNumber() function when saying that it returns file names in
different formats. Then we offered FixFileName() function to correct this situation. But LineNumber()
function has one more disadvantage related to its slow working speed. That's why we offer you an
optimized variant of LineNumber() function.

/*

    LineNumber() returns the line number of the line

    pointed to by PTR.

*/

size_t Program::LineNumber(const char* ptr,

                                         const char*& filename,
ptrdiff_t& filename_length,

                             const char *&beginLinePtr) const

{

    beginLinePtr = NULL;

    ptrdiff_t n;

    size_t len;

    size_t name;

    ptrdiff_t nline = 0;

    size_t pos = ptr - buf;

    size_t startPos = pos;

    if(pos > size){

         // error?

         assert(false);

         filename = defaultname.c_str();

         filename_length = defaultname.length();

         beginLinePtr = buf;

         return 0;

    }

    ptrdiff_t line_number = -1;

    filename_length = 0;

    while(pos > 0){

        if (pos == oldLineNumberPos) {

         line_number = oldLineNumber + nline;

         assert(!oldFileName.empty());

         filename = oldFileName.c_str();

         filename_length = oldFileName.length();

         assert(oldBeginLinePtr != NULL);

         if (beginLinePtr == NULL)

           beginLinePtr = oldBeginLinePtr;

         oldBeginLinePtr = beginLinePtr;
oldLineNumber = line_number;

    oldLineNumberPos = startPos;

    return line_number;

}

switch(buf[--pos]) {

case 'n' :

     if (beginLinePtr == NULL)

         beginLinePtr = &(buf[pos]) + 1;

     ++nline;

     break;

case '#' :

     len = 0;

     n = ReadLineDirective(pos, -1, name, len);

     if(n >= 0){                   // unless #pragma

          if(line_number < 0) {

              line_number = n + nline;

          }

          if(len > 0 && filename_length == 0){

              filename = (char*)Read(name);

              filename_length = len;

          }

     }

     if(line_number >= 0 && filename_length > 0) {

         oldLineNumberPos = pos;

         oldBeginLinePtr = beginLinePtr;

         oldLineNumber = line_number;

         oldFileName = std::string(filename,

                                   filename_length);

         return line_number;

     }
break;

            }

      }

      if(filename_length == 0){

          filename = defaultname.c_str();

          filename_length = defaultname.length();

          oldFileName = std::string(filename,

                                                filename_length);

      }

      if (line_number < 0) {

          line_number = nline + 1;

          if (beginLinePtr == NULL)

            beginLinePtr = buf;

          oldBeginLinePtr = beginLinePtr;

          oldLineNumber = line_number;

          oldLineNumberPos = startPos;

      }

      return line_number;

}


16. Correction of the error occurring while analyzing "#line" directive
In some cases Program::ReadLineDirective() function glitches taking irrelevant text for "#line" directive.
The corrected variant of the function looks as follows:

ptrdiff_t Program::ReadLineDirective(size_t i,

    ptrdiff_t line_number,

    size_t& filename, size_t& filename_length) const

{

      char c;

      do{

            c = Ref(++i);

      } while(is_blank(c));
#if defined(_MSC_VER) || defined(IRIX_CC)

   if(i + 5 <= GetSize() &&

         strncmp(Read(i), "line ", 5) == 0) {

         i += 4;

         do{

             c = Ref(++i);

         }while(is_blank(c));

   } else {

       return -1;

   }

#endif

   if(is_digit(c)){                   /* # <line> <file> */

         unsigned num = c - '0';

         for(;;){

               c = Ref(++i);

               if(is_digit(c))

                     num = num * 10 + c - '0';

               else

                     break;

         }

         /* line_number'll be incremented soon */

         line_number = num - 1;

         if(is_blank(c)){

               do{

                     c = Ref(++i);

               }while(is_blank(c));

               if(c == '"'){

                     size_t fname_start = i;

                     do{

                           c = Ref(++i);
} while(c != '"');

                       if(i > fname_start + 2){

                             filename = fname_start;

                             filename_length = i - fname_start + 1;

                       }

                 }

           }

     }

     return line_number;

}


Conclusion
Of course, this article covers only a small part of possible improvements. But we hope that they will be
useful for developers while using OpenC++ library and will become examples of how one can specialize
the library for one's own tasks.

We'd like to remind you once more that the improvements shown in this article and many other
corrections can be found in VivaCore library's code. VivaCore library may be more convenient for many
tasks than OpenC++.

If you have questions or would like to add or comment on something, our Viva64.com [10] team is
always glad to communicate. We are ready to discuss appearing questions, give recommendations and
help you to use OpenC++ library or VivaCore library. Write us!


References
    1. Zuev E.A. The rare occupation. PC Magazine/Russian Edition. N 5(75), 1997.
        http://www.viva64.com/go.php?url=43.
    2. Margaret A. Ellis, Bjarne Stroustrup. The Annotated C++ Reference Manual. Addison Wesley,
        1990.
    3. OpenC++ library. http://www.viva64.com/go.php?url=16.
    4. Andrey Karpov, Evgeniy Ryzhkov. The essence of the code analysis library VivaCore.
        http://www.viva64.com/art-2-2-449187005.html
    5. Semantic Designs site. http://www.viva64.com/go.php?url=19.
    6. Interstron Company. http://www.viva64.com/go.php?url=42.
    7. What is OpenTS? http://www.viva64.com/go.php?url=17.
    8. Evgeniy Ryzhkov. Viva64: what is it and for whom is it meant?
    9. http://www.viva64.com/art-1-2-903037923.html
    10. Synopsis: A Source-code Introspection Tool. http://www.viva64.com/go.php?url=18.
    11. OOO "Program Verification Systems" site. http://www.viva64.com.

Más contenido relacionado

La actualidad más candente (19)

2CPP17 - File IO
2CPP17 - File IO2CPP17 - File IO
2CPP17 - File IO
 
working file handling in cpp overview
working file handling in cpp overviewworking file handling in cpp overview
working file handling in cpp overview
 
File in cpp 2016
File in cpp 2016 File in cpp 2016
File in cpp 2016
 
File handling in C++
File handling in C++File handling in C++
File handling in C++
 
17 files and streams
17 files and streams17 files and streams
17 files and streams
 
Data file handling
Data file handlingData file handling
Data file handling
 
Log4 J
Log4 JLog4 J
Log4 J
 
File handling in_c
File handling in_cFile handling in_c
File handling in_c
 
C++ Files and Streams
C++ Files and Streams C++ Files and Streams
C++ Files and Streams
 
Files and streams
Files and streamsFiles and streams
Files and streams
 
Csc1100 lecture15 ch09
Csc1100 lecture15 ch09Csc1100 lecture15 ch09
Csc1100 lecture15 ch09
 
File Pointers
File PointersFile Pointers
File Pointers
 
File Handling In C++
File Handling In C++File Handling In C++
File Handling In C++
 
LaTeX for beginners
LaTeX for beginnersLaTeX for beginners
LaTeX for beginners
 
Filehandlinging cp2
Filehandlinging cp2Filehandlinging cp2
Filehandlinging cp2
 
Rust and Eclipse
Rust and EclipseRust and Eclipse
Rust and Eclipse
 
LaTex tutorial with Texstudio
LaTex tutorial with TexstudioLaTex tutorial with Texstudio
LaTex tutorial with Texstudio
 
basics of file handling
basics of file handlingbasics of file handling
basics of file handling
 
Patterns for JVM languages JokerConf
Patterns for JVM languages JokerConfPatterns for JVM languages JokerConf
Patterns for JVM languages JokerConf
 

Destacado

The Near Future of CSS
The Near Future of CSSThe Near Future of CSS
The Near Future of CSSRachel Andrew
 
Classroom Management Tips for Kids and Adolescents
Classroom Management Tips for Kids and AdolescentsClassroom Management Tips for Kids and Adolescents
Classroom Management Tips for Kids and AdolescentsShelly Sanchez Terrell
 
Essential things that should always be in your car
Essential things that should always be in your carEssential things that should always be in your car
Essential things that should always be in your carEason Chan
 
How to Battle Bad Reviews
How to Battle Bad ReviewsHow to Battle Bad Reviews
How to Battle Bad ReviewsGlassdoor
 
Activism x Technology
Activism x TechnologyActivism x Technology
Activism x TechnologyWebVisions
 
The Presentation Come-Back Kid
The Presentation Come-Back KidThe Presentation Come-Back Kid
The Presentation Come-Back KidEthos3
 

Destacado (6)

The Near Future of CSS
The Near Future of CSSThe Near Future of CSS
The Near Future of CSS
 
Classroom Management Tips for Kids and Adolescents
Classroom Management Tips for Kids and AdolescentsClassroom Management Tips for Kids and Adolescents
Classroom Management Tips for Kids and Adolescents
 
Essential things that should always be in your car
Essential things that should always be in your carEssential things that should always be in your car
Essential things that should always be in your car
 
How to Battle Bad Reviews
How to Battle Bad ReviewsHow to Battle Bad Reviews
How to Battle Bad Reviews
 
Activism x Technology
Activism x TechnologyActivism x Technology
Activism x Technology
 
The Presentation Come-Back Kid
The Presentation Come-Back KidThe Presentation Come-Back Kid
The Presentation Come-Back Kid
 

Similar a Improvements to the OpenC++ code analysis library

(1) c sharp introduction_basics_dot_net
(1) c sharp introduction_basics_dot_net(1) c sharp introduction_basics_dot_net
(1) c sharp introduction_basics_dot_netNico Ludwig
 
r,rstats,r language,r packages
r,rstats,r language,r packagesr,rstats,r language,r packages
r,rstats,r language,r packagesAjay Ohri
 
Language-agnostic data analysis workflows and reproducible research
Language-agnostic data analysis workflows and reproducible researchLanguage-agnostic data analysis workflows and reproducible research
Language-agnostic data analysis workflows and reproducible researchAndrew Lowe
 
Python scripting kick off
Python scripting kick offPython scripting kick off
Python scripting kick offAndrea Gangemi
 
typemap in Perl/XS
typemap in Perl/XS  typemap in Perl/XS
typemap in Perl/XS charsbar
 
Ekon bestof rtl_delphi
Ekon bestof rtl_delphiEkon bestof rtl_delphi
Ekon bestof rtl_delphiMax Kleiner
 
Groovy Introduction - JAX Germany - 2008
Groovy Introduction - JAX Germany - 2008Groovy Introduction - JAX Germany - 2008
Groovy Introduction - JAX Germany - 2008Guillaume Laforge
 
C, C++ Interview Questions Part - 1
C, C++ Interview Questions Part - 1C, C++ Interview Questions Part - 1
C, C++ Interview Questions Part - 1ReKruiTIn.com
 
CS 23001 Computer Science II Data Structures & AbstractionPro.docx
CS 23001 Computer Science II Data Structures & AbstractionPro.docxCS 23001 Computer Science II Data Structures & AbstractionPro.docx
CS 23001 Computer Science II Data Structures & AbstractionPro.docxfaithxdunce63732
 
Sour Pickles
Sour PicklesSour Pickles
Sour PicklesSensePost
 
Interoduction to c++
Interoduction to c++Interoduction to c++
Interoduction to c++Amresh Raj
 

Similar a Improvements to the OpenC++ code analysis library (20)

backend
backendbackend
backend
 
backend
backendbackend
backend
 
(1) c sharp introduction_basics_dot_net
(1) c sharp introduction_basics_dot_net(1) c sharp introduction_basics_dot_net
(1) c sharp introduction_basics_dot_net
 
r,rstats,r language,r packages
r,rstats,r language,r packagesr,rstats,r language,r packages
r,rstats,r language,r packages
 
Language-agnostic data analysis workflows and reproducible research
Language-agnostic data analysis workflows and reproducible researchLanguage-agnostic data analysis workflows and reproducible research
Language-agnostic data analysis workflows and reproducible research
 
C++ Boot Camp Part 2
C++ Boot Camp Part 2C++ Boot Camp Part 2
C++ Boot Camp Part 2
 
Angular Schematics
Angular SchematicsAngular Schematics
Angular Schematics
 
CPP Assignment Help
CPP Assignment HelpCPP Assignment Help
CPP Assignment Help
 
Functional programming in C++
Functional programming in C++Functional programming in C++
Functional programming in C++
 
Python scripting kick off
Python scripting kick offPython scripting kick off
Python scripting kick off
 
Bcsl 031 solve assignment
Bcsl 031 solve assignmentBcsl 031 solve assignment
Bcsl 031 solve assignment
 
typemap in Perl/XS
typemap in Perl/XS  typemap in Perl/XS
typemap in Perl/XS
 
Ekon bestof rtl_delphi
Ekon bestof rtl_delphiEkon bestof rtl_delphi
Ekon bestof rtl_delphi
 
Groovy Introduction - JAX Germany - 2008
Groovy Introduction - JAX Germany - 2008Groovy Introduction - JAX Germany - 2008
Groovy Introduction - JAX Germany - 2008
 
C, C++ Interview Questions Part - 1
C, C++ Interview Questions Part - 1C, C++ Interview Questions Part - 1
C, C++ Interview Questions Part - 1
 
C++primer
C++primerC++primer
C++primer
 
CS 23001 Computer Science II Data Structures & AbstractionPro.docx
CS 23001 Computer Science II Data Structures & AbstractionPro.docxCS 23001 Computer Science II Data Structures & AbstractionPro.docx
CS 23001 Computer Science II Data Structures & AbstractionPro.docx
 
Php
PhpPhp
Php
 
Sour Pickles
Sour PicklesSour Pickles
Sour Pickles
 
Interoduction to c++
Interoduction to c++Interoduction to c++
Interoduction to c++
 

Último

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 

Último (20)

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 

Improvements to the OpenC++ code analysis library

  • 1. The use of the code analysis library OpenC++: modifications, improvements, error corrections Author: Andrey Karpov Date: 12.01.2008 Abstract The article may be interesting for developers who use or plan to use OpenC++ library (OpenCxx). The author tells about his experience of improving OpenC++ library and modifying the library for solving special tasks. Introduction One may often here in forums that there are a lot of C++ syntax analyzers ("parsers"), and many of them are free. Or that one may take YACC, for example, and realize his own analyzer easily. Don't believe, it is not so easy [1, 2]. One may understand it especially if one remembers that it is even not half a task to parse syntax. It is necessary to realize structures for storing the program tree and semantic tables containing information about different objects and their scopes. It is especially important while developing specialized applications related to the processing and static analysis of C++ code. It is necessary for their realization to save the whole program tree what may be provided by few libraries. One of them is open library OpenC++ (OpenCxx) [3] about which we'll speak in this article. We'd like to help developers in mastering OpenC++ library and share our experience of modernization and improvement of some defects. The article is a compilation of pieces of advice, each of which is devoted to correction of some defect or realization of improvement. The article is based on recollections about changes that were carried out in VivaCore library [4] based on OpenC++. Of course, only a small part of these changes is discussed here. It is a difficult task to remember and describe them all. And, for example, description of addition of C language support into OpenC++ library will take too much place. But you can always refer to original texts of VivaCore library and get a lot of interesting information. It remains to say that OpenC++ library is unfortunately out-of-date now and needs serious improvement for supporting the modern C++ language standard. That's why if you are going to realize a modern compiler for example, you'd better pay your attention to GCC or to commercial libraries [5, 6]. But OpenC++ still remains a good and convenient tool for many developers in the sphere of systems of specialized processing and modification of program code. With the use of OpenC++ many interesting solutions are developed, for example, execution environment OpenTS [7] for T++ programming language (development of Program systems Institution RAS), static code analyzer Viva64 [8] or Synopsis tool for preparing documentation on the original code [9]. The purpose of the article is to show by examples how one can modify and improve OpenC++ library code. The article describes 15 library modifications related to error correction or addition of new
  • 2. functionality. Each of them not only allows to make OpenC++ library better but also gives an opportunity to study its work principles deeper. Let's get acquainted with them. 1. Skip of development environment keywords not influencing the program processing While developing a code analyzer for a specific development environment, you are likely to come across with its specific language constructions. These constructions are often guidance for a concrete compiler and may not be of interest for you. But such constructions cannot be processed by OpenC++ library as they are not a part of C++ language. In this case one of the simplest ways to ignore them is to add them into rw_table table with ignore key. For example: static rw_table table[] = { ... { "__ptr32", Ignore}, { "__ptr64", Ignore}, { "__unaligned", Ignore}, ... }; While adding you should keep in mind that words in rw_table table should be arranged in alphabetic order. Be careful. 2. Addition of a new lexeme If you want to add a keyword which should be processed, you need to create a new lexeme ("token"). Let's look at the example of adding a new keyword "__w64". At first create an identifier of the new lexeme (see token-name.h file), for example in this way: enum { Identifier = 258, Constant = 262, ... W64 = 346, // New token name ... }; Modernize the table "table" in lex.cc file: static rw_table table[] = { ...
  • 3. { "__w64", W64 }, ... }; The next step is to create a class for the new lexeme, which we'll call LeafW64. namespace Opencxx { class LeafW64 : public LeafReserved { public: LeafW64(Token& t) : LeafReserved(t) {} LeafW64(char* str, ptrdiff_t len) : LeafReserved(str, len) {} ptrdiff_t What() { return W64; } }; } To create an object we'll need to modify optIntegralTypeOrClassSpec() function: ... case UNSIGNED : flag = 'U'; kw = new (GC) LeafUNSIGNED(tk); break; case W64 : // NEW! flag = 'W'; kw = new (GC) LeafW64(tk); break; ... Pay attention that as far as we've decided to refer "__w64" to data types, we'll need the 'W' symbol for coding this type. You may learn more about type coding mechanism in Encoding.cc file. Introducing a new type we must remember that we need to modernize such functions as Parser::isTypeSpecifier() for example. And the last important point is modification of Encoding::MakePtree function:
  • 4. Ptree* Encoding::MakePtree(unsigned char*& encoded, Ptree* decl) { ... case 'W' : typespec = PtreeUtil::Snoc(typespec, w64_t); break; ... } Of course, it is only an example, and adding other lexemes may take much more efforts. A good way to add a new lexeme correctly is to take one close to it in sense and then find and examine all the places in OpenC++ library where it is used. 3. Skip of development environment complex key constructions not influencing the program processing We have already examined the way of skipping single keywords which are senseless for our program but impede code parsing. Unfortunately, sometimes it is even more difficult. Let's take for demonstration such constructions as __pragma and __noop which you may see in header files of VisualC++: __forceinline DWORD HEAP_MAKE_TAG_FLAGS ( DWORD TagBase, DWORD Tag ) { __pragma(warning(push)) __pragma(warning(disable : 4548)) do {__noop(TagBase);} while((0,0) __pragma(warning(pop)) ); return ((DWORD)((TagBase) + ((Tag) << 18))); } You may look for description of __pragma and __noop constructions in MSDN. The next points are important for our program: a) they are not of interest for us; b) they have some parameters; c) they impede code analysis. Let's add new lexemes at first, as it was told before, but now let's use InitializeOtherKeywords() function for this purpose: static void InitializeOtherKeywords(bool recognizeOccExtensions) { ... verify(Lex::RecordKeyword("__pragma", MSPRAGMA)); verify(Lex::RecordKeyword("__noop", MS__NOOP));
  • 5. ... } Solution consists in modifying Lex::ReadToken function so that when we come across with DECLSPEC or MSPRAGMA lexeme we skip it. And then we skip all the lexemes related to __pragma and __noop parameters. For skipping all the unnecessary lexemes we use SkipDeclspecToken() function as it is shown further. ptrdiff_t Lex::ReadToken(char*& ptr, ptrdiff_t& len) { ... else if(t == DECLSPEC){ SkipDeclspecToken(); continue; } else if(t == MSPRAGMA) { // NEW SkipDeclspecToken(); continue; } else if(t == MS__NOOP) { //NEW SkipDeclspecToken(); continue; } ... } 4. Function of full file paths disclosure In tasks of analysis of original code a large amount of functionality is related to creation of error messages and also to navigation on original files. What is inconvenient is that file names returned by such functions as Program::LineNumber() may be presented in different ways. Here are some examples: C:Program FilesMSVS 8VCatlmfcincludeafx.h .drawing.cpp c:srcwxwindows-2.4.2samplesdrawingwx/defs.h Boostboost-1_33_1boost/variant/recursive_variant.hpp
  • 6. ..FieldEdit2SrcamsEdit.cpp ......srcbaseftbase.c The way may be full or relative. Different delimiters may be used. All this makes the use of such ways inconvenient for processing or for output in information messages. That's why we offer realization of FixFileName() function bringing paths to uniform full way. An auxiliary function GetInputFileDirectory() is used to return the path to the catalogue where the processed file is situated. const string &GetInputFileDirectory() { static string oldInputFileName; static string fileDirectory; string dir; VivaConfiguration &cfg = VivaConfiguration::Instance(); string inputFileName; cfg.GetInputFileName(inputFileName); if (oldInputFileName == inputFileName) return fileDirectory; oldInputFileName = inputFileName; filesystem::path inputFileNamePath(inputFileName, filesystem::native); fileDirectory = inputFileNamePath.branch_path().string(); if (fileDirectory.empty()) { TCHAR curDir[MAX_PATH]; if (GetCurrentDirectory(MAX_PATH, curDir) != 0) { fileDirectory = curDir; } else { assert(false); } } algorithm::replace_all(fileDirectory, "/", ""); to_lower(fileDirectory); return fileDirectory; }
  • 7. typedef map<string, string> StrStrMap; typedef StrStrMap::iterator StrStrMapIt; void FixFileName(string &fileName) { static StrStrMap FileNamesMap; StrStrMapIt it = FileNamesMap.find(fileName); if (it != FileNamesMap.end()) { fileName = it->second; return; } string oldFileName = fileName; algorithm::replace_all(fileName, "/", ""); algorithm::replace_all(fileName, "", ""); filesystem::path tmpPath(fileName, filesystem::native); fileName = tmpPath.string(); algorithm::replace_all(fileName, "/", ""); to_lower(fileName); if (fileName.length() < 2) { assert(false); FileNamesMap.insert(make_pair(oldFileName, fileName)); return; } if (fileName[0] == '.' && fileName[1] != '.') { const string &dir = GetInputFileDirectory(); if (!dir.empty()) fileName.replace(0, 1, dir); FileNamesMap.insert(make_pair(oldFileName, fileName)); return; } if (isalpha(fileName[0]) && fileName[1] == ':' ) {
  • 8. FileNamesMap.insert(make_pair(oldFileName, fileName)); return; } const string &dir = GetInputFileDirectory(); if (dir.empty()) fileName.insert(0, "."); else { fileName.insert(0, ""); fileName.insert(0, dir); } FileNamesMap.insert(make_pair(oldFileName, fileName)); } 5. Getting values of numerical literals The function of getting a value of a numerical literal may be useful in systems of building documentation on the code. For example, with its help one may see that the argument of "void foo(a = 99)" function is 99 and use this for some purpose. GetLiteralType() function that we offer allows to get the literal type and its value if it is integer. GetLiteralType() function is created for getting information needed most often and doesn't support rarely used record types. But if you need to support UCNs for example or get values of double type, you may expand functionality of the functions given below by yourself. ", 5) == 0) { retValue = 0; ; } ; } IsHexLiteral( *from, size_t len) { (len < 3) ; (from[0] != '0') ; (from[1] != 'x' && from[1] != 'X') ; ; } SimpleType GetTypeBySufix( *from, size_t len) { assert(from != NULL); (len == 0) ST_INT; assert(!isdigit(*from)); suffix_8 = ; suffix_16 = ; suffix_32 = ; suffix_64 = ; suffix_i = ; suffix_l = ; suffix_u = ; (len != 0) { --len; c = *from++; (c) { '8': suffix_8 = ; ; '1': (len == 0 || *from++ != '6') { assert(); ST_UNKNOWN; } --len; suffix_16 = ; ; '3': (len == 0 || *from++ != '2') { assert(); ST_UNKNOWN; } --len; suffix_32 = ; ; '6': (len == 0 || *from++ != '4') { assert(); ST_UNKNOWN; } --len; suffix_64 = ; ; 'I': 'i': suffix_i = ; ; 'U': 'u': suffix_u = ; ; 'L': 'l': suffix_l = ; ; : assert(); ST_UNKNOWN; } } assert(suffix_8 + suffix_16 + suffix_32 + suffix_64 <= 1); (suffix_8 || suffix_16) ST_LESS_INT; (suffix_32) { (suffix_u) ST_UINT; ST_INT; }
  • 9. (suffix_64) { (suffix_u) ST_UINT64; ST_INT64; } (suffix_l) { (suffix_u) ST_ULONG; ST_LONG; } (suffix_u) ST_UINT; assert(suffix_i); ST_INT; } SimpleType GetHexLiteral( *from, size_t len, &retValue) { assert(len >= 3); *p = from + 2; (!GetHex(p, len, retValue)) { ST_UNKNOWN; } ptrdiff_t newLen = len - (p - from); assert(newLen >= 0 && newLen < <ptrdiff_t>(len)); GetTypeBySufix(p, newLen); } IsOctLiteral( *from, size_t len) { (len < 2) ; (from[0] != '0') ; ; } SimpleType GetOctLiteral( *from, size_t len, &retValue) { assert(len >= 2); *p = from + 1; (!GetOct(p, len, retValue)) { ST_UNKNOWN; } ptrdiff_t newLen = len - (p - from); assert(newLen >= 0 && newLen < <ptrdiff_t>(len)); GetTypeBySufix(p, newLen); } SimpleType GetDecLiteral( *from, size_t len, &retValue) { assert(len >= 1); *limit = from + len; n = 0; (from < limit) { c = *from; (c < '0' || c > '9') ; from++; n = n * 10 + (c - '0'); } ptrdiff_t newLen = limit - from; (newLen == <ptrdiff_t>(len)) ST_UNKNOWN; retValue = n; assert(newLen >= 0 && newLen < <ptrdiff_t>(len)); GetTypeBySufix(from, newLen); } SimpleType GetLiteralType( *from, size_t len, &retValue) { (from == NULL || len == 0) ST_UNKNOWN; retValue = 1; (from == NULL || len == 0) ST_UNKNOWN; (GetCharLiteral(from, len, retValue)) ST_LESS_INT; (GetStringLiteral(from, len)) ST_POINTER; (GetBoolLiteral(from, len, retValue)) ST_LESS_INT; (IsRealLiteral(from, len)) GetRealLiteral(from, len); (IsHexLiteral(from, len)) GetHexLiteral(from, len, retValue); (IsOctLiteral(from, len)) GetOctLiteral(from, len, retValue); GetDecLiteral(from, len, retValue); } unsigned __int64 GetHexValue(unsigned char c) { if (c >= '0' && c <= '9') return c - '0'; if (c >= 'a' && c <= 'f') return c - 'a' + 0x0a; if (c >= 'A' && c <= 'F') return c - 'A' + 0x0a; assert(false); return 0; } bool GetHex(const char *&from, size_t len, unsigned __int64 &retValue) {
  • 10. unsigned __int64 c, n = 0, overflow = 0; int digits_found = 0; const char *limit = from + len; while (from < limit) { c = *from; if (!isxdigit(c)) break; from++; overflow |= n ^ (n << 4 >> 4); n = (n << 4) + GetHexValue(c); digits_found = 1; } if (!digits_found) return false; if (overflow) { assert(false); } retValue = n; return true; } bool GetOct(const char *&from, size_t len, unsigned __int64 &retValue) { unsigned __int64 c, n = 0; bool overflow = false; const char *limit = from + len; while (from < limit) { c = *from; if (c < '0' || c > '7')
  • 11. break; from++; overflow |= static_cast<bool>(n ^ (n << 3 >> 3)); n = (n << 3) + c - '0'; } retValue = n; return true; } #define HOST_CHARSET_ASCII bool GetEscape(const char *from, size_t len, unsigned __int64 &retValue) { /* Values of a b e f n r t v respectively. */ // HOST_CHARSET_ASCII static const char charconsts[] = { 7, 8, 27, 12, 10, 13, 9, 11 }; // HOST_CHARSET_EBCDIC //static const uchar charconsts[] = { 47, 22, 39, 12, 21, 13, 5, 11 }; unsigned char c; c = from[0]; switch (c) { /* UCNs, hex escapes, and octal escapes are processed separately. */ case 'u': case 'U': // convert_ucn - not supported. Return: 65535. retValue = 0xFFFFui64; return true; case 'x': { const char *p = from + 1;
  • 12. return GetHex(p, len, retValue); } case '0': case '1': case '2': case '3': case '4': case '5': case '6': case '7': { const char *p = from + 1; return GetOct(p, len, retValue); } case '': case ''': case '"': case '?': break; case 'a': c = charconsts[0]; break; case 'b': c = charconsts[1]; break; case 'f': c = charconsts[3]; break; case 'n': c = charconsts[4]; break; case 'r': c = charconsts[5]; break; case 't': c = charconsts[6]; break; case 'v': c = charconsts[7]; break; case 'e': case 'E': c = charconsts[2]; break; default: assert(false); return false; } retValue = c; return true; } //'A', 't', L'A', 'xFE' static bool GetCharLiteral(const char *from, size_t len, unsigned __int64 &retValue) { if (len >= 3) {
  • 13. if (from[0] == ''' && from[len - 1] == ''') { unsigned char c = from[1]; if (c == '') { verify(GetEscape(from + 2, len - 3, retValue)); } else { retValue = c; } return true; } } if (len >= 4) { if (from[0] == 'L' && from[1] == ''' && from[len - 1] == ''') { unsigned char c = from[2]; if (c == '') { verify(GetEscape(from + 3, len - 4, retValue)); } else { retValue = c; } return true; } } return false; } // "string" static bool GetStringLiteral(const char *from, size_t len) { if (len >= 2) { if (from[0] == '"' && from[len - 1] == '"') return true;
  • 14. } if (len >= 3) { if (from[0] == 'L' && from[1] == '"' && from[len - 1] == '"') return true; } return false; } bool IsRealLiteral(const char *from, size_t len) { if (len < 2) return false; bool isReal = false; bool digitFound = false; for (size_t i = 0; i != len; ++i) { unsigned char c = from[i]; switch(c) { case 'x': return false; case 'X': return false; case 'f': isReal = true; break; case 'F': isReal = true; break; case '.': isReal = true; break; case 'e': isReal = true; break; case 'E': isReal = true; break; case 'l': break; case '-': break; case '+': break; case 'L': break; default: if (!isdigit(c))
  • 15. return false; digitFound = true; } } return isReal && digitFound; } SimpleType GetRealLiteral(const char *from, size_t len) { assert(len > 1); unsigned char rc1 = from[len - 1]; if (is_digit(rc1) || rc1 == '.' || rc1 == 'l' || rc1 == 'L' || rc1 == 'e' || rc1 == 'E') return ST_DOUBLE; if (rc1 == 'f' || rc1 == 'F') return ST_FLOAT; assert(false); return ST_UNKNOWN; } bool GetBoolLiteral(const char *from, size_t len, unsigned __int64 &retValue) { if (len == 4 && strncmp(from, "true", 4) == 0) { retValue = 1; return true; } if (len == 5 && strncmp(from, "false", 5) == 0) { retValue = 0; return true; } return false; }
  • 16. bool IsHexLiteral(const char *from, size_t len) { if (len < 3) return false; if (from[0] != '0') return false; if (from[1] != 'x' && from[1] != 'X') return false; return true; } SimpleType GetTypeBySufix(const char *from, size_t len) { assert(from != NULL); if (len == 0) return ST_INT; assert(!isdigit(*from)); bool suffix_8 = false; bool suffix_16 = false; bool suffix_32 = false; bool suffix_64 = false; bool suffix_i = false; bool suffix_l = false; bool suffix_u = false; while (len != 0) { --len; const char c = *from++; switch(c) { case '8': suffix_8 = true; break; case '1': if (len == 0 || *from++ != '6') { assert(false); return ST_UNKNOWN;
  • 17. } --len; suffix_16 = true; break; case '3': if (len == 0 || *from++ != '2') { assert(false); return ST_UNKNOWN; } --len; suffix_32 = true; break; case '6': if (len == 0 || *from++ != '4') { assert(false); return ST_UNKNOWN; } --len; suffix_64 = true; break; case 'I': case 'i': suffix_i = true; break; case 'U': case 'u': suffix_u = true; break; case 'L': case 'l': suffix_l = true; break; default: assert(false); return ST_UNKNOWN; }
  • 18. } assert(suffix_8 + suffix_16 + suffix_32 + suffix_64 <= 1); if (suffix_8 || suffix_16) return ST_LESS_INT; if (suffix_32) { if (suffix_u) return ST_UINT; else return ST_INT; } if (suffix_64) { if (suffix_u) return ST_UINT64; else return ST_INT64; } if (suffix_l) { if (suffix_u) return ST_ULONG; else return ST_LONG; } if (suffix_u) return ST_UINT; assert(suffix_i); return ST_INT; } SimpleType GetHexLiteral(const char *from, size_t len,
  • 19. unsigned __int64 &retValue) { assert(len >= 3); const char *p = from + 2; if (!GetHex(p, len, retValue)) { return ST_UNKNOWN; } ptrdiff_t newLen = len - (p - from); assert(newLen >= 0 && newLen < static_cast<ptrdiff_t>(len)); return GetTypeBySufix(p, newLen); } bool IsOctLiteral(const char *from, size_t len) { if (len < 2) return false; if (from[0] != '0') return false; return true; } SimpleType GetOctLiteral(const char *from, size_t len, unsigned __int64 &retValue) { assert(len >= 2); const char *p = from + 1; if (!GetOct(p, len, retValue)) { return ST_UNKNOWN; } ptrdiff_t newLen = len - (p - from); assert(newLen >= 0 && newLen < static_cast<ptrdiff_t>(len)); return GetTypeBySufix(p, newLen); } SimpleType GetDecLiteral(const char *from, size_t len, unsigned __int64 &retValue) {
  • 20. assert(len >= 1); const char *limit = from + len; unsigned __int64 n = 0; while (from < limit) { const char c = *from; if (c < '0' || c > '9') break; from++; n = n * 10 + (c - '0'); } ptrdiff_t newLen = limit - from; if (newLen == static_cast<ptrdiff_t>(len)) return ST_UNKNOWN; retValue = n; assert(newLen >= 0 && newLen < static_cast<ptrdiff_t>(len)); return GetTypeBySufix(from, newLen); } SimpleType GetLiteralType(const char *from, size_t len, unsigned __int64 &retValue) { if (from == NULL || len == 0) return ST_UNKNOWN; retValue = 1; if (from == NULL || len == 0) return ST_UNKNOWN; if (GetCharLiteral(from, len, retValue)) return ST_LESS_INT; if (GetStringLiteral(from, len)) return ST_POINTER; if (GetBoolLiteral(from, len, retValue)) return ST_LESS_INT;
  • 21. if (IsRealLiteral(from, len)) return GetRealLiteral(from, len); if (IsHexLiteral(from, len)) return GetHexLiteral(from, len, retValue); if (IsOctLiteral(from, len)) return GetOctLiteral(from, len, retValue); return GetDecLiteral(from, len, retValue); } 6. Correction of string literal processing function We offer you to modify Lex::ReadStrConst() function as it is shown further. This will allow to correct two errors related to processing of separated string literals. The first error occurs while processing strings of the following kind: const char *name = "Viva Core"; The second: const wchar_t *str = L"begin"L"end". The corrected function variant: bool Lex::ReadStrConst(size_t top, bool isWcharStr) { char c; for(;;){ c = file->Get(); if(c == ''){ c = file->Get(); // Support: "" if (c == 'r') { c = file->Get(); if (c != 'n') return false; } else if(c == '0')
  • 22. return false; } else if(c == '"</str>'){ size_t pos = file->GetCurPos() + 1; ptrdiff_t nline = 0; do{ c = file->Get(); if(c == 'n') ++nline; } while(is_blank(c) || c == 'n'); if (isWcharStr && c == 'L') { //Support: L"123" L"456" L "789". c = file->Get(); if(c == '"') /* line_number += nline; */ ; else{ file->Unget(); return false; } } else { if(c == '"') /* line_number += nline; */ ; else{ token_len = ptrdiff_t(pos - top); file->Rewind(pos); return true; } } } else if(c == 'n' || c == '0')
  • 23. return false; } } 7. Partial correction of the processing of "bool r = a < 1 || b > (int) 2;" type expressions There is an error in OpenC++ related to the processing of some expressions which are wrongly taken for templates. For example, in a string "bool r = a < 1 || b > (int) 2;" "a" variable will be taken for a template name and then a lot of troubles with syntactical analysis will follow... Full correction of this error requires great changes and is not realized by now. We offer you a temporary solution excluding the major part of errors. Further the functions are given which may be added or modified. bool VivaParser::MaybeTypeNameOrClassTemplate(Token &token) { if (m_env == NULL) { return true; } const char *ptr = token.GetPtr(); ptrdiff_t len = token.GetLen(); Bind *bind; bool isType = m_env->LookupType(ptr, len, bind); return isType; } static bool isOperatorInTemplateArg(ptrdiff_t t) { return t == AssignOp || t == EqualOp || t == LogOrOp || t == LogAndOp || t == IncOp || t == RelOp; } /* template.args : '<' any* '>' template.args must be followed by '(' or '::' */ bool Parser::isTemplateArgs() { ptrdiff_t i = 0;
  • 24. ptrdiff_t t = lex->LookAhead(i++); if(t == '<'){ ptrdiff_t n = 1; while(n > 0){ ptrdiff_t u = lex->LookAhead(i++); /* TODO. :( Fixing: bool r = a < 1 || b > (int) 2; We'll correct not all the cases but it will be better anyway. Editing method. If an identifier is found near the operator, it is obviously not a template because only a type or a constant expression may stay inside the brackets. An example which doesn't work anyway: r = a < fooi() || 1 > (int) b; Unfortunately, the following expression is processed incorrectly now, but such cases are fewer than corrected ones. template <int z> unsigned TFoo(unsigned a) { return a + z; } enum EEnum { EE1, EE2 }; b = TFoo < EE1 && EE2 > (2); */ ptrdiff_t next = lex->LookAhead(i); if (u == Identifier && isOperatorInTemplateArg(next))
  • 25. return false; if (isOperatorInTemplateArg(u) && next == Identifier) return false; if(u == '<') ++n; else if(u == '>') --n; else if(u == '('){ ptrdiff_t m = 1; while(m > 0){ ptrdiff_t v = lex->LookAhead(i++); if(v == '(') ++m; else if(v == ')') --m; else if(v == '0' || v == ';' || v == '}') return false; } } else if(u == '0' || u == ';' || u == '}') return false; } t = lex->LookAhead(i); return bool(t == Scope || t == '('); } return false; }
  • 26. 8. Improved error correction Unfortunately, the error correction mechanism in OpenC++ sometimes causes program crash. Problem places in OpenC++ are the code similar to this: if(!rDefinition(def)){ if(!SyntaxError()) return false; SkipTo('}'); lex->GetToken(cp); // WARNING: crash in the same case. body = PtreeUtil::List(new Leaf(op), 0, new Leaf(cp)); return true; } One should pay attention to those places where the processing of errors occurs and correct them the way shown by the example of Parser::rLinkageBody() and Parser::SyntaxError() functions. The general sense of the corrections is that after an error occurs, at first presence of the next lexeme should be checked with the use of CanLookAhead() function instead of immediate extraction of it by using GetToken,(). bool Parser::rLinkageBody(Ptree*& body) { Token op, cp; Ptree* def; if(lex->GetToken(op) != '{') return false; body = 0; while(lex->LookAhead(0) != '}'){ if(!rDefinition(def)){ if(!SyntaxError()) return false; // too many errors if (lex->CanLookAhead(1)) { SkipTo('}'); lex->GetToken(cp); if (!lex->CanLookAhead(0)) return false;
  • 27. } else { return false; } body = PtreeUtil::List(new (GC) Leaf(op), 0, new (GC) Leaf(cp)); return true; // error recovery } body = PtreeUtil::Snoc(body, def); } lex->GetToken(cp); body = new (GC) PtreeBrace(new (GC) Leaf(op), body, new (GC) Leaf(cp)); return true; } bool Parser::SyntaxError() { syntaxErrors_ = true; Token t, t2; if (lex->CanLookAhead(0)) { lex->LookAhead(0, t); } else { lex->LookAhead(-1, t); } if (lex->CanLookAhead(1)) { lex->LookAhead(1, t2); } else { t2 = t; }
  • 28. SourceLocation location(GetSourceLocation(*this, t.ptr)); string token(t2.ptr, t2.len); errorLog_.Report(ParseErrorMsg(location, token)); return true; } 9. Update of rTemplateDecl2 function Without going into details we offer you to replace rTemplateDecl2() function with the given variant. This will exclude some errors while working with template classes. bool Parser::rTemplateDecl2(Ptree*& decl, TemplateDeclKind &kind) { Token tk; Ptree *args = 0; if(lex->GetToken(tk) != TEMPLATE) return false; if(lex->LookAhead(0) != '<') { if (lex->LookAhead(0) == CLASS) { // template instantiation decl = 0; kind = tdk_instantiation; return true; // ignore TEMPLATE } decl = new (GC) PtreeTemplateDecl(new (GC) LeafReserved(tk)); } else { decl = new (GC) PtreeTemplateDecl(new (GC) LeafReserved(tk)); if(lex->GetToken(tk) != '<') return false;
  • 29. decl = PtreeUtil::Snoc(decl, new (GC) Leaf(tk)); if(!rTempArgList(args)) return false; if(lex->GetToken(tk) != '>') return false; } decl = PtreeUtil::Nconc(decl, PtreeUtil::List(args, new (GC) Leaf(tk))); // ignore nested TEMPLATE while (lex->LookAhead(0) == TEMPLATE) { lex->GetToken(tk); if(lex->LookAhead(0) != '<') break; lex->GetToken(tk); if(!rTempArgList(args)) return false; if(lex->GetToken(tk) != '>') return false; } if (args == 0) // template < > declaration kind = tdk_specialization; else // template < ... > declaration kind = tdk_decl; return true; }
  • 30. 10. Detection of Ptree position in the program text In some cases it is necessary to know in what places of the program text there is the code from which a particular Ptree object was built. The function given below returns the address of the beginning and the end of memory space with the text of the program from which the mentioned Ptree object was created. void GetPtreePos(const Ptree *p, const char *&begin, const char *&end) { if (p == NULL) return; if (p->IsLeaf()) { const char *pos = p->GetLeafPosition(); if (begin == NULL) { begin = pos; } else { begin = min(begin, pos); } end = max(end, pos); } else { GetPtreePos(p->Car(), begin, end); GetPtreePos(p->Cdr(), begin, end); } } 11. Support of const A (a) type definitions OpenC++ library doesn't support definition of variables of "const A (a)" type. To correct this defect a part of the code should be changed inside Parser::rOtherDeclaration function: if(!rDeclarators(decl, type_encode, false)) return false; Instead of it the following code should be used: if(!rDeclarators(decl, type_encode, false)) { // Support: const A (a);
  • 31. Lex::TokenIndex after_rDeclarators = lex->Save(); lex->Restore(before_rDeclarators); if (lex->CanLookAhead(3) && lex->CanLookAhead(-2)) { ptrdiff_t c_2 = lex->LookAhead(-2); ptrdiff_t c_1 = lex->LookAhead(-1); ptrdiff_t c0 = lex->LookAhead(0); ptrdiff_t c1 = lex->LookAhead(1); ptrdiff_t c2 = lex->LookAhead(2); ptrdiff_t c3 = lex->LookAhead(3); if (c_2 == CONST && c_1 == Identifier && c0 == '(' && c1 == Identifier && c2 == ')' && (c3 == ';' || c3 == '=')) { Lex::TokenContainer newEmptyContainer; ptrdiff_t pos = before_rDeclarators; lex->ReplaceTokens(pos + 2, pos + 3, newEmptyContainer); lex->ReplaceTokens(pos + 0, pos + 1, newEmptyContainer); lex->Restore(before_rDeclarators - 2); bool res = rDeclaration(statement); return res; } } } In this code some auxiliary functions are used which are not discussed in this article. But you can find them in VivaCore library. 12. Support of definitions in classes of T (min)() { } type functions Sometimes while programming one has to use workarounds to reach the desirable result. For example, a widely known macro "max" often causes troubles while defining in a class a method of "T max() {return m;}" type. In this case one resorts to some tricks and define the method as "T (max)() {return m;}". Unfortunately, OpenC++ doesn't understand such definitions inside classes. To correct this defect Parser::isConstructorDecl() function should be changed in the following way:
  • 32. bool Parser::isConstructorDecl() { if(lex->LookAhead(0) != '(') return false; else{ // Support: T (min)() { } if (lex->LookAhead(1) == Identifier && lex->LookAhead(2) == ')' && lex->LookAhead(3) == '(') return false; ptrdiff_t t = lex->LookAhead(1); if(t == '*' || t == '&' || t == '(') return false; // declarator else if(t == CONST || t == VOLATILE) return true; // constructor or declarator else if(isPtrToMember(1)) return false; // declarator (::*) else return true; // maybe constructor } } 13. Processing of constructions "using" and "namespace" inside functions OpenC++ library doesn't know that inside functions "using" and "namespace" constructions may be used. But one can easily correct it by modifying Parser::rStatement() function: bool Parser::rStatement(Ptree*& st) { ... case USING : return rUsing(st);
  • 33. case NAMESPACE : if (lex->LookAhead(2) == '=') return rNamespaceAlias(st); return rExprStatement(st); ... } 14. Making "this" a pointer As it is known "this" is a pointer. But it's not so in OpenC++. That's why we should correct Walker::TypeofThis() function to correct the error of type identification. Replace the code void Walker::TypeofThis(Ptree*, TypeInfo& t) { t.Set(env->LookupThis()); } with void Walker::TypeofThis(Ptree*, TypeInfo& t) { t.Set(env->LookupThis()); t.Reference(); } 15. Optimization of LineNumber() function We have already mentioned Program::LineNumber() function when saying that it returns file names in different formats. Then we offered FixFileName() function to correct this situation. But LineNumber() function has one more disadvantage related to its slow working speed. That's why we offer you an optimized variant of LineNumber() function. /* LineNumber() returns the line number of the line pointed to by PTR. */ size_t Program::LineNumber(const char* ptr, const char*& filename,
  • 34. ptrdiff_t& filename_length, const char *&beginLinePtr) const { beginLinePtr = NULL; ptrdiff_t n; size_t len; size_t name; ptrdiff_t nline = 0; size_t pos = ptr - buf; size_t startPos = pos; if(pos > size){ // error? assert(false); filename = defaultname.c_str(); filename_length = defaultname.length(); beginLinePtr = buf; return 0; } ptrdiff_t line_number = -1; filename_length = 0; while(pos > 0){ if (pos == oldLineNumberPos) { line_number = oldLineNumber + nline; assert(!oldFileName.empty()); filename = oldFileName.c_str(); filename_length = oldFileName.length(); assert(oldBeginLinePtr != NULL); if (beginLinePtr == NULL) beginLinePtr = oldBeginLinePtr; oldBeginLinePtr = beginLinePtr;
  • 35. oldLineNumber = line_number; oldLineNumberPos = startPos; return line_number; } switch(buf[--pos]) { case 'n' : if (beginLinePtr == NULL) beginLinePtr = &(buf[pos]) + 1; ++nline; break; case '#' : len = 0; n = ReadLineDirective(pos, -1, name, len); if(n >= 0){ // unless #pragma if(line_number < 0) { line_number = n + nline; } if(len > 0 && filename_length == 0){ filename = (char*)Read(name); filename_length = len; } } if(line_number >= 0 && filename_length > 0) { oldLineNumberPos = pos; oldBeginLinePtr = beginLinePtr; oldLineNumber = line_number; oldFileName = std::string(filename, filename_length); return line_number; }
  • 36. break; } } if(filename_length == 0){ filename = defaultname.c_str(); filename_length = defaultname.length(); oldFileName = std::string(filename, filename_length); } if (line_number < 0) { line_number = nline + 1; if (beginLinePtr == NULL) beginLinePtr = buf; oldBeginLinePtr = beginLinePtr; oldLineNumber = line_number; oldLineNumberPos = startPos; } return line_number; } 16. Correction of the error occurring while analyzing "#line" directive In some cases Program::ReadLineDirective() function glitches taking irrelevant text for "#line" directive. The corrected variant of the function looks as follows: ptrdiff_t Program::ReadLineDirective(size_t i, ptrdiff_t line_number, size_t& filename, size_t& filename_length) const { char c; do{ c = Ref(++i); } while(is_blank(c));
  • 37. #if defined(_MSC_VER) || defined(IRIX_CC) if(i + 5 <= GetSize() && strncmp(Read(i), "line ", 5) == 0) { i += 4; do{ c = Ref(++i); }while(is_blank(c)); } else { return -1; } #endif if(is_digit(c)){ /* # <line> <file> */ unsigned num = c - '0'; for(;;){ c = Ref(++i); if(is_digit(c)) num = num * 10 + c - '0'; else break; } /* line_number'll be incremented soon */ line_number = num - 1; if(is_blank(c)){ do{ c = Ref(++i); }while(is_blank(c)); if(c == '"'){ size_t fname_start = i; do{ c = Ref(++i);
  • 38. } while(c != '"'); if(i > fname_start + 2){ filename = fname_start; filename_length = i - fname_start + 1; } } } } return line_number; } Conclusion Of course, this article covers only a small part of possible improvements. But we hope that they will be useful for developers while using OpenC++ library and will become examples of how one can specialize the library for one's own tasks. We'd like to remind you once more that the improvements shown in this article and many other corrections can be found in VivaCore library's code. VivaCore library may be more convenient for many tasks than OpenC++. If you have questions or would like to add or comment on something, our Viva64.com [10] team is always glad to communicate. We are ready to discuss appearing questions, give recommendations and help you to use OpenC++ library or VivaCore library. Write us! References 1. Zuev E.A. The rare occupation. PC Magazine/Russian Edition. N 5(75), 1997. http://www.viva64.com/go.php?url=43. 2. Margaret A. Ellis, Bjarne Stroustrup. The Annotated C++ Reference Manual. Addison Wesley, 1990. 3. OpenC++ library. http://www.viva64.com/go.php?url=16. 4. Andrey Karpov, Evgeniy Ryzhkov. The essence of the code analysis library VivaCore. http://www.viva64.com/art-2-2-449187005.html 5. Semantic Designs site. http://www.viva64.com/go.php?url=19. 6. Interstron Company. http://www.viva64.com/go.php?url=42. 7. What is OpenTS? http://www.viva64.com/go.php?url=17. 8. Evgeniy Ryzhkov. Viva64: what is it and for whom is it meant? 9. http://www.viva64.com/art-1-2-903037923.html 10. Synopsis: A Source-code Introspection Tool. http://www.viva64.com/go.php?url=18. 11. OOO "Program Verification Systems" site. http://www.viva64.com.