Our internal coding standard for C++ source files dictates that 7-bit US-ASCII should be used for file encoding.
This decision is based on the fact that the current C++ standard (2003) limits characters that can be used in variable and type identifiers to ASCII letters. Although some compilers and the new (2011) C++ standard allow most Unicode code points in identifiers (basically whatever can be called a “letter” in the various scripts), the “same-glyph, different Unicode code-point syndrome” described here advises against that.
One could still allow non-ASCII characters in string constants and in comments, and this is tolerated by most modern compilers. But the decision was to be quite conservative in the current standard; in the future, as C++ 2011 is fully implemented, we might revise it.
The trouble is that sometimes non-ASCII characters sneak in, for example the euro sign €, the degree symbol ° and the dash – which looks so similar to the minus sign –.
Long story short, we needed an utility to detect non-ASCII characters in a collection of text (source) files. This utility is called checkAscii, and the C++ source code is:
/* @file checkAscii.cc @brief Detect non-ASCII characters in a text file @author (C) Copyright 2012 Paolo Greppi libpf.com @date 20120525 @version 0.1 no warranties whatsoever distribute freely and free of charge citing this:
Usage is as follows:
cat mySourceFile.cc | checkAscii
It will print this if non-ASCII characters are found (and return the number of found non-ASCII characters):
Now checking file mySourceFile.h line: 53 column: 26 nonascii ° line: 54 column: 27 nonascii €
or will print this (and return 0) if only ASCII characters are found:
Now checking file mySourceFile.h
We use it on large sets of files using bash and xargs as follows:
ls -1 include/*.h | xargs -d '\n' -n 1 checkAscii ls -1 src/*.cc | xargs -d '\n' -n 1 checkAscii