A Lexical Analyizer for C++ tokens

Lexical analysis is step 1 of compiling code down to machine language. The process breaks source code down into a long list of pieces called tokens. This list of tokens is used by a parser algorithm that extracts meaning from the order and arrangement of the tokens. Here is a small example of lex analysis:


int main(void) {

float myvar = 2.5;

return 0;


list of tokens:

  1. int type
  2. main reserved word
  3. (
  4. void keyword
  5. )
  6. {
  7. float keyword
  8. myvar identifier
  9. = operator
  10. 2.5 floating point constant
  11. ; end statement
  12. return keyword
  13. 0 integer constant
  14. ; end statemant
  15. }

As you can see, the list of tokens gets long rather quickly. Absolutely no syntax checking is done during lex. That happens later down the line.

I have written a basic lexical analyzer to break C++ code into tokens.  Its own source code is C++ as well.