[ale] [OT] Writing a parser

Thu Mar 18 09:24:51 EST 2004

Oooh, me, me!!

Parsers generally take collections of elements of a language ("lexical
tokens") and then translate them in some way.  I think what you might be
trying to build is a lexer, which takes raw text (or other raw data) and
cuts it into tokens which you can feed to a parser. Output from a lexer
is supposed to be already correct, so you can concentrate on parsing it
later without worrying about lexical errors such as unbalanced
parentheses (if your language uses parentheses) or invalid tokens.

If the source you're looking at is generated, it's not the source.
Parsers and lexers generally involve a lot of coding drudgery, so the
smart way to build them is with code generators working from a language
specification. That specification is the real source. The canonical
lexer//parser generator tool set is lex(1) and yacc(1), called flex(1)
and bison(1) in the linux development tool set.  I learned to use them
first by taking the standard demo program for these tools, a calculator
program, and enhancing it. You can even find a couple of calculator 
examples in the gnu bison documentation
(http://www.delorie.com/gnu/docs/bison/bison.html#SEC_Top).
After I got that to work and understood  the concepts behind it, I was
able to use flex/bison to write  my own program, available
at http://tomshiro.org/coldread.  

One crucial piece of advice which I haven't seen covered elsewhere:

First, build a really stupid parser which simply recognizes the lexical
tokens as they go by and prints them out (this is easy in yacc). Then,
build your lexer and make sure it works the way you want. Write some
unit tests here so you can change your lexer and be sure it still
works.Finally, go back and replace the stubbed back-end with one which
does what you want it to.

This stuff is really cool and what seduced me into the sad pathetic life
I now live. Aho, Sethi and Ullman's _Compilers: Principles, Techniques,
and Tools (Addison-Wesley, 1986), aka the Dragon Book, is worth the
fabulous price you'll pay for it if you think this is interesting

-- CHS

On Thu, 2004-03-18 at 04:18, Kevin Krumwiede wrote:
> I am working on a program to capture data from a MUD (actually, TW2002).
>  I've looked at the source for a couple parsers, including one made
> specifically for that game, but because they are generated code I'm
> having a lot of difficulty understanding how they work.  
> 
> The gist of what I do understand is this (and somebody please tell me if
> I'm wrong): parsers generally take a string of text and return a numeric
> code signifying what pattern (if any) the text matches.  A program can
> then use that code to decide how to process the text.  I assume that
> sophisticated parsers take into account the preceding context of the
> text when evaluating its pattern.
> 
> I am completely lost when it comes to the input languages of parser
> generators.  Anyone know of a good tutorial?
> 
> Thanks,
> Krum
> _______________________________________________
> Ale mailing list
> Ale at ale.org
> http://www.ale.org/mailman/listinfo/ale