[ale] [OT] Writing a parser

Charles Shapiro cshapiro at nubridges.com
Thu Mar 18 09:47:08 EST 2004


Oops. Unbalanced parens are a parse error. Invalid tokens are lexical
errors.

So if your language has "(" and ")", but "{" and "}" must be preceded by
a "\", the lexer would throw an error on a naked "}" but would pass "\}"
back to the parser. But the parser would have to check that each "(" has
a corresponding ")". Assuming, of course, that your language required
that the "()"s balance.

I better stop now.

--  CHS

On Thu, 2004-03-18 at 09:17, Charles Shapiro wrote:
> Oooh, me, me!!
> 
> Parsers generally take collections of elements of a language ("lexical
> tokens") and then translate them in some way.  I think what you might be
> trying to build is a lexer, which takes raw text (or other raw data) and
> cuts it into tokens which you can feed to a parser. Output from a lexer
> is supposed to be already correct, so you can concentrate on parsing it
> later without worrying about lexical errors such as unbalanced
> parentheses (if your language uses parentheses) or invalid tokens.
> 
> If the source you're looking at is generated, it's not the source.
> Parsers and lexers generally involve a lot of coding drudgery, so the
> smart way to build them is with code generators working from a language
> specification. That specification is the real source. The canonical
> lexer//parser generator tool set is lex(1) and yacc(1), called flex(1)
> and bison(1) in the linux development tool set.  I learned to use them
> first by taking the standard demo program for these tools, a calculator
> program, and enhancing it. You can even find a couple of calculator 
> examples in the gnu bison documentation
> (http://www.delorie.com/gnu/docs/bison/bison.html#SEC_Top).
> After I got that to work and understood  the concepts behind it, I was
> able to use flex/bison to write  my own program, available
> at http://tomshiro.org/coldread.  
> 
> One crucial piece of advice which I haven't seen covered elsewhere:
> 
> First, build a really stupid parser which simply recognizes the lexical
> tokens as they go by and prints them out (this is easy in yacc). Then,
> build your lexer and make sure it works the way you want. Write some
> unit tests here so you can change your lexer and be sure it still
> works.Finally, go back and replace the stubbed back-end with one which
> does what you want it to.
> 
> This stuff is really cool and what seduced me into the sad pathetic life
> I now live. Aho, Sethi and Ullman's _Compilers: Principles, Techniques,
> and Tools (Addison-Wesley, 1986), aka the Dragon Book, is worth the
> fabulous price you'll pay for it if you think this is interesting
> 
> -- CHS
> 
> On Thu, 2004-03-18 at 04:18, Kevin Krumwiede wrote:
> > I am working on a program to capture data from a MUD (actually, TW2002).
> >  I've looked at the source for a couple parsers, including one made
> > specifically for that game, but because they are generated code I'm
> > having a lot of difficulty understanding how they work.  
> > 
> > The gist of what I do understand is this (and somebody please tell me if
> > I'm wrong): parsers generally take a string of text and return a numeric
> > code signifying what pattern (if any) the text matches.  A program can
> > then use that code to decide how to process the text.  I assume that
> > sophisticated parsers take into account the preceding context of the
> > text when evaluating its pattern.
> > 
> > I am completely lost when it comes to the input languages of parser
> > generators.  Anyone know of a good tutorial?
> > 
> > Thanks,
> > Krum
> > _______________________________________________
> > Ale mailing list
> > Ale at ale.org
> > http://www.ale.org/mailman/listinfo/ale
> _______________________________________________
> Ale mailing list
> Ale at ale.org
> http://www.ale.org/mailman/listinfo/ale



More information about the Ale mailing list