operator-precedence parsing

March 18, 2010 at 18:34:11
Specs: Win XP, P4/1GB
I am trying to design an expression parser for a simple language. I've read a number of compiler theory books and Googled a number of sites covering operator-precedence parsers. Most of it is cut and dried. Binary operators are easy to deal with. The problem is with overloaded operators, such as a unary minus. Authors seem to "leave this as an exercise to the reader". Code samples seem to take a very ad-hoc approach, such as "assume the minus is unary, then check if the previous token was a ) or variable name (in which case it's binary). Is a language-specific ad-hoc solution the only way?

The token at the front of the input stream must be exactly determined (binary subtract vs unary negate vs other uses such as "set difference"). Only then can the proper column in the shift-reduce table be selected, and the appropriate action be taken. Are ad-hoc solutions the only ones, or is there something more elegant and general?

I'm trying to avoid the heavy overhead of a lex/yacc (flex/bison) built parser, with all the "factor - term - expression" complexity, and just cobble together a nice simple handmade o-p parser. If I can come up with an elegant way of handling overloaded operators, the rest is easy.

See More: operator-precedence parsing

Report •

March 19, 2010 at 12:45:07
I should add that a similar problem arises when one operator is a superset of another. For instance, I have "a" on the operand stack, and the input stream starts with "--". This is ambiguous: -- could be a post decrement on "a", a pre decrement on "b" (ruled out because it is immediately preceded by an operand token), or it could be "binary subtract" followed by "unary negate". "a--b" should only be tokenized as the last case: "a - (-b)". So, is there something other than an ad-hoc way, without looking some indefinite number of tokens ahead, to tell what this operator(s) is/are? I suppose I could split the execution stream (possibly with a child process?) and follow both until one gives an error.

a--b... a shifted onto operand stack, leaving --b...
assume -- decrement operator
postfix? a-- is permissible. next token is an operand without a binary operator before it, so error, backtrack (abandon)
prefix? requires operator immediately before, so fails
binary subtract? eat one '-' as binary subtract, leaving -b...
next token is '-' but not '--': is either binary subtract or unary negate
binary doesn't work because an operator not ) or operand was immediately before
unary works... so far. now input is b...

Anyway, it gets rather ad-hoc to come up with all these rules and special cases to tell what '--' means, so I'm hoping there's something more elegant. Thanks!

Report •
Related Solutions

Ask Question