ambiguity is useful for error recovery/error detection. also, some languages hav...

haberman · on March 15, 2011

> I think parser-generators are unpopular because people would prefer to just write code, rather than compile something else to automatically generated code that is nigh on unreadable.

I agree that generating source code is annoying, which is why Gazelle does not do it. It takes a VM approach instead; the parser is either interpreted or JIT-ted.

> I think the popularity of regexes is due in part to the ease of which they can be embedded or used within the host language - either with syntactic sugar, or simply as a library.

That is exactly what Gazelle is trying to do.

> i'd really prefer a library I can load and manipulate the grammar from, over yet another syntax and compiler in my tool chain.

Gazelle always loads its grammars at runtime. There's a compiler also, but it just generates byte-code that the runtime loads. But you can run the compiler at run-time too if you want.

If you'd rather build a grammar programmatically than use a syntax meant for it, more power to you (Gazelle will support it). But that doesn't seem to match your regex case: people specify regexes with a special syntax, not by building up an expression tree manually. The latter seems like a lot of work to me, and such grammars will not be reusable from other languages, but that might not be important to you.

> (ps. (i'm saddened by the lack of left recursion support in gazelle))

What is a case where you would really miss it, that isn't addressed by a repetition operator (*) or an operator-precedence parser?

_tef · on March 16, 2011

I would like to say: awesome!

And yes most of my left recursion fetish would be covered by an operator precedence parser/left corner parser

beza1e1 · on March 16, 2011

> I think parser-generators are unpopular because people would prefer to just write code, rather than compile something else to automatically generated code that is nigh on unreadable.

Also, a manual lexer/parser can introduce context when necessary. E.g. Python has significant whitespace. The lexer (with context knowledge of indentation width) can easily emit indent/dedent tokens, so the grammar is context free. With a tool like ANTLR you have to do wierd stuff to parse Python.

Programming languages older than 10 years or so are usually not context-free. For example C needs context to parse "A*B", because the meaning depends on whether "A" is a type or a variable. Recent programming languages usually try to be LL(1), which is why the keywords "var", "val", and "def" become so popular.