Parse Expressions
AnyText uses Parse Expression Grammars (PEGs) introduced by Bryan Ford.
Parse Expression Grammars
The basic idea is that a rule denotes a parse expression and if A and B are parsing expressions, then:
A | Bdenotes the ordered choice. This means that the parser will not try to matchBunless matchingAfails. This is in stark contrast to context-free grammars where the order of productions does not matter. In the literature, one therefore often finds/as notation to make the difference clearer, but AnyText decided to use the operator|for better compatibility with Xtext and Langium.A Bis the ordered sequence in which a parser would matchAand thenB, if matchingAsucceeded.'a'is a parse expression for any character sequencea. AnyText will treat these literals as keywords or operators (by default, an alphabetic literal is a keyword, everything else is an operator)A*is a parse expression that matchesAarbitrarily many times or not at all.A+is a parse expression that matchesAarbitrarily many times, but at least once.A?is a parse expression that matchesAoptionally, i.e., once or not at all.(A)is a parse expression.&Ais a positive lookahead. This means, the parser will see whether the non-terminalAcan expand at the current position, but the parser will not change the current position.!Ais a negative lookahead. This is similar to a positive lookahead except that the parser will accept the position if matchingAfails and will fail if matchingAsucceeds.
Thanks to the unlimited lookaheads, PEGs can parse some grammars that are not context-free. In DSLs, they are useful to exclude certain scenarios.
Assignments
In order define how semantic models are created from parse trees, AnyText supports the following assignment expressions, where again A is a parse expression:
feature=Aassigns the semantic value ofAto the propertyfeatureof the current semantic element.feature+=Aadds the semantic value ofAto the collectionfeatureof the current semantic element.feature?=Aassigns the valuetrueto the propertyfeatureof the current semantic element.
All of these assignments are executed for every rule application that gets activated, i.e., gets part of the parse tree.
If R is a class rule, then the following assignments are also supported:
feature=[R]assigns a reference to the ruleRto the propertyfeatureof the current semantic element. AnyText looks up the identifier ofRto determine the parse expression used to parse the reference.feature=[R:A]assigns a reference to the ruleRto the propertyfeatureof the current semantic element. The parse expressionAis used to indicate the text for the reference.feature+=[R]adds a reference to the ruleRto the propertyfeatureof the current semantic element. AnyText looks up the identifier ofRto determine the parse expression used to parse the reference.feature+=[R:A]adds a reference to the ruleRto the propertyfeatureof the current semantic element. The parse expressionAis used to indicate the text for the reference.
Reference resolution is done in the semantic model. By default, AnyText will look up in the containment hierarchy to find a model element with the given identifier. However, this can (and in many cases: has to) be overridden in the generated code. Also, it is possible to customize how AnyText will synthesize text for a given element reference.
Identifiers
By default, AnyText will treat feature assignments of features called name or id as an identifier. However, this can be customized with a dedicated command-line argument -i or --identifierNames of the code generator. You can specify multiple feature names separated by a space. Note that if you specify this command-line parameter, the default values no longer apply.