4. Parsing in Xtext
• Xtext continuously parses documents in editor
• Parsing needs to be fast
• Parsing needs good error recovery
• Xtext uses ANTLR 3.2
• LL(*) parser generator
5. LL-Parsing
• Traverse the document from beginning to end
• Each rule becomes a method
• Iterate:
1. Look ahead the minimum number of tokens to
decide on the next path segment
2. Rewind and generate AST along that path
segment
7. Lookahead: Syntax Only
Class:
... members+=Member* ...;
Member:
Field | Method;
Field:
type=[Type] name=ID;
Method:
return=[Type] name=ID '(' ')';
int foo
double bar()
8. Lookahead: Syntax Only!
Class:
... members+=Member* ...;
Member:
Field | Method;
Field:
type=[Type] name=ID;
Method:
return=[Type] name=ID '(' ')';
int foo
double bar()
ID
ID
9. Lookahead: Common Tokens
Class:
... members+=Member* ...;
Member:
Field | Method;
Field:
type=[Type] name=ID;
Method:
return=[Type] name=ID '(' ')';
int foo
double bar()
ID
ID
Lookahead
k=3
10. Lookahead: If-Cascade
Class:
... members+=Member* ...;
Member:
Field | Method;
Field:
type=[Type] name=ID;
Method:
return=[Type] name=ID '(' ')';
int foo
double bar()
ID
ID
Lookahead
k=3
Member
ID
ID
yes
error
no
(
yes no
Method
yes
Field
no
15. Execution
Class:
... members+=Member* ...;
Member:
Field | Method;
Field:
type=[Type] name=ID;
Method:
return=[Type] name=ID '(' ')';
int foo double bar()
ID
ID
Member
ID
ID
yes
error
no
(
yes no
Method
yes
Field
no
It’s a Field!
18. Execution
Class:
... members+=Member* ...;
Member:
Field | Method;
Field:
type=[Type] name=ID;
Method:
return=[Type] name=ID '(' ')';
int foo double bar()
Member
ID
ID
yes
error
no
(
yes no
Method
yes
Field
no
Class:
... members+=Member* ...;
Member:
Field | Method;
Field:
type=[Type] name=ID;
Method:
return=[Type] name=ID '(' ')';ID
ID
19. Execution
Class:
... members+=Member* ...;
Member:
Field | Method;
Field:
type=[Type] name=ID;
Method:
return=[Type] name=ID '(' ')';
int foo double bar()
Member
ID
ID
yes
error
no
(
yes no
Method
yes
Field
no
Class:
... members+=Member* ...;
Member:
Field | Method;
Field:
type=[Type] name=ID;
Method:
return=[Type] name=ID '(' ')';ID
ID
20. Execution
Class:
... members+=Member* ...;
Member:
Field | Method;
Field:
type=[Type] name=ID;
Method:
return=[Type] name=ID '(' ')';
int foo double bar()
Member
ID
ID
yes
error
no
(
yes no
Method
yes
Field
no
It’s a Method!
Class:
... members+=Member* ...;
Member:
Field | Method;
Field:
type=[Type] name=ID;
Method:
return=[Type] name=ID '(' ')';ID
ID
26. Simple Example
Class:
... members+=Member* ...;
Member:
Field | Method;
Field:
type=[Type]? name=ID;
Method:
return=[Type] name=ID '(' ')';
int foo
double bar()
bazwarning(200): ..InternalMyDsl.g:258:2:
Decision can match input such as "RULE_ID" using multiple
alternatives: 1, 2
As a result, alternative(s) 2 were disabled for that input
error(201): ..InternalMyDsl.g:258:2: The following alternatives can
never be matched: 2
27. Simple Example
Class:
... members+=Member* ...;
Member:
Field | Method;
Field:
type=[Type]? name=ID;
Method:
return=[Type] name=ID '(' ')';
int foo
double bar()
baz
• ID ID matches
• two fields w/o type
• one field w type
➡ Ambiguity
28. Solution I:
Change Syntax
Class:
... members+=Member* ...;
Member:
Field | Method;
Field:
('val' | type=[Type]) name=ID;
• Add more keywords
• Change order
• etc.
• Really resolves the ambiguity
• Too many keywords make language verbose
• Unfortunately not always possible
New keyword
31. Backside of Backtracking
• Can yield exponential parse time
• Suppresses all ambiguity warnings in the grammar
• Puts syntactic predicates on all first choices
• Language implementor should make decisions
32. Solution III:
Syntactic Predicate
Class:
... members+=Member* ...;
Member:
Field | Method;
Field:
=>(type=[Type] name=ID)
| name=ID;
Method:
type=[Type] name=ID '(' ')';
int foo
double bar()
baz
• Start local backtracking
• “If the token sequence
matches, go this way!”
Syntactic
Predicate
33. Solution III:
Syntactic Predicate
Class:
... members+=Member* ...;
Member:
Field | Method;
Field:
=>(type=[Type] name=ID)
| name=ID;
Method:
type=[Type] name=ID '(' ')';
int foo
double bar()
baz
• Start local backtracking
• “If the token sequence
matches, go this way!”
Syntactic
Predicate
35. Solution III:
Syntactic Predicate
Class:
... members+=Member* ...;
Member:
Field | Method;
Field:
name=ID
| =>(type=[Type] name=ID);
Method:
type=[Type] name=ID '(' ')';
int foo
double bar()
baz
Wrong
Order
36. Caveats
• SPs remove all warnings on the local decision
• SPs are executed in the order of alternatives
• Always write tests!
• Apply the SP to a minimum set of tokens only to limit
lookahead
• Prefer first token set syntactic predicates
37. First Token Set Predicate
XExpressionOrSimpleConstructorCall returns XExpression:
->XbaseConstructorCall | XExpression
;
XbaseConstructorCall returns XConstructorCall:
{xbase::XConstructorCall}
'new' constructor=[types::JvmConstructor|QualifiedName]
(=> ... // further stuff with complicated lookahead
First Token
• “If you find ‘new’, go this way!”
• Shortens lookahead
• Improves performance and error recovery
First Token
Predicate
40. Detecting Ambiguities
// with Xbase
Model:
expr=XExpression;
XLiteral returns XExpression:
RGBLiteral |
XCollectionLiteral |
XClosure |
XBooleanLiteral |
XNumberLiteral |
XNullLiteral |
XStringLiteral |
XTypeLiteral;
RGBLiteral:
red=INT ':' green=INT ':' blue=INT;
255:10:10 // an RGB literal
warning(200): ..InternalXbaseExample.g:119:1:
Decision can match input such as "RULE_INT ':' RULE_INT ':'
RULE_INT" using multiple alternatives: 1, 5
As a result, alternative(s) 5 were disabled for that input
Semantic predicates were present but were hidden by actions.
45. Non LL*
Class:
... members+=Member* ...;
Member:
Field | Method;
Field:
type=TypeRef name=ID;
Method:
return=TypeRef name=ID '(' ')';
TypeRef:
name=[Type] ('<' typeArg=TypeRef '>')?;
Map<String, Set<String>> foo
List<String> bar()
Recursion in
Lookahead
error(211): ..InternalMyDsl.g:119:1:
[fatal] rule ruleMember has non-LL(*) decision due to
recursive rule invocations reachable from alts 1,2.
Resolve by left-factoring or using syntactic predicates or
using backtrack=true option.