We combine transition-based dependency parsing with a high performing but relatively underexplored machine learning technique, Robust Risk Minimization. During decoding, we judiciously prune the next parsing states using k-best ranking. Moreover, we apply a simple post-processing to ensure robustness. We evaluate our approach on the CoNLL’09 shared task English data and improve transition-based dependency parsing accuracy, reaching a labeled attachment score of 89.28%. We also have observed near quadratic average running time in practice for the algorithm.
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
K -best, Locally Pruned, Transition-based Dependency Parsing Using Robust Risk Minimization
1. K-best, Locally-pruned,
Transition-based
Dependency Parsing using
Robust Risk Minimization
Jinho D. Choi
University of Colorado at Boulder
J. D. Power and Associates
September 9, 2009
2. Dependency Structure
• What is dependency?
- Syntactic or semantic relation between word-tokens
• Syntactic: NMOD (a beautiful woman)
• Semantic: LOC (places in this city), TMP (events in this year)
• Phrase structure vs. dependency structure
- Constituents vs. dependencies
S
bought
NP VP SBJ OBJ
Pro V NP she car
she bought Det N DET
a
a car
3. Dependency Graph
• For a sentence s = w .. w , a dependency graph G = (V , E )
1 n s s s
- V = {w = root, w , ... , w }
s 0 1 n
- E = {(w , r, w ) : w ! w , w ! V , w ! V - {w }, r ! R }
s i j i j i s j s 0 s
- R = a set of all dependency relations in s
s
• A well-formed dependency graph
- Unique root, single head, connected, acyclic ! dependency tree
- Projective vs. non-projective
root She bought a car
O(n)
vs.
root She bought a car yesterday that was blue O(n2)
4. Dependency Parsing Models
• Transition-based parsing model
- Transition: an operation that searches for a dependency
relation between each pair of words (e.g. Left-Arc, Shift, etc.)
- Greedy search that finds local optimums (locally optimized
transitions) " do better for short-distance dependencies
- Nivre’s algorithm (p, O(n)), Covington’s algorithm (n, O(n2))
• Graph-based parsing model
- Build a complete graph with directed/weighted edges and find
the tree with the highest score (sum of all weighted edges)
- Exhaustive search that finds for the global optimum (maximum
spanning tree) " do better for long-distance dependencies
- Eisner’s algorithm (p, O(n2)), Edmonds’ algorithm (n, O(n3))
5. Nivre’s List-based Algorithm
• Transition-based, non-projective dependency parsing algorithm
• # , # != lists of partially processed tokens
1 2
$ != a list of remaining unprocessed tokens
• Initialization: (# , # , $, A) = ([0], [ ], [1, 2, . . . , n], { })
1 2
Termination: (#1, #2, $, A) = ([...], [...], [ ], {...})
Deterministic shift vs. non-deterministic shift
11. Nivre’s List-based Algorithm
root She bought a car
bought
she a
root car
!1 !2 " A
• Initialize
• Shift : she
• Left-Arc : she ! bought
12. Nivre’s List-based Algorithm
root She bought a car
bought
a
root she car she ! bought
!1 !2 " A
• Initialize
• Shift : she
• Left-Arc : she ! bought
13. Nivre’s List-based Algorithm
root She bought a car
bought
a
root she car she ! bought
!1 !2 " A
• Initialize
• Shift : she
• Left-Arc : she ! bought
• Right-Arc : root " bought
14. Nivre’s List-based Algorithm
root She bought a car
bought
root a root " bought
she car she ! bought
!1 !2 " A
• Initialize
• Shift : she
• Left-Arc : she ! bought
• Right-Arc : root " bought
15. Nivre’s List-based Algorithm
root She bought a car
bought
root a root " bought
she car she ! bought
!1 !2 " A
• Initialize
• Shift : she
• Left-Arc : she ! bought
• Right-Arc : root " bought
• Shift : root, she, bought
16. Nivre’s List-based Algorithm
root She bought a car
bought
she a root " bought
root car she ! bought
!1 !2 " A
• Initialize
• Shift : she
• Left-Arc : she ! bought
• Right-Arc : root " bought
• Shift : root, she, bought
17. Nivre’s List-based Algorithm
root She bought a car
bought
she a root " bought
root car she ! bought
!1 !2 " A
• Initialize • Shift : a
• Shift : she
• Left-Arc : she ! bought
• Right-Arc : root " bought
• Shift : root, she, bought
18. Nivre’s List-based Algorithm
root She bought a car
a
bought
she root " bought
root car she ! bought
!1 !2 " A
• Initialize • Shift : a
• Shift : she
• Left-Arc : she ! bought
• Right-Arc : root " bought
• Shift : root, she, bought
19. Nivre’s List-based Algorithm
root She bought a car
a
bought
she root " bought
root car she ! bought
!1 !2 " A
• Initialize • Shift : a
• Shift : she • Left-Arc : a ! car
• Left-Arc : she ! bought
• Right-Arc : root " bought
• Shift : root, she, bought
20. Nivre’s List-based Algorithm
root She bought a car
bought a ! car
she root " bought
root a car she ! bought
!1 !2 " A
• Initialize • Shift : a
• Shift : she • Left-Arc : a ! car
• Left-Arc : she ! bought
• Right-Arc : root " bought
• Shift : root, she, bought
21. Nivre’s List-based Algorithm
root She bought a car
bought a ! car
she root " bought
root a car she ! bought
!1 !2 " A
• Initialize • Shift : a
• Shift : she • Left-Arc : a ! car
• Left-Arc : she ! bought • Right-Arc : bought " car
• Right-Arc : root " bought
• Shift : root, she, bought
22. Nivre’s List-based Algorithm
root She bought a car
bought " car
a ! car
she bought root " bought
root a car she ! bought
!1 !2 " A
• Initialize • Shift : a
• Shift : she • Left-Arc : a ! car
• Left-Arc : she ! bought • Right-Arc : bought " car
• Right-Arc : root " bought
• Shift : root, she, bought
23. Nivre’s List-based Algorithm
root She bought a car
bought " car
a ! car
she bought root " bought
root a car she ! bought
!1 !2 " A
• Initialize • Shift : a
• Shift : she • Left-Arc : a ! car
• Left-Arc : she ! bought • Right-Arc : bought " car
• Right-Arc : root " bought • Shift: bought, a, car
• Shift : root, she, bought
24. Nivre’s List-based Algorithm
root She bought a car
car
a bought " car
bought a ! car
she root " bought
root she ! bought
!1 !2 " A
• Initialize • Shift : a
• Shift : she • Left-Arc : a ! car
• Left-Arc : she ! bought • Right-Arc : bought " car
• Right-Arc : root " bought • Shift: bought, a, car
• Shift : root, she, bought
25. Nivre’s List-based Algorithm
root She bought a car
car
a bought " car
bought a ! car
she root " bought
root she ! bought
!1 !2 " A
• Initialize • Shift : a
• Shift : she • Left-Arc : a ! car
• Left-Arc : she ! bought • Right-Arc : bought " car
• Right-Arc : root " bought • Shift: bought, a, car
• Shift : root, she, bought • Terminate
26. Robust Risk Minimization
• Linear binary classification algorithm
- Searches for a hyperplane h(x) = w ·x ! ! that separates two
T
classes, -1 and 1, where class(xi) = (h(xi) < 0) ? -1 : 1.
- Finds " and ^! that solve the following optimization problem.
• Advantages
- Learns irrelevant features faster (than Perceptron).
- Deals with non-linearly separable data more flexibly.
27. K-best, Locally-pruned Parsing
• RRM is a binary classification algorithm.
- One-against-all method using multiple classifiers.
- What if more than one classifier predict transitions?
• Pick the transition with the highest score.
• What if the highest scoring transition is not correct?
28. K-best, Locally-pruned Parsing
• Predicting a wrong transition at any state can generate a
completely different tree (from as it would be in gold-standard).
• It is better to use k-best transitions instead of 1-best.
- Derive several trees and pick the one with the highest score.
- score(tree) = % score(transition)
" transitions used to derive the tree
- Problem with the above equation (addressed yesterday)
• A tree derived by a longer sequence of transitions win.
• Normalize the score by the total number of transitions.
• score(tree) = 1/|T|·% score(transition)
" transitions
29. Post-processing
• The output from the transition-based parser is not guaranteed
to be a tree but rather a forest.
- It is possible for some tokens not found their heads.
- For each such token, compare it against all other tokens and
pick the one that gives the highest score to be the head.
- For such w ,j
• Compare it against all w and see which wi gives the
i<j
highest scoring Right-Arc transition.
• Compare it against all w j<kand see which wk gives the
highest scoring Left-Arc transition.
30. Feature Space
• About 14 million features
• f: form, m: lemma, p: pos-tag, d: dependency label
• lm(w): left-most dependent , ln(w): left-nearest dependent
rm(w): right-most dependent, rn(w): right-nearest dependent
31. Evaluation
• Models
I. Greedy search using the highest scoring transition
II. Best search using all predicted transitions
III. II + using the upper bound of 1
IV. III + using the lower bound of "0.1
V. III + using the lower bound of "0.2
VI. V + using top 2 scoring transitions
VII. VI + post-processing
32. Evaluation
• Parsing accuracies
Labled Attachment Score Unlabeled Attachment Score
95.00
91.25
90.97
90.12 90.47 90.47
89.21 89.34 89.42 89.28
88.62 88.87 88.87
87.50 87.88 87.96 88.08
83.75
80.00
I II III IV V VI VII
33. Evaluation
• Average number of transitions
I II-III IV V VI-VII
1,500
1,125
750
375
0
2007 1-10 11-20 21-30 31-40 41-50 > 50
34. Summary and Conclusions
• Summary
- Transition-based, non-projective dependency parsing
- k-best, locally pruned dependency parsing
- Post-processing
- Robust Risk Minimization
• Conclusions
- It is possible to achieve higher parsing accuracy by considering
k-best, locally pruned trees,
- while keeping near quadratic running time in practice.
35. Future Work
• Parsing Algorithm
- Search transitions for both left and right sides of "[0].
- Beam search.
- Normalize scores and use priors for transitions.
• Feature
- Cut-off ones less than a threshold.
- Predicate-argument structure from frameset files.
• Machine learning algorithm
- Apply different values for learning parameters.
- Compare with Perceptron, Support Vector Machine.