React Native vs Ionic - The Best Mobile App Framework
Learning to Grow Structured Visual Summaries for Document Collections
1. Learning to Grow Structured Visual Summaries
for Document Collections
Daniil Mirylenka Andrea Passerini
University of Trento, Italy
Machine learning seminar, Waikato University, 2013
4. Building the topic graph:
Overview
1. Map documents to Wikipedia articles
2. Retrieve the parent categories
3. Link categories to each other
4. Merge similar topics
5. Break cycles in the graph
5. Building the topic graph:
Mapping the document to Wikipedia articles
“..we propose a method of summarizing collections
of documents with concise topic hierarchies, and
show how it can be applied to visualization and
browsing of academic search results.”
⇓
“..we propose a method summarizing collections of
documents with concise [[Topic (linguistics) |topic]]
[[Hierarchy |hierarchies]], and show how it can be
applied to [[Visualization (computer graphics)
|visualization]] and [[Web browser |browsing]] of
[[List of academic databases and search engines
|academic search]] results.”
10. Building the topic graph:
Example of an actual topic graph built from 100 abstracts
11. Summarizing the topic graph
Reflection
⇒
What is a summary?
- a set of nodes (topics).
12. Summarizing the topic graph
Reflection
⇒
What is a summary?
- a set of nodes (topics).
What is a good summary?
- ???
13. Summarizing the topic graph
Reflection
⇒
What is a summary?
- a set of nodes (topics).
What is a good summary?
- ???
Let’s learn from examples!
- subjective
14. Summarizing the topic graph
The first attempt
Structured prediction
ˆGT = arg max
GT
F(G, GT )
15. Summarizing the topic graph
The first attempt
Structured prediction
ˆGT = arg max
GT
F(G, GT )
Problem: evaluation on |G|
T
subgraphs
- Example:
300-node topic graph
10-node summary
16. Summarizing the topic graph
The first attempt
Structured prediction
ˆGT = arg max
GT
F(G, GT )
Problem: evaluation on |G|
T
subgraphs
- Example:
300-node topic graph
10-node summary
1 398 320 233 241 701 770 possible subgraphs
(1 million graphs per second ⇒ 44 311 years)
17. Summarizing the topic graph
Key idea
Restriction: summaries should be nested
∅ = G0 ⊂ G1 ⊂ · · · ⊂ GT
18. Summarizing the topic graph
Key idea
Restriction: summaries should be nested
∅ = G0 ⊂ G1 ⊂ · · · ⊂ GT
Now we can build summaries sequentially
Gt = Gt−1 ∪ {vt}
19. Summarizing the topic graph
Key idea
Restriction: summaries should be nested
∅ = G0 ⊂ G1 ⊂ · · · ⊂ GT
Now we can build summaries sequentially
Gt = Gt−1 ∪ {vt}
Still a supervised learning problem
- training data: summary sequences (G, G1, G2, · · · , GT )
- or topic sequences: (G, v1, v2, · · · , vT )
20. Learning to grow summaries
as imitation learning
Imitation learning (racing analogy)
destination: finish
sequence of states
driver’s actions (steering, etc.)
goal: copy the behaviour
Supervised Trai
Expert Trajectories
Learned Policy: aˆsup
(borrowed from the presentation of Stephane Ross)
21. Learning to grow summaries
as imitation learning
Imitation learning (racing analogy)
destination: finish
sequence of states
driver’s actions (steering, etc.)
goal: copy the behaviour
Supervised Trai
Expert Trajectories
Learned Policy: aˆsup
(borrowed from the presentation of Stephane Ross)
Our problem
destination: summary GT
states: intermediate summaries G0, G1, · · · , GT−1
actions: topics v1, v2, · · · , vT added to the summaries
goal: copy the behaviour
22. Learning to grow summaries
How can we do that?
Straightforward approach
Choose a classifier π : (G, Gt−1) → vt
Train on the ‘ground truth’ examples ((G, Gt−1), vt)
Sequentially apply on the new graphs
∅ = ˆG0
π(G,.)
→ ˆG1
π(G,.)
→ · · ·
π(G,.)
→ ˆGT
23. Learning to grow summaries
How can we do that?
Straightforward approach
Choose a classifier π : (G, Gt−1) → vt
Train on the ‘ground truth’ examples ((G, Gt−1), vt)
Sequentially apply on the new graphs
∅ = ˆG0
π(G,.)
→ ˆG1
π(G,.)
→ · · ·
π(G,.)
→ ˆGT
Will it work?
24. Learning to grow summaries
How can we do that?
Straightforward approach
Choose a classifier π : (G, Gt−1) → vt
Train on the ‘ground truth’ examples ((G, Gt−1), vt)
Sequentially apply on the new graphs
∅ = ˆG0
π(G,.)
→ ˆG1
π(G,.)
→ · · ·
π(G,.)
→ ˆGT
Will it work?
No.
(unable to recover from mistakes)
25. Learning to grow summaries
DAgger (dataset aggregation)
S. Ross, G. J. Gordon, and D. Bagnell. A reduction of imitation learning and structured prediction to no-regret
online learning. Journal of Machine Learning Research - Proceedings Track, 15:627635, 2011.
Idea:
train on the states we are going to encounter
(our own-generated states)
26. Learning to grow summaries
DAgger (dataset aggregation)
S. Ross, G. J. Gordon, and D. Bagnell. A reduction of imitation learning and structured prediction to no-regret
online learning. Journal of Machine Learning Research - Proceedings Track, 15:627635, 2011.
Idea:
train on the states we are going to encounter
(our own-generated states)
How can we do that?
We haven’t trained the classifier yet!
27. Learning to grow summaries
DAgger (dataset aggregation)
S. Ross, G. J. Gordon, and D. Bagnell. A reduction of imitation learning and structured prediction to no-regret
online learning. Journal of Machine Learning Research - Proceedings Track, 15:627635, 2011.
Idea:
train on the states we are going to encounter
(our own-generated states)
How can we do that?
We haven’t trained the classifier yet!
We will do it iteratively (for i = 0, 1,)
train the classifier πi on the dataset Di
generate the trajectories using πi
add new states to the dataset Di+1
28. Learning to grow summaries
Collecting the actions
DAgger (dataset aggregation)
iterating, we collect states
but we also need actions
29. Learning to grow summaries
Collecting the actions
DAgger (dataset aggregation)
iterating, we collect states
but we also need actions
“Let the expert steer”
Q: What action is optimal?
A: One that brings us closest to
the optimal trajectory.
DAgger: Dataset Aggregation
• Collect new trajectories with 1
1
14
Steering from
expert
(borrowed from the presentation of Stephane Ross)
30. Learning to grow summaries
Recap of the algorithm
The algorithm
‘ground truth’ dataset: points
(state, action)
train π on the ‘ground truth’
dataset
apply π to the initial states
- generate the trajectories
generate expert’s actions
add new state-action pairs to
the dataset
repeat
DAgger: Dataset Aggregation
• Collect new trajectories with 1
1
14
Steering from
expert
(borrowed from the presentation of Stephane Ross)
31. Learning to grow summaries
Training the classifier
Classifier
π : (G, Gt−1) → vt
Scoring function
F(G, Gt−1, vt) = w, Ψ (G, Gt−1, vt)
Prediction
vt = arg maxv F(G, Gt−1, v)
Learning: SVMstruct
- ensures that optimal topics score best
32. Learning to grow summaries
Providing the expert’s actions
Expert’s action
brings us closest to the optimal trajectory
Technically
by minimizing the loss function
vt = arg min
v
G (Gt−1 ∪ {v}, Gopt
t )
Loss functions
graphs as topic sets ⇒ redundancy
key: consider similarity between the topics
33. Learning grow summaries
Graph features
Some of the features:
document coverage
transitive document coverage
average and max. overlap between topics
average and max. parent-child overlap
the height of the graph
the number of connected components
...