7. Lessons learned
• data gathering from PDF is only OK for
some data
• alot of cleanup work + complexity with
distributed clean up data
• future: more structured data as a starting
point.
8. What we want...
• clean citation data
• geographical data: author - affiliation links
• structured data
• ...