Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Making document search system slightly friendlier to the power user
1. Making document search system
slightly friendlier to the power user.
Judgements search case study
Michał Łopuszyński
2017.11.29, London, UK
Search Solutions 2017
2. saos.org.pl
Before judgements scattered between many search systems•
Goal: Unify access to Polish case-law•
We provide unified search, rest API , WCAG compliant service•
Data volume ~ 300k documents and growing•
Constitutional
Tribunal
Import, metadata extraction
http://saos.org.pl
Supreme
Court
Common
Courts
National
Appeals
Chamber
API
Search
Analysis
~3k daily visits•
3. saos.org.pl
Side-goal: provide some non-mainstream approaches to
explore document collections
•
The analysis tool (the trender) – in production•
Creating maps of document collections – only in the lab•
7. Maps of document collections – a caveat
All low dimensional "embeddings" are wrong•
Some are useful (perhaps)•
The graph from Matti Lyra, PyData Berlin 2017, https://www.youtube.com/watch?v=UkmIljRIG_M
For t-SNE, see also https://distill.pub/2016/misread-tsne/
8. Maps of document collections – PCA vs t-SNE
PCA t-SNE
2000 judgements from National Appeal Chamber, common court,
Supreme Court, and Constitutional Tribunal visualised
•
M.Jungiewicz, M. Łopuszyński,
Towards Meaningful Maps of
Polish Case Law, JURIX 2015,
185 (2015)
9. Maps of document collections – PCA vs t-SNE
The previous picture coloured by issuing court (however, note that
issuing court was not used directly in map generation process)
•
National Appeal Chamber
common courts
Supreme Court
Constitutional Tribunal
PCA t-SNE
M.Jungiewicz, M. Łopuszyński,
Towards Meaningful Maps of
Polish Case Law, JURIX 2015,
185 (2015)
10. Maps of document collections – t-SNE example
2000 judgements from
common courts
tagged with different
keywords
•
granting
pensions
military
pensions
increase/recalculation
of pensions
pension
compensation
offence
agreement
personal rights
M.Jungiewicz, M. Łopuszyński,
Towards Meaningful Maps of
Polish Case Law, JURIX 2015,
185 (2015)
11. Maps of document collections – in the wild
Demo of Andrej Karpathy – papers, t-SNE based•
http://cs.stanford.edu/people/karpathy/scholaroctopus/
Paperscape – papers, based on citation networks•
http://paperscape.org
12. Acknowledgements
The Team•
Piotr Waglowski (the boss)•
Data science team: Michał Jungiewicz, Michał Łopuszyński•
Tech team: Łukasz Dumiszewski (tech lead), Aleksander Nowiński,
Monika Maksymiuk, Krzysztof Mądry, Łukasz Pawełczak, Jan Pavtel
•
The funding•
Grant of National Centre for Research and Development (PL),
within Social Innovations programme
•
Network analysis team: Michał Bojanowski, Bartosz Chrol
Monika Pawluczuk,
•
13. Thank you for your attention!
Questions?
@lopusz
http://slideshare.net/lopusz