SlideShare una empresa de Scribd logo
1 de 13
Descargar para leer sin conexión
Making document search system
slightly friendlier to the power user.
Judgements search case study
Michał Łopuszyński
2017.11.29, London, UK
Search Solutions 2017
saos.org.pl
Before judgements scattered between many search systems•
Goal: Unify access to Polish case-law•
We provide unified search, rest API , WCAG compliant service•
Data volume ~ 300k documents and growing•
Constitutional
Tribunal
Import, metadata extraction
http://saos.org.pl
Supreme
Court
Common
Courts
National
Appeals
Chamber
API
Search
Analysis
~3k daily visits•
saos.org.pl
Side-goal: provide some non-mainstream approaches to
explore document collections
•
The analysis tool (the trender) – in production•
Creating maps of document collections – only in the lab•
The trender
The trender – saos.org.pl/analysis
Maps of document collections
Maps of document collections – a caveat
All low dimensional "embeddings" are wrong•
Some are useful (perhaps)•
The graph from Matti Lyra, PyData Berlin 2017, https://www.youtube.com/watch?v=UkmIljRIG_M
For t-SNE, see also https://distill.pub/2016/misread-tsne/
Maps of document collections – PCA vs t-SNE
PCA t-SNE
2000 judgements from National Appeal Chamber, common court,
Supreme Court, and Constitutional Tribunal visualised
•
M.Jungiewicz, M. Łopuszyński,
Towards Meaningful Maps of
Polish Case Law, JURIX 2015,
185 (2015)
Maps of document collections – PCA vs t-SNE
The previous picture coloured by issuing court (however, note that
issuing court was not used directly in map generation process)
•
National Appeal Chamber
common courts
Supreme Court
Constitutional Tribunal
PCA t-SNE
M.Jungiewicz, M. Łopuszyński,
Towards Meaningful Maps of
Polish Case Law, JURIX 2015,
185 (2015)
Maps of document collections – t-SNE example
2000 judgements from
common courts
tagged with different
keywords
•
granting
pensions
military
pensions
increase/recalculation
of pensions
pension
compensation
offence
agreement
personal rights
M.Jungiewicz, M. Łopuszyński,
Towards Meaningful Maps of
Polish Case Law, JURIX 2015,
185 (2015)
Maps of document collections – in the wild
Demo of Andrej Karpathy – papers, t-SNE based•
http://cs.stanford.edu/people/karpathy/scholaroctopus/
Paperscape – papers, based on citation networks•
http://paperscape.org
Acknowledgements
The Team•
Piotr Waglowski (the boss)•
Data science team: Michał Jungiewicz, Michał Łopuszyński•
Tech team: Łukasz Dumiszewski (tech lead), Aleksander Nowiński,
Monika Maksymiuk, Krzysztof Mądry, Łukasz Pawełczak, Jan Pavtel
•
The funding•
Grant of National Centre for Research and Development (PL),
within Social Innovations programme
•
Network analysis team: Michał Bojanowski, Bartosz Chrol
Monika Pawluczuk,
•
Thank you for your attention!
Questions?
@lopusz
http://slideshare.net/lopusz

Más contenido relacionado

La actualidad más candente

Migration statistics in Eurostat - Definition, statistical production and dis...
Migration statistics in Eurostat - Definition, statistical production and dis...Migration statistics in Eurostat - Definition, statistical production and dis...
Migration statistics in Eurostat - Definition, statistical production and dis...Giampaolo Lanzieri
 
Corpus Protocols IFLA Geneva August 2014 by Neil Smyth and Stella Wisdom
Corpus Protocols IFLA Geneva August 2014 by Neil Smyth and Stella WisdomCorpus Protocols IFLA Geneva August 2014 by Neil Smyth and Stella Wisdom
Corpus Protocols IFLA Geneva August 2014 by Neil Smyth and Stella WisdomStella Wisdom
 
GND and URIs: Integration and Identification
GND and URIs: Integration and IdentificationGND and URIs: Integration and Identification
GND and URIs: Integration and IdentificationReinhold Heuvelmann
 
Wikidata Introduction, Linked Digital Future Initiative, August 2019
Wikidata Introduction, Linked Digital Future Initiative, August 2019Wikidata Introduction, Linked Digital Future Initiative, August 2019
Wikidata Introduction, Linked Digital Future Initiative, August 2019Beat Estermann
 
Data centre networking at the University of Bristol - Networkshop44
Data centre networking at the University of Bristol  - Networkshop44Data centre networking at the University of Bristol  - Networkshop44
Data centre networking at the University of Bristol - Networkshop44Jisc
 
Wikidata Introductory Workshop
Wikidata Introductory WorkshopWikidata Introductory Workshop
Wikidata Introductory WorkshopBeat Estermann
 
Wikidata and performing_arts_20170811
Wikidata and performing_arts_20170811Wikidata and performing_arts_20170811
Wikidata and performing_arts_20170811Beat Estermann
 
Wikidata and performing_arts_20180116
Wikidata and performing_arts_20180116Wikidata and performing_arts_20180116
Wikidata and performing_arts_20180116Beat Estermann
 
Linked Data at the German National Library
Linked Data at the German National LibraryLinked Data at the German National Library
Linked Data at the German National LibraryReinhold Heuvelmann
 
Europeana Newspapers Aggregator Forum 2018 Berlin
Europeana Newspapers Aggregator Forum 2018 BerlinEuropeana Newspapers Aggregator Forum 2018 Berlin
Europeana Newspapers Aggregator Forum 2018 Berlincneudecker
 
Historical Wiki of Vienna - the largest city wiki, Christoph Sonnlechner, SMW...
Historical Wiki of Vienna - the largest city wiki, Christoph Sonnlechner, SMW...Historical Wiki of Vienna - the largest city wiki, Christoph Sonnlechner, SMW...
Historical Wiki of Vienna - the largest city wiki, Christoph Sonnlechner, SMW...KDZ - Zentrum für Verwaltungsforschung
 
Open Data: EU Policies and Activities
Open Data: EU Policies and ActivitiesOpen Data: EU Policies and Activities
Open Data: EU Policies and ActivitiesCarl-Christian Buhr
 
Text and Data Mining at the Royal Library in the Netherlands
Text and Data Mining at the Royal Library in the NetherlandsText and Data Mining at the Royal Library in the Netherlands
Text and Data Mining at the Royal Library in the Netherlandsopenminted_eu
 
Data in Switzerland: BFS at OKCon 2013
Data in Switzerland: BFS at OKCon 2013Data in Switzerland: BFS at OKCon 2013
Data in Switzerland: BFS at OKCon 2013CH_Bundesarchiv
 
Local open data reaping the benefits
Local open data   reaping the benefitsLocal open data   reaping the benefits
Local open data reaping the benefitsMark Braggins
 
The Vienna History Wiki – a Collaborative Knowledge Platform for the City of...
The Vienna History Wiki –  a Collaborative Knowledge Platform for the City of...The Vienna History Wiki –  a Collaborative Knowledge Platform for the City of...
The Vienna History Wiki – a Collaborative Knowledge Platform for the City of...Bernhard Krabina
 
Big data Europe: concept, platform and pilots
Big data Europe: concept, platform and pilotsBig data Europe: concept, platform and pilots
Big data Europe: concept, platform and pilotsBigData_Europe
 
[2015 e-Government Program] Action Plan : Warsaw(Poland)
[2015 e-Government Program] Action Plan : Warsaw(Poland)[2015 e-Government Program] Action Plan : Warsaw(Poland)
[2015 e-Government Program] Action Plan : Warsaw(Poland)shrdcinfo
 
Reusing historical newspapers of KB in e-humanities - Case studies and exampl...
Reusing historical newspapers of KB in e-humanities - Case studies and exampl...Reusing historical newspapers of KB in e-humanities - Case studies and exampl...
Reusing historical newspapers of KB in e-humanities - Case studies and exampl...Olaf Janssen
 

La actualidad más candente (20)

Migration statistics in Eurostat - Definition, statistical production and dis...
Migration statistics in Eurostat - Definition, statistical production and dis...Migration statistics in Eurostat - Definition, statistical production and dis...
Migration statistics in Eurostat - Definition, statistical production and dis...
 
Corpus Protocols IFLA Geneva August 2014 by Neil Smyth and Stella Wisdom
Corpus Protocols IFLA Geneva August 2014 by Neil Smyth and Stella WisdomCorpus Protocols IFLA Geneva August 2014 by Neil Smyth and Stella Wisdom
Corpus Protocols IFLA Geneva August 2014 by Neil Smyth and Stella Wisdom
 
GND and URIs: Integration and Identification
GND and URIs: Integration and IdentificationGND and URIs: Integration and Identification
GND and URIs: Integration and Identification
 
Wikidata Introduction, Linked Digital Future Initiative, August 2019
Wikidata Introduction, Linked Digital Future Initiative, August 2019Wikidata Introduction, Linked Digital Future Initiative, August 2019
Wikidata Introduction, Linked Digital Future Initiative, August 2019
 
Data centre networking at the University of Bristol - Networkshop44
Data centre networking at the University of Bristol  - Networkshop44Data centre networking at the University of Bristol  - Networkshop44
Data centre networking at the University of Bristol - Networkshop44
 
Wikidata Introductory Workshop
Wikidata Introductory WorkshopWikidata Introductory Workshop
Wikidata Introductory Workshop
 
Wikidata and performing_arts_20170811
Wikidata and performing_arts_20170811Wikidata and performing_arts_20170811
Wikidata and performing_arts_20170811
 
Wikidata and performing_arts_20180116
Wikidata and performing_arts_20180116Wikidata and performing_arts_20180116
Wikidata and performing_arts_20180116
 
Linked Data at the German National Library
Linked Data at the German National LibraryLinked Data at the German National Library
Linked Data at the German National Library
 
Europeana Newspapers Aggregator Forum 2018 Berlin
Europeana Newspapers Aggregator Forum 2018 BerlinEuropeana Newspapers Aggregator Forum 2018 Berlin
Europeana Newspapers Aggregator Forum 2018 Berlin
 
Historical Wiki of Vienna - the largest city wiki, Christoph Sonnlechner, SMW...
Historical Wiki of Vienna - the largest city wiki, Christoph Sonnlechner, SMW...Historical Wiki of Vienna - the largest city wiki, Christoph Sonnlechner, SMW...
Historical Wiki of Vienna - the largest city wiki, Christoph Sonnlechner, SMW...
 
Open Data: EU Policies and Activities
Open Data: EU Policies and ActivitiesOpen Data: EU Policies and Activities
Open Data: EU Policies and Activities
 
Text and Data Mining at the Royal Library in the Netherlands
Text and Data Mining at the Royal Library in the NetherlandsText and Data Mining at the Royal Library in the Netherlands
Text and Data Mining at the Royal Library in the Netherlands
 
Linked data in the swiss federal data infra
Linked data in the swiss federal data infraLinked data in the swiss federal data infra
Linked data in the swiss federal data infra
 
Data in Switzerland: BFS at OKCon 2013
Data in Switzerland: BFS at OKCon 2013Data in Switzerland: BFS at OKCon 2013
Data in Switzerland: BFS at OKCon 2013
 
Local open data reaping the benefits
Local open data   reaping the benefitsLocal open data   reaping the benefits
Local open data reaping the benefits
 
The Vienna History Wiki – a Collaborative Knowledge Platform for the City of...
The Vienna History Wiki –  a Collaborative Knowledge Platform for the City of...The Vienna History Wiki –  a Collaborative Knowledge Platform for the City of...
The Vienna History Wiki – a Collaborative Knowledge Platform for the City of...
 
Big data Europe: concept, platform and pilots
Big data Europe: concept, platform and pilotsBig data Europe: concept, platform and pilots
Big data Europe: concept, platform and pilots
 
[2015 e-Government Program] Action Plan : Warsaw(Poland)
[2015 e-Government Program] Action Plan : Warsaw(Poland)[2015 e-Government Program] Action Plan : Warsaw(Poland)
[2015 e-Government Program] Action Plan : Warsaw(Poland)
 
Reusing historical newspapers of KB in e-humanities - Case studies and exampl...
Reusing historical newspapers of KB in e-humanities - Case studies and exampl...Reusing historical newspapers of KB in e-humanities - Case studies and exampl...
Reusing historical newspapers of KB in e-humanities - Case studies and exampl...
 

Similar a Making document search system slightly friendlier to the power user

ADEQUATe and CommuniData
ADEQUATe and CommuniDataADEQUATe and CommuniData
ADEQUATe and CommuniDataStadt Wien
 
e-Legal Deposit Survey 2017
e-Legal Deposit Survey 2017e-Legal Deposit Survey 2017
e-Legal Deposit Survey 2017Frederick Zarndt
 
Austrian Experience in Building Data Value Chain
Austrian Experience in Building Data Value ChainAustrian Experience in Building Data Value Chain
Austrian Experience in Building Data Value ChainAnna Fensel
 
Requirements for Open Sharing of Archaeological Research Data
Requirements for Open Sharing of Archaeological Research DataRequirements for Open Sharing of Archaeological Research Data
Requirements for Open Sharing of Archaeological Research Dataariadnenetwork
 
Data science and the future of statistics
Data science and the future of statisticsData science and the future of statistics
Data science and the future of statisticsPiet J.H. Daas
 
Doing data in the social sciences and humanities: links to and from published...
Doing data in the social sciences and humanities: links to and from published...Doing data in the social sciences and humanities: links to and from published...
Doing data in the social sciences and humanities: links to and from published...EDINA, University of Edinburgh
 
Preparing documentation and adapting work processes for acquiring DSA
Preparing documentation and adapting work processes for acquiring DSAPreparing documentation and adapting work processes for acquiring DSA
Preparing documentation and adapting work processes for acquiring DSAArhiv družboslovnih podatkov
 
OpenGovIntelligence Workshop at NTTS2017
OpenGovIntelligence Workshop at NTTS2017OpenGovIntelligence Workshop at NTTS2017
OpenGovIntelligence Workshop at NTTS2017OpenGovIntelligence
 
DARIAH Athens May 2009
DARIAH  Athens  May 2009DARIAH  Athens  May 2009
DARIAH Athens May 2009pkdoorn
 
GIS Day 2015: Geoinformatics, Open Source and Videos - a library perspective
GIS Day 2015: Geoinformatics, Open Source and Videos - a library perspectiveGIS Day 2015: Geoinformatics, Open Source and Videos - a library perspective
GIS Day 2015: Geoinformatics, Open Source and Videos - a library perspectivePeter Löwe
 
Gauditz & Kunze, Web archives as research data FINAL.pptx
Gauditz & Kunze, Web archives as research data FINAL.pptxGauditz & Kunze, Web archives as research data FINAL.pptx
Gauditz & Kunze, Web archives as research data FINAL.pptxWARCnet
 
Towards a common danish infrastructure
Towards a common danish infrastructureTowards a common danish infrastructure
Towards a common danish infrastructurechrert
 
Infrastructures for Open, Digital Science
Infrastructures for Open, Digital ScienceInfrastructures for Open, Digital Science
Infrastructures for Open, Digital ScienceCarl-Christian Buhr
 
Machine Learning with and for Semantic Web Knowledge Graphs
Machine Learning with and for Semantic Web Knowledge GraphsMachine Learning with and for Semantic Web Knowledge Graphs
Machine Learning with and for Semantic Web Knowledge GraphsHeiko Paulheim
 
Recent advances in the project EXCITE – Extraction of Citations from PDF Docu...
Recent advances in the project EXCITE – Extraction of Citations from PDF Docu...Recent advances in the project EXCITE – Extraction of Citations from PDF Docu...
Recent advances in the project EXCITE – Extraction of Citations from PDF Docu...GESIS
 
Libraries in the Big Data Era: Strategies and Challenges in Archiving and Sha...
Libraries in the Big Data Era: Strategies and Challenges in Archiving and Sha...Libraries in the Big Data Era: Strategies and Challenges in Archiving and Sha...
Libraries in the Big Data Era: Strategies and Challenges in Archiving and Sha...Peter Löwe
 
Digital archaeology and museums
Digital archaeology and museumsDigital archaeology and museums
Digital archaeology and museumsdejp3
 

Similar a Making document search system slightly friendlier to the power user (20)

ADEQUATe and CommuniData
ADEQUATe and CommuniDataADEQUATe and CommuniData
ADEQUATe and CommuniData
 
e-Legal Deposit Survey 2017
e-Legal Deposit Survey 2017e-Legal Deposit Survey 2017
e-Legal Deposit Survey 2017
 
Austrian Experience in Building Data Value Chain
Austrian Experience in Building Data Value ChainAustrian Experience in Building Data Value Chain
Austrian Experience in Building Data Value Chain
 
Requirements for Open Sharing of Archaeological Research Data
Requirements for Open Sharing of Archaeological Research DataRequirements for Open Sharing of Archaeological Research Data
Requirements for Open Sharing of Archaeological Research Data
 
Data science and the future of statistics
Data science and the future of statisticsData science and the future of statistics
Data science and the future of statistics
 
Doing data in the social sciences and humanities: links to and from published...
Doing data in the social sciences and humanities: links to and from published...Doing data in the social sciences and humanities: links to and from published...
Doing data in the social sciences and humanities: links to and from published...
 
Csdh sbg clariah_intr01
Csdh sbg clariah_intr01Csdh sbg clariah_intr01
Csdh sbg clariah_intr01
 
Preparing documentation and adapting work processes for acquiring DSA
Preparing documentation and adapting work processes for acquiring DSAPreparing documentation and adapting work processes for acquiring DSA
Preparing documentation and adapting work processes for acquiring DSA
 
Open, Digital Science in Europe
Open, Digital Science in EuropeOpen, Digital Science in Europe
Open, Digital Science in Europe
 
OpenGovIntelligence Workshop at NTTS2017
OpenGovIntelligence Workshop at NTTS2017OpenGovIntelligence Workshop at NTTS2017
OpenGovIntelligence Workshop at NTTS2017
 
DARIAH Athens May 2009
DARIAH  Athens  May 2009DARIAH  Athens  May 2009
DARIAH Athens May 2009
 
GIS Day 2015: Geoinformatics, Open Source and Videos - a library perspective
GIS Day 2015: Geoinformatics, Open Source and Videos - a library perspectiveGIS Day 2015: Geoinformatics, Open Source and Videos - a library perspective
GIS Day 2015: Geoinformatics, Open Source and Videos - a library perspective
 
Gauditz & Kunze, Web archives as research data FINAL.pptx
Gauditz & Kunze, Web archives as research data FINAL.pptxGauditz & Kunze, Web archives as research data FINAL.pptx
Gauditz & Kunze, Web archives as research data FINAL.pptx
 
Towards a common danish infrastructure
Towards a common danish infrastructureTowards a common danish infrastructure
Towards a common danish infrastructure
 
Infrastructures for Open, Digital Science
Infrastructures for Open, Digital ScienceInfrastructures for Open, Digital Science
Infrastructures for Open, Digital Science
 
TIDSR
TIDSRTIDSR
TIDSR
 
Machine Learning with and for Semantic Web Knowledge Graphs
Machine Learning with and for Semantic Web Knowledge GraphsMachine Learning with and for Semantic Web Knowledge Graphs
Machine Learning with and for Semantic Web Knowledge Graphs
 
Recent advances in the project EXCITE – Extraction of Citations from PDF Docu...
Recent advances in the project EXCITE – Extraction of Citations from PDF Docu...Recent advances in the project EXCITE – Extraction of Citations from PDF Docu...
Recent advances in the project EXCITE – Extraction of Citations from PDF Docu...
 
Libraries in the Big Data Era: Strategies and Challenges in Archiving and Sha...
Libraries in the Big Data Era: Strategies and Challenges in Archiving and Sha...Libraries in the Big Data Era: Strategies and Challenges in Archiving and Sha...
Libraries in the Big Data Era: Strategies and Challenges in Archiving and Sha...
 
Digital archaeology and museums
Digital archaeology and museumsDigital archaeology and museums
Digital archaeology and museums
 

Último

毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
Business Analytics using Microsoft Excel
Business Analytics using Microsoft ExcelBusiness Analytics using Microsoft Excel
Business Analytics using Microsoft Excelysmaelreyes
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一F La
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degreeyuu sss
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxdolaknnilon
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 

Último (20)

毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
Business Analytics using Microsoft Excel
Business Analytics using Microsoft ExcelBusiness Analytics using Microsoft Excel
Business Analytics using Microsoft Excel
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptx
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 

Making document search system slightly friendlier to the power user

  • 1. Making document search system slightly friendlier to the power user. Judgements search case study Michał Łopuszyński 2017.11.29, London, UK Search Solutions 2017
  • 2. saos.org.pl Before judgements scattered between many search systems• Goal: Unify access to Polish case-law• We provide unified search, rest API , WCAG compliant service• Data volume ~ 300k documents and growing• Constitutional Tribunal Import, metadata extraction http://saos.org.pl Supreme Court Common Courts National Appeals Chamber API Search Analysis ~3k daily visits•
  • 3. saos.org.pl Side-goal: provide some non-mainstream approaches to explore document collections • The analysis tool (the trender) – in production• Creating maps of document collections – only in the lab•
  • 5. The trender – saos.org.pl/analysis
  • 6. Maps of document collections
  • 7. Maps of document collections – a caveat All low dimensional "embeddings" are wrong• Some are useful (perhaps)• The graph from Matti Lyra, PyData Berlin 2017, https://www.youtube.com/watch?v=UkmIljRIG_M For t-SNE, see also https://distill.pub/2016/misread-tsne/
  • 8. Maps of document collections – PCA vs t-SNE PCA t-SNE 2000 judgements from National Appeal Chamber, common court, Supreme Court, and Constitutional Tribunal visualised • M.Jungiewicz, M. Łopuszyński, Towards Meaningful Maps of Polish Case Law, JURIX 2015, 185 (2015)
  • 9. Maps of document collections – PCA vs t-SNE The previous picture coloured by issuing court (however, note that issuing court was not used directly in map generation process) • National Appeal Chamber common courts Supreme Court Constitutional Tribunal PCA t-SNE M.Jungiewicz, M. Łopuszyński, Towards Meaningful Maps of Polish Case Law, JURIX 2015, 185 (2015)
  • 10. Maps of document collections – t-SNE example 2000 judgements from common courts tagged with different keywords • granting pensions military pensions increase/recalculation of pensions pension compensation offence agreement personal rights M.Jungiewicz, M. Łopuszyński, Towards Meaningful Maps of Polish Case Law, JURIX 2015, 185 (2015)
  • 11. Maps of document collections – in the wild Demo of Andrej Karpathy – papers, t-SNE based• http://cs.stanford.edu/people/karpathy/scholaroctopus/ Paperscape – papers, based on citation networks• http://paperscape.org
  • 12. Acknowledgements The Team• Piotr Waglowski (the boss)• Data science team: Michał Jungiewicz, Michał Łopuszyński• Tech team: Łukasz Dumiszewski (tech lead), Aleksander Nowiński, Monika Maksymiuk, Krzysztof Mądry, Łukasz Pawełczak, Jan Pavtel • The funding• Grant of National Centre for Research and Development (PL), within Social Innovations programme • Network analysis team: Michał Bojanowski, Bartosz Chrol Monika Pawluczuk, •
  • 13. Thank you for your attention! Questions? @lopusz http://slideshare.net/lopusz