SlideShare una empresa de Scribd logo
1 de 21
Descargar para leer sin conexión
WWW.SPAZIODATI.EU




                                       JSONpedia
                          Facilitating consumption of MediaWiki content.




                  Michele Mostarda <mostarda@spaziodati.eu>, TW: @micmos
mercoledì 10 ottobre 12
What is JSONpedia?



mercoledì 10 ottobre 12
“JSONpedia is a library and a web service
                     meant to read WikiText markup as JSON.”




mercoledì 10 ottobre 12
‣       Initially conceived as a tool to produce data to
                     train Machine Learning models.
             ‣       The REST service,inspired by Sweeble
                     Crystalball,produces JSON, HTML and
                     (coming soon) RDF data.
             ‣       Written over a context-dependent event based
                     parser to be more performant than an Regex
                     based parser (like the wikiparser) or a DOM
                     based parser (like Sweeble).


mercoledì 10 ottobre 12
Differences with Sweeble




mercoledì 10 ottobre 12
‣    Lightweight Event based parser.
                     ‣    More tolerant to frequent syntax errors
                          present within WikiText pages.
                     ‣    Serializes to JSON output which is easier
                          to consume!




mercoledì 10 ottobre 12
Differences with DBpedia




mercoledì 10 ottobre 12
‣       JSONpedia doesn't add any semantic to
                          the extracted data.
                  ‣       JSONpedia could integrate the current
                          DBpedia regex-based parser.
                  ‣       JSONpedia is a not competitor of DBpedia
                          but rather a complement.




mercoledì 10 ottobre 12
JSONpedia Internals




mercoledì 10 ottobre 12
Architecture
                             Parser      Structure




                                         Validator


                             Input
                            WikiText
                                         Extractor




                                          Splitter




                          DBpedia API/    Linker
                           Freebase




                            Output
                            JSON             +




mercoledì 10 ottobre 12
WikiText Parser Events
                   // Document bounding.                    // Links
                   void beginDocument(URL document);        void beginLink(String url);
                   void endDocument();                      void endLink(String url);

                   // Error handling.                       // lists
                   void parseWarning(String msg,            void beginList();
                   ParserLocation location);                void listItem();
                   void parseError(Exception e,             void endList();
                   ParserLocation location);
                                                            // Templates
                   // Tag handling.                         void beginTemplate(String name);
                   void beginTag(String node, Attribute[]   void endTemplate(String name);
                   attributes);
                   void endTag(String node);                // Tables
                   void inlineTag(String node,              void beginTable();
                   Attribute[] attributes);                 void headCell(int row, int col);
                   void commentTag(String comment);         void bodyCell(int row, int col);
                                                            void endTable();
                   // Sections
                   void section(String title, int level);   // Generic parameter
                                                            void parameter(String param);
                   // References                            // parameter / text value
                   void beginReference(String label);       void text(String content);
                   void endReference(String label);



mercoledì 10 ottobre 12
WikiText Processors
                Processors receive the stream of events generated by the
                parser and perform data construction and transformation.

                ‣    Structure
                ‣    Extractors
                ‣    Linkers
                ‣    Splitters
                ‣    Validator



mercoledì 10 ottobre 12
Structure



                 The Structure Processor receives a stream of
                 WikiText parsing events and builds a 1-1JSON
                 representation of the document DOM.




mercoledì 10 ottobre 12
Extractors

                          Extractors are specific Processors that
                          collect a certain type of data from the
                          event stream: for example the
                          SectionsExtractor collects the list of all
                          sections detected in the document
                          stream.



mercoledì 10 ottobre 12
Linkers


                      A Linker is a Processor which links the
                      current document entity to other
                      informations acquired from external sources.
                      An example of Linker is the FreebaseLinker
                      which connects an entity to the same
                      representation in Freebase if any.



mercoledì 10 ottobre 12
Splitters


                          A Splitter is a Processor able to cut sub
                          trees of the JSON document built by the
                          Structure processor. An example of
                          Splitter is the TableSplitter which extract
                          the JSON structures representing the
                          tables declared in the document.



mercoledì 10 ottobre 12
Validator



                          A Validator is a Processor performing the
                          check of data structures parsed from a
                          document.




mercoledì 10 ottobre 12
Forthcoming Features

                     ‣    JSONpedia DB (based on MongoDB +
                          ElasticSearch) can be queried online.
                          Also JSONpedia dumps will be available.
                     ‣    Online data model Exporter Tool (CSV)
                     ‣    RDF output.



mercoledì 10 ottobre 12
Release



                          JSONpedia will be fully released
                          OpenSource in by the end of the year.




mercoledì 10 ottobre 12
Live Demo


                          http://bit.ly/jsonpedia
                                    or
        http://json.it.dbpedia.org/frontend/form.html




mercoledì 10 ottobre 12
WWW.SPAZIODATI.EU




                                   Thanks!

                  Michele Mostarda <mostarda@spaziodati.eu>, TW: @micmos
mercoledì 10 ottobre 12

Más contenido relacionado

La actualidad más candente

Using Webservice in iOS
Using Webservice  in iOS Using Webservice  in iOS
Using Webservice in iOS Mahboob Nur
 
Rupy2012 ArangoDB Workshop Part1
Rupy2012 ArangoDB Workshop Part1Rupy2012 ArangoDB Workshop Part1
Rupy2012 ArangoDB Workshop Part1ArangoDB Database
 
Connecting to a REST API in iOS
Connecting to a REST API in iOSConnecting to a REST API in iOS
Connecting to a REST API in iOSgillygize
 
Introduction to mongo db
Introduction to mongo dbIntroduction to mongo db
Introduction to mongo dbHemant Sharma
 
Electron, databases, and RxDB
Electron, databases, and RxDBElectron, databases, and RxDB
Electron, databases, and RxDBBen Gotow
 
Updating materialized views and caches using kafka
Updating materialized views and caches using kafkaUpdating materialized views and caches using kafka
Updating materialized views and caches using kafkaZach Cox
 
Getting started with MongoDB and Scala - Open Source Bridge 2012
Getting started with MongoDB and Scala - Open Source Bridge 2012Getting started with MongoDB and Scala - Open Source Bridge 2012
Getting started with MongoDB and Scala - Open Source Bridge 2012sullis
 
Intro to XML in libraries
Intro to XML in librariesIntro to XML in libraries
Intro to XML in librariesKyle Banerjee
 
MongoDB for Beginners
MongoDB for BeginnersMongoDB for Beginners
MongoDB for BeginnersEnoch Joshua
 
iOS: Web Services and XML parsing
iOS: Web Services and XML parsingiOS: Web Services and XML parsing
iOS: Web Services and XML parsingJussi Pohjolainen
 
Scala with mongodb
Scala with mongodbScala with mongodb
Scala with mongodbKnoldus Inc.
 
Quick overview on mongo db
Quick overview on mongo dbQuick overview on mongo db
Quick overview on mongo dbEman Mohamed
 
Getting Started with MongoDB (TCF ITPC 2014)
Getting Started with MongoDB (TCF ITPC 2014)Getting Started with MongoDB (TCF ITPC 2014)
Getting Started with MongoDB (TCF ITPC 2014)Michael Redlich
 
ToroDB: Scaling PostgreSQL like MongoDB by Álvaro Hernández at Big Data Spain...
ToroDB: Scaling PostgreSQL like MongoDB by Álvaro Hernández at Big Data Spain...ToroDB: Scaling PostgreSQL like MongoDB by Álvaro Hernández at Big Data Spain...
ToroDB: Scaling PostgreSQL like MongoDB by Álvaro Hernández at Big Data Spain...Big Data Spain
 
Get expertise with mongo db
Get expertise with mongo dbGet expertise with mongo db
Get expertise with mongo dbAmit Thakkar
 

La actualidad más candente (20)

Using Webservice in iOS
Using Webservice  in iOS Using Webservice  in iOS
Using Webservice in iOS
 
Rupy2012 ArangoDB Workshop Part1
Rupy2012 ArangoDB Workshop Part1Rupy2012 ArangoDB Workshop Part1
Rupy2012 ArangoDB Workshop Part1
 
Files and JavaScript
Files and JavaScriptFiles and JavaScript
Files and JavaScript
 
Connecting to a REST API in iOS
Connecting to a REST API in iOSConnecting to a REST API in iOS
Connecting to a REST API in iOS
 
Introduction to mongo db
Introduction to mongo dbIntroduction to mongo db
Introduction to mongo db
 
Xml and DTD's
Xml and DTD'sXml and DTD's
Xml and DTD's
 
Electron, databases, and RxDB
Electron, databases, and RxDBElectron, databases, and RxDB
Electron, databases, and RxDB
 
Updating materialized views and caches using kafka
Updating materialized views and caches using kafkaUpdating materialized views and caches using kafka
Updating materialized views and caches using kafka
 
Getting started with MongoDB and Scala - Open Source Bridge 2012
Getting started with MongoDB and Scala - Open Source Bridge 2012Getting started with MongoDB and Scala - Open Source Bridge 2012
Getting started with MongoDB and Scala - Open Source Bridge 2012
 
Intro to XML in libraries
Intro to XML in librariesIntro to XML in libraries
Intro to XML in libraries
 
Legislation.gov.uk
Legislation.gov.ukLegislation.gov.uk
Legislation.gov.uk
 
Rails meets no sql
Rails meets no sqlRails meets no sql
Rails meets no sql
 
MongoDB for Beginners
MongoDB for BeginnersMongoDB for Beginners
MongoDB for Beginners
 
iOS: Web Services and XML parsing
iOS: Web Services and XML parsingiOS: Web Services and XML parsing
iOS: Web Services and XML parsing
 
Scala with mongodb
Scala with mongodbScala with mongodb
Scala with mongodb
 
Quick overview on mongo db
Quick overview on mongo dbQuick overview on mongo db
Quick overview on mongo db
 
Mongo DB Presentation
Mongo DB PresentationMongo DB Presentation
Mongo DB Presentation
 
Getting Started with MongoDB (TCF ITPC 2014)
Getting Started with MongoDB (TCF ITPC 2014)Getting Started with MongoDB (TCF ITPC 2014)
Getting Started with MongoDB (TCF ITPC 2014)
 
ToroDB: Scaling PostgreSQL like MongoDB by Álvaro Hernández at Big Data Spain...
ToroDB: Scaling PostgreSQL like MongoDB by Álvaro Hernández at Big Data Spain...ToroDB: Scaling PostgreSQL like MongoDB by Álvaro Hernández at Big Data Spain...
ToroDB: Scaling PostgreSQL like MongoDB by Álvaro Hernández at Big Data Spain...
 
Get expertise with mongo db
Get expertise with mongo dbGet expertise with mongo db
Get expertise with mongo db
 

Similar a Introducing JSONpedia

OrientDB introduction - NoSQL
OrientDB introduction - NoSQLOrientDB introduction - NoSQL
OrientDB introduction - NoSQLLuca Garulli
 
Open Access Publishing on the Semantic Web
Open Access Publishing  on the  Semantic WebOpen Access Publishing  on the  Semantic Web
Open Access Publishing on the Semantic WebRichard Cave
 
Querying the Web of Data
Querying the Web of DataQuerying the Web of Data
Querying the Web of DataRinke Hoekstra
 
[Deprecated] Integrating libSyntax into the compiler pipeline
[Deprecated] Integrating libSyntax into the compiler pipeline[Deprecated] Integrating libSyntax into the compiler pipeline
[Deprecated] Integrating libSyntax into the compiler pipelineYusuke Kita
 
DCMI/RDA Task Group Report, DC-2010 Pittsburgh
DCMI/RDA Task Group Report, DC-2010 PittsburghDCMI/RDA Task Group Report, DC-2010 Pittsburgh
DCMI/RDA Task Group Report, DC-2010 PittsburghDiane Hillmann
 
State of the Semantic Web
State of the Semantic WebState of the Semantic Web
State of the Semantic WebIvan Herman
 
SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)
SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)
SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)net2-project
 
Ks2009 Semanticweb In Action
Ks2009 Semanticweb In ActionKs2009 Semanticweb In Action
Ks2009 Semanticweb In ActionRinke Hoekstra
 
Xtext beyond the defaults - how to tackle performance problems
Xtext beyond the defaults -  how to tackle performance problemsXtext beyond the defaults -  how to tackle performance problems
Xtext beyond the defaults - how to tackle performance problemsHolger Schill
 
Streams of information - Chicago crystal language monthly meetup
Streams of information - Chicago crystal language monthly meetupStreams of information - Chicago crystal language monthly meetup
Streams of information - Chicago crystal language monthly meetupBrian Cardiff
 
Introduction to Mongo DB-open-­‐source, high-­‐performance, document-­‐orient...
Introduction to Mongo DB-open-­‐source, high-­‐performance, document-­‐orient...Introduction to Mongo DB-open-­‐source, high-­‐performance, document-­‐orient...
Introduction to Mongo DB-open-­‐source, high-­‐performance, document-­‐orient...boychatmate1
 
Hatkit Project - Datafiddler
Hatkit Project - DatafiddlerHatkit Project - Datafiddler
Hatkit Project - Datafiddlerholiman
 
node.js: Javascript's in your backend
node.js: Javascript's in your backendnode.js: Javascript's in your backend
node.js: Javascript's in your backendDavid Padbury
 

Similar a Introducing JSONpedia (20)

OrientDB introduction - NoSQL
OrientDB introduction - NoSQLOrientDB introduction - NoSQL
OrientDB introduction - NoSQL
 
Open Access Publishing on the Semantic Web
Open Access Publishing  on the  Semantic WebOpen Access Publishing  on the  Semantic Web
Open Access Publishing on the Semantic Web
 
Node.js and Ruby
Node.js and RubyNode.js and Ruby
Node.js and Ruby
 
Querying the Web of Data
Querying the Web of DataQuerying the Web of Data
Querying the Web of Data
 
Apache Beam de A à Z
 Apache Beam de A à Z Apache Beam de A à Z
Apache Beam de A à Z
 
[Deprecated] Integrating libSyntax into the compiler pipeline
[Deprecated] Integrating libSyntax into the compiler pipeline[Deprecated] Integrating libSyntax into the compiler pipeline
[Deprecated] Integrating libSyntax into the compiler pipeline
 
Java se7 features
Java se7 featuresJava se7 features
Java se7 features
 
Introduction to dotNetRDF
Introduction to dotNetRDFIntroduction to dotNetRDF
Introduction to dotNetRDF
 
Dom
Dom Dom
Dom
 
DCMI/RDA Task Group Report, DC-2010 Pittsburgh
DCMI/RDA Task Group Report, DC-2010 PittsburghDCMI/RDA Task Group Report, DC-2010 Pittsburgh
DCMI/RDA Task Group Report, DC-2010 Pittsburgh
 
State of the Semantic Web
State of the Semantic WebState of the Semantic Web
State of the Semantic Web
 
SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)
SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)
SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)
 
Ks2009 Semanticweb In Action
Ks2009 Semanticweb In ActionKs2009 Semanticweb In Action
Ks2009 Semanticweb In Action
 
ODF Mashups
ODF MashupsODF Mashups
ODF Mashups
 
Xtext beyond the defaults - how to tackle performance problems
Xtext beyond the defaults -  how to tackle performance problemsXtext beyond the defaults -  how to tackle performance problems
Xtext beyond the defaults - how to tackle performance problems
 
Streams of information - Chicago crystal language monthly meetup
Streams of information - Chicago crystal language monthly meetupStreams of information - Chicago crystal language monthly meetup
Streams of information - Chicago crystal language monthly meetup
 
Introduction to Mongo DB-open-­‐source, high-­‐performance, document-­‐orient...
Introduction to Mongo DB-open-­‐source, high-­‐performance, document-­‐orient...Introduction to Mongo DB-open-­‐source, high-­‐performance, document-­‐orient...
Introduction to Mongo DB-open-­‐source, high-­‐performance, document-­‐orient...
 
Hatkit Project - Datafiddler
Hatkit Project - DatafiddlerHatkit Project - Datafiddler
Hatkit Project - Datafiddler
 
Catmandu / LibreCat Project
Catmandu / LibreCat ProjectCatmandu / LibreCat Project
Catmandu / LibreCat Project
 
node.js: Javascript's in your backend
node.js: Javascript's in your backendnode.js: Javascript's in your backend
node.js: Javascript's in your backend
 

Más de SpazioDati

Dandelion API e Atoka: due strumenti utili al Data Journalism
Dandelion API e Atoka: due strumenti utili al Data JournalismDandelion API e Atoka: due strumenti utili al Data Journalism
Dandelion API e Atoka: due strumenti utili al Data JournalismSpazioDati
 
Data Curation @ SpazioDati - NEXA Lunch Seminar
Data Curation @ SpazioDati - NEXA Lunch SeminarData Curation @ SpazioDati - NEXA Lunch Seminar
Data Curation @ SpazioDati - NEXA Lunch SeminarSpazioDati
 
SpazioDati presents Dandelion dataTXT - SenTaClAus project - final meeting
SpazioDati presents Dandelion dataTXT - SenTaClAus project - final meetingSpazioDati presents Dandelion dataTXT - SenTaClAus project - final meeting
SpazioDati presents Dandelion dataTXT - SenTaClAus project - final meetingSpazioDati
 
SpazioDati presents dataTXT - SenTaClAus project - final meeting
SpazioDati presents dataTXT - SenTaClAus project - final meetingSpazioDati presents dataTXT - SenTaClAus project - final meeting
SpazioDati presents dataTXT - SenTaClAus project - final meetingSpazioDati
 
Opening “Big Data Challenge” data: some insights on our role in the story
Opening “Big Data Challenge” data: some insights on our role in the storyOpening “Big Data Challenge” data: some insights on our role in the story
Opening “Big Data Challenge” data: some insights on our role in the storySpazioDati
 
News Fact-checking: One Practical Application of Linked Statistics
News Fact-checking: One Practical Application of Linked StatisticsNews Fact-checking: One Practical Application of Linked Statistics
News Fact-checking: One Practical Application of Linked StatisticsSpazioDati
 
ISWC 2014 - Dandelion: from raw data to dataGEMs for developers
ISWC 2014 - Dandelion: from raw data to dataGEMs for developersISWC 2014 - Dandelion: from raw data to dataGEMs for developers
ISWC 2014 - Dandelion: from raw data to dataGEMs for developersSpazioDati
 
SpazioDati presents dataTXT - SenTaClAus project - 2nd open day
SpazioDati presents dataTXT - SenTaClAus project - 2nd open daySpazioDati presents dataTXT - SenTaClAus project - 2nd open day
SpazioDati presents dataTXT - SenTaClAus project - 2nd open daySpazioDati
 
Text analytics for Google Spreadsheets using Text Mining add-on
Text analytics for Google Spreadsheets using Text Mining add-on Text analytics for Google Spreadsheets using Text Mining add-on
Text analytics for Google Spreadsheets using Text Mining add-on SpazioDati
 
Find the specific Wikipedia page you’re looking for, using Wikisearch API
Find the specific Wikipedia page you’re looking for, using Wikisearch APIFind the specific Wikipedia page you’re looking for, using Wikisearch API
Find the specific Wikipedia page you’re looking for, using Wikisearch APISpazioDati
 
Using entity extraction extension with OpenRefine and Dandelion API
Using entity extraction extension with OpenRefine and Dandelion APIUsing entity extraction extension with OpenRefine and Dandelion API
Using entity extraction extension with OpenRefine and Dandelion APISpazioDati
 
Dandelion API and mobile payment: food for thoughts for H-ACK PAYMENT
Dandelion API and mobile payment: food for thoughts for H-ACK PAYMENTDandelion API and mobile payment: food for thoughts for H-ACK PAYMENT
Dandelion API and mobile payment: food for thoughts for H-ACK PAYMENTSpazioDati
 
Cerved Group scommette sull'analisi semantica made in Italy
Cerved Group scommette sull'analisi semantica made in ItalyCerved Group scommette sull'analisi semantica made in Italy
Cerved Group scommette sull'analisi semantica made in ItalySpazioDati
 
LinkedStat: making ISTAT data more valuable
LinkedStat: making ISTAT data more valuableLinkedStat: making ISTAT data more valuable
LinkedStat: making ISTAT data more valuableSpazioDati
 
Smart Open Data Kickoff - Madrid - Linked
Smart Open Data Kickoff - Madrid - Linked Smart Open Data Kickoff - Madrid - Linked
Smart Open Data Kickoff - Madrid - Linked SpazioDati
 
Linked STAT per l'evento datalab con ISTAT alla Smart City Exhibition 2013
Linked STAT per l'evento datalab con ISTAT alla Smart City Exhibition 2013Linked STAT per l'evento datalab con ISTAT alla Smart City Exhibition 2013
Linked STAT per l'evento datalab con ISTAT alla Smart City Exhibition 2013SpazioDati
 
Pubblicare Linked Open Data, lezione 1
Pubblicare Linked Open Data, lezione 1Pubblicare Linked Open Data, lezione 1
Pubblicare Linked Open Data, lezione 1SpazioDati
 

Más de SpazioDati (17)

Dandelion API e Atoka: due strumenti utili al Data Journalism
Dandelion API e Atoka: due strumenti utili al Data JournalismDandelion API e Atoka: due strumenti utili al Data Journalism
Dandelion API e Atoka: due strumenti utili al Data Journalism
 
Data Curation @ SpazioDati - NEXA Lunch Seminar
Data Curation @ SpazioDati - NEXA Lunch SeminarData Curation @ SpazioDati - NEXA Lunch Seminar
Data Curation @ SpazioDati - NEXA Lunch Seminar
 
SpazioDati presents Dandelion dataTXT - SenTaClAus project - final meeting
SpazioDati presents Dandelion dataTXT - SenTaClAus project - final meetingSpazioDati presents Dandelion dataTXT - SenTaClAus project - final meeting
SpazioDati presents Dandelion dataTXT - SenTaClAus project - final meeting
 
SpazioDati presents dataTXT - SenTaClAus project - final meeting
SpazioDati presents dataTXT - SenTaClAus project - final meetingSpazioDati presents dataTXT - SenTaClAus project - final meeting
SpazioDati presents dataTXT - SenTaClAus project - final meeting
 
Opening “Big Data Challenge” data: some insights on our role in the story
Opening “Big Data Challenge” data: some insights on our role in the storyOpening “Big Data Challenge” data: some insights on our role in the story
Opening “Big Data Challenge” data: some insights on our role in the story
 
News Fact-checking: One Practical Application of Linked Statistics
News Fact-checking: One Practical Application of Linked StatisticsNews Fact-checking: One Practical Application of Linked Statistics
News Fact-checking: One Practical Application of Linked Statistics
 
ISWC 2014 - Dandelion: from raw data to dataGEMs for developers
ISWC 2014 - Dandelion: from raw data to dataGEMs for developersISWC 2014 - Dandelion: from raw data to dataGEMs for developers
ISWC 2014 - Dandelion: from raw data to dataGEMs for developers
 
SpazioDati presents dataTXT - SenTaClAus project - 2nd open day
SpazioDati presents dataTXT - SenTaClAus project - 2nd open daySpazioDati presents dataTXT - SenTaClAus project - 2nd open day
SpazioDati presents dataTXT - SenTaClAus project - 2nd open day
 
Text analytics for Google Spreadsheets using Text Mining add-on
Text analytics for Google Spreadsheets using Text Mining add-on Text analytics for Google Spreadsheets using Text Mining add-on
Text analytics for Google Spreadsheets using Text Mining add-on
 
Find the specific Wikipedia page you’re looking for, using Wikisearch API
Find the specific Wikipedia page you’re looking for, using Wikisearch APIFind the specific Wikipedia page you’re looking for, using Wikisearch API
Find the specific Wikipedia page you’re looking for, using Wikisearch API
 
Using entity extraction extension with OpenRefine and Dandelion API
Using entity extraction extension with OpenRefine and Dandelion APIUsing entity extraction extension with OpenRefine and Dandelion API
Using entity extraction extension with OpenRefine and Dandelion API
 
Dandelion API and mobile payment: food for thoughts for H-ACK PAYMENT
Dandelion API and mobile payment: food for thoughts for H-ACK PAYMENTDandelion API and mobile payment: food for thoughts for H-ACK PAYMENT
Dandelion API and mobile payment: food for thoughts for H-ACK PAYMENT
 
Cerved Group scommette sull'analisi semantica made in Italy
Cerved Group scommette sull'analisi semantica made in ItalyCerved Group scommette sull'analisi semantica made in Italy
Cerved Group scommette sull'analisi semantica made in Italy
 
LinkedStat: making ISTAT data more valuable
LinkedStat: making ISTAT data more valuableLinkedStat: making ISTAT data more valuable
LinkedStat: making ISTAT data more valuable
 
Smart Open Data Kickoff - Madrid - Linked
Smart Open Data Kickoff - Madrid - Linked Smart Open Data Kickoff - Madrid - Linked
Smart Open Data Kickoff - Madrid - Linked
 
Linked STAT per l'evento datalab con ISTAT alla Smart City Exhibition 2013
Linked STAT per l'evento datalab con ISTAT alla Smart City Exhibition 2013Linked STAT per l'evento datalab con ISTAT alla Smart City Exhibition 2013
Linked STAT per l'evento datalab con ISTAT alla Smart City Exhibition 2013
 
Pubblicare Linked Open Data, lezione 1
Pubblicare Linked Open Data, lezione 1Pubblicare Linked Open Data, lezione 1
Pubblicare Linked Open Data, lezione 1
 

Último

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 

Último (20)

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 

Introducing JSONpedia

  • 1. WWW.SPAZIODATI.EU JSONpedia Facilitating consumption of MediaWiki content. Michele Mostarda <mostarda@spaziodati.eu>, TW: @micmos mercoledì 10 ottobre 12
  • 3. “JSONpedia is a library and a web service meant to read WikiText markup as JSON.” mercoledì 10 ottobre 12
  • 4. Initially conceived as a tool to produce data to train Machine Learning models. ‣ The REST service,inspired by Sweeble Crystalball,produces JSON, HTML and (coming soon) RDF data. ‣ Written over a context-dependent event based parser to be more performant than an Regex based parser (like the wikiparser) or a DOM based parser (like Sweeble). mercoledì 10 ottobre 12
  • 6. Lightweight Event based parser. ‣ More tolerant to frequent syntax errors present within WikiText pages. ‣ Serializes to JSON output which is easier to consume! mercoledì 10 ottobre 12
  • 8. JSONpedia doesn't add any semantic to the extracted data. ‣ JSONpedia could integrate the current DBpedia regex-based parser. ‣ JSONpedia is a not competitor of DBpedia but rather a complement. mercoledì 10 ottobre 12
  • 10. Architecture Parser Structure Validator Input WikiText Extractor Splitter DBpedia API/ Linker Freebase Output JSON + mercoledì 10 ottobre 12
  • 11. WikiText Parser Events // Document bounding. // Links void beginDocument(URL document); void beginLink(String url); void endDocument(); void endLink(String url); // Error handling. // lists void parseWarning(String msg, void beginList(); ParserLocation location); void listItem(); void parseError(Exception e, void endList(); ParserLocation location); // Templates // Tag handling. void beginTemplate(String name); void beginTag(String node, Attribute[] void endTemplate(String name); attributes); void endTag(String node); // Tables void inlineTag(String node, void beginTable(); Attribute[] attributes); void headCell(int row, int col); void commentTag(String comment); void bodyCell(int row, int col); void endTable(); // Sections void section(String title, int level); // Generic parameter void parameter(String param); // References // parameter / text value void beginReference(String label); void text(String content); void endReference(String label); mercoledì 10 ottobre 12
  • 12. WikiText Processors Processors receive the stream of events generated by the parser and perform data construction and transformation. ‣ Structure ‣ Extractors ‣ Linkers ‣ Splitters ‣ Validator mercoledì 10 ottobre 12
  • 13. Structure The Structure Processor receives a stream of WikiText parsing events and builds a 1-1JSON representation of the document DOM. mercoledì 10 ottobre 12
  • 14. Extractors Extractors are specific Processors that collect a certain type of data from the event stream: for example the SectionsExtractor collects the list of all sections detected in the document stream. mercoledì 10 ottobre 12
  • 15. Linkers A Linker is a Processor which links the current document entity to other informations acquired from external sources. An example of Linker is the FreebaseLinker which connects an entity to the same representation in Freebase if any. mercoledì 10 ottobre 12
  • 16. Splitters A Splitter is a Processor able to cut sub trees of the JSON document built by the Structure processor. An example of Splitter is the TableSplitter which extract the JSON structures representing the tables declared in the document. mercoledì 10 ottobre 12
  • 17. Validator A Validator is a Processor performing the check of data structures parsed from a document. mercoledì 10 ottobre 12
  • 18. Forthcoming Features ‣ JSONpedia DB (based on MongoDB + ElasticSearch) can be queried online. Also JSONpedia dumps will be available. ‣ Online data model Exporter Tool (CSV) ‣ RDF output. mercoledì 10 ottobre 12
  • 19. Release JSONpedia will be fully released OpenSource in by the end of the year. mercoledì 10 ottobre 12
  • 20. Live Demo http://bit.ly/jsonpedia or http://json.it.dbpedia.org/frontend/form.html mercoledì 10 ottobre 12
  • 21. WWW.SPAZIODATI.EU Thanks! Michele Mostarda <mostarda@spaziodati.eu>, TW: @micmos mercoledì 10 ottobre 12