3. “JSONpedia is a library and a web service
meant to read WikiText markup as JSON.”
mercoledì 10 ottobre 12
4. ‣ Initially conceived as a tool to produce data to
train Machine Learning models.
‣ The REST service,inspired by Sweeble
Crystalball,produces JSON, HTML and
(coming soon) RDF data.
‣ Written over a context-dependent event based
parser to be more performant than an Regex
based parser (like the wikiparser) or a DOM
based parser (like Sweeble).
mercoledì 10 ottobre 12
6. ‣ Lightweight Event based parser.
‣ More tolerant to frequent syntax errors
present within WikiText pages.
‣ Serializes to JSON output which is easier
to consume!
mercoledì 10 ottobre 12
8. ‣ JSONpedia doesn't add any semantic to
the extracted data.
‣ JSONpedia could integrate the current
DBpedia regex-based parser.
‣ JSONpedia is a not competitor of DBpedia
but rather a complement.
mercoledì 10 ottobre 12
12. WikiText Processors
Processors receive the stream of events generated by the
parser and perform data construction and transformation.
‣ Structure
‣ Extractors
‣ Linkers
‣ Splitters
‣ Validator
mercoledì 10 ottobre 12
13. Structure
The Structure Processor receives a stream of
WikiText parsing events and builds a 1-1JSON
representation of the document DOM.
mercoledì 10 ottobre 12
14. Extractors
Extractors are specific Processors that
collect a certain type of data from the
event stream: for example the
SectionsExtractor collects the list of all
sections detected in the document
stream.
mercoledì 10 ottobre 12
15. Linkers
A Linker is a Processor which links the
current document entity to other
informations acquired from external sources.
An example of Linker is the FreebaseLinker
which connects an entity to the same
representation in Freebase if any.
mercoledì 10 ottobre 12
16. Splitters
A Splitter is a Processor able to cut sub
trees of the JSON document built by the
Structure processor. An example of
Splitter is the TableSplitter which extract
the JSON structures representing the
tables declared in the document.
mercoledì 10 ottobre 12
17. Validator
A Validator is a Processor performing the
check of data structures parsed from a
document.
mercoledì 10 ottobre 12
18. Forthcoming Features
‣ JSONpedia DB (based on MongoDB +
ElasticSearch) can be queried online.
Also JSONpedia dumps will be available.
‣ Online data model Exporter Tool (CSV)
‣ RDF output.
mercoledì 10 ottobre 12
19. Release
JSONpedia will be fully released
OpenSource in by the end of the year.
mercoledì 10 ottobre 12
20. Live Demo
http://bit.ly/jsonpedia
or
http://json.it.dbpedia.org/frontend/form.html
mercoledì 10 ottobre 12