Rapid Prototyping of a Semantic-Web-based Research Workbench
1. Rapid Prototyping of a
Semantic-Web-based
Research Workbench
Carsten Ullrich
Dept. of Computer Science and
Engineering, SJTU
2. Overview
• Project done with Totuba, Inc.
• Goal: develop a research workbench
– bibliography manager
– research network
– support while writing research papers
• Sorry, no new pure research results
• But: overview on state-of-the-art of
existing Web services / Web data
4. Entity Extraction
The term "Web 2.0" is used to describe applications that
distinguish themselves from previous generations of software by
a number of principles. Existing work shows that Web 2.0
applications can be successfully exploited for technology-
enhanced learning. However, in-depth analyses of the
relationship between Web 2.0 technology on the one hand and
teaching and learning on the other hand are still rare.
5. Entity Extraction
Gur grez "Jro 2.0" vf hfrq gb qrfpevor nccyvpngvbaf gung
qvfgvathvfu gurzfryirf sebz cerivbhf trarengvbaf bs fbsgjner ol n
ahzore bs cevapvcyrf. Rkvfgvat jbex fubjf gung Jro 2.0
nccyvpngvbaf pna or fhpprffshyyl rkcybvgrq sbe grpuabybtl-
raunaprq yrneavat. Ubjrire, va-qrcgu nanylfrf bs gur
eryngvbafuvc orgjrra Jro 2.0 grpuabybtl ba gur bar unaq naq
grnpuvat naq yrneavat ba gur bgure unaq ner fgvyy ener.
6. Entity Extraction
Gur grez "Jro 2.0" vf hfrq gb qrfpevor nccyvpngvbaf gung
qvfgvathvfu gurzfryirf sebz cerivbhf trarengvbaf bs fbsgjner ol n
ahzore bs cevapvcyrf. Rkvfgvat jbex fubjf gung Jro 2.0
nccyvpngvbaf pna or fhpprffshyyl rkcybvgrq sbe grpuabybtl-
raunaprq yrneavat. Ubjrire, va-qrcgu nanylfrf bs gur
eryngvbafuvc orgjrra Jro 2.0 grpuabybtl ba gur bar unaq naq
grnpuvat naq yrneavat ba gur bgure unaq ner fgvyy ener.
OpenCalais
• Jro 2.0
• grpuabybtl-raunaprq yrneavat
7. Open Calais
• Thomson Reuters company
• Web Service
• Extracts entities, facts,
events (about 100 types)
• Free for noncommercial and
commercial use
Entities
Anniversary, City, Company, Continent, Country, Currency, EmailAddress,
EntertainmentAwardEvent, Facility, FaxNumber, Holiday, IndustryTerm, MarketIndex,
MedicalCondition, MedicalTreatment, Movie, MusicAlbum, MusicGroup, NaturalFeature,
OperatingSystem, Organization, Person, PhoneNumber, Position, Product, ProgrammingLanguage,
ProvinceOrState, PublishedMedium, RadioProgram, RadioStation, Region, SportsEvent,
SportsGame, SportsLeague, Technology, TVShow, TVStation, URL
8. Semantifying
The term "Web 2.0“...
OpenCalais
• Web 2.0
• technology-supported learning
DBPedia (others: Yago, Freebase, UMBEL)
• http://dbpedia.org/resource/Web_2.0
• http://dbpedia.org/resource/Technology-Enhanced_Learning
11. Reuse
• Highly efficient entity extraction
• Enormous databases
– describe the entities
– link to related entities
• Give a high-level starting position to explore new
challenges
– how to put this data into use?
– context: what is relevant for user/current usage
12. Lessons Learned
• Reuse enables progress
– no duplication of work
– focus on problems relevant for you
• Having a landscape that encourages reuse
creates advantages for research / commercial
applications
• Problems
– mostly only English
– few Chinese services / programming libraries
• e.g., named entity extraction
16. Questions
• I have some:
– opinion mining
– information extraction
• Any toolkits available? RASCALLI?
• Contact me in case you find this
interesting
• ullrich_c@sjtu.edu.cn