1. Rewire the Net
Davide Eynard
eynard@elet.polimi.it
Dipartimento di Elettronica e Informazione
Politecnico di Milano
2007/05/30
Mobile, Context Aware Databases and Database Systems
2. Intro
The problem
Wrapping vs Mashup
Mashup tools and technologies
Open problems
Conclusions
p. 2 2007/05/30 Rewire the Net
3. The problem
S
T
R
U
S C
T T
R U
U R
C E
T D
U U
R N
E S
D T
R
U
C
T
U
R
E
D
p. 3 2007/05/30 Rewire the Net
4. What is a wrapper?
p. 4 2007/05/30 Rewire the Net
5. What is a wrapper?
Content
Provider
Desired
Interface
p. 5 2007/05/30 Rewire the Net
6. Is a wrapper enough?
A wrapper takes a (usually unstructured) data
source and returns information in a desired
format
• All the uninteresting stuff is hidden within it
• From outside we see only the desired interface
What we want to do is work with this information
• aggregate/filter it
• use it as input for other services
• mash it!
p. 6 2007/05/30 Rewire the Net
7. An example
... and now?
p. 7 2007/05/30 Rewire the Net
8. An example
Convert data structures
to LaTeX and generate
a Sudoku book in PDF
p. 8 2007/05/30 Rewire the Net
9. An example
Create a Web app
which delivers data
in a standard format
Create a Java app
that runs Sudokus
on your mobile
Create another app
that solves Sudokus!
p. 9 2007/05/30 Rewire the Net
10. What kind of mashup?
Imagination is your only limit
• and... uhm, well... ability
So, most of the mashups around belong to one of
the following families:
• mapping mashups
• video and photo mashups
• search and shopping mashups
• news mashups
p. 10 2007/05/30 Rewire the Net
17. Features
Source:
“Five Ways to Mix, Rip, and Mash Your Data”
Nick Gonzalez, March 2 2007
p. 17 2007/05/30 Rewire the Net
18. The architecture
API/Content
Provider
I
N
T API/Content
E Provider
Client R MASHUP
F SITE/SERVICE
A
C ...
E
API/Content
Provider
p. 18 2007/05/30 Rewire the Net
19. The architecture
API/Content
Provider
API/Content
A Provider
Client J MASHUP
A SITE/SERVICE
X
...
API/Content
Provider
p. 19 2007/05/30 Rewire the Net
20. AJAX
Asynchronous Javascript and XML
It's a Web application model, rather than a
specific technology, and comprises several
different technologies:
• XHTML and CSS for style presentation
• The DOM API exposed by the browser for
dynamic display and interaction
• Asynchronous data exchange (typically XML)
• Browser-side scripting (typically Javascript)
p. 20 2007/05/30 Rewire the Net
21. Protocols and standards
Web protocols
• SOAP (Services-Oriented Access Protocol)
− XML message format
− Message structure: head and body parts
• REST (Representational State Transfer)
− Web-based communication using HTTP+XML
− Few operations: GET, POST, PUT, DELETE
applicable to all pieces of information
Syndication formats
• RSS (v1.0 is RDF based, while 2.0 is not)
• ATOM (more attention on metadata)
p. 21 2007/05/30 Rewire the Net
22. Wrappers, spiders, scrapers
Wrapper is quite a general term used to describe
a particular architecture
Remember
this one?
A wrapper needs at least other two components
to accomplish its task
• A spider (or crawler), to follow links and
download web pages
• A scraper, to extract useful content from pages
full of uninteresting data
p. 22 2007/05/30 Rewire the Net
25. Scrapers
However powerful, screen scraping is usually
considered an inelegant solution
• Lack of sophisticated, re-usable screen
scraping toolkit software (most of the scrapers
are created ad hoc). Difficult to program
• Unlike API-interfaces, scraping has no explicit
contract between content provider and content
consumer. Difficult to update/maintain
p. 25 2007/05/30 Rewire the Net
26. Semantic Web and RDF Hey, that's my job!
Content created for human consumption does not
make good content for automated machine
consumption
• Data becomes information when it conveys
meaning
XML in itself is not sufficient (too arbitrary).
RDF is quickly finding an adoption in a variety of
domains.
• possibility to query over it (RDQL, SPARQL)
• possibility to reason over it (Jena, RACER)
p. 26 2007/05/30 Rewire the Net
27. Challenges
Technical:
• data integration (what if mapping is not a
complete one?)
• data that need to be fixed/cleaned/converted
• robust standards, protocols, models and
toolkits (... and try to avoid scrapers)
Social:
• encouraging user contributions
• data pollution (lack of precision, gaming)
• tradeoff between the protection of intellectual
property and consumer privacy versus fair use
and free flow of information
p. 27 2007/05/30 Rewire the Net
28. Conclusions
Considering information as freely flowing on the
Internet, and creating “pipes” to redirect,
aggregate, reuse it is a great and powerful idea
We're still at the very beginning
User participation might offer new chances for
improvement
... and create new problems, of course!
p. 28 2007/05/30 Rewire the Net
29. Webography
Duane Merrill:
“Mashups: The new breed of Web app”
Tim O'Reilly: “Pipes and filters for the Internet”
Nick Gonzales:
“Five ways to Mix, Rip and Mash Your Data”
Davide Eynard: “PowerBrowsing Projects”,
“SukaSudoku”
www.webmashup.com
p. 29 2007/05/30 Rewire the Net
30. That's All, Folks
Thank you!
Questions are welcome
p. 30 2007/05/30 Rewire the Net