Python en la Web Semántica

en la Web Semántica

Santiago Coffey - PyCon 2009 - Buenos Aires

Global Internet trafﬁc estimates (TB/month)

9,000,000

7,500,000

6,000,000

4,500,000

3,000,000

1,500,000

0
1990 1992 1994 1996 1998 2000 2002 2004 2006 2008

Source: Cisco Visual Networking Index Forecast. http://www.cisco.com/go/vni

La web 2.0 es la
segunda generación del
desarrollo y diseño web

¿Cuál es la receta para esa nueva forma
de hacer la web?

PATTERN #1 
INFORMATION
SHARING

1.0 2.0

FUENTE FUENTE Y SUMIDERO
DE INFORMACIÓN DE INFORMACIÓN

WEB SITES
WEB APPLICATIONS
WEB SERVICES
3.0 SEMANTIC WEB

USERS

PATTERN #2 
USER-CENTERED
DESIGN

WEB 1.0 WEB 2.0

CENTRADO EN RECURSOS CENTRADO EN USUARIOS

http email

socket xmlrpc xml
cgi PYTHON 3.0 ssl
BATTERIES INCLUDED
smtpd
hashlib
urllib smtplib json

wsgiref codecs html

Babel lxml Paste Beaker

boto NumPy nltk WebHelpers
PIL
PYTHON MODULES
Twisted SWISS ARMY KNIFE PyYAML

FormEncode
SQLAlchemy
BeautifulSoup feedparser
python-memcached pycrypto

PATTERN #3 
MODEL-VIEW-
CONTROLLER
ARCHITECTURE

HTTP://WWW.E-XAMPLE.COM http://examp.le

BROWSER
 DIRECTORY
 DIRECTORY WEB SERVER

 DIRECTORY
 JPG ROUTE DISPATCHER

 GIF
CONTROLLER
 JS
 HTML
 HTML MODEL VIEW

 HTML
 HTML DATA
BASE

SITIO WEB: REPOSITORIO APLICACIÓN WEB:
DE PÁGINAS ESTÁTICAS MULTI-CAPA, CON ESTADO

Python Web Frameworks
• Django • Zope

• Pylons • Plone

• TurboGears • MoinMoin

• Google App • Python Paste
Engine
• CherryPy

Alternative DBMS
• memcachedb • Key-value
storage
• HBase + Thrift
• No joins, non-
• ﬂat ﬁles (CSV)
transactional
• BerkeleyDB
• Scalable,
• Map Reduce fault-tolerant

PATTERN #4 
PARTICIPATION &
COLLABORATION

WEB 1.0 WEB 2.0

PUBLICACIÓN INDIVIDUAL EDICIÓN COLABORATIVA

PATTERN #5 
SOCIAL
NETWORKING

1.0 2.0

RED DE COMPUTADORAS RED DE AMIGOS

IDEA SIMPLE
GRANDES
RESULTADOS

PATTERN #6 
SEARCH &
RECOMMENDATION
ENGINES

WEB 1.0: NAVEGAR TODO PARA DESCUBRIR

WEB 2.0: TEORÍA DEL ICEBERG
(SÓLO SE VE EL 10%)

¿Qué hace una web semántica?

• Integración (aggregation)
• Deﬁnición de ontologías
• Análisis semántico
• Agregado de valor (ﬁltrado,
asociación, recomendación, etc.)

¿Cuál es la misión
de la web semántica?

¿De qué habla (cuál es el signiﬁcado)
el contenido de un determinado
documento HTML?

¿Cuál es el contenido
relevante de un determinado
documento HTML?

¿En qué idioma está un
texto determinado?

¿Qué palabras son equivalentes
entre sí? (stemming)

¿Se pueden inferir tags
automáticamente a partir
de un texto?

¿Cómo se pueden eliminar palabras
comunes como artículos y
preposiciones (stop words)?

¿Cuál es la estructura sintáctica
de una oración o proposición?
(part of speech)

¿Cómo se puede eliminar la
ambigüedad entre los distintos
signiﬁcados de una misma palabra?

¿Es posible agregar
información del contexto
eﬁcientemente?

¿Cómo se puede mejorar la relación
señal-ruido (signal to noise ratio) de la
información que consume un usuario?

¿Cómo se pueden identiﬁcar
digramas en un texto?

¿Cuántas y cuáles son las categorías
apropiadas para clasiﬁcar contenidos
en una determinada ontología?

WEB 1.0 WEB 2.0

TAXONOMY FOLKSONOMY
(HIERARCHICAL CLASSIFICATION) (COLLABORATIVE TAGGING)

PATTERN #8 
COMMUNITY &
COLLECTIVE
INTELLIGENCE

WEB 1.0 WEB 2.0

CONSTRUIR PORTALES CONSTRUIR COMUNIDADES

PATTERN #9 
INTER-OPERABILITY &
DATA PORTABILITY

RSS

Atom

{json}
Widgets

JSON

<xml/>
RDF

 XML-RPC

WEB CRAWLERS / SPIDERS / BOTS WEB SYNDICATION, WIDGETS,
(HTML SCRAPING) OPEN APIs, AGGREGATION

Ejemplo: Atom feed
from feedparser import parse

url = 'http://blog.ianbicking.org/'
'feed/atom/'
feed = parse(url)
titles = [i['title'] for i in
feed['entries']]
print 'n'.join(titles)

Ejemplo: Twitter API
from urllib2 import urlopen
from simplejson import loads

url = 'http://twitter.com/statuses/'
'user_timeline/JohnCLeese.json'
raw_data = urlopen(url).read()
tweets = [i['text'] for i in
loads(raw_data)]
print 'n'.join(tweets)

Ejemplo: GData API
from gdata.service.youtube import
YoutubeService

url = 'http://gdata.youtube.com/feeds/'
'api/users/MontyPython/uploads'
client = YouTubeService()
feed = client.GetYouTubeVideoFeed(url)
titles = [i.title.text for i in
feed.entry]
print 'n'.join(titles)

Ejemplo: HTML Scraping
from urllib2 import urlopen
from lxml.html import fromstring

url = ...
html = urlopen(url).read()
root = fromstring(html)

• Demo con Microformats en LinkedIn

PATTERN #10 
RICH USER
EXPERIENCE

CLIENT-SIDE SCRIPTING CON
INTERACCIÓN POR FORMULARIOS
INTERACCIÓN ASINCRÓNICA
(CON RECARGA DE PÁGINA)
(e.g.: AJAX)

PATTERN #11 
SEPARATION OF
CONTENT AND
PRESENTATION

<HTML>

<HEAD>
<TITLE>My Homepage</TITLE>
</HEAD>
 STRUCTURE &
CONTENT
<BODY BGCOLOR="YELLOW"> XHTML
<SCRIPT>window.status="Hello!"</SCRIPT>

+
<TABLE WIDTH="640">
<TR><TD>


<CENTER>
<MARQUEE>Welcome to my homepage!</MARQUEE>

PRESENTATION
<IMG SRC="animation.gif" WIDTH="180"
HEIGHT="31" BORDER="1">
</CENTER>
CSS

+
<P><FONT COLOR="blue">This is a <BR>
<FONT FACE="Arial"><U>fake link</U>!
</FONT></P>
</TD>

<TD ALIGN="RIGHT">
<P>Sign my <A HREF="/cgi-bin/sign"
TARGET="_parent">guestbook</A></P>
<BLINK>Under <B>construction</B>!</BLINK>
 CLIENT-SIDE
SCRIPTING
JS
</TD></TR>
</TABLE>
</BODY>
</HTML>
STRUCTURE & CONTENT
CLUTTERED HTML
≠ PRESENTATION

PATTERN #12 
WEB AS A
PLATFORM
(UBIQUITY)


WEB

TV


YOUR SYSTEM
IS THE
PLATFORM


THE WEB

IS THE
PLATFORM

PROPRIETARY SPECIFICATIONS WEB STANDARDS
PLATFORM DEPENDENCE DEVICE INDEPENDENCE
BROWSER WARS CROSS-PLATFORM DESIGN

OMNIPRESENCIA (UBIQUITY)

THE WEB IS WATCHING YOU!

• Robot no exactamente
electro-mecánico
• Sensores:
Humanos-Máquinas
• Actuadores:
Humanos-Máquinas
• Entorno: Mundo
• AI: Webapps +
Semantic Web + ...

• Procesa datos y
responde a estímulos
• ¿Es autónomo?
• ¿Es consciente?
• Es tolerante a fallas
• Es “inteligente”
• Aprende
• Evoluciona

© John W Liberto

Suena ilusorio ¿no?

¿Pero podíamos imaginar hace 20 años
una web como la que hoy conocemos?

Además, ya hay aplicaciones...

Resumen: Los 12 patrones
• Information sharing • Folksonomy

• User-centered design • Community & collective
intelligence
• MVC Architecture
• Inter-operability & Data
• Participation & portability
collaboration
• Rich User Experience
• Social networking
• Content ≠ Presentation
• Search & recommendation
engines • Web as a Platform

Conclusiones

• Web como AI (como un robot)

Conclusiones

• Misión de la web semántica:
Agregar valor a la información

Conclusiones

• Misión de la web semántica:
Agregar valor a la información
• ¿Tecnología? No, patrones

¡Gracias!

santiago.coffey@popego.com
http://twitter.com/scoffey
http://scoffey.popego.com

Python en la Web Semántica

Más contenido relacionado

Destacado

Similar a Python en la Web Semántica

Python en la Web Semántica

Notas del editor