SlideShare una empresa de Scribd logo
1 de 11
Descargar para leer sin conexión
The S4 project & code 
OWNER: Raphael Bonaque 
PRESENTER: Juan Álvaro Muñoz Naranjo 
OAK Code Days 
16-18 October 2014
(Very) general overview 
— S4 stands for “Social Semantic Structured Search” 
— Goal: RDF-based keyword search engine in social and structured 
environments (currently Twitter) 
— Keywords to be searched are defined by RDF semantics 
— Results are ranked by proximity and position of the (extended) 
keywords within the documents and their comments 
— Examples: searching for “animal” should return tweets 
containing “cat”, “dog”, “eagle” sorted by the ranking criteria 
— Keywords are currently taken from DBPedia
Programming language 
— 
History, versions 
— Recent project 
— Two branches: 
— Storage through serialization 
— Storage via PostgreSQL 
— No code is reused from or into other projects
Code size 
Lines Python 
Classes 
Contributors, users 
— Raphael Bonaque 
Python 
scripts (.py) 
Serialization ~1240 12 9 
PostgreSQL ~1480 9 7
Code repository 
— https://gforge.inria.fr/scm/viewvc.php/?root=xrp 
Folder “postgres4”: version for the PostgreSQL DB. 
(permission needed) 
Papers 
— R. Bonaque, B. Cautis, F. Goasdoué, I. Manolescu. Toward 
Social, Structured and Semantic Search. SDSW’14, co-located 
with ISWC’14. 
— R. Bonaque, B. Cautis, F. Goasdoué, I. Manolescu. S4 
Structured Social and Semantic Search (working draft).
Overview of the software 
— Input: 
— User query 
— Twitter (static) database 
— RDF semantics 
— Output: 
— A ranked collection of tweets
Main modules 
1. Tweets retrieval 
• Use of Twitter API through the 
TweetPy library 
• Compresses retrieved data 
• receiving.py: tweet retriever through TweetPy 
• archiving.py: data compression and management 
• secrets.py: API key (not in the repo) 
2. Semantics retrieval & storage 
• RDF semantics creation & storage 
from DBPedia 
• rdf_db.py: PostgreSQL I/O wrapper 
3. Tweets storage 
• Decompresses tweets 
• Parses tweets according to RDF sems. 
• Stores parsed tweets in DB 
• twitter_database.py: (old ver.) object serialization 
• social_db.py: (new ver.) PostgreSQL I/O wrapper 
• archiving.py 
• config.py: database parameters (conn. string, etc) 
4. Search engine 
• Search algorithm 
• algorithm.py: interface for algorithms 
• baseline_algorithm.py: actual algorithm and entry 
point
Workflow 
Module 4 
Module 1 
receiving.py 
archiving.py 
Module 3 
social_db.py 
archiving.py 
baseline_algorithm.py 
Entry point: baseline_algorithm.top() 
Module 2 
rdf_db.py 
tweets 
compressed tweets 
uncompressed tweets 
keywords from DB 
User 
query 
Top-k 
tweets 
Offline
External software 
— TweetPy: twitter API interface for Python 
http://github.com/tweepy/tweepy 
— Twitter_nlp: Tweet natural language processing for Python 
http://github.com/aritter/twitter_nlp 
— Psycopg: PostgreSQL adapter for Python 
http://initd.org/psycopg 
— Scipy: scientific calculations library for Python 
http://www.scipy.org 
— XZ: data compression tool 
http://tukaani.org/xz 
— Matplotlib: (soon) plotting library for Python 
http://matplotlib.org
TODO 
— Implement execution scripts 
— Testing, benchmarking 
— Graph drawing 
— Optimization: query rewriting 
— Use of the “RDF loader into PostgreSQL” project 
— Alternatives to the baseline algorithm 
Known bugs 
— TweetPy crashes randomly , so Raphael had to make a 
wrapper to restart it when needed
Merci!

Más contenido relacionado

La actualidad más candente

Content Analysis with Apache Tika
Content Analysis with Apache TikaContent Analysis with Apache Tika
Content Analysis with Apache TikaPaolo Mottadelli
 
A Comparison Between Python APIs For RDF Processing
A Comparison Between Python APIs For RDF ProcessingA Comparison Between Python APIs For RDF Processing
A Comparison Between Python APIs For RDF Processinglucianb
 
Scientific data curation and processing with Apache Tika
Scientific data curation and processing with Apache TikaScientific data curation and processing with Apache Tika
Scientific data curation and processing with Apache TikaChris Mattmann
 
Biblio to Fedora Commons REST API
Biblio to Fedora Commons REST APIBiblio to Fedora Commons REST API
Biblio to Fedora Commons REST APIcmoyers
 
What's new with Apache Tika?
What's new with Apache Tika?What's new with Apache Tika?
What's new with Apache Tika?gagravarr
 
Content extraction with apache tika
Content extraction with apache tikaContent extraction with apache tika
Content extraction with apache tikaJukka Zitting
 
PhD Projects in Python With Source Code
PhD Projects in Python With Source CodePhD Projects in Python With Source Code
PhD Projects in Python With Source CodePhD Services
 
Detection of Malware Downloads via Graph Mining (AsiaCCS '16)
Detection of Malware Downloads via Graph Mining (AsiaCCS '16)Detection of Malware Downloads via Graph Mining (AsiaCCS '16)
Detection of Malware Downloads via Graph Mining (AsiaCCS '16)Marco Balduzzi
 
Managing JSON Deliverables with Fuzzy String-Matching Logic and the Path Reader
Managing JSON Deliverables with Fuzzy String-Matching Logic and the Path ReaderManaging JSON Deliverables with Fuzzy String-Matching Logic and the Path Reader
Managing JSON Deliverables with Fuzzy String-Matching Logic and the Path ReaderSafe Software
 
Search Me: Using Lucene.Net
Search Me: Using Lucene.NetSearch Me: Using Lucene.Net
Search Me: Using Lucene.Netgramana
 
Autopsy 3.0 - Open Source Digital Forensics Conference
Autopsy 3.0 - Open Source Digital Forensics ConferenceAutopsy 3.0 - Open Source Digital Forensics Conference
Autopsy 3.0 - Open Source Digital Forensics ConferenceBasis Technology
 
Access the world’s research outputs through the CORE API
Access the world’s research outputs through the CORE API Access the world’s research outputs through the CORE API
Access the world’s research outputs through the CORE API Matteo Cancellieri
 
Reproducible research concepts and tools
Reproducible research concepts and toolsReproducible research concepts and tools
Reproducible research concepts and toolsC. Tobin Magle
 
Content analysis for ECM with Apache Tika
Content analysis for ECM with Apache TikaContent analysis for ECM with Apache Tika
Content analysis for ECM with Apache TikaPaolo Mottadelli
 

La actualidad más candente (20)

Content Analysis with Apache Tika
Content Analysis with Apache TikaContent Analysis with Apache Tika
Content Analysis with Apache Tika
 
Scrutiny 2
Scrutiny 2Scrutiny 2
Scrutiny 2
 
A Comparison Between Python APIs For RDF Processing
A Comparison Between Python APIs For RDF ProcessingA Comparison Between Python APIs For RDF Processing
A Comparison Between Python APIs For RDF Processing
 
Scientific data curation and processing with Apache Tika
Scientific data curation and processing with Apache TikaScientific data curation and processing with Apache Tika
Scientific data curation and processing with Apache Tika
 
Biblio to Fedora Commons REST API
Biblio to Fedora Commons REST APIBiblio to Fedora Commons REST API
Biblio to Fedora Commons REST API
 
What's new with Apache Tika?
What's new with Apache Tika?What's new with Apache Tika?
What's new with Apache Tika?
 
Content extraction with apache tika
Content extraction with apache tikaContent extraction with apache tika
Content extraction with apache tika
 
ProjectHub
ProjectHubProjectHub
ProjectHub
 
Reproducible research
Reproducible researchReproducible research
Reproducible research
 
PhD Projects in Python With Source Code
PhD Projects in Python With Source CodePhD Projects in Python With Source Code
PhD Projects in Python With Source Code
 
Detection of Malware Downloads via Graph Mining (AsiaCCS '16)
Detection of Malware Downloads via Graph Mining (AsiaCCS '16)Detection of Malware Downloads via Graph Mining (AsiaCCS '16)
Detection of Malware Downloads via Graph Mining (AsiaCCS '16)
 
Managing JSON Deliverables with Fuzzy String-Matching Logic and the Path Reader
Managing JSON Deliverables with Fuzzy String-Matching Logic and the Path ReaderManaging JSON Deliverables with Fuzzy String-Matching Logic and the Path Reader
Managing JSON Deliverables with Fuzzy String-Matching Logic and the Path Reader
 
Search Me: Using Lucene.Net
Search Me: Using Lucene.NetSearch Me: Using Lucene.Net
Search Me: Using Lucene.Net
 
Autopsy 3.0 - Open Source Digital Forensics Conference
Autopsy 3.0 - Open Source Digital Forensics ConferenceAutopsy 3.0 - Open Source Digital Forensics Conference
Autopsy 3.0 - Open Source Digital Forensics Conference
 
Linked Library Data: stap voor stap
Linked Library Data: stap voor stapLinked Library Data: stap voor stap
Linked Library Data: stap voor stap
 
Ld4 l triannon
Ld4 l triannonLd4 l triannon
Ld4 l triannon
 
Access the world’s research outputs through the CORE API
Access the world’s research outputs through the CORE API Access the world’s research outputs through the CORE API
Access the world’s research outputs through the CORE API
 
Reproducible research concepts and tools
Reproducible research concepts and toolsReproducible research concepts and tools
Reproducible research concepts and tools
 
Content analysis for ECM with Apache Tika
Content analysis for ECM with Apache TikaContent analysis for ECM with Apache Tika
Content analysis for ECM with Apache Tika
 
IKON Final Presentation
IKON Final PresentationIKON Final Presentation
IKON Final Presentation
 

Destacado

172.набор новых сотрудников и проблема сохранения кадров специальный обзор
172.набор новых сотрудников и проблема сохранения кадров специальный обзор172.набор новых сотрудников и проблема сохранения кадров специальный обзор
172.набор новых сотрудников и проблема сохранения кадров специальный обзорivanov15666688
 
Ziarate Ashoora ke Fawaid
Ziarate Ashoora ke FawaidZiarate Ashoora ke Fawaid
Ziarate Ashoora ke Fawaidshia qaum
 
Google Analytics Basic Essentials
Google Analytics Basic EssentialsGoogle Analytics Basic Essentials
Google Analytics Basic EssentialseMarket Education
 
대한민국 과학기술이 해결해야 할 10대과제
대한민국 과학기술이 해결해야 할 10대과제대한민국 과학기술이 해결해야 할 10대과제
대한민국 과학기술이 해결해야 할 10대과제k27sin
 
Rab kie kit dak bk kb n 2014
Rab kie kit dak bk kb n 2014Rab kie kit dak bk kb n 2014
Rab kie kit dak bk kb n 2014Asaka Cv
 
Portfolio Jurgen Jansen
Portfolio Jurgen JansenPortfolio Jurgen Jansen
Portfolio Jurgen JansenJurgen Jansen
 
Kees Riekwel - Info battle finalevragen
Kees Riekwel - Info battle finalevragenKees Riekwel - Info battle finalevragen
Kees Riekwel - Info battle finalevragennvbonline
 
Preliminary Planning - Daryl
Preliminary Planning - DarylPreliminary Planning - Daryl
Preliminary Planning - Darylrhsmediastudies
 
Raddon Chart of the Day February 7, 2012
Raddon Chart of the Day February 7, 2012Raddon Chart of the Day February 7, 2012
Raddon Chart of the Day February 7, 2012Raddon Financial Group
 
Búsqueda y gestión de la información en la web
Búsqueda y gestión de la información en la webBúsqueda y gestión de la información en la web
Búsqueda y gestión de la información en la webgladysloor93
 
Common Consumer Frauds and How to Avoid Them-03-14
Common Consumer Frauds and How to Avoid Them-03-14Common Consumer Frauds and How to Avoid Them-03-14
Common Consumer Frauds and How to Avoid Them-03-14Barbara O'Neill
 
παναγιώτης τσικνοπεπτη
παναγιώτης τσικνοπεπτηπαναγιώτης τσικνοπεπτη
παναγιώτης τσικνοπεπτηstkarapy
 
STEPHEN W PACE CV 2015
STEPHEN W PACE CV 2015STEPHEN W PACE CV 2015
STEPHEN W PACE CV 2015Stephen Pace
 
Olympic brochure irantzu and maite
Olympic brochure irantzu and maiteOlympic brochure irantzu and maite
Olympic brochure irantzu and maiteaniturribhi
 
новый год в армении.Pptx
новый год в армении.Pptxновый год в армении.Pptx
новый год в армении.Pptxveneravenerka
 
chỗ nào dịch vụ giúp việc nhà tốt giá rẻ sài gòn
chỗ nào dịch vụ giúp việc nhà tốt giá rẻ sài gònchỗ nào dịch vụ giúp việc nhà tốt giá rẻ sài gòn
chỗ nào dịch vụ giúp việc nhà tốt giá rẻ sài gònkristopher782
 

Destacado (20)

172.набор новых сотрудников и проблема сохранения кадров специальный обзор
172.набор новых сотрудников и проблема сохранения кадров специальный обзор172.набор новых сотрудников и проблема сохранения кадров специальный обзор
172.набор новых сотрудников и проблема сохранения кадров специальный обзор
 
Ziarate Ashoora ke Fawaid
Ziarate Ashoora ke FawaidZiarate Ashoora ke Fawaid
Ziarate Ashoora ke Fawaid
 
Google Analytics Basic Essentials
Google Analytics Basic EssentialsGoogle Analytics Basic Essentials
Google Analytics Basic Essentials
 
Office politics
Office politicsOffice politics
Office politics
 
대한민국 과학기술이 해결해야 할 10대과제
대한민국 과학기술이 해결해야 할 10대과제대한민국 과학기술이 해결해야 할 10대과제
대한민국 과학기술이 해결해야 할 10대과제
 
No Excuses!
No Excuses!No Excuses!
No Excuses!
 
Presentation1
Presentation1Presentation1
Presentation1
 
Rab kie kit dak bk kb n 2014
Rab kie kit dak bk kb n 2014Rab kie kit dak bk kb n 2014
Rab kie kit dak bk kb n 2014
 
Portfolio Jurgen Jansen
Portfolio Jurgen JansenPortfolio Jurgen Jansen
Portfolio Jurgen Jansen
 
A thing
A thingA thing
A thing
 
Kees Riekwel - Info battle finalevragen
Kees Riekwel - Info battle finalevragenKees Riekwel - Info battle finalevragen
Kees Riekwel - Info battle finalevragen
 
Preliminary Planning - Daryl
Preliminary Planning - DarylPreliminary Planning - Daryl
Preliminary Planning - Daryl
 
Raddon Chart of the Day February 7, 2012
Raddon Chart of the Day February 7, 2012Raddon Chart of the Day February 7, 2012
Raddon Chart of the Day February 7, 2012
 
Búsqueda y gestión de la información en la web
Búsqueda y gestión de la información en la webBúsqueda y gestión de la información en la web
Búsqueda y gestión de la información en la web
 
Common Consumer Frauds and How to Avoid Them-03-14
Common Consumer Frauds and How to Avoid Them-03-14Common Consumer Frauds and How to Avoid Them-03-14
Common Consumer Frauds and How to Avoid Them-03-14
 
παναγιώτης τσικνοπεπτη
παναγιώτης τσικνοπεπτηπαναγιώτης τσικνοπεπτη
παναγιώτης τσικνοπεπτη
 
STEPHEN W PACE CV 2015
STEPHEN W PACE CV 2015STEPHEN W PACE CV 2015
STEPHEN W PACE CV 2015
 
Olympic brochure irantzu and maite
Olympic brochure irantzu and maiteOlympic brochure irantzu and maite
Olympic brochure irantzu and maite
 
новый год в армении.Pptx
новый год в армении.Pptxновый год в армении.Pptx
новый год в армении.Pptx
 
chỗ nào dịch vụ giúp việc nhà tốt giá rẻ sài gòn
chỗ nào dịch vụ giúp việc nhà tốt giá rẻ sài gònchỗ nào dịch vụ giúp việc nhà tốt giá rẻ sài gòn
chỗ nào dịch vụ giúp việc nhà tốt giá rẻ sài gòn
 

Similar a S4

Linked Data from a Digital Object Management System
Linked Data from a Digital Object Management SystemLinked Data from a Digital Object Management System
Linked Data from a Digital Object Management SystemUldis Bojars
 
11.5.14 Presentation Slides, “Fedora 4.0 in Action at Penn State and Stanford”
11.5.14 Presentation Slides, “Fedora 4.0 in Action at Penn State and Stanford”11.5.14 Presentation Slides, “Fedora 4.0 in Action at Penn State and Stanford”
11.5.14 Presentation Slides, “Fedora 4.0 in Action at Penn State and Stanford”DuraSpace
 
Analysing GitHub commits with R
Analysing GitHub commits with RAnalysing GitHub commits with R
Analysing GitHub commits with RBarbara Fusinska
 
The nature.com ontologies portal: nature.com/ontologies
The nature.com ontologies portal: nature.com/ontologiesThe nature.com ontologies portal: nature.com/ontologies
The nature.com ontologies portal: nature.com/ontologiesTony Hammond
 
ROHub-Argos integration
ROHub-Argos integrationROHub-Argos integration
ROHub-Argos integrationRaul Palma
 
CNI 2018: A Research Object Authoring Tool for the Data Commons
CNI 2018: A Research Object Authoring Tool for the Data CommonsCNI 2018: A Research Object Authoring Tool for the Data Commons
CNI 2018: A Research Object Authoring Tool for the Data CommonsAnita de Waard
 
Drupal and Apache Stanbol
Drupal and Apache StanbolDrupal and Apache Stanbol
Drupal and Apache StanbolAlkuvoima
 
Apache-Flink-What-How-Why-Who-Where-by-Slim-Baltagi
Apache-Flink-What-How-Why-Who-Where-by-Slim-BaltagiApache-Flink-What-How-Why-Who-Where-by-Slim-Baltagi
Apache-Flink-What-How-Why-Who-Where-by-Slim-BaltagiSlim Baltagi
 
Introduction To Python
Introduction To PythonIntroduction To Python
Introduction To PythonVanessa Rene
 
Open Source Lambda Architecture for deep learning
Open Source Lambda Architecture for deep learningOpen Source Lambda Architecture for deep learning
Open Source Lambda Architecture for deep learningPatrick Nicolas
 
Big Data Analytics course: Named Entities and Deep Learning for NLP
Big Data Analytics course: Named Entities and Deep Learning for NLPBig Data Analytics course: Named Entities and Deep Learning for NLP
Big Data Analytics course: Named Entities and Deep Learning for NLPChristian Morbidoni
 
Why do they call it Linked Data when they want to say...?
Why do they call it Linked Data when they want to say...?Why do they call it Linked Data when they want to say...?
Why do they call it Linked Data when they want to say...?Oscar Corcho
 
Building APIs in an easy way using API Platform
Building APIs in an easy way using API PlatformBuilding APIs in an easy way using API Platform
Building APIs in an easy way using API PlatformAntonio Peric-Mazar
 
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)Jason Dai
 
S. Bartoli & F. Pompermaier – A Semantic Big Data Companion
S. Bartoli & F. Pompermaier – A Semantic Big Data CompanionS. Bartoli & F. Pompermaier – A Semantic Big Data Companion
S. Bartoli & F. Pompermaier – A Semantic Big Data CompanionFlink Forward
 
FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout Carole Goble
 
Introduction to Research Objects - Collaboartions Workshop 2015, Oxford
Introduction to Research Objects - Collaboartions Workshop 2015, OxfordIntroduction to Research Objects - Collaboartions Workshop 2015, Oxford
Introduction to Research Objects - Collaboartions Workshop 2015, Oxfordmatthewgamble
 

Similar a S4 (20)

Crosslingual search-engine
Crosslingual search-engineCrosslingual search-engine
Crosslingual search-engine
 
Linked Data from a Digital Object Management System
Linked Data from a Digital Object Management SystemLinked Data from a Digital Object Management System
Linked Data from a Digital Object Management System
 
11.5.14 Presentation Slides, “Fedora 4.0 in Action at Penn State and Stanford”
11.5.14 Presentation Slides, “Fedora 4.0 in Action at Penn State and Stanford”11.5.14 Presentation Slides, “Fedora 4.0 in Action at Penn State and Stanford”
11.5.14 Presentation Slides, “Fedora 4.0 in Action at Penn State and Stanford”
 
Analysing GitHub commits with R
Analysing GitHub commits with RAnalysing GitHub commits with R
Analysing GitHub commits with R
 
The nature.com ontologies portal: nature.com/ontologies
The nature.com ontologies portal: nature.com/ontologiesThe nature.com ontologies portal: nature.com/ontologies
The nature.com ontologies portal: nature.com/ontologies
 
ROHub-Argos integration
ROHub-Argos integrationROHub-Argos integration
ROHub-Argos integration
 
CNI 2018: A Research Object Authoring Tool for the Data Commons
CNI 2018: A Research Object Authoring Tool for the Data CommonsCNI 2018: A Research Object Authoring Tool for the Data Commons
CNI 2018: A Research Object Authoring Tool for the Data Commons
 
Drupal and Apache Stanbol
Drupal and Apache StanbolDrupal and Apache Stanbol
Drupal and Apache Stanbol
 
Apache-Flink-What-How-Why-Who-Where-by-Slim-Baltagi
Apache-Flink-What-How-Why-Who-Where-by-Slim-BaltagiApache-Flink-What-How-Why-Who-Where-by-Slim-Baltagi
Apache-Flink-What-How-Why-Who-Where-by-Slim-Baltagi
 
Introduction To Python
Introduction To PythonIntroduction To Python
Introduction To Python
 
Cataloging GitHub Repositories
Cataloging GitHub RepositoriesCataloging GitHub Repositories
Cataloging GitHub Repositories
 
CORE APIv3
CORE APIv3CORE APIv3
CORE APIv3
 
Open Source Lambda Architecture for deep learning
Open Source Lambda Architecture for deep learningOpen Source Lambda Architecture for deep learning
Open Source Lambda Architecture for deep learning
 
Big Data Analytics course: Named Entities and Deep Learning for NLP
Big Data Analytics course: Named Entities and Deep Learning for NLPBig Data Analytics course: Named Entities and Deep Learning for NLP
Big Data Analytics course: Named Entities and Deep Learning for NLP
 
Why do they call it Linked Data when they want to say...?
Why do they call it Linked Data when they want to say...?Why do they call it Linked Data when they want to say...?
Why do they call it Linked Data when they want to say...?
 
Building APIs in an easy way using API Platform
Building APIs in an easy way using API PlatformBuilding APIs in an easy way using API Platform
Building APIs in an easy way using API Platform
 
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
 
S. Bartoli & F. Pompermaier – A Semantic Big Data Companion
S. Bartoli & F. Pompermaier – A Semantic Big Data CompanionS. Bartoli & F. Pompermaier – A Semantic Big Data Companion
S. Bartoli & F. Pompermaier – A Semantic Big Data Companion
 
FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout
 
Introduction to Research Objects - Collaboartions Workshop 2015, Oxford
Introduction to Research Objects - Collaboartions Workshop 2015, OxfordIntroduction to Research Objects - Collaboartions Workshop 2015, Oxford
Introduction to Research Objects - Collaboartions Workshop 2015, Oxford
 

Más de INRIA-OAK

Change Management in the Traditional and Semantic Web
Change Management in the Traditional and Semantic WebChange Management in the Traditional and Semantic Web
Change Management in the Traditional and Semantic WebINRIA-OAK
 
A Network-Aware Approach for Searching As-You-Type in Social Media
A Network-Aware Approach for Searching As-You-Type in Social MediaA Network-Aware Approach for Searching As-You-Type in Social Media
A Network-Aware Approach for Searching As-You-Type in Social MediaINRIA-OAK
 
Speeding up information extraction programs: a holistic optimizer and a learn...
Speeding up information extraction programs: a holistic optimizer and a learn...Speeding up information extraction programs: a holistic optimizer and a learn...
Speeding up information extraction programs: a holistic optimizer and a learn...INRIA-OAK
 
Querying incomplete data
Querying incomplete dataQuerying incomplete data
Querying incomplete dataINRIA-OAK
 
ANGIE in wonderland
ANGIE in wonderlandANGIE in wonderland
ANGIE in wonderlandINRIA-OAK
 
On building more human query answering systems
On building more human query answering systemsOn building more human query answering systems
On building more human query answering systemsINRIA-OAK
 
Dynamically Optimizing Queries over Large Scale Data Platforms
Dynamically Optimizing Queries over Large Scale Data PlatformsDynamically Optimizing Queries over Large Scale Data Platforms
Dynamically Optimizing Queries over Large Scale Data PlatformsINRIA-OAK
 
Web Data Management in RDF Age
Web Data Management in RDF AgeWeb Data Management in RDF Age
Web Data Management in RDF AgeINRIA-OAK
 
Oak meeting 18/09/2014
Oak meeting 18/09/2014Oak meeting 18/09/2014
Oak meeting 18/09/2014INRIA-OAK
 
Rdf saturator
Rdf saturatorRdf saturator
Rdf saturatorINRIA-OAK
 
Rdf generator
Rdf generatorRdf generator
Rdf generatorINRIA-OAK
 
Rdf conjunctive query selectivity estimation
Rdf conjunctive query selectivity estimationRdf conjunctive query selectivity estimation
Rdf conjunctive query selectivity estimationINRIA-OAK
 
rdf query reformulation
rdf query reformulationrdf query reformulation
rdf query reformulationINRIA-OAK
 
postgres loader
postgres loaderpostgres loader
postgres loaderINRIA-OAK
 
Conjunctive queries
Conjunctive queriesConjunctive queries
Conjunctive queriesINRIA-OAK
 

Más de INRIA-OAK (20)

Change Management in the Traditional and Semantic Web
Change Management in the Traditional and Semantic WebChange Management in the Traditional and Semantic Web
Change Management in the Traditional and Semantic Web
 
A Network-Aware Approach for Searching As-You-Type in Social Media
A Network-Aware Approach for Searching As-You-Type in Social MediaA Network-Aware Approach for Searching As-You-Type in Social Media
A Network-Aware Approach for Searching As-You-Type in Social Media
 
Speeding up information extraction programs: a holistic optimizer and a learn...
Speeding up information extraction programs: a holistic optimizer and a learn...Speeding up information extraction programs: a holistic optimizer and a learn...
Speeding up information extraction programs: a holistic optimizer and a learn...
 
Querying incomplete data
Querying incomplete dataQuerying incomplete data
Querying incomplete data
 
ANGIE in wonderland
ANGIE in wonderlandANGIE in wonderland
ANGIE in wonderland
 
On building more human query answering systems
On building more human query answering systemsOn building more human query answering systems
On building more human query answering systems
 
Dynamically Optimizing Queries over Large Scale Data Platforms
Dynamically Optimizing Queries over Large Scale Data PlatformsDynamically Optimizing Queries over Large Scale Data Platforms
Dynamically Optimizing Queries over Large Scale Data Platforms
 
Web Data Management in RDF Age
Web Data Management in RDF AgeWeb Data Management in RDF Age
Web Data Management in RDF Age
 
Oak meeting 18/09/2014
Oak meeting 18/09/2014Oak meeting 18/09/2014
Oak meeting 18/09/2014
 
Nautilus
NautilusNautilus
Nautilus
 
Warg
WargWarg
Warg
 
Vip2p
Vip2pVip2p
Vip2p
 
Rdf saturator
Rdf saturatorRdf saturator
Rdf saturator
 
Rdf generator
Rdf generatorRdf generator
Rdf generator
 
Rdf conjunctive query selectivity estimation
Rdf conjunctive query selectivity estimationRdf conjunctive query selectivity estimation
Rdf conjunctive query selectivity estimation
 
rdf query reformulation
rdf query reformulationrdf query reformulation
rdf query reformulation
 
postgres loader
postgres loaderpostgres loader
postgres loader
 
Plreuse
PlreusePlreuse
Plreuse
 
Paxquery
PaxqueryPaxquery
Paxquery
 
Conjunctive queries
Conjunctive queriesConjunctive queries
Conjunctive queries
 

Último

Mapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxMapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxVenkatasubramani13
 
CCS336-Cloud-Services-Management-Lecture-Notes-1.pptx
CCS336-Cloud-Services-Management-Lecture-Notes-1.pptxCCS336-Cloud-Services-Management-Lecture-Notes-1.pptx
CCS336-Cloud-Services-Management-Lecture-Notes-1.pptxdhiyaneswaranv1
 
5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best PracticesDataArchiva
 
ChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics InfrastructureChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics Infrastructuresonikadigital1
 
CI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionCI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionajayrajaganeshkayala
 
Master's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationMaster's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationGiorgio Carbone
 
How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?sonikadigital1
 
Rock Songs common codes and conventions.pptx
Rock Songs common codes and conventions.pptxRock Songs common codes and conventions.pptx
Rock Songs common codes and conventions.pptxFinatron037
 
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityStrategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityAggregage
 
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxTINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxDwiAyuSitiHartinah
 
Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Vladislav Solodkiy
 
The Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerThe Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerPavel Šabatka
 
Virtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product IntroductionVirtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product Introductionsanjaymuralee1
 
Optimal Decision Making - Cost Reduction in Logistics
Optimal Decision Making - Cost Reduction in LogisticsOptimal Decision Making - Cost Reduction in Logistics
Optimal Decision Making - Cost Reduction in LogisticsThinkInnovation
 
Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...PrithaVashisht1
 
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Guido X Jansen
 

Último (16)

Mapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxMapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptx
 
CCS336-Cloud-Services-Management-Lecture-Notes-1.pptx
CCS336-Cloud-Services-Management-Lecture-Notes-1.pptxCCS336-Cloud-Services-Management-Lecture-Notes-1.pptx
CCS336-Cloud-Services-Management-Lecture-Notes-1.pptx
 
5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices
 
ChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics InfrastructureChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics Infrastructure
 
CI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionCI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual intervention
 
Master's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationMaster's Thesis - Data Science - Presentation
Master's Thesis - Data Science - Presentation
 
How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?
 
Rock Songs common codes and conventions.pptx
Rock Songs common codes and conventions.pptxRock Songs common codes and conventions.pptx
Rock Songs common codes and conventions.pptx
 
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityStrategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
 
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxTINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
 
Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023
 
The Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerThe Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayer
 
Virtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product IntroductionVirtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product Introduction
 
Optimal Decision Making - Cost Reduction in Logistics
Optimal Decision Making - Cost Reduction in LogisticsOptimal Decision Making - Cost Reduction in Logistics
Optimal Decision Making - Cost Reduction in Logistics
 
Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...
 
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
 

S4

  • 1. The S4 project & code OWNER: Raphael Bonaque PRESENTER: Juan Álvaro Muñoz Naranjo OAK Code Days 16-18 October 2014
  • 2. (Very) general overview — S4 stands for “Social Semantic Structured Search” — Goal: RDF-based keyword search engine in social and structured environments (currently Twitter) — Keywords to be searched are defined by RDF semantics — Results are ranked by proximity and position of the (extended) keywords within the documents and their comments — Examples: searching for “animal” should return tweets containing “cat”, “dog”, “eagle” sorted by the ranking criteria — Keywords are currently taken from DBPedia
  • 3. Programming language — History, versions — Recent project — Two branches: — Storage through serialization — Storage via PostgreSQL — No code is reused from or into other projects
  • 4. Code size Lines Python Classes Contributors, users — Raphael Bonaque Python scripts (.py) Serialization ~1240 12 9 PostgreSQL ~1480 9 7
  • 5. Code repository — https://gforge.inria.fr/scm/viewvc.php/?root=xrp Folder “postgres4”: version for the PostgreSQL DB. (permission needed) Papers — R. Bonaque, B. Cautis, F. Goasdoué, I. Manolescu. Toward Social, Structured and Semantic Search. SDSW’14, co-located with ISWC’14. — R. Bonaque, B. Cautis, F. Goasdoué, I. Manolescu. S4 Structured Social and Semantic Search (working draft).
  • 6. Overview of the software — Input: — User query — Twitter (static) database — RDF semantics — Output: — A ranked collection of tweets
  • 7. Main modules 1. Tweets retrieval • Use of Twitter API through the TweetPy library • Compresses retrieved data • receiving.py: tweet retriever through TweetPy • archiving.py: data compression and management • secrets.py: API key (not in the repo) 2. Semantics retrieval & storage • RDF semantics creation & storage from DBPedia • rdf_db.py: PostgreSQL I/O wrapper 3. Tweets storage • Decompresses tweets • Parses tweets according to RDF sems. • Stores parsed tweets in DB • twitter_database.py: (old ver.) object serialization • social_db.py: (new ver.) PostgreSQL I/O wrapper • archiving.py • config.py: database parameters (conn. string, etc) 4. Search engine • Search algorithm • algorithm.py: interface for algorithms • baseline_algorithm.py: actual algorithm and entry point
  • 8. Workflow Module 4 Module 1 receiving.py archiving.py Module 3 social_db.py archiving.py baseline_algorithm.py Entry point: baseline_algorithm.top() Module 2 rdf_db.py tweets compressed tweets uncompressed tweets keywords from DB User query Top-k tweets Offline
  • 9. External software — TweetPy: twitter API interface for Python http://github.com/tweepy/tweepy — Twitter_nlp: Tweet natural language processing for Python http://github.com/aritter/twitter_nlp — Psycopg: PostgreSQL adapter for Python http://initd.org/psycopg — Scipy: scientific calculations library for Python http://www.scipy.org — XZ: data compression tool http://tukaani.org/xz — Matplotlib: (soon) plotting library for Python http://matplotlib.org
  • 10. TODO — Implement execution scripts — Testing, benchmarking — Graph drawing — Optimization: query rewriting — Use of the “RDF loader into PostgreSQL” project — Alternatives to the baseline algorithm Known bugs — TweetPy crashes randomly , so Raphael had to make a wrapper to restart it when needed