SlideShare una empresa de Scribd logo
1 de 22
Descargar para leer sin conexión
DISIT Lab, Distributed Data Intelligence and Technologies 
Distributed Systems and Internet Technologies 
Department of Information Engineering (DINFO) 
http://www.disit.dinfo.unifi.it Ontology Building vs Data 
Harvesting and Cleaning for 
Smart‐city Services 
Pierfrancesco Bellini, Monica Benigni, 
Riccardo Billero, Paolo Nesi, Nadia Rauch 
Dipartimento di Ingegneria dell’Informazione, DINFO 
Università degli Studi di Firenze 
Via S. Marta 3, 50139, Firenze, Italy 
Tel: +39-055-4796567, fax: +39-055-4796363 
DISIT Lab 
http://www.disit.dinfo.unifi.it alias http://www.disit.org , paolo.nesi@unifi.it 
Proc. of the 20th International Conference on Distributed Multimedia Systems, 
Pittsburgh, USA, August 2014 
DISIT Lab (DINFO UNIFI), DMS 2014, USA, August 2014 1
DISIT Lab, Distributed Data Intelligence and Technologies 
Distributed Systems and Internet Technologies 
Department of Information Engineering (DINFO) 
http://www.disit.dinfo.unifi.it Smart‐City axes 
• Cities produce a HUGE amount of data every day 
– ‘Static’ data 
• Road graph 
• Bus/train graph 
• Services 
• ... 
– Dynamic (real time) data 
• Weather conditions 
• Traffic conditions 
• Pollution status 
• Bus/train positions 
• Parking status 
• People flows 
• ... 
– Open/Private Data 
DISIT Lab (DINFO UNIFI), DMS 2014, USA, August 2014 2 
• Smart Health 
• Smart Education 
• Smart Mobility 
• Smart Energy 
• Smart Governmental 
– Smart economy 
– Smart people 
– Smart environment 
– Smart living 
• Smart Telecommunication
DISIT Lab, Distributed Data Intelligence and Technologies 
Distributed Systems and Internet Technologies 
Department of Information Engineering (DINFO) 
http://www.disit.dinfo.unifi.it 
Smart‐City 
• Main Aim 
– Provide a platform able to ingest and take advantage a large 
number of the above data, big data: 
• Exploit data integration and reasoning 
• Deliver new services and applications to citizens, 
Leverage on the ongoing Semantic Web effort 
• Problems & Challenges 
– Data are provided in many different formats and protocols 
and from many different institutions, different convention 
and protocols, a different time, …. ! 
– Data are typically not aligned (e.g., street names, dates, 
geolocations, tags, … ). That is, they are not semantically 
interoperable 
– resulting a big data problem: volume, velocity, variability, 
variety, ….. 
DISIT Lab (DINFO UNIFI), DMS 2014, USA, August 2014 3
DISIT Lab, Distributed Data Intelligence and Technologies 
Distributed Systems and Internet Technologies 
Department of Information Engineering (DINFO) 
http://www.disit.dinfo.unifi.it 
Smart City Paradigm 
Profiled 
Services 
Profiled 
Services 
Interoperability 
Smart City 
Engine 
Data 
processing 
DISIT Lab (DINFO UNIFI), DMS 2014, USA, August 2014 4 
Data 
Sensors 
Data 
Harvesting 
Data / info 
Rendering 
Data / info 
Exploitation 
Suggestions 
and Alarms 
Real Time 
Data 
Social Data 
trends 
Data Ingesting 
and mining 
Reasoning and 
Deduction 
Data Acting 
processors 
Real Time 
Computing 
eGov Data 
collection 
Sensors 
control 
Traffic 
control 
Energy 
central 
eHealth 
Agency 
Telecom. 
Services 
……. 
Social 
Media 
Peripheral 
processors 
Citizens 
Formation 
Applications
DISIT Lab, Distributed Data Intelligence and Technologies 
Distributed Systems and Internet Technologies 
Department of Information Engineering (DINFO) 
http://www.disit.dinfo.unifi.it Smart‐city Ontology 
• The data model provided have been mapped into 
the ontology, it covers different aspects: 
– Administration 
MetaData 
– Street‐guide 
– Points of interest 
ADMINISTRATIVEROAD 
PA hasPublicOffice OFFICE 
ownerAuthority PA 
SERVICE isInRoad ROAD 
Point of 
– Local public transport 
Administration 
Street‐guide 
Macroclass 
Macroclass 
Interest 
Macroclass 
CARPARK 
CARPARKSENSOR 
– Sensors 
BUSSTOP isInRoad ROAD 
isInRoad 
observeCarPark CARPARK 
ROAD 
WEATHERREPORT refersTo PA 
SENSOR measuredTime TIME 
– Temporal aspects 
Local public 
transport 
Sensors 
Temporal 
Macroclass 
Macroclass 
– Metadata on the data 
Macroclass 
BUS hasExpectedTime TIME 
BUSSTOPFORECAST 
atBusStop BUSSTOP 
DISIT Lab (DINFO UNIFI), DMS 2014, USA, August 2014 5
DISIT Lab, Distributed Data Intelligence and Technologies 
Distributed Systems and Internet Technologies 
Department of Information Engineering (DINFO) 
http://www.disit.dinfo.unifi.it 
Smart‐city Ontology 
• Administration: structure of the general public administrations 
(Municipality, Province and Region) also includes Resolutions 
(ordinance issued by administrations, may change the viability, 
infrastructural works, schedule for RTZ, etc. ) 
• Street‐guide: formed by entities as Road, Node, RoadElement, 
AdministrativeRoad, Milestone, StreetNumber, RoadLink, 
Junction, Entry, EntryRule, Maneuver,… represents the entire 
road system of the region, including the permitted maneuvers 
and the rules of access to the limited traffic zones. Based on OTN 
(Ontology of Transportation Networks) vocabulary 
• Points of Interest: includes all Services, activities, which may be 
useful to the citizen and who may have the need to search for 
and to arrive at, commercials, public administration, Cultural, …. 
DISIT Lab (DINFO UNIFI), DMS 2014, USA, August 2014 6
DISIT Lab, Distributed Data Intelligence and Technologies 
Distributed Systems and Internet Technologies 
Department of Information Engineering (DINFO) 
http://www.disit.dinfo.unifi.it 
Smart‐city Ontology 
• Local public transport: includes the data related to major local 
public transport companies as scheduled times, the rail graph, 
and data relating to real time passage at bus stops, real time 
position, ... 
• Sensors: data provided by sensors: currently, data are collected 
from various sensors (parking status, meteo, pollution) installed 
along some streets of Florence and surrounding areas, and from 
sensors installed into the main car parks of the region. 
– Plus: car sharing, bike sharing, AVM, RTZ, etc. 
• Temporal: that puts concepts related with time (time intervals 
and instants) into the ontology, so that associate a timeline to 
the events recorded and is possible to make forecasts. It uses 
time ontologies such as OWL‐Time. DISIT Lab (DINFO UNIFI), DMS 2014, USA, August 2014 7
DISIT Lab, Distributed Data Intelligence and Technologies 
Distributed Systems and Internet Technologies 
Department of Information Engineering (DINFO) 
http://www.disit.dinfo.unifi.it 
Smart‐city Ontology 
• Metadata: modeling the additional information associated with: 
– Descriptor of Data sets that produced the triples: data set ID, title, description, 
purpose, location, administration, version, responsible, etc.. 
– Licensing information 
– Process information: IDs of the processes adopted for ingestion, quality improvement, 
mapping, indexing,.. ; date and time of ingestion, update, review, …; 
When a problem is detected, we have the information to understand when and how the 
problem has been included 
• Including basic ontologies as: 
– DC: Dublin core, standard metadata 
– OTN: Ontology for Transport Network 
– FOAF: for the description of the relations among people or groups 
– vCard: for a description of people and organizations 
– wgs84_pos: for latitude and longitude, GPS info 
– OWL‐Time: reasoning on time, time intervals 
– GoodRelations: commercial activities models 
DISIT Lab (DINFO UNIFI), DMS 2014, USA, August 2014 8
DISIT Lab, Distributed Data Intelligence and Technologies 
Distributed Systems and Internet Technologies 
Department of Information Engineering (DINFO) 
http://www.disit.dinfo.unifi.it 
Smart‐city Ontology 
84 Classes 
93 ObjectProperties 
http://www.disit.org/5606 103 DataProperties 
DISIT Lab (DINFO UNIFI), DMS 2014, USA, August 2014 9
DISIT Lab, Distributed Data Intelligence and Technologies 
Distributed Systems and Internet Technologies 
Department of Information Engineering (DINFO) 
http://www.disit.dinfo.unifi.it Data Engineering Architecture 
Reconciliation 
SPARQL endpoint 
Phase IV 
DISIT Lab (DINFO UNIFI), DMS 2014, USA, August 2014 10 
HBase 
Service Map 
Linked Open 
Graph 
ETL 
Tran 
s. 
ETL 
Tran 
s. 
Process Scheduler 
ETL 
Tran 
s. 
Mapping 
Process 
RDF 
Store + 
indexes 
R2RML 
Model Validation 
Process 
Process 
Applications 
Phase I 
Phase III 
Phase V 
Phase VI 
Phase VII 
Data 
source 1 
Data 
source 2 
Data 
source n 
Phase II 
Quality 
improvem 
ent RDF 
triples 
indexing 
ontolo 
gies 
RT Data 
source n 
ETL 
Trans.
DISIT Lab, Distributed Data Intelligence and Technologies 
Distributed Systems and Internet Technologies 
Department of Information Engineering (DINFO) 
http://www.disit.dinfo.unifi.it 
Phase I ‐ Data Ingestion 
• Ingesting a wide range of OD/PD: public and private data, static, quasi 
static and/or dynamic real time data. 
• For the case of Florence, we are addressing about 150 different data 
sources of the 564 available, plus the regional, province, other 
municipalities, …. 
• Using Pentaho ‐ Kettle for data integration (Open source tool) 
– using specific ETL Kettle transformation processes (one or more for each 
data source) 
– data are stored in HBase (Bigdata NoSQL database) 
• Static and semi‐static data include: points of interests, geo‐referenced 
services, maps, accidents statistics, etc. 
– files in several formats (SHP, KML, CVS, ZIP, XML, etc.) 
• Dynamic data mainly data coming from sensors 
– parking, weather conditions, pollution measures, bus position, etc. 
– using Web Services. DISIT Lab (DINFO UNIFI), DMS 2014, USA, August 2014 11
DISIT Lab, Distributed Data Intelligence and Technologies 
Distributed Systems and Internet Technologies 
Department of Information Engineering (DINFO) 
http://www.disit.dinfo.unifi.it 
Phase II ‐ Data Quality Improvement 
• Problems kinds: 
– Inconsistencies, incompleteness,.. 
• Problems on: 
– CAPs vs Locations 
– Street names (e.g., dividing names from numbers, normalize when 
possible) 
– Dates and Time: normalizing 
– Telephone numbers: normalizing 
– Web links and emails: normalizing 
• Partial Usage of 
– Certified and accepted tables and additional knowledge 
• Enrichment process may need several versions: 
– VIP names, GeoNames, etc.. 
DISIT Lab (DINFO UNIFI), DMS 2014, USA, August 2014 12
DISIT Lab, Distributed Data Intelligence and Technologies 
Distributed Systems and Internet Technologies 
Department of Information Engineering (DINFO) 
http://www.disit.dinfo.unifi.it 
Phase III ‐ Data mapping 
• Transforms the data from HBase to RDF triples 
• Using Karma Data Integration tool, a mapping model 
from SQL to RDF on the basis of the ontology was 
created 
– Data to be mapped first temporarly passed from Hbase to 
MySQL and then mapped using Karma (in batch mode) 
• The mapped data in triples have to be uploaded (and 
indexed) to the RDF Store (OpenRDF – sesame with 
OWLIM‐SE) 
DISIT Lab (DINFO UNIFI), DMS 2014, USA, August 2014 13
DISIT Lab, Distributed Data Intelligence and Technologies 
Distributed Systems and Internet Technologies 
Department of Information Engineering (DINFO) 
http://www.disit.dinfo.unifi.it 
Phase IV ‐ Indexing 
• Periodic task for reindexing: triples, text, space (GPS), dates, etc. 
• Indexing triples: ontologies, all RDF files for OD, RT triples (from ‐ 
to), reconciliation triples for OD, triples for enrichments, etc. 
• If you do not index, you cannot identify all missing reconciliations 
Phase V ‐ Data Reconciliation/alignment 
• After the loading and indexing into the RDF store a dataset may 
be connected with the others if entities refer to the same triples 
– Missed connections strongly limit the usage of the knowledge base, 
– e.g. the services are not connected with the road graph. 
• To associate each Service with a Road and an Entity on the basis 
of the street name, number and locality 
• It is not easy! data coming from different sources 
DISIT Lab (DINFO UNIFI), DMS 2014, USA, August 2014 14
DISIT Lab, Distributed Data Intelligence and Technologies 
Distributed Systems and Internet Technologies 
Department of Information Engineering (DINFO) 
http://www.disit.dinfo.unifi.it 
Phase V ‐ Data Reconciliation/alignment 
DISIT Lab (DINFO UNIFI), DMS 2014, USA, August 2014 15 
• Examples: 
– Typos; 
– Missing street number, or replaced with "0" or 
"SNC"; 
– Municipalities with no official name (e.g. 
Vicchio/Vicchio del Mugello); 
– Street names and street numbers with strange 
characters ( ‐, /, ° ? , Ang., ,); 
– Road name with words in a different order ( e.g. 
Via Petrarca Francesco, exchange of name and 
surname); 
– Red street numbers (for shops); 
– Presence/absence of proper names in road 
name (e.g. via Camillo Benso di Cavour / via 
Cavour); 
– Number wrongly written (e.g. 34/AB, 403D, 
36INT.1); 
– Roman numerals in the road name (e.g., via 
XXVII Aprile). 
• Steps: 
1. SPARQL Exact match – match 
the strings as they are 
2. SPARQL Enhanced Exact 
Match – make some 
substitutions (Via S. Marta  
Via Santa Marta, ...) 
3. Last Word Search – use only 
the last word of street name 
4. Use Google GeoCoding API 
5. Remove ‘strange chars’ ( ‐, /, 
°, ? , Ang., ,) from Street 
number 
6. Remove ‘strange chars’ from 
Street name 
7. Rewrite wrong municipality 
names
DISIT Lab, Distributed Data Intelligence and Technologies 
Distributed Systems and Internet Technologies 
Department of Information Engineering (DINFO) 
http://www.disit.dinfo.unifi.it 
Phase V ‐ Data Reconciliation/alignment 
Comparing different reconciliation approaches based on 
• SILK link discovering language 
• SPARQL based reconciliation described above 
Method Precision Recall F1 
SPARQL –based reconciliation 1,00 0,69 0,820 
SPARQL ‐based reconciliation + 
additional manual review 0,985 0,722 0,833 
Link discovering ‐ Leveisthein 0,927 0,508 0,656 
Link discovering ‐ Dice 0,968 0,674 0,794 
Link discovering ‐ Jaccard 1,000 0,472 0,642 
Link discovering + heuristics based 
on data knowledge + Leveisthein 0,925 0,714 0,806 
Thus automation of reconciliation is possible and produces 
acceptable results!! 
DISIT Lab (DINFO UNIFI), DMS 2014, USA, August 2014 16
DISIT Lab, Distributed Data Intelligence and Technologies 
Distributed Systems and Internet Technologies 
Department of Information Engineering (DINFO) 
http://www.disit.dinfo.unifi.it 
Phase VI ‐ Validation 
• A set of queries applied automatically to verify the 
consistency and completeness, after new 
re‐indexing and new data integration 
– I.e.: the KB regression testing!!!!! 
Phase VII ‐ Data access 
• Applications can access the data using the SPARQL 
endpoint, currently we have two applications: 
– ServiceMap (http://servicemap.disit.org) for a map 
based application 
– Linked Open Graph (http://log.disit.org) for browsing the 
data from SPARQL/Linked Data sources 
DISIT Lab (DINFO UNIFI), DMS 2014, USA, August 2014 17
DISIT Lab, Distributed Data Intelligence and Technologies 
Distributed Systems and Internet Technologies 
Department of Information Engineering (DINFO) 
http://www.disit.dinfo.unifi.it 
DISIT Lab (DINFO UNIFI), DMS 2014, USA, August 2014 18 http://servicemap.disit.org
DISIT Lab, Distributed Data Intelligence and Technologies 
Distributed Systems and Internet Technologies 
Department of Information Engineering (DINFO) 
http://www.disit.dinfo.unifi.it 
http://log.disit.org 
DISIT Lab (DINFO UNIFI), DMS 2014, USA, August 2014 19
DISIT Lab, Distributed Data Intelligence and Technologies 
Distributed Systems and Internet Technologies 
Department of Information Engineering (DINFO) 
http://www.disit.dinfo.unifi.it 
Conclusions 
• Developed 
– Smart‐city Ontology as conceptual model for reasoning 
– platform for smart‐city data ingestion and semantic interoperability 
processes as big data tools 
– Assessment demonstrated that automated reconciliation is 
possible 
• Future/Ongoing activities 
– Improvement of data alignment and cleaning 
– Definition of languages and tools for reasoning 
• It will be used in Sii‐Mobility project: 
– Adding prediction algorithms 
– Adding user‐generated information 
– Adding more applications using the data 
DISIT Lab (DINFO UNIFI), DMS 2014, USA, August 2014 20
DISIT Lab, Distributed Data Intelligence and Technologies 
Distributed Systems and Internet Technologies 
Department of Information Engineering (DINFO) 
http://www.disit.dinfo.unifi.it 
References 
• Caragliu, A., Del Bo, C., Nijkamp, P. (2009), Smart cities in Europe, 3rd Central European Conference in Regional Science – CERS, Kosice (sk), 7‐9 ottobre 
2009. 
• Bellini P., Di Claudio M., Nesi P., Rauch N., "Tassonomy and Review of Big Data Solutions Navigation", Big Data Computing To Be Published 26th July 2013 by 
Chapman and Hall/CRC 
• Vilajosana, I. ; Llosa, J. ; Martinez, B. ; Domingo‐Prieto, M. ; Angles, A., "Bootstrapping smart cities through a self‐sustainable model based on big data 
flows", Communications Magazine, IEEE, Vol.51, n.6, 2013 
• Ontology of Trasportation Networks, Deliverable A1‐D4, Project REWERSE, 2005 http://rewerse.net/deliverables/m18/a1‐d4.pdf 
• Pan, Feng, and Jerry R. Hobbs. "Temporal Aggregates in OWL‐Time." In FLAIRS Conference, vol. 5, pp. 560‐565. 2005. 
• Embley, David W., Douglas M. Campbell, Yuan S. Jiang, Stephen W. Liddle, Deryle W. Lonsdale, Y‐K. Ng, and Randy D. Smith. "Conceptual‐model‐based data 
extraction from multiple‐record Web pages." Data & Knowledge Engineering 31, no. 3 (1999): 227‐251. 
• Auer, Sören, Jens Lehmann, and Sebastian Hellmann. "Linkedgeodata: Adding a spatial dimension to the web of data." In The Semantic Web‐ISWC 2009, pp. 
731‐746. Springer Berlin Heidelberg, 2009. 
• Andrea Bellandi, Pierfrancesco Bellini, Antonio Cappuccio, Paolo Nesi, Gianni Pantaleo, Nadia Rauch, ASSISTED KNOWLEDGE BASE GENERATION, 
MANAGEMENT AND COMPETENCE RETRIEVAL, International Journal of Software Engineering and Knowledge Engineering, Vol.22, n.8, 2012 
• Apache HBase: A Distributed Database for Large Datasets. The Apache Software Foundation, Los Angeles, CA. URL http://hbase.apache.org. 
• Pentaho Data Integration, http://www.pentaho.com/product/data‐integration 
• Barry Bishop, Atanas Kiryakov, Damyan Ognyanoff, Ivan Peikov, Zdravko Tashev, Ruslan Velkov, “OWLIM: A family of scalable semantic repositories”, 
Semantic Web Journal, Volume 2, Number 1 / 2011. 
• S.Gupta, P.Szekely, C.Knoblock, A.Goel, M.Taheriyan, M.Muslea, "Karma: A System for Mapping Structured Sources into the Semantic Web", 9th Extended 
Semantic Web Conference (ESWC2012). 
• A. Ngomo, S. Auer. “LIMES: a time‐efficient approach for large‐scale link discovery on the web of data”. Proc. of the 22nd int. joint conf. on Artificial 
Intelligence, Vol.3. AAAI Press, 2011. 
• R. Isele, C. Bizer. “Active learning of expressive linkage rules using genetic programming”. Web Semantics: Science, Services and Agents on the World Wide 
Web 23 (2013): pp.2‐15. 
• Powers, D.M.W. (February 27, 2011). "Evaluation from precision, recall and F‐Measure to roc informedness, markedness and correlation". Journal of 
Machine Learning Technologies 2 (1): 37–63. 
DISIT Lab (DINFO UNIFI), DMS 2014, USA, August 2014 21
DISIT Lab, Distributed Data Intelligence and Technologies 
Distributed Systems and Internet Technologies 
Department of Information Engineering (DINFO) 
http://www.disit.dinfo.unifi.it 
Thank you! 
http://www.disit.org/5606 
http://www.disit.dinfo.unifi.it 
Paolo Nesi 
Dipartimento di Ingegneria dell’Informazione, DINFO 
Università degli Studi di Firenze 
Via S. Marta 3, 50139, Firenze, Italy 
Tel: +39-055-4796567, fax: +39-055-4796363 
DISIT Lab 
http://www.disit.dinfo.unifi.it alias http://www.disit.org 
paolo.nesi@unifi.it 
DISIT Lab (DINFO UNIFI), DMS 2014, USA, August 2014 22

Más contenido relacionado

La actualidad más candente

Today's Data Grow Tomorrow's Citizens
Today's Data Grow Tomorrow's CitizensToday's Data Grow Tomorrow's Citizens
Today's Data Grow Tomorrow's Citizens
Communication and Media Studies, Carleton University
 
DISIT: Distributed Data Intelligence and Technology Lab
DISIT: Distributed Data Intelligence and Technology LabDISIT: Distributed Data Intelligence and Technology Lab
DISIT: Distributed Data Intelligence and Technology Lab
Paolo Nesi
 
Madrid 2 November 2009
Madrid 2 November 2009Madrid 2 November 2009
Madrid 2 November 2009
pkdoorn
 
Data, Infrastructures and Geographical Imaginations
Data, Infrastructures and Geographical ImaginationsData, Infrastructures and Geographical Imaginations
Data, Infrastructures and Geographical Imaginations
Communication and Media Studies, Carleton University
 
inLab FIB Presentation at ICT2013EU
inLab FIB Presentation at ICT2013EUinLab FIB Presentation at ICT2013EU
inLab FIB Presentation at ICT2013EU
inLabFIB
 
Plan4all Dissemination - Georama at PCI 2010 Greece
Plan4all Dissemination - Georama at PCI 2010 GreecePlan4all Dissemination - Georama at PCI 2010 Greece
Plan4all Dissemination - Georama at PCI 2010 Greece
Manolis Viennas
 

La actualidad más candente (20)

Data Power
Data PowerData Power
Data Power
 
Today's Data Grow Tomorrow's Citizens
Today's Data Grow Tomorrow's CitizensToday's Data Grow Tomorrow's Citizens
Today's Data Grow Tomorrow's Citizens
 
GI Management Transformation: from geometry to databased relationships
GI Management Transformation: from geometry to databased relationshipsGI Management Transformation: from geometry to databased relationships
GI Management Transformation: from geometry to databased relationships
 
DISIT: Distributed Data Intelligence and Technology Lab
DISIT: Distributed Data Intelligence and Technology LabDISIT: Distributed Data Intelligence and Technology Lab
DISIT: Distributed Data Intelligence and Technology Lab
 
Webinar 1: Situating Canadian Cities in an International Smart City Ecosystem
Webinar 1: Situating Canadian Cities in an International Smart City EcosystemWebinar 1: Situating Canadian Cities in an International Smart City Ecosystem
Webinar 1: Situating Canadian Cities in an International Smart City Ecosystem
 
Community Data Program Submitted letter to Open Government Partneship
Community Data Program Submitted letter to Open Government PartneshipCommunity Data Program Submitted letter to Open Government Partneship
Community Data Program Submitted letter to Open Government Partneship
 
Madrid 2 November 2009
Madrid 2 November 2009Madrid 2 November 2009
Madrid 2 November 2009
 
Neven Vrček - Role of Governments, Academy & Science Parks - University of Za...
Neven Vrček - Role of Governments, Academy & Science Parks - University of Za...Neven Vrček - Role of Governments, Academy & Science Parks - University of Za...
Neven Vrček - Role of Governments, Academy & Science Parks - University of Za...
 
Critical Data Studies in the Academy
Critical Data Studies in the AcademyCritical Data Studies in the Academy
Critical Data Studies in the Academy
 
Presentations from ICT 2015 in Lisbon
Presentations from ICT 2015 in LisbonPresentations from ICT 2015 in Lisbon
Presentations from ICT 2015 in Lisbon
 
The Developing Needs for e-infrastructures
The Developing Needs for e-infrastructuresThe Developing Needs for e-infrastructures
The Developing Needs for e-infrastructures
 
Data, Infrastructures and Geographical Imaginations
Data, Infrastructures and Geographical ImaginationsData, Infrastructures and Geographical Imaginations
Data, Infrastructures and Geographical Imaginations
 
Open Data Analytics for Parliamentary Monitoring in Finland
Open Data Analytics for Parliamentary Monitoring in FinlandOpen Data Analytics for Parliamentary Monitoring in Finland
Open Data Analytics for Parliamentary Monitoring in Finland
 
InLab FIB (UPC) Presentation
InLab FIB (UPC) PresentationInLab FIB (UPC) Presentation
InLab FIB (UPC) Presentation
 
BYTE bdva Valencia Summit November 2016
BYTE bdva Valencia Summit November 2016BYTE bdva Valencia Summit November 2016
BYTE bdva Valencia Summit November 2016
 
Phaedra II Technology foresight, 17 Nov 2016
Phaedra II Technology foresight, 17 Nov 2016Phaedra II Technology foresight, 17 Nov 2016
Phaedra II Technology foresight, 17 Nov 2016
 
inLab FIB Presentation at ICT2013EU
inLab FIB Presentation at ICT2013EUinLab FIB Presentation at ICT2013EU
inLab FIB Presentation at ICT2013EU
 
Associating e-government and e-participation indexes with governmental twitte...
Associating e-government and e-participation indexes with governmental twitte...Associating e-government and e-participation indexes with governmental twitte...
Associating e-government and e-participation indexes with governmental twitte...
 
In lab en_nov2012-smartcitiescongress-smartcities-ict
In lab en_nov2012-smartcitiescongress-smartcities-ictIn lab en_nov2012-smartcitiescongress-smartcities-ict
In lab en_nov2012-smartcitiescongress-smartcities-ict
 
Plan4all Dissemination - Georama at PCI 2010 Greece
Plan4all Dissemination - Georama at PCI 2010 GreecePlan4all Dissemination - Georama at PCI 2010 Greece
Plan4all Dissemination - Georama at PCI 2010 Greece
 

Destacado

La Citta' invisibile; Le informazioni digitali nei contesti urbani
La Citta' invisibile; Le informazioni digitali nei contesti urbaniLa Citta' invisibile; Le informazioni digitali nei contesti urbani
La Citta' invisibile; Le informazioni digitali nei contesti urbani
Paolo Nesi
 
Automating Big Data Management, DISIT Lab
Automating Big Data Management, DISIT LabAutomating Big Data Management, DISIT Lab
Automating Big Data Management, DISIT Lab
Paolo Nesi
 
DISIT lab and research group, 2015-2016 overview for the next calls
DISIT lab and research group, 2015-2016 overview for the next callsDISIT lab and research group, 2015-2016 overview for the next calls
DISIT lab and research group, 2015-2016 overview for the next calls
Paolo Nesi
 

Destacado (16)

From parallel architecture to mapreduce hadoop passing on grid, UNIFI course
From parallel architecture to mapreduce hadoop passing on grid, UNIFI courseFrom parallel architecture to mapreduce hadoop passing on grid, UNIFI course
From parallel architecture to mapreduce hadoop passing on grid, UNIFI course
 
TUTORIAL 1/2 (of the second part) ECLAP 2012 Conference: Ingestion Process Ov...
TUTORIAL 1/2 (of the second part) ECLAP 2012 Conference: Ingestion Process Ov...TUTORIAL 1/2 (of the second part) ECLAP 2012 Conference: Ingestion Process Ov...
TUTORIAL 1/2 (of the second part) ECLAP 2012 Conference: Ingestion Process Ov...
 
IPR and Business Models for Performing Arts Content, Best Practice Recommenda...
IPR and Business Models for Performing Arts Content, Best Practice Recommenda...IPR and Business Models for Performing Arts Content, Best Practice Recommenda...
IPR and Business Models for Performing Arts Content, Best Practice Recommenda...
 
Smart City at DISIT Lab, step two after smart city for beginners
Smart City at DISIT Lab, step two after smart city for beginnersSmart City at DISIT Lab, step two after smart city for beginners
Smart City at DISIT Lab, step two after smart city for beginners
 
A Linked Open Data Service for Performing Arts
A Linked Open Data Service for Performing ArtsA Linked Open Data Service for Performing Arts
A Linked Open Data Service for Performing Arts
 
Models and Tools for Knowledge Reconstruction
Models and Tools for Knowledge ReconstructionModels and Tools for Knowledge Reconstruction
Models and Tools for Knowledge Reconstruction
 
Mobile Emergency: Nuove Tecniche per la comunicazione e gestione delle emergenze
Mobile Emergency: Nuove Tecniche per la comunicazione e gestione delle emergenzeMobile Emergency: Nuove Tecniche per la comunicazione e gestione delle emergenze
Mobile Emergency: Nuove Tecniche per la comunicazione e gestione delle emergenze
 
Il punto di vista della ricerca: Smart Cities, Smart Specialisation in Toscana
Il punto di vista della ricerca:  Smart Cities, Smart Specialisation in ToscanaIl punto di vista della ricerca:  Smart Cities, Smart Specialisation in Toscana
Il punto di vista della ricerca: Smart Cities, Smart Specialisation in Toscana
 
Distributed and Parallel Architecture, from grid to MapReduce, hadoop
Distributed and Parallel Architecture, from grid to MapReduce, hadoopDistributed and Parallel Architecture, from grid to MapReduce, hadoop
Distributed and Parallel Architecture, from grid to MapReduce, hadoop
 
La Citta' invisibile; Le informazioni digitali nei contesti urbani
La Citta' invisibile; Le informazioni digitali nei contesti urbaniLa Citta' invisibile; Le informazioni digitali nei contesti urbani
La Citta' invisibile; Le informazioni digitali nei contesti urbani
 
Automating Big Data Management, DISIT Lab
Automating Big Data Management, DISIT LabAutomating Big Data Management, DISIT Lab
Automating Big Data Management, DISIT Lab
 
Service Map API, Smart City API, Open Data API
Service Map API, Smart City API, Open Data APIService Map API, Smart City API, Open Data API
Service Map API, Smart City API, Open Data API
 
RESOLUTE: Governing for Resilience – Implementation Challenges
RESOLUTE: Governing for Resilience – Implementation Challenges RESOLUTE: Governing for Resilience – Implementation Challenges
RESOLUTE: Governing for Resilience – Implementation Challenges
 
Big Data Smart City processes and tools, Real Time data processing tools
Big Data Smart City processes and tools, Real Time data processing toolsBig Data Smart City processes and tools, Real Time data processing tools
Big Data Smart City processes and tools, Real Time data processing tools
 
Smart City Ecosystem, fram data to value for the citizens, Km4City solution, ...
Smart City Ecosystem, fram data to value for the citizens, Km4City solution, ...Smart City Ecosystem, fram data to value for the citizens, Km4City solution, ...
Smart City Ecosystem, fram data to value for the citizens, Km4City solution, ...
 
DISIT lab and research group, 2015-2016 overview for the next calls
DISIT lab and research group, 2015-2016 overview for the next callsDISIT lab and research group, 2015-2016 overview for the next calls
DISIT lab and research group, 2015-2016 overview for the next calls
 

Similar a Ontology Building vs Data Harvesting and Cleaning for Smart-city Services

A Smart City Development kit for designing Web and Mobile Apps
A Smart City Development kit for designing  Web and Mobile AppsA Smart City Development kit for designing  Web and Mobile Apps
A Smart City Development kit for designing Web and Mobile Apps
Paolo Nesi
 
Open Data Day 2016, Km4City, L’universita’ come aggregatore di Open Data del ...
Open Data Day 2016, Km4City, L’universita’ come aggregatore di Open Data del ...Open Data Day 2016, Km4City, L’universita’ come aggregatore di Open Data del ...
Open Data Day 2016, Km4City, L’universita’ come aggregatore di Open Data del ...
Paolo Nesi
 
Smart City and Open Data Projects and tools of DISIT Lab
Smart City and Open Data Projects and tools of DISIT LabSmart City and Open Data Projects and tools of DISIT Lab
Smart City and Open Data Projects and tools of DISIT Lab
Paolo Nesi
 
Snap4City: Smart City IOT/IOE Platform scalable Smart aNalytic APplication bu...
Snap4City: Smart City IOT/IOE Platform scalable Smart aNalytic APplication bu...Snap4City: Smart City IOT/IOE Platform scalable Smart aNalytic APplication bu...
Snap4City: Smart City IOT/IOE Platform scalable Smart aNalytic APplication bu...
Paolo Nesi
 
Keynote: Making Smarter Tuscany and Florence with Km4City
Keynote: Making Smarter Tuscany and Florence with Km4CityKeynote: Making Smarter Tuscany and Florence with Km4City
Keynote: Making Smarter Tuscany and Florence with Km4City
Paolo Nesi
 
DAI DATI INTELLIGENTI AI SERVIZI Smart City API Hackathon
DAI DATI INTELLIGENTI AI SERVIZI Smart City API HackathonDAI DATI INTELLIGENTI AI SERVIZI Smart City API Hackathon
DAI DATI INTELLIGENTI AI SERVIZI Smart City API Hackathon
Paolo Nesi
 
Complexity of IOT/IOE Architectures for Smart Service Infrastructures Panel:...
Complexity of IOT/IOE Architectures for  Smart Service Infrastructures Panel:...Complexity of IOT/IOE Architectures for  Smart Service Infrastructures Panel:...
Complexity of IOT/IOE Architectures for Smart Service Infrastructures Panel:...
Paolo Nesi
 
Gli open data nella “città intelligente”
Gli open data nella “città intelligente”Gli open data nella “città intelligente”
Gli open data nella “città intelligente”
Paolo Nesi
 
Twitter Vigilance: a Multi-User platform for Cross-Domain Twitter Data Analyt...
Twitter Vigilance: a Multi-User platform for Cross-Domain Twitter Data Analyt...Twitter Vigilance: a Multi-User platform for Cross-Domain Twitter Data Analyt...
Twitter Vigilance: a Multi-User platform for Cross-Domain Twitter Data Analyt...
Paolo Nesi
 

Similar a Ontology Building vs Data Harvesting and Cleaning for Smart-city Services (20)

A Smart City Development kit for designing Web and Mobile Apps
A Smart City Development kit for designing  Web and Mobile AppsA Smart City Development kit for designing  Web and Mobile Apps
A Smart City Development kit for designing Web and Mobile Apps
 
Open Data Day 2016, Km4City, L’universita’ come aggregatore di Open Data del ...
Open Data Day 2016, Km4City, L’universita’ come aggregatore di Open Data del ...Open Data Day 2016, Km4City, L’universita’ come aggregatore di Open Data del ...
Open Data Day 2016, Km4City, L’universita’ come aggregatore di Open Data del ...
 
Smart City and Open Data Projects and tools of DISIT Lab
Smart City and Open Data Projects and tools of DISIT LabSmart City and Open Data Projects and tools of DISIT Lab
Smart City and Open Data Projects and tools of DISIT Lab
 
RESOLUTE: Resilience management guidelines and Operationalization applied to ...
RESOLUTE: Resilience management guidelines and Operationalization applied to ...RESOLUTE: Resilience management guidelines and Operationalization applied to ...
RESOLUTE: Resilience management guidelines and Operationalization applied to ...
 
Snap4City November 2019 Course: Smart City IOT Data Ingestion Interoperabilit...
Snap4City November 2019 Course: Smart City IOT Data Ingestion Interoperabilit...Snap4City November 2019 Course: Smart City IOT Data Ingestion Interoperabilit...
Snap4City November 2019 Course: Smart City IOT Data Ingestion Interoperabilit...
 
Open Urban Platform for Smart City: Technical View
Open Urban Platform for Smart City: Technical View Open Urban Platform for Smart City: Technical View
Open Urban Platform for Smart City: Technical View
 
Big Data, open data, IOT
Big Data, open data, IOTBig Data, open data, IOT
Big Data, open data, IOT
 
Big Data, Open data, IOT
Big Data, Open data, IOTBig Data, Open data, IOT
Big Data, Open data, IOT
 
Open Urban Platform: Technical View 2018: Km4City
Open Urban Platform: Technical View 2018: Km4CityOpen Urban Platform: Technical View 2018: Km4City
Open Urban Platform: Technical View 2018: Km4City
 
Km4city: open flexible scalable city platform
Km4city: open flexible scalable city platformKm4city: open flexible scalable city platform
Km4city: open flexible scalable city platform
 
Overview on Smart City: Smart City for Beginners
Overview on Smart City: Smart City for BeginnersOverview on Smart City: Smart City for Beginners
Overview on Smart City: Smart City for Beginners
 
Snap4City November 2019 Course: Smart City IOT Data Analytics
Snap4City November 2019 Course: Smart City IOT Data AnalyticsSnap4City November 2019 Course: Smart City IOT Data Analytics
Snap4City November 2019 Course: Smart City IOT Data Analytics
 
Snap4City: Smart City IOT/IOE Platform scalable Smart aNalytic APplication bu...
Snap4City: Smart City IOT/IOE Platform scalable Smart aNalytic APplication bu...Snap4City: Smart City IOT/IOE Platform scalable Smart aNalytic APplication bu...
Snap4City: Smart City IOT/IOE Platform scalable Smart aNalytic APplication bu...
 
Keynote: Making Smarter Tuscany and Florence with Km4City
Keynote: Making Smarter Tuscany and Florence with Km4CityKeynote: Making Smarter Tuscany and Florence with Km4City
Keynote: Making Smarter Tuscany and Florence with Km4City
 
DAI DATI INTELLIGENTI AI SERVIZI Smart City API Hackathon
DAI DATI INTELLIGENTI AI SERVIZI Smart City API HackathonDAI DATI INTELLIGENTI AI SERVIZI Smart City API Hackathon
DAI DATI INTELLIGENTI AI SERVIZI Smart City API Hackathon
 
Complexity of IOT/IOE Architectures for Smart Service Infrastructures Panel:...
Complexity of IOT/IOE Architectures for  Smart Service Infrastructures Panel:...Complexity of IOT/IOE Architectures for  Smart Service Infrastructures Panel:...
Complexity of IOT/IOE Architectures for Smart Service Infrastructures Panel:...
 
scalable Smart aNalytic APplication builder for sentient Cities Overview -- S...
scalable Smart aNalytic APplication builder for sentient Cities Overview -- S...scalable Smart aNalytic APplication builder for sentient Cities Overview -- S...
scalable Smart aNalytic APplication builder for sentient Cities Overview -- S...
 
Gli open data nella “città intelligente”
Gli open data nella “città intelligente”Gli open data nella “città intelligente”
Gli open data nella “città intelligente”
 
Km4City: una soluzione aperta per erogare servizi Smart City
Km4City: una soluzione aperta per erogare servizi Smart CityKm4City: una soluzione aperta per erogare servizi Smart City
Km4City: una soluzione aperta per erogare servizi Smart City
 
Twitter Vigilance: a Multi-User platform for Cross-Domain Twitter Data Analyt...
Twitter Vigilance: a Multi-User platform for Cross-Domain Twitter Data Analyt...Twitter Vigilance: a Multi-User platform for Cross-Domain Twitter Data Analyt...
Twitter Vigilance: a Multi-User platform for Cross-Domain Twitter Data Analyt...
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 

Ontology Building vs Data Harvesting and Cleaning for Smart-city Services

  • 1. DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it Ontology Building vs Data Harvesting and Cleaning for Smart‐city Services Pierfrancesco Bellini, Monica Benigni, Riccardo Billero, Paolo Nesi, Nadia Rauch Dipartimento di Ingegneria dell’Informazione, DINFO Università degli Studi di Firenze Via S. Marta 3, 50139, Firenze, Italy Tel: +39-055-4796567, fax: +39-055-4796363 DISIT Lab http://www.disit.dinfo.unifi.it alias http://www.disit.org , paolo.nesi@unifi.it Proc. of the 20th International Conference on Distributed Multimedia Systems, Pittsburgh, USA, August 2014 DISIT Lab (DINFO UNIFI), DMS 2014, USA, August 2014 1
  • 2. DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it Smart‐City axes • Cities produce a HUGE amount of data every day – ‘Static’ data • Road graph • Bus/train graph • Services • ... – Dynamic (real time) data • Weather conditions • Traffic conditions • Pollution status • Bus/train positions • Parking status • People flows • ... – Open/Private Data DISIT Lab (DINFO UNIFI), DMS 2014, USA, August 2014 2 • Smart Health • Smart Education • Smart Mobility • Smart Energy • Smart Governmental – Smart economy – Smart people – Smart environment – Smart living • Smart Telecommunication
  • 3. DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it Smart‐City • Main Aim – Provide a platform able to ingest and take advantage a large number of the above data, big data: • Exploit data integration and reasoning • Deliver new services and applications to citizens, Leverage on the ongoing Semantic Web effort • Problems & Challenges – Data are provided in many different formats and protocols and from many different institutions, different convention and protocols, a different time, …. ! – Data are typically not aligned (e.g., street names, dates, geolocations, tags, … ). That is, they are not semantically interoperable – resulting a big data problem: volume, velocity, variability, variety, ….. DISIT Lab (DINFO UNIFI), DMS 2014, USA, August 2014 3
  • 4. DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it Smart City Paradigm Profiled Services Profiled Services Interoperability Smart City Engine Data processing DISIT Lab (DINFO UNIFI), DMS 2014, USA, August 2014 4 Data Sensors Data Harvesting Data / info Rendering Data / info Exploitation Suggestions and Alarms Real Time Data Social Data trends Data Ingesting and mining Reasoning and Deduction Data Acting processors Real Time Computing eGov Data collection Sensors control Traffic control Energy central eHealth Agency Telecom. Services ……. Social Media Peripheral processors Citizens Formation Applications
  • 5. DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it Smart‐city Ontology • The data model provided have been mapped into the ontology, it covers different aspects: – Administration MetaData – Street‐guide – Points of interest ADMINISTRATIVEROAD PA hasPublicOffice OFFICE ownerAuthority PA SERVICE isInRoad ROAD Point of – Local public transport Administration Street‐guide Macroclass Macroclass Interest Macroclass CARPARK CARPARKSENSOR – Sensors BUSSTOP isInRoad ROAD isInRoad observeCarPark CARPARK ROAD WEATHERREPORT refersTo PA SENSOR measuredTime TIME – Temporal aspects Local public transport Sensors Temporal Macroclass Macroclass – Metadata on the data Macroclass BUS hasExpectedTime TIME BUSSTOPFORECAST atBusStop BUSSTOP DISIT Lab (DINFO UNIFI), DMS 2014, USA, August 2014 5
  • 6. DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it Smart‐city Ontology • Administration: structure of the general public administrations (Municipality, Province and Region) also includes Resolutions (ordinance issued by administrations, may change the viability, infrastructural works, schedule for RTZ, etc. ) • Street‐guide: formed by entities as Road, Node, RoadElement, AdministrativeRoad, Milestone, StreetNumber, RoadLink, Junction, Entry, EntryRule, Maneuver,… represents the entire road system of the region, including the permitted maneuvers and the rules of access to the limited traffic zones. Based on OTN (Ontology of Transportation Networks) vocabulary • Points of Interest: includes all Services, activities, which may be useful to the citizen and who may have the need to search for and to arrive at, commercials, public administration, Cultural, …. DISIT Lab (DINFO UNIFI), DMS 2014, USA, August 2014 6
  • 7. DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it Smart‐city Ontology • Local public transport: includes the data related to major local public transport companies as scheduled times, the rail graph, and data relating to real time passage at bus stops, real time position, ... • Sensors: data provided by sensors: currently, data are collected from various sensors (parking status, meteo, pollution) installed along some streets of Florence and surrounding areas, and from sensors installed into the main car parks of the region. – Plus: car sharing, bike sharing, AVM, RTZ, etc. • Temporal: that puts concepts related with time (time intervals and instants) into the ontology, so that associate a timeline to the events recorded and is possible to make forecasts. It uses time ontologies such as OWL‐Time. DISIT Lab (DINFO UNIFI), DMS 2014, USA, August 2014 7
  • 8. DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it Smart‐city Ontology • Metadata: modeling the additional information associated with: – Descriptor of Data sets that produced the triples: data set ID, title, description, purpose, location, administration, version, responsible, etc.. – Licensing information – Process information: IDs of the processes adopted for ingestion, quality improvement, mapping, indexing,.. ; date and time of ingestion, update, review, …; When a problem is detected, we have the information to understand when and how the problem has been included • Including basic ontologies as: – DC: Dublin core, standard metadata – OTN: Ontology for Transport Network – FOAF: for the description of the relations among people or groups – vCard: for a description of people and organizations – wgs84_pos: for latitude and longitude, GPS info – OWL‐Time: reasoning on time, time intervals – GoodRelations: commercial activities models DISIT Lab (DINFO UNIFI), DMS 2014, USA, August 2014 8
  • 9. DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it Smart‐city Ontology 84 Classes 93 ObjectProperties http://www.disit.org/5606 103 DataProperties DISIT Lab (DINFO UNIFI), DMS 2014, USA, August 2014 9
  • 10. DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it Data Engineering Architecture Reconciliation SPARQL endpoint Phase IV DISIT Lab (DINFO UNIFI), DMS 2014, USA, August 2014 10 HBase Service Map Linked Open Graph ETL Tran s. ETL Tran s. Process Scheduler ETL Tran s. Mapping Process RDF Store + indexes R2RML Model Validation Process Process Applications Phase I Phase III Phase V Phase VI Phase VII Data source 1 Data source 2 Data source n Phase II Quality improvem ent RDF triples indexing ontolo gies RT Data source n ETL Trans.
  • 11. DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it Phase I ‐ Data Ingestion • Ingesting a wide range of OD/PD: public and private data, static, quasi static and/or dynamic real time data. • For the case of Florence, we are addressing about 150 different data sources of the 564 available, plus the regional, province, other municipalities, …. • Using Pentaho ‐ Kettle for data integration (Open source tool) – using specific ETL Kettle transformation processes (one or more for each data source) – data are stored in HBase (Bigdata NoSQL database) • Static and semi‐static data include: points of interests, geo‐referenced services, maps, accidents statistics, etc. – files in several formats (SHP, KML, CVS, ZIP, XML, etc.) • Dynamic data mainly data coming from sensors – parking, weather conditions, pollution measures, bus position, etc. – using Web Services. DISIT Lab (DINFO UNIFI), DMS 2014, USA, August 2014 11
  • 12. DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it Phase II ‐ Data Quality Improvement • Problems kinds: – Inconsistencies, incompleteness,.. • Problems on: – CAPs vs Locations – Street names (e.g., dividing names from numbers, normalize when possible) – Dates and Time: normalizing – Telephone numbers: normalizing – Web links and emails: normalizing • Partial Usage of – Certified and accepted tables and additional knowledge • Enrichment process may need several versions: – VIP names, GeoNames, etc.. DISIT Lab (DINFO UNIFI), DMS 2014, USA, August 2014 12
  • 13. DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it Phase III ‐ Data mapping • Transforms the data from HBase to RDF triples • Using Karma Data Integration tool, a mapping model from SQL to RDF on the basis of the ontology was created – Data to be mapped first temporarly passed from Hbase to MySQL and then mapped using Karma (in batch mode) • The mapped data in triples have to be uploaded (and indexed) to the RDF Store (OpenRDF – sesame with OWLIM‐SE) DISIT Lab (DINFO UNIFI), DMS 2014, USA, August 2014 13
  • 14. DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it Phase IV ‐ Indexing • Periodic task for reindexing: triples, text, space (GPS), dates, etc. • Indexing triples: ontologies, all RDF files for OD, RT triples (from ‐ to), reconciliation triples for OD, triples for enrichments, etc. • If you do not index, you cannot identify all missing reconciliations Phase V ‐ Data Reconciliation/alignment • After the loading and indexing into the RDF store a dataset may be connected with the others if entities refer to the same triples – Missed connections strongly limit the usage of the knowledge base, – e.g. the services are not connected with the road graph. • To associate each Service with a Road and an Entity on the basis of the street name, number and locality • It is not easy! data coming from different sources DISIT Lab (DINFO UNIFI), DMS 2014, USA, August 2014 14
  • 15. DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it Phase V ‐ Data Reconciliation/alignment DISIT Lab (DINFO UNIFI), DMS 2014, USA, August 2014 15 • Examples: – Typos; – Missing street number, or replaced with "0" or "SNC"; – Municipalities with no official name (e.g. Vicchio/Vicchio del Mugello); – Street names and street numbers with strange characters ( ‐, /, ° ? , Ang., ,); – Road name with words in a different order ( e.g. Via Petrarca Francesco, exchange of name and surname); – Red street numbers (for shops); – Presence/absence of proper names in road name (e.g. via Camillo Benso di Cavour / via Cavour); – Number wrongly written (e.g. 34/AB, 403D, 36INT.1); – Roman numerals in the road name (e.g., via XXVII Aprile). • Steps: 1. SPARQL Exact match – match the strings as they are 2. SPARQL Enhanced Exact Match – make some substitutions (Via S. Marta  Via Santa Marta, ...) 3. Last Word Search – use only the last word of street name 4. Use Google GeoCoding API 5. Remove ‘strange chars’ ( ‐, /, °, ? , Ang., ,) from Street number 6. Remove ‘strange chars’ from Street name 7. Rewrite wrong municipality names
  • 16. DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it Phase V ‐ Data Reconciliation/alignment Comparing different reconciliation approaches based on • SILK link discovering language • SPARQL based reconciliation described above Method Precision Recall F1 SPARQL –based reconciliation 1,00 0,69 0,820 SPARQL ‐based reconciliation + additional manual review 0,985 0,722 0,833 Link discovering ‐ Leveisthein 0,927 0,508 0,656 Link discovering ‐ Dice 0,968 0,674 0,794 Link discovering ‐ Jaccard 1,000 0,472 0,642 Link discovering + heuristics based on data knowledge + Leveisthein 0,925 0,714 0,806 Thus automation of reconciliation is possible and produces acceptable results!! DISIT Lab (DINFO UNIFI), DMS 2014, USA, August 2014 16
  • 17. DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it Phase VI ‐ Validation • A set of queries applied automatically to verify the consistency and completeness, after new re‐indexing and new data integration – I.e.: the KB regression testing!!!!! Phase VII ‐ Data access • Applications can access the data using the SPARQL endpoint, currently we have two applications: – ServiceMap (http://servicemap.disit.org) for a map based application – Linked Open Graph (http://log.disit.org) for browsing the data from SPARQL/Linked Data sources DISIT Lab (DINFO UNIFI), DMS 2014, USA, August 2014 17
  • 18. DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it DISIT Lab (DINFO UNIFI), DMS 2014, USA, August 2014 18 http://servicemap.disit.org
  • 19. DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it http://log.disit.org DISIT Lab (DINFO UNIFI), DMS 2014, USA, August 2014 19
  • 20. DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it Conclusions • Developed – Smart‐city Ontology as conceptual model for reasoning – platform for smart‐city data ingestion and semantic interoperability processes as big data tools – Assessment demonstrated that automated reconciliation is possible • Future/Ongoing activities – Improvement of data alignment and cleaning – Definition of languages and tools for reasoning • It will be used in Sii‐Mobility project: – Adding prediction algorithms – Adding user‐generated information – Adding more applications using the data DISIT Lab (DINFO UNIFI), DMS 2014, USA, August 2014 20
  • 21. DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it References • Caragliu, A., Del Bo, C., Nijkamp, P. (2009), Smart cities in Europe, 3rd Central European Conference in Regional Science – CERS, Kosice (sk), 7‐9 ottobre 2009. • Bellini P., Di Claudio M., Nesi P., Rauch N., "Tassonomy and Review of Big Data Solutions Navigation", Big Data Computing To Be Published 26th July 2013 by Chapman and Hall/CRC • Vilajosana, I. ; Llosa, J. ; Martinez, B. ; Domingo‐Prieto, M. ; Angles, A., "Bootstrapping smart cities through a self‐sustainable model based on big data flows", Communications Magazine, IEEE, Vol.51, n.6, 2013 • Ontology of Trasportation Networks, Deliverable A1‐D4, Project REWERSE, 2005 http://rewerse.net/deliverables/m18/a1‐d4.pdf • Pan, Feng, and Jerry R. Hobbs. "Temporal Aggregates in OWL‐Time." In FLAIRS Conference, vol. 5, pp. 560‐565. 2005. • Embley, David W., Douglas M. Campbell, Yuan S. Jiang, Stephen W. Liddle, Deryle W. Lonsdale, Y‐K. Ng, and Randy D. Smith. "Conceptual‐model‐based data extraction from multiple‐record Web pages." Data & Knowledge Engineering 31, no. 3 (1999): 227‐251. • Auer, Sören, Jens Lehmann, and Sebastian Hellmann. "Linkedgeodata: Adding a spatial dimension to the web of data." In The Semantic Web‐ISWC 2009, pp. 731‐746. Springer Berlin Heidelberg, 2009. • Andrea Bellandi, Pierfrancesco Bellini, Antonio Cappuccio, Paolo Nesi, Gianni Pantaleo, Nadia Rauch, ASSISTED KNOWLEDGE BASE GENERATION, MANAGEMENT AND COMPETENCE RETRIEVAL, International Journal of Software Engineering and Knowledge Engineering, Vol.22, n.8, 2012 • Apache HBase: A Distributed Database for Large Datasets. The Apache Software Foundation, Los Angeles, CA. URL http://hbase.apache.org. • Pentaho Data Integration, http://www.pentaho.com/product/data‐integration • Barry Bishop, Atanas Kiryakov, Damyan Ognyanoff, Ivan Peikov, Zdravko Tashev, Ruslan Velkov, “OWLIM: A family of scalable semantic repositories”, Semantic Web Journal, Volume 2, Number 1 / 2011. • S.Gupta, P.Szekely, C.Knoblock, A.Goel, M.Taheriyan, M.Muslea, "Karma: A System for Mapping Structured Sources into the Semantic Web", 9th Extended Semantic Web Conference (ESWC2012). • A. Ngomo, S. Auer. “LIMES: a time‐efficient approach for large‐scale link discovery on the web of data”. Proc. of the 22nd int. joint conf. on Artificial Intelligence, Vol.3. AAAI Press, 2011. • R. Isele, C. Bizer. “Active learning of expressive linkage rules using genetic programming”. Web Semantics: Science, Services and Agents on the World Wide Web 23 (2013): pp.2‐15. • Powers, D.M.W. (February 27, 2011). "Evaluation from precision, recall and F‐Measure to roc informedness, markedness and correlation". Journal of Machine Learning Technologies 2 (1): 37–63. DISIT Lab (DINFO UNIFI), DMS 2014, USA, August 2014 21
  • 22. DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it Thank you! http://www.disit.org/5606 http://www.disit.dinfo.unifi.it Paolo Nesi Dipartimento di Ingegneria dell’Informazione, DINFO Università degli Studi di Firenze Via S. Marta 3, 50139, Firenze, Italy Tel: +39-055-4796567, fax: +39-055-4796363 DISIT Lab http://www.disit.dinfo.unifi.it alias http://www.disit.org paolo.nesi@unifi.it DISIT Lab (DINFO UNIFI), DMS 2014, USA, August 2014 22