SlideShare una empresa de Scribd logo
1 de 17
Jarrar © 2013 1
Dr. Mustafa Jarrar
University of Birzeit
mjarrar@birzeit.edu
www.jarrar.info
Lecture Notes, Web Data Management
Birzeit University, Palestine
2013
Introduction to Data Integration
Jarrar © 2013 2
Watch this lecture and download the slides from
http://jarrar-courses.blogspot.com/2014/01/web-data-management.html
Jarrar © 2013 3
Outline
Example from the government Domain
- Problem is in all domains
- Challenges of Data Integration:
- Heterogeneities in Database Schemas
- Name and Meaning Heterogeneities
- Heterogeneities in Structure and Type
- Heterogeneities in the rules and constraints
- Model Heterogeneities
Keywords: Data Integration, Registered data, domain, domain name system, web, distributed database,
database schema, Heterogeneities, Model Heterogeneities, Data model, Synonyms, Homonyms, Attribute,
Entity
Jarrar © 2013 4
Example from the government Domain
Consider all interactions with government agencies in order
to register a new business in Palestine.
Example: Establishing a new Radio Station.
Ministry of
Telecom
Ministry of
Information
Ministry of National
Economy
Chamber of
Commerce
Ministry of
Finance
Jarrar © 2013 5
Example from the government Domain
Consider when the business evolves or changes.
Example: Changing the address of the radio station.
– Address must be changed in 5 different databases.
Ministry of
Telecom
Ministry of
Information
Ministry of National
Economy
Chamber of
Commerce
Ministry of
Finance
Jarrar © 2013 6
Example from the government Domain
Consider the data registered about the same radio station in
the databases of different ministries and governmental
agencies:
ID Name Type City
R2563I Radio Al-Amal Radio Station Ramallah
B_ID Business Name Activity Type Province
LM1847 Al-Amal
Broadcast
Radio
Broadcasting
Ramallah
and Bireh
ID Company Name Company Type Location
182NS3 Broadcast Al-
Amal
Broadcasting
Station
Al-Balu’
Agency 1
Agency 2
Agency 3
. . .
Jarrar © 2013 7
Example from the government Domain
From our simple example one can point out to some
challenges in Data Integration:
– No agreed upon naming (name, business name, company name)
– No agreed upon meaning (Does ’Activity Type’ mean exactly the
same as ‘Company Type’?)
– Different Registered Data: Radio Al-Amal, Al-Amal Broadcast, ….
ID Name Type City
R2563I Radio Al-Amal Radio Station Ramallah
B_ID Business Name Activity Type Province
LM1847 Al-Amal
Broadcast
Radio
Broadcasting
Ramallah
and Bireh
ID Company Name Company Type Location
182NS3 Broadcast Al-
Amal
Broadcasting
Station
Al-Balu’
Agency 1
Agency 2
Agency 3
. . .
Jarrar © 2013 8
Problem is in all domains
Jarrar © 2013 9
Problem is in all domains
Problem is now even more challenging with the Web.
The Data Web envisions the web as a global world-wide
database.
This means that one can query distributed multiple databases
on the web as if he/she is querying a local database.
Jarrar © 2013 10
Challenges of Data Integration:
Heterogeneities in Database Schemas
One can distinguish between several heterogeneities
between different schemas:
– Name Heterogeneities (difference in used vocabulary).
– Meaning Heterogeneities (different meaning for the same attribute
in two schemas).
– Heterogeneities in the structure and type.
– Heterogeneities in the rules and constraints.
– Data Model Heterogeneities.
Jarrar © 2013 11
Name and Meaning Heterogeneities
Synonyms – Different names for the same concepts
– employee, clerk
– exam, course
– code, num
Homonyms – Same name for different concepts (different
meanings)
- City as City of birth in one schema,
- City as City of Residence in another schema
Saraly: Net Salary
Salary: Gross Salary
Section
Division
Synonyms
Homonyms
A specialized
division of a
large
organization
Jarrar © 2013 12
Heterogeneities in Structure and Type
The same concepts are represented with
different conceptual structures in two schemas:
– Attribute in one schema and derived value in another schema.
– Attribute in one schema and entity in another schema.
– Entity in one schema and relationship in another schema.
– Different abstraction levels for the same concept in two schemas:
e.g. two entities with homonym names related by an IS-A hierarchy
in two schemas.
Source: Carlo Batini
Jarrar © 2013 13
Heterogeneities in Structure
EXAMPLES:
PUBLISHERBOOKBOOK
PUBLISHER
EMPLOYEE
DEPARTMENT
PROJECT
EMPLOYEE
PROJECT
Source: Carlo Batini
Person
WOMANMAN
GENDER
Person
Jarrar © 2013 14
Heterogeneities in Type
Examples:
 In a single attribute (e.g., Numberic, Alphanumeric).
E.g., the attribute “gender”:
– Male/Female
– M/F
– 0/1
 Year has a four digit domain in one schema and two digit domain
in another schema
 Different currencies (Euros, US Dollars, etc.)
 Different measure systems (kilos vs. pounds,
centigrade vs. Fahrenheit.)
 Different granularities (grams, kilos, etc.)
Jarrar © 2013 15
Heterogeneities in the rules and constraints
EXAMPLES:
– Different cardinalities in the same relationships
– Key conflicts
Source: Carlo Batini
Jarrar © 2013 16
Model Heterogeneities
Model Heterogeneities occurs when different databases adheres to
different data models:
– Relational Data Model, XML, RDF, Object-Oriented, OWL, ...
Solution: Reduce Model Heterogeneity by using one data model.
Example: Convert the Relational Model to RDF graph model.
Jarrar © 2013 17
References and Acknowledgement
• Carlo Batini: Course on Data Integration. BZU IT Summer School
2011.
• Stefano Spaccapietra: Information Integration. Presentation at the IFIP
Academy. Porto Alegre. 2005.
• Chris Bizer: The Emerging Web of Linked Data. Presentation at SRI
International, Artificial Intelligence Center. Menlo Park, USA. 2009.
Thanks to Anton Deik for helping me preparing this lecture

Más contenido relacionado

La actualidad más candente

Jarrar: SPARQL - RDF Query Language
Jarrar: SPARQL - RDF Query LanguageJarrar: SPARQL - RDF Query Language
Jarrar: SPARQL - RDF Query LanguageMustafa Jarrar
 
How Semantics Solves Big Data Challenges
How Semantics Solves Big Data ChallengesHow Semantics Solves Big Data Challenges
How Semantics Solves Big Data ChallengesDATAVERSITY
 
George thomas gtra2010
George thomas gtra2010George thomas gtra2010
George thomas gtra2010George Thomas
 
An introduction to Linked (Open) Data
An introduction to Linked (Open) DataAn introduction to Linked (Open) Data
An introduction to Linked (Open) DataAli Khalili
 
Arcomem training enrichment_advanced
Arcomem training enrichment_advancedArcomem training enrichment_advanced
Arcomem training enrichment_advancedarcomem
 
Arcomem training – Enrichment Advanced (update)
Arcomem training – Enrichment Advanced (update)Arcomem training – Enrichment Advanced (update)
Arcomem training – Enrichment Advanced (update)arcomem
 
Semantics for Big Data Integration and Analysis
Semantics for Big Data Integration and AnalysisSemantics for Big Data Integration and Analysis
Semantics for Big Data Integration and AnalysisCraig Knoblock
 
Dbms Notes Lecture 4 : Data Models in DBMS
Dbms Notes Lecture 4 : Data Models in DBMSDbms Notes Lecture 4 : Data Models in DBMS
Dbms Notes Lecture 4 : Data Models in DBMSBIT Durg
 
Introduction to RDF & SPARQL
Introduction to RDF & SPARQLIntroduction to RDF & SPARQL
Introduction to RDF & SPARQLOpen Data Support
 
Linking Open Government Data at Scale
Linking Open Government Data at Scale Linking Open Government Data at Scale
Linking Open Government Data at Scale Bernadette Hyland-Wood
 
Big Linked Data - Creating Training Curricula
Big Linked Data - Creating Training CurriculaBig Linked Data - Creating Training Curricula
Big Linked Data - Creating Training CurriculaEUCLID project
 
A semantic based approach for information retrieval from html documents using...
A semantic based approach for information retrieval from html documents using...A semantic based approach for information retrieval from html documents using...
A semantic based approach for information retrieval from html documents using...csandit
 
A SEMANTIC BASED APPROACH FOR INFORMATION RETRIEVAL FROM HTML DOCUMENTS USING...
A SEMANTIC BASED APPROACH FOR INFORMATION RETRIEVAL FROM HTML DOCUMENTS USING...A SEMANTIC BASED APPROACH FOR INFORMATION RETRIEVAL FROM HTML DOCUMENTS USING...
A SEMANTIC BASED APPROACH FOR INFORMATION RETRIEVAL FROM HTML DOCUMENTS USING...cscpconf
 
Data.dcs: Converting Legacy Data into Linked Data
Data.dcs: Converting Legacy Data into Linked DataData.dcs: Converting Legacy Data into Linked Data
Data.dcs: Converting Legacy Data into Linked DataMatthew Rowe
 

La actualidad más candente (20)

Gt ea2009
Gt ea2009Gt ea2009
Gt ea2009
 
Jarrar: SPARQL - RDF Query Language
Jarrar: SPARQL - RDF Query LanguageJarrar: SPARQL - RDF Query Language
Jarrar: SPARQL - RDF Query Language
 
How Semantics Solves Big Data Challenges
How Semantics Solves Big Data ChallengesHow Semantics Solves Big Data Challenges
How Semantics Solves Big Data Challenges
 
George thomas gtra2010
George thomas gtra2010George thomas gtra2010
George thomas gtra2010
 
An introduction to Linked (Open) Data
An introduction to Linked (Open) DataAn introduction to Linked (Open) Data
An introduction to Linked (Open) Data
 
Arcomem training enrichment_advanced
Arcomem training enrichment_advancedArcomem training enrichment_advanced
Arcomem training enrichment_advanced
 
Arcomem training – Enrichment Advanced (update)
Arcomem training – Enrichment Advanced (update)Arcomem training – Enrichment Advanced (update)
Arcomem training – Enrichment Advanced (update)
 
Semantics for Big Data Integration and Analysis
Semantics for Big Data Integration and AnalysisSemantics for Big Data Integration and Analysis
Semantics for Big Data Integration and Analysis
 
Dbms Notes Lecture 4 : Data Models in DBMS
Dbms Notes Lecture 4 : Data Models in DBMSDbms Notes Lecture 4 : Data Models in DBMS
Dbms Notes Lecture 4 : Data Models in DBMS
 
Quick Introduction to the Semantic Web, RDFa & Microformats
Quick Introduction to the Semantic Web, RDFa & MicroformatsQuick Introduction to the Semantic Web, RDFa & Microformats
Quick Introduction to the Semantic Web, RDFa & Microformats
 
Introduction to RDF & SPARQL
Introduction to RDF & SPARQLIntroduction to RDF & SPARQL
Introduction to RDF & SPARQL
 
Linking Open Government Data at Scale
Linking Open Government Data at Scale Linking Open Government Data at Scale
Linking Open Government Data at Scale
 
Big Linked Data - Creating Training Curricula
Big Linked Data - Creating Training CurriculaBig Linked Data - Creating Training Curricula
Big Linked Data - Creating Training Curricula
 
A semantic based approach for information retrieval from html documents using...
A semantic based approach for information retrieval from html documents using...A semantic based approach for information retrieval from html documents using...
A semantic based approach for information retrieval from html documents using...
 
A SEMANTIC BASED APPROACH FOR INFORMATION RETRIEVAL FROM HTML DOCUMENTS USING...
A SEMANTIC BASED APPROACH FOR INFORMATION RETRIEVAL FROM HTML DOCUMENTS USING...A SEMANTIC BASED APPROACH FOR INFORMATION RETRIEVAL FROM HTML DOCUMENTS USING...
A SEMANTIC BASED APPROACH FOR INFORMATION RETRIEVAL FROM HTML DOCUMENTS USING...
 
Data.dcs: Converting Legacy Data into Linked Data
Data.dcs: Converting Legacy Data into Linked DataData.dcs: Converting Legacy Data into Linked Data
Data.dcs: Converting Legacy Data into Linked Data
 
Semantics and Web 3.0
Semantics and Web 3.0Semantics and Web 3.0
Semantics and Web 3.0
 
Data models
Data modelsData models
Data models
 
Linked Open Data
Linked Open DataLinked Open Data
Linked Open Data
 
wedi
wediwedi
wedi
 

Similar a Jarrar: Introduction to data Integration

Jarrar: Introduction to Data Integration
Jarrar: Introduction to Data IntegrationJarrar: Introduction to Data Integration
Jarrar: Introduction to Data IntegrationMustafa Jarrar
 
Jarrar: Data Schema Integration
Jarrar: Data Schema IntegrationJarrar: Data Schema Integration
Jarrar: Data Schema IntegrationMustafa Jarrar
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)theijes
 
Legal Technology - State Bar of CA - Solo/Small Firm Summit
Legal Technology - State Bar of CA - Solo/Small Firm SummitLegal Technology - State Bar of CA - Solo/Small Firm Summit
Legal Technology - State Bar of CA - Solo/Small Firm SummitRon Dolin
 
Jarrar: Data Integration and Fusion using RDF
Jarrar: Data Integration and Fusion using RDFJarrar: Data Integration and Fusion using RDF
Jarrar: Data Integration and Fusion using RDFMustafa Jarrar
 
Jarrar: The Next Generation of the Web 3.0: The Semantic Web
Jarrar: The Next Generation of the Web 3.0: The Semantic WebJarrar: The Next Generation of the Web 3.0: The Semantic Web
Jarrar: The Next Generation of the Web 3.0: The Semantic WebMustafa Jarrar
 
Jarrar: Logical Foundation of Ontology Engineering
Jarrar: Logical Foundation of Ontology EngineeringJarrar: Logical Foundation of Ontology Engineering
Jarrar: Logical Foundation of Ontology EngineeringMustafa Jarrar
 
IOT, Real World Things, & Linked data
IOT, Real World Things, & Linked dataIOT, Real World Things, & Linked data
IOT, Real World Things, & Linked datarobin fay
 
Jarrar: Web 2 Data Mashups
Jarrar: Web 2 Data MashupsJarrar: Web 2 Data Mashups
Jarrar: Web 2 Data MashupsMustafa Jarrar
 
Running head DATABASE PROJECT 1DATABASE PROJECT 1Database S.docx
Running head DATABASE PROJECT 1DATABASE PROJECT 1Database S.docxRunning head DATABASE PROJECT 1DATABASE PROJECT 1Database S.docx
Running head DATABASE PROJECT 1DATABASE PROJECT 1Database S.docxtodd271
 
Oracle Sql Developer Data Modeler 3 3 new features
Oracle Sql Developer Data Modeler 3 3 new featuresOracle Sql Developer Data Modeler 3 3 new features
Oracle Sql Developer Data Modeler 3 3 new featuresPhilip Stoyanov
 
ESWC2008 Identity OpenLink - On The Evolution of Terms
ESWC2008 Identity OpenLink - On The Evolution of TermsESWC2008 Identity OpenLink - On The Evolution of Terms
ESWC2008 Identity OpenLink - On The Evolution of Termsrumito
 
Questions On The And Football
Questions On The And FootballQuestions On The And Football
Questions On The And FootballAmanda Gray
 
17 define workforce records
17   define workforce records17   define workforce records
17 define workforce recordsmohamed refaei
 
Stephen Buxton: When RDF alone is not enough - triples, documents, and data i...
Stephen Buxton: When RDF alone is not enough - triples, documents, and data i...Stephen Buxton: When RDF alone is not enough - triples, documents, and data i...
Stephen Buxton: When RDF alone is not enough - triples, documents, and data i...Semantic Web Company
 
Role of metadata in transportation agency data programs
Role of metadata in transportation agency data programsRole of metadata in transportation agency data programs
Role of metadata in transportation agency data programsJoseph Busch
 
UNITII -MODELING and AGGRIGATION-sss.docx
UNITII -MODELING  and AGGRIGATION-sss.docxUNITII -MODELING  and AGGRIGATION-sss.docx
UNITII -MODELING and AGGRIGATION-sss.docxsharwarisolapur
 
Semantic Modeling for Information Federation
Semantic Modeling for Information FederationSemantic Modeling for Information Federation
Semantic Modeling for Information FederationCory Casanave
 

Similar a Jarrar: Introduction to data Integration (20)

Jarrar: Introduction to Data Integration
Jarrar: Introduction to Data IntegrationJarrar: Introduction to Data Integration
Jarrar: Introduction to Data Integration
 
Jarrar: Data Schema Integration
Jarrar: Data Schema IntegrationJarrar: Data Schema Integration
Jarrar: Data Schema Integration
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)
 
Legal Technology - State Bar of CA - Solo/Small Firm Summit
Legal Technology - State Bar of CA - Solo/Small Firm SummitLegal Technology - State Bar of CA - Solo/Small Firm Summit
Legal Technology - State Bar of CA - Solo/Small Firm Summit
 
Jarrar: Data Integration and Fusion using RDF
Jarrar: Data Integration and Fusion using RDFJarrar: Data Integration and Fusion using RDF
Jarrar: Data Integration and Fusion using RDF
 
Jarrar: The Next Generation of the Web 3.0: The Semantic Web
Jarrar: The Next Generation of the Web 3.0: The Semantic WebJarrar: The Next Generation of the Web 3.0: The Semantic Web
Jarrar: The Next Generation of the Web 3.0: The Semantic Web
 
Jarrar: Logical Foundation of Ontology Engineering
Jarrar: Logical Foundation of Ontology EngineeringJarrar: Logical Foundation of Ontology Engineering
Jarrar: Logical Foundation of Ontology Engineering
 
IOT, Real World Things, & Linked data
IOT, Real World Things, & Linked dataIOT, Real World Things, & Linked data
IOT, Real World Things, & Linked data
 
Jarrar: Web 2 Data Mashups
Jarrar: Web 2 Data MashupsJarrar: Web 2 Data Mashups
Jarrar: Web 2 Data Mashups
 
Running head DATABASE PROJECT 1DATABASE PROJECT 1Database S.docx
Running head DATABASE PROJECT 1DATABASE PROJECT 1Database S.docxRunning head DATABASE PROJECT 1DATABASE PROJECT 1Database S.docx
Running head DATABASE PROJECT 1DATABASE PROJECT 1Database S.docx
 
Oracle Sql Developer Data Modeler 3 3 new features
Oracle Sql Developer Data Modeler 3 3 new featuresOracle Sql Developer Data Modeler 3 3 new features
Oracle Sql Developer Data Modeler 3 3 new features
 
ESWC2008 Identity OpenLink - On The Evolution of Terms
ESWC2008 Identity OpenLink - On The Evolution of TermsESWC2008 Identity OpenLink - On The Evolution of Terms
ESWC2008 Identity OpenLink - On The Evolution of Terms
 
Questions On The And Football
Questions On The And FootballQuestions On The And Football
Questions On The And Football
 
17 define workforce records
17   define workforce records17   define workforce records
17 define workforce records
 
Ultra large scale systems to design interoperability
Ultra large scale systems to design interoperabilityUltra large scale systems to design interoperability
Ultra large scale systems to design interoperability
 
Stephen Buxton: When RDF alone is not enough - triples, documents, and data i...
Stephen Buxton: When RDF alone is not enough - triples, documents, and data i...Stephen Buxton: When RDF alone is not enough - triples, documents, and data i...
Stephen Buxton: When RDF alone is not enough - triples, documents, and data i...
 
Role of metadata in transportation agency data programs
Role of metadata in transportation agency data programsRole of metadata in transportation agency data programs
Role of metadata in transportation agency data programs
 
UNITII -MODELING and AGGRIGATION-sss.docx
UNITII -MODELING  and AGGRIGATION-sss.docxUNITII -MODELING  and AGGRIGATION-sss.docx
UNITII -MODELING and AGGRIGATION-sss.docx
 
ER Diagram
ER DiagramER Diagram
ER Diagram
 
Semantic Modeling for Information Federation
Semantic Modeling for Information FederationSemantic Modeling for Information Federation
Semantic Modeling for Information Federation
 

Más de Mustafa Jarrar

Clustering Arabic Tweets for Sentiment Analysis
Clustering Arabic Tweets for Sentiment AnalysisClustering Arabic Tweets for Sentiment Analysis
Clustering Arabic Tweets for Sentiment AnalysisMustafa Jarrar
 
Classifying Processes and Basic Formal Ontology
Classifying Processes  and Basic Formal OntologyClassifying Processes  and Basic Formal Ontology
Classifying Processes and Basic Formal OntologyMustafa Jarrar
 
Discrete Mathematics Course Outline
Discrete Mathematics Course OutlineDiscrete Mathematics Course Outline
Discrete Mathematics Course OutlineMustafa Jarrar
 
Business Process Implementation
Business Process ImplementationBusiness Process Implementation
Business Process ImplementationMustafa Jarrar
 
Business Process Design and Re-engineering
Business Process Design and Re-engineeringBusiness Process Design and Re-engineering
Business Process Design and Re-engineeringMustafa Jarrar
 
BPMN 2.0 Analytical Constructs
BPMN 2.0 Analytical ConstructsBPMN 2.0 Analytical Constructs
BPMN 2.0 Analytical ConstructsMustafa Jarrar
 
BPMN 2.0 Descriptive Constructs
BPMN 2.0 Descriptive Constructs  BPMN 2.0 Descriptive Constructs
BPMN 2.0 Descriptive Constructs Mustafa Jarrar
 
Introduction to Business Process Management
Introduction to Business Process ManagementIntroduction to Business Process Management
Introduction to Business Process ManagementMustafa Jarrar
 
Customer Complaint Ontology
Customer Complaint Ontology Customer Complaint Ontology
Customer Complaint Ontology Mustafa Jarrar
 
Subset, Equality, and Exclusion Rules
Subset, Equality, and Exclusion RulesSubset, Equality, and Exclusion Rules
Subset, Equality, and Exclusion RulesMustafa Jarrar
 
Schema Modularization in ORM
Schema Modularization in ORMSchema Modularization in ORM
Schema Modularization in ORMMustafa Jarrar
 
On Computer Science Trends and Priorities in Palestine
On Computer Science Trends and Priorities in PalestineOn Computer Science Trends and Priorities in Palestine
On Computer Science Trends and Priorities in PalestineMustafa Jarrar
 
Lessons from Class Recording & Publishing of Eight Online Courses
Lessons from Class Recording & Publishing of Eight Online CoursesLessons from Class Recording & Publishing of Eight Online Courses
Lessons from Class Recording & Publishing of Eight Online CoursesMustafa Jarrar
 
Presentation curras paper-emnlp2014-final
Presentation curras paper-emnlp2014-finalPresentation curras paper-emnlp2014-final
Presentation curras paper-emnlp2014-finalMustafa Jarrar
 
Jarrar: Future Internet in Horizon 2020 Calls
Jarrar: Future Internet in Horizon 2020 CallsJarrar: Future Internet in Horizon 2020 Calls
Jarrar: Future Internet in Horizon 2020 CallsMustafa Jarrar
 
Habash: Arabic Natural Language Processing
Habash: Arabic Natural Language ProcessingHabash: Arabic Natural Language Processing
Habash: Arabic Natural Language ProcessingMustafa Jarrar
 
Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Mustafa Jarrar
 
Riestra: How to Design and engineer Competitive Horizon 2020 Proposals
Riestra: How to Design and engineer Competitive Horizon 2020 ProposalsRiestra: How to Design and engineer Competitive Horizon 2020 Proposals
Riestra: How to Design and engineer Competitive Horizon 2020 ProposalsMustafa Jarrar
 
Bouquet: SIERA Workshop on The Pillars of Horizon2020
Bouquet: SIERA Workshop on The Pillars of Horizon2020Bouquet: SIERA Workshop on The Pillars of Horizon2020
Bouquet: SIERA Workshop on The Pillars of Horizon2020Mustafa Jarrar
 
Jarrar: Sparql Project
Jarrar: Sparql ProjectJarrar: Sparql Project
Jarrar: Sparql ProjectMustafa Jarrar
 

Más de Mustafa Jarrar (20)

Clustering Arabic Tweets for Sentiment Analysis
Clustering Arabic Tweets for Sentiment AnalysisClustering Arabic Tweets for Sentiment Analysis
Clustering Arabic Tweets for Sentiment Analysis
 
Classifying Processes and Basic Formal Ontology
Classifying Processes  and Basic Formal OntologyClassifying Processes  and Basic Formal Ontology
Classifying Processes and Basic Formal Ontology
 
Discrete Mathematics Course Outline
Discrete Mathematics Course OutlineDiscrete Mathematics Course Outline
Discrete Mathematics Course Outline
 
Business Process Implementation
Business Process ImplementationBusiness Process Implementation
Business Process Implementation
 
Business Process Design and Re-engineering
Business Process Design and Re-engineeringBusiness Process Design and Re-engineering
Business Process Design and Re-engineering
 
BPMN 2.0 Analytical Constructs
BPMN 2.0 Analytical ConstructsBPMN 2.0 Analytical Constructs
BPMN 2.0 Analytical Constructs
 
BPMN 2.0 Descriptive Constructs
BPMN 2.0 Descriptive Constructs  BPMN 2.0 Descriptive Constructs
BPMN 2.0 Descriptive Constructs
 
Introduction to Business Process Management
Introduction to Business Process ManagementIntroduction to Business Process Management
Introduction to Business Process Management
 
Customer Complaint Ontology
Customer Complaint Ontology Customer Complaint Ontology
Customer Complaint Ontology
 
Subset, Equality, and Exclusion Rules
Subset, Equality, and Exclusion RulesSubset, Equality, and Exclusion Rules
Subset, Equality, and Exclusion Rules
 
Schema Modularization in ORM
Schema Modularization in ORMSchema Modularization in ORM
Schema Modularization in ORM
 
On Computer Science Trends and Priorities in Palestine
On Computer Science Trends and Priorities in PalestineOn Computer Science Trends and Priorities in Palestine
On Computer Science Trends and Priorities in Palestine
 
Lessons from Class Recording & Publishing of Eight Online Courses
Lessons from Class Recording & Publishing of Eight Online CoursesLessons from Class Recording & Publishing of Eight Online Courses
Lessons from Class Recording & Publishing of Eight Online Courses
 
Presentation curras paper-emnlp2014-final
Presentation curras paper-emnlp2014-finalPresentation curras paper-emnlp2014-final
Presentation curras paper-emnlp2014-final
 
Jarrar: Future Internet in Horizon 2020 Calls
Jarrar: Future Internet in Horizon 2020 CallsJarrar: Future Internet in Horizon 2020 Calls
Jarrar: Future Internet in Horizon 2020 Calls
 
Habash: Arabic Natural Language Processing
Habash: Arabic Natural Language ProcessingHabash: Arabic Natural Language Processing
Habash: Arabic Natural Language Processing
 
Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing
 
Riestra: How to Design and engineer Competitive Horizon 2020 Proposals
Riestra: How to Design and engineer Competitive Horizon 2020 ProposalsRiestra: How to Design and engineer Competitive Horizon 2020 Proposals
Riestra: How to Design and engineer Competitive Horizon 2020 Proposals
 
Bouquet: SIERA Workshop on The Pillars of Horizon2020
Bouquet: SIERA Workshop on The Pillars of Horizon2020Bouquet: SIERA Workshop on The Pillars of Horizon2020
Bouquet: SIERA Workshop on The Pillars of Horizon2020
 
Jarrar: Sparql Project
Jarrar: Sparql ProjectJarrar: Sparql Project
Jarrar: Sparql Project
 

Último

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 

Último (20)

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 

Jarrar: Introduction to data Integration

  • 1. Jarrar © 2013 1 Dr. Mustafa Jarrar University of Birzeit mjarrar@birzeit.edu www.jarrar.info Lecture Notes, Web Data Management Birzeit University, Palestine 2013 Introduction to Data Integration
  • 2. Jarrar © 2013 2 Watch this lecture and download the slides from http://jarrar-courses.blogspot.com/2014/01/web-data-management.html
  • 3. Jarrar © 2013 3 Outline Example from the government Domain - Problem is in all domains - Challenges of Data Integration: - Heterogeneities in Database Schemas - Name and Meaning Heterogeneities - Heterogeneities in Structure and Type - Heterogeneities in the rules and constraints - Model Heterogeneities Keywords: Data Integration, Registered data, domain, domain name system, web, distributed database, database schema, Heterogeneities, Model Heterogeneities, Data model, Synonyms, Homonyms, Attribute, Entity
  • 4. Jarrar © 2013 4 Example from the government Domain Consider all interactions with government agencies in order to register a new business in Palestine. Example: Establishing a new Radio Station. Ministry of Telecom Ministry of Information Ministry of National Economy Chamber of Commerce Ministry of Finance
  • 5. Jarrar © 2013 5 Example from the government Domain Consider when the business evolves or changes. Example: Changing the address of the radio station. – Address must be changed in 5 different databases. Ministry of Telecom Ministry of Information Ministry of National Economy Chamber of Commerce Ministry of Finance
  • 6. Jarrar © 2013 6 Example from the government Domain Consider the data registered about the same radio station in the databases of different ministries and governmental agencies: ID Name Type City R2563I Radio Al-Amal Radio Station Ramallah B_ID Business Name Activity Type Province LM1847 Al-Amal Broadcast Radio Broadcasting Ramallah and Bireh ID Company Name Company Type Location 182NS3 Broadcast Al- Amal Broadcasting Station Al-Balu’ Agency 1 Agency 2 Agency 3 . . .
  • 7. Jarrar © 2013 7 Example from the government Domain From our simple example one can point out to some challenges in Data Integration: – No agreed upon naming (name, business name, company name) – No agreed upon meaning (Does ’Activity Type’ mean exactly the same as ‘Company Type’?) – Different Registered Data: Radio Al-Amal, Al-Amal Broadcast, …. ID Name Type City R2563I Radio Al-Amal Radio Station Ramallah B_ID Business Name Activity Type Province LM1847 Al-Amal Broadcast Radio Broadcasting Ramallah and Bireh ID Company Name Company Type Location 182NS3 Broadcast Al- Amal Broadcasting Station Al-Balu’ Agency 1 Agency 2 Agency 3 . . .
  • 8. Jarrar © 2013 8 Problem is in all domains
  • 9. Jarrar © 2013 9 Problem is in all domains Problem is now even more challenging with the Web. The Data Web envisions the web as a global world-wide database. This means that one can query distributed multiple databases on the web as if he/she is querying a local database.
  • 10. Jarrar © 2013 10 Challenges of Data Integration: Heterogeneities in Database Schemas One can distinguish between several heterogeneities between different schemas: – Name Heterogeneities (difference in used vocabulary). – Meaning Heterogeneities (different meaning for the same attribute in two schemas). – Heterogeneities in the structure and type. – Heterogeneities in the rules and constraints. – Data Model Heterogeneities.
  • 11. Jarrar © 2013 11 Name and Meaning Heterogeneities Synonyms – Different names for the same concepts – employee, clerk – exam, course – code, num Homonyms – Same name for different concepts (different meanings) - City as City of birth in one schema, - City as City of Residence in another schema Saraly: Net Salary Salary: Gross Salary Section Division Synonyms Homonyms A specialized division of a large organization
  • 12. Jarrar © 2013 12 Heterogeneities in Structure and Type The same concepts are represented with different conceptual structures in two schemas: – Attribute in one schema and derived value in another schema. – Attribute in one schema and entity in another schema. – Entity in one schema and relationship in another schema. – Different abstraction levels for the same concept in two schemas: e.g. two entities with homonym names related by an IS-A hierarchy in two schemas. Source: Carlo Batini
  • 13. Jarrar © 2013 13 Heterogeneities in Structure EXAMPLES: PUBLISHERBOOKBOOK PUBLISHER EMPLOYEE DEPARTMENT PROJECT EMPLOYEE PROJECT Source: Carlo Batini Person WOMANMAN GENDER Person
  • 14. Jarrar © 2013 14 Heterogeneities in Type Examples:  In a single attribute (e.g., Numberic, Alphanumeric). E.g., the attribute “gender”: – Male/Female – M/F – 0/1  Year has a four digit domain in one schema and two digit domain in another schema  Different currencies (Euros, US Dollars, etc.)  Different measure systems (kilos vs. pounds, centigrade vs. Fahrenheit.)  Different granularities (grams, kilos, etc.)
  • 15. Jarrar © 2013 15 Heterogeneities in the rules and constraints EXAMPLES: – Different cardinalities in the same relationships – Key conflicts Source: Carlo Batini
  • 16. Jarrar © 2013 16 Model Heterogeneities Model Heterogeneities occurs when different databases adheres to different data models: – Relational Data Model, XML, RDF, Object-Oriented, OWL, ... Solution: Reduce Model Heterogeneity by using one data model. Example: Convert the Relational Model to RDF graph model.
  • 17. Jarrar © 2013 17 References and Acknowledgement • Carlo Batini: Course on Data Integration. BZU IT Summer School 2011. • Stefano Spaccapietra: Information Integration. Presentation at the IFIP Academy. Porto Alegre. 2005. • Chris Bizer: The Emerging Web of Linked Data. Presentation at SRI International, Artificial Intelligence Center. Menlo Park, USA. 2009. Thanks to Anton Deik for helping me preparing this lecture