SlideShare una empresa de Scribd logo
1 de 16
Mustafa Jarrar
Lecture Notes, Web Data Management (MCOM7348)
University of Birzeit, Palestine
1st Semester, 2013

Introduction to Data Integration

Dr. Mustafa Jarrar
University of Birzeit
mjarrar@birzeit.edu
www.jarrar.info
Jarrar © 2013

1
Watch this lecture and download the slides from
http://jarrar-courses.blogspot.com/2013/11/web-data-management.html

Jarrar © 2013

2
Example from the government Domain
Consider all interactions with government agencies in order
to register a new business in Palestine.
Example: Establishing a new Radio Station.

Ministry of
Telecom

Ministry of
Information

Ministry of
National Economy

Jarrar © 2013

Ministry of
Finance

Chamber of
Commerce

3
Example from the government Domain
Consider when the business evolves or changes.
Example: Changing the address of the radio station.
–  Address must be changed in 5 different databases.

Ministry of
Telecom

Ministry of
Information

Ministry of
National Economy

Jarrar © 2013

Ministry of
Finance

Chamber of
Commerce

4
Example from the government Domain
Consider the data registered about the same radio station in
the databases of different ministries and governmental
agencies:

ID

Agency 3

R2563I

Radio Al-Amal

Radio Station Ramallah

Business Name

Activity Type

Province

LM1847

Al-Amal
Broadcast

Radio
Broadcasting

Ramallah
and Bireh

ID

Agency 2

Type

B_ID

Agency 1

Name

Company Name

Company Type

Location

182NS3

Broadcast AlAmal

Broadcasting
Station

Al-Balu’

...

Jarrar © 2013

City

5
Example from the government Domain
From our simple example one can point out to some
challenges in Data Integration:
–  No agreed upon naming (name, business name, company name)
–  No agreed upon meaning (Does ’Activity Type’ mean exactly the
same as ‘Company Type’?)
–  Different Registered Data: Radio Al-Amal, Al-Amal Broadcast, ….
ID

Agency 3

R2563I

Radio Al-Amal

Radio Station Ramallah

Business Name

Activity Type

Province

LM1847

Al-Amal
Broadcast

Radio
Broadcasting

Ramallah
and Bireh

ID

Agency 2

Type

B_ID

Agency 1

Name

Company Name

Company Type

Location

182NS3

Broadcast AlAmal

Broadcasting
Station

Al-Balu’

...

Jarrar © 2013

City

6
Problem is in all domains

Jarrar © 2013

7
Problem is in all domains
Problem is now even more challenging with the Web.
The Data Web envisions the web as a global world-wide
database.
This means that one can query distributed multiple databases
on the web as if he/she is querying a local database.

Jarrar © 2013

8
Challenges of Data Integration:
Heterogeneities in Database Schemas
One can distinguish between several heterogeneities
between different schemas:
–  Name Heterogeneities (difference in used vocabulary).
–  Meaning Heterogeneities (different meaning for the same attribute
in two schemas).
–  Heterogeneities in the structure and type.
–  Heterogeneities in the rules and constraints.
–  Data Model Heterogeneities.

Jarrar © 2013

9
Name and Meaning Heterogeneities
Synonyms – Different names for the same concepts
–  employee, clerk
–  exam, course
–  code, num

Homonyms – Same name for different concepts (different
meanings)
- City as City of birth in one schema,
- City as City of Residence in another schema
Saraly: Net Salary

Section

Salary: Gross Salary

Division

Homonyms

A specialized
division of a
large
organization

Synonyms
Jarrar © 2013

10
Heterogeneities in Structure and Type
Source: Carlo Batini

The same concepts are represented with
different conceptual structures in two schemas:
–  Attribute in one schema and derived value in another schema.
–  Attribute in one schema and entity in another schema.
–  Entity in one schema and relationship in another schema.
–  Different abstraction levels for the same concept in two schemas:
e.g. two entities with homonym names related by an IS-A hierarchy
in two schemas.

Jarrar © 2013

11
Heterogeneities in Structure
Source: Carlo Batini

EXAMPLES:
EMPLOYEE

Person
MAN

Person

GENDER

EMPLOYEE

DEPARTMENT

PROJECT

WOMAN
PROJECT

BOOK

BOOK

PUBLISHER

PUBLISHER

Jarrar © 2013

12
Heterogeneities in Type
Examples:
§  In a single attribute (e.g., Numberic, Alphanumeric).
E.g., the attribute “gender”:
–  Male/Female
–  M/F
–  0/1
§  Year has a four digit domain in one schema and two digit domain
in another schema

§  Different currencies (Euros, US Dollars, etc.)
§  Different measure systems (kilos vs. pounds,
centigrade vs. Fahrenheit.)
§  Different granularities (grams, kilos, etc.)
Jarrar © 2013

13
Heterogeneities in the rules and constraints
Source: Carlo Batini

EXAMPLES:
–  Different cardinalities in the same relationships
–  Key conflicts

Jarrar © 2013

14
Model Heterogeneities
Model Heterogeneities occurs when different databases adheres to
different data models:
–  Relational Data Model, XML, RDF, Object-Oriented, OWL, ...

Solution: Reduce Model Heterogeneity by using one data model.
Example: Convert the Relational Model to RDF graph model.

Jarrar © 2013

15
References and Acknowledgement
•  Carlo Batini: Course on Data Integration. BZU IT Summer School
2011.
•  Stefano Spaccapietra: Information Integration. Presentation at the IFIP
Academy. Porto Alegre. 2005.
•  Chris Bizer: The Emerging Web of Linked Data. Presentation at SRI
International, Artificial Intelligence Center. Menlo Park, USA. 2009.

Thanks to Anton Deik for helping me preparing this lecture

Jarrar © 2013

16

Más contenido relacionado

Destacado

Chapter12 designing databases
Chapter12 designing databasesChapter12 designing databases
Chapter12 designing databasesDhani Ahmad
 
Jarrar: Data Integration and Fusion using RDF
Jarrar: Data Integration and Fusion using RDFJarrar: Data Integration and Fusion using RDF
Jarrar: Data Integration and Fusion using RDFMustafa Jarrar
 
Jarrar: Knowledge Engineering- Course Outline
Jarrar: Knowledge Engineering- Course OutlineJarrar: Knowledge Engineering- Course Outline
Jarrar: Knowledge Engineering- Course OutlineMustafa Jarrar
 
Jarrar: Web 2 Data Mashups
Jarrar: Web 2 Data MashupsJarrar: Web 2 Data Mashups
Jarrar: Web 2 Data MashupsMustafa Jarrar
 
Jarrar: Architectural Solutions in Data Integration
Jarrar: Architectural Solutions in Data IntegrationJarrar: Architectural Solutions in Data Integration
Jarrar: Architectural Solutions in Data IntegrationMustafa Jarrar
 
Jarrar: Subtype Relations and Constraints
Jarrar: Subtype Relations and ConstraintsJarrar: Subtype Relations and Constraints
Jarrar: Subtype Relations and ConstraintsMustafa Jarrar
 
Jarrar: SPARQL - RDF Query Language
Jarrar: SPARQL - RDF Query LanguageJarrar: SPARQL - RDF Query Language
Jarrar: SPARQL - RDF Query LanguageMustafa Jarrar
 
Jarrar: Sparql Project
Jarrar: Sparql ProjectJarrar: Sparql Project
Jarrar: Sparql ProjectMustafa Jarrar
 
Jarrar: RDF Stores: Challenges and Solutions
Jarrar: RDF Stores: Challenges and SolutionsJarrar: RDF Stores: Challenges and Solutions
Jarrar: RDF Stores: Challenges and SolutionsMustafa Jarrar
 
Jarrar: Data Fusion using RDF
Jarrar: Data Fusion using RDFJarrar: Data Fusion using RDF
Jarrar: Data Fusion using RDFMustafa Jarrar
 
Jarrar: RDFs -RDF Schema
Jarrar: RDFs -RDF SchemaJarrar: RDFs -RDF Schema
Jarrar: RDFs -RDF SchemaMustafa Jarrar
 
Jarrar: The Next Generation of the Web 3.0: The Semantic Web Vesion
Jarrar: The Next Generation of the Web 3.0: The Semantic Web VesionJarrar: The Next Generation of the Web 3.0: The Semantic Web Vesion
Jarrar: The Next Generation of the Web 3.0: The Semantic Web VesionMustafa Jarrar
 
Jarrar: OWL -Web Ontology Language
Jarrar: OWL -Web Ontology LanguageJarrar: OWL -Web Ontology Language
Jarrar: OWL -Web Ontology LanguageMustafa Jarrar
 
Jarrar: RDF Stores -Challenges and Solutions
Jarrar: RDF Stores -Challenges and SolutionsJarrar: RDF Stores -Challenges and Solutions
Jarrar: RDF Stores -Challenges and SolutionsMustafa Jarrar
 
Jarrar: OWL (Web Ontology Language)
Jarrar: OWL (Web Ontology Language)Jarrar: OWL (Web Ontology Language)
Jarrar: OWL (Web Ontology Language)Mustafa Jarrar
 
Jarrar: The Next Generation of the Web 3.0: The Semantic Web
Jarrar: The Next Generation of the Web 3.0: The Semantic WebJarrar: The Next Generation of the Web 3.0: The Semantic Web
Jarrar: The Next Generation of the Web 3.0: The Semantic WebMustafa Jarrar
 
Jarrar: Conceptual Schema Design Steps
Jarrar: Conceptual Schema Design Steps Jarrar: Conceptual Schema Design Steps
Jarrar: Conceptual Schema Design Steps Mustafa Jarrar
 

Destacado (20)

Chapter12 designing databases
Chapter12 designing databasesChapter12 designing databases
Chapter12 designing databases
 
Jarrar: Zinnar
Jarrar: ZinnarJarrar: Zinnar
Jarrar: Zinnar
 
Jarrar: Data Integration and Fusion using RDF
Jarrar: Data Integration and Fusion using RDFJarrar: Data Integration and Fusion using RDF
Jarrar: Data Integration and Fusion using RDF
 
Jarrar: Knowledge Engineering- Course Outline
Jarrar: Knowledge Engineering- Course OutlineJarrar: Knowledge Engineering- Course Outline
Jarrar: Knowledge Engineering- Course Outline
 
Jarrar: Linked Data
Jarrar: Linked DataJarrar: Linked Data
Jarrar: Linked Data
 
Jarrar: Web 2 Data Mashups
Jarrar: Web 2 Data MashupsJarrar: Web 2 Data Mashups
Jarrar: Web 2 Data Mashups
 
Jarrar: Architectural Solutions in Data Integration
Jarrar: Architectural Solutions in Data IntegrationJarrar: Architectural Solutions in Data Integration
Jarrar: Architectural Solutions in Data Integration
 
Jarrar: Subtype Relations and Constraints
Jarrar: Subtype Relations and ConstraintsJarrar: Subtype Relations and Constraints
Jarrar: Subtype Relations and Constraints
 
Jarrar: SPARQL - RDF Query Language
Jarrar: SPARQL - RDF Query LanguageJarrar: SPARQL - RDF Query Language
Jarrar: SPARQL - RDF Query Language
 
Jarrar: Sparql Project
Jarrar: Sparql ProjectJarrar: Sparql Project
Jarrar: Sparql Project
 
Jarrar: RDF Stores: Challenges and Solutions
Jarrar: RDF Stores: Challenges and SolutionsJarrar: RDF Stores: Challenges and Solutions
Jarrar: RDF Stores: Challenges and Solutions
 
Jarrar: Data Fusion using RDF
Jarrar: Data Fusion using RDFJarrar: Data Fusion using RDF
Jarrar: Data Fusion using RDF
 
Jarrar: RDFs -RDF Schema
Jarrar: RDFs -RDF SchemaJarrar: RDFs -RDF Schema
Jarrar: RDFs -RDF Schema
 
Jarrar: The Next Generation of the Web 3.0: The Semantic Web Vesion
Jarrar: The Next Generation of the Web 3.0: The Semantic Web VesionJarrar: The Next Generation of the Web 3.0: The Semantic Web Vesion
Jarrar: The Next Generation of the Web 3.0: The Semantic Web Vesion
 
Jarrar: RDFa
Jarrar: RDFaJarrar: RDFa
Jarrar: RDFa
 
Jarrar: OWL -Web Ontology Language
Jarrar: OWL -Web Ontology LanguageJarrar: OWL -Web Ontology Language
Jarrar: OWL -Web Ontology Language
 
Jarrar: RDF Stores -Challenges and Solutions
Jarrar: RDF Stores -Challenges and SolutionsJarrar: RDF Stores -Challenges and Solutions
Jarrar: RDF Stores -Challenges and Solutions
 
Jarrar: OWL (Web Ontology Language)
Jarrar: OWL (Web Ontology Language)Jarrar: OWL (Web Ontology Language)
Jarrar: OWL (Web Ontology Language)
 
Jarrar: The Next Generation of the Web 3.0: The Semantic Web
Jarrar: The Next Generation of the Web 3.0: The Semantic WebJarrar: The Next Generation of the Web 3.0: The Semantic Web
Jarrar: The Next Generation of the Web 3.0: The Semantic Web
 
Jarrar: Conceptual Schema Design Steps
Jarrar: Conceptual Schema Design Steps Jarrar: Conceptual Schema Design Steps
Jarrar: Conceptual Schema Design Steps
 

Similar a Jarrar: Introduction to Data Integration

Jarrar: Introduction to data Integration
Jarrar: Introduction to data IntegrationJarrar: Introduction to data Integration
Jarrar: Introduction to data IntegrationMustafa Jarrar
 
Jarrar: Data Schema Integration
Jarrar: Data Schema IntegrationJarrar: Data Schema Integration
Jarrar: Data Schema IntegrationMustafa Jarrar
 
Jarrar: Data Schema Integration
Jarrar: Data Schema Integration Jarrar: Data Schema Integration
Jarrar: Data Schema Integration Mustafa Jarrar
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)theijes
 
Jarrar: Logical Foundation of Ontology Engineering
Jarrar: Logical Foundation of Ontology EngineeringJarrar: Logical Foundation of Ontology Engineering
Jarrar: Logical Foundation of Ontology EngineeringMustafa Jarrar
 
Legal Technology - State Bar of CA - Solo/Small Firm Summit
Legal Technology - State Bar of CA - Solo/Small Firm SummitLegal Technology - State Bar of CA - Solo/Small Firm Summit
Legal Technology - State Bar of CA - Solo/Small Firm SummitRon Dolin
 
Jarrar: Introduction to Ontology
Jarrar: Introduction to OntologyJarrar: Introduction to Ontology
Jarrar: Introduction to OntologyMustafa Jarrar
 
All authors contributed equally.An Analysis of Categoric
 All authors contributed equally.An Analysis of Categoric All authors contributed equally.An Analysis of Categoric
All authors contributed equally.An Analysis of CategoricMargaritoWhitt221
 

Similar a Jarrar: Introduction to Data Integration (8)

Jarrar: Introduction to data Integration
Jarrar: Introduction to data IntegrationJarrar: Introduction to data Integration
Jarrar: Introduction to data Integration
 
Jarrar: Data Schema Integration
Jarrar: Data Schema IntegrationJarrar: Data Schema Integration
Jarrar: Data Schema Integration
 
Jarrar: Data Schema Integration
Jarrar: Data Schema Integration Jarrar: Data Schema Integration
Jarrar: Data Schema Integration
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)
 
Jarrar: Logical Foundation of Ontology Engineering
Jarrar: Logical Foundation of Ontology EngineeringJarrar: Logical Foundation of Ontology Engineering
Jarrar: Logical Foundation of Ontology Engineering
 
Legal Technology - State Bar of CA - Solo/Small Firm Summit
Legal Technology - State Bar of CA - Solo/Small Firm SummitLegal Technology - State Bar of CA - Solo/Small Firm Summit
Legal Technology - State Bar of CA - Solo/Small Firm Summit
 
Jarrar: Introduction to Ontology
Jarrar: Introduction to OntologyJarrar: Introduction to Ontology
Jarrar: Introduction to Ontology
 
All authors contributed equally.An Analysis of Categoric
 All authors contributed equally.An Analysis of Categoric All authors contributed equally.An Analysis of Categoric
All authors contributed equally.An Analysis of Categoric
 

Último

How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSCeline George
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Jisc
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxJisc
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Pooja Bhuva
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Association for Project Management
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the ClassroomPooky Knightsmith
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxheathfieldcps1
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxVishalSingh1417
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxEsquimalt MFRC
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfPoh-Sun Goh
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseAnaAcapella
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structuredhanjurrannsibayan2
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfDr Vijay Vishwakarma
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxmarlenawright1
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...pradhanghanshyam7136
 

Último (20)

How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptx
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 

Jarrar: Introduction to Data Integration

  • 1. Mustafa Jarrar Lecture Notes, Web Data Management (MCOM7348) University of Birzeit, Palestine 1st Semester, 2013 Introduction to Data Integration Dr. Mustafa Jarrar University of Birzeit mjarrar@birzeit.edu www.jarrar.info Jarrar © 2013 1
  • 2. Watch this lecture and download the slides from http://jarrar-courses.blogspot.com/2013/11/web-data-management.html Jarrar © 2013 2
  • 3. Example from the government Domain Consider all interactions with government agencies in order to register a new business in Palestine. Example: Establishing a new Radio Station. Ministry of Telecom Ministry of Information Ministry of National Economy Jarrar © 2013 Ministry of Finance Chamber of Commerce 3
  • 4. Example from the government Domain Consider when the business evolves or changes. Example: Changing the address of the radio station. –  Address must be changed in 5 different databases. Ministry of Telecom Ministry of Information Ministry of National Economy Jarrar © 2013 Ministry of Finance Chamber of Commerce 4
  • 5. Example from the government Domain Consider the data registered about the same radio station in the databases of different ministries and governmental agencies: ID Agency 3 R2563I Radio Al-Amal Radio Station Ramallah Business Name Activity Type Province LM1847 Al-Amal Broadcast Radio Broadcasting Ramallah and Bireh ID Agency 2 Type B_ID Agency 1 Name Company Name Company Type Location 182NS3 Broadcast AlAmal Broadcasting Station Al-Balu’ ... Jarrar © 2013 City 5
  • 6. Example from the government Domain From our simple example one can point out to some challenges in Data Integration: –  No agreed upon naming (name, business name, company name) –  No agreed upon meaning (Does ’Activity Type’ mean exactly the same as ‘Company Type’?) –  Different Registered Data: Radio Al-Amal, Al-Amal Broadcast, …. ID Agency 3 R2563I Radio Al-Amal Radio Station Ramallah Business Name Activity Type Province LM1847 Al-Amal Broadcast Radio Broadcasting Ramallah and Bireh ID Agency 2 Type B_ID Agency 1 Name Company Name Company Type Location 182NS3 Broadcast AlAmal Broadcasting Station Al-Balu’ ... Jarrar © 2013 City 6
  • 7. Problem is in all domains Jarrar © 2013 7
  • 8. Problem is in all domains Problem is now even more challenging with the Web. The Data Web envisions the web as a global world-wide database. This means that one can query distributed multiple databases on the web as if he/she is querying a local database. Jarrar © 2013 8
  • 9. Challenges of Data Integration: Heterogeneities in Database Schemas One can distinguish between several heterogeneities between different schemas: –  Name Heterogeneities (difference in used vocabulary). –  Meaning Heterogeneities (different meaning for the same attribute in two schemas). –  Heterogeneities in the structure and type. –  Heterogeneities in the rules and constraints. –  Data Model Heterogeneities. Jarrar © 2013 9
  • 10. Name and Meaning Heterogeneities Synonyms – Different names for the same concepts –  employee, clerk –  exam, course –  code, num Homonyms – Same name for different concepts (different meanings) - City as City of birth in one schema, - City as City of Residence in another schema Saraly: Net Salary Section Salary: Gross Salary Division Homonyms A specialized division of a large organization Synonyms Jarrar © 2013 10
  • 11. Heterogeneities in Structure and Type Source: Carlo Batini The same concepts are represented with different conceptual structures in two schemas: –  Attribute in one schema and derived value in another schema. –  Attribute in one schema and entity in another schema. –  Entity in one schema and relationship in another schema. –  Different abstraction levels for the same concept in two schemas: e.g. two entities with homonym names related by an IS-A hierarchy in two schemas. Jarrar © 2013 11
  • 12. Heterogeneities in Structure Source: Carlo Batini EXAMPLES: EMPLOYEE Person MAN Person GENDER EMPLOYEE DEPARTMENT PROJECT WOMAN PROJECT BOOK BOOK PUBLISHER PUBLISHER Jarrar © 2013 12
  • 13. Heterogeneities in Type Examples: §  In a single attribute (e.g., Numberic, Alphanumeric). E.g., the attribute “gender”: –  Male/Female –  M/F –  0/1 §  Year has a four digit domain in one schema and two digit domain in another schema §  Different currencies (Euros, US Dollars, etc.) §  Different measure systems (kilos vs. pounds, centigrade vs. Fahrenheit.) §  Different granularities (grams, kilos, etc.) Jarrar © 2013 13
  • 14. Heterogeneities in the rules and constraints Source: Carlo Batini EXAMPLES: –  Different cardinalities in the same relationships –  Key conflicts Jarrar © 2013 14
  • 15. Model Heterogeneities Model Heterogeneities occurs when different databases adheres to different data models: –  Relational Data Model, XML, RDF, Object-Oriented, OWL, ... Solution: Reduce Model Heterogeneity by using one data model. Example: Convert the Relational Model to RDF graph model. Jarrar © 2013 15
  • 16. References and Acknowledgement •  Carlo Batini: Course on Data Integration. BZU IT Summer School 2011. •  Stefano Spaccapietra: Information Integration. Presentation at the IFIP Academy. Porto Alegre. 2005. •  Chris Bizer: The Emerging Web of Linked Data. Presentation at SRI International, Artificial Intelligence Center. Menlo Park, USA. 2009. Thanks to Anton Deik for helping me preparing this lecture Jarrar © 2013 16