SlideShare una empresa de Scribd logo
1 de 1
Descargar para leer sin conexión
SAMSTAR:of Poster
A Semi-Automated Lexical Method to
Titl
generate Star Schemas from an ERD
Authors
Ritu Khare, Il Yeol Song, Yuan An
PROBLEM
To develop a star schema, existing approaches analyze the
attributes of interesting business entities.
•Entities with numerical measure attributes are assumed to be the
candidates of facts
•Entities with non-numerical and descriptive attributes are assumed
to be the candidates of dimensions.

2. Annotated Dimension Design Patterns (A_DDP)
We have referred to four sources and have instantiated the six classes of
DDP to produce a list of 131 commonly used dimension entities. We refer
to these entities as Annotated DDP (A_DDPs). These entities are
frequently used entities in the business processes. Examples include
account, activity, agent, aircraft, airport, etc.

RESULTS
Invoice

We focus on the structure of the ERD. The novel features of
SAMSTAR are:
(1) the use of the notion of Connection Topology Value (CTV) in
identifying the candidates of facts and dimensions and
(2) the use of Annotated Dimensional Design Patterns (A_DDP) as
well as WordNet to extend the list of dimensions.

Customer

ALGORITHM

Quantitative
Method
(FIND FACTS
AND DIRECT
DIMENSIONS)

DDP

Fig.2 SAMSTAR Overview

Quantitative
Method
(FIND INDIRECT
DIMENSIONS

WordNet

A_DDP

1. Connection Topology Value (CTV)

CTV (e) 1* n 0.8 *

CTV(Node(i ))
i 1

* If you are scanning charts,
cartoons, illustrations or
plain text non-photo
represents scan entity having
images), an at 600 dpi,
then ‘Save As’ at 225 dpi.

where i
a direct M:1 relationship with e.
For Fig. 1, CTV is calculated in the following manner:

A

B

E

H

weight_d=100%; weight_i=80%
D
F
C
The CTV for each entity is:
CTV (H) = 1* 0 + 0.8 * 0 = 0
CTV(F) = 0
G
CTV(G) = 0
CTV(E) = 1*1 + 0.8 * CTV(H) = 1
Fig.1: Calculation of CTV
CTV(B)= 1*1 + 0.8 * (CTV(E))= 1.8
CTV(C)=1*2 + 0.8 * (CTV(G) + CTV( F)) = 2
CTV(D)=1*1 + 0.8 * (CTV(C)) = 1 + 0.8 * 2 = 2.6
CTV(A)= 1*2 + 0.8 * (CTV(B) + CTV( C)) = 2 + 0.8 * (1.8+2) = 5.04

www.ischool.drexel.edu

Order
Time

Logistics

Shipment

Order
Customer

Area

Shipment

Supplier

Product-Supplier

Order-Product

Promotion Store

Store

SAMSTAR

Order

Invoice
Customer

Store

Promotion
Store

Customer

Promotion Type

Product

Warehouse

Time

Star Schema

n

Return

Store

ER Diagram

OUR SOLUTION

Store

Invoice

Return

Warehouse

Hence, these approaches are qualitative in nature, and focus only
on the semantics of ERD.

Time

1. Pre-process the input ERD.
2. Store Entities and Relationships.
3. Let user choose weighting factors for direct and indirect relationships.
4. Calculate the CTV for all entities.
5. Calculate the threshold value, Th, for CTV.
6. Identify entities having CTV higher than the threshold Th.
These are the candidates for fact tables.
7. Decide and shortlist the fact entities.
8. For each fact entity, perform the following steps:
(i) Identify the entities having direct M:1 link with a fact entity.
(ii) Identify entities having indirect M:1 link with the fact entity.
Out of these entities, identify synonyms of entity names from WordNet.
Extract the terms which match the DDP entity list or the A_DDP List.
(iii) Combine the results to Steps 8(i) and 8(ii) to prepare a list of candidate
dimensions for a given fact.
(iv) Add time dimension to this list.
9. Decide the dimension entities.
10. Let the user post-process the Star Schemas
11. Generate the star schema(s).

Order
Product

Order

Product

Invoice

Fig.3: Result of SAMSTAR
Schemas generated by SAMSTAR are similar to those generated
by the manual steps in case study papers.
SAMSTAR generated star schemas are the superset of the ones
generated manually in the paper using the same ER diagrams,
user needs and business goals.
This shows our schemas are inclusive of all possible facts and
dimensions. This gives the designer a helpful aid to prune the
schema as per the business and user requirements.

CONTRIBUTION
•A universal method to generate star schema(s) in that we have
used generalized DDPs and WordNet to identify dimensions of a
fact table.
•Quantitative in nature in that we analyze the structure of an input
ERD.
•Identifies a set of fact candidates from a large and complex ERD.
•Automatable up to a large extent; simplifies the work of
experienced designers; and gives a smooth head-start to novices.

The paper SAMSTAR: A Semi-Automated Lexical Method for
generating Star Schemas using an Entity Relationship
Diagram was presented at Data warehousing and On-line
Analytical Processing (DOLAP 2007) workshop held in Lisbon,
Portugal, November 2007.

Más contenido relacionado

La actualidad más candente

Chapter 8 ror analysis for multiple alternatives
Chapter 8   ror analysis for multiple alternativesChapter 8   ror analysis for multiple alternatives
Chapter 8 ror analysis for multiple alternativesBich Lien Pham
 
Lecture # 12 measures of profitability ii
Lecture # 12 measures of profitability iiLecture # 12 measures of profitability ii
Lecture # 12 measures of profitability iiBich Lien Pham
 
Chapter 18 sensitivity analysis
Chapter 18   sensitivity analysisChapter 18   sensitivity analysis
Chapter 18 sensitivity analysisBich Lien Pham
 
ELECTRICAL POWER SYSTEMS ECONOMICS
ELECTRICAL POWER SYSTEMS ECONOMICSELECTRICAL POWER SYSTEMS ECONOMICS
ELECTRICAL POWER SYSTEMS ECONOMICSMohammedMedani4
 
ELECTRICAL POWER SYSTEMS ECONOMICS
ELECTRICAL POWER SYSTEMS ECONOMICSELECTRICAL POWER SYSTEMS ECONOMICS
ELECTRICAL POWER SYSTEMS ECONOMICSMohammedMedani4
 
Lecture # 6 cost estimation ii
Lecture # 6 cost estimation iiLecture # 6 cost estimation ii
Lecture # 6 cost estimation iiBich Lien Pham
 
Chapter 12 independent projects & budget limitation
Chapter 12   independent projects & budget limitationChapter 12   independent projects & budget limitation
Chapter 12 independent projects & budget limitationBich Lien Pham
 
5 creating a_histogram
5 creating a_histogram5 creating a_histogram
5 creating a_histogramMedia4math
 
BIometrics- plotting DET and EER curve using Matlab
BIometrics- plotting DET and EER curve using MatlabBIometrics- plotting DET and EER curve using Matlab
BIometrics- plotting DET and EER curve using MatlabShiv Koppad
 
Chapter 6 annual worth analysis
Chapter 6   annual worth analysisChapter 6   annual worth analysis
Chapter 6 annual worth analysisBich Lien Pham
 
Chapter 13 breakeven analysis
Chapter 13   breakeven analysisChapter 13   breakeven analysis
Chapter 13 breakeven analysisBich Lien Pham
 
Lecture # 4 gradients factors and nominal and effective interest rates
Lecture # 4 gradients factors and nominal and effective interest ratesLecture # 4 gradients factors and nominal and effective interest rates
Lecture # 4 gradients factors and nominal and effective interest ratesBich Lien Pham
 

La actualidad más candente (16)

Chapter 8 ror analysis for multiple alternatives
Chapter 8   ror analysis for multiple alternativesChapter 8   ror analysis for multiple alternatives
Chapter 8 ror analysis for multiple alternatives
 
Lecture # 12 measures of profitability ii
Lecture # 12 measures of profitability iiLecture # 12 measures of profitability ii
Lecture # 12 measures of profitability ii
 
Chapter 18 sensitivity analysis
Chapter 18   sensitivity analysisChapter 18   sensitivity analysis
Chapter 18 sensitivity analysis
 
ELECTRICAL POWER SYSTEMS ECONOMICS
ELECTRICAL POWER SYSTEMS ECONOMICSELECTRICAL POWER SYSTEMS ECONOMICS
ELECTRICAL POWER SYSTEMS ECONOMICS
 
slides-josepcoves
slides-josepcovesslides-josepcoves
slides-josepcoves
 
ELECTRICAL POWER SYSTEMS ECONOMICS
ELECTRICAL POWER SYSTEMS ECONOMICSELECTRICAL POWER SYSTEMS ECONOMICS
ELECTRICAL POWER SYSTEMS ECONOMICS
 
Lecture # 6 cost estimation ii
Lecture # 6 cost estimation iiLecture # 6 cost estimation ii
Lecture # 6 cost estimation ii
 
Chapter 12 independent projects & budget limitation
Chapter 12   independent projects & budget limitationChapter 12   independent projects & budget limitation
Chapter 12 independent projects & budget limitation
 
5 creating a_histogram
5 creating a_histogram5 creating a_histogram
5 creating a_histogram
 
Assignment in java
Assignment in javaAssignment in java
Assignment in java
 
Thesis PPT
Thesis PPTThesis PPT
Thesis PPT
 
test
testtest
test
 
BIometrics- plotting DET and EER curve using Matlab
BIometrics- plotting DET and EER curve using MatlabBIometrics- plotting DET and EER curve using Matlab
BIometrics- plotting DET and EER curve using Matlab
 
Chapter 6 annual worth analysis
Chapter 6   annual worth analysisChapter 6   annual worth analysis
Chapter 6 annual worth analysis
 
Chapter 13 breakeven analysis
Chapter 13   breakeven analysisChapter 13   breakeven analysis
Chapter 13 breakeven analysis
 
Lecture # 4 gradients factors and nominal and effective interest rates
Lecture # 4 gradients factors and nominal and effective interest ratesLecture # 4 gradients factors and nominal and effective interest rates
Lecture # 4 gradients factors and nominal and effective interest rates
 

Destacado

Para los muchos que tienen diversas opiniones
Para los muchos que tienen diversas opinionesPara los muchos que tienen diversas opiniones
Para los muchos que tienen diversas opinionescorronuevo
 
Fiscal access system using rfid
Fiscal access system using rfidFiscal access system using rfid
Fiscal access system using rfidEcwaytech
 
Presentacion seguridad del paciente
Presentacion seguridad del pacientePresentacion seguridad del paciente
Presentacion seguridad del pacienteFormando Enfermeria
 
Fast activity detection indexing for temporal stochastic automaton based acti...
Fast activity detection indexing for temporal stochastic automaton based acti...Fast activity detection indexing for temporal stochastic automaton based acti...
Fast activity detection indexing for temporal stochastic automaton based acti...Ecwaytech
 
Paz ivan
Paz ivanPaz ivan
Paz ivanarinani
 
Organizational behavior paper
Organizational behavior paperOrganizational behavior paper
Organizational behavior papercornel doyle
 
1 competencias digitales profesores
1 competencias digitales profesores1 competencias digitales profesores
1 competencias digitales profesoresTICS & Partners
 
Promotion and Pricing Strategies
Promotion and Pricing StrategiesPromotion and Pricing Strategies
Promotion and Pricing StrategiesKawser Ahmad Sohan
 
বাংলাদেশের মুক্তির সংগ্রাম
বাংলাদেশের  মুক্তির সংগ্রামবাংলাদেশের  মুক্তির সংগ্রাম
বাংলাদেশের মুক্তির সংগ্রামKawser Ahmad Sohan
 
Oracle University Certificate - Communications ASAP, OSS
Oracle University Certificate - Communications ASAP, OSSOracle University Certificate - Communications ASAP, OSS
Oracle University Certificate - Communications ASAP, OSSVijayananda Mohire
 
Event tracking for real time unaware sensitivity analysis
Event tracking for real time unaware sensitivity analysisEvent tracking for real time unaware sensitivity analysis
Event tracking for real time unaware sensitivity analysisEcwaytech
 

Destacado (20)

Para los muchos que tienen diversas opiniones
Para los muchos que tienen diversas opinionesPara los muchos que tienen diversas opiniones
Para los muchos que tienen diversas opiniones
 
Tecnologías utilizadas en la cotidianidad
Tecnologías utilizadas en la cotidianidadTecnologías utilizadas en la cotidianidad
Tecnologías utilizadas en la cotidianidad
 
Edit anaylist
Edit anaylistEdit anaylist
Edit anaylist
 
Fiscal access system using rfid
Fiscal access system using rfidFiscal access system using rfid
Fiscal access system using rfid
 
Motos
MotosMotos
Motos
 
Presentacion seguridad del paciente
Presentacion seguridad del pacientePresentacion seguridad del paciente
Presentacion seguridad del paciente
 
Gacetilla de prensa 12 06-2015
Gacetilla de prensa 12 06-2015Gacetilla de prensa 12 06-2015
Gacetilla de prensa 12 06-2015
 
Brad Pitt & Angelina Jolie
Brad Pitt & Angelina JolieBrad Pitt & Angelina Jolie
Brad Pitt & Angelina Jolie
 
Fast activity detection indexing for temporal stochastic automaton based acti...
Fast activity detection indexing for temporal stochastic automaton based acti...Fast activity detection indexing for temporal stochastic automaton based acti...
Fast activity detection indexing for temporal stochastic automaton based acti...
 
Paz ivan
Paz ivanPaz ivan
Paz ivan
 
Resume Carrie
Resume CarrieResume Carrie
Resume Carrie
 
Organizational behavior paper
Organizational behavior paperOrganizational behavior paper
Organizational behavior paper
 
1 competencias digitales profesores
1 competencias digitales profesores1 competencias digitales profesores
1 competencias digitales profesores
 
시디론Ppt
시디론Ppt시디론Ppt
시디론Ppt
 
ASNT_Level_III_UT
ASNT_Level_III_UTASNT_Level_III_UT
ASNT_Level_III_UT
 
USAFBannual_2014
USAFBannual_2014USAFBannual_2014
USAFBannual_2014
 
Promotion and Pricing Strategies
Promotion and Pricing StrategiesPromotion and Pricing Strategies
Promotion and Pricing Strategies
 
বাংলাদেশের মুক্তির সংগ্রাম
বাংলাদেশের  মুক্তির সংগ্রামবাংলাদেশের  মুক্তির সংগ্রাম
বাংলাদেশের মুক্তির সংগ্রাম
 
Oracle University Certificate - Communications ASAP, OSS
Oracle University Certificate - Communications ASAP, OSSOracle University Certificate - Communications ASAP, OSS
Oracle University Certificate - Communications ASAP, OSS
 
Event tracking for real time unaware sensitivity analysis
Event tracking for real time unaware sensitivity analysisEvent tracking for real time unaware sensitivity analysis
Event tracking for real time unaware sensitivity analysis
 

Similar a SAMSTAR: A Semi-automated Lexical Method to generate Star Schemas from an ERD

Tactical data engineering
Tactical data engineeringTactical data engineering
Tactical data engineeringJulian Hyde
 
Business Intelligence Portfolio
Business Intelligence PortfolioBusiness Intelligence Portfolio
Business Intelligence PortfolioChris Seebacher
 
Patterns of Enterprise Application Architecture (by example)
Patterns of Enterprise Application Architecture (by example)Patterns of Enterprise Application Architecture (by example)
Patterns of Enterprise Application Architecture (by example)Paulo Gandra de Sousa
 
Kevin Bengtson Portfolio
Kevin Bengtson PortfolioKevin Bengtson Portfolio
Kevin Bengtson PortfolioKbengt521
 
Grill at bigdata-cloud conf
Grill at bigdata-cloud confGrill at bigdata-cloud conf
Grill at bigdata-cloud confamarsri
 
Pumps, Compressors and Turbine Fault Frequency Analysis
Pumps, Compressors and Turbine Fault Frequency AnalysisPumps, Compressors and Turbine Fault Frequency Analysis
Pumps, Compressors and Turbine Fault Frequency AnalysisUniversity of Illinois,Chicago
 
Pumps, Compressors and Turbine Fault Frequency Analysis
Pumps, Compressors and Turbine Fault Frequency AnalysisPumps, Compressors and Turbine Fault Frequency Analysis
Pumps, Compressors and Turbine Fault Frequency AnalysisUniversity of Illinois,Chicago
 
It's Time to Reassess Your FDM Mappings
It's Time to Reassess Your FDM MappingsIt's Time to Reassess Your FDM Mappings
It's Time to Reassess Your FDM MappingsAlithya
 
DN 2017 | Reducing pain in data engineering | Martin Loetzsch | Project A
DN 2017 | Reducing pain in data engineering | Martin Loetzsch | Project ADN 2017 | Reducing pain in data engineering | Martin Loetzsch | Project A
DN 2017 | Reducing pain in data engineering | Martin Loetzsch | Project ADataconomy Media
 
Discover Data That Matters- Deep dive into WSO2 Analytics
Discover Data That Matters- Deep dive into WSO2 AnalyticsDiscover Data That Matters- Deep dive into WSO2 Analytics
Discover Data That Matters- Deep dive into WSO2 AnalyticsSriskandarajah Suhothayan
 
IBM Insight 2015 - 1823 - Geospatial analytics with dashDB in the cloud
IBM Insight 2015 - 1823 - Geospatial analytics with dashDB in the cloudIBM Insight 2015 - 1823 - Geospatial analytics with dashDB in the cloud
IBM Insight 2015 - 1823 - Geospatial analytics with dashDB in the cloudTorsten Steinbach
 
Rethinking metrics: metrics 2.0 @ Lisa 2014
Rethinking metrics: metrics 2.0 @ Lisa 2014Rethinking metrics: metrics 2.0 @ Lisa 2014
Rethinking metrics: metrics 2.0 @ Lisa 2014Dieter Plaetinck
 
Dublin Ireland Spark Meetup October 15, 2015
Dublin Ireland Spark Meetup October 15, 2015Dublin Ireland Spark Meetup October 15, 2015
Dublin Ireland Spark Meetup October 15, 2015eddiebaggott
 
Parallel and Iterative Processing for Machine Learning Recommendations with S...
Parallel and Iterative Processing for Machine Learning Recommendations with S...Parallel and Iterative Processing for Machine Learning Recommendations with S...
Parallel and Iterative Processing for Machine Learning Recommendations with S...MapR Technologies
 
Chapter 1 Basic Concepts
Chapter 1 Basic ConceptsChapter 1 Basic Concepts
Chapter 1 Basic ConceptsHareem Aslam
 

Similar a SAMSTAR: A Semi-automated Lexical Method to generate Star Schemas from an ERD (20)

Tactical data engineering
Tactical data engineeringTactical data engineering
Tactical data engineering
 
C a1 fin_10
C a1 fin_10C a1 fin_10
C a1 fin_10
 
Bw14
Bw14Bw14
Bw14
 
Business Intelligence Portfolio
Business Intelligence PortfolioBusiness Intelligence Portfolio
Business Intelligence Portfolio
 
PoEAA by Example
PoEAA by ExamplePoEAA by Example
PoEAA by Example
 
Patterns of Enterprise Application Architecture (by example)
Patterns of Enterprise Application Architecture (by example)Patterns of Enterprise Application Architecture (by example)
Patterns of Enterprise Application Architecture (by example)
 
Kevin Bengtson Portfolio
Kevin Bengtson PortfolioKevin Bengtson Portfolio
Kevin Bengtson Portfolio
 
Grill at bigdata-cloud conf
Grill at bigdata-cloud confGrill at bigdata-cloud conf
Grill at bigdata-cloud conf
 
Pumps, Compressors and Turbine Fault Frequency Analysis
Pumps, Compressors and Turbine Fault Frequency AnalysisPumps, Compressors and Turbine Fault Frequency Analysis
Pumps, Compressors and Turbine Fault Frequency Analysis
 
Pumps, Compressors and Turbine Fault Frequency Analysis
Pumps, Compressors and Turbine Fault Frequency AnalysisPumps, Compressors and Turbine Fault Frequency Analysis
Pumps, Compressors and Turbine Fault Frequency Analysis
 
It ready dw_day4_rev00
It ready dw_day4_rev00It ready dw_day4_rev00
It ready dw_day4_rev00
 
It's Time to Reassess Your FDM Mappings
It's Time to Reassess Your FDM MappingsIt's Time to Reassess Your FDM Mappings
It's Time to Reassess Your FDM Mappings
 
DN 2017 | Reducing pain in data engineering | Martin Loetzsch | Project A
DN 2017 | Reducing pain in data engineering | Martin Loetzsch | Project ADN 2017 | Reducing pain in data engineering | Martin Loetzsch | Project A
DN 2017 | Reducing pain in data engineering | Martin Loetzsch | Project A
 
C++
C++C++
C++
 
Discover Data That Matters- Deep dive into WSO2 Analytics
Discover Data That Matters- Deep dive into WSO2 AnalyticsDiscover Data That Matters- Deep dive into WSO2 Analytics
Discover Data That Matters- Deep dive into WSO2 Analytics
 
IBM Insight 2015 - 1823 - Geospatial analytics with dashDB in the cloud
IBM Insight 2015 - 1823 - Geospatial analytics with dashDB in the cloudIBM Insight 2015 - 1823 - Geospatial analytics with dashDB in the cloud
IBM Insight 2015 - 1823 - Geospatial analytics with dashDB in the cloud
 
Rethinking metrics: metrics 2.0 @ Lisa 2014
Rethinking metrics: metrics 2.0 @ Lisa 2014Rethinking metrics: metrics 2.0 @ Lisa 2014
Rethinking metrics: metrics 2.0 @ Lisa 2014
 
Dublin Ireland Spark Meetup October 15, 2015
Dublin Ireland Spark Meetup October 15, 2015Dublin Ireland Spark Meetup October 15, 2015
Dublin Ireland Spark Meetup October 15, 2015
 
Parallel and Iterative Processing for Machine Learning Recommendations with S...
Parallel and Iterative Processing for Machine Learning Recommendations with S...Parallel and Iterative Processing for Machine Learning Recommendations with S...
Parallel and Iterative Processing for Machine Learning Recommendations with S...
 
Chapter 1 Basic Concepts
Chapter 1 Basic ConceptsChapter 1 Basic Concepts
Chapter 1 Basic Concepts
 

Último

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 

Último (20)

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 

SAMSTAR: A Semi-automated Lexical Method to generate Star Schemas from an ERD

  • 1. SAMSTAR:of Poster A Semi-Automated Lexical Method to Titl generate Star Schemas from an ERD Authors Ritu Khare, Il Yeol Song, Yuan An PROBLEM To develop a star schema, existing approaches analyze the attributes of interesting business entities. •Entities with numerical measure attributes are assumed to be the candidates of facts •Entities with non-numerical and descriptive attributes are assumed to be the candidates of dimensions. 2. Annotated Dimension Design Patterns (A_DDP) We have referred to four sources and have instantiated the six classes of DDP to produce a list of 131 commonly used dimension entities. We refer to these entities as Annotated DDP (A_DDPs). These entities are frequently used entities in the business processes. Examples include account, activity, agent, aircraft, airport, etc. RESULTS Invoice We focus on the structure of the ERD. The novel features of SAMSTAR are: (1) the use of the notion of Connection Topology Value (CTV) in identifying the candidates of facts and dimensions and (2) the use of Annotated Dimensional Design Patterns (A_DDP) as well as WordNet to extend the list of dimensions. Customer ALGORITHM Quantitative Method (FIND FACTS AND DIRECT DIMENSIONS) DDP Fig.2 SAMSTAR Overview Quantitative Method (FIND INDIRECT DIMENSIONS WordNet A_DDP 1. Connection Topology Value (CTV) CTV (e) 1* n 0.8 * CTV(Node(i )) i 1 * If you are scanning charts, cartoons, illustrations or plain text non-photo represents scan entity having images), an at 600 dpi, then ‘Save As’ at 225 dpi. where i a direct M:1 relationship with e. For Fig. 1, CTV is calculated in the following manner: A B E H weight_d=100%; weight_i=80% D F C The CTV for each entity is: CTV (H) = 1* 0 + 0.8 * 0 = 0 CTV(F) = 0 G CTV(G) = 0 CTV(E) = 1*1 + 0.8 * CTV(H) = 1 Fig.1: Calculation of CTV CTV(B)= 1*1 + 0.8 * (CTV(E))= 1.8 CTV(C)=1*2 + 0.8 * (CTV(G) + CTV( F)) = 2 CTV(D)=1*1 + 0.8 * (CTV(C)) = 1 + 0.8 * 2 = 2.6 CTV(A)= 1*2 + 0.8 * (CTV(B) + CTV( C)) = 2 + 0.8 * (1.8+2) = 5.04 www.ischool.drexel.edu Order Time Logistics Shipment Order Customer Area Shipment Supplier Product-Supplier Order-Product Promotion Store Store SAMSTAR Order Invoice Customer Store Promotion Store Customer Promotion Type Product Warehouse Time Star Schema n Return Store ER Diagram OUR SOLUTION Store Invoice Return Warehouse Hence, these approaches are qualitative in nature, and focus only on the semantics of ERD. Time 1. Pre-process the input ERD. 2. Store Entities and Relationships. 3. Let user choose weighting factors for direct and indirect relationships. 4. Calculate the CTV for all entities. 5. Calculate the threshold value, Th, for CTV. 6. Identify entities having CTV higher than the threshold Th. These are the candidates for fact tables. 7. Decide and shortlist the fact entities. 8. For each fact entity, perform the following steps: (i) Identify the entities having direct M:1 link with a fact entity. (ii) Identify entities having indirect M:1 link with the fact entity. Out of these entities, identify synonyms of entity names from WordNet. Extract the terms which match the DDP entity list or the A_DDP List. (iii) Combine the results to Steps 8(i) and 8(ii) to prepare a list of candidate dimensions for a given fact. (iv) Add time dimension to this list. 9. Decide the dimension entities. 10. Let the user post-process the Star Schemas 11. Generate the star schema(s). Order Product Order Product Invoice Fig.3: Result of SAMSTAR Schemas generated by SAMSTAR are similar to those generated by the manual steps in case study papers. SAMSTAR generated star schemas are the superset of the ones generated manually in the paper using the same ER diagrams, user needs and business goals. This shows our schemas are inclusive of all possible facts and dimensions. This gives the designer a helpful aid to prune the schema as per the business and user requirements. CONTRIBUTION •A universal method to generate star schema(s) in that we have used generalized DDPs and WordNet to identify dimensions of a fact table. •Quantitative in nature in that we analyze the structure of an input ERD. •Identifies a set of fact candidates from a large and complex ERD. •Automatable up to a large extent; simplifies the work of experienced designers; and gives a smooth head-start to novices. The paper SAMSTAR: A Semi-Automated Lexical Method for generating Star Schemas using an Entity Relationship Diagram was presented at Data warehousing and On-line Analytical Processing (DOLAP 2007) workshop held in Lisbon, Portugal, November 2007.