SlideShare una empresa de Scribd logo
1 de 7
MC0088- DATA WAREHOUSING & DATA MINING 
Que.1 Differentiate between Data Mining and Data Warehousing? 
Ans: - 
Data Mining: - 
Data Mining: A hot buzzword for a class of database applications that look for hidden patterns in a 
group of data. For example, data mining software can help retail companies find customers with 
common interests. 
The term is commonly misused to describe software that presents data in new ways. True data 
mining software doesn't just change the presentation, but actually discovers previously unknown 
relationships among the data. 
Data mining consists of many up-to-date techniques such as classification (decision trees, native 
Bayes classifier, k-nearest neighbor, and neural networks), clustering (k-means, hierarchical 
clustering, and density-based clustering), association (one-dimensional, multidimensional, 
multilevel association, constraint-based association). Many years of practice show that data mining 
is a process, and its successful application requires data preprocessing (dimensionality reduction, 
cleaning, noise/outlier removal), post processing (understand ability, summary, presentation), 
good understanding of problem domains and domain expertise. 
Data Warehousing: - 
The construction of data warehouse, which involves data cleaning and data integration, can be 
viewed as an important preprocessing step for data mining. Moreover, data warehouses provide 
on-line analytical processing (OLAP) tools for the interactive analysis of multidimensional data of 
varied granularities, which facilitate effective data mining. Furthermore, many other data mining 
functions such as classification, prediction, association and clustering can be integrated with OLAP 
operation to enhance interactive mining of knowledge at multiple levels of abstraction. Hence, the 
data warehouse has become an increasingly important platform for data analysis and online 
analytical processing and will provide an effective platform for data mining. Therefore, prior to 
presenting a systematic coverage of data mining technology in the remainder of this book, we 
devote this unit to an overview of data warehouse technology. Such an overview is essential for 
understanding data mining technology. 
Data warehouses have been defined in many ways, making it difficult to formulate a rigorous 
definition. A data warehouse refers to a database that is maintained separately from an 
organization’s operational databases. Data warehouse systems allow for the integration of a variety 
of application systems. 
Data warehousing is defined as a process of centralized data management and retrieval. Data 
warehousing, like data mining, is a relatively new term although the concept itself has been around 
for years.
Que.2 Describe the key features of a Data Warehouse? 
Ans: - 
According to W. H. Inmon, a leading architect in the construction of data warehouse systems, “A 
data warehouse is a subject – oriented, integrated, and time – variant, and nonvolatile collection of 
data in support of management’s decision making process”. 
Key features of a Data Warehouse 
1) Subject – oriented 
2) Integrated 
3) Time – variant: 
4) Nonvolatile 
Subject – oriented: - 
A data warehouse is organized around major subjects, such as customer, supplier, product, and 
sales. Rather than concentrating on the day-to-day operation and transaction processing of an 
organization, a data warehouse focuses on the modeling and analysis of data for decision makers. 
Hence, data warehouses typically provide a simple and concise view around particular subject 
issues by excluding data that are not useful in the decision support process. 
Integrated: - 
A data warehouse is usually constructed by integrating multiple heterogeneous sources, such as 
relational databases, flat files, and on – line transaction records. Data cleaning and data integration 
techniques are applied to ensure consistency in naming conventions, encoding structures, attribute 
measures, and so on. 
Time – variant: - 
Data are stored to provide information from a historical perspective (e.g., the past 5 – 10 years). 
Every key structure in the data warehouse contains, either implicitly or explicitly, an element of 
time. 
Nonvolatile: - 
A data warehouse is always a physically separate store of data transformed from the application 
data found in the operational environment. Due to this separation, a data warehouse does not 
require transaction processing, recovery, and concurrency control mechanisms. It usually requires 
only two operations in data accessing: initial loading of data and access of data. 
The traditional database approach to heterogeneous database integration is to build wrappers and 
integrators (or mediators) on top of multiple, heterogeneous databases (examples include IBM Data 
Joiner and Informix Data Blade). When a query is posed to a client site, a metadata dictionary is 
used to translate the query into queries appropriate for the individual heterogeneous sites 
involved. These queries are then mapped and sent to local query processors. The results returned 
form the different sites are integrated into a global answer set.
Que. 3 Differentiate between Data Integration and Transformation? 
Ans: - 
Data Integration: - 
Data Integration is one of the steps of Data Preprocessing that involves combining data residing in 
different sources and providing users with a unified view of these data It does merging data from 
multiple data stores (data sources) like as under : - 
1) Data Migration 
2) Data Synchronization 
3) ETL 
4) Business Intelligence 
5) Master Data Management 
Data Migration: - 
Data Migration is the process of transferring data from one system to another while changing the 
storage, database or application. 
Data Synchronization: - 
Data Synchronization is a process of establishing consistency among systems and subsequent 
continuous updates to maintain consistency. 
ETL: - 
ETL comes from Data Warehousing and stands for Extract-Transform-Load. ETL covers a process of 
how the data are loaded from the source system to the data warehouse. 
Business Intelligence: - 
Business Intelligence (BI) is a set of tools supporting the transformation of raw data into useful 
information which can support decision making. 
Master Data Management: - 
Master Data Management (MDM) represents a set of tools and processes used by an enterprise to 
consistently manage their non-transactional data.
Transformation 
Data transformation is the process of converting data from one format (e.g. a database file, XML 
document, or Excel sheet) to another. Because data often resides in different locations and formats 
across the enterprise, data transformation is necessary to ensure data from one application or 
database is intelligible to other applications and databases, a critical feature for applications 
integration. 
In a typical scenario where information needs to be shared, data is extracted from the source 
application or data warehouse, transformed into another format, and then loaded into the target 
location. Extraction, transformation, and loading (together known as ETL) are the central processes 
of data integration. Depending on the nature of the integration scenario, data may need to be 
merged, aggregated, enriched, summarized, or filtered. 
The first step of data transformation is data mapping. Data mapping determines the relationship 
between the data elements of two applications and establishes instructions for how the data from 
the source application is transformed before it is loaded into the target application. In other words, 
data mapping produces the critical metadata that is needed before the actual data conversion takes 
place.
Que. 4 Differentiate between database management systems (DBMS) and data mining? 
Ans: - 
Database Management System (DBMS) is the software that manages data on physical storage 
devices. 
Data Mining: - Data mining is the process of discovering relationships among data in the database. 
Area DBMS Data mining 
Task 
Extraction of detailed and 
summary data 
Knowledge discovery of hidden 
patterns and insights 
Type of result Information Insight and Prediction 
Method 
Deduction (Ask the question, 
verify the data) 
Induction (Build the model, 
apply it to new data, get the 
result) 
Example question 
Who purchased mutual funds 
in the last 3 years? 
Who will buy a mutual fund in 
the next 6 months and why? 
Data mining is concerned with finding hidden relationships present in business data to allow 
businesses to make predictions for future use. It is the process of data-driven extraction of not so 
obvious but useful information from large databases. 
The aim of data mining is to extract implicit, previously unknown and potentially useful (or 
actionable) patterns from data. Data mining consists of many up-to-date techniques such as 
classification (decision trees, naïve bays classifier, k -nearest neighbor, and neural networks), 
clustering (k-means, hierarchical clustering, and density-based clustering), association (one-dimensional, 
multidimensional, multilevel association, constraint-based association). 
Data warehousing is defined as a process of centralized data management and retrieval. 
Data warehouse is an enabled relational database system designed to support very large databases 
(VLDB) at a significantly higher level of performance and manageability. 
Data warehouse is an environment, not a product. It is an architectural construct of information 
that is hard to accessory present in traditional operational data stores
Que. 5 Differentiate between K-means and Hierarchical clustering? 
Ans: - 
K-means clustering 
The k-means algorithm assigns each point to the cluster whose center (also called centroid) is 
nearest. The center is the average of all the points in the cluster — that is, its coordinates are the 
arithmetic mean for each dimension separately over all the points in the cluster. 
Example: The data set has three dimensions and the cluster has two points: X = (x1,x2,x3) and Y = 
(y1,y2,y3). Then the centroid Z becomes Z = (z1,z2,z3), where 
The algorithm steps are as under: - 
Choose the number of clusters, k. 
Randomly generate k clusters and determine the cluster centers, or directly generate k random 
points as cluster centers. 
Assign each point to the nearest cluster center, where "nearest" is defined with respect to one of the 
distance measures discussed above. 
Recomputed the new cluster centers. 
Repeat the two previous steps until some convergence criterion is met (usually that the assignment 
hasn't changed). 
The main advantages of this algorithm are its simplicity and speed which allows it to run on large 
datasets. Its disadvantage is that it does not yield the same result with each run, since the resulting 
clusters depend on the initial random assignments. 
Hierarchical clustering: - 
Hierarchical clustering creates a hierarchy of clusters which may be represented in a tree structure 
called a dendrogram. The root of the tree consists of a single cluster containing all observations, 
and the leaves correspond to individual observations. 
Algorithms for hierarchical clustering are generally either agglomerative, in which one starts at the 
leaves and successively merges clusters together; or divisive, in which one starts at the root and 
recursively splits the clusters. 
Any non-negative-valued function may be used as a measure of similarity between pairs of 
observations. The choice of which clusters to merge or split is determined by a linkage criterion, 
which is a function of the pair wise distances between observations. 
Cutting the tree at a given height will give a clustering at a selected precision. In the following 
example, cutting after the second row will yield clusters {a} {b c} {d e} {f}. Cutting after the third 
row will yield clusters {a} {b c} {d e f}, which is a coarser clustering, with a smaller number of larger 
clusters. 
This method builds the hierarchy from the individual elements by progressively merging clusters. 
In our example, we have six elements {a} {b} {c} {d} {e} and {f}. The first step is to determine which 
elements to merge in a cluster.
Que. 6 Differentiate between Web content mining and Web usage mining? 
Ans: - 
Web Content Mining: - 
Web content mining targets the knowledge discovery, in which the main objects are the traditional 
collections of multimedia documents such as images, video, and audio, which are embedded in or 
linked to the web pages. It is also quite different from Data mining because Web data are mainly 
semi-structured and/or unstructured, while Data mining deals primarily with structured data. Web 
content mining is also different from Text mining because of the semi-structure nature of the Web, 
while Text mining focuses on unstructured texts. Web content mining thus requires creative 
applications of Data mining and / or Text mining techniques and also its own unique approaches. In 
the past few years, there was a rapid expansion of activities in the Web content mining area. This is 
not surprising because of the phenomenal growth of the Web contents and significant economic 
benefit of such mining. However, due to the heterogeneity and the lack of structure of Web data, 
automated discovery of targeted or unexpected knowledge information still present many 
challenging research problems. Web content mining could be differentiated from two points of 
view: 
1) Agent-based approach 
2) Database approach. 
The first approach aims on improving the information finding and filtering. 
The second approach aims on modeling the data on the. Web into more structured form in order to 
apply standard database querying mechanism and data mining applications to analyze it 
Web Usage Mining: - 
Web Usage Mining focuses on techniques that could predict the behavior of users while they are 
interacting with the WWW. Web usage mining, discover user navigation patterns from web data, 
tries to discover the useful information from the second array data derived from the interactions of 
the users while surfing on the Web. 
There are several available research projects and commercial tools that analyze those patterns for 
different purposes. The insight knowledge could be utilized in personalization, system 
improvement, site modification, business intelligence and usage characterization. The only 
information left behind by many users visiting a Web site is the path through the pages they have 
accessed. Most of the Web information retrieval tools only use the textual information, while they 
ignore the link information that could be very valuable. In general, there are mainly four kinds of 
data mining techniques applied to the web mining domain to discover the user navigation pattern: 
1) Association Rule mining 
2) Sequential pattern 
3) Clustering 
4) Classification

Más contenido relacionado

La actualidad más candente

Additional themes of data mining for Msc CS
Additional themes of data mining for Msc CSAdditional themes of data mining for Msc CS
Additional themes of data mining for Msc CSThanveen
 
Top Data Mining Techniques and Their Applications
Top Data Mining Techniques and Their ApplicationsTop Data Mining Techniques and Their Applications
Top Data Mining Techniques and Their ApplicationsPromptCloud
 
Data Mining: What is Data Mining?
Data Mining: What is Data Mining?Data Mining: What is Data Mining?
Data Mining: What is Data Mining?Seerat Malik
 
Data Mining – analyse Bank Marketing Data Set by WEKA.
Data Mining – analyse Bank Marketing Data Set by WEKA.Data Mining – analyse Bank Marketing Data Set by WEKA.
Data Mining – analyse Bank Marketing Data Set by WEKA.Mateusz Brzoska
 
Importance of Data Mining
Importance of Data MiningImportance of Data Mining
Importance of Data MiningScottperrone
 
Data mining seminar report
Data mining seminar reportData mining seminar report
Data mining seminar reportmayurik19
 
Application areas of data mining
Application areas of data miningApplication areas of data mining
Application areas of data miningpriya jain
 
Data mining concepts and work
Data mining concepts and workData mining concepts and work
Data mining concepts and workAmr Abd El Latief
 
Introduction to Data Mining
Introduction to Data Mining Introduction to Data Mining
Introduction to Data Mining Sushil Kulkarni
 

La actualidad más candente (19)

Additional themes of data mining for Msc CS
Additional themes of data mining for Msc CSAdditional themes of data mining for Msc CS
Additional themes of data mining for Msc CS
 
Top Data Mining Techniques and Their Applications
Top Data Mining Techniques and Their ApplicationsTop Data Mining Techniques and Their Applications
Top Data Mining Techniques and Their Applications
 
Data Mining
Data MiningData Mining
Data Mining
 
Data mining
Data mining Data mining
Data mining
 
Data mining
Data miningData mining
Data mining
 
Data mining notes
Data mining notesData mining notes
Data mining notes
 
Data Mining: What is Data Mining?
Data Mining: What is Data Mining?Data Mining: What is Data Mining?
Data Mining: What is Data Mining?
 
Data mining
Data miningData mining
Data mining
 
Data Mining – analyse Bank Marketing Data Set by WEKA.
Data Mining – analyse Bank Marketing Data Set by WEKA.Data Mining – analyse Bank Marketing Data Set by WEKA.
Data Mining – analyse Bank Marketing Data Set by WEKA.
 
Importance of Data Mining
Importance of Data MiningImportance of Data Mining
Importance of Data Mining
 
Data mining
Data miningData mining
Data mining
 
Data mining
Data miningData mining
Data mining
 
Data mining seminar report
Data mining seminar reportData mining seminar report
Data mining seminar report
 
Data Mining
Data MiningData Mining
Data Mining
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
 
Data Mining
Data MiningData Mining
Data Mining
 
Application areas of data mining
Application areas of data miningApplication areas of data mining
Application areas of data mining
 
Data mining concepts and work
Data mining concepts and workData mining concepts and work
Data mining concepts and work
 
Introduction to Data Mining
Introduction to Data Mining Introduction to Data Mining
Introduction to Data Mining
 

Similar a MC0088 Internal Assignment (SMU)

Data warehouse
Data warehouseData warehouse
Data warehouseRajThakuri
 
Introduction-to-Databases.pptx
Introduction-to-Databases.pptxIntroduction-to-Databases.pptx
Introduction-to-Databases.pptxIvanDarrylLopez
 
Advances And Research Directions In Data-Warehousing Technology
Advances And Research Directions In Data-Warehousing TechnologyAdvances And Research Directions In Data-Warehousing Technology
Advances And Research Directions In Data-Warehousing TechnologyKate Campbell
 
Enhancing Data Staging as a Mechanism for Fast Data Access
Enhancing Data Staging as a Mechanism for Fast Data AccessEnhancing Data Staging as a Mechanism for Fast Data Access
Enhancing Data Staging as a Mechanism for Fast Data AccessEditor IJCATR
 
Enhancing Data Staging as a Mechanism for Fast Data Access
Enhancing Data Staging as a Mechanism for Fast Data AccessEnhancing Data Staging as a Mechanism for Fast Data Access
Enhancing Data Staging as a Mechanism for Fast Data AccessEditor IJCATR
 
Enhancing Data Staging as a Mechanism for Fast Data Access
Enhancing Data Staging as a Mechanism for Fast Data AccessEnhancing Data Staging as a Mechanism for Fast Data Access
Enhancing Data Staging as a Mechanism for Fast Data AccessEditor IJCATR
 
Enhancing Data Staging as a Mechanism for Fast Data Access
Enhancing Data Staging as a Mechanism for Fast Data AccessEnhancing Data Staging as a Mechanism for Fast Data Access
Enhancing Data Staging as a Mechanism for Fast Data AccessEditor IJCATR
 
Data Warehousing AWS 12345
Data Warehousing AWS 12345Data Warehousing AWS 12345
Data Warehousing AWS 12345AkhilSinghal21
 
DATAWAREHOUSE MAIn under data mining for
DATAWAREHOUSE MAIn under data mining forDATAWAREHOUSE MAIn under data mining for
DATAWAREHOUSE MAIn under data mining forAyushMeraki1
 
BVRM 402 IMS Database Concept.pptx
BVRM 402 IMS Database Concept.pptxBVRM 402 IMS Database Concept.pptx
BVRM 402 IMS Database Concept.pptxDrNilimaThakur
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousingsumit621
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSINGKing Julian
 

Similar a MC0088 Internal Assignment (SMU) (20)

MS-CIT Unit 9.pptx
MS-CIT Unit 9.pptxMS-CIT Unit 9.pptx
MS-CIT Unit 9.pptx
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Introduction-to-Databases.pptx
Introduction-to-Databases.pptxIntroduction-to-Databases.pptx
Introduction-to-Databases.pptx
 
Data Mining
Data MiningData Mining
Data Mining
 
Data Warehouse
Data WarehouseData Warehouse
Data Warehouse
 
DW 101
DW 101DW 101
DW 101
 
Advances And Research Directions In Data-Warehousing Technology
Advances And Research Directions In Data-Warehousing TechnologyAdvances And Research Directions In Data-Warehousing Technology
Advances And Research Directions In Data-Warehousing Technology
 
Enhancing Data Staging as a Mechanism for Fast Data Access
Enhancing Data Staging as a Mechanism for Fast Data AccessEnhancing Data Staging as a Mechanism for Fast Data Access
Enhancing Data Staging as a Mechanism for Fast Data Access
 
Enhancing Data Staging as a Mechanism for Fast Data Access
Enhancing Data Staging as a Mechanism for Fast Data AccessEnhancing Data Staging as a Mechanism for Fast Data Access
Enhancing Data Staging as a Mechanism for Fast Data Access
 
Enhancing Data Staging as a Mechanism for Fast Data Access
Enhancing Data Staging as a Mechanism for Fast Data AccessEnhancing Data Staging as a Mechanism for Fast Data Access
Enhancing Data Staging as a Mechanism for Fast Data Access
 
Enhancing Data Staging as a Mechanism for Fast Data Access
Enhancing Data Staging as a Mechanism for Fast Data AccessEnhancing Data Staging as a Mechanism for Fast Data Access
Enhancing Data Staging as a Mechanism for Fast Data Access
 
ijcatr04081001
ijcatr04081001ijcatr04081001
ijcatr04081001
 
Abstract
AbstractAbstract
Abstract
 
Data Warehousing AWS 12345
Data Warehousing AWS 12345Data Warehousing AWS 12345
Data Warehousing AWS 12345
 
DATAWAREHOUSE MAIn under data mining for
DATAWAREHOUSE MAIn under data mining forDATAWAREHOUSE MAIn under data mining for
DATAWAREHOUSE MAIn under data mining for
 
BVRM 402 IMS UNIT V
BVRM 402 IMS UNIT VBVRM 402 IMS UNIT V
BVRM 402 IMS UNIT V
 
BVRM 402 IMS Database Concept.pptx
BVRM 402 IMS Database Concept.pptxBVRM 402 IMS Database Concept.pptx
BVRM 402 IMS Database Concept.pptx
 
Big data and oracle
Big data and oracleBig data and oracle
Big data and oracle
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousing
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
 

Último

Food processing presentation for bsc agriculture hons
Food processing presentation for bsc agriculture honsFood processing presentation for bsc agriculture hons
Food processing presentation for bsc agriculture honsManeerUddin
 
Activity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationActivity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationRosabel UA
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Seán Kennedy
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfVanessa Camilleri
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4JOYLYNSAMANIEGO
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptxmary850239
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSJoshuaGantuangco2
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfPatidar M
 

Último (20)

Food processing presentation for bsc agriculture hons
Food processing presentation for bsc agriculture honsFood processing presentation for bsc agriculture hons
Food processing presentation for bsc agriculture hons
 
Activity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationActivity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translation
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
 
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptxFINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdf
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptxYOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdf
 

MC0088 Internal Assignment (SMU)

  • 1. MC0088- DATA WAREHOUSING & DATA MINING Que.1 Differentiate between Data Mining and Data Warehousing? Ans: - Data Mining: - Data Mining: A hot buzzword for a class of database applications that look for hidden patterns in a group of data. For example, data mining software can help retail companies find customers with common interests. The term is commonly misused to describe software that presents data in new ways. True data mining software doesn't just change the presentation, but actually discovers previously unknown relationships among the data. Data mining consists of many up-to-date techniques such as classification (decision trees, native Bayes classifier, k-nearest neighbor, and neural networks), clustering (k-means, hierarchical clustering, and density-based clustering), association (one-dimensional, multidimensional, multilevel association, constraint-based association). Many years of practice show that data mining is a process, and its successful application requires data preprocessing (dimensionality reduction, cleaning, noise/outlier removal), post processing (understand ability, summary, presentation), good understanding of problem domains and domain expertise. Data Warehousing: - The construction of data warehouse, which involves data cleaning and data integration, can be viewed as an important preprocessing step for data mining. Moreover, data warehouses provide on-line analytical processing (OLAP) tools for the interactive analysis of multidimensional data of varied granularities, which facilitate effective data mining. Furthermore, many other data mining functions such as classification, prediction, association and clustering can be integrated with OLAP operation to enhance interactive mining of knowledge at multiple levels of abstraction. Hence, the data warehouse has become an increasingly important platform for data analysis and online analytical processing and will provide an effective platform for data mining. Therefore, prior to presenting a systematic coverage of data mining technology in the remainder of this book, we devote this unit to an overview of data warehouse technology. Such an overview is essential for understanding data mining technology. Data warehouses have been defined in many ways, making it difficult to formulate a rigorous definition. A data warehouse refers to a database that is maintained separately from an organization’s operational databases. Data warehouse systems allow for the integration of a variety of application systems. Data warehousing is defined as a process of centralized data management and retrieval. Data warehousing, like data mining, is a relatively new term although the concept itself has been around for years.
  • 2. Que.2 Describe the key features of a Data Warehouse? Ans: - According to W. H. Inmon, a leading architect in the construction of data warehouse systems, “A data warehouse is a subject – oriented, integrated, and time – variant, and nonvolatile collection of data in support of management’s decision making process”. Key features of a Data Warehouse 1) Subject – oriented 2) Integrated 3) Time – variant: 4) Nonvolatile Subject – oriented: - A data warehouse is organized around major subjects, such as customer, supplier, product, and sales. Rather than concentrating on the day-to-day operation and transaction processing of an organization, a data warehouse focuses on the modeling and analysis of data for decision makers. Hence, data warehouses typically provide a simple and concise view around particular subject issues by excluding data that are not useful in the decision support process. Integrated: - A data warehouse is usually constructed by integrating multiple heterogeneous sources, such as relational databases, flat files, and on – line transaction records. Data cleaning and data integration techniques are applied to ensure consistency in naming conventions, encoding structures, attribute measures, and so on. Time – variant: - Data are stored to provide information from a historical perspective (e.g., the past 5 – 10 years). Every key structure in the data warehouse contains, either implicitly or explicitly, an element of time. Nonvolatile: - A data warehouse is always a physically separate store of data transformed from the application data found in the operational environment. Due to this separation, a data warehouse does not require transaction processing, recovery, and concurrency control mechanisms. It usually requires only two operations in data accessing: initial loading of data and access of data. The traditional database approach to heterogeneous database integration is to build wrappers and integrators (or mediators) on top of multiple, heterogeneous databases (examples include IBM Data Joiner and Informix Data Blade). When a query is posed to a client site, a metadata dictionary is used to translate the query into queries appropriate for the individual heterogeneous sites involved. These queries are then mapped and sent to local query processors. The results returned form the different sites are integrated into a global answer set.
  • 3. Que. 3 Differentiate between Data Integration and Transformation? Ans: - Data Integration: - Data Integration is one of the steps of Data Preprocessing that involves combining data residing in different sources and providing users with a unified view of these data It does merging data from multiple data stores (data sources) like as under : - 1) Data Migration 2) Data Synchronization 3) ETL 4) Business Intelligence 5) Master Data Management Data Migration: - Data Migration is the process of transferring data from one system to another while changing the storage, database or application. Data Synchronization: - Data Synchronization is a process of establishing consistency among systems and subsequent continuous updates to maintain consistency. ETL: - ETL comes from Data Warehousing and stands for Extract-Transform-Load. ETL covers a process of how the data are loaded from the source system to the data warehouse. Business Intelligence: - Business Intelligence (BI) is a set of tools supporting the transformation of raw data into useful information which can support decision making. Master Data Management: - Master Data Management (MDM) represents a set of tools and processes used by an enterprise to consistently manage their non-transactional data.
  • 4. Transformation Data transformation is the process of converting data from one format (e.g. a database file, XML document, or Excel sheet) to another. Because data often resides in different locations and formats across the enterprise, data transformation is necessary to ensure data from one application or database is intelligible to other applications and databases, a critical feature for applications integration. In a typical scenario where information needs to be shared, data is extracted from the source application or data warehouse, transformed into another format, and then loaded into the target location. Extraction, transformation, and loading (together known as ETL) are the central processes of data integration. Depending on the nature of the integration scenario, data may need to be merged, aggregated, enriched, summarized, or filtered. The first step of data transformation is data mapping. Data mapping determines the relationship between the data elements of two applications and establishes instructions for how the data from the source application is transformed before it is loaded into the target application. In other words, data mapping produces the critical metadata that is needed before the actual data conversion takes place.
  • 5. Que. 4 Differentiate between database management systems (DBMS) and data mining? Ans: - Database Management System (DBMS) is the software that manages data on physical storage devices. Data Mining: - Data mining is the process of discovering relationships among data in the database. Area DBMS Data mining Task Extraction of detailed and summary data Knowledge discovery of hidden patterns and insights Type of result Information Insight and Prediction Method Deduction (Ask the question, verify the data) Induction (Build the model, apply it to new data, get the result) Example question Who purchased mutual funds in the last 3 years? Who will buy a mutual fund in the next 6 months and why? Data mining is concerned with finding hidden relationships present in business data to allow businesses to make predictions for future use. It is the process of data-driven extraction of not so obvious but useful information from large databases. The aim of data mining is to extract implicit, previously unknown and potentially useful (or actionable) patterns from data. Data mining consists of many up-to-date techniques such as classification (decision trees, naïve bays classifier, k -nearest neighbor, and neural networks), clustering (k-means, hierarchical clustering, and density-based clustering), association (one-dimensional, multidimensional, multilevel association, constraint-based association). Data warehousing is defined as a process of centralized data management and retrieval. Data warehouse is an enabled relational database system designed to support very large databases (VLDB) at a significantly higher level of performance and manageability. Data warehouse is an environment, not a product. It is an architectural construct of information that is hard to accessory present in traditional operational data stores
  • 6. Que. 5 Differentiate between K-means and Hierarchical clustering? Ans: - K-means clustering The k-means algorithm assigns each point to the cluster whose center (also called centroid) is nearest. The center is the average of all the points in the cluster — that is, its coordinates are the arithmetic mean for each dimension separately over all the points in the cluster. Example: The data set has three dimensions and the cluster has two points: X = (x1,x2,x3) and Y = (y1,y2,y3). Then the centroid Z becomes Z = (z1,z2,z3), where The algorithm steps are as under: - Choose the number of clusters, k. Randomly generate k clusters and determine the cluster centers, or directly generate k random points as cluster centers. Assign each point to the nearest cluster center, where "nearest" is defined with respect to one of the distance measures discussed above. Recomputed the new cluster centers. Repeat the two previous steps until some convergence criterion is met (usually that the assignment hasn't changed). The main advantages of this algorithm are its simplicity and speed which allows it to run on large datasets. Its disadvantage is that it does not yield the same result with each run, since the resulting clusters depend on the initial random assignments. Hierarchical clustering: - Hierarchical clustering creates a hierarchy of clusters which may be represented in a tree structure called a dendrogram. The root of the tree consists of a single cluster containing all observations, and the leaves correspond to individual observations. Algorithms for hierarchical clustering are generally either agglomerative, in which one starts at the leaves and successively merges clusters together; or divisive, in which one starts at the root and recursively splits the clusters. Any non-negative-valued function may be used as a measure of similarity between pairs of observations. The choice of which clusters to merge or split is determined by a linkage criterion, which is a function of the pair wise distances between observations. Cutting the tree at a given height will give a clustering at a selected precision. In the following example, cutting after the second row will yield clusters {a} {b c} {d e} {f}. Cutting after the third row will yield clusters {a} {b c} {d e f}, which is a coarser clustering, with a smaller number of larger clusters. This method builds the hierarchy from the individual elements by progressively merging clusters. In our example, we have six elements {a} {b} {c} {d} {e} and {f}. The first step is to determine which elements to merge in a cluster.
  • 7. Que. 6 Differentiate between Web content mining and Web usage mining? Ans: - Web Content Mining: - Web content mining targets the knowledge discovery, in which the main objects are the traditional collections of multimedia documents such as images, video, and audio, which are embedded in or linked to the web pages. It is also quite different from Data mining because Web data are mainly semi-structured and/or unstructured, while Data mining deals primarily with structured data. Web content mining is also different from Text mining because of the semi-structure nature of the Web, while Text mining focuses on unstructured texts. Web content mining thus requires creative applications of Data mining and / or Text mining techniques and also its own unique approaches. In the past few years, there was a rapid expansion of activities in the Web content mining area. This is not surprising because of the phenomenal growth of the Web contents and significant economic benefit of such mining. However, due to the heterogeneity and the lack of structure of Web data, automated discovery of targeted or unexpected knowledge information still present many challenging research problems. Web content mining could be differentiated from two points of view: 1) Agent-based approach 2) Database approach. The first approach aims on improving the information finding and filtering. The second approach aims on modeling the data on the. Web into more structured form in order to apply standard database querying mechanism and data mining applications to analyze it Web Usage Mining: - Web Usage Mining focuses on techniques that could predict the behavior of users while they are interacting with the WWW. Web usage mining, discover user navigation patterns from web data, tries to discover the useful information from the second array data derived from the interactions of the users while surfing on the Web. There are several available research projects and commercial tools that analyze those patterns for different purposes. The insight knowledge could be utilized in personalization, system improvement, site modification, business intelligence and usage characterization. The only information left behind by many users visiting a Web site is the path through the pages they have accessed. Most of the Web information retrieval tools only use the textual information, while they ignore the link information that could be very valuable. In general, there are mainly four kinds of data mining techniques applied to the web mining domain to discover the user navigation pattern: 1) Association Rule mining 2) Sequential pattern 3) Clustering 4) Classification