SlideShare una empresa de Scribd logo
1 de 24
DR MANMOHAN SINGH
Assistant professor
ITM UNIVERSE VDODARA GUJARAT INDIA
Presentation Outline
 Introduction
 Compression Technique
 Association Rule Mining
 Limitation Of Apriori
 Literature Survey
 Problem Statement
 Proposed Work
 Implementation Enviroment
 Conclusion
 References
What Is Data Mining
 Data mining is used to help users discover interesting and useful knowledge more
easily.
 Data compression is one of good solutions to reduce data size.
 Data pre-process transforms the original database into a new data representation.
 It generates a new transaction database at the end of the data pre-process step.
What Is Data Mining
 The figure shows data mining as a step in an iterative knowledge discovery process.
Why Data Mining?
 Data is scattered over network. so it is difficult to find the actual data. Data mining
helps to find that data.
 A business man wants to grow up his business. For that he needs smart data,
techniques ,models , tools etc.
 Data mining helps how we get, use & understand that data. .
 There is a need to extract useful information from the data and to interpret the data.
Application
 Financial Data Analysis
 Retail Industry
 Telecommunication Industry
 Biological Data Analysis
 Other Scientific Applications
 Intrusion Detection
Issues
 Mining Methodology
 User Interaction
 Performance Issues
 Diverse Data Types Issues
Compression technique?
 Make optimal use of limited storage space.
 It reduces the size of the data and improves I/O performance.
 Compression has also been recently applied for reading large scientific files in
parallel file systems.
 Compression decrease bandwidth consumption on networks, and reduce energy
consumption in hardware.
 Compression has been used extensively in wireless networks.
Types Of Compression Techniques
 Null Compression: Replaces a series of blank spaces with a compression code.
 Run length Compression:- Expands on the null compression, by compressing a
series of four repeating characters.
 Keyword Encoding:- Creates a table with values that represent common sets of
character.
 Adaptive Huffman Coding:-Assign fewer bits to symbols that occur more
frequently and more bits to symbols appear less often.
 Lempel Ziv Compession:-
 Building an indexed dictionary
 Compressing a string of symbols
Association Rule Mining
 It is a method for discovering interesting relations between variables in large
databases.
 Intended to identify strong rules discovered in databases using different measures of
interestingness.
 Many Algorithms had been proposed for finding the strong association between the
data sets.
 In which Apriori was the most well known association rule algorithm which was
developed in 1994, having some major issues.
Limitations of Apriori
 Needs several iterations for the scanning of the data.
 Difficulties to find rarely occuring events.
 Works for small set of data.
 Costly wasting of time to hold a vast number of candidate sets.
Sr No Reference Paper Methodology
Used
Future Work
1 Integrating Compression and
Execution in ColumnOriented
Database Systems by Daniel J.
Abadi,Samuel R. Madden,Miguel &
C.Ferreira.
Column-Oriented
Database system
architecture
NIL
2 Integrating Online Compression To
Accelerate Large-Scale Data
Analytics Application. By Tekin
Bicer, Jian Yin,. David Chiu,Gagan
Agrawal,& Karen Schuchardt
Chunk Resource
Allocation , Parallel
Compressioon Engine
NIL
3 Efficient Mining Frequent Itemsets
Algorithms.By Marghny H.
Mohamed, & Mohammed M.
Darwieesh.
Count Table , Binary
Count Table
Extend the algorithms to mine
other kinds of patterns, such
as sequential patteern mining
problem,
4 A Transaction Mapping Algorithm
For Frequent Itemsets Mining By
Mingjun Song, & Sanguthevar
Rajasekaran.
Transaction Mapping
Algorithm
To Improve the
implementation of the TM
algorithm and make a fair
comparison with FP-growth.
Sr No Reference Paper Methodology
Used
Future Work
5. Compact Transaction Database For
Efficient Ffrequent Pattern Mining By
Qian Wan & Aijun An.
Compact Tree
Structure Called CT-
tree
NIL
6. A New Association Rules Mining
Algorithm Based On Vector By xin
Zhang, Pin Liao & Huiyong Wang.
Association rule
mining algorithm
based on vector.
NIL
Problem Statement
 They all lack the ability to decompress the data to their original state and improve
the data mining performance..
 It is even a bigger challenge to maintain the compressed database in the future
 It spends too much time to check candidate itemsets in the data mining step.
 Unable to enter the data set at runtime
Original database
Sorted database
Sorted database
Group1
Sorted database
Group2
Sorted database
Group3
Compressed dataset
and generate merged
group
Compressed transaction dataset
Generate frequent item
set by simple apriori
algorithms
Now generate association rules and uncompressed
dataset
Proposed Work
The main criteria of research are related to the followings:-
(a) The compressed database can be decompressed to the original form.
(b) Reduce the process time of association rule mining by using a quantification table.
(c) Reduce I/O time by using only the compressed database to do data mining.
(d) Allow incremental data mining.
Implementation Enviroment
 Minimum Hardware Requirement:
1. 3 GHZ Pentium PC Machine.
2. 512 Megabytes Main Memory
3. Screen Resolution needs to be between 800*600 & 1200*800.
 Minimum Software Requirement:
1. Operating system microsoft windows XP.
2. Microsoft Visual Studio.net(C#).
Conclusion
 Rapid Increase of large data become a point of concern.
 i.e, time required for data pre-process.
 Hence, the proposed algorithm can be benificial while dealing with such large data.
 As, it can decompressed the data also after compression.
 It can also reduce the I/O time by using only compressed database.
References
1. Xin Zhang, Pin Liao and Huiyong Wang ”A New Association Rules Mining
Algorithm Based On Vector”, 2009 Third International Conference on Genetic and
Evolutionary Computing
2. Qian Wan And Aijun An” Compact Transaction database For Efficient Frequent
Pattern Mining” Department of Computer Science and Engineering York
University, Toronto, Ontario, M3J 1P3, Canada
3. Jis-Yu Dai, Don-lin Yang, Jungpin Wu, And Ming-Chuan Hung-” An Efficient
Data Mining Approach on Compressed Transactions.” International Journal of
Electrical and Computer Engineering 3:2 2008
References
4. Wael Ahmad AlZoubi, Khairuddin Omar, Azuraliza Abu Bakar” An Efficient
Mining of Trasactional Data Using Graph-Based Technique” 2011 3rd Conference
on Data Mining and Optimization (DMO) 28-29 June 2011, Selangor, Malaysia
5. Mingjun Song And Sanguthevar Rajasekaran, “A Transaction Mapping Algorithm
For Frequent Itemsets Mining” IEEE TRANSACTIONS ON KNOWLEDGE AND
DATA ENGINEERING, October 2005.
6. Marghny H. Mohamed, Mohammed M. Darwieesh,”Efficient Mining Frequent
Itemsets Algorithm”. Revised: 7 March 2012/Accepted 29 April 2013 Springer-
Verlag Berlin Heidelberg 2013.
References
7. Fan Zhang, Yan Zhang Jason Bakos,” GP Apriori: GPU-Accelerated Frequent
Itemset Mining”. 2011 IEEE International Conference On Cluster Computing
8. Tekin Bicer, Jian Yin, David Chiu, Gagan Agrawal And Karen Schuchardt“
Integrating Online Compression To Accelerate large-Scale Data Analytics
Application”. 2013 IEEE 27th
International Sympoosium on parallel & distributed
processing.
9. Daniel J. Abadi, Samuel R. Madden, Miguel C. Ferreira”Integrating
Compression And Execution In Column-Oriented Database Systems”, SIGMOD
2006, June 27–29, 2006, Chicago, llinois, USA.Copyright 2006 ACM
1595932569/06/0006.
References
10. Shalini Dutt, Naveen Choudhary & Dharm Singh, “ An Improved Apriori
Algorithm Based On Matrix Data Structure”, Global Journal Of Computer
Science And Technology : C Software & Data Engineering, Vol. 14 Issues
5/Version 1.0 Year 2014.
11. Wael A.ALZoubi, Azuraliza Abu Bakar, Khairuddin Omar, “Scalable And
Efficient Method For Mining Association Rules, ”2009 International Conference
On Electrical Engineering And Infrmatics 5-7 August 2009, Selangor Malaysia.
12. Loan T.T.Nguyen, Bay Vo, Tzung-Pei Hong,Hoang Chi Thanh,“CAR-Miner: An
Efficient Algorithm For Mining Class-Association Rules,”Expert system With
Applications 40(2013) 2305-2311, 2012@Elsevier Ltd. All Rights.
References
10. Mohammed Al-Maolegi, Bassam Arkok, “An Improved Apriori Algorithm For
Association Rules ,” International Journal On Natural Language
Computing(IJNLC) Vol. 3, N.1, Feburary 2014.
ANY QUERY?

Más contenido relacionado

La actualidad más candente

IRJET- Top-K Query Processing using Top Order Preserving Encryption (TOPE)
IRJET- Top-K Query Processing using Top Order Preserving Encryption (TOPE)IRJET- Top-K Query Processing using Top Order Preserving Encryption (TOPE)
IRJET- Top-K Query Processing using Top Order Preserving Encryption (TOPE)IRJET Journal
 
Protection of big data privacy
Protection of big data privacyProtection of big data privacy
Protection of big data privacyredpel dot com
 
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...IRJET Journal
 
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...Robert Grossman
 
Different Classification Technique for Data mining in Insurance Industry usin...
Different Classification Technique for Data mining in Insurance Industry usin...Different Classification Technique for Data mining in Insurance Industry usin...
Different Classification Technique for Data mining in Insurance Industry usin...IOSRjournaljce
 
A Review Paper on Big Data and Hadoop for Data Science
A Review Paper on Big Data and Hadoop for Data ScienceA Review Paper on Big Data and Hadoop for Data Science
A Review Paper on Big Data and Hadoop for Data Scienceijtsrd
 
Data minig with Big data analysis
Data minig with Big data analysisData minig with Big data analysis
Data minig with Big data analysisPoonam Kshirsagar
 
A Novel Integrated Framework to Ensure Better Data Quality in Big Data Analyt...
A Novel Integrated Framework to Ensure Better Data Quality in Big Data Analyt...A Novel Integrated Framework to Ensure Better Data Quality in Big Data Analyt...
A Novel Integrated Framework to Ensure Better Data Quality in Big Data Analyt...IJECEIAES
 
Sample Paper.doc.doc
Sample Paper.doc.docSample Paper.doc.doc
Sample Paper.doc.docbutest
 
Data Mining and Big Data Challenges and Research Opportunities
Data Mining and Big Data Challenges and Research OpportunitiesData Mining and Big Data Challenges and Research Opportunities
Data Mining and Big Data Challenges and Research OpportunitiesKathirvel Ayyaswamy
 
Big Data in Bioinformatics & the Era of Cloud Computing
Big Data in Bioinformatics & the Era of Cloud ComputingBig Data in Bioinformatics & the Era of Cloud Computing
Big Data in Bioinformatics & the Era of Cloud ComputingIOSR Journals
 
Data mining & big data presentation 01
Data mining & big data presentation 01Data mining & big data presentation 01
Data mining & big data presentation 01Aseem Chakrabarthy
 
Data repository for sensor network a data mining approach
Data repository for sensor network  a data mining approachData repository for sensor network  a data mining approach
Data repository for sensor network a data mining approachijdms
 
Anonymization of data using mapreduce on cloud
Anonymization of data using mapreduce on cloudAnonymization of data using mapreduce on cloud
Anonymization of data using mapreduce on cloudeSAT Journals
 

La actualidad más candente (19)

IRJET- Top-K Query Processing using Top Order Preserving Encryption (TOPE)
IRJET- Top-K Query Processing using Top Order Preserving Encryption (TOPE)IRJET- Top-K Query Processing using Top Order Preserving Encryption (TOPE)
IRJET- Top-K Query Processing using Top Order Preserving Encryption (TOPE)
 
U0 vqmtq3m tc=
U0 vqmtq3m tc=U0 vqmtq3m tc=
U0 vqmtq3m tc=
 
Protection of big data privacy
Protection of big data privacyProtection of big data privacy
Protection of big data privacy
 
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
 
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
 
Different Classification Technique for Data mining in Insurance Industry usin...
Different Classification Technique for Data mining in Insurance Industry usin...Different Classification Technique for Data mining in Insurance Industry usin...
Different Classification Technique for Data mining in Insurance Industry usin...
 
A Review Paper on Big Data and Hadoop for Data Science
A Review Paper on Big Data and Hadoop for Data ScienceA Review Paper on Big Data and Hadoop for Data Science
A Review Paper on Big Data and Hadoop for Data Science
 
Data minig with Big data analysis
Data minig with Big data analysisData minig with Big data analysis
Data minig with Big data analysis
 
A Novel Integrated Framework to Ensure Better Data Quality in Big Data Analyt...
A Novel Integrated Framework to Ensure Better Data Quality in Big Data Analyt...A Novel Integrated Framework to Ensure Better Data Quality in Big Data Analyt...
A Novel Integrated Framework to Ensure Better Data Quality in Big Data Analyt...
 
Sample Paper.doc.doc
Sample Paper.doc.docSample Paper.doc.doc
Sample Paper.doc.doc
 
Big data mining
Big data miningBig data mining
Big data mining
 
Data Mining and Big Data Challenges and Research Opportunities
Data Mining and Big Data Challenges and Research OpportunitiesData Mining and Big Data Challenges and Research Opportunities
Data Mining and Big Data Challenges and Research Opportunities
 
Big Data in Bioinformatics & the Era of Cloud Computing
Big Data in Bioinformatics & the Era of Cloud ComputingBig Data in Bioinformatics & the Era of Cloud Computing
Big Data in Bioinformatics & the Era of Cloud Computing
 
Data mining & big data presentation 01
Data mining & big data presentation 01Data mining & big data presentation 01
Data mining & big data presentation 01
 
Data repository for sensor network a data mining approach
Data repository for sensor network  a data mining approachData repository for sensor network  a data mining approach
Data repository for sensor network a data mining approach
 
Challenges of Big Data Research
Challenges of Big Data ResearchChallenges of Big Data Research
Challenges of Big Data Research
 
Data mining on big data
Data mining on big dataData mining on big data
Data mining on big data
 
Anonymization of data using mapreduce on cloud
Anonymization of data using mapreduce on cloudAnonymization of data using mapreduce on cloud
Anonymization of data using mapreduce on cloud
 
Ijariie1184
Ijariie1184Ijariie1184
Ijariie1184
 

Destacado

Fp growth tree improve its efficiency and scalability
Fp growth tree improve its efficiency and scalabilityFp growth tree improve its efficiency and scalability
Fp growth tree improve its efficiency and scalabilityDr.Manmohan Singh
 
Data Compression Technique
Data Compression TechniqueData Compression Technique
Data Compression Techniquenayakslideshare
 
data compression technique
data compression techniquedata compression technique
data compression techniqueCHINMOY PAUL
 
Data mining slides
Data mining slidesData mining slides
Data mining slidessmj
 
Data Mining Concepts
Data Mining ConceptsData Mining Concepts
Data Mining ConceptsDung Nguyen
 
Data Warehousing and Data Mining
Data Warehousing and Data MiningData Warehousing and Data Mining
Data Warehousing and Data Miningidnats
 
Data mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesData mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesSaif Ullah
 
Data Quality, Data Mining & Applications of Data Mining in Banking Sector
Data Quality, Data Mining & Applications of Data Mining in Banking SectorData Quality, Data Mining & Applications of Data Mining in Banking Sector
Data Quality, Data Mining & Applications of Data Mining in Banking SectorSonu Mamman
 
A vertical representation in frequent item set mining
A vertical representation in frequent item set miningA vertical representation in frequent item set mining
A vertical representation in frequent item set miningDr.Manmohan Singh
 
Chapter 5 - Data Compression
Chapter 5 - Data CompressionChapter 5 - Data Compression
Chapter 5 - Data CompressionPratik Pradhan
 
Big data Clustering Algorithms And Strategies
Big data Clustering Algorithms And StrategiesBig data Clustering Algorithms And Strategies
Big data Clustering Algorithms And StrategiesFarzad Nozarian
 
Internet telephony
Internet telephonyInternet telephony
Internet telephonySajan Sahu
 

Destacado (19)

Fp growth tree improve its efficiency and scalability
Fp growth tree improve its efficiency and scalabilityFp growth tree improve its efficiency and scalability
Fp growth tree improve its efficiency and scalability
 
Data Compression Technique
Data Compression TechniqueData Compression Technique
Data Compression Technique
 
data compression technique
data compression techniquedata compression technique
data compression technique
 
Data mining slides
Data mining slidesData mining slides
Data mining slides
 
Data Mining Concepts
Data Mining ConceptsData Mining Concepts
Data Mining Concepts
 
Data mining
Data miningData mining
Data mining
 
Data Warehousing and Data Mining
Data Warehousing and Data MiningData Warehousing and Data Mining
Data Warehousing and Data Mining
 
Data mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesData mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniques
 
Dr. Manmohan Singh
Dr. Manmohan SinghDr. Manmohan Singh
Dr. Manmohan Singh
 
Data Quality, Data Mining & Applications of Data Mining in Banking Sector
Data Quality, Data Mining & Applications of Data Mining in Banking SectorData Quality, Data Mining & Applications of Data Mining in Banking Sector
Data Quality, Data Mining & Applications of Data Mining in Banking Sector
 
A vertical representation in frequent item set mining
A vertical representation in frequent item set miningA vertical representation in frequent item set mining
A vertical representation in frequent item set mining
 
Chapter 5 - Data Compression
Chapter 5 - Data CompressionChapter 5 - Data Compression
Chapter 5 - Data Compression
 
General presentation
General presentationGeneral presentation
General presentation
 
Seminar report on ip telephony
Seminar report on ip telephonySeminar report on ip telephony
Seminar report on ip telephony
 
Introduction to telephony
Introduction to telephonyIntroduction to telephony
Introduction to telephony
 
Big data Clustering Algorithms And Strategies
Big data Clustering Algorithms And StrategiesBig data Clustering Algorithms And Strategies
Big data Clustering Algorithms And Strategies
 
Internet telephony
Internet telephonyInternet telephony
Internet telephony
 
OLED report 2014
OLED report 2014OLED report 2014
OLED report 2014
 
Ip telephony
Ip telephonyIp telephony
Ip telephony
 

Similar a Integrating compression technique for data mining

MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...acijjournal
 
An Efficient Compressed Data Structure Based Method for Frequent Item Set Mining
An Efficient Compressed Data Structure Based Method for Frequent Item Set MiningAn Efficient Compressed Data Structure Based Method for Frequent Item Set Mining
An Efficient Compressed Data Structure Based Method for Frequent Item Set Miningijsrd.com
 
A Quantified Approach for large Dataset Compression in Association Mining
A Quantified Approach for large Dataset Compression in Association MiningA Quantified Approach for large Dataset Compression in Association Mining
A Quantified Approach for large Dataset Compression in Association MiningIOSR Journals
 
Implementation of Improved Apriori Algorithm on Large Dataset using Hadoop
Implementation of Improved Apriori Algorithm on Large Dataset using HadoopImplementation of Improved Apriori Algorithm on Large Dataset using Hadoop
Implementation of Improved Apriori Algorithm on Large Dataset using HadoopBRNSSPublicationHubI
 
A genetic based research framework 3
A genetic based research framework 3A genetic based research framework 3
A genetic based research framework 3prj_publication
 
Mining frequent itemsets (mfi) over
Mining frequent itemsets (mfi) overMining frequent itemsets (mfi) over
Mining frequent itemsets (mfi) overIJDKP
 
A Survey on Data Mining
A Survey on Data MiningA Survey on Data Mining
A Survey on Data MiningIOSR Journals
 
A Brief Overview On Frequent Pattern Mining Algorithms
A Brief Overview On Frequent Pattern Mining AlgorithmsA Brief Overview On Frequent Pattern Mining Algorithms
A Brief Overview On Frequent Pattern Mining AlgorithmsSara Alvarez
 
Top cited articles 2020 - Advanced Computational Intelligence: An Internation...
Top cited articles 2020 - Advanced Computational Intelligence: An Internation...Top cited articles 2020 - Advanced Computational Intelligence: An Internation...
Top cited articles 2020 - Advanced Computational Intelligence: An Internation...aciijournal
 
Web Oriented FIM for large scale dataset using Hadoop
Web Oriented FIM for large scale dataset using HadoopWeb Oriented FIM for large scale dataset using Hadoop
Web Oriented FIM for large scale dataset using Hadoopdbpublications
 
Parallel Key Value Pattern Matching Model
Parallel Key Value Pattern Matching ModelParallel Key Value Pattern Matching Model
Parallel Key Value Pattern Matching Modelijsrd.com
 
REVIEW: Frequent Pattern Mining Techniques
REVIEW: Frequent Pattern Mining TechniquesREVIEW: Frequent Pattern Mining Techniques
REVIEW: Frequent Pattern Mining TechniquesEditor IJMTER
 
Data Mining: Future Trends and Applications
Data Mining: Future Trends and ApplicationsData Mining: Future Trends and Applications
Data Mining: Future Trends and ApplicationsIJMER
 
Hadoop Map-Reduce To Generate Frequent Item Set on Large Datasets Using Impro...
Hadoop Map-Reduce To Generate Frequent Item Set on Large Datasets Using Impro...Hadoop Map-Reduce To Generate Frequent Item Set on Large Datasets Using Impro...
Hadoop Map-Reduce To Generate Frequent Item Set on Large Datasets Using Impro...BRNSSPublicationHubI
 

Similar a Integrating compression technique for data mining (20)

MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...
 
Ijetcas14 316
Ijetcas14 316Ijetcas14 316
Ijetcas14 316
 
An Efficient Compressed Data Structure Based Method for Frequent Item Set Mining
An Efficient Compressed Data Structure Based Method for Frequent Item Set MiningAn Efficient Compressed Data Structure Based Method for Frequent Item Set Mining
An Efficient Compressed Data Structure Based Method for Frequent Item Set Mining
 
Aa31163168
Aa31163168Aa31163168
Aa31163168
 
Study on Positive and Negative Rule Based Mining Techniques for E-Commerce Ap...
Study on Positive and Negative Rule Based Mining Techniques for E-Commerce Ap...Study on Positive and Negative Rule Based Mining Techniques for E-Commerce Ap...
Study on Positive and Negative Rule Based Mining Techniques for E-Commerce Ap...
 
A Quantified Approach for large Dataset Compression in Association Mining
A Quantified Approach for large Dataset Compression in Association MiningA Quantified Approach for large Dataset Compression in Association Mining
A Quantified Approach for large Dataset Compression in Association Mining
 
An introduction to data mining
An introduction to data miningAn introduction to data mining
An introduction to data mining
 
Implementation of Improved Apriori Algorithm on Large Dataset using Hadoop
Implementation of Improved Apriori Algorithm on Large Dataset using HadoopImplementation of Improved Apriori Algorithm on Large Dataset using Hadoop
Implementation of Improved Apriori Algorithm on Large Dataset using Hadoop
 
A genetic based research framework 3
A genetic based research framework 3A genetic based research framework 3
A genetic based research framework 3
 
Mining frequent itemsets (mfi) over
Mining frequent itemsets (mfi) overMining frequent itemsets (mfi) over
Mining frequent itemsets (mfi) over
 
386 390
386 390386 390
386 390
 
386 390
386 390386 390
386 390
 
A Survey on Data Mining
A Survey on Data MiningA Survey on Data Mining
A Survey on Data Mining
 
A Brief Overview On Frequent Pattern Mining Algorithms
A Brief Overview On Frequent Pattern Mining AlgorithmsA Brief Overview On Frequent Pattern Mining Algorithms
A Brief Overview On Frequent Pattern Mining Algorithms
 
Top cited articles 2020 - Advanced Computational Intelligence: An Internation...
Top cited articles 2020 - Advanced Computational Intelligence: An Internation...Top cited articles 2020 - Advanced Computational Intelligence: An Internation...
Top cited articles 2020 - Advanced Computational Intelligence: An Internation...
 
Web Oriented FIM for large scale dataset using Hadoop
Web Oriented FIM for large scale dataset using HadoopWeb Oriented FIM for large scale dataset using Hadoop
Web Oriented FIM for large scale dataset using Hadoop
 
Parallel Key Value Pattern Matching Model
Parallel Key Value Pattern Matching ModelParallel Key Value Pattern Matching Model
Parallel Key Value Pattern Matching Model
 
REVIEW: Frequent Pattern Mining Techniques
REVIEW: Frequent Pattern Mining TechniquesREVIEW: Frequent Pattern Mining Techniques
REVIEW: Frequent Pattern Mining Techniques
 
Data Mining: Future Trends and Applications
Data Mining: Future Trends and ApplicationsData Mining: Future Trends and Applications
Data Mining: Future Trends and Applications
 
Hadoop Map-Reduce To Generate Frequent Item Set on Large Datasets Using Impro...
Hadoop Map-Reduce To Generate Frequent Item Set on Large Datasets Using Impro...Hadoop Map-Reduce To Generate Frequent Item Set on Large Datasets Using Impro...
Hadoop Map-Reduce To Generate Frequent Item Set on Large Datasets Using Impro...
 

Último

Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701bronxfugly43
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...Poonam Aher Patil
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxheathfieldcps1
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docxPoojaSen20
 
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural ResourcesEnergy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural ResourcesShubhangi Sonawane
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfChris Hunter
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.MaryamAhmad92
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 

Último (20)

Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural ResourcesEnergy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 

Integrating compression technique for data mining

  • 1. DR MANMOHAN SINGH Assistant professor ITM UNIVERSE VDODARA GUJARAT INDIA
  • 2. Presentation Outline  Introduction  Compression Technique  Association Rule Mining  Limitation Of Apriori  Literature Survey  Problem Statement  Proposed Work  Implementation Enviroment  Conclusion  References
  • 3. What Is Data Mining  Data mining is used to help users discover interesting and useful knowledge more easily.  Data compression is one of good solutions to reduce data size.  Data pre-process transforms the original database into a new data representation.  It generates a new transaction database at the end of the data pre-process step.
  • 4. What Is Data Mining  The figure shows data mining as a step in an iterative knowledge discovery process.
  • 5. Why Data Mining?  Data is scattered over network. so it is difficult to find the actual data. Data mining helps to find that data.  A business man wants to grow up his business. For that he needs smart data, techniques ,models , tools etc.  Data mining helps how we get, use & understand that data. .  There is a need to extract useful information from the data and to interpret the data.
  • 6. Application  Financial Data Analysis  Retail Industry  Telecommunication Industry  Biological Data Analysis  Other Scientific Applications  Intrusion Detection
  • 7. Issues  Mining Methodology  User Interaction  Performance Issues  Diverse Data Types Issues
  • 8. Compression technique?  Make optimal use of limited storage space.  It reduces the size of the data and improves I/O performance.  Compression has also been recently applied for reading large scientific files in parallel file systems.  Compression decrease bandwidth consumption on networks, and reduce energy consumption in hardware.  Compression has been used extensively in wireless networks.
  • 9. Types Of Compression Techniques  Null Compression: Replaces a series of blank spaces with a compression code.  Run length Compression:- Expands on the null compression, by compressing a series of four repeating characters.  Keyword Encoding:- Creates a table with values that represent common sets of character.  Adaptive Huffman Coding:-Assign fewer bits to symbols that occur more frequently and more bits to symbols appear less often.  Lempel Ziv Compession:-  Building an indexed dictionary  Compressing a string of symbols
  • 10. Association Rule Mining  It is a method for discovering interesting relations between variables in large databases.  Intended to identify strong rules discovered in databases using different measures of interestingness.  Many Algorithms had been proposed for finding the strong association between the data sets.  In which Apriori was the most well known association rule algorithm which was developed in 1994, having some major issues.
  • 11. Limitations of Apriori  Needs several iterations for the scanning of the data.  Difficulties to find rarely occuring events.  Works for small set of data.  Costly wasting of time to hold a vast number of candidate sets.
  • 12. Sr No Reference Paper Methodology Used Future Work 1 Integrating Compression and Execution in ColumnOriented Database Systems by Daniel J. Abadi,Samuel R. Madden,Miguel & C.Ferreira. Column-Oriented Database system architecture NIL 2 Integrating Online Compression To Accelerate Large-Scale Data Analytics Application. By Tekin Bicer, Jian Yin,. David Chiu,Gagan Agrawal,& Karen Schuchardt Chunk Resource Allocation , Parallel Compressioon Engine NIL 3 Efficient Mining Frequent Itemsets Algorithms.By Marghny H. Mohamed, & Mohammed M. Darwieesh. Count Table , Binary Count Table Extend the algorithms to mine other kinds of patterns, such as sequential patteern mining problem, 4 A Transaction Mapping Algorithm For Frequent Itemsets Mining By Mingjun Song, & Sanguthevar Rajasekaran. Transaction Mapping Algorithm To Improve the implementation of the TM algorithm and make a fair comparison with FP-growth.
  • 13. Sr No Reference Paper Methodology Used Future Work 5. Compact Transaction Database For Efficient Ffrequent Pattern Mining By Qian Wan & Aijun An. Compact Tree Structure Called CT- tree NIL 6. A New Association Rules Mining Algorithm Based On Vector By xin Zhang, Pin Liao & Huiyong Wang. Association rule mining algorithm based on vector. NIL
  • 14. Problem Statement  They all lack the ability to decompress the data to their original state and improve the data mining performance..  It is even a bigger challenge to maintain the compressed database in the future  It spends too much time to check candidate itemsets in the data mining step.  Unable to enter the data set at runtime
  • 15. Original database Sorted database Sorted database Group1 Sorted database Group2 Sorted database Group3 Compressed dataset and generate merged group Compressed transaction dataset Generate frequent item set by simple apriori algorithms Now generate association rules and uncompressed dataset
  • 16. Proposed Work The main criteria of research are related to the followings:- (a) The compressed database can be decompressed to the original form. (b) Reduce the process time of association rule mining by using a quantification table. (c) Reduce I/O time by using only the compressed database to do data mining. (d) Allow incremental data mining.
  • 17. Implementation Enviroment  Minimum Hardware Requirement: 1. 3 GHZ Pentium PC Machine. 2. 512 Megabytes Main Memory 3. Screen Resolution needs to be between 800*600 & 1200*800.  Minimum Software Requirement: 1. Operating system microsoft windows XP. 2. Microsoft Visual Studio.net(C#).
  • 18. Conclusion  Rapid Increase of large data become a point of concern.  i.e, time required for data pre-process.  Hence, the proposed algorithm can be benificial while dealing with such large data.  As, it can decompressed the data also after compression.  It can also reduce the I/O time by using only compressed database.
  • 19. References 1. Xin Zhang, Pin Liao and Huiyong Wang ”A New Association Rules Mining Algorithm Based On Vector”, 2009 Third International Conference on Genetic and Evolutionary Computing 2. Qian Wan And Aijun An” Compact Transaction database For Efficient Frequent Pattern Mining” Department of Computer Science and Engineering York University, Toronto, Ontario, M3J 1P3, Canada 3. Jis-Yu Dai, Don-lin Yang, Jungpin Wu, And Ming-Chuan Hung-” An Efficient Data Mining Approach on Compressed Transactions.” International Journal of Electrical and Computer Engineering 3:2 2008
  • 20. References 4. Wael Ahmad AlZoubi, Khairuddin Omar, Azuraliza Abu Bakar” An Efficient Mining of Trasactional Data Using Graph-Based Technique” 2011 3rd Conference on Data Mining and Optimization (DMO) 28-29 June 2011, Selangor, Malaysia 5. Mingjun Song And Sanguthevar Rajasekaran, “A Transaction Mapping Algorithm For Frequent Itemsets Mining” IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, October 2005. 6. Marghny H. Mohamed, Mohammed M. Darwieesh,”Efficient Mining Frequent Itemsets Algorithm”. Revised: 7 March 2012/Accepted 29 April 2013 Springer- Verlag Berlin Heidelberg 2013.
  • 21. References 7. Fan Zhang, Yan Zhang Jason Bakos,” GP Apriori: GPU-Accelerated Frequent Itemset Mining”. 2011 IEEE International Conference On Cluster Computing 8. Tekin Bicer, Jian Yin, David Chiu, Gagan Agrawal And Karen Schuchardt“ Integrating Online Compression To Accelerate large-Scale Data Analytics Application”. 2013 IEEE 27th International Sympoosium on parallel & distributed processing. 9. Daniel J. Abadi, Samuel R. Madden, Miguel C. Ferreira”Integrating Compression And Execution In Column-Oriented Database Systems”, SIGMOD 2006, June 27–29, 2006, Chicago, llinois, USA.Copyright 2006 ACM 1595932569/06/0006.
  • 22. References 10. Shalini Dutt, Naveen Choudhary & Dharm Singh, “ An Improved Apriori Algorithm Based On Matrix Data Structure”, Global Journal Of Computer Science And Technology : C Software & Data Engineering, Vol. 14 Issues 5/Version 1.0 Year 2014. 11. Wael A.ALZoubi, Azuraliza Abu Bakar, Khairuddin Omar, “Scalable And Efficient Method For Mining Association Rules, ”2009 International Conference On Electrical Engineering And Infrmatics 5-7 August 2009, Selangor Malaysia. 12. Loan T.T.Nguyen, Bay Vo, Tzung-Pei Hong,Hoang Chi Thanh,“CAR-Miner: An Efficient Algorithm For Mining Class-Association Rules,”Expert system With Applications 40(2013) 2305-2311, 2012@Elsevier Ltd. All Rights.
  • 23. References 10. Mohammed Al-Maolegi, Bassam Arkok, “An Improved Apriori Algorithm For Association Rules ,” International Journal On Natural Language Computing(IJNLC) Vol. 3, N.1, Feburary 2014.