SlideShare una empresa de Scribd logo
1 de 30
Data mining With Big Data
 -xsandy
Outlines
 Introduction
 What is Big Data?
 How Much Data really Exist?
 Literature Review
 4Vs of Big Data
 Proposed System
 System Architecture
 Big Data mining Framework
 Hadoop Framework
 Big Data Challenges and solution
 Conclusion
Introduction
Interesting Facts
 The volume of business data worldwide, across all companies, doubles every
1.2 years (was 1.5 years)
 Daily 2500 quadrillion of data are produced and more than 90 percentage of
data are produced within past two years.
 A regular person is processing daily more data than a 16th century individual
in his entire life
 In the last years cost of storage and processing power dropped significantly
 Bad data or poor data quality costs US businesses $600 billion annually
 Facebook processes 10 TB of data every day / Twitter 7 TB
 Google has over 3 million servers processing over 2 trillion searches per year
in 2012 (only 22 million in 2000)
What is
“Big Data is the frontier of a firm's ability to
store, process, and access (SPA) all the
data it needs to operate effectively, make
decisions, reduce risks, and serve
customers.”
-- Forrester
“Big Data is the frontier of a firm's ability to
store, process, and access (SPA) all the data
it needs to operate effectively, make
decisions, reduce risks, and serve
customers.”
-- Forrester
“Big data is the data characterized by 3
attributes: volume, variety and velocity.”
-- IBM
“Big data is the data characterized by 3
attributes: volume, variety and velocity.”
-- IBM
Big Data is not about the size of the data,
it’s about the value within the data.
What is …… ?
 Data Mining
‣ computational process of discovering patterns in
large data sets
 Big Data
The term Big data is used to describe a massive
volume of both structured and unstructured data
that is so large that it's difficult to process using
traditional database and software techniques.
 ‘Big Data’ is similar to ‘small data’, but bigger
 …but having data bigger it requires different approaches:
 Techniques, tools and architecture
 …with an aim to solve new problems
 …or old problems in a better way
How much Data does exist?
 2.5 quintillion bytes of data are created EVERY DAY
 IBM: 90 percent of the data in the world today were produced
with past two years
 Forms of Data????
 Examples : Boing Jet, Scientific Data, Sensor Data, Internet
Data,
Literature Review
 Data has grown tremendously.
 This large amount of data is beyond the software tools to
manage.
 Exploring the large volume of data and extracting useful
information and knowledge is a challenge, and sometimes, it is
almost infeasible.
 Most people don’t know what to do with all data that they
already have
Giant Elephant
 Huge Data with heterogeneous and diverse dimensionality
‣ represent huge volume of data
 Autonomous sources with distributed and decentralized control
‣ main characteristics of Big Data
 Complex and evolving relationships
4 Vs of Big Data
Volume
• Data
quantity
Velocity
• Data Speed
Variety
• Data Types
Veracity
• Authenticity
Proposed System:
 Identify relationships between different idea
 Capable of handling Huge volume of Data
 Uses distributed parallel computing with help of Hadoop
 Provides platform for process data in different dimensions and summarized
results.
 system architecture is to be flexible enough that the components built on top
of it for expressing the various kinds of processing tasks can tune it to
efficiently run these different workloads.
 System will process these data within reasonable cost and time limits.
Gap due to Lack of analysis
System Architecture:
Hadoop framework :
Big Data Mining framework
 Big Data Mining Platform
 Dig Data Semantics and Application Knowledge
I. Information Sharing and Data Privacy
II. Domain and Application Knowledge
 Big Data Mining Algorithm
I. Local Learning and Model Fusion for Multiple
Information Sources
II. mining from Sparse, Uncertain, and Incomplete Data
III. Mining Complex and Dynamic Data
Big Data mining Framework
Challenges
Location of Big Data sources- Commonly Big Data are
stored in different locations
Volume of the Big Data- size of the Big Data grows
continuously.
Hardware resources- RAM capacity
Privacy- Medical reports, bank transactions
Having domain knowledge
Getting meaningful information
Solutions
Parallel computing programming
An efficient platform for computing will not have
centralized data storage instead of that platform
will be distributed in big scale storage.
Restricting access to the data
Advantages:
 Fast response
 Extract useful information
 Prediction of required data from large amount of data.
 Savour of better results in the form of visualization.
Conclusion
 We have entered an era of Big Data. Through better analysis of the large
volumes of data that are becoming available, there is the potential for
making faster advances in many scientific and improving the profitability and
success of many enterprises by using technologies like hadoop ,pig and so on.
 Proposed system will fully serviceable across a large variety of application
domains, and therefore not cost-effective to address in the context of one
domain alone.
 Furthermore, this system will provide fully transformative solutions, and will
be address naturally for the next generation of industrial applications. We
must support and encourage this proposed framework towards addressing
these technical challenges of unstructured data, if we are to achieve the
promised benefits of Big Data.
Data mining with big data
Data mining with big data

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Big Data
Big DataBig Data
Big Data
 
Big data Ppt
Big data PptBig data Ppt
Big data Ppt
 
Big Data, Big Deal: For Future Big Data Scientists
Big Data, Big Deal: For Future Big Data ScientistsBig Data, Big Deal: For Future Big Data Scientists
Big Data, Big Deal: For Future Big Data Scientists
 
Big Data ppt
Big Data pptBig Data ppt
Big Data ppt
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
Big data introduction
Big data introductionBig data introduction
Big data introduction
 
Big data and its applications
Big data and its applicationsBig data and its applications
Big data and its applications
 
Big data
Big dataBig data
Big data
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
 
Big data mining
Big data miningBig data mining
Big data mining
 
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
 
Big_data_ppt
Big_data_ppt Big_data_ppt
Big_data_ppt
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Big Data Presentation
Big  Data PresentationBig  Data Presentation
Big Data Presentation
 
Overview of Big data(ppt)
Overview of Big data(ppt)Overview of Big data(ppt)
Overview of Big data(ppt)
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
A Big Data Concept
A Big Data ConceptA Big Data Concept
A Big Data Concept
 

Similar a Data mining with big data

Bigdata " new level"
Bigdata " new level"Bigdata " new level"
Bigdata " new level"
Vamshikrishna Goud
 
ppt final.pptx
ppt final.pptxppt final.pptx
ppt final.pptx
kalai75
 

Similar a Data mining with big data (20)

Big data
Big dataBig data
Big data
 
Bigdata " new level"
Bigdata " new level"Bigdata " new level"
Bigdata " new level"
 
Big Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar SemwalBig Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar Semwal
 
Presentation on Big Data
Presentation on Big DataPresentation on Big Data
Presentation on Big Data
 
Understanding big data
Understanding big dataUnderstanding big data
Understanding big data
 
Kartikey tripathi
Kartikey tripathiKartikey tripathi
Kartikey tripathi
 
Big data Analytics
Big data Analytics Big data Analytics
Big data Analytics
 
big-data-8722-m8RQ3h1.pptx
big-data-8722-m8RQ3h1.pptxbig-data-8722-m8RQ3h1.pptx
big-data-8722-m8RQ3h1.pptx
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Big data seminor
Big data seminorBig data seminor
Big data seminor
 
In memory big data management and processing
In memory big data management and processingIn memory big data management and processing
In memory big data management and processing
 
An Overview of BigData
An Overview of BigDataAn Overview of BigData
An Overview of BigData
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.ppt
 
Big data and data mining
Big data and data miningBig data and data mining
Big data and data mining
 
ppt final.pptx
ppt final.pptxppt final.pptx
ppt final.pptx
 
Big data lecture notes
Big data lecture notesBig data lecture notes
Big data lecture notes
 
Bigdatappt 140225061440-phpapp01
Bigdatappt 140225061440-phpapp01Bigdatappt 140225061440-phpapp01
Bigdatappt 140225061440-phpapp01
 
BIG DATA & DATA ANALYTICS
BIG  DATA & DATA  ANALYTICSBIG  DATA & DATA  ANALYTICS
BIG DATA & DATA ANALYTICS
 
Big data ppt
Big  data pptBig  data ppt
Big data ppt
 
Special issues on big data
Special issues on big dataSpecial issues on big data
Special issues on big data
 

Último

Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
JoseMangaJr1
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
AroojKhan71
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 

Último (20)

Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 

Data mining with big data

  • 1. Data mining With Big Data  -xsandy
  • 2. Outlines  Introduction  What is Big Data?  How Much Data really Exist?  Literature Review  4Vs of Big Data  Proposed System  System Architecture  Big Data mining Framework  Hadoop Framework  Big Data Challenges and solution  Conclusion
  • 4. Interesting Facts  The volume of business data worldwide, across all companies, doubles every 1.2 years (was 1.5 years)  Daily 2500 quadrillion of data are produced and more than 90 percentage of data are produced within past two years.  A regular person is processing daily more data than a 16th century individual in his entire life  In the last years cost of storage and processing power dropped significantly  Bad data or poor data quality costs US businesses $600 billion annually  Facebook processes 10 TB of data every day / Twitter 7 TB  Google has over 3 million servers processing over 2 trillion searches per year in 2012 (only 22 million in 2000)
  • 6. “Big Data is the frontier of a firm's ability to store, process, and access (SPA) all the data it needs to operate effectively, make decisions, reduce risks, and serve customers.” -- Forrester
  • 7. “Big Data is the frontier of a firm's ability to store, process, and access (SPA) all the data it needs to operate effectively, make decisions, reduce risks, and serve customers.” -- Forrester
  • 8. “Big data is the data characterized by 3 attributes: volume, variety and velocity.” -- IBM
  • 9. “Big data is the data characterized by 3 attributes: volume, variety and velocity.” -- IBM
  • 10. Big Data is not about the size of the data, it’s about the value within the data.
  • 11. What is …… ?  Data Mining ‣ computational process of discovering patterns in large data sets  Big Data The term Big data is used to describe a massive volume of both structured and unstructured data that is so large that it's difficult to process using traditional database and software techniques.
  • 12.  ‘Big Data’ is similar to ‘small data’, but bigger  …but having data bigger it requires different approaches:  Techniques, tools and architecture  …with an aim to solve new problems  …or old problems in a better way
  • 13. How much Data does exist?  2.5 quintillion bytes of data are created EVERY DAY  IBM: 90 percent of the data in the world today were produced with past two years  Forms of Data????  Examples : Boing Jet, Scientific Data, Sensor Data, Internet Data,
  • 14.
  • 15. Literature Review  Data has grown tremendously.  This large amount of data is beyond the software tools to manage.  Exploring the large volume of data and extracting useful information and knowledge is a challenge, and sometimes, it is almost infeasible.  Most people don’t know what to do with all data that they already have
  • 17.  Huge Data with heterogeneous and diverse dimensionality ‣ represent huge volume of data  Autonomous sources with distributed and decentralized control ‣ main characteristics of Big Data  Complex and evolving relationships
  • 18. 4 Vs of Big Data Volume • Data quantity Velocity • Data Speed Variety • Data Types Veracity • Authenticity
  • 19. Proposed System:  Identify relationships between different idea  Capable of handling Huge volume of Data  Uses distributed parallel computing with help of Hadoop  Provides platform for process data in different dimensions and summarized results.  system architecture is to be flexible enough that the components built on top of it for expressing the various kinds of processing tasks can tune it to efficiently run these different workloads.  System will process these data within reasonable cost and time limits.
  • 20. Gap due to Lack of analysis
  • 23. Big Data Mining framework  Big Data Mining Platform  Dig Data Semantics and Application Knowledge I. Information Sharing and Data Privacy II. Domain and Application Knowledge  Big Data Mining Algorithm I. Local Learning and Model Fusion for Multiple Information Sources II. mining from Sparse, Uncertain, and Incomplete Data III. Mining Complex and Dynamic Data
  • 24. Big Data mining Framework
  • 25. Challenges Location of Big Data sources- Commonly Big Data are stored in different locations Volume of the Big Data- size of the Big Data grows continuously. Hardware resources- RAM capacity Privacy- Medical reports, bank transactions Having domain knowledge Getting meaningful information
  • 26. Solutions Parallel computing programming An efficient platform for computing will not have centralized data storage instead of that platform will be distributed in big scale storage. Restricting access to the data
  • 27. Advantages:  Fast response  Extract useful information  Prediction of required data from large amount of data.  Savour of better results in the form of visualization.
  • 28. Conclusion  We have entered an era of Big Data. Through better analysis of the large volumes of data that are becoming available, there is the potential for making faster advances in many scientific and improving the profitability and success of many enterprises by using technologies like hadoop ,pig and so on.  Proposed system will fully serviceable across a large variety of application domains, and therefore not cost-effective to address in the context of one domain alone.  Furthermore, this system will provide fully transformative solutions, and will be address naturally for the next generation of industrial applications. We must support and encourage this proposed framework towards addressing these technical challenges of unstructured data, if we are to achieve the promised benefits of Big Data.

Notas del editor

  1. Sourcessssssssss Social network Satellite data Geographical data Live streaming data
  2. Acco.to IBM