SlideShare una empresa de Scribd logo
1 de 18
:Hype or necessity 
Presented by : 
Swapnaja Tandale 
BECSE(WIT,Solapur)
 Introduction : 
 Big data 
 Why study big data ? 
 The Three V’s 
 Data analysis and storage 
 Big Data Technology : 
 Hadoop 
• HDFS 
• MapReduce 
 Conclusion 
2
Big Data refers to datasets that grow so large that it is difficult 
to 
capture, store, manage, share, analyze and visualize with 
The typical database software tools. 
3
Social media 
and 
networks 
Scientific 
instruments 
Mobile 
devices 
Sensor 
technology 
and networks * To Analyze it.. 
large amounts of different data types, or 
big data, in an effort to uncover hidden 
patterns, unknown correlations and other 
useful information i.e Big data is GOLDMINE.
Big data Examples 
5+ 
billion 
people 
on the 
Web by 
end 
2014 
30 billion RFID 
tags today 
(1.3B in 2005) 
4.6 
billion 
camera 
phones 
world 
wide 
100s of 
millions 
of GPS 
enabled 
devices 
sold 
annually 
76 million smart 
meters in 2009… 
200M by 2014 
12+ TBs 
of tweet data 
every day 
10 billion 
people(1PB) 
? TBs of 
data every day 
5
 Volume :The amount of data is big. 
 Velocity : 
 How fast is data available for analysis 
 How fast you can use data 
 Variety : 
 Structured 
 Semi-structured 
 Unstructured 
Other V’s => Veracity ,Variability ,Visualization ,Value .. 
6
 Data Volume 
◦ 44x increase from 2009- 2020 
◦ From 0.8 zettabytes to 35zb. 
 Data volume is increasing exponentially . 
---- 
exponential
 Pre-defined schema imposed on data 
 Highly patterned structured 
 Usually stored in relational database system 
Numbers :20,3.14 
String:”Hello World” 
Dates: 08/04/2014 
Roughly 20% of all data out there is structured . 
8
 Inconsistent structure 
 Cannot be stored typically in tables or database 
 Information is often self-describing( 
label/value) pairs 
 No fix data models 
• Xml – Extensible markup language . 
• Sgml – Standard Generalized markup language . 
• Logs - Catlogs , Weblogs ,Graph logs . 
• Tweets. 
9
 Data does not resides in any particular form 
i.e row-column 
 Opposite of structured data 
•Multimedia –videos,photos,audio,files 
•Email ,Messages 
•Presentation and reports 
•Free form text 
•Word processing documents 
According to experts 80-90% of data in any organization is 
unstructured data . 
10
 Storage capacity of hard drives has increased 
massively over the years. 
 Access speeds have not kept up 
Year Storage 
Capacity 
Transfer 
Speed 
Time 
1990 1370 mb 4.4mbps <5min. 
2010 1Tb 100mbps >2.5hrs. 
 Problem and its solution :Big Data technology. 
11
12
To The Rescue! 
“Hadoop” 
Apache Hadoop is a framework for running applications on 
large cluster built of commodity hardware. 
A common way of avoiding data loss is through replication,The 
Hadoop Distributed Filesystem (HDFS), takes care of this 
problem. 
The second problem is solved by a simple programming model- 
Mapreduce. Hadoop is the popular open source implementation 
of MapReduce, a powerful tool designed for deep analysis and 
transformation of very large data sets. 
13
HDFS 
“Moving Computation is Cheaper than Moving Data” 
14 
HDFS, is a distributed file system designed to hold very 
large amounts of data (terabytes or even petabytes) 
•Redundant copies of the data are kept by the system so that 
in the event of failure, there is another copy available. 
•Portability Across Heterogeneous Hardware and Software 
Platforms
 MapReduce is a programming model . 
 Programs written in this functional style are automatically parallelized and 
executed on a large cluster of commodity machines. 
 MapReduce is an associated implementation for processing and generating large 
data sets. 
MapReduce 
MAP 
map function that processes a 
key/value pair to generate a 
set of intermediate key/value 
pairs 
REDUCE 
and a reduce function that 
merges all intermediate values 
associated with the same 
intermediate key. 
15
References 
o Hadoop- The Definitive Guide, O’Reilly 
2009, Yahoo! Press – Tom White. 
o http://en.wikipedia.org/wiki/Big_data 
* www.technologyreview.in/featured-story/ 
401775/10-emerging-technologies-that- 
will-change-the/ 
16
Conclusion 
BIG DATA is a key for innovation and has a high potential for 
value creation. There are huge opportunities, for example 
concerning healthcare, location related data, retail, manufacturing, 
or social data. There are also challenges, for example concerning 
data volume, data quality, data capturing, and data management, 
such as privacy, security or governance. 
17
Thank 
You

Más contenido relacionado

La actualidad más candente

Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...
Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...
Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...yashbheda
 
JPJ1417 Data Mining With Big Data
JPJ1417   Data Mining With Big DataJPJ1417   Data Mining With Big Data
JPJ1417 Data Mining With Big Datachennaijp
 
big data and hadoop
 big data and hadoop big data and hadoop
big data and hadoopahmed alshikh
 
Introduction to Big Data Hadoop Training Online by www.itjobzone.biz
Introduction to Big Data Hadoop Training Online by www.itjobzone.bizIntroduction to Big Data Hadoop Training Online by www.itjobzone.biz
Introduction to Big Data Hadoop Training Online by www.itjobzone.bizITJobZone.biz
 

La actualidad más candente (14)

Cs501 dm intro
Cs501 dm introCs501 dm intro
Cs501 dm intro
 
Big_data_ppt
Big_data_ppt Big_data_ppt
Big_data_ppt
 
Big data
Big dataBig data
Big data
 
Bar camp bigdata
Bar camp bigdataBar camp bigdata
Bar camp bigdata
 
Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...
Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...
Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...
 
Big data analytics.
Big data analytics.Big data analytics.
Big data analytics.
 
Big Data
Big DataBig Data
Big Data
 
JPJ1417 Data Mining With Big Data
JPJ1417   Data Mining With Big DataJPJ1417   Data Mining With Big Data
JPJ1417 Data Mining With Big Data
 
Data mining on big data
Data mining on big dataData mining on big data
Data mining on big data
 
big data and hadoop
 big data and hadoop big data and hadoop
big data and hadoop
 
Big Data
Big DataBig Data
Big Data
 
Big data tools
Big data toolsBig data tools
Big data tools
 
Overview of Bigdata Analytics
Overview of Bigdata Analytics Overview of Bigdata Analytics
Overview of Bigdata Analytics
 
Introduction to Big Data Hadoop Training Online by www.itjobzone.biz
Introduction to Big Data Hadoop Training Online by www.itjobzone.bizIntroduction to Big Data Hadoop Training Online by www.itjobzone.biz
Introduction to Big Data Hadoop Training Online by www.itjobzone.biz
 

Destacado

BigData and Hadoop Ecosystems_Foundation E-certficate
BigData and Hadoop Ecosystems_Foundation E-certficateBigData and Hadoop Ecosystems_Foundation E-certficate
BigData and Hadoop Ecosystems_Foundation E-certficatePriyanka Halu
 
Bigdata presentation
Bigdata presentationBigdata presentation
Bigdata presentationSatishAlerts
 
Extending the Yahoo Streaming Benchmark + MapR Benchmarks
Extending the Yahoo Streaming Benchmark + MapR BenchmarksExtending the Yahoo Streaming Benchmark + MapR Benchmarks
Extending the Yahoo Streaming Benchmark + MapR BenchmarksJamie Grier
 
BigData - Hadoop -by 侯圣文@secooler
BigData - Hadoop -by 侯圣文@secooler BigData - Hadoop -by 侯圣文@secooler
BigData - Hadoop -by 侯圣文@secooler Shengwen HOU(侯圣文)
 
Big Data
Big DataBig Data
Big DataNGDATA
 

Destacado (10)

BigData and Hadoop Ecosystems_Foundation E-certficate
BigData and Hadoop Ecosystems_Foundation E-certficateBigData and Hadoop Ecosystems_Foundation E-certficate
BigData and Hadoop Ecosystems_Foundation E-certficate
 
Bigdata presentation
Bigdata presentationBigdata presentation
Bigdata presentation
 
BIGDATA & HADOOP PROJECT
BIGDATA & HADOOP PROJECTBIGDATA & HADOOP PROJECT
BIGDATA & HADOOP PROJECT
 
Extending the Yahoo Streaming Benchmark + MapR Benchmarks
Extending the Yahoo Streaming Benchmark + MapR BenchmarksExtending the Yahoo Streaming Benchmark + MapR Benchmarks
Extending the Yahoo Streaming Benchmark + MapR Benchmarks
 
BigData - Hadoop -by 侯圣文@secooler
BigData - Hadoop -by 侯圣文@secooler BigData - Hadoop -by 侯圣文@secooler
BigData - Hadoop -by 侯圣文@secooler
 
What is big data?
What is big data?What is big data?
What is big data?
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Big Data
Big DataBig Data
Big Data
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?
 
Big data ppt
Big  data pptBig  data ppt
Big data ppt
 

Similar a Hype or necessity: Big data analysis and storage using Hadoop

Similar a Hype or necessity: Big data analysis and storage using Hadoop (20)

Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.ppt
 
Bigdata and Hadoop Bootcamp
Bigdata and Hadoop BootcampBigdata and Hadoop Bootcamp
Bigdata and Hadoop Bootcamp
 
Unit 1
Unit 1Unit 1
Unit 1
 
BIG DATA
BIG DATABIG DATA
BIG DATA
 
Hadoop HDFS.ppt
Hadoop HDFS.pptHadoop HDFS.ppt
Hadoop HDFS.ppt
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Big data présentation
Big data présentationBig data présentation
Big data présentation
 
Big data(1st presentation)
Big data(1st presentation)Big data(1st presentation)
Big data(1st presentation)
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
 
Big Data and BI Tools - BI Reporting for Bay Area Startups User Group
Big Data and BI Tools - BI Reporting for Bay Area Startups User GroupBig Data and BI Tools - BI Reporting for Bay Area Startups User Group
Big Data and BI Tools - BI Reporting for Bay Area Startups User Group
 
Hadoop
HadoopHadoop
Hadoop
 

Último

(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...ranjana rawat
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdfankushspencer015
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college projectTonystark477637
 

Último (20)

(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
 

Hype or necessity: Big data analysis and storage using Hadoop

  • 1. :Hype or necessity Presented by : Swapnaja Tandale BECSE(WIT,Solapur)
  • 2.  Introduction :  Big data  Why study big data ?  The Three V’s  Data analysis and storage  Big Data Technology :  Hadoop • HDFS • MapReduce  Conclusion 2
  • 3. Big Data refers to datasets that grow so large that it is difficult to capture, store, manage, share, analyze and visualize with The typical database software tools. 3
  • 4. Social media and networks Scientific instruments Mobile devices Sensor technology and networks * To Analyze it.. large amounts of different data types, or big data, in an effort to uncover hidden patterns, unknown correlations and other useful information i.e Big data is GOLDMINE.
  • 5. Big data Examples 5+ billion people on the Web by end 2014 30 billion RFID tags today (1.3B in 2005) 4.6 billion camera phones world wide 100s of millions of GPS enabled devices sold annually 76 million smart meters in 2009… 200M by 2014 12+ TBs of tweet data every day 10 billion people(1PB) ? TBs of data every day 5
  • 6.  Volume :The amount of data is big.  Velocity :  How fast is data available for analysis  How fast you can use data  Variety :  Structured  Semi-structured  Unstructured Other V’s => Veracity ,Variability ,Visualization ,Value .. 6
  • 7.  Data Volume ◦ 44x increase from 2009- 2020 ◦ From 0.8 zettabytes to 35zb.  Data volume is increasing exponentially . ---- exponential
  • 8.  Pre-defined schema imposed on data  Highly patterned structured  Usually stored in relational database system Numbers :20,3.14 String:”Hello World” Dates: 08/04/2014 Roughly 20% of all data out there is structured . 8
  • 9.  Inconsistent structure  Cannot be stored typically in tables or database  Information is often self-describing( label/value) pairs  No fix data models • Xml – Extensible markup language . • Sgml – Standard Generalized markup language . • Logs - Catlogs , Weblogs ,Graph logs . • Tweets. 9
  • 10.  Data does not resides in any particular form i.e row-column  Opposite of structured data •Multimedia –videos,photos,audio,files •Email ,Messages •Presentation and reports •Free form text •Word processing documents According to experts 80-90% of data in any organization is unstructured data . 10
  • 11.  Storage capacity of hard drives has increased massively over the years.  Access speeds have not kept up Year Storage Capacity Transfer Speed Time 1990 1370 mb 4.4mbps <5min. 2010 1Tb 100mbps >2.5hrs.  Problem and its solution :Big Data technology. 11
  • 12. 12
  • 13. To The Rescue! “Hadoop” Apache Hadoop is a framework for running applications on large cluster built of commodity hardware. A common way of avoiding data loss is through replication,The Hadoop Distributed Filesystem (HDFS), takes care of this problem. The second problem is solved by a simple programming model- Mapreduce. Hadoop is the popular open source implementation of MapReduce, a powerful tool designed for deep analysis and transformation of very large data sets. 13
  • 14. HDFS “Moving Computation is Cheaper than Moving Data” 14 HDFS, is a distributed file system designed to hold very large amounts of data (terabytes or even petabytes) •Redundant copies of the data are kept by the system so that in the event of failure, there is another copy available. •Portability Across Heterogeneous Hardware and Software Platforms
  • 15.  MapReduce is a programming model .  Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines.  MapReduce is an associated implementation for processing and generating large data sets. MapReduce MAP map function that processes a key/value pair to generate a set of intermediate key/value pairs REDUCE and a reduce function that merges all intermediate values associated with the same intermediate key. 15
  • 16. References o Hadoop- The Definitive Guide, O’Reilly 2009, Yahoo! Press – Tom White. o http://en.wikipedia.org/wiki/Big_data * www.technologyreview.in/featured-story/ 401775/10-emerging-technologies-that- will-change-the/ 16
  • 17. Conclusion BIG DATA is a key for innovation and has a high potential for value creation. There are huge opportunities, for example concerning healthcare, location related data, retail, manufacturing, or social data. There are also challenges, for example concerning data volume, data quality, data capturing, and data management, such as privacy, security or governance. 17