SlideShare una empresa de Scribd logo
1 de 41
Big Data Analytics
& Trends
Presentation
by
Dr.K.Sreenivasa Rao
Dept. of CSE, VBIT
Content
1. What is Big data ?
2. Why Big data ?
3. Some Definitions.
4. Types of data-Structured, Unstructured & Semi
structured
5. The data Landscape
6. Some other definitions
7. Characteristics of big data
8. Data generation Points
9. Big Data analytics
10.Example Scenario
11.Challenges of Big data
12.Hadoop, History & Complementary Packages
13.Difference between Big data & Data Science.
14.Salary Trends in Hadoop/Big Data
What is Big data?
•Facebook generates 10TB daily
•Twitter generates 7TB of data Daily
•IBM claims 90% of today’s stored data was generated
in just the last two years.
Why Big Data ?
• Growth of Big Data is needed because of
– Increase of storage capacities
– Increase of processing power
– Availability of data(different data types)
– Every day we create 2.5 Million TB[quintillion bytes(1
Quintillionbyte= 1 Exabyte=1000Petabytes where 1
Petabyte=1000 TB)] of data; 90% of the data in the
world today has been created in the last two years
alone.
• FB generates 10TB daily
• Twitter generates 7TB of data Daily
• IBM claims 90% of today’s stored data was generated in
just the last two years.
Some Definitions
• Big data is a "catch all" word, related to the
power of using a lot of data to solve
problems.. Big data is the data that is large
enough and complex that it becomes
difficult to process using a single
computer...
• Big data is simply the large sets of data that
businesses and other parties put together to
serve specific goals and operations. Big data
can include many different kinds of data in
many different kinds of formats.
Some Definitions
• Big data is an evolving term that describes any
voluminous amount of structured,
semi structured and unstructured data that
has the potential to be mined for information.
[Ref:
Strata + Hadoop World 2016: Hadoop and Spark in spotlight]
RDF-Resource Description Framework
Some Other Definitions
• Gartner defines Big Data as high volume, velocity and
variety information assets that demand cost-effective,
innovative forms of information processing for
enhanced insight and decision making.
• Big data is often characterized by 3Vs: the
extreme volume of data, the wide variety of data types
and the velocity at which the data must be processed.
Although big data doesn't equate to any specific
volume of data, the term is often used to describe
Terabytes, Petabytes and even Exabytes of data
captured over time.
Characteristics of Big data
Volume: (Data Quantity)
• Twitter generates about 80 MB per second.
• Facebook generates 10 TB data per day.
• Black box data: Single flight generates nearly 10 TB of data per
every ½ an hour.
• Twitter generates of about 80 MB every second.
Velocity: (Data Speed) ebay analyzes 5 million transactions per day.
• Finally, velocity refers to the speed at which big data must be
analyzed. Velocity is also meaningful, as big data analysis expands
into fields like machine learning and artificial intelligence, where
analytical processes mimic perception by finding and using patterns
in the collected data.
Variety: (Data Types) Bigdata includes data from e-commerce sites,
health care data, education, stock exchange, banking etc…..
Varying in Time:
• [http://searchcloudcomputing.techtarget.com/definition/big-data-Big-Data]
• http://www.information-management.com/news/big-data-analytics/the-
Data generation Points Examples
Mobile Devices
Readers/Scanners
Science facilities
Microphones
Cameras
Social Media
Programs/ Software
Big Data Analytics
• Examining large amount of data
• Appropriate information
• Identification of hidden patterns, unknown correlations
• Competitive advantage
• Better business decisions: Strategic and Operational
• Effective marketing, customer satisfaction, increased
revenue
Example Scenario
U need reading articles,
Pictures & videos, links to
facebook & twitter etc….
Pictures & reading articles
Watching Videos etc… still have no clarity….
Such bigdata is to be sorted, filtered &
analyzed to produce useful information
for decision making.
For haps facebook may help u better to identify best
gym equipment for your office…..
Finally Analytics gives us useful insight or information
from big data.
Challenges of big data:
• Problem: To read 1 TB data from a hard drive
• Sol1: 1 machine of 4 I/O channels of 100 MBps
• 1 TB=1024*1024 MB
• 10,48,576 MB
• =10, 485 Seconds
• =174.75 Minutes by 1 i/o channel
• =174.75/4
• =43.6 Minutes for by 4 i/o channels
• Sol2: If 10 machines are used for reading it takes
43.6/10=4.36 minutes to read 1 TB data.
• i.e to analyze big data, first we need to read it,
today challenge is i/o speed but not storage
capacity.
• Challenge is to read/write data but not to store it.
• Hadoop is framework to solve the above challenges.
Hadoop
• Hadoop: is an open source java based programming framework that
supports processing of large datasets in distributed computing
environment. It is a part of apache project sponsored by Apache
Software Foundation.
• It is designed to answer the question “How to process big data with
reasonable cost & time”.
• Definition2:
• Apache hadoop ia a framework for distributed processing of large
datasets across clusters of commodity computers/hardware using
simple programming model (mapReduce).
• Commodity hardware is cheap & more in number rather than high
cost high end, less number of servers or super/micro computers.
• Who use hadoop ?:
• Indian Aadar scheme is using hadoop.
• Google has built a new version of distributed file system using
hadoop to handle & analyze its data.
• Yahoo
• Facebook etc….
• History:
• It was founded by yahoo in 2005.
• It was handed over to Google in 2006.
• Now it is Apache hadoop.
• Some Public Cloud services that gives hadoop:
• AWS Elastic MapReduce
• Amazon EC2/S3
• Google Cloud DataProc
Hadoop Components:
• 1.HDFS: (Hadoop Distributed File System)
for storing data across thousands of servers
to achieve high bandwidth.
• 2.MapReduce: Provides programming model
to handle large distributed processing
–mapping data & reducing it to a result.
• Hadoop is the popular open source
implementation of MapReduce, a powerful
tool designed for deep analysis and
transformation of very large data sets. 
Complementary software packages:
• The term Hadoop has come to refer not just to the base modules
above, but also to collection of additional software packages that
can be installed on top of or alongside Hadoop, such as 
• Apache Pig, 
• Apache Hive, 
• Apache HBase, 
• Apache Phoenix, 
• Apache Spark, 
• Apache ZooKeeper, 
• Cloudera Impala, 
• Apache Flume, 
• Apache Sqoop, 
• Apache Oozie, 
• Apache Storm.
• HBase: An open source , non relational distributed database.
• Hive: A datawarehouse that provides data summary
• Pig: A high level platform that creates programs run on hadoop.
• Apache Spark: A fast engine for bigdata processing capable of
streaming & supporting SQL, machine learning, grapg processing.
One survey says, 80 % of hadoop projects are going to mature in
2016 & people are looking towards apache spark for their next
projects.
• Where processing is hosted?
– Distributed Servers / Cloud (e.g. Amazon EC2)
• Where data is stored?
– Distributed Storage (e.g. Amazon S3)
• What is the programming model?
– Distributed Processing (e.g. MapReduce)
• How data is stored & indexed?
– High-performance schema-free databases (e.g. MongoDB)
• What operations are performed on data?
– Analytic / Semantic Processing
Types of tools used in
Big-Data
Difference between Big data & Data Science.
• [http://www.kdnuggets.com/2015/07/data-science-big-data-different-beasts.html]
• Creating artifact from the ore requires the tools, craftmanship and science.
Same is the case of big data and data science, here we present the
distinguishing factors between the ore and the artifact.
• Data Science looks to create models that capture the
underlying patterns of complex systems, and codify those models into
working applications. Big Data looks to collect and manage large
amounts of varied data to serve large-scale web applications and vast
sensor networks.
Although both offer the
potential to produce value
from data, the fundamental
difference between Data
Science and Big Data can be
summarized in one
statement:
-Collecting Does Not
Mean Discovering
Investments in data-focused activities center around
tools instead of approaches. The engineering cart
gets put before the scientific horse, leaving an
organization with a big set of tools, and a small
amount of knowledge on how to convert data into
something useful.
So, Data Science is expertise in converting data to
an useful information/products that answer
always-changing demands of the market.
Salary Trends for Bigdata/hadoop
• Big Data Hadoop Salary Trends
• 1.Average Big Data salaries have increased by 9.3% in the last
12 months. Current salary range is between $119,250 to
$168,250.
• 2.A Hadoop developer making $120,000 will be evaluated by
competitor companies at $155,000. Thats a 29% hike.
• 3.On average there is a new Big Data/Hadoop technology
released every 6 weeks. So make sure you stay updated.
• 4.The average salary for a Hadoop Developer in San Francisco,
CA, is $139,000.
• 5.A Senior Hadoop developer in San Francisco, CA can earn over
$178,000 on an average.
• 6.Hortonworks, Paxata, Bloomberg LP - are hiring top Big Data
Hadoop talent for the highest pay package.
• 7.The states with the most Hadoop Big Data jobs are California,
New York, New Jersey and Texas. - duh that was obvious :)
So, make sure, you stay updated
Future of Big Data
• $15 billion on software firms only specializing in
data management and analytics.
• This industry on its own is worth more than $100
billion and growing at almost 10% a year which is
roughly twice as fast as the software business as a
whole.
• In February 2012, the open source analyst firm
Wikibon released the first market forecast for Big
Data , listing $5.1B revenue in 2012 with growth to
$53.4B in 2017
• The McKinsey Global Institute estimates that data
volume is growing 40% per year, and will grow 44x
between 2009 and 2020.
• So, Data Science as a career goal will enrich
employability of the graduate in future market.
• Big data Market Forecast
References
• www.Slideshare.com
• www.wikipedia.com
• www.computereducation.org
• Strata + Hadoop World 2016: Hadoop and Spark in
spotlight
• http://searchcloudcomputing.techtarget.com/definition/bi
g-data-Big-Data
• http://www.information-management.com/news/big-data-
analytics/the-top-5-trends-in-big-data-for-2017-10029956-
1.html
• Books-
 Big Data by Viktor Mayer-Schonberger
Big Data Analytics & Trends Presentation

Más contenido relacionado

La actualidad más candente

Speeding Time to Insight with a Modern ELT Approach
Speeding Time to Insight with a Modern ELT ApproachSpeeding Time to Insight with a Modern ELT Approach
Speeding Time to Insight with a Modern ELT ApproachDatabricks
 
Big Data Storage Challenges and Solutions
Big Data Storage Challenges and SolutionsBig Data Storage Challenges and Solutions
Big Data Storage Challenges and SolutionsWSO2
 
Data warehouse concepts
Data warehouse conceptsData warehouse concepts
Data warehouse conceptsobieefans
 
Introduction to Map-Reduce
Introduction to Map-ReduceIntroduction to Map-Reduce
Introduction to Map-ReduceBrendan Tierney
 
Data mining concepts and work
Data mining concepts and workData mining concepts and work
Data mining concepts and workAmr Abd El Latief
 
Data Warehouse Basic Guide
Data Warehouse Basic GuideData Warehouse Basic Guide
Data Warehouse Basic Guidethomasmary607
 
Introduction to Data Engineering
Introduction to Data EngineeringIntroduction to Data Engineering
Introduction to Data EngineeringHadi Fadlallah
 
Introduction To Data Mining
Introduction To Data Mining   Introduction To Data Mining
Introduction To Data Mining Phi Jack
 
Data Warehouse Fundamentals
Data Warehouse FundamentalsData Warehouse Fundamentals
Data Warehouse FundamentalsRashmi Bhat
 
Who is a Data Scientist? | How to become a Data Scientist? | Data Science Cou...
Who is a Data Scientist? | How to become a Data Scientist? | Data Science Cou...Who is a Data Scientist? | How to become a Data Scientist? | Data Science Cou...
Who is a Data Scientist? | How to become a Data Scientist? | Data Science Cou...Edureka!
 
Data Modeling Fundamentals
Data Modeling FundamentalsData Modeling Fundamentals
Data Modeling FundamentalsDATAVERSITY
 
Data Visualisation for Data Science
Data Visualisation for Data ScienceData Visualisation for Data Science
Data Visualisation for Data ScienceChristophe Bontemps
 
Data Archiving & Purging.pptx
Data Archiving & Purging.pptxData Archiving & Purging.pptx
Data Archiving & Purging.pptxMishika Bharadwaj
 
Data Science Tutorial | Introduction To Data Science | Data Science Training ...
Data Science Tutorial | Introduction To Data Science | Data Science Training ...Data Science Tutorial | Introduction To Data Science | Data Science Training ...
Data Science Tutorial | Introduction To Data Science | Data Science Training ...Edureka!
 

La actualidad más candente (20)

What is big data?
What is big data?What is big data?
What is big data?
 
Speeding Time to Insight with a Modern ELT Approach
Speeding Time to Insight with a Modern ELT ApproachSpeeding Time to Insight with a Modern ELT Approach
Speeding Time to Insight with a Modern ELT Approach
 
NOSQL vs SQL
NOSQL vs SQLNOSQL vs SQL
NOSQL vs SQL
 
Big Data Storage Challenges and Solutions
Big Data Storage Challenges and SolutionsBig Data Storage Challenges and Solutions
Big Data Storage Challenges and Solutions
 
Data warehouse concepts
Data warehouse conceptsData warehouse concepts
Data warehouse concepts
 
Introduction to Map-Reduce
Introduction to Map-ReduceIntroduction to Map-Reduce
Introduction to Map-Reduce
 
Data mining concepts and work
Data mining concepts and workData mining concepts and work
Data mining concepts and work
 
Data Warehouse Basic Guide
Data Warehouse Basic GuideData Warehouse Basic Guide
Data Warehouse Basic Guide
 
Introduction to Data Engineering
Introduction to Data EngineeringIntroduction to Data Engineering
Introduction to Data Engineering
 
Introduction To Data Mining
Introduction To Data Mining   Introduction To Data Mining
Introduction To Data Mining
 
Data Warehouse Fundamentals
Data Warehouse FundamentalsData Warehouse Fundamentals
Data Warehouse Fundamentals
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Who is a Data Scientist? | How to become a Data Scientist? | Data Science Cou...
Who is a Data Scientist? | How to become a Data Scientist? | Data Science Cou...Who is a Data Scientist? | How to become a Data Scientist? | Data Science Cou...
Who is a Data Scientist? | How to become a Data Scientist? | Data Science Cou...
 
Data Science
Data ScienceData Science
Data Science
 
Data Modeling Fundamentals
Data Modeling FundamentalsData Modeling Fundamentals
Data Modeling Fundamentals
 
Data Engineering Basics
Data Engineering BasicsData Engineering Basics
Data Engineering Basics
 
Data Visualisation for Data Science
Data Visualisation for Data ScienceData Visualisation for Data Science
Data Visualisation for Data Science
 
Data Archiving & Purging.pptx
Data Archiving & Purging.pptxData Archiving & Purging.pptx
Data Archiving & Purging.pptx
 
Data Science Tutorial | Introduction To Data Science | Data Science Training ...
Data Science Tutorial | Introduction To Data Science | Data Science Training ...Data Science Tutorial | Introduction To Data Science | Data Science Training ...
Data Science Tutorial | Introduction To Data Science | Data Science Training ...
 

Similar a Big Data Analytics & Trends Presentation

Hadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewHadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewAbhishek Roy
 
Big data with Hadoop - Introduction
Big data with Hadoop - IntroductionBig data with Hadoop - Introduction
Big data with Hadoop - IntroductionTomy Rhymond
 
Big-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-KoenigBig-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-KoenigManish Chopra
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataIMC Institute
 
Big data by Mithlesh sadh
Big data by Mithlesh sadhBig data by Mithlesh sadh
Big data by Mithlesh sadhMithlesh Sadh
 
Big Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and moreBig Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and moreSoftweb Solutions
 
Big data4businessusers
Big data4businessusersBig data4businessusers
Big data4businessusersBob Hardaway
 
Content1. Introduction2. What is Big Data3. Characte.docx
Content1. Introduction2. What is Big Data3. Characte.docxContent1. Introduction2. What is Big Data3. Characte.docx
Content1. Introduction2. What is Big Data3. Characte.docxdickonsondorris
 
Oh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG DataOh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG DataPrakalp Agarwal
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptalmaraniabwmalk
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataSpringPeople
 
Special issues on big data
Special issues on big dataSpecial issues on big data
Special issues on big dataVedanand Singh
 

Similar a Big Data Analytics & Trends Presentation (20)

Hadoop HDFS.ppt
Hadoop HDFS.pptHadoop HDFS.ppt
Hadoop HDFS.ppt
 
Hadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewHadoop Master Class : A concise overview
Hadoop Master Class : A concise overview
 
Big data with Hadoop - Introduction
Big data with Hadoop - IntroductionBig data with Hadoop - Introduction
Big data with Hadoop - Introduction
 
Big-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-KoenigBig-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-Koenig
 
Big data
Big dataBig data
Big data
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Big Data
Big DataBig Data
Big Data
 
Big Data
Big DataBig Data
Big Data
 
Big data by Mithlesh sadh
Big data by Mithlesh sadhBig data by Mithlesh sadh
Big data by Mithlesh sadh
 
Big Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and moreBig Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and more
 
Presentation on Big Data
Presentation on Big DataPresentation on Big Data
Presentation on Big Data
 
Big data4businessusers
Big data4businessusersBig data4businessusers
Big data4businessusers
 
Content1. Introduction2. What is Big Data3. Characte.docx
Content1. Introduction2. What is Big Data3. Characte.docxContent1. Introduction2. What is Big Data3. Characte.docx
Content1. Introduction2. What is Big Data3. Characte.docx
 
Oh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG DataOh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG Data
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.ppt
 
big-data-notes1.ppt
big-data-notes1.pptbig-data-notes1.ppt
big-data-notes1.ppt
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Special issues on big data
Special issues on big dataSpecial issues on big data
Special issues on big data
 
Big_Data_ppt[1] (1).pptx
Big_Data_ppt[1] (1).pptxBig_Data_ppt[1] (1).pptx
Big_Data_ppt[1] (1).pptx
 

Último

Industrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.pptIndustrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.pptNarmatha D
 
Transport layer issues and challenges - Guide
Transport layer issues and challenges - GuideTransport layer issues and challenges - Guide
Transport layer issues and challenges - GuideGOPINATHS437943
 
Steel Structures - Building technology.pptx
Steel Structures - Building technology.pptxSteel Structures - Building technology.pptx
Steel Structures - Building technology.pptxNikhil Raut
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 
National Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdfNational Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdfRajuKanojiya4
 
Main Memory Management in Operating System
Main Memory Management in Operating SystemMain Memory Management in Operating System
Main Memory Management in Operating SystemRashmi Bhat
 
US Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionUS Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionMebane Rash
 
Solving The Right Triangles PowerPoint 2.ppt
Solving The Right Triangles PowerPoint 2.pptSolving The Right Triangles PowerPoint 2.ppt
Solving The Right Triangles PowerPoint 2.pptJasonTagapanGulla
 
Industrial Safety Unit-I SAFETY TERMINOLOGIES
Industrial Safety Unit-I SAFETY TERMINOLOGIESIndustrial Safety Unit-I SAFETY TERMINOLOGIES
Industrial Safety Unit-I SAFETY TERMINOLOGIESNarmatha D
 
home automation using Arduino by Aditya Prasad
home automation using Arduino by Aditya Prasadhome automation using Arduino by Aditya Prasad
home automation using Arduino by Aditya Prasadaditya806802
 
The SRE Report 2024 - Great Findings for the teams
The SRE Report 2024 - Great Findings for the teamsThe SRE Report 2024 - Great Findings for the teams
The SRE Report 2024 - Great Findings for the teamsDILIPKUMARMONDAL6
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)Dr SOUNDIRARAJ N
 
Virtual memory management in Operating System
Virtual memory management in Operating SystemVirtual memory management in Operating System
Virtual memory management in Operating SystemRashmi Bhat
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfAsst.prof M.Gokilavani
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptSAURABHKUMAR892774
 
Energy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptxEnergy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptxsiddharthjain2303
 
Vishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documentsVishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documentsSachinPawar510423
 

Último (20)

Industrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.pptIndustrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.ppt
 
Transport layer issues and challenges - Guide
Transport layer issues and challenges - GuideTransport layer issues and challenges - Guide
Transport layer issues and challenges - Guide
 
Steel Structures - Building technology.pptx
Steel Structures - Building technology.pptxSteel Structures - Building technology.pptx
Steel Structures - Building technology.pptx
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 
National Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdfNational Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdf
 
young call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Serviceyoung call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Service
 
Main Memory Management in Operating System
Main Memory Management in Operating SystemMain Memory Management in Operating System
Main Memory Management in Operating System
 
US Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionUS Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of Action
 
Solving The Right Triangles PowerPoint 2.ppt
Solving The Right Triangles PowerPoint 2.pptSolving The Right Triangles PowerPoint 2.ppt
Solving The Right Triangles PowerPoint 2.ppt
 
Industrial Safety Unit-I SAFETY TERMINOLOGIES
Industrial Safety Unit-I SAFETY TERMINOLOGIESIndustrial Safety Unit-I SAFETY TERMINOLOGIES
Industrial Safety Unit-I SAFETY TERMINOLOGIES
 
home automation using Arduino by Aditya Prasad
home automation using Arduino by Aditya Prasadhome automation using Arduino by Aditya Prasad
home automation using Arduino by Aditya Prasad
 
The SRE Report 2024 - Great Findings for the teams
The SRE Report 2024 - Great Findings for the teamsThe SRE Report 2024 - Great Findings for the teams
The SRE Report 2024 - Great Findings for the teams
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
 
Virtual memory management in Operating System
Virtual memory management in Operating SystemVirtual memory management in Operating System
Virtual memory management in Operating System
 
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.ppt
 
Energy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptxEnergy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptx
 
Vishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documentsVishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documents
 
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
 

Big Data Analytics & Trends Presentation

  • 1. Big Data Analytics & Trends Presentation by Dr.K.Sreenivasa Rao Dept. of CSE, VBIT
  • 2. Content 1. What is Big data ? 2. Why Big data ? 3. Some Definitions. 4. Types of data-Structured, Unstructured & Semi structured 5. The data Landscape 6. Some other definitions 7. Characteristics of big data 8. Data generation Points 9. Big Data analytics 10.Example Scenario 11.Challenges of Big data 12.Hadoop, History & Complementary Packages 13.Difference between Big data & Data Science. 14.Salary Trends in Hadoop/Big Data
  • 3. What is Big data? •Facebook generates 10TB daily •Twitter generates 7TB of data Daily •IBM claims 90% of today’s stored data was generated in just the last two years.
  • 4. Why Big Data ? • Growth of Big Data is needed because of – Increase of storage capacities – Increase of processing power – Availability of data(different data types) – Every day we create 2.5 Million TB[quintillion bytes(1 Quintillionbyte= 1 Exabyte=1000Petabytes where 1 Petabyte=1000 TB)] of data; 90% of the data in the world today has been created in the last two years alone. • FB generates 10TB daily • Twitter generates 7TB of data Daily • IBM claims 90% of today’s stored data was generated in just the last two years.
  • 5. Some Definitions • Big data is a "catch all" word, related to the power of using a lot of data to solve problems.. Big data is the data that is large enough and complex that it becomes difficult to process using a single computer... • Big data is simply the large sets of data that businesses and other parties put together to serve specific goals and operations. Big data can include many different kinds of data in many different kinds of formats.
  • 6. Some Definitions • Big data is an evolving term that describes any voluminous amount of structured, semi structured and unstructured data that has the potential to be mined for information. [Ref: Strata + Hadoop World 2016: Hadoop and Spark in spotlight]
  • 7.
  • 8.
  • 9.
  • 11.
  • 12. Some Other Definitions • Gartner defines Big Data as high volume, velocity and variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making. • Big data is often characterized by 3Vs: the extreme volume of data, the wide variety of data types and the velocity at which the data must be processed. Although big data doesn't equate to any specific volume of data, the term is often used to describe Terabytes, Petabytes and even Exabytes of data captured over time.
  • 13. Characteristics of Big data Volume: (Data Quantity) • Twitter generates about 80 MB per second. • Facebook generates 10 TB data per day. • Black box data: Single flight generates nearly 10 TB of data per every ½ an hour. • Twitter generates of about 80 MB every second. Velocity: (Data Speed) ebay analyzes 5 million transactions per day. • Finally, velocity refers to the speed at which big data must be analyzed. Velocity is also meaningful, as big data analysis expands into fields like machine learning and artificial intelligence, where analytical processes mimic perception by finding and using patterns in the collected data. Variety: (Data Types) Bigdata includes data from e-commerce sites, health care data, education, stock exchange, banking etc….. Varying in Time: • [http://searchcloudcomputing.techtarget.com/definition/big-data-Big-Data]
  • 15.
  • 16. Data generation Points Examples Mobile Devices Readers/Scanners Science facilities Microphones Cameras Social Media Programs/ Software
  • 17. Big Data Analytics • Examining large amount of data • Appropriate information • Identification of hidden patterns, unknown correlations • Competitive advantage • Better business decisions: Strategic and Operational • Effective marketing, customer satisfaction, increased revenue
  • 18. Example Scenario U need reading articles, Pictures & videos, links to facebook & twitter etc….
  • 19. Pictures & reading articles
  • 20. Watching Videos etc… still have no clarity….
  • 21. Such bigdata is to be sorted, filtered & analyzed to produce useful information for decision making.
  • 22. For haps facebook may help u better to identify best gym equipment for your office….. Finally Analytics gives us useful insight or information from big data.
  • 23. Challenges of big data: • Problem: To read 1 TB data from a hard drive • Sol1: 1 machine of 4 I/O channels of 100 MBps • 1 TB=1024*1024 MB • 10,48,576 MB • =10, 485 Seconds • =174.75 Minutes by 1 i/o channel • =174.75/4 • =43.6 Minutes for by 4 i/o channels • Sol2: If 10 machines are used for reading it takes 43.6/10=4.36 minutes to read 1 TB data. • i.e to analyze big data, first we need to read it, today challenge is i/o speed but not storage capacity. • Challenge is to read/write data but not to store it. • Hadoop is framework to solve the above challenges.
  • 24. Hadoop • Hadoop: is an open source java based programming framework that supports processing of large datasets in distributed computing environment. It is a part of apache project sponsored by Apache Software Foundation. • It is designed to answer the question “How to process big data with reasonable cost & time”. • Definition2: • Apache hadoop ia a framework for distributed processing of large datasets across clusters of commodity computers/hardware using simple programming model (mapReduce). • Commodity hardware is cheap & more in number rather than high cost high end, less number of servers or super/micro computers. • Who use hadoop ?: • Indian Aadar scheme is using hadoop. • Google has built a new version of distributed file system using hadoop to handle & analyze its data. • Yahoo • Facebook etc….
  • 25. • History: • It was founded by yahoo in 2005. • It was handed over to Google in 2006. • Now it is Apache hadoop. • Some Public Cloud services that gives hadoop: • AWS Elastic MapReduce • Amazon EC2/S3 • Google Cloud DataProc
  • 26. Hadoop Components: • 1.HDFS: (Hadoop Distributed File System) for storing data across thousands of servers to achieve high bandwidth. • 2.MapReduce: Provides programming model to handle large distributed processing –mapping data & reducing it to a result. • Hadoop is the popular open source implementation of MapReduce, a powerful tool designed for deep analysis and transformation of very large data sets. 
  • 27. Complementary software packages: • The term Hadoop has come to refer not just to the base modules above, but also to collection of additional software packages that can be installed on top of or alongside Hadoop, such as  • Apache Pig,  • Apache Hive,  • Apache HBase,  • Apache Phoenix,  • Apache Spark,  • Apache ZooKeeper,  • Cloudera Impala,  • Apache Flume,  • Apache Sqoop,  • Apache Oozie,  • Apache Storm. • HBase: An open source , non relational distributed database. • Hive: A datawarehouse that provides data summary • Pig: A high level platform that creates programs run on hadoop. • Apache Spark: A fast engine for bigdata processing capable of streaming & supporting SQL, machine learning, grapg processing. One survey says, 80 % of hadoop projects are going to mature in 2016 & people are looking towards apache spark for their next projects.
  • 28. • Where processing is hosted? – Distributed Servers / Cloud (e.g. Amazon EC2) • Where data is stored? – Distributed Storage (e.g. Amazon S3) • What is the programming model? – Distributed Processing (e.g. MapReduce) • How data is stored & indexed? – High-performance schema-free databases (e.g. MongoDB) • What operations are performed on data? – Analytic / Semantic Processing Types of tools used in Big-Data
  • 29. Difference between Big data & Data Science. • [http://www.kdnuggets.com/2015/07/data-science-big-data-different-beasts.html] • Creating artifact from the ore requires the tools, craftmanship and science. Same is the case of big data and data science, here we present the distinguishing factors between the ore and the artifact. • Data Science looks to create models that capture the underlying patterns of complex systems, and codify those models into working applications. Big Data looks to collect and manage large amounts of varied data to serve large-scale web applications and vast sensor networks. Although both offer the potential to produce value from data, the fundamental difference between Data Science and Big Data can be summarized in one statement: -Collecting Does Not Mean Discovering
  • 30. Investments in data-focused activities center around tools instead of approaches. The engineering cart gets put before the scientific horse, leaving an organization with a big set of tools, and a small amount of knowledge on how to convert data into something useful. So, Data Science is expertise in converting data to an useful information/products that answer always-changing demands of the market.
  • 31. Salary Trends for Bigdata/hadoop • Big Data Hadoop Salary Trends • 1.Average Big Data salaries have increased by 9.3% in the last 12 months. Current salary range is between $119,250 to $168,250. • 2.A Hadoop developer making $120,000 will be evaluated by competitor companies at $155,000. Thats a 29% hike. • 3.On average there is a new Big Data/Hadoop technology released every 6 weeks. So make sure you stay updated. • 4.The average salary for a Hadoop Developer in San Francisco, CA, is $139,000. • 5.A Senior Hadoop developer in San Francisco, CA can earn over $178,000 on an average. • 6.Hortonworks, Paxata, Bloomberg LP - are hiring top Big Data Hadoop talent for the highest pay package. • 7.The states with the most Hadoop Big Data jobs are California, New York, New Jersey and Texas. - duh that was obvious :)
  • 32.
  • 33.
  • 34.
  • 35.
  • 36. So, make sure, you stay updated
  • 37.
  • 38. Future of Big Data • $15 billion on software firms only specializing in data management and analytics. • This industry on its own is worth more than $100 billion and growing at almost 10% a year which is roughly twice as fast as the software business as a whole. • In February 2012, the open source analyst firm Wikibon released the first market forecast for Big Data , listing $5.1B revenue in 2012 with growth to $53.4B in 2017 • The McKinsey Global Institute estimates that data volume is growing 40% per year, and will grow 44x between 2009 and 2020.
  • 39. • So, Data Science as a career goal will enrich employability of the graduate in future market. • Big data Market Forecast
  • 40. References • www.Slideshare.com • www.wikipedia.com • www.computereducation.org • Strata + Hadoop World 2016: Hadoop and Spark in spotlight • http://searchcloudcomputing.techtarget.com/definition/bi g-data-Big-Data • http://www.information-management.com/news/big-data- analytics/the-top-5-trends-in-big-data-for-2017-10029956- 1.html • Books-  Big Data by Viktor Mayer-Schonberger