SlideShare una empresa de Scribd logo
1 de 43
Big Data Analytics and Hadoop
Dr. M.V. Padmavati
Bhilai Institute of Technology, Durg
Data scientist is the most promising job of 2019
Big Data
What is Big data?
Key Challenges
• Capture & Store
• Search
• Sharing & Transfer
• Analysis
Big data is an all-encompassing term for any collection of data sets so large and
complex that it becomes difficult to process them using traditional data
processing applications
Domains with Large Datasets:
• Meteorology
• Complex physics simulations
• Biological and environmental
research
• Internet Search
Dimensions to Big Data
• Initially, there are three dimensions to big data
known as Volume, Variety and Velocity.
• These are also called characteristics of big data or
3V’s of Big data.
• 4th V (Veracity) is added afterwards.
Volume(Scale)-Data Volume
• There will be 44x increase from 2009 to 2020, From 0.8
zettabytes to 35zb, Data volume is increasing
exponentially.
• 1TB=1024GB
• 1 PetaByte (5th power of 1000, 1015) =1024TB
• 1 ExaByte (6th power of 1000, 1018) =1024 PB
• 1 ZettaByte=1024 EB
• 1 YottaByte=1024 ZB
• Big Data is a collection of huge volumes of Data.
Velocity (Speed)
• Data is being generated fast and need to be processed fast.
• Requires Online Data Analytics
• Late decisions means missing opportunities
• Examples
•E-Promotions: Based on your current location, your
purchase history, what you like  send promotions right
now for store next to you
•Healthcare monitoring: sensors monitoring your activities
and body  any abnormal measurements require
immediate reaction
Variety (Complexity)
• Text, numerical, images, audio, video, sequences, time
series, social media data, multi-dim arrays, etc…
• Data can be either Static or streaming data
Veracity (Uncertainty)
• Veracity refers to the trustworthiness of the data
• This refers to the inconsistency
4 Vs of Big Data
Types of Data: Data is categorized as
1. Structured Data
2. Semi-Structured Data
3. Un-Structured Data
Generally Big Data consists unstructured
Data
1. Structured Data:
• Uploads neatly into a relational
database
Types of Data
2. Unstructured Data
• Today more than 80% of the data generated is
unstructured.
• Examples:
•Satellite images, Social media data, Mobile
data, Photographs and video: This includes
security, surveillance, and traffic video
•Website content: This comes from any site
delivering unstructured content, like YouTube,
Flickr, or Instagram.
Types of Data -Unstructured Data
• Semi-structured has
some organizational
properties that make it
easier to analyze.
• Examples of semi-
structured data formats:
• CSV (Comma separated values)
• XML (Extended Markup
language)
• JSON (JavaScript Object
Notation)
Types of Data – Semi structured Data
Big data Analytics
What is Big data Analytics?
• “It is the art of finding patterns and insights in
large sets of data that allow you to make
better decisions or learn things you couldn’t
otherwise learn.”
• It makes use of statistics, AI, data mining,
machine learning, pattern recognition, natural
language processing etc
Reasons Benefits of Big data Analytics
Timely Gain instant insights from diverse data sources
Better analytics Improvement of business performance through
real-time analytics
Vast data Big data technologies manage huge amounts of
data
Insights Can provide better insights with the help of
unstructured and semi-structured data
Decision making Helps mitigate risk and make smart decision by
proper risk analysis
Why Big data Analytics?
Types of
Analytics
BIG DATA ANALYTICAL TOOLS
 Apache Hadoop
 Apache Spark
 Apache Storm
 Presto (Facebook)
 Hydra
 Google Bigquery
 Statwing
 Pentaho
 Flink
 Openrefine
 Kaggle
 Windows Azure
Applications of Big Analytics for
Humanity
Big data in Healthcare
Customer relationship
management
Electronic
Health
Record
Big data in Healthcare
• Big data reduces costs of treatment since there is less chances
of having to perform unnecessary diagnosis.
• It helps in predicting outbreaks of epidemics and also helps in
deciding what preventive measures
• It helps avoid preventable diseases by detecting diseases in
early stages which helps in preventing it
• Patients can be provided with the evidence based
medicine which is identified and prescribed after doing the past
medical results research.
Big data in Insurance
• Analyzing and predicting customer behavior through data
derived from social media, GPS-enabled devices and CCTV
footage.
• When it comes to claims management, predictive analytics from
big data has been used to offer faster service and Fraud
detection.
• Through massive data from digital channels and social media,
real-time monitoring of claims throughout the claims cycle has
been used to provide insights.
• SBI life makes use of big data analytics.
Big data in Education
• The University of Tasmania, An Australian university with over 26000
students has deployed a Learning and Management System that tracks
among other things, when a student logs onto the system, how much
time is spent on different pages in the system, as well as the overall
progress of a student over time.
• it is also used to measure teacher’s effectiveness to ensure a good
experience for both students and teachers.
• Click patterns are also being used to detect boredom.
• Adaptive learning: Customized learning. Enterprises produce digital
courses that use big-data-fuelled prognostic analytics to locate what a
learner is learning and what components of a lecture plan most
effectively ensembles them at those situations.
Big data in Media and Entertainment
• Media and entertainment industry is facing new business models, for the way
they – create, market and distribute their content. This is happening because
of current consumer’s search and the requirement of accessing content
anywhere, any time, on any device.
• Big Data provides actionable points of information about millions of
individuals. Now, publishing environments are tailoring advertisements and
content to appeal consumers. These insights are gathered through
various data-mining activities. Big Data applications benefits media and
entertainment industry by:
• Predicting what the audience wants
• Scheduling optimization
• Increasing acquisition and retention
• Ad targeting
• Content monetization and new product development
• Crime Prediction and Prevention
Police departments can leverage advanced, real-time analytics to provide
actionable intelligence that can be used to understand criminal behaviour,
identify crime/incident patterns, and uncover location-based threats.
• Weather Forecasting
The NOAA(National Oceanic and Atmospheric Administration) gathers data
every minute of every day from land, sea, and space-based sensors. Daily NOAA
uses Big Data to analyze and extract value from over 20 terabytes of data.
• Tax Compliance
Big Data Applications can be used by tax organizations to analyze both
unstructured and structured data from a variety of sources in order to identify
suspicious behavior and multiple identities. This would help in tax fraud
identification.
• Big Data Contributions to Transportation: Route planning to reduce the users
wait times, Congestion management by predicting traffic conditions: Using big
data, real time estimation of congestion and traffic patterns is now possible. For
examples, people using Google Maps to locate the least traffic prone routes.
Safety level of traffic: Using the real time processing of big data and predictive
analysis to identify the traffic accidents prone areas can help reduce accidents
and increase the safety level of traffic
Big data in Various Other Fields
Why should I Learn Big Data
Analytics?
Role of Mathematicians in Big data
• Data science is the marriage of statistics and computer
science, we need
• Probability
• Statistics
• Distributed Optimization
• Algebra
• Calculus
How Physicists can use Big data
• Astrophysics
• Quantum Computing
• Electrical grid analytics
• Simulation of complex systems
• Internet of things
How Bio People can use Big data
• The human genome contains roughly 3 billion
DNA base pairs and about 20,000 genes.
• The genetic information acquired globally about
patients and diseases will enable the health-care
providers to offer individual-specific, tailor made
medicines.
• Smart agriculture using IOTs
• The DNA-sequence data contain insights for the
development of (a) superior, disease-resistant
and high yielding crop varieties that are resistant
to the climate change, and (b) drugs for cancer
cure, HIV, or new strains of influenza
For Commerce People
• Supply chain analytics
• Retail Analytics
• Manufacturing analytics
• Bank Analytics
• HR Analytics
• Sales analytics
• Recommender systems
Apache Hadoop
HADOOP
APACHE HADOOP
• Hadoop is an open source framework developed by Doug Cutting in 2006
and is managed by the Apache Software Foundation
• The project was named as Hadoop after the yellow toy elephant of the Doug
Cutting’s son.
• The framework is written in Java that allows storage and processing of large
volumes of data on a cluster of commodity hardware.
• The Apache Hadoop project actively supports multiple projects intended to
extend Hadoop’s capabilities and make it easier to use.
Traditional Systems Vs Big data Systems
Traditional Systems
• Schema-On-Write
• Traditional systems use
shared storage
• Cost of Proprietary
Hardware
• Brings Data to the Programs
Hadoop Data Systems
• Schema-On-Read
• Uses the Hadoop Distributed
File System (HDFS)
• Local storage, uses
commodity hardware
• Brings Programs to the Data
HADOOP ECOSYSTEM
HADOOP COMPONENTS
 HDFS (Hadoop Distributed File System)
• It is the storage layer of Hadoop. It works as the Master-Slave pattern.
• In HDFS NameNode acts as a master which stores the metadata of
DataNode.
• Data node acts as a slave which stores the actual data in local disc and
parallely performs the actual task on data.
HADOOP COMPONENTS
 MapReduce
• It is the data processing layer of Hadoop.
• It processes huge amount of data in parallel by dividing the job (submitted
job) into a set of independent tasks.
• It contains four tasks: Map-shuffle-sort-reduce
HADOOP COMPONENTS
 Hbase and Hive
• Hive and HBase are both data stores for storing unstructured data.
• RDBMS professionals love apache hive as they can simply map HDFS files to Hive
tables and query the data
• HBase is a NoSQL database used for real-time data streaming whereas Hive is not
ideally a database but a mapreduce based SQL engine that runs on top of
Hadoop.
• HBase is a database and Hive is a SQL engine for batch processing of big data.
• Other NoSQL databases are MongoDB, Cassandra etc
HADOOP COMPONENTS
 Pig
• It is a top-level scripting language.
• It enables writing complex data processing operators in Hadoop using Pig Latin
programming.
 Sqoop
• It is a data collection tool design to transport huge volumes of data between Hadoop
and RDBMS.
 Mahout
• A library of scalable machine-learning algorithms, implemented on top of Apache
Hadoop and using the MapReduce paradigm.
HADOOP COMPONENTS
 Flume
• It is a reliable system for collecting large amounts of log data from many different
sources in real-time.
 Oozie
• It is a workflow scheduler system that is used to schedule Apache Hadoop jobs. It
combines multiple jobs sequentially into one logical unit of work.
 Zookeeper
• ZooKeeper is a high-performance coordination service for distributed applications.
It provides a centralized service for maintaining configuration information, naming,
providing distributed synchronization, and providing group services.
HADOOP COMPONENTS
FEATURES OF HADOOP
 No expensive hardware are required
 Supports a large cluster of 100 to 1000 nodes
 More computing power and storage system
 Parallel Processing of Data
 Distributed Data
 Data Replication
 Automatic Failover management
 Data Locality Optimization
 Supports Heterogeneous Cluster
 Scalability
Bigdata and Hadoop with applications

Más contenido relacionado

La actualidad más candente

A Novel Framework for Big Data Processing in a Data-driven Society
A Novel Framework for Big Data Processing in a Data-driven SocietyA Novel Framework for Big Data Processing in a Data-driven Society
A Novel Framework for Big Data Processing in a Data-driven SocietyAnthonyOtuonye
 
USING BIGDATA WITH ACADEMIC LIBRARY SERVICES: A VIEW
USING BIGDATA WITH ACADEMIC LIBRARY SERVICES: A VIEWUSING BIGDATA WITH ACADEMIC LIBRARY SERVICES: A VIEW
USING BIGDATA WITH ACADEMIC LIBRARY SERVICES: A VIEWNellore Harilakshmi
 
JPJ1417 Data Mining With Big Data
JPJ1417   Data Mining With Big DataJPJ1417   Data Mining With Big Data
JPJ1417 Data Mining With Big Datachennaijp
 
Big data deep learning: applications and challenges
Big data deep learning: applications and challengesBig data deep learning: applications and challenges
Big data deep learning: applications and challengesfazail amin
 
elgendy2014.pdf
elgendy2014.pdfelgendy2014.pdf
elgendy2014.pdfAkuhuruf
 
Tools and Methods for Big Data Analytics by Dahl Winters
Tools and Methods for Big Data Analytics by Dahl WintersTools and Methods for Big Data Analytics by Dahl Winters
Tools and Methods for Big Data Analytics by Dahl WintersMelinda Thielbar
 
Mining Big Data in Real Time
Mining Big Data in Real TimeMining Big Data in Real Time
Mining Big Data in Real TimeAlbert Bifet
 
Data Mining in the World of BIG Data-A Survey
Data Mining in the World of BIG Data-A SurveyData Mining in the World of BIG Data-A Survey
Data Mining in the World of BIG Data-A SurveyEditor IJCATR
 
Introduction to Big Data Analytics
Introduction to Big Data AnalyticsIntroduction to Big Data Analytics
Introduction to Big Data AnalyticsUtkarsh Sharma
 
Presentation Big Data
Presentation Big DataPresentation Big Data
Presentation Big DataRené Kuipers
 
Real World Application of Big Data In Data Mining Tools
Real World Application of Big Data In Data Mining ToolsReal World Application of Big Data In Data Mining Tools
Real World Application of Big Data In Data Mining Toolsijsrd.com
 
Big Data Brown Bag
Big Data Brown BagBig Data Brown Bag
Big Data Brown Bagusmanqureshi
 
Transforming Research in Collaboration with Funding Agencies
Transforming Research in Collaboration with Funding AgenciesTransforming Research in Collaboration with Funding Agencies
Transforming Research in Collaboration with Funding AgenciesAmazon Web Services
 
HathiTrust Research Center Secure Commons
HathiTrust Research Center Secure CommonsHathiTrust Research Center Secure Commons
HathiTrust Research Center Secure CommonsBeth Plale
 
Computational intelligence for big data analytics bda 2013
Computational intelligence for big data analytics   bda 2013Computational intelligence for big data analytics   bda 2013
Computational intelligence for big data analytics bda 2013oj08
 

La actualidad más candente (20)

A Novel Framework for Big Data Processing in a Data-driven Society
A Novel Framework for Big Data Processing in a Data-driven SocietyA Novel Framework for Big Data Processing in a Data-driven Society
A Novel Framework for Big Data Processing in a Data-driven Society
 
Big data Mining
Big data MiningBig data Mining
Big data Mining
 
USING BIGDATA WITH ACADEMIC LIBRARY SERVICES: A VIEW
USING BIGDATA WITH ACADEMIC LIBRARY SERVICES: A VIEWUSING BIGDATA WITH ACADEMIC LIBRARY SERVICES: A VIEW
USING BIGDATA WITH ACADEMIC LIBRARY SERVICES: A VIEW
 
JPJ1417 Data Mining With Big Data
JPJ1417   Data Mining With Big DataJPJ1417   Data Mining With Big Data
JPJ1417 Data Mining With Big Data
 
Big data deep learning: applications and challenges
Big data deep learning: applications and challengesBig data deep learning: applications and challenges
Big data deep learning: applications and challenges
 
elgendy2014.pdf
elgendy2014.pdfelgendy2014.pdf
elgendy2014.pdf
 
Tools and Methods for Big Data Analytics by Dahl Winters
Tools and Methods for Big Data Analytics by Dahl WintersTools and Methods for Big Data Analytics by Dahl Winters
Tools and Methods for Big Data Analytics by Dahl Winters
 
Mining Big Data in Real Time
Mining Big Data in Real TimeMining Big Data in Real Time
Mining Big Data in Real Time
 
Big data
Big dataBig data
Big data
 
Data mining on big data
Data mining on big dataData mining on big data
Data mining on big data
 
Big data
Big dataBig data
Big data
 
Big Data for Library Services (2017)
Big Data for Library Services (2017)Big Data for Library Services (2017)
Big Data for Library Services (2017)
 
Data Mining in the World of BIG Data-A Survey
Data Mining in the World of BIG Data-A SurveyData Mining in the World of BIG Data-A Survey
Data Mining in the World of BIG Data-A Survey
 
Introduction to Big Data Analytics
Introduction to Big Data AnalyticsIntroduction to Big Data Analytics
Introduction to Big Data Analytics
 
Presentation Big Data
Presentation Big DataPresentation Big Data
Presentation Big Data
 
Real World Application of Big Data In Data Mining Tools
Real World Application of Big Data In Data Mining ToolsReal World Application of Big Data In Data Mining Tools
Real World Application of Big Data In Data Mining Tools
 
Big Data Brown Bag
Big Data Brown BagBig Data Brown Bag
Big Data Brown Bag
 
Transforming Research in Collaboration with Funding Agencies
Transforming Research in Collaboration with Funding AgenciesTransforming Research in Collaboration with Funding Agencies
Transforming Research in Collaboration with Funding Agencies
 
HathiTrust Research Center Secure Commons
HathiTrust Research Center Secure CommonsHathiTrust Research Center Secure Commons
HathiTrust Research Center Secure Commons
 
Computational intelligence for big data analytics bda 2013
Computational intelligence for big data analytics   bda 2013Computational intelligence for big data analytics   bda 2013
Computational intelligence for big data analytics bda 2013
 

Similar a Bigdata and Hadoop with applications

Similar a Bigdata and Hadoop with applications (20)

Big data
Big dataBig data
Big data
 
Data Mining With Big Data
Data Mining With Big DataData Mining With Big Data
Data Mining With Big Data
 
Big Data World
Big Data WorldBig Data World
Big Data World
 
Unit 1 (DSBDA) PD.pptx
Unit 1 (DSBDA)  PD.pptxUnit 1 (DSBDA)  PD.pptx
Unit 1 (DSBDA) PD.pptx
 
Trends in data analytics
Trends in data analyticsTrends in data analytics
Trends in data analytics
 
Bigdata
BigdataBigdata
Bigdata
 
Data analytics introduction
Data analytics introductionData analytics introduction
Data analytics introduction
 
Big data
Big dataBig data
Big data
 
Unit-1 -2-3- BDA PIET 6 AIDS.pptx
Unit-1 -2-3- BDA PIET 6 AIDS.pptxUnit-1 -2-3- BDA PIET 6 AIDS.pptx
Unit-1 -2-3- BDA PIET 6 AIDS.pptx
 
Big data
Big dataBig data
Big data
 
Business Analytics and Data mining.pdf
Business Analytics and Data mining.pdfBusiness Analytics and Data mining.pdf
Business Analytics and Data mining.pdf
 
Are you ready for BIG DATA?
Are you ready for BIG DATA?Are you ready for BIG DATA?
Are you ready for BIG DATA?
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
TOPIC.pptx
TOPIC.pptxTOPIC.pptx
TOPIC.pptx
 
Understanding big data
Understanding big dataUnderstanding big data
Understanding big data
 
02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big data
 
Big data Analytics
Big data AnalyticsBig data Analytics
Big data Analytics
 
Big data
Big dataBig data
Big data
 
Intro to Data Science Big Data
Intro to Data Science Big DataIntro to Data Science Big Data
Intro to Data Science Big Data
 
Big data
Big dataBig data
Big data
 

Más de Padma Metta

Correlation and regression
Correlation and regressionCorrelation and regression
Correlation and regressionPadma Metta
 
Machine learning and types
Machine learning and typesMachine learning and types
Machine learning and typesPadma Metta
 
Statistical computing 1
Statistical computing 1Statistical computing 1
Statistical computing 1Padma Metta
 
Statistical computing2
Statistical computing2Statistical computing2
Statistical computing2Padma Metta
 
Kernel density estimation (kde)
Kernel density estimation (kde)Kernel density estimation (kde)
Kernel density estimation (kde)Padma Metta
 
Writing a Research Paper
Writing a Research PaperWriting a Research Paper
Writing a Research PaperPadma Metta
 
Machine learning and decision trees
Machine learning and decision treesMachine learning and decision trees
Machine learning and decision treesPadma Metta
 
Machine Translation System: Chhattisgarhi to Hindi
Machine Translation System: Chhattisgarhi to HindiMachine Translation System: Chhattisgarhi to Hindi
Machine Translation System: Chhattisgarhi to HindiPadma Metta
 
HTML and ASP.NET
HTML and ASP.NETHTML and ASP.NET
HTML and ASP.NETPadma Metta
 

Más de Padma Metta (9)

Correlation and regression
Correlation and regressionCorrelation and regression
Correlation and regression
 
Machine learning and types
Machine learning and typesMachine learning and types
Machine learning and types
 
Statistical computing 1
Statistical computing 1Statistical computing 1
Statistical computing 1
 
Statistical computing2
Statistical computing2Statistical computing2
Statistical computing2
 
Kernel density estimation (kde)
Kernel density estimation (kde)Kernel density estimation (kde)
Kernel density estimation (kde)
 
Writing a Research Paper
Writing a Research PaperWriting a Research Paper
Writing a Research Paper
 
Machine learning and decision trees
Machine learning and decision treesMachine learning and decision trees
Machine learning and decision trees
 
Machine Translation System: Chhattisgarhi to Hindi
Machine Translation System: Chhattisgarhi to HindiMachine Translation System: Chhattisgarhi to Hindi
Machine Translation System: Chhattisgarhi to Hindi
 
HTML and ASP.NET
HTML and ASP.NETHTML and ASP.NET
HTML and ASP.NET
 

Último

FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 

Último (20)

FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 

Bigdata and Hadoop with applications

  • 1. Big Data Analytics and Hadoop Dr. M.V. Padmavati Bhilai Institute of Technology, Durg Data scientist is the most promising job of 2019
  • 3.
  • 4. What is Big data? Key Challenges • Capture & Store • Search • Sharing & Transfer • Analysis Big data is an all-encompassing term for any collection of data sets so large and complex that it becomes difficult to process them using traditional data processing applications Domains with Large Datasets: • Meteorology • Complex physics simulations • Biological and environmental research • Internet Search
  • 5. Dimensions to Big Data • Initially, there are three dimensions to big data known as Volume, Variety and Velocity. • These are also called characteristics of big data or 3V’s of Big data. • 4th V (Veracity) is added afterwards.
  • 6. Volume(Scale)-Data Volume • There will be 44x increase from 2009 to 2020, From 0.8 zettabytes to 35zb, Data volume is increasing exponentially. • 1TB=1024GB • 1 PetaByte (5th power of 1000, 1015) =1024TB • 1 ExaByte (6th power of 1000, 1018) =1024 PB • 1 ZettaByte=1024 EB • 1 YottaByte=1024 ZB • Big Data is a collection of huge volumes of Data.
  • 7. Velocity (Speed) • Data is being generated fast and need to be processed fast. • Requires Online Data Analytics • Late decisions means missing opportunities • Examples •E-Promotions: Based on your current location, your purchase history, what you like  send promotions right now for store next to you •Healthcare monitoring: sensors monitoring your activities and body  any abnormal measurements require immediate reaction
  • 8. Variety (Complexity) • Text, numerical, images, audio, video, sequences, time series, social media data, multi-dim arrays, etc… • Data can be either Static or streaming data
  • 9. Veracity (Uncertainty) • Veracity refers to the trustworthiness of the data • This refers to the inconsistency
  • 10. 4 Vs of Big Data
  • 11. Types of Data: Data is categorized as 1. Structured Data 2. Semi-Structured Data 3. Un-Structured Data Generally Big Data consists unstructured Data 1. Structured Data: • Uploads neatly into a relational database Types of Data
  • 12. 2. Unstructured Data • Today more than 80% of the data generated is unstructured. • Examples: •Satellite images, Social media data, Mobile data, Photographs and video: This includes security, surveillance, and traffic video •Website content: This comes from any site delivering unstructured content, like YouTube, Flickr, or Instagram. Types of Data -Unstructured Data
  • 13. • Semi-structured has some organizational properties that make it easier to analyze. • Examples of semi- structured data formats: • CSV (Comma separated values) • XML (Extended Markup language) • JSON (JavaScript Object Notation) Types of Data – Semi structured Data
  • 15. What is Big data Analytics? • “It is the art of finding patterns and insights in large sets of data that allow you to make better decisions or learn things you couldn’t otherwise learn.” • It makes use of statistics, AI, data mining, machine learning, pattern recognition, natural language processing etc
  • 16. Reasons Benefits of Big data Analytics Timely Gain instant insights from diverse data sources Better analytics Improvement of business performance through real-time analytics Vast data Big data technologies manage huge amounts of data Insights Can provide better insights with the help of unstructured and semi-structured data Decision making Helps mitigate risk and make smart decision by proper risk analysis Why Big data Analytics?
  • 18. BIG DATA ANALYTICAL TOOLS  Apache Hadoop  Apache Spark  Apache Storm  Presto (Facebook)  Hydra  Google Bigquery  Statwing  Pentaho  Flink  Openrefine  Kaggle  Windows Azure
  • 19. Applications of Big Analytics for Humanity
  • 20. Big data in Healthcare Customer relationship management Electronic Health Record
  • 21. Big data in Healthcare • Big data reduces costs of treatment since there is less chances of having to perform unnecessary diagnosis. • It helps in predicting outbreaks of epidemics and also helps in deciding what preventive measures • It helps avoid preventable diseases by detecting diseases in early stages which helps in preventing it • Patients can be provided with the evidence based medicine which is identified and prescribed after doing the past medical results research.
  • 22. Big data in Insurance • Analyzing and predicting customer behavior through data derived from social media, GPS-enabled devices and CCTV footage. • When it comes to claims management, predictive analytics from big data has been used to offer faster service and Fraud detection. • Through massive data from digital channels and social media, real-time monitoring of claims throughout the claims cycle has been used to provide insights. • SBI life makes use of big data analytics.
  • 23. Big data in Education • The University of Tasmania, An Australian university with over 26000 students has deployed a Learning and Management System that tracks among other things, when a student logs onto the system, how much time is spent on different pages in the system, as well as the overall progress of a student over time. • it is also used to measure teacher’s effectiveness to ensure a good experience for both students and teachers. • Click patterns are also being used to detect boredom. • Adaptive learning: Customized learning. Enterprises produce digital courses that use big-data-fuelled prognostic analytics to locate what a learner is learning and what components of a lecture plan most effectively ensembles them at those situations.
  • 24. Big data in Media and Entertainment • Media and entertainment industry is facing new business models, for the way they – create, market and distribute their content. This is happening because of current consumer’s search and the requirement of accessing content anywhere, any time, on any device. • Big Data provides actionable points of information about millions of individuals. Now, publishing environments are tailoring advertisements and content to appeal consumers. These insights are gathered through various data-mining activities. Big Data applications benefits media and entertainment industry by: • Predicting what the audience wants • Scheduling optimization • Increasing acquisition and retention • Ad targeting • Content monetization and new product development
  • 25. • Crime Prediction and Prevention Police departments can leverage advanced, real-time analytics to provide actionable intelligence that can be used to understand criminal behaviour, identify crime/incident patterns, and uncover location-based threats. • Weather Forecasting The NOAA(National Oceanic and Atmospheric Administration) gathers data every minute of every day from land, sea, and space-based sensors. Daily NOAA uses Big Data to analyze and extract value from over 20 terabytes of data. • Tax Compliance Big Data Applications can be used by tax organizations to analyze both unstructured and structured data from a variety of sources in order to identify suspicious behavior and multiple identities. This would help in tax fraud identification. • Big Data Contributions to Transportation: Route planning to reduce the users wait times, Congestion management by predicting traffic conditions: Using big data, real time estimation of congestion and traffic patterns is now possible. For examples, people using Google Maps to locate the least traffic prone routes. Safety level of traffic: Using the real time processing of big data and predictive analysis to identify the traffic accidents prone areas can help reduce accidents and increase the safety level of traffic Big data in Various Other Fields
  • 26. Why should I Learn Big Data Analytics?
  • 27. Role of Mathematicians in Big data • Data science is the marriage of statistics and computer science, we need • Probability • Statistics • Distributed Optimization • Algebra • Calculus
  • 28. How Physicists can use Big data • Astrophysics • Quantum Computing • Electrical grid analytics • Simulation of complex systems • Internet of things
  • 29. How Bio People can use Big data • The human genome contains roughly 3 billion DNA base pairs and about 20,000 genes. • The genetic information acquired globally about patients and diseases will enable the health-care providers to offer individual-specific, tailor made medicines. • Smart agriculture using IOTs • The DNA-sequence data contain insights for the development of (a) superior, disease-resistant and high yielding crop varieties that are resistant to the climate change, and (b) drugs for cancer cure, HIV, or new strains of influenza
  • 30. For Commerce People • Supply chain analytics • Retail Analytics • Manufacturing analytics • Bank Analytics • HR Analytics • Sales analytics • Recommender systems
  • 32. APACHE HADOOP • Hadoop is an open source framework developed by Doug Cutting in 2006 and is managed by the Apache Software Foundation • The project was named as Hadoop after the yellow toy elephant of the Doug Cutting’s son. • The framework is written in Java that allows storage and processing of large volumes of data on a cluster of commodity hardware. • The Apache Hadoop project actively supports multiple projects intended to extend Hadoop’s capabilities and make it easier to use.
  • 33. Traditional Systems Vs Big data Systems Traditional Systems • Schema-On-Write • Traditional systems use shared storage • Cost of Proprietary Hardware • Brings Data to the Programs Hadoop Data Systems • Schema-On-Read • Uses the Hadoop Distributed File System (HDFS) • Local storage, uses commodity hardware • Brings Programs to the Data
  • 36.  HDFS (Hadoop Distributed File System) • It is the storage layer of Hadoop. It works as the Master-Slave pattern. • In HDFS NameNode acts as a master which stores the metadata of DataNode. • Data node acts as a slave which stores the actual data in local disc and parallely performs the actual task on data. HADOOP COMPONENTS
  • 37.  MapReduce • It is the data processing layer of Hadoop. • It processes huge amount of data in parallel by dividing the job (submitted job) into a set of independent tasks. • It contains four tasks: Map-shuffle-sort-reduce HADOOP COMPONENTS
  • 38.  Hbase and Hive • Hive and HBase are both data stores for storing unstructured data. • RDBMS professionals love apache hive as they can simply map HDFS files to Hive tables and query the data • HBase is a NoSQL database used for real-time data streaming whereas Hive is not ideally a database but a mapreduce based SQL engine that runs on top of Hadoop. • HBase is a database and Hive is a SQL engine for batch processing of big data. • Other NoSQL databases are MongoDB, Cassandra etc HADOOP COMPONENTS
  • 39.  Pig • It is a top-level scripting language. • It enables writing complex data processing operators in Hadoop using Pig Latin programming.  Sqoop • It is a data collection tool design to transport huge volumes of data between Hadoop and RDBMS.  Mahout • A library of scalable machine-learning algorithms, implemented on top of Apache Hadoop and using the MapReduce paradigm. HADOOP COMPONENTS
  • 40.  Flume • It is a reliable system for collecting large amounts of log data from many different sources in real-time.  Oozie • It is a workflow scheduler system that is used to schedule Apache Hadoop jobs. It combines multiple jobs sequentially into one logical unit of work.  Zookeeper • ZooKeeper is a high-performance coordination service for distributed applications. It provides a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. HADOOP COMPONENTS
  • 41.
  • 42. FEATURES OF HADOOP  No expensive hardware are required  Supports a large cluster of 100 to 1000 nodes  More computing power and storage system  Parallel Processing of Data  Distributed Data  Data Replication  Automatic Failover management  Data Locality Optimization  Supports Heterogeneous Cluster  Scalability