SlideShare una empresa de Scribd logo
1 de 62
Big Data Analytics
Ms.Humera Shaziya
Department of Informatics
Nizam College
Outline
• Introduction to Big Data
• Characteristics of Big Data
▫ Volume
▫ Velocity
▫ Variety
• Challenges of Big Data
• Examples of Big Data
• Definition of Big Data Analytics
• Types of Analytics
• Applications of Big Data Analytics
• Recommendation System
Big Data
• Big Data is a huge volume of data that cannot be stored
and processed using the traditional approach within a
given time frame
• The definition of Big Data, given by Gartner is, “Big data
is high-volume, and high-velocity and/or high-variety
information assets that demand cost-effective,
innovative forms of information processing that enable
enhanced insight, decision making, and process
automation”.
• It refers to any dataset which cannot be analyzed using
popular and conventional tools and requires specialized
tools for analysis
• Any dataset in terabytes or petabytes is considered to be
big data
Data
• Information in raw or unorganized form (such
as alphabets, numbers, or symbols) that refer to,
or represent, conditions, ideas, or objects. Data
is limitless and present everywhere in the
universe
• Eg., Student details
• Data holds lot of valuable information
• Organizations use data to gain insights
Characteristics of Big Data
• Volume: it refers to the amount of data that is
getting generated
• Velocity: it refers to the speed at which this data
is generated
• Variety: it refers to the different types of data
that is getting generated
3V’s of Big Data
Volume
• Data
quantity
Velocity
• Data
Speed
Variety
• Data
Types
Volume: How huge data needs to be?
• To classify data to be big when its volume is in
terabytes, petabytes, exabytes and so on
• Big Data refers to terabytes or petabytes of less-
structured data that require Hadoop and/or
non-relational databases for cost-effective,
efficient processing.
Data Measurement
• Bit
A bit is a value of either a 1 or 0 (on or off).
• Nibble
A Nibble is 4 bits.
• Byte
A Byte is 8 bits.
1 character, e.g. "a", is one byte.
• Kilobyte (KB)
A Kilobyte is 1,024 bytes.
2 or 3 paragraphs of text.
• Megabyte (MB)
A Megabyte is 1,048,576 bytes or 1,024 Kilobytes
873 pages of plaintext (1,200 characters)
4 books (200 pages or 240,000 characters)
Gigabyte (GB)
• A Gigabyte is 1,073,741,824 (230) bytes. 1,024
Megabytes, or 1,048,576 Kilobytes.
▫ 894,784 pages of plaintext (1,200 characters)
▫ 4,473 books (200 pages or 240,000 characters)
▫ 640 web pages (with 1.6MB average file size)
▫ 341 digital pictures (with 3MB average file size)
▫ 256 MP3 audio files (with 4MB average file size)
▫ 1 650MB CD
Terabyte (TB)
• A Terabyte is 1,099,511,627,776 (240) bytes, 1,024
Gigabytes, or 1,048,576 Megabytes.
▫ 916,259,689 pages of plaintext (1,200 characters)
▫ 4,581,298 books (200 pages or 240,000 characters)
▫ 655,360 web pages (with 1.6MB average file size)
▫ 349,525 digital pictures (with 3MB average file size)
▫ 262,144 MP3 audio files (with 4MB average file size)
▫ 1,613 650MB CD's
▫ 233 4.38GB DVD's
▫ 40 25GB Blu-ray discs
Petabyte (PB)
• A Petabyte is 1,125,899,906,842,624 (250) bytes, 1,024
Terabytes, 1,048,576 Gigabytes, or 1,073,741,824
Megabytes.
▫ 938,249,922,368 pages of plaintext (1,200 characters)
▫ 4,691,249,611 books (200 pages or 240,000 characters)
▫ 671,088,640 web pages (with 1.6MB average file size)
▫ 357,913,941 digital pictures (with 3MB average file size)
▫ 268,435,456 MP3 audio files (with 4MB average file size)
▫ 1,651,910 650MB CD's
▫ 239,400 4.38GB DVD's
▫ 41,943 25GB Blu-ray discs
Exabyte (EB), Zettabyte (ZB)
and Yottabyte
• Exabyte (EB)
▫ An Exabyte is 1,152,921,504,606,846,976 (260) bytes, 1,024
Petabytes, 1,048,576 Terabytes, 1,073,741,824 Gigabytes, or
1,099,511,627,776 Megabytes.
• Zettabyte (ZB)
▫ A Zettabyte is 1,180,591,620,717,411,303,424 (270) bytes, 1,024
Exabytes, 1,048,576 Petabytes, 1,073,741,824 Terabytes,
1,099,511,627,776 Gigabytes, or 1,125,899,910,000,000
Megabytes.
• Yottabyte (YB)
▫ A Yottabyte is 1,208,925,819,614,629,174,706,176 (280) bytes,
1,024 Zettabytes, 1,048,576 Exabytes, 1,073,741,824 Petabytes,
1,099,511,627,776 Terabytes, 1,125,899,910,000,000 Gigabytes,
or 1,152,921,500,000,000,000 Megabytes.
Velocity: Data generated in every 60
seconds on Internet
• 2+ million seraches on Google
• 3+ million likes on facebook
• 250,000 new photoes uploaded on facebook
• 3 million items shared on facebook
• 56,000 photos uploaded on instagram
• 430,000 tweets sent on twitter
• 150+ million emails sent
Data generated in 60 secs on Internet
• 2.7 million video views on youtube
• 139,000 hours video watched on youtube
• 300 hours video uploaded on youtube
• 280,000 snaps sent on snapchat
• 44 million messages processed on whatsapp
• 486,000 photos shared on whatsapp
• 70,000 video messages shared on whatsapp
• 9800 articles pinned on pinterest
Data generated in 60 secs on Internet
• 195,000 minutes audio chat on wechat
• 21 million messages sent on wechat
• 100+ new domains registered
• 95,000 apps download on android
• 48,000 apps download on iPhone
• 140+ submissions on reddit
• 18,000 matches on tinder
• 972,000 swipes daily on tinder
Data generated in 60 secs on Internet
• 69,500 hours video watched on netflix
• 26 new reviews posted on ylp
• 120 new accounts on linkedin
• 39,300+ hours music listened on spotify
• 14 new songs added on spotify
Infographics covering the latest
statistics on things that happen on
internet every 60 seconds
Variety: Types of Data
There are three types
• Structured: A data to which proper format is
associated to it. Eg: Database tables, CSV files,
and spreadsheets (XLS).
• Semi-Structured: A data that does not have a
proper format associated to it. Eg: emails, log
word document.
• Unstructured: A data that does not have any
format associated to it. Eg: image, audio and
video files
Challenges of Big Data
• There are two main challenges associated with ig
data
▫ How do we store and manage such a huge data
efficiently
▫ How do we process and extract valuable
information from this huge volume of data within
a time frame
• These two challenges lead to the development of
hadoop
Hadoop
• Hadoop is an open-source framework that
allows to store and process big data in a
distributed environment across clusters of
computers using simple programming models. It
is designed to scale up from single servers to
thousands of machines, each offering local
computation and storage.
• Developed by Doug Cutting and managed by the
apache foundation
Components of Hadoop
• Hadoop Distributed File System (HDFS) : deals
with storage of big data
• MapReduce: deals with processing of big data
Analytics
• Analytics refers to the ability to collect and use
data to generate insights to inform fact-based
decision making
• Analytics allows us to use sophisticated
statistical algorithms and leverage computing
power to explore, analyze and understand the
data to generate insights from it and to discover
hidden patterns and take advantage of this to
make better decisions.
Big Data Analytics
• It refers to the huge dataset that has come about
now a days which need to be analyzed and stored
• When dealing with such huge data conventional
tools are not enough to analyze and explore
• In order to analyze this data one needs
specialized tools designed to deal with such large
amount of data
• This is how the big data has come about
3 Broad Types of Analytics
• On the basis of industry
• On the basis of business function/ domain
analytics
• On the basis of insights offered
Industry Analytics
• Credit cards
• Insurance
• E-Commerce
• Travel
• Retail
• Telecom
• So on…
Business Function/Domain Analytics
• HR analytics
• Finance analytics
• Sales analytics
• Supply chain analytics
• Risk analytics
• So on…
Insights Analytics
• Descriptive analytics
• Predictive analytics
• Prescriptive analytics
Descriptive analytics
• Descriptive analytics: it uses information from
the past to make decisions in the present for the
future.
• It refers to a set of techniques used to describe or
explore or profile any kind of data
Predictive analytics
• Predictive analytics: it works by identifying
patterns and using statistics to make inferences
• Predictive analysis identifies past data patterns
and provides a list of likely outcomes for a given
situation. By studying recent and historical data,
predictive analysis presents you with a forecast
of what may happen in the future.
Prescriptive analytics
• Prescriptive analysis reveals actions that should
be taken and provides recommendations for
next steps, letting you answer your business
questions in a focused manner. It goes beyond
predictive data analytics, since it recommends
multiple courses of action with likely outcomes
for each decision.
Analytics Tools
• Open source
▫ R
▫ Hadoop with mahout
▫ Weka
• commercial
▫ SAS
▫ SPSS
Job titles on Big Data
• Big Data Architect – Analytics
▫ Focused on creating views on top of structured
and non-structured data and presenting that data
in a portal framework. Will initially focus on data
mining and data visualization using the latest in
open source data mining/data presentation
technology.... In addition, the team will begin to
pull in other sources of data such as BI, user
feedback and social to help us better understand
our customer.
Job titles on Big Data
• Big Data Analyst
▫ Help better understand, test and use vast volumes
of data. Support the business through advanced
analysis and design, maintenance, and
implementation of reports and databases. Design
and build scalable infrastructure and platforms to
collect and process very large amounts of
structured, unstructured and real-time data.
Analyze large volumes of data from disparate
types of sources and present findings to senior
management.
Job titles on Big Data
• Principal Engineer, Big Data
▫ Skills will be applied to solving problems
impacting millions of customers. Explores large
data volumes using state of the art tools and
techniques to find solutions to practical business
problems.
Applications of Big Data Analytics
• Big Data for financial services: Credit card companies,
retail banks, private wealth management advisories,
insurance firms, venture finds, and institutional
investment banks use big data for their financial
services. The common problem among them all is the
massive amounts of multi structured data living in
multiple disparate systems which can be solved by big
data. Thus big data is used in a number of ways like:
• Customer analytics
• Compliance analytics
• Fraud analytics
• Operational analytics
Applications of Big Data Analytics
• Big Data in communications: Gaining new subscribers,
retaining customers, and expanding within current subscriber
bases are top priorities for telecommunication service
providers. The solutions to these challenges lie in the ability to
combine and analyze the masses of customer generated data
and machine generated data that is being created every day.
• Big Data for Retail: Brick and Mortar or an online e-tailer, the
answer to staying the game and being competitive is
understanding the customer better to serve them. This
requires the ability to analyze all the disparate data sources
that companies deal with every day, including the weblogs,
customer transaction data, social media, store branded credit
card data, and loyalty program data.
Applications of Big Data Analytics
• Healthcare: The main challenge for hospitals with
cost pressures tightens is to treat as many patients
as they can efficiently, keeping in mind the
improvement of quality of care. Instrument and
machine data is being used increasingly to track as
well as optimize patient flow, treatment, and
equipment use in the hospitals. It is estimated that
there will be a 1% efficiency gain that could yield
more than $63 billion in the global health care
savings.
Applications of Big Data Analytics
• Travel: Data analytics is able to optimize the
buying experience through the mobile/ web log
and the social media data analysis. Travel sights
can gain insights into the customer’s desires and
preferences. Products can be up-sold by
correlating the current sales to the subsequent
browsing increase browse-to-buy conversions
via customized packages and offers.
Personalized travel recommendations can also
be delivered by data analytics based on social
media data.
Applications of Big Data Analytics
• Gaming: Data Analytics helps in collecting data to
optimize and spend within as well as across games.
Game companies gain insight into the dislikes, the
relationships, and the likes of the users.
• Energy Management: Most firms are using data analytics
for energy management, including smart-grid
management, energy optimization, energy distribution,
and building automation in utility companies. The
application here is centered on the controlling and
monitoring of network devices, dispatch crews, and
manage service outrages. Utilities are given the ability to
integrate millions of data points in the network
performance and lets the engineers to use the analytics
to monitor the network.
Recommendation
system
Recommendation systems
• Recommendation systems are software tools or
techniques providing suggestions for items to be
of use to a user.
• The suggestions relate to various decision
making processes, such as ‘what items to buy’,
‘what music to listen’, ‘what online news to read’
Etc.
Where is it used?
• Massive E-commerce sites use this tool to
suggest other items a consumer may want to
purchase.
• Offer news articles to on-line newspaper readers,
based on a prediction of reader interests.
• Offer customers of an on-line retailer suggestion
about what they might like to buy based on their
past history of purchases and/or product
searches.
Types of
Recommendation systems
• Content-Based System
• Collaborative Filtering System
• Hybrid Recommender system
Content-Based System
• A content based recommender works with data
that the user provides, either explicitly (rating)
or implicitly (clicking on a link).
• Content-based systems examine properties of
the items recommended. For instance, if a
Netflix user has watched many cowboy movies,
then recommend a movie classified in the
database as having the “cowboy” genre.
Example of Content-Based
Recommendation System
The recommendation process is
Performed in three steps
1. Content Analyzer
2. Profile Learner
3. Filtering Component
Advantages of Content-Based
Recommendation System
• User Independence
• Transparency
• New Item
Collaborative Filtering
• Collaborative filtering is a popular
recommendation algorithm that bases its
predictions and recommendations on the ratings
or behavior of other users in the system.
• Collaborative filtering systems recommend
items based on similarity measures between
users and/or items.
• The items recommended to a user are those
preferred by similar users.
How Collaborative Filtering system
Works
• Asking a user to rate an item on a sliding scale.
• Asking a user to rank a collection of items from
favorite to least favorite.
• Asking a user to create a list of items that he/she
likes
How Collaborative Filtering system
Works
• Observing the items that a user views in an
online store.
• Keeping a record of the items that a user
purchases online.
• Obtaining a list of items that a user has listened
to or watched on his/her computer.
Collaborative Filtering system
Websites Uses Collaborative Filtering
system
• Amazon
• Facebook
• MySpace
• LinkedIn
• Twitter
Advantages of collaborative
Filtering recommender systems
• The notable advantage is that Collaborative
Filtering systems can produce personalized
recommendations, because they consider other
people’s experience and recommendations are
based on that experience.
• Another notable advantage is that the CF
recommender systems can suggest serendipitous
items by observing similar-minded people’s
behavior.
Hybrid Recommender system
• Hybrid recommendation systems work on
characteristics that are related to both Content-
based and Collaborative Recommender system.
• Netflix is a good example of the use of hybrid
recommender systems.
• Netflix makes recommendations by comparing
the watching and searching habits of similar
users.
ADVANTAGES OF RECOMMENDATION
SYSTEM
• Drive Traffic
• Provide Relevant Material
• Engage Customers
• Transform Shoppers to Clients
• Boost Number of Items per Order
• Offer Recommendations and Direction
Conclusion
Accordingly, these days with technology
improvement and also increasing the quantity of data we
need a method and system that can help people to find
their interests and their items with less effort and also with
spending less time with more accurate. There are several
ways that we can exploit them to reach these goals like
Collaborative filtering (CF) that suggests items based on
history valuation of all users communally, Content base
filtering which recommend according to previous users’
precedence, and also Hybrid system that is combination of
two techniques foresaid.
These approaches have several advantages and
disadvantages that at this research have tried to focus
mostly on the recommendation approaches. Although,
recommendation systems with these conditions help users
to find their preferences a lot they must be improved more
and more.
Thank You

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Data Monetization
Data MonetizationData Monetization
Data Monetization
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Big data
Big dataBig data
Big data
 
Big Data
Big DataBig Data
Big Data
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?
 
Data Monetization
Data MonetizationData Monetization
Data Monetization
 
Big Data Analytics Powerpoint Presentation Slide
Big Data Analytics Powerpoint Presentation SlideBig Data Analytics Powerpoint Presentation Slide
Big Data Analytics Powerpoint Presentation Slide
 
Overview of Big data(ppt)
Overview of Big data(ppt)Overview of Big data(ppt)
Overview of Big data(ppt)
 
Data science big data and analytics
Data science big data and analyticsData science big data and analytics
Data science big data and analytics
 
Business intelligence
Business intelligenceBusiness intelligence
Business intelligence
 
Presentation on Big Data
Presentation on Big DataPresentation on Big Data
Presentation on Big Data
 
Data Governance and Metadata Management
Data Governance and Metadata ManagementData Governance and Metadata Management
Data Governance and Metadata Management
 
Big data
Big dataBig data
Big data
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business Goals
 
Big data
Big dataBig data
Big data
 
Business Intelligence Presentation (1/2)
Business Intelligence Presentation (1/2)Business Intelligence Presentation (1/2)
Business Intelligence Presentation (1/2)
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?
 
Determine Your Data Strategy
Determine Your Data StrategyDetermine Your Data Strategy
Determine Your Data Strategy
 
Data at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and GovernanceData at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and Governance
 
Introduction To Data Science
Introduction To Data ScienceIntroduction To Data Science
Introduction To Data Science
 

Similar a Big Data Analytics

Business Analytics and Data mining.pdf
Business Analytics and Data mining.pdfBusiness Analytics and Data mining.pdf
Business Analytics and Data mining.pdf
ssuser0413ec
 
Big-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-KoenigBig-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-Koenig
Manish Chopra
 

Similar a Big Data Analytics (20)

Big data ppt
Big data pptBig data ppt
Big data ppt
 
Business Analytics and Data mining.pdf
Business Analytics and Data mining.pdfBusiness Analytics and Data mining.pdf
Business Analytics and Data mining.pdf
 
Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2
 
The Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They NeedThe Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They Need
 
01-Introduction.pdf
01-Introduction.pdf01-Introduction.pdf
01-Introduction.pdf
 
A picture is worth a thousand words
A picture is worth a thousand wordsA picture is worth a thousand words
A picture is worth a thousand words
 
Data analytics introduction
Data analytics introductionData analytics introduction
Data analytics introduction
 
BIg Data Overview
BIg Data OverviewBIg Data Overview
BIg Data Overview
 
Ds01 data science
Ds01   data scienceDs01   data science
Ds01 data science
 
Introduction to data mining and data warehousing
Introduction to data mining and data warehousingIntroduction to data mining and data warehousing
Introduction to data mining and data warehousing
 
Introduction to Big Data Analytics
Introduction to Big Data AnalyticsIntroduction to Big Data Analytics
Introduction to Big Data Analytics
 
Big-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-KoenigBig-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-Koenig
 
Big Data - Module 1
Big Data - Module 1Big Data - Module 1
Big Data - Module 1
 
2. Business Data Analytics and Technology.pptx
2. Business Data Analytics and Technology.pptx2. Business Data Analytics and Technology.pptx
2. Business Data Analytics and Technology.pptx
 
Data analytics & its Trends
Data analytics & its TrendsData analytics & its Trends
Data analytics & its Trends
 
Intro big data analytics
Intro big data analyticsIntro big data analytics
Intro big data analytics
 
Dw 07032018-dr pl pradhan
Dw 07032018-dr pl pradhanDw 07032018-dr pl pradhan
Dw 07032018-dr pl pradhan
 
Big data by Mithlesh sadh
Big data by Mithlesh sadhBig data by Mithlesh sadh
Big data by Mithlesh sadh
 
Big data Analytics
Big data AnalyticsBig data Analytics
Big data Analytics
 
ch2 DS.pptx
ch2 DS.pptxch2 DS.pptx
ch2 DS.pptx
 

Último

Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
Areesha Ahmad
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Sérgio Sacani
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
seri bangash
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
PirithiRaju
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
PirithiRaju
 

Último (20)

Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdf
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
 
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verifiedSector 62, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verified
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
 
Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its Functions
 
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
Dubai Call Girls Beauty Face Teen O525547819 Call Girls Dubai Young
Dubai Call Girls Beauty Face Teen O525547819 Call Girls Dubai YoungDubai Call Girls Beauty Face Teen O525547819 Call Girls Dubai Young
Dubai Call Girls Beauty Face Teen O525547819 Call Girls Dubai Young
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 

Big Data Analytics

  • 1. Big Data Analytics Ms.Humera Shaziya Department of Informatics Nizam College
  • 2. Outline • Introduction to Big Data • Characteristics of Big Data ▫ Volume ▫ Velocity ▫ Variety • Challenges of Big Data • Examples of Big Data • Definition of Big Data Analytics • Types of Analytics • Applications of Big Data Analytics • Recommendation System
  • 3. Big Data • Big Data is a huge volume of data that cannot be stored and processed using the traditional approach within a given time frame • The definition of Big Data, given by Gartner is, “Big data is high-volume, and high-velocity and/or high-variety information assets that demand cost-effective, innovative forms of information processing that enable enhanced insight, decision making, and process automation”. • It refers to any dataset which cannot be analyzed using popular and conventional tools and requires specialized tools for analysis • Any dataset in terabytes or petabytes is considered to be big data
  • 4. Data • Information in raw or unorganized form (such as alphabets, numbers, or symbols) that refer to, or represent, conditions, ideas, or objects. Data is limitless and present everywhere in the universe • Eg., Student details • Data holds lot of valuable information • Organizations use data to gain insights
  • 5. Characteristics of Big Data • Volume: it refers to the amount of data that is getting generated • Velocity: it refers to the speed at which this data is generated • Variety: it refers to the different types of data that is getting generated
  • 6. 3V’s of Big Data Volume • Data quantity Velocity • Data Speed Variety • Data Types
  • 7. Volume: How huge data needs to be? • To classify data to be big when its volume is in terabytes, petabytes, exabytes and so on • Big Data refers to terabytes or petabytes of less- structured data that require Hadoop and/or non-relational databases for cost-effective, efficient processing.
  • 8. Data Measurement • Bit A bit is a value of either a 1 or 0 (on or off). • Nibble A Nibble is 4 bits. • Byte A Byte is 8 bits. 1 character, e.g. "a", is one byte. • Kilobyte (KB) A Kilobyte is 1,024 bytes. 2 or 3 paragraphs of text. • Megabyte (MB) A Megabyte is 1,048,576 bytes or 1,024 Kilobytes 873 pages of plaintext (1,200 characters) 4 books (200 pages or 240,000 characters)
  • 9. Gigabyte (GB) • A Gigabyte is 1,073,741,824 (230) bytes. 1,024 Megabytes, or 1,048,576 Kilobytes. ▫ 894,784 pages of plaintext (1,200 characters) ▫ 4,473 books (200 pages or 240,000 characters) ▫ 640 web pages (with 1.6MB average file size) ▫ 341 digital pictures (with 3MB average file size) ▫ 256 MP3 audio files (with 4MB average file size) ▫ 1 650MB CD
  • 10. Terabyte (TB) • A Terabyte is 1,099,511,627,776 (240) bytes, 1,024 Gigabytes, or 1,048,576 Megabytes. ▫ 916,259,689 pages of plaintext (1,200 characters) ▫ 4,581,298 books (200 pages or 240,000 characters) ▫ 655,360 web pages (with 1.6MB average file size) ▫ 349,525 digital pictures (with 3MB average file size) ▫ 262,144 MP3 audio files (with 4MB average file size) ▫ 1,613 650MB CD's ▫ 233 4.38GB DVD's ▫ 40 25GB Blu-ray discs
  • 11. Petabyte (PB) • A Petabyte is 1,125,899,906,842,624 (250) bytes, 1,024 Terabytes, 1,048,576 Gigabytes, or 1,073,741,824 Megabytes. ▫ 938,249,922,368 pages of plaintext (1,200 characters) ▫ 4,691,249,611 books (200 pages or 240,000 characters) ▫ 671,088,640 web pages (with 1.6MB average file size) ▫ 357,913,941 digital pictures (with 3MB average file size) ▫ 268,435,456 MP3 audio files (with 4MB average file size) ▫ 1,651,910 650MB CD's ▫ 239,400 4.38GB DVD's ▫ 41,943 25GB Blu-ray discs
  • 12. Exabyte (EB), Zettabyte (ZB) and Yottabyte • Exabyte (EB) ▫ An Exabyte is 1,152,921,504,606,846,976 (260) bytes, 1,024 Petabytes, 1,048,576 Terabytes, 1,073,741,824 Gigabytes, or 1,099,511,627,776 Megabytes. • Zettabyte (ZB) ▫ A Zettabyte is 1,180,591,620,717,411,303,424 (270) bytes, 1,024 Exabytes, 1,048,576 Petabytes, 1,073,741,824 Terabytes, 1,099,511,627,776 Gigabytes, or 1,125,899,910,000,000 Megabytes. • Yottabyte (YB) ▫ A Yottabyte is 1,208,925,819,614,629,174,706,176 (280) bytes, 1,024 Zettabytes, 1,048,576 Exabytes, 1,073,741,824 Petabytes, 1,099,511,627,776 Terabytes, 1,125,899,910,000,000 Gigabytes, or 1,152,921,500,000,000,000 Megabytes.
  • 13. Velocity: Data generated in every 60 seconds on Internet • 2+ million seraches on Google • 3+ million likes on facebook • 250,000 new photoes uploaded on facebook • 3 million items shared on facebook • 56,000 photos uploaded on instagram • 430,000 tweets sent on twitter • 150+ million emails sent
  • 14. Data generated in 60 secs on Internet • 2.7 million video views on youtube • 139,000 hours video watched on youtube • 300 hours video uploaded on youtube • 280,000 snaps sent on snapchat • 44 million messages processed on whatsapp • 486,000 photos shared on whatsapp • 70,000 video messages shared on whatsapp • 9800 articles pinned on pinterest
  • 15. Data generated in 60 secs on Internet • 195,000 minutes audio chat on wechat • 21 million messages sent on wechat • 100+ new domains registered • 95,000 apps download on android • 48,000 apps download on iPhone • 140+ submissions on reddit • 18,000 matches on tinder • 972,000 swipes daily on tinder
  • 16. Data generated in 60 secs on Internet • 69,500 hours video watched on netflix • 26 new reviews posted on ylp • 120 new accounts on linkedin • 39,300+ hours music listened on spotify • 14 new songs added on spotify
  • 17. Infographics covering the latest statistics on things that happen on internet every 60 seconds
  • 18.
  • 19. Variety: Types of Data There are three types • Structured: A data to which proper format is associated to it. Eg: Database tables, CSV files, and spreadsheets (XLS). • Semi-Structured: A data that does not have a proper format associated to it. Eg: emails, log word document. • Unstructured: A data that does not have any format associated to it. Eg: image, audio and video files
  • 20.
  • 21. Challenges of Big Data • There are two main challenges associated with ig data ▫ How do we store and manage such a huge data efficiently ▫ How do we process and extract valuable information from this huge volume of data within a time frame • These two challenges lead to the development of hadoop
  • 22. Hadoop • Hadoop is an open-source framework that allows to store and process big data in a distributed environment across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. • Developed by Doug Cutting and managed by the apache foundation
  • 23. Components of Hadoop • Hadoop Distributed File System (HDFS) : deals with storage of big data • MapReduce: deals with processing of big data
  • 24. Analytics • Analytics refers to the ability to collect and use data to generate insights to inform fact-based decision making • Analytics allows us to use sophisticated statistical algorithms and leverage computing power to explore, analyze and understand the data to generate insights from it and to discover hidden patterns and take advantage of this to make better decisions.
  • 25. Big Data Analytics • It refers to the huge dataset that has come about now a days which need to be analyzed and stored • When dealing with such huge data conventional tools are not enough to analyze and explore • In order to analyze this data one needs specialized tools designed to deal with such large amount of data • This is how the big data has come about
  • 26. 3 Broad Types of Analytics • On the basis of industry • On the basis of business function/ domain analytics • On the basis of insights offered
  • 27. Industry Analytics • Credit cards • Insurance • E-Commerce • Travel • Retail • Telecom • So on…
  • 28. Business Function/Domain Analytics • HR analytics • Finance analytics • Sales analytics • Supply chain analytics • Risk analytics • So on…
  • 29. Insights Analytics • Descriptive analytics • Predictive analytics • Prescriptive analytics
  • 30. Descriptive analytics • Descriptive analytics: it uses information from the past to make decisions in the present for the future. • It refers to a set of techniques used to describe or explore or profile any kind of data
  • 31. Predictive analytics • Predictive analytics: it works by identifying patterns and using statistics to make inferences • Predictive analysis identifies past data patterns and provides a list of likely outcomes for a given situation. By studying recent and historical data, predictive analysis presents you with a forecast of what may happen in the future.
  • 32. Prescriptive analytics • Prescriptive analysis reveals actions that should be taken and provides recommendations for next steps, letting you answer your business questions in a focused manner. It goes beyond predictive data analytics, since it recommends multiple courses of action with likely outcomes for each decision.
  • 33. Analytics Tools • Open source ▫ R ▫ Hadoop with mahout ▫ Weka • commercial ▫ SAS ▫ SPSS
  • 34. Job titles on Big Data • Big Data Architect – Analytics ▫ Focused on creating views on top of structured and non-structured data and presenting that data in a portal framework. Will initially focus on data mining and data visualization using the latest in open source data mining/data presentation technology.... In addition, the team will begin to pull in other sources of data such as BI, user feedback and social to help us better understand our customer.
  • 35. Job titles on Big Data • Big Data Analyst ▫ Help better understand, test and use vast volumes of data. Support the business through advanced analysis and design, maintenance, and implementation of reports and databases. Design and build scalable infrastructure and platforms to collect and process very large amounts of structured, unstructured and real-time data. Analyze large volumes of data from disparate types of sources and present findings to senior management.
  • 36. Job titles on Big Data • Principal Engineer, Big Data ▫ Skills will be applied to solving problems impacting millions of customers. Explores large data volumes using state of the art tools and techniques to find solutions to practical business problems.
  • 37. Applications of Big Data Analytics • Big Data for financial services: Credit card companies, retail banks, private wealth management advisories, insurance firms, venture finds, and institutional investment banks use big data for their financial services. The common problem among them all is the massive amounts of multi structured data living in multiple disparate systems which can be solved by big data. Thus big data is used in a number of ways like: • Customer analytics • Compliance analytics • Fraud analytics • Operational analytics
  • 38. Applications of Big Data Analytics • Big Data in communications: Gaining new subscribers, retaining customers, and expanding within current subscriber bases are top priorities for telecommunication service providers. The solutions to these challenges lie in the ability to combine and analyze the masses of customer generated data and machine generated data that is being created every day. • Big Data for Retail: Brick and Mortar or an online e-tailer, the answer to staying the game and being competitive is understanding the customer better to serve them. This requires the ability to analyze all the disparate data sources that companies deal with every day, including the weblogs, customer transaction data, social media, store branded credit card data, and loyalty program data.
  • 39. Applications of Big Data Analytics • Healthcare: The main challenge for hospitals with cost pressures tightens is to treat as many patients as they can efficiently, keeping in mind the improvement of quality of care. Instrument and machine data is being used increasingly to track as well as optimize patient flow, treatment, and equipment use in the hospitals. It is estimated that there will be a 1% efficiency gain that could yield more than $63 billion in the global health care savings.
  • 40. Applications of Big Data Analytics • Travel: Data analytics is able to optimize the buying experience through the mobile/ web log and the social media data analysis. Travel sights can gain insights into the customer’s desires and preferences. Products can be up-sold by correlating the current sales to the subsequent browsing increase browse-to-buy conversions via customized packages and offers. Personalized travel recommendations can also be delivered by data analytics based on social media data.
  • 41. Applications of Big Data Analytics • Gaming: Data Analytics helps in collecting data to optimize and spend within as well as across games. Game companies gain insight into the dislikes, the relationships, and the likes of the users. • Energy Management: Most firms are using data analytics for energy management, including smart-grid management, energy optimization, energy distribution, and building automation in utility companies. The application here is centered on the controlling and monitoring of network devices, dispatch crews, and manage service outrages. Utilities are given the ability to integrate millions of data points in the network performance and lets the engineers to use the analytics to monitor the network.
  • 42.
  • 44. Recommendation systems • Recommendation systems are software tools or techniques providing suggestions for items to be of use to a user. • The suggestions relate to various decision making processes, such as ‘what items to buy’, ‘what music to listen’, ‘what online news to read’ Etc.
  • 45. Where is it used? • Massive E-commerce sites use this tool to suggest other items a consumer may want to purchase. • Offer news articles to on-line newspaper readers, based on a prediction of reader interests. • Offer customers of an on-line retailer suggestion about what they might like to buy based on their past history of purchases and/or product searches.
  • 46. Types of Recommendation systems • Content-Based System • Collaborative Filtering System • Hybrid Recommender system
  • 47. Content-Based System • A content based recommender works with data that the user provides, either explicitly (rating) or implicitly (clicking on a link). • Content-based systems examine properties of the items recommended. For instance, if a Netflix user has watched many cowboy movies, then recommend a movie classified in the database as having the “cowboy” genre.
  • 49. The recommendation process is Performed in three steps 1. Content Analyzer 2. Profile Learner 3. Filtering Component
  • 50. Advantages of Content-Based Recommendation System • User Independence • Transparency • New Item
  • 51. Collaborative Filtering • Collaborative filtering is a popular recommendation algorithm that bases its predictions and recommendations on the ratings or behavior of other users in the system. • Collaborative filtering systems recommend items based on similarity measures between users and/or items. • The items recommended to a user are those preferred by similar users.
  • 52. How Collaborative Filtering system Works • Asking a user to rate an item on a sliding scale. • Asking a user to rank a collection of items from favorite to least favorite. • Asking a user to create a list of items that he/she likes
  • 53. How Collaborative Filtering system Works • Observing the items that a user views in an online store. • Keeping a record of the items that a user purchases online. • Obtaining a list of items that a user has listened to or watched on his/her computer.
  • 55. Websites Uses Collaborative Filtering system • Amazon • Facebook • MySpace • LinkedIn • Twitter
  • 56.
  • 57. Advantages of collaborative Filtering recommender systems • The notable advantage is that Collaborative Filtering systems can produce personalized recommendations, because they consider other people’s experience and recommendations are based on that experience. • Another notable advantage is that the CF recommender systems can suggest serendipitous items by observing similar-minded people’s behavior.
  • 58. Hybrid Recommender system • Hybrid recommendation systems work on characteristics that are related to both Content- based and Collaborative Recommender system. • Netflix is a good example of the use of hybrid recommender systems. • Netflix makes recommendations by comparing the watching and searching habits of similar users.
  • 59.
  • 60. ADVANTAGES OF RECOMMENDATION SYSTEM • Drive Traffic • Provide Relevant Material • Engage Customers • Transform Shoppers to Clients • Boost Number of Items per Order • Offer Recommendations and Direction
  • 61. Conclusion Accordingly, these days with technology improvement and also increasing the quantity of data we need a method and system that can help people to find their interests and their items with less effort and also with spending less time with more accurate. There are several ways that we can exploit them to reach these goals like Collaborative filtering (CF) that suggests items based on history valuation of all users communally, Content base filtering which recommend according to previous users’ precedence, and also Hybrid system that is combination of two techniques foresaid. These approaches have several advantages and disadvantages that at this research have tried to focus mostly on the recommendation approaches. Although, recommendation systems with these conditions help users to find their preferences a lot they must be improved more and more.