This document provides an overview of big data and big data analytics. It defines big data as large volumes of diverse data that require advanced techniques and technologies to capture, manage, and process within a tolerable time frame. The document outlines the characteristics of big data, including volume, velocity, and variety. It also discusses challenges of big data, examples of big data applications, and different types of analytics including descriptive, predictive, and prescriptive. Recommendation systems are introduced as a type of predictive analytics.
2. Outline
• Introduction to Big Data
• Characteristics of Big Data
▫ Volume
▫ Velocity
▫ Variety
• Challenges of Big Data
• Examples of Big Data
• Definition of Big Data Analytics
• Types of Analytics
• Applications of Big Data Analytics
• Recommendation System
3. Big Data
• Big Data is a huge volume of data that cannot be stored
and processed using the traditional approach within a
given time frame
• The definition of Big Data, given by Gartner is, “Big data
is high-volume, and high-velocity and/or high-variety
information assets that demand cost-effective,
innovative forms of information processing that enable
enhanced insight, decision making, and process
automation”.
• It refers to any dataset which cannot be analyzed using
popular and conventional tools and requires specialized
tools for analysis
• Any dataset in terabytes or petabytes is considered to be
big data
4. Data
• Information in raw or unorganized form (such
as alphabets, numbers, or symbols) that refer to,
or represent, conditions, ideas, or objects. Data
is limitless and present everywhere in the
universe
• Eg., Student details
• Data holds lot of valuable information
• Organizations use data to gain insights
5. Characteristics of Big Data
• Volume: it refers to the amount of data that is
getting generated
• Velocity: it refers to the speed at which this data
is generated
• Variety: it refers to the different types of data
that is getting generated
6. 3V’s of Big Data
Volume
• Data
quantity
Velocity
• Data
Speed
Variety
• Data
Types
7. Volume: How huge data needs to be?
• To classify data to be big when its volume is in
terabytes, petabytes, exabytes and so on
• Big Data refers to terabytes or petabytes of less-
structured data that require Hadoop and/or
non-relational databases for cost-effective,
efficient processing.
8. Data Measurement
• Bit
A bit is a value of either a 1 or 0 (on or off).
• Nibble
A Nibble is 4 bits.
• Byte
A Byte is 8 bits.
1 character, e.g. "a", is one byte.
• Kilobyte (KB)
A Kilobyte is 1,024 bytes.
2 or 3 paragraphs of text.
• Megabyte (MB)
A Megabyte is 1,048,576 bytes or 1,024 Kilobytes
873 pages of plaintext (1,200 characters)
4 books (200 pages or 240,000 characters)
9. Gigabyte (GB)
• A Gigabyte is 1,073,741,824 (230) bytes. 1,024
Megabytes, or 1,048,576 Kilobytes.
▫ 894,784 pages of plaintext (1,200 characters)
▫ 4,473 books (200 pages or 240,000 characters)
▫ 640 web pages (with 1.6MB average file size)
▫ 341 digital pictures (with 3MB average file size)
▫ 256 MP3 audio files (with 4MB average file size)
▫ 1 650MB CD
10. Terabyte (TB)
• A Terabyte is 1,099,511,627,776 (240) bytes, 1,024
Gigabytes, or 1,048,576 Megabytes.
▫ 916,259,689 pages of plaintext (1,200 characters)
▫ 4,581,298 books (200 pages or 240,000 characters)
▫ 655,360 web pages (with 1.6MB average file size)
▫ 349,525 digital pictures (with 3MB average file size)
▫ 262,144 MP3 audio files (with 4MB average file size)
▫ 1,613 650MB CD's
▫ 233 4.38GB DVD's
▫ 40 25GB Blu-ray discs
11. Petabyte (PB)
• A Petabyte is 1,125,899,906,842,624 (250) bytes, 1,024
Terabytes, 1,048,576 Gigabytes, or 1,073,741,824
Megabytes.
▫ 938,249,922,368 pages of plaintext (1,200 characters)
▫ 4,691,249,611 books (200 pages or 240,000 characters)
▫ 671,088,640 web pages (with 1.6MB average file size)
▫ 357,913,941 digital pictures (with 3MB average file size)
▫ 268,435,456 MP3 audio files (with 4MB average file size)
▫ 1,651,910 650MB CD's
▫ 239,400 4.38GB DVD's
▫ 41,943 25GB Blu-ray discs
12. Exabyte (EB), Zettabyte (ZB)
and Yottabyte
• Exabyte (EB)
▫ An Exabyte is 1,152,921,504,606,846,976 (260) bytes, 1,024
Petabytes, 1,048,576 Terabytes, 1,073,741,824 Gigabytes, or
1,099,511,627,776 Megabytes.
• Zettabyte (ZB)
▫ A Zettabyte is 1,180,591,620,717,411,303,424 (270) bytes, 1,024
Exabytes, 1,048,576 Petabytes, 1,073,741,824 Terabytes,
1,099,511,627,776 Gigabytes, or 1,125,899,910,000,000
Megabytes.
• Yottabyte (YB)
▫ A Yottabyte is 1,208,925,819,614,629,174,706,176 (280) bytes,
1,024 Zettabytes, 1,048,576 Exabytes, 1,073,741,824 Petabytes,
1,099,511,627,776 Terabytes, 1,125,899,910,000,000 Gigabytes,
or 1,152,921,500,000,000,000 Megabytes.
13. Velocity: Data generated in every 60
seconds on Internet
• 2+ million seraches on Google
• 3+ million likes on facebook
• 250,000 new photoes uploaded on facebook
• 3 million items shared on facebook
• 56,000 photos uploaded on instagram
• 430,000 tweets sent on twitter
• 150+ million emails sent
14. Data generated in 60 secs on Internet
• 2.7 million video views on youtube
• 139,000 hours video watched on youtube
• 300 hours video uploaded on youtube
• 280,000 snaps sent on snapchat
• 44 million messages processed on whatsapp
• 486,000 photos shared on whatsapp
• 70,000 video messages shared on whatsapp
• 9800 articles pinned on pinterest
15. Data generated in 60 secs on Internet
• 195,000 minutes audio chat on wechat
• 21 million messages sent on wechat
• 100+ new domains registered
• 95,000 apps download on android
• 48,000 apps download on iPhone
• 140+ submissions on reddit
• 18,000 matches on tinder
• 972,000 swipes daily on tinder
16. Data generated in 60 secs on Internet
• 69,500 hours video watched on netflix
• 26 new reviews posted on ylp
• 120 new accounts on linkedin
• 39,300+ hours music listened on spotify
• 14 new songs added on spotify
19. Variety: Types of Data
There are three types
• Structured: A data to which proper format is
associated to it. Eg: Database tables, CSV files,
and spreadsheets (XLS).
• Semi-Structured: A data that does not have a
proper format associated to it. Eg: emails, log
word document.
• Unstructured: A data that does not have any
format associated to it. Eg: image, audio and
video files
20.
21. Challenges of Big Data
• There are two main challenges associated with ig
data
▫ How do we store and manage such a huge data
efficiently
▫ How do we process and extract valuable
information from this huge volume of data within
a time frame
• These two challenges lead to the development of
hadoop
22. Hadoop
• Hadoop is an open-source framework that
allows to store and process big data in a
distributed environment across clusters of
computers using simple programming models. It
is designed to scale up from single servers to
thousands of machines, each offering local
computation and storage.
• Developed by Doug Cutting and managed by the
apache foundation
23. Components of Hadoop
• Hadoop Distributed File System (HDFS) : deals
with storage of big data
• MapReduce: deals with processing of big data
24. Analytics
• Analytics refers to the ability to collect and use
data to generate insights to inform fact-based
decision making
• Analytics allows us to use sophisticated
statistical algorithms and leverage computing
power to explore, analyze and understand the
data to generate insights from it and to discover
hidden patterns and take advantage of this to
make better decisions.
25. Big Data Analytics
• It refers to the huge dataset that has come about
now a days which need to be analyzed and stored
• When dealing with such huge data conventional
tools are not enough to analyze and explore
• In order to analyze this data one needs
specialized tools designed to deal with such large
amount of data
• This is how the big data has come about
26. 3 Broad Types of Analytics
• On the basis of industry
• On the basis of business function/ domain
analytics
• On the basis of insights offered
30. Descriptive analytics
• Descriptive analytics: it uses information from
the past to make decisions in the present for the
future.
• It refers to a set of techniques used to describe or
explore or profile any kind of data
31. Predictive analytics
• Predictive analytics: it works by identifying
patterns and using statistics to make inferences
• Predictive analysis identifies past data patterns
and provides a list of likely outcomes for a given
situation. By studying recent and historical data,
predictive analysis presents you with a forecast
of what may happen in the future.
32. Prescriptive analytics
• Prescriptive analysis reveals actions that should
be taken and provides recommendations for
next steps, letting you answer your business
questions in a focused manner. It goes beyond
predictive data analytics, since it recommends
multiple courses of action with likely outcomes
for each decision.
34. Job titles on Big Data
• Big Data Architect – Analytics
▫ Focused on creating views on top of structured
and non-structured data and presenting that data
in a portal framework. Will initially focus on data
mining and data visualization using the latest in
open source data mining/data presentation
technology.... In addition, the team will begin to
pull in other sources of data such as BI, user
feedback and social to help us better understand
our customer.
35. Job titles on Big Data
• Big Data Analyst
▫ Help better understand, test and use vast volumes
of data. Support the business through advanced
analysis and design, maintenance, and
implementation of reports and databases. Design
and build scalable infrastructure and platforms to
collect and process very large amounts of
structured, unstructured and real-time data.
Analyze large volumes of data from disparate
types of sources and present findings to senior
management.
36. Job titles on Big Data
• Principal Engineer, Big Data
▫ Skills will be applied to solving problems
impacting millions of customers. Explores large
data volumes using state of the art tools and
techniques to find solutions to practical business
problems.
37. Applications of Big Data Analytics
• Big Data for financial services: Credit card companies,
retail banks, private wealth management advisories,
insurance firms, venture finds, and institutional
investment banks use big data for their financial
services. The common problem among them all is the
massive amounts of multi structured data living in
multiple disparate systems which can be solved by big
data. Thus big data is used in a number of ways like:
• Customer analytics
• Compliance analytics
• Fraud analytics
• Operational analytics
38. Applications of Big Data Analytics
• Big Data in communications: Gaining new subscribers,
retaining customers, and expanding within current subscriber
bases are top priorities for telecommunication service
providers. The solutions to these challenges lie in the ability to
combine and analyze the masses of customer generated data
and machine generated data that is being created every day.
• Big Data for Retail: Brick and Mortar or an online e-tailer, the
answer to staying the game and being competitive is
understanding the customer better to serve them. This
requires the ability to analyze all the disparate data sources
that companies deal with every day, including the weblogs,
customer transaction data, social media, store branded credit
card data, and loyalty program data.
39. Applications of Big Data Analytics
• Healthcare: The main challenge for hospitals with
cost pressures tightens is to treat as many patients
as they can efficiently, keeping in mind the
improvement of quality of care. Instrument and
machine data is being used increasingly to track as
well as optimize patient flow, treatment, and
equipment use in the hospitals. It is estimated that
there will be a 1% efficiency gain that could yield
more than $63 billion in the global health care
savings.
40. Applications of Big Data Analytics
• Travel: Data analytics is able to optimize the
buying experience through the mobile/ web log
and the social media data analysis. Travel sights
can gain insights into the customer’s desires and
preferences. Products can be up-sold by
correlating the current sales to the subsequent
browsing increase browse-to-buy conversions
via customized packages and offers.
Personalized travel recommendations can also
be delivered by data analytics based on social
media data.
41. Applications of Big Data Analytics
• Gaming: Data Analytics helps in collecting data to
optimize and spend within as well as across games.
Game companies gain insight into the dislikes, the
relationships, and the likes of the users.
• Energy Management: Most firms are using data analytics
for energy management, including smart-grid
management, energy optimization, energy distribution,
and building automation in utility companies. The
application here is centered on the controlling and
monitoring of network devices, dispatch crews, and
manage service outrages. Utilities are given the ability to
integrate millions of data points in the network
performance and lets the engineers to use the analytics
to monitor the network.
44. Recommendation systems
• Recommendation systems are software tools or
techniques providing suggestions for items to be
of use to a user.
• The suggestions relate to various decision
making processes, such as ‘what items to buy’,
‘what music to listen’, ‘what online news to read’
Etc.
45. Where is it used?
• Massive E-commerce sites use this tool to
suggest other items a consumer may want to
purchase.
• Offer news articles to on-line newspaper readers,
based on a prediction of reader interests.
• Offer customers of an on-line retailer suggestion
about what they might like to buy based on their
past history of purchases and/or product
searches.
47. Content-Based System
• A content based recommender works with data
that the user provides, either explicitly (rating)
or implicitly (clicking on a link).
• Content-based systems examine properties of
the items recommended. For instance, if a
Netflix user has watched many cowboy movies,
then recommend a movie classified in the
database as having the “cowboy” genre.
51. Collaborative Filtering
• Collaborative filtering is a popular
recommendation algorithm that bases its
predictions and recommendations on the ratings
or behavior of other users in the system.
• Collaborative filtering systems recommend
items based on similarity measures between
users and/or items.
• The items recommended to a user are those
preferred by similar users.
52. How Collaborative Filtering system
Works
• Asking a user to rate an item on a sliding scale.
• Asking a user to rank a collection of items from
favorite to least favorite.
• Asking a user to create a list of items that he/she
likes
53. How Collaborative Filtering system
Works
• Observing the items that a user views in an
online store.
• Keeping a record of the items that a user
purchases online.
• Obtaining a list of items that a user has listened
to or watched on his/her computer.
57. Advantages of collaborative
Filtering recommender systems
• The notable advantage is that Collaborative
Filtering systems can produce personalized
recommendations, because they consider other
people’s experience and recommendations are
based on that experience.
• Another notable advantage is that the CF
recommender systems can suggest serendipitous
items by observing similar-minded people’s
behavior.
58. Hybrid Recommender system
• Hybrid recommendation systems work on
characteristics that are related to both Content-
based and Collaborative Recommender system.
• Netflix is a good example of the use of hybrid
recommender systems.
• Netflix makes recommendations by comparing
the watching and searching habits of similar
users.
59.
60. ADVANTAGES OF RECOMMENDATION
SYSTEM
• Drive Traffic
• Provide Relevant Material
• Engage Customers
• Transform Shoppers to Clients
• Boost Number of Items per Order
• Offer Recommendations and Direction
61. Conclusion
Accordingly, these days with technology
improvement and also increasing the quantity of data we
need a method and system that can help people to find
their interests and their items with less effort and also with
spending less time with more accurate. There are several
ways that we can exploit them to reach these goals like
Collaborative filtering (CF) that suggests items based on
history valuation of all users communally, Content base
filtering which recommend according to previous users’
precedence, and also Hybrid system that is combination of
two techniques foresaid.
These approaches have several advantages and
disadvantages that at this research have tried to focus
mostly on the recommendation approaches. Although,
recommendation systems with these conditions help users
to find their preferences a lot they must be improved more
and more.