2. CONTENTS
AN INTRODUCTION TOTHE WORLD OF DATA ?
WHAT IS BIG DATA ?
WHO CREATED BIG DATA ?
WHEN IT WAS CREATED ?
WHY BIG DATA ?
WHERE WE’RE USING IT ?
HOW TO USE IT ?
5’W 1’H
OF BIG DATA
3. WHAT IS DATA.....?
facts and statistics collected together for reference or analysis.
OR
the quantities, characters, or symbols on which operations
are performed by a computer, which may be stored and
transmitted in the form of electrical signals and recorded
on magnetic, optical, or mechanical recording media.
4. Types of data
Traditional
Document
Finance
Stock record
Personal files
Modern
Photographs
Audio & Video
3D Model
Simulation
Location Data
CAN BE HANDLED BY
RDBMS
DIFFICULT TO BE HANDLED BY
RDBMS
5. What is BIG DATA ?
Big data generally uses that data sets which can’t be
handled by the traditional software tools to capture ,curate
, manage and process data within a tolerable elapsed time.
As we know that size is a constantly varying target
ranging from few terabytes to many petabytes of data so we
can also say big data is a set of techniques required to
uncover large hidden values from large data sets that are
diverse complex and of a massive scale.
6. Who created BIG DATA ?
In his in the year 2001 paper 3D Data Management: Controlling
Data Volume, Velocity and Variety Doug Laney, analyst at Gartner,
defines three of what will come to be the commonly-accepted
characteristics of Big Data.
Commentators announce in 2005 that we are witnessing the
birth of “Web 2.0” – the user-generated web where the majority of
content will be provided by users of services, rather than the service
providers themselves. This year also witnessed the emerging of
HADOOP an open source platform used to store and process big data.
7. Contd...
In 2008 the world’s servers process 9.57 zettabytes (9.57 trillion
gigabytes) of information – equivalent to 12 gigabytes of information
per person, per day), according to the How Much Information? 2010
report. In International Production and Dissemination of Information,
it is estimated that 14.7 exabytes of new information are produced this
year.
In 2010 Eric Schmidt, executive chairman of Google, tells a conference
that as much data is now being created every two days, as was created
from the beginning of human civilization to the year 2003.
In 2014 The rise of the mobile machines – as for the first time, more
people are using mobile devices to access digital data, than office or
home computers. 88% of business executives surveyed by GE working
with Accenture report that big data analytics is a top priority for their
business.
8. Contd...
What this teaches us is that Big Data is not a new or
isolated phenomenon, but one that is part of a long
evolution of capturing and using data. Like other key
developments in data storage, data processing and the
Internet, Big Data is just a further step that will bring
change to the way we run business and society. At the same
time it will lay the foundations on which many evolutions
will be built.
9. The 3 V’s of big data....
Volume
(Amount of Data)
Velocity
(speed of processing)
Variety
(range and source)
10. Volume....
As the number of users increasing day by day the mount of
data used by them also increasing simultaneously.
Organisation Data processed(per day)
Ebay 100 pb
Google 100 pb
Baidu 10-100 pb
NSA 29 pb
Spotify 600 pb
Facebook 100 pb
Twitter 64 pb
11. Contd...
If we analyse these amount of data it would be easier for
the companies to know about their customers. however
traditional data processing system is not able to process
these amount of data .So we need a more reliable data
processing concept which is nothing but BIG DATA.
12. Velocity
The amount of data which are uploaded or downloaded by
the users of some organisation are exceeding the capacity
of their IT systems.
As we can see that the amount of data produced in last 5
years is the 90% of the whole data which are produced by
in last 20 years.
And in this speed data processing can’t be done by using
traditional RDBMS concepts.
13. Variety
Previously we’re dealing with few varieties of data such as
Document Finance Stock record Personal files
But now a days we’ve to deal with many kinds f data
such as videos ,music ,photographs , simulations and
3D models.
15. Contd...
Analysis type — Whether the data is analyzed in real time or batched
for later analysis. Give careful consideration to choosing the analysis
type, since it affects several other decisions about products, tools,
hardware, data sources, and expected data frequency. A mix of both
types may be required by the use case:
Fraud detection; analysis must be done in real time or near real time.
Trend analysis for strategic business decisions; analysis can be in batch
mode.
Processing methodology — The type of technique to be applied for
processing data (e.g., predictive, analytical, ad-hoc query, and
reporting). Business requirements determine the appropriate
processing methodology. A combination of techniques can be used.
The choice of processing methodology helps identify the appropriate
tools and techniques to be used in your big data solution.
16. Contd...
Content format — Format of incoming data — structured (RDMBS, for
example), unstructured (audio, video, and images, for example), or
semi-structured. Format determines how the incoming data needs to
be processed and is key to choosing tools and techniques and defining
a solution from a business perspective.
Data type — Type of data to be processed — transactional, historical,
master data, and others. Knowing the data type helps segregate the
data in storage.
Data frequency and size — How much data is expected and at what
frequency does it arrive. Knowing frequency and size helps determine
the storage mechanism, storage format, and the necessary pre-
processing tools. Data frequency and size depend on data sources:
On demand, as with social media data
Continuous feed, real-time (weather data, transactional data)
Time series (time-based data)
17. Contd...
Data source — Sources of data (where the data is generated) — web
and social media, machine-generated, human-generated, etc.
Identifying all the data sources helps determine the scope from a
business perspective. The figure shows the most widely used data
sources.
Data consumers — A list of all of the possible consumers of the
processed data: Business processes, Business users, Enterprise
applications, Individual people in various business roles, Part of the
process flows, Other data repositories or enterprise applications.
Hardware — The type of hardware on which the big data solution will
be implemented — commodity hardware or state of the art.
Understanding the limitations of hardware helps inform the choice of
big data solution.
19. Where we’re using....
In medicals
In social networking sites
For surveys
Science and Research
Real estate
Retail
Banking
Internet of things
Government sectors
20. Conclusion
The availability of Big Data, low-cost commodity hardware, and
new information management and analytic software have produced a
unique moment in the history of data analysis. The convergence of
these trends means that we have the capabilities required to analyze
astonishing data sets quickly and cost-effectively for the first time in
history. These capabilities are neither theoretical nor trivial. They
represent a genuine leap forward and a clear opportunity to realize
enormous gains in terms of efficiency, productivity, revenue, and
profitability.
The Age of Big Data is here, and these are truly revolutionary times
if both business and technology professionals continue to work
together and deliver on the promise.