1. UNIT : II
Chracteristics of Data
Composition: deals with the structure of data i.e. sources of
data, types of data, nature of data.
Condition: deals with state of data i.e.
Context: deals with generation of data, sensitivity of data.
2. Evolution of Big Data
In 1970s : The data was essentially primitive and
structured.
In 1980s and 1990s : Relational databases evolved,
so the era was of Data-intensive applications.
In 2000 and beyond : WWW and IoT have led to
structured, unstructured and multimedia data.
3. Big Data
Define Big Data?
It's anything beyond imagination.
Today's BIG may be tomorrow's NORMAL.
Terabytes, Petabytes or Zettabytes of data.
About 3V's.
4. In 2001 industry analyst Doug Laney defines “Big Data” as the three
V’s (3Vs): Volume, Velocity and Variety.
In 2012 Gartner update this definition as, “Big Data” is high-volume,
high-velocity & high-variety information assets that demand cost-
effective, innovative form of information processing for enhanced
insight and decision making.
Big data is an evolving term that describes any voluminous amount
of structured, semi-structured and unstructured data that has the
potential to be mined for information.
Big Data
5. Challenges with Big Data
Challenges with Big Data
Capture
Storage
Curation
Search
Analysis
Transfer
Visualization
Privacy
6. Characteristics of Big Data
Big data is broken by three characteristics.
Extremely largeVolume of data
Extremely highVelocity of data
Extremely wideVariety of data
7.
8. Other characteristics of data which
are not definitional for Big Data
Veracity and Validity : deals with abnormality, accuracy and
correctness
Volatility : deals with data validity
Variability : deals with data floe which is highly inconsistent
9. Why Big Data?
More Data
More Acurate Analysis
More Confidence in
decision making
Impact in terms of enhancing
operational efficiency,
reducing cost & time,
innovating New products, new services,
Optimized offerings etc.
10. We are only Consumers or
information producers?
Consider one scenario :
11. 1. Text msg. To attend the party.
2. use of credit/debit card at the petrol pump.
3. Point-of-sale sys. At Archie's shop.
4. Photographs & posts on social networking
sites.
5. Likes & comments to your post.
12. BI Versus Big Data
Bisiness Intelligence(BI)
1. All enterprise's data is
housed in a central server
2. Tipical database server
scales data Vertically
3. BI data analyzed in an offline
mode
4. BI is about Structured Data
5. Move Data to code
Big Data
1. Data resides in a
distributed file system
2. Distributed file system
scales data Horizontally
3. Big Data analyzed in both
real time as well as
offline mode.
4. Big Data is about veriety
data
5. Move Code to data
13. Typical Data Warehouse Environment
ERP
(Enterprise Resource
Planning)
CRM
(Customer Relationship
Management)
Third party apps
Legacy System
Data
Warehouse
Reporting/
Dashbording
OLAP
Ad hoc querying
Modeling
14. Typical Hadoop Environment
Web Logs
Images and Videos
Docs and PDFs
Social Media
HDFS
Operational System
Data Warehouse
Data Mart
ODS
(Operational Data Store)
Data MartHadoop
MapReduce
15. Functional Requirements of Big Data
Big Data
Big Data
Big Data
(1)
Collection
(2)
Integration
(3)
Analysis
(4)
Actions
Decisions
16. Big Data Stack
Big Data technical Stack explain layered
architecture.
It is how to think about Big Data.
It is dealing with
– Storage
– Analytics
– Reporting
– Applications
Let's watch this Vedio....
18. Big Data Stack
Layer 0 (Redundant Physical Infrastructure) :
Deals with hardware, network & so on.
Performance: How responsive do you need the sys. To be?
performance of your machine, very fast infrastructures tends
to be very expensive.
Availability: Do you need a 100% uptime guarantee of
servise? Highly available infrastuctures are very expensive.
Scalability: How Big does your infrastructure need to be?
How much Disk space is needed?
Flexibility: How quickly can you add more resourses to the
infrastructure?
Cost: What can you afford?
19. Big Data Stack
Layer 1 (Security Infrastructure) :
Security and privacy requirements for big data are similar to the
requirements for conventional data environments.
Data Access: Data should be available to authorized person.
Application Access: Most API's offer protection from
unauthorized usage or access.
Data Encryption: It is most challenging aspect in Big Data
environment.
Threat Detection: The inclusion of mobile devices and social
networks exponentially increases both the amount of data and
opportunities for security threats.
20. Big Data Stack
Layer 2 (Operational Databases):
For Big Data environment it is needed to be have
fast & scalable database engine.
Use of RDBMS for Big Data is not practical
solution.
Choose Proper Database.
Your Database must support ACID.
21. Big Data Stack
Layer 3 (Organizing Data Services and Tools):
Organizing Data Services and Tools capture, validate and assemble
various big data elements in to contextually relevent collections.
Becouse Big data is massive.
Tools need to provide integration, translation, normalization and scale.
Technologies in this layer are as follows:
A Distributed File System
Serialization Service
Coordination Services
Extract, Transfer and Load (ETL) Tools
Workflow Services
22. Big Data Stack
Layer 4 (Analytical data Warehouses):
Data Warehouse and Data Mart contain normalized data
gathered from a variety of sources and assembled to facilitate
analysis of the business.
It is for creation of reports and visualization of disparate data
items.
23. Big Data Analytics:
It requires proper Analytical tools
This Architecture list three classes of tools.
Reporting and dashboards: this tools provide
“User-friendly” representation of information.
Visualization:
Analytics and Advanced Analytics: