SlideShare una empresa de Scribd logo
1 de 81
Unraveling Big Data

Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved.
Our Goal for Today
1. Evolution of digital data over the decades
2. Why do we process data – and how?

3. How all this has been changing in the last decade?
4. What is Big Data and how to handle it?

5. Who needs to understand Big Data?
6. What are the Big Data related opportunities?
7. Discussions and Q&A

Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved.

2
Setting The Context

Managerial Leadership and Team

3
Bits, Bytes, and Beyond
Name

Value

Example

Bit

A BIT !!

Byte

8 Bits

1 Character

Kilobyte

1024 (1K) Bytes

About 150 words

Megabyte

1K Kilobytes

A small book

Gigabyte

1K Megabytes

20 GB = All of Beethoven’s work

Terabyte

1K Gigabytes

1000 copies of Encyclopedia
Britannica

Petabyte

1K Terabytes

500 billion pages of standard printed
text

Exabyte

1K Petabytes

5 EB = All words ever spoken by
mankind

Zettabyte

1K Exabyte

1 ZB = Entire planet’s digital content

Yottabyte

1K Zettabye

1 YB = will take 11 Trillion years to
download!

Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved.

4
History of Data Storage Capacity
1956

Hard Drive from IBM : 5 MB

1963

Audio Tape : 663 KB

1970

Floppy Disk : 80 KB

1976

Floppy Disk : 110 KB

1981

Floppy Disk : 1.4 MB

1982

CD : 700 MB

1995

DVD : 4.7 GB

2003 BLU RAY : 25 GB
Hard Disks : Multi Terabyte
WWW & CLOUD

Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved.

5
Cost Per Gigabyte

YEAR

COST / GB

1980

$ 3,000,000

1990

$ 8,000

2000

$ 30

2010

$ 0.08

Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved.

6
Prior to the 80’s
 E-commerce did not exist.
 Data entry, storage, and processing were sequential
processes – and displaced in time.
 Data was processed on monolithic computers running on
mainframes.
 Batch processing was the norm.
 Data processing was used in non-time-critical areas such as
payroll and accounting.
 Only large enterprises and institutions could afford data
processing.
 Data processing could only support long term analysis and
decision making processes – such as planning.
Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved.

7
Prior to the 80’s…

Data was largely STRUCTURED

Managerial Leadership and Team

8
Structured Data

Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved.

9
Structured Data Cont...

Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved.

10
Data Processing in the 80’s and Before
Data creation was a controlled process.
Rate of data creation was known and manageable.

Data creation and processing : Co-located.

Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved.

11
Database Systems of the 80’s and Prior
Navigational

Relational

Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved.

12
In the 90’s
 Better connectivity allowed data to be collected from
distributed, but finite sources.
 Data created was directly captured and stored online.

 Online Transaction Processing (OLTP) systems emerged.
 Data processing could now support operational decision
making since data capture and processing could be done real
time.

Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved.

13
In the 90’s Cont...

 Data creation was still a controlled step and data was
structured.
 Volumes of data generated was manageable.

 Data processing was still centralized.
 Relational Databases ruled the world of data processing.

Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved.

14
Then…

“INTERNET HAPPENED”
Changing the way we live in this world …

Managerial Leadership and Team

1
5
Internet Traffic Trends

Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved.

16
Early Years of Internet

Internet enabled
e-commerce

B2B Transactions

B2C Transactions






Banking and Finance
Travel and Hospitality
Retail
Health Care

Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved.

17
Early Years of Internet Cont...
 Volume of online transactions rapidly increased.
 Database systems had to separate online processing from
analysis to cope with the transaction volume.
 Data Warehousing emerged.
 Distributed databases also made their appearance.

Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved.

18
Early Years of Internet Cont...
 In the early days, the processed data was still structured
since it dealt with e-commerce transactions.
 The need was for systems that focused on transactions:
validation and recording.

 Consequently, transaction and analysis systems had to be
separated.
ETL (Extract Transform Load) processes managed data
conversion from one form to another (transaction 
analysis).

Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved.

19
In the New Millennium
Rapid adoption of Internet.

Explosion of e-commerce : Especially B2C.
The Internet enabled customers to seek out the best deal.
Businesses had to proactively entice customers.
• To consume their products and services.
• At the point of purchase.

Data processing moved from playing a supportive role to a
“Business Critical” role.
• Nature of certain businesses completely changed.

Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved.

2
0
Then Came SOCIAL NETWORKING and MOBILITY

Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved.

21
Impact of Social Networking
Success of B2C business transactions now depends on the ability to
analyze customers’ past and current behaviour real-time!
Social Networking has become a source of valuable information to
understand customer choice and behaviour.

Social Networking

=

Unstructured Data

Social Networking

=

Extremely large data generation rates

Social Networking

=

Highly distributed

Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved.

2
2
Unstructured and Distributed Data

Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved.

2
3
Unstructured Data

Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved.

2
4
Unstructured Data Cont...

Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved.

25
Unstructured Data Cont...

Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved.

2
6
Very High Data Creation Rates

Year

Data Estimate

2002

5 Billon GB

2006

161 Billion GB

2010

1277 Billion GB

2015

7910 Billion GB
Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved.

27
The Situation Today…

Every two days now we create as much information as we did
from the dawn of civilization up until 2003.
- Erik Schmidt, GOOGLE

Structured Data constitutes only 5% of the
total “Data Deluge”.

Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved.

28
Business Processes – Then and Now

Then

Now

Anticipate product / service
need

Anticipate product / service
need

Marketing

Marketing

Sales

Sales

Transaction

Transaction

Analysis

Analysis

Refinement

Refinement

Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved.

29
Who Needs Rapid Data Analysis
Banking and Finance
Credit / Debit / ATM card transactions
• Collaboration between banks
• Fraud detection
• Real-time analysis of CCTV to detect and prevent ATM
attacks

Credit / Loan approval
• Credit analysis based on credit history as well as social
network traces

Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved.

30
B2C ecommerce Sites (Online Stores)

Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved.

31
B2C – Product Comparison Sites

Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved.

32
Data Analysis in Elections

The last USA elections
Data-driven decision making played
a huge role in creating a second
term for the 44th President and will
be one of the more closely studied
elements of the 2012 cycle.
Time: Nov 10, 2012
Obama Election Head Office - Chicago

Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved.

33
Crime Investigation / Prevention / Surveillance
Processing of email / chat / phone call traces
• Accessed by Govt. agencies

Processing of Facebook / Twitter posts / Chats
• Sentiment analysis for crime prevention

Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved.

34
Common to All These Situations…
•

UNSTRUCTURED data.

•

Very large data sets – dynamic and rapidly increasing by
the minute.
o Terabytes of Data (BIG DATA)

•

Highly dispersed and distributed data generation.

•

Impossible to move such data to a central location for
processing.

•

At the same time, very critical to process data and
generate results real-time.

Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved.

35
Characteristics of New Age Data Processing
Systems
 Ability to handle unstructured data.
 Ability to handle rapidly increasing volumes of data.
 Ability to operate on distributed data sets.
 Scalable.
 Reliable/Fault tolerant.
 Reasonable costs - one time & operational.
These requirements have led to increasing interest in BIG
DATA the development of newer Data Storage & Analysis
Techniques.
Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved.

36
Growing Interest in Big Data

Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved.

37
Conventional Database Systems
Relational

Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved.

38
Conventional Database Systems Cont…

Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved.

39
Data Models and Database Systems Over the Years

Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved.

40
History of Data Models and Database Systems

MAP REDUCE,
COLUMNAR DATABASES
& NO-SQL DATABASES

Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved.

41
How to Tackle Big Data – In Simple Words
1. Break down the problem into manageable chunks.
2. Spread the data and its processing it over a number of nodes
– typically cheap computers.
3. Manage the process to ensure that nothing gets lost.
4. Re-assemble the answer from the various parts to get your
query answered.

Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved.

42
Map – Reduce : Technique to Handle BIG DATA

Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved.

43
Map – Reduce : Technique to Handle BIG DATA Cont...

Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved.

44
The Map – Reduce Technique
Advantages

Drawbacks

Can handle both, structured and
unstructured data.

Not very easy to setup and use.

Can scale up with data size.

Raw Map - Reduce requires
programming to set up.

Open source implements
available: Reasonable costs.

Basic Map - Reduce suitable
largely for batch processing.
• (Real time techniques have
now been implemented to
overcome this drawback).

Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved.

45
Hadoop
Based on the Map-Reduce distributed processing architecture.
A task is mapped to a set of servers for processing.
Results from the servers are then reduced down to a singe set.

Hadoop operates on the HDFS distributed file system.
- HDFS ensures data redundancy.
Hadoop has in-built task management functionality to ensure
reliability.
Interfaces available with other components: Open Systems and
commercial.
Highly scalable and cost effective.

Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved.

46
HDFS
HDFS
Hadoop Distributed File System

Goals (Ref: Nortonworks)
• Store Petabytes of data.
• Keep per node costs down to afford more nodes (scalability).
• Commodity x86 servers, Open Source software.

• Support computation in each server.
• Handle failures: Failures treated like noise – inevitable.

Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved.

47
HDFS Cont...

Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved.

48
Big Data Analysis – The Big Picture!

Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved.

49
Components Relevant to Hadoop
Hbase
Database to store data and speed up queries.

Hive
Warehouse implementation to support Analytics, Query and
Visualization.

Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved.

50
HBase
HBase is a Columnar, NoSQL database system.
HBASE

RDBMS

Column oriented

Row oriented

Flexible schema, add columns on
the fly

Fixed Schema

Good with sparse tables (partially Not optimized for sparse tables
filled)
No query language

SQL

Wide tables

Narrow tables

Joins using Map – Reduce

Optimized for joins

Tight integration with Map
Reduce

Not integrated (usually) with MR

Horizontal scalability – just add
hardware

Hard to scale and size down

Good for semi-structured &
structured

Good only for structured data

Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved.

51
Hive
•

Hadoop can get difficult to configure and use!

•

Hive sits between Hadoop and the users of Hadoop.

•

It provides a familiar – TABLE like – environment for
dealing with Hadoop.

•

It allows Data to be:
o Read from Hadoop / HDFS

o Written into Hadoop / HDFS
o Queried from Hadoop / HDFS using the much
familiar SQL like syntax
•

In the background, Hive efficiently converts all queries
into efficient MAP – REDUCE tasks.

•

Hive is a Data Warehouse system for Hadoop.

Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved.

52
HBase v/s Hive
HBase

Hive

Typically used for unstructured
data and sparse tables.

Typically used as a Data
Warehouse.

Allows low latency random data
access.

Main purpose is analysis and adhoc querying.

Main purpose is continuous
operations such as accepting
data feeds and committing them
to HDFS.

Deals with Structured Data
resulting from analysis of data
stored in HDFS.

Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved.

53
Pioneers of Big Data
eBay

In excess of 2500 computing cores

Yahoo

In excess of 4000 nodes

Facebook

More than 23,000 nodes

Google

?? (24 Pb of data/day)

LinkedIn

??
Source: Slide by Ian Brown

Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved.

54
Big Data Solution Suppliers
Informatica

EMC

Oracle

IBM
Microsoft
Teradata

Amazon

Cloudera
Apache
Google
Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved.

55
Who Uses Big Data (2011)

Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved.

56
Case Study : redBus.in
•

redBus.in : Internet based bus ticket booking

•

Handles more than 10,000 routes

•

Goal
o To capture each and every event happening on their
website & co-relate them
o To identify if booking failures were due to absence of
supply, or due to server problems
o To understand which routes needed more buses

•

Volume of data: 500 GB

•

Expected response time: Less than 1 minute

•

Tool / service used : BigQuery from Google

Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved.

57
Case Study : Seagate
•

Seagate : Has manufactured more than 2 Billion hard drives

•

They maintain data comprising:
o Information related to the 2 Billion hard drives
o Manufacturing information
o Supplier information
o Customer information

•

400 GB of data added per day to the Warehouse

•

Used Big Data techniques to analyze Test Data

•

Impact : Overall improvement in quality due to sharp
identification of process and supplier issues

•

Tools used: Not known

Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved.

58
Case Study : Macy’s
•

They want to prevent an overload of irrelevant promotions going to
their customers.

•

They are sending fewer, more focused messages to individual
clients about products and special offerings that have a high
likelihood of being appealing to that person.

•

They are combining point-of-sale information with
o online browsing behaviours

o response to emails
o social media activity

o and more …
•

To get a 360-degree view of each customer.

•

The result: fewer, more meaningful interactions with customers
that drive greater loyalty, greater revenues, and lower churn.
Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved.

59
Other Applications of Big Data
 Epidemic prediction

 Weather predictions
 Scientific experiments generating very large amount of data such as the Super Collider.
 Astronomy
 Search for extra terrestrial intelligence

Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved.

60
Big Data Challenges
Hadoop and Big Data Technologies are time consuming to
set-up and use.
Building and running Hadoop jobs is non-trivial.
Running and analyzing queries and results does not leverage
existing skills.
Requires special teams to initiate in an organization – along
with associated costs.

Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved.

61
Who Should Know About Big Data
Decision Makers
To understand its capabilities and how to use it for Business gains.
Data Scientists

To be able to understand and apply the right techniques to solve
Big Data problems.
Big Data Applications Developers
To know the building blocks, and nuts and bolts of putting
together a Big Data processing system.
Big Data Analysts

IT Stafff
Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved.

62
Big Data Macro Trends
•

Information generation growing 2 times faster than
storage capacity.

•

Growth in data collection: 60% CAGR.

•

Information Management industry:

o Sized at $100 Billion
o Growing at 10% CAGR

•

Big Data sources are becoming more varied.
o Mobile phones, sensors, etc.

•

Total Internet traffic will exceed 667 Exabytes by 2013.

•

Third party data availability is on the rise.

•

Hadoop is the fastest growing Big Data : Downloads have
increased more than 400% in the last two years.

Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved.

63
Big Data Market Size Projection

Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved.

64
Future of Big Data

Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved.

65
Career Opportunities

Internet products and services companies
Manufacturing companies
Banking and finance
Pharma
Govt Departments

Direct
Opportunities

•
•
•
•
•

Indirect
Opportunities

• Handling outsourced Big Data analysis and
development projects for the above
organizations.

Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved.

66
Structured v/s Unstructured Data
Unstructured Data
1.

Structured Data

Web server and search engine
logs (“data exhaust”)

Customer databases

Logs from other types of servers
2. (e.g., telecom switches and
gateways)
3.

E-Commerce / Web Commerce
records

Legacy BI/ CRM/ ERP systems
Inventory and Supply Chain

4. Social Media / Gaming messages
5.

Multimedia – voice, video,
images

6.

Sensor data / M2M
communications

Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved.

67
Structured v/s Unstructured
Structured

Unstructured

Discrete (rows and
columns)

Binary large objects: Lessdefined boundaries, less-easily
addressable.
Small discrete objects:
Information represented for a very
specific purpose (e.g., SMTP Mail
Msg.).

Storage/Persistence

DBMS or file formats (e.g.,
VSAM).

Unmanaged, file structure or
content repository.

Metadata Focus

Syntax (e.g., location and
format).

Semantics (descriptive and other
markup).

Integration Tools

ETL or ELT, Enterprise
Information Integration via
BizTalk and Batch
Processing.

Batch Processing, Manual data
Entry, Custom solutions that involve
a lot of code.

Standards

SQL (and its multiple
Open XML, SMTP, SMS, CSV and
variations), ADO.Net, ODBC Information and Content Exchange.
and many RDBMS support
XML as another option.

Representation

Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved.

68
Evolution of Data Transfer Rates
Medium

Transfer Rate

Modems

56 Kilobits / Second

T-1 Line

1.544 Megabits / Second

Ethernet

10 Megabits / Second

Fast Ethernet (LAN)

100 Megabits / Second
1 Gigabits / Second

T-3

44.736 Megabits / Second

Optical Fibres

Upto 20 Gigabits / Second (Dedicated)

Next Internet Backbone

2.4 Gigabits / Second

Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved.

69
History of Analytics

Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved.

70
Prior to the 80’s

E-commerce did not exist

Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved.

71
Prior to the 80’s

Data entry, storage, and processing were sequential
and displaced in time

Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved.

72
Prior to the 80’s

Data was processed by monolithic applications
running on mainframes
Batch processing was the norm

Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved.

73
Prior to the 80’s

Data processing was used in non-time-critical areas
such as payroll, accounting

Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved.

74
Prior to the 80’s

Only large enterprises and institutions could afford
the cost of processing data

Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved.

75
Prior to the 80’s

Data processing could only support long term analysis
and decision making processes – such as planning

Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved.

76
Hadoop – Value Adding Projects/Products
Hadoop

1. HBase

2. Cassandra
3. Mongo
4. CouchDB

Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved.

77
Projects/Products Adding Value to Hadoop
The standard Hadoop database, an open-source, distributed, versioned,
column-oriented store, providing Bigtable-like capabilities over Hadoop.
HBase

Cassandra

HBase includes base classes for backing Hadoop MapReduce jobs; query
predicate push; optimizations for real time queries; a Thrift gateway and
a REST-ful web service to support XML, Protobuf, and binary data
encoding: an extensible JRu-by-based (JIRB) shell; and support for the
Hadoop metrics subsystem. Like Hadoop, HBase is an Apache project,
hosted at http://hbase.apache.org/

Apache Cassandra is a highly scalable second-generation distributed
database, bringing together Dynamo’s fully distributed design and
Bigtable's ColumnFamily-based data mode. The Cassandra project lives
at http://cassandra.apache.org/

A good example of using Cassandra together with Hadoop lies in the
Datastax Brisk platform - learn more at http://www.datastax.com/
Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved.

78
Projects/Products adding value to Hadoop Cont…

Mongo

An open source, scalable, high-performance, schema-free, documentoriented database written in C++. The MongoDB project is hosted at
http://www.mongodb.org/.
To use Mongo and Hadoop together, check out
https://github.com/mongodb/mongo-hadoop

Apache CouchDB is a document-oriented database supporting queries
and indexing in a MapReduce fashion using JavaScript.
CouchDB
CouchDB provides APls that can be accessed via HTTP requests to
support web applications. Learn more at http://couchdb.apache.org/

Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved.

79
Big Data Applications: Additional Ideas
Balance Sheet Analysis
Manufacturing Data Analysis
Production Systems Diagnostics and Pattern Identification

Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved.

80
Thank you for your attention!
Please ask questions, if any!

Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved.

8
1

Más contenido relacionado

La actualidad más candente

The Pros and Cons of Big Data in an ePatient World
The Pros and Cons of Big Data in an ePatient WorldThe Pros and Cons of Big Data in an ePatient World
The Pros and Cons of Big Data in an ePatient WorldPYA, P.C.
 
Big data (word file)
Big data  (word file)Big data  (word file)
Big data (word file)Shahbaz Anjam
 
Big data overview external
Big data overview externalBig data overview external
Big data overview externalBrett Colbert
 
10 Most Effective Big Data Technologies
10 Most Effective Big Data Technologies10 Most Effective Big Data Technologies
10 Most Effective Big Data TechnologiesMahindra Comviva
 
Data-Ed Webinar: Demystifying Big Data
Data-Ed Webinar: Demystifying Big Data Data-Ed Webinar: Demystifying Big Data
Data-Ed Webinar: Demystifying Big Data DATAVERSITY
 
Big Data Characteristics And Process PowerPoint Presentation Slides
Big Data Characteristics And Process PowerPoint Presentation SlidesBig Data Characteristics And Process PowerPoint Presentation Slides
Big Data Characteristics And Process PowerPoint Presentation SlidesSlideTeam
 
What is AI without Data?
What is AI without Data?What is AI without Data?
What is AI without Data?InnoTech
 
Chapter 4 what is data and data types
Chapter 4  what is data and data typesChapter 4  what is data and data types
Chapter 4 what is data and data typesPro Guide
 
Approaching Big Data: Lesson Plan
Approaching Big Data: Lesson Plan Approaching Big Data: Lesson Plan
Approaching Big Data: Lesson Plan Bessie Chu
 
Big data introduction
Big data introductionBig data introduction
Big data introductionvikas samant
 
Research issues in the big data and its Challenges
Research issues in the big data and its ChallengesResearch issues in the big data and its Challenges
Research issues in the big data and its ChallengesKathirvel Ayyaswamy
 
Sr. Jon Ander, Internet de las Cosas y Big Data: ¿hacia dónde va la Industria?
Sr. Jon Ander, Internet de las Cosas y Big Data: ¿hacia dónde va la Industria? Sr. Jon Ander, Internet de las Cosas y Big Data: ¿hacia dónde va la Industria?
Sr. Jon Ander, Internet de las Cosas y Big Data: ¿hacia dónde va la Industria? INACAP
 
Big data introduction - Big Data from a Consulting perspective - Sogeti
Big data introduction - Big Data from a Consulting perspective - SogetiBig data introduction - Big Data from a Consulting perspective - Sogeti
Big data introduction - Big Data from a Consulting perspective - SogetiEdzo Botjes
 
BIG Data and Methodology-A review
BIG Data and Methodology-A reviewBIG Data and Methodology-A review
BIG Data and Methodology-A reviewShilpa Soi
 

La actualidad más candente (20)

The Pros and Cons of Big Data in an ePatient World
The Pros and Cons of Big Data in an ePatient WorldThe Pros and Cons of Big Data in an ePatient World
The Pros and Cons of Big Data in an ePatient World
 
Big data (word file)
Big data  (word file)Big data  (word file)
Big data (word file)
 
L18 Big Data and Analytics
L18 Big Data and AnalyticsL18 Big Data and Analytics
L18 Big Data and Analytics
 
Big data overview external
Big data overview externalBig data overview external
Big data overview external
 
Big Data Trends
Big Data TrendsBig Data Trends
Big Data Trends
 
10 Most Effective Big Data Technologies
10 Most Effective Big Data Technologies10 Most Effective Big Data Technologies
10 Most Effective Big Data Technologies
 
Big data
Big dataBig data
Big data
 
Data-Ed Webinar: Demystifying Big Data
Data-Ed Webinar: Demystifying Big Data Data-Ed Webinar: Demystifying Big Data
Data-Ed Webinar: Demystifying Big Data
 
Big Data World
Big Data WorldBig Data World
Big Data World
 
Big Data Characteristics And Process PowerPoint Presentation Slides
Big Data Characteristics And Process PowerPoint Presentation SlidesBig Data Characteristics And Process PowerPoint Presentation Slides
Big Data Characteristics And Process PowerPoint Presentation Slides
 
What is AI without Data?
What is AI without Data?What is AI without Data?
What is AI without Data?
 
Big Data introduction - Café Numérique Bruxelles
Big Data introduction - Café Numérique BruxellesBig Data introduction - Café Numérique Bruxelles
Big Data introduction - Café Numérique Bruxelles
 
Chapter 4 what is data and data types
Chapter 4  what is data and data typesChapter 4  what is data and data types
Chapter 4 what is data and data types
 
Approaching Big Data: Lesson Plan
Approaching Big Data: Lesson Plan Approaching Big Data: Lesson Plan
Approaching Big Data: Lesson Plan
 
Big data introduction
Big data introductionBig data introduction
Big data introduction
 
Research issues in the big data and its Challenges
Research issues in the big data and its ChallengesResearch issues in the big data and its Challenges
Research issues in the big data and its Challenges
 
Sr. Jon Ander, Internet de las Cosas y Big Data: ¿hacia dónde va la Industria?
Sr. Jon Ander, Internet de las Cosas y Big Data: ¿hacia dónde va la Industria? Sr. Jon Ander, Internet de las Cosas y Big Data: ¿hacia dónde va la Industria?
Sr. Jon Ander, Internet de las Cosas y Big Data: ¿hacia dónde va la Industria?
 
Big data introduction - Big Data from a Consulting perspective - Sogeti
Big data introduction - Big Data from a Consulting perspective - SogetiBig data introduction - Big Data from a Consulting perspective - Sogeti
Big data introduction - Big Data from a Consulting perspective - Sogeti
 
Presentation on Big Data
Presentation on Big DataPresentation on Big Data
Presentation on Big Data
 
BIG Data and Methodology-A review
BIG Data and Methodology-A reviewBIG Data and Methodology-A review
BIG Data and Methodology-A review
 

Destacado (13)

Cook Business English overview mindmap
Cook Business English overview mindmapCook Business English overview mindmap
Cook Business English overview mindmap
 
Mindmap operational-transparency-in-sap-en final
Mindmap operational-transparency-in-sap-en finalMindmap operational-transparency-in-sap-en final
Mindmap operational-transparency-in-sap-en final
 
Qmeeting - Projetos de bi
Qmeeting  - Projetos de biQmeeting  - Projetos de bi
Qmeeting - Projetos de bi
 
QlikView Publisher
QlikView PublisherQlikView Publisher
QlikView Publisher
 
Label inventaris
Label inventarisLabel inventaris
Label inventaris
 
Animate mindmap in powerpoint
Animate mindmap in powerpointAnimate mindmap in powerpoint
Animate mindmap in powerpoint
 
Business intelligence com QlikView
Business intelligence com QlikViewBusiness intelligence com QlikView
Business intelligence com QlikView
 
Top 10 trends in business intelligence for 2015
Top 10 trends in business intelligence for 2015Top 10 trends in business intelligence for 2015
Top 10 trends in business intelligence for 2015
 
Business intelligence
Business intelligenceBusiness intelligence
Business intelligence
 
MindMaps
MindMapsMindMaps
MindMaps
 
Business intelligence
Business intelligenceBusiness intelligence
Business intelligence
 
Business Intelligence Presentation (1/2)
Business Intelligence Presentation (1/2)Business Intelligence Presentation (1/2)
Business Intelligence Presentation (1/2)
 
Business intelligence ppt
Business intelligence pptBusiness intelligence ppt
Business intelligence ppt
 

Similar a Big data by_mcal

Big Data - A Real Life Revolution
Big Data - A Real Life RevolutionBig Data - A Real Life Revolution
Big Data - A Real Life RevolutionCapgemini
 
BIGDATAPrepared ByMuhammad Abrar UddinIntrodu.docx
BIGDATAPrepared ByMuhammad Abrar UddinIntrodu.docxBIGDATAPrepared ByMuhammad Abrar UddinIntrodu.docx
BIGDATAPrepared ByMuhammad Abrar UddinIntrodu.docxtangyechloe
 
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)Hritika Raj
 
big-data-8722-m8RQ3h1.pptx
big-data-8722-m8RQ3h1.pptxbig-data-8722-m8RQ3h1.pptx
big-data-8722-m8RQ3h1.pptxVaishnavGhadge1
 
Special issues on big data
Special issues on big dataSpecial issues on big data
Special issues on big dataVedanand Singh
 
ppt final.pptx
ppt final.pptxppt final.pptx
ppt final.pptxkalai75
 
BigDataFinal.pptx
BigDataFinal.pptxBigDataFinal.pptx
BigDataFinal.pptxPentaTech
 
BIG DATA & DATA ANALYTICS
BIG  DATA & DATA  ANALYTICSBIG  DATA & DATA  ANALYTICS
BIG DATA & DATA ANALYTICSNAGARAJAGIDDE
 
Future of Power: Big Data - Søren Ravn
Future of Power: Big Data - Søren RavnFuture of Power: Big Data - Søren Ravn
Future of Power: Big Data - Søren RavnIBM Danmark
 
big-datagroup6-150317090053-conversion-gate01.pdf
big-datagroup6-150317090053-conversion-gate01.pdfbig-datagroup6-150317090053-conversion-gate01.pdf
big-datagroup6-150317090053-conversion-gate01.pdfVirajSaud
 
Bigdatappt 140225061440-phpapp01
Bigdatappt 140225061440-phpapp01Bigdatappt 140225061440-phpapp01
Bigdatappt 140225061440-phpapp01nayanbhatia2
 

Similar a Big data by_mcal (20)

Big data Analytics
Big data Analytics Big data Analytics
Big data Analytics
 
Big Data - A Real Life Revolution
Big Data - A Real Life RevolutionBig Data - A Real Life Revolution
Big Data - A Real Life Revolution
 
Kartikey tripathi
Kartikey tripathiKartikey tripathi
Kartikey tripathi
 
BIGDATAPrepared ByMuhammad Abrar UddinIntrodu.docx
BIGDATAPrepared ByMuhammad Abrar UddinIntrodu.docxBIGDATAPrepared ByMuhammad Abrar UddinIntrodu.docx
BIGDATAPrepared ByMuhammad Abrar UddinIntrodu.docx
 
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
 
big-data-8722-m8RQ3h1.pptx
big-data-8722-m8RQ3h1.pptxbig-data-8722-m8RQ3h1.pptx
big-data-8722-m8RQ3h1.pptx
 
Big Data ppt
Big Data pptBig Data ppt
Big Data ppt
 
Special issues on big data
Special issues on big dataSpecial issues on big data
Special issues on big data
 
Presentation on Big Data
Presentation on Big DataPresentation on Big Data
Presentation on Big Data
 
ppt final.pptx
ppt final.pptxppt final.pptx
ppt final.pptx
 
Big_Data_ppt[1] (1).pptx
Big_Data_ppt[1] (1).pptxBig_Data_ppt[1] (1).pptx
Big_Data_ppt[1] (1).pptx
 
BigDataFinal.pptx
BigDataFinal.pptxBigDataFinal.pptx
BigDataFinal.pptx
 
Big data
Big dataBig data
Big data
 
Big data ppt
Big  data pptBig  data ppt
Big data ppt
 
Big data
Big dataBig data
Big data
 
BIG DATA & DATA ANALYTICS
BIG  DATA & DATA  ANALYTICSBIG  DATA & DATA  ANALYTICS
BIG DATA & DATA ANALYTICS
 
Future of Power: Big Data - Søren Ravn
Future of Power: Big Data - Søren RavnFuture of Power: Big Data - Søren Ravn
Future of Power: Big Data - Søren Ravn
 
big-datagroup6-150317090053-conversion-gate01.pdf
big-datagroup6-150317090053-conversion-gate01.pdfbig-datagroup6-150317090053-conversion-gate01.pdf
big-datagroup6-150317090053-conversion-gate01.pdf
 
Big data basics
Big data basicsBig data basics
Big data basics
 
Bigdatappt 140225061440-phpapp01
Bigdatappt 140225061440-phpapp01Bigdatappt 140225061440-phpapp01
Bigdatappt 140225061440-phpapp01
 

Último

APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsKarinaGenton
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 

Último (20)

APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its Characteristics
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 

Big data by_mcal

  • 1. Unraveling Big Data Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved.
  • 2. Our Goal for Today 1. Evolution of digital data over the decades 2. Why do we process data – and how? 3. How all this has been changing in the last decade? 4. What is Big Data and how to handle it? 5. Who needs to understand Big Data? 6. What are the Big Data related opportunities? 7. Discussions and Q&A Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 2
  • 3. Setting The Context Managerial Leadership and Team 3
  • 4. Bits, Bytes, and Beyond Name Value Example Bit A BIT !! Byte 8 Bits 1 Character Kilobyte 1024 (1K) Bytes About 150 words Megabyte 1K Kilobytes A small book Gigabyte 1K Megabytes 20 GB = All of Beethoven’s work Terabyte 1K Gigabytes 1000 copies of Encyclopedia Britannica Petabyte 1K Terabytes 500 billion pages of standard printed text Exabyte 1K Petabytes 5 EB = All words ever spoken by mankind Zettabyte 1K Exabyte 1 ZB = Entire planet’s digital content Yottabyte 1K Zettabye 1 YB = will take 11 Trillion years to download! Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 4
  • 5. History of Data Storage Capacity 1956 Hard Drive from IBM : 5 MB 1963 Audio Tape : 663 KB 1970 Floppy Disk : 80 KB 1976 Floppy Disk : 110 KB 1981 Floppy Disk : 1.4 MB 1982 CD : 700 MB 1995 DVD : 4.7 GB 2003 BLU RAY : 25 GB Hard Disks : Multi Terabyte WWW & CLOUD Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 5
  • 6. Cost Per Gigabyte YEAR COST / GB 1980 $ 3,000,000 1990 $ 8,000 2000 $ 30 2010 $ 0.08 Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 6
  • 7. Prior to the 80’s  E-commerce did not exist.  Data entry, storage, and processing were sequential processes – and displaced in time.  Data was processed on monolithic computers running on mainframes.  Batch processing was the norm.  Data processing was used in non-time-critical areas such as payroll and accounting.  Only large enterprises and institutions could afford data processing.  Data processing could only support long term analysis and decision making processes – such as planning. Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 7
  • 8. Prior to the 80’s… Data was largely STRUCTURED Managerial Leadership and Team 8
  • 9. Structured Data Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 9
  • 10. Structured Data Cont... Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 10
  • 11. Data Processing in the 80’s and Before Data creation was a controlled process. Rate of data creation was known and manageable. Data creation and processing : Co-located. Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 11
  • 12. Database Systems of the 80’s and Prior Navigational Relational Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 12
  • 13. In the 90’s  Better connectivity allowed data to be collected from distributed, but finite sources.  Data created was directly captured and stored online.  Online Transaction Processing (OLTP) systems emerged.  Data processing could now support operational decision making since data capture and processing could be done real time. Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 13
  • 14. In the 90’s Cont...  Data creation was still a controlled step and data was structured.  Volumes of data generated was manageable.  Data processing was still centralized.  Relational Databases ruled the world of data processing. Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 14
  • 15. Then… “INTERNET HAPPENED” Changing the way we live in this world … Managerial Leadership and Team 1 5
  • 16. Internet Traffic Trends Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 16
  • 17. Early Years of Internet Internet enabled e-commerce B2B Transactions B2C Transactions     Banking and Finance Travel and Hospitality Retail Health Care Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 17
  • 18. Early Years of Internet Cont...  Volume of online transactions rapidly increased.  Database systems had to separate online processing from analysis to cope with the transaction volume.  Data Warehousing emerged.  Distributed databases also made their appearance. Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 18
  • 19. Early Years of Internet Cont...  In the early days, the processed data was still structured since it dealt with e-commerce transactions.  The need was for systems that focused on transactions: validation and recording.  Consequently, transaction and analysis systems had to be separated. ETL (Extract Transform Load) processes managed data conversion from one form to another (transaction  analysis). Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 19
  • 20. In the New Millennium Rapid adoption of Internet. Explosion of e-commerce : Especially B2C. The Internet enabled customers to seek out the best deal. Businesses had to proactively entice customers. • To consume their products and services. • At the point of purchase. Data processing moved from playing a supportive role to a “Business Critical” role. • Nature of certain businesses completely changed. Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 2 0
  • 21. Then Came SOCIAL NETWORKING and MOBILITY Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 21
  • 22. Impact of Social Networking Success of B2C business transactions now depends on the ability to analyze customers’ past and current behaviour real-time! Social Networking has become a source of valuable information to understand customer choice and behaviour. Social Networking = Unstructured Data Social Networking = Extremely large data generation rates Social Networking = Highly distributed Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 2 2
  • 23. Unstructured and Distributed Data Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 2 3
  • 24. Unstructured Data Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 2 4
  • 25. Unstructured Data Cont... Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 25
  • 26. Unstructured Data Cont... Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 2 6
  • 27. Very High Data Creation Rates Year Data Estimate 2002 5 Billon GB 2006 161 Billion GB 2010 1277 Billion GB 2015 7910 Billion GB Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 27
  • 28. The Situation Today… Every two days now we create as much information as we did from the dawn of civilization up until 2003. - Erik Schmidt, GOOGLE Structured Data constitutes only 5% of the total “Data Deluge”. Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 28
  • 29. Business Processes – Then and Now Then Now Anticipate product / service need Anticipate product / service need Marketing Marketing Sales Sales Transaction Transaction Analysis Analysis Refinement Refinement Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 29
  • 30. Who Needs Rapid Data Analysis Banking and Finance Credit / Debit / ATM card transactions • Collaboration between banks • Fraud detection • Real-time analysis of CCTV to detect and prevent ATM attacks Credit / Loan approval • Credit analysis based on credit history as well as social network traces Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 30
  • 31. B2C ecommerce Sites (Online Stores) Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 31
  • 32. B2C – Product Comparison Sites Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 32
  • 33. Data Analysis in Elections The last USA elections Data-driven decision making played a huge role in creating a second term for the 44th President and will be one of the more closely studied elements of the 2012 cycle. Time: Nov 10, 2012 Obama Election Head Office - Chicago Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 33
  • 34. Crime Investigation / Prevention / Surveillance Processing of email / chat / phone call traces • Accessed by Govt. agencies Processing of Facebook / Twitter posts / Chats • Sentiment analysis for crime prevention Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 34
  • 35. Common to All These Situations… • UNSTRUCTURED data. • Very large data sets – dynamic and rapidly increasing by the minute. o Terabytes of Data (BIG DATA) • Highly dispersed and distributed data generation. • Impossible to move such data to a central location for processing. • At the same time, very critical to process data and generate results real-time. Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 35
  • 36. Characteristics of New Age Data Processing Systems  Ability to handle unstructured data.  Ability to handle rapidly increasing volumes of data.  Ability to operate on distributed data sets.  Scalable.  Reliable/Fault tolerant.  Reasonable costs - one time & operational. These requirements have led to increasing interest in BIG DATA the development of newer Data Storage & Analysis Techniques. Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 36
  • 37. Growing Interest in Big Data Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 37
  • 38. Conventional Database Systems Relational Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 38
  • 39. Conventional Database Systems Cont… Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 39
  • 40. Data Models and Database Systems Over the Years Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 40
  • 41. History of Data Models and Database Systems MAP REDUCE, COLUMNAR DATABASES & NO-SQL DATABASES Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 41
  • 42. How to Tackle Big Data – In Simple Words 1. Break down the problem into manageable chunks. 2. Spread the data and its processing it over a number of nodes – typically cheap computers. 3. Manage the process to ensure that nothing gets lost. 4. Re-assemble the answer from the various parts to get your query answered. Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 42
  • 43. Map – Reduce : Technique to Handle BIG DATA Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 43
  • 44. Map – Reduce : Technique to Handle BIG DATA Cont... Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 44
  • 45. The Map – Reduce Technique Advantages Drawbacks Can handle both, structured and unstructured data. Not very easy to setup and use. Can scale up with data size. Raw Map - Reduce requires programming to set up. Open source implements available: Reasonable costs. Basic Map - Reduce suitable largely for batch processing. • (Real time techniques have now been implemented to overcome this drawback). Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 45
  • 46. Hadoop Based on the Map-Reduce distributed processing architecture. A task is mapped to a set of servers for processing. Results from the servers are then reduced down to a singe set. Hadoop operates on the HDFS distributed file system. - HDFS ensures data redundancy. Hadoop has in-built task management functionality to ensure reliability. Interfaces available with other components: Open Systems and commercial. Highly scalable and cost effective. Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 46
  • 47. HDFS HDFS Hadoop Distributed File System Goals (Ref: Nortonworks) • Store Petabytes of data. • Keep per node costs down to afford more nodes (scalability). • Commodity x86 servers, Open Source software. • Support computation in each server. • Handle failures: Failures treated like noise – inevitable. Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 47
  • 48. HDFS Cont... Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 48
  • 49. Big Data Analysis – The Big Picture! Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 49
  • 50. Components Relevant to Hadoop Hbase Database to store data and speed up queries. Hive Warehouse implementation to support Analytics, Query and Visualization. Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 50
  • 51. HBase HBase is a Columnar, NoSQL database system. HBASE RDBMS Column oriented Row oriented Flexible schema, add columns on the fly Fixed Schema Good with sparse tables (partially Not optimized for sparse tables filled) No query language SQL Wide tables Narrow tables Joins using Map – Reduce Optimized for joins Tight integration with Map Reduce Not integrated (usually) with MR Horizontal scalability – just add hardware Hard to scale and size down Good for semi-structured & structured Good only for structured data Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 51
  • 52. Hive • Hadoop can get difficult to configure and use! • Hive sits between Hadoop and the users of Hadoop. • It provides a familiar – TABLE like – environment for dealing with Hadoop. • It allows Data to be: o Read from Hadoop / HDFS o Written into Hadoop / HDFS o Queried from Hadoop / HDFS using the much familiar SQL like syntax • In the background, Hive efficiently converts all queries into efficient MAP – REDUCE tasks. • Hive is a Data Warehouse system for Hadoop. Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 52
  • 53. HBase v/s Hive HBase Hive Typically used for unstructured data and sparse tables. Typically used as a Data Warehouse. Allows low latency random data access. Main purpose is analysis and adhoc querying. Main purpose is continuous operations such as accepting data feeds and committing them to HDFS. Deals with Structured Data resulting from analysis of data stored in HDFS. Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 53
  • 54. Pioneers of Big Data eBay In excess of 2500 computing cores Yahoo In excess of 4000 nodes Facebook More than 23,000 nodes Google ?? (24 Pb of data/day) LinkedIn ?? Source: Slide by Ian Brown Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 54
  • 55. Big Data Solution Suppliers Informatica EMC Oracle IBM Microsoft Teradata Amazon Cloudera Apache Google Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 55
  • 56. Who Uses Big Data (2011) Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 56
  • 57. Case Study : redBus.in • redBus.in : Internet based bus ticket booking • Handles more than 10,000 routes • Goal o To capture each and every event happening on their website & co-relate them o To identify if booking failures were due to absence of supply, or due to server problems o To understand which routes needed more buses • Volume of data: 500 GB • Expected response time: Less than 1 minute • Tool / service used : BigQuery from Google Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 57
  • 58. Case Study : Seagate • Seagate : Has manufactured more than 2 Billion hard drives • They maintain data comprising: o Information related to the 2 Billion hard drives o Manufacturing information o Supplier information o Customer information • 400 GB of data added per day to the Warehouse • Used Big Data techniques to analyze Test Data • Impact : Overall improvement in quality due to sharp identification of process and supplier issues • Tools used: Not known Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 58
  • 59. Case Study : Macy’s • They want to prevent an overload of irrelevant promotions going to their customers. • They are sending fewer, more focused messages to individual clients about products and special offerings that have a high likelihood of being appealing to that person. • They are combining point-of-sale information with o online browsing behaviours o response to emails o social media activity o and more … • To get a 360-degree view of each customer. • The result: fewer, more meaningful interactions with customers that drive greater loyalty, greater revenues, and lower churn. Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 59
  • 60. Other Applications of Big Data  Epidemic prediction  Weather predictions  Scientific experiments generating very large amount of data such as the Super Collider.  Astronomy  Search for extra terrestrial intelligence Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 60
  • 61. Big Data Challenges Hadoop and Big Data Technologies are time consuming to set-up and use. Building and running Hadoop jobs is non-trivial. Running and analyzing queries and results does not leverage existing skills. Requires special teams to initiate in an organization – along with associated costs. Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 61
  • 62. Who Should Know About Big Data Decision Makers To understand its capabilities and how to use it for Business gains. Data Scientists To be able to understand and apply the right techniques to solve Big Data problems. Big Data Applications Developers To know the building blocks, and nuts and bolts of putting together a Big Data processing system. Big Data Analysts IT Stafff Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 62
  • 63. Big Data Macro Trends • Information generation growing 2 times faster than storage capacity. • Growth in data collection: 60% CAGR. • Information Management industry: o Sized at $100 Billion o Growing at 10% CAGR • Big Data sources are becoming more varied. o Mobile phones, sensors, etc. • Total Internet traffic will exceed 667 Exabytes by 2013. • Third party data availability is on the rise. • Hadoop is the fastest growing Big Data : Downloads have increased more than 400% in the last two years. Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 63
  • 64. Big Data Market Size Projection Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 64
  • 65. Future of Big Data Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 65
  • 66. Career Opportunities Internet products and services companies Manufacturing companies Banking and finance Pharma Govt Departments Direct Opportunities • • • • • Indirect Opportunities • Handling outsourced Big Data analysis and development projects for the above organizations. Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 66
  • 67. Structured v/s Unstructured Data Unstructured Data 1. Structured Data Web server and search engine logs (“data exhaust”) Customer databases Logs from other types of servers 2. (e.g., telecom switches and gateways) 3. E-Commerce / Web Commerce records Legacy BI/ CRM/ ERP systems Inventory and Supply Chain 4. Social Media / Gaming messages 5. Multimedia – voice, video, images 6. Sensor data / M2M communications Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 67
  • 68. Structured v/s Unstructured Structured Unstructured Discrete (rows and columns) Binary large objects: Lessdefined boundaries, less-easily addressable. Small discrete objects: Information represented for a very specific purpose (e.g., SMTP Mail Msg.). Storage/Persistence DBMS or file formats (e.g., VSAM). Unmanaged, file structure or content repository. Metadata Focus Syntax (e.g., location and format). Semantics (descriptive and other markup). Integration Tools ETL or ELT, Enterprise Information Integration via BizTalk and Batch Processing. Batch Processing, Manual data Entry, Custom solutions that involve a lot of code. Standards SQL (and its multiple Open XML, SMTP, SMS, CSV and variations), ADO.Net, ODBC Information and Content Exchange. and many RDBMS support XML as another option. Representation Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 68
  • 69. Evolution of Data Transfer Rates Medium Transfer Rate Modems 56 Kilobits / Second T-1 Line 1.544 Megabits / Second Ethernet 10 Megabits / Second Fast Ethernet (LAN) 100 Megabits / Second 1 Gigabits / Second T-3 44.736 Megabits / Second Optical Fibres Upto 20 Gigabits / Second (Dedicated) Next Internet Backbone 2.4 Gigabits / Second Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 69
  • 70. History of Analytics Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 70
  • 71. Prior to the 80’s E-commerce did not exist Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 71
  • 72. Prior to the 80’s Data entry, storage, and processing were sequential and displaced in time Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 72
  • 73. Prior to the 80’s Data was processed by monolithic applications running on mainframes Batch processing was the norm Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 73
  • 74. Prior to the 80’s Data processing was used in non-time-critical areas such as payroll, accounting Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 74
  • 75. Prior to the 80’s Only large enterprises and institutions could afford the cost of processing data Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 75
  • 76. Prior to the 80’s Data processing could only support long term analysis and decision making processes – such as planning Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 76
  • 77. Hadoop – Value Adding Projects/Products Hadoop 1. HBase 2. Cassandra 3. Mongo 4. CouchDB Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 77
  • 78. Projects/Products Adding Value to Hadoop The standard Hadoop database, an open-source, distributed, versioned, column-oriented store, providing Bigtable-like capabilities over Hadoop. HBase Cassandra HBase includes base classes for backing Hadoop MapReduce jobs; query predicate push; optimizations for real time queries; a Thrift gateway and a REST-ful web service to support XML, Protobuf, and binary data encoding: an extensible JRu-by-based (JIRB) shell; and support for the Hadoop metrics subsystem. Like Hadoop, HBase is an Apache project, hosted at http://hbase.apache.org/ Apache Cassandra is a highly scalable second-generation distributed database, bringing together Dynamo’s fully distributed design and Bigtable's ColumnFamily-based data mode. The Cassandra project lives at http://cassandra.apache.org/ A good example of using Cassandra together with Hadoop lies in the Datastax Brisk platform - learn more at http://www.datastax.com/ Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 78
  • 79. Projects/Products adding value to Hadoop Cont… Mongo An open source, scalable, high-performance, schema-free, documentoriented database written in C++. The MongoDB project is hosted at http://www.mongodb.org/. To use Mongo and Hadoop together, check out https://github.com/mongodb/mongo-hadoop Apache CouchDB is a document-oriented database supporting queries and indexing in a MapReduce fashion using JavaScript. CouchDB CouchDB provides APls that can be accessed via HTTP requests to support web applications. Learn more at http://couchdb.apache.org/ Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 79
  • 80. Big Data Applications: Additional Ideas Balance Sheet Analysis Manufacturing Data Analysis Production Systems Diagnostics and Pattern Identification Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 80
  • 81. Thank you for your attention! Please ask questions, if any! Copyright ©2013. MindMap IT Solution (P) Ltd. All right reserved. 8 1