SlideShare una empresa de Scribd logo
1 de 362
BUMPER
Understanding Big Data
Class 1
Introduction to Big Data
Understanding Big Data
Business Applications of Big Data
Class 1
Introduction to Big Data
Understanding Big Data
Business Applications of Big Data
Technologies for handling Big Data
Class 1
Introduction to Big Data
Understanding Big Data
Business Applications of Big Data
Technologies for handling Big Data
Big Data Management Systems – Databases &
Warehouses
Class 1
Introduction to Big Data
Understanding Big Data
Business Applications of Big Data
Technologies for handling Big Data
Big Data Management Systems – Databases &
Warehouses
Analytics & Big Data
Class 1
Introduction to Big Data
Topic 1
Class 1
Introduction to Big Data
Understanding Big Data
What is Big Data?
Topic 1 – Understanding Big Data
What is Big Data?
Topic 1 – Understanding Big Data
Structuring & Elements
What is Big Data?
Topic 1 – Understanding Big Data
Structuring & Elements
Application in Business & Careers
DATA
Personal
Computers
Facebook
Twitter
YouTube
Google
ATMs
Drop Box
Picasa
2002
5 Exabytes
Online Data
2009
281
Exabytes
Online Data
(56 Times
Increase)
A pool of large-sized datasets to capture, store,
What is Big Data?
A pool of large-sized datasets to capture, store,
What is Big Data?
search, share, transfer, analyse, and visualise
A pool of large-sized datasets to capture, store,
What is Big Data?
search, share, transfer, analyse, and visualise
related information or data within an acceptable
elapsed time.
Data = Information
Data = Information
Information = Insight
• Every second, consumers make 10,000 payment
card transactions worldwide
• Every second, consumers make 10,000 payment
card transactions worldwide
• Every hour, Walmart handles more than 1 million
customer transactions
• Every second, consumers make 10,000 payment
card transactions worldwide
• Every hour, Walmart handles more than 1 million
customer transactions
• Everyday Twitter’s users post 500 million tweets per
day
• Every second, consumers make 10,000 payment
card transactions worldwide
• Every hour, Walmart handles more than 1 million
customer transactions
• Everyday Twitter’s users post 500 million tweets per
day
• Facebook users post 2.7 billion likes and comments
in a day
BIG DATA
Is a new data
challenge that
requires
leveraging
existing
systems
differently
BIG DATA
Is a new data
challenge that
requires
leveraging
existing
systems
differently
Is classified in terms of:
Volume (terabytes, records,
transactions)
Variety (internal, external,
behavioural, or/and social)
Velocity (near or real-time
assimilation)
BIG DATA
Is a new data
challenge that
requires
leveraging
existing
systems
differently
Is classified in terms of:
Volume (terabytes, records,
transactions)
Variety (internal, external,
behavioural, or/and social)
Velocity (near or real-time
assimilation)
Is usually
unstructured
and qualitative
in Nature
• Understanding target customer
Advantages of Studying Big Data:
• Understanding target customer
• Cutting down expenditures in the healthcare
Advantages of Studying Big Data:
• Understanding target customer
• Cutting down expenditures in the healthcare
• Increase in operating margins in retail
Advantages of Studying Big Data:
• Understanding target customer
• Cutting down expenditures in the healthcare
• Increase in operating margins in retail
• Profits with improvements in operational
efficiency
Advantages of Studying Big Data:
• Sports
Industries that Benefit:
• Sports
• Science and Research
Industries that Benefit:
• Sports
• Science and Research
• Security and Law Enforcement
Industries that Benefit:
• Sports
• Science and Research
• Security and Law Enforcement
• Financial Trading
Industries that Benefit:
• Procurement
Departments that can Benefit:
• Procurement
• Product Development
Departments that can Benefit:
• Procurement
• Product Development
• Manufacturing
Departments that can Benefit:
• Procurement
• Product Development
• Manufacturing
• Distribution
Departments that can Benefit:
• Procurement
• Product Development
• Manufacturing
• Distribution
• Marketing
Departments that can Benefit:
• Procurement
• Product Development
• Manufacturing
• Distribution
• Marketing
• Price Management
Departments that can Benefit:
• Procurement
• Product Development
• Manufacturing
• Distribution
• Marketing
• Price Management
• Merchandising
Departments that can Benefit:
• Procurement
• Product Development
• Manufacturing
• Distribution
• Marketing
• Price Management
• Merchandising
• Sales
Departments that can Benefit:
• Procurement
• Product Development
• Manufacturing
• Distribution
• Marketing
• Price Management
• Merchandising
• Sales
• Store operations
Departments that can Benefit:
• Procurement
• Product Development
• Manufacturing
• Distribution
• Marketing
• Price Management
• Merchandising
• Sales
• Store operations
• Human Resources
Departments that can Benefit:
Flu Indications & WarningsMassive Data Collection
Analyse
Collected
Data
Early Warnings for Flu Plague
Social Data from Networking Sites
reveals Behavioural Patterns
Use Big Data for Growth & Value Addition
RECAP
What is Big Data, its advantages and various
sources
BUMPER
BUMPER
Topic 1
Class 1 - Introduction to Big Data
Understanding Big Data
What is Big Data?
Class 1 - Introduction to Big Data
What is Big Data?
Class 1 - Introduction to Big Data
Structuring & Elements
What is Big Data?
Class 1 - Introduction to Big Data
Structuring & Elements
Application in Business & Careers
How do I choose
a book, of the
millions available
on my favorite
sites or stores?
How can I use
the vast amount
of data
and information I
come across?
How do I keep
myself updated
of events,
news?
Which news
articles should
I read?
How do I choose
a book, of the
millions available
on my favorite
sites or stores?
How can I use
the vast amount
of data
and information I
come across?
Formats of Data:
Formats of Data:
Formats of Data:
Formats of Data:
Internal – Organisational
or enterprise data
Sources of Data:
External - Social Data from the
internet or Government
Structured
Data
Unstructured
Data
Semi-
Structured
Data
BIG DATA
Structured Data
• Has a predefined format
Features of Structured Data:
• Has a predefined format
• Resides in fixed fields within a record
Features of Structured Data:
• Has a predefined format
• Resides in fixed fields within a record
• Has their attributes mapped
Features of Structured Data:
• Has a predefined format
• Resides in fixed fields within a record
• Has their attributes mapped
• Used to report against predetermined data types
Features of Structured Data:
Sources of Structured Data:
• Relational databases
Sources of Structured Data:
• Relational databases
• Flat files in record format
Sources of Structured Data:
• Relational databases
• Flat files in record format
• Multidimensional databases
Sources of Structured Data:
• Relational databases
• Flat files in record format
• Multidimensional databases
• Legacy databases
Unstructured Data
Sources of Unstructured Data:
• Organisational Data
Sources of Unstructured Data:
• Organisational Data
• Social Media
Sources of Unstructured Data:
• Organisational Data
• Social Media
• Mobile Data
Challenges of Using Unstructured Data:
• Difficulty and time consumption in making
sense
Challenges of Using Unstructured Data:
• Difficulty and time consumption in making
sense
• Difficulty in combining and linking unstructured
data to more structured information
Challenges of Using Unstructured Data:
• Difficulty and time consumption in making
sense
• Difficulty in combining and linking unstructured
data to more structured information
• Cost-addition in terms of the storage wastage
and human resource needed
Semi-Structured Data
Sources of Semi-Structured data:
• Database systems
Sources of Semi-Structured data:
• Database systems
• File systems like Web data and bibliographic data
Sources of Semi-Structured data:
• Database systems
• File systems like Web data and bibliographic data
• Data exchange formats like scientific data
Sl. No Name E-mail
1. Sam Jacobs smj@xyz.com
2. First Name David davidb@xyz.com
Last Name Brown
Volume
Velocity
Variety
What is Big Data?
Class 1 - Introduction to Big Data
Structuring & Elements
Application in Business & Careers
Big Data Application In
Business Analytics
What are the areas where
Big Data can be applied?
Transportation
Provides improved traffic
information and autonomous
features
Education
Through innovative approaches for
teachers to analyze students
Travel
Apply analytics to pricing,
inventory, and advertising to
improve customer experiences
Governments
To make informed decisions for
fraud management, discover
unknown threats, ensure security
of global supply chain
Healthcare
To ensure clinical protocols that
will ensure the best health
outcome for patients
Careers in Big Data
BIG Career Opportunities
Major Big Data Hiring Companies:
Product companies, e.g., Oracle
Technology drivers, e.g., Google
Services companies, e.g., EMC
Data analytics companies, e.g., Splunk
The most common job titles in Big Data include:
Big Data Analyst
The most common job titles in Big Data include:
Big Data Analyst
Big Data Scientist
The most common job titles in Big Data include:
Big Data Analyst
Big Data Scientist
Big Data Developer
Module 1
Introduction to Big Data
Module 1
Introduction to Big Data
Big Data Analyst
Certification Track
Big Data Developer
Certification Track
Module 1
Introduction to Big Data
Big Data Analyst
Certification Track
Big Data Developer
Certification Track
Module 2
Introduction to
Analytics & R
Programming
Module 3
Data Analysis
Using R
Module 4
Advanced
Analytics
Using R
Module 2
Managing a
Big Data
Ecosystem
Module 1
Introduction to Big Data
Big Data Analyst
Certification Track
Big Data Developer
Certification Track
Module 2
Introduction to
Analytics & R
Programming
Module 3
Data Analysis
Using R
Module 4
Advanced
Analytics
Using R
Module 2
Managing a
Big Data
Ecosystem
Module 5
Machine
Learning
Concepts
Module 3
Storing &
Processing
Data: HDFS &
MapReduce
Module 4:
Increasing
Efficiency with
Hadoop Tools
Module 5
Additional
Hadoop Tools:
ZooKeeper,
Sqoop, Flume,
YARN & Storm
Module 1
Introduction to Big Data
Big Data Analyst
Certification Track
Big Data Developer
Certification Track
Module 2
Introduction to
Analytics & R
Programming
Module 3
Data Analysis
Using R
Module 4
Advanced
Analytics
Using R
Module 2
Managing a
Big Data
Ecosystem
Module 5
Machine
Learning
Concepts
Module 3
Storing &
Processing
Data: HDFS &
MapReduce
Module 4:
Increasing
Efficiency with
Hadoop Tools
Module 5
Additional
Hadoop Tools:
ZooKeeper,
Sqoop, Flume,
YARN & Storm
Module 6
Social Media,
Mobile
Analytics &
Visualisation
Module 7
Industry
Applications of
Big Data
Applications
Module 6
Leveraging NoSQL
& Hadoop: Real
Time, Security &
Cloud
Module 7
Commercial
Hadoop
Distribution &
Management Tools
Module 1
Introduction to Big Data
Big Data Analyst
Certification Track
Big Data Developer
Certification Track
Module 2
Introduction to
Analytics & R
Programming
Module 3
Data Analysis
Using R
Module 4
Advanced
Analytics
Using R
Module 2
Managing a
Big Data
Ecosystem
Module 5
Machine
Learning
Concepts
Module 3
Storing &
Processing
Data: HDFS &
MapReduce
Module 4:
Increasing
Efficiency with
Hadoop Tools
Module 5
Additional
Hadoop Tools:
ZooKeeper,
Sqoop, Flume,
YARN & Storm
Module 6
Social Media,
Mobile
Analytics &
Visualisation
Module 7
Industry
Applications of
Big Data
Applications
Module 6
Leveraging NoSQL
& Hadoop: Real
Time, Security &
Cloud
Module 7
Commercial
Hadoop
Distribution &
Management Tools
Complete
Project
Wrox Certified Big Data
Analyst/ Developer
Technical Skills Required
for a Big Data Analyst:
Technical Skills Required
for a Big Data Analyst:
• Handle & analyse massive data sets using MapReduce
Technical Skills Required
for a Big Data Analyst:
• Handle & analyse massive data sets using MapReduce
• Hadoop & components Hbase & Hive
Technical Skills Required
for a Big Data Analyst:
• Handle & analyse massive data sets using MapReduce
• Hadoop & components Hbase & Hive
• SQL and NoSQL languages such as Impala, Hive and Pig
Technical Skills Required
for a Big Data Analyst:
• Handle & analyse massive data sets using MapReduce
• Hadoop & components Hbase & Hive
• SQL and NoSQL languages such as Impala, Hive and Pig
• Analytical tools such as SAS, R, Tableau
Technical Skills Required
for a Big Data Analyst:
• Handle & analyse massive data sets using MapReduce
• Hadoop & components Hbase & Hive
• SQL and NoSQL languages such as Impala, Hive and Pig
• Analytical tools such as SAS, R, Tableau
• Statistical techniques to implement text analytics solutions
Technical Skills Required
for a Big Data Analyst:
• Handle & analyse massive data sets using MapReduce
• Hadoop & components Hbase & Hive
• SQL and NoSQL languages such as Impala, Hive and Pig
• Analytical tools such as SAS, R, Tableau
• Statistical techniques to implement text analytics solutions
• Data handling and manipulation techniques
Technical Skills Required
for a Big Data Analyst:
• Handle & analyse massive data sets using MapReduce
• Hadoop & components Hbase & Hive
• SQL and NoSQL languages such as Impala, Hive and Pig
• Analytical tools such as SAS, R, Tableau
• Statistical techniques to implement text analytics solutions
• Data handling and manipulation techniques
• Generate client ready dashboards, reports and visualizations
Soft Skills Required:
• Strong written & verbal communication skills
Soft Skills Required:
• Strong written & verbal communication skills
• Analytical Ability
Soft Skills Required:
• Strong written & verbal communication skills
• Analytical Ability
• Basic understanding of how a business works
Future of Big Data
RECAP
 What are the various types and structures
of Big Data and the elements that form it
 What are the business applications of Big
Data and the career opportunities
associated
BUMPER
BUMPER
BIG DATA
Topic 2
Business Applications of Big Data
Class 1: Introduction to Big Data
Social Media
Topic 2
Business Applications of Big Data
Significance of Social Network Data
Topic 2
Business Applications of Big Data
Significance of Social Network Data
Financial Fraud & Big Data
Topic 2
Business Applications of Big Data
Significance of Social Network Data
Financial Fraud & Big Data
Fraud Detection in Insurance
Topic 2
Business Applications of Big Data
Significance of Social Network Data
Financial Fraud & Big Data
Fraud Detection in Insurance
Use in Retail Industry
Significance of Social Network Data
What is Social Network Data?
Significance of Social Network Data
What is Social Network Data?
What is Social Network Analysis?
Significance of Social Network Data
What is Social Network Data?
What is Social Network Analysis?
What are the uses of Social Network
Data Analysis?
Significance of Social Network Data
What is Social Network Data?
What is Social Network Analysis?
What are the uses of Social Network
Data Analysis?
What is Sentiment Analysis?
DATA
Social Media
AGE
Social Media
AGE
GENDER
Social Media
AGE
GENDER
LOCATION
Significance of Social Network Data
What is Social Network Data?
What is Social Network Analysis?
What are the uses of Social Network
Data Analysis?
What is Sentiment Analysis?
Social Network Analysis (SNA)
Social
Network
Social Network Analysis (SNA)
Social
Network
DATA
Analysis
Social Network Analysis (SNA)
Social
Network
DATA
Total
Number
of calls
Total
Number
of calls
Total
Number
of SMS
Structure of a Caller’s Social Network
Social Network Site
Social Network Site
Social Network Site
Social Network Site
Social Network Site
Social Network Site
Social Network Site
Social Network Site
Social Network Site
Social Networking Analysis
a Big Data Problem
Significance of Social Network Data
What is Social Network Data?
What is Social Network Analysis?
What are the uses of Social Network
Data Analysis?
What is Sentiment Analysis?
Social Network Analysis (SNA)
Business Intelligence
Social Network Analysis (SNA)
Business Intelligence
Marketing
Social Network Analysis (SNA)
Business Intelligence
Marketing
Product Design & Development
Social Network Analysis (SNA)
Business Intelligence
Marketing
Product Design & Development
Customer Relationship
Management (CRM)
A
•E
•F
B
•A
•D
C
•H
•OGroup A
Group GH
Provides new contexts in which decisions are data driven,
not opinion driven
Social Network Data Analysis
Provides new contexts in which decisions are data driven,
not opinion driven
Organizations to shift goals to maximize profitability of
customer’s network
Social Network Data Analysis
Provides new contexts in which decisions are data driven,
not opinion driven
Organizations to shift goals to maximize profitability of
customer’s network
Organizations to identify highly connected customers
Social Network Data Analysis
Organizations to lure highly connected customers with free
trials and solicit their feedback
Social Network Data Analysis
Organizations to lure highly connected customers with free
trials and solicit their feedback
Organizations to encourage internal customers to become
more active
Social Network Data Analysis
Social Network Analysis (SNA)
Business Intelligence
Marketing
Product Design & Development
Social Data
Social Data
Analysis
Analyze
Media Communication
Social Network Analysis (SNA)
Business Intelligence
Marketing
Product Design & Development
System
System
DATA
System
Significance of Social Network Data
What is Social Network Data?
What is Social Network Analysis?
What are the uses of Social Network
Data Analysis?
What is Sentiment Analysis?
Product Development
and Offerings
Sentiment Analysis
Marketers
Business
Professionals
Followers
3,46,259
Followers
2,73,591
Likes
But is one of the most disliked airlines. Why?
SummaryRECAP
What is social network data and analysis
What are its uses and values
BUMPER
BUMPER
BIG DATA
Topic 2
Business Applications of Big Data
Class 1: Introduction to Big Data
Topic 2
Business Applications of Big Data
Significance of Social Network Data
Financial Fraud & Big Data
Fraud Detection in Insurance
Use in Retail Industry
BANK
Common Financial Frauds
Common Financial Frauds
Credit Card Frauds
Exchange or Return Policy Fraud
Personal Information Fraud
understand
customers ordering
patterns
Prevent Frauds
watch out
For red flags
Big
Data
Analyzing
data
sample
size Small
Can understand
various patterns of the
fraud
Analyzing
data
sample
size Large
Cannot understand
various patterns of the
fraud
• Size could not be increased, required huge investments
in time and money
• Big Data techniques can overcome this challenge
Big Data analytics can…
Run check on all data to identify fraudulent
ones
Identify new ways of fraud and add to a set of
fraud-prevention checks
Doesn’t impede customers with unnecessary
polices and governance structures
Fraud Detection in Real Time
BIG DATA
live transactions sources of data
BIG DATA
Historical Data
Indicate fraud
patterns
Checks to prevent
real-time fraud
Real-time Analysis
BIG DATA
Create
comparisons
Drawing Maps &
Graphs
Decisions and
effective systems
BLOCK FRAUD
Topic 2
Business Applications of Big Data
The Significance of Social Network Data
Financial Fraud and Big Data
Fraud Detection in Insurance
Use of Big Data in the Retail Industry
Insurance Company
Improve its ability to make decisions in real time when
processing a new claim, thereby reducing the claim
cycle time
Incurs a steady increase in the cost of litigation and
fraudulent claims
Underwriters do not have required data at the right time
to make the necessary decisions, further delaying
processing time
BIG DATA
Social Media
Data
Note for
underwriter
Social Media Triggers to identify Fraud
These glaring discrepancies reflect FRAUD.
In the claim - a customer might indicate that
his or her car was destroyed in a flood
Documentation from the social media feed
shows that the car was actually in another
city on the day the flood occurred.
Insurance Frauds
Have a huge cost implication on organization
Organizations prefer using Big Data analytics and other
advanced technologies
Positive impact on customers as losses are transferred
as higher premiums to customers
Big Data analytics
platform
Organizations are now able to analyze complex information
and accident scenarios in minutes rather than days or months
INSURANCE
 Typically use small samples of data to analyze
 Method relies on the previously recorded fraud cases
 Every time a fraud based on new technique occurs, insurance
companies have to bear the consequences and the losses for
the first time
 The traditional method of identifying frauds works in
independent silos
 It is not capable of handling various sources of information from
different channels and different functions in an integrated way
Fraud Detection Methods
Statistical Models
Public
Data
Bank Statements
Legal Judgments
Criminal Records
Medical Bills
Social Network Analysis (SNA)
Big Data can be used to create visibility into blind spots
for businesses
SNA is an innovative and effective way to identify and
detect frauds
SNA tool uses a mix of analytical methods
• Statistical methods
• Pattern analysis
• Link analysis
When link analysis is used in fraud detection
• Looks for clusters of data
• How those data clusters are linked to other data
clusters?
• Public records are various data sources that can
be integrated into a model
• The insurer can rate claims
When link analysis is used in fraud detection
If the rating is high
It indicates that the claim is fraudulent
• known bad address
• a suspicious provider
• the vehicle was involved in many accidents with multiple
carriers.
How fast does data arrive?
How much of unrequired data is
there when it arrives?
How deep should the analysis be
before determining
the best accurate results?
What type of user interface
components need to be included
on the SNA dashboard?
SNA method to detect fraud:
Structured and unstructured data, from various sources fed into the
ETL (Extract, Transform, and Load) tool
This data is then transformed and loaded into data warehouse
Analytics team uses information from various sources, scores risk of
fraud and ranks likelihood of fraud
Information used can come from varied sources - prior belief, previous
relationship, number of rejected claims etc.
Big Data technologies - text mining, sentiment analysis, content
categorization, and social network analysis included into the fraud
detection and predictive modeling mechanism.
SNA method to detect fraud:
Depending on score of particular network, an alert is generated
Investigators can leverage this information and begin researching
more on fraudulent claim
Issues of frauds identified are added into case system.
Predictive analysis works with the
concept that earlier the fraud detection,
the lesser the loss incurred by a business.
Fraud detection
BIG DATA
Text analytics Sentiment analysis
Predictive analytics
Predictive Analytics
Technology
Claim adjusters write lengthy reports while investigating a claim.
Clues are hidden in reports that claims adjuster would not notice
Computing system based on business rules highlights clues for
possible fraud
Fraud detection system spot these discrepancies and flag claim
as fraudulent
Customer Relationship
Management (CRM)
The following briefly describes how a Social CRM
process works:
Uses organization’s existing CRM to gather data from various social media
platforms
Uses “listening” tool to extract data from social chatter that acts as reference
data for existing data in organization’s CRM
Reference data along with information stored in CRM fed into a case
management system
Case management system analyzes information on basis of organization’s
business rules and sends response
Response from claim management system on fraudulent claim is confirmed by
investigators
Class 1: Introduction to Big Data
The Significance of Social Network Data
Financial Fraud and Big Data
Fraud Detection in Insurance
Use of Big Data in Retail Industry
Use of Big Data in Retail Industry
BIG DATA
Use of Big Data in Retail Industry
How many basic tees did we sell today?
What time of the year do we sell most leggings?
What else has customer X bought?
what kind of coupons can we send to customer X?
Use of Big Data in Retail Industry
Use of Big Data in Retail Industry
In-store Sales Online Sales
Use of Big Data in Retail Industry
Use of Big Data in Retail Industry
Most of the Big Data is just not required
and not useful either
• some information will have long-term strategic value
• some will be useful only for immediate and tactical use
• some data won’t be used for anything at all
Use of RFID Data in Retail
(Radio Frequency Identification)
A RFID tag refers to a small tag that includes a unique code
to identify a product like a UPC code. This tag is placed on
shipping pallets or product packages as an adjacent image.
In addition to a bar code, an RFID:
Specifies pallet as allotted to a precise and exclusive set of computer
systems
Helps in finding situations where items have no units left in store
Specifies number of units of each item remaining in store, and thereby
raises an alarm when restocking required
Better tracking of products by differentiating products which are out of
stock and products that are available on shelf.
Use of RFID Data in Retail
• saves time
• reduces labor
• enhances the visibility of
products throughout the
production-delivery life cycle
• saves costs
 What is the significance of Social Data
Network Data, Financial Fraud, Fraud
Detection in Insurance and the uses of Big
Data in Retail Industry
 What are the uses of Big Data in retail
Industry, RFID Data and its advantages
RECAP
BUMPER
BUMPER
Topic 3
Class 1 - Introduction to Big Data
Technologies for Handling Big Data
Distribution & Computing for Big Data
Topic 3 – Technologies for Handling Big Data
Introducing Hadoop
Cloud Computing & In-Memory Technologies
for Big Data
DATA
PROCESSING
Analysed
Distributed & Parallel Computing
BIG DATA
HADOOP
CLOUD
In-Memory
Computing
Transmitter
Receiver
Transmitter
Receiver
Hello?
Transmitter
Receiver
Hello?
Transmitter
Receiver
Hello?
I can’t hear
you…
Slowdown in system performance
Issues caused by Latency:
Slowdown in system performance
Data management
Issues caused by Latency:
Slowdown in system performance
Data management
Internal organisational communication
Issues caused by Latency:
Slowdown in system performance
Data management
Internal organisational communication
External communication
Issues caused by Latency:
Distributed and Parallel processing
Distributed and Parallel processing
techniques process large amounts of
Distributed and Parallel processing
techniques process large amounts of
data and also deal with latency.
Distributed System
A collection of independent computer systems
Distributed System
A collection of independent computer systems
that are connected via a network
Distributed System
A collection of independent computer systems
that are connected via a network
to accomplish a specific task.
Parallel System
A computer system that has
multiple processing units attached to it.
Parallel Computing Techniques
Clusters or Grids
Parallel Computing Techniques
Massively Parallel Processing (MPP)
Parallel Computing Techniques
High-Performance Computing (HPC)
Public Cloud vs Private Cloud
Public Cloud vs Private Cloud
Public Cloud vs Private Cloud
Public Cloud vs Private Cloud
Distribution & Computing for Big Data
Topic 3 – Technologies for Handling Big Data
Introducing Hadoop
Cloud Computing & In-Memory Technologies
for Big Data
Features of Hadoop:
• Works on multiple machines without sharing memory
Features of Hadoop:
• Works on multiple machines without sharing memory
• Distributes data over different servers
Features of Hadoop:
• Works on multiple machines without sharing memory
• Distributes data over different servers
• Can track data stored on different servers
Features of Hadoop:
• Works on multiple machines without sharing memory
• Distributes data over different servers
• Can track data stored on different servers
• Runs all available servers in parallel
Features of Hadoop:
• Works on multiple machines without sharing memory
• Distributes data over different servers
• Can track data stored on different servers
• Runs all available servers in parallel
• Keeps multiple copies of data
Hadoop Cluster
Gateway Node
Hadoop Cluster
Gateway Node
Switch
Hadoop Cluster
Gateway Node
Switch
Server 1 Server 2
Hadoop Cluster
Gateway Node
Switch
Server 1 Server 2 Server 3 Server 4 Server 5
Hadoop Cluster
Gateway Node
Switch
Server 1 Server 2 Server 3 Server 4 Server 5
MapReduce
How does Hadoop work?
• Data of an organisation is loaded into the Hadoop software
How does Hadoop work?
• Data of an organisation is loaded into the Hadoop software
• Data is divided into different pieces & sent to different servers
How does Hadoop work?
• Data of an organisation is loaded into the Hadoop software
• Data is divided into different pieces & sent to different servers
• Hadoop keeps track of the data by sending a job code
to all the servers that store the relevant piece of data
How does Hadoop work?
• Data of an organisation is loaded into the Hadoop software
• Data is divided into different pieces & sent to different servers
• Hadoop keeps track of the data by sending a job code
to all the servers that store the relevant piece of data
• Each server applies the job code to the portion of
data stored on it and returns results
Indexing Job
Hadoop Software
Server 1 Server 2 Server 3
Job Code 1 +
Processing Data
Job Code 2 +
Processing Data
Job Code 3 +
Processing Data
Result
EXAMPLE:
 user_id
 user_name
EXAMPLE:
 user_id
 user_name
 city_name
 service_provider_name
 and call_time
 user_id
 user_name
 city_name
 service_provider_name
 and call_time
RECAP
 Various aspects of distribution and
computing for Big Data
 Hadoop as a technology
for handling Big Data
BUMPER
BUMPER
Topic 3
Class 1 - Introduction to Big Data
Technologies for Handling Big Data
Distribution & Computing for Big Data
Topic 3 – Technologies for Handling Big Data
Introducing Hadoop
Cloud Computing & In-Memory Technologies
for Big Data
Features of Cloud Computing:
• Scalability
Features of Cloud Computing:
• Scalability
• Elasticity
Features of Cloud Computing:
• Scalability
• Elasticity
• Resource Pooling
Features of Cloud Computing:
• Scalability
• Elasticity
• Resource Pooling
• Self Service
Features of Cloud Computing:
• Scalability
• Elasticity
• Resource Pooling
• Self Service
• Low Costs
Features of Cloud Computing:
• Scalability
• Elasticity
• Resource Pooling
• Self Service
• Low Costs
• Fault Tolerance
What are Cloud Deployment Modules?
PRIVATE CLOUD
Categories of Cloud Services:
Other Amazon Web Services:
• Amazon Elastic MapReduce
Other Amazon Web Services:
• Amazon Elastic MapReduce
• Amazon Dynamo DB
Other Amazon Web Services:
• Amazon Elastic MapReduce
• Amazon Dynamo DB
• Amazon S3
Other Amazon Web Services:
• Amazon Elastic MapReduce
• Amazon Dynamo DB
• Amazon S3
• Amazon High-Performance Computing
Other Amazon Web Services:
• Amazon Elastic MapReduce
• Amazon Dynamo DB
• Amazon S3
• Amazon High-Performance Computing
• Amazon RedShift
Google Web Services:
• Google Compute Engine
Google Web Services:
• Google Compute Engine
• Google Big Query
Google Web Services:
• Google Compute Engine
• Google Big Query
• Google Prediction API
Windows Azure
In-memory technology makes it possible for
In-memory technology makes it possible for
departments or business units
In-memory technology makes it possible for
departments or business units
to take the part of the organizational data
In-memory technology makes it possible for
departments or business units
to take the part of the organizational data
that is relevant to their needs and process it locally.
RECAP
In this session we discussed cloud computing &
various in-memory technologies for handling Big Data.
BUMPER

Más contenido relacionado

La actualidad más candente

Case Studies - Customer & Marketing Analytics for Retail
Case Studies - Customer & Marketing Analytics for Retail Case Studies - Customer & Marketing Analytics for Retail
Case Studies - Customer & Marketing Analytics for Retail Gurmit Combo
 
Gartner Digital Marketing Conference 2016: Theater Session (C. Slovak)
Gartner Digital Marketing Conference 2016: Theater Session (C. Slovak) Gartner Digital Marketing Conference 2016: Theater Session (C. Slovak)
Gartner Digital Marketing Conference 2016: Theater Session (C. Slovak) Tealium
 
Convergytics - Data Management, Reporting & Visualization Capabilities
Convergytics - Data Management, Reporting & Visualization CapabilitiesConvergytics - Data Management, Reporting & Visualization Capabilities
Convergytics - Data Management, Reporting & Visualization CapabilitiesRandhir Hebbar
 
Convergytics capabilites and profile
Convergytics capabilites and profileConvergytics capabilites and profile
Convergytics capabilites and profileSantosh Atre
 
Customer analytics for Startup and SMEs
Customer analytics for Startup and SMEsCustomer analytics for Startup and SMEs
Customer analytics for Startup and SMEsSWAGATO CHATTERJEE
 
Quant5 planning ness-050613_final
Quant5 planning ness-050613_finalQuant5 planning ness-050613_final
Quant5 planning ness-050613_finalDoug Levin
 
IBM Retail Analytics Solutions
IBM Retail Analytics Solutions IBM Retail Analytics Solutions
IBM Retail Analytics Solutions Virginia Fernandez
 
Adoption of analytics in retail | Retail Analytics
Adoption of analytics in retail | Retail AnalyticsAdoption of analytics in retail | Retail Analytics
Adoption of analytics in retail | Retail AnalyticsAnkur Khandelwal
 
Creating Business Value - Use Cases in CPG/Retail
Creating Business Value - Use Cases in CPG/RetailCreating Business Value - Use Cases in CPG/Retail
Creating Business Value - Use Cases in CPG/RetailBig Data Pulse
 
Google Analytics Crash Course
Google Analytics Crash CourseGoogle Analytics Crash Course
Google Analytics Crash CoursePeter O'Neill
 
Applied Data Science for E-Commerce
Applied Data Science for E-CommerceApplied Data Science for E-Commerce
Applied Data Science for E-CommerceArul Bharathi
 
About MAIA Intelligence Company Profile
About MAIA Intelligence Company ProfileAbout MAIA Intelligence Company Profile
About MAIA Intelligence Company ProfileSanjay Mehta
 
Digital Velocity 2014 Morning Keynote: "Building an Effective Digital Marketi...
Digital Velocity 2014 Morning Keynote: "Building an Effective Digital Marketi...Digital Velocity 2014 Morning Keynote: "Building an Effective Digital Marketi...
Digital Velocity 2014 Morning Keynote: "Building an Effective Digital Marketi...Tealium
 
StartupFlux Pitch Draft
StartupFlux Pitch DraftStartupFlux Pitch Draft
StartupFlux Pitch DraftVAIBHAV JAIN
 
How Big Data is Changing Retail Marketing Analytics
How Big Data is Changing Retail Marketing Analytics How Big Data is Changing Retail Marketing Analytics
How Big Data is Changing Retail Marketing Analytics Revolution Analytics
 
Next Generation Business And Retail Analytics Webinar
Next Generation Business And Retail Analytics WebinarNext Generation Business And Retail Analytics Webinar
Next Generation Business And Retail Analytics WebinarLightship Partners LLC
 

La actualidad más candente (20)

Case Studies - Customer & Marketing Analytics for Retail
Case Studies - Customer & Marketing Analytics for Retail Case Studies - Customer & Marketing Analytics for Retail
Case Studies - Customer & Marketing Analytics for Retail
 
Gartner Digital Marketing Conference 2016: Theater Session (C. Slovak)
Gartner Digital Marketing Conference 2016: Theater Session (C. Slovak) Gartner Digital Marketing Conference 2016: Theater Session (C. Slovak)
Gartner Digital Marketing Conference 2016: Theater Session (C. Slovak)
 
Convergytics - Data Management, Reporting & Visualization Capabilities
Convergytics - Data Management, Reporting & Visualization CapabilitiesConvergytics - Data Management, Reporting & Visualization Capabilities
Convergytics - Data Management, Reporting & Visualization Capabilities
 
Convergytics capabilites and profile
Convergytics capabilites and profileConvergytics capabilites and profile
Convergytics capabilites and profile
 
Customer analytics fast facts v3
Customer analytics fast facts v3Customer analytics fast facts v3
Customer analytics fast facts v3
 
Customer analytics for Startup and SMEs
Customer analytics for Startup and SMEsCustomer analytics for Startup and SMEs
Customer analytics for Startup and SMEs
 
Retail Analytics
Retail AnalyticsRetail Analytics
Retail Analytics
 
Quant5 planning ness-050613_final
Quant5 planning ness-050613_finalQuant5 planning ness-050613_final
Quant5 planning ness-050613_final
 
IBM Retail Analytics Solutions
IBM Retail Analytics Solutions IBM Retail Analytics Solutions
IBM Retail Analytics Solutions
 
Predictive analytic-for-retail-business
Predictive analytic-for-retail-businessPredictive analytic-for-retail-business
Predictive analytic-for-retail-business
 
Marketing analytics
Marketing analyticsMarketing analytics
Marketing analytics
 
Adoption of analytics in retail | Retail Analytics
Adoption of analytics in retail | Retail AnalyticsAdoption of analytics in retail | Retail Analytics
Adoption of analytics in retail | Retail Analytics
 
Creating Business Value - Use Cases in CPG/Retail
Creating Business Value - Use Cases in CPG/RetailCreating Business Value - Use Cases in CPG/Retail
Creating Business Value - Use Cases in CPG/Retail
 
Google Analytics Crash Course
Google Analytics Crash CourseGoogle Analytics Crash Course
Google Analytics Crash Course
 
Applied Data Science for E-Commerce
Applied Data Science for E-CommerceApplied Data Science for E-Commerce
Applied Data Science for E-Commerce
 
About MAIA Intelligence Company Profile
About MAIA Intelligence Company ProfileAbout MAIA Intelligence Company Profile
About MAIA Intelligence Company Profile
 
Digital Velocity 2014 Morning Keynote: "Building an Effective Digital Marketi...
Digital Velocity 2014 Morning Keynote: "Building an Effective Digital Marketi...Digital Velocity 2014 Morning Keynote: "Building an Effective Digital Marketi...
Digital Velocity 2014 Morning Keynote: "Building an Effective Digital Marketi...
 
StartupFlux Pitch Draft
StartupFlux Pitch DraftStartupFlux Pitch Draft
StartupFlux Pitch Draft
 
How Big Data is Changing Retail Marketing Analytics
How Big Data is Changing Retail Marketing Analytics How Big Data is Changing Retail Marketing Analytics
How Big Data is Changing Retail Marketing Analytics
 
Next Generation Business And Retail Analytics Webinar
Next Generation Business And Retail Analytics WebinarNext Generation Business And Retail Analytics Webinar
Next Generation Business And Retail Analytics Webinar
 

Destacado

Class ppt overview of analytics
Class ppt overview of analyticsClass ppt overview of analytics
Class ppt overview of analyticsJigsawAcademy2014
 
Analytics overview class-ppt
Analytics overview  class-pptAnalytics overview  class-ppt
Analytics overview class-pptJigsawAcademy2014
 
Hadoop eco system-first class
Hadoop eco system-first classHadoop eco system-first class
Hadoop eco system-first classalogarg
 
Hadoop - Introduction to map reduce programming - Reunião 12/04/2014
Hadoop - Introduction to map reduce programming - Reunião 12/04/2014Hadoop - Introduction to map reduce programming - Reunião 12/04/2014
Hadoop - Introduction to map reduce programming - Reunião 12/04/2014soujavajug
 
Hadoop map reduce data flow
Hadoop map reduce data flowHadoop map reduce data flow
Hadoop map reduce data flowIntellipaat
 
Secrets in Kubernetes
Secrets in KubernetesSecrets in Kubernetes
Secrets in KubernetesJerry Jalava
 
Stata datman
Stata datmanStata datman
Stata datmanizahn
 
R Regression Models with Zelig
R Regression Models with ZeligR Regression Models with Zelig
R Regression Models with Zeligizahn
 
Talend Big Data Capabilities Overview
Talend Big Data Capabilities OverviewTalend Big Data Capabilities Overview
Talend Big Data Capabilities OverviewRajan Kanitkar
 
Graphing stata (2 hour course)
Graphing stata (2 hour course)Graphing stata (2 hour course)
Graphing stata (2 hour course)izahn
 
Introduction to the R Statistical Computing Environment
Introduction to the R Statistical Computing EnvironmentIntroduction to the R Statistical Computing Environment
Introduction to the R Statistical Computing Environmentizahn
 
The What, Why and How of Big Data
The What, Why and How of Big DataThe What, Why and How of Big Data
The What, Why and How of Big DataLuca Naso
 

Destacado (20)

Bd class 2 complete
Bd class 2 completeBd class 2 complete
Bd class 2 complete
 
Class ppt overview of analytics
Class ppt overview of analyticsClass ppt overview of analytics
Class ppt overview of analytics
 
Class ppt intro to r
Class ppt intro to rClass ppt intro to r
Class ppt intro to r
 
Analytics overview class-ppt
Analytics overview  class-pptAnalytics overview  class-ppt
Analytics overview class-ppt
 
Hadoop story
Hadoop storyHadoop story
Hadoop story
 
Hadoop - Introduction to mapreduce
Hadoop -  Introduction to mapreduceHadoop -  Introduction to mapreduce
Hadoop - Introduction to mapreduce
 
Hadoop eco system-first class
Hadoop eco system-first classHadoop eco system-first class
Hadoop eco system-first class
 
Hadoop - Introduction to map reduce programming - Reunião 12/04/2014
Hadoop - Introduction to map reduce programming - Reunião 12/04/2014Hadoop - Introduction to map reduce programming - Reunião 12/04/2014
Hadoop - Introduction to map reduce programming - Reunião 12/04/2014
 
Hadoop map reduce data flow
Hadoop map reduce data flowHadoop map reduce data flow
Hadoop map reduce data flow
 
Map reduce
Map reduceMap reduce
Map reduce
 
HadoopFileFormats_2016
HadoopFileFormats_2016HadoopFileFormats_2016
HadoopFileFormats_2016
 
Class ppt intro to-sas
Class ppt   intro to-sasClass ppt   intro to-sas
Class ppt intro to-sas
 
Secrets in Kubernetes
Secrets in KubernetesSecrets in Kubernetes
Secrets in Kubernetes
 
Stata datman
Stata datmanStata datman
Stata datman
 
R Regression Models with Zelig
R Regression Models with ZeligR Regression Models with Zelig
R Regression Models with Zelig
 
Talend Big Data Capabilities Overview
Talend Big Data Capabilities OverviewTalend Big Data Capabilities Overview
Talend Big Data Capabilities Overview
 
Graphing stata (2 hour course)
Graphing stata (2 hour course)Graphing stata (2 hour course)
Graphing stata (2 hour course)
 
Introduction to the R Statistical Computing Environment
Introduction to the R Statistical Computing EnvironmentIntroduction to the R Statistical Computing Environment
Introduction to the R Statistical Computing Environment
 
Hadoop File System Shell Commands,
Hadoop File System Shell Commands,Hadoop File System Shell Commands,
Hadoop File System Shell Commands,
 
The What, Why and How of Big Data
The What, Why and How of Big DataThe What, Why and How of Big Data
The What, Why and How of Big Data
 

Similar a Big data gaurav

Data Lake Architecture – Modern Strategies & Approaches
Data Lake Architecture – Modern Strategies & ApproachesData Lake Architecture – Modern Strategies & Approaches
Data Lake Architecture – Modern Strategies & ApproachesDATAVERSITY
 
Bigdata and Hadoop with applications
Bigdata and Hadoop with applicationsBigdata and Hadoop with applications
Bigdata and Hadoop with applicationsPadma Metta
 
TOPIC.pptx
TOPIC.pptxTOPIC.pptx
TOPIC.pptxinfinix8
 
02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big dataRaul Chong
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataSpringPeople
 
Unit 1 (DSBDA) PD.pptx
Unit 1 (DSBDA)  PD.pptxUnit 1 (DSBDA)  PD.pptx
Unit 1 (DSBDA) PD.pptxSamiksha880257
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataUmair Shafique
 
Modern Metadata Strategies
Modern Metadata StrategiesModern Metadata Strategies
Modern Metadata StrategiesDATAVERSITY
 
Big-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-KoenigBig-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-KoenigManish Chopra
 
Handling and Processing Big Data
Handling and Processing Big DataHandling and Processing Big Data
Handling and Processing Big DataUmair Shafique
 
Group 2 Handling and Processing of big data.pptx
Group 2 Handling and Processing of big data.pptxGroup 2 Handling and Processing of big data.pptx
Group 2 Handling and Processing of big data.pptxsalutiontechnology
 
Data mining with big data implementation
Data mining with big data implementationData mining with big data implementation
Data mining with big data implementationSandip Tipayle Patil
 
Data Modeling for Big Data
Data Modeling for Big DataData Modeling for Big Data
Data Modeling for Big DataDATAVERSITY
 
Oh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG DataOh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG DataPrakalp Agarwal
 

Similar a Big data gaurav (20)

Data Lake Architecture – Modern Strategies & Approaches
Data Lake Architecture – Modern Strategies & ApproachesData Lake Architecture – Modern Strategies & Approaches
Data Lake Architecture – Modern Strategies & Approaches
 
Intro big data analytics
Intro big data analyticsIntro big data analytics
Intro big data analytics
 
uae views on big data
  uae views on  big data  uae views on  big data
uae views on big data
 
Bigdata and Hadoop with applications
Bigdata and Hadoop with applicationsBigdata and Hadoop with applications
Bigdata and Hadoop with applications
 
TOPIC.pptx
TOPIC.pptxTOPIC.pptx
TOPIC.pptx
 
Big_Data.pptx
Big_Data.pptxBig_Data.pptx
Big_Data.pptx
 
02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big data
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Unit 1 (DSBDA) PD.pptx
Unit 1 (DSBDA)  PD.pptxUnit 1 (DSBDA)  PD.pptx
Unit 1 (DSBDA) PD.pptx
 
Big data
Big dataBig data
Big data
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Modern Metadata Strategies
Modern Metadata StrategiesModern Metadata Strategies
Modern Metadata Strategies
 
Big-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-KoenigBig-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-Koenig
 
Big data
Big dataBig data
Big data
 
Understanding big data
Understanding big dataUnderstanding big data
Understanding big data
 
Handling and Processing Big Data
Handling and Processing Big DataHandling and Processing Big Data
Handling and Processing Big Data
 
Group 2 Handling and Processing of big data.pptx
Group 2 Handling and Processing of big data.pptxGroup 2 Handling and Processing of big data.pptx
Group 2 Handling and Processing of big data.pptx
 
Data mining with big data implementation
Data mining with big data implementationData mining with big data implementation
Data mining with big data implementation
 
Data Modeling for Big Data
Data Modeling for Big DataData Modeling for Big Data
Data Modeling for Big Data
 
Oh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG DataOh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG Data
 

Último

How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
PSYCHIATRIC History collection FORMAT.pptx
PSYCHIATRIC   History collection FORMAT.pptxPSYCHIATRIC   History collection FORMAT.pptx
PSYCHIATRIC History collection FORMAT.pptxPoojaSen20
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docxPoojaSen20
 

Último (20)

How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
PSYCHIATRIC History collection FORMAT.pptx
PSYCHIATRIC   History collection FORMAT.pptxPSYCHIATRIC   History collection FORMAT.pptx
PSYCHIATRIC History collection FORMAT.pptx
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docx
 

Big data gaurav

Notas del editor

  1. Welcome to the Big Data Course, jointly presented by Jigsaw Academy and Wiley. Through this course, we hope to create a new international breed of versatile Big Data analysts.  
  2. Class 1 gives you an overview of Big Data. It introduces the concept and gives a broad overview of the business applications of Big Data. In addition, the module gives a broad understanding of the technology infrastructure that is required to store, handle, and manage Big Data. Finally, the module delves a little deeper into the Hadoop ecosystem as well as the MapReduce framework and explains how these popular frameworks support Big Data management. We’ll understand this a lot more in detail as we explore these topics.
  3. Class 1 gives you an overview of Big Data. It introduces the concept and gives a broad overview of the business applications of Big Data. In addition, the module gives a broad understanding of the technology infrastructure that is required to store, handle, and manage Big Data. Finally, the module delves a little deeper into the Hadoop ecosystem as well as the MapReduce framework and explains how these popular frameworks support Big Data management. We’ll understand this a lot more in detail as we explore these topics.
  4. Class 1 gives you an overview of Big Data. It introduces the concept and gives a broad overview of the business applications of Big Data. In addition, the module gives a broad understanding of the technology infrastructure that is required to store, handle, and manage Big Data. Finally, the module delves a little deeper into the Hadoop ecosystem as well as the MapReduce framework and explains how these popular frameworks support Big Data management. We’ll understand this a lot more in detail as we explore these topics.
  5. Class 1 gives you an overview of Big Data. It introduces the concept and gives a broad overview of the business applications of Big Data. In addition, the module gives a broad understanding of the technology infrastructure that is required to store, handle, and manage Big Data. Finally, the module delves a little deeper into the Hadoop ecosystem as well as the MapReduce framework and explains how these popular frameworks support Big Data management. We’ll understand this a lot more in detail as we explore these topics.
  6. Class 1 gives you an overview of Big Data. It introduces the concept and gives a broad overview of the business applications of Big Data. In addition, the module gives a broad understanding of the technology infrastructure that is required to store, handle, and manage Big Data. Finally, the module delves a little deeper into the Hadoop ecosystem as well as the MapReduce framework and explains how these popular frameworks support Big Data management. We’ll understand this a lot more in detail as we explore these topics.
  7. The 1st topic we will discuss today is what Big Data, what are its advantages and sources?
  8. We will be dividing class 1 into three broad categories – history & evolution of Big Data, Structuring of Big Data & the elements that comprise it, Big Data application in business analytics and lastly the career opportunities associated with studying Big Data.
  9. We will be dividing class 1 into three broad categories – history & evolution of Big Data, Structuring of Big Data & the elements that comprise it, Big Data application in business analytics and lastly the career opportunities associated with studying Big Data.
  10. We will be dividing class 1 into three broad categories – history & evolution of Big Data, Structuring of Big Data & the elements that comprise it, Big Data application in business analytics and lastly the career opportunities associated with studying Big Data.
  11. If you think of the world around you, there is an enormous amount of data generated, captured, and transferred through various media—within seconds. This data may come from a personal computer, social networking sites, transaction or communication system of an organization, ATMs, and multiple other channels.
  12. Some reports have recorded that in 2002, there was an estimated 5 exabytes of online data in existence. Each Exabyte is a massive 1000000 terabytes or TBs. By 2009, that number had risen to 281 exabytes—a 56-times increase—and this number has multiplied exponentially post 2009. This data is created in the form of posts, pictures, videos, and weather information.
  13. This accumulation results in a continuous generation of an enormous volume of data, which if analyzed intelligently, can be of immense value, as it can give us a variety of critical information to make smarter decisions. In other words, careful analysis can transform this data to information, and information to insight.
  14. The need to analyze and offer this critical data in a systematic and comprehensive manner leads to the rise of a much discussed term … and the pivot of this course —Big Data.
  15. Big Data is a pool of large-sized datasets to capture, store, search, share, transfer, analyze, and visualize related information or data within an acceptable elapsed time.
  16. Big Data is a pool of large-sized datasets to capture, store, search, share, transfer, analyze, and visualize related information or data within an acceptable elapsed time.
  17. Big Data is a pool of large-sized datasets to capture, store, search, share, transfer, analyze, and visualize related information or data within an acceptable elapsed time.
  18. Big Data assimilation is the process of examining large amounts of data to gain insight.
  19. Big Data assimilation is the process of examining large amounts of data to gain insight.
  20. As data continues to grow, there is a need for the data to be organized and made available so that it can be used as an information source. Earlier due to lack of access and the means to process data, the potential of Big Data remained mostly untapped.
  21. As data continues to grow, there is a need for the data to be organized and made available so that it can be used as an information source. Earlier due to lack of access and the means to process data, the potential of Big Data remained mostly untapped.
  22. As data continues to grow, there is a need for the data to be organized and made available so that it can be used as an information source. Earlier due to lack of access and the means to process data, the potential of Big Data remained mostly untapped.
  23. As data continues to grow, there is a need for the data to be organized and made available so that it can be used as an information source. Earlier due to lack of access and the means to process data, the potential of Big Data remained mostly untapped.
  24. There are 3 main factors to consider when talking about Big Data, so lets take a quick look at each of them.   = It’s a new kind of data. It’s a challenge since it requires leveraging different systems differently.   = It is classified in terms of Volume / Variety and Velocity. Volume refers to the amount of data, whereas Variety refers to type – internal or external or behavioural or social. The third classification Velocity refers to its assimilation … how near or real-time it is. We will look at these concepts in more detail in later classes.   = Lastly, Big Data is largely unstructured and qualitative in nature – hence giving it its name – BIG data.
  25. There are 3 main factors to consider when talking about Big Data, so lets take a quick look at each of them.   = It’s a new kind of data. It’s a challenge since it requires leveraging different systems differently.   = It is classified in terms of Volume / Variety and Velocity. Volume refers to the amount of data, whereas Variety refers to type – internal or external or behavioural or social. The third classification Velocity refers to its assimilation … how near or real-time it is. We will look at these concepts in more detail in later classes.   = Lastly, Big Data is largely unstructured and qualitative in nature – hence giving it its name – BIG data.
  26. There are 3 main factors to consider when talking about Big Data, so lets take a quick look at each of them.   = It’s a new kind of data. It’s a challenge since it requires leveraging different systems differently.   = It is classified in terms of Volume / Variety and Velocity. Volume refers to the amount of data, whereas Variety refers to type – internal or external or behavioural or social. The third classification Velocity refers to its assimilation … how near or real-time it is. We will look at these concepts in more detail in later classes.   = Lastly, Big Data is largely unstructured and qualitative in nature – hence giving it its name – BIG data.
  27. Big Data is a new kind of challenge because besides its enormous implications, its significance is constantly increasing with the growth in data. Today, Big Data can mean anything from a single terabyte to a petabyte or an Exabyte of data.
  28. The systematic study of Big Data across sectors and geographies can lead to results such as: Understanding target customers better Cutting down of expenditures in the healthcare sector Increase in operating margins for the retail sector Several billions of dollars being saved by improvements in operational efficiency
  29. The systematic study of Big Data across sectors and geographies can lead to results such as: Understanding target customers better Cutting down of expenditures in the healthcare sector Increase in operating margins for the retail sector Several billions of dollars being saved by improvements in operational efficiency
  30. The systematic study of Big Data across sectors and geographies can lead to results such as: Understanding target customers better Cutting down of expenditures in the healthcare sector Increase in operating margins for the retail sector Several billions of dollars being saved by improvements in operational efficiency
  31. The systematic study of Big Data across sectors and geographies can lead to results such as: Understanding target customers better Cutting down of expenditures in the healthcare sector Increase in operating margins for the retail sector Several billions of dollars being saved by improvements in operational efficiency
  32. Across industries, data along with analytics can transform major business processes in various ways such as:   Improving performance in sports by analyzing and tracking performance and behavior Improving science and research Improving security and law enforcement by enabling better monitoring Improving financial trading by making more informed decisions
  33. Across industries, data along with analytics can transform major business processes in various ways such as:   Improving performance in sports by analyzing and tracking performance and behavior Improving science and research Improving security and law enforcement by enabling better monitoring Improving financial trading by making more informed decisions
  34. Across industries, data along with analytics can transform major business processes in various ways such as:   Improving performance in sports by analyzing and tracking performance and behavior Improving science and research Improving security and law enforcement by enabling better monitoring Improving financial trading by making more informed decisions
  35. Across industries, data along with analytics can transform major business processes in various ways such as:   Improving performance in sports by analyzing and tracking performance and behavior Improving science and research Improving security and law enforcement by enabling better monitoring Improving financial trading by making more informed decisions
  36. Across organizations, the right analysis of available data can transform major business processes in various ways like in …
  37. Across organizations, the right analysis of available data can transform major business processes in various ways like in …
  38. Across organizations, the right analysis of available data can transform major business processes in various ways like in …
  39. Across organizations, the right analysis of available data can transform major business processes in various ways like in …
  40. Across organizations, the right analysis of available data can transform major business processes in various ways like in …
  41. Across organizations, the right analysis of available data can transform major business processes in various ways like in …
  42. Across organizations, the right analysis of available data can transform major business processes in various ways like in …
  43. Across organizations, the right analysis of available data can transform major business processes in various ways like in …
  44. Across organizations, the right analysis of available data can transform major business processes in various ways like in …
  45. Across organizations, the right analysis of available data can transform major business processes in various ways like in …
  46. Google applied its massive data-collecting power to raise warnings for the flu plagues approximately two weeks in advance of the existing public services. To do this, Google monitored millions of users’ health-tracking behaviors, and followed a cluster of queries on themes such as symptoms about flu, congestion in chest, and incidences of buying a thermometer. Google analyzed this collected data and generated consolidated results that revealed strong indications of flu levels across America.
  47. Besides the more obvious reference to volume, Big Data has also been called so because of the various types and sources of data. Lets look at some of the source types of data and their usage.   Think of social data from sources like Facebook or Twitter, and how much it can tell us about the people using them, and their behavioral patterns. Or data like GPS outputs which can track our movements across the globe – that’s machine data, or even transactional data from when we order a new pair of shoes online, or when we buy pizza.
  48. The need for Big Data is evident. If leaders and economies want exemplary growth and wish to generate value for all their stakeholders, Big Data has to be embraced and used extensively.
  49. Welcome to the Big Data Course, jointly presented by Jigsaw Academy and Wiley. Through this course, we hope to create a new international breed of versatile Big Data analysts.  
  50. The 1st topic we will discuss today is what Big Data, what are its advantages and sources?
  51. Now we will look at the structuring and elements of Big Data.
  52. Now we will look at the structuring and elements of Big Data.
  53. Now we will look at the structuring and elements of Big Data.
  54. In your daily life, you may have come across questions like:
  55. In your daily life, you may have come across questions like:
  56. Today, solutions to such questions can be found by computers. Recommendation systems can analyze and structure a large amount of data specifically for you, on the basis of what you searched, what you looked at, and for how long—thus scanning and presenting you with customized information as per your behavior and habits. This is called structuring of data. This is what goes into play when your favorite shopping site presents you with a fantastically picked set of recommendations when you log in. It is when technology is used to study and analyze the data to understand user behavior, requirements, and preferences to make personalized recommendations for every individual.
  57. Data that comes from multiple sources—such as databases, Enterprise Resource Planning (ERP) systems, weblogs, chat history, and GPS maps—varies in its format. However, different formats of data need to be made consistent and clear to be used for analysis.
  58. Data that comes from multiple sources—such as databases, Enterprise Resource Planning (ERP) systems, weblogs, chat history, and GPS maps—varies in its format. However, different formats of data need to be made consistent and clear to be used for analysis.
  59. Data that comes from multiple sources—such as databases, Enterprise Resource Planning (ERP) systems, weblogs, chat history, and GPS maps—varies in its format. However, different formats of data need to be made consistent and clear to be used for analysis.
  60. Data that comes from multiple sources—such as databases, Enterprise Resource Planning (ERP) systems, weblogs, chat history, and GPS maps—varies in its format. However, different formats of data need to be made consistent and clear to be used for analysis.
  61. Data acquired from various sources can be categorized primarily into the following types of sources:   Internal sources, such as organizational or enterprise data which can be used to support the business operations of an organization. And External sources, such as social data from the Internet or the government which can be analyzed to formulate policy and understand the market, or the environment or technology.
  62. Have a look at the table on your screen. You’ll see that sources can be internal or external, but they usually provide 3 kinds of data … Its when all these 3 data comes together that we can actually visualize what is Big Data. You’ll note that typically unstructured data is larger in volume than structured and semi-structured data. Lets take a closer look at each of these data types.
  63. Structured data can be defined as a set of data with a defined repeating pattern. This pattern makes it easier for any program to sort, read, and process it. Obviously, processing of structured data is much faster than the processing of data without specific repeating patterns.
  64. Lets take a quick look at a sample of structured data, in which the attribute data for every customer is stored with individual data points in the defined fields. From this, lets try derive a few features of structured data.   Is organized data in a predefined format Is data that resides in fixed fields within a record or file Is formatted data that has entities and their attributes mapped Is used to query and report against predetermined data types
  65. Lets take a quick look at a sample of structured data, in which the attribute data for every customer is stored with individual data points in the defined fields. From this, lets try derive a few features of structured data.   Is organized data in a predefined format Is data that resides in fixed fields within a record or file Is formatted data that has entities and their attributes mapped Is used to query and report against predetermined data types
  66. Lets take a quick look at a sample of structured data, in which the attribute data for every customer is stored with individual data points in the defined fields. From this, lets try derive a few features of structured data.   Is organized data in a predefined format Is data that resides in fixed fields within a record or file Is formatted data that has entities and their attributes mapped Is used to query and report against predetermined data types
  67. Lets take a quick look at a sample of structured data, in which the attribute data for every customer is stored with individual data points in the defined fields. From this, lets try derive a few features of structured data.   Is organized data in a predefined format Is data that resides in fixed fields within a record or file Is formatted data that has entities and their attributes mapped Is used to query and report against predetermined data types
  68. Unstructured Data is a set of data with a complex structure that might or might not have a repeating pattern. It: Consists typically of metadata Comprises inconsistent data Consists of data in different formats such as e-mails, text, audio, video, or image files
  69. Some sources for unstructured data include: Text Internal to an Organization: Think of documents, logs, emails etc. Data from Social Media And Mobile Data
  70. Some sources for unstructured data include: Text Internal to an Organization: Think of documents, logs, emails etc. Data from Social Media And Mobile Data
  71. Some sources for unstructured data include: Text Internal to an Organization: Think of documents, logs, emails etc. Data from Social Media And Mobile Data
  72. A fantastic example of the usage of unstructured data is in supermarkets where unstructured visual information from CCTV footage – like where customers halt, their behavior during a bottleneck, how they navigate through a store … is combined with structured data comprising bill counters, products to arrive at a complete data-driven picture of customer behavior. This can be used to create a better shopping experience for the customer, and of course, generate more sales for the store.
  73. About 80 percent of enterprise data consists of unstructured content. Unstructured systems typically have little or no predetermined form and provide users with a wide scope to structure data according to their choice. So it becomes the weapon of choice to gain considerable competitive corporate advantage, and to also gain a more holistic complete picture of future prospects.
  74. About 80 percent of enterprise data consists of unstructured content. Unstructured systems typically have little or no predetermined form and provide users with a wide scope to structure data according to their choice. So it becomes the weapon of choice to gain considerable competitive corporate advantage, and to also gain a more holistic complete picture of future prospects.
  75. About 80 percent of enterprise data consists of unstructured content. Unstructured systems typically have little or no predetermined form and provide users with a wide scope to structure data according to their choice. So it becomes the weapon of choice to gain considerable competitive corporate advantage, and to also gain a more holistic complete picture of future prospects.
  76. The table on your screen shows the result of a survey conducted to ascertain the challenges associated with unstructured data. The survey reveals that the volume of data is the biggest challenge followed by the infrastructure requirement to manage this volume. Managing unstructured data is also difficult because it is not easy to identify it.
  77. Semi-structured data, also known as schema-less or self-describing structure, refers to a form of structured data that contains tags or markup elements in order to separate semantic elements and generate hierarchies of records and fields in the given data. Such type of data does not follow proper structure of data models as in relation databases.
  78. To be organized, semi-structured data should be fed electronically from database systems, file systems, and through data exchange formats including scientific data and XML or eXtensible Markup Language. XML enables data to have an elaborate and intricate structure that is significantly richer and comparatively complex.
  79. To be organized, semi-structured data should be fed electronically from database systems, file systems, and through data exchange formats including scientific data and XML or eXtensible Markup Language. XML enables data to have an elaborate and intricate structure that is significantly richer and comparatively complex.
  80. To be organized, semi-structured data should be fed electronically from database systems, file systems, and through data exchange formats including scientific data and XML or eXtensible Markup Language. XML enables data to have an elaborate and intricate structure that is significantly richer and comparatively complex.
  81. An example of semi-structured data is shown on your screen, which indicates that entities that belong to a same class can have different attributes even if they are grouped together. Now that we have examined the way data arrives and is presented, let us move on to the elements that characterize this data.
  82. Big Data primarily consists of the following three elements: Volume Velocity Variety Lets now take a more detailed look at each of these elements.
  83. Volume is the amount of data generated by organizations or individuals. Today, the volume of data is approaching exabytes. Some experts predict the volume of data to reach zettabytes in the coming years. Think about the numbers – Google Inc processes around 20 petabytes in a single day! While Twitter feeds generate around 80 MB per second!
  84. Velocity describes the rate at which data is generated, captured, and shared. Enterprises can capitalize on data only if it is captured and shared in real-time.
  85. Existing systems such as CRM and ERP face the problem associated with the speed of data, which adds up continuously, and cannot be attended quickly. These systems are able to attend data in batches every few hours; however, the time lag causes the data to lose its importance, and, in the meantime, new data is being constantly generated. Ebay for example, analyzes 5 million transactions per day in real-time to address frauds arising from the usage of Paypal!
  86. A pool of data from social, machine, and mobile sources continues to add new data types and varieties of data to traditional transactional data; thus, data is no longer organized in any predefined form and comprises new types of data, including weblog data, machine data, mobile data, sensor data, social data, and text data.
  87. In this section we will be understanding Big Data Application in business analytics and also the career prospects in Big Data.
  88. Now we will study in detail the application of Big Data in Business Analytics.
  89. Data, which is available in abundance, can be streamlined and exploited for growth and expansion in technology as well as businesses. When data is analyzed successfully, it can be the answer to an important question: how can businesses acquire more customers and gain business insight? The key lies in being able to source, link, understand, and analyze data.
  90. Take a look at this table highlighting different business areas that have benefited by using Big Data and their proportion.
  91. Lets now take a quick look at businesses and industries that are affected by and benefit from Big Data Analytic. Sectors, such as computer and electronic products, and IT have experienced tremendous growth in sales, while sectors, such as finance, insurance, and government have developed accurate assessment techniques.
  92. Big Data has transformed transportation by providing improved traffic information and autonomous features.
  93. Big Data has transformed the modern day education process through innovative approaches for teachers to analyze the students’ ability to comprehend and thus, impart education effectively in accordance with each student’s needs.
  94. The travel industry, too, is using Big Data to conduct business. Most airlines are working toward customer satisfaction by doing more to remember personal preferences. Such customization goes way beyond the mileage rewards—based loyalty programs. Airline companies also apply analytics to pricing, inventory, and advertising to improve customer experiences, which leads to more customer satisfaction, and hence, more business. A similar story can be experienced in the hotel industry as well.
  95. The study and analysis of available data is allowing governments to make informed decisions for fraud management, discover unknown threats, ensure security of global supply chain by monitoring global cargo traffic, use budgets more judiciously, analyze risks, and lots more.
  96. In healthcare, physicians can make use of Big Data to determine the best clinical protocols that will ensure the best health outcome for patients.
  97. Now that you know that Big Data is really BIG in today’s world, you can well understand that so are the opportunities associated with it.
  98. Qualified and experienced Big Data professionals must have a blend of technical expertise, creative and analytical thinking, and communication skills, to be able to effectively collate, clean, analyze, and present information extracted from Big Data. Now that you know that Big Data is really BIG in today’s world, you can well understand that so are the opportunities associated with it.
  99. Most jobs in Big Data are from companies that can be categorized into the following four broad buckets: 1. Big Data technology drivers, e.g., Google 2. Big Data product companies, e.g., Oracle 3. Big Data services companies, e.g., EMC 4. Big Data analytics companies, e.g., Splunk
  100. The flowchart should give you a fairly accurate idea of the step-by-step progress, you can expect from a Big Data certification program – either as an analyst or as a developer.
  101. The flowchart should give you a fairly accurate idea of the step-by-step progress, you can expect from a Big Data certification program – either as an analyst or as a developer.
  102. The flowchart should give you a fairly accurate idea of the step-by-step progress, you can expect from a Big Data certification program – either as an analyst or as a developer.
  103. The flowchart should give you a fairly accurate idea of the step-by-step progress, you can expect from a Big Data certification program – either as an analyst or as a developer.
  104. The flowchart should give you a fairly accurate idea of the step-by-step progress, you can expect from a Big Data certification program – either as an analyst or as a developer.
  105. The flowchart should give you a fairly accurate idea of the step-by-step progress, you can expect from a Big Data certification program – either as an analyst or as a developer.
  106. A Big Data analyst should possess the following technical skills: Understanding of Hadoop, Hive, and MapReduce Knowledge of natural language processing Knowledge of statistical analysis and analytical tools Knowledge of conceptual and predictive modeling
  107. A Big Data analyst should possess the following technical skills: Understanding of Hadoop, Hive, and MapReduce Knowledge of natural language processing Knowledge of statistical analysis and analytical tools Knowledge of conceptual and predictive modeling
  108. A Big Data analyst should possess the following technical skills: Understanding of Hadoop, Hive, and MapReduce Knowledge of natural language processing Knowledge of statistical analysis and analytical tools Knowledge of conceptual and predictive modeling
  109. A Big Data analyst should possess the following technical skills: Understanding of Hadoop, Hive, and MapReduce Knowledge of natural language processing Knowledge of statistical analysis and analytical tools Knowledge of conceptual and predictive modeling
  110. A Big Data analyst should possess the following technical skills: Understanding of Hadoop, Hive, and MapReduce Knowledge of natural language processing Knowledge of statistical analysis and analytical tools Knowledge of conceptual and predictive modeling
  111. A Big Data analyst should possess the following technical skills: Understanding of Hadoop, Hive, and MapReduce Knowledge of natural language processing Knowledge of statistical analysis and analytical tools Knowledge of conceptual and predictive modeling
  112. A Big Data analyst should possess the following technical skills: Understanding of Hadoop, Hive, and MapReduce Knowledge of natural language processing Knowledge of statistical analysis and analytical tools Knowledge of conceptual and predictive modeling
  113. A Big Data analyst should possess the following technical skills: Understanding of Hadoop, Hive, and MapReduce Knowledge of natural language processing Knowledge of statistical analysis and analytical tools Knowledge of conceptual and predictive modeling
  114. Organisations look for professionals who possess good logical & analytical skill, with good communication skills and an affinity toward strategic business thinking. The preferred soft skills requirements for a big data professional are:
  115. Organisations look for professionals who possess good logical & analytical skill, with good communication skills and an affinity toward strategic business thinking. The preferred soft skills requirements for a big data professional are:
  116. Organisations look for professionals who possess good logical & analytical skill, with good communication skills and an affinity toward strategic business thinking. The preferred soft skills requirements for a big data professional are:
  117. Most organizations today consider data and information to be their most valuable and differentiated asset, next to only their employees. By analyzing this data effectively, organizations worldwide are now finding new ways to compete and emerge as leaders in their fields, to improve decision-making, and to enhance performance. At the same time with the volume and variety of data also increasing at an immense speed everyday, the global phenomena of using Big Data to gain business value and competitive advantage will only continue to grow.
  118. To sum it up by analyzing data effectively, organizations worldwide are now finding new ways to compete and emerge as leaders in their fields, to improve decision-making, and to enhance performance. At the same time with the volume and variety of data also increasing at an immense speed everyday, the global phenomena of using Big Data to gain business value and competitive advantage will only continue to grow.
  119. In this class, we’ll look at the significance of social network data in the business context.   The previous class gave you a broad idea about “Big Data” and how it affects our lives. In a sense, the data is only as good as the insights provided by it.
  120. Human beings are social animals and cannot live in isolation. A human being gains knowledge, learns to communicate and think, work and play, by living in a social environment.
  121. Today, socialization is not restricted to meeting and communicating with others in person. The usage of mobile phones and the Internet has made communication across the globe fast and easy. These also make socialization and the sharing of information both affordable and easily accessible.
  122. Twitter, Facebook, and LinkedIn are currently some of the most popular social networking sites. These comprise the social media. This session analyzes the Big Data generated by social media and its implications on various industries.
  123.    In this topic we will understand the: - Significance of social network data Financial Fraud and Big Data Fraud detection in insurance And use in retail industry
  124.    In this topic we will understand the: - Significance of social network data Financial Fraud and Big Data Fraud detection in insurance And use in retail industry
  125.    In this topic we will understand the: - Significance of social network data Financial Fraud and Big Data Fraud detection in insurance And use in retail industry
  126.    In this topic we will understand the: - Significance of social network data Financial Fraud and Big Data Fraud detection in insurance And use in retail industry
  127. In this session, we’ll look at what is Social Network Data? And what is Social Network Analysis? We will also address what are the uses of Social Network Data Analysis? And lastly what Sentiment Analysis is?
  128. In this session, we’ll look at what is Social Network Data? And what is Social Network Analysis? We will also address what are the uses of Social Network Data Analysis? And lastly what Sentiment Analysis is?
  129. In this session, we’ll look at what is Social Network Data? And what is Social Network Analysis? We will also address what are the uses of Social Network Data Analysis? And lastly what Sentiment Analysis is?
  130. In this session, we’ll look at what is Social Network Data? And what is Social Network Analysis? We will also address what are the uses of Social Network Data Analysis? And lastly what Sentiment Analysis is?
  131. Social network data is the data generated when people socialize or communicate through social media.
  132. As you can see, on social networking sites, numerous people constantly add and update their comments, likes, preferences, sentiments, and feelings and thereby generate huge data. This huge data, when mined and analyzed, throws up collective views and trends with regard to the likes and dislikes, wants and preferences of a large population.
  133. This collective data can also be segregated and analyzed in terms of various groups of people, such as people belonging to various age groups, genders, and locations around the world. This information enables organizations to design and tailor products and services that people want. Such is the importance of social network data.
  134. This collective data can also be segregated and analyzed in terms of various groups of people, such as people belonging to various age groups, genders, and locations around the world. This information enables organizations to design and tailor products and services that people want. Such is the importance of social network data.
  135. This collective data can also be segregated and analyzed in terms of various groups of people, such as people belonging to various age groups, genders, and locations around the world. This information enables organizations to design and tailor products and services that people want. Such is the importance of social network data.
  136. Have a look at this image
  137. Now let’s look at what is Social Network Analysis?    
  138. Social Network Analysis or SNA refers to the analysis of the data generated in social networks. As the data used is massive, it leads to a Big Data situation.
  139. Social Network Analysis or SNA refers to the analysis of the data generated in social networks. As the data used is massive, it leads to a Big Data situation.
  140. Social Network Analysis or SNA refers to the analysis of the data generated in social networks. As the data used is massive, it leads to a Big Data situation.
  141. Let’s consider an example of a mobile network operator to understand the value of social network data. The complete set of cell phone calls or text message records captured by an MNO is huge data. Such data is routinely used for a variety of purposes – to possibly tailor offers to the customer, or provide relevant services that the customer routinely uses.
  142. Let’s consider an example of a mobile network operator to understand the value of social network data. The complete set of cell phone calls or text message records captured by an MNO is huge data. Such data is routinely used for a variety of purposes – to possibly tailor offers to the customer, or provide relevant services that the customer routinely uses.
  143. Let’s consider an example of a mobile network operator to understand the value of social network data. The complete set of cell phone calls or text message records captured by an MNO is huge data. Such data is routinely used for a variety of purposes – to possibly tailor offers to the customer, or provide relevant services that the customer routinely uses.
  144. In this example, we will see how data analysis is going up a notch by looking into several degrees of association instead of just one. That’s how social network analysis can make a simple data source into a Big Data source.
  145. This figure represents a caller’s social network. It is possible to go as many layers deep as the analysis can handle to get the complete picture of a social network. The need to go deeper from customer to customer and call to call for several layers makes the volume of data massive. It also increases the difficulty in analyzing it, particularly when it comes to using traditional methods.   It works the same way within social networking sites. When analyzing a member of a social network, it isn’t that hard to identify how many connections a member has, how often messages are posted, and other standard metrics.
  146. This figure represents a caller’s social network. It is possible to go as many layers deep as the analysis can handle to get the complete picture of a social network. The need to go deeper from customer to customer and call to call for several layers makes the volume of data massive. It also increases the difficulty in analyzing it, particularly when it comes to using traditional methods.   It works the same way within social networking sites. When analyzing a member of a social network, it isn’t that hard to identify how many connections a member has, how often messages are posted, and other standard metrics.
  147. This figure represents a caller’s social network. It is possible to go as many layers deep as the analysis can handle to get the complete picture of a social network. The need to go deeper from customer to customer and call to call for several layers makes the volume of data massive. It also increases the difficulty in analyzing it, particularly when it comes to using traditional methods.   It works the same way within social networking sites. When analyzing a member of a social network, it isn’t that hard to identify how many connections a member has, how often messages are posted, and other standard metrics.
  148. This figure represents a caller’s social network. It is possible to go as many layers deep as the analysis can handle to get the complete picture of a social network. The need to go deeper from customer to customer and call to call for several layers makes the volume of data massive. It also increases the difficulty in analyzing it, particularly when it comes to using traditional methods.   It works the same way within social networking sites. When analyzing a member of a social network, it isn’t that hard to identify how many connections a member has, how often messages are posted, and other standard metrics.
  149. This figure represents a caller’s social network. It is possible to go as many layers deep as the analysis can handle to get the complete picture of a social network. The need to go deeper from customer to customer and call to call for several layers makes the volume of data massive. It also increases the difficulty in analyzing it, particularly when it comes to using traditional methods.   It works the same way within social networking sites. When analyzing a member of a social network, it isn’t that hard to identify how many connections a member has, how often messages are posted, and other standard metrics.
  150. This figure represents a caller’s social network. It is possible to go as many layers deep as the analysis can handle to get the complete picture of a social network. The need to go deeper from customer to customer and call to call for several layers makes the volume of data massive. It also increases the difficulty in analyzing it, particularly when it comes to using traditional methods.   It works the same way within social networking sites. When analyzing a member of a social network, it isn’t that hard to identify how many connections a member has, how often messages are posted, and other standard metrics.
  151. This figure represents a caller’s social network. It is possible to go as many layers deep as the analysis can handle to get the complete picture of a social network. The need to go deeper from customer to customer and call to call for several layers makes the volume of data massive. It also increases the difficulty in analyzing it, particularly when it comes to using traditional methods.   It works the same way within social networking sites. When analyzing a member of a social network, it isn’t that hard to identify how many connections a member has, how often messages are posted, and other standard metrics.
  152. This figure represents a caller’s social network. It is possible to go as many layers deep as the analysis can handle to get the complete picture of a social network. The need to go deeper from customer to customer and call to call for several layers makes the volume of data massive. It also increases the difficulty in analyzing it, particularly when it comes to using traditional methods.   It works the same way within social networking sites. When analyzing a member of a social network, it isn’t that hard to identify how many connections a member has, how often messages are posted, and other standard metrics.
  153. This figure represents a caller’s social network. It is possible to go as many layers deep as the analysis can handle to get the complete picture of a social network. The need to go deeper from customer to customer and call to call for several layers makes the volume of data massive. It also increases the difficulty in analyzing it, particularly when it comes to using traditional methods.   It works the same way within social networking sites. When analyzing a member of a social network, it isn’t that hard to identify how many connections a member has, how often messages are posted, and other standard metrics.
  154. However, knowing how wide a network a member has when including friends, friends of friends, and friends of friends of friend, is a lot more work or a Big Data problem.
  155. What are the uses of Social Network Data Analysis?    
  156. By using social network data analysis, decision-making can be improved in the following areas: Business Intelligence Marketing Product Design and Development Lets look at each of these in a little more detail.
  157. By using social network data analysis, decision-making can be improved in the following areas: Business Intelligence Marketing Product Design and Development Lets look at each of these in a little more detail.
  158. By using social network data analysis, decision-making can be improved in the following areas: Business Intelligence Marketing Product Design and Development Lets look at each of these in a little more detail.
  159. Let’s look at how it helps in Business Intelligence, in detail You can analyze data generated from social networks to get some high value business insights.
  160. Social Customer Relationship Management (CRM) is a buzzword these days. This analysis is capable of changing the perspective with which organizations value their customers. Rather than considering a single customer’s value, now it is possible to evaluate the value of a customer’s overall network.
  161. Social Customer Relationship Management (CRM) is a buzzword these days. This analysis is capable of changing the perspective with which organizations value their customers. Rather than considering a single customer’s value, now it is possible to evaluate the value of a customer’s overall network.
  162. Social Customer Relationship Management (CRM) is a buzzword these days. This analysis is capable of changing the perspective with which organizations value their customers. Rather than considering a single customer’s value, now it is possible to evaluate the value of a customer’s overall network.
  163. Let’s consider the example of a mobile service provider which has a relatively low-value customer as a subscriber. The customer has a basic call plan, which does not generate any additional revenue. The customer is barely profitable. The service provider would traditionally have valued this customer on the basis of his or her individual account and hence may not have been too worried if the customer had wanted to leave.
  164. With social network analysis, however, it is possible to identify that the same customer can influence the people in his or her network who are heavy users and who have a wide network of friends. This may persuade the company to make an altogether different business decision and value the customer more.   This may also be because studies have shown that once a member of a calling circle leaves, others are most likely to follow the first and leave. Using social network analysis, it is possible to understand the potential value that the customers can influence, rather than only the revenue they directly generate. This gives a completely different perspective of how the customer needs to be handled.
  165. Law enforcement and anti-terrorism efforts also leverage social network analysis today. It is possible to recognize individuals who are connected, directly or indirectly, to known trouble groups or persons. Analysis of this type is often referred to as link analysis.
  166. Law enforcement and anti-terrorism efforts also leverage social network analysis today. It is possible to recognize individuals who are connected, directly or indirectly, to known trouble groups or persons. Analysis of this type is often referred to as link analysis.
  167. Law enforcement and anti-terrorism efforts also leverage social network analysis today. It is possible to recognize individuals who are connected, directly or indirectly, to known trouble groups or persons. Analysis of this type is often referred to as link analysis.
  168. So, from the above mentioned examples, we can infer the following business insights:   Social network data analysis can help provide new contexts in which decisions are data driven and not opinion driven. Big Data analysis allows organizations to shift goals from maximizing individual account profitability to maximizing the profitability of the customer’s network. Big Data helps organizations to identify highly connected customers and assists in when, where, and how to align and focus marketing efforts in building a better brand image.
  169. So, from the above mentioned examples, we can infer the following business insights:   Social network data analysis can help provide new contexts in which decisions are data driven and not opinion driven. Big Data analysis allows organizations to shift goals from maximizing individual account profitability to maximizing the profitability of the customer’s network. Big Data helps organizations to identify highly connected customers and assists in when, where, and how to align and focus marketing efforts in building a better brand image.
  170. So, from the above mentioned examples, we can infer the following business insights:   Social network data analysis can help provide new contexts in which decisions are data driven and not opinion driven. Big Data analysis allows organizations to shift goals from maximizing individual account profitability to maximizing the profitability of the customer’s network. Big Data helps organizations to identify highly connected customers and assists in when, where, and how to align and focus marketing efforts in building a better brand image.
  171. It enables organizations to lure highly connected customers with free trials and solicit their feedback for the betterment of their products and services. It assists organizations to encourage internal customers to become more active with feedback and opinions on the product or services
  172. It enables organizations to lure highly connected customers with free trials and solicit their feedback for the betterment of their products and services. It assists organizations to encourage internal customers to become more active with feedback and opinions on the product or services
  173. Let’s look at how social network data analysis can improve decision-making in marketing.
  174. Today’s consumers have changed. They no longer read newspapers end-to-end. They do not see fast-forward TV commercials and junk unsolicited e-mail because they have many choices and new options that fit their digital lifestyle better. Consumers can now choose the marketing messages they wish to receive—when, where, and from whom.
  175. In today’s competitive scenario, marketers deliver what consumers want through relevant interactive communication across digital power channels: e-mail, mobile, social, and the Web.
  176. In today’s competitive scenario, marketers deliver what consumers want through relevant interactive communication across digital power channels: e-mail, mobile, social, and the Web.
  177. In today’s competitive scenario, marketers deliver what consumers want through relevant interactive communication across digital power channels: e-mail, mobile, social, and the Web.
  178. In today’s competitive scenario, marketers deliver what consumers want through relevant interactive communication across digital power channels: e-mail, mobile, social, and the Web.
  179. In today’s competitive scenario, marketers deliver what consumers want through relevant interactive communication across digital power channels: e-mail, mobile, social, and the Web.
  180. These channels, in turn, generate the social data required to provide insights into a target audience’s brand communication preferences, the tone of voice they use, their interests, the other brands they discuss, and plenty of other data that can help a brand tailor consumer communication to eke the most out of them.  
  181. These channels, in turn, generate the social data required to provide insights into a target audience’s brand communication preferences, the tone of voice they use, their interests, the other brands they discuss, and plenty of other data that can help a brand tailor consumer communication to eke the most out of them.  
  182. These channels, in turn, generate the social data required to provide insights into a target audience’s brand communication preferences, the tone of voice they use, their interests, the other brands they discuss, and plenty of other data that can help a brand tailor consumer communication to eke the most out of them.  
  183. These channels, in turn, generate the social data required to provide insights into a target audience’s brand communication preferences, the tone of voice they use, their interests, the other brands they discuss, and plenty of other data that can help a brand tailor consumer communication to eke the most out of them.  
  184. These channels, in turn, generate the social data required to provide insights into a target audience’s brand communication preferences, the tone of voice they use, their interests, the other brands they discuss, and plenty of other data that can help a brand tailor consumer communication to eke the most out of them.  
  185.  Social network analysis of this data has a widespread use in marketing in various interesting ways.
  186. Lets look at how retail giant Walmart is using social media to undserstand their customers better. Walmart recently acquired a a social media analytics company named Kosmix and created Walmart Labs, a division that analyzes media communication to understand retail trends. One of the key responsibilities of this division is to monitor public domain conversations and then position Walmart products accordingly.
  187. Affiliate marketing is a reward-based marketing structure, where an affiliated company uses its own market effort to trigger off customers for another company and in turn, is rewarded by the benefited company. For example Brandlove app. Today, one would be hard-pressed to find a major brand that does not have a thriving affiliate program.
  188. Let’s now look at how social network data analysis can improve decision-making in product design & development.
  189. Millions of status updates, blog posts, photographs, and videos are shared every second.
  190. To be successful, organizations not only need to identify the information relevant to their company, products, and services but should be able to dissect, comprehend, and respond to the relevant information in real time and on a continuous basis.
  191. A system that is able to represent a sentiment as data with increased levels of accuracy provides the client a way to access information on a social platform. To measure sentiment more closely is of great value in designing a product or service. It is important for brands to be able to understand the demographic information they receive.
  192. A system that is able to represent a sentiment as data with increased levels of accuracy provides the client a way to access information on a social platform. To measure sentiment more closely is of great value in designing a product or service. It is important for brands to be able to understand the demographic information they receive.
  193. A system that is able to represent a sentiment as data with increased levels of accuracy provides the client a way to access information on a social platform. To measure sentiment more closely is of great value in designing a product or service. It is important for brands to be able to understand the demographic information they receive.
  194. Let’s now look at what is Sentiment Analysis?
  195. Sentiment analysis is defined as a computer programming technique to analyze human emotions, attitudes, and views across popular social networks including Facebook, Twitter, and blogs. The technique requires analytic skill as well as computing techniques.
  196. By listening to what consumers want, by understanding where the gap in the offering is, and so on, organizations can make the right decisions in the direction of their product development and offerings. In this way, social network data can help organizations improve product development and services, also making sure consumers ultimately get the products and services they want.
  197. However, this technique is still evolving, and the full potential of sentiment analysis is yet to be explored by marketers and other business professionals.
  198. There’s also the issue of judgment. Think of a company relying purely on the number of likes and followers they have to estimate their popularity. Deeper studies could possibly show that most of the trends are negative – yet it may all go towards somehow creating a false social media impression about the company.
  199. American airlines has been ranked one of the most disliked companies in the USA. But their social media presence & its studies have a different story to tell. The airlines has about 346,259 followers on Twitter and 273,591 ‘likes’ on Facebook. Deep studies indicate online conversations about the company that are negative, which indicates that it is one of the most disliked airlines. Hence sentiment & emotive data should be given more importance rather than numbers that come from the “followers” and “likes”.
  200. Under this topic we have discussed in detail about We have looked at Social network Data and its analysis. We have addressed the uses of Social Network Data Analysis and how Sentiment analysis is helpful in making better business decisions.
  201. In this class, we’ll look at the significance of social network data in the business context.   The previous class gave you a broad idea about “Big Data” and how it affects our lives. In a sense, the data is only as good as the insights provided by it; hence, it is important to understand how the data is actually.
  202. Now we will look at Financial Fraud and Big Data
  203. Frauds occur frequently in banks and other financial institutions. These financial institutions send educative e-mails and communication on how to prevent such frauds and not be a party to it.
  204. Financial frauds are even higher in the online retail sector. In such frauds cases, online retailers, such as Amazon, eBay, and Groupon, tend to incur huge expenses and losses.
  205. Following are the most common financial frauds that impact online retailers:   Credit Card Frauds: This is a widespread and frequent fraud. The online retailer does not see the user of the card, and hence cannot validate the ownership of the card. That a stolen or even fake card was used in a transaction is also quite likely. Despite the several checks in the process of online transactions all the loopholes in the system are not plugged. Exchange or Return Policy Fraud: Every online retailer has a policy on exchange and return, and this provides a strong area for fraudsters to function. Personal Information Fraud: Here, the customer’s login information is stolen, and thereafter the fraudster logs-in, goes about completing the entire sale transaction, and then changes the address for delivery to a different location.
  206. The only way to prevent these frauds is to understand customers‘ ordering patterns and keep a vigil out for red flags.
  207. Big Data can be intelligently used not just to educate online retailers but also to manage and prevent fraud and losses in their business.
  208. LETS LOOK AT HOW THIS IS POSSIBLE Analyzing data to understand various patterns of the fraud was one of the many preventive methods, but it worked only as long as the sample size was small. This size could not be increased because that required huge investments in time and money. With Big Data techniques, however, this challenge can now be overcome.
  209. Big Data analytics can … Run a check on all the data to identify any fraudulent ones. Identify any new ways of fraud and then keep adding them to a set of fraud-prevention checks. It doesn’t impede customers with unnecessary polices and governance structures.
  210. Fraud Detection in Real Time To detect fraud in real time, Big Data uses a real-time comparison of live transitions with various sources of data to authenticate transactions online. For example, if there were to be a transaction online, Big Data would immediately enable comparison between the incoming IP address and the geo-data from the customer’s smart phone apps. A match would authenticate the transaction.
  211. Big Data can also comb through historical data and indicate fraud patterns that are later used to create checks to prevent real-time fraud.
  212. Retailers use real-time analysis effectively by knowing when exactly the items were delivered to customers. High-value items have attached sensors that can transmit their location. When such items are delivered to customers, retailers process the streaming data from these sensors and thus prevent frauds.
  213. Visually Analyzing Frauds Big Data can facilitate drawing maps and graphs that create comparisons, which are then used to make decisions and create effective systems that are accurately placed to block fraud. An analysis in the graphical form, for example, can help identify the regions, customers, and the products that have a higher fraud rate. Big Data can even show comparisons between products and regions, and so on, which alerts the retailer on where a greater probability of fraud exists.
  214. Let’s assume that an insurance company wants to improve its ability to make decisions in real time when processing a new claim, thereby reducing the claim cycle time. On the other hand, the company incurs a steady increase in the cost of litigation and fraudulent claims. The company has policies and procedures to help underwriters evaluate fraudulent claims; however, the underwriters do not have the required data at the right time to make the necessary decisions, further delaying the processing time.
  215. Within this context, the company implements a Big Data analytical platform, which uses data from social media to provide a real-time view. This enables a call center agent to diagnose the patterns of behaviors and the relationships among other claimants when the customer calls in for a claim for the first time, and leaves a note for the underwriters to go through.
  216. In some cases, social media could also provide great triggers to identify fraud; for example, a customer might indicate that his or her car was destroyed in a flood, but the documentation from the social media feed shows that the car was actually in another city on the day the flood occurred. These glaring discrepancies reflect fraud.
  217. Insurance frauds have a huge cost implication on an organization, which is why organizations prefer using Big Data analytics and other advanced technologies to handle this issue. This also has a positive impact on customers as losses are transferred as higher premiums to customers.
  218. Post the implementation of Big Data analytics platform, organizations are now able to analyze complex information and accident scenarios in minutes rather than days or months.
  219. Fraud Detection Methods   Traditionally, insurance companies have been using statistical models to identify fraudulent claims. These models have many limitations and can prevent fraud only to a certain extent. This section examines these limitations and how Big Data can overcome them.   Insurance companies typically use small samples of data to analyze, which leads to one or more frauds going undetected. This method relies on the previously recorded fraud cases; therefore, every time a fraud based on new technique occurs, insurance companies have to bear the consequences and the losses for the first time. The traditional method of identifying frauds works in independent silos. It is not capable of handling various sources of information from different channels and different functions in an integrated way. Big Data analytics, on the other hand, can handle this kind of challenge.
  220. Public data like bank statements, legal judgments, criminal records and medical bills can provide useful means of predictive analysis in order to avoid frauds.   To get the most effective predictive value from such public data, business organizations integrate their internal data with third party data. This integration helps in investigating and restricting fraudulent activities.
  221. Social Network Analysis   Earlier, we learned about social network analysis (SNA) and how Big Data can be used to create visibility into blind spots for businesses. SNA is an innovative and effective way to identify and detect frauds.
  222. Consider an example. Assume in an accident, all people involved exchanged their addresses and phone numbers and have given them to the insurer. Among them, if the address given by one of the accident victims reveals several claims or the vehicle is identified to have been involved in other claims as well, this will automatically indicate chances of fraudulent claims. The ability to source this information can result in catching such fraudulent claims faster.
  223. The SNA tool uses a mix of analytical methods. This mixed approach includes statistical methods, pattern analysis, and link analysis to uncover large amounts of data to show relationships.
  224. When link analysis is used in fraud detection, one looks for clusters of data and how those data clusters are linked to other data clusters. As already mentioned, public records are various data sources that can be integrated into a model. Using this approach of integrating various data sources into a model, the insurer can rate claims.
  225. If the rating is high, it indicates that the claim is fraudulent. This might be because of a known bad address, or a suspicious provider, or the vehicle was involved in many accidents with multiple carriers.
  226. Before implementing SNA, however, organizations should consider the following questions carefully: 1. How fast does data arrive?
  227. How much of unrequired data is there when it arrives?
  228. How deep should the analysis be before determining the best accurate results?
  229. What type of user interface components need to be included on the SNA dashboard?
  230. Next is the step-by-step SNA method to detect fraud: 1. The data, both structured and unstructured, from various sources is fed into the ETL (Extract, Transform, and Load) tool. This data is then transformed and loaded into a data warehouse. 2. The analytics team uses information from various sources, scores the risk of fraud and ranks the likelihood of fraud. The information used can come from varied sources such as a prior belief or a previous relationship, the number of rejected claims etc. 3. Several Big Data technologies including text mining, sentiment analysis, content categorization, and social network analysis can be included into the fraud detection and predictive modeling mechanism.
  231. 4. Depending on the score of the particular network, an alert is generated. 5. The investigators can then leverage this information and begin researching more on the fraudulent claim. 6. Finally, issues of frauds that are identified are added into the case system.
  232. Predictive Analysis Predictive analysis works with the concept that earlier the fraud detection, the lesser the loss incurred by a business.
  233. Think about a situation where a customer raises a claim saying his car caught fire. But recorded statements indicate that most of the valuable items in the car had been removed prior to the fire. This could raise the suspicion that the car had been torched on purpose.
  234. Predictive analytics includes the use of text analytics and sentiment analysis to look at Big Data for fraud detection. Claim reports are of multiple pages, leaving very little room for text analytics to detect the scam easily. Big Data analytics helps in sifting through unstructured data, which was not possible earlier, and helps in proactively detecting frauds. Predictive analytics technology is being used increasingly to spot potentially fraudulent claims and to speed up the payment of legitimate claims.
  235. Here’s how the predictive analytics technology works: Claim adjusters write lengthy reports while investigating a claim. Typically clues are hidden in the reports that the claims adjuster would not have noticed The computing system that is based on business rules highlights these clues for possible fraud The fraud detection system can spot these discrepancies and flag the claim as fraudulent.
  236. Social Customer Relationship Management (CRM)   Social CRM enables effective fraud detection in the insurance sector. Social CRM is neither a platform nor a technology, but a process. It makes it critical that insurance companies link social media sites, such as Facebook and Twitter, to their CRM systems.
  237. When social media is integrated within an organization, it enables greater transparency with customers. Mutually beneficial transparency indicates that the company trusts its customers and vice-versa. This customer-centric ecosystem reinforces that increasingly the customer base is in control. This ecosystem can be beneficial to the business as well if the business is able to leverage the collective intelligence of its customer base.
  238. Here’s how the predictive analytics technology works: Claim adjusters write lengthy reports while investigating a claim. Typically clues are hidden in the reports that the claims adjuster would not have noticed. The computing system that is based on business rules highlights these clues for possible fraud. The fraud detection system can spot these discrepancies and flag the claim as fraudulent.
  239. Today we will discuss the usage of Big Data in the retail industry.
  240. Big Data has huge potential for the retail industry as well, considering the immense number of transactions and the correlation.
  241. Seemingly simple questions are easy to answer when there is a single retail location and a small customer base: How many basic tees did we sell today? What time of the year do we sell most leggings? What else has customer X bought, and what kind of coupons can we send to customer X?
  242. But in larger systems, with millions of transactions being carried out daily, spread across multiple disconnected legacy systems and IT teams, it is impossible to see the full picture of the data.
  243.  Finding the link in the company’s sales, between in-store and online sales, can lead to deep insights into customer behavior and overall company health, but often this information is so hard to pull together that the issue goes unaddressed. Retail stores typically run on the legacy point of sale systems that batch updates daily, and often do not communicate with each other, let alone with the e-commerce site. For a marketing analyst, to try and understand the strength and health of their products or campaign, reconciling these systems and their different data can be an impossible task. While omni-channel retailing solutions do exist, they require both store managers and Web developers to learn entirely new systems, incurring huge costs in time and money for company-wide training and systems deployment. Further, accessing data in real time is not often possible, as systems hit scaling issues.
  244. Suppose, you want to know if a particular item is in stock in another nearby store. This information is eventually not readily available and requires phone calls or other communication that adds further time to a transaction and potentially prevents an immediate sale from being made.
  245. As retail gets bigger and wider with technology in the likes of Walmart and Amazon, tracking shipping and production also grows significantly. In these scenarios, Big Data proves to be of immense help. Data from innovative solutions like tagging are used for analysis. These tags can generate a lot of data, that can be analyzed to provide various solutions, some of which are discussed in the next section.
  246. But remember, the fact remains that most of the Big Data is just not required and not useful either. Within a Big Data feed, some information will have long-term strategic value, some will be useful only for immediate and tactical use, and some data won’t be used for anything at all. The key part of taming Big Data is to determine which pieces fall into which category.
  247. Use of RFID (Radio Frequency Identification) Data in Retail   A RFID tag refers to a small tag that includes a unique code to identify a product like a UPC code. This tag is placed on shipping pallets or product packages as an adjacent image.
  248. In addition to a bar code, an RFID:   Specifies the pallet as allotted to a precise and exclusive set of computer systems. Helps in finding situations where the items have no units left in store. Specifies the number of units of each item remaining in the store, and thereby raises an alarm when restocking is required. Allows better tracking of products by differentiating the products which are out of stock and products that are available on shelf. For example, if a product is unavailable on the shelf, that does not mean that it is not available throughout. Using a RFID reader and a mobile computer—stocks can be identified from the warehouse and replaced immediately.
  249. In addition to these, use of RFID also saves time, reduces labor, enhances the visibility of products throughout the production-delivery life cycle, and saves costs.
  250. Under this topic we have discussed in detail about The uses of Big Data in retail industry.
  251. Welcome to the Big Data Course, jointly presented by Jigsaw Academy and Wiley. Through this course, we hope to create a new international breed of versatile Big Data analysts.  
  252. In today’s topic we will look at the various technologies used for handling Big Data.
  253. Today we will further discuss how to make use of the enormous volume and variety of data at the required speed, with a suitable technology framework. So we will look at some of the major technologies related to Big Data that help store, process, and analyse the data and provide required business insights.
  254. Rapid changes in technology radically changes the way data is produced, processed, analysed, and consumed. A huge increase in the amount of data being captured and analysed by organizations as well as that on the Internet, has fuelled the need for huge data sources and efficient processing of that data.
  255. Some of the most popular areas of Big Data-related innovation include those in distributed and parallel computing, Hadoop, cloud for Big Data, and in-memory computing for Big Data. Of all the technologies, Hadoop is perhaps the most popular name identified with Big Data.
  256. Distributed computing is a method in which multiple computing resources are connected in a network and computing tasks are distributed across the resources, thereby increasing the computing power. Distributed computing is faster and more efficient than traditional computing, and hence of immense value when it comes to processing a huge amount of data in a limited time.
  257. Parallel computing is a process where to carry out complex computations, the processing power of a standalone personal computer can also be enhanced by adding multiple processing units. These can carry out the processing of a complex task by breaking it up into subtasks, and carrying out individual sub-tasks simultaneously.
  258. Today markets and businesses are fiercely competitive. At the same time, the volume, variety, and velocity of data available has surged astronomically. To find an edge in the market, organizations feel a need for analysing all the data they can get hold of, and in a very short span of time. This obviously leads to the requirement of large storage and processing powe
  259. In spite of all the technological developments there is a constant problem affecting the process of data collection – latency. Latency is the aggregate delay in the system because of delays in individual tasks involving large amount of data. If you use a wireless phone, you may have experienced latency first-hand in the form of a lag or delay in the transmissions between you and your caller.
  260. In spite of all the technological developments there is a constant problem affecting the process of data collection – latency. Latency is the aggregate delay in the system because of delays in individual tasks involving large amount of data. If you use a wireless phone, you may have experienced latency first-hand in the form of a lag or delay in the transmissions between you and your caller.
  261. In spite of all the technological developments there is a constant problem affecting the process of data collection – latency. Latency is the aggregate delay in the system because of delays in individual tasks involving large amount of data. If you use a wireless phone, you may have experienced latency first-hand in the form of a lag or delay in the transmissions between you and your caller.
  262. In spite of all the technological developments there is a constant problem affecting the process of data collection – latency. Latency is the aggregate delay in the system because of delays in individual tasks involving large amount of data. If you use a wireless phone, you may have experienced latency first-hand in the form of a lag or delay in the transmissions between you and your caller.
  263. In spite of all the technological developments there is a constant problem affecting the process of data collection – latency. Latency is the aggregate delay in the system because of delays in individual tasks involving large amount of data. If you use a wireless phone, you may have experienced latency first-hand in the form of a lag or delay in the transmissions between you and your caller.
  264. This delay leads to slowdown in system performance, data management, and communication within an organization as well as with customers and other external stakeholders. Regular Big Data applications usually suffer from the problem of latency, and hence have lower levels of performance. This is a potential problem for businesses.
  265. This delay leads to slowdown in system performance, data management, and communication within an organization as well as with customers and other external stakeholders. Regular Big Data applications usually suffer from the problem of latency, and hence have lower levels of performance. This is a potential problem for businesses.
  266. This delay leads to slowdown in system performance, data management, and communication within an organization as well as with customers and other external stakeholders. Regular Big Data applications usually suffer from the problem of latency, and hence have lower levels of performance. This is a potential problem for businesses.
  267. This delay leads to slowdown in system performance, data management, and communication within an organization as well as with customers and other external stakeholders. Regular Big Data applications usually suffer from the problem of latency, and hence have lower levels of performance. This is a potential problem for businesses.
  268. As a response to handling all these problems, distributed and parallel processing techniques provided concrete solutions for not just processing large amounts of data in a short span, but also in dealing with the latency.
  269. As a response to handling all these problems, distributed and parallel processing techniques provided concrete solutions for not just processing large amounts of data in a short span, but also in dealing with the latency.
  270. As a response to handling all these problems, distributed and parallel processing techniques provided concrete solutions for not just processing large amounts of data in a short span, but also in dealing with the latency.
  271. A collection of independent computer systems that are connected via a network to accomplish a specific task. The connected computers are loosely coupled and can access data and resources that are remotely located.
  272. A collection of independent computer systems that are connected via a network to accomplish a specific task. The connected computers are loosely coupled and can access data and resources that are remotely located.
  273. A collection of independent computer systems that are connected via a network to accomplish a specific task. The connected computers are loosely coupled and can access data and resources that are remotely located.
  274. A computer system that has multiple processing units attached to it. These systems are tightly coupled and are usually employed to solve a single complex problem.
  275. Several servers are connected to form a network, so that the workload can be shared amongst them. A cluster equipped with the same type of commodity hardware is called homogeneous cluster. A cluster equipped with a combination of different hardware is called heterogeneous cluster. An organization can utilize the hardware components acquired over a period of time, to form a cluster or grid. This method is usually cost-effective. Also, grids offer cost-effective storage solutions, although the overall costs may be high.
  276. An MPP platform is a single machine that works like a grid. It handles storage, memory, and computing tasks. These capabilities are optimized by software written especially for the MPP platform. The platform is also optimized for scalability. MPP platforms are suitable for high value uses. EMC Greenplum and ParAccel are examples of MPP platforms.
  277. HPC environments offer very high performance and scalability. They use in-memory technology and are used for high-speed floating point processing. You will read more about in-memory technology in the following sections. HPC environments are ideal for specialty applications and custom application development. These environments are suitable for research or business organizations where high costs are acceptable because the results are very valuable, or the project is strategically important.
  278. A public cloud environment is a type of cluster or grid that can be accessed through the internet. The cloud owner or vendor develops a cluster, and then allows customers to use it for storage or computing tasks for a fee. Amazon and EC2 are examples of public clouds. A public cloud gives businesses the flexibility to buy computing power as per their needs.   In case of a private cloud environment, an organization’s cluster is private and accessible through its network. The private cluster is suitable for businesses that have high priority for data privacy. The cost of private clouds is shared among the business units.
  279. A public cloud environment is a type of cluster or grid that can be accessed through the internet. The cloud owner or vendor develops a cluster, and then allows customers to use it for storage or computing tasks for a fee. Amazon and EC2 are examples of public clouds. A public cloud gives businesses the flexibility to buy computing power as per their needs.   In case of a private cloud environment, an organization’s cluster is private and accessible through its network. The private cluster is suitable for businesses that have high priority for data privacy. The cost of private clouds is shared among the business units.
  280. A public cloud environment is a type of cluster or grid that can be accessed through the internet. The cloud owner or vendor develops a cluster, and then allows customers to use it for storage or computing tasks for a fee. Amazon and EC2 are examples of public clouds. A public cloud gives businesses the flexibility to buy computing power as per their needs.   In case of a private cloud environment, an organization’s cluster is private and accessible through its network. The private cluster is suitable for businesses that have high priority for data privacy. The cost of private clouds is shared among the business units.
  281. A public cloud environment is a type of cluster or grid that can be accessed through the internet. The cloud owner or vendor develops a cluster, and then allows customers to use it for storage or computing tasks for a fee. Amazon and EC2 are examples of public clouds. A public cloud gives businesses the flexibility to buy computing power as per their needs.   In case of a private cloud environment, an organization’s cluster is private and accessible through its network. The private cluster is suitable for businesses that have high priority for data privacy. The cost of private clouds is shared among the business units.
  282. In this session we will study Hadoop in detail, one of the most preferred technologies to handle Big Data.
  283. Hadoop is an open-source platform designed to work with huge volumes of structured and unstructured data—Big Data. Working with such volume of data needs deep analytical technology, which requires greater computational power.
  284. Lets look at some of the features of Hadoop:   It can work on a large number of machines that do not share any memory or disks. This solves the twin Big Data problems of efficient storage and access. Storage improves when the data is loaded on a Hadoop platform because Hadoop distributes data over different servers. Access improves because Hadoop can track the data stored on the different servers. Processing improves as Hadoop runs computing tasks by using all available processors working in parallel. Hadoop improves resilience by keeping multiple copies of data, which can be used in case of server failure.
  285. Lets look at some of the features of Hadoop:   It can work on a large number of machines that do not share any memory or disks. This solves the twin Big Data problems of efficient storage and access. Storage improves when the data is loaded on a Hadoop platform because Hadoop distributes data over different servers. Access improves because Hadoop can track the data stored on the different servers. Processing improves as Hadoop runs computing tasks by using all available processors working in parallel. Hadoop improves resilience by keeping multiple copies of data, which can be used in case of server failure.
  286. Lets look at some of the features of Hadoop:   It can work on a large number of machines that do not share any memory or disks. This solves the twin Big Data problems of efficient storage and access. Storage improves when the data is loaded on a Hadoop platform because Hadoop distributes data over different servers. Access improves because Hadoop can track the data stored on the different servers. Processing improves as Hadoop runs computing tasks by using all available processors working in parallel. Hadoop improves resilience by keeping multiple copies of data, which can be used in case of server failure.
  287. Lets look at some of the features of Hadoop:   It can work on a large number of machines that do not share any memory or disks. This solves the twin Big Data problems of efficient storage and access. Storage improves when the data is loaded on a Hadoop platform because Hadoop distributes data over different servers. Access improves because Hadoop can track the data stored on the different servers. Processing improves as Hadoop runs computing tasks by using all available processors working in parallel. Hadoop improves resilience by keeping multiple copies of data, which can be used in case of server failure.
  288. Lets look at some of the features of Hadoop:   It can work on a large number of machines that do not share any memory or disks. This solves the twin Big Data problems of efficient storage and access. Storage improves when the data is loaded on a Hadoop platform because Hadoop distributes data over different servers. Access improves because Hadoop can track the data stored on the different servers. Processing improves as Hadoop runs computing tasks by using all available processors working in parallel. Hadoop improves resilience by keeping multiple copies of data, which can be used in case of server failure.
  289. So how does Hadoop use multiple computing resources to execute a task?   The Hadoop Distributed File System (HDFS) is a reliable, high-bandwidth, low-cost data storage cluster that facilitates management of related files across machines. The Hadoop MapReduce Engine is a high-performance parallel/distributed data-processing implementation of the MapReduce algorithm.
  290. So how does Hadoop use multiple computing resources to execute a task?   The Hadoop Distributed File System (HDFS) is a reliable, high-bandwidth, low-cost data storage cluster that facilitates management of related files across machines. The Hadoop MapReduce Engine is a high-performance parallel/distributed data-processing implementation of the MapReduce algorithm.
  291. Hadoop is designed to process large amounts of structured and unstructured data and is implemented in racks of commodity servers as a Hadoop cluster. Each server works independently at its task and returns its response. Servers can also be removed or added from the cluster dynamically because Hadoop is able to detect changes, including failures, and adjust to those changes and continue to operate without interruption.
  292. Hadoop is designed to process large amounts of structured and unstructured data and is implemented in racks of commodity servers as a Hadoop cluster. Each server works independently at its task and returns its response. Servers can also be removed or added from the cluster dynamically because Hadoop is able to detect changes, including failures, and adjust to those changes and continue to operate without interruption.
  293. Hadoop is designed to process large amounts of structured and unstructured data and is implemented in racks of commodity servers as a Hadoop cluster. Each server works independently at its task and returns its response. Servers can also be removed or added from the cluster dynamically because Hadoop is able to detect changes, including failures, and adjust to those changes and continue to operate without interruption.
  294. Hadoop is designed to process large amounts of structured and unstructured data and is implemented in racks of commodity servers as a Hadoop cluster. Each server works independently at its task and returns its response. Servers can also be removed or added from the cluster dynamically because Hadoop is able to detect changes, including failures, and adjust to those changes and continue to operate without interruption.
  295. Hadoop is designed to process large amounts of structured and unstructured data and is implemented in racks of commodity servers as a Hadoop cluster. Each server works independently at its task and returns its response. Servers can also be removed or added from the cluster dynamically because Hadoop is able to detect changes, including failures, and adjust to those changes and continue to operate without interruption.
  296. MapReduce is the programming model which allows mapping the tasks to different servers and reducing the responses to one result. Hadoop MapReduce is an implementation of the MapReduce algorithm developed and maintained by the Apache project. This algorithm provides the capabilities to break data into manageable chunks, process the data in parallel on the distributed cluster, and then make the data available for user consumption or additional processing.
  297. The map component of MapReduce distributes the programming problem or tasks across a large number of systems, and handles the placement of the tasks in a way that balances the load and manages recovery from failures. After the distributed computation is completed, another function called Reduce, aggregates all the elements back together to provide a result.
  298. When Hadoop receives an indexing job, the data of an organization is first loaded into the Hadoop software, which then divides the data into different pieces and sends each piece of data to different servers. Hadoop keeps track of the data by sending a job code to all the servers that store the relevant piece of data. Each server then applies the job code to the portion of data stored on it and returns the results.
  299. When Hadoop receives an indexing job, the data of an organization is first loaded into the Hadoop software, which then divides the data into different pieces and sends each piece of data to different servers. Hadoop keeps track of the data by sending a job code to all the servers that store the relevant piece of data. Each server then applies the job code to the portion of data stored on it and returns the results.
  300. When Hadoop receives an indexing job, the data of an organization is first loaded into the Hadoop software, which then divides the data into different pieces and sends each piece of data to different servers. Hadoop keeps track of the data by sending a job code to all the servers that store the relevant piece of data. Each server then applies the job code to the portion of data stored on it and returns the results.
  301. When Hadoop receives an indexing job, the data of an organization is first loaded into the Hadoop software, which then divides the data into different pieces and sends each piece of data to different servers. Hadoop keeps track of the data by sending a job code to all the servers that store the relevant piece of data. Each server then applies the job code to the portion of data stored on it and returns the results.
  302. This chart describes the process of job tracking in MapReduce.
  303. Lets look at an example to understand how Hadoop works. Consider the records of all telephone calls in a city. Suppose, a researcher wants to know the number of college students who made calls at the time of a particular event. The indexing query would specify the relevant user information and the time of the event. Each server would search its collection of call records and return the ones that match the query. Hadoop would put together all these sets into one result. Lets suppose, all records of telephone calls are stored in the csv format in the server. First, the data is loaded in Hadoop and then the MapReduce programming model is used to process the data. Suppose there are five columns in the csv file: user_id user_name city_name service_provider_name and call_time
  304. Lets look at an example to understand how Hadoop works. Consider the records of all telephone calls in a city. Suppose, a researcher wants to know the number of college students who made calls at the time of a particular event. The indexing query would specify the relevant user information and the time of the event. Each server would search its collection of call records and return the ones that match the query. Hadoop would put together all these sets into one result. Lets suppose, all records of telephone calls are stored in the csv format in the server. First, the data is loaded in Hadoop and then the MapReduce programming model is used to process the data. Suppose there are five columns in the csv file: user_id user_name city_name service_provider_name and call_time
  305. To find the number of users or students who made calls at a particular time, a student is identified by the user_id. The final output is the total number of users who made calls during a particular time period, say, 9–10 pm. To get the final output, the data is passed line by line to each mapper. After completion of the mapper job, the Hadoop framework shuffles or sorts and groups the data and sends it to the reducer, which gives the final output. The Hadoop platform also facilitates data storage on many machines. This facility allows a business to use multiple commodity servers and run Hadoop on each, instead of creating an integrated system.
  306. Welcome to the Big Data Course, jointly presented by Jigsaw Academy and Wiley. Through this course, we hope to create a new international breed of versatile Big Data analysts.  
  307. This topic deals with the various technologies used for handling Big Data.
  308. In this session we will understand cloud computing & various in-memory technologies for handling Big Data.
  309. Cloud-based application platforms enable easy availability of computing resources to an application, and lets you pay for these resources accordingly, depending on what and how much you use. In the context of cloud computing, such a feature is called elasticity—you can regulate and access the computing resources dynamically with a touch of a button and pay.
  310. In cloud computing, all data is gathered in data centers and then distributed to the end-users. Further, automatic backups and recovery of data is also ensured for business continuity. The primary reason Cloud and Big Data analytics complement each other is because Cloud, like Big Data, uses distributed computing as well.
  311. Amazon & Google are two large companies who are required to have massive capability to manage huge amounts of data to move their business. They need infrastructure and technologies that can support their applications at a huge scale. Think of the millions of g-mail messages that Google needs to process every minute, or every second as a part of this job. Google has been able to optimize the Linux OS and its software environment to support e-mails efficiently. Its able to capture and leverage massive amounts of data about its mail users and search engine users to drive its business. Similarly, Amazon with its IaaS data centres is optimized to facilitate massive workloads to offer services and support to innumerable centers. Both these companies now offer a range of cloud-based services for Big Data as well.
  312. Scalability - Even if organizations increase the processing power of their hardware, they may need to change the architecture, and may face issues with running the software on the new hardware. Cloud provides the solution to this. It provides scalability by using distributed computing.   Elasticity - Cloud solutions allow customers to use and pay for the exact amount of cloud service they need, depending on the requirement; for example, a business may expect more data during an in-store promotion, and hence can buy more processing power during such times.   Resource Pooling - Multiple organizations using similar computing resources do not need to invest in them individually. Cloud can offer these resources, and as these resources are used by many, the cost to the cloud comes down.   Self-service - Customers can access cloud services directly through a user interface that allows them to choose the services they want. This is automated and does not need human intervention.   Low Costs - Businesses do not need to make a large initial investment in computing resources to handle huge operations such as Big Data analytics. They can sign up for a cloud service and pay as they use. In the process, the cloud provider enjoys economies of scale. This benefits the customers.   Fault Tolerance - If a part of the cloud fails, the other parts can take over and give customers uninterrupted service.
  313. Scalability - Even if organizations increase the processing power of their hardware, they may need to change the architecture, and may face issues with running the software on the new hardware. Cloud provides the solution to this. It provides scalability by using distributed computing.   Elasticity - Cloud solutions allow customers to use and pay for the exact amount of cloud service they need, depending on the requirement; for example, a business may expect more data during an in-store promotion, and hence can buy more processing power during such times.   Resource Pooling - Multiple organizations using similar computing resources do not need to invest in them individually. Cloud can offer these resources, and as these resources are used by many, the cost to the cloud comes down.   Self-service - Customers can access cloud services directly through a user interface that allows them to choose the services they want. This is automated and does not need human intervention.   Low Costs - Businesses do not need to make a large initial investment in computing resources to handle huge operations such as Big Data analytics. They can sign up for a cloud service and pay as they use. In the process, the cloud provider enjoys economies of scale. This benefits the customers.   Fault Tolerance - If a part of the cloud fails, the other parts can take over and give customers uninterrupted service.
  314. Scalability - Even if organizations increase the processing power of their hardware, they may need to change the architecture, and may face issues with running the software on the new hardware. Cloud provides the solution to this. It provides scalability by using distributed computing.   Elasticity - Cloud solutions allow customers to use and pay for the exact amount of cloud service they need, depending on the requirement; for example, a business may expect more data during an in-store promotion, and hence can buy more processing power during such times.   Resource Pooling - Multiple organizations using similar computing resources do not need to invest in them individually. Cloud can offer these resources, and as these resources are used by many, the cost to the cloud comes down.   Self-service - Customers can access cloud services directly through a user interface that allows them to choose the services they want. This is automated and does not need human intervention.   Low Costs - Businesses do not need to make a large initial investment in computing resources to handle huge operations such as Big Data analytics. They can sign up for a cloud service and pay as they use. In the process, the cloud provider enjoys economies of scale. This benefits the customers.   Fault Tolerance - If a part of the cloud fails, the other parts can take over and give customers uninterrupted service.
  315. Scalability - Even if organizations increase the processing power of their hardware, they may need to change the architecture, and may face issues with running the software on the new hardware. Cloud provides the solution to this. It provides scalability by using distributed computing.   Elasticity - Cloud solutions allow customers to use and pay for the exact amount of cloud service they need, depending on the requirement; for example, a business may expect more data during an in-store promotion, and hence can buy more processing power during such times.   Resource Pooling - Multiple organizations using similar computing resources do not need to invest in them individually. Cloud can offer these resources, and as these resources are used by many, the cost to the cloud comes down.   Self-service - Customers can access cloud services directly through a user interface that allows them to choose the services they want. This is automated and does not need human intervention.   Low Costs - Businesses do not need to make a large initial investment in computing resources to handle huge operations such as Big Data analytics. They can sign up for a cloud service and pay as they use. In the process, the cloud provider enjoys economies of scale. This benefits the customers.   Fault Tolerance - If a part of the cloud fails, the other parts can take over and give customers uninterrupted service.
  316. Scalability - Even if organizations increase the processing power of their hardware, they may need to change the architecture, and may face issues with running the software on the new hardware. Cloud provides the solution to this. It provides scalability by using distributed computing.   Elasticity - Cloud solutions allow customers to use and pay for the exact amount of cloud service they need, depending on the requirement; for example, a business may expect more data during an in-store promotion, and hence can buy more processing power during such times.   Resource Pooling - Multiple organizations using similar computing resources do not need to invest in them individually. Cloud can offer these resources, and as these resources are used by many, the cost to the cloud comes down.   Self-service - Customers can access cloud services directly through a user interface that allows them to choose the services they want. This is automated and does not need human intervention.   Low Costs - Businesses do not need to make a large initial investment in computing resources to handle huge operations such as Big Data analytics. They can sign up for a cloud service and pay as they use. In the process, the cloud provider enjoys economies of scale. This benefits the customers.   Fault Tolerance - If a part of the cloud fails, the other parts can take over and give customers uninterrupted service.
  317. Scalability - Even if organizations increase the processing power of their hardware, they may need to change the architecture, and may face issues with running the software on the new hardware. Cloud provides the solution to this. It provides scalability by using distributed computing.   Elasticity - Cloud solutions allow customers to use and pay for the exact amount of cloud service they need, depending on the requirement; for example, a business may expect more data during an in-store promotion, and hence can buy more processing power during such times.   Resource Pooling - Multiple organizations using similar computing resources do not need to invest in them individually. Cloud can offer these resources, and as these resources are used by many, the cost to the cloud comes down.   Self-service - Customers can access cloud services directly through a user interface that allows them to choose the services they want. This is automated and does not need human intervention.   Low Costs - Businesses do not need to make a large initial investment in computing resources to handle huge operations such as Big Data analytics. They can sign up for a cloud service and pay as they use. In the process, the cloud provider enjoys economies of scale. This benefits the customers.   Fault Tolerance - If a part of the cloud fails, the other parts can take over and give customers uninterrupted service.
  318. A public cloud is owned and operated by an organization, for use by other organizations and individuals. A public cloud offers a range of computing services. For each category of service, it specializes in a specific type of workload. By specializing, the cloud can customize hardware and software to optimize performance. Customization makes the computing process highly scalable; for example, a cloud can specialize in storing videos for live streaming on YouTube or Vimeo and optimize to handle a large volume of traffic. For businesses, public cloud provides economical storage solutions and is an efficient way to handle complex data analysis. However these factors sometimes increases the risk of security & latency.
  319. A private cloud is owned and operated by an organization for its own purposes. Besides the employees, partners and customers of the organization also use the private cloud. Private cloud is designed for one organization, and incorporates the systems and processes of that organization, including the organization’s business rules, governance policies, and compliance checks. Things that need to be done manually in the public cloud because of different specifications given by multiple customers, can be automated in the private cloud. This cloud is thus highly automated and also protected by a firewall. This reduces latency and improves security, making it ideal for Big Data analytics.
  320. Apart from being used for Big Data analytics, the Cloud is used for several purposes such as storage, backup, and customer services. As more people use computers on the go, business tasks have shifted to laptops and mobile devices and subsequently to the cloud. Consumers may order a product from their home, and the store receives the order and sends instructions to the warehouse, which delivers the product. The store could be using the cloud to receive the order and send instructions, as well as to handle payments and track deliveries. These tasks can also be done without using cloud computing, but cloud computing lowers infrastructure costs and provides scalable content storage.
  321. Infrastructure as a service Infrastructure refers to hardware, storage, and network. When you pay to save your holiday photographs on a cloud, you use a public IaaS. When an employee saves a work report on the organization’s backup server, the employee uses a private IaaS. IaaS provides hardware, storage, and network as a service. Examples of IaaS are virtual machines, load balancers, and network-attached storage. A business can save investments in physical infrastructure by using a public cloud IaaS. The business can choose the OS, and IaaS allows the business to create virtual machines with scalable storage and processing power.
  322. Platform as a service PaaS provides a platform to write and run users applications. The Platform refers to the OS, which is a collection of middleware services and software development and deployment tools. Examples of PaaS are Windows Azure and Google App Engine or GAE. When an organization has a private cloud PaaS, programmers in the business unit can create and deploy applications for their needs. PaaS makes it easier to experiment with new applications.
  323. Software as a service SaaS provides software that can be accessed from anywhere. Customers can use software on the cloud without buying and installing it on their own devices. These software applications are offered on monthly or yearly contracts. For SaaS to work, the infrastructure (IaaS) and the platform (PaaS) must be in place. An organization can maintain a custom-developed application in its private cloud and link it to Big Data stored in a public cloud. In a hybrid cloud, the application can efficiently analyze the data by using the strengths of private and public clouds.
  324. Among the many established and new cloud service providers, some offer resources specifically for Big Data analytics. Lets look at a few of these:   Amazon - The development of Amazon’s IaaS, called Elastic Compute Cloud (Amazon EC2) was a result of the company’s massive infrastructure of computing resources for its own business, which were actually underused. So, Amazon decided to rent them out and earn revenues. The word “elastic” in the name is justified because these resources can be scaled hour by hour.
  325. In addition to Amazon EC2, Amazon Web Services offers the following services: Amazon Elastic MapReduce which Refers to a Web service that provides cost-effective processing of vast amounts of data by using Amazon EC2 and Amazon Simple Storage Service (Amazon S3).   Amazon DynamoDB which Refers to a NoSQL database service that stores data items on Solid State Drives (SSDs) and replicates data with high availability and durability.   Amazon Simple Storage Service (S3) It refers to a Web interface for storing data over the Internet and for Web-scale computing.   Amazon High-Performance Computing It Refers to a low-latency network with high bandwidths and computational capabilities for solving problems from educational and business fields.   Amazon RedShift - Which refers to a petabyte-scale data warehouse service for analysing data by using existing business intelligence tools in a cost-effective manner.
  326. In addition to Amazon EC2, Amazon Web Services offers the following services: Amazon Elastic MapReduce which Refers to a Web service that provides cost-effective processing of vast amounts of data by using Amazon EC2 and Amazon Simple Storage Service (Amazon S3).   Amazon DynamoDB which Refers to a NoSQL database service that stores data items on Solid State Drives (SSDs) and replicates data with high availability and durability.   Amazon Simple Storage Service (S3) It refers to a Web interface for storing data over the Internet and for Web-scale computing.   Amazon High-Performance Computing It Refers to a low-latency network with high bandwidths and computational capabilities for solving problems from educational and business fields.   Amazon RedShift - Which refers to a petabyte-scale data warehouse service for analysing data by using existing business intelligence tools in a cost-effective manner.
  327. In addition to Amazon EC2, Amazon Web Services offers the following services: Amazon Elastic MapReduce which Refers to a Web service that provides cost-effective processing of vast amounts of data by using Amazon EC2 and Amazon Simple Storage Service (Amazon S3).   Amazon DynamoDB which Refers to a NoSQL database service that stores data items on Solid State Drives (SSDs) and replicates data with high availability and durability.   Amazon Simple Storage Service (S3) It refers to a Web interface for storing data over the Internet and for Web-scale computing.   Amazon High-Performance Computing It Refers to a low-latency network with high bandwidths and computational capabilities for solving problems from educational and business fields.   Amazon RedShift - Which refers to a petabyte-scale data warehouse service for analysing data by using existing business intelligence tools in a cost-effective manner.
  328. In addition to Amazon EC2, Amazon Web Services offers the following services: Amazon Elastic MapReduce which Refers to a Web service that provides cost-effective processing of vast amounts of data by using Amazon EC2 and Amazon Simple Storage Service (Amazon S3).   Amazon DynamoDB which Refers to a NoSQL database service that stores data items on Solid State Drives (SSDs) and replicates data with high availability and durability.   Amazon Simple Storage Service (S3) It refers to a Web interface for storing data over the Internet and for Web-scale computing.   Amazon High-Performance Computing It Refers to a low-latency network with high bandwidths and computational capabilities for solving problems from educational and business fields.   Amazon RedShift - Which refers to a petabyte-scale data warehouse service for analysing data by using existing business intelligence tools in a cost-effective manner.
  329. In addition to Amazon EC2, Amazon Web Services offers the following services: Amazon Elastic MapReduce which Refers to a Web service that provides cost-effective processing of vast amounts of data by using Amazon EC2 and Amazon Simple Storage Service (Amazon S3).   Amazon DynamoDB which Refers to a NoSQL database service that stores data items on Solid State Drives (SSDs) and replicates data with high availability and durability.   Amazon Simple Storage Service (S3) It refers to a Web interface for storing data over the Internet and for Web-scale computing.   Amazon High-Performance Computing It Refers to a low-latency network with high bandwidths and computational capabilities for solving problems from educational and business fields.   Amazon RedShift - Which refers to a petabyte-scale data warehouse service for analysing data by using existing business intelligence tools in a cost-effective manner.
  330. Now lets look closer at what Google has to offer in terms of services designed for Big Data: Google Compute Engine: It is a secure and flexible virtual machine computing environment.   Google BigQuery: It is a Desktop as a Service (DaaS) that searches large datasets at high speeds on the basis of queries in the SQL format.   Google Prediction API: It identifies patterns in data, stores the patterns, and improves the pattern with every use.
  331. Now lets look closer at what Google has to offer in terms of services designed for Big Data: Google Compute Engine: It is a secure and flexible virtual machine computing environment.   Google BigQuery: It is a Desktop as a Service (DaaS) that searches large datasets at high speeds on the basis of queries in the SQL format.   Google Prediction API: It identifies patterns in data, stores the patterns, and improves the pattern with every use.
  332. Now lets look closer at what Google has to offer in terms of services designed for Big Data: Google Compute Engine: It is a secure and flexible virtual machine computing environment.   Google BigQuery: It is a Desktop as a Service (DaaS) that searches large datasets at high speeds on the basis of queries in the SQL format.   Google Prediction API: It identifies patterns in data, stores the patterns, and improves the pattern with every use.
  333. And next, lets see what Windows Azure is all about. On the basis of Windows and SQL abstractions, Microsoft has produced a set of development tools, virtual machine support, management and media services, and mobile device services in a PaaS offering. For customers with deep expertise in .NET, SQL Server, and Windows, the adoption of the Azure-based PaaS is straightforward. To address the emerging requirements to integrate Big Data into Windows Azure solutions, Microsoft has also added Windows Azure HDInsight. Built on the Hortonworks Data Platform (HDP), which according to Microsoft, offers 100 percent compatibility with Apache Hadoop, HDInsight supports connection with Microsoft Excel and other Business Intelligence tools. In addition, Azure HDInsight can also be deployed on the Windows Server.
  334. Large organizations store data in a central warehouse, and all users have to access it from there, usually through the IT department. In-memory technology makes it possible for departments or business units to take the part of the organizational data that is relevant to their needs and process it locally. This reduces the workload on the central warehouse. Users do not need the IT department to work with the data
  335. Large organizations store data in a central warehouse, and all users have to access it from there, usually through the IT department. In-memory technology makes it possible for departments or business units to take the part of the organizational data that is relevant to their needs and process it locally. This reduces the workload on the central warehouse. Users do not need the IT department to work with the data
  336. Large organizations store data in a central warehouse, and all users have to access it from there, usually through the IT department. In-memory technology makes it possible for departments or business units to take the part of the organizational data that is relevant to their needs and process it locally. This reduces the workload on the central warehouse. Users do not need the IT department to work with the data
  337. Large organizations store data in a central warehouse, and all users have to access it from there, usually through the IT department. In-memory technology makes it possible for departments or business units to take the part of the organizational data that is relevant to their needs and process it locally. This reduces the workload on the central warehouse. Users do not need the IT department to work with the data
  338. In this session we discussed cloud computing & various in-memory technologies for handling Big Data.