Big data gaurav

•Descargar como PPTX, PDF•

1 recomendación•3,517 vistas

JigsawAcademy2014

Big Data Class 1

Educación

Understanding Big Data
Class 1
Introduction to Big Data

Understanding Big Data
Business Applications of Big Data
Class 1
Introduction to Big Data

Understanding Big Data
Business Applications of Big Data
Technologies for handling Big Data
Class 1
Introduction to Big Data

Understanding Big Data
Business Applications of Big Data
Technologies for handling Big Data
Big Data Management Systems – Databases &
Warehouses
Class 1
Introduction to Big Data

Topic 1
Class 1
Introduction to Big Data
Understanding Big Data

What is Big Data?
Topic 1 – Understanding Big Data

What is Big Data?
Topic 1 – Understanding Big Data
Structuring & Elements

What is Big Data?
Topic 1 – Understanding Big Data
Structuring & Elements
Application in Business & Careers

DATA
Personal
Computers
Facebook
Twitter
YouTube
Google
ATMs
Drop Box
Picasa

2002
5 Exabytes
Online Data
2009
281
Exabytes
Online Data
(56 Times
Increase)

A pool of large-sized datasets to capture, store,
What is Big Data?

A pool of large-sized datasets to capture, store,
What is Big Data?
search, share, transfer, analyse, and visualise

A pool of large-sized datasets to capture, store,
What is Big Data?
search, share, transfer, analyse, and visualise
related information or data within an acceptable
elapsed time.

Data = Information
Information = Insight

• Every second, consumers make 10,000 payment
card transactions worldwide

• Every second, consumers make 10,000 payment
card transactions worldwide
• Every hour, Walmart handles more than 1 million
customer transactions

BIG DATA
Is a new data
challenge that
requires
leveraging
existing
systems
differently

BIG DATA
Is a new data
challenge that
requires
leveraging
existing
systems
differently
Is classified in terms of:
Volume (terabytes, records,
transactions)
Variety (internal, external,
behavioural, or/and social)
Velocity (near or real-time
assimilation)

• Understanding target customer
Advantages of Studying Big Data:

• Understanding target customer
• Cutting down expenditures in the healthcare
Advantages of Studying Big Data:

• Understanding target customer
• Cutting down expenditures in the healthcare
• Increase in operating margins in retail
Advantages of Studying Big Data:

• Understanding target customer
• Cutting down expenditures in the healthcare
• Increase in operating margins in retail
• Profits with improvements in operational
efficiency
Advantages of Studying Big Data:

• Sports
• Science and Research
Industries that Benefit:

• Sports
• Science and Research
• Security and Law Enforcement
Industries that Benefit:

• Sports
• Science and Research
• Security and Law Enforcement
• Financial Trading
Industries that Benefit:

• Procurement
Departments that can Benefit:

• Procurement
• Product Development
Departments that can Benefit:

• Procurement
• Product Development
• Manufacturing
Departments that can Benefit:

• Procurement
• Product Development
• Manufacturing
• Distribution
Departments that can Benefit:

• Procurement
• Product Development
• Manufacturing
• Distribution
• Marketing
Departments that can Benefit:

• Procurement
• Product Development
• Manufacturing
• Distribution
• Marketing
• Price Management
Departments that can Benefit:

• Procurement
• Product Development
• Manufacturing
• Distribution
• Marketing
• Price Management
• Merchandising
Departments that can Benefit:

• Procurement
• Product Development
• Manufacturing
• Distribution
• Marketing
• Price Management
• Merchandising
• Sales
Departments that can Benefit:

• Procurement
• Product Development
• Manufacturing
• Distribution
• Marketing
• Price Management
• Merchandising
• Sales
• Store operations
Departments that can Benefit:

• Procurement
• Product Development
• Manufacturing
• Distribution
• Marketing
• Price Management
• Merchandising
• Sales
• Store operations
• Human Resources
Departments that can Benefit:

Flu Indications & WarningsMassive Data Collection
Analyse
Collected
Data
Early Warnings for Flu Plague

Social Data from Networking Sites
reveals Behavioural Patterns

Use Big Data for Growth & Value Addition

RECAP
What is Big Data, its advantages and various
sources

What is Big Data?
Class 1 - Introduction to Big Data

What is Big Data?
Class 1 - Introduction to Big Data
Structuring & Elements

What is Big Data?
Class 1 - Introduction to Big Data
Structuring & Elements
Application in Business & Careers

How do I choose
a book, of the
millions available
on my favorite
sites or stores?
How can I use
the vast amount
of data
and information I
come across?

How do I keep
myself updated
of events,
news?
Which news
articles should
I read?
How do I choose
a book, of the
millions available
on my favorite
sites or stores?
How can I use
the vast amount
of data
and information I
come across?

Internal – Organisational
or enterprise data
Sources of Data:
External - Social Data from the
internet or Government

Structured
Data
Unstructured
Data
Semi-
Structured
Data
BIG DATA

• Has a predefined format
Features of Structured Data:

• Has a predefined format
• Resides in fixed fields within a record
Features of Structured Data:

• Has a predefined format
• Resides in fixed fields within a record
• Has their attributes mapped
Features of Structured Data:

• Has a predefined format
• Resides in fixed fields within a record
• Has their attributes mapped
• Used to report against predetermined data types
Features of Structured Data:

Sources of Structured Data:
• Relational databases

Sources of Structured Data:
• Relational databases
• Flat files in record format

Sources of Structured Data:
• Relational databases
• Flat files in record format
• Multidimensional databases

Sources of Structured Data:
• Relational databases
• Flat files in record format
• Multidimensional databases
• Legacy databases

Sources of Unstructured Data:
• Organisational Data

Sources of Unstructured Data:
• Organisational Data
• Social Media

Sources of Unstructured Data:
• Organisational Data
• Social Media
• Mobile Data

Challenges of Using Unstructured Data:
• Difficulty and time consumption in making
sense

Challenges of Using Unstructured Data:
• Difficulty and time consumption in making
sense
• Difficulty in combining and linking unstructured
data to more structured information

Sources of Semi-Structured data:
• Database systems

Sources of Semi-Structured data:
• Database systems
• File systems like Web data and bibliographic data

Sources of Semi-Structured data:
• Database systems
• File systems like Web data and bibliographic data
• Data exchange formats like scientific data

Sl. No Name E-mail
1. Sam Jacobs smj@xyz.com
2. First Name David davidb@xyz.com
Last Name Brown

Big Data Application In
Business Analytics

What are the areas where
Big Data can be applied?

Transportation
Provides improved traffic
information and autonomous
features

Education
Through innovative approaches for
teachers to analyze students

Travel
Apply analytics to pricing,
inventory, and advertising to
improve customer experiences

Governments
To make informed decisions for
fraud management, discover
unknown threats, ensure security
of global supply chain

Healthcare
To ensure clinical protocols that
will ensure the best health
outcome for patients

Major Big Data Hiring Companies:
Product companies, e.g., Oracle
Technology drivers, e.g., Google
Services companies, e.g., EMC
Data analytics companies, e.g., Splunk

The most common job titles in Big Data include:
Big Data Analyst

The most common job titles in Big Data include:
Big Data Analyst
Big Data Scientist

The most common job titles in Big Data include:
Big Data Analyst
Big Data Scientist
Big Data Developer

Module 1
Introduction to Big Data
Big Data Analyst
Certification Track
Big Data Developer
Certification Track

Module 1
Introduction to Big Data
Big Data Analyst
Certification Track
Big Data Developer
Certification Track
Module 2
Introduction to
Analytics & R
Programming
Module 3
Data Analysis
Using R
Module 4
Advanced
Analytics
Using R
Module 2
Managing a
Big Data
Ecosystem

Technical Skills Required
for a Big Data Analyst:

Technical Skills Required
for a Big Data Analyst:
• Handle & analyse massive data sets using MapReduce

Technical Skills Required
for a Big Data Analyst:
• Handle & analyse massive data sets using MapReduce
• Hadoop & components Hbase & Hive

Technical Skills Required
for a Big Data Analyst:
• Handle & analyse massive data sets using MapReduce
• Hadoop & components Hbase & Hive
• SQL and NoSQL languages such as Impala, Hive and Pig

Soft Skills Required:
• Strong written & verbal communication skills

Soft Skills Required:
• Strong written & verbal communication skills
• Analytical Ability

Soft Skills Required:
• Strong written & verbal communication skills
• Analytical Ability
• Basic understanding of how a business works

RECAP
 What are the various types and structures
of Big Data and the elements that form it
 What are the business applications of Big
Data and the career opportunities
associated

Topic 2
Business Applications of Big Data
Class 1: Introduction to Big Data

Topic 2
Business Applications of Big Data
Significance of Social Network Data

Topic 2
Business Applications of Big Data
Significance of Social Network Data
Financial Fraud & Big Data

Topic 2
Business Applications of Big Data
Significance of Social Network Data
Financial Fraud & Big Data
Fraud Detection in Insurance

Topic 2
Business Applications of Big Data
Significance of Social Network Data
Financial Fraud & Big Data
Fraud Detection in Insurance
Use in Retail Industry

Significance of Social Network Data
What is Social Network Data?

Significance of Social Network Data
What is Social Network Data?
What is Social Network Analysis?

Significance of Social Network Data
What is Social Network Data?
What is Social Network Analysis?
What are the uses of Social Network
Data Analysis?

Social Network Analysis (SNA)
Social
Network

Social Network Analysis (SNA)
Social
Network
DATA

Analysis
Social Network Analysis (SNA)
Social
Network
DATA

Total
Number
of calls
Total
Number
of SMS

Social Networking Analysis
a Big Data Problem

Social Network Analysis (SNA)
Business Intelligence

Social Network Analysis (SNA)
Business Intelligence
Marketing

Social Network Analysis (SNA)
Business Intelligence
Marketing
Product Design & Development

Provides new contexts in which decisions are data driven,
not opinion driven
Social Network Data Analysis

Provides new contexts in which decisions are data driven,
not opinion driven
Organizations to shift goals to maximize profitability of
customer’s network
Social Network Data Analysis

Organizations to lure highly connected customers with free
trials and solicit their feedback
Social Network Data Analysis

Organizations to lure highly connected customers with free
trials and solicit their feedback
Organizations to encourage internal customers to become
more active
Social Network Data Analysis

Sentiment Analysis
Marketers
Business
Professionals

3,46,259
Followers
2,73,591
Likes
But is one of the most disliked airlines. Why?

SummaryRECAP
What is social network data and analysis
What are its uses and values

Common Financial Frauds
Common Financial Frauds
Credit Card Frauds
Exchange or Return Policy Fraud
Personal Information Fraud

understand
customers ordering
patterns
Prevent Frauds
watch out
For red flags

Analyzing
data
sample
size Small
Can understand
various patterns of the
fraud
Analyzing
data
sample
size Large
Cannot understand
various patterns of the
fraud
• Size could not be increased, required huge investments
in time and money
• Big Data techniques can overcome this challenge

Big Data analytics can…
Run check on all data to identify fraudulent
ones
Identify new ways of fraud and add to a set of
fraud-prevention checks
Doesn’t impede customers with unnecessary
polices and governance structures

Fraud Detection in Real Time
BIG DATA
live transactions sources of data

BIG DATA
Historical Data
Indicate fraud
patterns
Checks to prevent
real-time fraud

BIG DATA
Create
comparisons
Drawing Maps &
Graphs
Decisions and
effective systems
BLOCK FRAUD

Topic 2
Business Applications of Big Data
The Significance of Social Network Data
Financial Fraud and Big Data
Fraud Detection in Insurance
Use of Big Data in the Retail Industry

Insurance Company
Improve its ability to make decisions in real time when
processing a new claim, thereby reducing the claim
cycle time
Incurs a steady increase in the cost of litigation and
fraudulent claims
Underwriters do not have required data at the right time
to make the necessary decisions, further delaying
processing time

BIG DATA
Social Media
Data
Note for
underwriter

Social Media Triggers to identify Fraud
These glaring discrepancies reflect FRAUD.
In the claim - a customer might indicate that
his or her car was destroyed in a flood
Documentation from the social media feed
shows that the car was actually in another
city on the day the flood occurred.

Insurance Frauds
Have a huge cost implication on organization
Organizations prefer using Big Data analytics and other
advanced technologies
Positive impact on customers as losses are transferred
as higher premiums to customers

Big Data analytics
platform
Organizations are now able to analyze complex information
and accident scenarios in minutes rather than days or months
INSURANCE

 Typically use small samples of data to analyze
 Method relies on the previously recorded fraud cases
 Every time a fraud based on new technique occurs, insurance
companies have to bear the consequences and the losses for
the first time
 The traditional method of identifying frauds works in
independent silos
 It is not capable of handling various sources of information from
different channels and different functions in an integrated way
Fraud Detection Methods
Statistical Models

Public
Data
Bank Statements
Legal Judgments
Criminal Records
Medical Bills

Social Network Analysis (SNA)
Big Data can be used to create visibility into blind spots
for businesses
SNA is an innovative and effective way to identify and
detect frauds

SNA tool uses a mix of analytical methods
• Statistical methods
• Pattern analysis
• Link analysis

When link analysis is used in fraud detection
• Looks for clusters of data
• How those data clusters are linked to other data
clusters?
• Public records are various data sources that can
be integrated into a model
• The insurer can rate claims

When link analysis is used in fraud detection
If the rating is high
It indicates that the claim is fraudulent
• known bad address
• a suspicious provider
• the vehicle was involved in many accidents with multiple
carriers.

How much of unrequired data is
there when it arrives?

How deep should the analysis be
before determining
the best accurate results?

What type of user interface
components need to be included
on the SNA dashboard?

SNA method to detect fraud:
Structured and unstructured data, from various sources fed into the
ETL (Extract, Transform, and Load) tool
This data is then transformed and loaded into data warehouse
Analytics team uses information from various sources, scores risk of
fraud and ranks likelihood of fraud
Information used can come from varied sources - prior belief, previous
relationship, number of rejected claims etc.
Big Data technologies - text mining, sentiment analysis, content
categorization, and social network analysis included into the fraud
detection and predictive modeling mechanism.

SNA method to detect fraud:
Depending on score of particular network, an alert is generated
Investigators can leverage this information and begin researching
more on fraudulent claim
Issues of frauds identified are added into case system.

Predictive analysis works with the
concept that earlier the fraud detection,
the lesser the loss incurred by a business.

Fraud detection
BIG DATA
Text analytics Sentiment analysis
Predictive analytics

Predictive Analytics
Technology
Claim adjusters write lengthy reports while investigating a claim.
Clues are hidden in reports that claims adjuster would not notice
Computing system based on business rules highlights clues for
possible fraud
Fraud detection system spot these discrepancies and flag claim
as fraudulent

The following briefly describes how a Social CRM
process works:
Uses organization’s existing CRM to gather data from various social media
platforms
Uses “listening” tool to extract data from social chatter that acts as reference
data for existing data in organization’s CRM
Reference data along with information stored in CRM fed into a case
management system
Case management system analyzes information on basis of organization’s
business rules and sends response
Response from claim management system on fraudulent claim is confirmed by
investigators

Class 1: Introduction to Big Data
The Significance of Social Network Data
Financial Fraud and Big Data
Fraud Detection in Insurance
Use of Big Data in Retail Industry

Use of Big Data in Retail Industry
BIG DATA

Use of Big Data in Retail Industry
How many basic tees did we sell today?
What time of the year do we sell most leggings?
What else has customer X bought?
what kind of coupons can we send to customer X?

Use of Big Data in Retail Industry
In-store Sales Online Sales

Most of the Big Data is just not required
and not useful either
• some information will have long-term strategic value
• some will be useful only for immediate and tactical use
• some data won’t be used for anything at all

Use of RFID Data in Retail
(Radio Frequency Identification)
A RFID tag refers to a small tag that includes a unique code
to identify a product like a UPC code. This tag is placed on
shipping pallets or product packages as an adjacent image.

In addition to a bar code, an RFID:
Specifies pallet as allotted to a precise and exclusive set of computer
systems
Helps in finding situations where items have no units left in store
Specifies number of units of each item remaining in store, and thereby
raises an alarm when restocking required
Better tracking of products by differentiating products which are out of
stock and products that are available on shelf.

Use of RFID Data in Retail
• saves time
• reduces labor
• enhances the visibility of
products throughout the
production-delivery life cycle
• saves costs

 What is the significance of Social Data
Network Data, Financial Fraud, Fraud
Detection in Insurance and the uses of Big
Data in Retail Industry
 What are the uses of Big Data in retail
Industry, RFID Data and its advantages
RECAP

Topic 3
Class 1 - Introduction to Big Data
Technologies for Handling Big Data

Distribution & Computing for Big Data
Topic 3 – Technologies for Handling Big Data
Introducing Hadoop
Cloud Computing & In-Memory Technologies
for Big Data

Distributed & Parallel Computing
BIG DATA
HADOOP
CLOUD
In-Memory
Computing

Transmitter
Receiver
Hello?
I can’t hear
you…

Slowdown in system performance
Issues caused by Latency:

Slowdown in system performance
Data management
Issues caused by Latency:

Slowdown in system performance
Data management
Internal organisational communication
Issues caused by Latency:

Slowdown in system performance
Data management
Internal organisational communication
External communication
Issues caused by Latency:

Distributed and Parallel processing
techniques process large amounts of

Distributed and Parallel processing
techniques process large amounts of
data and also deal with latency.

Distributed System
A collection of independent computer systems

Distributed System
A collection of independent computer systems
that are connected via a network

Distributed System
A collection of independent computer systems
that are connected via a network
to accomplish a specific task.

Parallel System
A computer system that has
multiple processing units attached to it.

Parallel Computing Techniques
Clusters or Grids

Parallel Computing Techniques
Massively Parallel Processing (MPP)

Parallel Computing Techniques
High-Performance Computing (HPC)

Features of Hadoop:
• Works on multiple machines without sharing memory

Features of Hadoop:
• Works on multiple machines without sharing memory
• Distributes data over different servers

Features of Hadoop:
• Works on multiple machines without sharing memory
• Distributes data over different servers
• Can track data stored on different servers

Hadoop Cluster
Gateway Node
Switch
Server 1 Server 2

Hadoop Cluster
Gateway Node
Switch
Server 1 Server 2 Server 3 Server 4 Server 5

How does Hadoop work?
• Data of an organisation is loaded into the Hadoop software

How does Hadoop work?
• Data of an organisation is loaded into the Hadoop software
• Data is divided into different pieces & sent to different servers

Indexing Job
Hadoop Software
Server 1 Server 2 Server 3
Job Code 1 +
Processing Data
Job Code 2 +
Processing Data
Job Code 3 +
Processing Data
Result

EXAMPLE:
 user_id
 user_name
 city_name
 service_provider_name
 and call_time

 user_id
 user_name
 city_name
 service_provider_name
 and call_time

RECAP
 Various aspects of distribution and
computing for Big Data
 Hadoop as a technology
for handling Big Data

Features of Cloud Computing:
• Scalability

Features of Cloud Computing:
• Scalability
• Elasticity

Features of Cloud Computing:
• Scalability
• Elasticity
• Resource Pooling

Features of Cloud Computing:
• Scalability
• Elasticity
• Resource Pooling
• Self Service

Features of Cloud Computing:
• Scalability
• Elasticity
• Resource Pooling
• Self Service
• Low Costs

Features of Cloud Computing:
• Scalability
• Elasticity
• Resource Pooling
• Self Service
• Low Costs
• Fault Tolerance

Other Amazon Web Services:
• Amazon Elastic MapReduce

Other Amazon Web Services:
• Amazon Elastic MapReduce
• Amazon Dynamo DB

Other Amazon Web Services:
• Amazon Elastic MapReduce
• Amazon Dynamo DB
• Amazon S3

Other Amazon Web Services:
• Amazon Elastic MapReduce
• Amazon Dynamo DB
• Amazon S3
• Amazon High-Performance Computing

Other Amazon Web Services:
• Amazon Elastic MapReduce
• Amazon Dynamo DB
• Amazon S3
• Amazon High-Performance Computing
• Amazon RedShift

Google Web Services:
• Google Compute Engine

Google Web Services:
• Google Compute Engine
• Google Big Query

Google Web Services:
• Google Compute Engine
• Google Big Query
• Google Prediction API

In-memory technology makes it possible for

In-memory technology makes it possible for
departments or business units

In-memory technology makes it possible for
departments or business units
to take the part of the organizational data

In-memory technology makes it possible for
departments or business units
to take the part of the organizational data
that is relevant to their needs and process it locally.

RECAP
In this session we discussed cloud computing &
various in-memory technologies for handling Big Data.

Más contenido relacionado

La actualidad más candente

Case Studies - Customer & Marketing Analytics for Retail Gurmit Combo

Gartner Digital Marketing Conference 2016: Theater Session (C. Slovak) Tealium

Convergytics - Data Management, Reporting & Visualization CapabilitiesRandhir Hebbar

Convergytics capabilites and profileSantosh Atre

Customer analytics fast facts v3Absolutdata Analytics

Customer analytics for Startup and SMEsSWAGATO CHATTERJEE

Retail AnalyticsAaum Research and Analytics Private Limited

Quant5 planning ness-050613_finalDoug Levin

IBM Retail Analytics Solutions Virginia Fernandez

Predictive analytic-for-retail-businessBig Data Engineering, Faculty of Engineering, Dhurakij Pundit University

Marketing analyticsData Science Thailand

Adoption of analytics in retail | Retail AnalyticsAnkur Khandelwal

Creating Business Value - Use Cases in CPG/RetailBig Data Pulse

Google Analytics Crash CoursePeter O'Neill

Applied Data Science for E-CommerceArul Bharathi

About MAIA Intelligence Company ProfileSanjay Mehta

Digital Velocity 2014 Morning Keynote: "Building an Effective Digital Marketi...Tealium

StartupFlux Pitch DraftVAIBHAV JAIN

How Big Data is Changing Retail Marketing Analytics Revolution Analytics

Next Generation Business And Retail Analytics WebinarLightship Partners LLC

La actualidad más candente (20)

Case Studies - Customer & Marketing Analytics for Retail

Gartner Digital Marketing Conference 2016: Theater Session (C. Slovak)

Convergytics - Data Management, Reporting & Visualization Capabilities

Convergytics capabilites and profile

Customer analytics fast facts v3

Customer analytics for Startup and SMEs

Retail Analytics

Quant5 planning ness-050613_final

IBM Retail Analytics Solutions

Predictive analytic-for-retail-business

Marketing analytics

Adoption of analytics in retail | Retail Analytics

Creating Business Value - Use Cases in CPG/Retail

Google Analytics Crash Course

Applied Data Science for E-Commerce

About MAIA Intelligence Company Profile

Digital Velocity 2014 Morning Keynote: "Building an Effective Digital Marketi...

StartupFlux Pitch Draft

How Big Data is Changing Retail Marketing Analytics

Next Generation Business And Retail Analytics Webinar

Destacado

Bd class 2 completeJigsawAcademy2014

Class ppt overview of analyticsJigsawAcademy2014

Class ppt intro to rJigsawAcademy2014

Analytics overview class-pptJigsawAcademy2014

Hadoop storyDeep Kakkar

Hadoop - Introduction to mapreduceVibrant Technologies & Computers

Hadoop eco system-first classalogarg

Hadoop - Introduction to map reduce programming - Reunião 12/04/2014soujavajug

Hadoop map reduce data flowIntellipaat

Map reduceHyosung Jeon

HadoopFileFormats_2016Jakub Wszolek, PhD

Class ppt intro to-sasJigsawAcademy2014

Secrets in KubernetesJerry Jalava

Stata datmanizahn

R Regression Models with Zeligizahn

Talend Big Data Capabilities OverviewRajan Kanitkar

Graphing stata (2 hour course)izahn

Introduction to the R Statistical Computing Environmentizahn

Hadoop File System Shell Commands,Hadoop online training

The What, Why and How of Big DataLuca Naso

Destacado (20)

Bd class 2 complete

Class ppt overview of analytics

Class ppt intro to r

Analytics overview class-ppt

Hadoop story

Hadoop - Introduction to mapreduce

Hadoop eco system-first class

Hadoop - Introduction to map reduce programming - Reunião 12/04/2014

Hadoop map reduce data flow

Map reduce

HadoopFileFormats_2016

Class ppt intro to-sas

Secrets in Kubernetes

Stata datman

R Regression Models with Zelig

Talend Big Data Capabilities Overview

Graphing stata (2 hour course)

Introduction to the R Statistical Computing Environment

Hadoop File System Shell Commands,

The What, Why and How of Big Data

Similar a Big data gaurav

Data Lake Architecture – Modern Strategies & ApproachesDATAVERSITY

Intro big data analyticsHagar Alaa el-din

uae views on big dataAravindharamanan S

Bigdata and Hadoop with applicationsPadma Metta

TOPIC.pptxinfinix8

Big_Data.pptxmohamedibrahim946387

02 a holistic approach to big dataRaul Chong

Introduction to Big DataSpringPeople

Unit 1 (DSBDA) PD.pptxSamiksha880257

Big dataJoseph Sebastian

Introduction to Big DataUmair Shafique

Modern Metadata StrategiesDATAVERSITY

Big-Data-Seminar-6-Aug-2014-KoenigManish Chopra

Big dataEnfa Rose George

Understanding big dataPraneet Samaiya

Handling and Processing Big DataUmair Shafique

Group 2 Handling and Processing of big data.pptxsalutiontechnology

Data mining with big data implementationSandip Tipayle Patil

Data Modeling for Big DataDATAVERSITY

Oh! Session on Introduction to BIG DataPrakalp Agarwal

Similar a Big data gaurav (20)

Data Lake Architecture – Modern Strategies & Approaches

Intro big data analytics

uae views on big data

Bigdata and Hadoop with applications

TOPIC.pptx

Big_Data.pptx

02 a holistic approach to big data

Introduction to Big Data

Unit 1 (DSBDA) PD.pptx

Big data

Introduction to Big Data

Modern Metadata Strategies

Big-Data-Seminar-6-Aug-2014-Koenig

Big data

Understanding big data

Handling and Processing Big Data

Group 2 Handling and Processing of big data.pptx

Data mining with big data implementation

Data Modeling for Big Data

Oh! Session on Introduction to BIG Data

Último

How to Make a Pirate ship Primary Education.pptxmanuelaromero2013

Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha

Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD

Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝9953056974 Low Rate Call Girls In Saket, Delhi NCR

Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732

microwave assisted reaction. General introductionMaksud Ahmed

PSYCHIATRIC History collection FORMAT.pptxPoojaSen20

Accessible design: Minimum effort, maximum impactdawncurless

Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron

Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique

“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr

Staff of Color (SOC) Retention Efforts DDSDDavid Douglas School District

Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1

Software Engineering Methodologies (overview)eniolaolutunde

SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood

mini mental status format.docxPoojaSen20

TataKelola dan KamSiber Kecerdasan Buatan v022.pdfSarwono Sutikno, Dr.Eng.,CISA,CISSP,CISM,CSX-F

A Critique of the Proposed National Education Policy ReformChameera Dedduwage

Crayon Activity Handout For the Crayon AUnboundStockton

MENTAL STATUS EXAMINATION format.docxPoojaSen20

Big data gaurav

1. BUMPER

3. Understanding Big Data Class 1 Introduction to Big Data

4. Understanding Big Data Business Applications of Big Data Class 1 Introduction to Big Data

5. Understanding Big Data Business Applications of Big Data Technologies for handling Big Data Class 1 Introduction to Big Data

6. Understanding Big Data Business Applications of Big Data Technologies for handling Big Data Big Data Management Systems – Databases & Warehouses Class 1 Introduction to Big Data

7. Understanding Big Data Business Applications of Big Data Technologies for handling Big Data Big Data Management Systems – Databases & Warehouses Analytics & Big Data Class 1 Introduction to Big Data

8. Topic 1 Class 1 Introduction to Big Data Understanding Big Data

9. What is Big Data? Topic 1 – Understanding Big Data

10. What is Big Data? Topic 1 – Understanding Big Data Structuring & Elements

11. What is Big Data? Topic 1 – Understanding Big Data Structuring & Elements Application in Business & Careers

12. DATA Personal Computers Facebook Twitter YouTube Google ATMs Drop Box Picasa

13. 2002 5 Exabytes Online Data 2009 281 Exabytes Online Data (56 Times Increase)

14.

15.

16. A pool of large-sized datasets to capture, store, What is Big Data?

17. A pool of large-sized datasets to capture, store, What is Big Data? search, share, transfer, analyse, and visualise

18. A pool of large-sized datasets to capture, store, What is Big Data? search, share, transfer, analyse, and visualise related information or data within an acceptable elapsed time.

19. Data = Information

20. Data = Information Information = Insight

21. • Every second, consumers make 10,000 payment card transactions worldwide

22. • Every second, consumers make 10,000 payment card transactions worldwide • Every hour, Walmart handles more than 1 million customer transactions

23. • Every second, consumers make 10,000 payment card transactions worldwide • Every hour, Walmart handles more than 1 million customer transactions • Everyday Twitter’s users post 500 million tweets per day

24. • Every second, consumers make 10,000 payment card transactions worldwide • Every hour, Walmart handles more than 1 million customer transactions • Everyday Twitter’s users post 500 million tweets per day • Facebook users post 2.7 billion likes and comments in a day

25. BIG DATA Is a new data challenge that requires leveraging existing systems differently

26. BIG DATA Is a new data challenge that requires leveraging existing systems differently Is classified in terms of: Volume (terabytes, records, transactions) Variety (internal, external, behavioural, or/and social) Velocity (near or real-time assimilation)

27. BIG DATA Is a new data challenge that requires leveraging existing systems differently Is classified in terms of: Volume (terabytes, records, transactions) Variety (internal, external, behavioural, or/and social) Velocity (near or real-time assimilation) Is usually unstructured and qualitative in Nature

28.

29. • Understanding target customer Advantages of Studying Big Data:

30. • Understanding target customer • Cutting down expenditures in the healthcare Advantages of Studying Big Data:

31. • Understanding target customer • Cutting down expenditures in the healthcare • Increase in operating margins in retail Advantages of Studying Big Data:

32. • Understanding target customer • Cutting down expenditures in the healthcare • Increase in operating margins in retail • Profits with improvements in operational efficiency Advantages of Studying Big Data:

33. • Sports Industries that Benefit:

34. • Sports • Science and Research Industries that Benefit:

35. • Sports • Science and Research • Security and Law Enforcement Industries that Benefit:

36. • Sports • Science and Research • Security and Law Enforcement • Financial Trading Industries that Benefit:

37. • Procurement Departments that can Benefit:

38. • Procurement • Product Development Departments that can Benefit:

39. • Procurement • Product Development • Manufacturing Departments that can Benefit:

40. • Procurement • Product Development • Manufacturing • Distribution Departments that can Benefit:

41. • Procurement • Product Development • Manufacturing • Distribution • Marketing Departments that can Benefit:

42. • Procurement • Product Development • Manufacturing • Distribution • Marketing • Price Management Departments that can Benefit:

43. • Procurement • Product Development • Manufacturing • Distribution • Marketing • Price Management • Merchandising Departments that can Benefit:

44. • Procurement • Product Development • Manufacturing • Distribution • Marketing • Price Management • Merchandising • Sales Departments that can Benefit:

45. • Procurement • Product Development • Manufacturing • Distribution • Marketing • Price Management • Merchandising • Sales • Store operations Departments that can Benefit:

46. • Procurement • Product Development • Manufacturing • Distribution • Marketing • Price Management • Merchandising • Sales • Store operations • Human Resources Departments that can Benefit:

47. Flu Indications & WarningsMassive Data Collection Analyse Collected Data Early Warnings for Flu Plague

48. Social Data from Networking Sites reveals Behavioural Patterns

49. Use Big Data for Growth & Value Addition

50. RECAP What is Big Data, its advantages and various sources

51. BUMPER

52. BUMPER

53.

54. Topic 1 Class 1 - Introduction to Big Data Understanding Big Data

55. What is Big Data? Class 1 - Introduction to Big Data

56. What is Big Data? Class 1 - Introduction to Big Data Structuring & Elements

57. What is Big Data? Class 1 - Introduction to Big Data Structuring & Elements Application in Business & Careers

58. How do I choose a book, of the millions available on my favorite sites or stores? How can I use the vast amount of data and information I come across?

59. How do I keep myself updated of events, news? Which news articles should I read? How do I choose a book, of the millions available on my favorite sites or stores? How can I use the vast amount of data and information I come across?

60.

61. Formats of Data:

62. Formats of Data:

63. Formats of Data:

64. Formats of Data:

65. Internal – Organisational or enterprise data Sources of Data: External - Social Data from the internet or Government

66. Structured Data Unstructured Data Semi- Structured Data BIG DATA

67. Structured Data

68. • Has a predefined format Features of Structured Data:

69. • Has a predefined format • Resides in fixed fields within a record Features of Structured Data:

70. • Has a predefined format • Resides in fixed fields within a record • Has their attributes mapped Features of Structured Data:

71. • Has a predefined format • Resides in fixed fields within a record • Has their attributes mapped • Used to report against predetermined data types Features of Structured Data:

72. Sources of Structured Data: • Relational databases

73. Sources of Structured Data: • Relational databases • Flat files in record format

74. Sources of Structured Data: • Relational databases • Flat files in record format • Multidimensional databases

75. Sources of Structured Data: • Relational databases • Flat files in record format • Multidimensional databases • Legacy databases

76. Unstructured Data

77. Sources of Unstructured Data: • Organisational Data

78. Sources of Unstructured Data: • Organisational Data • Social Media

79. Sources of Unstructured Data: • Organisational Data • Social Media • Mobile Data

80.

81. Challenges of Using Unstructured Data: • Difficulty and time consumption in making sense

82. Challenges of Using Unstructured Data: • Difficulty and time consumption in making sense • Difficulty in combining and linking unstructured data to more structured information

83. Challenges of Using Unstructured Data: • Difficulty and time consumption in making sense • Difficulty in combining and linking unstructured data to more structured information • Cost-addition in terms of the storage wastage and human resource needed

84.

85. Semi-Structured Data

86. Sources of Semi-Structured data: • Database systems

87. Sources of Semi-Structured data: • Database systems • File systems like Web data and bibliographic data

88. Sources of Semi-Structured data: • Database systems • File systems like Web data and bibliographic data • Data exchange formats like scientific data

89. Sl. No Name E-mail 1. Sam Jacobs smj@xyz.com 2. First Name David davidb@xyz.com Last Name Brown

90.

91. Volume

92. Velocity

93.

94. Variety

95. What is Big Data? Class 1 - Introduction to Big Data Structuring & Elements Application in Business & Careers

96. Big Data Application In Business Analytics

97.

98.

99. What are the areas where Big Data can be applied?

100. Transportation Provides improved traffic information and autonomous features

101. Education Through innovative approaches for teachers to analyze students

102. Travel Apply analytics to pricing, inventory, and advertising to improve customer experiences

103. Governments To make informed decisions for fraud management, discover unknown threats, ensure security of global supply chain

104. Healthcare To ensure clinical protocols that will ensure the best health outcome for patients

105. Careers in Big Data

106. BIG Career Opportunities

107. Major Big Data Hiring Companies: Product companies, e.g., Oracle Technology drivers, e.g., Google Services companies, e.g., EMC Data analytics companies, e.g., Splunk

108. The most common job titles in Big Data include: Big Data Analyst

109. The most common job titles in Big Data include: Big Data Analyst Big Data Scientist

110. The most common job titles in Big Data include: Big Data Analyst Big Data Scientist Big Data Developer

111. Module 1 Introduction to Big Data

112. Module 1 Introduction to Big Data Big Data Analyst Certification Track Big Data Developer Certification Track

113. Module 1 Introduction to Big Data Big Data Analyst Certification Track Big Data Developer Certification Track Module 2 Introduction to Analytics & R Programming Module 3 Data Analysis Using R Module 4 Advanced Analytics Using R Module 2 Managing a Big Data Ecosystem

114. Module 1 Introduction to Big Data Big Data Analyst Certification Track Big Data Developer Certification Track Module 2 Introduction to Analytics & R Programming Module 3 Data Analysis Using R Module 4 Advanced Analytics Using R Module 2 Managing a Big Data Ecosystem Module 5 Machine Learning Concepts Module 3 Storing & Processing Data: HDFS & MapReduce Module 4: Increasing Efficiency with Hadoop Tools Module 5 Additional Hadoop Tools: ZooKeeper, Sqoop, Flume, YARN & Storm

115. Module 1 Introduction to Big Data Big Data Analyst Certification Track Big Data Developer Certification Track Module 2 Introduction to Analytics & R Programming Module 3 Data Analysis Using R Module 4 Advanced Analytics Using R Module 2 Managing a Big Data Ecosystem Module 5 Machine Learning Concepts Module 3 Storing & Processing Data: HDFS & MapReduce Module 4: Increasing Efficiency with Hadoop Tools Module 5 Additional Hadoop Tools: ZooKeeper, Sqoop, Flume, YARN & Storm Module 6 Social Media, Mobile Analytics & Visualisation Module 7 Industry Applications of Big Data Applications Module 6 Leveraging NoSQL & Hadoop: Real Time, Security & Cloud Module 7 Commercial Hadoop Distribution & Management Tools

116. Module 1 Introduction to Big Data Big Data Analyst Certification Track Big Data Developer Certification Track Module 2 Introduction to Analytics & R Programming Module 3 Data Analysis Using R Module 4 Advanced Analytics Using R Module 2 Managing a Big Data Ecosystem Module 5 Machine Learning Concepts Module 3 Storing & Processing Data: HDFS & MapReduce Module 4: Increasing Efficiency with Hadoop Tools Module 5 Additional Hadoop Tools: ZooKeeper, Sqoop, Flume, YARN & Storm Module 6 Social Media, Mobile Analytics & Visualisation Module 7 Industry Applications of Big Data Applications Module 6 Leveraging NoSQL & Hadoop: Real Time, Security & Cloud Module 7 Commercial Hadoop Distribution & Management Tools Complete Project Wrox Certified Big Data Analyst/ Developer

117. Technical Skills Required for a Big Data Analyst:

118. Technical Skills Required for a Big Data Analyst: • Handle & analyse massive data sets using MapReduce

119. Technical Skills Required for a Big Data Analyst: • Handle & analyse massive data sets using MapReduce • Hadoop & components Hbase & Hive

120. Technical Skills Required for a Big Data Analyst: • Handle & analyse massive data sets using MapReduce • Hadoop & components Hbase & Hive • SQL and NoSQL languages such as Impala, Hive and Pig

121. Technical Skills Required for a Big Data Analyst: • Handle & analyse massive data sets using MapReduce • Hadoop & components Hbase & Hive • SQL and NoSQL languages such as Impala, Hive and Pig • Analytical tools such as SAS, R, Tableau

122. Technical Skills Required for a Big Data Analyst: • Handle & analyse massive data sets using MapReduce • Hadoop & components Hbase & Hive • SQL and NoSQL languages such as Impala, Hive and Pig • Analytical tools such as SAS, R, Tableau • Statistical techniques to implement text analytics solutions

123. Technical Skills Required for a Big Data Analyst: • Handle & analyse massive data sets using MapReduce • Hadoop & components Hbase & Hive • SQL and NoSQL languages such as Impala, Hive and Pig • Analytical tools such as SAS, R, Tableau • Statistical techniques to implement text analytics solutions • Data handling and manipulation techniques

124. Technical Skills Required for a Big Data Analyst: • Handle & analyse massive data sets using MapReduce • Hadoop & components Hbase & Hive • SQL and NoSQL languages such as Impala, Hive and Pig • Analytical tools such as SAS, R, Tableau • Statistical techniques to implement text analytics solutions • Data handling and manipulation techniques • Generate client ready dashboards, reports and visualizations

125. Soft Skills Required: • Strong written & verbal communication skills

126. Soft Skills Required: • Strong written & verbal communication skills • Analytical Ability

127. Soft Skills Required: • Strong written & verbal communication skills • Analytical Ability • Basic understanding of how a business works

128. Future of Big Data

129. RECAP  What are the various types and structures of Big Data and the elements that form it  What are the business applications of Big Data and the career opportunities associated

130. BUMPER

131. BUMPER

132. BIG DATA

133. Topic 2 Business Applications of Big Data Class 1: Introduction to Big Data

134.

135.

136. Social Media

137. Topic 2 Business Applications of Big Data Significance of Social Network Data

138. Topic 2 Business Applications of Big Data Significance of Social Network Data Financial Fraud & Big Data

139. Topic 2 Business Applications of Big Data Significance of Social Network Data Financial Fraud & Big Data Fraud Detection in Insurance

140. Topic 2 Business Applications of Big Data Significance of Social Network Data Financial Fraud & Big Data Fraud Detection in Insurance Use in Retail Industry

141. Significance of Social Network Data What is Social Network Data?

142. Significance of Social Network Data What is Social Network Data? What is Social Network Analysis?

143. Significance of Social Network Data What is Social Network Data? What is Social Network Analysis? What are the uses of Social Network Data Analysis?

144. Significance of Social Network Data What is Social Network Data? What is Social Network Analysis? What are the uses of Social Network Data Analysis? What is Sentiment Analysis?

145. DATA

146.

147. Social Media AGE

148. Social Media AGE GENDER

149. Social Media AGE GENDER LOCATION

150.

151. Significance of Social Network Data What is Social Network Data? What is Social Network Analysis? What are the uses of Social Network Data Analysis? What is Sentiment Analysis?

152. Social Network Analysis (SNA) Social Network

153. Social Network Analysis (SNA) Social Network DATA

154. Analysis Social Network Analysis (SNA) Social Network DATA

155.

156. Total Number of calls

157. Total Number of calls Total Number of SMS

158. Structure of a Caller’s Social Network

159. Social Network Site

160. Social Network Site

161. Social Network Site

162. Social Network Site

163. Social Network Site

164. Social Network Site

165. Social Network Site

166. Social Network Site

167. Social Network Site

168. Social Networking Analysis a Big Data Problem

169. Significance of Social Network Data What is Social Network Data? What is Social Network Analysis? What are the uses of Social Network Data Analysis? What is Sentiment Analysis?

170. Social Network Analysis (SNA) Business Intelligence

171. Social Network Analysis (SNA) Business Intelligence Marketing

172. Social Network Analysis (SNA) Business Intelligence Marketing Product Design & Development

173. Social Network Analysis (SNA) Business Intelligence Marketing Product Design & Development

174.

175.

176. Customer Relationship Management (CRM)

177.

178.

179.

180.

181. A •E •F B •A •D C •H •OGroup A Group GH

182. Provides new contexts in which decisions are data driven, not opinion driven Social Network Data Analysis

183. Provides new contexts in which decisions are data driven, not opinion driven Organizations to shift goals to maximize profitability of customer’s network Social Network Data Analysis

184. Provides new contexts in which decisions are data driven, not opinion driven Organizations to shift goals to maximize profitability of customer’s network Organizations to identify highly connected customers Social Network Data Analysis

185. Organizations to lure highly connected customers with free trials and solicit their feedback Social Network Data Analysis

186. Organizations to lure highly connected customers with free trials and solicit their feedback Organizations to encourage internal customers to become more active Social Network Data Analysis

187. Social Network Analysis (SNA) Business Intelligence Marketing Product Design & Development

188.

189.

190.

191.

192.

193.

194.

195.

196.

197.

198. Social Data

199. Social Data Analysis

200. Analyze Media Communication

201.

202. Social Network Analysis (SNA) Business Intelligence Marketing Product Design & Development

203.

204.

205. System

206. System

207. DATA System

208. Significance of Social Network Data What is Social Network Data? What is Social Network Analysis? What are the uses of Social Network Data Analysis? What is Sentiment Analysis?

209.

210. Product Development and Offerings

211. Sentiment Analysis Marketers Business Professionals

212. Followers

213. 3,46,259 Followers 2,73,591 Likes But is one of the most disliked airlines. Why?

214. SummaryRECAP What is social network data and analysis What are its uses and values

215. BUMPER

216. BUMPER

217. BIG DATA

218. Topic 2 Business Applications of Big Data Class 1: Introduction to Big Data

219. Topic 2 Business Applications of Big Data Significance of Social Network Data Financial Fraud & Big Data Fraud Detection in Insurance Use in Retail Industry

220. BANK

221.

222. Common Financial Frauds Common Financial Frauds Credit Card Frauds Exchange or Return Policy Fraud Personal Information Fraud

223. understand customers ordering patterns Prevent Frauds watch out For red flags

224. Big Data

225. Analyzing data sample size Small Can understand various patterns of the fraud Analyzing data sample size Large Cannot understand various patterns of the fraud • Size could not be increased, required huge investments in time and money • Big Data techniques can overcome this challenge

226. Big Data analytics can… Run check on all data to identify fraudulent ones Identify new ways of fraud and add to a set of fraud-prevention checks Doesn’t impede customers with unnecessary polices and governance structures

227. Fraud Detection in Real Time BIG DATA live transactions sources of data

228. BIG DATA Historical Data Indicate fraud patterns Checks to prevent real-time fraud

229. Real-time Analysis

230. BIG DATA Create comparisons Drawing Maps & Graphs Decisions and effective systems BLOCK FRAUD

231. Topic 2 Business Applications of Big Data The Significance of Social Network Data Financial Fraud and Big Data Fraud Detection in Insurance Use of Big Data in the Retail Industry

232. Insurance Company Improve its ability to make decisions in real time when processing a new claim, thereby reducing the claim cycle time Incurs a steady increase in the cost of litigation and fraudulent claims Underwriters do not have required data at the right time to make the necessary decisions, further delaying processing time

233. BIG DATA Social Media Data Note for underwriter

234. Social Media Triggers to identify Fraud These glaring discrepancies reflect FRAUD. In the claim - a customer might indicate that his or her car was destroyed in a flood Documentation from the social media feed shows that the car was actually in another city on the day the flood occurred.

235. Insurance Frauds Have a huge cost implication on organization Organizations prefer using Big Data analytics and other advanced technologies Positive impact on customers as losses are transferred as higher premiums to customers

236. Big Data analytics platform Organizations are now able to analyze complex information and accident scenarios in minutes rather than days or months INSURANCE

237.  Typically use small samples of data to analyze  Method relies on the previously recorded fraud cases  Every time a fraud based on new technique occurs, insurance companies have to bear the consequences and the losses for the first time  The traditional method of identifying frauds works in independent silos  It is not capable of handling various sources of information from different channels and different functions in an integrated way Fraud Detection Methods Statistical Models

238. Public Data Bank Statements Legal Judgments Criminal Records Medical Bills

239. Social Network Analysis (SNA) Big Data can be used to create visibility into blind spots for businesses SNA is an innovative and effective way to identify and detect frauds

240.

241. SNA tool uses a mix of analytical methods • Statistical methods • Pattern analysis • Link analysis

242. When link analysis is used in fraud detection • Looks for clusters of data • How those data clusters are linked to other data clusters? • Public records are various data sources that can be integrated into a model • The insurer can rate claims

243. When link analysis is used in fraud detection If the rating is high It indicates that the claim is fraudulent • known bad address • a suspicious provider • the vehicle was involved in many accidents with multiple carriers.

244. How fast does data arrive?

245. How much of unrequired data is there when it arrives?

246. How deep should the analysis be before determining the best accurate results?

247. What type of user interface components need to be included on the SNA dashboard?

248. SNA method to detect fraud: Structured and unstructured data, from various sources fed into the ETL (Extract, Transform, and Load) tool This data is then transformed and loaded into data warehouse Analytics team uses information from various sources, scores risk of fraud and ranks likelihood of fraud Information used can come from varied sources - prior belief, previous relationship, number of rejected claims etc. Big Data technologies - text mining, sentiment analysis, content categorization, and social network analysis included into the fraud detection and predictive modeling mechanism.

249. SNA method to detect fraud: Depending on score of particular network, an alert is generated Investigators can leverage this information and begin researching more on fraudulent claim Issues of frauds identified are added into case system.

250. Predictive analysis works with the concept that earlier the fraud detection, the lesser the loss incurred by a business.

251.

252. Fraud detection BIG DATA Text analytics Sentiment analysis Predictive analytics

253. Predictive Analytics Technology Claim adjusters write lengthy reports while investigating a claim. Clues are hidden in reports that claims adjuster would not notice Computing system based on business rules highlights clues for possible fraud Fraud detection system spot these discrepancies and flag claim as fraudulent

254. Customer Relationship Management (CRM)

255.

256. The following briefly describes how a Social CRM process works: Uses organization’s existing CRM to gather data from various social media platforms Uses “listening” tool to extract data from social chatter that acts as reference data for existing data in organization’s CRM Reference data along with information stored in CRM fed into a case management system Case management system analyzes information on basis of organization’s business rules and sends response Response from claim management system on fraudulent claim is confirmed by investigators

257. Class 1: Introduction to Big Data The Significance of Social Network Data Financial Fraud and Big Data Fraud Detection in Insurance Use of Big Data in Retail Industry

258. Use of Big Data in Retail Industry BIG DATA

259. Use of Big Data in Retail Industry How many basic tees did we sell today? What time of the year do we sell most leggings? What else has customer X bought? what kind of coupons can we send to customer X?

260. Use of Big Data in Retail Industry

261. Use of Big Data in Retail Industry In-store Sales Online Sales

262. Use of Big Data in Retail Industry

263. Use of Big Data in Retail Industry

264. Most of the Big Data is just not required and not useful either • some information will have long-term strategic value • some will be useful only for immediate and tactical use • some data won’t be used for anything at all

265. Use of RFID Data in Retail (Radio Frequency Identification) A RFID tag refers to a small tag that includes a unique code to identify a product like a UPC code. This tag is placed on shipping pallets or product packages as an adjacent image.

266. In addition to a bar code, an RFID: Specifies pallet as allotted to a precise and exclusive set of computer systems Helps in finding situations where items have no units left in store Specifies number of units of each item remaining in store, and thereby raises an alarm when restocking required Better tracking of products by differentiating products which are out of stock and products that are available on shelf.

267. Use of RFID Data in Retail • saves time • reduces labor • enhances the visibility of products throughout the production-delivery life cycle • saves costs

268.  What is the significance of Social Data Network Data, Financial Fraud, Fraud Detection in Insurance and the uses of Big Data in Retail Industry  What are the uses of Big Data in retail Industry, RFID Data and its advantages RECAP

269. BUMPER

270. BUMPER

271.

272. Topic 3 Class 1 - Introduction to Big Data Technologies for Handling Big Data

273. Distribution & Computing for Big Data Topic 3 – Technologies for Handling Big Data Introducing Hadoop Cloud Computing & In-Memory Technologies for Big Data

274. DATA PROCESSING Analysed

275. Distributed & Parallel Computing BIG DATA HADOOP CLOUD In-Memory Computing

276.

277.

278.

279.

280. Transmitter Receiver

281. Transmitter Receiver Hello?

282. Transmitter Receiver Hello?

283. Transmitter Receiver Hello? I can’t hear you…

284. Slowdown in system performance Issues caused by Latency:

285. Slowdown in system performance Data management Issues caused by Latency:

286. Slowdown in system performance Data management Internal organisational communication Issues caused by Latency:

287. Slowdown in system performance Data management Internal organisational communication External communication Issues caused by Latency:

288. Distributed and Parallel processing

289. Distributed and Parallel processing techniques process large amounts of

290. Distributed and Parallel processing techniques process large amounts of data and also deal with latency.

291. Distributed System A collection of independent computer systems

292. Distributed System A collection of independent computer systems that are connected via a network

293. Distributed System A collection of independent computer systems that are connected via a network to accomplish a specific task.

294. Parallel System A computer system that has multiple processing units attached to it.

295. Parallel Computing Techniques Clusters or Grids

296. Parallel Computing Techniques Massively Parallel Processing (MPP)

297. Parallel Computing Techniques High-Performance Computing (HPC)

298. Public Cloud vs Private Cloud

299. Public Cloud vs Private Cloud

300. Public Cloud vs Private Cloud

301. Public Cloud vs Private Cloud

302. Distribution & Computing for Big Data Topic 3 – Technologies for Handling Big Data Introducing Hadoop Cloud Computing & In-Memory Technologies for Big Data

303.

304. Features of Hadoop: • Works on multiple machines without sharing memory

305. Features of Hadoop: • Works on multiple machines without sharing memory • Distributes data over different servers

306. Features of Hadoop: • Works on multiple machines without sharing memory • Distributes data over different servers • Can track data stored on different servers

307. Features of Hadoop: • Works on multiple machines without sharing memory • Distributes data over different servers • Can track data stored on different servers • Runs all available servers in parallel

308. Features of Hadoop: • Works on multiple machines without sharing memory • Distributes data over different servers • Can track data stored on different servers • Runs all available servers in parallel • Keeps multiple copies of data

309.

310.

311. Hadoop Cluster Gateway Node

312. Hadoop Cluster Gateway Node Switch

313. Hadoop Cluster Gateway Node Switch Server 1 Server 2

314. Hadoop Cluster Gateway Node Switch Server 1 Server 2 Server 3 Server 4 Server 5

315. Hadoop Cluster Gateway Node Switch Server 1 Server 2 Server 3 Server 4 Server 5

316. MapReduce

317.

318. How does Hadoop work? • Data of an organisation is loaded into the Hadoop software

319. How does Hadoop work? • Data of an organisation is loaded into the Hadoop software • Data is divided into different pieces & sent to different servers

320. How does Hadoop work? • Data of an organisation is loaded into the Hadoop software • Data is divided into different pieces & sent to different servers • Hadoop keeps track of the data by sending a job code to all the servers that store the relevant piece of data

321. How does Hadoop work? • Data of an organisation is loaded into the Hadoop software • Data is divided into different pieces & sent to different servers • Hadoop keeps track of the data by sending a job code to all the servers that store the relevant piece of data • Each server applies the job code to the portion of data stored on it and returns results

322. Indexing Job Hadoop Software Server 1 Server 2 Server 3 Job Code 1 + Processing Data Job Code 2 + Processing Data Job Code 3 + Processing Data Result

323. EXAMPLE:  user_id  user_name

324. EXAMPLE:  user_id  user_name  city_name  service_provider_name  and call_time

325.  user_id  user_name  city_name  service_provider_name  and call_time

326. RECAP  Various aspects of distribution and computing for Big Data  Hadoop as a technology for handling Big Data

327. BUMPER

328. BUMPER

329.

330. Topic 3 Class 1 - Introduction to Big Data Technologies for Handling Big Data

331. Distribution & Computing for Big Data Topic 3 – Technologies for Handling Big Data Introducing Hadoop Cloud Computing & In-Memory Technologies for Big Data

332.

333.

334.

335. Features of Cloud Computing: • Scalability

336. Features of Cloud Computing: • Scalability • Elasticity

337. Features of Cloud Computing: • Scalability • Elasticity • Resource Pooling

338. Features of Cloud Computing: • Scalability • Elasticity • Resource Pooling • Self Service

339. Features of Cloud Computing: • Scalability • Elasticity • Resource Pooling • Self Service • Low Costs

340. Features of Cloud Computing: • Scalability • Elasticity • Resource Pooling • Self Service • Low Costs • Fault Tolerance

341. What are Cloud Deployment Modules?

342. PRIVATE CLOUD

343.

344. Categories of Cloud Services:

345.

346.

347.

348. Other Amazon Web Services: • Amazon Elastic MapReduce

349. Other Amazon Web Services: • Amazon Elastic MapReduce • Amazon Dynamo DB

350. Other Amazon Web Services: • Amazon Elastic MapReduce • Amazon Dynamo DB • Amazon S3

351. Other Amazon Web Services: • Amazon Elastic MapReduce • Amazon Dynamo DB • Amazon S3 • Amazon High-Performance Computing

352. Other Amazon Web Services: • Amazon Elastic MapReduce • Amazon Dynamo DB • Amazon S3 • Amazon High-Performance Computing • Amazon RedShift

353. Google Web Services: • Google Compute Engine

354. Google Web Services: • Google Compute Engine • Google Big Query

355. Google Web Services: • Google Compute Engine • Google Big Query • Google Prediction API

356. Windows Azure

357. In-memory technology makes it possible for

358. In-memory technology makes it possible for departments or business units

359. In-memory technology makes it possible for departments or business units to take the part of the organizational data

360. In-memory technology makes it possible for departments or business units to take the part of the organizational data that is relevant to their needs and process it locally.

361. RECAP In this session we discussed cloud computing & various in-memory technologies for handling Big Data.

362. BUMPER

Notas del editor

Welcome to the Big Data Course, jointly presented by Jigsaw Academy and Wiley. Through this course, we hope to create a new international breed of versatile Big Data analysts.
Class 1 gives you an overview of Big Data. It introduces the concept and gives a broad overview of the business applications of Big Data. In addition, the module gives a broad understanding of the technology infrastructure that is required to store, handle, and manage Big Data. Finally, the module delves a little deeper into the Hadoop ecosystem as well as the MapReduce framework and explains how these popular frameworks support Big Data management. We’ll understand this a lot more in detail as we explore these topics.
Class 1 gives you an overview of Big Data. It introduces the concept and gives a broad overview of the business applications of Big Data. In addition, the module gives a broad understanding of the technology infrastructure that is required to store, handle, and manage Big Data. Finally, the module delves a little deeper into the Hadoop ecosystem as well as the MapReduce framework and explains how these popular frameworks support Big Data management. We’ll understand this a lot more in detail as we explore these topics.
Class 1 gives you an overview of Big Data. It introduces the concept and gives a broad overview of the business applications of Big Data. In addition, the module gives a broad understanding of the technology infrastructure that is required to store, handle, and manage Big Data. Finally, the module delves a little deeper into the Hadoop ecosystem as well as the MapReduce framework and explains how these popular frameworks support Big Data management. We’ll understand this a lot more in detail as we explore these topics.
Class 1 gives you an overview of Big Data. It introduces the concept and gives a broad overview of the business applications of Big Data. In addition, the module gives a broad understanding of the technology infrastructure that is required to store, handle, and manage Big Data. Finally, the module delves a little deeper into the Hadoop ecosystem as well as the MapReduce framework and explains how these popular frameworks support Big Data management. We’ll understand this a lot more in detail as we explore these topics.
Class 1 gives you an overview of Big Data. It introduces the concept and gives a broad overview of the business applications of Big Data. In addition, the module gives a broad understanding of the technology infrastructure that is required to store, handle, and manage Big Data. Finally, the module delves a little deeper into the Hadoop ecosystem as well as the MapReduce framework and explains how these popular frameworks support Big Data management. We’ll understand this a lot more in detail as we explore these topics.
The 1st topic we will discuss today is what Big Data, what are its advantages and sources?
We will be dividing class 1 into three broad categories – history & evolution of Big Data, Structuring of Big Data & the elements that comprise it, Big Data application in business analytics and lastly the career opportunities associated with studying Big Data.
We will be dividing class 1 into three broad categories – history & evolution of Big Data, Structuring of Big Data & the elements that comprise it, Big Data application in business analytics and lastly the career opportunities associated with studying Big Data.
We will be dividing class 1 into three broad categories – history & evolution of Big Data, Structuring of Big Data & the elements that comprise it, Big Data application in business analytics and lastly the career opportunities associated with studying Big Data.
If you think of the world around you, there is an enormous amount of data generated, captured, and transferred through various media—within seconds. This data may come from a personal computer, social networking sites, transaction or communication system of an organization, ATMs, and multiple other channels.
Some reports have recorded that in 2002, there was an estimated 5 exabytes of online data in existence. Each Exabyte is a massive 1000000 terabytes or TBs. By 2009, that number had risen to 281 exabytes—a 56-times increase—and this number has multiplied exponentially post 2009. This data is created in the form of posts, pictures, videos, and weather information.
This accumulation results in a continuous generation of an enormous volume of data, which if analyzed intelligently, can be of immense value, as it can give us a variety of critical information to make smarter decisions. In other words, careful analysis can transform this data to information, and information to insight.
The need to analyze and offer this critical data in a systematic and comprehensive manner leads to the rise of a much discussed term … and the pivot of this course —Big Data.
Big Data is a pool of large-sized datasets to capture, store, search, share, transfer, analyze, and visualize related information or data within an acceptable elapsed time.
Big Data is a pool of large-sized datasets to capture, store, search, share, transfer, analyze, and visualize related information or data within an acceptable elapsed time.
Big Data is a pool of large-sized datasets to capture, store, search, share, transfer, analyze, and visualize related information or data within an acceptable elapsed time.
Big Data assimilation is the process of examining large amounts of data to gain insight.
Big Data assimilation is the process of examining large amounts of data to gain insight.
As data continues to grow, there is a need for the data to be organized and made available so that it can be used as an information source. Earlier due to lack of access and the means to process data, the potential of Big Data remained mostly untapped.
As data continues to grow, there is a need for the data to be organized and made available so that it can be used as an information source. Earlier due to lack of access and the means to process data, the potential of Big Data remained mostly untapped.
As data continues to grow, there is a need for the data to be organized and made available so that it can be used as an information source. Earlier due to lack of access and the means to process data, the potential of Big Data remained mostly untapped.
As data continues to grow, there is a need for the data to be organized and made available so that it can be used as an information source. Earlier due to lack of access and the means to process data, the potential of Big Data remained mostly untapped.
There are 3 main factors to consider when talking about Big Data, so lets take a quick look at each of them. = It’s a new kind of data. It’s a challenge since it requires leveraging different systems differently. = It is classified in terms of Volume / Variety and Velocity. Volume refers to the amount of data, whereas Variety refers to type – internal or external or behavioural or social. The third classification Velocity refers to its assimilation … how near or real-time it is. We will look at these concepts in more detail in later classes. = Lastly, Big Data is largely unstructured and qualitative in nature – hence giving it its name – BIG data.
There are 3 main factors to consider when talking about Big Data, so lets take a quick look at each of them. = It’s a new kind of data. It’s a challenge since it requires leveraging different systems differently. = It is classified in terms of Volume / Variety and Velocity. Volume refers to the amount of data, whereas Variety refers to type – internal or external or behavioural or social. The third classification Velocity refers to its assimilation … how near or real-time it is. We will look at these concepts in more detail in later classes. = Lastly, Big Data is largely unstructured and qualitative in nature – hence giving it its name – BIG data.
There are 3 main factors to consider when talking about Big Data, so lets take a quick look at each of them. = It’s a new kind of data. It’s a challenge since it requires leveraging different systems differently. = It is classified in terms of Volume / Variety and Velocity. Volume refers to the amount of data, whereas Variety refers to type – internal or external or behavioural or social. The third classification Velocity refers to its assimilation … how near or real-time it is. We will look at these concepts in more detail in later classes. = Lastly, Big Data is largely unstructured and qualitative in nature – hence giving it its name – BIG data.
Big Data is a new kind of challenge because besides its enormous implications, its significance is constantly increasing with the growth in data. Today, Big Data can mean anything from a single terabyte to a petabyte or an Exabyte of data.
The systematic study of Big Data across sectors and geographies can lead to results such as: Understanding target customers better Cutting down of expenditures in the healthcare sector Increase in operating margins for the retail sector Several billions of dollars being saved by improvements in operational efficiency
The systematic study of Big Data across sectors and geographies can lead to results such as: Understanding target customers better Cutting down of expenditures in the healthcare sector Increase in operating margins for the retail sector Several billions of dollars being saved by improvements in operational efficiency
The systematic study of Big Data across sectors and geographies can lead to results such as: Understanding target customers better Cutting down of expenditures in the healthcare sector Increase in operating margins for the retail sector Several billions of dollars being saved by improvements in operational efficiency
The systematic study of Big Data across sectors and geographies can lead to results such as: Understanding target customers better Cutting down of expenditures in the healthcare sector Increase in operating margins for the retail sector Several billions of dollars being saved by improvements in operational efficiency
Across industries, data along with analytics can transform major business processes in various ways such as: Improving performance in sports by analyzing and tracking performance and behavior Improving science and research Improving security and law enforcement by enabling better monitoring Improving financial trading by making more informed decisions
Across industries, data along with analytics can transform major business processes in various ways such as: Improving performance in sports by analyzing and tracking performance and behavior Improving science and research Improving security and law enforcement by enabling better monitoring Improving financial trading by making more informed decisions
Across industries, data along with analytics can transform major business processes in various ways such as: Improving performance in sports by analyzing and tracking performance and behavior Improving science and research Improving security and law enforcement by enabling better monitoring Improving financial trading by making more informed decisions
Across industries, data along with analytics can transform major business processes in various ways such as: Improving performance in sports by analyzing and tracking performance and behavior Improving science and research Improving security and law enforcement by enabling better monitoring Improving financial trading by making more informed decisions
Across organizations, the right analysis of available data can transform major business processes in various ways like in …
Across organizations, the right analysis of available data can transform major business processes in various ways like in …
Across organizations, the right analysis of available data can transform major business processes in various ways like in …
Across organizations, the right analysis of available data can transform major business processes in various ways like in …
Across organizations, the right analysis of available data can transform major business processes in various ways like in …
Across organizations, the right analysis of available data can transform major business processes in various ways like in …
Across organizations, the right analysis of available data can transform major business processes in various ways like in …
Across organizations, the right analysis of available data can transform major business processes in various ways like in …
Across organizations, the right analysis of available data can transform major business processes in various ways like in …
Across organizations, the right analysis of available data can transform major business processes in various ways like in …
Google applied its massive data-collecting power to raise warnings for the flu plagues approximately two weeks in advance of the existing public services. To do this, Google monitored millions of users’ health-tracking behaviors, and followed a cluster of queries on themes such as symptoms about flu, congestion in chest, and incidences of buying a thermometer. Google analyzed this collected data and generated consolidated results that revealed strong indications of flu levels across America.
Besides the more obvious reference to volume, Big Data has also been called so because of the various types and sources of data. Lets look at some of the source types of data and their usage. Think of social data from sources like Facebook or Twitter, and how much it can tell us about the people using them, and their behavioral patterns. Or data like GPS outputs which can track our movements across the globe – that’s machine data, or even transactional data from when we order a new pair of shoes online, or when we buy pizza.
The need for Big Data is evident. If leaders and economies want exemplary growth and wish to generate value for all their stakeholders, Big Data has to be embraced and used extensively.
Welcome to the Big Data Course, jointly presented by Jigsaw Academy and Wiley. Through this course, we hope to create a new international breed of versatile Big Data analysts.
The 1st topic we will discuss today is what Big Data, what are its advantages and sources?
Now we will look at the structuring and elements of Big Data.
Now we will look at the structuring and elements of Big Data.
Now we will look at the structuring and elements of Big Data.
In your daily life, you may have come across questions like:
In your daily life, you may have come across questions like:
Today, solutions to such questions can be found by computers. Recommendation systems can analyze and structure a large amount of data specifically for you, on the basis of what you searched, what you looked at, and for how long—thus scanning and presenting you with customized information as per your behavior and habits. This is called structuring of data. This is what goes into play when your favorite shopping site presents you with a fantastically picked set of recommendations when you log in. It is when technology is used to study and analyze the data to understand user behavior, requirements, and preferences to make personalized recommendations for every individual.
Data that comes from multiple sources—such as databases, Enterprise Resource Planning (ERP) systems, weblogs, chat history, and GPS maps—varies in its format. However, different formats of data need to be made consistent and clear to be used for analysis.
Data that comes from multiple sources—such as databases, Enterprise Resource Planning (ERP) systems, weblogs, chat history, and GPS maps—varies in its format. However, different formats of data need to be made consistent and clear to be used for analysis.
Data that comes from multiple sources—such as databases, Enterprise Resource Planning (ERP) systems, weblogs, chat history, and GPS maps—varies in its format. However, different formats of data need to be made consistent and clear to be used for analysis.
Data that comes from multiple sources—such as databases, Enterprise Resource Planning (ERP) systems, weblogs, chat history, and GPS maps—varies in its format. However, different formats of data need to be made consistent and clear to be used for analysis.
Data acquired from various sources can be categorized primarily into the following types of sources: Internal sources, such as organizational or enterprise data which can be used to support the business operations of an organization. And External sources, such as social data from the Internet or the government which can be analyzed to formulate policy and understand the market, or the environment or technology.
Have a look at the table on your screen. You’ll see that sources can be internal or external, but they usually provide 3 kinds of data … Its when all these 3 data comes together that we can actually visualize what is Big Data. You’ll note that typically unstructured data is larger in volume than structured and semi-structured data. Lets take a closer look at each of these data types.
Structured data can be defined as a set of data with a defined repeating pattern. This pattern makes it easier for any program to sort, read, and process it. Obviously, processing of structured data is much faster than the processing of data without specific repeating patterns.
Lets take a quick look at a sample of structured data, in which the attribute data for every customer is stored with individual data points in the defined fields. From this, lets try derive a few features of structured data. Is organized data in a predefined format Is data that resides in fixed fields within a record or file Is formatted data that has entities and their attributes mapped Is used to query and report against predetermined data types
Lets take a quick look at a sample of structured data, in which the attribute data for every customer is stored with individual data points in the defined fields. From this, lets try derive a few features of structured data. Is organized data in a predefined format Is data that resides in fixed fields within a record or file Is formatted data that has entities and their attributes mapped Is used to query and report against predetermined data types
Lets take a quick look at a sample of structured data, in which the attribute data for every customer is stored with individual data points in the defined fields. From this, lets try derive a few features of structured data. Is organized data in a predefined format Is data that resides in fixed fields within a record or file Is formatted data that has entities and their attributes mapped Is used to query and report against predetermined data types
Lets take a quick look at a sample of structured data, in which the attribute data for every customer is stored with individual data points in the defined fields. From this, lets try derive a few features of structured data. Is organized data in a predefined format Is data that resides in fixed fields within a record or file Is formatted data that has entities and their attributes mapped Is used to query and report against predetermined data types
Unstructured Data is a set of data with a complex structure that might or might not have a repeating pattern. It: Consists typically of metadata Comprises inconsistent data Consists of data in different formats such as e-mails, text, audio, video, or image files
Some sources for unstructured data include: Text Internal to an Organization: Think of documents, logs, emails etc. Data from Social Media And Mobile Data
Some sources for unstructured data include: Text Internal to an Organization: Think of documents, logs, emails etc. Data from Social Media And Mobile Data
Some sources for unstructured data include: Text Internal to an Organization: Think of documents, logs, emails etc. Data from Social Media And Mobile Data
A fantastic example of the usage of unstructured data is in supermarkets where unstructured visual information from CCTV footage – like where customers halt, their behavior during a bottleneck, how they navigate through a store … is combined with structured data comprising bill counters, products to arrive at a complete data-driven picture of customer behavior. This can be used to create a better shopping experience for the customer, and of course, generate more sales for the store.
About 80 percent of enterprise data consists of unstructured content. Unstructured systems typically have little or no predetermined form and provide users with a wide scope to structure data according to their choice. So it becomes the weapon of choice to gain considerable competitive corporate advantage, and to also gain a more holistic complete picture of future prospects.
About 80 percent of enterprise data consists of unstructured content. Unstructured systems typically have little or no predetermined form and provide users with a wide scope to structure data according to their choice. So it becomes the weapon of choice to gain considerable competitive corporate advantage, and to also gain a more holistic complete picture of future prospects.
About 80 percent of enterprise data consists of unstructured content. Unstructured systems typically have little or no predetermined form and provide users with a wide scope to structure data according to their choice. So it becomes the weapon of choice to gain considerable competitive corporate advantage, and to also gain a more holistic complete picture of future prospects.
The table on your screen shows the result of a survey conducted to ascertain the challenges associated with unstructured data. The survey reveals that the volume of data is the biggest challenge followed by the infrastructure requirement to manage this volume. Managing unstructured data is also difficult because it is not easy to identify it.
Semi-structured data, also known as schema-less or self-describing structure, refers to a form of structured data that contains tags or markup elements in order to separate semantic elements and generate hierarchies of records and fields in the given data. Such type of data does not follow proper structure of data models as in relation databases.
To be organized, semi-structured data should be fed electronically from database systems, file systems, and through data exchange formats including scientific data and XML or eXtensible Markup Language. XML enables data to have an elaborate and intricate structure that is significantly richer and comparatively complex.
To be organized, semi-structured data should be fed electronically from database systems, file systems, and through data exchange formats including scientific data and XML or eXtensible Markup Language. XML enables data to have an elaborate and intricate structure that is significantly richer and comparatively complex.
To be organized, semi-structured data should be fed electronically from database systems, file systems, and through data exchange formats including scientific data and XML or eXtensible Markup Language. XML enables data to have an elaborate and intricate structure that is significantly richer and comparatively complex.
An example of semi-structured data is shown on your screen, which indicates that entities that belong to a same class can have different attributes even if they are grouped together. Now that we have examined the way data arrives and is presented, let us move on to the elements that characterize this data.
Big Data primarily consists of the following three elements: Volume Velocity Variety Lets now take a more detailed look at each of these elements.
Volume is the amount of data generated by organizations or individuals. Today, the volume of data is approaching exabytes. Some experts predict the volume of data to reach zettabytes in the coming years. Think about the numbers – Google Inc processes around 20 petabytes in a single day! While Twitter feeds generate around 80 MB per second!
Velocity describes the rate at which data is generated, captured, and shared. Enterprises can capitalize on data only if it is captured and shared in real-time.
Existing systems such as CRM and ERP face the problem associated with the speed of data, which adds up continuously, and cannot be attended quickly. These systems are able to attend data in batches every few hours; however, the time lag causes the data to lose its importance, and, in the meantime, new data is being constantly generated. Ebay for example, analyzes 5 million transactions per day in real-time to address frauds arising from the usage of Paypal!
A pool of data from social, machine, and mobile sources continues to add new data types and varieties of data to traditional transactional data; thus, data is no longer organized in any predefined form and comprises new types of data, including weblog data, machine data, mobile data, sensor data, social data, and text data.
In this section we will be understanding Big Data Application in business analytics and also the career prospects in Big Data.
Now we will study in detail the application of Big Data in Business Analytics.
Data, which is available in abundance, can be streamlined and exploited for growth and expansion in technology as well as businesses. When data is analyzed successfully, it can be the answer to an important question: how can businesses acquire more customers and gain business insight? The key lies in being able to source, link, understand, and analyze data.
Take a look at this table highlighting different business areas that have benefited by using Big Data and their proportion.
Lets now take a quick look at businesses and industries that are affected by and benefit from Big Data Analytic. Sectors, such as computer and electronic products, and IT have experienced tremendous growth in sales, while sectors, such as finance, insurance, and government have developed accurate assessment techniques.
Big Data has transformed transportation by providing improved traffic information and autonomous features.
Big Data has transformed the modern day education process through innovative approaches for teachers to analyze the students’ ability to comprehend and thus, impart education effectively in accordance with each student’s needs.
The travel industry, too, is using Big Data to conduct business. Most airlines are working toward customer satisfaction by doing more to remember personal preferences. Such customization goes way beyond the mileage rewards—based loyalty programs. Airline companies also apply analytics to pricing, inventory, and advertising to improve customer experiences, which leads to more customer satisfaction, and hence, more business. A similar story can be experienced in the hotel industry as well.
The study and analysis of available data is allowing governments to make informed decisions for fraud management, discover unknown threats, ensure security of global supply chain by monitoring global cargo traffic, use budgets more judiciously, analyze risks, and lots more.
In healthcare, physicians can make use of Big Data to determine the best clinical protocols that will ensure the best health outcome for patients.
Now that you know that Big Data is really BIG in today’s world, you can well understand that so are the opportunities associated with it.
Qualified and experienced Big Data professionals must have a blend of technical expertise, creative and analytical thinking, and communication skills, to be able to effectively collate, clean, analyze, and present information extracted from Big Data. Now that you know that Big Data is really BIG in today’s world, you can well understand that so are the opportunities associated with it.
Most jobs in Big Data are from companies that can be categorized into the following four broad buckets: 1. Big Data technology drivers, e.g., Google 2. Big Data product companies, e.g., Oracle 3. Big Data services companies, e.g., EMC 4. Big Data analytics companies, e.g., Splunk
The flowchart should give you a fairly accurate idea of the step-by-step progress, you can expect from a Big Data certification program – either as an analyst or as a developer.
The flowchart should give you a fairly accurate idea of the step-by-step progress, you can expect from a Big Data certification program – either as an analyst or as a developer.
The flowchart should give you a fairly accurate idea of the step-by-step progress, you can expect from a Big Data certification program – either as an analyst or as a developer.
The flowchart should give you a fairly accurate idea of the step-by-step progress, you can expect from a Big Data certification program – either as an analyst or as a developer.
The flowchart should give you a fairly accurate idea of the step-by-step progress, you can expect from a Big Data certification program – either as an analyst or as a developer.
The flowchart should give you a fairly accurate idea of the step-by-step progress, you can expect from a Big Data certification program – either as an analyst or as a developer.
A Big Data analyst should possess the following technical skills: Understanding of Hadoop, Hive, and MapReduce Knowledge of natural language processing Knowledge of statistical analysis and analytical tools Knowledge of conceptual and predictive modeling
A Big Data analyst should possess the following technical skills: Understanding of Hadoop, Hive, and MapReduce Knowledge of natural language processing Knowledge of statistical analysis and analytical tools Knowledge of conceptual and predictive modeling
A Big Data analyst should possess the following technical skills: Understanding of Hadoop, Hive, and MapReduce Knowledge of natural language processing Knowledge of statistical analysis and analytical tools Knowledge of conceptual and predictive modeling
A Big Data analyst should possess the following technical skills: Understanding of Hadoop, Hive, and MapReduce Knowledge of natural language processing Knowledge of statistical analysis and analytical tools Knowledge of conceptual and predictive modeling
A Big Data analyst should possess the following technical skills: Understanding of Hadoop, Hive, and MapReduce Knowledge of natural language processing Knowledge of statistical analysis and analytical tools Knowledge of conceptual and predictive modeling
A Big Data analyst should possess the following technical skills: Understanding of Hadoop, Hive, and MapReduce Knowledge of natural language processing Knowledge of statistical analysis and analytical tools Knowledge of conceptual and predictive modeling
A Big Data analyst should possess the following technical skills: Understanding of Hadoop, Hive, and MapReduce Knowledge of natural language processing Knowledge of statistical analysis and analytical tools Knowledge of conceptual and predictive modeling
A Big Data analyst should possess the following technical skills: Understanding of Hadoop, Hive, and MapReduce Knowledge of natural language processing Knowledge of statistical analysis and analytical tools Knowledge of conceptual and predictive modeling
Organisations look for professionals who possess good logical & analytical skill, with good communication skills and an affinity toward strategic business thinking. The preferred soft skills requirements for a big data professional are:
Organisations look for professionals who possess good logical & analytical skill, with good communication skills and an affinity toward strategic business thinking. The preferred soft skills requirements for a big data professional are:
Organisations look for professionals who possess good logical & analytical skill, with good communication skills and an affinity toward strategic business thinking. The preferred soft skills requirements for a big data professional are:
Most organizations today consider data and information to be their most valuable and differentiated asset, next to only their employees. By analyzing this data effectively, organizations worldwide are now finding new ways to compete and emerge as leaders in their fields, to improve decision-making, and to enhance performance. At the same time with the volume and variety of data also increasing at an immense speed everyday, the global phenomena of using Big Data to gain business value and competitive advantage will only continue to grow.
To sum it up by analyzing data effectively, organizations worldwide are now finding new ways to compete and emerge as leaders in their fields, to improve decision-making, and to enhance performance. At the same time with the volume and variety of data also increasing at an immense speed everyday, the global phenomena of using Big Data to gain business value and competitive advantage will only continue to grow.
In this class, we’ll look at the significance of social network data in the business context. The previous class gave you a broad idea about “Big Data” and how it affects our lives. In a sense, the data is only as good as the insights provided by it.
Human beings are social animals and cannot live in isolation. A human being gains knowledge, learns to communicate and think, work and play, by living in a social environment.
Today, socialization is not restricted to meeting and communicating with others in person. The usage of mobile phones and the Internet has made communication across the globe fast and easy. These also make socialization and the sharing of information both affordable and easily accessible.
Twitter, Facebook, and LinkedIn are currently some of the most popular social networking sites. These comprise the social media. This session analyzes the Big Data generated by social media and its implications on various industries.
In this topic we will understand the: - Significance of social network data Financial Fraud and Big Data Fraud detection in insurance And use in retail industry
In this topic we will understand the: - Significance of social network data Financial Fraud and Big Data Fraud detection in insurance And use in retail industry
In this topic we will understand the: - Significance of social network data Financial Fraud and Big Data Fraud detection in insurance And use in retail industry
In this topic we will understand the: - Significance of social network data Financial Fraud and Big Data Fraud detection in insurance And use in retail industry
In this session, we’ll look at what is Social Network Data? And what is Social Network Analysis? We will also address what are the uses of Social Network Data Analysis? And lastly what Sentiment Analysis is?
In this session, we’ll look at what is Social Network Data? And what is Social Network Analysis? We will also address what are the uses of Social Network Data Analysis? And lastly what Sentiment Analysis is?
In this session, we’ll look at what is Social Network Data? And what is Social Network Analysis? We will also address what are the uses of Social Network Data Analysis? And lastly what Sentiment Analysis is?
In this session, we’ll look at what is Social Network Data? And what is Social Network Analysis? We will also address what are the uses of Social Network Data Analysis? And lastly what Sentiment Analysis is?
Social network data is the data generated when people socialize or communicate through social media.
As you can see, on social networking sites, numerous people constantly add and update their comments, likes, preferences, sentiments, and feelings and thereby generate huge data. This huge data, when mined and analyzed, throws up collective views and trends with regard to the likes and dislikes, wants and preferences of a large population.
This collective data can also be segregated and analyzed in terms of various groups of people, such as people belonging to various age groups, genders, and locations around the world. This information enables organizations to design and tailor products and services that people want. Such is the importance of social network data.
This collective data can also be segregated and analyzed in terms of various groups of people, such as people belonging to various age groups, genders, and locations around the world. This information enables organizations to design and tailor products and services that people want. Such is the importance of social network data.
This collective data can also be segregated and analyzed in terms of various groups of people, such as people belonging to various age groups, genders, and locations around the world. This information enables organizations to design and tailor products and services that people want. Such is the importance of social network data.
Have a look at this image
Now let’s look at what is Social Network Analysis?
Social Network Analysis or SNA refers to the analysis of the data generated in social networks. As the data used is massive, it leads to a Big Data situation.
Social Network Analysis or SNA refers to the analysis of the data generated in social networks. As the data used is massive, it leads to a Big Data situation.
Social Network Analysis or SNA refers to the analysis of the data generated in social networks. As the data used is massive, it leads to a Big Data situation.
Let’s consider an example of a mobile network operator to understand the value of social network data. The complete set of cell phone calls or text message records captured by an MNO is huge data. Such data is routinely used for a variety of purposes – to possibly tailor offers to the customer, or provide relevant services that the customer routinely uses.
Let’s consider an example of a mobile network operator to understand the value of social network data. The complete set of cell phone calls or text message records captured by an MNO is huge data. Such data is routinely used for a variety of purposes – to possibly tailor offers to the customer, or provide relevant services that the customer routinely uses.
Let’s consider an example of a mobile network operator to understand the value of social network data. The complete set of cell phone calls or text message records captured by an MNO is huge data. Such data is routinely used for a variety of purposes – to possibly tailor offers to the customer, or provide relevant services that the customer routinely uses.
In this example, we will see how data analysis is going up a notch by looking into several degrees of association instead of just one. That’s how social network analysis can make a simple data source into a Big Data source.
This figure represents a caller’s social network. It is possible to go as many layers deep as the analysis can handle to get the complete picture of a social network. The need to go deeper from customer to customer and call to call for several layers makes the volume of data massive. It also increases the difficulty in analyzing it, particularly when it comes to using traditional methods. It works the same way within social networking sites. When analyzing a member of a social network, it isn’t that hard to identify how many connections a member has, how often messages are posted, and other standard metrics.
This figure represents a caller’s social network. It is possible to go as many layers deep as the analysis can handle to get the complete picture of a social network. The need to go deeper from customer to customer and call to call for several layers makes the volume of data massive. It also increases the difficulty in analyzing it, particularly when it comes to using traditional methods. It works the same way within social networking sites. When analyzing a member of a social network, it isn’t that hard to identify how many connections a member has, how often messages are posted, and other standard metrics.
This figure represents a caller’s social network. It is possible to go as many layers deep as the analysis can handle to get the complete picture of a social network. The need to go deeper from customer to customer and call to call for several layers makes the volume of data massive. It also increases the difficulty in analyzing it, particularly when it comes to using traditional methods. It works the same way within social networking sites. When analyzing a member of a social network, it isn’t that hard to identify how many connections a member has, how often messages are posted, and other standard metrics.
This figure represents a caller’s social network. It is possible to go as many layers deep as the analysis can handle to get the complete picture of a social network. The need to go deeper from customer to customer and call to call for several layers makes the volume of data massive. It also increases the difficulty in analyzing it, particularly when it comes to using traditional methods. It works the same way within social networking sites. When analyzing a member of a social network, it isn’t that hard to identify how many connections a member has, how often messages are posted, and other standard metrics.
This figure represents a caller’s social network. It is possible to go as many layers deep as the analysis can handle to get the complete picture of a social network. The need to go deeper from customer to customer and call to call for several layers makes the volume of data massive. It also increases the difficulty in analyzing it, particularly when it comes to using traditional methods. It works the same way within social networking sites. When analyzing a member of a social network, it isn’t that hard to identify how many connections a member has, how often messages are posted, and other standard metrics.
This figure represents a caller’s social network. It is possible to go as many layers deep as the analysis can handle to get the complete picture of a social network. The need to go deeper from customer to customer and call to call for several layers makes the volume of data massive. It also increases the difficulty in analyzing it, particularly when it comes to using traditional methods. It works the same way within social networking sites. When analyzing a member of a social network, it isn’t that hard to identify how many connections a member has, how often messages are posted, and other standard metrics.
This figure represents a caller’s social network. It is possible to go as many layers deep as the analysis can handle to get the complete picture of a social network. The need to go deeper from customer to customer and call to call for several layers makes the volume of data massive. It also increases the difficulty in analyzing it, particularly when it comes to using traditional methods. It works the same way within social networking sites. When analyzing a member of a social network, it isn’t that hard to identify how many connections a member has, how often messages are posted, and other standard metrics.
This figure represents a caller’s social network. It is possible to go as many layers deep as the analysis can handle to get the complete picture of a social network. The need to go deeper from customer to customer and call to call for several layers makes the volume of data massive. It also increases the difficulty in analyzing it, particularly when it comes to using traditional methods. It works the same way within social networking sites. When analyzing a member of a social network, it isn’t that hard to identify how many connections a member has, how often messages are posted, and other standard metrics.
This figure represents a caller’s social network. It is possible to go as many layers deep as the analysis can handle to get the complete picture of a social network. The need to go deeper from customer to customer and call to call for several layers makes the volume of data massive. It also increases the difficulty in analyzing it, particularly when it comes to using traditional methods. It works the same way within social networking sites. When analyzing a member of a social network, it isn’t that hard to identify how many connections a member has, how often messages are posted, and other standard metrics.
However, knowing how wide a network a member has when including friends, friends of friends, and friends of friends of friend, is a lot more work or a Big Data problem.
What are the uses of Social Network Data Analysis?
By using social network data analysis, decision-making can be improved in the following areas: Business Intelligence Marketing Product Design and Development Lets look at each of these in a little more detail.
By using social network data analysis, decision-making can be improved in the following areas: Business Intelligence Marketing Product Design and Development Lets look at each of these in a little more detail.
By using social network data analysis, decision-making can be improved in the following areas: Business Intelligence Marketing Product Design and Development Lets look at each of these in a little more detail.
Let’s look at how it helps in Business Intelligence, in detail You can analyze data generated from social networks to get some high value business insights.
Social Customer Relationship Management (CRM) is a buzzword these days. This analysis is capable of changing the perspective with which organizations value their customers. Rather than considering a single customer’s value, now it is possible to evaluate the value of a customer’s overall network.
Social Customer Relationship Management (CRM) is a buzzword these days. This analysis is capable of changing the perspective with which organizations value their customers. Rather than considering a single customer’s value, now it is possible to evaluate the value of a customer’s overall network.
Social Customer Relationship Management (CRM) is a buzzword these days. This analysis is capable of changing the perspective with which organizations value their customers. Rather than considering a single customer’s value, now it is possible to evaluate the value of a customer’s overall network.
Let’s consider the example of a mobile service provider which has a relatively low-value customer as a subscriber. The customer has a basic call plan, which does not generate any additional revenue. The customer is barely profitable. The service provider would traditionally have valued this customer on the basis of his or her individual account and hence may not have been too worried if the customer had wanted to leave.
With social network analysis, however, it is possible to identify that the same customer can influence the people in his or her network who are heavy users and who have a wide network of friends. This may persuade the company to make an altogether different business decision and value the customer more. This may also be because studies have shown that once a member of a calling circle leaves, others are most likely to follow the first and leave. Using social network analysis, it is possible to understand the potential value that the customers can influence, rather than only the revenue they directly generate. This gives a completely different perspective of how the customer needs to be handled.
Law enforcement and anti-terrorism efforts also leverage social network analysis today. It is possible to recognize individuals who are connected, directly or indirectly, to known trouble groups or persons. Analysis of this type is often referred to as link analysis.
Law enforcement and anti-terrorism efforts also leverage social network analysis today. It is possible to recognize individuals who are connected, directly or indirectly, to known trouble groups or persons. Analysis of this type is often referred to as link analysis.
Law enforcement and anti-terrorism efforts also leverage social network analysis today. It is possible to recognize individuals who are connected, directly or indirectly, to known trouble groups or persons. Analysis of this type is often referred to as link analysis.
So, from the above mentioned examples, we can infer the following business insights: Social network data analysis can help provide new contexts in which decisions are data driven and not opinion driven. Big Data analysis allows organizations to shift goals from maximizing individual account profitability to maximizing the profitability of the customer’s network. Big Data helps organizations to identify highly connected customers and assists in when, where, and how to align and focus marketing efforts in building a better brand image.
So, from the above mentioned examples, we can infer the following business insights: Social network data analysis can help provide new contexts in which decisions are data driven and not opinion driven. Big Data analysis allows organizations to shift goals from maximizing individual account profitability to maximizing the profitability of the customer’s network. Big Data helps organizations to identify highly connected customers and assists in when, where, and how to align and focus marketing efforts in building a better brand image.
So, from the above mentioned examples, we can infer the following business insights: Social network data analysis can help provide new contexts in which decisions are data driven and not opinion driven. Big Data analysis allows organizations to shift goals from maximizing individual account profitability to maximizing the profitability of the customer’s network. Big Data helps organizations to identify highly connected customers and assists in when, where, and how to align and focus marketing efforts in building a better brand image.
It enables organizations to lure highly connected customers with free trials and solicit their feedback for the betterment of their products and services. It assists organizations to encourage internal customers to become more active with feedback and opinions on the product or services
It enables organizations to lure highly connected customers with free trials and solicit their feedback for the betterment of their products and services. It assists organizations to encourage internal customers to become more active with feedback and opinions on the product or services
Let’s look at how social network data analysis can improve decision-making in marketing.
Today’s consumers have changed. They no longer read newspapers end-to-end. They do not see fast-forward TV commercials and junk unsolicited e-mail because they have many choices and new options that fit their digital lifestyle better. Consumers can now choose the marketing messages they wish to receive—when, where, and from whom.
In today’s competitive scenario, marketers deliver what consumers want through relevant interactive communication across digital power channels: e-mail, mobile, social, and the Web.
In today’s competitive scenario, marketers deliver what consumers want through relevant interactive communication across digital power channels: e-mail, mobile, social, and the Web.
In today’s competitive scenario, marketers deliver what consumers want through relevant interactive communication across digital power channels: e-mail, mobile, social, and the Web.
In today’s competitive scenario, marketers deliver what consumers want through relevant interactive communication across digital power channels: e-mail, mobile, social, and the Web.
In today’s competitive scenario, marketers deliver what consumers want through relevant interactive communication across digital power channels: e-mail, mobile, social, and the Web.
These channels, in turn, generate the social data required to provide insights into a target audience’s brand communication preferences, the tone of voice they use, their interests, the other brands they discuss, and plenty of other data that can help a brand tailor consumer communication to eke the most out of them.
These channels, in turn, generate the social data required to provide insights into a target audience’s brand communication preferences, the tone of voice they use, their interests, the other brands they discuss, and plenty of other data that can help a brand tailor consumer communication to eke the most out of them.
These channels, in turn, generate the social data required to provide insights into a target audience’s brand communication preferences, the tone of voice they use, their interests, the other brands they discuss, and plenty of other data that can help a brand tailor consumer communication to eke the most out of them.
These channels, in turn, generate the social data required to provide insights into a target audience’s brand communication preferences, the tone of voice they use, their interests, the other brands they discuss, and plenty of other data that can help a brand tailor consumer communication to eke the most out of them.
These channels, in turn, generate the social data required to provide insights into a target audience’s brand communication preferences, the tone of voice they use, their interests, the other brands they discuss, and plenty of other data that can help a brand tailor consumer communication to eke the most out of them.
Social network analysis of this data has a widespread use in marketing in various interesting ways.
Lets look at how retail giant Walmart is using social media to undserstand their customers better. Walmart recently acquired a a social media analytics company named Kosmix and created Walmart Labs, a division that analyzes media communication to understand retail trends. One of the key responsibilities of this division is to monitor public domain conversations and then position Walmart products accordingly.
Affiliate marketing is a reward-based marketing structure, where an affiliated company uses its own market effort to trigger off customers for another company and in turn, is rewarded by the benefited company. For example Brandlove app. Today, one would be hard-pressed to find a major brand that does not have a thriving affiliate program.
Let’s now look at how social network data analysis can improve decision-making in product design & development.
Millions of status updates, blog posts, photographs, and videos are shared every second.
To be successful, organizations not only need to identify the information relevant to their company, products, and services but should be able to dissect, comprehend, and respond to the relevant information in real time and on a continuous basis.
A system that is able to represent a sentiment as data with increased levels of accuracy provides the client a way to access information on a social platform. To measure sentiment more closely is of great value in designing a product or service. It is important for brands to be able to understand the demographic information they receive.
A system that is able to represent a sentiment as data with increased levels of accuracy provides the client a way to access information on a social platform. To measure sentiment more closely is of great value in designing a product or service. It is important for brands to be able to understand the demographic information they receive.
A system that is able to represent a sentiment as data with increased levels of accuracy provides the client a way to access information on a social platform. To measure sentiment more closely is of great value in designing a product or service. It is important for brands to be able to understand the demographic information they receive.
Let’s now look at what is Sentiment Analysis?
Sentiment analysis is defined as a computer programming technique to analyze human emotions, attitudes, and views across popular social networks including Facebook, Twitter, and blogs. The technique requires analytic skill as well as computing techniques.
By listening to what consumers want, by understanding where the gap in the offering is, and so on, organizations can make the right decisions in the direction of their product development and offerings. In this way, social network data can help organizations improve product development and services, also making sure consumers ultimately get the products and services they want.
However, this technique is still evolving, and the full potential of sentiment analysis is yet to be explored by marketers and other business professionals.
There’s also the issue of judgment. Think of a company relying purely on the number of likes and followers they have to estimate their popularity. Deeper studies could possibly show that most of the trends are negative – yet it may all go towards somehow creating a false social media impression about the company.
American airlines has been ranked one of the most disliked companies in the USA. But their social media presence & its studies have a different story to tell. The airlines has about 346,259 followers on Twitter and 273,591 ‘likes’ on Facebook. Deep studies indicate online conversations about the company that are negative, which indicates that it is one of the most disliked airlines. Hence sentiment & emotive data should be given more importance rather than numbers that come from the “followers” and “likes”.
Under this topic we have discussed in detail about We have looked at Social network Data and its analysis. We have addressed the uses of Social Network Data Analysis and how Sentiment analysis is helpful in making better business decisions.
In this class, we’ll look at the significance of social network data in the business context. The previous class gave you a broad idea about “Big Data” and how it affects our lives. In a sense, the data is only as good as the insights provided by it; hence, it is important to understand how the data is actually.
Now we will look at Financial Fraud and Big Data
Frauds occur frequently in banks and other financial institutions. These financial institutions send educative e-mails and communication on how to prevent such frauds and not be a party to it.
Financial frauds are even higher in the online retail sector. In such frauds cases, online retailers, such as Amazon, eBay, and Groupon, tend to incur huge expenses and losses.
Following are the most common financial frauds that impact online retailers: Credit Card Frauds: This is a widespread and frequent fraud. The online retailer does not see the user of the card, and hence cannot validate the ownership of the card. That a stolen or even fake card was used in a transaction is also quite likely. Despite the several checks in the process of online transactions all the loopholes in the system are not plugged. Exchange or Return Policy Fraud: Every online retailer has a policy on exchange and return, and this provides a strong area for fraudsters to function. Personal Information Fraud: Here, the customer’s login information is stolen, and thereafter the fraudster logs-in, goes about completing the entire sale transaction, and then changes the address for delivery to a different location.
The only way to prevent these frauds is to understand customers‘ ordering patterns and keep a vigil out for red flags.
Big Data can be intelligently used not just to educate online retailers but also to manage and prevent fraud and losses in their business.
LETS LOOK AT HOW THIS IS POSSIBLE Analyzing data to understand various patterns of the fraud was one of the many preventive methods, but it worked only as long as the sample size was small. This size could not be increased because that required huge investments in time and money. With Big Data techniques, however, this challenge can now be overcome.
Big Data analytics can … Run a check on all the data to identify any fraudulent ones. Identify any new ways of fraud and then keep adding them to a set of fraud-prevention checks. It doesn’t impede customers with unnecessary polices and governance structures.
Fraud Detection in Real Time To detect fraud in real time, Big Data uses a real-time comparison of live transitions with various sources of data to authenticate transactions online. For example, if there were to be a transaction online, Big Data would immediately enable comparison between the incoming IP address and the geo-data from the customer’s smart phone apps. A match would authenticate the transaction.
Big Data can also comb through historical data and indicate fraud patterns that are later used to create checks to prevent real-time fraud.
Retailers use real-time analysis effectively by knowing when exactly the items were delivered to customers. High-value items have attached sensors that can transmit their location. When such items are delivered to customers, retailers process the streaming data from these sensors and thus prevent frauds.
Visually Analyzing Frauds Big Data can facilitate drawing maps and graphs that create comparisons, which are then used to make decisions and create effective systems that are accurately placed to block fraud. An analysis in the graphical form, for example, can help identify the regions, customers, and the products that have a higher fraud rate. Big Data can even show comparisons between products and regions, and so on, which alerts the retailer on where a greater probability of fraud exists.
Let’s assume that an insurance company wants to improve its ability to make decisions in real time when processing a new claim, thereby reducing the claim cycle time. On the other hand, the company incurs a steady increase in the cost of litigation and fraudulent claims. The company has policies and procedures to help underwriters evaluate fraudulent claims; however, the underwriters do not have the required data at the right time to make the necessary decisions, further delaying the processing time.
Within this context, the company implements a Big Data analytical platform, which uses data from social media to provide a real-time view. This enables a call center agent to diagnose the patterns of behaviors and the relationships among other claimants when the customer calls in for a claim for the first time, and leaves a note for the underwriters to go through.
In some cases, social media could also provide great triggers to identify fraud; for example, a customer might indicate that his or her car was destroyed in a flood, but the documentation from the social media feed shows that the car was actually in another city on the day the flood occurred. These glaring discrepancies reflect fraud.
Insurance frauds have a huge cost implication on an organization, which is why organizations prefer using Big Data analytics and other advanced technologies to handle this issue. This also has a positive impact on customers as losses are transferred as higher premiums to customers.
Post the implementation of Big Data analytics platform, organizations are now able to analyze complex information and accident scenarios in minutes rather than days or months.
Fraud Detection Methods Traditionally, insurance companies have been using statistical models to identify fraudulent claims. These models have many limitations and can prevent fraud only to a certain extent. This section examines these limitations and how Big Data can overcome them. Insurance companies typically use small samples of data to analyze, which leads to one or more frauds going undetected. This method relies on the previously recorded fraud cases; therefore, every time a fraud based on new technique occurs, insurance companies have to bear the consequences and the losses for the first time. The traditional method of identifying frauds works in independent silos. It is not capable of handling various sources of information from different channels and different functions in an integrated way. Big Data analytics, on the other hand, can handle this kind of challenge.
Public data like bank statements, legal judgments, criminal records and medical bills can provide useful means of predictive analysis in order to avoid frauds. To get the most effective predictive value from such public data, business organizations integrate their internal data with third party data. This integration helps in investigating and restricting fraudulent activities.
Social Network Analysis Earlier, we learned about social network analysis (SNA) and how Big Data can be used to create visibility into blind spots for businesses. SNA is an innovative and effective way to identify and detect frauds.
Consider an example. Assume in an accident, all people involved exchanged their addresses and phone numbers and have given them to the insurer. Among them, if the address given by one of the accident victims reveals several claims or the vehicle is identified to have been involved in other claims as well, this will automatically indicate chances of fraudulent claims. The ability to source this information can result in catching such fraudulent claims faster.
The SNA tool uses a mix of analytical methods. This mixed approach includes statistical methods, pattern analysis, and link analysis to uncover large amounts of data to show relationships.
When link analysis is used in fraud detection, one looks for clusters of data and how those data clusters are linked to other data clusters. As already mentioned, public records are various data sources that can be integrated into a model. Using this approach of integrating various data sources into a model, the insurer can rate claims.
If the rating is high, it indicates that the claim is fraudulent. This might be because of a known bad address, or a suspicious provider, or the vehicle was involved in many accidents with multiple carriers.
Before implementing SNA, however, organizations should consider the following questions carefully: 1. How fast does data arrive?
How much of unrequired data is there when it arrives?
How deep should the analysis be before determining the best accurate results?
What type of user interface components need to be included on the SNA dashboard?
Next is the step-by-step SNA method to detect fraud: 1. The data, both structured and unstructured, from various sources is fed into the ETL (Extract, Transform, and Load) tool. This data is then transformed and loaded into a data warehouse. 2. The analytics team uses information from various sources, scores the risk of fraud and ranks the likelihood of fraud. The information used can come from varied sources such as a prior belief or a previous relationship, the number of rejected claims etc. 3. Several Big Data technologies including text mining, sentiment analysis, content categorization, and social network analysis can be included into the fraud detection and predictive modeling mechanism.
4. Depending on the score of the particular network, an alert is generated. 5. The investigators can then leverage this information and begin researching more on the fraudulent claim. 6. Finally, issues of frauds that are identified are added into the case system.
Predictive Analysis Predictive analysis works with the concept that earlier the fraud detection, the lesser the loss incurred by a business.
Think about a situation where a customer raises a claim saying his car caught fire. But recorded statements indicate that most of the valuable items in the car had been removed prior to the fire. This could raise the suspicion that the car had been torched on purpose.
Predictive analytics includes the use of text analytics and sentiment analysis to look at Big Data for fraud detection. Claim reports are of multiple pages, leaving very little room for text analytics to detect the scam easily. Big Data analytics helps in sifting through unstructured data, which was not possible earlier, and helps in proactively detecting frauds. Predictive analytics technology is being used increasingly to spot potentially fraudulent claims and to speed up the payment of legitimate claims.
Here’s how the predictive analytics technology works: Claim adjusters write lengthy reports while investigating a claim. Typically clues are hidden in the reports that the claims adjuster would not have noticed The computing system that is based on business rules highlights these clues for possible fraud The fraud detection system can spot these discrepancies and flag the claim as fraudulent.
Social Customer Relationship Management (CRM) Social CRM enables effective fraud detection in the insurance sector. Social CRM is neither a platform nor a technology, but a process. It makes it critical that insurance companies link social media sites, such as Facebook and Twitter, to their CRM systems.
When social media is integrated within an organization, it enables greater transparency with customers. Mutually beneficial transparency indicates that the company trusts its customers and vice-versa. This customer-centric ecosystem reinforces that increasingly the customer base is in control. This ecosystem can be beneficial to the business as well if the business is able to leverage the collective intelligence of its customer base.
Here’s how the predictive analytics technology works: Claim adjusters write lengthy reports while investigating a claim. Typically clues are hidden in the reports that the claims adjuster would not have noticed. The computing system that is based on business rules highlights these clues for possible fraud. The fraud detection system can spot these discrepancies and flag the claim as fraudulent.
Today we will discuss the usage of Big Data in the retail industry.
Big Data has huge potential for the retail industry as well, considering the immense number of transactions and the correlation.
Seemingly simple questions are easy to answer when there is a single retail location and a small customer base: How many basic tees did we sell today? What time of the year do we sell most leggings? What else has customer X bought, and what kind of coupons can we send to customer X?
But in larger systems, with millions of transactions being carried out daily, spread across multiple disconnected legacy systems and IT teams, it is impossible to see the full picture of the data.
Finding the link in the company’s sales, between in-store and online sales, can lead to deep insights into customer behavior and overall company health, but often this information is so hard to pull together that the issue goes unaddressed. Retail stores typically run on the legacy point of sale systems that batch updates daily, and often do not communicate with each other, let alone with the e-commerce site. For a marketing analyst, to try and understand the strength and health of their products or campaign, reconciling these systems and their different data can be an impossible task. While omni-channel retailing solutions do exist, they require both store managers and Web developers to learn entirely new systems, incurring huge costs in time and money for company-wide training and systems deployment. Further, accessing data in real time is not often possible, as systems hit scaling issues.
Suppose, you want to know if a particular item is in stock in another nearby store. This information is eventually not readily available and requires phone calls or other communication that adds further time to a transaction and potentially prevents an immediate sale from being made.
As retail gets bigger and wider with technology in the likes of Walmart and Amazon, tracking shipping and production also grows significantly. In these scenarios, Big Data proves to be of immense help. Data from innovative solutions like tagging are used for analysis. These tags can generate a lot of data, that can be analyzed to provide various solutions, some of which are discussed in the next section.
But remember, the fact remains that most of the Big Data is just not required and not useful either. Within a Big Data feed, some information will have long-term strategic value, some will be useful only for immediate and tactical use, and some data won’t be used for anything at all. The key part of taming Big Data is to determine which pieces fall into which category.
Use of RFID (Radio Frequency Identification) Data in Retail A RFID tag refers to a small tag that includes a unique code to identify a product like a UPC code. This tag is placed on shipping pallets or product packages as an adjacent image.
In addition to a bar code, an RFID: Specifies the pallet as allotted to a precise and exclusive set of computer systems. Helps in finding situations where the items have no units left in store. Specifies the number of units of each item remaining in the store, and thereby raises an alarm when restocking is required. Allows better tracking of products by differentiating the products which are out of stock and products that are available on shelf. For example, if a product is unavailable on the shelf, that does not mean that it is not available throughout. Using a RFID reader and a mobile computer—stocks can be identified from the warehouse and replaced immediately.
In addition to these, use of RFID also saves time, reduces labor, enhances the visibility of products throughout the production-delivery life cycle, and saves costs.
Under this topic we have discussed in detail about The uses of Big Data in retail industry.
Welcome to the Big Data Course, jointly presented by Jigsaw Academy and Wiley. Through this course, we hope to create a new international breed of versatile Big Data analysts.
In today’s topic we will look at the various technologies used for handling Big Data.
Today we will further discuss how to make use of the enormous volume and variety of data at the required speed, with a suitable technology framework. So we will look at some of the major technologies related to Big Data that help store, process, and analyse the data and provide required business insights.
Rapid changes in technology radically changes the way data is produced, processed, analysed, and consumed. A huge increase in the amount of data being captured and analysed by organizations as well as that on the Internet, has fuelled the need for huge data sources and efficient processing of that data.
Some of the most popular areas of Big Data-related innovation include those in distributed and parallel computing, Hadoop, cloud for Big Data, and in-memory computing for Big Data. Of all the technologies, Hadoop is perhaps the most popular name identified with Big Data.
Distributed computing is a method in which multiple computing resources are connected in a network and computing tasks are distributed across the resources, thereby increasing the computing power. Distributed computing is faster and more efficient than traditional computing, and hence of immense value when it comes to processing a huge amount of data in a limited time.
Parallel computing is a process where to carry out complex computations, the processing power of a standalone personal computer can also be enhanced by adding multiple processing units. These can carry out the processing of a complex task by breaking it up into subtasks, and carrying out individual sub-tasks simultaneously.
Today markets and businesses are fiercely competitive. At the same time, the volume, variety, and velocity of data available has surged astronomically. To find an edge in the market, organizations feel a need for analysing all the data they can get hold of, and in a very short span of time. This obviously leads to the requirement of large storage and processing powe
In spite of all the technological developments there is a constant problem affecting the process of data collection – latency. Latency is the aggregate delay in the system because of delays in individual tasks involving large amount of data. If you use a wireless phone, you may have experienced latency first-hand in the form of a lag or delay in the transmissions between you and your caller.
In spite of all the technological developments there is a constant problem affecting the process of data collection – latency. Latency is the aggregate delay in the system because of delays in individual tasks involving large amount of data. If you use a wireless phone, you may have experienced latency first-hand in the form of a lag or delay in the transmissions between you and your caller.
In spite of all the technological developments there is a constant problem affecting the process of data collection – latency. Latency is the aggregate delay in the system because of delays in individual tasks involving large amount of data. If you use a wireless phone, you may have experienced latency first-hand in the form of a lag or delay in the transmissions between you and your caller.
In spite of all the technological developments there is a constant problem affecting the process of data collection – latency. Latency is the aggregate delay in the system because of delays in individual tasks involving large amount of data. If you use a wireless phone, you may have experienced latency first-hand in the form of a lag or delay in the transmissions between you and your caller.
In spite of all the technological developments there is a constant problem affecting the process of data collection – latency. Latency is the aggregate delay in the system because of delays in individual tasks involving large amount of data. If you use a wireless phone, you may have experienced latency first-hand in the form of a lag or delay in the transmissions between you and your caller.
This delay leads to slowdown in system performance, data management, and communication within an organization as well as with customers and other external stakeholders. Regular Big Data applications usually suffer from the problem of latency, and hence have lower levels of performance. This is a potential problem for businesses.
This delay leads to slowdown in system performance, data management, and communication within an organization as well as with customers and other external stakeholders. Regular Big Data applications usually suffer from the problem of latency, and hence have lower levels of performance. This is a potential problem for businesses.
This delay leads to slowdown in system performance, data management, and communication within an organization as well as with customers and other external stakeholders. Regular Big Data applications usually suffer from the problem of latency, and hence have lower levels of performance. This is a potential problem for businesses.
This delay leads to slowdown in system performance, data management, and communication within an organization as well as with customers and other external stakeholders. Regular Big Data applications usually suffer from the problem of latency, and hence have lower levels of performance. This is a potential problem for businesses.
As a response to handling all these problems, distributed and parallel processing techniques provided concrete solutions for not just processing large amounts of data in a short span, but also in dealing with the latency.
As a response to handling all these problems, distributed and parallel processing techniques provided concrete solutions for not just processing large amounts of data in a short span, but also in dealing with the latency.
As a response to handling all these problems, distributed and parallel processing techniques provided concrete solutions for not just processing large amounts of data in a short span, but also in dealing with the latency.
A collection of independent computer systems that are connected via a network to accomplish a specific task. The connected computers are loosely coupled and can access data and resources that are remotely located.
A collection of independent computer systems that are connected via a network to accomplish a specific task. The connected computers are loosely coupled and can access data and resources that are remotely located.
A collection of independent computer systems that are connected via a network to accomplish a specific task. The connected computers are loosely coupled and can access data and resources that are remotely located.
A computer system that has multiple processing units attached to it. These systems are tightly coupled and are usually employed to solve a single complex problem.
Several servers are connected to form a network, so that the workload can be shared amongst them. A cluster equipped with the same type of commodity hardware is called homogeneous cluster. A cluster equipped with a combination of different hardware is called heterogeneous cluster. An organization can utilize the hardware components acquired over a period of time, to form a cluster or grid. This method is usually cost-effective. Also, grids offer cost-effective storage solutions, although the overall costs may be high.
An MPP platform is a single machine that works like a grid. It handles storage, memory, and computing tasks. These capabilities are optimized by software written especially for the MPP platform. The platform is also optimized for scalability. MPP platforms are suitable for high value uses. EMC Greenplum and ParAccel are examples of MPP platforms.
HPC environments offer very high performance and scalability. They use in-memory technology and are used for high-speed floating point processing. You will read more about in-memory technology in the following sections. HPC environments are ideal for specialty applications and custom application development. These environments are suitable for research or business organizations where high costs are acceptable because the results are very valuable, or the project is strategically important.
A public cloud environment is a type of cluster or grid that can be accessed through the internet. The cloud owner or vendor develops a cluster, and then allows customers to use it for storage or computing tasks for a fee. Amazon and EC2 are examples of public clouds. A public cloud gives businesses the flexibility to buy computing power as per their needs. In case of a private cloud environment, an organization’s cluster is private and accessible through its network. The private cluster is suitable for businesses that have high priority for data privacy. The cost of private clouds is shared among the business units.
A public cloud environment is a type of cluster or grid that can be accessed through the internet. The cloud owner or vendor develops a cluster, and then allows customers to use it for storage or computing tasks for a fee. Amazon and EC2 are examples of public clouds. A public cloud gives businesses the flexibility to buy computing power as per their needs. In case of a private cloud environment, an organization’s cluster is private and accessible through its network. The private cluster is suitable for businesses that have high priority for data privacy. The cost of private clouds is shared among the business units.
A public cloud environment is a type of cluster or grid that can be accessed through the internet. The cloud owner or vendor develops a cluster, and then allows customers to use it for storage or computing tasks for a fee. Amazon and EC2 are examples of public clouds. A public cloud gives businesses the flexibility to buy computing power as per their needs. In case of a private cloud environment, an organization’s cluster is private and accessible through its network. The private cluster is suitable for businesses that have high priority for data privacy. The cost of private clouds is shared among the business units.
A public cloud environment is a type of cluster or grid that can be accessed through the internet. The cloud owner or vendor develops a cluster, and then allows customers to use it for storage or computing tasks for a fee. Amazon and EC2 are examples of public clouds. A public cloud gives businesses the flexibility to buy computing power as per their needs. In case of a private cloud environment, an organization’s cluster is private and accessible through its network. The private cluster is suitable for businesses that have high priority for data privacy. The cost of private clouds is shared among the business units.
In this session we will study Hadoop in detail, one of the most preferred technologies to handle Big Data.
Hadoop is an open-source platform designed to work with huge volumes of structured and unstructured data—Big Data. Working with such volume of data needs deep analytical technology, which requires greater computational power.
Lets look at some of the features of Hadoop: It can work on a large number of machines that do not share any memory or disks. This solves the twin Big Data problems of efficient storage and access. Storage improves when the data is loaded on a Hadoop platform because Hadoop distributes data over different servers. Access improves because Hadoop can track the data stored on the different servers. Processing improves as Hadoop runs computing tasks by using all available processors working in parallel. Hadoop improves resilience by keeping multiple copies of data, which can be used in case of server failure.
Lets look at some of the features of Hadoop: It can work on a large number of machines that do not share any memory or disks. This solves the twin Big Data problems of efficient storage and access. Storage improves when the data is loaded on a Hadoop platform because Hadoop distributes data over different servers. Access improves because Hadoop can track the data stored on the different servers. Processing improves as Hadoop runs computing tasks by using all available processors working in parallel. Hadoop improves resilience by keeping multiple copies of data, which can be used in case of server failure.
Lets look at some of the features of Hadoop: It can work on a large number of machines that do not share any memory or disks. This solves the twin Big Data problems of efficient storage and access. Storage improves when the data is loaded on a Hadoop platform because Hadoop distributes data over different servers. Access improves because Hadoop can track the data stored on the different servers. Processing improves as Hadoop runs computing tasks by using all available processors working in parallel. Hadoop improves resilience by keeping multiple copies of data, which can be used in case of server failure.
Lets look at some of the features of Hadoop: It can work on a large number of machines that do not share any memory or disks. This solves the twin Big Data problems of efficient storage and access. Storage improves when the data is loaded on a Hadoop platform because Hadoop distributes data over different servers. Access improves because Hadoop can track the data stored on the different servers. Processing improves as Hadoop runs computing tasks by using all available processors working in parallel. Hadoop improves resilience by keeping multiple copies of data, which can be used in case of server failure.
Lets look at some of the features of Hadoop: It can work on a large number of machines that do not share any memory or disks. This solves the twin Big Data problems of efficient storage and access. Storage improves when the data is loaded on a Hadoop platform because Hadoop distributes data over different servers. Access improves because Hadoop can track the data stored on the different servers. Processing improves as Hadoop runs computing tasks by using all available processors working in parallel. Hadoop improves resilience by keeping multiple copies of data, which can be used in case of server failure.
So how does Hadoop use multiple computing resources to execute a task? The Hadoop Distributed File System (HDFS) is a reliable, high-bandwidth, low-cost data storage cluster that facilitates management of related files across machines. The Hadoop MapReduce Engine is a high-performance parallel/distributed data-processing implementation of the MapReduce algorithm.
So how does Hadoop use multiple computing resources to execute a task? The Hadoop Distributed File System (HDFS) is a reliable, high-bandwidth, low-cost data storage cluster that facilitates management of related files across machines. The Hadoop MapReduce Engine is a high-performance parallel/distributed data-processing implementation of the MapReduce algorithm.
Hadoop is designed to process large amounts of structured and unstructured data and is implemented in racks of commodity servers as a Hadoop cluster. Each server works independently at its task and returns its response. Servers can also be removed or added from the cluster dynamically because Hadoop is able to detect changes, including failures, and adjust to those changes and continue to operate without interruption.
Hadoop is designed to process large amounts of structured and unstructured data and is implemented in racks of commodity servers as a Hadoop cluster. Each server works independently at its task and returns its response. Servers can also be removed or added from the cluster dynamically because Hadoop is able to detect changes, including failures, and adjust to those changes and continue to operate without interruption.
Hadoop is designed to process large amounts of structured and unstructured data and is implemented in racks of commodity servers as a Hadoop cluster. Each server works independently at its task and returns its response. Servers can also be removed or added from the cluster dynamically because Hadoop is able to detect changes, including failures, and adjust to those changes and continue to operate without interruption.
Hadoop is designed to process large amounts of structured and unstructured data and is implemented in racks of commodity servers as a Hadoop cluster. Each server works independently at its task and returns its response. Servers can also be removed or added from the cluster dynamically because Hadoop is able to detect changes, including failures, and adjust to those changes and continue to operate without interruption.
Hadoop is designed to process large amounts of structured and unstructured data and is implemented in racks of commodity servers as a Hadoop cluster. Each server works independently at its task and returns its response. Servers can also be removed or added from the cluster dynamically because Hadoop is able to detect changes, including failures, and adjust to those changes and continue to operate without interruption.
MapReduce is the programming model which allows mapping the tasks to different servers and reducing the responses to one result. Hadoop MapReduce is an implementation of the MapReduce algorithm developed and maintained by the Apache project. This algorithm provides the capabilities to break data into manageable chunks, process the data in parallel on the distributed cluster, and then make the data available for user consumption or additional processing.
The map component of MapReduce distributes the programming problem or tasks across a large number of systems, and handles the placement of the tasks in a way that balances the load and manages recovery from failures. After the distributed computation is completed, another function called Reduce, aggregates all the elements back together to provide a result.
When Hadoop receives an indexing job, the data of an organization is first loaded into the Hadoop software, which then divides the data into different pieces and sends each piece of data to different servers. Hadoop keeps track of the data by sending a job code to all the servers that store the relevant piece of data. Each server then applies the job code to the portion of data stored on it and returns the results.
When Hadoop receives an indexing job, the data of an organization is first loaded into the Hadoop software, which then divides the data into different pieces and sends each piece of data to different servers. Hadoop keeps track of the data by sending a job code to all the servers that store the relevant piece of data. Each server then applies the job code to the portion of data stored on it and returns the results.
When Hadoop receives an indexing job, the data of an organization is first loaded into the Hadoop software, which then divides the data into different pieces and sends each piece of data to different servers. Hadoop keeps track of the data by sending a job code to all the servers that store the relevant piece of data. Each server then applies the job code to the portion of data stored on it and returns the results.
When Hadoop receives an indexing job, the data of an organization is first loaded into the Hadoop software, which then divides the data into different pieces and sends each piece of data to different servers. Hadoop keeps track of the data by sending a job code to all the servers that store the relevant piece of data. Each server then applies the job code to the portion of data stored on it and returns the results.
This chart describes the process of job tracking in MapReduce.
Lets look at an example to understand how Hadoop works. Consider the records of all telephone calls in a city. Suppose, a researcher wants to know the number of college students who made calls at the time of a particular event. The indexing query would specify the relevant user information and the time of the event. Each server would search its collection of call records and return the ones that match the query. Hadoop would put together all these sets into one result. Lets suppose, all records of telephone calls are stored in the csv format in the server. First, the data is loaded in Hadoop and then the MapReduce programming model is used to process the data. Suppose there are five columns in the csv file: user_id user_name city_name service_provider_name and call_time
Lets look at an example to understand how Hadoop works. Consider the records of all telephone calls in a city. Suppose, a researcher wants to know the number of college students who made calls at the time of a particular event. The indexing query would specify the relevant user information and the time of the event. Each server would search its collection of call records and return the ones that match the query. Hadoop would put together all these sets into one result. Lets suppose, all records of telephone calls are stored in the csv format in the server. First, the data is loaded in Hadoop and then the MapReduce programming model is used to process the data. Suppose there are five columns in the csv file: user_id user_name city_name service_provider_name and call_time
To find the number of users or students who made calls at a particular time, a student is identified by the user_id. The final output is the total number of users who made calls during a particular time period, say, 9–10 pm. To get the final output, the data is passed line by line to each mapper. After completion of the mapper job, the Hadoop framework shuffles or sorts and groups the data and sends it to the reducer, which gives the final output. The Hadoop platform also facilitates data storage on many machines. This facility allows a business to use multiple commodity servers and run Hadoop on each, instead of creating an integrated system.
Welcome to the Big Data Course, jointly presented by Jigsaw Academy and Wiley. Through this course, we hope to create a new international breed of versatile Big Data analysts.
This topic deals with the various technologies used for handling Big Data.
In this session we will understand cloud computing & various in-memory technologies for handling Big Data.
Cloud-based application platforms enable easy availability of computing resources to an application, and lets you pay for these resources accordingly, depending on what and how much you use. In the context of cloud computing, such a feature is called elasticity—you can regulate and access the computing resources dynamically with a touch of a button and pay.
In cloud computing, all data is gathered in data centers and then distributed to the end-users. Further, automatic backups and recovery of data is also ensured for business continuity. The primary reason Cloud and Big Data analytics complement each other is because Cloud, like Big Data, uses distributed computing as well.
Amazon & Google are two large companies who are required to have massive capability to manage huge amounts of data to move their business. They need infrastructure and technologies that can support their applications at a huge scale. Think of the millions of g-mail messages that Google needs to process every minute, or every second as a part of this job. Google has been able to optimize the Linux OS and its software environment to support e-mails efficiently. Its able to capture and leverage massive amounts of data about its mail users and search engine users to drive its business. Similarly, Amazon with its IaaS data centres is optimized to facilitate massive workloads to offer services and support to innumerable centers. Both these companies now offer a range of cloud-based services for Big Data as well.
Scalability - Even if organizations increase the processing power of their hardware, they may need to change the architecture, and may face issues with running the software on the new hardware. Cloud provides the solution to this. It provides scalability by using distributed computing. Elasticity - Cloud solutions allow customers to use and pay for the exact amount of cloud service they need, depending on the requirement; for example, a business may expect more data during an in-store promotion, and hence can buy more processing power during such times. Resource Pooling - Multiple organizations using similar computing resources do not need to invest in them individually. Cloud can offer these resources, and as these resources are used by many, the cost to the cloud comes down. Self-service - Customers can access cloud services directly through a user interface that allows them to choose the services they want. This is automated and does not need human intervention. Low Costs - Businesses do not need to make a large initial investment in computing resources to handle huge operations such as Big Data analytics. They can sign up for a cloud service and pay as they use. In the process, the cloud provider enjoys economies of scale. This benefits the customers. Fault Tolerance - If a part of the cloud fails, the other parts can take over and give customers uninterrupted service.
Scalability - Even if organizations increase the processing power of their hardware, they may need to change the architecture, and may face issues with running the software on the new hardware. Cloud provides the solution to this. It provides scalability by using distributed computing. Elasticity - Cloud solutions allow customers to use and pay for the exact amount of cloud service they need, depending on the requirement; for example, a business may expect more data during an in-store promotion, and hence can buy more processing power during such times. Resource Pooling - Multiple organizations using similar computing resources do not need to invest in them individually. Cloud can offer these resources, and as these resources are used by many, the cost to the cloud comes down. Self-service - Customers can access cloud services directly through a user interface that allows them to choose the services they want. This is automated and does not need human intervention. Low Costs - Businesses do not need to make a large initial investment in computing resources to handle huge operations such as Big Data analytics. They can sign up for a cloud service and pay as they use. In the process, the cloud provider enjoys economies of scale. This benefits the customers. Fault Tolerance - If a part of the cloud fails, the other parts can take over and give customers uninterrupted service.
Scalability - Even if organizations increase the processing power of their hardware, they may need to change the architecture, and may face issues with running the software on the new hardware. Cloud provides the solution to this. It provides scalability by using distributed computing. Elasticity - Cloud solutions allow customers to use and pay for the exact amount of cloud service they need, depending on the requirement; for example, a business may expect more data during an in-store promotion, and hence can buy more processing power during such times. Resource Pooling - Multiple organizations using similar computing resources do not need to invest in them individually. Cloud can offer these resources, and as these resources are used by many, the cost to the cloud comes down. Self-service - Customers can access cloud services directly through a user interface that allows them to choose the services they want. This is automated and does not need human intervention. Low Costs - Businesses do not need to make a large initial investment in computing resources to handle huge operations such as Big Data analytics. They can sign up for a cloud service and pay as they use. In the process, the cloud provider enjoys economies of scale. This benefits the customers. Fault Tolerance - If a part of the cloud fails, the other parts can take over and give customers uninterrupted service.
Scalability - Even if organizations increase the processing power of their hardware, they may need to change the architecture, and may face issues with running the software on the new hardware. Cloud provides the solution to this. It provides scalability by using distributed computing. Elasticity - Cloud solutions allow customers to use and pay for the exact amount of cloud service they need, depending on the requirement; for example, a business may expect more data during an in-store promotion, and hence can buy more processing power during such times. Resource Pooling - Multiple organizations using similar computing resources do not need to invest in them individually. Cloud can offer these resources, and as these resources are used by many, the cost to the cloud comes down. Self-service - Customers can access cloud services directly through a user interface that allows them to choose the services they want. This is automated and does not need human intervention. Low Costs - Businesses do not need to make a large initial investment in computing resources to handle huge operations such as Big Data analytics. They can sign up for a cloud service and pay as they use. In the process, the cloud provider enjoys economies of scale. This benefits the customers. Fault Tolerance - If a part of the cloud fails, the other parts can take over and give customers uninterrupted service.
Scalability - Even if organizations increase the processing power of their hardware, they may need to change the architecture, and may face issues with running the software on the new hardware. Cloud provides the solution to this. It provides scalability by using distributed computing. Elasticity - Cloud solutions allow customers to use and pay for the exact amount of cloud service they need, depending on the requirement; for example, a business may expect more data during an in-store promotion, and hence can buy more processing power during such times. Resource Pooling - Multiple organizations using similar computing resources do not need to invest in them individually. Cloud can offer these resources, and as these resources are used by many, the cost to the cloud comes down. Self-service - Customers can access cloud services directly through a user interface that allows them to choose the services they want. This is automated and does not need human intervention. Low Costs - Businesses do not need to make a large initial investment in computing resources to handle huge operations such as Big Data analytics. They can sign up for a cloud service and pay as they use. In the process, the cloud provider enjoys economies of scale. This benefits the customers. Fault Tolerance - If a part of the cloud fails, the other parts can take over and give customers uninterrupted service.
Scalability - Even if organizations increase the processing power of their hardware, they may need to change the architecture, and may face issues with running the software on the new hardware. Cloud provides the solution to this. It provides scalability by using distributed computing. Elasticity - Cloud solutions allow customers to use and pay for the exact amount of cloud service they need, depending on the requirement; for example, a business may expect more data during an in-store promotion, and hence can buy more processing power during such times. Resource Pooling - Multiple organizations using similar computing resources do not need to invest in them individually. Cloud can offer these resources, and as these resources are used by many, the cost to the cloud comes down. Self-service - Customers can access cloud services directly through a user interface that allows them to choose the services they want. This is automated and does not need human intervention. Low Costs - Businesses do not need to make a large initial investment in computing resources to handle huge operations such as Big Data analytics. They can sign up for a cloud service and pay as they use. In the process, the cloud provider enjoys economies of scale. This benefits the customers. Fault Tolerance - If a part of the cloud fails, the other parts can take over and give customers uninterrupted service.
A public cloud is owned and operated by an organization, for use by other organizations and individuals. A public cloud offers a range of computing services. For each category of service, it specializes in a specific type of workload. By specializing, the cloud can customize hardware and software to optimize performance. Customization makes the computing process highly scalable; for example, a cloud can specialize in storing videos for live streaming on YouTube or Vimeo and optimize to handle a large volume of traffic. For businesses, public cloud provides economical storage solutions and is an efficient way to handle complex data analysis. However these factors sometimes increases the risk of security & latency.
A private cloud is owned and operated by an organization for its own purposes. Besides the employees, partners and customers of the organization also use the private cloud. Private cloud is designed for one organization, and incorporates the systems and processes of that organization, including the organization’s business rules, governance policies, and compliance checks. Things that need to be done manually in the public cloud because of different specifications given by multiple customers, can be automated in the private cloud. This cloud is thus highly automated and also protected by a firewall. This reduces latency and improves security, making it ideal for Big Data analytics.
Apart from being used for Big Data analytics, the Cloud is used for several purposes such as storage, backup, and customer services. As more people use computers on the go, business tasks have shifted to laptops and mobile devices and subsequently to the cloud. Consumers may order a product from their home, and the store receives the order and sends instructions to the warehouse, which delivers the product. The store could be using the cloud to receive the order and send instructions, as well as to handle payments and track deliveries. These tasks can also be done without using cloud computing, but cloud computing lowers infrastructure costs and provides scalable content storage.
Infrastructure as a service Infrastructure refers to hardware, storage, and network. When you pay to save your holiday photographs on a cloud, you use a public IaaS. When an employee saves a work report on the organization’s backup server, the employee uses a private IaaS. IaaS provides hardware, storage, and network as a service. Examples of IaaS are virtual machines, load balancers, and network-attached storage. A business can save investments in physical infrastructure by using a public cloud IaaS. The business can choose the OS, and IaaS allows the business to create virtual machines with scalable storage and processing power.
Platform as a service PaaS provides a platform to write and run users applications. The Platform refers to the OS, which is a collection of middleware services and software development and deployment tools. Examples of PaaS are Windows Azure and Google App Engine or GAE. When an organization has a private cloud PaaS, programmers in the business unit can create and deploy applications for their needs. PaaS makes it easier to experiment with new applications.
Software as a service SaaS provides software that can be accessed from anywhere. Customers can use software on the cloud without buying and installing it on their own devices. These software applications are offered on monthly or yearly contracts. For SaaS to work, the infrastructure (IaaS) and the platform (PaaS) must be in place. An organization can maintain a custom-developed application in its private cloud and link it to Big Data stored in a public cloud. In a hybrid cloud, the application can efficiently analyze the data by using the strengths of private and public clouds.
Among the many established and new cloud service providers, some offer resources specifically for Big Data analytics. Lets look at a few of these: Amazon - The development of Amazon’s IaaS, called Elastic Compute Cloud (Amazon EC2) was a result of the company’s massive infrastructure of computing resources for its own business, which were actually underused. So, Amazon decided to rent them out and earn revenues. The word “elastic” in the name is justified because these resources can be scaled hour by hour.
In addition to Amazon EC2, Amazon Web Services offers the following services: Amazon Elastic MapReduce which Refers to a Web service that provides cost-effective processing of vast amounts of data by using Amazon EC2 and Amazon Simple Storage Service (Amazon S3). Amazon DynamoDB which Refers to a NoSQL database service that stores data items on Solid State Drives (SSDs) and replicates data with high availability and durability. Amazon Simple Storage Service (S3) It refers to a Web interface for storing data over the Internet and for Web-scale computing. Amazon High-Performance Computing It Refers to a low-latency network with high bandwidths and computational capabilities for solving problems from educational and business fields. Amazon RedShift - Which refers to a petabyte-scale data warehouse service for analysing data by using existing business intelligence tools in a cost-effective manner.
In addition to Amazon EC2, Amazon Web Services offers the following services: Amazon Elastic MapReduce which Refers to a Web service that provides cost-effective processing of vast amounts of data by using Amazon EC2 and Amazon Simple Storage Service (Amazon S3). Amazon DynamoDB which Refers to a NoSQL database service that stores data items on Solid State Drives (SSDs) and replicates data with high availability and durability. Amazon Simple Storage Service (S3) It refers to a Web interface for storing data over the Internet and for Web-scale computing. Amazon High-Performance Computing It Refers to a low-latency network with high bandwidths and computational capabilities for solving problems from educational and business fields. Amazon RedShift - Which refers to a petabyte-scale data warehouse service for analysing data by using existing business intelligence tools in a cost-effective manner.
In addition to Amazon EC2, Amazon Web Services offers the following services: Amazon Elastic MapReduce which Refers to a Web service that provides cost-effective processing of vast amounts of data by using Amazon EC2 and Amazon Simple Storage Service (Amazon S3). Amazon DynamoDB which Refers to a NoSQL database service that stores data items on Solid State Drives (SSDs) and replicates data with high availability and durability. Amazon Simple Storage Service (S3) It refers to a Web interface for storing data over the Internet and for Web-scale computing. Amazon High-Performance Computing It Refers to a low-latency network with high bandwidths and computational capabilities for solving problems from educational and business fields. Amazon RedShift - Which refers to a petabyte-scale data warehouse service for analysing data by using existing business intelligence tools in a cost-effective manner.
In addition to Amazon EC2, Amazon Web Services offers the following services: Amazon Elastic MapReduce which Refers to a Web service that provides cost-effective processing of vast amounts of data by using Amazon EC2 and Amazon Simple Storage Service (Amazon S3). Amazon DynamoDB which Refers to a NoSQL database service that stores data items on Solid State Drives (SSDs) and replicates data with high availability and durability. Amazon Simple Storage Service (S3) It refers to a Web interface for storing data over the Internet and for Web-scale computing. Amazon High-Performance Computing It Refers to a low-latency network with high bandwidths and computational capabilities for solving problems from educational and business fields. Amazon RedShift - Which refers to a petabyte-scale data warehouse service for analysing data by using existing business intelligence tools in a cost-effective manner.
In addition to Amazon EC2, Amazon Web Services offers the following services: Amazon Elastic MapReduce which Refers to a Web service that provides cost-effective processing of vast amounts of data by using Amazon EC2 and Amazon Simple Storage Service (Amazon S3). Amazon DynamoDB which Refers to a NoSQL database service that stores data items on Solid State Drives (SSDs) and replicates data with high availability and durability. Amazon Simple Storage Service (S3) It refers to a Web interface for storing data over the Internet and for Web-scale computing. Amazon High-Performance Computing It Refers to a low-latency network with high bandwidths and computational capabilities for solving problems from educational and business fields. Amazon RedShift - Which refers to a petabyte-scale data warehouse service for analysing data by using existing business intelligence tools in a cost-effective manner.
Now lets look closer at what Google has to offer in terms of services designed for Big Data: Google Compute Engine: It is a secure and flexible virtual machine computing environment. Google BigQuery: It is a Desktop as a Service (DaaS) that searches large datasets at high speeds on the basis of queries in the SQL format. Google Prediction API: It identifies patterns in data, stores the patterns, and improves the pattern with every use.
Now lets look closer at what Google has to offer in terms of services designed for Big Data: Google Compute Engine: It is a secure and flexible virtual machine computing environment. Google BigQuery: It is a Desktop as a Service (DaaS) that searches large datasets at high speeds on the basis of queries in the SQL format. Google Prediction API: It identifies patterns in data, stores the patterns, and improves the pattern with every use.
Now lets look closer at what Google has to offer in terms of services designed for Big Data: Google Compute Engine: It is a secure and flexible virtual machine computing environment. Google BigQuery: It is a Desktop as a Service (DaaS) that searches large datasets at high speeds on the basis of queries in the SQL format. Google Prediction API: It identifies patterns in data, stores the patterns, and improves the pattern with every use.
And next, lets see what Windows Azure is all about. On the basis of Windows and SQL abstractions, Microsoft has produced a set of development tools, virtual machine support, management and media services, and mobile device services in a PaaS offering. For customers with deep expertise in .NET, SQL Server, and Windows, the adoption of the Azure-based PaaS is straightforward. To address the emerging requirements to integrate Big Data into Windows Azure solutions, Microsoft has also added Windows Azure HDInsight. Built on the Hortonworks Data Platform (HDP), which according to Microsoft, offers 100 percent compatibility with Apache Hadoop, HDInsight supports connection with Microsoft Excel and other Business Intelligence tools. In addition, Azure HDInsight can also be deployed on the Windows Server.
Large organizations store data in a central warehouse, and all users have to access it from there, usually through the IT department. In-memory technology makes it possible for departments or business units to take the part of the organizational data that is relevant to their needs and process it locally. This reduces the workload on the central warehouse. Users do not need the IT department to work with the data
Large organizations store data in a central warehouse, and all users have to access it from there, usually through the IT department. In-memory technology makes it possible for departments or business units to take the part of the organizational data that is relevant to their needs and process it locally. This reduces the workload on the central warehouse. Users do not need the IT department to work with the data
Large organizations store data in a central warehouse, and all users have to access it from there, usually through the IT department. In-memory technology makes it possible for departments or business units to take the part of the organizational data that is relevant to their needs and process it locally. This reduces the workload on the central warehouse. Users do not need the IT department to work with the data
Large organizations store data in a central warehouse, and all users have to access it from there, usually through the IT department. In-memory technology makes it possible for departments or business units to take the part of the organizational data that is relevant to their needs and process it locally. This reduces the workload on the central warehouse. Users do not need the IT department to work with the data
In this session we discussed cloud computing & various in-memory technologies for handling Big Data.

Big data gaurav

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (20)

Similar a Big data gaurav

Similar a Big data gaurav (20)

Último

Último (20)

Big data gaurav

Notas del editor