SlideShare una empresa de Scribd logo
1 de 78
Demystify
Technology Basics
Big Data Overview & Snapshot
Big Data Architecture : Deep Dive
Hadoop Overview
Clear Understanding of Data Science
Big Data Career Opportunities
Q & A
1
What we will cover in the 60 mins
2
3
4
5
6
7
Apart from that we will also cover …
• An overview of the shift to Data Science Platforms
• The 3 critical components of a Data Science platform
• Industries that are most likely to get disrupted and shift to Data Science
• Characteristics of firms that get left behind the Data Science wave
• Factors that push an industry towards Data Science
• A brief overview of aspects of platform architecture beyond technology
Who am I ?
• Mahesh Kumar CV is A Big Data Entrepreneur
• Mahesh got about 14 years of experience in architecting and
developing distributed and real-time data-driven systems.
• Specialties: Translating big data into action, Big Data Trainings,
Product Engineering Services, and Building Big Data CoE & Big Data
Incubators
• Written more than 60 Blogs in Big Data & SAP Analytics
• Worked in the past with IBM, Mindtree, CSC & Rolta companies
• Conducted couple of Boot camps & Workshops in Different
companies
Data Vs Information
• Data refers to a collection of numbers, characters and is a relative term;
• Data is Raw, Facts , Figures etc
• Information is Process Data
Structure Data Vs Unstructured Data
So where is this data getting generated ?
Social Networking
and Media:
700 million
Facebook users, 250
million Twitter users
175+ million
public blogs
Each Facebook
update, Tweet, blog
post and comment
creates multiple
new data points,
both structured,
semi-structured and
unstructured
Mobile Devices:
5 billion mobile
phones in use
worldwide
Each call, text and
instant message
is logged as data
particularly smart
phones and
tablets, also make
it easier to use
social media
Internet
Transactions:
Billions of online
purchases, stock
trades and other
transactions
happen every day,
including countless
automated
transactions
Each creates a
number of data
points collected
by retailers,
banks, credit
cards, credit
agencies and
others
Networked Devices
and Sensors:
Electronic
devices of all
sorts – including
servers and other
IT hardware,
smart energy
meters and
temperature
sensors -- all
create semi-
structured log
data that record
every action
Build Vs Buy
HUMAN DRIVEN
EMAIL
WEB LOGS
DOCUMENTS
SOCIAL
MACHINE DRIVEN
SATELLITE IMAGES
BIO-
INFORMATICS
M2M LOG FILES
SENSORS
VIDEO
AUDIO
BUSINESS DRIVEN
OLTP
ALL DATA TYPES
1X 10X 100X
BIG DATA
TODAY
BIG DATA TOMORROW
Defining Big Data
Any amount
of data
that's too
BIGto be handled by one computer
John Rauser
Why Big Data
12 TB of Tweets in a Day
80%
Of world’s data
is unstructured
30 billion pieces of content
shared on Facebook every
month
Expected Data in 2020 would be
35 ZB
5 Million Trade events per
second
2267 Billion Internet
Users
4.7 billion searches on
Google per day
5 Billion people
tweet,text,call and browse
on mobile phones daily
Walmart handles 1 Million transaction per
hour
255 Million
Websites
Big Data Reference Architecture
Structured Data Sources
Data Integration (Batch /
Near real-time)
Data Repositories
MDM
End User Analytics
Reports / Dashboards
Unstructured/Semi-
structured Data Sources
Web logs, Application /
Network log, Social, Chat
transcripts, Emails
Legacy applications, ERP
and CRM applications
Data Extraction
External feeds
Instrumentation data /
Sensors, RFID, Telematics,
Time and Location data
Real-time Streaming/Integration
Data Cleaning and
Transformation
Change Data Capture for
Structured Data
Change Data Capture
ODS
Analytics
Data Warehouse
DW Appliances
Data Marts
MOLAP CubeIn-memory Databases
Unstructured / Semi-
structured data
Scorecards and Metrics
Events and Alerts
Data Mining and Exploration
Predictive Analytics
Text Analytics
Visual Exploration
Mobile BI
Columnar Databases
Columnar
Databases
Structured Data Sources
Data
Integration
Data Repositories
MDM
End User Analytics
Reports
Unstructured/Semi-
structured Data Sources
Web logs, Application / Network
log, Social, Chat transcripts,
Emails
Legacy and ERP
Data
Extraction,
Transformation
External feeds
Instrumentation data / Sensors,
RFID, Telematics, Time and
Location data
Real-time Streaming /
Integration
Data
Quality
CDC for
Structured
data
Change
Data
Capture
ODS
DW
DW
Appliance
Data
Marts
MOLAP
Cube
In-memory
Databases
Unstructured /
Semi-structured
Scorecards /
Metrics
Events /
Alerts
Data
Mining
Predictive
Analytics
Text
Analytics
HANA / BW
/ Sybase
SAP HANA
Dash
boards
BO WebI /
Crystal
Reports
BO dashboard
Data
Exploration
Mobile
BI
SAP HANA
Sybase IQ /
HANA
BO Mobile
SAP HANA/
Sybase
RDS /
Rapid
Marts
SAP BW
SAP Lumira
SAP Predictive
Analysis
Analytics
Hadoop
Platform
BO CMS
SAP HANA
/ SAP BW
SAP MDM
SAPBO
DataServices
3rd Party
3rd Party
SAP HANA
Big Data Reference Architecture
SAP
Columnar
Databases
Structured Data Sources
Data
Integration
Data Repositories
MDM
End User Analytics
Reports
Unstructured/Semi-
structured Data Sources
Web logs, Application /
Network log, Social, Chat
transcripts, Emails
Legacy Applications
and ERP
Data
Extraction
External feeds
Instrumentation data /
Sensors, RFID, Telematics,
Time and Location data
Real-time Streaming
Data
Quality
CDC for
Structured
Data
CDC for
Unstructured
Data
Hadoop
Platform
ODS
Data
Warehouse
DW
Appliance
Data
Marts
MOLAP
Cube
In-memory
Databases
Semi /
Unstructured
Scorecards /
Metrics
Events /
Alerts
Predictive
Analytics
Text
Analytics
Content
Analytics
InfoSphere
InformationServer
Dash
boards
CognosBuisnessIntelligence
Enterprise
Visual
Exploration
Mobile
BI
Cognos
TM1
Cognos
Mobile
PureData
(Netezza,
InfoSphere
Warehouse)
Cognos TM1
InfoSphere
Data Explorer
SPSS
Premium
SPSS
Content
Analytics
InfoSphere Streams
InfoSphere
CDC
Analytics
Sandbox
Big Insights /
Streams
Big Insights
InfoSphere
MDM
Big Insights /
NoSQL
Big Insights /
HBase
PureData(Netezza,
InfoSphereWarehouse,
ISAS)
Big Data Reference Architecture
IBM
Columnar
Databases
Structured Data Sources
Data
Integration
Data Repositories
MDM
End User Analytics
Reports
Unstructured/Semi-
structured Data Sources
Web logs, Application /
Network log, Social, Chat
transcripts, Emails
Legacy Applications
and ERP
Data
Extraction
External feeds
Instrumentation data /
Sensors, RFID, Telematics,
Time and Location data
Real-time Streaming
Data
Quality
CDC for
Structured
Data
CDC for
Unstructured
Data
Hadoop
Platform
ODS
Data
Warehouse
DW
Appliance
Data
Marts
MOLAP
Cube
In-memory
Databases
Semi /
Unstructured
Scorecards /
Metrics
Real Time
Decision Mgt.
Data
Mining
Predictive
Analytics
Text
Analytics
Data
Integrator
Exadata Dash
boards
BI Publisher
OBI Foundation
Suite
Visual
Exploration
Mobile
BI
Exalytics
OBI Mobile
Oracle/Exadata
Oracle /
Exadata
Essbase /
Hyperion
Exalytics
OBI Scorecard
Exalytics+
OracleREnt.
EndecaOracle Golden Gate
Analytics
Sandbox
Exalytics
Hadoop /
Golden Gate
Big Data
Appliance
Oracle MDM
Big Data
Appliance
Exadata EHCC
/ HBase
Silver Creek
Data Integrator
/ Golden Gate
Real-time
Decisions
Big Data Reference Architecture
ORACLE
Big Data Reference Architecture
Informatica+EMC+SAS
Columnar
Databases
Structured Data Sources
Data
Integration
Data Repositories
MDM
End User Analytics
Reports
Unstructured/Semi-
structured Data Sources
Legacy Applications
and ERP
Data
Extraction
External feeds
Instrumentation data /
Sensors RFID, Telematics,
Time and Location data
Real-time Streaming
Data
Quality
CDC for
Structured
Data
CDC for
Unstructured
Data
Hadoop
Platform
ODS
Data
Warehouse
DW
Appliance
Data
Marts
MOLAP
Cube
In-memory
Databases
Semi /
Unstructured
Scorecards /
Metrics
Data
Exploration
Predictive
Analytics
Text
Analytics
InformaticaPowerCenter&
DataQuality
EMC GreenPlum Dash
boards
SAS BI
Visual
Exploration
Mobile
BI
SAS Visual
Analytics
SAS BI
EMCGreenPlum
Database
EMC GreenPlum
SAS OLAP
Server
SAS Visual
BI
SAS Ent.
Miner
SAS Strategy
Mgt
JMP Pro
SAS Text
Miner
Informatica PowerCenter – Real-time edition
Analytics
Sandbox
EMC GreenPlum
UAP
Informatica
hParser /
Hadoop Pwx
EMC
Greenplum HD
EMC
GreenPlum
HD
HBase
Informatica
MDM
Web logs, Application /
Network log, Social, Chat
transcripts, Emails
Big Data Reference Architecture
Open Source Technologies
Columnar
Databases
Structured Data Sources
Data
Integration
Data Repositories
MDM
End User Analytics
Reports
Unstructured/Semi-
structured Data Sources
Legacy Applications
and ERP
Data
Extraction
External feeds
Instrumentation data /
Sensors RFID, Telematics,
Time and Location data
Real-time Streaming
Data
Quality
CDC for
Structured
Data
CDC for
Unstructured
Data
Hadoop
Platform
ODS
Data
Warehouse
DW
Appliance
Data
Marts
MOLAP
Cube
In-memory
Databases
Semi /
Unstructured
Scorecards /
Metrics
Predictive
Analytics
Text
Analytics
ApacheMapReduce,Pig,
TalendDataIntegration&DataQuality
Commercial
Product
Dash
boards
Visual
Exploration
Mobile
BI
Apache Derby
PentahoMob
ile BI
MySQL,Apache
Hive
MySQL, Hive
SAS OLAP
Server
R, Apache
Mahout
SAS Text
Miner
Apache Flume
Analytics
Sandbox
Apache HDFS +
R
Apache
Hadoop
HBase,
NoSQL
HBase
Talend MDM
Web logs, Application /
Network log, Social, Chat
transcripts, Emails
Pentaho
BusinessAnalytics,BI
What is Hadoop
• It’s a framework for large-scale data processing:
• Inspired by Google’s architecture:
• A top-level Apache project – Hadoop is open source
• Written in Java, plus a few shell scripts
• An open-source software framework that supports data-intensive
distributed applications
• Abstract and facilitate the storage and processing of large and
rapidly growing data sets
• Structured and non-structured data
• Simple programming models
2 key components of Core Hadoop
• Yahoo! : More than 100,000 CPUs in ~20,000 computers running Hadoop; biggest cluster: 2000 nodes
(2*4cpu boxes with 4TB disk each); used to support research for Ad Systems and Web Search
• AOL : Used for a variety of things ranging from statistics generation to running advanced algorithms for
doing behavioral analysis and targeting; cluster size is 50 machines, Intel Xeon, dual processors, dual
core, each with 16GB Ram and 800 GB hard-disk giving us a total of 37 TB HDFS capacity.
• Facebook: To store copies of internal log and dimension data sources and use it as a source for
reporting/analytics and machine learning; 320 machine cluster with 2,560 cores and about 1.3 PB raw
storage;
• FOX Interactive Media : 3 X 20 machine cluster (8 cores/machine, 2TB/machine storage) ; 10 machine
cluster (8 cores/machine, 1TB/machine storage); Used for log analysis, data mining and machine
learning
• NetSeer - Up to 1000 instances on Amazon EC2 ; Data storage in Amazon S3; Used for crawling,
processing, serving and log analysis
• Powerset / Microsoft - Natural Language Search; up to 400 instances on Amazon EC2 ; data storage
in Amazon S3
Hadoop uses every where
HDFS : High level architecture
• HDFS Follows a master-slave architecture
• 2 Major Daemons in HDFS –
• Name Node
• Data Node
• Master : Name Node
• Responsible for namespace and metadata
• Namespace : file hierarchy
• Metadata : ownership, permissions, block locations etc
• Slave : DataNode
• Responsible for storing actual data blocks
MapReduce : High Level Architecture
• Map reduce has a master slave architecture too
• 2 Daemon processes
• Master : Job Tracker
• Responsible for dividing, scheduling and monitoring work
• Slave : Task Tracker
• Responsible for actual processing
High Level View
Apache Hadoop Ecosystem
Disruptions
1 Japanese dating app
2.Heart implants
MOOC
3
Sensored cows in Netherland
Googles autonomous car
What's common to the following game changing solutions ?
1
2
3
4 5
Japanese dating app
Sensored cows in Netherland Googles autonomous car
MOOC
Heart implants
At the core there is a deep
embedded DATA PRODUCT !
Created by DATA SCIENCE !
Conquer the world ! Become Data Scientist
• How our health gets cared
for ?
• How we learn ?
• How we fall in love ?
• How we do farming ?
• How we drive ?
The world around is changing… Our lives are intimately Surrounded by Data products
(an intimate fabric of our lives)
• Amazon Defeated Borders ( Books )
• Netflix Defeated Blockbuster ( Video )
• iTunes Defeated Tower records ( Music )
• Google defeated Yahoo ( Search ) – Page rank algorithm
How did the following players disrupt the Marketplace ?
If Data Science is not integral you are no longer in the game
Demystifying
Data Science
( in simple plain everyday English  )
In a Nutshell
• Data Science is the extraction of knowledge from data
• Data Science is the art of turning data into actions
• The ability to take data—to be able to understand it, to process it,
to extract value from it, to visualize it, to communicate it
• Data Science seeks to
• Extract meaning from data
• Create " Data Products"
• Use all available data to tell a valuable story to non- practioners
The future belongs to the companies and people that
turn data into products
Data Science is every where
40
Known Unknowns
(BI)
Unknown Unknowns
( Data Science )
Lots of $ impacting patterns
Unnoticed
Waiting to be discovered!
Data Science vs.BI
“As is” state in most organizations
Data
( Sales , Finance )
Reports
( BO, Cognos, MSAS )
“As is” stage with leading game changers
Data repository
Insights
Analytics cell + Modeling processes
( Segment, Score, Text mine )
Move from Reports  Insightful Actions that Impact
What's are 4 core differences between Data Science & Dashboards ?
Data repository
Dashboards
Data repository
(Purchase habits)
Signal
(Similiar people discovery)
ML process
(Collaborative filtering)
Actions
(Recommend a product )
Outcomes
(Improve cross sell)
2
3
4
Dashboards
1
ML + Signals + Actions = Game Changing Outcomes
What exactly is an model ?
• Mathematically defining a real world phenomena
• Representative of real world
• For example cross sell model
What are 3 common things between
predictive models and caricatures ?
• Its an approximation, not
a perfection
• Its better than not having
anything
• It get the job done
REAL WORLD
ANALYTICAL MODEL
Use data
to discover Signals (patterns)
that cause changes
that impacts $ .
What's the Goal of Data Science ?
Data Science Reference Architecture – Key components
Hadoop
Hive
Hana
Info bright
Clustering
Text mining
Mobile
Digital
Data Ingestion Pipeline
Machine Learning Reference Architecture
STORE
( Hadoop, Hive, HANA, Cloudera, Splunk, Hortonworks)
SENSE
( signal extraction- text mining, scoring models ),
RESPOND
( Front line actions thru website, call centre )
1
2
3
Snapshot of Machine Learning Techniques
1. Segmentation
3.Forecasting
5. Scoring models
2.Text mining
4. Visual Analytics
6.Optimisation
1. Customer behavior segmentation
2. Defect segmentation
3. Employee segmentation model
4. Supplier segmentation mode
5. “Chunking” groups
6. Discovered by algorithm
1. Convert messy unstructured text into actionable signals
2. Keyword frequencies
3. Sentiment ratios
4. Blogs
5. Call center transcripts
6. Emails
7. Multi channel sentiment analysis
1. Predict CLTV
2. Predict Sales at a neighborhood outlet
3. Predict Salary based on experience, qualification,
rating, market demand
4. Identify drivers of behavior
5. Weights processing
1. Beyond line, bar , pie charts
2. Geospatial modeling to see geo correlation
3. Spread analysis
4. Outlier detection
1. Churn propensity
2. Cross sell
3. Attrition modeling in HR
4. Risk scoring models in Banking
5. Logistic
6. Neural networks
7. Decision trees
8. Support Vector machines
1. Constraint modeling
2. Maximize an outcome
3. Maximize sales without cannibalizing sister brands
Its all about DETECTING PATTERNS !
1. Segmentation
2. Unstructured Text Mining
Real world Unstructured text mining in
health care
Doctors transcripts
Split sentences
onto
words/tokens
Step-1 : SPLIT
Filter “noise”
words eg : I ,
the, is, was,
Step-2 : FILTER ‘Pulmonary’=
‘pulmonar’
‘Insomnia’ = ‘Sleep’ =
‘Sleeplessnes;
‘
Step-3 : STEMMING
Keyword extraction &
Theme generation
Step-4 : THEME EXTRACTION
Step-5 : THEME /
KEYWORD ANALYSIS
Lab diagnostics Nurses Observations
Cardiac
watch list
Oncology
watch list
Pulmonary
watch list
Diabetic
watch list
Schizophreni
a watch list
3. Scoring Models
4. Forecasting !
5. Recommenders
Industries disrupted by Data Science
• Infrastructure optimisation, Network securityTelecom
• Customer sentiment, Multi channel analysisBanking
• Consumer engagement, Recommendation enginesDigital channel
• Autonomous cards, Fords OnStarAutomotive
• WearablesHealth care
• Operations optimisationOil n Gas
• DigitisationRetail
What factors are driving companies towards data science ?
• Competitive advantage in the market place ( get ahead fast using unique insights )
• Existential threat ( others are moving ahead fast and I need to catch up )
• Revenue enhancement ( Cross sell models, recommenders )
• Cost optimisation ( Operational efficiency )
Technology behind Data Science
Algorithams
Machine learning
Predictive
analytics
R
Why is Big Data HOT ?
Big Data jobs are Exploding!
Data Science jobs are Exploding!
Data Science Jobs exploding in India too !
1
2
3
Transform yourself to 21st Century Skills
The 6 Most Desired Skills in 2015
1
2
3
To summarize
3 key takeaways …
FAQ
FAQ-1: “I am confused between Hadoop and Data Science …
What's difference between Hadoop and Data Science?”
• Hadoop = Data Infrastructure layer
• Data Science = Sensing patterns from data to impact business outcome
FAQ-2 : “I have worked on SAP, Oracle, etc How do I transition
to becoming a Data Scientist ?”
• Execute your first Data Science pilot
• Step-1 : Learn R
• Step-2 : Zero in on a business problem to solve
• Step-3 : Setup R Your technology connector …Get access to data from your
Technology
• Step-4 : Apply an Analytical construct ( VEDA ML )
• Step-5 : Discover the pattern which impacts the outcome
• Step-6 : Present final results to executive business team
• Explore setting up a Data science project within existing organisation
• Meetups to explore the outside world
FAQ-3: “Should I know probability and advanced statistics ?”
• Not really
• We are focussed on APPLICATION and not THEORY underpinning it
• We will teach you
• Business problem to solve
• How to execute the command on a platform
• What to look for in the output
• What happens within the black box can be seen later
FAQ-4: “This is a big shift for me … In your experience how long
does it take to make the transition from IT to Data Science ?”
• We have seen people make the transition from 4 weeks to about 6 months
• It depends upon the time + passion + drive you have
FAQ-5: “How are we going to prepare you for the data science
job market ?”
1. Mock preparatory sessions
2. Worksheets + Modelling Checklists + Data Science Playbooks
3. Live projects on clustering , scoring which can be put in resume
4. Our strategic tie-ups with Organisations looking for data science skills
5. Top 30 Practitioner generated Data Science questions
FAQ-6: “I am not an IT professional but a domain person. How
can I get started ?”
1. Option-1 : Focus on Industry use cases
2. Option-2 : Take basic introduction to data sciences
Big Data Resources• datasciencecentral.com
• bigdatauniversity.com
• Courseera.com
• Big Data Architecture
• Spotting Signals in Big Data
• Signal Extraction Methodology
• Advanced Visualization in Big Data
• Exploratory Data Analysis (EDA) : Quick Deep Dive
• Best practices in designing dashboards and scorecards
• Exploring Big Data Using Bivariate Analysis
• Where to start looking in Big Data using Univariate Analysis
• Big Data Platform & Applications
• Statistics Role in Data Science
• Applied Mathematics Role in Data Science
• Data-Scientist-playbook
• 5-disruption-data-products By Data Science
All The Best
Happy Hadooping & Dating with Data Science
Conquer the world !
Become Data Scientist
Demystify big data  data science

Más contenido relacionado

La actualidad más candente

Tools for Unstructured Data Analytics
Tools for Unstructured Data AnalyticsTools for Unstructured Data Analytics
Tools for Unstructured Data AnalyticsRavi Teja
 
Big Data Landscape 2018
Big Data Landscape 2018Big Data Landscape 2018
Big Data Landscape 2018Leanne Hwee
 
Introduction to Data Mining, Business Intelligence and Data Science
Introduction to Data Mining, Business Intelligence and Data ScienceIntroduction to Data Mining, Business Intelligence and Data Science
Introduction to Data Mining, Business Intelligence and Data ScienceIMC Institute
 
Big data Presentation
Big data PresentationBig data Presentation
Big data PresentationAswadmehar
 
Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...
Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...
Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...Edureka!
 
AI & Big Data Analytics : Innovation trends and use cases
AI & Big Data Analytics : Innovation trends and use casesAI & Big Data Analytics : Innovation trends and use cases
AI & Big Data Analytics : Innovation trends and use casesSarvesh Kumar
 
Big Data Applications | Big Data Application Examples | Big Data Use Cases | ...
Big Data Applications | Big Data Application Examples | Big Data Use Cases | ...Big Data Applications | Big Data Application Examples | Big Data Use Cases | ...
Big Data Applications | Big Data Application Examples | Big Data Use Cases | ...Simplilearn
 
What is big data ? | Big Data Applications
What is big data ? | Big Data ApplicationsWhat is big data ? | Big Data Applications
What is big data ? | Big Data ApplicationsShilpaKrishna6
 
Data Science Applications | Data Science For Beginners | Data Science Trainin...
Data Science Applications | Data Science For Beginners | Data Science Trainin...Data Science Applications | Data Science For Beginners | Data Science Trainin...
Data Science Applications | Data Science For Beginners | Data Science Trainin...Edureka!
 
MAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAI
MAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAIMAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAI
MAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAIBig Data Week
 
Big Data Science: Intro and Benefits
Big Data Science: Intro and BenefitsBig Data Science: Intro and Benefits
Big Data Science: Intro and BenefitsChandan Rajah
 
Data Science Project Lifecycle
Data Science Project LifecycleData Science Project Lifecycle
Data Science Project LifecycleJason Geng
 
Big data course | big data training | big data classes
Big data course | big data training | big data classesBig data course | big data training | big data classes
Big data course | big data training | big data classesNaviWalker
 
Career in Data Science
Career in Data ScienceCareer in Data Science
Career in Data ScienceActonRoy
 

La actualidad más candente (20)

Tools for Unstructured Data Analytics
Tools for Unstructured Data AnalyticsTools for Unstructured Data Analytics
Tools for Unstructured Data Analytics
 
Big Data Landscape 2018
Big Data Landscape 2018Big Data Landscape 2018
Big Data Landscape 2018
 
Introduction to Data Mining, Business Intelligence and Data Science
Introduction to Data Mining, Business Intelligence and Data ScienceIntroduction to Data Mining, Business Intelligence and Data Science
Introduction to Data Mining, Business Intelligence and Data Science
 
Big Data analytics
Big Data analyticsBig Data analytics
Big Data analytics
 
Big data Presentation
Big data PresentationBig data Presentation
Big data Presentation
 
Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...
Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...
Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...
 
Big data-analytics-ebook
Big data-analytics-ebookBig data-analytics-ebook
Big data-analytics-ebook
 
AI & Big Data Analytics : Innovation trends and use cases
AI & Big Data Analytics : Innovation trends and use casesAI & Big Data Analytics : Innovation trends and use cases
AI & Big Data Analytics : Innovation trends and use cases
 
Big data
Big dataBig data
Big data
 
Big Data Applications | Big Data Application Examples | Big Data Use Cases | ...
Big Data Applications | Big Data Application Examples | Big Data Use Cases | ...Big Data Applications | Big Data Application Examples | Big Data Use Cases | ...
Big Data Applications | Big Data Application Examples | Big Data Use Cases | ...
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
A Big Data Concept
A Big Data ConceptA Big Data Concept
A Big Data Concept
 
What is big data ? | Big Data Applications
What is big data ? | Big Data ApplicationsWhat is big data ? | Big Data Applications
What is big data ? | Big Data Applications
 
Data Science Applications | Data Science For Beginners | Data Science Trainin...
Data Science Applications | Data Science For Beginners | Data Science Trainin...Data Science Applications | Data Science For Beginners | Data Science Trainin...
Data Science Applications | Data Science For Beginners | Data Science Trainin...
 
MAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAI
MAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAIMAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAI
MAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAI
 
Big Data Science: Intro and Benefits
Big Data Science: Intro and BenefitsBig Data Science: Intro and Benefits
Big Data Science: Intro and Benefits
 
Bigdata analytics
Bigdata analyticsBigdata analytics
Bigdata analytics
 
Data Science Project Lifecycle
Data Science Project LifecycleData Science Project Lifecycle
Data Science Project Lifecycle
 
Big data course | big data training | big data classes
Big data course | big data training | big data classesBig data course | big data training | big data classes
Big data course | big data training | big data classes
 
Career in Data Science
Career in Data ScienceCareer in Data Science
Career in Data Science
 

Destacado

Hadoop and Data Access Security
Hadoop and Data Access SecurityHadoop and Data Access Security
Hadoop and Data Access SecurityCloudera, Inc.
 
MATATABI: Cyber Threat Analysis and Defense Platform using Huge Amount of Dat...
MATATABI: Cyber Threat Analysis and Defense Platform using Huge Amount of Dat...MATATABI: Cyber Threat Analysis and Defense Platform using Huge Amount of Dat...
MATATABI: Cyber Threat Analysis and Defense Platform using Huge Amount of Dat...APNIC
 
Enterprise Approach towards Cost Savings and Enterprise Agility
Enterprise Approach towards Cost Savings and Enterprise AgilityEnterprise Approach towards Cost Savings and Enterprise Agility
Enterprise Approach towards Cost Savings and Enterprise AgilityNUS-ISS
 
Building Hadoop Data Applications with Kite by Tom White
Building Hadoop Data Applications with Kite by Tom WhiteBuilding Hadoop Data Applications with Kite by Tom White
Building Hadoop Data Applications with Kite by Tom WhiteThe Hive
 
Balancing Mobile UX & Security: An API Management Perspective Presentation fr...
Balancing Mobile UX & Security: An API Management Perspective Presentation fr...Balancing Mobile UX & Security: An API Management Perspective Presentation fr...
Balancing Mobile UX & Security: An API Management Perspective Presentation fr...CA API Management
 
Kerberos, Token and Hadoop
Kerberos, Token and HadoopKerberos, Token and Hadoop
Kerberos, Token and HadoopKai Zheng
 
Hadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache KnoxHadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache KnoxVinay Shukla
 
Building hadoop based big data environment
Building hadoop based big data environmentBuilding hadoop based big data environment
Building hadoop based big data environmentEvans Ye
 
Big Data: Opportunities, Strategy and Challenges
Big Data: Opportunities, Strategy and ChallengesBig Data: Opportunities, Strategy and Challenges
Big Data: Opportunities, Strategy and ChallengesGregg Barrett
 
Real time big data analytical architecture for remote sensing application
Real time big data analytical architecture for remote sensing applicationReal time big data analytical architecture for remote sensing application
Real time big data analytical architecture for remote sensing applicationLeMeniz Infotech
 
Big Data Security Intelligence and Analytics for Advanced Threat Protection
Big Data Security Intelligence and Analytics for Advanced Threat ProtectionBig Data Security Intelligence and Analytics for Advanced Threat Protection
Big Data Security Intelligence and Analytics for Advanced Threat ProtectionBlue Coat
 
Open-BDA - Big Data Hadoop Developer Training 10th & 11th June
Open-BDA - Big Data Hadoop Developer Training 10th & 11th JuneOpen-BDA - Big Data Hadoop Developer Training 10th & 11th June
Open-BDA - Big Data Hadoop Developer Training 10th & 11th JuneInnovative Management Services
 
Hdp security overview
Hdp security overview Hdp security overview
Hdp security overview Hortonworks
 
"Big Data" in the Energy Industry
"Big Data" in the Energy Industry"Big Data" in the Energy Industry
"Big Data" in the Energy IndustryPaige Bailey
 

Destacado (17)

Hadoop and Data Access Security
Hadoop and Data Access SecurityHadoop and Data Access Security
Hadoop and Data Access Security
 
MATATABI: Cyber Threat Analysis and Defense Platform using Huge Amount of Dat...
MATATABI: Cyber Threat Analysis and Defense Platform using Huge Amount of Dat...MATATABI: Cyber Threat Analysis and Defense Platform using Huge Amount of Dat...
MATATABI: Cyber Threat Analysis and Defense Platform using Huge Amount of Dat...
 
Enterprise Approach towards Cost Savings and Enterprise Agility
Enterprise Approach towards Cost Savings and Enterprise AgilityEnterprise Approach towards Cost Savings and Enterprise Agility
Enterprise Approach towards Cost Savings and Enterprise Agility
 
Building Hadoop Data Applications with Kite by Tom White
Building Hadoop Data Applications with Kite by Tom WhiteBuilding Hadoop Data Applications with Kite by Tom White
Building Hadoop Data Applications with Kite by Tom White
 
Balancing Mobile UX & Security: An API Management Perspective Presentation fr...
Balancing Mobile UX & Security: An API Management Perspective Presentation fr...Balancing Mobile UX & Security: An API Management Perspective Presentation fr...
Balancing Mobile UX & Security: An API Management Perspective Presentation fr...
 
Kerberos, Token and Hadoop
Kerberos, Token and HadoopKerberos, Token and Hadoop
Kerberos, Token and Hadoop
 
Hadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache KnoxHadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache Knox
 
Building hadoop based big data environment
Building hadoop based big data environmentBuilding hadoop based big data environment
Building hadoop based big data environment
 
Big Data: Opportunities, Strategy and Challenges
Big Data: Opportunities, Strategy and ChallengesBig Data: Opportunities, Strategy and Challenges
Big Data: Opportunities, Strategy and Challenges
 
Real time big data analytical architecture for remote sensing application
Real time big data analytical architecture for remote sensing applicationReal time big data analytical architecture for remote sensing application
Real time big data analytical architecture for remote sensing application
 
Big Data Security and Governance
Big Data Security and GovernanceBig Data Security and Governance
Big Data Security and Governance
 
Hadoop security
Hadoop securityHadoop security
Hadoop security
 
Open-BDA Hadoop Summt 2014 - Post Summit Report
Open-BDA Hadoop Summt 2014 - Post Summit ReportOpen-BDA Hadoop Summt 2014 - Post Summit Report
Open-BDA Hadoop Summt 2014 - Post Summit Report
 
Big Data Security Intelligence and Analytics for Advanced Threat Protection
Big Data Security Intelligence and Analytics for Advanced Threat ProtectionBig Data Security Intelligence and Analytics for Advanced Threat Protection
Big Data Security Intelligence and Analytics for Advanced Threat Protection
 
Open-BDA - Big Data Hadoop Developer Training 10th & 11th June
Open-BDA - Big Data Hadoop Developer Training 10th & 11th JuneOpen-BDA - Big Data Hadoop Developer Training 10th & 11th June
Open-BDA - Big Data Hadoop Developer Training 10th & 11th June
 
Hdp security overview
Hdp security overview Hdp security overview
Hdp security overview
 
"Big Data" in the Energy Industry
"Big Data" in the Energy Industry"Big Data" in the Energy Industry
"Big Data" in the Energy Industry
 

Similar a Demystify big data data science

Demystify Big Data, Data Science & Signal Extraction Deep Dive
Demystify Big Data, Data Science & Signal Extraction Deep DiveDemystify Big Data, Data Science & Signal Extraction Deep Dive
Demystify Big Data, Data Science & Signal Extraction Deep DiveHyderabad Scalability Meetup
 
Hadoop 2.0: YARN to Further Optimize Data Processing
Hadoop 2.0: YARN to Further Optimize Data ProcessingHadoop 2.0: YARN to Further Optimize Data Processing
Hadoop 2.0: YARN to Further Optimize Data ProcessingHortonworks
 
50 Shades of Data - how, when and why Big,Relational,NoSQL,Elastic,Event,CQRS...
50 Shades of Data - how, when and why Big,Relational,NoSQL,Elastic,Event,CQRS...50 Shades of Data - how, when and why Big,Relational,NoSQL,Elastic,Event,CQRS...
50 Shades of Data - how, when and why Big,Relational,NoSQL,Elastic,Event,CQRS...Lucas Jellema
 
Take Action: The New Reality of Data-Driven Business
Take Action: The New Reality of Data-Driven BusinessTake Action: The New Reality of Data-Driven Business
Take Action: The New Reality of Data-Driven BusinessInside Analysis
 
50 Shades of Data - Dutch Oracle Architects Platform (February 2018)
50 Shades of Data - Dutch Oracle Architects Platform (February 2018)50 Shades of Data - Dutch Oracle Architects Platform (February 2018)
50 Shades of Data - Dutch Oracle Architects Platform (February 2018)Lucas Jellema
 
Big Data Meetup: Analytical Systems Evolution
Big Data Meetup: Analytical Systems EvolutionBig Data Meetup: Analytical Systems Evolution
Big Data Meetup: Analytical Systems EvolutionProvectus
 
A Winning Strategy for the Digital Economy
A Winning Strategy for the Digital EconomyA Winning Strategy for the Digital Economy
A Winning Strategy for the Digital EconomyEric Kavanagh
 
A modern data platform meets the needs of each type of data in your business
A modern data platform meets the needs of each type of data in your businessA modern data platform meets the needs of each type of data in your business
A modern data platform meets the needs of each type of data in your businessMarcos Quezada
 
Building IoT and Big Data Solutions on Azure
Building IoT and Big Data Solutions on AzureBuilding IoT and Big Data Solutions on Azure
Building IoT and Big Data Solutions on AzureIdo Flatow
 
4th Industrial Revolution
4th Industrial Revolution4th Industrial Revolution
4th Industrial RevolutionRolando Rangel
 
Track B-1 建構新世代的智慧數據平台
Track B-1 建構新世代的智慧數據平台Track B-1 建構新世代的智慧數據平台
Track B-1 建構新世代的智慧數據平台Etu Solution
 
Getting Started with Data Virtualization – What problems DV solves
Getting Started with Data Virtualization – What problems DV solvesGetting Started with Data Virtualization – What problems DV solves
Getting Started with Data Virtualization – What problems DV solvesDenodo
 
CS8091_BDA_Unit_I_Analytical_Architecture
CS8091_BDA_Unit_I_Analytical_ArchitectureCS8091_BDA_Unit_I_Analytical_Architecture
CS8091_BDA_Unit_I_Analytical_ArchitecturePalani Kumar
 
Best practices to deliver data analytics to the business with power bi
Best practices to deliver data analytics to the business with power biBest practices to deliver data analytics to the business with power bi
Best practices to deliver data analytics to the business with power biSatya Shyam K Jayanty
 
Moving Targets: Harnessing Real-time Value from Data in Motion
Moving Targets: Harnessing Real-time Value from Data in Motion Moving Targets: Harnessing Real-time Value from Data in Motion
Moving Targets: Harnessing Real-time Value from Data in Motion Inside Analysis
 
Introduction Big Data
Introduction Big DataIntroduction Big Data
Introduction Big DataFrank Kienle
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Group
 
Overview - IBM Big Data Platform
Overview - IBM Big Data PlatformOverview - IBM Big Data Platform
Overview - IBM Big Data PlatformVikas Manoria
 
How does Microsoft solve Big Data?
How does Microsoft solve Big Data?How does Microsoft solve Big Data?
How does Microsoft solve Big Data?James Serra
 
Turning data from insights into value
Turning data from insights into valueTurning data from insights into value
Turning data from insights into valueKoray Kocabas
 

Similar a Demystify big data data science (20)

Demystify Big Data, Data Science & Signal Extraction Deep Dive
Demystify Big Data, Data Science & Signal Extraction Deep DiveDemystify Big Data, Data Science & Signal Extraction Deep Dive
Demystify Big Data, Data Science & Signal Extraction Deep Dive
 
Hadoop 2.0: YARN to Further Optimize Data Processing
Hadoop 2.0: YARN to Further Optimize Data ProcessingHadoop 2.0: YARN to Further Optimize Data Processing
Hadoop 2.0: YARN to Further Optimize Data Processing
 
50 Shades of Data - how, when and why Big,Relational,NoSQL,Elastic,Event,CQRS...
50 Shades of Data - how, when and why Big,Relational,NoSQL,Elastic,Event,CQRS...50 Shades of Data - how, when and why Big,Relational,NoSQL,Elastic,Event,CQRS...
50 Shades of Data - how, when and why Big,Relational,NoSQL,Elastic,Event,CQRS...
 
Take Action: The New Reality of Data-Driven Business
Take Action: The New Reality of Data-Driven BusinessTake Action: The New Reality of Data-Driven Business
Take Action: The New Reality of Data-Driven Business
 
50 Shades of Data - Dutch Oracle Architects Platform (February 2018)
50 Shades of Data - Dutch Oracle Architects Platform (February 2018)50 Shades of Data - Dutch Oracle Architects Platform (February 2018)
50 Shades of Data - Dutch Oracle Architects Platform (February 2018)
 
Big Data Meetup: Analytical Systems Evolution
Big Data Meetup: Analytical Systems EvolutionBig Data Meetup: Analytical Systems Evolution
Big Data Meetup: Analytical Systems Evolution
 
A Winning Strategy for the Digital Economy
A Winning Strategy for the Digital EconomyA Winning Strategy for the Digital Economy
A Winning Strategy for the Digital Economy
 
A modern data platform meets the needs of each type of data in your business
A modern data platform meets the needs of each type of data in your businessA modern data platform meets the needs of each type of data in your business
A modern data platform meets the needs of each type of data in your business
 
Building IoT and Big Data Solutions on Azure
Building IoT and Big Data Solutions on AzureBuilding IoT and Big Data Solutions on Azure
Building IoT and Big Data Solutions on Azure
 
4th Industrial Revolution
4th Industrial Revolution4th Industrial Revolution
4th Industrial Revolution
 
Track B-1 建構新世代的智慧數據平台
Track B-1 建構新世代的智慧數據平台Track B-1 建構新世代的智慧數據平台
Track B-1 建構新世代的智慧數據平台
 
Getting Started with Data Virtualization – What problems DV solves
Getting Started with Data Virtualization – What problems DV solvesGetting Started with Data Virtualization – What problems DV solves
Getting Started with Data Virtualization – What problems DV solves
 
CS8091_BDA_Unit_I_Analytical_Architecture
CS8091_BDA_Unit_I_Analytical_ArchitectureCS8091_BDA_Unit_I_Analytical_Architecture
CS8091_BDA_Unit_I_Analytical_Architecture
 
Best practices to deliver data analytics to the business with power bi
Best practices to deliver data analytics to the business with power biBest practices to deliver data analytics to the business with power bi
Best practices to deliver data analytics to the business with power bi
 
Moving Targets: Harnessing Real-time Value from Data in Motion
Moving Targets: Harnessing Real-time Value from Data in Motion Moving Targets: Harnessing Real-time Value from Data in Motion
Moving Targets: Harnessing Real-time Value from Data in Motion
 
Introduction Big Data
Introduction Big DataIntroduction Big Data
Introduction Big Data
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2
 
Overview - IBM Big Data Platform
Overview - IBM Big Data PlatformOverview - IBM Big Data Platform
Overview - IBM Big Data Platform
 
How does Microsoft solve Big Data?
How does Microsoft solve Big Data?How does Microsoft solve Big Data?
How does Microsoft solve Big Data?
 
Turning data from insights into value
Turning data from insights into valueTurning data from insights into value
Turning data from insights into value
 

Más de Mahesh Kumar CV

New mindset Game Changing habits for Startups
New mindset Game Changing habits for StartupsNew mindset Game Changing habits for Startups
New mindset Game Changing habits for StartupsMahesh Kumar CV
 
AI offerings 4 colleges & Universities
AI offerings 4 colleges & UniversitiesAI offerings 4 colleges & Universities
AI offerings 4 colleges & UniversitiesMahesh Kumar CV
 
AI 4 Institution Leaders_Feb 2019
AI  4 Institution Leaders_Feb 2019AI  4 Institution Leaders_Feb 2019
AI 4 Institution Leaders_Feb 2019Mahesh Kumar CV
 
7 mahi communication mantras
7 mahi communication mantras7 mahi communication mantras
7 mahi communication mantrasMahesh Kumar CV
 
Nuts and Bolts of Startup
Nuts and Bolts of StartupNuts and Bolts of Startup
Nuts and Bolts of StartupMahesh Kumar CV
 
7 mindsets to survive in VUCA world
7 mindsets to survive in VUCA world7 mindsets to survive in VUCA world
7 mindsets to survive in VUCA worldMahesh Kumar CV
 
What to look for in segmentation output
What to look for in segmentation outputWhat to look for in segmentation output
What to look for in segmentation outputMahesh Kumar CV
 
Best Practicies for designing dasboards score cards
Best Practicies for designing dasboards score cardsBest Practicies for designing dasboards score cards
Best Practicies for designing dasboards score cardsMahesh Kumar CV
 
8 minute intro to data science
8 minute intro to data science 8 minute intro to data science
8 minute intro to data science Mahesh Kumar CV
 
World Biggest SAP Inside Track ( Sep 13 2014)
World Biggest SAP Inside Track ( Sep 13 2014)World Biggest SAP Inside Track ( Sep 13 2014)
World Biggest SAP Inside Track ( Sep 13 2014)Mahesh Kumar CV
 
World Biggest SAP Inside Track ( Sep 13 2014)
World Biggest SAP Inside Track ( Sep 13 2014)World Biggest SAP Inside Track ( Sep 13 2014)
World Biggest SAP Inside Track ( Sep 13 2014)Mahesh Kumar CV
 

Más de Mahesh Kumar CV (15)

New mindset Game Changing habits for Startups
New mindset Game Changing habits for StartupsNew mindset Game Changing habits for Startups
New mindset Game Changing habits for Startups
 
AI offerings 4 colleges & Universities
AI offerings 4 colleges & UniversitiesAI offerings 4 colleges & Universities
AI offerings 4 colleges & Universities
 
AI 4 Institution Leaders_Feb 2019
AI  4 Institution Leaders_Feb 2019AI  4 Institution Leaders_Feb 2019
AI 4 Institution Leaders_Feb 2019
 
7 mahi communication mantras
7 mahi communication mantras7 mahi communication mantras
7 mahi communication mantras
 
AI Blue Print
AI Blue PrintAI Blue Print
AI Blue Print
 
Nuts and Bolts of Startup
Nuts and Bolts of StartupNuts and Bolts of Startup
Nuts and Bolts of Startup
 
7 mindsets to survive in VUCA world
7 mindsets to survive in VUCA world7 mindsets to survive in VUCA world
7 mindsets to survive in VUCA world
 
What to look for in segmentation output
What to look for in segmentation outputWhat to look for in segmentation output
What to look for in segmentation output
 
Best Practicies for designing dasboards score cards
Best Practicies for designing dasboards score cardsBest Practicies for designing dasboards score cards
Best Practicies for designing dasboards score cards
 
Why is data science hot
Why is data science hotWhy is data science hot
Why is data science hot
 
8 minute intro to data science
8 minute intro to data science 8 minute intro to data science
8 minute intro to data science
 
World Biggest SAP Inside Track ( Sep 13 2014)
World Biggest SAP Inside Track ( Sep 13 2014)World Biggest SAP Inside Track ( Sep 13 2014)
World Biggest SAP Inside Track ( Sep 13 2014)
 
World Biggest SAP Inside Track ( Sep 13 2014)
World Biggest SAP Inside Track ( Sep 13 2014)World Biggest SAP Inside Track ( Sep 13 2014)
World Biggest SAP Inside Track ( Sep 13 2014)
 
Sitchn 2014 overview
Sitchn 2014 overviewSitchn 2014 overview
Sitchn 2014 overview
 
SITHYD 2014 Overview
SITHYD 2014 OverviewSITHYD 2014 Overview
SITHYD 2014 Overview
 

Último

Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 

Último (20)

Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 

Demystify big data data science

  • 2. Technology Basics Big Data Overview & Snapshot Big Data Architecture : Deep Dive Hadoop Overview Clear Understanding of Data Science Big Data Career Opportunities Q & A 1 What we will cover in the 60 mins 2 3 4 5 6 7
  • 3. Apart from that we will also cover … • An overview of the shift to Data Science Platforms • The 3 critical components of a Data Science platform • Industries that are most likely to get disrupted and shift to Data Science • Characteristics of firms that get left behind the Data Science wave • Factors that push an industry towards Data Science • A brief overview of aspects of platform architecture beyond technology
  • 4. Who am I ? • Mahesh Kumar CV is A Big Data Entrepreneur • Mahesh got about 14 years of experience in architecting and developing distributed and real-time data-driven systems. • Specialties: Translating big data into action, Big Data Trainings, Product Engineering Services, and Building Big Data CoE & Big Data Incubators • Written more than 60 Blogs in Big Data & SAP Analytics • Worked in the past with IBM, Mindtree, CSC & Rolta companies • Conducted couple of Boot camps & Workshops in Different companies
  • 5. Data Vs Information • Data refers to a collection of numbers, characters and is a relative term; • Data is Raw, Facts , Figures etc • Information is Process Data
  • 6. Structure Data Vs Unstructured Data
  • 7.
  • 8. So where is this data getting generated ? Social Networking and Media: 700 million Facebook users, 250 million Twitter users 175+ million public blogs Each Facebook update, Tweet, blog post and comment creates multiple new data points, both structured, semi-structured and unstructured Mobile Devices: 5 billion mobile phones in use worldwide Each call, text and instant message is logged as data particularly smart phones and tablets, also make it easier to use social media Internet Transactions: Billions of online purchases, stock trades and other transactions happen every day, including countless automated transactions Each creates a number of data points collected by retailers, banks, credit cards, credit agencies and others Networked Devices and Sensors: Electronic devices of all sorts – including servers and other IT hardware, smart energy meters and temperature sensors -- all create semi- structured log data that record every action
  • 9. Build Vs Buy HUMAN DRIVEN EMAIL WEB LOGS DOCUMENTS SOCIAL MACHINE DRIVEN SATELLITE IMAGES BIO- INFORMATICS M2M LOG FILES SENSORS VIDEO AUDIO BUSINESS DRIVEN OLTP ALL DATA TYPES 1X 10X 100X BIG DATA TODAY BIG DATA TOMORROW
  • 10. Defining Big Data Any amount of data that's too BIGto be handled by one computer John Rauser
  • 11. Why Big Data 12 TB of Tweets in a Day 80% Of world’s data is unstructured 30 billion pieces of content shared on Facebook every month Expected Data in 2020 would be 35 ZB 5 Million Trade events per second 2267 Billion Internet Users 4.7 billion searches on Google per day 5 Billion people tweet,text,call and browse on mobile phones daily Walmart handles 1 Million transaction per hour 255 Million Websites
  • 12. Big Data Reference Architecture Structured Data Sources Data Integration (Batch / Near real-time) Data Repositories MDM End User Analytics Reports / Dashboards Unstructured/Semi- structured Data Sources Web logs, Application / Network log, Social, Chat transcripts, Emails Legacy applications, ERP and CRM applications Data Extraction External feeds Instrumentation data / Sensors, RFID, Telematics, Time and Location data Real-time Streaming/Integration Data Cleaning and Transformation Change Data Capture for Structured Data Change Data Capture ODS Analytics Data Warehouse DW Appliances Data Marts MOLAP CubeIn-memory Databases Unstructured / Semi- structured data Scorecards and Metrics Events and Alerts Data Mining and Exploration Predictive Analytics Text Analytics Visual Exploration Mobile BI Columnar Databases
  • 13. Columnar Databases Structured Data Sources Data Integration Data Repositories MDM End User Analytics Reports Unstructured/Semi- structured Data Sources Web logs, Application / Network log, Social, Chat transcripts, Emails Legacy and ERP Data Extraction, Transformation External feeds Instrumentation data / Sensors, RFID, Telematics, Time and Location data Real-time Streaming / Integration Data Quality CDC for Structured data Change Data Capture ODS DW DW Appliance Data Marts MOLAP Cube In-memory Databases Unstructured / Semi-structured Scorecards / Metrics Events / Alerts Data Mining Predictive Analytics Text Analytics HANA / BW / Sybase SAP HANA Dash boards BO WebI / Crystal Reports BO dashboard Data Exploration Mobile BI SAP HANA Sybase IQ / HANA BO Mobile SAP HANA/ Sybase RDS / Rapid Marts SAP BW SAP Lumira SAP Predictive Analysis Analytics Hadoop Platform BO CMS SAP HANA / SAP BW SAP MDM SAPBO DataServices 3rd Party 3rd Party SAP HANA Big Data Reference Architecture SAP
  • 14. Columnar Databases Structured Data Sources Data Integration Data Repositories MDM End User Analytics Reports Unstructured/Semi- structured Data Sources Web logs, Application / Network log, Social, Chat transcripts, Emails Legacy Applications and ERP Data Extraction External feeds Instrumentation data / Sensors, RFID, Telematics, Time and Location data Real-time Streaming Data Quality CDC for Structured Data CDC for Unstructured Data Hadoop Platform ODS Data Warehouse DW Appliance Data Marts MOLAP Cube In-memory Databases Semi / Unstructured Scorecards / Metrics Events / Alerts Predictive Analytics Text Analytics Content Analytics InfoSphere InformationServer Dash boards CognosBuisnessIntelligence Enterprise Visual Exploration Mobile BI Cognos TM1 Cognos Mobile PureData (Netezza, InfoSphere Warehouse) Cognos TM1 InfoSphere Data Explorer SPSS Premium SPSS Content Analytics InfoSphere Streams InfoSphere CDC Analytics Sandbox Big Insights / Streams Big Insights InfoSphere MDM Big Insights / NoSQL Big Insights / HBase PureData(Netezza, InfoSphereWarehouse, ISAS) Big Data Reference Architecture IBM
  • 15. Columnar Databases Structured Data Sources Data Integration Data Repositories MDM End User Analytics Reports Unstructured/Semi- structured Data Sources Web logs, Application / Network log, Social, Chat transcripts, Emails Legacy Applications and ERP Data Extraction External feeds Instrumentation data / Sensors, RFID, Telematics, Time and Location data Real-time Streaming Data Quality CDC for Structured Data CDC for Unstructured Data Hadoop Platform ODS Data Warehouse DW Appliance Data Marts MOLAP Cube In-memory Databases Semi / Unstructured Scorecards / Metrics Real Time Decision Mgt. Data Mining Predictive Analytics Text Analytics Data Integrator Exadata Dash boards BI Publisher OBI Foundation Suite Visual Exploration Mobile BI Exalytics OBI Mobile Oracle/Exadata Oracle / Exadata Essbase / Hyperion Exalytics OBI Scorecard Exalytics+ OracleREnt. EndecaOracle Golden Gate Analytics Sandbox Exalytics Hadoop / Golden Gate Big Data Appliance Oracle MDM Big Data Appliance Exadata EHCC / HBase Silver Creek Data Integrator / Golden Gate Real-time Decisions Big Data Reference Architecture ORACLE
  • 16. Big Data Reference Architecture Informatica+EMC+SAS Columnar Databases Structured Data Sources Data Integration Data Repositories MDM End User Analytics Reports Unstructured/Semi- structured Data Sources Legacy Applications and ERP Data Extraction External feeds Instrumentation data / Sensors RFID, Telematics, Time and Location data Real-time Streaming Data Quality CDC for Structured Data CDC for Unstructured Data Hadoop Platform ODS Data Warehouse DW Appliance Data Marts MOLAP Cube In-memory Databases Semi / Unstructured Scorecards / Metrics Data Exploration Predictive Analytics Text Analytics InformaticaPowerCenter& DataQuality EMC GreenPlum Dash boards SAS BI Visual Exploration Mobile BI SAS Visual Analytics SAS BI EMCGreenPlum Database EMC GreenPlum SAS OLAP Server SAS Visual BI SAS Ent. Miner SAS Strategy Mgt JMP Pro SAS Text Miner Informatica PowerCenter – Real-time edition Analytics Sandbox EMC GreenPlum UAP Informatica hParser / Hadoop Pwx EMC Greenplum HD EMC GreenPlum HD HBase Informatica MDM Web logs, Application / Network log, Social, Chat transcripts, Emails
  • 17. Big Data Reference Architecture Open Source Technologies Columnar Databases Structured Data Sources Data Integration Data Repositories MDM End User Analytics Reports Unstructured/Semi- structured Data Sources Legacy Applications and ERP Data Extraction External feeds Instrumentation data / Sensors RFID, Telematics, Time and Location data Real-time Streaming Data Quality CDC for Structured Data CDC for Unstructured Data Hadoop Platform ODS Data Warehouse DW Appliance Data Marts MOLAP Cube In-memory Databases Semi / Unstructured Scorecards / Metrics Predictive Analytics Text Analytics ApacheMapReduce,Pig, TalendDataIntegration&DataQuality Commercial Product Dash boards Visual Exploration Mobile BI Apache Derby PentahoMob ile BI MySQL,Apache Hive MySQL, Hive SAS OLAP Server R, Apache Mahout SAS Text Miner Apache Flume Analytics Sandbox Apache HDFS + R Apache Hadoop HBase, NoSQL HBase Talend MDM Web logs, Application / Network log, Social, Chat transcripts, Emails Pentaho BusinessAnalytics,BI
  • 18. What is Hadoop • It’s a framework for large-scale data processing: • Inspired by Google’s architecture: • A top-level Apache project – Hadoop is open source • Written in Java, plus a few shell scripts • An open-source software framework that supports data-intensive distributed applications • Abstract and facilitate the storage and processing of large and rapidly growing data sets • Structured and non-structured data • Simple programming models
  • 19. 2 key components of Core Hadoop
  • 20. • Yahoo! : More than 100,000 CPUs in ~20,000 computers running Hadoop; biggest cluster: 2000 nodes (2*4cpu boxes with 4TB disk each); used to support research for Ad Systems and Web Search • AOL : Used for a variety of things ranging from statistics generation to running advanced algorithms for doing behavioral analysis and targeting; cluster size is 50 machines, Intel Xeon, dual processors, dual core, each with 16GB Ram and 800 GB hard-disk giving us a total of 37 TB HDFS capacity. • Facebook: To store copies of internal log and dimension data sources and use it as a source for reporting/analytics and machine learning; 320 machine cluster with 2,560 cores and about 1.3 PB raw storage; • FOX Interactive Media : 3 X 20 machine cluster (8 cores/machine, 2TB/machine storage) ; 10 machine cluster (8 cores/machine, 1TB/machine storage); Used for log analysis, data mining and machine learning • NetSeer - Up to 1000 instances on Amazon EC2 ; Data storage in Amazon S3; Used for crawling, processing, serving and log analysis • Powerset / Microsoft - Natural Language Search; up to 400 instances on Amazon EC2 ; data storage in Amazon S3 Hadoop uses every where
  • 21. HDFS : High level architecture • HDFS Follows a master-slave architecture • 2 Major Daemons in HDFS – • Name Node • Data Node • Master : Name Node • Responsible for namespace and metadata • Namespace : file hierarchy • Metadata : ownership, permissions, block locations etc • Slave : DataNode • Responsible for storing actual data blocks
  • 22. MapReduce : High Level Architecture • Map reduce has a master slave architecture too • 2 Daemon processes • Master : Job Tracker • Responsible for dividing, scheduling and monitoring work • Slave : Task Tracker • Responsible for actual processing
  • 29. Sensored cows in Netherland
  • 31. What's common to the following game changing solutions ? 1 2 3 4 5 Japanese dating app Sensored cows in Netherland Googles autonomous car MOOC Heart implants
  • 32. At the core there is a deep embedded DATA PRODUCT !
  • 33. Created by DATA SCIENCE ! Conquer the world ! Become Data Scientist
  • 34. • How our health gets cared for ? • How we learn ? • How we fall in love ? • How we do farming ? • How we drive ? The world around is changing… Our lives are intimately Surrounded by Data products (an intimate fabric of our lives)
  • 35. • Amazon Defeated Borders ( Books ) • Netflix Defeated Blockbuster ( Video ) • iTunes Defeated Tower records ( Music ) • Google defeated Yahoo ( Search ) – Page rank algorithm How did the following players disrupt the Marketplace ?
  • 36. If Data Science is not integral you are no longer in the game
  • 37. Demystifying Data Science ( in simple plain everyday English  )
  • 38. In a Nutshell • Data Science is the extraction of knowledge from data • Data Science is the art of turning data into actions • The ability to take data—to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it • Data Science seeks to • Extract meaning from data • Create " Data Products" • Use all available data to tell a valuable story to non- practioners The future belongs to the companies and people that turn data into products
  • 39. Data Science is every where
  • 40. 40 Known Unknowns (BI) Unknown Unknowns ( Data Science ) Lots of $ impacting patterns Unnoticed Waiting to be discovered! Data Science vs.BI
  • 41. “As is” state in most organizations Data ( Sales , Finance ) Reports ( BO, Cognos, MSAS )
  • 42. “As is” stage with leading game changers Data repository Insights Analytics cell + Modeling processes ( Segment, Score, Text mine ) Move from Reports  Insightful Actions that Impact
  • 43. What's are 4 core differences between Data Science & Dashboards ? Data repository Dashboards Data repository (Purchase habits) Signal (Similiar people discovery) ML process (Collaborative filtering) Actions (Recommend a product ) Outcomes (Improve cross sell) 2 3 4 Dashboards 1 ML + Signals + Actions = Game Changing Outcomes
  • 44. What exactly is an model ? • Mathematically defining a real world phenomena • Representative of real world • For example cross sell model
  • 45. What are 3 common things between predictive models and caricatures ? • Its an approximation, not a perfection • Its better than not having anything • It get the job done REAL WORLD ANALYTICAL MODEL
  • 46. Use data to discover Signals (patterns) that cause changes that impacts $ . What's the Goal of Data Science ?
  • 47. Data Science Reference Architecture – Key components Hadoop Hive Hana Info bright Clustering Text mining Mobile Digital Data Ingestion Pipeline
  • 48. Machine Learning Reference Architecture STORE ( Hadoop, Hive, HANA, Cloudera, Splunk, Hortonworks) SENSE ( signal extraction- text mining, scoring models ), RESPOND ( Front line actions thru website, call centre ) 1 2 3
  • 49. Snapshot of Machine Learning Techniques 1. Segmentation 3.Forecasting 5. Scoring models 2.Text mining 4. Visual Analytics 6.Optimisation 1. Customer behavior segmentation 2. Defect segmentation 3. Employee segmentation model 4. Supplier segmentation mode 5. “Chunking” groups 6. Discovered by algorithm 1. Convert messy unstructured text into actionable signals 2. Keyword frequencies 3. Sentiment ratios 4. Blogs 5. Call center transcripts 6. Emails 7. Multi channel sentiment analysis 1. Predict CLTV 2. Predict Sales at a neighborhood outlet 3. Predict Salary based on experience, qualification, rating, market demand 4. Identify drivers of behavior 5. Weights processing 1. Beyond line, bar , pie charts 2. Geospatial modeling to see geo correlation 3. Spread analysis 4. Outlier detection 1. Churn propensity 2. Cross sell 3. Attrition modeling in HR 4. Risk scoring models in Banking 5. Logistic 6. Neural networks 7. Decision trees 8. Support Vector machines 1. Constraint modeling 2. Maximize an outcome 3. Maximize sales without cannibalizing sister brands
  • 50. Its all about DETECTING PATTERNS !
  • 53. Real world Unstructured text mining in health care Doctors transcripts Split sentences onto words/tokens Step-1 : SPLIT Filter “noise” words eg : I , the, is, was, Step-2 : FILTER ‘Pulmonary’= ‘pulmonar’ ‘Insomnia’ = ‘Sleep’ = ‘Sleeplessnes; ‘ Step-3 : STEMMING Keyword extraction & Theme generation Step-4 : THEME EXTRACTION Step-5 : THEME / KEYWORD ANALYSIS Lab diagnostics Nurses Observations Cardiac watch list Oncology watch list Pulmonary watch list Diabetic watch list Schizophreni a watch list
  • 57. Industries disrupted by Data Science • Infrastructure optimisation, Network securityTelecom • Customer sentiment, Multi channel analysisBanking • Consumer engagement, Recommendation enginesDigital channel • Autonomous cards, Fords OnStarAutomotive • WearablesHealth care • Operations optimisationOil n Gas • DigitisationRetail
  • 58. What factors are driving companies towards data science ? • Competitive advantage in the market place ( get ahead fast using unique insights ) • Existential threat ( others are moving ahead fast and I need to catch up ) • Revenue enhancement ( Cross sell models, recommenders ) • Cost optimisation ( Operational efficiency )
  • 59. Technology behind Data Science Algorithams Machine learning Predictive analytics R
  • 60. Why is Big Data HOT ?
  • 61. Big Data jobs are Exploding!
  • 62.
  • 63. Data Science jobs are Exploding!
  • 64. Data Science Jobs exploding in India too !
  • 65. 1 2 3
  • 66. Transform yourself to 21st Century Skills
  • 67. The 6 Most Desired Skills in 2015
  • 68. 1 2 3 To summarize 3 key takeaways …
  • 69. FAQ
  • 70. FAQ-1: “I am confused between Hadoop and Data Science … What's difference between Hadoop and Data Science?” • Hadoop = Data Infrastructure layer • Data Science = Sensing patterns from data to impact business outcome
  • 71. FAQ-2 : “I have worked on SAP, Oracle, etc How do I transition to becoming a Data Scientist ?” • Execute your first Data Science pilot • Step-1 : Learn R • Step-2 : Zero in on a business problem to solve • Step-3 : Setup R Your technology connector …Get access to data from your Technology • Step-4 : Apply an Analytical construct ( VEDA ML ) • Step-5 : Discover the pattern which impacts the outcome • Step-6 : Present final results to executive business team • Explore setting up a Data science project within existing organisation • Meetups to explore the outside world
  • 72. FAQ-3: “Should I know probability and advanced statistics ?” • Not really • We are focussed on APPLICATION and not THEORY underpinning it • We will teach you • Business problem to solve • How to execute the command on a platform • What to look for in the output • What happens within the black box can be seen later
  • 73. FAQ-4: “This is a big shift for me … In your experience how long does it take to make the transition from IT to Data Science ?” • We have seen people make the transition from 4 weeks to about 6 months • It depends upon the time + passion + drive you have
  • 74. FAQ-5: “How are we going to prepare you for the data science job market ?” 1. Mock preparatory sessions 2. Worksheets + Modelling Checklists + Data Science Playbooks 3. Live projects on clustering , scoring which can be put in resume 4. Our strategic tie-ups with Organisations looking for data science skills 5. Top 30 Practitioner generated Data Science questions
  • 75. FAQ-6: “I am not an IT professional but a domain person. How can I get started ?” 1. Option-1 : Focus on Industry use cases 2. Option-2 : Take basic introduction to data sciences
  • 76. Big Data Resources• datasciencecentral.com • bigdatauniversity.com • Courseera.com • Big Data Architecture • Spotting Signals in Big Data • Signal Extraction Methodology • Advanced Visualization in Big Data • Exploratory Data Analysis (EDA) : Quick Deep Dive • Best practices in designing dashboards and scorecards • Exploring Big Data Using Bivariate Analysis • Where to start looking in Big Data using Univariate Analysis • Big Data Platform & Applications • Statistics Role in Data Science • Applied Mathematics Role in Data Science • Data-Scientist-playbook • 5-disruption-data-products By Data Science
  • 77. All The Best Happy Hadooping & Dating with Data Science Conquer the world ! Become Data Scientist