SlideShare a Scribd company logo
1 of 37
Easier, Faster, Smarter
Data Science without the Scientist
Matt Schumpert
10.30.13

© 2013 Datameer, Inc. All rights reserved.
Agenda
Background
First principles
Mind-blowing fun fact
Current state & challenges
Suggestions for making life easier
Demo!

© 2013 Datameer, Inc. All rights reserved.
Me
Enterprise infrastructure software guy
Focused on abstraction and customers
Likes simplicity

© 2013 Datameer, Inc. All rights reserved.
A favorite example...
Buffered Web Services:
“When a buffered operation is invoked by a client, the method operation goes on a JMS queue and WebLogic
Server deals with it asynchronously by transparently creating a Message Driven Bean to consume the message.
As with Web Service reliable messaging, if WebLogic Server goes down while the method invocation is still in the
queue, it will be dealt with as soon as WebLogic Server is restarted. When a client invokes the buffered Web
Service, the client does not wait for a response from the invoke, and the execution of the client can continue”

© 2013 Datameer, Inc. All rights reserved.
1. First Principles
First Principles from an Expert
Instrument everything
Invest in infrastructure
Put all your data in one place
Data first, questions later
Keep raw data forever
Let everyone party on the data
Produce tools to support the whole lifecycle
- Jeff Hammerbacher
© 2013 Datameer, Inc. All rights reserved.
2. Mind-boggling fun fact
190,000 unfilled data
scientist jobs by 2018

-McKinsey
Signal-to-Noise Ratio is Dropping!
3. Current state + challenges
Hallmarks of Traditional Analytics
Esoteric skills
Long cycle times
Low transparency
Data & application silos
Mired in data prep
Sampling (guesstimation)
Expensive!
Extremely valuable work products
© 2013 Datameer, Inc. All rights reserved.
Current Recipe:
Pull historical data
Sample
Cleanse / Pre-process
Design / implement model
Train
Hand-code / Integrate
Deploy
Fine-Tune, rinse and repeat
© 2013 Datameer, Inc. All rights reserved.
Science != Everyday Decisions
There must be a better
way!
Apply traditional tools to big data?

SAS

R

Mahout

Expensive
Not Scalable
Silo’ed

Requires Coding
Retraining
Clunky Architecture

Coding Required
Immature
Limited Support

© 2013 Datameer, Inc. All rights reserved.
And what about the rest
of the (big data) story?
Big Data Analytics is NOT (just):
A sexy new visualization tool
Machine learning / Predictive analytics
Data science
Hadoop
The data warehousing movie replayed

© 2013 Datameer, Inc. All rights reserved.
Big Data Analytics IS:
A granular, complete and current understanding
of your operations and customers
Answering questions at the speed of business
Relevancy in all customer interactions
Closed-loop decisioning that’s data-driven
Managing data through a lifecycle
© 2013 Datameer, Inc. All rights reserved.
The Big Data Analytics Lifecycle
Prepare and
Analyze
Analyze

Create your
Integrate
hypothesis

Visualize
Visualize

Act on insight and
measure ROI
Deploy

© 2013 Datameer, Inc. All rights reserved.
A lesson from data warehousing / BI
traditional / schema-on-write:
slow

static

complex

agile / schema-on-read:
fast

dynamic

simple

Source: TDWI
© 2013 Datameer, Inc. All rights reserved.
Don’t rebuild Rome... again!!

© 2013 Datameer, Inc. All rights reserved.
There must be a better
way!
4. Making life easier
How (without army):
Speak the language of the business
Generate (don’t write) code
Simplify data integration and preparation
Move the computation (analytics) to the data

© 2013 Datameer, Inc. All rights reserved.
Esoteric Language == Obscurity
K-Means

CART

Mutual Information

Matrix Factorization
Random Forest?

Logistical Regression

Support Vector Machine??

© 2013 Datameer, Inc. All rights reserved.
Algorithms can be straightforward!

© 2013 Datameer, Inc. All rights reserved.
Clustering

© 2013 Datameer, Inc. All rights reserved.
Column Dependencies

© 2013 Datameer, Inc. All rights reserved.
Decision Trees

© 2013 Datameer, Inc. All rights reserved.
Recommendations

© 2013 Datameer, Inc. All rights reserved.
Example:
Fraud Investigation
Sales Conversion
DEMO
Data Wrangling
DEMO
© 2013 Datameer, Inc. All rights reserved.
@Datameer

More Related Content

What's hot

Brian Lalancette CollabCon 2015 Developing a Business Requirements Strategy f...
Brian Lalancette CollabCon 2015 Developing a Business Requirements Strategy f...Brian Lalancette CollabCon 2015 Developing a Business Requirements Strategy f...
Brian Lalancette CollabCon 2015 Developing a Business Requirements Strategy f...Brian Lalancette
 
AOS - Cloud Solutions
AOS - Cloud SolutionsAOS - Cloud Solutions
AOS - Cloud SolutionsNGINX at F5
 
Lean Enterprise Finding Your Innovation Focus AWS Summit SG 2017
Lean Enterprise Finding Your Innovation Focus  AWS Summit SG 2017Lean Enterprise Finding Your Innovation Focus  AWS Summit SG 2017
Lean Enterprise Finding Your Innovation Focus AWS Summit SG 2017Amazon Web Services
 
CEQIT Company Profile
CEQIT Company ProfileCEQIT Company Profile
CEQIT Company ProfileJonathan Ang
 
All analytics assets, one launchpad
All analytics assets, one launchpadAll analytics assets, one launchpad
All analytics assets, one launchpadRobert Hankey
 
What is managed IT service?
What is managed IT service?What is managed IT service?
What is managed IT service?supportnerds
 
Using Netsuite For Your Distribution Company.
Using Netsuite For Your Distribution Company.Using Netsuite For Your Distribution Company.
Using Netsuite For Your Distribution Company.Your Virtual CTO
 
AWS re:Invent 2017 | CloudHealth Tech Session
AWS re:Invent 2017 |  CloudHealth Tech SessionAWS re:Invent 2017 |  CloudHealth Tech Session
AWS re:Invent 2017 | CloudHealth Tech SessionCloudHealth by VMware
 
Critical data center move case study
Critical data center move case study Critical data center move case study
Critical data center move case study NinthDimension
 
Dun & Bradstreet Business Information Solutions
Dun & Bradstreet Business Information SolutionsDun & Bradstreet Business Information Solutions
Dun & Bradstreet Business Information SolutionsAmazon Web Services
 
Instacarma Portfolio
Instacarma PortfolioInstacarma Portfolio
Instacarma PortfolioInsta Crama
 
Freeing Minds - Reduce waste, improve efficiency
Freeing Minds - Reduce waste, improve efficiencyFreeing Minds - Reduce waste, improve efficiency
Freeing Minds - Reduce waste, improve efficiencySolarwinds N-able
 
Data Drive Applications_Webinar
Data Drive Applications_WebinarData Drive Applications_Webinar
Data Drive Applications_WebinarSean Spediacci
 
Full-Service NetSuite Team: Implementation, Integration, Training & Support
Full-Service NetSuite Team: Implementation, Integration, Training & SupportFull-Service NetSuite Team: Implementation, Integration, Training & Support
Full-Service NetSuite Team: Implementation, Integration, Training & SupportProtelo, Inc.
 
Learn NetSuite: Top NetSuite Training Resources For Self-Teaching
Learn NetSuite: Top NetSuite Training Resources For Self-TeachingLearn NetSuite: Top NetSuite Training Resources For Self-Teaching
Learn NetSuite: Top NetSuite Training Resources For Self-TeachingProtelo, Inc.
 

What's hot (20)

Brian Lalancette CollabCon 2015 Developing a Business Requirements Strategy f...
Brian Lalancette CollabCon 2015 Developing a Business Requirements Strategy f...Brian Lalancette CollabCon 2015 Developing a Business Requirements Strategy f...
Brian Lalancette CollabCon 2015 Developing a Business Requirements Strategy f...
 
Profile
ProfileProfile
Profile
 
AOS - Cloud Solutions
AOS - Cloud SolutionsAOS - Cloud Solutions
AOS - Cloud Solutions
 
Lean Enterprise Finding Your Innovation Focus AWS Summit SG 2017
Lean Enterprise Finding Your Innovation Focus  AWS Summit SG 2017Lean Enterprise Finding Your Innovation Focus  AWS Summit SG 2017
Lean Enterprise Finding Your Innovation Focus AWS Summit SG 2017
 
Corporate Profile
Corporate ProfileCorporate Profile
Corporate Profile
 
CEQIT Company Profile
CEQIT Company ProfileCEQIT Company Profile
CEQIT Company Profile
 
Cloud for-startup
Cloud for-startupCloud for-startup
Cloud for-startup
 
All analytics assets, one launchpad
All analytics assets, one launchpadAll analytics assets, one launchpad
All analytics assets, one launchpad
 
What is managed IT service?
What is managed IT service?What is managed IT service?
What is managed IT service?
 
Using Netsuite For Your Distribution Company.
Using Netsuite For Your Distribution Company.Using Netsuite For Your Distribution Company.
Using Netsuite For Your Distribution Company.
 
AWS re:Invent 2017 | CloudHealth Tech Session
AWS re:Invent 2017 |  CloudHealth Tech SessionAWS re:Invent 2017 |  CloudHealth Tech Session
AWS re:Invent 2017 | CloudHealth Tech Session
 
Critical data center move case study
Critical data center move case study Critical data center move case study
Critical data center move case study
 
Softchoice overview
Softchoice overviewSoftchoice overview
Softchoice overview
 
Dun & Bradstreet Business Information Solutions
Dun & Bradstreet Business Information SolutionsDun & Bradstreet Business Information Solutions
Dun & Bradstreet Business Information Solutions
 
Instacarma Portfolio
Instacarma PortfolioInstacarma Portfolio
Instacarma Portfolio
 
Freeing Minds - Reduce waste, improve efficiency
Freeing Minds - Reduce waste, improve efficiencyFreeing Minds - Reduce waste, improve efficiency
Freeing Minds - Reduce waste, improve efficiency
 
Data Drive Applications_Webinar
Data Drive Applications_WebinarData Drive Applications_Webinar
Data Drive Applications_Webinar
 
Full-Service NetSuite Team: Implementation, Integration, Training & Support
Full-Service NetSuite Team: Implementation, Integration, Training & SupportFull-Service NetSuite Team: Implementation, Integration, Training & Support
Full-Service NetSuite Team: Implementation, Integration, Training & Support
 
Moogilu StartupKit
Moogilu StartupKitMoogilu StartupKit
Moogilu StartupKit
 
Learn NetSuite: Top NetSuite Training Resources For Self-Teaching
Learn NetSuite: Top NetSuite Training Resources For Self-TeachingLearn NetSuite: Top NetSuite Training Resources For Self-Teaching
Learn NetSuite: Top NetSuite Training Resources For Self-Teaching
 

Similar to How to do Data Science Without the Scientist

The Five Markers on Your Big Data Journey
The Five Markers on Your Big Data JourneyThe Five Markers on Your Big Data Journey
The Five Markers on Your Big Data JourneyCloudera, Inc.
 
The Economics of SQL on Hadoop
The Economics of SQL on HadoopThe Economics of SQL on Hadoop
The Economics of SQL on HadoopDatameer
 
How to Avoid Pitfalls in Big Data Analytics Webinar
How to Avoid Pitfalls in Big Data Analytics WebinarHow to Avoid Pitfalls in Big Data Analytics Webinar
How to Avoid Pitfalls in Big Data Analytics WebinarDatameer
 
The Journey to Success with Big Data
The Journey to Success with Big DataThe Journey to Success with Big Data
The Journey to Success with Big DataCloudera, Inc.
 
Where the Warehouse Ends: A New Age of Information Access
Where the Warehouse Ends: A New Age of Information AccessWhere the Warehouse Ends: A New Age of Information Access
Where the Warehouse Ends: A New Age of Information AccessInside Analysis
 
Big Data and Fast Data – Big and Fast Combined, is it Possible?
Big Data and Fast Data – Big and Fast Combined, is it Possible?Big Data and Fast Data – Big and Fast Combined, is it Possible?
Big Data and Fast Data – Big and Fast Combined, is it Possible?Guido Schmutz
 
The Evolution of a Scrappy Startup to a Successful Web Service
The Evolution of a Scrappy Startup to a Successful Web ServiceThe Evolution of a Scrappy Startup to a Successful Web Service
The Evolution of a Scrappy Startup to a Successful Web ServicePoornima Vijayashanker
 
The new dominant companies are running on data
The new dominant companies are running on data The new dominant companies are running on data
The new dominant companies are running on data SnapLogic
 
Big Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on DataBig Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on DataMatt Stubbs
 
Big Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on DataBig Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on DataMatt Stubbs
 
The Automotive Journey Into the Cloud
The Automotive Journey Into the CloudThe Automotive Journey Into the Cloud
The Automotive Journey Into the CloudKim Pike
 
The Automotive Journey Into the Cloud
The Automotive Journey Into the CloudThe Automotive Journey Into the Cloud
The Automotive Journey Into the CloudEmtec Inc.
 
InfoSphere BigInsights
InfoSphere BigInsightsInfoSphere BigInsights
InfoSphere BigInsightsWilfried Hoge
 
Capgemini Leap Data Transformation Framework with Cloudera
Capgemini Leap Data Transformation Framework with ClouderaCapgemini Leap Data Transformation Framework with Cloudera
Capgemini Leap Data Transformation Framework with ClouderaCapgemini
 
Iasa Architect responsibilities in the cloud
Iasa Architect responsibilities in the cloudIasa Architect responsibilities in the cloud
Iasa Architect responsibilities in the cloudiasaglobal
 
RoMT - Part 2 Marketing Technology Webinar
RoMT - Part 2 Marketing Technology WebinarRoMT - Part 2 Marketing Technology Webinar
RoMT - Part 2 Marketing Technology WebinarSmart Insights
 
Becoming Data-Driven Through Cultural Change
Becoming Data-Driven Through Cultural ChangeBecoming Data-Driven Through Cultural Change
Becoming Data-Driven Through Cultural ChangeCloudera, Inc.
 
SDD2017 - 03 Abed Ajraou - putting data science in your business a first uti...
SDD2017 - 03 Abed Ajraou  - putting data science in your business a first uti...SDD2017 - 03 Abed Ajraou  - putting data science in your business a first uti...
SDD2017 - 03 Abed Ajraou - putting data science in your business a first uti...Dario Mangano
 
Microsoft Whitepaper: Running Your Business in the Cloud
Microsoft Whitepaper: Running Your Business in the CloudMicrosoft Whitepaper: Running Your Business in the Cloud
Microsoft Whitepaper: Running Your Business in the CloudDWP Information Architects Inc.
 
Transform Banking with Big Data and Automated Machine Learning 9.12.17
Transform Banking with Big Data and Automated Machine Learning 9.12.17Transform Banking with Big Data and Automated Machine Learning 9.12.17
Transform Banking with Big Data and Automated Machine Learning 9.12.17Cloudera, Inc.
 

Similar to How to do Data Science Without the Scientist (20)

The Five Markers on Your Big Data Journey
The Five Markers on Your Big Data JourneyThe Five Markers on Your Big Data Journey
The Five Markers on Your Big Data Journey
 
The Economics of SQL on Hadoop
The Economics of SQL on HadoopThe Economics of SQL on Hadoop
The Economics of SQL on Hadoop
 
How to Avoid Pitfalls in Big Data Analytics Webinar
How to Avoid Pitfalls in Big Data Analytics WebinarHow to Avoid Pitfalls in Big Data Analytics Webinar
How to Avoid Pitfalls in Big Data Analytics Webinar
 
The Journey to Success with Big Data
The Journey to Success with Big DataThe Journey to Success with Big Data
The Journey to Success with Big Data
 
Where the Warehouse Ends: A New Age of Information Access
Where the Warehouse Ends: A New Age of Information AccessWhere the Warehouse Ends: A New Age of Information Access
Where the Warehouse Ends: A New Age of Information Access
 
Big Data and Fast Data – Big and Fast Combined, is it Possible?
Big Data and Fast Data – Big and Fast Combined, is it Possible?Big Data and Fast Data – Big and Fast Combined, is it Possible?
Big Data and Fast Data – Big and Fast Combined, is it Possible?
 
The Evolution of a Scrappy Startup to a Successful Web Service
The Evolution of a Scrappy Startup to a Successful Web ServiceThe Evolution of a Scrappy Startup to a Successful Web Service
The Evolution of a Scrappy Startup to a Successful Web Service
 
The new dominant companies are running on data
The new dominant companies are running on data The new dominant companies are running on data
The new dominant companies are running on data
 
Big Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on DataBig Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on Data
 
Big Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on DataBig Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on Data
 
The Automotive Journey Into the Cloud
The Automotive Journey Into the CloudThe Automotive Journey Into the Cloud
The Automotive Journey Into the Cloud
 
The Automotive Journey Into the Cloud
The Automotive Journey Into the CloudThe Automotive Journey Into the Cloud
The Automotive Journey Into the Cloud
 
InfoSphere BigInsights
InfoSphere BigInsightsInfoSphere BigInsights
InfoSphere BigInsights
 
Capgemini Leap Data Transformation Framework with Cloudera
Capgemini Leap Data Transformation Framework with ClouderaCapgemini Leap Data Transformation Framework with Cloudera
Capgemini Leap Data Transformation Framework with Cloudera
 
Iasa Architect responsibilities in the cloud
Iasa Architect responsibilities in the cloudIasa Architect responsibilities in the cloud
Iasa Architect responsibilities in the cloud
 
RoMT - Part 2 Marketing Technology Webinar
RoMT - Part 2 Marketing Technology WebinarRoMT - Part 2 Marketing Technology Webinar
RoMT - Part 2 Marketing Technology Webinar
 
Becoming Data-Driven Through Cultural Change
Becoming Data-Driven Through Cultural ChangeBecoming Data-Driven Through Cultural Change
Becoming Data-Driven Through Cultural Change
 
SDD2017 - 03 Abed Ajraou - putting data science in your business a first uti...
SDD2017 - 03 Abed Ajraou  - putting data science in your business a first uti...SDD2017 - 03 Abed Ajraou  - putting data science in your business a first uti...
SDD2017 - 03 Abed Ajraou - putting data science in your business a first uti...
 
Microsoft Whitepaper: Running Your Business in the Cloud
Microsoft Whitepaper: Running Your Business in the CloudMicrosoft Whitepaper: Running Your Business in the Cloud
Microsoft Whitepaper: Running Your Business in the Cloud
 
Transform Banking with Big Data and Automated Machine Learning 9.12.17
Transform Banking with Big Data and Automated Machine Learning 9.12.17Transform Banking with Big Data and Automated Machine Learning 9.12.17
Transform Banking with Big Data and Automated Machine Learning 9.12.17
 

More from Datameer

Datameer6 for prospects - june 2016_v2
Datameer6 for prospects - june 2016_v2Datameer6 for prospects - june 2016_v2
Datameer6 for prospects - june 2016_v2Datameer
 
Extending BI with Big Data Analytics
Extending BI with Big Data AnalyticsExtending BI with Big Data Analytics
Extending BI with Big Data AnalyticsDatameer
 
Getting Started with Big Data for Business Managers
Getting Started with Big Data for Business ManagersGetting Started with Big Data for Business Managers
Getting Started with Big Data for Business ManagersDatameer
 
The State of Big Data Adoption: A Glance at Top Industries Adopting Big Data ...
The State of Big Data Adoption: A Glance at Top Industries Adopting Big Data ...The State of Big Data Adoption: A Glance at Top Industries Adopting Big Data ...
The State of Big Data Adoption: A Glance at Top Industries Adopting Big Data ...Datameer
 
Analyzing Unstructured Data in Hadoop Webinar
Analyzing Unstructured Data in Hadoop WebinarAnalyzing Unstructured Data in Hadoop Webinar
Analyzing Unstructured Data in Hadoop WebinarDatameer
 
Webinar - Big Data: Power to the User
Webinar - Big Data: Power to the User Webinar - Big Data: Power to the User
Webinar - Big Data: Power to the User Datameer
 
Why Use Hadoop for Big Data Analytics?
Why Use Hadoop for Big Data Analytics?Why Use Hadoop for Big Data Analytics?
Why Use Hadoop for Big Data Analytics?Datameer
 
Why Use Hadoop?
Why Use Hadoop?Why Use Hadoop?
Why Use Hadoop?Datameer
 
Online Fraud Detection Using Big Data Analytics Webinar
Online Fraud Detection Using Big Data Analytics WebinarOnline Fraud Detection Using Big Data Analytics Webinar
Online Fraud Detection Using Big Data Analytics WebinarDatameer
 
Instant Visualizations in Every Step of Analysis
Instant Visualizations in Every Step of AnalysisInstant Visualizations in Every Step of Analysis
Instant Visualizations in Every Step of AnalysisDatameer
 
Customer Case Studies of Self-Service Big Data Analytics
Customer Case Studies of Self-Service Big Data AnalyticsCustomer Case Studies of Self-Service Big Data Analytics
Customer Case Studies of Self-Service Big Data AnalyticsDatameer
 
BI, Hive or Big Data Analytics?
BI, Hive or Big Data Analytics? BI, Hive or Big Data Analytics?
BI, Hive or Big Data Analytics? Datameer
 
Is Your Hadoop Environment Secure?
Is Your Hadoop Environment Secure?Is Your Hadoop Environment Secure?
Is Your Hadoop Environment Secure?Datameer
 
Fight Fraud with Big Data Analytics
Fight Fraud with Big Data AnalyticsFight Fraud with Big Data Analytics
Fight Fraud with Big Data AnalyticsDatameer
 
Complement Your Existing Data Warehouse with Big Data & Hadoop
Complement Your Existing Data Warehouse with Big Data & HadoopComplement Your Existing Data Warehouse with Big Data & Hadoop
Complement Your Existing Data Warehouse with Big Data & HadoopDatameer
 
Lean Production Meets Big Data: A Next Generation Use Case
Lean Production Meets Big Data: A Next Generation Use CaseLean Production Meets Big Data: A Next Generation Use Case
Lean Production Meets Big Data: A Next Generation Use CaseDatameer
 
Top 3 Considerations for Machine Learning on Big Data
Top 3 Considerations for Machine Learning on Big DataTop 3 Considerations for Machine Learning on Big Data
Top 3 Considerations for Machine Learning on Big DataDatameer
 
Best Practices for Big Data Analytics with Machine Learning by Datameer
Best Practices for Big Data Analytics with Machine Learning by DatameerBest Practices for Big Data Analytics with Machine Learning by Datameer
Best Practices for Big Data Analytics with Machine Learning by DatameerDatameer
 
How to do Predictive Analytics with Limited Data
How to do Predictive Analytics with Limited DataHow to do Predictive Analytics with Limited Data
How to do Predictive Analytics with Limited DataDatameer
 

More from Datameer (19)

Datameer6 for prospects - june 2016_v2
Datameer6 for prospects - june 2016_v2Datameer6 for prospects - june 2016_v2
Datameer6 for prospects - june 2016_v2
 
Extending BI with Big Data Analytics
Extending BI with Big Data AnalyticsExtending BI with Big Data Analytics
Extending BI with Big Data Analytics
 
Getting Started with Big Data for Business Managers
Getting Started with Big Data for Business ManagersGetting Started with Big Data for Business Managers
Getting Started with Big Data for Business Managers
 
The State of Big Data Adoption: A Glance at Top Industries Adopting Big Data ...
The State of Big Data Adoption: A Glance at Top Industries Adopting Big Data ...The State of Big Data Adoption: A Glance at Top Industries Adopting Big Data ...
The State of Big Data Adoption: A Glance at Top Industries Adopting Big Data ...
 
Analyzing Unstructured Data in Hadoop Webinar
Analyzing Unstructured Data in Hadoop WebinarAnalyzing Unstructured Data in Hadoop Webinar
Analyzing Unstructured Data in Hadoop Webinar
 
Webinar - Big Data: Power to the User
Webinar - Big Data: Power to the User Webinar - Big Data: Power to the User
Webinar - Big Data: Power to the User
 
Why Use Hadoop for Big Data Analytics?
Why Use Hadoop for Big Data Analytics?Why Use Hadoop for Big Data Analytics?
Why Use Hadoop for Big Data Analytics?
 
Why Use Hadoop?
Why Use Hadoop?Why Use Hadoop?
Why Use Hadoop?
 
Online Fraud Detection Using Big Data Analytics Webinar
Online Fraud Detection Using Big Data Analytics WebinarOnline Fraud Detection Using Big Data Analytics Webinar
Online Fraud Detection Using Big Data Analytics Webinar
 
Instant Visualizations in Every Step of Analysis
Instant Visualizations in Every Step of AnalysisInstant Visualizations in Every Step of Analysis
Instant Visualizations in Every Step of Analysis
 
Customer Case Studies of Self-Service Big Data Analytics
Customer Case Studies of Self-Service Big Data AnalyticsCustomer Case Studies of Self-Service Big Data Analytics
Customer Case Studies of Self-Service Big Data Analytics
 
BI, Hive or Big Data Analytics?
BI, Hive or Big Data Analytics? BI, Hive or Big Data Analytics?
BI, Hive or Big Data Analytics?
 
Is Your Hadoop Environment Secure?
Is Your Hadoop Environment Secure?Is Your Hadoop Environment Secure?
Is Your Hadoop Environment Secure?
 
Fight Fraud with Big Data Analytics
Fight Fraud with Big Data AnalyticsFight Fraud with Big Data Analytics
Fight Fraud with Big Data Analytics
 
Complement Your Existing Data Warehouse with Big Data & Hadoop
Complement Your Existing Data Warehouse with Big Data & HadoopComplement Your Existing Data Warehouse with Big Data & Hadoop
Complement Your Existing Data Warehouse with Big Data & Hadoop
 
Lean Production Meets Big Data: A Next Generation Use Case
Lean Production Meets Big Data: A Next Generation Use CaseLean Production Meets Big Data: A Next Generation Use Case
Lean Production Meets Big Data: A Next Generation Use Case
 
Top 3 Considerations for Machine Learning on Big Data
Top 3 Considerations for Machine Learning on Big DataTop 3 Considerations for Machine Learning on Big Data
Top 3 Considerations for Machine Learning on Big Data
 
Best Practices for Big Data Analytics with Machine Learning by Datameer
Best Practices for Big Data Analytics with Machine Learning by DatameerBest Practices for Big Data Analytics with Machine Learning by Datameer
Best Practices for Big Data Analytics with Machine Learning by Datameer
 
How to do Predictive Analytics with Limited Data
How to do Predictive Analytics with Limited DataHow to do Predictive Analytics with Limited Data
How to do Predictive Analytics with Limited Data
 

Recently uploaded

DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 

Recently uploaded (20)

DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 

How to do Data Science Without the Scientist

  • 2. Data Science without the Scientist Matt Schumpert 10.30.13 © 2013 Datameer, Inc. All rights reserved.
  • 3. Agenda Background First principles Mind-blowing fun fact Current state & challenges Suggestions for making life easier Demo! © 2013 Datameer, Inc. All rights reserved.
  • 4. Me Enterprise infrastructure software guy Focused on abstraction and customers Likes simplicity © 2013 Datameer, Inc. All rights reserved.
  • 5. A favorite example... Buffered Web Services: “When a buffered operation is invoked by a client, the method operation goes on a JMS queue and WebLogic Server deals with it asynchronously by transparently creating a Message Driven Bean to consume the message. As with Web Service reliable messaging, if WebLogic Server goes down while the method invocation is still in the queue, it will be dealt with as soon as WebLogic Server is restarted. When a client invokes the buffered Web Service, the client does not wait for a response from the invoke, and the execution of the client can continue” © 2013 Datameer, Inc. All rights reserved.
  • 7. First Principles from an Expert Instrument everything Invest in infrastructure Put all your data in one place Data first, questions later Keep raw data forever Let everyone party on the data Produce tools to support the whole lifecycle - Jeff Hammerbacher © 2013 Datameer, Inc. All rights reserved.
  • 9. 190,000 unfilled data scientist jobs by 2018 -McKinsey
  • 11. 3. Current state + challenges
  • 12. Hallmarks of Traditional Analytics Esoteric skills Long cycle times Low transparency Data & application silos Mired in data prep Sampling (guesstimation) Expensive! Extremely valuable work products © 2013 Datameer, Inc. All rights reserved.
  • 13. Current Recipe: Pull historical data Sample Cleanse / Pre-process Design / implement model Train Hand-code / Integrate Deploy Fine-Tune, rinse and repeat © 2013 Datameer, Inc. All rights reserved.
  • 14. Science != Everyday Decisions
  • 15. There must be a better way!
  • 16. Apply traditional tools to big data? SAS R Mahout Expensive Not Scalable Silo’ed Requires Coding Retraining Clunky Architecture Coding Required Immature Limited Support © 2013 Datameer, Inc. All rights reserved.
  • 17. And what about the rest of the (big data) story?
  • 18. Big Data Analytics is NOT (just): A sexy new visualization tool Machine learning / Predictive analytics Data science Hadoop The data warehousing movie replayed © 2013 Datameer, Inc. All rights reserved.
  • 19. Big Data Analytics IS: A granular, complete and current understanding of your operations and customers Answering questions at the speed of business Relevancy in all customer interactions Closed-loop decisioning that’s data-driven Managing data through a lifecycle © 2013 Datameer, Inc. All rights reserved.
  • 20. The Big Data Analytics Lifecycle Prepare and Analyze Analyze Create your Integrate hypothesis Visualize Visualize Act on insight and measure ROI Deploy © 2013 Datameer, Inc. All rights reserved.
  • 21. A lesson from data warehousing / BI traditional / schema-on-write: slow static complex agile / schema-on-read: fast dynamic simple Source: TDWI © 2013 Datameer, Inc. All rights reserved.
  • 22. Don’t rebuild Rome... again!! © 2013 Datameer, Inc. All rights reserved.
  • 23. There must be a better way!
  • 24. 4. Making life easier
  • 25. How (without army): Speak the language of the business Generate (don’t write) code Simplify data integration and preparation Move the computation (analytics) to the data © 2013 Datameer, Inc. All rights reserved.
  • 26. Esoteric Language == Obscurity K-Means CART Mutual Information Matrix Factorization Random Forest? Logistical Regression Support Vector Machine?? © 2013 Datameer, Inc. All rights reserved.
  • 27. Algorithms can be straightforward! © 2013 Datameer, Inc. All rights reserved.
  • 28. Clustering © 2013 Datameer, Inc. All rights reserved.
  • 29. Column Dependencies © 2013 Datameer, Inc. All rights reserved.
  • 30. Decision Trees © 2013 Datameer, Inc. All rights reserved.
  • 31. Recommendations © 2013 Datameer, Inc. All rights reserved.
  • 33. DEMO
  • 35. DEMO
  • 36. © 2013 Datameer, Inc. All rights reserved.