SlideShare una empresa de Scribd logo
1 de 14
Descargar para leer sin conexión
1© Cloudera, Inc. All rights reserved.
Data Science in the Enterprise
Amr Awadallah (@awadallah)
Founder, Chief Technical Officer, Cloudera
2© Cloudera, Inc. All rights reserved.
Typical Data Science Workflow
Data Engineering Data Science (Exploratory) Production (Operational)
Data Wrangling
Visualization
and Analysis
Model Training
& Testing
Production
Data Pipelines Batch Scoring
Online Scoring
Serving
Data GovernanceGovernance
Processing
Acquisition
3© Cloudera, Inc. All rights reserved.
• Team: Data scientists and analysts
• Goal: Understand data, develop and improve models,
share insights
• Data: New and changing; often sampled
• Environment: Local machine, sandbox cluster
• Tools: R, Python, SAS/SPSS, SQL; notebooks; data
wrangling/discovery tools, …
• End State: Reports, dashboards, PDF, MS Office
• Team: Data engineers, developers, SREs
• Goal: Build and maintain applications, improve
model performance, manage models in production
• Data: Known data; full scale
• Environment: Production clusters
• Tools: Java/Scala, C++; IDEs; continuous
integration, source control, …
• End State: Online/production applications
Types of Data Science
Exploratory
(discover and quantify opportunities)
Operational
(deploy production systems)
4© Cloudera, Inc. All rights reserved.
Common Limitations
Access
Many times secured clusters are hard
for data science professionals to
connect either because they don’t
have the right permissions or
resources are to scarce to afford them
access. In addition popular
frameworks and libraries don’t read
Hadoop data formats out-of-the-box.
Scale
Notebook environments seldom
have large enough data storage for
medium, let alone big data. Data
scientists are often relegated to
sample data and constrained
when working on distributed
systems. Popular frameworks and
libraries don’t easily parallelize
across the cluster.
Developer Experience
Popular notebooks don’t work well
with access engines like Spark and
package deployment and
dependency management across
multiple software versions is often
hard to manage. Then once a model
is built there is no easy path from
model development to production
5© Cloudera, Inc. All rights reserved.
Management of Dependencies
6© Cloudera, Inc. All rights reserved.
Open Data Science in the Enterprise
IT
drive adoption while maintaining compliance
Data Scientist
explore, experiment, iterate
7© Cloudera, Inc. All rights reserved.
https://medium.com/@KevinSchmidtBiz/data-engineer-vs-data-scientist-vs-business-analyst-b68d201364bc
8© Cloudera, Inc. All rights reserved.
Introducing Cloudera Data Science Workbench
Self-service data science for the enterprise
Accelerates data science from
development to production with:
• Secure self-service environments
for data scientists to work against
Cloudera clusters
• Support for Python, R, and Scala,
plus project dependency isolation
for multiple library versions
• Workflow automation, version
control, collaboration and sharing
9© Cloudera, Inc. All rights reserved.
How does CDSW help?
Visualizeresults
ChangeandCompileSource
code
Retrainandredeploy
ExtensibleEngines
ConfigurableSessions
Trivialtotweakparameters
MultipleUsers
Roles/Governance
CDH
10© Cloudera, Inc. All rights reserved.
The Importance of an Open Ecosystem
Open Ecosystem Black Box
11© Cloudera, Inc. All rights reserved.
Demo
12© Cloudera, Inc. All rights reserved.
Key Benefits
How is Cloudera Data Science different?
Works with fully secured clusters
One tool for multiple standard languages (Python, R, Scala)
Multi-tenant Architecture
Common Platform
13© Cloudera, Inc. All rights reserved.
1
A conference for and by practicing data scientists!
Save the Date: July 20th at the Chapel, San Francisco
Wrangle is a 1 day, single track community event that hosts the best and
brightest in the Bay Area talking about the principles, practice, and
application of Data Science, across multiple data-rich industries. Join
Cloudera, Facebook, Netflix and more to discuss future trends, how they
can can be predicted, and most importantly—how can they be anticipated.
wrangleconf.com
#wrangleconf | Powered by Cloudera
14© Cloudera, Inc. All rights reserved.
Thank You
Amr Awadallah (@awadallah)

Más contenido relacionado

La actualidad más candente

13 2792 big-data_keynote_presentation_finalpass_05_d_v02
13 2792 big-data_keynote_presentation_finalpass_05_d_v0213 2792 big-data_keynote_presentation_finalpass_05_d_v02
13 2792 big-data_keynote_presentation_finalpass_05_d_v02
Erin Kerrigan
 

La actualidad más candente (20)

GITEX Big Data Conference 2014 – SAP Presentation
GITEX Big Data Conference 2014 – SAP PresentationGITEX Big Data Conference 2014 – SAP Presentation
GITEX Big Data Conference 2014 – SAP Presentation
 
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
 
seven steps to dataops @ dataops.rocks conference Oct 2019
seven steps to dataops @ dataops.rocks conference Oct 2019seven steps to dataops @ dataops.rocks conference Oct 2019
seven steps to dataops @ dataops.rocks conference Oct 2019
 
Intro to Big Data and Apache Hadoop by Dr. Amr Awadallah at CLOUD WEEKEND '13...
Intro to Big Data and Apache Hadoop by Dr. Amr Awadallah at CLOUD WEEKEND '13...Intro to Big Data and Apache Hadoop by Dr. Amr Awadallah at CLOUD WEEKEND '13...
Intro to Big Data and Apache Hadoop by Dr. Amr Awadallah at CLOUD WEEKEND '13...
 
Commercializing Alternative Data
Commercializing Alternative DataCommercializing Alternative Data
Commercializing Alternative Data
 
Next-Gen ML/AI Platform
Next-Gen ML/AI PlatformNext-Gen ML/AI Platform
Next-Gen ML/AI Platform
 
Big Data Roundtable. Why, how, where, which, and when to start doing Big Data
Big Data Roundtable. Why, how, where, which, and when to start doing Big DataBig Data Roundtable. Why, how, where, which, and when to start doing Big Data
Big Data Roundtable. Why, how, where, which, and when to start doing Big Data
 
Driving Digital Transformation through Service-Centric AIOps
Driving Digital Transformation through Service-Centric AIOpsDriving Digital Transformation through Service-Centric AIOps
Driving Digital Transformation through Service-Centric AIOps
 
AI Data Acquisition and Governance: Considerations for Success
AI Data Acquisition and Governance: Considerations for SuccessAI Data Acquisition and Governance: Considerations for Success
AI Data Acquisition and Governance: Considerations for Success
 
Big Data LDN 2017: Billions of Rows, the 5ws and H of Interpreting Fast and F...
Big Data LDN 2017: Billions of Rows, the 5ws and H of Interpreting Fast and F...Big Data LDN 2017: Billions of Rows, the 5ws and H of Interpreting Fast and F...
Big Data LDN 2017: Billions of Rows, the 5ws and H of Interpreting Fast and F...
 
Alan Southall, SVP of Engineering, Head of IoT Predictive Maintenance, SAP
Alan Southall, SVP of Engineering, Head of IoT Predictive Maintenance, SAPAlan Southall, SVP of Engineering, Head of IoT Predictive Maintenance, SAP
Alan Southall, SVP of Engineering, Head of IoT Predictive Maintenance, SAP
 
About us tapsolutions
About us tapsolutionsAbout us tapsolutions
About us tapsolutions
 
The Hive Think Tank: Talk by Mohandas Pai - India at 2030, How Tech Entrepren...
The Hive Think Tank: Talk by Mohandas Pai - India at 2030, How Tech Entrepren...The Hive Think Tank: Talk by Mohandas Pai - India at 2030, How Tech Entrepren...
The Hive Think Tank: Talk by Mohandas Pai - India at 2030, How Tech Entrepren...
 
13 2792 big-data_keynote_presentation_finalpass_05_d_v02
13 2792 big-data_keynote_presentation_finalpass_05_d_v0213 2792 big-data_keynote_presentation_finalpass_05_d_v02
13 2792 big-data_keynote_presentation_finalpass_05_d_v02
 
M2M Summit 2016 - IBM Presentation Cognitive Manufacturing
M2M Summit 2016 - IBM Presentation Cognitive ManufacturingM2M Summit 2016 - IBM Presentation Cognitive Manufacturing
M2M Summit 2016 - IBM Presentation Cognitive Manufacturing
 
Big Data Predictions for 2015
Big Data Predictions for 2015 Big Data Predictions for 2015
Big Data Predictions for 2015
 
Empowering a Mobile Workforce: An Objective Comparison of Leading Mobile Devi...
Empowering a Mobile Workforce: An Objective Comparison of Leading Mobile Devi...Empowering a Mobile Workforce: An Objective Comparison of Leading Mobile Devi...
Empowering a Mobile Workforce: An Objective Comparison of Leading Mobile Devi...
 
Operationalizing Data Analytics
Operationalizing Data AnalyticsOperationalizing Data Analytics
Operationalizing Data Analytics
 
Big Data Analytics and Artifical Intelligence
Big Data Analytics and Artifical IntelligenceBig Data Analytics and Artifical Intelligence
Big Data Analytics and Artifical Intelligence
 
Using Machine Learning at Scale: A Gaming Industry Experience!
Using Machine Learning at Scale: A Gaming Industry Experience!Using Machine Learning at Scale: A Gaming Industry Experience!
Using Machine Learning at Scale: A Gaming Industry Experience!
 

Similar a Data Science in the Enterprise

Similar a Data Science in the Enterprise (20)

Part 3: Models in Production: A Look From Beginning to End
Part 3: Models in Production: A Look From Beginning to EndPart 3: Models in Production: A Look From Beginning to End
Part 3: Models in Production: A Look From Beginning to End
 
Part 1: Introducing the Cloudera Data Science Workbench
Part 1: Introducing the Cloudera Data Science WorkbenchPart 1: Introducing the Cloudera Data Science Workbench
Part 1: Introducing the Cloudera Data Science Workbench
 
Analyzing Hadoop Data Using Sparklyr

Analyzing Hadoop Data Using Sparklyr
Analyzing Hadoop Data Using Sparklyr

Analyzing Hadoop Data Using Sparklyr

 
NOVA Data Science Meetup 2-21-2018 Presentation Cloudera Data Science Workbench
NOVA Data Science Meetup 2-21-2018 Presentation Cloudera Data Science WorkbenchNOVA Data Science Meetup 2-21-2018 Presentation Cloudera Data Science Workbench
NOVA Data Science Meetup 2-21-2018 Presentation Cloudera Data Science Workbench
 
Unlocking data science in the enterprise - with Oracle and Cloudera
Unlocking data science in the enterprise - with Oracle and ClouderaUnlocking data science in the enterprise - with Oracle and Cloudera
Unlocking data science in the enterprise - with Oracle and Cloudera
 
Machine Learning Model Deployment: Strategy to Implementation
Machine Learning Model Deployment: Strategy to ImplementationMachine Learning Model Deployment: Strategy to Implementation
Machine Learning Model Deployment: Strategy to Implementation
 
Data Science and CDSW
Data Science and CDSWData Science and CDSW
Data Science and CDSW
 
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 

 
Data Science in Enterprise
Data Science in EnterpriseData Science in Enterprise
Data Science in Enterprise
 
From Insight to Action: Using Data Science to Transform Your Organization
From Insight to Action: Using Data Science to Transform Your OrganizationFrom Insight to Action: Using Data Science to Transform Your Organization
From Insight to Action: Using Data Science to Transform Your Organization
 
Cloudera Altus: Big Data in the Cloud Made Easy
Cloudera Altus: Big Data in the Cloud Made EasyCloudera Altus: Big Data in the Cloud Made Easy
Cloudera Altus: Big Data in the Cloud Made Easy
 
Hadoop and Manufacturing
Hadoop and ManufacturingHadoop and Manufacturing
Hadoop and Manufacturing
 
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 
The Vision & Challenge of Applied Machine Learning
The Vision & Challenge of Applied Machine LearningThe Vision & Challenge of Applied Machine Learning
The Vision & Challenge of Applied Machine Learning
 
What it takes to bring Hadoop to a production-ready state
What it takes to bring Hadoop to a production-ready stateWhat it takes to bring Hadoop to a production-ready state
What it takes to bring Hadoop to a production-ready state
 
Enterprise Metadata Integration, Cloudera
Enterprise Metadata Integration, ClouderaEnterprise Metadata Integration, Cloudera
Enterprise Metadata Integration, Cloudera
 
Manoj Shanmugasundaram - Agile Machine Learning Development
Manoj Shanmugasundaram - Agile Machine Learning DevelopmentManoj Shanmugasundaram - Agile Machine Learning Development
Manoj Shanmugasundaram - Agile Machine Learning Development
 

Más de The Hive

The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
The Hive
 

Más de The Hive (20)

"Responsible AI", by Charlie Muirhead
"Responsible AI", by Charlie Muirhead"Responsible AI", by Charlie Muirhead
"Responsible AI", by Charlie Muirhead
 
Translating a Trillion Points of Data into Therapies, Diagnostics, and New In...
Translating a Trillion Points of Data into Therapies, Diagnostics, and New In...Translating a Trillion Points of Data into Therapies, Diagnostics, and New In...
Translating a Trillion Points of Data into Therapies, Diagnostics, and New In...
 
Digital Transformation; Digital Twins for Delivering Business Value in IIoT
Digital Transformation; Digital Twins for Delivering Business Value in IIoTDigital Transformation; Digital Twins for Delivering Business Value in IIoT
Digital Transformation; Digital Twins for Delivering Business Value in IIoT
 
Quantum Computing (IBM Q) - Hive Think Tank Event w/ Dr. Bob Sutor - 02.22.18
Quantum Computing (IBM Q) - Hive Think Tank Event w/ Dr. Bob Sutor - 02.22.18Quantum Computing (IBM Q) - Hive Think Tank Event w/ Dr. Bob Sutor - 02.22.18
Quantum Computing (IBM Q) - Hive Think Tank Event w/ Dr. Bob Sutor - 02.22.18
 
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...
 
“ High Precision Analytics for Healthcare: Promises and Challenges” by Sriram...
“ High Precision Analytics for Healthcare: Promises and Challenges” by Sriram...“ High Precision Analytics for Healthcare: Promises and Challenges” by Sriram...
“ High Precision Analytics for Healthcare: Promises and Challenges” by Sriram...
 
Social Impact & Ethics of AI by Steve Omohundro
Social Impact & Ethics of AI by Steve OmohundroSocial Impact & Ethics of AI by Steve Omohundro
Social Impact & Ethics of AI by Steve Omohundro
 
The Hive Think Tank: Machine Learning Applications in Genomics by Prof. Jian ...
The Hive Think Tank: Machine Learning Applications in Genomics by Prof. Jian ...The Hive Think Tank: Machine Learning Applications in Genomics by Prof. Jian ...
The Hive Think Tank: Machine Learning Applications in Genomics by Prof. Jian ...
 
The Hive Think Tank: The Future Of Customer Support - AI Driven Automation
The Hive Think Tank: The Future Of Customer Support - AI Driven AutomationThe Hive Think Tank: The Future Of Customer Support - AI Driven Automation
The Hive Think Tank: The Future Of Customer Support - AI Driven Automation
 
The Hive Think Tank: The Content Trap - Strategist's Guide to Digital Change
The Hive Think Tank: The Content Trap - Strategist's Guide to Digital ChangeThe Hive Think Tank: The Content Trap - Strategist's Guide to Digital Change
The Hive Think Tank: The Content Trap - Strategist's Guide to Digital Change
 
Deep Visual Understanding from Deep Learning by Prof. Jitendra Malik
Deep Visual Understanding from Deep Learning by Prof. Jitendra MalikDeep Visual Understanding from Deep Learning by Prof. Jitendra Malik
Deep Visual Understanding from Deep Learning by Prof. Jitendra Malik
 
The Hive Think Tank: Heron at Twitter
The Hive Think Tank: Heron at TwitterThe Hive Think Tank: Heron at Twitter
The Hive Think Tank: Heron at Twitter
 
The Hive Think Tank: Unpacking AI for Healthcare
The Hive Think Tank: Unpacking AI for Healthcare The Hive Think Tank: Unpacking AI for Healthcare
The Hive Think Tank: Unpacking AI for Healthcare
 
The Hive Think Tank: Translating IoT into Innovation at Every Level by Prith ...
The Hive Think Tank: Translating IoT into Innovation at Every Level by Prith ...The Hive Think Tank: Translating IoT into Innovation at Every Level by Prith ...
The Hive Think Tank: Translating IoT into Innovation at Every Level by Prith ...
 
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
 
The Hive Think Tank - Design Thinking by Bernie Roth, Professor at Stanford U...
The Hive Think Tank - Design Thinking by Bernie Roth, Professor at Stanford U...The Hive Think Tank - Design Thinking by Bernie Roth, Professor at Stanford U...
The Hive Think Tank - Design Thinking by Bernie Roth, Professor at Stanford U...
 
The Hive Think Tank: Machine Learning at Pinterest by Jure Leskovec
The Hive Think Tank: Machine Learning at Pinterest by Jure LeskovecThe Hive Think Tank: Machine Learning at Pinterest by Jure Leskovec
The Hive Think Tank: Machine Learning at Pinterest by Jure Leskovec
 
The Hive Think Tank: Sidechains by Adam Back, President of Blockstream
The Hive Think Tank: Sidechains by Adam Back, President of BlockstreamThe Hive Think Tank: Sidechains by Adam Back, President of Blockstream
The Hive Think Tank: Sidechains by Adam Back, President of Blockstream
 
The Hive Think Tank: Ceph + RocksDB by Sage Weil, Red Hat.
The Hive Think Tank: Ceph + RocksDB by Sage Weil, Red Hat.The Hive Think Tank: Ceph + RocksDB by Sage Weil, Red Hat.
The Hive Think Tank: Ceph + RocksDB by Sage Weil, Red Hat.
 
The Hive Think Tank: Rocking the Database World with RocksDB
The Hive Think Tank: Rocking the Database World with RocksDBThe Hive Think Tank: Rocking the Database World with RocksDB
The Hive Think Tank: Rocking the Database World with RocksDB
 

Último

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Último (20)

Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 

Data Science in the Enterprise

  • 1. 1© Cloudera, Inc. All rights reserved. Data Science in the Enterprise Amr Awadallah (@awadallah) Founder, Chief Technical Officer, Cloudera
  • 2. 2© Cloudera, Inc. All rights reserved. Typical Data Science Workflow Data Engineering Data Science (Exploratory) Production (Operational) Data Wrangling Visualization and Analysis Model Training & Testing Production Data Pipelines Batch Scoring Online Scoring Serving Data GovernanceGovernance Processing Acquisition
  • 3. 3© Cloudera, Inc. All rights reserved. • Team: Data scientists and analysts • Goal: Understand data, develop and improve models, share insights • Data: New and changing; often sampled • Environment: Local machine, sandbox cluster • Tools: R, Python, SAS/SPSS, SQL; notebooks; data wrangling/discovery tools, … • End State: Reports, dashboards, PDF, MS Office • Team: Data engineers, developers, SREs • Goal: Build and maintain applications, improve model performance, manage models in production • Data: Known data; full scale • Environment: Production clusters • Tools: Java/Scala, C++; IDEs; continuous integration, source control, … • End State: Online/production applications Types of Data Science Exploratory (discover and quantify opportunities) Operational (deploy production systems)
  • 4. 4© Cloudera, Inc. All rights reserved. Common Limitations Access Many times secured clusters are hard for data science professionals to connect either because they don’t have the right permissions or resources are to scarce to afford them access. In addition popular frameworks and libraries don’t read Hadoop data formats out-of-the-box. Scale Notebook environments seldom have large enough data storage for medium, let alone big data. Data scientists are often relegated to sample data and constrained when working on distributed systems. Popular frameworks and libraries don’t easily parallelize across the cluster. Developer Experience Popular notebooks don’t work well with access engines like Spark and package deployment and dependency management across multiple software versions is often hard to manage. Then once a model is built there is no easy path from model development to production
  • 5. 5© Cloudera, Inc. All rights reserved. Management of Dependencies
  • 6. 6© Cloudera, Inc. All rights reserved. Open Data Science in the Enterprise IT drive adoption while maintaining compliance Data Scientist explore, experiment, iterate
  • 7. 7© Cloudera, Inc. All rights reserved. https://medium.com/@KevinSchmidtBiz/data-engineer-vs-data-scientist-vs-business-analyst-b68d201364bc
  • 8. 8© Cloudera, Inc. All rights reserved. Introducing Cloudera Data Science Workbench Self-service data science for the enterprise Accelerates data science from development to production with: • Secure self-service environments for data scientists to work against Cloudera clusters • Support for Python, R, and Scala, plus project dependency isolation for multiple library versions • Workflow automation, version control, collaboration and sharing
  • 9. 9© Cloudera, Inc. All rights reserved. How does CDSW help? Visualizeresults ChangeandCompileSource code Retrainandredeploy ExtensibleEngines ConfigurableSessions Trivialtotweakparameters MultipleUsers Roles/Governance CDH
  • 10. 10© Cloudera, Inc. All rights reserved. The Importance of an Open Ecosystem Open Ecosystem Black Box
  • 11. 11© Cloudera, Inc. All rights reserved. Demo
  • 12. 12© Cloudera, Inc. All rights reserved. Key Benefits How is Cloudera Data Science different? Works with fully secured clusters One tool for multiple standard languages (Python, R, Scala) Multi-tenant Architecture Common Platform
  • 13. 13© Cloudera, Inc. All rights reserved. 1 A conference for and by practicing data scientists! Save the Date: July 20th at the Chapel, San Francisco Wrangle is a 1 day, single track community event that hosts the best and brightest in the Bay Area talking about the principles, practice, and application of Data Science, across multiple data-rich industries. Join Cloudera, Facebook, Netflix and more to discuss future trends, how they can can be predicted, and most importantly—how can they be anticipated. wrangleconf.com #wrangleconf | Powered by Cloudera
  • 14. 14© Cloudera, Inc. All rights reserved. Thank You Amr Awadallah (@awadallah)