SlideShare una empresa de Scribd logo
1 de 24
Descargar para leer sin conexión
Delivering Agile Data Science on Openshift
Audrey Reznik
Data Scientist
May 9th, 2019
John Archer
Principal Energy Solution Architect
How to create Instant Business Value
Red Hat Summit May 2019 – Delivering agile data science solutions with OpenShift
MEET THE SPEAKERS
John Archer
Principal Solution Energy Architect
Red Hat since 2015
BEA Systems, BSI Consulting,
DocuQuest, Andrews & Kurth,
SilverStream, Petris and Oracle
Upstream Data Management, DoD,
APIs, eCommerce, IoT, data science
and blockchain
SPE, SEG, PPDM, HJUG, HDUG, HAL-
PC, Energistics
Audrey Reznik
Data Scientist
Upstream Research Center
ExxonMobil since 2007
Chevron, Akamai, Entriq, Digital Medical
Registrar, Spider Technologies, Ziff
Davis
Red Hat Summit May 2019 – Delivering agile data science solutions with OpenShift
DATA SCIENCE TEAM PRESSURES
EXPLOSIVE GROWTH
in data analytics teams and analytic
tools
MULTIPLE TEAMS COMPETING
for use of the same storage and
computing resources
CONGESTION
in busy analytic clusters causing
frustration and missed SLAs
EMERGING DATAOPS
Data Scientist Developers vs Full Stack
Developer agility and enablement gaps
What can you envision and share?
Red Hat Summit May 2019 – Delivering agile data science solutions with OpenShift
NEED: SHARE CODE (PRODUCT) WITH USERS
Jupyter Notebooks as a technology we could use to combine python code, a GUI, documentation for sharing with
customers.
Start of a Interactive Data Science environment.
Red Hat OpenShift PoC at ExxonMobil. Could this new technology benefit us in
creating a Reproducible & Interactive Data Science environment?
Prize: This would enable the team to not only quickly obtain customer feedback,
but also easily utilize Agile Methodology; therefore, quickly delivering MVPs.
Drawback: how does
one avoid the
setup/configuration
issues and reliably
deploy the notebook? Pip install required
Anaconda libraries
Jupyter Notebook Python 3.x
(load onto PC – or setup server)
Local admin access
Access to latest source code
OS?SQL
Server
PC Setup
Red Hat Summit May 2019 – Delivering agile data science solutions with OpenShift
LOCAL PC VS OPENSHIFT PROJECT CONTAINERS
Jupyter Notebook
Python 3.x
(image)
Libraries
• Numpy
• Pandas
• Matplotlib
• IPyWidgets
• SciPy
• Lmfit
• Seaborne
• Plotly
SQLite
Container v2.0
GIT
Image project
Code project
OpenShift
URL
to PoCCode
Local PC Setup
pip install required
Anaconda libraries
Jupyter Notebook Python 3.x
(load onto PC – or setup server)
Local admin access
Access to latest source code
OS?SQL
Server
Reproducible Data Science environment that users interact with via Chrome.
Hardware Freedom
& easier
Reproduction!
Red Hat Summit May 2019 – Delivering agile data science solutions with OpenShift
For a Data Scientist, the ability to rapidly deploy code and quickly obtain feedback from a user is extremely
valuable and Agile! Openshift facilitates these capabilities!
REPRODUCIBLE & INTERACTIVE SCIENTIFIC ENVIRONMENT
1. Understand
the
Problem
2. Suggest
Solutions
Deliver POC
3. Refine the
Problem
Agile
How to Deploy?
URL
to
PoC
Code
GIT
Image project
Code project
OpenShift
“Interactive” feedback!
Nexus
Image
As a user I want to
provide frequent
feedback!
Python
(Pypi)
Security
Red Hat Summit May 2019 – Delivering agile data science solutions with OpenShift
DEPLOY SOURCE CODE WITH SOURCE TO IMAGE (S2I)
• Re-useable Data Science Applications: data location
• To re-useable Data Science Images: can they be re-consumed or modified for particular use cases?
• E.g. we have a base python image that has been modified to provide TensorFlow, SciKit Learn for Data
Science projects.
• Reusable data access containers: SQL Server, Oracle, PI, SAP HANA.
Git
RepositoryBUILD APP
(OpenShift) Developer
code	
Source-to-Image
(S2I)
Builder	Image
Image
Registry
BUILD IMAGE
(OpenShift)
DEPLOY
(OpenShift)
deployApplication	
Container
Red Hat Summit May 2019 – Delivering agile data science solutions with OpenShift
MATURING THE CI/CD PIPELINE
Seeing an emerging notion of Data ScienceOps workflows. Current OS production CI/CD in progress.
Challenges we are experiencing include:
1. OnPrem databases in different countries
2. Development/Deployment in Jupyter notebooks
GIT
Jenkins
build
Package
Jenkins
Archive
Artifacts in
Nexus
Nexus
OS build image
deploy to TEST
OS build image
deploy to PROD
Test
build
Package
Red Hat Summit May 2019 – Delivering agile data science solutions with OpenShift
MACHINE LEARNING ON OPENSHIFT
Figure 1. liquid estimates. Marco De Mattia
Unique performance computing requirements for
Artificial Intelligence, Machine Learning, Neural
Networks and GPUs
Multiple Data Science images:
• TensorFlow
• PyTorch
• Scikit-learn
Testing GPU (NVidia v100) cluster (OCP). Additional
service to internal HPC.
Next Steps: examine RAPIDS.AI – execute end-to-
end data science pipelines in the GPU…
Red Hat Summit May 2019 – Delivering agile data science solutions with OpenShift
OPENSHIFT GPU PROOF OF CONCEPT (POC)
GPU POC: read & analyze petro-physical data. Use ML Algorithms to generate analysis/models on GPU cluster.
Vetted models can be pushed to Azure for deployment.
GPUDB
Data
Scientist
URL to ML App
User
ML Algorithms
(GIT Repo)
L4
Network
onPrem
Database(s)
Containers
Figure 2. GPU POC workflow, Audrey Reznik
Red Hat Summit May 2019 – Delivering agile data science solutions with OpenShift
READY FOR ANY CLOUD – PRIVATE AND PUBLIC
DATA GRAVITY DRIVES THE LOCATION
• OpenShift for on premise and Public Cloud (Azure) for Container as a Service (CaaS)
1. CaaS Security enabled through AD groups created onPremise and DevOps practices
2. Self-service for accessing Data Science packages with network, routing and DNS services
3. Storage can be self-service with PVC or extended with Ceph and OCP Storage options
Where does your application live? How do you access it?
Is my application
secure?
Enabled Data Science Teams
• Perform More Experiments
• Spend less time on plumbing
• Focus on Delivering Value to
ExxonMobil
Resulting In
Red Hat Summit May 2019 – Delivering agile data science solutions with OpenShift
EXXONMOBIL DATA SCIENCE OS TIMELINE
Started with Data
Virtualization for
Calgary
Optimization
Dec 2017
Containerized JBoss
Data Virtualization
on Openshift on
premise - Feb 2018
Spoke with Data Science
teams - Python,
MATLAB, Julia and R
users – Mar 2018
Introduced Graham
Dumpleton’s
JupyterHub container
image – April - 2018
Delivered Data Science
Workshop on Openshift to
eight different data
science teams – Dec 2018
Built “Base” Data
Science image.
Python 3.x, AI
libraries
July - 2018
Data Science developers
deliver faster and
collaborate globally within
2 months – Feb 2019
Successfully deploy
ODH supporting multiple
notebook kernels and
GPU – Mar 2019
Built test OCP 3.10
cluster for NVidia
v100 testing for
Tensorflow and
Keras - Nov 2018
Red Hat Summit May 2019 – Delivering agile data science solutions with OpenShift
MOVING FORWARD: EXXONMOBIL DATA SCIENCE CAPABILITY TODAY
As a Data Scientist (all I care about) is that using Openshift, I can now deploy a common Jupyter Notebook /
Anaconda image (with all required libraries) in a matter of seconds.
Freeing myself (and other Data Scientists) to perform data science and not worry about architecture and delivery
mechanisms. Now that is Democratizing Data Science!
Selected Openshift on premises and public cloud for Container as a Service (CaaS)
• Openshift supports:
• One Click Notebooks and JupyterHub/Lab templates
• Self-service for accessing data & data science packages
• Nexus Repository to allow for Python, Java, R, PHP, .Net package managers
• Docker public repository security built-in process – protects against rooted
containers and new CVE attacks
• NVidia GPU support allows for sharing these resources across multiple teams
Jupyter Notebook	& select	conda libraries	image	being	used	for	Kearl Mining	Optimization	Studies
Red Hat Summit May 2019 – Delivering agile data science solutions with OpenShift
DATA SCIENTIST DEVELOPERS NEEDS
All Developers need
● Choice of architectures
● Choice of programming languages
● Choice of databases and persistence
● Choice of application services
● Choice of development tools
● Choice of build and deploy workflows
Data Science Additional Needs
● Access to GPUs and varied storage
● Access to Curated Data
● Automated ScienceOps pipelines
● Collaboration with the Business
● Access to specific data science
languages and toolsets
They don’t want to have to worry about the infrastructure.
Red Hat Summit May 2019 – Delivering agile data science solutions with OpenShift
YOUR DIFFERENTIATION DEPENDS ON YOUR
ABILITY TO DELIVER INTELLIGENT APPS FASTER
CONTAINERS, KUBERNETES, DEVOPS & DATAOPS ARE KEY INGREDIENTS
Innovation
Culture
Cloud-native
Applications
AI & Machine
Learning
Internet of
Things
Virtual GPU
Red Hat Summit May 2019 – Delivering agile data science solutions with OpenShift
Red Hat Summit May 2019 – Delivering agile data science solutions with OpenShift
OPENDATAHUB.IO ARCHITECTURE
CONTAINER STORAGE (CEPH)
CONTAINER HOST (RHEL/RHCOS)
Microsoft
Azure
AWSOpenStackDatacenterLaptop Google
Cloud
CONTAINER ORCHESTRATION AND MANAGEMENT (OPENSHIFT)
S3 API Object Store BLOCK FILE
GPU FPGA
APPLICATION LIFE CYCLE MANAGEMENT (OPENSHIFT)
DEVOPS WORKFLOW (CODE & DATA)
API GATEWAY (3SCALE) SERVICE MESH (ISTIO)
SERVERLESS
PRIVATE MICRO SERVICES
(CONTAINERIZED CUSTOM APPS)
CONTAINER APPS
PRE-DEFINED AI LIBRARY
(BOTS | ANOMALY | CLASSIFICATION | SENTIMENT | …)
AI TOOLCHAIN & WORKFLOW
(JUPYTER, SUPERSET, …)
COMMON SERVICES
SERVICECATALOG&SELFSERVICEUI/CLI
IDENTITY/POLICY(ACCESS,PLACEMENT)/LINEAGE(CODE
ANDDATA)
MANAGEMENTCONSOLE/INSIGHTS/AIOPS
(PROMETHEUS|ELASTIC|…)
FEDERATION
RH Core
Platform
OpenShift ALM
Red Hat
Middleware
Community &
ISV Ecosystem
Technology
Roadmap
Customer
Content
LEGEND
PYTHON / FLASK JAVA JAVASCRIPT ...
STREAMING (KAFKA - streamzi)
MSG BUS (AMQ) ANALYTICS (SPARK)
ML (TENSORFLOW |
…)
MEMORY CACHE (JDG) ||
DECISION (BxMS)
HDFS | REDIS | SQL | NoSQL
| GRAPHDB | TIMESERIES |
ELASTIC | ...
Red Hat Summit May 2019 – Delivering agile data science solutions with OpenShift
MODERN DATA ANALYTICS PIPELINE
DATA
GENERATION
INGEST DATA
SCIENCE
MACHINE
LEARNING
STREAM
PROCESSING
TRANSFORM,
MERGE, JOIN
DATA
ANALYTICS
• IoT Telemetry
• G&G - Well Logs
• Transactions
• Production
• NiFi
• Kafka
• MQTT
• Presto
• Impala
• SparkSQL
• Notebooks
• TensorFlow
• PyTorch
• Keras
• scikit-learn
• AutoML*
• Kafka
• MQTT
• WebSockets
• Hadoop
• Spark
• Pandas
• Apache Arrow
• Spark
• Hadoop
CONNECTING THE EDGE TO DATA SCIENTISTS
Highly	Scalable,	
flexible,	elastic,	
microservice	based	
architecture
Fully	Portable	– On	
Premise	to	any	
public	cloud	vendor
Leverages	the	
power	and	agility	
of	open	source	
software	without	
lock-in
Architecture	
Tenets
Data	
Scientist
Data	
Manager
s
Citizen
Data	
Scientist
Cognitive	AI
Vision
Speech
Face
Audio
Video
Text
Data
Models
Curation
Prep
Quality
Publishing
SecurityPython,	R,	Jupyter.org,	Tensorflow,	Keras,	Pandas,	Bokeh,	Dash,	Prometheus,	
Grafana,	SciPy,	NumPy,	SumPy,	Julia	,	Spark,	PySpark,	Theano,	Scikit,	FaceDetect
Packages:
AI/ML/Data	Science	Pods
MongoDB,	MariaDB,	mySQL,	Postgres,	Couchbase,	Redis,	MS-SQL,	OraclePersistence
:
SSO	and	Authentication
OIDC
SAML
OAuth
JWT
Kerberos
DevOps	
Node.js,	.Net	Core,	Java,	Python,	PHP,		Ruby,	Rails,	Javascript,	PerlApp	Dev:
AppDev	&	App	Services	and	Persistence	Pods
REST
ODBC
JDBC
WS
Predictive	
Maintenance
Autonomous
Operations
Supply	Chain	
Improvements
Downstream
Reliability
Use	Cases
Multitenant	– CPU	
and	GPU	powered	
workloads
REST
IoT	“Things”
MQTT
Integration,	BPM,	Rules,	Messaging,	API,	IoT,	Microservices,	IstioApp	Services:
OnPremise Public	Cloud
WSS
Kafka
Red Hat Summit May 2019 – Delivering agile data science solutions with OpenShift
● JupyterHub on Openshift
○ Jupyter notebook, JupyterHub, JupyterLab, Openshift Templates
● Kubeflow
○ Kube project for Tensorflow, Seldon, JupyterHub/Lab, PyTorch, MPI
Operator
● Opendatahub.io
○ Ceph, Spark, JupyterHub/Lab, Tensorflow
○ Simplified Multiple Kernels support
○ GPU Support
○ Resource management and instance culling
● radanalytics.io
○ Openshift Spark
○ Oshinko - Apache Spark Cluster
○ Spark Operator
OSS DATA SCIENCE PROJECTS
Red Hat Summit May 2019 – Delivering agile data science solutions with OpenShift
● Join Openshift Commons - ML SIG - https://commons.openshift.org/
● Openshift Self Service Education - https://learn.openshift.com
● Install Minishift - https://docs.okd.io/latest/minishift/getting-
started/installing.html
○ MacOS - brew cask install minishift
○ Manual - https://github.com/minishift/minishift/releases
● Install Jupyter and JupyterHub Openshift templates
○ https://github.com/jupyter-on-openshift/jupyterhub-quickstart
● Review the OpenDataHub.io project
HOW CAN I GET STARTED?
Delivering Agile Data Science solutions with OpenShift … and providing Business Value!
Delivering Agile Data Science on Openshift  - Red Hat Summit 2019

Más contenido relacionado

La actualidad más candente

Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...DoKC
 
Prometheus
PrometheusPrometheus
Prometheuswyukawa
 
Vault Open Source vs Enterprise v2
Vault Open Source vs Enterprise v2Vault Open Source vs Enterprise v2
Vault Open Source vs Enterprise v2Stenio Ferreira
 
從實戰經驗看到的 K8S 導入痛點
從實戰經驗看到的 K8S 導入痛點從實戰經驗看到的 K8S 導入痛點
從實戰經驗看到的 K8S 導入痛點Will Huang
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Learn O11y from Grafana ecosystem.
Learn O11y from Grafana ecosystem.Learn O11y from Grafana ecosystem.
Learn O11y from Grafana ecosystem.HungWei Chiu
 
Adopting Open Telemetry as Distributed Tracer on your Microservices at Kubern...
Adopting Open Telemetry as Distributed Tracer on your Microservices at Kubern...Adopting Open Telemetry as Distributed Tracer on your Microservices at Kubern...
Adopting Open Telemetry as Distributed Tracer on your Microservices at Kubern...Tonny Adhi Sabastian
 
OpenTelemetry For Architects
OpenTelemetry For ArchitectsOpenTelemetry For Architects
OpenTelemetry For ArchitectsKevin Brockhoff
 
Tomcat, Undertow, Jetty, Nginx Unit: pros and cons
Tomcat, Undertow, Jetty, Nginx Unit: pros and consTomcat, Undertow, Jetty, Nginx Unit: pros and cons
Tomcat, Undertow, Jetty, Nginx Unit: pros and consGeraldo Netto
 
開發人員必須知道的 Kubernetes 核心技術 - Kubernetes Summit 2018
開發人員必須知道的 Kubernetes 核心技術 - Kubernetes Summit 2018開發人員必須知道的 Kubernetes 核心技術 - Kubernetes Summit 2018
開發人員必須知道的 Kubernetes 核心技術 - Kubernetes Summit 2018Will Huang
 
Terraform introduction
Terraform introductionTerraform introduction
Terraform introductionJason Vance
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudNoritaka Sekiyama
 
Kubernetes Application Deployment with Helm - A beginner Guide!
Kubernetes Application Deployment with Helm - A beginner Guide!Kubernetes Application Deployment with Helm - A beginner Guide!
Kubernetes Application Deployment with Helm - A beginner Guide!Krishna-Kumar
 
Kubernetes環境に対する性能試験(Kubernetes Novice Tokyo #2 発表資料)
Kubernetes環境に対する性能試験(Kubernetes Novice Tokyo #2 発表資料)Kubernetes環境に対する性能試験(Kubernetes Novice Tokyo #2 発表資料)
Kubernetes環境に対する性能試験(Kubernetes Novice Tokyo #2 発表資料)NTT DATA Technology & Innovation
 
Parquet Hadoop Summit 2013
Parquet Hadoop Summit 2013Parquet Hadoop Summit 2013
Parquet Hadoop Summit 2013Julien Le Dem
 
Distributed tracing using open tracing & jaeger 2
Distributed tracing using open tracing & jaeger 2Distributed tracing using open tracing & jaeger 2
Distributed tracing using open tracing & jaeger 2Chandresh Pancholi
 
OCIv2?!軽量高速なイケてる次世代イメージ仕様の最新動向を抑えよう!
OCIv2?!軽量高速なイケてる次世代イメージ仕様の最新動向を抑えよう!OCIv2?!軽量高速なイケてる次世代イメージ仕様の最新動向を抑えよう!
OCIv2?!軽量高速なイケてる次世代イメージ仕様の最新動向を抑えよう!Kohei Tokunaga
 
Prometheus at Preferred Networks
Prometheus at Preferred NetworksPrometheus at Preferred Networks
Prometheus at Preferred NetworksPreferred Networks
 
Starting up Containers Super Fast With Lazy Pulling of Images
Starting up Containers Super Fast With Lazy Pulling of ImagesStarting up Containers Super Fast With Lazy Pulling of Images
Starting up Containers Super Fast With Lazy Pulling of ImagesKohei Tokunaga
 

La actualidad más candente (20)

Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...
 
Prometheus
PrometheusPrometheus
Prometheus
 
Vault Open Source vs Enterprise v2
Vault Open Source vs Enterprise v2Vault Open Source vs Enterprise v2
Vault Open Source vs Enterprise v2
 
從實戰經驗看到的 K8S 導入痛點
從實戰經驗看到的 K8S 導入痛點從實戰經驗看到的 K8S 導入痛點
從實戰經驗看到的 K8S 導入痛點
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Learn O11y from Grafana ecosystem.
Learn O11y from Grafana ecosystem.Learn O11y from Grafana ecosystem.
Learn O11y from Grafana ecosystem.
 
Adopting Open Telemetry as Distributed Tracer on your Microservices at Kubern...
Adopting Open Telemetry as Distributed Tracer on your Microservices at Kubern...Adopting Open Telemetry as Distributed Tracer on your Microservices at Kubern...
Adopting Open Telemetry as Distributed Tracer on your Microservices at Kubern...
 
OpenTelemetry For Architects
OpenTelemetry For ArchitectsOpenTelemetry For Architects
OpenTelemetry For Architects
 
Tomcat, Undertow, Jetty, Nginx Unit: pros and cons
Tomcat, Undertow, Jetty, Nginx Unit: pros and consTomcat, Undertow, Jetty, Nginx Unit: pros and cons
Tomcat, Undertow, Jetty, Nginx Unit: pros and cons
 
開發人員必須知道的 Kubernetes 核心技術 - Kubernetes Summit 2018
開發人員必須知道的 Kubernetes 核心技術 - Kubernetes Summit 2018開發人員必須知道的 Kubernetes 核心技術 - Kubernetes Summit 2018
開發人員必須知道的 Kubernetes 核心技術 - Kubernetes Summit 2018
 
Terraform introduction
Terraform introductionTerraform introduction
Terraform introduction
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
 
Kubernetes Application Deployment with Helm - A beginner Guide!
Kubernetes Application Deployment with Helm - A beginner Guide!Kubernetes Application Deployment with Helm - A beginner Guide!
Kubernetes Application Deployment with Helm - A beginner Guide!
 
Kubernetes環境に対する性能試験(Kubernetes Novice Tokyo #2 発表資料)
Kubernetes環境に対する性能試験(Kubernetes Novice Tokyo #2 発表資料)Kubernetes環境に対する性能試験(Kubernetes Novice Tokyo #2 発表資料)
Kubernetes環境に対する性能試験(Kubernetes Novice Tokyo #2 発表資料)
 
Helm intro
Helm introHelm intro
Helm intro
 
Parquet Hadoop Summit 2013
Parquet Hadoop Summit 2013Parquet Hadoop Summit 2013
Parquet Hadoop Summit 2013
 
Distributed tracing using open tracing & jaeger 2
Distributed tracing using open tracing & jaeger 2Distributed tracing using open tracing & jaeger 2
Distributed tracing using open tracing & jaeger 2
 
OCIv2?!軽量高速なイケてる次世代イメージ仕様の最新動向を抑えよう!
OCIv2?!軽量高速なイケてる次世代イメージ仕様の最新動向を抑えよう!OCIv2?!軽量高速なイケてる次世代イメージ仕様の最新動向を抑えよう!
OCIv2?!軽量高速なイケてる次世代イメージ仕様の最新動向を抑えよう!
 
Prometheus at Preferred Networks
Prometheus at Preferred NetworksPrometheus at Preferred Networks
Prometheus at Preferred Networks
 
Starting up Containers Super Fast With Lazy Pulling of Images
Starting up Containers Super Fast With Lazy Pulling of ImagesStarting up Containers Super Fast With Lazy Pulling of Images
Starting up Containers Super Fast With Lazy Pulling of Images
 

Similar a Delivering Agile Data Science on Openshift - Red Hat Summit 2019

Democratizing Data Science on Kubernetes
Democratizing Data Science on Kubernetes Democratizing Data Science on Kubernetes
Democratizing Data Science on Kubernetes John Archer
 
DDDP 2019 - Brown to Green
DDDP 2019  - Brown to GreenDDDP 2019  - Brown to Green
DDDP 2019 - Brown to GreenJohn Archer
 
ODSC East 2020 Accelerate ML Lifecycle with Kubernetes and Containerized Da...
ODSC East 2020   Accelerate ML Lifecycle with Kubernetes and Containerized Da...ODSC East 2020   Accelerate ML Lifecycle with Kubernetes and Containerized Da...
ODSC East 2020 Accelerate ML Lifecycle with Kubernetes and Containerized Da...Abhinav Joshi
 
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)Sascha Dittmann
 
Scaling Data Science on Big Data
Scaling Data Science on Big DataScaling Data Science on Big Data
Scaling Data Science on Big DataDataWorks Summit
 
OpenACC Monthly Highlights: February 2021
OpenACC Monthly Highlights: February 2021OpenACC Monthly Highlights: February 2021
OpenACC Monthly Highlights: February 2021OpenACC
 
Career opportunities in open source framework
Career opportunities in open source frameworkCareer opportunities in open source framework
Career opportunities in open source frameworkedunextgen
 
Career opportunities in open source framework
Career opportunities in open source framework Career opportunities in open source framework
Career opportunities in open source framework edunextgen
 
The Future of Data Science
The Future of Data ScienceThe Future of Data Science
The Future of Data ScienceDataWorks Summit
 
Big Data & Open Source - Neil Jadhav
Big Data & Open Source - Neil JadhavBig Data & Open Source - Neil Jadhav
Big Data & Open Source - Neil JadhavSwapnil (Neil) Jadhav
 
How Cloud is Affecting Data Scientists
How Cloud is Affecting Data Scientists How Cloud is Affecting Data Scientists
How Cloud is Affecting Data Scientists CCG
 
SoftElegance Services: Data Science, Data Engineering, Big Data Architecture
SoftElegance Services: Data Science, Data Engineering, Big Data Architecture SoftElegance Services: Data Science, Data Engineering, Big Data Architecture
SoftElegance Services: Data Science, Data Engineering, Big Data Architecture Daryna Dubitska
 
BUILDING BETTER PREDICTIVE MODELS WITH COGNITIVE ASSISTANCE IN A DATA SCIENCE...
BUILDING BETTER PREDICTIVE MODELS WITH COGNITIVE ASSISTANCE IN A DATA SCIENCE...BUILDING BETTER PREDICTIVE MODELS WITH COGNITIVE ASSISTANCE IN A DATA SCIENCE...
BUILDING BETTER PREDICTIVE MODELS WITH COGNITIVE ASSISTANCE IN A DATA SCIENCE...Alex Liu
 
Oct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on HadoopOct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on HadoopJosh Patterson
 
Red hat infrastructure for analytics
Red hat infrastructure for analyticsRed hat infrastructure for analytics
Red hat infrastructure for analyticsKyle Bader
 
IBM COE - AI /HPC/CLOUD at your university
IBM COE - AI /HPC/CLOUD at your university IBM COE - AI /HPC/CLOUD at your university
IBM COE - AI /HPC/CLOUD at your university Ganesan Narayanasamy
 
Available platforms for Big Data 2.0
Available platforms for Big Data 2.0Available platforms for Big Data 2.0
Available platforms for Big Data 2.0Petr Novotný
 
End-to-End Big Data AI with Analytics Zoo
End-to-End Big Data AI with Analytics ZooEnd-to-End Big Data AI with Analytics Zoo
End-to-End Big Data AI with Analytics ZooJason Dai
 

Similar a Delivering Agile Data Science on Openshift - Red Hat Summit 2019 (20)

Democratizing Data Science on Kubernetes
Democratizing Data Science on Kubernetes Democratizing Data Science on Kubernetes
Democratizing Data Science on Kubernetes
 
DDDP 2019 - Brown to Green
DDDP 2019  - Brown to GreenDDDP 2019  - Brown to Green
DDDP 2019 - Brown to Green
 
ODSC East 2020 Accelerate ML Lifecycle with Kubernetes and Containerized Da...
ODSC East 2020   Accelerate ML Lifecycle with Kubernetes and Containerized Da...ODSC East 2020   Accelerate ML Lifecycle with Kubernetes and Containerized Da...
ODSC East 2020 Accelerate ML Lifecycle with Kubernetes and Containerized Da...
 
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
 
Scaling Data Science on Big Data
Scaling Data Science on Big DataScaling Data Science on Big Data
Scaling Data Science on Big Data
 
OpenACC Monthly Highlights: February 2021
OpenACC Monthly Highlights: February 2021OpenACC Monthly Highlights: February 2021
OpenACC Monthly Highlights: February 2021
 
Career opportunities in open source framework
Career opportunities in open source frameworkCareer opportunities in open source framework
Career opportunities in open source framework
 
Career opportunities in open source framework
Career opportunities in open source framework Career opportunities in open source framework
Career opportunities in open source framework
 
The Future of Data Science
The Future of Data ScienceThe Future of Data Science
The Future of Data Science
 
Big Data & Open Source - Neil Jadhav
Big Data & Open Source - Neil JadhavBig Data & Open Source - Neil Jadhav
Big Data & Open Source - Neil Jadhav
 
How Cloud is Affecting Data Scientists
How Cloud is Affecting Data Scientists How Cloud is Affecting Data Scientists
How Cloud is Affecting Data Scientists
 
SoftElegance Services: Data Science, Data Engineering, Big Data Architecture
SoftElegance Services: Data Science, Data Engineering, Big Data Architecture SoftElegance Services: Data Science, Data Engineering, Big Data Architecture
SoftElegance Services: Data Science, Data Engineering, Big Data Architecture
 
BUILDING BETTER PREDICTIVE MODELS WITH COGNITIVE ASSISTANCE IN A DATA SCIENCE...
BUILDING BETTER PREDICTIVE MODELS WITH COGNITIVE ASSISTANCE IN A DATA SCIENCE...BUILDING BETTER PREDICTIVE MODELS WITH COGNITIVE ASSISTANCE IN A DATA SCIENCE...
BUILDING BETTER PREDICTIVE MODELS WITH COGNITIVE ASSISTANCE IN A DATA SCIENCE...
 
Oct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on HadoopOct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on Hadoop
 
Ss eb29
Ss eb29Ss eb29
Ss eb29
 
Red hat infrastructure for analytics
Red hat infrastructure for analyticsRed hat infrastructure for analytics
Red hat infrastructure for analytics
 
OpenPOWER foundation
OpenPOWER foundationOpenPOWER foundation
OpenPOWER foundation
 
IBM COE - AI /HPC/CLOUD at your university
IBM COE - AI /HPC/CLOUD at your university IBM COE - AI /HPC/CLOUD at your university
IBM COE - AI /HPC/CLOUD at your university
 
Available platforms for Big Data 2.0
Available platforms for Big Data 2.0Available platforms for Big Data 2.0
Available platforms for Big Data 2.0
 
End-to-End Big Data AI with Analytics Zoo
End-to-End Big Data AI with Analytics ZooEnd-to-End Big Data AI with Analytics Zoo
End-to-End Big Data AI with Analytics Zoo
 

Más de John Archer

Enabling Enterprise-wide OT Data access with Matrikon Data Broker.pdf
Enabling Enterprise-wide OT Data access  with Matrikon Data Broker.pdfEnabling Enterprise-wide OT Data access  with Matrikon Data Broker.pdf
Enabling Enterprise-wide OT Data access with Matrikon Data Broker.pdfJohn Archer
 
Extending open source and hybrid cloud to drive OT transformation - Future Oi...
Extending open source and hybrid cloud to drive OT transformation - Future Oi...Extending open source and hybrid cloud to drive OT transformation - Future Oi...
Extending open source and hybrid cloud to drive OT transformation - Future Oi...John Archer
 
Leveraging IoT as part of your digital transformation
Leveraging IoT as part of your digital transformationLeveraging IoT as part of your digital transformation
Leveraging IoT as part of your digital transformationJohn Archer
 
Locationless data science on a modern secure edge
Locationless data science on a modern secure edgeLocationless data science on a modern secure edge
Locationless data science on a modern secure edgeJohn Archer
 
Red Hat Java Update and Quarkus Introduction
Red Hat Java Update and Quarkus IntroductionRed Hat Java Update and Quarkus Introduction
Red Hat Java Update and Quarkus IntroductionJohn Archer
 
Openshift 3.10 & Container solutions for Blockchain, IoT and Data Science
Openshift 3.10 & Container solutions for Blockchain, IoT and Data ScienceOpenshift 3.10 & Container solutions for Blockchain, IoT and Data Science
Openshift 3.10 & Container solutions for Blockchain, IoT and Data ScienceJohn Archer
 
Single View of Well, Production and Assets
Single View of Well, Production and AssetsSingle View of Well, Production and Assets
Single View of Well, Production and AssetsJohn Archer
 
Red Hat Openshift on Microsoft Azure
Red Hat Openshift on Microsoft AzureRed Hat Openshift on Microsoft Azure
Red Hat Openshift on Microsoft AzureJohn Archer
 
Field development and operational optimization for unconventionals
 Field development and operational optimization for unconventionals Field development and operational optimization for unconventionals
Field development and operational optimization for unconventionalsJohn Archer
 

Más de John Archer (9)

Enabling Enterprise-wide OT Data access with Matrikon Data Broker.pdf
Enabling Enterprise-wide OT Data access  with Matrikon Data Broker.pdfEnabling Enterprise-wide OT Data access  with Matrikon Data Broker.pdf
Enabling Enterprise-wide OT Data access with Matrikon Data Broker.pdf
 
Extending open source and hybrid cloud to drive OT transformation - Future Oi...
Extending open source and hybrid cloud to drive OT transformation - Future Oi...Extending open source and hybrid cloud to drive OT transformation - Future Oi...
Extending open source and hybrid cloud to drive OT transformation - Future Oi...
 
Leveraging IoT as part of your digital transformation
Leveraging IoT as part of your digital transformationLeveraging IoT as part of your digital transformation
Leveraging IoT as part of your digital transformation
 
Locationless data science on a modern secure edge
Locationless data science on a modern secure edgeLocationless data science on a modern secure edge
Locationless data science on a modern secure edge
 
Red Hat Java Update and Quarkus Introduction
Red Hat Java Update and Quarkus IntroductionRed Hat Java Update and Quarkus Introduction
Red Hat Java Update and Quarkus Introduction
 
Openshift 3.10 & Container solutions for Blockchain, IoT and Data Science
Openshift 3.10 & Container solutions for Blockchain, IoT and Data ScienceOpenshift 3.10 & Container solutions for Blockchain, IoT and Data Science
Openshift 3.10 & Container solutions for Blockchain, IoT and Data Science
 
Single View of Well, Production and Assets
Single View of Well, Production and AssetsSingle View of Well, Production and Assets
Single View of Well, Production and Assets
 
Red Hat Openshift on Microsoft Azure
Red Hat Openshift on Microsoft AzureRed Hat Openshift on Microsoft Azure
Red Hat Openshift on Microsoft Azure
 
Field development and operational optimization for unconventionals
 Field development and operational optimization for unconventionals Field development and operational optimization for unconventionals
Field development and operational optimization for unconventionals
 

Último

Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 

Último (20)

Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 

Delivering Agile Data Science on Openshift - Red Hat Summit 2019

  • 1. Delivering Agile Data Science on Openshift Audrey Reznik Data Scientist May 9th, 2019 John Archer Principal Energy Solution Architect How to create Instant Business Value
  • 2. Red Hat Summit May 2019 – Delivering agile data science solutions with OpenShift MEET THE SPEAKERS John Archer Principal Solution Energy Architect Red Hat since 2015 BEA Systems, BSI Consulting, DocuQuest, Andrews & Kurth, SilverStream, Petris and Oracle Upstream Data Management, DoD, APIs, eCommerce, IoT, data science and blockchain SPE, SEG, PPDM, HJUG, HDUG, HAL- PC, Energistics Audrey Reznik Data Scientist Upstream Research Center ExxonMobil since 2007 Chevron, Akamai, Entriq, Digital Medical Registrar, Spider Technologies, Ziff Davis
  • 3. Red Hat Summit May 2019 – Delivering agile data science solutions with OpenShift DATA SCIENCE TEAM PRESSURES EXPLOSIVE GROWTH in data analytics teams and analytic tools MULTIPLE TEAMS COMPETING for use of the same storage and computing resources CONGESTION in busy analytic clusters causing frustration and missed SLAs EMERGING DATAOPS Data Scientist Developers vs Full Stack Developer agility and enablement gaps
  • 4. What can you envision and share?
  • 5. Red Hat Summit May 2019 – Delivering agile data science solutions with OpenShift NEED: SHARE CODE (PRODUCT) WITH USERS Jupyter Notebooks as a technology we could use to combine python code, a GUI, documentation for sharing with customers. Start of a Interactive Data Science environment. Red Hat OpenShift PoC at ExxonMobil. Could this new technology benefit us in creating a Reproducible & Interactive Data Science environment? Prize: This would enable the team to not only quickly obtain customer feedback, but also easily utilize Agile Methodology; therefore, quickly delivering MVPs. Drawback: how does one avoid the setup/configuration issues and reliably deploy the notebook? Pip install required Anaconda libraries Jupyter Notebook Python 3.x (load onto PC – or setup server) Local admin access Access to latest source code OS?SQL Server PC Setup
  • 6. Red Hat Summit May 2019 – Delivering agile data science solutions with OpenShift LOCAL PC VS OPENSHIFT PROJECT CONTAINERS Jupyter Notebook Python 3.x (image) Libraries • Numpy • Pandas • Matplotlib • IPyWidgets • SciPy • Lmfit • Seaborne • Plotly SQLite Container v2.0 GIT Image project Code project OpenShift URL to PoCCode Local PC Setup pip install required Anaconda libraries Jupyter Notebook Python 3.x (load onto PC – or setup server) Local admin access Access to latest source code OS?SQL Server Reproducible Data Science environment that users interact with via Chrome. Hardware Freedom & easier Reproduction!
  • 7. Red Hat Summit May 2019 – Delivering agile data science solutions with OpenShift For a Data Scientist, the ability to rapidly deploy code and quickly obtain feedback from a user is extremely valuable and Agile! Openshift facilitates these capabilities! REPRODUCIBLE & INTERACTIVE SCIENTIFIC ENVIRONMENT 1. Understand the Problem 2. Suggest Solutions Deliver POC 3. Refine the Problem Agile How to Deploy? URL to PoC Code GIT Image project Code project OpenShift “Interactive” feedback! Nexus Image As a user I want to provide frequent feedback! Python (Pypi) Security
  • 8. Red Hat Summit May 2019 – Delivering agile data science solutions with OpenShift DEPLOY SOURCE CODE WITH SOURCE TO IMAGE (S2I) • Re-useable Data Science Applications: data location • To re-useable Data Science Images: can they be re-consumed or modified for particular use cases? • E.g. we have a base python image that has been modified to provide TensorFlow, SciKit Learn for Data Science projects. • Reusable data access containers: SQL Server, Oracle, PI, SAP HANA. Git RepositoryBUILD APP (OpenShift) Developer code Source-to-Image (S2I) Builder Image Image Registry BUILD IMAGE (OpenShift) DEPLOY (OpenShift) deployApplication Container
  • 9. Red Hat Summit May 2019 – Delivering agile data science solutions with OpenShift MATURING THE CI/CD PIPELINE Seeing an emerging notion of Data ScienceOps workflows. Current OS production CI/CD in progress. Challenges we are experiencing include: 1. OnPrem databases in different countries 2. Development/Deployment in Jupyter notebooks GIT Jenkins build Package Jenkins Archive Artifacts in Nexus Nexus OS build image deploy to TEST OS build image deploy to PROD Test build Package
  • 10. Red Hat Summit May 2019 – Delivering agile data science solutions with OpenShift MACHINE LEARNING ON OPENSHIFT Figure 1. liquid estimates. Marco De Mattia Unique performance computing requirements for Artificial Intelligence, Machine Learning, Neural Networks and GPUs Multiple Data Science images: • TensorFlow • PyTorch • Scikit-learn Testing GPU (NVidia v100) cluster (OCP). Additional service to internal HPC. Next Steps: examine RAPIDS.AI – execute end-to- end data science pipelines in the GPU…
  • 11. Red Hat Summit May 2019 – Delivering agile data science solutions with OpenShift OPENSHIFT GPU PROOF OF CONCEPT (POC) GPU POC: read & analyze petro-physical data. Use ML Algorithms to generate analysis/models on GPU cluster. Vetted models can be pushed to Azure for deployment. GPUDB Data Scientist URL to ML App User ML Algorithms (GIT Repo) L4 Network onPrem Database(s) Containers Figure 2. GPU POC workflow, Audrey Reznik
  • 12. Red Hat Summit May 2019 – Delivering agile data science solutions with OpenShift READY FOR ANY CLOUD – PRIVATE AND PUBLIC DATA GRAVITY DRIVES THE LOCATION • OpenShift for on premise and Public Cloud (Azure) for Container as a Service (CaaS) 1. CaaS Security enabled through AD groups created onPremise and DevOps practices 2. Self-service for accessing Data Science packages with network, routing and DNS services 3. Storage can be self-service with PVC or extended with Ceph and OCP Storage options Where does your application live? How do you access it? Is my application secure? Enabled Data Science Teams • Perform More Experiments • Spend less time on plumbing • Focus on Delivering Value to ExxonMobil Resulting In
  • 13. Red Hat Summit May 2019 – Delivering agile data science solutions with OpenShift EXXONMOBIL DATA SCIENCE OS TIMELINE Started with Data Virtualization for Calgary Optimization Dec 2017 Containerized JBoss Data Virtualization on Openshift on premise - Feb 2018 Spoke with Data Science teams - Python, MATLAB, Julia and R users – Mar 2018 Introduced Graham Dumpleton’s JupyterHub container image – April - 2018 Delivered Data Science Workshop on Openshift to eight different data science teams – Dec 2018 Built “Base” Data Science image. Python 3.x, AI libraries July - 2018 Data Science developers deliver faster and collaborate globally within 2 months – Feb 2019 Successfully deploy ODH supporting multiple notebook kernels and GPU – Mar 2019 Built test OCP 3.10 cluster for NVidia v100 testing for Tensorflow and Keras - Nov 2018
  • 14. Red Hat Summit May 2019 – Delivering agile data science solutions with OpenShift MOVING FORWARD: EXXONMOBIL DATA SCIENCE CAPABILITY TODAY As a Data Scientist (all I care about) is that using Openshift, I can now deploy a common Jupyter Notebook / Anaconda image (with all required libraries) in a matter of seconds. Freeing myself (and other Data Scientists) to perform data science and not worry about architecture and delivery mechanisms. Now that is Democratizing Data Science! Selected Openshift on premises and public cloud for Container as a Service (CaaS) • Openshift supports: • One Click Notebooks and JupyterHub/Lab templates • Self-service for accessing data & data science packages • Nexus Repository to allow for Python, Java, R, PHP, .Net package managers • Docker public repository security built-in process – protects against rooted containers and new CVE attacks • NVidia GPU support allows for sharing these resources across multiple teams Jupyter Notebook & select conda libraries image being used for Kearl Mining Optimization Studies
  • 15. Red Hat Summit May 2019 – Delivering agile data science solutions with OpenShift DATA SCIENTIST DEVELOPERS NEEDS All Developers need ● Choice of architectures ● Choice of programming languages ● Choice of databases and persistence ● Choice of application services ● Choice of development tools ● Choice of build and deploy workflows Data Science Additional Needs ● Access to GPUs and varied storage ● Access to Curated Data ● Automated ScienceOps pipelines ● Collaboration with the Business ● Access to specific data science languages and toolsets They don’t want to have to worry about the infrastructure.
  • 16. Red Hat Summit May 2019 – Delivering agile data science solutions with OpenShift YOUR DIFFERENTIATION DEPENDS ON YOUR ABILITY TO DELIVER INTELLIGENT APPS FASTER CONTAINERS, KUBERNETES, DEVOPS & DATAOPS ARE KEY INGREDIENTS Innovation Culture Cloud-native Applications AI & Machine Learning Internet of Things Virtual GPU
  • 17. Red Hat Summit May 2019 – Delivering agile data science solutions with OpenShift
  • 18. Red Hat Summit May 2019 – Delivering agile data science solutions with OpenShift OPENDATAHUB.IO ARCHITECTURE CONTAINER STORAGE (CEPH) CONTAINER HOST (RHEL/RHCOS) Microsoft Azure AWSOpenStackDatacenterLaptop Google Cloud CONTAINER ORCHESTRATION AND MANAGEMENT (OPENSHIFT) S3 API Object Store BLOCK FILE GPU FPGA APPLICATION LIFE CYCLE MANAGEMENT (OPENSHIFT) DEVOPS WORKFLOW (CODE & DATA) API GATEWAY (3SCALE) SERVICE MESH (ISTIO) SERVERLESS PRIVATE MICRO SERVICES (CONTAINERIZED CUSTOM APPS) CONTAINER APPS PRE-DEFINED AI LIBRARY (BOTS | ANOMALY | CLASSIFICATION | SENTIMENT | …) AI TOOLCHAIN & WORKFLOW (JUPYTER, SUPERSET, …) COMMON SERVICES SERVICECATALOG&SELFSERVICEUI/CLI IDENTITY/POLICY(ACCESS,PLACEMENT)/LINEAGE(CODE ANDDATA) MANAGEMENTCONSOLE/INSIGHTS/AIOPS (PROMETHEUS|ELASTIC|…) FEDERATION RH Core Platform OpenShift ALM Red Hat Middleware Community & ISV Ecosystem Technology Roadmap Customer Content LEGEND PYTHON / FLASK JAVA JAVASCRIPT ... STREAMING (KAFKA - streamzi) MSG BUS (AMQ) ANALYTICS (SPARK) ML (TENSORFLOW | …) MEMORY CACHE (JDG) || DECISION (BxMS) HDFS | REDIS | SQL | NoSQL | GRAPHDB | TIMESERIES | ELASTIC | ...
  • 19. Red Hat Summit May 2019 – Delivering agile data science solutions with OpenShift MODERN DATA ANALYTICS PIPELINE DATA GENERATION INGEST DATA SCIENCE MACHINE LEARNING STREAM PROCESSING TRANSFORM, MERGE, JOIN DATA ANALYTICS • IoT Telemetry • G&G - Well Logs • Transactions • Production • NiFi • Kafka • MQTT • Presto • Impala • SparkSQL • Notebooks • TensorFlow • PyTorch • Keras • scikit-learn • AutoML* • Kafka • MQTT • WebSockets • Hadoop • Spark • Pandas • Apache Arrow • Spark • Hadoop
  • 20. CONNECTING THE EDGE TO DATA SCIENTISTS Highly Scalable, flexible, elastic, microservice based architecture Fully Portable – On Premise to any public cloud vendor Leverages the power and agility of open source software without lock-in Architecture Tenets Data Scientist Data Manager s Citizen Data Scientist Cognitive AI Vision Speech Face Audio Video Text Data Models Curation Prep Quality Publishing SecurityPython, R, Jupyter.org, Tensorflow, Keras, Pandas, Bokeh, Dash, Prometheus, Grafana, SciPy, NumPy, SumPy, Julia , Spark, PySpark, Theano, Scikit, FaceDetect Packages: AI/ML/Data Science Pods MongoDB, MariaDB, mySQL, Postgres, Couchbase, Redis, MS-SQL, OraclePersistence : SSO and Authentication OIDC SAML OAuth JWT Kerberos DevOps Node.js, .Net Core, Java, Python, PHP, Ruby, Rails, Javascript, PerlApp Dev: AppDev & App Services and Persistence Pods REST ODBC JDBC WS Predictive Maintenance Autonomous Operations Supply Chain Improvements Downstream Reliability Use Cases Multitenant – CPU and GPU powered workloads REST IoT “Things” MQTT Integration, BPM, Rules, Messaging, API, IoT, Microservices, IstioApp Services: OnPremise Public Cloud WSS Kafka
  • 21. Red Hat Summit May 2019 – Delivering agile data science solutions with OpenShift ● JupyterHub on Openshift ○ Jupyter notebook, JupyterHub, JupyterLab, Openshift Templates ● Kubeflow ○ Kube project for Tensorflow, Seldon, JupyterHub/Lab, PyTorch, MPI Operator ● Opendatahub.io ○ Ceph, Spark, JupyterHub/Lab, Tensorflow ○ Simplified Multiple Kernels support ○ GPU Support ○ Resource management and instance culling ● radanalytics.io ○ Openshift Spark ○ Oshinko - Apache Spark Cluster ○ Spark Operator OSS DATA SCIENCE PROJECTS
  • 22. Red Hat Summit May 2019 – Delivering agile data science solutions with OpenShift ● Join Openshift Commons - ML SIG - https://commons.openshift.org/ ● Openshift Self Service Education - https://learn.openshift.com ● Install Minishift - https://docs.okd.io/latest/minishift/getting- started/installing.html ○ MacOS - brew cask install minishift ○ Manual - https://github.com/minishift/minishift/releases ● Install Jupyter and JupyterHub Openshift templates ○ https://github.com/jupyter-on-openshift/jupyterhub-quickstart ● Review the OpenDataHub.io project HOW CAN I GET STARTED?
  • 23. Delivering Agile Data Science solutions with OpenShift … and providing Business Value!