SlideShare una empresa de Scribd logo
1 de 25
Driving Enterprise Adoption:
Tragedies, Triumphs and our NEXT
Standard Bank South Africa
Big Data
Journey: Lake
2
Setup POC
Environment for Data
Science Exploration
Begin Data
Lake
Journey
Get critical mass
(Enterprise Data)
ingested into
Lake
Create a
multitenant
Lake
environment
Start onboarding tenants
(Data Science teams) onto
Lake. Each team has
‘sandbox’ environment to
start prototyping models
Setup Data
Science
Workbench
on Lake
Finalise Model
Productionisation
process
Implement Real
time Streaming
capability
Productionise first data
science (Real time
Machine Learning)
model on Lake
Establish Data
Science Guild
Establish Security
Policies for Data
access
Integrate Lake
metadata into
Enterprise
metadata
repository
Move to use case
driven prioritization
for DS model
productionisation
2016 2017 2018
Successes
• Security
• Getting critical mass of Enterprise
Data into the Lake
• Establishment of our Data Science
workbench
• Defining the model productionisation
standards
Challenges
• Security
• Integrating with other systems (Kerberos)
• Open source Connectors (Kerberos)
• Proprietary Connectors (SAS/ SAP/ Actimise/ IBM)
• Skills gap
• Data Lake vs Hadoop and Strategy
• No demand to oversubscription
Big Data Landscape
5 Pillars of Enterprise Security on Hadoop
Pillar Intent Tool Pillar
Administration
How do I set
policy?
Ambari/Ranger Administration
Authentication Who am I? Kerberos/LDAP Authentication
Authorization What can I do? Ranger/Centrify Authorization
Audit What did I do? Ranger/Centrify Audit
Data Protection
How can I
encrypt data?
Ranger KMS/SSL Data Protection
6
Metadata Strategy
Place your screenshot here
8
Building a Lake and
not a Swamp
Edge Node
2
Edge Node 2Edge Node 1
CIA Data Services
Enterprise Lake
Proprietary
Data Science
Workbench
KVM
(Active/Passive)
Load Balanced
Virtual
Machines
Application
Development
Test
Application
Development
Test
Repo
(in DMZ)
e.g. R
Managed Queues Managed Storage
Common OS
Apps
Common OS
Apps
Production
Workbench
Approved list of
commonly used
open source apps
Setting up multiple tenants
Edge Node 1
Data Services
Enterprise Lake
Edge Node
Virtual
Machines
Managed Queues Managed Storage
Self Service on the Edge node
Data Science Workbench
2017 Q1
2018
Feb
2018
Ability to build
streaming data
pipelines
R Studio
Enterprise Server
Ability to source data onto
Edge node
• Up to 10 TB and copy
into their own Dev
folder(1TB) on HDFS to
run distributed
R Studio + Jupiter
Notebooks + Spark R –
sufficient tooling
available to build models
Data Science team setup with own edge
node
Enable
Power BI?
Ability to install
applications on the
Edge Node
Private Cloud: Africa Regions
Step 1: Near RT and Batch
Internal: CDC
HDF: Kafka/Nifi/
Storm
External:
Streaming
Internal/ Ext :
file based
EL:
Abinitio/Spark
Feature
Extraction
Batch
Model
Training
Spark
model
Model Results
via API
Data
Persiste
d: HDFS/
Hbase/
Elastic/
etc
Exposed
model
(PMML)
Continuous
Integration
Model
via API
Model deployed/
replaced
Regional Systems
South Africa: Data Lake
Regional
Reservoir and
Data Warehouse
In Country
Feature
Extraction
Batch
Model
Training
Spark
model
Exposed
model
(PMML)
Continuous
Integration
Private Cloud: Africa Regions
Step 1: Near RT and Batch
Internal: CDC
HDF: Kafka/Nifi/
Storm
External:
Streaming
Internal/ Ext : file
based
EL: Abinitio/Spark
Feature
Extraction
Batch
Model
Training
Spark
model
Model Results via
API
Data
Persisted:
HDFS/
Hbase/
Elastic/
etc
Exposed
model
(PMML)
Continuous
Integration
Model
via API
Model deployed/
replaced
Regional Systems
South Africa: Data Lake
Regional Reservoir
and Data
Warehouse
In Country
Feature
Extraction
Batch
Model
Training
Spark
model
Exposed
model
(PMML)
Continuous
Integration
• Model Training happens off SA Lake
• Africa Regional data ingested into SA Lake for Data Science
• Results can be made available to Africa Region Systems via API
• Can accommodate Batch and near real-time data science models
Data Science Workbench and Tools
IT Engineering Data Science
Not many
standards
Roles and Responsibilities
Data Lake Interactions
Data
Scientist
DSC
Business
Requirements
idea
Data
Data Source
requirements
Model
Design Predictive
Model
Model Testing
Processing and
Vizualisation
Optimize Model
Model Optimization
Launch to
Production
Trigger Model
Productionization
Monitor
Production
Monitor Model
Production
Data
Engineer
DEV
Data
Data Source requirements
ETL development and
productionization
Launch to
Production
Serialize Model
Production
Move to Production
Platform
Infra
Engineer
OPS
Business
Requirements
Setup Project for
Data
Existing Data
Subscribe to
Existing Data
New Data
New Data Ingestion
Pipeline
Deployment and
Subscription
Monitor Resources
Access Tools and
Quota Resources
Launch to
Production
New Project
Deployment
Existing
Data
New Data
Source
Monitor
Production
Job Performances
Monitor
Production
Execution / Queues
Capacity
D
S
BDE
No of
Sources
Variety of
data types
Number
of Use
cases
Complexity
of use cases
% of
workflow
and
automation
Data
Science
technical
skill
BDE
BDE
D
S
D
S
D
S
D
S
D
S
D
S
D
S
D
S
D
S
D
S
D
S BDE
Backlog of
Prod
Velocity
Streaming
Data Science to BDE ratio
DRIVING PRINCIPLES
We are a community of like-minded professionals
and enthusiasts who share the common goal of
teaching, learning and shaping the future of Data
Science within Standard Bank Group.
Our focus is on building a local community of
practitioners that can effectively share knowledge,
best practices, and provide opportunities for
collaboration across business units and functional
areas. We seek to consolidate needs and
preferences for Data Science technologies across
individuals and teams, bringing a unified vision for
Data Science to our Big Data environments.
We share our thoughts and ideas. We work with
openness with the understanding that
advancement depends on collaboration and shared
learning.
2017 OFFICE HOLDERS
OBJECTIVES
• To advance Data Science principles across
lines of business, using a common practice
definition
• To provide guidance and direction to
practitioners
• Establish policies, standards and processes for
the application of Data Science use-cases on
shared production environments
• To socialise Data Science use-cases, success
stories, and stumbling blocks for shared
knowledge
• Provide professional education standards and
training pathways
MISSION STATEMENT
The Data Science Guild is a technical and data-savvy group
of practitioners discussing the application of artificial
intelligence and machine learning across the Standard
Bank Group.
We aim to guide, assist, and improve the development and
productionisation of machine learning algorithms and
statistical models on our Data Lake and Data Reservoir
environments.
FUNCTIONAL SCOPE
MEMBERSHIP PROPOSITION
Join the Data Science Guild in order to advance your
individual capabilities and to get exposure to an array of
use-cases and methodologies. Join one of the working
groups to directly contribute to shaping the practice of
Data Science within Standard Bank Group.
Get exposure to the developing curriculum of Data
Science training opportunities targeted to your
individual level of practice and expected business
application.
We run monthly meetings that include:
• Overview from working groups (education, tooling,
and productionisation standards)
• Demonstration(s) of use-cases from teams across
business units
• Connect sessions for practitioners to network across
teams.
The Data Science Guild is mandated by the Enterprise
Data Committee and forms part of the Data Community
of Practice. The Guild has the responsibility of
representing the Data Science professionals in the
Group and ensuring that they are equipped with the
education, tools and means by which Data Science
assets can defined, controlled, used and communicated
for the benefit of the Group and its component business
entities.
TERMS OF REFERENCE
Data Science Guild
16
Service Offering Supported Capabilities Owned Toolsets Service Offering
Business Data
Science Executive Education
Presentation
materials /collateral Business Data Science
Technical Data
Science Knowledge Sharing Code repository Technical Data Science
Operational Data
Science Data Science Tools
Data Science
Workbench
Operational Data
Science
Data Science Model
Productionisation
Defined
Productionisation
Standards
Education
Grad Training
Programme
Michelle Gervais Chair
Kristel Sampson Deputy Chair
TBD Membership
TBD General Professional Development / Events
TBD Working Group Lead: Productionisation
TBD Working Group Lead: Training Programme
Data Lake
Model
Dev
Model
Test/
Train
Model
Serialisation
DataScientist
Prod
Model
Predict
Prod
Model
Train
DataEngineerBoth
Monitor
Model
Predict
Monitor
Model
Train
If Model Predict Decays –
Replace with Model train
Productionise
data
Option 1 Option 2 (Preferred)
Deploy Server
Data Lake
Mode
l Dev
Mode
l Test/
Train
Model
Serialisation
DataScientist
Prod
Model
Predict
Prod
Model
Train
DataEngineerBoth
Monitor
Model
Predict
Monitor
Model
Train
If Model Predict Decays – Replace with Model train
Productionise
data
Scalability, volume of data, no of users, low
latency
Deployment: Model Predict On and off the Lake
Spark Serialisation
Deep Learning
model serialisation
Python
Serialisation J
S
O
N
Serialisation of different types of models need to be
investigated. One of the main goals of model
serialization is to have the ability to possibly embed a
model into a production system. Hence some of the
available options wrt serialization may not be viable –
Python. Having one standard like JSON may not be
possible either as it may not cater for complex models.
Model serialization requires unpacking and some
prototyping.
Model Serialization
What are the AI Use Cases?
One Size Fits All
The Data Services team is
building a suite of anomaly
detection models which will
solve for domains such as :
◂Software testing
◂Customer behavior monitoring
◂Trading patterns
◂Price formation
◂System performance (servers,
networks, OS, software)
◂Fraud detection
◂Customer support
Who Are You?
The Security team have
sponsored a facial recognition
engine to enhance security for
digital channels.
Leading research has excluded
an adequate corpus of African
faces, hence the need for a
custom solution
19
The Price is Right!
Markets have become more
interconnected and data-driven.
Using AI, the Global Markets
team is becoming more
efficient and competitive in our
market-making, risk
management, pricing and
execution. We use our data to
understand the actions of
markets participants and to
change the way we react to our
trading environments.
What are the AI Use Cases?
Shap! Eish! Hujambo!
Modern day research concerning sentiment
analysis and natural language processing
techniques have focused on the English
language. Our approach is to build models
based on vernacular languages in the regions
within which we operate.
20
Work Smarter, Not Harder
The Intelligence Automation
team have been automating a
number of business processes
including account origination.
The next generation pipeline
business processes to be
automated will include
embedded artificial cognition.
.
Show Your Money Who’s boss
Standard Bank is making your financial
management personal. Using the latest
machine learning techniques, we have
developed a prototype, commissioned by the
PBB SA Digital team, which produces an
accurate and customized forecast of a
customer’s upcoming transactions.
We Are All Connected
Using the power of distributed
computing, we are building
graph database capabilities to
connect our customer records
across independent databases
and systems.
The Nigeria Ecosystem model,
based on graph, is helping to
generate leads for CIB.
What’s our next?
21
Our Next: Model Productionisation
Python/Spark
Machine Learning
Analytical
Data Science
Workbench setup with
Anaconda - Repo
Setup with Spark R –
but update required
to Python and
request for Sparklyr
and other
R
Statistical
modelling
Tensorflow (CPU)
Deep Learning
Setup Cluster – linked
to HDFS, running on
CPU not GPU
Graph
Entity Linking
Tactical – Neo4J
Installation per
user on edge node
Our Next
• Cloud
• Microservices Architecture
• Self Service (Business units enabled)
• Data Tokenisation
24
Thanks!
Any questions?
You can find us at
◂ Kristel.Sampson@standardbank.co.za
◂ Zakeera.Mahomed@standardbank.co.za
Icons are editable shapes.
This means that you can:
● Resize them without losing
quality.
● Change fill color and opacity.
● Change line color, width and
style.
Isn’t that nice? :)
Examples:
25

Más contenido relacionado

La actualidad más candente

Build Big Data Enterprise Solutions Faster on Azure HDInsight
Build Big Data Enterprise Solutions Faster on Azure HDInsightBuild Big Data Enterprise Solutions Faster on Azure HDInsight
Build Big Data Enterprise Solutions Faster on Azure HDInsightDataWorks Summit/Hadoop Summit
 
The rise of big data governance: insight on this emerging trend from active o...
The rise of big data governance: insight on this emerging trend from active o...The rise of big data governance: insight on this emerging trend from active o...
The rise of big data governance: insight on this emerging trend from active o...DataWorks Summit
 
Empowering you with Democratized Data Access, Data Science and Machine Learning
Empowering you with Democratized Data Access, Data Science and Machine LearningEmpowering you with Democratized Data Access, Data Science and Machine Learning
Empowering you with Democratized Data Access, Data Science and Machine LearningDataWorks Summit
 
Data Science: Driving Smarter Finance and Workforce Decsions for the Enterprise
Data Science: Driving Smarter Finance and Workforce Decsions for the EnterpriseData Science: Driving Smarter Finance and Workforce Decsions for the Enterprise
Data Science: Driving Smarter Finance and Workforce Decsions for the EnterpriseDataWorks Summit
 
Machine Learning Everywhere
Machine Learning EverywhereMachine Learning Everywhere
Machine Learning EverywhereDataWorks Summit
 
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...DataWorks Summit/Hadoop Summit
 
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo ClinicBig Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo ClinicDataWorks Summit
 
Hadoop Operations – Past, Present, and Future
Hadoop Operations – Past, Present, and FutureHadoop Operations – Past, Present, and Future
Hadoop Operations – Past, Present, and FutureDataWorks Summit
 
Hadoop in the cloud – The what, why and how from the experts
Hadoop in the cloud – The what, why and how from the expertsHadoop in the cloud – The what, why and how from the experts
Hadoop in the cloud – The what, why and how from the expertsDataWorks Summit
 
Data Science with Hadoop: A Primer
Data Science with Hadoop: A PrimerData Science with Hadoop: A Primer
Data Science with Hadoop: A PrimerDataWorks Summit
 
BI on Big Data with instant response times at Verizon
BI on Big Data with instant response times at VerizonBI on Big Data with instant response times at Verizon
BI on Big Data with instant response times at VerizonDataWorks Summit
 
Big SQL: Powerful SQL Optimization - Re-Imagined for open source
Big SQL: Powerful SQL Optimization - Re-Imagined for open sourceBig SQL: Powerful SQL Optimization - Re-Imagined for open source
Big SQL: Powerful SQL Optimization - Re-Imagined for open sourceDataWorks Summit
 
Hadoop Summit San Jose 2015: YARN - Past, Present and Future
Hadoop Summit San Jose 2015: YARN - Past, Present and FutureHadoop Summit San Jose 2015: YARN - Past, Present and Future
Hadoop Summit San Jose 2015: YARN - Past, Present and FutureVinod Kumar Vavilapalli
 
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...DataWorks Summit
 
Designing Data Pipelines for Automous and Trusted Analytics
Designing Data Pipelines for Automous and Trusted AnalyticsDesigning Data Pipelines for Automous and Trusted Analytics
Designing Data Pipelines for Automous and Trusted AnalyticsDataWorks Summit
 
Create a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache HadoopCreate a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache HadoopHortonworks
 
Use dependency injection to get Hadoop *out* of your application code
Use dependency injection to get Hadoop *out* of your application codeUse dependency injection to get Hadoop *out* of your application code
Use dependency injection to get Hadoop *out* of your application codeDataWorks Summit
 

La actualidad más candente (20)

Build Big Data Enterprise Solutions Faster on Azure HDInsight
Build Big Data Enterprise Solutions Faster on Azure HDInsightBuild Big Data Enterprise Solutions Faster on Azure HDInsight
Build Big Data Enterprise Solutions Faster on Azure HDInsight
 
The rise of big data governance: insight on this emerging trend from active o...
The rise of big data governance: insight on this emerging trend from active o...The rise of big data governance: insight on this emerging trend from active o...
The rise of big data governance: insight on this emerging trend from active o...
 
Empowering you with Democratized Data Access, Data Science and Machine Learning
Empowering you with Democratized Data Access, Data Science and Machine LearningEmpowering you with Democratized Data Access, Data Science and Machine Learning
Empowering you with Democratized Data Access, Data Science and Machine Learning
 
Data Science: Driving Smarter Finance and Workforce Decsions for the Enterprise
Data Science: Driving Smarter Finance and Workforce Decsions for the EnterpriseData Science: Driving Smarter Finance and Workforce Decsions for the Enterprise
Data Science: Driving Smarter Finance and Workforce Decsions for the Enterprise
 
Machine Learning Everywhere
Machine Learning EverywhereMachine Learning Everywhere
Machine Learning Everywhere
 
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
 
A Mayo Clinic Big Data Implementation
A Mayo Clinic Big Data ImplementationA Mayo Clinic Big Data Implementation
A Mayo Clinic Big Data Implementation
 
Hadoop Trends
Hadoop TrendsHadoop Trends
Hadoop Trends
 
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo ClinicBig Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
 
Hadoop Operations – Past, Present, and Future
Hadoop Operations – Past, Present, and FutureHadoop Operations – Past, Present, and Future
Hadoop Operations – Past, Present, and Future
 
Hadoop in the cloud – The what, why and how from the experts
Hadoop in the cloud – The what, why and how from the expertsHadoop in the cloud – The what, why and how from the experts
Hadoop in the cloud – The what, why and how from the experts
 
Data Science with Hadoop: A Primer
Data Science with Hadoop: A PrimerData Science with Hadoop: A Primer
Data Science with Hadoop: A Primer
 
BI on Big Data with instant response times at Verizon
BI on Big Data with instant response times at VerizonBI on Big Data with instant response times at Verizon
BI on Big Data with instant response times at Verizon
 
Big SQL: Powerful SQL Optimization - Re-Imagined for open source
Big SQL: Powerful SQL Optimization - Re-Imagined for open sourceBig SQL: Powerful SQL Optimization - Re-Imagined for open source
Big SQL: Powerful SQL Optimization - Re-Imagined for open source
 
OpenPOWER Update
OpenPOWER UpdateOpenPOWER Update
OpenPOWER Update
 
Hadoop Summit San Jose 2015: YARN - Past, Present and Future
Hadoop Summit San Jose 2015: YARN - Past, Present and FutureHadoop Summit San Jose 2015: YARN - Past, Present and Future
Hadoop Summit San Jose 2015: YARN - Past, Present and Future
 
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
 
Designing Data Pipelines for Automous and Trusted Analytics
Designing Data Pipelines for Automous and Trusted AnalyticsDesigning Data Pipelines for Automous and Trusted Analytics
Designing Data Pipelines for Automous and Trusted Analytics
 
Create a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache HadoopCreate a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache Hadoop
 
Use dependency injection to get Hadoop *out* of your application code
Use dependency injection to get Hadoop *out* of your application codeUse dependency injection to get Hadoop *out* of your application code
Use dependency injection to get Hadoop *out* of your application code
 

Similar a Driving Enterprise Adoption: Tragedies, Triumphs and Our NEXT

Developing and deploying AI solutions on the cloud using Team Data Science Pr...
Developing and deploying AI solutions on the cloud using Team Data Science Pr...Developing and deploying AI solutions on the cloud using Team Data Science Pr...
Developing and deploying AI solutions on the cloud using Team Data Science Pr...Debraj GuhaThakurta
 
Big Data for Data Scientists - Info Session
Big Data for Data Scientists - Info SessionBig Data for Data Scientists - Info Session
Big Data for Data Scientists - Info SessionWeCloudData
 
Contexti / Oracle - Big Data : From Pilot to Production
Contexti / Oracle - Big Data : From Pilot to ProductionContexti / Oracle - Big Data : From Pilot to Production
Contexti / Oracle - Big Data : From Pilot to ProductionContexti
 
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BIAugmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BIDenodo
 
6 enriching your data warehouse with big data and hadoop
6 enriching your data warehouse with big data and hadoop6 enriching your data warehouse with big data and hadoop
6 enriching your data warehouse with big data and hadoopDr. Wilfred Lin (Ph.D.)
 
Migrating Analytics to the Cloud at Fannie Mae
Migrating Analytics to the Cloud at Fannie MaeMigrating Analytics to the Cloud at Fannie Mae
Migrating Analytics to the Cloud at Fannie MaeDataWorks Summit
 
Virtual Sandbox for Data Scientists at Enterprise Scale
Virtual Sandbox for Data Scientists at Enterprise ScaleVirtual Sandbox for Data Scientists at Enterprise Scale
Virtual Sandbox for Data Scientists at Enterprise ScaleDenodo
 
Building Data Ecosystems for Accelerated Discovery
Building Data Ecosystems for Accelerated DiscoveryBuilding Data Ecosystems for Accelerated Discovery
Building Data Ecosystems for Accelerated Discoveryadamkraut
 
Part 1: Introducing the Cloudera Data Science Workbench
Part 1: Introducing the Cloudera Data Science WorkbenchPart 1: Introducing the Cloudera Data Science Workbench
Part 1: Introducing the Cloudera Data Science WorkbenchCloudera, Inc.
 
Cloudera Academic Partnership: Teaching Hadoop to the Next Generation of Data...
Cloudera Academic Partnership: Teaching Hadoop to the Next Generation of Data...Cloudera Academic Partnership: Teaching Hadoop to the Next Generation of Data...
Cloudera Academic Partnership: Teaching Hadoop to the Next Generation of Data...Cloudera, Inc.
 
Breed data scientists_ A Presentation.pptx
Breed data scientists_ A Presentation.pptxBreed data scientists_ A Presentation.pptx
Breed data scientists_ A Presentation.pptxGautamPopli1
 
Big Data & Open Source - Neil Jadhav
Big Data & Open Source - Neil JadhavBig Data & Open Source - Neil Jadhav
Big Data & Open Source - Neil JadhavSwapnil (Neil) Jadhav
 
Bay Area Azure Meetup - Ignite update session
Bay Area Azure Meetup - Ignite update sessionBay Area Azure Meetup - Ignite update session
Bay Area Azure Meetup - Ignite update sessionNills Franssens
 
zData BI & Advanced Analytics Platform + 8 Week Pilot Programs
zData BI & Advanced Analytics Platform + 8 Week Pilot ProgramszData BI & Advanced Analytics Platform + 8 Week Pilot Programs
zData BI & Advanced Analytics Platform + 8 Week Pilot ProgramszData Inc.
 
AUSOUG - NZOUG-GroundBreakers-Jun 2019 - AI and Machine Learning
AUSOUG - NZOUG-GroundBreakers-Jun 2019 - AI and Machine LearningAUSOUG - NZOUG-GroundBreakers-Jun 2019 - AI and Machine Learning
AUSOUG - NZOUG-GroundBreakers-Jun 2019 - AI and Machine LearningSandesh Rao
 
How Cloud is Affecting Data Scientists
How Cloud is Affecting Data Scientists How Cloud is Affecting Data Scientists
How Cloud is Affecting Data Scientists CCG
 
ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...
ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...
ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...Big Data Value Association
 
Purdue-Data-Engineering (1).pdf
Purdue-Data-Engineering (1).pdfPurdue-Data-Engineering (1).pdf
Purdue-Data-Engineering (1).pdfSantoshMuduli1
 
Blackboard Learn Deployment: A Detailed Update of Managed Hosting and SaaS De...
Blackboard Learn Deployment: A Detailed Update of Managed Hosting and SaaS De...Blackboard Learn Deployment: A Detailed Update of Managed Hosting and SaaS De...
Blackboard Learn Deployment: A Detailed Update of Managed Hosting and SaaS De...Blackboard APAC
 

Similar a Driving Enterprise Adoption: Tragedies, Triumphs and Our NEXT (20)

Developing and deploying AI solutions on the cloud using Team Data Science Pr...
Developing and deploying AI solutions on the cloud using Team Data Science Pr...Developing and deploying AI solutions on the cloud using Team Data Science Pr...
Developing and deploying AI solutions on the cloud using Team Data Science Pr...
 
Big Data for Data Scientists - Info Session
Big Data for Data Scientists - Info SessionBig Data for Data Scientists - Info Session
Big Data for Data Scientists - Info Session
 
Contexti / Oracle - Big Data : From Pilot to Production
Contexti / Oracle - Big Data : From Pilot to ProductionContexti / Oracle - Big Data : From Pilot to Production
Contexti / Oracle - Big Data : From Pilot to Production
 
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BIAugmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
 
6 enriching your data warehouse with big data and hadoop
6 enriching your data warehouse with big data and hadoop6 enriching your data warehouse with big data and hadoop
6 enriching your data warehouse with big data and hadoop
 
Migrating Analytics to the Cloud at Fannie Mae
Migrating Analytics to the Cloud at Fannie MaeMigrating Analytics to the Cloud at Fannie Mae
Migrating Analytics to the Cloud at Fannie Mae
 
Virtual Sandbox for Data Scientists at Enterprise Scale
Virtual Sandbox for Data Scientists at Enterprise ScaleVirtual Sandbox for Data Scientists at Enterprise Scale
Virtual Sandbox for Data Scientists at Enterprise Scale
 
Building Data Ecosystems for Accelerated Discovery
Building Data Ecosystems for Accelerated DiscoveryBuilding Data Ecosystems for Accelerated Discovery
Building Data Ecosystems for Accelerated Discovery
 
Part 1: Introducing the Cloudera Data Science Workbench
Part 1: Introducing the Cloudera Data Science WorkbenchPart 1: Introducing the Cloudera Data Science Workbench
Part 1: Introducing the Cloudera Data Science Workbench
 
Cloudera Academic Partnership: Teaching Hadoop to the Next Generation of Data...
Cloudera Academic Partnership: Teaching Hadoop to the Next Generation of Data...Cloudera Academic Partnership: Teaching Hadoop to the Next Generation of Data...
Cloudera Academic Partnership: Teaching Hadoop to the Next Generation of Data...
 
Breed data scientists_ A Presentation.pptx
Breed data scientists_ A Presentation.pptxBreed data scientists_ A Presentation.pptx
Breed data scientists_ A Presentation.pptx
 
Big Data & Open Source - Neil Jadhav
Big Data & Open Source - Neil JadhavBig Data & Open Source - Neil Jadhav
Big Data & Open Source - Neil Jadhav
 
Bay Area Azure Meetup - Ignite update session
Bay Area Azure Meetup - Ignite update sessionBay Area Azure Meetup - Ignite update session
Bay Area Azure Meetup - Ignite update session
 
zData BI & Advanced Analytics Platform + 8 Week Pilot Programs
zData BI & Advanced Analytics Platform + 8 Week Pilot ProgramszData BI & Advanced Analytics Platform + 8 Week Pilot Programs
zData BI & Advanced Analytics Platform + 8 Week Pilot Programs
 
AUSOUG - NZOUG-GroundBreakers-Jun 2019 - AI and Machine Learning
AUSOUG - NZOUG-GroundBreakers-Jun 2019 - AI and Machine LearningAUSOUG - NZOUG-GroundBreakers-Jun 2019 - AI and Machine Learning
AUSOUG - NZOUG-GroundBreakers-Jun 2019 - AI and Machine Learning
 
How Cloud is Affecting Data Scientists
How Cloud is Affecting Data Scientists How Cloud is Affecting Data Scientists
How Cloud is Affecting Data Scientists
 
Ramesh kutumbaka resume
Ramesh kutumbaka resumeRamesh kutumbaka resume
Ramesh kutumbaka resume
 
ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...
ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...
ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...
 
Purdue-Data-Engineering (1).pdf
Purdue-Data-Engineering (1).pdfPurdue-Data-Engineering (1).pdf
Purdue-Data-Engineering (1).pdf
 
Blackboard Learn Deployment: A Detailed Update of Managed Hosting and SaaS De...
Blackboard Learn Deployment: A Detailed Update of Managed Hosting and SaaS De...Blackboard Learn Deployment: A Detailed Update of Managed Hosting and SaaS De...
Blackboard Learn Deployment: A Detailed Update of Managed Hosting and SaaS De...
 

Más de DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

Más de DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Último

Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 

Último (20)

Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 

Driving Enterprise Adoption: Tragedies, Triumphs and Our NEXT

  • 1. Driving Enterprise Adoption: Tragedies, Triumphs and our NEXT Standard Bank South Africa
  • 2. Big Data Journey: Lake 2 Setup POC Environment for Data Science Exploration Begin Data Lake Journey Get critical mass (Enterprise Data) ingested into Lake Create a multitenant Lake environment Start onboarding tenants (Data Science teams) onto Lake. Each team has ‘sandbox’ environment to start prototyping models Setup Data Science Workbench on Lake Finalise Model Productionisation process Implement Real time Streaming capability Productionise first data science (Real time Machine Learning) model on Lake Establish Data Science Guild Establish Security Policies for Data access Integrate Lake metadata into Enterprise metadata repository Move to use case driven prioritization for DS model productionisation 2016 2017 2018
  • 3. Successes • Security • Getting critical mass of Enterprise Data into the Lake • Establishment of our Data Science workbench • Defining the model productionisation standards
  • 4. Challenges • Security • Integrating with other systems (Kerberos) • Open source Connectors (Kerberos) • Proprietary Connectors (SAS/ SAP/ Actimise/ IBM) • Skills gap • Data Lake vs Hadoop and Strategy • No demand to oversubscription
  • 6. 5 Pillars of Enterprise Security on Hadoop Pillar Intent Tool Pillar Administration How do I set policy? Ambari/Ranger Administration Authentication Who am I? Kerberos/LDAP Authentication Authorization What can I do? Ranger/Centrify Authorization Audit What did I do? Ranger/Centrify Audit Data Protection How can I encrypt data? Ranger KMS/SSL Data Protection 6
  • 8. Place your screenshot here 8 Building a Lake and not a Swamp
  • 9. Edge Node 2 Edge Node 2Edge Node 1 CIA Data Services Enterprise Lake Proprietary Data Science Workbench KVM (Active/Passive) Load Balanced Virtual Machines Application Development Test Application Development Test Repo (in DMZ) e.g. R Managed Queues Managed Storage Common OS Apps Common OS Apps Production Workbench Approved list of commonly used open source apps Setting up multiple tenants
  • 10. Edge Node 1 Data Services Enterprise Lake Edge Node Virtual Machines Managed Queues Managed Storage Self Service on the Edge node Data Science Workbench 2017 Q1 2018 Feb 2018 Ability to build streaming data pipelines R Studio Enterprise Server Ability to source data onto Edge node • Up to 10 TB and copy into their own Dev folder(1TB) on HDFS to run distributed R Studio + Jupiter Notebooks + Spark R – sufficient tooling available to build models Data Science team setup with own edge node Enable Power BI? Ability to install applications on the Edge Node
  • 11. Private Cloud: Africa Regions Step 1: Near RT and Batch Internal: CDC HDF: Kafka/Nifi/ Storm External: Streaming Internal/ Ext : file based EL: Abinitio/Spark Feature Extraction Batch Model Training Spark model Model Results via API Data Persiste d: HDFS/ Hbase/ Elastic/ etc Exposed model (PMML) Continuous Integration Model via API Model deployed/ replaced Regional Systems South Africa: Data Lake Regional Reservoir and Data Warehouse In Country Feature Extraction Batch Model Training Spark model Exposed model (PMML) Continuous Integration
  • 12. Private Cloud: Africa Regions Step 1: Near RT and Batch Internal: CDC HDF: Kafka/Nifi/ Storm External: Streaming Internal/ Ext : file based EL: Abinitio/Spark Feature Extraction Batch Model Training Spark model Model Results via API Data Persisted: HDFS/ Hbase/ Elastic/ etc Exposed model (PMML) Continuous Integration Model via API Model deployed/ replaced Regional Systems South Africa: Data Lake Regional Reservoir and Data Warehouse In Country Feature Extraction Batch Model Training Spark model Exposed model (PMML) Continuous Integration • Model Training happens off SA Lake • Africa Regional data ingested into SA Lake for Data Science • Results can be made available to Africa Region Systems via API • Can accommodate Batch and near real-time data science models
  • 13. Data Science Workbench and Tools IT Engineering Data Science Not many standards
  • 14. Roles and Responsibilities Data Lake Interactions Data Scientist DSC Business Requirements idea Data Data Source requirements Model Design Predictive Model Model Testing Processing and Vizualisation Optimize Model Model Optimization Launch to Production Trigger Model Productionization Monitor Production Monitor Model Production Data Engineer DEV Data Data Source requirements ETL development and productionization Launch to Production Serialize Model Production Move to Production Platform Infra Engineer OPS Business Requirements Setup Project for Data Existing Data Subscribe to Existing Data New Data New Data Ingestion Pipeline Deployment and Subscription Monitor Resources Access Tools and Quota Resources Launch to Production New Project Deployment Existing Data New Data Source Monitor Production Job Performances Monitor Production Execution / Queues Capacity
  • 15. D S BDE No of Sources Variety of data types Number of Use cases Complexity of use cases % of workflow and automation Data Science technical skill BDE BDE D S D S D S D S D S D S D S D S D S D S D S BDE Backlog of Prod Velocity Streaming Data Science to BDE ratio
  • 16. DRIVING PRINCIPLES We are a community of like-minded professionals and enthusiasts who share the common goal of teaching, learning and shaping the future of Data Science within Standard Bank Group. Our focus is on building a local community of practitioners that can effectively share knowledge, best practices, and provide opportunities for collaboration across business units and functional areas. We seek to consolidate needs and preferences for Data Science technologies across individuals and teams, bringing a unified vision for Data Science to our Big Data environments. We share our thoughts and ideas. We work with openness with the understanding that advancement depends on collaboration and shared learning. 2017 OFFICE HOLDERS OBJECTIVES • To advance Data Science principles across lines of business, using a common practice definition • To provide guidance and direction to practitioners • Establish policies, standards and processes for the application of Data Science use-cases on shared production environments • To socialise Data Science use-cases, success stories, and stumbling blocks for shared knowledge • Provide professional education standards and training pathways MISSION STATEMENT The Data Science Guild is a technical and data-savvy group of practitioners discussing the application of artificial intelligence and machine learning across the Standard Bank Group. We aim to guide, assist, and improve the development and productionisation of machine learning algorithms and statistical models on our Data Lake and Data Reservoir environments. FUNCTIONAL SCOPE MEMBERSHIP PROPOSITION Join the Data Science Guild in order to advance your individual capabilities and to get exposure to an array of use-cases and methodologies. Join one of the working groups to directly contribute to shaping the practice of Data Science within Standard Bank Group. Get exposure to the developing curriculum of Data Science training opportunities targeted to your individual level of practice and expected business application. We run monthly meetings that include: • Overview from working groups (education, tooling, and productionisation standards) • Demonstration(s) of use-cases from teams across business units • Connect sessions for practitioners to network across teams. The Data Science Guild is mandated by the Enterprise Data Committee and forms part of the Data Community of Practice. The Guild has the responsibility of representing the Data Science professionals in the Group and ensuring that they are equipped with the education, tools and means by which Data Science assets can defined, controlled, used and communicated for the benefit of the Group and its component business entities. TERMS OF REFERENCE Data Science Guild 16 Service Offering Supported Capabilities Owned Toolsets Service Offering Business Data Science Executive Education Presentation materials /collateral Business Data Science Technical Data Science Knowledge Sharing Code repository Technical Data Science Operational Data Science Data Science Tools Data Science Workbench Operational Data Science Data Science Model Productionisation Defined Productionisation Standards Education Grad Training Programme Michelle Gervais Chair Kristel Sampson Deputy Chair TBD Membership TBD General Professional Development / Events TBD Working Group Lead: Productionisation TBD Working Group Lead: Training Programme
  • 17. Data Lake Model Dev Model Test/ Train Model Serialisation DataScientist Prod Model Predict Prod Model Train DataEngineerBoth Monitor Model Predict Monitor Model Train If Model Predict Decays – Replace with Model train Productionise data Option 1 Option 2 (Preferred) Deploy Server Data Lake Mode l Dev Mode l Test/ Train Model Serialisation DataScientist Prod Model Predict Prod Model Train DataEngineerBoth Monitor Model Predict Monitor Model Train If Model Predict Decays – Replace with Model train Productionise data Scalability, volume of data, no of users, low latency Deployment: Model Predict On and off the Lake
  • 18. Spark Serialisation Deep Learning model serialisation Python Serialisation J S O N Serialisation of different types of models need to be investigated. One of the main goals of model serialization is to have the ability to possibly embed a model into a production system. Hence some of the available options wrt serialization may not be viable – Python. Having one standard like JSON may not be possible either as it may not cater for complex models. Model serialization requires unpacking and some prototyping. Model Serialization
  • 19. What are the AI Use Cases? One Size Fits All The Data Services team is building a suite of anomaly detection models which will solve for domains such as : ◂Software testing ◂Customer behavior monitoring ◂Trading patterns ◂Price formation ◂System performance (servers, networks, OS, software) ◂Fraud detection ◂Customer support Who Are You? The Security team have sponsored a facial recognition engine to enhance security for digital channels. Leading research has excluded an adequate corpus of African faces, hence the need for a custom solution 19 The Price is Right! Markets have become more interconnected and data-driven. Using AI, the Global Markets team is becoming more efficient and competitive in our market-making, risk management, pricing and execution. We use our data to understand the actions of markets participants and to change the way we react to our trading environments.
  • 20. What are the AI Use Cases? Shap! Eish! Hujambo! Modern day research concerning sentiment analysis and natural language processing techniques have focused on the English language. Our approach is to build models based on vernacular languages in the regions within which we operate. 20 Work Smarter, Not Harder The Intelligence Automation team have been automating a number of business processes including account origination. The next generation pipeline business processes to be automated will include embedded artificial cognition. . Show Your Money Who’s boss Standard Bank is making your financial management personal. Using the latest machine learning techniques, we have developed a prototype, commissioned by the PBB SA Digital team, which produces an accurate and customized forecast of a customer’s upcoming transactions. We Are All Connected Using the power of distributed computing, we are building graph database capabilities to connect our customer records across independent databases and systems. The Nigeria Ecosystem model, based on graph, is helping to generate leads for CIB.
  • 22. Our Next: Model Productionisation Python/Spark Machine Learning Analytical Data Science Workbench setup with Anaconda - Repo Setup with Spark R – but update required to Python and request for Sparklyr and other R Statistical modelling Tensorflow (CPU) Deep Learning Setup Cluster – linked to HDFS, running on CPU not GPU Graph Entity Linking Tactical – Neo4J Installation per user on edge node
  • 23. Our Next • Cloud • Microservices Architecture • Self Service (Business units enabled) • Data Tokenisation
  • 24. 24 Thanks! Any questions? You can find us at ◂ Kristel.Sampson@standardbank.co.za ◂ Zakeera.Mahomed@standardbank.co.za
  • 25. Icons are editable shapes. This means that you can: ● Resize them without losing quality. ● Change fill color and opacity. ● Change line color, width and style. Isn’t that nice? :) Examples: 25

Notas del editor

  1. Have separate GIT directories Prod will only have access to Prod GIT folder Dev will only have access to Dev Git directory Blackduck to automate package refreshes
  2. Have separate GIT directories Prod will only have access to Prod GIT folder Dev will only have access to Dev Git directory Blackduck to automate package refreshes