Más contenido relacionado La actualidad más candente (20) Similar a Driving Machine Learning and Analytics Use Cases with AWS Storage (STG302) - AWS re:Invent 2018 (20) Más de Amazon Web Services (20) Driving Machine Learning and Analytics Use Cases with AWS Storage (STG302) - AWS re:Invent 20182. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Driving Machine Learning and Analytics Use
Cases with AWS Storage
Rob Krugman
Chief Digital Officer
Broadridge Financial Systems
S T G 3 0 2
Mahendra Bairagi
AI/ML Specialist Solutions Architect
AWS
3. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Agenda
• Typical AI development life cycle
• Overview of AWS machine learning portfolio
• Data needs for AI ML workload
• Storage options for AI ML workload
• Best practices - storage options for AI
• Broadridge solution
4. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Related breakouts
Thursday November 29th
Breaking the Ice: Transform Cold Archival Data into Fresh Insights
4 – 5PM | Aria East, Level 1, Joshua 3
5. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
6. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
The Machine Learning Process
Monitoring
& Debugging
Predictions
Yes
Model Deployment
Data
Augmentation
No Are business
goals met?
Model Evaluation
Model Training
& Parameter Tuning
Feature Engineering
Data Visualization
& Analysis
Data Preparation
Data Integration
Data Collection
ML Problem Framing
Business Problem
Feature
Augmentation
7. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Integration: The data architecture
Model Evaluation Model Deployment
Business Problem
Are business
goals met?
ML Problem Framing
YesNo
Feature
Augmentation
Data
Augmentation
Predictions
Build the data
platform:
Amazon S3
Amazon Athena
Amazon EMR
Amazon Redshift
Spectrum
AWS Glue
Monitoring
& Debugging
Data Collection
Data Integration
Data Preparation
Feature Engineering
Model Training
& Parameter Tuning
Data Visualization
& Analysis
The Machine Learning Process
8. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
The model training: Undifferentiated heavy lifting
Setup and manage
Notebook
environments
Training clusters
Write data
connectors
Scale ML algorithms
to large datasets
Distribute ML
training algorithm
to multiple
machines
Secure model
artifacts
Model Deployment
Business Problem
Are business
goals met?
ML Problem Framing
YesNo
Feature
Augmentation
Data
Augmentation
Predictions
Feature Engineering
Model Training
& Parameter Tuning
Model Evaluation
Data Visualization
& Analysis
Data Collection
Data Integration
Data Preparation
Monitoring
& Debugging
The Machine Learning Process
9. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
DevOps: Undifferentiated heavy lifting
Setup and manage
Model Inference
Clusters
Manage and scale
model inference
APIs
Monitor and debug
model predictions
Models versioning
and performance
tracking
Automate new model
version promotion
to production (A/B
testing)
Data Collection
Data Integration
Data Preparation
Data Visualization
& Analysis
Feature Engineering
Model Training
& Parameter Tuning
Model Evaluation
Business Problem
Are business
goals met?
ML Problem Framing
YesNo
Feature
Augmentation
Data
Augmentation
Predictions
Model Deployment
Monitoring
& Debugging
The Machine Learning Process
10. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Collect
Store/data
lake
Build &
train
model
Inference
Latency
Throughput
Cost
Simplify ML Processing
11. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Platform services
Application services
Frameworks & interfaces
Caffe2 CNTK
Apache
MXNet
PyTorch TensorFlow Chainer Keras Gluon
AWS Deep Learning AMIs
Amazon SageMaker
Amazon
Rekognition
Amazon
Transcribe
Amazon
Translate
Amazon Polly
Amazon
Comprehend
Amazon Lex
AWS
DeepLens
Education
The Amazon Machine Learning Stack
13. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
COLLECT
Devices
Sensors
IoT platforms
AWS IoT STREAMS
IoT
EventsData streams
Migration
Snowball
Logging
Amazon
CloudWatch
AWS
CloudTrail
FILES
DataTransport&Logging
Import/expo
rt
Files / Objects
Log files
Media files
Mobile apps
Web apps
Data centers AWS Direct
Connect
RECORDS
Applications
Transactions
Data structures
Database records
Type of Data
14. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Hot Warm Cold
Volume MB–GB GB–TB PB–EB
Item size B–KB KB–MB KB–TB
Latency µs, ms ms, sec min, hrs
Durability Low–high High Very high
Request rate Very high High Low
Cost/GB $$-$ $-¢¢ ¢
Hot data Warm data Cold data
What Is the Temperature of Your ML Data?
15. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Spark Streaming
AWS Lambda
KCL apps
Amazon
Redshift
Amazon
Redshift
Hive
Spark
Presto
ProcessingTechnology
FastSlow
Hive
Native apps
KCL apps
AWS Lambda
Amazon
Athena
Amazon Kinesis Amazon
DynamoDB/RDS
Amazon S3data
Hot Cold
Storage options for ML and Analytics workload
Streaming
17. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Best Practice 1: Make sure Datalake is
Well-Architected!
5 Pillars of Well-Architected systems
• Operational excellence
• Security
• Reliability
• Performance efficiency
• Cost optimization
18. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Best Practice 2: Use the right tool for the job
Data Tier
Relational
Referential integrity
with strong
consistency,
transactions, and
hardened scale
Key-value
Low-latency, key-
based queries with
high throughput and
fast data ingestion
Document
Indexing and storing
of documents with
support for query on
any property
In-memory
Microsecond latency,
key-based queries,
specialized data
structures
Graph
Creating and
navigating relations
between data easily
and quickly
Complex query
support via SQL
Simple query
methods with filters
Simple query with
filters, projections
and aggregates
Simple query
methods with filters
Easily express queries
in terms of relations
19. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Best Practice 3: Right tool for right AI stack
Amazon SageMaker storage options:
• Use Amazon S3 pipe mode for Amazon Sagemaker where applicable
• EFS for Amazon Sagemaker notebook external storage
Deep learning AMIs Storage options:
• Amazon Elastic Block Storage (EBS), Amazon S3 and Amazon Elastic
File System (EFS)
AI Application services:
• AWS Rekognition, Amazon Polly, Amazon Comprehend, etc.
• Amazon S3
20. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
From Edge (IoT devices):
• For data: MQTT, Amazon Kinesis Data Streams
• For video: Amazon Kinesis Video Streams
• AWS SDK for Python (e.g. Amazon S3 through Boto libraries)
Best Practice 4: Storage options for ML @Edge
21. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
22. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Why do organizations typically
store and archive data?
REGULATION
23. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Unfortunately todays solutions reinforce
regulation as the primary driver
24. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
ar·chi·val: Where content and data
goes to be forgotten
25. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Yet, for 70% of organizations, the
monolithic model characteristics of
historic information management
has been replaced by a desire to
consume content capabilities as
needed*
*”Digitalizing”CoreBusinessProcesses–AIIMInternational2018
26. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
A challenge: Last January we challenged
a group of AWS and
Broadridge resources to
answer a question …
27. … Can we create services that change the
perception of what it means to store and
archive information, in an effort to make it a
value driver to an enterprise?
28. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
What we recognized in today’s solutions
• Solutions are cost prohibitive
• Typically in house, leveraging expensive hardware and software
• Information is captive
• Minimal interfaces available
• Typically customized to support point solutions
• Information in standardized
• Data and Content must adhere to a structure to be stored and leveraged
• Business use is minimal
• Access via website
• Support e-discovery capabilities
29. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Where we focused (Personas)
Operations Legal & compliance
Customer service
representatives
1 2 3
• Eliminate data silos
• Support migration,
interoperability, or
conversion
• Complete view
• Learn
• GDPR & California privacy
• Regulatory overlap
• Flexible taxonomies
• Anomaly detection
• Automation
• Experience
• Chatbots
• Self service
30. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Our Solution: Intelligent Information Management
A PaaS that turns archive data into live,
useable, actionable information
31. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Reimaging customer service
through information
management, AI and Chatbots
32. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
IIM – A Platform Focused on Information Assets
Interrogate
stored
information
Match data
point with
current trade
information
Calculate
differences
Send as text
to user
33. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Intelligent Information Management (IIM)
34. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
info
Information
services
info
UX API CHATBOT
Abstraction Tier (PaaS)
Information
services
3rd Party
Services
Cloud
services
Datall
lakes
Data
lakes
Data
lakes
Amazon
Aurora
Amazon
DynamoDB
Broadridge presentment tier
including Chatbots (voice and
text), Mobile First UX, and
Microservices APIs
Broadridge PaaS a
reusable purpose driven
services tier
BASIC
PREMIUM
Client customers | Clients | Partners | Internal customers
Users and systems will interface
with Broadridge API, UX, or
Chatbot to access information
Architected as a commodities
layer to avoid vendor lock-in
35. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
The abstraction layer
Abstraction tier (PaaS)
E-Discovery
Traditional
Search
Web
Presentment
Compliant
Archival
Actionable
Insights
NLP
Search
Risk & Fraud Image Search
Recommendatio
ns
Regulatory
Reporting
Pattern
Recognition
Chat Bot
Broadridge
Services
AWS
Services
36. Thank you!
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Rob Krugman
Chief Digital Officer
Broadridge Financial Systems
Maheandra Bairagi
AI/ML Specialist Solutions
Architect AWS
37. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.