SlideShare una empresa de Scribd logo
1 de 63
Descargar para leer sin conexión
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Arnoud Otte, Assistant Director Cloud & Data Architecture, Cambia Health Solutions
Rich Uhl, CTO / Founder, 1Strategy
Ujjwal Ratan, Solutions Architect, AWS
November 28, 2016
HLC301
Data Science and Healthcare: Running Large
Scale Analytics and Machine Learning on AWS
What to Expect from the Session
• Benefits from large-scale analytics with PHI - Arnoud
• Securing Amazon EMR & Elasticsearch - Rich
• Additional solution components for HIPAA compliance [demo] - Rich
• Reducing cost and improve quality of care with Amazon Machine
Learning [demo] - Ujjwal
NOTE: This is a deep dive session on HOW rather than WHAT. We will show
implementation details.
• This session expects familiarity with:
• AWS services - EMR and S3
 BDM401 - Deep Dive: Amazon EMR Best Practices & Design Patterns
 BDA206 - Building Big Data Applications with the AWS Big Data Platform
• Encryption and distributed systems like Hadoop and Elasticsearch
Arnoud Otte
Assistant Director Cloud & Data Architecture
Arnoud.Otte@CambiaHealth.com
Cambia Health Solutions
Our Roots
Born from an inspired idea
Our Cause
Becoming catalysts
for transformation
Our Vision
Delivering a reimagined
health care experience
Requirements
HIPAA eligible
Scalable
Managed Service
Secure
Pay-as-we-go
Performance
Master Data
Management
Data Science
& Analytics
Architecture
Amazon
CloudWatch
AWS
CloudTrail
AWS
IAM
Cambia
Data Center
Amazon
S3
Amazon
DynamoDB
AWS
Lambda
Amazon
EMR
Amazon
Elasticsearch Service
Data Lake
Metadata
Security
Amazon
Redshift
Amazon
EMR
Data Science
& Analytics
Amazon
EMR
Master Data
Management
Master Data Management
Source A Source B
First
Name
John John
Last
Name
Doe Doe
DOB 1970-01-01 2016-11-28
Street 105 Main St 105 Main St
City Portland Portland
State OR OR
Source A Source B
First
Name
Jillian Jill
Last
Name
Doe Doe-Doe
SSN 123-45-6789 123-45-6789
Street 605 Oak Dr 105 Main Street
City PDX Portland
State OR Oregon
No. Fatherandson. Yes.Married,changedname,andmoved.
This is artificial data fabricated for illustration purposes only.
Are these the same people?
Master Data Management – Approach
Demographics
Laboratory
Pharmaceutics
Geography
Claims
Composite
record of
best values
Cambia
Match and Merge
on Amazon EMR
Master Data Management – Quality
98.50%
99.90%
99.99%
97.5%
98.0%
98.5%
99.0%
99.5%
100.0%
Match Correctness
Vendor Cambia V1 Cambia V1.1
98.80%
84.30%
98.10%
75.0%
80.0%
85.0%
90.0%
95.0%
100.0%
Match Completeness
Vendor Cambia V1 Cambia V1.1
7,000+ records containing 1,600+ matches
Manually checked and confirmed in the real world
Master Data Management – Performance
90 minutes 40 minutes
0
500
1000
1500
2000
2500
minutes
Run time
Vendor Cambia V1 Cambia V1.1
2160 minutes
or 36 hours
17.7M records containing 1.8M matches
Next Steps
Scale
in and out or up and down
Amazon Machine
Learning
Amazon
EMR
Build out healthcare
data science models
HIPAA compliant
search on data
Amazon
EC2
SecurityBig Data
1Strategy.com | @1strategy_cloud | Booth #408
Rich Uhl
Founder & CTO
Rich@1Strategy.com
At Rest – when data is in a stored location
Definition of Terms
In Transit – when data is moved to and from storage
In Process – when data is in temporary space for processing state
Architecture
Amazon
CloudWatch
AWS
CloudTrail
AWS
IAM
Cambia
Data Center
Amazon
S3
Amazon
DynamoDB
AWS
Lambda
Amazon
EMR
Amazon
Elasticsearch Service
Data Lake
Metadata
Security
Amazon
Redshift
Amazon
EMR
Data Science
& Analytics
Amazon
EMR
Master Data
Management
AWS KMS
Encryption Keys Exchanging Keys Temporary KeysMaster Key
Key Management
Encryption at Rest
EMRFS
on S3
EMRFS on S3 – This is achieved via s3 client-side encryption with AWS KMS.
HDFS – via Hadoop File System (HDFS) transparent data encryption as
described in the Apache Docs.
HDFS on
EMR Cluster
Config File
Encrypted
Encryption at Rest
{
"Sid": "DenyUnEncryptedObjectUploads",
"Effect": "Deny",
"Principal": "*",
"Action": "s3:PutObject",
"Resource": "arn:aws:s3:::prd-datalake/*",
"Condition": {
"StringNotEquals": {
"s3:x-amz-server-side-encryption": "AES256"
}
}
}
EMRFS
on S3
Encryption at Rest
Data
Encryption
Key (DEK)
Envelope Data
Encryption Key
(EDEK)
Hadoop KMS
Bootstrap Script
Uses native Hadoop HDFS Transparent Data Encryption (DEK/EDEK)
HDFS on
EMR Cluster
Encryption at Rest
{
"Classification": "hdfs-site",
"Properties": {
"dfs.encryption.key.provider.uri": "kms://…”,
"dfs.namenode.name.dir": "file:///…",
"dfs.name.dir": "/mnt/encrypted/…",
"dfs.data.dir": "/mnt/encrypted/…",
"dfs.datanode.data.dir": "file:///…"
}
Bootstrap Script
HDFS on
EMR Cluster
Encryption at Rest
EMRFS
on S3
HDFS on
EMR Cluster
Summary of Encryption at Rest
Encryption in Transit
HDFS on
EMR Cluster
EMRFS
on S3
Encryption in Transit
EMRFS on
S3
HDFS on
EMR
Cluster
Encryption in Transit
<!-- Client certificate Store -->
<property>
<name>ssl.client.keystore.type</name>
<value>jks</value>
</property>
<property>
<name>ssl.client.keystore.location</name>
<value>/etc/emr/security/ssl/keystore.jks</value>
</property>
<property>
<name>ssl.client.keystore.password</name>
<value>changeit</value>
</property>
<!-- Client Trust Store -->
<property>
<name>ssl.client.truststore.type</name>
<value>jks</value>
</property>
<property>
<name>ssl.client.truststore.location</name>
<value>/etc/emr/security/ssl/truststore.jks</value>
</property>
<property>
<name>ssl.client.truststore.password</name>
<value>changeit</value>
</property>
<property>
<name>ssl.client.truststore.reload.interval</name>
<value>10000</value>
</property>
</configuration>
Three areas to address
1. Hadoop RPC - Hadoop RPC is used by API clients of MapReduce
2. HDFS DTP - HDFS Transparent encryption this traffic is automatically encrypted
3. Hadoop MapReduce Shuffle - MapReduce shuffles and sorts the output of each map task to reducers
on different nodes
HDFS
on EMR
Cluster
Encryption in Transit - Cluster
RPC
client
Hadoop RPC - Hadoop RPC is used by API clients of MapReduce
EMR
Cluster
EMRFS
on S3
Encryption in Transit - Cluster
RPC
client
<property>
<name>hadoop.security.service.user.name.key</name>
<value></value>
<description>
For those cases where the same RPC protocol is implemented by multiple
servers, this configuration is required for specifying the principal
name to use for the service when the client wishes to make an RPC call.
</description>
</property>
<property>
<name>hadoop.rpc.protection</name>
<value>authentication</value>
<description>A comma-separated list of protection values for secured sasl
connections. Possible values are authentication, integrity and privacy.
authentication means authentication only and no integrity or privacy;
integrity implies authentication and integrity are enabled; and privacy
implies all of authentication, integrity and privacy are enabled.
hadoop.security.saslproperties.resolver.class can be used to override
the hadoop.rpc.protection for a connection at the server side.
</description>
</property>
Encryption in Transit - Cluster
Data
Encryption
Key (DEK)
Envelope Data
Encryption Key
(EDEK)
Hadoop KMS
HDFS Data Transfer Protocol (DTP) – Using HDFS
Transparent encryption enabled ensures automatic
encryption
Encryption in Transit - Cluster
EMRFS
on S3
EMR
Cluster
<property>
<name>dfs.encrypt.data.transfer</name>
<value>true</value>
<description>
Whether or not actual block data that is read/written from/to HDFS should
be encrypted on the wire. This only needs to be set on the NN and DNs,
clients will deduce this automatically. It is possible to override this setting
per connection by specifying custom logic via dfs.trustedchannel.resolver.class.
</description>
</property>
<property>
<name>dfs.encrypt.data.transfer.algorithm</name>
<value></value>
<description>
This value may be set to either "3des" or "rc4". If nothing is set, then
the configured JCE default on the system is used (usually 3DES.) It is
widely believed that 3DES is more cryptographically secure, but RC4 is
substantially faster.
</description>
</property>
Data
Encryption
Key (DEK)
Envelope Data
Encryption Key
(EDEK)
Hadoop KMS
Hadoop Data Transfer Protocol (DTP) configured on
startup with a bootstrap script
Encryption in Transit - Cluster
Hadoop
Encrypted
Shuffle and Sort
Hadoop MapReduce Shuffle - In the shuffle phase, Hadoop MapReduce (MRv2) shuffles the output of
each map task to reducers on different nodes using HTTP by default.
EMR
Cluster
Encryption in Transit - Cluster
EMRFS
on S3
{
"Classification": "mapred-site",
"Properties": {
"mapreduce.shuffle.ssl.enabled": "true",
"mapred.local.dir": "/mnt/encrypted/mapred,/mnt1/encrypted/mapred",
"mapreduce.cluster.local.dir": "/mnt/encrypted/mapred,/mnt1/encrypted/mapred",
"mapreduce.application.classpath": "$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*,n
$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*,n /usr/lib/hadoop-lzo/lib/*,n
/usr/share/aws/emr/emrfs/conf,n /usr/share/aws/emr/emrfs/lib/*,n /usr/share/aws/emr/emrfs/auxlib/*,n
/usr/share/aws/emr/lib/*,n /usr/share/aws/emr/ddb/lib/emr-ddb-hadoop.jar,n
/usr/share/aws/emr/goodies/lib/emr-hadoop-goodies.jar,n /usr/share/aws/emr/kinesis/lib/emr-kinesis-
hadoop.jar,n /usr/share/aws/emr/cloudwatch-sink/lib/*,n /etc/emr/security/conf"
}
Hadoop
Encrypted
Shuffle and Sort
Encryption in Transit - Cluster
EMRFS
on S3
EMR
Cluster
Encryption in Transit - Cluster
Spark block transfer service – This is can be encrypted using SASL encryption in Spark 1.5.1 and later.
{
"Classification": "spark-env",
"Properties": {
"spark.authenticate.enableSaslEncryption": "true",
"spark.network.sasl.serverAlwaysEncrypt": "true"
}
Encryption in Transit
Encryption in Process
Temporary
Space on EBS
Volumes
Temporary
Keys
Bootstrap Script
Encryption in Process
Bootstrap Script
function encrypt_disk() {
local dev=$1
local dir=$2
local cryptname="crypt_${dir:1}"
# Unmount the drive
sudo umount "$dev"
# Encrypt the drive
sudo cryptsetup luksFormat -q --key-file "$PWD_FILE" "$dev"
sudo cryptsetup luksOpen -q --key-file "$PWD_FILE" "$dev" "$cryptname"
# Format the drive
sudo mkfs -t xfs "/dev/mapper/$cryptname"
sudo mount -o defaults,noatime,inode64 "/dev/mapper/$cryptname" "$dir"
sudo rm -rf "$dir/lost+found"
sudo mkdir -p "$dir/encrypted"
sudo chown -R hadoop:hadoop "$dir"
echo "/dev/mapper/$cryptname $dir xfs defaults,noatime,inode64 0 0" |
sudo tee -a /etc/fstab
echo "$cryptname $dev $PWD_FILE" | sudo tee -a /etc/crypttab
}
Temporary
Space on EBS
Volumes
Encryption in Process
HDFS on
EMR ClusterEMRFS on S3
Temporary Space
on EBS Volumes
RPC
Hadoop Encrypted
Shuffle and Sort
Native DTP
Summary of the EMR Encryption Process
EMR Updates
1Strategy blog links
amzn.to/2g0JJIN
September 21st, 2016
bit.ly/1strategy_emr
AWS EMR Encryption Documentation
EMR Updates and how they play into this
Temporary
Space on EBS
Volumes
ElasticSearch for HealthCare
Encryption and AuthenticationElasticSearch
on EC2
Instances
EMRFS on S3
Temporary Space
on EBS Volumes
ElasticSearch on EC2
Instances
ElasticSearch Encryption Process Summary
HIPAA is more than encryption
Auditing & custom tools:
• Audit script to show limited users have access to encrypted S3 data
• S3 Buckets are encrypted
• Show S3 Objects are encrypted
*Working with Cambia to open source these tools
bit.ly/1strategy_emr_code
Demo
Ujjwal Ratan
Solutions Architect, AWS
Ujjwalr@Amazon.com
Machine Learning inside Healthcare
Analyzing Medical Images
Prescription Compliance Prediction
Evidence Based & Precision Medicine
Text classification and mining
Medicare and Medicaid Fraud
Hospital Bed Utilization
Treatment Queries and Suggestions
Drug Discovery and Clinical Trials
Population Health
Vaccination and Immunization
Omics and Clinical Data Integration
Patient Outcomes
Patient Readmission
Prediction through risk
stratification
Real World Problem – Hospital Readmissions
• Hospital Readmission Reduction
Program (HRRP) part of the Affordable
Care Act.
• Centers for Medicare & Medicaid
Services (CMS) required to reduce
payments to hospitals with excess
readmissions.
• Not all readmissions can be prevented
• Facilities with high readmission rates
had their Medicare payment cut by 1%
in 2013 which rose to 2% in 2014.
Source - www.ncbi.nlm.nih.gov/pmc/articles/PMC3558794
Our Focus
Utilizing AWS For Machine Learning (ML)
Continuum of Machine Learning Solutions
• Limited ML Options
• Binary
• Multiclass
• Regression
• Simple to train
• Easy to evaluate
• Quick to deploy
• Comprehensive ML options
• Requires work to train
• No support for evaluation
• Additional work to deploy
• Scalable
• Customizable
Amazon EMR
+ Spark ML
Amazon Machine
Learning
Introducing Amazon Machine Learning (AML)
• Easy to use, managed machine learning
service built for developers
• Robust, powerful machine learning
technology based on Amazon’s internal
systems
• Use your data already stored in the
AWS cloud
• Models in production within seconds
Machine Learning
Proactive Prediction of Readmission
Patient
Demographics
Patient History
Admission
Attributes
Other features
Patient
High Risk Patient
Low Risk Patient
Moderate Risk
Patient
Amazon
S3
Amazon
Redshift
Amazon Machine
Learning
users
Internet
CSV
Files
1 2 3
5
Amazon
Cognito
S3 Static
Website
Internet
4
AML Application for Predicting Readmissions
Clinical Data Set
https://archive.ics.uci.edu/ml/datasets/Diabetes+130-US+hospitals+for+years+1999-2008
• 101,766 rows
• 10 years of clinical care
• 130 US hospitals
• 50+ attributes of diabetes patients and hospital outcomes
Ingesting Data into S3 - Staging
Table Name Table Type
admission_source.csv Master
admission_type.csv Master
discharge_disposition.csv Master
Diabetic_data.csv Transaction
aws s3 cp /tmp/foo/ s3://bucket/ --recursive
Schema in Redshift
Fact
create table admission_type (
admission_type_id INTEGER NOT NULL,
description varchar(100)
);
create table discharge_disposition (
discharge_disposition_id INTEGER NOT NULL,
description VARCHAR(500)
);
create table admission_source (
admission_source_id INTEGER NOT NULL,
description VARCHAR(500)
);
create table diabetes_data (
// ~50 attributes
);
Dim2
Dim3
Dim1
Data Load and Standardization
COPY<Redshift_Table_Name> FROM's3://<file_path.csv>' CREDENTIALS
'aws_access_key_id=<>;aws_secret_access_key=<>’ DELIMITER ',’ IGNOREHEADER 1;
Data Load
• Updated NULL values
• Change attributes values which do not comply with standard patterns.
• ex: Phone = (206) XXX-XXXX
• Complete geographical data where possible
• Include timeline values if possible
• Group granular attributes in sets.
• ex: Ages 0 to 20 as youth, 20 to 40 as adult and so on.
Data Standardization
Create AML Data Source with Redshift
CreateDataSourceFromRedshift API
Console
Real-time Predictions Using API
• Synchronous, low-latency, high-throughput prediction generation
• Request through service API or server or mobile SDKs
• Best for interaction applications that deal with individual data records
>>> import boto
>>> ml = boto.connect_machinelearning()
>>> ml.predict(
ml_model_id=’my_model',
predict_endpoint=’example_endpoint’,
record={’key1':’value1’, ’key2':’value2’})
{
'Prediction': {
'predictedValue': 13.284348,
'details': {
'Algorithm': 'SGD',
'PredictiveModelType': 'REGRESSION’
}
}
}
Application Website Hosted on S3
var machinelearning = new AWS.MachineLearning({apiVersion:
'2014-12-12'});
var params = {
MLModelId: ‘<AML Model ID>',
PredictEndpoint: ‘<AML Model Real Time End Point>',
Record: <Selected Attributes record set>
};
var request = machinelearning.predict(params);
Application calls the Predict() API using necessary parameters
Website hosting in S3 without web servers eliminates complexities of
scaling hardware based on traffic routed to your application.
bit.ly/aml_demo - Demo bit.ly/hcl301_blog - Blog
Expanded Architecture
Amazon
S3
Amazon
Redshift
Amazon Machine
Learning Amazon
EC2
Amazon
EMR
users
Internet
Corporate Data Center
Make data suitable to acting as
an ML data source
An ML model is
created with Redshift
as the data source
EC2 as a frontend
for AML end point
Process unstructured and
semi-structured data
Data Lake
Amazon
S3
Amazon
QuickSight
Amazon
RDS users
Batch prediction
generated and
stored in S3
DB Schemas
CSV Files
Unstructured files
QuickSight
generates BI reports
on prediction data.
An RDS schema
acts as a source
for QuickSight
Thank you!
Join us tonight at the Health Care happy hour
sponsored by Cambia Health Solutions,
8KMiles.com and AWS at:
Japonais restaurant in the Mirage
on Monday 11/28 from 6-8 PM
AWS and Cambia are co-presenting:
SEC305 – Scaling Security Resources for
Your First 10 Million Customers
Tuesday, Nov 29, 12:30 PM - 1:30 PM
Do you want to know
more about how to
secure health data?
Remember to complete
your evaluations!

Más contenido relacionado

La actualidad más candente

The New Normal - AWSome Day Zurich 112016
The New Normal - AWSome Day Zurich 112016The New Normal - AWSome Day Zurich 112016
The New Normal - AWSome Day Zurich 112016Amazon Web Services
 
AWS Big Data and Analytics Services Speed Innovation | AWS Public Sector Summ...
AWS Big Data and Analytics Services Speed Innovation | AWS Public Sector Summ...AWS Big Data and Analytics Services Speed Innovation | AWS Public Sector Summ...
AWS Big Data and Analytics Services Speed Innovation | AWS Public Sector Summ...Amazon Web Services
 
ENT305 Migrating Your Databases to AWS: Deep Dive on Amazon Relational Databa...
ENT305 Migrating Your Databases to AWS: Deep Dive on Amazon Relational Databa...ENT305 Migrating Your Databases to AWS: Deep Dive on Amazon Relational Databa...
ENT305 Migrating Your Databases to AWS: Deep Dive on Amazon Relational Databa...Amazon Web Services
 
Building Serverless Web Applications - DevDay Los Angeles 2017
Building Serverless Web Applications - DevDay Los Angeles 2017Building Serverless Web Applications - DevDay Los Angeles 2017
Building Serverless Web Applications - DevDay Los Angeles 2017Amazon Web Services
 
ENT314 Automate Best Practices and Operational Health for Your AWS Resources
ENT314 Automate Best Practices and Operational Health for Your AWS ResourcesENT314 Automate Best Practices and Operational Health for Your AWS Resources
ENT314 Automate Best Practices and Operational Health for Your AWS ResourcesAmazon Web Services
 
Building your First Big Data Application on AWS
Building your First Big Data Application on AWSBuilding your First Big Data Application on AWS
Building your First Big Data Application on AWSAmazon Web Services
 
What's New with Big Data Analytics
What's New with Big Data AnalyticsWhat's New with Big Data Analytics
What's New with Big Data AnalyticsAmazon Web Services
 
Database and Analytics on the AWS Cloud
Database and Analytics on the AWS CloudDatabase and Analytics on the AWS Cloud
Database and Analytics on the AWS CloudAmazon Web Services
 
Structured, Unstructured and Streaming Big Data on the AWS
Structured, Unstructured and Streaming Big Data on the AWSStructured, Unstructured and Streaming Big Data on the AWS
Structured, Unstructured and Streaming Big Data on the AWSAmazon Web Services
 
How EIS Reduced Costs by 20% and Optimized SAP by Leveraging the Cloud PPT
How EIS Reduced Costs by 20% and Optimized SAP by Leveraging the Cloud PPTHow EIS Reduced Costs by 20% and Optimized SAP by Leveraging the Cloud PPT
How EIS Reduced Costs by 20% and Optimized SAP by Leveraging the Cloud PPTAmazon Web Services
 
Full Stack Analytics on AWS - AWS Summit Cape Town 2017
Full Stack Analytics on AWS - AWS Summit Cape Town 2017 Full Stack Analytics on AWS - AWS Summit Cape Town 2017
Full Stack Analytics on AWS - AWS Summit Cape Town 2017 Amazon Web Services
 
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924Amazon Web Services
 
February 2016 Webinar Series - 451 Research and AWS
February 2016 Webinar Series - 451 Research and AWSFebruary 2016 Webinar Series - 451 Research and AWS
February 2016 Webinar Series - 451 Research and AWSAmazon Web Services
 
Big Data on AWS - Toronto FSI Symposium - October 2016
Big Data on AWS - Toronto FSI Symposium - October 2016Big Data on AWS - Toronto FSI Symposium - October 2016
Big Data on AWS - Toronto FSI Symposium - October 2016Amazon Web Services
 
Building a Big Data & Analytics Platform using AWS
Building a Big Data & Analytics Platform using AWS Building a Big Data & Analytics Platform using AWS
Building a Big Data & Analytics Platform using AWS Amazon Web Services
 
ENT316 Keeping Pace With The Cloud: Managing and Optimizing as You Scale
ENT316 Keeping Pace With The Cloud: Managing and Optimizing as You ScaleENT316 Keeping Pace With The Cloud: Managing and Optimizing as You Scale
ENT316 Keeping Pace With The Cloud: Managing and Optimizing as You ScaleAmazon Web Services
 
Best Practices Using Big Data on AWS | AWS Public Sector Summit 2017
Best Practices Using Big Data on AWS | AWS Public Sector Summit 2017Best Practices Using Big Data on AWS | AWS Public Sector Summit 2017
Best Practices Using Big Data on AWS | AWS Public Sector Summit 2017Amazon Web Services
 
Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSAmazon Web Services
 
Partner webinar presentation aws pebble_treasure_data
Partner webinar presentation aws pebble_treasure_dataPartner webinar presentation aws pebble_treasure_data
Partner webinar presentation aws pebble_treasure_dataTreasure Data, Inc.
 

La actualidad más candente (20)

The New Normal - AWSome Day Zurich 112016
The New Normal - AWSome Day Zurich 112016The New Normal - AWSome Day Zurich 112016
The New Normal - AWSome Day Zurich 112016
 
AWS Big Data and Analytics Services Speed Innovation | AWS Public Sector Summ...
AWS Big Data and Analytics Services Speed Innovation | AWS Public Sector Summ...AWS Big Data and Analytics Services Speed Innovation | AWS Public Sector Summ...
AWS Big Data and Analytics Services Speed Innovation | AWS Public Sector Summ...
 
ENT305 Migrating Your Databases to AWS: Deep Dive on Amazon Relational Databa...
ENT305 Migrating Your Databases to AWS: Deep Dive on Amazon Relational Databa...ENT305 Migrating Your Databases to AWS: Deep Dive on Amazon Relational Databa...
ENT305 Migrating Your Databases to AWS: Deep Dive on Amazon Relational Databa...
 
Building Serverless Web Applications - DevDay Los Angeles 2017
Building Serverless Web Applications - DevDay Los Angeles 2017Building Serverless Web Applications - DevDay Los Angeles 2017
Building Serverless Web Applications - DevDay Los Angeles 2017
 
Securing Your Big Data on AWS
Securing Your Big Data on AWSSecuring Your Big Data on AWS
Securing Your Big Data on AWS
 
ENT314 Automate Best Practices and Operational Health for Your AWS Resources
ENT314 Automate Best Practices and Operational Health for Your AWS ResourcesENT314 Automate Best Practices and Operational Health for Your AWS Resources
ENT314 Automate Best Practices and Operational Health for Your AWS Resources
 
Building your First Big Data Application on AWS
Building your First Big Data Application on AWSBuilding your First Big Data Application on AWS
Building your First Big Data Application on AWS
 
What's New with Big Data Analytics
What's New with Big Data AnalyticsWhat's New with Big Data Analytics
What's New with Big Data Analytics
 
Database and Analytics on the AWS Cloud
Database and Analytics on the AWS CloudDatabase and Analytics on the AWS Cloud
Database and Analytics on the AWS Cloud
 
Structured, Unstructured and Streaming Big Data on the AWS
Structured, Unstructured and Streaming Big Data on the AWSStructured, Unstructured and Streaming Big Data on the AWS
Structured, Unstructured and Streaming Big Data on the AWS
 
How EIS Reduced Costs by 20% and Optimized SAP by Leveraging the Cloud PPT
How EIS Reduced Costs by 20% and Optimized SAP by Leveraging the Cloud PPTHow EIS Reduced Costs by 20% and Optimized SAP by Leveraging the Cloud PPT
How EIS Reduced Costs by 20% and Optimized SAP by Leveraging the Cloud PPT
 
Full Stack Analytics on AWS - AWS Summit Cape Town 2017
Full Stack Analytics on AWS - AWS Summit Cape Town 2017 Full Stack Analytics on AWS - AWS Summit Cape Town 2017
Full Stack Analytics on AWS - AWS Summit Cape Town 2017
 
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
 
February 2016 Webinar Series - 451 Research and AWS
February 2016 Webinar Series - 451 Research and AWSFebruary 2016 Webinar Series - 451 Research and AWS
February 2016 Webinar Series - 451 Research and AWS
 
Big Data on AWS - Toronto FSI Symposium - October 2016
Big Data on AWS - Toronto FSI Symposium - October 2016Big Data on AWS - Toronto FSI Symposium - October 2016
Big Data on AWS - Toronto FSI Symposium - October 2016
 
Building a Big Data & Analytics Platform using AWS
Building a Big Data & Analytics Platform using AWS Building a Big Data & Analytics Platform using AWS
Building a Big Data & Analytics Platform using AWS
 
ENT316 Keeping Pace With The Cloud: Managing and Optimizing as You Scale
ENT316 Keeping Pace With The Cloud: Managing and Optimizing as You ScaleENT316 Keeping Pace With The Cloud: Managing and Optimizing as You Scale
ENT316 Keeping Pace With The Cloud: Managing and Optimizing as You Scale
 
Best Practices Using Big Data on AWS | AWS Public Sector Summit 2017
Best Practices Using Big Data on AWS | AWS Public Sector Summit 2017Best Practices Using Big Data on AWS | AWS Public Sector Summit 2017
Best Practices Using Big Data on AWS | AWS Public Sector Summit 2017
 
Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWS
 
Partner webinar presentation aws pebble_treasure_data
Partner webinar presentation aws pebble_treasure_dataPartner webinar presentation aws pebble_treasure_data
Partner webinar presentation aws pebble_treasure_data
 

Destacado

Splunk for Enterprise Security featuring User Behavior Analytics
Splunk for Enterprise Security featuring User Behavior AnalyticsSplunk for Enterprise Security featuring User Behavior Analytics
Splunk for Enterprise Security featuring User Behavior AnalyticsSplunk
 
AWS re:Invent 2016: FINRA: Building a Secure Data Science Platform on AWS (BD...
AWS re:Invent 2016: FINRA: Building a Secure Data Science Platform on AWS (BD...AWS re:Invent 2016: FINRA: Building a Secure Data Science Platform on AWS (BD...
AWS re:Invent 2016: FINRA: Building a Secure Data Science Platform on AWS (BD...Amazon Web Services
 
AWS re:Invent 2016: Case Study: Data-Heavy Healthcare: UPMCe’s Transformative...
AWS re:Invent 2016: Case Study: Data-Heavy Healthcare: UPMCe’s Transformative...AWS re:Invent 2016: Case Study: Data-Heavy Healthcare: UPMCe’s Transformative...
AWS re:Invent 2016: Case Study: Data-Heavy Healthcare: UPMCe’s Transformative...Amazon Web Services
 
Log Mining: Beyond Log Analysis
Log Mining: Beyond Log AnalysisLog Mining: Beyond Log Analysis
Log Mining: Beyond Log AnalysisAnton Chuvakin
 
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...Amazon Web Services
 
AWS re:Invent 2016: Security Automation: Spend Less Time Securing Your Applic...
AWS re:Invent 2016: Security Automation: Spend Less Time Securing Your Applic...AWS re:Invent 2016: Security Automation: Spend Less Time Securing Your Applic...
AWS re:Invent 2016: Security Automation: Spend Less Time Securing Your Applic...Amazon Web Services
 
Cloud Connect 2013- Lock Stock and x Smoking EC2's
Cloud Connect 2013- Lock Stock and x Smoking EC2'sCloud Connect 2013- Lock Stock and x Smoking EC2's
Cloud Connect 2013- Lock Stock and x Smoking EC2'sHarish Ganesan
 
A Fast and Dirty Intro to NetworkX (and D3)
A Fast and Dirty Intro to NetworkX (and D3)A Fast and Dirty Intro to NetworkX (and D3)
A Fast and Dirty Intro to NetworkX (and D3)Lynn Cherny
 
AWS re:Invent 2016: FINRA in the Cloud: the Big Data Enterprise (ENT313)
AWS re:Invent 2016: FINRA in the Cloud: the Big Data Enterprise (ENT313)AWS re:Invent 2016: FINRA in the Cloud: the Big Data Enterprise (ENT313)
AWS re:Invent 2016: FINRA in the Cloud: the Big Data Enterprise (ENT313)Amazon Web Services
 
AWS re:Invent 2016: Automating and Scaling Infrastructure Administration with...
AWS re:Invent 2016: Automating and Scaling Infrastructure Administration with...AWS re:Invent 2016: Automating and Scaling Infrastructure Administration with...
AWS re:Invent 2016: Automating and Scaling Infrastructure Administration with...Amazon Web Services
 
AWS re:Invent 2016: Deep Dive: Amazon EMR Best Practices & Design Patterns (B...
AWS re:Invent 2016: Deep Dive: Amazon EMR Best Practices & Design Patterns (B...AWS re:Invent 2016: Deep Dive: Amazon EMR Best Practices & Design Patterns (B...
AWS re:Invent 2016: Deep Dive: Amazon EMR Best Practices & Design Patterns (B...Amazon Web Services
 
AWS re:Invent 2016: Datapipe Open Source: Image Development Pipeline (ARC319)
AWS re:Invent 2016: Datapipe Open Source:  Image Development Pipeline (ARC319)AWS re:Invent 2016: Datapipe Open Source:  Image Development Pipeline (ARC319)
AWS re:Invent 2016: Datapipe Open Source: Image Development Pipeline (ARC319)Amazon Web Services
 
AWS re:Invent 2016: Leveraging Amazon Machine Learning, Amazon Redshift, and ...
AWS re:Invent 2016: Leveraging Amazon Machine Learning, Amazon Redshift, and ...AWS re:Invent 2016: Leveraging Amazon Machine Learning, Amazon Redshift, and ...
AWS re:Invent 2016: Leveraging Amazon Machine Learning, Amazon Redshift, and ...Amazon Web Services
 
AWS re:Invent 2016: Netflix: Using Amazon S3 as the fabric of our big data ec...
AWS re:Invent 2016: Netflix: Using Amazon S3 as the fabric of our big data ec...AWS re:Invent 2016: Netflix: Using Amazon S3 as the fabric of our big data ec...
AWS re:Invent 2016: Netflix: Using Amazon S3 as the fabric of our big data ec...Amazon Web Services
 
AWS re:Invent 2016: Big Data Mini Con State of the Union (BDM205)
AWS re:Invent 2016: Big Data Mini Con State of the Union (BDM205)AWS re:Invent 2016: Big Data Mini Con State of the Union (BDM205)
AWS re:Invent 2016: Big Data Mini Con State of the Union (BDM205)Amazon Web Services
 
AWS re:Invent 2016: Extending Datacenters to the Cloud: Connectivity Options ...
AWS re:Invent 2016: Extending Datacenters to the Cloud: Connectivity Options ...AWS re:Invent 2016: Extending Datacenters to the Cloud: Connectivity Options ...
AWS re:Invent 2016: Extending Datacenters to the Cloud: Connectivity Options ...Amazon Web Services
 
AWS re:Invent 2016: Achieving Agility by Following Well-Architected Framework...
AWS re:Invent 2016: Achieving Agility by Following Well-Architected Framework...AWS re:Invent 2016: Achieving Agility by Following Well-Architected Framework...
AWS re:Invent 2016: Achieving Agility by Following Well-Architected Framework...Amazon Web Services
 
Turning Big Data Insights into Action through Advanced Analytics
Turning Big Data Insights into Action through Advanced AnalyticsTurning Big Data Insights into Action through Advanced Analytics
Turning Big Data Insights into Action through Advanced AnalyticsCraig Rhinehart Rhinehart
 
Healthcare 2.0: The Age of Analytics
Healthcare 2.0: The Age of AnalyticsHealthcare 2.0: The Age of Analytics
Healthcare 2.0: The Age of AnalyticsHealth Catalyst
 

Destacado (20)

Splunk for Enterprise Security featuring User Behavior Analytics
Splunk for Enterprise Security featuring User Behavior AnalyticsSplunk for Enterprise Security featuring User Behavior Analytics
Splunk for Enterprise Security featuring User Behavior Analytics
 
AWS re:Invent 2016: FINRA: Building a Secure Data Science Platform on AWS (BD...
AWS re:Invent 2016: FINRA: Building a Secure Data Science Platform on AWS (BD...AWS re:Invent 2016: FINRA: Building a Secure Data Science Platform on AWS (BD...
AWS re:Invent 2016: FINRA: Building a Secure Data Science Platform on AWS (BD...
 
Log Data Mining
Log Data MiningLog Data Mining
Log Data Mining
 
AWS re:Invent 2016: Case Study: Data-Heavy Healthcare: UPMCe’s Transformative...
AWS re:Invent 2016: Case Study: Data-Heavy Healthcare: UPMCe’s Transformative...AWS re:Invent 2016: Case Study: Data-Heavy Healthcare: UPMCe’s Transformative...
AWS re:Invent 2016: Case Study: Data-Heavy Healthcare: UPMCe’s Transformative...
 
Log Mining: Beyond Log Analysis
Log Mining: Beyond Log AnalysisLog Mining: Beyond Log Analysis
Log Mining: Beyond Log Analysis
 
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
 
AWS re:Invent 2016: Security Automation: Spend Less Time Securing Your Applic...
AWS re:Invent 2016: Security Automation: Spend Less Time Securing Your Applic...AWS re:Invent 2016: Security Automation: Spend Less Time Securing Your Applic...
AWS re:Invent 2016: Security Automation: Spend Less Time Securing Your Applic...
 
Cloud Connect 2013- Lock Stock and x Smoking EC2's
Cloud Connect 2013- Lock Stock and x Smoking EC2'sCloud Connect 2013- Lock Stock and x Smoking EC2's
Cloud Connect 2013- Lock Stock and x Smoking EC2's
 
A Fast and Dirty Intro to NetworkX (and D3)
A Fast and Dirty Intro to NetworkX (and D3)A Fast and Dirty Intro to NetworkX (and D3)
A Fast and Dirty Intro to NetworkX (and D3)
 
AWS re:Invent 2016: FINRA in the Cloud: the Big Data Enterprise (ENT313)
AWS re:Invent 2016: FINRA in the Cloud: the Big Data Enterprise (ENT313)AWS re:Invent 2016: FINRA in the Cloud: the Big Data Enterprise (ENT313)
AWS re:Invent 2016: FINRA in the Cloud: the Big Data Enterprise (ENT313)
 
AWS re:Invent 2016: Automating and Scaling Infrastructure Administration with...
AWS re:Invent 2016: Automating and Scaling Infrastructure Administration with...AWS re:Invent 2016: Automating and Scaling Infrastructure Administration with...
AWS re:Invent 2016: Automating and Scaling Infrastructure Administration with...
 
AWS re:Invent 2016: Deep Dive: Amazon EMR Best Practices & Design Patterns (B...
AWS re:Invent 2016: Deep Dive: Amazon EMR Best Practices & Design Patterns (B...AWS re:Invent 2016: Deep Dive: Amazon EMR Best Practices & Design Patterns (B...
AWS re:Invent 2016: Deep Dive: Amazon EMR Best Practices & Design Patterns (B...
 
AWS re:Invent 2016: Datapipe Open Source: Image Development Pipeline (ARC319)
AWS re:Invent 2016: Datapipe Open Source:  Image Development Pipeline (ARC319)AWS re:Invent 2016: Datapipe Open Source:  Image Development Pipeline (ARC319)
AWS re:Invent 2016: Datapipe Open Source: Image Development Pipeline (ARC319)
 
AWS re:Invent 2016: Leveraging Amazon Machine Learning, Amazon Redshift, and ...
AWS re:Invent 2016: Leveraging Amazon Machine Learning, Amazon Redshift, and ...AWS re:Invent 2016: Leveraging Amazon Machine Learning, Amazon Redshift, and ...
AWS re:Invent 2016: Leveraging Amazon Machine Learning, Amazon Redshift, and ...
 
AWS re:Invent 2016: Netflix: Using Amazon S3 as the fabric of our big data ec...
AWS re:Invent 2016: Netflix: Using Amazon S3 as the fabric of our big data ec...AWS re:Invent 2016: Netflix: Using Amazon S3 as the fabric of our big data ec...
AWS re:Invent 2016: Netflix: Using Amazon S3 as the fabric of our big data ec...
 
AWS re:Invent 2016: Big Data Mini Con State of the Union (BDM205)
AWS re:Invent 2016: Big Data Mini Con State of the Union (BDM205)AWS re:Invent 2016: Big Data Mini Con State of the Union (BDM205)
AWS re:Invent 2016: Big Data Mini Con State of the Union (BDM205)
 
AWS re:Invent 2016: Extending Datacenters to the Cloud: Connectivity Options ...
AWS re:Invent 2016: Extending Datacenters to the Cloud: Connectivity Options ...AWS re:Invent 2016: Extending Datacenters to the Cloud: Connectivity Options ...
AWS re:Invent 2016: Extending Datacenters to the Cloud: Connectivity Options ...
 
AWS re:Invent 2016: Achieving Agility by Following Well-Architected Framework...
AWS re:Invent 2016: Achieving Agility by Following Well-Architected Framework...AWS re:Invent 2016: Achieving Agility by Following Well-Architected Framework...
AWS re:Invent 2016: Achieving Agility by Following Well-Architected Framework...
 
Turning Big Data Insights into Action through Advanced Analytics
Turning Big Data Insights into Action through Advanced AnalyticsTurning Big Data Insights into Action through Advanced Analytics
Turning Big Data Insights into Action through Advanced Analytics
 
Healthcare 2.0: The Age of Analytics
Healthcare 2.0: The Age of AnalyticsHealthcare 2.0: The Age of Analytics
Healthcare 2.0: The Age of Analytics
 

Similar a AWS re:Invent 2016| HLC301 | Data Science and Healthcare: Running Large Scale Analytics and Machine Learning on AWS

Deep Dive on Amazon Relational Database Service
Deep Dive on Amazon Relational Database ServiceDeep Dive on Amazon Relational Database Service
Deep Dive on Amazon Relational Database ServiceAmazon Web Services
 
透過 Amazon Redshift 打造數據分析服務及 Amazon Redshift 新功能案例介紹
透過 Amazon Redshift 打造數據分析服務及 Amazon Redshift 新功能案例介紹透過 Amazon Redshift 打造數據分析服務及 Amazon Redshift 新功能案例介紹
透過 Amazon Redshift 打造數據分析服務及 Amazon Redshift 新功能案例介紹Amazon Web Services
 
Fraud Detection and Prevention on AWS using Machine Learning
Fraud Detection and Prevention on AWS using Machine LearningFraud Detection and Prevention on AWS using Machine Learning
Fraud Detection and Prevention on AWS using Machine LearningAmazon Web Services
 
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the CloudFSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the CloudAmazon Web Services
 
Cloud Migration, Application Modernization and Security for Partners
Cloud Migration, Application Modernization and Security for PartnersCloud Migration, Application Modernization and Security for Partners
Cloud Migration, Application Modernization and Security for PartnersAmazon Web Services
 
Cloud Migration, Application Modernization and Security for Partners
Cloud Migration, Application Modernization and Security for PartnersCloud Migration, Application Modernization and Security for Partners
Cloud Migration, Application Modernization and Security for PartnersAmazon Web Services
 
Cloud Migration, Application Modernization, and Security
Cloud Migration, Application Modernization, and Security Cloud Migration, Application Modernization, and Security
Cloud Migration, Application Modernization, and Security Tom Laszewski
 
Builders Day' - Databases on AWS: The Right Tool for The Right Job
Builders Day' - Databases on AWS: The Right Tool for The Right JobBuilders Day' - Databases on AWS: The Right Tool for The Right Job
Builders Day' - Databases on AWS: The Right Tool for The Right JobAmazon Web Services LATAM
 
(BDT305) Amazon EMR Deep Dive and Best Practices
(BDT305) Amazon EMR Deep Dive and Best Practices(BDT305) Amazon EMR Deep Dive and Best Practices
(BDT305) Amazon EMR Deep Dive and Best PracticesAmazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Data Analytics on AWS
Data Analytics on AWSData Analytics on AWS
Data Analytics on AWSDanilo Poccia
 
Fraud Detection with Amazon Machine Learning on AWS
Fraud Detection with Amazon Machine Learning on AWSFraud Detection with Amazon Machine Learning on AWS
Fraud Detection with Amazon Machine Learning on AWSAmazon Web Services
 
Amazon Elastic Map Reduce - Ian Meyers
Amazon Elastic Map Reduce - Ian MeyersAmazon Elastic Map Reduce - Ian Meyers
Amazon Elastic Map Reduce - Ian Meyershuguk
 
BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon Redshift
BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon RedshiftBDA306 Building a Modern Data Warehouse: Deep Dive on Amazon Redshift
BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon RedshiftAmazon Web Services
 
Masterclass Webinar: Amazon DynamoDB July 2014
Masterclass Webinar: Amazon DynamoDB July 2014Masterclass Webinar: Amazon DynamoDB July 2014
Masterclass Webinar: Amazon DynamoDB July 2014Amazon Web Services
 
Build Data Lakes and Analytics on AWS
Build Data Lakes and Analytics on AWS Build Data Lakes and Analytics on AWS
Build Data Lakes and Analytics on AWS Amazon Web Services
 
Big Data on Azure Tutorial
Big Data on Azure TutorialBig Data on Azure Tutorial
Big Data on Azure Tutorialrustd
 

Similar a AWS re:Invent 2016| HLC301 | Data Science and Healthcare: Running Large Scale Analytics and Machine Learning on AWS (20)

Deep Dive on Amazon Relational Database Service
Deep Dive on Amazon Relational Database ServiceDeep Dive on Amazon Relational Database Service
Deep Dive on Amazon Relational Database Service
 
透過 Amazon Redshift 打造數據分析服務及 Amazon Redshift 新功能案例介紹
透過 Amazon Redshift 打造數據分析服務及 Amazon Redshift 新功能案例介紹透過 Amazon Redshift 打造數據分析服務及 Amazon Redshift 新功能案例介紹
透過 Amazon Redshift 打造數據分析服務及 Amazon Redshift 新功能案例介紹
 
Fraud Detection and Prevention on AWS using Machine Learning
Fraud Detection and Prevention on AWS using Machine LearningFraud Detection and Prevention on AWS using Machine Learning
Fraud Detection and Prevention on AWS using Machine Learning
 
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the CloudFSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
 
Cloud Migration, Application Modernization and Security for Partners
Cloud Migration, Application Modernization and Security for PartnersCloud Migration, Application Modernization and Security for Partners
Cloud Migration, Application Modernization and Security for Partners
 
Cloud Migration, Application Modernization and Security for Partners
Cloud Migration, Application Modernization and Security for PartnersCloud Migration, Application Modernization and Security for Partners
Cloud Migration, Application Modernization and Security for Partners
 
Cloud Migration, Application Modernization, and Security
Cloud Migration, Application Modernization, and Security Cloud Migration, Application Modernization, and Security
Cloud Migration, Application Modernization, and Security
 
Builders Day' - Databases on AWS: The Right Tool for The Right Job
Builders Day' - Databases on AWS: The Right Tool for The Right JobBuilders Day' - Databases on AWS: The Right Tool for The Right Job
Builders Day' - Databases on AWS: The Right Tool for The Right Job
 
(BDT305) Amazon EMR Deep Dive and Best Practices
(BDT305) Amazon EMR Deep Dive and Best Practices(BDT305) Amazon EMR Deep Dive and Best Practices
(BDT305) Amazon EMR Deep Dive and Best Practices
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Data Analytics on AWS
Data Analytics on AWSData Analytics on AWS
Data Analytics on AWS
 
Fraud Detection with Amazon Machine Learning on AWS
Fraud Detection with Amazon Machine Learning on AWSFraud Detection with Amazon Machine Learning on AWS
Fraud Detection with Amazon Machine Learning on AWS
 
Amazon Elastic Map Reduce - Ian Meyers
Amazon Elastic Map Reduce - Ian MeyersAmazon Elastic Map Reduce - Ian Meyers
Amazon Elastic Map Reduce - Ian Meyers
 
BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon Redshift
BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon RedshiftBDA306 Building a Modern Data Warehouse: Deep Dive on Amazon Redshift
BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon Redshift
 
Masterclass Webinar: Amazon DynamoDB July 2014
Masterclass Webinar: Amazon DynamoDB July 2014Masterclass Webinar: Amazon DynamoDB July 2014
Masterclass Webinar: Amazon DynamoDB July 2014
 
Build Data Lakes and Analytics on AWS
Build Data Lakes and Analytics on AWS Build Data Lakes and Analytics on AWS
Build Data Lakes and Analytics on AWS
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
 
Big Data on Azure Tutorial
Big Data on Azure TutorialBig Data on Azure Tutorial
Big Data on Azure Tutorial
 

Más de Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

Más de Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Último

Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 

Último (20)

Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 

AWS re:Invent 2016| HLC301 | Data Science and Healthcare: Running Large Scale Analytics and Machine Learning on AWS

  • 1. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Arnoud Otte, Assistant Director Cloud & Data Architecture, Cambia Health Solutions Rich Uhl, CTO / Founder, 1Strategy Ujjwal Ratan, Solutions Architect, AWS November 28, 2016 HLC301 Data Science and Healthcare: Running Large Scale Analytics and Machine Learning on AWS
  • 2. What to Expect from the Session • Benefits from large-scale analytics with PHI - Arnoud • Securing Amazon EMR & Elasticsearch - Rich • Additional solution components for HIPAA compliance [demo] - Rich • Reducing cost and improve quality of care with Amazon Machine Learning [demo] - Ujjwal NOTE: This is a deep dive session on HOW rather than WHAT. We will show implementation details. • This session expects familiarity with: • AWS services - EMR and S3  BDM401 - Deep Dive: Amazon EMR Best Practices & Design Patterns  BDA206 - Building Big Data Applications with the AWS Big Data Platform • Encryption and distributed systems like Hadoop and Elasticsearch
  • 3. Arnoud Otte Assistant Director Cloud & Data Architecture Arnoud.Otte@CambiaHealth.com
  • 4. Cambia Health Solutions Our Roots Born from an inspired idea Our Cause Becoming catalysts for transformation Our Vision Delivering a reimagined health care experience
  • 6. Architecture Amazon CloudWatch AWS CloudTrail AWS IAM Cambia Data Center Amazon S3 Amazon DynamoDB AWS Lambda Amazon EMR Amazon Elasticsearch Service Data Lake Metadata Security Amazon Redshift Amazon EMR Data Science & Analytics Amazon EMR Master Data Management
  • 7. Master Data Management Source A Source B First Name John John Last Name Doe Doe DOB 1970-01-01 2016-11-28 Street 105 Main St 105 Main St City Portland Portland State OR OR Source A Source B First Name Jillian Jill Last Name Doe Doe-Doe SSN 123-45-6789 123-45-6789 Street 605 Oak Dr 105 Main Street City PDX Portland State OR Oregon No. Fatherandson. Yes.Married,changedname,andmoved. This is artificial data fabricated for illustration purposes only. Are these the same people?
  • 8. Master Data Management – Approach Demographics Laboratory Pharmaceutics Geography Claims Composite record of best values Cambia Match and Merge on Amazon EMR
  • 9. Master Data Management – Quality 98.50% 99.90% 99.99% 97.5% 98.0% 98.5% 99.0% 99.5% 100.0% Match Correctness Vendor Cambia V1 Cambia V1.1 98.80% 84.30% 98.10% 75.0% 80.0% 85.0% 90.0% 95.0% 100.0% Match Completeness Vendor Cambia V1 Cambia V1.1 7,000+ records containing 1,600+ matches Manually checked and confirmed in the real world
  • 10. Master Data Management – Performance 90 minutes 40 minutes 0 500 1000 1500 2000 2500 minutes Run time Vendor Cambia V1 Cambia V1.1 2160 minutes or 36 hours 17.7M records containing 1.8M matches
  • 11. Next Steps Scale in and out or up and down Amazon Machine Learning Amazon EMR Build out healthcare data science models HIPAA compliant search on data Amazon EC2
  • 12. SecurityBig Data 1Strategy.com | @1strategy_cloud | Booth #408 Rich Uhl Founder & CTO Rich@1Strategy.com
  • 13. At Rest – when data is in a stored location Definition of Terms In Transit – when data is moved to and from storage In Process – when data is in temporary space for processing state
  • 14. Architecture Amazon CloudWatch AWS CloudTrail AWS IAM Cambia Data Center Amazon S3 Amazon DynamoDB AWS Lambda Amazon EMR Amazon Elasticsearch Service Data Lake Metadata Security Amazon Redshift Amazon EMR Data Science & Analytics Amazon EMR Master Data Management
  • 15. AWS KMS Encryption Keys Exchanging Keys Temporary KeysMaster Key Key Management
  • 17. EMRFS on S3 EMRFS on S3 – This is achieved via s3 client-side encryption with AWS KMS. HDFS – via Hadoop File System (HDFS) transparent data encryption as described in the Apache Docs. HDFS on EMR Cluster Config File Encrypted Encryption at Rest
  • 18. { "Sid": "DenyUnEncryptedObjectUploads", "Effect": "Deny", "Principal": "*", "Action": "s3:PutObject", "Resource": "arn:aws:s3:::prd-datalake/*", "Condition": { "StringNotEquals": { "s3:x-amz-server-side-encryption": "AES256" } } } EMRFS on S3 Encryption at Rest
  • 19. Data Encryption Key (DEK) Envelope Data Encryption Key (EDEK) Hadoop KMS Bootstrap Script Uses native Hadoop HDFS Transparent Data Encryption (DEK/EDEK) HDFS on EMR Cluster Encryption at Rest
  • 20. { "Classification": "hdfs-site", "Properties": { "dfs.encryption.key.provider.uri": "kms://…”, "dfs.namenode.name.dir": "file:///…", "dfs.name.dir": "/mnt/encrypted/…", "dfs.data.dir": "/mnt/encrypted/…", "dfs.datanode.data.dir": "file:///…" } Bootstrap Script HDFS on EMR Cluster Encryption at Rest
  • 21. EMRFS on S3 HDFS on EMR Cluster Summary of Encryption at Rest
  • 23. HDFS on EMR Cluster EMRFS on S3 Encryption in Transit
  • 24. EMRFS on S3 HDFS on EMR Cluster Encryption in Transit <!-- Client certificate Store --> <property> <name>ssl.client.keystore.type</name> <value>jks</value> </property> <property> <name>ssl.client.keystore.location</name> <value>/etc/emr/security/ssl/keystore.jks</value> </property> <property> <name>ssl.client.keystore.password</name> <value>changeit</value> </property> <!-- Client Trust Store --> <property> <name>ssl.client.truststore.type</name> <value>jks</value> </property> <property> <name>ssl.client.truststore.location</name> <value>/etc/emr/security/ssl/truststore.jks</value> </property> <property> <name>ssl.client.truststore.password</name> <value>changeit</value> </property> <property> <name>ssl.client.truststore.reload.interval</name> <value>10000</value> </property> </configuration>
  • 25. Three areas to address 1. Hadoop RPC - Hadoop RPC is used by API clients of MapReduce 2. HDFS DTP - HDFS Transparent encryption this traffic is automatically encrypted 3. Hadoop MapReduce Shuffle - MapReduce shuffles and sorts the output of each map task to reducers on different nodes HDFS on EMR Cluster Encryption in Transit - Cluster
  • 26. RPC client Hadoop RPC - Hadoop RPC is used by API clients of MapReduce EMR Cluster EMRFS on S3 Encryption in Transit - Cluster
  • 27. RPC client <property> <name>hadoop.security.service.user.name.key</name> <value></value> <description> For those cases where the same RPC protocol is implemented by multiple servers, this configuration is required for specifying the principal name to use for the service when the client wishes to make an RPC call. </description> </property> <property> <name>hadoop.rpc.protection</name> <value>authentication</value> <description>A comma-separated list of protection values for secured sasl connections. Possible values are authentication, integrity and privacy. authentication means authentication only and no integrity or privacy; integrity implies authentication and integrity are enabled; and privacy implies all of authentication, integrity and privacy are enabled. hadoop.security.saslproperties.resolver.class can be used to override the hadoop.rpc.protection for a connection at the server side. </description> </property> Encryption in Transit - Cluster
  • 28. Data Encryption Key (DEK) Envelope Data Encryption Key (EDEK) Hadoop KMS HDFS Data Transfer Protocol (DTP) – Using HDFS Transparent encryption enabled ensures automatic encryption Encryption in Transit - Cluster EMRFS on S3 EMR Cluster
  • 29. <property> <name>dfs.encrypt.data.transfer</name> <value>true</value> <description> Whether or not actual block data that is read/written from/to HDFS should be encrypted on the wire. This only needs to be set on the NN and DNs, clients will deduce this automatically. It is possible to override this setting per connection by specifying custom logic via dfs.trustedchannel.resolver.class. </description> </property> <property> <name>dfs.encrypt.data.transfer.algorithm</name> <value></value> <description> This value may be set to either "3des" or "rc4". If nothing is set, then the configured JCE default on the system is used (usually 3DES.) It is widely believed that 3DES is more cryptographically secure, but RC4 is substantially faster. </description> </property> Data Encryption Key (DEK) Envelope Data Encryption Key (EDEK) Hadoop KMS Hadoop Data Transfer Protocol (DTP) configured on startup with a bootstrap script Encryption in Transit - Cluster
  • 30. Hadoop Encrypted Shuffle and Sort Hadoop MapReduce Shuffle - In the shuffle phase, Hadoop MapReduce (MRv2) shuffles the output of each map task to reducers on different nodes using HTTP by default. EMR Cluster Encryption in Transit - Cluster EMRFS on S3
  • 31. { "Classification": "mapred-site", "Properties": { "mapreduce.shuffle.ssl.enabled": "true", "mapred.local.dir": "/mnt/encrypted/mapred,/mnt1/encrypted/mapred", "mapreduce.cluster.local.dir": "/mnt/encrypted/mapred,/mnt1/encrypted/mapred", "mapreduce.application.classpath": "$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*,n $HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*,n /usr/lib/hadoop-lzo/lib/*,n /usr/share/aws/emr/emrfs/conf,n /usr/share/aws/emr/emrfs/lib/*,n /usr/share/aws/emr/emrfs/auxlib/*,n /usr/share/aws/emr/lib/*,n /usr/share/aws/emr/ddb/lib/emr-ddb-hadoop.jar,n /usr/share/aws/emr/goodies/lib/emr-hadoop-goodies.jar,n /usr/share/aws/emr/kinesis/lib/emr-kinesis- hadoop.jar,n /usr/share/aws/emr/cloudwatch-sink/lib/*,n /etc/emr/security/conf" } Hadoop Encrypted Shuffle and Sort Encryption in Transit - Cluster
  • 32. EMRFS on S3 EMR Cluster Encryption in Transit - Cluster Spark block transfer service – This is can be encrypted using SASL encryption in Spark 1.5.1 and later.
  • 33. { "Classification": "spark-env", "Properties": { "spark.authenticate.enableSaslEncryption": "true", "spark.network.sasl.serverAlwaysEncrypt": "true" } Encryption in Transit
  • 36. Bootstrap Script function encrypt_disk() { local dev=$1 local dir=$2 local cryptname="crypt_${dir:1}" # Unmount the drive sudo umount "$dev" # Encrypt the drive sudo cryptsetup luksFormat -q --key-file "$PWD_FILE" "$dev" sudo cryptsetup luksOpen -q --key-file "$PWD_FILE" "$dev" "$cryptname" # Format the drive sudo mkfs -t xfs "/dev/mapper/$cryptname" sudo mount -o defaults,noatime,inode64 "/dev/mapper/$cryptname" "$dir" sudo rm -rf "$dir/lost+found" sudo mkdir -p "$dir/encrypted" sudo chown -R hadoop:hadoop "$dir" echo "/dev/mapper/$cryptname $dir xfs defaults,noatime,inode64 0 0" | sudo tee -a /etc/fstab echo "$cryptname $dev $PWD_FILE" | sudo tee -a /etc/crypttab } Temporary Space on EBS Volumes Encryption in Process
  • 37. HDFS on EMR ClusterEMRFS on S3 Temporary Space on EBS Volumes RPC Hadoop Encrypted Shuffle and Sort Native DTP Summary of the EMR Encryption Process
  • 38. EMR Updates 1Strategy blog links amzn.to/2g0JJIN September 21st, 2016 bit.ly/1strategy_emr AWS EMR Encryption Documentation
  • 39. EMR Updates and how they play into this
  • 40.
  • 41. Temporary Space on EBS Volumes ElasticSearch for HealthCare Encryption and AuthenticationElasticSearch on EC2 Instances
  • 42. EMRFS on S3 Temporary Space on EBS Volumes ElasticSearch on EC2 Instances ElasticSearch Encryption Process Summary
  • 43. HIPAA is more than encryption Auditing & custom tools: • Audit script to show limited users have access to encrypted S3 data • S3 Buckets are encrypted • Show S3 Objects are encrypted *Working with Cambia to open source these tools bit.ly/1strategy_emr_code
  • 44. Demo
  • 45. Ujjwal Ratan Solutions Architect, AWS Ujjwalr@Amazon.com
  • 46. Machine Learning inside Healthcare Analyzing Medical Images Prescription Compliance Prediction Evidence Based & Precision Medicine Text classification and mining Medicare and Medicaid Fraud Hospital Bed Utilization Treatment Queries and Suggestions Drug Discovery and Clinical Trials Population Health Vaccination and Immunization Omics and Clinical Data Integration Patient Outcomes Patient Readmission Prediction through risk stratification
  • 47. Real World Problem – Hospital Readmissions • Hospital Readmission Reduction Program (HRRP) part of the Affordable Care Act. • Centers for Medicare & Medicaid Services (CMS) required to reduce payments to hospitals with excess readmissions. • Not all readmissions can be prevented • Facilities with high readmission rates had their Medicare payment cut by 1% in 2013 which rose to 2% in 2014. Source - www.ncbi.nlm.nih.gov/pmc/articles/PMC3558794
  • 48. Our Focus Utilizing AWS For Machine Learning (ML) Continuum of Machine Learning Solutions • Limited ML Options • Binary • Multiclass • Regression • Simple to train • Easy to evaluate • Quick to deploy • Comprehensive ML options • Requires work to train • No support for evaluation • Additional work to deploy • Scalable • Customizable Amazon EMR + Spark ML Amazon Machine Learning
  • 49. Introducing Amazon Machine Learning (AML) • Easy to use, managed machine learning service built for developers • Robust, powerful machine learning technology based on Amazon’s internal systems • Use your data already stored in the AWS cloud • Models in production within seconds
  • 50. Machine Learning Proactive Prediction of Readmission Patient Demographics Patient History Admission Attributes Other features Patient High Risk Patient Low Risk Patient Moderate Risk Patient
  • 51. Amazon S3 Amazon Redshift Amazon Machine Learning users Internet CSV Files 1 2 3 5 Amazon Cognito S3 Static Website Internet 4 AML Application for Predicting Readmissions
  • 52. Clinical Data Set https://archive.ics.uci.edu/ml/datasets/Diabetes+130-US+hospitals+for+years+1999-2008 • 101,766 rows • 10 years of clinical care • 130 US hospitals • 50+ attributes of diabetes patients and hospital outcomes
  • 53. Ingesting Data into S3 - Staging Table Name Table Type admission_source.csv Master admission_type.csv Master discharge_disposition.csv Master Diabetic_data.csv Transaction aws s3 cp /tmp/foo/ s3://bucket/ --recursive
  • 54. Schema in Redshift Fact create table admission_type ( admission_type_id INTEGER NOT NULL, description varchar(100) ); create table discharge_disposition ( discharge_disposition_id INTEGER NOT NULL, description VARCHAR(500) ); create table admission_source ( admission_source_id INTEGER NOT NULL, description VARCHAR(500) ); create table diabetes_data ( // ~50 attributes ); Dim2 Dim3 Dim1
  • 55. Data Load and Standardization COPY<Redshift_Table_Name> FROM's3://<file_path.csv>' CREDENTIALS 'aws_access_key_id=<>;aws_secret_access_key=<>’ DELIMITER ',’ IGNOREHEADER 1; Data Load • Updated NULL values • Change attributes values which do not comply with standard patterns. • ex: Phone = (206) XXX-XXXX • Complete geographical data where possible • Include timeline values if possible • Group granular attributes in sets. • ex: Ages 0 to 20 as youth, 20 to 40 as adult and so on. Data Standardization
  • 56. Create AML Data Source with Redshift CreateDataSourceFromRedshift API Console
  • 57. Real-time Predictions Using API • Synchronous, low-latency, high-throughput prediction generation • Request through service API or server or mobile SDKs • Best for interaction applications that deal with individual data records >>> import boto >>> ml = boto.connect_machinelearning() >>> ml.predict( ml_model_id=’my_model', predict_endpoint=’example_endpoint’, record={’key1':’value1’, ’key2':’value2’}) { 'Prediction': { 'predictedValue': 13.284348, 'details': { 'Algorithm': 'SGD', 'PredictiveModelType': 'REGRESSION’ } } }
  • 58.
  • 59. Application Website Hosted on S3 var machinelearning = new AWS.MachineLearning({apiVersion: '2014-12-12'}); var params = { MLModelId: ‘<AML Model ID>', PredictEndpoint: ‘<AML Model Real Time End Point>', Record: <Selected Attributes record set> }; var request = machinelearning.predict(params); Application calls the Predict() API using necessary parameters Website hosting in S3 without web servers eliminates complexities of scaling hardware based on traffic routed to your application. bit.ly/aml_demo - Demo bit.ly/hcl301_blog - Blog
  • 60. Expanded Architecture Amazon S3 Amazon Redshift Amazon Machine Learning Amazon EC2 Amazon EMR users Internet Corporate Data Center Make data suitable to acting as an ML data source An ML model is created with Redshift as the data source EC2 as a frontend for AML end point Process unstructured and semi-structured data Data Lake Amazon S3 Amazon QuickSight Amazon RDS users Batch prediction generated and stored in S3 DB Schemas CSV Files Unstructured files QuickSight generates BI reports on prediction data. An RDS schema acts as a source for QuickSight
  • 62. Join us tonight at the Health Care happy hour sponsored by Cambia Health Solutions, 8KMiles.com and AWS at: Japonais restaurant in the Mirage on Monday 11/28 from 6-8 PM AWS and Cambia are co-presenting: SEC305 – Scaling Security Resources for Your First 10 Million Customers Tuesday, Nov 29, 12:30 PM - 1:30 PM Do you want to know more about how to secure health data?