More Related Content Similar to Serverless Extract-transform-load (ETL) on AWS Webinar (20) More from Amazon Web Services (20) Serverless Extract-transform-load (ETL) on AWS Webinar1. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Paul Macey
Specialist Solution Architect, Big Data and Analytics
AWS Public Sector
November 2019
Serverless ETL on AWS
Deep dive webinar
2. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Agenda
Serverless ETL(SETL) vs Traditional ETL
Establishing a repeatable data workflow
SETL components & pipeline
Use Cases
SETL with Amazon Athena
Demo
SETL & data lake integration
Demo
Data security & governance
Wrap up
3. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Outcomes for this session
Understand how to use serverless technologies to perform
ETL
Learn how SETL can be integrated into an existing data
pipeline or data lake
4. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Serverless vs Traditional ETL
Operational
Excellence Security Reliability
Performance
Efficiency
Cost
Optimization
5. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data sources Transport
Process &
Transform
Persist &
Store
Secure and
Deliver
Operate &
Monitor
Establishing a repeatable data workflow
Data Lake
SETL
6. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
SETL Components
ETL EngineProcess initiator Workflow
coordination
Storage
7. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
SETL Components – AWS Lambda
ETL
Engine
Process initiator
AWS Step Functions
Workflow
coordination
(optional)
AWS Lambda
Storage
Amazon EventBridge
AWS Lambda Event
Amazon S3
AWS Database
Service
8. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
SETL Components – AWS Lambda
ETL
Engine
Process initiator
AWS Step Functions
Workflow
coordination
(optional)
AWS Lambda
Storage
Amazon EventBridge
AWS Lambda Event
Amazon S3
AWS Database
Service
ETL using open source libraries and AWS Lambda:
• Arrays and matrices - Numpy
• Data manipulation - Pandas
• Machine Learning - Scikit
• Natural Language Processing - NLTK
• Geospatial - Geopandas
9. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
SETL Components – Amazon Athena
ETL
Engine
Process initiator
AWS Step Functions
Workflow
coordination
(optional)
Amazon Athena
Storage
Amazon EventBridge
AWS Lambda Event
Amazon S3
AWS Database
Service
ETL using Amazon Athena (SQL based):
• Geospatial
• Windowing
• JSON parsing
• Lambda Expressions and Functions
10. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Pop quiz
https://prestodb.io/
11. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Use Cases
Data masking / hashing
Aggregation
Reporting
Timeseries
Data prep for DS/ML
Row by row ML
12. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
SETL Pricing
AWS Lambda
Requests
First 1M free
$0.20 per 1M thereafter
Duration
First 400,000 GB-seconds per month, up to 3.2M seconds of compute time, are free.
$0.0000166667 FOR EVERY GB-SECOND USED THEREAFTER
The price depends on the amount of memory you allocate to your function.
AWS Athena
S3 - Standard S3 rates for storage, requests, and data transfer
Athena - $5.00 per TB of data scanned
13. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Demo data flow
Amazon S3
Data Lake
Amazon
Athena
Data Catalogue
AWS Glue
14. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
SETL with Amazon Athena
Demo
15. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Athena
Amazon Athena
SETL & data lake integration
(Athena Slingshot)
Start Small
Establish a
Repeatable
Workflow
Deliver
benefits
Improve
and Iterate
Repeat
16. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
SETL & data lake integration
(Athena Slingshot)
Amazon Athena workgroups
Amazon S3 query destinations
Saved tables & queries
17. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Athena slingshot data flow
Data Catalogue
AWS Glue
Staging
Amazon S3
SETL
Amazon Athena
Curated
Amazon S3
Gold
Amazon S3
Data Catalogue
AWS Glue
18. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
SETL & data lake integration
(Athena Slingshot)
Demo
19. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Using SETL in the real world
20. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data security & governance
Data access control & security
Data usage controls
Encrypted output
21. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Wrap up
Understand how to use serverless technologies to perform
SETL
• Components
• Pipelines
Learn how SETL can be integrated into an existing data
pipeline or data lake
• Athena slingshot
• Data security and governance
22. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Glue References
Restrict access to your AWS glue data catalog
https://aws.amazon.com/blogs/big-data/restrict-access-to-your-aws-glue-data-catalog-with-resource-
level-iam-permissions-and-resource-based-policies/
Fine grained access to glue resources
https://docs.aws.amazon.com/athena/latest/ug/fine-grained-access-to-glue-resources.html
23. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Athena References
Amazon Athena JDBC & ODBC connectivity
https://docs.aws.amazon.com/athena/latest/ug/athena-bi-tools-jdbc-odbc.html
Athena workgroup policies
https://docs.aws.amazon.com/athena/latest/ug/workgroups-iam-policy.html
Athena workgroup policy examples
https://docs.aws.amazon.com/athena/latest/ug/example-policies-workgroup.html
Presto functions in Athena
https://docs.aws.amazon.com/athena/latest/ug/presto-functions.html
Working with Query Results and Output Files
https://docs.aws.amazon.com/athena/latest/ug/querying.html
24. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Pricing
AWS Pricing Calculator
https://calculator.aws
AWS Glue
https://aws.amazon.com/glue/pricing/
Amazon Athena
https://aws.amazon.com/athena/pricing/
Amazon Athena JDBC & ODBC connectivity
https://docs.aws.amazon.com/athena/latest/ug/athena-bi-tools-jdbc-odbc.html
AWS Lambda
https://aws.amazon.com/lambda/pricing/
Amazon Event Bridge
https://aws.amazon.com/eventbridge/pricing/
25. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS References
AWS Data Flywheel
https://pages.awscloud.com/apac-data-flywheel.html
https://resources.awscloud.com/aws-data-analytics-machinelearning/data-flywheel-e-book
AWS Lake Formation
https://aws.amazon.com/blogs/aws/aws-lake-formation-now-generally-available/
Well Architected Framework
https://aws.amazon.com/architecture/well-architected/
26. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Python Library References
https://numpy.org/
https://pandas.pydata.or
https://scikit-learn.org/stable/index.html g/
http://geopandas.org/index.html
https://www.nltk.org/
27. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Accelerated Data Lake
Available today @ GitHub
https://github.com/aws-samples/accelerated-data-lake
Includes
Data lake pipeline (CloudFormation)
Instructions
Data configuration, security and metadata templates
Delivery
Professional services
AWS partners
28. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Thank you