Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.
P U B L I C S E C T O R
S U M M I T
Canberra, ACT
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
Building Data...
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
Clinical data...
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
Objectives
In...
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
Ingest and ca...
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
Add the secon...
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
Then add the ...
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
Data staging ...
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
First pass di...
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
Optimise perf...
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
Accelerate qu...
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
Enrich with A...
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
Amazon Compre...
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
Structure and...
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
Experiment wi...
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
Close the loo...
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
Some Pro Tips...
Próxima SlideShare
Cargando en…5
×

Build Data Lakes and Analytics on AWS: Patterns and Best Practices

473 visualizaciones

Publicado el

In this session, we show you how to understand what data you have, how to drive insights, and how to make predictions using purpose-built AWS services. Learn about the common pitfalls of building data lakes and discover how to successfully drive analytics and insights from your data. Also learn how services such as Amazon Simple Storage Service (Amazon S3), AWS Glue, Amazon Redshift, Amazon Athena, Amazon EMR, Amazon Kinesis, and Amazon Machine Learning (Amazon ML) services work together to build a successful data lake for various roles, including data scientists and business users.

Speaker: Craig Roach, Solutions Architect, AWS

  • Sé el primero en comentar

Build Data Lakes and Analytics on AWS: Patterns and Best Practices

  1. 1. P U B L I C S E C T O R S U M M I T Canberra, ACT
  2. 2. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T Building Data Lakes and Analytics on AWS: Patterns and Best Practices Craig Roach Principal Solutions Architect AWS David Scott Director Network and Asset Intelligence Transport for NSW
  3. 3. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T
  4. 4. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T Clinical data lake: Reference architecture AWS Glue Crawler AWS Glue Data Catalog Amazon API Gateway AWS DataSync Amazon Kinesis Amazon QuickSight AWS Glue Amazon S3 Amazon EMR Amazon Athena Amazon Redshift Spectrum Amazon Comprehend Medical AWS Step Functions Amazon SageMaker Model Inference
  5. 5. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T
  6. 6. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T Objectives Insights Data Sources
  7. 7. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T Ingest and catalogue the first data source – discharges AWS DataSync HL7 Amazon S3 Raw Staging Amazon Elasticsearch Service Amazon DynamoDB Validate schema AWS Glue Crawler AWS Glue Data Catalog Object schemas: Hive metastore compatible Object lineage: Searchable via Kibana Object metadat a table Hospital discharge system
  8. 8. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T Add the second source – admissions AWS Glue Crawler AWS Glue Data Catalog Amazon S3 ISV-managed admission service (SaaS) Amazon API Gateway AWS DataSync FHIR Amazon DynamoDB FHIR database FHIR data capture AWS Lambda function Raw Staging AWS Glue Crawler AWS Glue Data Catalog FHIR JSON
  9. 9. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T Then add the third source – Real-time systems AWS Glue Crawler AWS Glue Data Catalog Amazon S3 Amazon API Gateway AWS DataSync Amazon Kinesis Data Firehose Raw Staging AWS Glue Crawler AWS Glue Data Catalog Amazon Kinesis Data Streams CSV • Kafka • Fluentd • Flume • LOG4J Amazon Kinesis Data Analytics Hospital real-time systems
  10. 10. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T Data staging complete – let’s do some analysis AWS Glue Crawler AWS Glue Data Catalog Amazon S3 Amazon API Gateway AWS DataSync Amazon Kinesis • HL7 • FHIR JSON • CSV
  11. 11. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T First pass discovery with SQL and visualisation AWS Glue Crawler AWS Glue Data Catalog Amazon S3 Amazon API Gateway AWS DataSync Amazon Kinesis Amazon Athena • JDBC • ODBC
  12. 12. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T Optimise performance and cost-efficiency AWS Glue Crawler AWS Glue Data Catalog Amazon API Gateway AWS DataSync Amazon Kinesis Amazon Athena Amazon QuickSight AWS Glue Amazon S3 PySpark ETL jobs • Apache Parquet • Partitioned • ISO elements • JDBC • ODBC Any JDBC/ODBC Visualisation or reporting product Business reporting Curated
  13. 13. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T Accelerate queries with Amazon Redshift Spectrum AWS Glue Crawler AWS Glue Data Catalog Amazon S3 Amazon API Gateway AWS DataSync Amazon Kinesis Amazon Redshift Spectrum Amazon QuickSight AWS Glue Amazon S3 PySpark ETL Jobs • JDBC • ODBC Any JDBC/ODBC visualisation or reporting product Business reporting Amazon Athena
  14. 14. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T Enrich with Apache Spark on Amazon EMR AWS Glue Crawler AWS Glue Data Catalog Amazon API Gateway AWS DataSync Amazon Kinesis Amazon QuickSight AWS Glue Amazon S3 Amazon EMR Amazon Athena EMRFS Data science Amazon Redshift Spectrum
  15. 15. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T
  16. 16. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T Amazon Comprehend Medical Pt is 40yo mother, highschool teacher HPI : Sleeping trouble on present dosage of Clonidine. Severe Rash on face and leg, slightly itchy Meds : Vyvanse 50 mgs po at breakfast daily, Clonidine 0.2 mgs -- 1 and 1 / 2 tabs po qhs Lungs : clear Heart : Regular rhythm Skin : Mild erythematous eruption to hairline Amazon Comprehend {"Id": 1, "Score": 0.32382732629776, "Text": "highschool teacher", "Category": "PROTECTED_HEALTH_INFORMATION", "Type": "PROFESSION”} {"Id": 12, "Score": 0.8420739769935608, "Text": "Sleeping trouble", "Category": "MEDICAL_CONDITION", "Type": "DX_NAME", "Traits": [ { "Name": "SYMPTOM", "Score": 0.6971826553344727} ]} Unstructured Clinical Text NLP Analysis Output (JSON) • Amazon Comprehend Medical uses deep learning technology to accurately analyze text. • Our models are constantly trained with new data across multiple domains to improve accuracy.
  17. 17. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T Structure and encode clinical case notes AWS Glue Crawler AWS Glue Data Catalog Amazon API Gateway AWS DataSync Amazon Kinesis Amazon QuickSight AWS Glue Amazon S3 Amazon EMR Amazon Athena Amazon Redshift Spectrum Amazon Comprehend Medical Analyse unstructured clinical case notes, then encode using SNOMED and ICD10
  18. 18. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T
  19. 19. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T Experiment with Amazon SageMaker notebooks and query tools AWS Glue Crawler AWS Glue Data Catalog Amazon API Gateway AWS DataSync Amazon Kinesis Amazon QuickSight AWS Glue Amazon S3 Amazon EMR Amazon Athena Amazon Redshift Spectrum Amazon Comprehend Medical Amazon SageMaker Clinical anomaly detection: Feature engineering and training Model Training
  20. 20. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T Close the loop – feed ML inferences back into data lake AWS Glue Crawler AWS Glue Data Catalog Amazon API Gateway AWS DataSync Amazon Kinesis Amazon QuickSight AWS Glue Amazon S3 Amazon EMR Amazon Athena Amazon Redshift Spectrum Amazon Comprehend Medical AWS Step Functions Amazon SageMaker Model inference Clinical anomaly detection: Inference workflow
  21. 21. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T
  22. 22. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T Some Pro Tips • Set achievable objectives: three insights, data sources, and sprints • Build a data catalog that can trace data lineage and provenance, as well as automatically discover schemas. • Use Apache Parquet transforms to optimise query performance and cost-efficiency. • Process data directly in the Amazon S3 data lake securely, and with higher performance and efficiency, using the EMRFS connector. • Use Amazon Machine Learning application services, like Amazon Comprehend Medical, to detect structure in unstructured text and media. • Use Amazon SageMaker to augment the data lake with inferences derived from models trained from the data lake itself.

×