SlideShare una empresa de Scribd logo
1 de 12
Descargar para leer sin conexión
MIDWEST | CHICAGO
Analytics Pipeline using Glue/Athena
and Visualization
Piyali Kamra| 2020 | Sr Principal Architect, Morningstar.
Agenda
Terminologies
Classifier
Crawler
Data catalog
Parquet data format
Athena
Architecture
Demo
Metrics Size/Execution time
Non partitioned data
Partitioned Data
Raw CSV data
SQL Queries
Sample Visualization.
Terminologies
Crawlers crawl data sets and
populate the Glue Data
Catalog
Glue Data Catalog - Meta
data informationabout the
schema , locationof data.
ReadbyGlue ETL. Created
bythe Crawler
A GLUE job basically
consist ofbusiness logic
that performs ETL work.
Glue is a managed
service that uses apache
spark.The boilerplate
code is in python/ scala.
Parquet is an open
source
fileformat available
to any projectin the
Hadoop ecosystem.
Apache Parquet is
designed for efficient
as well as performant
flatcolumnar
storage format of
data compared to
row-based files like
CSV or TSV files.
Athena is a serverless
interactivequery
servicebased on the
presto processing
engine .Can run
analytic queries on
largedata sets from
S3 buckets in csv /
parquet /JSON
formats.
Setup for S3 Source buckets
1. Source raw data s3 bucket.
2. Partitionedsource rawdatas3
bucket.
Screen Shot 2020-06-08 at 12.02.0
Setup the Crawler and Data Catalog
Setup the Glue ETL Job
Parquet format data in target
s3 bucket after the glue job is
run
Demo
Recorded video
Import new client - partitioned use case
Run crawler - data catalog data update , schema update
Run Glue Job - update spark code to include partition
Check new partition in parquet S3 bucket.
Athena query by updating partition.
Visualization Options.
Metrics – Athena backed by csv data
Metrics – Athena backed by non partitionedparquet data
Metrics – Athena backed by partitionedparquet data

Más contenido relacionado

La actualidad más candente

Presto Summit 2018 - 03 - Starburst CBO
Presto Summit 2018  - 03 - Starburst CBOPresto Summit 2018  - 03 - Starburst CBO
Presto Summit 2018 - 03 - Starburst CBOkbajda
 
Big Data Landscape 2019
Big Data Landscape 2019Big Data Landscape 2019
Big Data Landscape 2019QAware GmbH
 
Calum McCrea, Software Engineer at Kx Systems, "Kx: How Wall Street Tech can ...
Calum McCrea, Software Engineer at Kx Systems, "Kx: How Wall Street Tech can ...Calum McCrea, Software Engineer at Kx Systems, "Kx: How Wall Street Tech can ...
Calum McCrea, Software Engineer at Kx Systems, "Kx: How Wall Street Tech can ...Dataconomy Media
 
Alexander Pavlenko, Java Software Engineer, DataArt.
Alexander Pavlenko, Java Software Engineer, DataArt.Alexander Pavlenko, Java Software Engineer, DataArt.
Alexander Pavlenko, Java Software Engineer, DataArt.Alina Vilk
 
Big Data Landscape 2019
Big Data Landscape 2019Big Data Landscape 2019
Big Data Landscape 2019QAware GmbH
 
ЯРОСЛАВ РАВЛІНКО «Data Science at scale. Next generation data processing plat...
ЯРОСЛАВ РАВЛІНКО «Data Science at scale. Next generation data processing plat...ЯРОСЛАВ РАВЛІНКО «Data Science at scale. Next generation data processing plat...
ЯРОСЛАВ РАВЛІНКО «Data Science at scale. Next generation data processing plat...UA DevOps Conference
 
Big Data at Tube: Events to Insights to Action
Big Data at Tube: Events to Insights to ActionBig Data at Tube: Events to Insights to Action
Big Data at Tube: Events to Insights to ActionMurtaza Doctor
 
HUG France - Paris - Data Engineer's Toolkit
HUG France - Paris - Data Engineer's ToolkitHUG France - Paris - Data Engineer's Toolkit
HUG France - Paris - Data Engineer's ToolkitSynaltic Group
 
Near Real-Time Data Warehousing with Apache Spark and Delta Lake
Near Real-Time Data Warehousing with Apache Spark and Delta LakeNear Real-Time Data Warehousing with Apache Spark and Delta Lake
Near Real-Time Data Warehousing with Apache Spark and Delta LakeDatabricks
 
Collecting Endpoint Security Logs Through Big Data Technology - Dedi Dwianto
Collecting Endpoint Security Logs Through Big Data Technology - Dedi DwiantoCollecting Endpoint Security Logs Through Big Data Technology - Dedi Dwianto
Collecting Endpoint Security Logs Through Big Data Technology - Dedi Dwiantoidsecconf
 
Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli
Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli
Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli Spark Summit
 
Make your PySpark Data Fly with Arrow!
Make your PySpark Data Fly with Arrow!Make your PySpark Data Fly with Arrow!
Make your PySpark Data Fly with Arrow!Databricks
 
Intro elasticsearch taswarbhatti
Intro elasticsearch taswarbhattiIntro elasticsearch taswarbhatti
Intro elasticsearch taswarbhattiTaswar Bhatti
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
An Intro to Elasticsearch and Kibana
An Intro to Elasticsearch and KibanaAn Intro to Elasticsearch and Kibana
An Intro to Elasticsearch and KibanaObjectRocket
 
Airflow at lyft for Airflow summit 2020 conference
Airflow at lyft for Airflow summit 2020 conferenceAirflow at lyft for Airflow summit 2020 conference
Airflow at lyft for Airflow summit 2020 conferenceTao Feng
 
Presto for apps deck varada prestoconf
Presto for apps deck varada prestoconfPresto for apps deck varada prestoconf
Presto for apps deck varada prestoconfOri Reshef
 
Observability for Data Pipelines With OpenLineage
Observability for Data Pipelines With OpenLineageObservability for Data Pipelines With OpenLineage
Observability for Data Pipelines With OpenLineageDatabricks
 
How to Rebuild an End-to-End ML Pipeline with Databricks and Upwork with Than...
How to Rebuild an End-to-End ML Pipeline with Databricks and Upwork with Than...How to Rebuild an End-to-End ML Pipeline with Databricks and Upwork with Than...
How to Rebuild an End-to-End ML Pipeline with Databricks and Upwork with Than...Databricks
 

La actualidad más candente (20)

Presto Summit 2018 - 03 - Starburst CBO
Presto Summit 2018  - 03 - Starburst CBOPresto Summit 2018  - 03 - Starburst CBO
Presto Summit 2018 - 03 - Starburst CBO
 
Big Data Landscape 2019
Big Data Landscape 2019Big Data Landscape 2019
Big Data Landscape 2019
 
Calum McCrea, Software Engineer at Kx Systems, "Kx: How Wall Street Tech can ...
Calum McCrea, Software Engineer at Kx Systems, "Kx: How Wall Street Tech can ...Calum McCrea, Software Engineer at Kx Systems, "Kx: How Wall Street Tech can ...
Calum McCrea, Software Engineer at Kx Systems, "Kx: How Wall Street Tech can ...
 
Alexander Pavlenko, Java Software Engineer, DataArt.
Alexander Pavlenko, Java Software Engineer, DataArt.Alexander Pavlenko, Java Software Engineer, DataArt.
Alexander Pavlenko, Java Software Engineer, DataArt.
 
Big Data Landscape 2019
Big Data Landscape 2019Big Data Landscape 2019
Big Data Landscape 2019
 
ЯРОСЛАВ РАВЛІНКО «Data Science at scale. Next generation data processing plat...
ЯРОСЛАВ РАВЛІНКО «Data Science at scale. Next generation data processing plat...ЯРОСЛАВ РАВЛІНКО «Data Science at scale. Next generation data processing plat...
ЯРОСЛАВ РАВЛІНКО «Data Science at scale. Next generation data processing plat...
 
Big Data at Tube: Events to Insights to Action
Big Data at Tube: Events to Insights to ActionBig Data at Tube: Events to Insights to Action
Big Data at Tube: Events to Insights to Action
 
HUG France - Paris - Data Engineer's Toolkit
HUG France - Paris - Data Engineer's ToolkitHUG France - Paris - Data Engineer's Toolkit
HUG France - Paris - Data Engineer's Toolkit
 
Near Real-Time Data Warehousing with Apache Spark and Delta Lake
Near Real-Time Data Warehousing with Apache Spark and Delta LakeNear Real-Time Data Warehousing with Apache Spark and Delta Lake
Near Real-Time Data Warehousing with Apache Spark and Delta Lake
 
Collecting Endpoint Security Logs Through Big Data Technology - Dedi Dwianto
Collecting Endpoint Security Logs Through Big Data Technology - Dedi DwiantoCollecting Endpoint Security Logs Through Big Data Technology - Dedi Dwianto
Collecting Endpoint Security Logs Through Big Data Technology - Dedi Dwianto
 
Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli
Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli
Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli
 
Make your PySpark Data Fly with Arrow!
Make your PySpark Data Fly with Arrow!Make your PySpark Data Fly with Arrow!
Make your PySpark Data Fly with Arrow!
 
Intro elasticsearch taswarbhatti
Intro elasticsearch taswarbhattiIntro elasticsearch taswarbhatti
Intro elasticsearch taswarbhatti
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
An Intro to Elasticsearch and Kibana
An Intro to Elasticsearch and KibanaAn Intro to Elasticsearch and Kibana
An Intro to Elasticsearch and Kibana
 
Airflow at lyft for Airflow summit 2020 conference
Airflow at lyft for Airflow summit 2020 conferenceAirflow at lyft for Airflow summit 2020 conference
Airflow at lyft for Airflow summit 2020 conference
 
Hugfr SPARK & RIAK -20160114_hug_france
Hugfr  SPARK & RIAK -20160114_hug_franceHugfr  SPARK & RIAK -20160114_hug_france
Hugfr SPARK & RIAK -20160114_hug_france
 
Presto for apps deck varada prestoconf
Presto for apps deck varada prestoconfPresto for apps deck varada prestoconf
Presto for apps deck varada prestoconf
 
Observability for Data Pipelines With OpenLineage
Observability for Data Pipelines With OpenLineageObservability for Data Pipelines With OpenLineage
Observability for Data Pipelines With OpenLineage
 
How to Rebuild an End-to-End ML Pipeline with Databricks and Upwork with Than...
How to Rebuild an End-to-End ML Pipeline with Databricks and Upwork with Than...How to Rebuild an End-to-End ML Pipeline with Databricks and Upwork with Than...
How to Rebuild an End-to-End ML Pipeline with Databricks and Upwork with Than...
 

Similar a Piyali Kamra - Analytics and Data Visualization pipeline backed by AWS Glue & Athena

Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft AzureOtimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft AzureLuan Moreno Medeiros Maciel
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptxAlex Ivy
 
Large-Scale Data Science in Apache Spark 2.0
Large-Scale Data Science in Apache Spark 2.0Large-Scale Data Science in Apache Spark 2.0
Large-Scale Data Science in Apache Spark 2.0Databricks
 
Big Data Analytics from Azure Cloud to Power BI Mobile
Big Data Analytics from Azure Cloud to Power BI MobileBig Data Analytics from Azure Cloud to Power BI Mobile
Big Data Analytics from Azure Cloud to Power BI MobileRoy Kim
 
Data Science with the Help of Metadata
Data Science with the Help of MetadataData Science with the Help of Metadata
Data Science with the Help of MetadataJim Dowling
 
Tordatasci meetup-precima-retail-analytics-201901
Tordatasci meetup-precima-retail-analytics-201901Tordatasci meetup-precima-retail-analytics-201901
Tordatasci meetup-precima-retail-analytics-201901WeCloudData
 
The Open Data Lake Platform Brief - Data Sheets | Whitepaper
The Open Data Lake Platform Brief - Data Sheets | WhitepaperThe Open Data Lake Platform Brief - Data Sheets | Whitepaper
The Open Data Lake Platform Brief - Data Sheets | WhitepaperVasu S
 
Big Data, Ingeniería de datos, y Data Lakes en AWS
Big Data, Ingeniería de datos, y Data Lakes en AWSBig Data, Ingeniería de datos, y Data Lakes en AWS
Big Data, Ingeniería de datos, y Data Lakes en AWSjavier ramirez
 
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)Jason Dai
 
AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)
AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)
AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)Amazon Web Services Korea
 
Azure BI Cloud Architectural Guidelines.pdf
Azure BI Cloud Architectural Guidelines.pdfAzure BI Cloud Architectural Guidelines.pdf
Azure BI Cloud Architectural Guidelines.pdfpbonillo1
 
Using Cloud Automation Technologies to Deliver an Enterprise Data Fabric
Using Cloud Automation Technologies to Deliver an Enterprise Data FabricUsing Cloud Automation Technologies to Deliver an Enterprise Data Fabric
Using Cloud Automation Technologies to Deliver an Enterprise Data FabricCambridge Semantics
 
Spark + AI Summit 2020 イベント概要
Spark + AI Summit 2020 イベント概要Spark + AI Summit 2020 イベント概要
Spark + AI Summit 2020 イベント概要Paulo Gutierrez
 
Azure Databricks & Spark @ Techorama 2018
Azure Databricks & Spark @ Techorama 2018Azure Databricks & Spark @ Techorama 2018
Azure Databricks & Spark @ Techorama 2018Nathan Bijnens
 
Big Data Engineering for Machine Learning
Big Data Engineering for Machine LearningBig Data Engineering for Machine Learning
Big Data Engineering for Machine LearningVasu S
 
Presto talk @ Global AI conference 2018 Boston
Presto talk @ Global AI conference 2018 BostonPresto talk @ Global AI conference 2018 Boston
Presto talk @ Global AI conference 2018 Bostonkbajda
 
How Microsoft Synapse Analytics Can Transform Your Data Analytics.pdf
How Microsoft Synapse Analytics Can Transform Your Data Analytics.pdfHow Microsoft Synapse Analytics Can Transform Your Data Analytics.pdf
How Microsoft Synapse Analytics Can Transform Your Data Analytics.pdfAddend Analytics
 
Running Presto and Spark on the Netflix Big Data Platform
Running Presto and Spark on the Netflix Big Data PlatformRunning Presto and Spark on the Netflix Big Data Platform
Running Presto and Spark on the Netflix Big Data PlatformEva Tse
 
Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...
Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...
Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...Rukmani Gopalan
 

Similar a Piyali Kamra - Analytics and Data Visualization pipeline backed by AWS Glue & Athena (20)

Os Lonergan
Os LonerganOs Lonergan
Os Lonergan
 
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft AzureOtimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
 
Large-Scale Data Science in Apache Spark 2.0
Large-Scale Data Science in Apache Spark 2.0Large-Scale Data Science in Apache Spark 2.0
Large-Scale Data Science in Apache Spark 2.0
 
Big Data Analytics from Azure Cloud to Power BI Mobile
Big Data Analytics from Azure Cloud to Power BI MobileBig Data Analytics from Azure Cloud to Power BI Mobile
Big Data Analytics from Azure Cloud to Power BI Mobile
 
Data Science with the Help of Metadata
Data Science with the Help of MetadataData Science with the Help of Metadata
Data Science with the Help of Metadata
 
Tordatasci meetup-precima-retail-analytics-201901
Tordatasci meetup-precima-retail-analytics-201901Tordatasci meetup-precima-retail-analytics-201901
Tordatasci meetup-precima-retail-analytics-201901
 
The Open Data Lake Platform Brief - Data Sheets | Whitepaper
The Open Data Lake Platform Brief - Data Sheets | WhitepaperThe Open Data Lake Platform Brief - Data Sheets | Whitepaper
The Open Data Lake Platform Brief - Data Sheets | Whitepaper
 
Big Data, Ingeniería de datos, y Data Lakes en AWS
Big Data, Ingeniería de datos, y Data Lakes en AWSBig Data, Ingeniería de datos, y Data Lakes en AWS
Big Data, Ingeniería de datos, y Data Lakes en AWS
 
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
 
AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)
AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)
AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)
 
Azure BI Cloud Architectural Guidelines.pdf
Azure BI Cloud Architectural Guidelines.pdfAzure BI Cloud Architectural Guidelines.pdf
Azure BI Cloud Architectural Guidelines.pdf
 
Using Cloud Automation Technologies to Deliver an Enterprise Data Fabric
Using Cloud Automation Technologies to Deliver an Enterprise Data FabricUsing Cloud Automation Technologies to Deliver an Enterprise Data Fabric
Using Cloud Automation Technologies to Deliver an Enterprise Data Fabric
 
Spark + AI Summit 2020 イベント概要
Spark + AI Summit 2020 イベント概要Spark + AI Summit 2020 イベント概要
Spark + AI Summit 2020 イベント概要
 
Azure Databricks & Spark @ Techorama 2018
Azure Databricks & Spark @ Techorama 2018Azure Databricks & Spark @ Techorama 2018
Azure Databricks & Spark @ Techorama 2018
 
Big Data Engineering for Machine Learning
Big Data Engineering for Machine LearningBig Data Engineering for Machine Learning
Big Data Engineering for Machine Learning
 
Presto talk @ Global AI conference 2018 Boston
Presto talk @ Global AI conference 2018 BostonPresto talk @ Global AI conference 2018 Boston
Presto talk @ Global AI conference 2018 Boston
 
How Microsoft Synapse Analytics Can Transform Your Data Analytics.pdf
How Microsoft Synapse Analytics Can Transform Your Data Analytics.pdfHow Microsoft Synapse Analytics Can Transform Your Data Analytics.pdf
How Microsoft Synapse Analytics Can Transform Your Data Analytics.pdf
 
Running Presto and Spark on the Netflix Big Data Platform
Running Presto and Spark on the Netflix Big Data PlatformRunning Presto and Spark on the Netflix Big Data Platform
Running Presto and Spark on the Netflix Big Data Platform
 
Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...
Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...
Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...
 

Más de AWS Chicago

AWS reInvent 2023 recaps from Chicago AWS user group
AWS reInvent 2023 recaps from Chicago AWS user groupAWS reInvent 2023 recaps from Chicago AWS user group
AWS reInvent 2023 recaps from Chicago AWS user groupAWS Chicago
 
Chicago AWS Solutions Architect Mehdy Haghy recaps the new AI/ML releases and...
Chicago AWS Solutions Architect Mehdy Haghy recaps the new AI/ML releases and...Chicago AWS Solutions Architect Mehdy Haghy recaps the new AI/ML releases and...
Chicago AWS Solutions Architect Mehdy Haghy recaps the new AI/ML releases and...AWS Chicago
 
WilliamCollins_Road-to-Transit-Gateway.pptx
WilliamCollins_Road-to-Transit-Gateway.pptxWilliamCollins_Road-to-Transit-Gateway.pptx
WilliamCollins_Road-to-Transit-Gateway.pptxAWS Chicago
 
Suresh Poopandi_Generative AI On AWS-MidWestCommunityDay-Final.pdf
Suresh Poopandi_Generative AI On AWS-MidWestCommunityDay-Final.pdfSuresh Poopandi_Generative AI On AWS-MidWestCommunityDay-Final.pdf
Suresh Poopandi_Generative AI On AWS-MidWestCommunityDay-Final.pdfAWS Chicago
 
Streamlined Entitlements with AWS Lake Formation - Anusha Dwivedula
Streamlined Entitlements with AWS Lake Formation - Anusha DwivedulaStreamlined Entitlements with AWS Lake Formation - Anusha Dwivedula
Streamlined Entitlements with AWS Lake Formation - Anusha DwivedulaAWS Chicago
 
Steve Seaney_AWS Control Tower - 2023 Midwest Community Day - Final.pptx
Steve Seaney_AWS Control Tower - 2023 Midwest Community Day - Final.pptxSteve Seaney_AWS Control Tower - 2023 Midwest Community Day - Final.pptx
Steve Seaney_AWS Control Tower - 2023 Midwest Community Day - Final.pptxAWS Chicago
 
Saurabh_Shanbhag - Building_SaaS_on_AWS.pptx
Saurabh_Shanbhag - Building_SaaS_on_AWS.pptxSaurabh_Shanbhag - Building_SaaS_on_AWS.pptx
Saurabh_Shanbhag - Building_SaaS_on_AWS.pptxAWS Chicago
 
Sanket_Nasre_Simplify Modernization.pdf
Sanket_Nasre_Simplify Modernization.pdfSanket_Nasre_Simplify Modernization.pdf
Sanket_Nasre_Simplify Modernization.pdfAWS Chicago
 
Ross Stuart_Using ML to Solve Lifes Problems.pptx
Ross Stuart_Using ML to Solve Lifes Problems.pptxRoss Stuart_Using ML to Solve Lifes Problems.pptx
Ross Stuart_Using ML to Solve Lifes Problems.pptxAWS Chicago
 
robsable_Enhancing DevOps Practices with CloudWatch APM FINAL.pdf
robsable_Enhancing DevOps Practices with CloudWatch APM FINAL.pdfrobsable_Enhancing DevOps Practices with CloudWatch APM FINAL.pdf
robsable_Enhancing DevOps Practices with CloudWatch APM FINAL.pdfAWS Chicago
 
Sanket_Nasre_Simplify Modernization.pdf
Sanket_Nasre_Simplify Modernization.pdfSanket_Nasre_Simplify Modernization.pdf
Sanket_Nasre_Simplify Modernization.pdfAWS Chicago
 
Mohamed Wali_AWS Security Reference Architecture.pptx
Mohamed Wali_AWS Security Reference Architecture.pptxMohamed Wali_AWS Security Reference Architecture.pptx
Mohamed Wali_AWS Security Reference Architecture.pptxAWS Chicago
 
Nick-Walter-HOB_Migrating_Dinosaurs.pptx
Nick-Walter-HOB_Migrating_Dinosaurs.pptxNick-Walter-HOB_Migrating_Dinosaurs.pptx
Nick-Walter-HOB_Migrating_Dinosaurs.pptxAWS Chicago
 
Pat_Davies_AWSCostOptimization_Final.pdf
Pat_Davies_AWSCostOptimization_Final.pdfPat_Davies_AWSCostOptimization_Final.pdf
Pat_Davies_AWSCostOptimization_Final.pdfAWS Chicago
 
MARK GAMBLE_ASC For Really Remote Edge Computing - AWS Community Day Chicago ...
MARK GAMBLE_ASC For Really Remote Edge Computing - AWS Community Day Chicago ...MARK GAMBLE_ASC For Really Remote Edge Computing - AWS Community Day Chicago ...
MARK GAMBLE_ASC For Really Remote Edge Computing - AWS Community Day Chicago ...AWS Chicago
 
MichaelSoule-UsingJupyterNotebooks.pptx
MichaelSoule-UsingJupyterNotebooks.pptxMichaelSoule-UsingJupyterNotebooks.pptx
MichaelSoule-UsingJupyterNotebooks.pptxAWS Chicago
 
Michal Brygidyn_CloudHackingScenarios.pdf
Michal Brygidyn_CloudHackingScenarios.pdfMichal Brygidyn_CloudHackingScenarios.pdf
Michal Brygidyn_CloudHackingScenarios.pdfAWS Chicago
 
Kamil Kolodziejski_Structura-AWS.pptx
Kamil Kolodziejski_Structura-AWS.pptxKamil Kolodziejski_Structura-AWS.pptx
Kamil Kolodziejski_Structura-AWS.pptxAWS Chicago
 
John Merline AWS Certification FAQ.pptx
John Merline AWS Certification FAQ.pptxJohn Merline AWS Certification FAQ.pptx
John Merline AWS Certification FAQ.pptxAWS Chicago
 
JuliaFMorgado_Breaking_bad_habits.pptx
JuliaFMorgado_Breaking_bad_habits.pptxJuliaFMorgado_Breaking_bad_habits.pptx
JuliaFMorgado_Breaking_bad_habits.pptxAWS Chicago
 

Más de AWS Chicago (20)

AWS reInvent 2023 recaps from Chicago AWS user group
AWS reInvent 2023 recaps from Chicago AWS user groupAWS reInvent 2023 recaps from Chicago AWS user group
AWS reInvent 2023 recaps from Chicago AWS user group
 
Chicago AWS Solutions Architect Mehdy Haghy recaps the new AI/ML releases and...
Chicago AWS Solutions Architect Mehdy Haghy recaps the new AI/ML releases and...Chicago AWS Solutions Architect Mehdy Haghy recaps the new AI/ML releases and...
Chicago AWS Solutions Architect Mehdy Haghy recaps the new AI/ML releases and...
 
WilliamCollins_Road-to-Transit-Gateway.pptx
WilliamCollins_Road-to-Transit-Gateway.pptxWilliamCollins_Road-to-Transit-Gateway.pptx
WilliamCollins_Road-to-Transit-Gateway.pptx
 
Suresh Poopandi_Generative AI On AWS-MidWestCommunityDay-Final.pdf
Suresh Poopandi_Generative AI On AWS-MidWestCommunityDay-Final.pdfSuresh Poopandi_Generative AI On AWS-MidWestCommunityDay-Final.pdf
Suresh Poopandi_Generative AI On AWS-MidWestCommunityDay-Final.pdf
 
Streamlined Entitlements with AWS Lake Formation - Anusha Dwivedula
Streamlined Entitlements with AWS Lake Formation - Anusha DwivedulaStreamlined Entitlements with AWS Lake Formation - Anusha Dwivedula
Streamlined Entitlements with AWS Lake Formation - Anusha Dwivedula
 
Steve Seaney_AWS Control Tower - 2023 Midwest Community Day - Final.pptx
Steve Seaney_AWS Control Tower - 2023 Midwest Community Day - Final.pptxSteve Seaney_AWS Control Tower - 2023 Midwest Community Day - Final.pptx
Steve Seaney_AWS Control Tower - 2023 Midwest Community Day - Final.pptx
 
Saurabh_Shanbhag - Building_SaaS_on_AWS.pptx
Saurabh_Shanbhag - Building_SaaS_on_AWS.pptxSaurabh_Shanbhag - Building_SaaS_on_AWS.pptx
Saurabh_Shanbhag - Building_SaaS_on_AWS.pptx
 
Sanket_Nasre_Simplify Modernization.pdf
Sanket_Nasre_Simplify Modernization.pdfSanket_Nasre_Simplify Modernization.pdf
Sanket_Nasre_Simplify Modernization.pdf
 
Ross Stuart_Using ML to Solve Lifes Problems.pptx
Ross Stuart_Using ML to Solve Lifes Problems.pptxRoss Stuart_Using ML to Solve Lifes Problems.pptx
Ross Stuart_Using ML to Solve Lifes Problems.pptx
 
robsable_Enhancing DevOps Practices with CloudWatch APM FINAL.pdf
robsable_Enhancing DevOps Practices with CloudWatch APM FINAL.pdfrobsable_Enhancing DevOps Practices with CloudWatch APM FINAL.pdf
robsable_Enhancing DevOps Practices with CloudWatch APM FINAL.pdf
 
Sanket_Nasre_Simplify Modernization.pdf
Sanket_Nasre_Simplify Modernization.pdfSanket_Nasre_Simplify Modernization.pdf
Sanket_Nasre_Simplify Modernization.pdf
 
Mohamed Wali_AWS Security Reference Architecture.pptx
Mohamed Wali_AWS Security Reference Architecture.pptxMohamed Wali_AWS Security Reference Architecture.pptx
Mohamed Wali_AWS Security Reference Architecture.pptx
 
Nick-Walter-HOB_Migrating_Dinosaurs.pptx
Nick-Walter-HOB_Migrating_Dinosaurs.pptxNick-Walter-HOB_Migrating_Dinosaurs.pptx
Nick-Walter-HOB_Migrating_Dinosaurs.pptx
 
Pat_Davies_AWSCostOptimization_Final.pdf
Pat_Davies_AWSCostOptimization_Final.pdfPat_Davies_AWSCostOptimization_Final.pdf
Pat_Davies_AWSCostOptimization_Final.pdf
 
MARK GAMBLE_ASC For Really Remote Edge Computing - AWS Community Day Chicago ...
MARK GAMBLE_ASC For Really Remote Edge Computing - AWS Community Day Chicago ...MARK GAMBLE_ASC For Really Remote Edge Computing - AWS Community Day Chicago ...
MARK GAMBLE_ASC For Really Remote Edge Computing - AWS Community Day Chicago ...
 
MichaelSoule-UsingJupyterNotebooks.pptx
MichaelSoule-UsingJupyterNotebooks.pptxMichaelSoule-UsingJupyterNotebooks.pptx
MichaelSoule-UsingJupyterNotebooks.pptx
 
Michal Brygidyn_CloudHackingScenarios.pdf
Michal Brygidyn_CloudHackingScenarios.pdfMichal Brygidyn_CloudHackingScenarios.pdf
Michal Brygidyn_CloudHackingScenarios.pdf
 
Kamil Kolodziejski_Structura-AWS.pptx
Kamil Kolodziejski_Structura-AWS.pptxKamil Kolodziejski_Structura-AWS.pptx
Kamil Kolodziejski_Structura-AWS.pptx
 
John Merline AWS Certification FAQ.pptx
John Merline AWS Certification FAQ.pptxJohn Merline AWS Certification FAQ.pptx
John Merline AWS Certification FAQ.pptx
 
JuliaFMorgado_Breaking_bad_habits.pptx
JuliaFMorgado_Breaking_bad_habits.pptxJuliaFMorgado_Breaking_bad_habits.pptx
JuliaFMorgado_Breaking_bad_habits.pptx
 

Último

Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)Samir Dash
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAnitaRaj43
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMKumar Satyam
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard37
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 

Último (20)

Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 

Piyali Kamra - Analytics and Data Visualization pipeline backed by AWS Glue & Athena

  • 2. Analytics Pipeline using Glue/Athena and Visualization Piyali Kamra| 2020 | Sr Principal Architect, Morningstar.
  • 3. Agenda Terminologies Classifier Crawler Data catalog Parquet data format Athena Architecture Demo Metrics Size/Execution time Non partitioned data Partitioned Data Raw CSV data SQL Queries Sample Visualization.
  • 4. Terminologies Crawlers crawl data sets and populate the Glue Data Catalog Glue Data Catalog - Meta data informationabout the schema , locationof data. ReadbyGlue ETL. Created bythe Crawler A GLUE job basically consist ofbusiness logic that performs ETL work. Glue is a managed service that uses apache spark.The boilerplate code is in python/ scala. Parquet is an open source fileformat available to any projectin the Hadoop ecosystem. Apache Parquet is designed for efficient as well as performant flatcolumnar storage format of data compared to row-based files like CSV or TSV files. Athena is a serverless interactivequery servicebased on the presto processing engine .Can run analytic queries on largedata sets from S3 buckets in csv / parquet /JSON formats.
  • 5.
  • 6. Setup for S3 Source buckets 1. Source raw data s3 bucket. 2. Partitionedsource rawdatas3 bucket. Screen Shot 2020-06-08 at 12.02.0
  • 7. Setup the Crawler and Data Catalog
  • 8. Setup the Glue ETL Job Parquet format data in target s3 bucket after the glue job is run
  • 9. Demo Recorded video Import new client - partitioned use case Run crawler - data catalog data update , schema update Run Glue Job - update spark code to include partition Check new partition in parquet S3 bucket. Athena query by updating partition. Visualization Options.
  • 10. Metrics – Athena backed by csv data
  • 11. Metrics – Athena backed by non partitionedparquet data
  • 12. Metrics – Athena backed by partitionedparquet data