SlideShare a Scribd company logo
1 of 14
Download to read offline
Using Airflow to speed up
development of data intensive tools
Blaine Elliott
Data Engineer @ One Medical
Twitter: @blainee
Airflow Summit
July 10th, 2020
Purpose of this talk?
● To demonstrate how Airflow can help you build
new tools
● Inspire others to do the same
Who am I?
● Data Engineer @ One Medical
● Formerly @ LinkedIn, Chegg, MySpace
Intro...
Proprietary and ConfidentialOne Medical
● A tool to detect data anomalies
● The architecture of this tool
...also how the tool communicates with Airflow
● How Airflow decreased the cost to develop this tool
3
What are we going to cover in this talk?
Airflow Summit - Summer 2020
Proprietary and ConfidentialOne Medical
● At One Medical, we consume and create a lot of data
● We want to find bad data before it’s passed on to analysts
● We’re lazy engineers
4
Setting up the problem...
Airflow Summit - Summer 2020
Proprietary and ConfidentialOne Medical
● Needs to detect abnormal data
● Can scale to thousands of tables and columns
● Cost to develop the tool is minimized
5
Feature requirements for our Data Anomaly Detector(“DAD Tool”)
Airflow Summit - Summer 2020
Proprietary and ConfidentialOne Medical 6
● The ability to do statistical analysis
● Storage to persist data & test results
● UI/UX to manage the tool, create tests, & analyze results
● Database interoperability
(authentication, communication)
● The ability to run thousands of tests per day
● Must be secure
(must pass a security audit)
What is need to make this work?
Airflow Summit - Summer 2020
Proprietary and ConfidentialOne Medical
Airflow steps…
1. Create dynamic DAGs
2. Tell Airflow to run our DAGs
3. Process the DAGs
4. Send results to the DAD Tool
7
The Data Anomaly Detector(“DAD Tool”)
Airflow Summit - Summer 2020
Proprietary and ConfidentialOne Medical 8
Airflow Integration (4 steps)
Airflow Summit - Summer 2020
1. Send SQL to Airflow, as a text file on S3
2. Send request to Airflow to process DAGs
3. Process DAGs
4. Send test results to DAD Tool, as a pickled file on S3
Proprietary and ConfidentialOne Medical
1. User defines a test
Ex, all values in a time series must be within X σ’s of the mean.
2. User applies the test to a column
Ex, Using our new test, set threshold to 3-σ’s, use the table patients
w/the column systolic_blood_pressure for the most recent 90 days.
3. The DAD Tool + Airflow processes all the things
4. User analyzes results in the DAD Tool UI
9
Anatomy of a test
Airflow Summit - Summer 2020
Proprietary and ConfidentialOne Medical
● Needs to detect abnormal data
● Can scale to thousands of tables and columns
● Cost to develop the tool is minimized
10
Requirements Review
Airflow Summit - Summer 2020
Proprietary and ConfidentialOne Medical 11
● The complexity of Airflow is hidden from users
● Using Airflow for part of the backend processing of the DAD Tool
significantly decreased development time
● Because Airflow was already actively used at One Medical, desirable
features already available in Airflow could be made available to the
DAD Tool
● Time that would have been spent building features in Airflow were
repurposed to improve the DAD Tool
Conclusions
Airflow Summit - Summer 2020
Proprietary and ConfidentialOne Medical
● No need to manage database authentication
● Databases configured in Airflow are immediately available to the
DAD Tool
● Parallelism is managed by Airflow
● Throttling is managed by Airflow
● Since Airflow already passed our security audit, minimal effort was
needed to get approved to leverage Airflow in the DAD Tool
12
List of Airflow features that enable the DAD Tool
Airflow Summit - Summer 2020
Proprietary and ConfidentialOne Medical
Q. Why not use XCOM?
A. Using S3 (an any other object store) is stateful, fault tolerant and avoids
any limitations on how much data is being transferred.
Q. Is the DAD Tool open source?
A. Not currently but I am working towards that goal.
Answers to common questions
13
Airflow Summit - Summer 2020
Thank you
Blaine Elliott
Sr Data Engineer @ One Medical
Twitter: @blainee
Airflow Summit
July 10th, 2020

More Related Content

Similar to Using airflow for tools development

Multiple awr reports_parser
Multiple awr reports_parserMultiple awr reports_parser
Multiple awr reports_parserJacques Kostic
 
Universal test solutions customer testimonial 10192013-v2.2
Universal test solutions customer testimonial 10192013-v2.2Universal test solutions customer testimonial 10192013-v2.2
Universal test solutions customer testimonial 10192013-v2.2Universal Technology Solutions
 
Major Project Report on Designing an Android Application for Electrical Maint...
Major Project Report on Designing an Android Application for Electrical Maint...Major Project Report on Designing an Android Application for Electrical Maint...
Major Project Report on Designing an Android Application for Electrical Maint...Amit Kumar
 
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data PipelinesPutting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data PipelinesDATAVERSITY
 
Game Analytics at London Apache Druid Meetup
Game Analytics at London Apache Druid MeetupGame Analytics at London Apache Druid Meetup
Game Analytics at London Apache Druid MeetupJelena Zanko
 
Project Management Sample
Project Management SampleProject Management Sample
Project Management SampleRavi Nakulan
 
Fri benghiat gil-odsc-data-kitchen-data science to dataops
Fri benghiat gil-odsc-data-kitchen-data science to dataopsFri benghiat gil-odsc-data-kitchen-data science to dataops
Fri benghiat gil-odsc-data-kitchen-data science to dataopsDataKitchen
 
ODSC data science to DataOps
ODSC data science to DataOpsODSC data science to DataOps
ODSC data science to DataOpsChristopher Bergh
 
Self-Service Analytics with Guard Rails
Self-Service Analytics with Guard RailsSelf-Service Analytics with Guard Rails
Self-Service Analytics with Guard RailsDenodo
 
Monitoring MongoDB Atlas with Datadog
Monitoring MongoDB Atlas with DatadogMonitoring MongoDB Atlas with Datadog
Monitoring MongoDB Atlas with DatadogMongoDB
 
Technology Primer: Monitor Microservices, Containers, Cloud Foundry and Node ...
Technology Primer: Monitor Microservices, Containers, Cloud Foundry and Node ...Technology Primer: Monitor Microservices, Containers, Cloud Foundry and Node ...
Technology Primer: Monitor Microservices, Containers, Cloud Foundry and Node ...CA Technologies
 
Advanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationAdvanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationDenodo
 
Implement a Universal Data Distribution Architecture to Manage All Streaming ...
Implement a Universal Data Distribution Architecture to Manage All Streaming ...Implement a Universal Data Distribution Architecture to Manage All Streaming ...
Implement a Universal Data Distribution Architecture to Manage All Streaming ...Timothy Spann
 
Splunk bangalore user group 2020-06-01
Splunk bangalore user group   2020-06-01Splunk bangalore user group   2020-06-01
Splunk bangalore user group 2020-06-01NiketNilay
 
Technology Primer: Hey IT—Your Big Data Infrastructure Can’t Sit in a Silo An...
Technology Primer: Hey IT—Your Big Data Infrastructure Can’t Sit in a Silo An...Technology Primer: Hey IT—Your Big Data Infrastructure Can’t Sit in a Silo An...
Technology Primer: Hey IT—Your Big Data Infrastructure Can’t Sit in a Silo An...CA Technologies
 

Similar to Using airflow for tools development (20)

Industrial IoT bootcamp
Industrial IoT bootcampIndustrial IoT bootcamp
Industrial IoT bootcamp
 
Multiple awr reports_parser
Multiple awr reports_parserMultiple awr reports_parser
Multiple awr reports_parser
 
Resume (1)
Resume (1)Resume (1)
Resume (1)
 
Resume (1)
Resume (1)Resume (1)
Resume (1)
 
Universal test solutions customer testimonial 10192013-v2.2
Universal test solutions customer testimonial 10192013-v2.2Universal test solutions customer testimonial 10192013-v2.2
Universal test solutions customer testimonial 10192013-v2.2
 
Major Project Report on Designing an Android Application for Electrical Maint...
Major Project Report on Designing an Android Application for Electrical Maint...Major Project Report on Designing an Android Application for Electrical Maint...
Major Project Report on Designing an Android Application for Electrical Maint...
 
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data PipelinesPutting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
 
Game Analytics at London Apache Druid Meetup
Game Analytics at London Apache Druid MeetupGame Analytics at London Apache Druid Meetup
Game Analytics at London Apache Druid Meetup
 
Project Management Sample
Project Management SampleProject Management Sample
Project Management Sample
 
Fri benghiat gil-odsc-data-kitchen-data science to dataops
Fri benghiat gil-odsc-data-kitchen-data science to dataopsFri benghiat gil-odsc-data-kitchen-data science to dataops
Fri benghiat gil-odsc-data-kitchen-data science to dataops
 
ODSC data science to DataOps
ODSC data science to DataOpsODSC data science to DataOps
ODSC data science to DataOps
 
Airflow presentation
Airflow presentationAirflow presentation
Airflow presentation
 
Self-Service Analytics with Guard Rails
Self-Service Analytics with Guard RailsSelf-Service Analytics with Guard Rails
Self-Service Analytics with Guard Rails
 
Monitoring MongoDB Atlas with Datadog
Monitoring MongoDB Atlas with DatadogMonitoring MongoDB Atlas with Datadog
Monitoring MongoDB Atlas with Datadog
 
Technology Primer: Monitor Microservices, Containers, Cloud Foundry and Node ...
Technology Primer: Monitor Microservices, Containers, Cloud Foundry and Node ...Technology Primer: Monitor Microservices, Containers, Cloud Foundry and Node ...
Technology Primer: Monitor Microservices, Containers, Cloud Foundry and Node ...
 
Advanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationAdvanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data Virtualization
 
Implement a Universal Data Distribution Architecture to Manage All Streaming ...
Implement a Universal Data Distribution Architecture to Manage All Streaming ...Implement a Universal Data Distribution Architecture to Manage All Streaming ...
Implement a Universal Data Distribution Architecture to Manage All Streaming ...
 
Dagster @ R&S MNT
Dagster @ R&S MNTDagster @ R&S MNT
Dagster @ R&S MNT
 
Splunk bangalore user group 2020-06-01
Splunk bangalore user group   2020-06-01Splunk bangalore user group   2020-06-01
Splunk bangalore user group 2020-06-01
 
Technology Primer: Hey IT—Your Big Data Infrastructure Can’t Sit in a Silo An...
Technology Primer: Hey IT—Your Big Data Infrastructure Can’t Sit in a Silo An...Technology Primer: Hey IT—Your Big Data Infrastructure Can’t Sit in a Silo An...
Technology Primer: Hey IT—Your Big Data Infrastructure Can’t Sit in a Silo An...
 

Recently uploaded

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 

Recently uploaded (20)

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 

Using airflow for tools development

  • 1. Using Airflow to speed up development of data intensive tools Blaine Elliott Data Engineer @ One Medical Twitter: @blainee Airflow Summit July 10th, 2020
  • 2. Purpose of this talk? ● To demonstrate how Airflow can help you build new tools ● Inspire others to do the same Who am I? ● Data Engineer @ One Medical ● Formerly @ LinkedIn, Chegg, MySpace Intro...
  • 3. Proprietary and ConfidentialOne Medical ● A tool to detect data anomalies ● The architecture of this tool ...also how the tool communicates with Airflow ● How Airflow decreased the cost to develop this tool 3 What are we going to cover in this talk? Airflow Summit - Summer 2020
  • 4. Proprietary and ConfidentialOne Medical ● At One Medical, we consume and create a lot of data ● We want to find bad data before it’s passed on to analysts ● We’re lazy engineers 4 Setting up the problem... Airflow Summit - Summer 2020
  • 5. Proprietary and ConfidentialOne Medical ● Needs to detect abnormal data ● Can scale to thousands of tables and columns ● Cost to develop the tool is minimized 5 Feature requirements for our Data Anomaly Detector(“DAD Tool”) Airflow Summit - Summer 2020
  • 6. Proprietary and ConfidentialOne Medical 6 ● The ability to do statistical analysis ● Storage to persist data & test results ● UI/UX to manage the tool, create tests, & analyze results ● Database interoperability (authentication, communication) ● The ability to run thousands of tests per day ● Must be secure (must pass a security audit) What is need to make this work? Airflow Summit - Summer 2020
  • 7. Proprietary and ConfidentialOne Medical Airflow steps… 1. Create dynamic DAGs 2. Tell Airflow to run our DAGs 3. Process the DAGs 4. Send results to the DAD Tool 7 The Data Anomaly Detector(“DAD Tool”) Airflow Summit - Summer 2020
  • 8. Proprietary and ConfidentialOne Medical 8 Airflow Integration (4 steps) Airflow Summit - Summer 2020 1. Send SQL to Airflow, as a text file on S3 2. Send request to Airflow to process DAGs 3. Process DAGs 4. Send test results to DAD Tool, as a pickled file on S3
  • 9. Proprietary and ConfidentialOne Medical 1. User defines a test Ex, all values in a time series must be within X σ’s of the mean. 2. User applies the test to a column Ex, Using our new test, set threshold to 3-σ’s, use the table patients w/the column systolic_blood_pressure for the most recent 90 days. 3. The DAD Tool + Airflow processes all the things 4. User analyzes results in the DAD Tool UI 9 Anatomy of a test Airflow Summit - Summer 2020
  • 10. Proprietary and ConfidentialOne Medical ● Needs to detect abnormal data ● Can scale to thousands of tables and columns ● Cost to develop the tool is minimized 10 Requirements Review Airflow Summit - Summer 2020
  • 11. Proprietary and ConfidentialOne Medical 11 ● The complexity of Airflow is hidden from users ● Using Airflow for part of the backend processing of the DAD Tool significantly decreased development time ● Because Airflow was already actively used at One Medical, desirable features already available in Airflow could be made available to the DAD Tool ● Time that would have been spent building features in Airflow were repurposed to improve the DAD Tool Conclusions Airflow Summit - Summer 2020
  • 12. Proprietary and ConfidentialOne Medical ● No need to manage database authentication ● Databases configured in Airflow are immediately available to the DAD Tool ● Parallelism is managed by Airflow ● Throttling is managed by Airflow ● Since Airflow already passed our security audit, minimal effort was needed to get approved to leverage Airflow in the DAD Tool 12 List of Airflow features that enable the DAD Tool Airflow Summit - Summer 2020
  • 13. Proprietary and ConfidentialOne Medical Q. Why not use XCOM? A. Using S3 (an any other object store) is stateful, fault tolerant and avoids any limitations on how much data is being transferred. Q. Is the DAD Tool open source? A. Not currently but I am working towards that goal. Answers to common questions 13 Airflow Summit - Summer 2020
  • 14. Thank you Blaine Elliott Sr Data Engineer @ One Medical Twitter: @blainee Airflow Summit July 10th, 2020