Splunk Cloud and Splunk Enterprise 7.2 provide enhanced capabilities for data ingestion, visualization, and analytics powered by artificial intelligence and machine learning. New features include guided data onboarding, metrics search performance improvements, smart data tiering for cost optimization, and accessibility enhancements. These updates aim to empower more users and accelerate business value from machine learning.
Splunk Cloud and Splunk Enterprise 7.2 provide breakthrough performance, scale, and manageability. Key features include SmartStore for cost-effective data management, workload management to prioritize analytics workloads, and accessibility enhancements to enable more users. The release also expands AI/ML capabilities and delivers intuitive metrics visualization and search.
Splunk Cloud and Splunk Enterprise 7.2 provide enhanced capabilities for data ingestion, visualization, and analytics powered by artificial intelligence and machine learning. New features include guided data onboarding, metrics search performance improvements, workload management for prioritizing queries, and accessibility enhancements. The presentation highlights how these updates help users gain more insights from their machine data and empower more people to explore and analyze data.
What's New with the Latest Splunk Platform ReleaseSplunk
This presentation + demo provides an overview of Splunk Cloud and Splunk Enterprise version 7.2, and Splunk Machine Learning Toolkit 4.0 – the customer value proposition, supporting customer stories, and high-level technical details.
Alle Neuigkeiten im letzten Plattform ReleaseSplunk
Diese Session und Demo liefert einen Überblick über Splunk Cloud und Splunk Enterprise Version 7.2 und Splunk Machine Learning Toolkit 4.0 - Mehrwert für den Anwender, Kundenbeispiele und High-Level technische Details.
Splunk Discovery: Warsaw 2018 - Legacy SIEM to Splunk, How to Conquer Migrati...Splunk
Presented at Splunk Discovery Warsaw 2018:
SIEM Replacement Methodology
Use Cases
Data Sources & Data Onboarding
Architecture
Third Party Integration
You Got This!
Splunk Cloud and Splunk Enterprise 7.2 provide enhanced capabilities for data ingestion, visualization, and analytics powered by artificial intelligence and machine learning. New features include guided data onboarding, metrics search performance improvements, smart data tiering for cost optimization, and accessibility enhancements. These updates aim to empower more users and accelerate business value from machine learning.
Splunk Cloud and Splunk Enterprise 7.2 provide breakthrough performance, scale, and manageability. Key features include SmartStore for cost-effective data management, workload management to prioritize analytics workloads, and accessibility enhancements to enable more users. The release also expands AI/ML capabilities and delivers intuitive metrics visualization and search.
Splunk Cloud and Splunk Enterprise 7.2 provide enhanced capabilities for data ingestion, visualization, and analytics powered by artificial intelligence and machine learning. New features include guided data onboarding, metrics search performance improvements, workload management for prioritizing queries, and accessibility enhancements. The presentation highlights how these updates help users gain more insights from their machine data and empower more people to explore and analyze data.
What's New with the Latest Splunk Platform ReleaseSplunk
This presentation + demo provides an overview of Splunk Cloud and Splunk Enterprise version 7.2, and Splunk Machine Learning Toolkit 4.0 – the customer value proposition, supporting customer stories, and high-level technical details.
Alle Neuigkeiten im letzten Plattform ReleaseSplunk
Diese Session und Demo liefert einen Überblick über Splunk Cloud und Splunk Enterprise Version 7.2 und Splunk Machine Learning Toolkit 4.0 - Mehrwert für den Anwender, Kundenbeispiele und High-Level technische Details.
Splunk Discovery: Warsaw 2018 - Legacy SIEM to Splunk, How to Conquer Migrati...Splunk
Presented at Splunk Discovery Warsaw 2018:
SIEM Replacement Methodology
Use Cases
Data Sources & Data Onboarding
Architecture
Third Party Integration
You Got This!
During the presentation, forward-looking statements were made regarding Splunk's plans and estimates that are subject to risks and uncertainties. Any information about Splunk's roadmap outlines general product direction but is subject to change without notice. Splunk undertakes no obligation to develop or include any described feature in a future release. The presentation demonstrated Splunk's IoT analytics capabilities for manufacturing including predictive maintenance, advanced monitoring, and self-service analytics.
The document discusses how Staples uses Splunk for operational support, application insights, and business intelligence across their infrastructure. Staples relies on Splunk for real-time visibility into the health of their Advantage website and business/operational analytics. Splunk provides comprehensive insights into Staples' infrastructure and helps map application performance to user experience. It has saved Staples numerous times by quickly detecting issues. Adoption of Splunk at Staples has grown organically as more teams see its benefits.
Splunk Webinar: IT Operations Demo für Troubleshooting & DashboardingGeorg Knon
This document provides an overview of Splunk's IT operations software. It discusses the challenges facing IT operations, including siloed tools and reactive problem solving. It presents Splunk as a solution, with its ability to index and analyze machine data from any source in real-time. Key benefits highlighted include faster troubleshooting to reduce downtime, proactive monitoring to address issues before they become problems, and increased operational visibility across the IT environment. The document concludes with a demonstration of Splunk's IT service intelligence capabilities.
SplunkLive! Zurich 2018: Legacy SIEM to Splunk, How to Conquer Migration and ...Splunk
This document provides an overview of best practices for migrating from a legacy SIEM to Splunk Enterprise Security. It discusses identifying high-value use cases to prioritize for migration. Proper data source onboarding using technologies like the Universal Forwarder and Technology Add-ons is also covered. The presentation recommends planning the target architecture and identifying any necessary third-party integrations. Some preparatory steps customers can take today to get ready for the replacement are also listed.
SplunkLive! Zurich 2018: Monitoring the End User Experience with SplunkSplunk
This document discusses using Splunk to gain insights into end user experience and the factors that influence experience. Splunk provides a platform approach to monitor applications across the full technology stack from networks to databases. It can ingest data from various sources, including APM tools, and provide visibility into both instrumented and non-instrumented applications and environments. Splunk also offers predictive analytics capabilities and allows various stakeholders like operations and business teams to access and analyze data. The document demonstrates how Splunk can help organizations improve user experience, application performance, and collaboration between teams.
Monitoring End User Experiences with New Relic & SplunkAbner Germanow
When your digital experience is your brand experience, understanding what your customers go through is critical. Troubleshooting and optimizing their experiences requires visibility into metrics, traces and logs. In this session, we'll demonstrate how to use the combined power of New Relic's real-user monitoring and application performance monitoring with Splunk to keep teams focused on identifying issues before customers tweet, fixing problems fast and knowing what to tackle next.
SplunkLive! Frankfurt 2018 - Legacy SIEM to Splunk, How to Conquer Migration ...Splunk
Presented at SplunkLive! Frankfurt 2018:
Introduction
SIEM Migration Methodology
Use Cases
Datasources & Data Onboarding
ES Architecture
Third-Party Integrations
You Got This!
Thanks for coming out to the first PNW user group of 2023, and our first IN PERSON user group in a couple years!
Dan Hogland caught us up on the latest Enterprise Security updates, Melissa Riley brought the best strategies to leverage FREE Splunk Education (and the Academic Alliances program for all you universities who joined us!) and we welcomed new User Group leader Rob de Luna.
See you in a couple of months, in person in Seattle!
Splunk Data Onboarding Overview - Splunk Data Collection ArchitectureSplunk
Splunk's Naman Joshi and Jon Harris presented the Splunk Data Onboarding overview at SplunkLive! Sydney. This presentation covers:
1. Splunk Data Collection Architecture 2. Apps and Technology Add-ons
3. Demos / Examples
4. Best Practices
5. Resources and Q&A
The very first in-person PNW Splunk user group in Seattle since before the pandemic! REI's Michael Bunner brought us a constellation of automation patterns, and Splunk's Rob de Luna walked us through Edge Processor. It was so great to see folks out and about and getting into deep discussion! Next in-person meeting in PDX. Sponsored by Arcus Data.
An overview of Splunk Enterprise 6.3. Presented by Splunk's Jim Viegas at GTRI's Splunk Tech Day, December 8, 2015.
Visit http://www.gtri.com/ for more information.
Covering off some of the latest announcements at Splunk's user conference (.conf), an Add-on created to Splunk config files and also the presentation delivered at .conf18 on SplDevOps!
Splunk4Rookies - Attendee - May 2023.pdfdjdhhdddhhd
This document discusses creating a dashboard in Splunk with four views to meet the needs of different teams at a company. The IT operations team needs a view showing successful and unsuccessful web server requests over time. The DevOps team needs views of the most common customer operating systems and web browsers experiencing failures. The security/fraud team needs to see website activity by geographic location. Instructions are provided to create searches and visualizations to populate these views on a dashboard for multiple use cases.
SplunkLive! Zurich 2018: Integrating Metrics and LogsSplunk
This document discusses integrating metrics and logs in Splunk for enhanced troubleshooting and monitoring. It provides an overview of metrics and how they are defined, compared to events. Metrics support in Splunk allows for more efficient aggregation, storage, and analysis of time-series data. Example use cases mentioned include IT operations, application performance monitoring, and IoT. Pricing is still based on uncompressed data volume ingested, with each metrics measurement licensed at around 150 bytes.
Get an updated 2019 introduction to Sentinel, HashiCorp's policy as code framework. See demos of Sentinel policies inside Terraform, Consul, Nomad, and Vault, and learn about upcoming features.
SplunkLive! London - Splunk App for Stream & MINT BreakoutSplunk
The document discusses new features in Splunk's App for Stream and Splunk MINT. It introduces the Splunk App for Stream, which enables real-time insights into private, public and hybrid cloud infrastructures through efficient wire data capture. It also discusses Splunk for Mobile Intelligence (MINT), which provides mobile analytics capabilities. The document promotes these products as enhancing operational intelligence through efficient and cloud-ready wire data collection.
The document provides an overview of the Splunk data platform. It discusses how Splunk helps organizations overcome challenges in turning real-time data into action. Splunk provides a single platform to investigate, monitor, and take action on any type of machine data from any source. It enables multiple use cases across IT, security, and business domains. The document highlights some of Splunk's products, capabilities, and customer benefits.
SplunkLive! London 2017 - Happy Apps, Happy UsersSplunk
No matter what business you’re in, your web applications are front-and-center for your customers. Downtime, or even bad performance not only creates a spike in costs, they often translate into loss of customers and revenue. You need immediate insight into the availability, performance and usage of your applications and the infrastructure your applications run on. In this session, you will learn why you need to take a platform approach to full stack application management, whether your applications reside on-premises or in the cloud. Second, we will show you how you can use Splunk to monitor the usage and performance of your applications, and quickly troubleshoot faults by stepping through some of the most common issues our customers experience. Third, we’ll contrast what Splunk does relative to other APM tools you may already have deployed, and even show you how you can bring APM data into Splunk to gain more insight into application performance.
SFBA Splunk Usergroup meeting December 14, 2023Becky Burwell
The summary provides an overview of the key topics and announcements from the Splunk User Group meeting:
1. The meeting will start at 11:10 am PST with a welcome and announcements before speakers present.
2. Upcoming meeting dates and locations for 2023 are provided, including a virtual meeting in March 2023.
3. The presentation will cover writing documentation for Splunk, including administrator documentation, user documentation, and documenting known issues. Tips are provided about iterating on documentation.
During the presentation, forward-looking statements were made regarding Splunk's plans and estimates that are subject to risks and uncertainties. Any information about Splunk's roadmap outlines general product direction but is subject to change without notice. Splunk undertakes no obligation to develop or include any described feature in a future release. The presentation demonstrated Splunk's IoT analytics capabilities for manufacturing including predictive maintenance, advanced monitoring, and self-service analytics.
The document discusses how Staples uses Splunk for operational support, application insights, and business intelligence across their infrastructure. Staples relies on Splunk for real-time visibility into the health of their Advantage website and business/operational analytics. Splunk provides comprehensive insights into Staples' infrastructure and helps map application performance to user experience. It has saved Staples numerous times by quickly detecting issues. Adoption of Splunk at Staples has grown organically as more teams see its benefits.
Splunk Webinar: IT Operations Demo für Troubleshooting & DashboardingGeorg Knon
This document provides an overview of Splunk's IT operations software. It discusses the challenges facing IT operations, including siloed tools and reactive problem solving. It presents Splunk as a solution, with its ability to index and analyze machine data from any source in real-time. Key benefits highlighted include faster troubleshooting to reduce downtime, proactive monitoring to address issues before they become problems, and increased operational visibility across the IT environment. The document concludes with a demonstration of Splunk's IT service intelligence capabilities.
SplunkLive! Zurich 2018: Legacy SIEM to Splunk, How to Conquer Migration and ...Splunk
This document provides an overview of best practices for migrating from a legacy SIEM to Splunk Enterprise Security. It discusses identifying high-value use cases to prioritize for migration. Proper data source onboarding using technologies like the Universal Forwarder and Technology Add-ons is also covered. The presentation recommends planning the target architecture and identifying any necessary third-party integrations. Some preparatory steps customers can take today to get ready for the replacement are also listed.
SplunkLive! Zurich 2018: Monitoring the End User Experience with SplunkSplunk
This document discusses using Splunk to gain insights into end user experience and the factors that influence experience. Splunk provides a platform approach to monitor applications across the full technology stack from networks to databases. It can ingest data from various sources, including APM tools, and provide visibility into both instrumented and non-instrumented applications and environments. Splunk also offers predictive analytics capabilities and allows various stakeholders like operations and business teams to access and analyze data. The document demonstrates how Splunk can help organizations improve user experience, application performance, and collaboration between teams.
Monitoring End User Experiences with New Relic & SplunkAbner Germanow
When your digital experience is your brand experience, understanding what your customers go through is critical. Troubleshooting and optimizing their experiences requires visibility into metrics, traces and logs. In this session, we'll demonstrate how to use the combined power of New Relic's real-user monitoring and application performance monitoring with Splunk to keep teams focused on identifying issues before customers tweet, fixing problems fast and knowing what to tackle next.
SplunkLive! Frankfurt 2018 - Legacy SIEM to Splunk, How to Conquer Migration ...Splunk
Presented at SplunkLive! Frankfurt 2018:
Introduction
SIEM Migration Methodology
Use Cases
Datasources & Data Onboarding
ES Architecture
Third-Party Integrations
You Got This!
Thanks for coming out to the first PNW user group of 2023, and our first IN PERSON user group in a couple years!
Dan Hogland caught us up on the latest Enterprise Security updates, Melissa Riley brought the best strategies to leverage FREE Splunk Education (and the Academic Alliances program for all you universities who joined us!) and we welcomed new User Group leader Rob de Luna.
See you in a couple of months, in person in Seattle!
Splunk Data Onboarding Overview - Splunk Data Collection ArchitectureSplunk
Splunk's Naman Joshi and Jon Harris presented the Splunk Data Onboarding overview at SplunkLive! Sydney. This presentation covers:
1. Splunk Data Collection Architecture 2. Apps and Technology Add-ons
3. Demos / Examples
4. Best Practices
5. Resources and Q&A
The very first in-person PNW Splunk user group in Seattle since before the pandemic! REI's Michael Bunner brought us a constellation of automation patterns, and Splunk's Rob de Luna walked us through Edge Processor. It was so great to see folks out and about and getting into deep discussion! Next in-person meeting in PDX. Sponsored by Arcus Data.
An overview of Splunk Enterprise 6.3. Presented by Splunk's Jim Viegas at GTRI's Splunk Tech Day, December 8, 2015.
Visit http://www.gtri.com/ for more information.
Covering off some of the latest announcements at Splunk's user conference (.conf), an Add-on created to Splunk config files and also the presentation delivered at .conf18 on SplDevOps!
Splunk4Rookies - Attendee - May 2023.pdfdjdhhdddhhd
This document discusses creating a dashboard in Splunk with four views to meet the needs of different teams at a company. The IT operations team needs a view showing successful and unsuccessful web server requests over time. The DevOps team needs views of the most common customer operating systems and web browsers experiencing failures. The security/fraud team needs to see website activity by geographic location. Instructions are provided to create searches and visualizations to populate these views on a dashboard for multiple use cases.
SplunkLive! Zurich 2018: Integrating Metrics and LogsSplunk
This document discusses integrating metrics and logs in Splunk for enhanced troubleshooting and monitoring. It provides an overview of metrics and how they are defined, compared to events. Metrics support in Splunk allows for more efficient aggregation, storage, and analysis of time-series data. Example use cases mentioned include IT operations, application performance monitoring, and IoT. Pricing is still based on uncompressed data volume ingested, with each metrics measurement licensed at around 150 bytes.
Get an updated 2019 introduction to Sentinel, HashiCorp's policy as code framework. See demos of Sentinel policies inside Terraform, Consul, Nomad, and Vault, and learn about upcoming features.
SplunkLive! London - Splunk App for Stream & MINT BreakoutSplunk
The document discusses new features in Splunk's App for Stream and Splunk MINT. It introduces the Splunk App for Stream, which enables real-time insights into private, public and hybrid cloud infrastructures through efficient wire data capture. It also discusses Splunk for Mobile Intelligence (MINT), which provides mobile analytics capabilities. The document promotes these products as enhancing operational intelligence through efficient and cloud-ready wire data collection.
The document provides an overview of the Splunk data platform. It discusses how Splunk helps organizations overcome challenges in turning real-time data into action. Splunk provides a single platform to investigate, monitor, and take action on any type of machine data from any source. It enables multiple use cases across IT, security, and business domains. The document highlights some of Splunk's products, capabilities, and customer benefits.
SplunkLive! London 2017 - Happy Apps, Happy UsersSplunk
No matter what business you’re in, your web applications are front-and-center for your customers. Downtime, or even bad performance not only creates a spike in costs, they often translate into loss of customers and revenue. You need immediate insight into the availability, performance and usage of your applications and the infrastructure your applications run on. In this session, you will learn why you need to take a platform approach to full stack application management, whether your applications reside on-premises or in the cloud. Second, we will show you how you can use Splunk to monitor the usage and performance of your applications, and quickly troubleshoot faults by stepping through some of the most common issues our customers experience. Third, we’ll contrast what Splunk does relative to other APM tools you may already have deployed, and even show you how you can bring APM data into Splunk to gain more insight into application performance.
Similar a SFBA Usergroup meeting November 2, 2022 (20)
SFBA Splunk Usergroup meeting December 14, 2023Becky Burwell
The summary provides an overview of the key topics and announcements from the Splunk User Group meeting:
1. The meeting will start at 11:10 am PST with a welcome and announcements before speakers present.
2. Upcoming meeting dates and locations for 2023 are provided, including a virtual meeting in March 2023.
3. The presentation will cover writing documentation for Splunk, including administrator documentation, user documentation, and documenting known issues. Tips are provided about iterating on documentation.
The document discusses a Splunk User Group meeting where the CISO of Los Angeles discussed the importance of automation and intelligence to act on threats. It then provides an overview of threat intelligence and how Recorded Future collects and organizes data from various sources to understand the threat landscape. Finally, it describes how the Recorded Future integration with Splunk can help accelerate security workflows like investigation, automation, and strategic planning.
SFBA Splunk User Group Meeting February 2023Becky Burwell
This presentation provides an overview of Splunk apps and how to build Splunk addons. It discusses the different types of Splunk apps and addons, such as modular inputs, parsing configurations, and custom search commands. It also covers ways to build addons using the UCC framework or Addon Builder, as well as how to package and vet apps using CLI commands, APIs, and the packaging toolkit. Resources for learning app development are also provided.
SFBA Splunk Usergroup meeting December 2022Becky Burwell
This presentation discusses Splunk Ideas, a program that allows users to submit enhancement requests for Splunk products. It provides metrics on the number of ideas submitted, voted on, and implemented. The presentation outlines the lifecycle of an idea from submission to implementation. It also discusses upcoming improvements to Splunk Ideas including customer champions, newsletters, and better response rates.
SF Bay Area Splunk User Group Meeting October 5, 2022Becky Burwell
Andrew D'Auria, the Director of Sales Engineering at Anvilogic, gave a presentation on modernizing threat detection engineering. He discussed problems with the current detection engineering process, including that it is slow, results in noisy alerts, and lacks coordination across tools. D'Auria proposed using Anvilogic's platform to build detections based on MITRE ATT&CK techniques and scenarios, correlate events of interest without code, and measure detection program effectiveness to improve security operations. He provided examples of how Anvilogic helped a financial client improve detections and reduce alerts.
SFBA Splunk User Group Meeting August 10, 2022Becky Burwell
The document summarizes the agenda and presentations for the August SF Bay Area Splunk User Group meeting. Ryan O'Connor gave a presentation on Dashboard Studio and the Splunk UI. He discussed why to build with Dashboard Studio, how to quickly customize dashboards, reduce searches, and tips for building with Dashboard Studio. Rinita Datta then presented on driving customer success through self-service resources like the Adoption Boards, signing up for tech talks and newsletters, and finding guidance on Splunk Lantern.
Getting Started with Splunk Observability September 8, 2021Becky Burwell
This document provides an introduction to getting started with Splunk Observability, including setting up a Splunk Observability trial, installing integrations for Windows, Linux, and GCP, and collecting events and metrics from cloud and observability systems. It also references a workshop for further guidance and discusses plans to get the Gateway installation working and collecting more data.
Advanced Outlier Detection and Noise Reduction with Splunk & MLTK August 11, ...Becky Burwell
This document provides an overview of advanced outlier detection and noise reduction techniques using Splunk and the Machine Learning Toolkit (MLTK). It discusses common ways to detect outliers including static thresholds, moving averages, density functions, and combining multiple methods. Ensemble learning and clustering algorithms are also introduced as ways to increase outlier detection accuracy.
The Ipsos - AI - Monitor 2024 Report.pdfSocial Samosa
According to Ipsos AI Monitor's 2024 report, 65% Indians said that products and services using AI have profoundly changed their daily life in the past 3-5 years.
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Aggregage
This webinar will explore cutting-edge, less familiar but powerful experimentation methodologies which address well-known limitations of standard A/B Testing. Designed for data and product leaders, this session aims to inspire the embrace of innovative approaches and provide insights into the frontiers of experimentation!
Build applications with generative AI on Google CloudMárton Kodok
We will explore Vertex AI - Model Garden powered experiences, we are going to learn more about the integration of these generative AI APIs. We are going to see in action what the Gemini family of generative models are for developers to build and deploy AI-driven applications. Vertex AI includes a suite of foundation models, these are referred to as the PaLM and Gemini family of generative ai models, and they come in different versions. We are going to cover how to use via API to: - execute prompts in text and chat - cover multimodal use cases with image prompts. - finetune and distill to improve knowledge domains - run function calls with foundation models to optimize them for specific tasks. At the end of the session, developers will understand how to innovate with generative AI and develop apps using the generative ai industry trends.
Codeless Generative AI Pipelines
(GenAI with Milvus)
https://ml.dssconf.pl/user.html#!/lecture/DSSML24-041a/rate
Discover the potential of real-time streaming in the context of GenAI as we delve into the intricacies of Apache NiFi and its capabilities. Learn how this tool can significantly simplify the data engineering workflow for GenAI applications, allowing you to focus on the creative aspects rather than the technical complexities. I will guide you through practical examples and use cases, showing the impact of automation on prompt building. From data ingestion to transformation and delivery, witness how Apache NiFi streamlines the entire pipeline, ensuring a smooth and hassle-free experience.
Timothy Spann
https://www.youtube.com/@FLaNK-Stack
https://medium.com/@tspann
https://www.datainmotion.dev/
milvus, unstructured data, vector database, zilliz, cloud, vectors, python, deep learning, generative ai, genai, nifi, kafka, flink, streaming, iot, edge
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeWalaa Eldin Moustafa
Dynamic policy enforcement is becoming an increasingly important topic in today’s world where data privacy and compliance is a top priority for companies, individuals, and regulators alike. In these slides, we discuss how LinkedIn implements a powerful dynamic policy enforcement engine, called ViewShift, and integrates it within its data lake. We show the query engine architecture and how catalog implementations can automatically route table resolutions to compliance-enforcing SQL views. Such views have a set of very interesting properties: (1) They are auto-generated from declarative data annotations. (2) They respect user-level consent and preferences (3) They are context-aware, encoding a different set of transformations for different use cases (4) They are portable; while the SQL logic is only implemented in one SQL dialect, it is accessible in all engines.
#SQL #Views #Privacy #Compliance #DataLake
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"sameer shah
Embark on a captivating financial journey with 'Financial Odyssey,' our hackathon project. Delve deep into the past performance of two companies as we employ an array of financial statement analysis techniques. From ratio analysis to trend analysis, uncover insights crucial for informed decision-making in the dynamic world of finance."
End-to-end pipeline agility - Berlin Buzzwords 2024Lars Albertsson
We describe how we achieve high change agility in data engineering by eliminating the fear of breaking downstream data pipelines through end-to-end pipeline testing, and by using schema metaprogramming to safely eliminate boilerplate involved in changes that affect whole pipelines.
A quick poll on agility in changing pipelines from end to end indicated a huge span in capabilities. For the question "How long time does it take for all downstream pipelines to be adapted to an upstream change," the median response was 6 months, but some respondents could do it in less than a day. When quantitative data engineering differences between the best and worst are measured, the span is often 100x-1000x, sometimes even more.
A long time ago, we suffered at Spotify from fear of changing pipelines due to not knowing what the impact might be downstream. We made plans for a technical solution to test pipelines end-to-end to mitigate that fear, but the effort failed for cultural reasons. We eventually solved this challenge, but in a different context. In this presentation we will describe how we test full pipelines effectively by manipulating workflow orchestration, which enables us to make changes in pipelines without fear of breaking downstream.
Making schema changes that affect many jobs also involves a lot of toil and boilerplate. Using schema-on-read mitigates some of it, but has drawbacks since it makes it more difficult to detect errors early. We will describe how we have rejected this tradeoff by applying schema metaprogramming, eliminating boilerplate but keeping the protection of static typing, thereby further improving agility to quickly modify data pipelines without fear.