The daily hype is all around you. From cloud, multicloud, to hybrid cloud, this is the path to your digital future. These choices you’ve got to make don’t preclude the daily work of enhancing your customer's experience and agile delivery of those applications. With all this delivery and infrastructure, there is a lot of data to consider when engaging with any cloud experience. Regulatory and compliance pressures force us to store audit and observability data. Understanding the pitfalls around the collection, storage, and maintenance of our data can mean the difference between bankruptcy and success with our cloud strategy. Let us take you on a journey, looking closely at the decisions being made for delivering and dealing with monitoring those applications. Join us for an hour of power, where real customer experiences are used to highlight the three top lessons learned as they transitioned their data needs into cloud environments.
How to Troubleshoot Apps for the Modern Connected Worker
OpenShift Commons Dublin 2022 - 3 Pitfalls Everyone Should Avoid with Cloud Data
1. CONFIDENTIAL designator
3 Pitfalls Everyone
Should Avoid with
Cloud Data
Eric D. Schabell
Portfolio Architect Technical Director
@ericschabell
Roel Hodzelmans
Manager, Specialist Solution Architecture
@roelhodzelmans
5. Does your company know their bandwidth
and storage needs?
Question #1
Something to think about...
6. De Persgroep
(Based on audience question from ‘Power to Innovate talk)
“Why don’t you go 100% to cloud for hosting your news group assets?” The
public cloud providers quoted prices for hosting was OK, but the bandwidth
quotes were off the charts.
Shortly thereafter, during the Paris attacks (Charlie Hebdo) the French
people crashed all local news sites. The Walloon (FR Belgium) sites hosted by
De Persgroep received an extra 1.2 million unique visitors…. (810K BE, 450K
NL).
What would that have meant to your bandwidth costs in the cloud?
9. Will your organization have observability
across their cloud landscape?
Question #2
Something to think about...
10. “It’s remarkable how common this
situation is, where an organization is
paying more for their observability
data, than they do for their
production infrastructure.”
-- The Growth of Observability Data is out of Control
11. Data complexity
Experiment:
- Hello World app on 4 node
Kubernetes cluster with Tracing,
End User Metrics (EUM), Logs,
Metrics (containers / nodes)
- 30 days == +450 GB
14. Question #3
Do you know (who observes) the cost of
observability metrics data?
Something to think about...
15. Dedicated FinOp
“By 2023, 80% of organizations using
cloud services will establish a dedicated
FinOps function to automate policy-
driven observibility and optimization of
cloud resources to maximize value.”
-- Source: IDC 2022
16. Bonus Question #4
Does your organization have a baseline of
your cloud landscape?
Something to think about...
17. 1. Determine your goals of migrating data to the cloud
2. Assess your current situation
3. Select the right cloud migration partner
4. Create your business case for the cloud
5. Select the type of cloud environment needed – public, private, hybrid or
hybrid-multi?
6. Determine the specific cloud components necessary
7. Choose the right cloud provider
8. Plan the cloud approach
9. Execute the migration
10. Observability of the production environment
Don’t forget the baseline!
The daily hype is all around you. From cloud, multicloud, to hybrid cloud, this is the path to your digital future. These choices you’ve got to make don’t preclude the daily work of enhancing your customer's experience and agile delivery of those applications. With all this delivery and infrastructure, there is a lot of data to consider when engaging with any cloud experience. Regulatory and compliance pressures force us to store audit and observability data. Understanding the pitfalls around the collection, storage, and maintenance of our data can mean the difference between bankruptcy and success with our cloud strategy. Let us take you on a journey, looking closely at the decisions being made for delivering and dealing with monitoring those applications. Join us for an hour of power, where real customer experiences are used to highlight the three top lessons learned as they transitioned their data needs into cloud environments.
3 pitfalls you should understand when dealing with customers and prospects looking for your strategic insights into cloud data.
Cloud and data
Observability is data
FinOps the crucial Ops
The rest of this session will cover these three pitfalls after first setting the stage with definitions and positioning for cloud and cloud data.
It started with this session back in 2018. Five attendees insisted that we share this talk with their CIO/CTO onsite, resulting in a two week tour through the midwest in the US. This led to a book in the dummies series and several articles online.
But this topic included one key element that kept gnawing on us… data. Data since 2018 has secretly become a huge issue in organizations making use of the cloud in any form and in any size. It’s not what you think… storage is easy to solve or accept in the pricing picture, it’s more about fully understanding what data in the cloud can mean for you.
This talk evolved from the roots of hybrid multicloud and explores the modern day pitfalls solely based on data in the cloud… all kinds of data in the cloud.
(The Red Hat Summit 2018 top rated session recording (https://youtu.be/eACHhV_uxTE) available and online free ebook download (https://www.redhat.com/en/engage/multicloud-portability-dummies-s-201903060959?sc_cid=701f20000012pHcAAI).)
The first pitfall is understanding that cloud providers make their money on the transportation of data… water in the pipeline == $$$. This means we need to rethink our architecture and usage of the data pipeline and it’s not just about storage.
Asking the audience to give their accounts, customers and personal experiences as feedback.
Example of company in BE that decided to not go 100% in to the cloud, actually hosting their own sites in data centers due to the story here which would mean bankruptcy based on the pricing models.
http://powertoinnovate.nl/presentaties-powertoinnovate/customer-case-de-persgroep.pdf
From the presentation, the numbers that showcase extra load running on BE sites, with cloud pricing it would have meant bankruptcy. Note, nothing is running in containers…. wow.
http://powertoinnovate.nl/presentaties-powertoinnovate/customer-case-de-persgroep.pdf
The second pitfall is understanding that observability and metrics collection is cloud data..
Asking the audience to give their accounts, customers and personal experiences as feedback.
And for what purpose? If these organizations could draw a straight line from more data to better outcomes — higher levels of availability, happier customers, faster remediation, more revenue — this tradeoff might make sense. But in many cases, this isn’t true. “Paying more for logging/metrics/tracing doesn’t equate to a positive user experience. Consider how much data can be generated and shipped. $$$. You still need good people to turn data into action.”
It’s remarkable how common this situation is, where an organization is paying more for their observability data (typically metrics, logs, traces, and sometimes events), than they do for their production infrastructure. -- The Growth of Observability Data is out of Control
Observability metric data explosion will cause plenty of issues, not to mention costs… dare to flip the switch on new data collection?
An experiment:
Hello World application was deployed to a four node Kubernetes cluster on GKE. Load was generated using the script that comes with the app.
Wrote some additional scripting to scrape the Prometheus end points and record the size of the data payloads.
Another script accepted Jaeger tracing spans and EUM beacons, recording the size of the data payloads.
Fluentd collected all the logs and concatenated them all into one flat file. Using the timestamps from the log file, one hour was extracted into a new file, which was then measured.
Observability Data Volume: Tracing
At a rate of 1 trace per second, over 24 hours per day and 30 days in a month, the total number of traces is 2.5 million. The average trace size was 66kB. Therefore, the total data size for traces was 161GB. Looks like my estimate of fitting inside 100GB has already been proved wrong.
While Tracing can be sampled at source, that would mean having to throw away nearly half of the data to fit inside the original estimate of 100GB.
Observability Data Volume: EUM
Each back-end call is triggered by a user interaction at the browser, which produces an EUM beacon – conveniently making the number of beacons generated the same as the number of traces – 2.5 million. The average size of an End User Metrics (EUM) beacon is a lot smaller at 397 bytes, making our total data size for a month of EUM beacons 1 GB.
Observability Data Volume: Logs
For logs, especially when it comes to data volumes, your mileage may vary – depending on your app, configuration settings, etc. The application logs generate quite a bit at INFO level, though not nearly as much as some other real-world applications. From the experiment, the log file size for one hour was 5 MB, making the total log volume for one month 3.4 GB.
Observability Data Volume: Metrics
Collected metrics – using Prometheus – from from every container, each worker node and from kube state metrics for the cluster giving a total of 1.1 MB per sample period. With a sample every ten seconds, that’s 259,200 samples per month, which results in a total data volume of 285 GB.
Total Observability Data Volumes
The grand total across all datasets is 452 GB per month for a simple Hello World application running on a small Kubernetes cluster.
A note on data granularity: As you may or may not know, Instana collects all metrics at 1-second granularity. Doing this with Prometheus would so devastatingly skew the experiment results, since Prometheus has none of the optimizations built into the Instana sensors and agents. Thus, the experiment was conducted at 10 second sample rate for Prometheus metrics. The load generation script produces one request per second to the application back-end services.
(Source: The Hidden Cost of Data Observability)
Most companies default to 13 months retention for all data. But in the modern cloud native architecture, where we are deploying multiple times a day, and a container is only around for a couple of hours, a huge amount of that modern observability data does not need to be retained for 13 months. One tactic for reducing your data footprint is setting the optimal retention period for each data type. For example, you might only need to keep observability data from your lab environment for two weeks if the environment is torn down and rebuilt on a bi-weekly basis anyways.
Source -- The Growth of Observability Data is out of Control
The thirds pitfall is how crucial FinOps is going to be.
Asking the audience to give their accounts, customers and personal experiences as feedback.
A banking customer OpSec wanted to leverage the cloud provider’s observability in the load balancers by using a label of the load balancer per application. However, the load balancers label was limited to a specific amount, hence even though the utilization was < 10% the Dev Team had to get more load balancers (one of the most expensive component in the cloud) to meet said expectation. After many escalations that was solved. However, now they ran into another issue, the load balancers had a limited amount of contextpaths they could support. So again they had to multiply the number of load balancers, without hitting the traffic limit. The alternative, a simple NGinx behind the Cloud Load balancer, was not permitted because of the LCM, nobody wanted to LCM said instance.
Who observes the cost of sub par architectural decisions? Auditing, Monitoring, Tracing: Beautiful capabilities, highly necessary for proper observation of the health of the app and it’s capability to serve our customers in a timely and secure manner. But if each customer engagement for a purchase means N number of logs, then those will grow exponentially. Who owns that data strategy?
It looks like we’ll be kicking off with yet another buzzword in this industry: FinOps.
Many of us are re-architecting our apps & devs to be cloud native, our ops to be platform providers, building a SRE org to close the feedback loop between the platform consumers and providers. We talk (hopefully) about data portability, exit strategies and baselines. We talk about security, we talk about LCM, we talk about Utilization. And when we talk about the last 3, those should include not “just” the customer data - though that is extremely important - but also the auditing, tracing and logging data strategies & architectures.
Netflix already talked about this, how many already implemented a strategy?
Semi related (they use auto remediation, but still it’s a crap ton of data) https://www.infoq.com/presentations/netflix-streaming-data-infrastructure/
Asking the audience to give their accounts, customers and personal experiences as feedback.
A thorough assessment of your current situation is imperative, as it will lay the foundation for many important decisions you’ll need to make. A deep understanding of what applications you need to migrate to the cloud, your current IT environment, and the present level of resources and costs will help you make informed choices.
A banking customer asked RH for piloting business case public cloud, OCP and OSP. Looking at cost running container vs running VM’s. Without a baseline you can’t scope anything…. A government agency running containers on OCP so also have not baseline when asked. All decisions based on this! BTW: That Banking customer now decided that they go full cloud, just leaving the lights on for the traditional DC. Their own team say its a bad idea, but they are going anyway.
What are critical apps? What needs clustering? What can run in cloud (certified) and whatnot?
https://www.thorntech.com/2016/07/10-steps-cloud-computing-migration/#execute
3 pitfalls you should understand when dealing with customers and prospects looking for your strategic insights into cloud data.
Cloud and data
Observability is data
FinOps the crucial Ops
The rest of this session will cover these three pitfalls after first setting the stage with definitions and positioning for cloud and cloud data.
The daily hype is all around you. From cloud, multicloud, to hybrid cloud, this is the path to your digital future. These choices you’ve got to make don’t preclude the daily work of enhancing your customer's experience and agile delivery of those applications. With all this delivery and infrastructure, there is a lot of data to consider when engaging with any cloud experience. Regulatory and compliance pressures force us to store audit and observability data. Understanding the pitfalls around the collection, storage, and maintenance of our data can mean the difference between bankruptcy and success with our cloud strategy. Let us take you on a journey, looking closely at the decisions being made for delivering and dealing with monitoring those applications. Join us for an hour of power, where real customer experiences are used to highlight the three top lessons learned as they transitioned their data needs into cloud environments.