The document discusses the challenges of monitoring dynamic cloud applications where resources are constantly changing. Traditional monitoring of servers is not sufficient, as resources may not exist for long periods. Effective monitoring requires tracking how resources are provisioned and utilized over time, as well as both static and dynamic monitoring from the application to infrastructure layers. This allows visibility into how dynamic resources are working and being used.
45. Server OS
Server (Virtual)
Hardware
Application &
Application
Microservices
Provisioning
Application &
Application
Microservices
Application &
Application
Microservices
BrowserMobile
Visibility Requires
Full Stack, Static & Dynamic Monitoring
Application
Monitoring
Infrastructure
Monitoring
• Top to bottom monitoring of entire application
• Static Monitoring of how Resource are used
• Dynamic Monitoring of how Resources are
provisioned and utilized
Dynamic Cloud
Monitoring
Customer
Experience
Monitoring
DASHBOARDS
It’s your big day.
The day of the new product launch.
Black Friday
Election Day.
The big game. The big event.
{c} Whatever day, it’s the busiest day of the year.
{c} The day of the year when your company either…makes it or breaks it…
You hold your breath…
Can you scale?
{c}Can you stay operational?
{c}Will you survive?
Your customer’s expect your product to be available. They expect your product to meet their needs.
They expect you to work…all the time…
A failure of your application, is a disappointment to your customer.
A disappointment to your customer, is an unhappy customer experience.
Your customers aren’t happy when you fail…
Unhappy customers don’t buy from you.
And, most importantly, unhappy customers, tell other people.
They have a right to complain, and complain they do.
Maintaining application availability, properly scaled, is critical to keeping your customers happy…
…and keeping more customers coming to your applications.
But the problem is, many of us don’t pay proper attention to how our applications are performing.
We don’t know when there is a lurking problem. We don’t know when we need to intervene.
Hoping… and wishing… your service stays up is not a path to success…
Laugh at it, but more people do this than you might expect. There are many companies that simply wait for the next failure to occur, and hope the failure isn‘t serious...
They deal with the problem when it occurs, rather than anticipate and plan for the problems ahead of time.
They do things that put their applications in jeopardy...
They do thuings that add risk to their applications, and ultimateily failure.
Let me give you a real life example…
This is an overheard conversation. A report, by an operations engineer.
I want you to see if you hear anything familiar in this conversation…
We were wondering how changing a setting on our MySQL database might impact our performance…
… but we were worried that the change may cause our production database to fail…
… Since we didn’t want to bring down production, we decided to make the change to our backup (replica) database instead…
… After all, it wasn’t being used for anything at the moment.
Of course, that’s when the random act of nature occurs…and we remember why we had a backup database.
We remember why the backup was needed...
This problem is the result of bad planning. It’s the result of back decision making.
It’s the result of not understanding the stresses your application is on.
And it is the result of not having visibility into how changes to your system impact your system performance.
This, absolutely, was a true story.
Does this story sound familiar?
It is unfortunately not an uncommon story…
Availability issues come in all sizes and shapes. Some of them are big fat obvious ones like the last story…
Some are much more subtle… For example…
Imagine we are a e-commerce website. We’ve got a mobile app that can purchase items in ourshop. {C} Bob uses his phone, buys something, and it takes 300ms. That’s great! {C} Sally logs in, buys something, but the database is slow. It takes much longer. She is not a happy customer.
Availability is not just whether a page responds, but how long it takes to respond.
The customer doesn’t care why a problem occurred, they don’t care why your app is slow. If it doesn’t meet their expectations at a time they expect, nothing else matters…
The problem is that when we typically look at our applications, we something more like this.
On average, it worked pretty good…most of the time…
The details are where the problems lie. The details are where availability problems are born.
The real answer to how your application is doing is not a hope and a wish. It’s not an average. It’s in the details. It’s in the data.
Modern application monitoring can’t be done by simply looking from the outside in. It can’t be done with averaged or sampled data. You must collect data from all areas of your application, and from all transactions. You must collect tons and tons of data.
---
In fact, you typically need to collect more monitoring data than data that is within your application. And it grows continuously, every day, every second. Everything that anyone does on your application, generates performance data.
If anybody is using your application, you must collect data about exactly how they are using it and how the infrastructure behind it works together. All of it is important.
All parts of your application, from your servers thru your apps, to the business outcomes they represent.
All generate data that you must analyze together.
{C} This is a big data problem.
You must understand all parts of your application.
You must understand how all the parts work together.
You must understand the performance of every part of your application.
You must have visibility into your entire application, and it’s infrastructure.
Because if you don’t have visibility into your application, and have access to the data you need at the time you need it. You’ll:
1) Waste time fire fighting…because you won’t know where the problem is…
2) Meaningless finger pointing across teams…one team won’t trust another team that is telling them the problem is in their service…without data
3) Lose money…you don’t make money when your application is not available
4) Make customers unhappy…
5) Unhappy customers tell other people…
You also need the right data. You need to know how your application is performing, to answer questions as simple as, “Am I actually open for business?”. But you also want to know how easy it is for your customers to make use of your application. What is their experience? And you need to know how your business is doing.
You need to monitor the right components…and you need to monitor the right data.
Success involves all three types of analytics. Is the software working? Is it meeting the customer’s needs? Is it meeting your business needs? All of these three things are interconnected.
But this isn’t enough. This is the old story. This is the “visibility keeps your application running” story. We all know that story.
The problem is this…
…the world itself is getting complicated. Our applications are getting complicated. The world isn’t a static world any longer.
It use to be your world was composed of simple, static data centers.
Data centers where your application ran normally…and all was well.
Your operations team was comfortable. They knew the resources they controlled, they created them, they managed them. All was simple and manageable.
But in the new world. Our applications are much more dynamic. They are more sophisticated, and serve a more sophisticated customer.
Our static data centers simply don’t meet our needs anymore. We are out growing them.
In the new world, resources are created dynamically.
The cloud allows us to request and consume resources on demand.
The world of the operations team can no longer be as simple as tracking resources on a spreadsheet. The resources they are responsible for are dynamic and transient.
Their world has gotten a lot more complicated.
The dynamic cloud allows you to build better applications, faster. The way you’ve done things in the past won’t work in the future.
New Relic did an analysis about how our customers are making use of Docker.
The question we wanted to answer was, how long do docker containers live? This diagram shows the answer to that question. The horizontal axis is the number of hours a docker container has lived for, and the vertical axis is the number of containers in that time bucket. As you can see, there is a long tail, with some docker containers running for well over a year. However, there is a huge number of docker containers that run for less than one hour. In fact, if we zoom in on just that one hour time period…
we can see that most docker containers we run actually only run for less than one minute! Over 11% of all docker containers we run will run for less than 60 seconds.
This is some customer’s application or service, some business logic, that starts up, runs, and shuts down all within 60 seconds. This is very rapid. These are containers that are launched only for a specific business purpose and are terminated when that purpose is completed. This is what we mean by dynamic infrastructure.
***Interestingly, we did this analysis first three years ago, then updated it regularly. The analysis has remained the same, but the containers have become even more dynamic as time has gone on.
Building dynamic infrastructures in the cloud allows you to {c} scale your applications better. {c} It also allows you to make changes to your application faster and easier. {c} Both of these ultimately result in higher availability…
But only if you know what your application is actually doing…
(But only if you know what your application is actually doing…)
This brings up an interesting concern. In a dynamic cloud, you have dynamic resources. Resources that are coming and going rapidly. Instances are starting and stopping. Containers are coming and going. And functions are executing and terminating.
If resources are coming and going so fast, how can you monitor them? How do you monitor a dynamic application in a dynamic cloud?
What is a dynamic application?
Dynamic applications allocate resources on demand.
They resize resources on demand.
The provisioning process is not an independent action performed by operations engineers.
The provisioning process is part of the application.
It’s dynamic.
Our applications are dynamic. This allows better scaling, and it allows higher availability in our complex applications.
How do you get visibility into a dynamic application?
Your application still has services to monitor
Your application likely still has servers to monitor
Your application still has an infrastructure
Your application still has user interfaces and connections
But what about provisioning?
How do you monitor the provisioning process of a dynamic application?
Given that resources are coming and going regularly, how do you monitor that?
How do you monitor components that are there one moment, but less than 60 seconds later, they are gone?
<c>
Remember the docker information…
It turns out that monitoring a dynamic application in a dynamic cloud is very different than monitoring traditional data center components.
You must of course still monitor each of the components themselves…each of the services and resources and components that make up your application. This gives you visibility into how the actual resources themselves are performing.
{c}
But you also must monitor the lifecycle of the cloud components. This is because it matters not only **that** a resource was used, it matters **when** that resource was used. Because just looking at the resources running right now is inadequate when trying to diagnose a problem from even a few minutes ago. The resources that were in use when the problem occurred are **not** the same resources in use now.
So, in the old world, your operations team was comfortable. They knew the resources they controlled, they created them, they managed them, they even put them in spreadsheets. All was simple and manageable.
But in this new world, resources are created and destroyed dynamically. The world of the operations team can no longer be as simple as tracking resources on a spreadsheet. The resources they are responsible for are dynamic and transient.
The world is a lot more complicated.
In dynamic applications. The resources are constantly changing.
Monitoring a dynamic application requires tracking what resources are used…and when.
It means monitoring the provisioning process and how it is performing.
It means monitoring resource management processes, to make sure they are functioning as expected
In addition to monitoring the resources statically
The world is a lot more complicated.
It used to be, long ago, that all it took to make sure an application was running was to look at the server. Did the amount of CPU or memory utilization change recently? If it did, there might be a problem.
A slight bump up in memory usage might be a memory leak.
A slight bump up in CPU usage might indicate a server or service problem.
Everything was static, everything was smooth. Everything was constant. A change indicated a problem.
But in this new world, resources are created and destroyed dynamically. Resources are transient.
Knowing what resources were being used when a problem occurred is just as important as knowing what the resource was doing.
A resource that gets away without being tracked, can’t be used to determine the cause of a problem.
The world is a lot more complicated.
In the new complicated world…In order to monitor your dynamic d…you must…
Monitor all aspects of your application, top to bottom, every layer
Monitor all resources…every resource…no matter how long it is around for
And monitor how they are allocated, provisioned, and utilized…
You must understand how the dynamic infrastructure is changing how your application is performing…at every moment…of every day.
You must use dynamic instrumentation and dynamic monitoring.
Avoiding this is critical to every business.
Our Customers demand modern applications.
{c} And modern applications require…modern instrumentation
Visibility into our applications…gives us the ability to innovate.
It gives you speed. It gives you flexibility.
By giving yourself the visibility into your applications that proper instrumentation can provide, you get confidence.
Confidence to develop…
Confidence to scale…
Visibility gives you confidence in your dynamic applications.
Visibility gives you confidence into the dynamic cloud.
No more are your applications in jeopardy.
No more is money being burned.
In the cloud. Things are constantly in motion.
Tracking resources, watching how your application works, is substantially more complicated in the cloud.
Your world has gotten a lot more complicated.
Dynamic instrumentation gives you the visibility.Visibility gives you the confidence in the cloud.