2. Service Owner
Monitoring Engineer
Is there any way to
streamline these
repeatable tasks?
Chasing all these alerts is time
consuming
I can see where the problems
are, I just don’t have a way to
fix them
How am I going to hit my KPI
of reducing alert counts and
MTTR?
Why am I getting woken up
at all hours from my
monitoring team?
How am I going to hit my
KPI of service availability
and reliability?
How could I provide that team the
access they need to troubleshoot
before they call?
I have these scripts, what if they
could just run them for me?
3. How do we make it easier for
the first line of defense to
take action?
How much time are your
subject matter experts
spending on tasks that can
be automated?
How fast can we gather
additional troubleshooting
information or attempt a fix?
Monitoring solutions today know
a lot about the health of your
infrastructure, but lack the
ability to do something about it.
4. Confidential
1. Decipher the wiki (what does it mean? how old?)
2. Ad-hoc tool/script usage (where? syntax?)
3. ESCALATE!
3 options:
Without RBA
With RBA
Runbook
Automation
5. Can I see and example of Automating a fix
using Rundeck?
Our application has
two NGNIX servers.
1
If these servers go down, the
first troubleshooting step is
always “Restart the Service”.
2
Using Datadog to track the
service status, we can automate
this procedure by firing a
webhook to Rundeck.
3
Of course!
7. Safely provide task execution to teams that
don’t directly manage a service or
infrastructure.
Reduce burden on Subject Matter Experts
and allow them to focus on critical issues.
Automate the first line of defense tasks.
If you have any “try this first every time”
actions then it’s likely something that can
be automated.
RUNDECK STREAMLINES REPEATABLE AUTOMATION
TO TURN MONITORING INTO RESOLVING
So why Datadog + Rundeck Automation?