This document provides an overview of the Kepner-Tregoe problem solving method, which involves defining the problem, describing it in terms of what is and is not occurring, establishing possible causes, testing the most probable cause, and verifying the true cause. It includes examples of problems that were initially unsolvable but were resolved by properly applying the Kepner-Tregoe method, such as defining the problem statement more precisely, gathering all relevant resources, finding patterns in timing data, and thinking beyond the immediate fix. The key lessons are to follow the problem solving process, let it guide the investigation, and consider non-obvious factors or causes.
Scanning the Internet for External Cloud Exposures via SSL Certs
Solving “unsolvable” Problems Using IS/IS NOT Problem Solving
1. IS/IS NOT – Solving
“unsolvable” Problems
Speaker Ric Browne : CloudFX Senior ITSM Consultant
21 November 2013
2. Welcome to Appendix C: Kepner and Tregoe
Defining the Problem - Problem Statement
Describing the Problem with regard to Identity, Location, Timing
and Size - Problem Specification (IS/IS NOT in What, Where,
When, Extent format)
Establishing Possible Causes - Identify Possible Causes using
distinctions and changes
Testing the Most Probable Cause - Evaluate Possible Causes
Verifying the True Cause - Confirm True Cause (Facts,
Observation, Research, Results)
3. Process Quotes
“Freedom is greatest when the ground
rules are clear”
“Process is a wise man’s guide and a
fool’s bible”
“I don’t believe in failure. It is not
failure if you enjoyed the process”
Oprah Winfrey
“You can’t process me with an
ordinary brain”
Charlie Sheen
4. Defining the Problem
Because the investigation is based on the definition of the
problem, this definition has to state precisely which deviation(s)
from the agreed service levels have occurred.
Often, during the definition of a problem, the most likely cause is
already indicated. Take care not to jump to conclusions, which
can guide the investigation in the wrong direction from the
beginning.
In practice, problem definition is often a difficult task because of a
complicated IT Infrastructure and non-transparent agreements on
service levels.
ITIL v3 Service Operations Page 201
5. Defining the Problem
Problem : “Production Server running Slow” - Ongoing 3
weeks intermittently
Unsolvable : Analysis Inconclusive (working on wrong
problem)
Real Problem Statement “License Database response is
slow over lunchtime” – Solved 30 minutes
Key Takeaway : a well formed problem statement is a
problem half solved
6. Describing the Problem
The following aspects are used to describe the problem, i.e. what
the problem IS:
Identity. Which part does not function well? What is the problem?
Location. Where does the problem occur?
Time. When did the problem start to occur? How frequently has
the problem occurred?
Size. What is the size of the problem? How many parts are
affected?
ITIL v3 Service Operations Page 201
7. Describing the Problem ….cont.
The ‘IS’ situation is determined by the answers to these question.
The next step is to investigate which similar parts in a similar
environment are functioning properly. With this, an answer is
formulated to the question ‘What COULD BE but IS NOT?’ (Which
parts could be showing the same problem but do not?)
IS
IS NOT
WHAT
WHERE
WHEN
EXTENT
It is then possible you search effectively for relevant differences in
both situations. Furthermore, past changes, which could be the
cause of these differences, can be identified.
ITIL v3 Service Operations Page 201
8. Describing the Problem (IS/IS NOT)
Problem : “Login takes 35 minutes” – Ongoing : 8 months
constant and increasing
Unsolvable : Not reported for 7 months, Information not
shared between all teams
Real Cause : Novell Client not up to date causing
contention with NT Servers – Solved 2 hours
Key Takeaway : Gather Resources together and make the
process visible
9. Establishing Possible Causes
The list of differences and changes mentioned above most likely
hold the cause of the problem so possible causes can be extracted
from this list.
ITIL v3 Service Operations Page 201
10. Establishing Possible Causes
Problem : “NT4 Servers unable to communicate from new
LAN” – Ongoing : 10 weeks
Unsolvable : Stopped looking – Continue migration
project and delay analysis
11. Network Diagram
NEW VLAN
PROD VLAN
IP 176.3.7.0
IP 125.55.158.0
GW3
176.3.7.254
GW1
125.55.158.254
VMS SERVER
NEW NET CORE
BLOCK
NET CORE
BLOCK
OPS VLAN
GW2
125.55.154.254
IP 125.55.154.0
12. Establishing Possible Causes
Problem : “NT4 Servers unable to communicate from new
LAN” – Ongoing : 10 weeks
Unsolvable : Stopped looking – Continue migration
project and delay analysis
Real Cause : Older version TIBCO licence is IP sensitive –
Solved 3 days
Key Takeaway : Weird stuff happens – trust your instincts
13. Testing the Most Probable Cause
Each possible cause needs to be assessed to determine whether it
could be the cause of all symptoms of the problem.
ITIL v3 Service Operations Page 201
14. Testing the Most Probable Cause
Problem : “Online betting database response timing out”
– Ongoing : 18 months randomly
Unsolvable : Only considering 1 possible cause ignoring
the data – high jacking process for another agenda
16. Testing the Most Probable Cause
Problem : “Online betting database response timing out”
– Ongoing : 18 months randomly
Unsolvable : Only considering 1 possible cause ignoring
the data – high jacking process for another agenda
Real Cause : Tenanted Test database slowing down
response – Solved 6 weeks
Key Takeaway : when Timing is key, find the pattern, find
the cause
17. Verifying the True Cause
The remaining possible causes have to be verified as being the
source of the problem. This can only be done by proving this in
one way or another – for example by implementing a change or
replacing a part. Address the possible causes that can be verified
quickly and simply first.
ITIL v3 Service Operations Page 201
18. Verifying the True Cause
Problem : “Capital City Mortgage Cheque Printers will not
print” – Ongoing : 3 days continuous
Unsolvable : Limited knowledge of legacy system with all
causes exhausted – process of elimination completed
Real Cause : Signature file had inexplicably grown in size –
Solved 5 hours, verified 2 hours
Key Takeaway : Think Beyond The Fix – avoid creating new
problems