This document discusses the mathematics behind reducing complexity in large IT systems through partitioning. It defines two types of complexity: functional complexity, which increases exponentially with the number of functions in a system, and coordination complexity, which increases with the number of connections between systems. The key challenge is finding the optimal partition that minimizes both complexities. It introduces the concept of synergistic partitioning, which uses set theory and equivalence relations to determine the simplest possible partition before requirements are known. This allows partitioning very early in the project lifecycle to reduce overall complexity.
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
The Mathematics of IT Simplification
1. The Mathematics of IT Simplification
by
Roger Sessions
ObjectWatch, Inc.
1 April 2011
Version 1.03
Most Recent Version at
www.objectwatch.com/white_papers.htm#Math
2. Good fences make good neighbors.
- Robert Frost
Contents
Author Bio ..................................................................................................................................................... 3
Acknowledgements....................................................................................................................................... 3
Executive Summary....................................................................................................................................... 4
Introduction .................................................................................................................................................. 6
Complexity .................................................................................................................................................... 7
Functional Complexity .................................................................................................................................. 8
Coordination Complexity ............................................................................................................................ 11
Total Complexity ......................................................................................................................................... 12
Partitions ..................................................................................................................................................... 12
Set Notation ................................................................................................................................................ 14
Partitions and Complexity ........................................................................................................................... 14
Directed vs. Non-Directed Methodologies ................................................................................................. 16
Equivalence Relations ................................................................................................................................. 17
Driving Partitions with Equivalence Relations ............................................................................................ 18
Properties of Equivalence Driven Partitions ............................................................................................... 19
Uniqueness of Partition .......................................................................................................................... 20
Conservation of Structure ....................................................................................................................... 20
Synergy........................................................................................................................................................ 22
The SIP Process ........................................................................................................................................... 29
Wrap-up ...................................................................................................................................................... 32
Legal Notices ............................................................................................................................................... 32
2|Page
3. Author Bio
Roger Sessions is the CTO of ObjectWatch, a company he founded thirteen years ago. He has written
seven books including his most recent, Simple Architectures for Complex Enterprises, and dozens of
articles and white papers. He assists both public and private sector organizations in reducing IT
complexity by blending existing architectural methodologies and SIP (Simple Iterative Partitions). In
addition, Sessions provides architectural reviews and analysis. Sessions holds multiple patents in
software and architectural methodology. He is a Fellow of the International Association of Software
Architects (IASA), past Editor-in-Chief of the IASA Perspectives Journal, and a past Microsoft recognized
MVP in Enterprise Architecture. A frequent keynote speaker, Sessions has presented in countries around
the world on the topics of IT Complexity and Enterprise Architecture. Sessions has a Masters Degree in
Computer Science from the University of Pennsylvania. He lives in Chappell Hill, Texas. His blog is
SimpleArchitectures.blogspot.com and his Twitter ID is @RSessions.
Comments on this white paper can be sent to emailID: roger domain: objectwatch.com
Acknowledgements
This white paper has benefited greatly from the careful reading of its first draft by some excellent
reviewers. These reviewers include:
Christer Berglund, Enterprise Architect, aRway AB
Aleks Buterman, Principal, SenseAgility Group
G. Hussain Chinoy, Partner, Enterprise Architect, Bespoke Systems
Dr. H. Nebi Gursoy
Jeffrey J. Kreutzer, MBA, MPM, PMP
Michael J. Krouze, CTO, Charter Solutions, Inc.
Patrick E. Lujan, Managing Member, Bennu Consulting, LLC
John Polgreen, Ph.D., Chief Architect, Architecting the Enterprise
Balaji Prasad, Technology Partner and Chief Architect, Cognizant Technology Solutions
Mark Reiners
David Renton, III
Anonymous
I appreciate all of their help.
The towels photos on the front cover are by EvelynGiggles, Flickr, and Licensed under Creative
Commons Attribution 2.0 Generic.
3|Page
4. Executive Summary
Large IT systems frequently fail, at a huge cost to the U.S. and world economy. The reason for these
failures has to do with complexity. The more complex a system is, the more likely it is to fail. It is difficult
to figure out the requirements for complex systems. It is hard to design complex systems. And it is hard
to implement complex systems. At every step in the system life cycle, errors accumulate at an ever
increasing rate.
The solution to this problem is to break large complex systems up into small simple systems that can be
worked on independently. But in breaking up the large complex systems, great care must be done to
minimize the overall system complexity. There are many ways to break up a system and the vast
majority of these make the problem worse, not better.
If we are going to compare approaches to reducing the complexity, it is useful to have a way of
measuring complexity. The problem of measuring complexity is orthogonal to the problem of reducing
complexity, in the same way that measuring heat is orthogonal to the problem of cooling, but our ability
to measure an attribute gives us more confidence in our ability to reduce that attribute.
To measure complexity, I consider two aspects of a system. The first is the amount of functionality. The
second is the number of other systems with which this system needs to coordinate. In both cases,
complexity increases exponentially. This indicates that an effective “breaking up” of a large system
needs to take into account both the size of the resulting smaller systems (I call this functional
complexity) and the amount of coordination that must be done between them (I call this coordination
complexity.) Mathematically, I describe the process of “breaking up” a system as partitioning that
system.
The problem with trying to simultaneously reduce both functional and coordination complexity is that
these two pull against each other. As you partition a large system into smaller and smaller systems, you
reduce the functional complexity but increase the coordination complexity. As you consolidate smaller
systems to reduce the coordination complexity, you increase the functional complexity. So the challenge
in reducing overall complexity is to find the perfect balance between these two forms of complexity.
What makes this challenge even more difficult is that one must determine how to split up the project
before one knows what functionality is needed in the system or what coordination will be required.
Existing approaches to this problem use decompositional design, in which a large system is split up into
smaller systems, and those smaller systems are then split into yet smaller systems. But decompositional
design is a relatively random process. The outcome is highly dependent on who is running the analysis
and what information they happen to have. The chance that decompositional design will find the
simplest possible way of partitioning a system is highly unlikely.
This paper suggests another way of partitioning a system called synergistic partitioning. Synergistic
partitioning is based on the mathematics of sets, equivalence relations, and partitions. This approach
has several advantages over traditional decompositional design:
4|Page
5. It produces the simplest possible partition for a given system, in other words, it produces the
best possible solution.
It is directed, meaning that it always comes out with the same solution.
It can determine the optimal partitioning of a system with very little information about the
functionality of that system and no information about its coordination dependencies, so it can
be used very early in the system life cycle.
To someone who doesn’t understand the mathematics of synergistic partitioning, the process can seem
like magic. How can you figure out how many baskets you will need to sort your eggs if you don’t know
how many eggs you have, which eggs will end up in which baskets, or how the eggs are related to each
other? But synergistic partitioning is not magic. It is mathematics. This paper describes the mathematics
of synergistic partitioning.
The concept of synergistic partitioning is delivered with a methodology known as Simple Iterative
Partitions (SIP.) It is not necessary to understand the mathematics of synergistic partitioning to use SIP.
But if you are one of those who feel most comfortable with a practice when you understand the theory
behind the practice, this paper is for you.
If you are not interested in the theory, then just focus on the results: reduced complexity, reduced costs,
reduced failure rates, better deliverables, higher return on investment. If this seems like magic,
remember what Arthur C. Clarke said: “Any sufficiently advanced technology is indistinguishable from
magic.” The difference between magic and science is understanding.
5|Page
6. Introduction
Big IT projects usually fail, and the bigger the project, the more likely the failure. Estimates on the cost of
these failures in the U.S. range from $60B in direct costs1 to over $2 trillion when indirect costs are
included2. By any estimate, the cost is huge.
The typical approach to large IT projects is four phases, as shown in Figure 1. This approach is essentially
a waterfall approach; each phase must be largely completed before the next is begun. While the
waterfall approach is used less and less on smaller IT projects, on large multi-million dollar systems it is
still the norm.
Figure 1. Waterfall Approach to Large IT Projects
Notice that in Figure 1, the Business Case is shown as being much smaller than the remaining boxes. This
reflects the unfortunate reality that too little attention is usually paid to this critical phase of most
projects.
The problem with the waterfall approach when used to deliver large IT systems is that each phase is very
complex. The requirements for a $100 million system can easily be a foot thick. The chances that these
requirements will accurately reflect all of the needs of a large complex system are poor. Then these
most likely flawed requirements will be translated into a very large complex design. The chance that this
design will accurately encompass all of the requirements is also small. Then this flawed design will be
implemented, probably by hundreds of thousands of lines of code. The chance that this complex design
will accurately be implemented is also small. So we have poor business requirements specified by flawed
requirements translated into an even more flawed design delivered by an even more flawed
implementation. On top of this, the process takes so long that by the time the implementation is
delivered, the requirements and technology have probably changed. No wonder the failure rates are so
high!
I advocate another approach known as Simple Iterative Partitions (SIP). The SIP approach to delivering a
large IT system is to first split it up into smaller, much simpler subsystems as shown in Figure 2. These
systems are more amenable to fast, iterative, agile development approaches.
1
CIOInsight, 6/10/2010 http://www.cioinsight.com/c/a/IT-Management/Why-IT-Projects-Fail-762340
2
The IT Complexity Crisis; Danger and Opportunity. White Paper by Roger Sessions, available at
http://objectwatch.com/white_papers.htm#ITComplexity
6|Page
7. Figure 2. SIP Approach to Large IT
We are generally good at requirements gathering, design, and implementation of small IT systems. So if
we can find a way to partition the large complex project into small simple systems, we can reduce our
costs and improve our success rates. While SIP is not the only methodology to make use of
partitioning3, it is the only methodology to approach partitioning from a rigorous, mathematical
perspective which yields the many important benefits that I will explore further in this paper.
What makes partitioning difficult is that it must be done very early in the project life cycle, before we
even know the requirements of the project. Why? Because the larger the project, the less likely the
requirements are to be accurate. So if we split the project before requirements gathering, then we can
do a much more accurate job of doing requirements gathering. But how can you split up a project when
you know nothing about the project?
SIP does this through a pre-design of the project as part of the partitioning phase. In this pre-design, we
gather some rudimentary information about the project, analyze that information in a methodical way,
and make highly accurate mathematical predictions about the best way to carve up the project.
This claim seems like magic. How can you effectively partition something before you know what that
something is? SIP claims to do this based on mathematical models. And that is the purpose of this white
paper: to describe the mathematical foundations and models that allow SIP to do its magic.
Complexity
SIP promises to find the best possible way to split up a project. In order to evaluate this claim, we need
to have a definition for “best.” For SIP, the best way to split up the project is the way that will give the
maximum return on investment (ROI), that is, the maximum value for the minimum cost. The best way
3
TOGAF® 9, for example, uses the term partitioning to describe different views of the enterprise.
7|Page
8. to maximize ROI is to reduce complexity. Complexity is defined as the attribute of a system that makes
that system difficult to use, understand, manage, and/or implement.4
Complexity, cost, and value are all intimately connected. Since the cost of implementing a system is
primarily determined by the complexity of that system5, if we can reduce the complexity of a system
through effective partitioning, then we can reduce its cost. And since our ability to accurately specify
requirements is also impacted by complexity, reducing complexity also increases the likelihood that the
business will derive the expected value from the system. Understanding complexity is therefore critical.
We can’t reduce something unless we understand it.
Of course, reducing the complexity of a system does not guarantee that we can deliver it. There may be
technology, political, or business obstacles that complexity reduction does not address. But it is a
necessary part of the solution.
IT complexity at the architecture levels comes in two flavors: functional complexity and coordination
complexity. I’ll look at each of these in the next two sections. There are other types of complexity such
as implementation complexity, which considers the complexity of the code, and communications
complexity, which considers the difficulty in getting parties to communicate. These other types of
complexity are also important but fall outside of the scope of SIP.
Functional Complexity
Functional complexity refers to the complexity that arises as the functionality of a system increases. This
is shown in Figure 3.
4
CUEC (Consortium for Untangling Enterprise Complexity) – Standard Definitions of Common Terms
(www.cuec.info/standards/CUEC-Std-CommonTerminology-Latest.pdf
5
See, for example, The Impact of Size and Volatility on IT Project Performance by Chris Sauer, Andrew Gemino,
and Blaize Horner Reich, Comms of the ACM Nov 07 and 2009 Chaos Report published by the Standish Group
8|Page
9. Figure 3. Increasing Complexity with Increasing Functionality
Consider a system that has a set of functions, F. If you have a problem in the system, how many
possibilities are there for where that problem could reside? The more possibilities, the more complexity.
Let’s say you have a system with one function: F1. If a problem arises, there is only one place to look for
that problem: F1.
Let’s say you have a system with two functions: F1 and F2. If a problem arises, there are three places you
need to look. The problem could be in F1, the problem could be in F2, or the problem could be in both.
Let’s say you have a system with three functions: F1, F2, and F3. Now there are seven places the problem
could reside: F1, F2, F3, F1+F2, F1+F3, F2+F3, F1+F2+F3.
You can see that the complexity of solving the problem is increasing faster than the amount of
functionality in the system. Going from 2 to 3 functions is a 50% increase in functionality, but more than
a 100% increase in complexity.
In general, the relationship between functionality and complexity is an exponential one. That means
that as the functionality in the system increases by F, the complexity increases in the system by C, where
C is > F. What is the relationship between C and F?
One supposition is made by Robert Glass in his book Fact’s and Fallacy’s about Software Engineering. He
says that every 25% increase in functionality results in a doubling of complexity. In other words, F=1.25;
C=2.
Glass may or may not be right about these exact numbers, but they seem a reasonable assumption. We
have many examples of how adding more “stuff” into a system exponentially increases the complexity
of that system. In gambling, adding one die to a system of dice increases the complexity of the system
9|Page
10. by six-fold. In software systems, adding one if statement to the scope of another if statement increases
the overall cyclomatic complexity of that region of code by a factor of two.
Assuming Glass’s numbers are correct, we can compare the complexity of an arbitrary system relative to
a system with a single function in it. We may not know how much complexity is in a system with only a
single function, but whatever that is, we will call it one Standard Complexity Unit (SCU).
We can calculate the complexity, in SCUs, of a system of N functions.
SCU(N) = 10((log(C)/log(F)) X log(N)
Since log(2) and log(1.25) are both constants, they can be combined as 3.11. This equation simplifies to
SCU(N) = 103.11 X log (N)
This equation can be further simplified to SCU(N) =N3.11
Since the constant 3.11 is derived from Glass, I refer to that number as Glass’s Constant. If you don’t
believe Glass had the right numbers, feel free to come up with a new constant based on your own
observations and name it after yourself. The precise value of the constant (3.11) is less important than
where it appears (the exponent.) For the purposes of this paper, I will use Glass’s Constant.
In this white paper I will not be using large values for N, so rather than go through the arithmetic to
determine the SCU for a given value of N, you can look it up in Table 1 which gives the SCU
transformations for the numbers 1-25.
N SCU(N) N SCU(N) N SCU(N) N SCU(N) N SCU(N)
1 1 6 251 11 1,717 16 5,500 21 12,799
2 9 7 422 12 2.250 17 6,639 22 14,789
3 30 8 639 13 2,886 18 7929 23 16,979
4 74 9 921 14 3,632 19 9,379 24 19,379
5 148 10 1,277 15 4,501 20 10,999 25 21,999
Table 1. SCU Transformations6 for 1-25
Using Table 1, you can calculate the relative complexity of a system of 20 functions compared to a
system of 25 functions. A system of 20 functions has, according to Table 1, 10,999 SCUs. A system of 25
functions has, according to Table 1, 21,999 SCUs. So we calculate relative complexity as follows:
20 functions = 10,999 SCUs
25 functions = 21,999 SCUs
Difference = 11.000 SCUs
Increase = 11,000/10,999 X 100 = 100 % increase
Notice that the above calculation agrees with Glass’s prediction, that a 25% increase in functionality
results in a 100% increase in complexity. No surprise here, since it was Glass’s constant I used to
calculate this table.
6
The numbers in this table are calculated using a more precise Glass’s constant, 3.1062837. The numbers you get
with 3.11 will be somewhat off from these, especially in larger values of N.
10 | P a g e
11. Coordination Complexity
The complexity of a system can increase not only because of the amount of functionality it contains, but
how that functionality needs to coordinate with other systems, either internal or external. In a service-
oriented architecture (SOA), this dependency will probably be expressed with messages. But messages
aren’t the only expression of dependencies. Dependencies could be expressed as shared data in a
database, function invocations, or shared transactions, to give a few examples.
Figure 4 shows the increasing complexity of a system with increasing coordination dependencies.
Figure 4. Increasing Complexity with Increasing Coordination Dependencies
If we assume that the amount of complexity added by increasing the dependencies by one is roughly
equal to the amount of complexity added by increasing the number of functions by one, then we can
use similar logic to the previous section to calculate the number of standard complexity units (SCUs) a
system has by virtue of its dependencies:
C(M) = 103.11 X log (M) where M is the number of system dependencies.
This simplifies to C(M) = M3.11
This arithmetic is exactly the same as the SCU transformation shown in Table 1. So you can use the same
table to calculate the complexity (in SCUs) of a system with N dependencies. The three systems in Figure
4 thus have SCU values of 0, 9, and 422, which are the SCU transformations of 0, 2, and 7.
Of course, it is unlikely that the amount of complexity one adds with an additional coordination
dependency is exactly the same as the amount one adds with an additional function. Both of these
constants are just educated guesses. As before, the value of the constant (3.11) is less important than
the location of the constant (the exponent.) I won’t be calculating absolute complexity; I will be using
these numbers to compare the complexity in two or more systems. Since we don’t know how much
11 | P a g e
12. complexity is in an SCU, knowing that System A has 1000 SCUs tells us little about its absolute
complexity. On the other hand, knowing that System B has twice the complexity of System A tells us a
great deal about the relative complexity of the two systems.
Total Complexity
The complexity of a system has two complexity components: the complexity due to the amount of
functionality and the complexity due to the number of coordination dependencies. Assuming these two
are independent of each other, the total complexity of the system is the addition of the two. So the
complexity of a system with N functions and M coordination dependencies is given by
C(M, N) = M3.11 + N3.11
Of course, feel free to use Table 1 rather than go through the arithmetic.
Many people have suggested to me that the addition of the two types of complexity is too simplistic.
They feel that the two should be multiplied instead. They may well be right. However I have chosen to
assume independence (and thus, addition) because it is a more conservative assumption. If we instead
assume the two should be multiplied, then the measurements later in this paper would be even more
dramatic. Multiplication magnifies, not lessens my conclusions.
One last thing. Systems rarely exist in isolation. In a real IT architecture, we have many systems working
together. So if we want to know the full complexity of the system of systems, we need to sum the
complexity of each of the individual systems. This is straightforward, now that we know how to calculate
the complexity of any one of the systems.
A good example of a “system of systems” is a service-oriented architecture (SOA). In this architecture,
there are a number of services. Each service has complexity due to the number of functions it
implements and complexity due to the number of other services on which it has dependencies. Each of
these can be calculated using the SCU transform (or Table 1) and the complexity of the SOA as a whole is
the sum of the complexity of each of the services.
Partitions
Much of the discussion of simplification will center on partitioning. I’ll start by describing the formal
mathematical concept of a partition, since that is how I will be using the term in this white paper.
A set of elements can be divided into subsets. The collection of subsets is a set of subsets. A partition is a
set of subsets such that each of the elements of the original set is now in exactly one of the subsets. It is
incorrect to use the word partition to refer to one of the subsets. The word partition refers to the set of
subsets.
For a non-trivial number of elements, there are a number of possible partitions. For example, Figure 5
shows four ways of partitioning 8 elements. One of the partitions contains 1 subset, two contain 2
subsets, and one contains 8 subsets.
12 | P a g e
13. Figure 5. Four Ways of Partitioning Eight Elements
This brings up an interesting question: how many ways are there of partitioning N elements? The answer
to this is known as the Nth Bell Number. I won’t try to explain how to calculate the Nth Bell Number7. I
will only point out that for non-trivial numbers, the Nth Bell Number is very large. Table 2 gives the Nth
Bell Number for integers (N) between 0 and 19. The Bell Number of 10, for example, is more than
100,000 and by the time N reaches 13, the Bell Number exceeds 27 million. This means that if you are
looking at all of the possible ways of partitioning 13 elements, you have more than 27 million options to
consider.
N N Bell N N Bell
0 1 10 115,975
1 1 11 678,570
2 2 12 4,213,597
3 5 13 27,644,437
4 15 14 190,899,322
5 52 15 1,382,958,545
6 203 16 10,480,142,147
7 877 17 82,864,869,804
8 4,140 18 682,076,806,159
9 21,147 19 5,832,742,205,057
Table 2. N Bell for N from 0-19
You can see some of the practical problems this causes in IT. Let’s say, for example, we want to
implement a relatively meager 19 functions in an SOA. This means that we need to partition the 19
functions into some number of services. Which is the simplest way to do this? If we are going to
exhaustively check each of the possibilities, we have more than 1012 possibilities to go through! Finding
a needle in a haystack is trivial by comparison.
7
For those interested in exploring the Bell Number, see http://en.wikipedia.org/wiki/Bell_number.
13 | P a g e
14. Set Notation
Partitioning Theory is part of Set Theory, so I will often be using set notation in the discussions. Just to
be sure we are clear on set notation, let me review how we write sets.
A = {a, b, c} is to be read A is the set of elements a, b, and c. It is worth noting that order in sets is not
relevant, so if
A = {a, b, c} and
B = {b, c, a}
then A = B.
The elements of a set can themselves be sets, as in
A = {{a, b}, {c}} which should be read
A is a set containing two sets, the first is the set of elements a and b, and the second is the set with only
c in it.
We do not allow duplicates in sets, so this is invalid: A = {a, b, a}.
The same element can be in two sets of a set of sets, so this is valid: A = {{a, b, c}, {a}}.
If C is the entire collection of elements of interest then a set of sets P of those elements is a partition if
every element is in exactly one of the sets of P. So if
C = a, b, c, d and
A = {{ab}, {cd}} and
B = {{a}, {b, c, d} and
D = {{a, b}, {b, c}} and
E = {{a, b}, {d}}
Then A and B are partitions of C. D is not a partition because b is in two sets of C. E is not a partition
because c is not in any of the sets of E.
Partitions and Complexity
Earlier I suggested that the main goals of the SIP preplanning process are reducing cost and increasing
value. Both of these are influenced by complexity. Complexity adds cost to a system. And complexity
makes it difficult to deliver, or even understand, the system’s value requirements. Partitioning is an
important tool for reducing complexity. But it is also a tool that if not well understood can do more
harm than good.
Let’s say you are trying to figure out the best way to partition eight business functions, A-H, into
services. And let’s say that you are given the following dependencies:
14 | P a g e
15. A/C (I will use “/” to indicate “leaning on”, or dependent, so A is dependent on C)
A/F
A/G
G/F
F/B
B/E
B/D
B/H
D/H
E/D
I stated that we want to partition this the “best way.” Remember we defined best as the one with the
least overall complexity. Given these eight functions and their dependencies, you might want to take a
moment to try to determine the best possible partition. Perhaps the best possible partition is {{ABEF},
{CDGH}}. This partition is shown in Figure 6.
Figure 6. Partition {{ABEF}, {CDGH}}
The partition shown above can be simplified by eliminating any dependencies within a subset (service.)
It isn’t that these connections aren’t important; it’s just that they have already been accounted for in
the implementation calculations. Figure 7 shows this same partition with internal connections removed.
Figure 7. Partition {{ABEF}, {CDGH}} With Internal Connections Removed
Now we can calculate the complexity of the proposed partition. The complexity of the left subset is SCU
(4 functions) + SCU (5 dependencies) which equals 74 + 148. The complexity of the right subset is SCU (4
15 | P a g e
16. functions) +SCU (1 coordination dependency) which equals 74 + 1 = 75. The complexity of the entire
system is therefore 148 + 75 = 223 SCUs.
We know that the complexity of {{ABEF}, {CDGH}} is 223 SCUs. But is that the best we can do? Let’s
consider another partition: {{ACGF}, {BDEH}}. Figure 8 shows this partition with internal connections
removed.
Figure 8. Partition {{ACGF}, {BDEH}}
You can already see how much simpler this second partition is than the first. But let’s go through the
calculations.
Left: SCU (4) + SCU (1) = 74 + 1
Right: SCU (4) + SCU (0) + 74 + 0
Total = 75 + 74 = 149 SCUs
The difference in complexity between the two partitions is striking. The second is 149 SCUs. The first is
223 SCUs. The difference is 74 SCUs. Thus we have achieved a 33% reduction in complexity just by
shifting the partitions a bit. This equates to a 33% reduction in cost of the project and a 33% increase in
the likelihood we will deliver all of the value the business wants.
Of course, this example is a trivially small system. As systems grow in size, the difference in complexity
between different partition choices becomes much greater. For very large systems (those over $10M)
the choice of partition becomes critical to the success of the project.
Directed vs. Non-Directed Methodologies
You can see why the ability to choose the best (i.e., simplest) partition is important. There are two
typical approaches to the problem. The first is directed, or deterministic. The second is non-directed, or
non-deterministic.
Directed methodologies are those that assume that there are many possible outcomes but only one that
is best. The directed methodology focuses on leading us directly to that best possible outcome. The
childhood game of “hot/cold” is a directed methodology. Participants are trying to lead you directly to
your goal by telling you when you are hot (getting closer) or cold (getting further away.)
16 | P a g e
17. Non-directed methodologies are those that assume you aren’t looking for a specific goal, just some
general category of outcome. The game of poker is a non-directed methodology. It doesn’t matter how
you end up winning all the money, as long as you win it.
From the perspective of IT complexity reduction, the directed approach has advantages and
disadvantages. The advantage of the directed approach is that it will lead you to the simplest solution,
guaranteed, no questions asked. The disadvantage is that you need complete information to find that
solution. That is, you must know every function and every coordination dependency. However this
degree of information doesn’t materialize until the very end of the design cycle or even later, long after
the partitioning needs to be done. So directed approaches, at least the common ones in use today, are
not helpful in the particular problem space we are considering, that is, IT complexity reduction.
The non-directed methodologies also have advantages and disadvantages. The non-directed
methodologies used in IT design are all some flavor of decompositional design. Decompositional design
works by starting with the larger system and breaking it down into smaller and smaller pieces. The
advantage of decompositional design is that you can use the methodology early on in the design cycle,
at a point when your knowledge is very incomplete. The disadvantage of decompositional design is that
it is more than non-directed, it is close to random. Any of the many partitions are a possible outcome
from a decompositional design and there are a huge number of partitions (The Bell Number, to be
precise.) Experienced designers may be expected to eliminate the worst of these, but the chances that
decompositional design, even in the hands of highly experienced personnel, will end up with the best
possible partition are, for non-trivial systems, extremely low. If we are using decompositional design to
find the needle in the haystack, the chances are high that all we will find is hay.
So we have a big problem in IT preplanning. We need to do our partitioning early on in the design cycle.
The only design methodologies that can be used early in the design cycle are non-directed
methodologies, particularly decompositional design. But decompositional design methodologies are
good at finding hay, very poor at finding needles. So they offer little hope of solving our problem,
namely, finding the least complex possible partition. What do we do?
The solution to this dilemma is to find another methodology, one that combines the benefits of both
directed and non-directed methodologies. Such a methodology could be used very early in the design
cycle (like non-directed methodologies) but still promise to lead us to the best possible partition (like the
directed methodologies.) But before I can introduce such a methodology, I need to give you some more
background.
Equivalence Relations
Equivalence relations are a mathematical concept that will be critical to the needle in the haystack
problem. In this section, I will describe equivalence relations and their relevance to partitions.
An equivalence relation E is any function that has these characteristics:
17 | P a g e
18. - It returns TRUE or FALSE (i.e., it is Boolean.)
- It takes two arguments, both of which are elements of some collection (i.e., it is binary.)
- E(a,a) is always TRUE (i.e., it is reflective.)
- If E(a,b) is TRUE, then E(b,a) is TRUE (i.e., it is symmetric.)
- If E(a,b) is TRUE and E(b,c) is TRUE, then E(a,c) is TRUE (i.e., it is transitive.)
For example, consider the collection of people in this room and the function BirthYear that takes as
arguments two people in the room and returns TRUE if they are born in the same year and FALSE
otherwise. This is an equivalence relation, because:
- It is Boolean (it returns TRUE or FALSE.)
- It is binary (it compares two people in this room.)
- It is reflective (everybody is born in the same year as themselves.)
- It is symmetric (if I am born in the same year as you, then you are born in the same year as me.)
- It is transitive (If I am born in the same year as you and you are born in the same year as Sally,
then I am born in the same year as Sally.)
Driving Partitions with Equivalence Relations
Equivalence relations are important because they are highly efficient at driving partitions. The general
algorithm for using an equivalence relation E to drive a partition P is as follows:
Initialization:
1. Say C is the collection of elements that are to be partitioned.
2. Say P is the set of sets that will (eventually) be the partition of C.
3. Initialize a set U (unassigned elements) to contain all the elements of C.
4. Initialize P to contain a single set which is the Null set, i.e., P = {{null}}
5. Take a random element out of U and put it in the single set of P, i.e., P = {{e1}}
Iteration:
1. Choose a random element of U, say em.
2. Choose any set in P.
3. Choose any element in P, say en.
4. If E(em, en ) is TRUE, then add em to the set.
5. If (em, en) is FALSE, then check the next set in P (i.e., go to step 2).
6. If there are no more sets in P, then create a new set, add em to that set, and add the new set to
P.
This sounds confusing, but it becomes clear when you see an example. Let’s say we have five people we
want to partition whose names and birth years are as follows:
18 | P a g e
19. - Anne (1990)
- John (1989)
- Tom (1990)
- Jerry (1989)
- Lucy (1970)
Let’s assume that we want to partition them by birth year. Let’s follow the algorithm:
Initialization:
1. C = {Anne, John, Tom, Jerry, Lucy}
2. Initialize U = {Anne, John, Tom, Jerry, Lucy}
3. Initialize P = {{null}}
4. Take a random element out of U say John, and assign John to the single set in P. So P now looks
like {{John}} and U now looks like {Anne, Tom, Jerry, Lucy}.
Now we iterate:
1. U is not null (it has four elements) so we are not done.
2. Choose a random element of U, say Lucy.
3. Choose any set in P, there is only one which is {John}.
4. Choose any element in that set, there is only one which is John.
5. If BirthYear(Lucy, John) is TRUE, then add Lucy to the set {John}.
6. BirthYear(Lucy, John) is FALSE, so we check the next set in P. But there are no more sets in P, so
we create a new set, add Lucy to it, and add that set to P. P now looks like {{John}, {Lucy}}
Let’s go through one more iteration:
1. U is not null (it now has three elements) so we are not done.
2. Choose a random element of U, say Jerry.
3. Choose any set in P, say {John}.
4. Choose any element in that set, there is only one which is John.
5. If BirthYear(Jerry, John) is TRUE, then add Jerry to that set. It is, so we add it to that set and start
the next iteration. We now have P = {{John, Jerry}, {Lucy}}
As you continue this algorithm, you will eventually end up with the desired partition.
Properties of Equivalence Driven Partitions
When we use an equivalence relation to drive a partition as described in the last section, that partition
has two properties that will be important when we apply these ideas to IT preplanning. These are:
- Uniqueness of Partition
- Conservation of Structure
19 | P a g e
20. Let’s look at each of these in turn.
Uniqueness of Partition
Uniqueness of partition means that for a given collection of elements and a given equivalence relation
there is only one possible outcome from the partitioning algorithm. This means that there is only one
possible partition that can be generated. Let’s go back to the problem of partitioning the room of people
using the BirthYear equivalence relation. As long as the collection of people we are partitioning doesn’t
change and the nature of the equivalence relation doesn’t change, there is only one possible outcome. It
doesn’t matter how many times we run the algorithm, we will always get the same result. And it doesn’t
matter who runs the algorithms or what order we choose elements from the set.
This property is important in solving our needle in the haystack problem. We are trying to determine
which partition is the optimal, or simplest, partition out of the almost infinite set of possible partitions.
Going through all possible partitions one by one and measuring their complexity is logistically
impossible. But if we can show that the simplest partition is one that would be generated by running a
particular equivalence relation, then we have practical way of finding our optimal partition.
Conservation of Structure
Conservation of structure means that if we expand the collection of elements, we will expand but we
will never invalidate our partition. So let’s say we have a collection C of elements that we partition using
equivalence relation E. If we now add more elements to C and continue the partition generating
algorithm, we may get more elements added to existing sets in P and/or we may get more sets added to
P with new elements. But we will never invalidate the work we have done to date. That is, none of the
following can ever occur:
- We will never find that we need to remove a set from P.
- We will never find that we need to split a set in P.
- We will never find that we need to move an element in one of the sets of P to another set in P.
Let’s consider an example. Suppose we have 1000 pieces of mail to sort by zip code. Assume that we
don’t know in advance which zip codes or how many of them we will be using for our sort. We can use
the equivalence relation SameZipCode and our partition generating algorithm to sort the mail.
Let’s say that after sorting the first 100 pieces of mail, we have 10 zip codes identified with an average of
10 pieces of mail in each set. As we continue sorting the remaining mail, only two things can happen: we
can find new zip codes and/or we can add new mail to existing zip code. But we will never invalidate the
work we have done to date. Our work is conserved.
This brings up an interesting point. If you are sorting e Elements into s Sets of a partition, then how
many of e do you need to run through the partitioning algorithm before you have identified most of the
sets that will eventually make up P? In other words, how far to do you need to go in the algorithm
before you have carved out the basic structure of P? The answer to this question is going to depend on
three things:
20 | P a g e
21. - The magnitude of s, the number of sets we will eventually end up with.
- The distribution of e into s.
- How much of P we need to find before we can say we have found its basic structure.
To give an example calculation, I set up a test where I assumed that we would be partitioning 1000
business functions randomly among 20 sets. I then checked to see how many business functions I would
need to run through the partitioning algorithm to have found 50%, 75%, 90%, and 100% of the sets. The
results are shown in Table 2.
Table 2.
Run# 50% 75% 90% 100%
1 10 23 45 83
2 14 40 48 68
3 13 22 65 93
4 12 18 28 31
5 16 32 41 111
6 12 22 29 47
7 11 23 35 83
8 14 22 41 51
9 15 27 46 63
10 18 29 45 81
Table 2. Runs to Identify Different Percentages of Partition
As Table 2 shows, in the worst case I had identified 90% of the final sets after analyzing at most 65 of the
1000 business functions. The average number of iterations I had to run to find 90% of the sets was 42.3
with a standard deviation of 10.6. This tells us that 95% of the time we will find 90% of the 20 sets with
no more that 42.3 + (2 X 10.6) = 64 iterations. Putting this all together, by examining 6.4% of the 1000
business functions, we can predict with 95% confidence that we have found at least 90% of the structure
of the partition. From then on, we are just filling in details.
This is well and good, you may say, if you know in advance that we have 20 sets. What if we have no
idea how many sets we have?
It turns out that you can predict with remarkable accuracy when you have found 90% of the sets even
without any idea how many sets you have. You can do this by plotting how many sets have been
identified against the number of iterations. With one random sample, I got the plot shown in Figure 9.
21 | P a g e
22. Figure 9. Number of Sets Found vs. Iterations
As you can see in Figure 9, we rapidly find new sets early in the iteration loop. In the first five iterations
we find five sets. This is to be expected, since if the elements are randomly distributed, we would expect
almost every element of the early iterations to land in a new subset. But as more subsets are discovered
there are fewer left to discover, so for later iterations elements are more likely to land in existing
subsets. By the time we have found 15 of the subsets, there are only 5 left to be discovered. The next
iteration after the 15th subset is discovered has only a 5/20th chance of discovering yet another
undiscovered subset. And the next iteration after the 19th subset has only a 1/20th chance of finding the
next subset. It will take on average 20 iterations to go from the 19th subset to the 20th subset.
So by observing the shape of the curve, we can predict when we are plateauing out on the set discovery
opportunities. The particular curve shown in Figure 9 seems to flatten out at about 17 subsets, which
have taken us about 35 iterations to reach. So by the time we have completed 35 iterations, we can
predict that we have found someplace around 90% of the existing sets.
Synergy
Let’s take a temporary break from partitions and equivalence relations to examine another concept that
is important in complexity reduction, that is, synergy. Synergy is a relationship that may or may not exist
between two business functions. When two business functions have synergy, we call them synergistic.
Synergy is of course a word that is used in many different contexts. I define synergy as follows:
Synergy: Two business functions have synergy with each other if and only if from the business
perspective neither is useful without the other.
22 | P a g e
23. So if we want to test two business functions A and B for synergy, we take them to the business and tell
them we can give them A or B but not both. If they are willing to take either without the other, then the
two functions are not synergistic. If they won’t take either without the other, then they are synergistic.
For example, consider these business functions:
- Deposit
- Withdraw
- Check Overdraft
- Check Balance
- Validate Credit Card
- Validate Debit Card
- Process Credit Charge
- Process Debit Charge
If you take these functions two by two to the business and ask which have mutual requirements, you are
likely to get these answers:
- Deposit and Withdraw are needed as a pair. The business has no interest in a system that can
deposit but not withdraw or can withdraw but not deposit. So these two are synergistic.
- Withdraw and Check Balance are both needed. The business won’t accept a system that can
withdraw but not show you the new balance and visa versa, so these two are synergistic.
- Process Credit Charge and Withdraw are not needed as a pair. Process Credit Charge is used to
process a credit card charge, and the business can see use for this regardless of whether that
same system can show them what their account balance is.
Now let’s say that we have three business functions, A, B, and C. Here are some observations we can
make about synergy.
1. It is Boolean, meaning that the statement “A and B are synergistic” is either TRUE or FALSE.
2. It is binary, meaning that the statement takes two arguments.
3. It is reflective, meaning that any business function is synergistic with itself. You can’t
meaningfully ask the question, “Can I give you Deposit but not Deposit?”
4. It is symmetric, meaning that if A is synergic with B, then B must be synergistic with A. So if you
ask the question “Can I give you only one of Deposit and Withdraw?” you will get the same
answer as “Can I give you only one of Withdraw and Deposit?”
5. It is transitive, meaning that if A and B are synergistic and B and C are synergistic, then A and C
must be synergistic. So once we know that Deposit and Withdraw and synergistic and we also
know that Withdraw and Check Balance are synergistic, then we can predict that Deposit and
Check Balance are also synergistic.
So we have a function that is Boolean, binary, reflective, transitive, and symmetric. What is such a
function called? As I discussed earlier, such a function is called an equivalence relation. So synergy is not
23 | P a g e
24. just a function, it is an equivalence relation. And now that we know that synergy is an equivalence
relation, we know a great deal about it. In particular, we know these important facts:
- It can be used to drive a partition.
- That partition will be unique.
- That partition will have conservation of structure.
Now of course the business side doesn’t think of business functions as elements in a partition. It thinks
of business functions as steps in a particular business process. It thinks of Deposit and Withdrawal, for
example, as part of the process of Managing An Account. Managing An Account is an example of a
cluster of functions that is often referred to as a capability. From the business perspective, a capability is
a unified business responsibility. From the mathematical perspective, a capability is one of the sets in
the synergy driven partition.
An element in the partition then corresponds to a discrete step in the business process. The synergistic
analysis ensures that elements of a given set are all working on a unified business responsibility, that is,
the responsibility of their respective capability. I will start using the term capability set to refer to one of
the synergy driven sets in the partition.
When we ask the business to analyze these eight functions, they are not going to take long to figure out
which are synergistically related (or, as they might put it, living in the same capability.) They are likely to
conclude that the related groups are synergistic:
Manage Cards
Validate Credit Card
Validate Debit Card
Process Credit Card
Process Debit Card
Manage Account
Deposit
Check Overdraft
Withdraw
Check Balance
Now let’s watch what happens when this project is turned over to IT for implementation. Four different
architects (John, Anne, Jim, Sally) examine the requirements. They each come back with their
recommendation.
John doesn’t like SOAs. He thinks the messaging is too complex. He recommends that all eight of the
functions be packaged together. If anything goes wrong, debugging is much easier, he says.
24 | P a g e
25. Anne really likes SOAs. She thinks everything should be a service. She recommends that each function
be a separate service. This will give the system the maximum flexibility, she says.
Jim believes in reuse. He thinks functions that are going to have similar implementations should be
packaged together. This will give the maximum possible opportunity to reuse code, he says.
Sally believes that the implementation packages should follow the business capabilities as defined by
the synergistic relationships. This will give the maximum possible alignment between the business and
IT, she says.
Who is right? Notice that all of these arguments are entirely reasonable. Let’s approach the problem
from a complexity perspective.
Since each function implements one step in a unified business process, it stands to reason that there will
be many mutual dependencies between implementations of members of a capability set. Consider the
functions Deposit, Withdraw, Check Balance and Check Overdraft. If we change the format of money
that Deposit stores in a database, we will need to change Withdraw to use the same storage format. If
we change Deposit’s understanding of an account ID, that understanding will need to be reflected in
Check Balance.
We will be facing similar issues with Validate Credit Card, Process Credit Charge, Validate Debit Card,
and Process Debit Charge. These are all functions related to processing credit and debit cards. Changes
to Process Credit Charge are likely to force changes in Validate Debit Card and vise versa.
On the other hand, changes in Validate Credit Card are not likely to force changes in Check Balance and
changes in Check Balance are not likely to force changes in Charge Credit Card. Graphically, we can map
these dependencies as shown in Figure 10.
Figure 10. Dependencies in Capability Sets
Now let’s look at each of the four architectural proposals. Figures 11-14 show the four different
packaging possibilities with dependencies internal to the package not shown.
John’s consolidated packaging is shown in Figure 11.
25 | P a g e
26. Figure 11. Monolithic Package
Anne’s extreme SOA packaging is shown in Figure 12.
Figure 12. Extreme SOA Package
Jim’s reuse intensive packaging is shown in Figure 13.
26 | P a g e
27. Figure 13. Reuse Driven Package
Sally’s synergy driven package is shown in Figure 14.
Figure 14. Synergy Driven Package
Now let’s do the complexity calculations on the four packages. Remember that the complexity of a
system of systems is equal to the sum of the individual systems, and the complexity of an individual
system is equal to its implementation complexity and its coordination complexity. Using Table 1, we get
these values in standard complexity units (SCUs) for the four packages:
Monolithic Package
System Functional Complexity Coordination Complexity
A 8 Functions, 639 SCUs 0 Dependencies, 0 SCUs
Total Complexity: 639 SCUs
27 | P a g e
28. Extreme SOA Package
System Functional Complexity Coordination Complexity
A 1 Function, 1 SCU 3 Dependencies, 30 SCUs
B 1 Function, 1 SCU 3 Dependencies, 30 SCUs
C 1 Function, 1 SCU 3 Dependencies, 30 SCUs
D 1 Function, 1 SCU 3 Dependencies, 30 SCUs
E 1 Function, 1 SCU 3 Dependencies, 30 SCUs
F 1 Function, 1 SCU 3 Dependencies, 30 SCUs
G 1 Function, 1 SCU 3 Dependencies, 30 SCUs
H 1 Function, 1 SCU 3 Dependencies, 30 SCUs
Total Complexity: 248 SCUs
Reuse Package
System Functional Complexity Coordination Complexity
A 3 Functions, 30 SCUs 7 Dependencies, 422 SCUs
B 1 Functions, 1 SCU 3 Dependencies, 30 SCUs
C 3 Functions, 30 SCUs 7 Dependencies, 422 SCUs
D 1 Functions, 1 SCU 3 Dependencies, 30 SCUs
Total Complexity: 966
Synergy Package
System Functional Complexity Coordination Complexity
A 4 Functions, 74 SCUs 0 Dependencies, 0 SCUs
B 4 Functions, 74 SCU 0 Dependencies, 0 SCUs
Total Complexity: 148 SCUs
The implementation package that follows the synergy relationships is the least complex. In fact, it is
substantially less complex. It has only 60% of the complexity of the next simplest solution (the extreme
SOA) and only 15% of the complexity of the most complex solution (the reuse.)
This result cannot be over emphasized. The least complex implementation package is the one that
precisely reflects the synergy driven capability sets. Not only is it the least complex. It is by far the least
complex. That means it will be by far the least costly to implement and by far the most likely to succeed.
At this point, some readers will question the need for all of this math. Why, they will ask, do we need to
go through this complex synergy algorithm? Why don’t we just let the business tells us how they view
the capability sets and drive the technical architecture from those?
In fact, in most cases, this is exactly what we do. For most large systems, the majority of functions fall
into obvious sets. It is only when we run into questionable or politically motivated placements that we
need to revert to the formalism of synergy analysis. But we mustn’t underestimate the importance of
this. Even a few misplacements can introduce huge amounts of complexity.
28 | P a g e
29. Some other readers will question how mathematical this process really is. After all, they will point out,
the synergy decision is made by human beings and human beings are notoriously fickle. So what have
we gained?
The way we get around the fickleness of human beings is to sharpen the focus of the question as much
as possible. So instead of asking a fuzzy and potentially politically charged question, such as “Which
group should own credit card processing,” we keep our questions tightly focused and politically neutral:
“Is Deposit synergistic with Process Credit Card?” No politics, no vagueness. If the business groups can’t
agree on the answer, we have a very simple question to escalate to the next highest level of expertise.
We must still deal with human beings. But their role is limited to answering pointed questions not much
more complicated than “are these two people born in the same year?”
The SIP Process
Now that we have some mathematical foundations, I can explain how SIP accomplishes its magic. Keep
in mind that it is not my goal in this white paper to explain the SIP process. I have done that elsewhere8.
My goal here is to describe the mathematics of SIP.
Recall that most of the SIP activity occurs in the partitioning phase, or what I often call the preplanning
phase. The main goal of this phase is to identify the basic structure of the project partition. To do this we
generate a representative selection of business functions, say 10% of the expected total. We then use
synergy analysis on that selection to drive the partition. This partition will define the structure and
packaging of the technical solution.
Since synergy is an equivalence relation, the partition that is created is unique and can be validated.
Because the methodology is deterministic, there is only one way it can turn out regardless of who is
driving the process and what random selection of business functions happen to be picked for the
analysis.
The partition generated will be the simplest possible, because synergy limits the number of business
functions that will live together. Only those with synergy are allowed to cohabitate. And it also limits the
coordination complexity by grouping together those with the most likely dependencies. The fact that we
can limit the coordination dependencies without knowing what those dependencies are is part of what
appears to be magic.
Remember all partitions generated with equivalence relations have the property of conservation of
structure. This guarantees that the partition we identified with the small group of business functions will
not change as we discover more business functions. This is important because upwards of 90% of the
business functions and all of their dependencies are still unknown during the partitioning phase. They
won’t be discovered until the design phase or even later.
8
See, for example, my book Simple Architectures for Complex Enterprises, Microsoft Press.
29 | P a g e
30. Even after implementation, the conservation of structure gives the system resiliency against future
changes. New functions may get added to the system in the future or existing functions may get re-
implemented, but the overall structure is highly stable.
Once SIP has identified how the project should be partitioned into smaller, simpler subprojects, each of
these subprojects can proceed relatively independently by smaller, more focused teams that have
specialized expertise. This means that the requirements of the subprojects are more likely to accurately
reflect the needs of the business, the design of the subprojects is more likely to deliver on the
requirements, and the implementation of the functions is more likely to meet the specifications of the
design. These smaller projects can more successfully use agile development methodologies, if desired.
The main role for SIP in the design phase is to ensure that newly discovered business functions are
assigned the proper subsystem based on synergy. This is critical, because if independent groups start
making their own decisions about where to place newly discovered business functions, the original
partition will quickly degrade. When that happens, all bets are off. Complexity will flood the system.
It is important that dependencies between subsystems are tracked as they are identified. These will
become part of the specifications to the implementers. These dependencies will also be used to define
an integration harness that will eventually unify the various subprojects.
As the subprojects move from design to implementation, SIP takes on another role. SIP needs to ensure
that the partition carved out at the business level is projected precisely onto the technical architecture.
Mathematically, the relationship between the business architecture and the technical architecture
should be an isomorphic projection. An isomorphism is a mapping between two sets of elements such
that every element in one set maps to one and only one element in the other set and vice versa. There
are two isomorphisms of interest to SIP. First, every boundary at the business level should map to a
boundary at the technical level. Second, every coordination dependency at the business level should
map to a coordination dependency at the technical level. Since the technical level will be driven by the
business level, I call this relationship an isomorphic projection (the business architecture projects onto
the technical architecture.)
This isomorphic projection is critical if SIP is to be used to simplify IT systems. Since we don’t analyze IT
systems directly for synergy (we can’t, because IT folks are not qualified to make synergy decisions) we
get the IT simplification indirectly through the business synergy analysis and the isomorphic projection.
This isomorphic projection must be guided by those whose vision encompasses both the business and
technical architecture, a group commonly referred to as enterprise architects. This isomorphic projection
must not only be initiated, but diligently maintained throughout the system lifecycle, otherwise the
simplicity gained in the design phase will be eroded in the maintenance phase.
30 | P a g e
31. The net result of this isomorphic projection is an architectural style that is quite unique to SIP. The
contrast between this style and a traditional IT style (e.g. Federal Enterprise Architecture Framework9)
is shown in Figure 15.
Figure 15. Comparing SIP IT Architecture to Traditional IT Architecture
I often refer to the SIP architecture as a snowman architecture, because it appears to be a collection of
snowmen holding hands. The snowman formation is a direct reflection of the isomorphic projection of
the business boundaries/dependencies onto the technical boundaries/dependencies. For more on the
snowman architecture, see my recent white paper with Nikos Salingaros10
The snowman architecture has a number of advantages over a traditional IT architecture such as
- It is less complex.
- It is more amenable to agile development methodologies.
- It is easier to adapt as the business processes change.
- It is more efficient for cloud deployment.
The snowman architecture does have one feature that some will consider a disadvantage: it emphasizes
simplicity at the expense of reusability. This is in keeping with the SIP philosophy that simplicity is the
most important design philosophy. It isn’t that we eliminate the possibility of reuse. We still seek
opportunities for code reuse through shared libraries and, to a lesser extent, through shared services. It
is just that all reuse is accomplished within the tight constraints of isomorphic projection. When we are
forced to choose between simplicity and reuse, simplicity wins.
9
http://en.wikipedia.org/wiki/Technical_Reference_Model#Technical_Reference_Model
10
Urban and Enterprise Architectures: A Cross-Disciplinary Look at Complexity by Roger Sessions and Nikos A.
Salingaros available at http://objectwatch.com/white_papers.htm#SessionsSalingaros.
31 | P a g e
32. Wrap-up
We are not doing a good job building large complex IT systems. Many people feel that the design of
large complex IT systems is an art, not a science. If it is an art, it is a failing art. We are spending too
much to deliver too little too late. This will not change until we change our approaches to delivering
large complex IT systems.
We have tried many approaches to managing complexity. None have solved the problem. I don’t believe
the problem is solvable. We can’t manage complexity. We must get rid of it. And our best hope to
getting rid of complexity is partitioning. We must partition large complex systems into small simple
systems. And we must do it at the earliest possible phase of the system life cycle, before complexity has
had a chance to take over.
But partitioning alone is not the answer. There are many ways of partitioning systems, many of which
make complexity worse and very few of which are optimal. If we are looking for the optimal partitioning,
we need to understand something about the nature of complexity and the nature of partitioning. And
then we need reproducible, verifiable methodologies for leveraging the mathematics of partitions to
address the problem of complexity.
That is the focus of the methodology known as SIP (Simple Iterative Partitions.) In this white paper, I
have discussed the mathematics behind SIP and IT simplification. The mathematical foundations are not
needed to apply the principles of simplification, but they are critical to understanding how simplification
works. And that is the first step in moving IT design from an art to a science. The artistic side of design
may help us appreciate complexity but only the scientific side of design can help us eliminate it.
Legal Notices
This white paper describes the mathematics of the methodology known as Simple Iterative Partitions
(SIP). SIP is protected by U.S. patent 7,756,735. Organizations interested in using SIP should contact the
author.
Simple Iterative Partitions is a trademark of ObjectWatch, Inc. ObjectWatch is a registered trademark of
ObjectWatch, Inc.
This white paper is copyright (c) 2011 by ObjectWatch, Inc.. It may be freely redistributed as long as it is
redistributed in its entirety and not changed in any way.
32 | P a g e