This experience report, by a project’s technical architect, details the adoption of Agile methods across several teams after one high profile success. The organisation had a long history of waterfall development and a clearly defined remit for technical architects. Years of refinement had led to a set of techniques which contradicted many of the ideals held by Agile practitioners. The author’s challenge was to maintain agility and fulfill responsibilities inherited from waterfall processes without reverting to the conventional practices that ultimately lead to the architect’s ivory tower.
2. Table of Content
Abstract ............................................................................................................................... 3
1. Introduction .......................................................................................................................... 4
2. Initial Agile adoption............................................................................................................. 5
3. The post lauch doubt and scaling up ................................................................................... 5
4. Redefining the Approach ..................................................................................................... 6
4.1. Reflecting on code structure ................................................................................................ 7
4.2. Providing direction and rigorous checkpoints ....................................................................... 8
4.3. Structuring the aplication architecture to promote good governance .................................... 9
4.4. Key measurement tool: testing .......................................................................................... 10
4.5. Documentation that is ‘good enough’ ................................................................................. 11
4.6. Delegation ......................................................................................................................... 12
5. Conclusion ......................................................................................................................... 14
6. References ........................................................................................................................ 14
7. Valtech Contact Details ..................................................................................................... 15
2
3. Abstract
This experience report, by a project’s technical architect, details the adoption of Agile methods
across several teams after one high profile success. The organization had a long history of
waterfall development and a clearly defined remit for technical architects. Years of refinement
had led to a set of techniques which contradicted many of the ideals held by Agile practitioners.
The author’s challenge was to maintain agility and fulfill responsibilities inherited from waterfall
processes without reverting to the conventional practices that ultimately lead to the architect’s
ivory tower.
3
4. 1. Introduction
The T-Mobile International Mobile Portals and Content Delivery Group had developed a set of
expectations around the responsibilities of a role which would be widely recognized as a
'Technical Architect'. The exact remit of a technical architect is subject to
much debate and differs widely between organizations and even projects. This report does not
aim to produce a definition for the technical architect, Agile or otherwise, but in the context of this
particular client engagement the role involved having: a wide remit over the implementations
delivered by several different teams; end to end technical responsibility; delivery of a consistently
efficient implementation that is fit for purpose.
In organizations where a waterfall process is in place it has been the author's experience that the
technical architect is unlikely to be involved in the hands on aspects of delivering solutions. Their
role is often very closely coupled with design documentation.
Often the delivery process has been structured to include quality gates where the deliverable for
the next stage is documented and then reviewed by the technical architect. The review of design
documentation is the primary tool available to the architect. This had been the case at T-Mobile.
The requirement to design, document and review everything upfront as a way to reduce risk is
one that is obviously eschewed by the Agile movement as being ineffective and providing only an
illusion of control.
In addition, making a documentation quality gate the mechanism by which the technical architect
manages the implementation paradoxically reduces their effectiveness by:
Isolating them from the technical implementation for which they are supposedly
responsible.
Reducing their technical ability by taking them away from the technology that made them
great candidates for the role in the first place.
Supporting the fallacy that the technical architect is the all-knowing center of the technical
universe.
Many technical architects on waterfall projects do an excellent job. The author's opinion is that
this is achieved in spite of rather than because of the design review quality gates. Personal
experience is that many technical architects feel disenfranchised because their key skills honed
through many years of education and hard project work are no longer put to good use.
Conversely, many developers perceive the technical architect as being somebody they rarely
interact with, who has little idea of how the software is being put together. In many teams the
author has observed the architect as being regarded as at best a poorly informed individual,
dabbling at the periphery and at worst, a dangerous impediment to progress whose attention
must be avoided at all costs.
4
5. Technical architect is usually a logical career progression for developers. The organization's
need for an individual to fulfill the responsibilities of the architect has not diminished [1] even if
the tools employed in the past have sometimes failed to add value. The challenge is: how can the
responsibilities of a technical architect be fulfilled without introducing practices which reduce the
agility, and therefore the effectiveness, of the team? Valtech has demonstrated that by changing
the techniques and attitudes of the architect it is possible to meet this goal. This experience
report details practices employed by a technical architect and his team across a body of work
consisting of several projects. These practices are evaluated in retrospect to measure their
effectiveness.
2. Initial Agile adoption
In the summer of 2007 a marketing initiative for a new mobile portal was proposed. The adage
'necessity is the mother of all invention' applied to this project. The high profile and reduced time-
scales (twelve weeks rather than six months from initiation to go-live) meant that light-weight
technologies and Agile practices had to be used rather than the incumbent document-centric
waterfall processes. This was very much a tactical Agile adoption. Failure was not an option and
the focus was on effective delivery rather than best practice. The project was a high profile
success in a very short time scale. The delivery date was met, to the hour. The team had proved
that a number of Agile techniques were highly beneficial and instilled confidence throughout the
group that more comprehensive adoption was not only possible but desirable.
3. The post launch doubts and scaling up
After the initial euphoria of launch there were doubts expressed by senior members of the
management team. There were concerns that not all of the old practices should have been
discarded. Of particular concern was the lack of accessible documentation to allow maintenance
of the platform, especially if the development team was cycled. This resulted in pressure to revert
to some of the original document centric processes.
A victim of their own success, the team was now required to deliver more features to the same
high levels of quality in the same aggressive time-scales. Given increasing scope and fixed
delivery dates the only solution was to increase headcount. The team was grown and
reorganized into two separate groups with different functional responsibilities in the same
platform. The technical architect retained his position across the team.
The two groups were seated in different locations. Other than an hour long weekly team meeting
there was increasingly little interaction between the two groups. This led to the creation of silos
where developers in one group knew very little about what was happening in the other.
5
6. The new organization began to deliver quickly and was generally viewed as a success but
internally several disturbing issues were surfacing:
Code style was diverging. It was very easy to see which team had produced any one
piece of code as they were radically different.
Common implementation patterns that had been very well understood were not being
applied consistently. This led to a number of situations where the application had
previously been predictable now behaving in an unpredictable and inconsistent fashion.
Technical debt was increasing. A key development principle was that the code base
should be under constant rationalization to remove duplication and redundancy and
increase reuse and consistency. This principle allowed the team to build fast, prove or
disprove a feature and then refactor to pay back the technical debt. A technical backlog
was maintained and tasks were regularly executed from this backlog. The net effect
should have been a reduction in lines of source but analysis showed that as quickly as
code was being refined, new code was being created. Code was still being produced
quickly but the debt was not being repaid. The team was moving from a proactive
refactoring regime into constant firefighting mode with decreasing feature delivery. During
this period the architect was made less effective by two critical factors:
o The architect was on the critical path for code delivery on one of the projects with
the same expected ideal engineering hours capacity as any developer.
o The architect's remit was well understood but the mechanisms by which that remit
would be achieved were not. This was the one of the traits of the conventional
ivory tower architect: responsibility with no clear mechanism of control.
4. Redefining the approach
The technical architect, who had been aware of these issues for several months but had been
unable to correct them, not least because of his own development commitments, determined that
the situation required immediate and fundamental intervention. At this point external events
required the teams to be reorganized into a number of different projects.
The technical architect took this opportunity to reorganize technical governance. The new regime
would allow scaling of development capacity through delegation and empowerment. To ensure
quality and consistency the regime would gather and analyse empirical evidence. The architect
determined that it was impossible to maintain the commitments of being a full time developer and
fulfill the technical architect's remit.
New techniques minimized the architect's isolation from the implementation, without the
unachievable requirement that the architect write key code modules.
The following sections describe some of the main techniques employed in this new technical
governance regime which kept the architect 'out of the ivory tower'.
6
7. 4.1. Reflecting on code structure
During the initial, problematic, scaling up of the team, development patterns and priorities had
been communicated. These were not always followed. In the new approach, after communicating
the intent, the realization was evaluated for compliance. This took the form of detailed code
reviews during the second iteration of the new projects as the body of code began to increase.
On a clean workstation, using only the instructions on the wiki, the architect built a development
environment. This included configuring the Eclipse IDE and Maven as well as checking out code
and setting up development application server instances. This was an essential first test of the
stability and accessibility of the code base.
The architect used a combination of Eclipse's powerful code navigation tools and the acceptance
and unit tests to traverse the application. After identifying the classes involved in particular user
goals, a UML tool reverse engineered the code into a set of class diagrams. The architect used
Eclipse and the UML tool to determine the associations between the implementation classes and
their tests. The architect then examined the implementation of the unit tests and made brief
passes of the code to determine the responsibilities of each class. This exercise indicated
whether standard patterns and agreed libraries were in use. Importantly it articulated the class
cohesion and structure.
Issues were identified around encapsulation and cohesion. Anti-patterns in test classes which
indicated issues in the implementation were noted and verified. The exercise produced a list of
issues to be corrected. The architect annotated the code in several places with FIXME and
TODO and finally produced a UML class diagram with notes showing the class structure in use.
This was added to the wiki. The exercise formed the basis for several improvement points at the
next retrospective and allowed the architect to provide positive feedback on the implementation
based on real evidence.
This exercise had several positive outcomes:
The architect's confidence that the team was following the correct and consistent set of
patterns was firmly established. The architect also gained valuable familiarity with the
code base. The issues that were spotted were easy to correct at this stage of the project.
If left they may well have spawned a large number of similar features which would have
increased the technical debt.
The developers' confidence was boosted. They were now sure that they were interpreting
the development guidelines correctly and had been publicly credited as such. The next
retrospective recorded that the developers regarded the code review as being one of the
positive features of the sprint. The architect going through code leaving TODO
annotations etc. increased the sense of common code ownership.
7
8. It made everybody more aware that the source was not an opaque artefact whose
functionality was the only facet that would be observed.
One problem as the code base increased was choosing which part of the application to inspect.
One technique that proved effective was simply to conduct an exercise in the retrospective where
each developer named their most complex or cleverest code module. These modules became
candidates for detailed inspection.
This mechanism of code review did require a significant investment in time. These reviews were
only conducted a handful of times over several months. The prohibitive cost of these detailed
code reviews meant that more commonly developers were invited to use a whiteboard to talk the
architect and a number of their peers through the interactions and class structure of a particular
section of the code. The objective was much the same as the detailed code inspection but also
served to educate a wider audience. Whilst it was of comparable in cost to the project in man
hours the cost to any one individual, critically the architect was reduced. E.g. a white-board
session would require two hours preparation by a developer and then one only hour for
attendance from the architect, two other developers and the developer presenting. The cost is six
hours to the project but only one hour is taken from the architect’s diary. A more effective but
costlier code inspection might easily cost six hours of the architect's time.
White-board sessions were less useful than code inspection. They did not bring the architect and
other developers into close contact with the actual code. It often led to a level of abstraction
(consciously or not) being introduced by the presenter in order to communicate with the
audience. Occasionally it appeared to reduce the sense of common code ownership as one
individual became the recognized expert.
4.2. Providing direction and rigorous checkpoints
The architect created a Development Principles wiki page which was then presented to the team
in an interactive session. The principles were deliberately not generic points that could be
universally applied on any project. Instead, these principles were derived from the best working
practices used on the T-Mobile portal application. They were very specific and easy to apply. The
principles covered a wide range of topics from TDD and patterns for concurrency to policies
regarding the team’s attitude to broken builds. They also contained sections that related to
common functional requirements such as error handling.
Reviews and retrospectives supported the view that these principles had a positive effect
encouraging a consistent approach.
In keeping with the approach of gathering empirical evidence over reliance on documentation,
the Development Principles were supported by an audit driven by an Architecture Checklist.
8
9. Like the Developer Principles the Architecture Checklist was developed specifically for this suite
of applications. The temptation to try and make a generic tool which could be widely reused was
resisted.
The audit of the system using the checklist was performed by the architect and technical lead
paired at a workstation. The code was checked out clean and the IDE and test framework were
used to check various points. In previous projects the author had experienced audit exercises
driven by a review of design documentation followed by an interview. This required overly time
consuming preparation, was stressful and less effective than inspecting the code and running
tests.
Since the check list was developed for this application suite most points were pertinent. Each
check was phrased as a question where a given response would sometimes indicate that a more
detailed section was applicable. Not every question was answerable by a simple yes or no;
instead where appropriate the reviewers recorded a written answer. This formed part of the
documentation and most importantly stimulated deeper inspection.
The audit exposed the architect and technical lead (who may or may not have been a senior
developer depending on the project) to the code. It was not a fool proof tool and obviously did not
detect all errors. It did expose issues that would have been show-stoppers later in the application
lifecycle. E.g. an integration module was found to report a network connectivity error to the
operators in exactly the same way as an unexpected response from an external system. These
errors required different escalation routes (the former to the network team, the latter to the owner
of the integrated system). This was stipulated by the Development Principles but had been
missed during development. The audit picked up these sorts of issues which previously had only
been uncovered in UAT or production.
The cost of the audit was high. It was expensive to develop and maintain and required significant
input from project members whose time was heavily in demand. Scheduling was difficult and
audits were often delayed, which increased risk to the project. The audits were enormously
beneficial and fully justified their high cost. They provided valuable empirical evidence and
reduced the architect's isolation from the implementation.
4.3. Structuring the application architecture to promote good governance
It was found that restructuring the application architecture was a contentious but effective tool to
improve technical governance.
The application was designed from inception with a clear modular structure with loose coupling.
Events had demonstrated that it was still possible to build components that violated
encapsulation by the corruption of shared services or simply by consuming all the CPU or
memory allocated.
9
10. When the new projects were initiated the assumption was that they would all be extending the
existing application. The architect determined that this made the technical governance more
difficult. Towards the end of the first iteration he proposed a departure from this architecture. The
monolith would be replaced several discrete applications. Where the same code was required in
more than one platform this was moved into shared libraries (distributed and controlled via
Maven) which individual applications could branch if required.
This move had a significant positive effect. The teams were decoupled in the same way as their
applications. The silos that had been in effect previously were recreated but with clean interfaces
which could be easily policed. Developers now had freedom to innovate rather than a license to
interfere and disrupt.
Although their deployment workload had been increased, the operations team was supportive
because they were given better performance testing guarantees.
One of the failures late in 2008 had been caused by a presentation layer module consuming
unacceptable levels of CPU capacity. Since all modules ran in the same container it took over a
week of repeated tests to ascertain that the complex integration modules, obvious candidates for
extreme CPU use, were not at fault. The updated architecture allowed each component to be
load tested in isolation. This meant that issues were identified with fewer test cycles.
Simplification and encapsulation of the implementation directly led to more effective architectural
governance without imposing onerous processes. Although the initial emotional response to this
change was that it would be very costly, in retrospect, even though it was initiated in the second
iteration it still only required an additional ten days of development time. It saved many times that
effort by removing the requirement to regression test alone.
4.4. Key measurement tool: testing
The technical architect had always placed a high value on a test first strategy and the adoption of
TDD at T-Mobile was the subject of an Agile 2008 Experience Report [3]. All projects had a
reasonable unit test coverage (60% lowest to 85% highest). Unit tests were written by and were
mostly for the benefit of the developers.
A second class of tests, labelled acceptance tests, were closely aligned with the user goals of the
system and were intended to be developed in conjunction with the proxies for the business
stakeholders (organizational issues precluded the direct involvement of the stakeholders
themselves).
On some projects resource constraints meant that the acceptance tests were often created by
the developers without the involvement of other participants. Whilst these tests still had
significant value, an opportunity for the architect and proxy stakeholders to verify that the system
was fit for purpose as they understood it was missed. As the tests were developed solely by
engineers the technical complexity of the code inhibited comprehension by proxy stakeholders.
10
11. To mitigate against the above issues the architect initiated the development of a new class of
tests, referred to as use case tests, which ran against the application fully deployed. These tests
were supported by a simple framework which aimed to make the tests themselves resemble the
language of the interface specifications, i.e. the tests were expressed in a language that the
stakeholder proxies could understand.
These tests became a powerful tool for measuring completeness against functional
requirements. The construction of this test suite highlighted several areas where the
implementation had diverged from the published API. These tests also provided a seed into the
creation of load tests (using JMeter).
The technical architect helped construct and run these tests rather than relying on a report from
others. This practice was prioritized as another mechanism to reduce the distance between the
architect and the implementation [1].
Definition of load testing profiles (i.e. agreeing what constituted 100% load), detailed review and
coordination of load test execution were key responsibilities of the architect. The cost of these
activities was extremely high but entirely justified by the direct exposure it gave the architect to
the non-functional aspects of the system. These aspects are fundamental in delivering the
architectural remit. It was found that the architect and technical lead would be forced to
concentrate solely on load testing for long periods. This came at the cost of ignoring the
demands of the other projects during these times. It was a significant issue if these load tests
occurred at the same time as other critical activities (such as retrospectives or sprint planning) for
other projects.
JMeter and use case test execution gave the technical architect a high level of confidence that
the applications were fit for purpose based on empirical evidence and real experience rather than
documentation.
4.5. Documentation that is 'good enough'
In keeping with Agile values, the technical architect was determined not to expend valuable effort
producing documentation with no clear purpose when that effort could be better used to bring
delivery closer. At the same time the architect wanted to ensure that where documentation was
genuinely required it was fit for purpose.
One project delivered a set of web services. A comprehensive, example based, API specification
was produced. This document was as formal as any document delivered by waterfall projects at
the client. This document was identified as being critical to success and therefore justified its high
cost in man hours to write and review.
Previously a design document for each module had been mandated. This rule was discarded.
Instead, key areas were identified by the architect or the team as being important or complex
enough to justify some form of design review and capture. White board sessions were led by the
appropriate developer.
11
12. These were captured using digital cameras and uploaded to the wiki along with a brief summary
of any conclusions or activities to complete. Where an area was identified as particularly
important the architect had formal UML diagram production added to the sprint backlog. This
ensured key documentation was completed, its cost was visible and that cost was not absorbed
into the development activity. These documentation tasks had specific goals, e.g. communicate
the lifecycle of an object through a state diagram such that the design can be verified against use
cases. The UML diagrams were held in a single, source controlled, highly accessible UML
repository.
Given the client's history of a document centric process the new approach to documentation was
always going to be contentious. This was especially true when the development process had
some sort of interaction with the wider organization. A security audit was performed several
months into one project's life by an external group who had been informed that they were dealing
with an Agile team. Due to some inter-programme communication issues, they were only
supplied with a couple of power point slides. This
met all their preconceptions of Agile. The auditors were surprised when the architect was able to
supply on demand a number of succinct and appropriate documents. These were generated from
the UML repository or copied from the wiki but were exported into a company standard document
repository to comply with versioning and accessibility rules. This demonstrated to the security
auditors that the project was as rigorous as any of its waterfall peers.
4.6. Delegation
There is always going to be a point where it is impossible to achieve any more development
throughput without adding more people to the equation. It is the familiar pattern of horizontal
scalability always outperforming vertical scalability at some point.
Agile projects empower developers. Empowerment requires delegation. To be able to delegate
tasks you need to have a team which is fit for purpose [4]. As part of the client's rigorous
selection procedure the architect performed a technical assessment of all new joiners with a
development remit. In the course of his career the architect had observed many interviews being
concluded using emotional rather than empirical methods.
The technical architect developed a set of case study driven interviews customized for the T-
Mobile project. This meant that the candidate could be exercised using the project's working
practices and technologies. Interviewees were expected to run white board design sessions
based on common problems the project faced or use TDD using Maven and Eclipse. The
interviews were designed to give candidates a vehicle to demonstrate their abilities rather than
trying to trip them up.
12
13. It was the architect's opinion that this was a critical factor in assembling a strong team whose
practical ability had been proven before they started the project. This enabled a high degree of
immediate delegation and empowerment. The technical architect was keen to allow developers
to take the lead in producing the solution with minimal guidance. This allowed best practices to
be developed by individuals and then adopted across the teams.
Ivory tower architects often concentrate on technology rather than people. This stops them
delegating and therefore impedes the development scalability.
13
14. 5. Conclusions
An architect must reduce their isolation from the implementation by being closely involved
with high value technical activities such as load testing and code reviews.
An architect can become isolated from parts of the system if they cannot find time to
cover all areas because they are attempting to also be a full time developer.
Relying solely on documentation as a tool for technical governance is not an effective
strategy but there are documents (which may be on the wiki or in the UML repository)
which are essential to the architect.
Learning to employ ‘soft’ skills becomes as important as technical acumen because
without excellent communication and effective delegation you cannot scale up.
Automated, well written tests are the architect’s best mechanism to gather empirical
evidence of compliance with governance and fitness for purpose.
The best techniques cannot be applied every time because of cost. Choose the most
important areas for the high cost activities and use less effective but cheaper methods for
others.
Activities that made the developers aware that the architect was observing the source
increased the perception of common code ownership and invigorated developers to
maintain high levels of code quality.
6. References
[1] V. Hazrati, The Shiny New Agile Architect, 2008
[2] J. McGovern, A Practical Guide to Enterprise Architecture, 2003
[3] A. Rendell, Pragmatic and effective Test Driven Development, 2008
[4] S McConnell, Rapid Development, 1996
14