Scaling API-first – The story of a global engineering organization
IT Architecture Automatic Verification
1. IT Architecture Automatic Verification
A Network Evidence-based Approach
António Alegria André Vasconcelos
Instituto Superior Técnico, Universidade Técnica de Lisboa Center for Organizational Design and Engineering (CODE)
PT Comunicações INESC Inovação and Instituto Superior Técnico
Lisbon, Portugal Instituto Superior Técnico
antonio-m-alegria@telecom.pt andre.vasconcelos@ist.utl.pt
Abstract— Ensuring constant synchronicity between the IT According to Vasconcelos [3], the construction and
Architecture (ITA) and the actual Information Systems (IS) maintenance of an ISA is fundamental to the proper
without the help of automatic tools is an intractable task, development of technology’s full potential in supporting
especially when taking into account modern IS’ rapid evolution business requirements. Without an ISA it is impossible to plan,
and growing complexity and distributed nature. We propose an analyze, discuss, decide, build (successfully) – and also
automatic AS-IS ITA verification methodology and framework measure and control – what cannot be specified or represented.
based on deep passive network traffic analysis and logical
inference rules with the goal of inferring relevant facts about the The ISA architectural level should be the map that drives a
actual ITA. The resulting knowledge is described according to a methodical, orderly and business-oriented technological growth
conceptual model designed for this purpose. We also propose an in organizations. Considering the ISA as the set of design
organization-independent mapping relationship between that artifacts relevant for the description of Information Systems, so
ITA network evidence model and an ISA modeling framework that it is possible to produce it in accordance to the
(CEO Framework), at the technology (ITA) level, realized requirements and maintain it during its lifetime [4], it becomes
through a set of logical deduction rules. These rules formally clear that its importance depends on its consistency with the
define the conditions that must hold between the inferred actual reality of the ISs it describes. As such, it is essential to
evidence and a high-level ITA model (both represented in this the usefulness of the ISA that its construction, maintenance and
inference system) in order to declare that model factual and in evolution are made in sync with the evolution of the IS.
line with reality. It is the automatic execution of these rules that
realizes the verification process that reports, as a result, all the B. Problem Statement
significant detected discrepancies. The proposed concepts and
Ensuring constant synchronicity between the ISA and the
methodology are implemented in a prototype applied to a case
study in the leading Portuguese Telecom operator. The proposed
actual IS without the help of automatic tools is an intractable
solution was shown to be capable of successfully verifying the task, especially when taking into account modern Information
case study’s ITA model as well as discovering new, Systems’ rapid evolution, and growing complexity and
undocumented, information through logical inference. distribution [5]. Common approaches to ISA planning ([6], [7])
specify an AS-IS ISA and ITA elicitation step but do not
Keywords - Network Traffic; Passive Monitoring; Deep Packet propose any automatic methods or approaches to aid in
Analysis; Logical Inference; Information Systems; IT Architecture assuring the actual correctness of the resulting documentation.
It is therefore necessary to establish a process encompassing
I. INTRODUCTION continuous IS monitoring and automatic ISA verification in
A. Motivation order to detect discrepancies and improve the ISA’s
maintenance and evolution. Furthermore, this process needs to
Modern phenomena such as globalization, the merging of be integrated into a holistic IS and ISA planning and
business and IT, the emergence of new technologies and the construction process.
introduction of new business models and regulations occur at
an ever increasing pace, demanding a swift adaptation of This research argues that it is possible to take advantage of
modern organizations and their Information Systems (ISs). the fact that most currently relevant Information Systems “live”
Enterprise Architecture (EA), of which the Information System and “cooperate” in networked environments and communicate
Architecture (ISA) is an integral part of, is considered a critical through several, predominately IP-based, application protocols
instrument to address this need [1]. which we can capture and analyze, in real time, through
technology common to network security experts.
Enterprise Architecture – and the ISA and its IT
Architecture (ITA) layer – is an ongoing process that should be Therefore, the main question framing this research is:
kept in sync with developments both in the external business “How to automatically verify if an IT Architecture model
environment and inside the organization, including both its specifies the actual reality of production Information Systems,
strategy and operational processes [2]. through the analysis of passively captured network traffic
generated and consumed by these systems?“.
2. C. Research Goals and Scope addition, this rigorous specification of its metamodel based on
The central theme of this paper is the possibility of UML makes it easy to extend and manipulate.
matching an ISA model and the network traffic actually The CEOF2007 is comprehensive and flexible with regard
produced by the involved ISs. We explore and establish a to the modeling of infrastructure, software components,
relationship between conventional ISA concepts (at the execution environments and IT services. However, this
Technology or IT Architecture level) and the information that framework does not specify a way to properly model network
can be automatically inferred by correlating all the facts connections, IT service’s network interfaces and how the ISs
obtained through network traffic capture, deep inspection and interact and expose their services through these interfaces.
analysis. We apply this relationship with the goal of Moreover, the CEOF2007 does not define attributes relating to
investigating the practicality of automatically verifying the the actual low-level naming of architectural components (e.g.
reality of an ITA model by confronting it with information infrastructure nodes’ host names, software components and IT
inferred from actual network traffic generated by production Services’ low-level names). Since the ISA model is a high-
ISs. level document, the names given to its components are
The scope of this work is restricted to the technology part commonly descriptive and divergent from the actual concrete
of the ISA – the ITA. Nevertheless, this subject was always names used at a technological level.
approached as an integral part of the ISA and the EA. In accordance with the established evaluation principles
Furthermore, we restricted ourselves to passive TCP/IP and in comparison with Archimate, TOGAF and RM-ODP, we
network traffic capture and analysis as the used data source. On consider CEOF2007 to be the most appropriate to the purposes
top of that, the monitored network traffic is assumed to be of this research, providing that some of the previously
unencrypted; otherwise, the encryption keys would need to be described limitations are addressed.
made available.
B. Traffic Analysis in Enterprise Networks
II. STATE OF THE ART RESEARCH In order to collect evidence about the actual ITA from an
As part of this research we studied the area of Information Enterprise Network, we analyzed and compared different
Systems and Information Systems Architecture modeling, from automatic and online network and application discovery
which we analyzed and compared four different modeling techniques: Agent-based (including log analysis), Active
frameworks that include an IT architecture level. Moreover, we Analysis with remote access Credentials, Active Network
investigated several techniques used in the automated and Probing and Passive Network Monitoring [10], [11].
online discovery of information about networked Information
In terms of cost and impact of deployment and operation on
Systems and their actual behavior and low-level technology
current production infrastructure, passive monitoring methods
(IT) architecture.
have been shown to require less effort, cost and time-to-
A. Information Systems Architecture Modeling Frameworks deployment. Passive techniques are, by definition, completely
Based on the state of the art review on the subject of ISA transparent with regard to the network traffic and other
modeling we analyzed and compared four different systems’ resources and, therefore, are the less intrusive
frameworks: CEO Framework (CEOF2007) [3], Archimate [8], approach to enterprise IS monitoring.
RM-ODP [9] and TOGAF [7]. Given our research goals of Despite not offering the highest level of detail, passive
establishing a relationship between an ITA model and the approaches can become nearly-ubiquitous and are able to
network traffic generated by the targeted ISs, this analysis observe the actual interactions between Information Systems.
focused on the technology level and involved criteria related to: Passive approaches offer a broad, real-time and up-to-date
support for proper alignment between the ITA and other ISA view of the overall usage profile of IT services and the actual
architecture layers; level of support for several important ITA relationships between Information Systems while also being
concepts related with IT services and their network interfaces; able to capture and analyze actual application layer
and the level of support for modeling notation and formal information.
specification.
In conclusion, due to its ease of deployment, broadness of
Our assessment of all frameworks’ features, virtues and coverage, real-time capabilities and the possibility of tapping
limitations, as well as our collaboration with CODE1, led us to into actual application-level interactions and data, we consider
decide on the CEOF2007 as the modeling language and passive monitoring to have the greatest potential and usefulness
framework to set the domain of discourse when referring to in large organizations. Therefore, one of our main aims is to try
ISA and its modeling. to push passive monitoring as further as possible in the ITA
Firstly, the CEOF2007 ensures proper alignment between verification domain.
the ITA level and all other ISA architecture levels and, because III. A SOLUTION FOR AUTOMATIC ITA VERIFICATION
it formally defines its primitives and concepts and their actual
notation as an UML profile, it is easy to extend and to offer We approach the problem stated in section I.B by
different views on its models, according to the stakeholder. In identifying and addressing the importance of integrating and
framing the solution into an existing ISA planning process. We
chose to extend the process proposed by Vasconcelos, Sousa &
Tribolet in [12] by introducing an explicit monitoring and
1 CODE – Center for Organizational Design and Engineering, INESC INOV verification step which, continuously and over the ISA’s
3. Figure 1. Proposed ISA Planning, Building and Maintenance Process (modeled using CEOF2007 UML profile [13])
lifecycle, checks if the current (AS-IS) ISA is actually a set of comprehensive verification rules that map between an
consistent with what is observed in the current Information ISA modeling language (such as the CEOF2007) and the ISs’
Systems’ interactions over an organization’s network. Figure 1 and their ITA’s manifestations evidenced in their generated
presents the resulting process, with our extensions drawn in a network traffic. These manifestations can be detected and
lighter gray tone. inferred through deep passive traffic analysis as well as logical
inference techniques.
Considering the continuous and cyclic nature of ISs and
their ISA’s development [4], and the necessity to cover two These rules effectively define the conditions that must hold
distinct verification situations – verifying the AS-IS ISA and in order to consider an ISA consistent with the actual reality
the TO-BE ISA, following its implementation – we also inferred from the network traffic generated by the
introduced another higher-level loop. This cycle takes organization’s ISs.
advantage of the resulting ISs – after the TO-BE ISA’s
implementation – as inputs to the AS-IS ISA updating and This verification process’ inputs are composed by the actual
maintenance process. This way, we can verify in a single, ISs, their expected AS-IS ISA model and the verification rules,
unified verification step, both the reality of the AS-IS ISA which actually map between the domains of the ISA modeling
model as well as if the implicit and explicit expectations in the language and some abstraction over the network traffic
TO-BE ISA model were accomplished in its implementation. generated by the targeted ISs. That abstraction is defined by a
conceptual model named Netfacts, described in section III.B.
We tested this process using an extended CEOF2007 as the
ISA modeling language (described in section III.C).
Section III.A describes the «Monitor Network Traffic» step
which leverages state of the art deep packet inspection
techniques and logical inference to passively and continuously
capture and analyze an organization’s network traffic in order
to detect and infer evidence of the actual ISs’ technology
architecture.
Section III.D describes the «Mapping Rules» that realize
the «Verify If Rules Hold» step.
Figure 2. Simplified model of the Automatic ITA Monitoring and
Verification Process (modeled using CEOF2007 UML profile [13]) The verification process’ results address the outcome of
each test, allowing the architect to assess discrepancies
Nevertheless, the main contribution and focus of this between the AS-IS ISA model and the deployed ISs. This
research is the actual ISA verification step («Verify ISA’s output can be used as a resource in the AS-IS ISA updating and
Reality»), shown in further detail in Figure 2. This step applies maintenance process, if any discrepancies are detected.
4. Nonetheless, this update process is out of the scope of this analysis of communications between systems [15], [16], [17],
research. [18].
A. Network Traffic Monitoring In sum, this layer offers the possibility of inferring the
At present, ISs predominantly cooperate and interoperate relationship graph between infrastructure elements in a
through TCP/IP networks [14]. The authors argue that if we are computationally efficient way, without the need to inspect all
able to observe the network traffic that involves these traffic but only its network and transport-layer headers.
interactions and successfully interpret it, it is possible to 2) Superficial Inspection of Application-layer Content:
establish a sort of digital ethnography method whereby we are Having the application-layer content stripped and reconstructed
able to reconstruct the ITA automatically and in real-time.
in the previous level, this level analyses application-layer
By taking into account the structure of network traffic that content by confronting it with a set of patterns (e.g. regular
realizes the ISs’ interactions and their supplied IT services’ expressions) that, if matched, identify the used application-
usage, we systematize passive network analysis techniques in layer protocol. It is possible to discover other kinds of
increasingly detailed layers. Raw network traffic constantly information explicitly announced by the protocol, such as what
captured through passive monitoring can be subject to software components support each end of communication
inspection and analysis in three levels: Sub-Application-layer
Inspection; Superficial Inspection of Application-layer flows. These patterns are commonly referred to as signatures or
Content; and Deep Interpretation of Application-layer Content. fingerprints because they serve as unique identifiers for these
network traffic elements [19].
These levels constitute layers of traffic analysis on which In sum, these techniques add to the previous level’s
information inferred in one layer can be used as the basis for infrastructure relationship graph information about application-
the analysis accomplished in the upper layers. layer protocols (levels 5 to 7 of the OSI model [20]) and
1) Sub-Application-layer Inspection: All the packets that software components in each end of communication flows.
constitute a network flow contain, in their TCP/IP headers, 3) Deep Application-layer Interpretation: Although it is
information that characterizes the communication at a basic but possible to develop a set of signatures that can classify and
crucial level. By inspecting and analyzing the configuration extract diverse information carried in network traffic, these
flags and data carried in the IP and TCP headers it is possible kinds of techniques are practically restricted to identifying
to infer significant information about the participating systems. protocols and are not as flexible or useful in the extraction and
These kinds of techniques enable inferring information inference of other information carried in network traffic’s
about addresses (network and transport) and operating systems application-layer payload, due to the inherent limitations of
Figure 3. Conceptual (Entity-Relationship) Model of actual ITA evidence detected and inferred from Network Traffic
at both ends of network flows. Furthermore, they make regular expressions. According to Noam Chomsky [21] it is
possible to reassemble and correlate network flows allowing impossible to describe a higher-level language with a lower
proper examination of their content and temporal and spatial level one. Regular expressions define regular languages – the
lowest level of Chomsky’s language hierarchy – while most
5. languages and protocols used at an application level of C. Extending the CEO Framework
communication are mostly context-free languages (e.g. XML, The assessment made of the CEOF2007 (section II.A)
SOAP and SQL are context-free languages). identified some key limitations which prevented it from being
Therefore, in order to gain insight into what the ISs are effectively mapped with the actual ITA evidence inferred from
actually doing on the network, there needs to be a higher-level passive network analysis, described through the Netfacts model
analysis whereby, after the application-layer traffic payload is (section B). We addressed these limitations by extending the
extracted and reassembled in the first level and classified in the CEOF2007 metamodel (at the M2 level) through the addition
second level, it is possible to forward it to specific specialized of some new primitives that reify and formalize concepts such
interpreters, per application-layer protocol. These interpreters as the «Operating System», the «Network Connection» and
are capable of decoding and understanding conversations «Network Service Ports», which formally specifies IT
between ISs while they use each-others’ IT services, in the Services’ network interfaces. Furthermore, we added new
same way applications decode the traffic sent between them. attributes to existing primitives, namely the «concrete name»
and «version». The full specification and detailed rationale of
Exactly what information can be inferred at this level is
these extensions to the CEOF2007 are available in [26].
extremely dependent on the specific application-layer protocols
and the information explicitly declared in the interactions From this point on we refer to the extended framework as
between ISs. Nevertheless, it is possible in many cases to CEOF2007+.
extract and infer concrete names of infrastructure nodes,
software components and IT services and operations, including D. Mapping Rules Between Netfacts and CEOF2007+
used parameters, as well as user login names and low level In order to define a mapping association between the
information entities (e.g. database schema, remote file system Netfacts model and the CEOF2007+ meta-model, we propose a
structure) [22], [23], [24]. comprehensive set of mapping rules specified using a subset of
first-order-logic (Horn clauses, as used in Prolog [27]). These
B. Netfacts Model rules prescribe the set of criteria used to check if an ITA model
In order to manipulate and map all the evidence inferred is factually aligned with the reality described by the facts
through the previously described techniques to a higher level structured according to the Netfacts model. Verifying if these
ISA model we need to establish a conceptual model that rules hold establishes the actual ITA verification process.
defines and relates all inferred facts. We propose a generic
model that frames and relates all the different kinds of ITA The following two formulas show a small example of such
evidence we can infer from the previously described passive rules, assuming the predicates defining the domain-of-
network traffic analysis methods. This model, named Netfacts, discourse have been previously defined (e.g. Name(x,n) means
is designed with the main goal of being generic and that n is a concrete name of x).
independent from any ISA modeling Framework and any
traffic analysis technique or tool. MapIPBToSwComponent ( x, y ) ≡
∀x∃i∃n∃v∃t ( IIP ( x) ∧ BaseNetConnIp ( x, i ) ∧ Name( x, n ) ∧ Version( x, v) ∧ ServiceType( x, t )
The main purpose of the Netfacts model (presented in ∃s ( SwComponent ( y ) ∧ IpSwComponent (i, y ) ∧ Name( y , n) ∧ Version ( y , v) ∧ ServiceType( y, t )))
(1)
Figure 3) is to be a reference framework to describe, store and
MapIIBItSvcUsageToNetFlow( x, y , z) ≡
manipulate all the facts pertaining to the manifestations of ISs ∀x∀y ∃i∃w( IAPB ( x ) ∧ ItSvc( y ) ∧ Uses ( x, y ) ∧ BaseNetConnIp( x, i) ∧ NSP ( y , w)
on their generated network traffic and enable mapping these ∃z ( NetFlow( z ) ∧ MapNSPToNetFlow( w, z ) ∧ SrcIp ( z , i)))
low-level evidence with any ISA modeling framework, at a (2)
technology (ITA) level.
Netfacts is a conceptual model that specifies simple entities Rule (1) defines a mapping between an «IT Platform
describing facts about network communications between Block» (CEOF2007+) and a «Software Component» (Netfacts)
information systems. These communications are embodied in through direct matching between analogous attributes. It reads
«Network Flows», the central and mediating entity that as: “If there is an «IT Platform Block» whose supporting «IT
represents a single coherent communication session between Infrastructure Block» maps to a «Network Host» through one
two TCP/IP endpoints. Communications between infrastructure its «Network Connection» then there must be a «Software
nodes («Network Host») are made through «Network Flows» Component» with matching attributes detected in that
over a set of «Application Layer Protocols». On each end of a «Network Host» for at least one «Network Flow»”.
«Network Flow» it is possible to detect the existence of Rule (2) defines the mapping between an «IT Service»
participating «Software Components» as well as the utilization usage relationship and its corresponding «Network Flow». It
of IT «Services» and «Operations», including the specific reads as “If there is an «IT Service» (ITS) used by any «IT
«Operation Parameters» that were used. Block» (ITB) then there must be at least one detected «Network
This model supports the indication of what service types Flow» whose source «Network Host» maps to any of the ITB’s
are supported or supplied by a «Service» or «Software «Network Connections» and whose destination «Network
Component», in line with TOGAF’s service taxonomy defined Host», «Transport Port» and «Application Layer Protocol» all
in its Technical Reference Model (TRM) [7]. map to at least one of ITS’s «Network Service Ports»”.
Netfacts is described and specified in detail in [25], All the implemented rules and their domain of discourse
including its entities, attributes and associations. specification are available in [28].
6. IV. TECHNICAL IMPLEMENTATION OF THE PROPOSED 2) Superficial Application-layer Content Inspection: At
SOLUTION the Superficial Application-Layer Content Inspection level, the
In order to validate the previously described proposal we NTMA has one subcomponent based on two variations of the
developed a proof-of-concept tool. This tool implements the PADS 1.2 [19] software to infer and identify «Application
ITA monitoring and verification process detailed in section III Layer Protocols» used in each «Network Flows» and
and presented in Figure 2. This prototype is generic enough to «Software Components» participating on each end of these
be applied in any organization as long as all the restrictions flows.
listed in section I.C are satisfied. The two variations of PADS are each responsible for
The prototype’s architecture is made of two main analyzing different segments of network flows: traffic coming
components with distinct concerns and responsibilities and from the flow's source and from the flow's destination. Both
shows how to structure an actual ITA model verification tool: use different sets of signatures whose format was extended in
this research to support the explicit specification of «Software
• Network Traffic Monitoring and Analysis engine Components»' service types, according to TOGAF’s TRM
(NTMA) – implements the «Monitor Network Traffic» service taxonomy [7].
process and is responsible for passively analyzing In the case study’s research context (see section V) several
(previously captured) network traffic and produce facts important application-layer protocol signatures were developed
relating to evidence about the actual ITA, described in such as signatures for Tuxedo, Tibco Rendezvous, SOAP,
accordance with the Netfacts model («Netfacts HTTP, Oracle Database (TNS) and Microsoft SQL Server
Instantiation»). (TDS). These are all common protocols making up a
• ITA Inference and Verification Engine (IIVE) – considerable part of traffic in corporate networks
responsible for manipulating all the facts inferred by the 3) Deep Interpretation of Application-layer Content: The
NTMA enabling their exploration and the discovery of new deep interpretation component is responsible for analyzing
information as well as the execution of the ITA verification traffic at the highest level of the traffic analysis hierarchy. This
tests («Verify If Rules Hold »). These tests simply apply the component is made of:
mapping and verification rules explained in section D in
order to reach a conclusion about the reality of the given • one capture and Superficial Application-layer Content
ITA model. Inspection-level traffic analyzer – preprocesses and
classifies network traffic and forwards it to specialized
These two components are described in fuller detail in the
interpreters, depending on the detected application-layer
following section.
protocol;
A. Network Traffic Monitoring and Analysis engine (NTMA) • three deep interpretation components, each specialized in
The NTMA is composed of four main independent traffic different application-layer protocol stacks: HTTP/SOAP,
analyzers that, together, infer evidence and produce the facts SQL and Oracle TNS.
described by the Netfacts model. These analyzers operate at
different levels of the traffic analysis hierarchy defined in The HTTP/SOAP interpreter uses an HTTP parsing library
section III.A. [29] and, if it detects that the HTTP message’s body is SOAP it
forwards it to another specialized SOAP envelope interpreter.
1) Sub-Application-layer Inspection: At the Sub- This interpreter infers what IT services and operations are
Application-layer Inspection level, the NTMA has two being called, with which parameters (name and type) and by
subcomponents, built on top of Open Source tools, that manage whom. On the other hand, the HTTP part of this component is
and coordinate their execution and parse and interpret the able to infer information such as concrete names associated to
generated output in order to produce information conforming to servers «Network Hosts» as well as client and server «Software
Components» involved in these interactions.
the Netfacts model.
One of these subcomponents uses IPAudit 1.0 [17] to infer The SQL interpreter parses SQL queries inferring
and identify all «Network Flows», «Network Hosts» and used information about the used databases’ schema such as used
«Transport Ports», including statistics about temporal tables and columns as well as database and data service’s
(beginning and end timestamps) and size (number of bytes and concrete names, effectively discovering data pertaining to the
packets sent and received) dimensions of «Network Flows». information architecture.
The other subcomponent uses p0f 2.0.8 [18] updated with The TNS interpreter specializes in parsing TNS service
signatures from the PRADS project 2 to infer and identify request messages used by clients of a data service provided by
«Operating Systems» used by each end of the «Network a database hosted on the DBMS Oracle Database [30], [31]. By
Flows». We also developed a simple mechanism to classify the analyzing these messages, this component is able to infer
operating system’s family (e.g. Windows, UNIX, and Mac information about the data services’ concrete names, user
OS). names, concrete names associated to the client and server
«Network Hosts» as well as «Software Components» in both
ends of the communication flow.
2 PRADS – http://gamelinux.github.com/prads
7. 4) Integrating all Traffic Analyzers: After all captured Logtalk classes. These facts serve as the domain for all
network traffic is properly analyzed and distilled, all inferred mapping rules in the Knowledge Base.
facts about the actual ITA (conforming to the Netfacts model) 3) Knowledge Base: The Knowledge Base is made up of
are integrated into the same «Netfacts Instantiation» all the mapping and verification rules mentioned in section
knowledge base (see ) by correlating them by network flow (IP III.D and which encode all the domain knowledge that
addresses and transport ports) and temporal approximation. establishes a mapping relationship between an ITA model in a
Afterwards, this data is converted to Prolog facts and written particular ISA modeling language (CEOF2007+ in this case)
into an output file to be read by our ITA Inference and and the actual ITA’s reality evidence inferred through the
Verification Engine. passive capture analysis of network traffic generated by the
organization’s ISs (Netfacts model). By checking if these rules
B. ITA Inference and Verification Engine (IIVE)
hold, for a given working storage, we are in fact verifying if the
The ITA Inference and Verification Engine (IIVE) is one of ITA’s model is consistent with the evidence inferred from
our prototype’s main components and is responsible for the
actual network traffic, generated by current production
manipulation of all Netfacts-conforming facts inferred and
produced by the NTMA and for the automatic verification of systems. This verification is done automatically by the
an ITA model by checking if the mapping rules hold between Inference Engine.
that model and those facts. 4) User Interface: The IIVE’s user interface component is
based on the command line interface supplied by the used
IIVE was developed with the Logtalk object-oriented logic Logtalk/Prolog environment. This command line supplies a
programming language and runtime environment [32]
supported by the SWI Prolog implementation [33]. The choice way to query the Working Storage and apply the knowledge in
of language is justified by Prolog’s automatic inference the Knowledge Base to the whole ITA model, executing the
features and semantic proximity to first-order logic (used in the verification test-suite. Furthermore, we implemented the
mapping and verification rules specification) and Logtalk’s generation of verification reports that describe the verification
object-orientation that allow an easier handling of the process including each test’s description (e.g. what attributes
architecture primitives that usually compose an ISA modeling and/or relationships are being checked), results (e.g. pass, fail
framework such as the CEOF2007+, mapping its meta-model or unknown) and examples of facts used to reach a conclusion
to a class hierarchy that can be instanced to describe the actual about a specific verification step. These reports allow the
ITA model. architect to easily view all the detected discrepancies as well as
This component’s architecture is loosely inspired by classic all the confirmed architecture elements and those that were not
Expert Systems ([34], [27], [35]), being composed of four main possible to confirm or refute. Additionally, the architect is
subcomponents: Inference Engine; Working Storage; informed of the facts that served as evidence to reach a
Knowledge Base and User Interface. conclusion about a particular architecture element.
1) Inference Engine: The Inference Engine is the IIVE’s
“brain”, supplying the mechanisms to execute all the rules that
compose the knowledge base and enabling the exploration of
all the facts that compose the ITA description (CEOF2007+) as
well as all the facts that serve as evidence for the real ITA’s
manifestations on the captured network traffic (Netfacts) and
inferred through passive capture and analysis (NTMA).
The used inference engine is based on the Logtalk/SWI-
Prolog runtime environment which, by itself, offers a simple,
but sufficiently capable, inference engine for this prototype’s
purposes, especially taking into account the semantic proximity
between Prolog/Logtalk code and the first-order-logic
specification of the knowledge incorporated in the mapping
rules, and the Prolog execution model which, through
backtracking and logical unification, is able to efficiently and
automatically check if the mapping rules hold for a given
working storage and also discover new knowledge. In addition,
this inference engine is capable of logically inferring new,
undocumented parts of the ITA from observed facts related
other parts of that model.
2) Working Storage: The working storage is composed of
all the facts describing the problem-domain state and is Figure 4. IT Service Architecture of the studied IS ecosystem (correct
populated by all the Netfacts-conforming facts generated by the model in black; introduced errors in gray)
NTMA as well as the ITA model description in CEOF2007+
8. V. PORTUGAL TELECOM IT ARCHITECTURE VERIFICATION The concrete application of the proof-of-concept prototype
CASE STUDY to the case study was executed in two distinct situations,
corresponding to slightly different ITA models:
The concepts and processes described in section III and
materialized in the prototype described in section IV were 1. Correct model of the described ecosystem, actually
applied and evaluated in a case study of the IT Architecture of describing the reality of the production ISs (black model
a significant subset of the information systems supporting the of Figure 4, without the shaded elements);
sales function of the leading Portuguese telecommunication 2. Incorrect model of the described ecosystem whereby
organization – Portugal Telecom (PT) Comunicações3. several known mistakes were introduced in the correct
We took advantage of the existing network monitoring model. These errors aren’t limited to those shown in
infrastructure in the Pulso monitoring platform [36] – Figure 4 (in gray) and encompass all shown «IT
developed in-house at PT Comunicações – in order to capture Blocks»’ detailed architectures, not presented in finer
raw network traffic from different points in the corporate detail in this paper for space limitations
network. Despite this, all raw captured traffic is processed and
analyzed by our prototype which was developed and used in These different situations allowed us to assess the
total separation from this platform. capability of our proposal to positively verify a correct model
and to detect errors in an incorrect model, therefore
Next, we briefly describe next the studied IS, whose IT accomplishing its task of automatically verifying the reality
Service architecture is presented in Figure 4: and actuality of an ITA model.
• Sales Force Automation Portal (SFA) – web-based sales
VI. RESULTS
portal following a classic 2-tier architecture, including load
balanced web frontends and a data backend failover cluster This section reports the results of applying the developed
supporting the portal through a supplied data service. All proof-of-concept prototype to the previously described case
study (section V) and our assessment of the outcome. This
non-hardware components are based on Microsoft
analysis is broken into three parts: verifying the correct model;
technologies such as the .Net Framework 2.0 and IIS 6.0 on verifying the incorrect model and new ITA information
the web frontends and SQL Server 2005 on the data discovery.
backend.
• Order Entry System (SIREL) – manages order entry for A. Verifying the Correct Model
just-sold products. Its architecture is based on a failover Figure 5 displays a brief extract of the verification of
cluster of HP-UX servers supporting an «Order Entry SIREL Order Entry «IT Logic Block». The first test positively
Management» data service realized by a database over an confirms a usage relationship between SIREL and the
Oracle Database DBMS. “Distributed Transaction Processing” «IT Service» and is
unable to confirm or deny that it is this specific «IT Logic
• Service Framework (FWS) – middleware system that
Block» that realizes this usage.
supplies several integration Web Services used to access IT
Services offered by other systems. Its architecture is based it_logic_block: sirel_logic(Order Entry Logic)
on a simple 2-tier pattern including load balanced -------------------------------------------------------------------------------------------
application frontends and a failover cluster data backend
Testing any outbound network activity toward any «NetworkServicePort» supporting the «IT
Service» "Distributed Transaction Processing":
[PASS] Found matching network activity from «NetworkHost» "144.64.193.35" in
supporting the frontend applications through a supplied data «NetworkFlow»: flow_ead8bcbf6f44c2ad208d89e41fdcee7c1
service. As with SFA, all non-hardware components are Testing «IT Service» "Distributed Transaction Processing" usage through any of its
based on Microsoft technologies such as .Net Framework «NetworkServicePorts» by a source «SoftwareComponent» matching this «IT Block» or any
supporting «IT Application Block» or «IT Platform Block»:
1.1 and IIS 6.0 on the frontends and SQL Server 2000 on the [UNKN] No matching «SoftwareComponent» detected in a valid outbound connection to
the «IT Service» «NetworkServicePort»! Check full test results for details.
data backend.
• Tuxedo – distributed transaction processing middleware Figure 5. Brief extract of the verification of SIREL Order Entry «IT Logic
system. Its architecture is based on a failover cluster of HP- Block» (correct model)
UX servers supporting a «Distributed Transaction From the analysis of the produced verification report we
Processing» IT Service realized by the Tuxedo software reached the following conclusions, organized by type of
component. architecture element:
The information systems ecosystem composed of the above 1) IT Infrastructure Block: All «Servers» were positively
systems is characterized by their externally supplied services identified by at least one of their «Network Connections»
and their interrelations, realized by these services’ usage. and/or concrete names. In the case of failover clusters, only the
Figure 4 documents these relationships (lighter gray elements active (master) servers were detected, as expected in a typical
represent errors purposefully introduced in the model). production environment.
2) IT Platform Blocks and IT Application Blocks: All
«Operating Systems» were positively identified.
All other «IT Platform Blocks» were positively identified
with the exception of the .Net Framework 2.0 used in the SFA
3 PT Comunicações – http://www.telecom.pt
9. frontends and the SQL Server 2005 in the SFA backend. After
exploring all the facts generated by the NTMA (through the it_logic_block: sirel_logic(Order Entry Logic)
IIVE’s user interface) we came to the conclusion that the -------------------------------------------------------------------------------------------
ASP.NET 2.0 and other .Net Framework 2.0 components were Testing any outbound network activity toward any «NetworkServicePort» supporting the «IT
Service» "Work Order Notification":
detected but since these components’ concrete names were not [UNKN] No matching activity found. Checking details:
preconfigured in the ITA, the Inference Engine was not able to * Testing any outbound network activity:
[PASS] Found outbound activity from «NetworkHost» "144.64.193.35" in
match them to the .Net Framework 2.0. In the case of SQL «NetworkFlow»: flow_eb5d703c6917532de7f7b4140e1ea5d159
* Testing any outbound network activity towards any «NetworkConnection» supporting
Server 2005, the missing detection was caused by the fact that the used «IT Service»:
[FAIL] No outbound activity to any of the «NetworkConnections» supporting the
the developed TDS (SQL Server’s protocol) signature could «IT Service» was detected!
only detect earlier versions of the DBMS. Testing «IT Service» "Work Order Notification" usage through any of its
«NetworkServicePort»:
All verifiable «IT Application Blocks» (those with defined [FAIL] «IT Service» is not used by this «IT Block».
concrete names) were positively identified, including all Figure 6. Brief extract of the verification of SIREL Order Entry «IT Logic
databases («IT Data Block»).
Block» (incorrect model)
3) IT Service: All «IT Services» were positively identified
In all cases, none of these architecture elements were
through their attributes (e.g. concrete name and service type) positively verified. In the majority of reported cases (two
resulting in the discovery or confirmation of at least one thirds), they were explicitly reported as errors. However, there
«Network Service Port» to those services. were a few cases where the prototype could not find evidence
All «IT Services»’ realization relationships were positively to support or refute those parts of the model, declaring them as
verified with the exception of one data service realized by a unprovable and undetermined. In either situation, the prototype
database supported by SQL Server 2005, because, as was raised a “red flag” for every artificially introduced mistake in
previously explained, there was no signature identifying this the ITA model therefore prompting the architect to further
particular version of the SQL Server software component or its investigate the matter, possibly through the prototype’s user
used application-layer protocol (TDS). interface and inference engine.
All «IT Services»’ utilization relationships were positively Figure 6 displays a brief extract of the verification of
identified in terms of detecting network traffic corresponding SIREL Order Entry «IT Logic Block» where it detects the
the service’s usage. However, the client-side «Software introduced error – the logic block’s usage of the “Work Order
Components» were not detected for the matching network Notification” «IT Service».
flows and thus, were not able to be verified.
C. New Information Discovery
4) Main Problems: In the cases where we could not In addition to verifying the actuality and reality of ITA
positively verify an architecture component or relationship it models according to the mapping rules mentioned in section
was due to the lack of a particular application-layer protocol III.D, the developed tool is able to explore and discover
signature (e.g. SQL Server 2005) or concrete name disparity undocumented ITA-related information, through the
between what is specified in the ITA model and what was exploration and inference over the facts generated by MATR
inferred from the network traffic (e.g. .Net Framework 2.0 vs. and by taking advantage of the inference engine in MIVA and
ASP.NET 2.0). Nevertheless, it would only take a small effort a few simple inference rules.
(tweaking existing signatures) to improve our developed Through these mechanisms we were able to discover 50
signatures and interpreters to take these cases into account. The undocumented Web Services, Databases and Data Services
larger the signature base the higher the usefulness of the used or realized by components in the studied IS ecosystem as
developed tool. We consider, however, that we managed to well as parts of several Databases’ schemas (e.g. column and
achieve a significant coverage of application-layer protocols’ table names) as well as the login names of users accessing
and software components’ signatures. Improving string these services.
handling and matching would also help in matching concrete VII. CONCLUSION AND FUTURE WORK
names specified in the ITA model and inferred in the network.
A. Main Contributions
B. Verifying the Incorrect Model
Many of the techniques employed here have existed for
Assessing the results produced by the prototype when some time in the security and system management area.
verifying the incorrect ITA model we reached the conclusion However, this research’s main contribution is the proposal and
that all introduced errors were reported, allowing the architect demonstration of the application of passive network traffic
to fix the model. The purposefully introduced errors were monitoring and analysis as a way to infer relevant
diverse, including changing operating systems and software information about the actual status of the ISs and the use of
components and adding «IT Services» realized by the wrong this information in automatically verifying an ITA model by
«IT Block» as well as changing existing service usage trying to match it with facts inferred as evidence in the
relationships or introducing new, previously non-existing ones network traffic generated by those ISs. This contribution is
(Figure 4). held together by a set of other smaller contributions:
10. • Systematization of passive network traffic analysis (when needed) and automatically explain the conclusions
techniques as an inexpensive source of real-time reached by the inference engine (answering the questions
information about the actual state of the ISs and their ITA; “Why?” and “How?”).
• Conceptual model of information that can be • Model-oriented User Interface – integration of all the
automatically discovered through passive network concepts, techniques and tools hereby proposed in a
traffic analysis techniques. We named this model Netfacts; graphical ISA modeling environment.
• CEOF2007 extension enabling mapping its high-level
C. Future Work
meta-model with the network traffic generated by ISs
(described through the Netfacts model). We named this Considering all contributions, this research is not self-
fulfilling and serves as another stepping stone for further
extended framework CEOF2007+. The resulting framework
research. We consider the following themes to be important for
was applied in a real-world case study in a major Portuguese future work.
telecom company – PT Comunicações;
• Mapping between network traffic (generated by ISs) and ITA Automatic Discovery – a subject that serves as an
an ISA modeling language through the specification of important incentive for persisting on this research path and as
first-order-logic rules that specify the restrictions and an ambitious goal worth chasing is the Automatic Discovery of
the ITA based on capturing and analyzing the ISs’ and their
associations between the Netfacts model and the ITA’s manifestations in their generated network traffic.
CEOF2007+; Although the present research is still far from reaching this
• Automatic ITA monitoring and verification process, goal, we consider it serves a firm first step that leads the way
according to the actual state of the IS. This process was toward it. The proposed passive network traffic analysis
integrated and harmonized into a holistic ISA planning, methodology and the usage of logical inference techniques are
building and maintenance process, resulting in an extension contributions we believe can be leveraged in this new stage.
to the ISA planning process proposed by Vasconcelos [12]. Complex Relationships between Information Systems –
This extension establishes a continuous ISA planning, nowadays, and specially with gaining importance of SOA,
verification and construction cycle; most ISs communicate and relate with each other through
• Development and actual deployment of a proof of middleware systems and asynchronous messaging over ESBs.
concept prototype that encompasses all the research work In cases like these, these relationships aren’t directly mirrored
hereby described allowing the validation of this work by into the observed network traffic and so, there needs to be a
testing it in a real-world case study in a large enterprise. better way to infer them, such as what is proposed in [16], [15].
B. Limitations Extend Automatic Verification Process to other ISA
architecture levels such as the Information Architecture or the
In spite of these research contributions, we identified some
Application Architecture levels.
unaddressed limitations:
Apply to Other ISA Modeling Frameworks such as
• IS and ISA Planning, Building and Maintenance Archimate or RM-ODP.
Process – the proposed extension to the full ISA planning
process introduced in Figure 1 needs to be the tested and Use other data sources (e.g. active network probing,
validated (in case studies) in order to assess its theoretical agents and log analysis) in order to complement the automatic
merits. and runtime ISs and ITA information discovery capabilities.
• Detection of some important software components – in
spite of developing a considerable amount of new
application protocol and software components signatures, REFERENCES
some of these architecture elements (e.g. SQL Server 2005) [1]. Land, Martin Op 't, et al. Chapter 2: Overview.
couldn’t be classified by our prototype. Nevertheless, the Enterprise Architecture (The Enterprise Engineering Series):
proposed passive network traffic analysis framework Creating Value by Informed Governance. s.l. : Springer, 2008.
(section III.A) can be easily improved over time by [2]. —. Chapter 5: Processes Involved in EA. Enterprise
continuously adding new signatures and improving handling Architecture (The Enterprise Engineering Series): Creating
of subtle differences in concrete names and versions, Value by Informed Governance. s.l. : Springer, 2008.
without much added effort. Supporting SQL Server 2005,
for example, would only require a small tweak to the [3]. Information System Architecture Metrics: an Enterprise
existing TDS signature. Engineering Evaluation Approach. Vasconcelos, A., Sousa,
P. and Tribolet, J. 1, s.l. : Academic Conferences Limited,
• Expert System Features – a major part of our proposal’s
June 2007, Electronic Journal of Information Systems
architecture was inspired by rule-based Expert Systems.
Evaluation, Vol. 10, pp. 91-122. available online at
Despite this, some useful features ([35], [27]) weren’t
www.ejise.com.
implemented because they were not considered essential for
this research’s purpose. The most obvious are the ability to
interactively integrate the user’s knowledge at runtime
11. [4]. Enterprise Architecture: The Issue of the Century. Workshop on Business-Driven IT Management (BDIM 2006).
Zachman, J. March 1997, Database Programming and pp. 63-70. ISBN: 1-4244-0176-3.
Design, pp. 1-13.
[17]. Rifkin, J. IPAudit Web Site. [Online] July 12, 2005.
[5]. Brett, Charles. Automated Application Discovery: The [Cited: July 20, 2009.] http://ipaudit.sourceforge.net.
Enterprise Architect's Auto-Aide. s.l. : Forrester Research Inc.,
[18]. Zalewski, M. p0f 2 README. [Online] 2006. [Cited:
2007. White Paper.
July 20, 2009.] http://lcamtuf.coredump.cx/p0f/README.
http://www.forrester.com/Research/Document/Excerpt/0,7211
,44251,00.html. [19]. Shelton, M. About Passive Asset Detection System
(PADS). [Online] June 18, 2005. [Cited: July 20, 2009.]
[6]. Spewak, S. and Hill, S. Enterprise Architecture
http://passive.sourceforge.net/about.php.
Planning: Developing a Blueprint for Data, Applications and
Technology. s.l. : Wiley, 1992. ISBN-13: 978-0471599852. [20]. International Telecommunication Union (ITU). Open
Systems Interconnection - Basic Reference Model. s.l. :
[7]. Open Group. The Open Group Architectural Framework
International Telecommunication Union (ITU), 1994.
(TOGAF) - Version 9 Enterprise Edition. 9th Edition. s.l. :
Standard. Recommendation X.200 (07/94).
Van Haren Publishing, 2009. ISBN 9789087532307.
[21]. Three models for the description of language. Chomsky,
[8]. Lankhorst, M. & the ArchiMate team. ArchiMate
Noam. 3, September 1956, IRE Transactions on Information
Language Primer. Enschede : Telematica Instituut, 2004.
Theory, Vol. 2, pp. 113-124. DOI:
disponível em https://doc.telin.nl/dsweb/Get/Document-
10.1109/TIT.1956.1056813.
43839. ArchiMate/D1.1.6a.
[22]. Oracle. Oracle Real User Experience Insight - An Oracle
[9]. ISO/IEC. ISO/IEC 19793:2008: Information technology -
White Paper. [Online] March 2008. [Cited: July 20, 2009.]
Open distributed processing - Use of UML for ODP system
http://www.oracle.com/technology/products/oem/pdf/twp_use
specification. s.l. : Multiple, 2008. Standard. ISO/IEC
r_insight.pdf.
19793:2008.
[23]. Secerno. The SynoptiQ Engine: The Power Behind
[10]. Drogseth, Dennis. Planning for CMDB Design and
Secerno DataWall. [Online] 2009. [Cited: July 20, 2009.]
Adoption: An Industry Colloquium. [Online] September 1,
http://www.secerno.com/?pg=our-approach&sub=powerful-
2005. [Cited: January 15, 2008.]
analysis.
http://www.enterprisemanagement.com/research/asset.php?id=
225. [24]. Netwitness Corporation. Netwitness Investigator.
[Online] Netwitness Corporation, 2009. [Cited: July 20, 2009.]
[11]. Garbani, Jean-Pierre and Mendel, Thomas. The
http://www.netwitness.com/products/investigator.aspx.
Forrester Wave: Application Mapping For The CMDB, Q1
2006. s.l. : Forrester Research, Inc., 2006. White Paper. [25]. Alegria, António. Netfacts Model Specification. Lisbon :
http://www.forrester.com/rb/Research/wave%26trade%3B_ap PT Comunicações and INESC INOV, 2009. Technical Report.
plication_mapping_for_cmdb%2C_q1_2006/q/id/36891/t/2. Available at
https://fenix.ist.utl.pt/homepage/ist153841/technical-
[12]. Enterprise Architecture Analysis: An Information System
reports/netfacts-model-specification. INESC Tec. Rep.
Evaluation Approach. Vasconcelos, André, Sousa, Pedro
Reference Number 5718.
and Tribolet, José. 2, Ulm : Germany Informatics Society,
December 2008, Enterprise Modelling and Information [26]. —. CEO Framework: Technology Architecture
Systems Architectures, Vol. 3, pp. 31-53. Extensions. Lisbon : PT Comunicações and INESC INOV,
2009. Technical Report. Available at
[13]. Vasconcelos, André. CEO Framework UML Profile
https://fenix.ist.utl.pt/homepage/ist153841/technical-
v1.2. Lisbon : CEP, INESC-INOV, 2006. Technical Report.
reports/ceo-framework-technology-architecture-extensions.
[14]. Laudon, Kenneth C. and Laudon, Jane P. INESC Tec. Rep. Reference Number 5719.
Management Information Systems: Managing the Digital
[27]. Bratko, Ivan. PROLOG Programming for Artificial
Firm. 10th Edition. s.l. : Prentice Hall, 2006. ISBN-13: 978-
Intelligence. 3rd Edition. s.l. : Addison Wesley, 2001. ISBN-
8120334687.
13: 978-0201403756.
[15]. Mining Semantic Relations using NetFlow. Caracas, A,
[28]. Alegria, António. CEO Framework Netfacts
et al. Salvador, Bahia, Brasil : IEEE, April 7, 2008, Third
Mapping and Verification Rules. Lisbon : PT Comunicações
IEEE/IFIP International Workshop on Business-driven IT
and INESC INOV, 2009. Technical Report. Available at
Management (BDIM 2008), pp. 110-111. ISBN: 978-1-4244-
https://fenix.ist.utl.pt/homepage/ist153841/technical-
2191-6.
reports/ceo-framework---netfacts-mapping-and-verification-
[16]. Relationship Discovery with NetFlow to Enable rules. INESC Tec. Rep. Reference Number 5720.
Business-Driven IT Management. Kind, A, Gantenbein, D
and Etoh, H. s.l. : IEEE, 2006. IEEE / IFIP International
12. [29]. Internet Programming with Ruby writers. WEBrick -
an HTTP server toolkit. [Online] August 14, 2003. [Cited:
August 4, 2009.] http://www.webrick.org.
[30]. Litchfield, David. The Oracle Hacker's Handbook:
Hacking and Defending Oracle. s.l. : Wiley, 2007. ISBN-13:
978-0470080221.
[31]. Oracle. Oracle Database Net Services Reference - 10g
Release 2 (10.2). [Online] 2005. [Cited: July 15, 2009.]
http://download.oracle.com/docs/cd/B19306_01/network.102/
b14213.pdf. Part Number B14213-01.
[32]. Moura, Paulo. Logtalk - Design of an Object-Oriented
Logic Programming Language. Departamento de Informática,
Universidade da Beira Interior. Covilhã : s.n., 2003. PhD
Thesis.
[33]. Wielemaker, Jan. SWI-Prolog. [Online] [Cited: August
4, 2009.] http://www.swi-prolog.org.
[34]. Russel, Stuart and Norvig, Peter. Artificial
Intelligence: A Modern Approach. 2nd Edition. s.l. : Prentice
Hall, 2002. ISBN-13: 978-0137903955.
[35]. Merrit, Dennis. Building Expert System in Prolog. s.l. :
Springer, 1989. ISBN-13: 978-0387970165.
[36]. Uma experiência open-source para "tomar o pulso" e
"ter o pulso" sobre a função de sistemas e tecnologias de
informação. Alegria, J., Ramalho, R. and Carvalho, T.
Lisbon : CAPSI, 2004.