SlideShare una empresa de Scribd logo
1 de 4
Descargar para leer sin conexión
Execution Models for Processors and Instructions
                             Florian Brandner                                       Viktor Pavlu and Andreas Krall
                 COMPSYS, LIP, ENS de Lyon                                          Institute of Computer Languages
       UMR 5668 CNRS – ENS de Lyon – UCB Lyon – INRIA                               Vienna University of Technology
               Email: florian.brandner@ens-lyon.fr                             Email: {vpavlu,andi}@complang.tuwien.ac.at




    Abstract—Modeling the execution of a processor and its in-        example, the Pentium 4 allows up to 128 instructions to be
structions is a challenging problem, in particular in the presence    in-flight simultaneously [1].
of long pipelines, parallelism, and out-of-order execution. A naive      We thus propose a partitioning of the resources of the pro-
approach based on finite state automata inevitably leads to an
explosion in the number of states and is thus only applicable to      cessor’s datapath, where each resource is assigned a dedicated
simple minimalistic processors.                                       sub-automaton that is (largely) independent of the automata
    During their execution, instructions may only proceed forward     of other resources. This allows the reduction of the overall
through the processor’s datapath towards the end of the pipeline.     number of automaton states, but requires a coordinated proce-
The state of later pipeline stages is thus independent of potential   dure in order to model state transitions among the individual
hazards in preceding stages. This also applies for data hazards,
i. e., we may observe data by-passing from a later stage to an        automata. However, for most pipelined processors this parti-
earlier one, but not in the other direction.                          tioning can be derived readily from the pipeline organization
    Based on this observation, we explore the use of a series         based on the following observation: During execution the state
of parallel finite automata to model the execution states of           of a resource depends only on the current instruction it is
the processor’s resources individually. The automaton model           executing, its successors in the datapath where the current
captures state updates of the individual resources along with
the movement of instructions through the pipeline. A highly-
                                                                      instruction will proceed after finishing its computations at
flexible synchronization scheme built into the automata enables        the resource, as well as the next instruction that the resource
an elegant modeling of parallel computations, pipelining, and         will receive from one of its predecessors in the datapath. A
even out-of-order execution. An interesting property of our           similar interaction pattern can be observed for data by-passing,
approach is the ability to model a subset of a given processor        where register values are forwarded from instructions that are
using a sub-automaton of the full execution model.
                                                                      executed late in the pipeline to instructions in early pipeline
                                                                      stages. It is thus possible to model the state of the processor
                       I. I NTRODUCTION
                                                                      by a series of parallel finite automata (PFA) [2], which are
   Formal models of the dynamic behavior of processors and            synchronized in order to perform state updates according to
instructions during their execution find applications in a wide        the datapath and the processor’s pipeline structure.
range of development, design, validation, and verification                Furthermore, our approach allows to derive execution mod-
tasks. For example, cycle-accurate instruction set simulators         els of individual instructions or groups of instructions using
rely on an execution model in order to efficiently emulate             sub-automata that include only those resources that are re-
the processor’s behavior during the execution of a program.           quired for the execution of the considered instructions. This
During the simulation pipeline effects, resource-, control-, and      is particularly interesting for analysis and verification tasks
data-hazards need to be detected and faithfully modeled in            where the reduced size of the sub-automata is expected to be
order to guarantee correctness. The validation and verification        beneficial.
of a processor poses similar requirements. Here the processor            The rest of this paper is organized as follows. First, we
designer seeks for a proof that the behavior of the processor         will present previous work on execution models in Section II,
actually reflects the properties stated by a formal specification.      followed by an introduction to the basic concepts of parallel
In addition, processor models can help to build software devel-       finite automata in Section III. Section IV introduces our exe-
opment and analysis tools, such as worst-case execution time          cution model for processors based on parallel finite automata.
(WCET) analysis tools, code profiling tools, code certification         Next, we present how execution models for a subset of the
tools, and compilers.                                                 processor’s instruction set can be derived down to the level of
   Naive approaches based on finite state automata (FSA)               individual instructions. Finally, we will conclude by shortly
inevitably lead to a rapid growth in the number of states.            discussing the merits our new approach in Section VI.
In particularly when multiple instructions are executed in
parallel, as is the case for pipelined, superscalar, or explicitly                         II. R ELATED W ORK
parallel processors such as VLIW machines. Stalls due to                 Execution models for processors are often developed in
various kinds of data hazards, long memory latencies, shared          combination with processor description languages (PDL) [3].
resources, etc. further increase the number of states, rendering      Lisa [4] models pipeline effects using extended reservation
this approach impractical for many application scenarios. For         tables called L-charts that contain operations and the activation


978-1-4244-8971-8/10$26.00 c 2010 IEEE
a                                    multiple source nodes are termed synchronization transitions.
                                                                              A transition can be applied, or is enabled, if all its source
                                                        n4                    nodes are active in the current state and the current input
                           start         n1                                   symbol matches the transition’s label. The new state after the
                                                                              execution of a transition is derived by first deactivating all
                                                        c                     source nodes and then activating all its target nodes. At the
                                                                              same time the current input symbol is consumed, unless the
                                         λ                                    label represents the special symbol ‘λ‘. In that case the current
             b           n2                             n3
                                                                              input symbol is not consumed, nevertheless the transition is
                                                                              still considered enabled. The automaton accepts its input when
                                                                              all the input symbols have been consumed and a final state
Fig. 1.   A parallel finite automaton accepting a∗ (b∗   c) - see Example 1.
                                                                              has been reached, i.e., when all final nodes in F are active.
relations between them to support dynamic scheduling for                      A formal definition of PFAs and their execution is presented
simple in-order pipelined processors. MADL uses extended                      in [2].
finite state machines with tokens [5]. The token manager,                          A PFA can be represented as a labeled directed hyper-
however, is implemented in C/C++, so the formal model                         graph [8] H = (V, E), consisting of a set of vertices V
remains incomplete. The approach presented in this paper                      and hyperedges e ∈ E ⊆ P(V ) × P(V ) × (Σ ∪ {λ}). The
was also designed in conjunction with a PDL [6], [7]. The                     automaton’s nodes are represented by vertices of the graph,
language relies on hypergraphs [8] in order to specify the                    i. e., V = N , and the transitions in µ correspond to labeled
hardware structure of a processor. Due to the close relationship              hyperedges.
between hypergraphs and PFAs it is straightforward to derive                      The language accepted by a PFA can be represented using
an execution model from a processor specification.                             an extended form of regular expressions, with the common
   Furthermore, processor models are used in software and                     operators ?, ∗, +, and |. An additional operator represents
hardware verification tools, analysis tools, and instruction set               the interleaving of the languages specified by the respective
simulators. Reshadi and Dutt use Reduced Colored Petri Nets                   sub-expressions.
to formally model the pipeline of processors. They are able                   Example 1. Consider the PFA depicted in Figure 1. The node
to synthesize efficient instruction set simulators [9] for a wide              n1 is the start node, while nodes n2 and n4 are final nodes. The
range of processor architectures. Thesing [10] uses finite state               figure shows a hypergraph representation, where the hyperedge
machines in order to capture the execution state of a processor               labeled ‘λ‘ is a parallel transition that does not consume any
for worst-case execution time (WCET) analysis. Synchronous                    symbol. The two sub-automata represented by n2 as well as
Transition Systems are used by Damm and Pnueli [11] to verify                 n3 and n4 may proceed independently. The PFA thus accepts
machines with out-of-order execution by showing that they                     the language a∗ (b∗ c), where the sub-expressions b∗ and c
reach the same final state as purely sequential machines.                      are processed independently from each other.
                                                                              The automaton rejects the word abb , because the non-
                 III. PARALLEL F INITE AUTOMATA
                                                                              final node n3 is activated after consuming the input symbol
   Parallel finite automata (PFA) are related to traditional finite             ‘a‘ but never deactivated thereafter. Final node n4 is never
state automata, but allow multiple nodes of the automaton                     activated and thus no final state is reached. The word abbc
to be active simultaneously. Stotts and Pugh showed that                      is, however, accepted. In terms of the automaton’s execution
PFAs are equivalent to finite state automaton (FSA) [2], i.e.,                 this word is equivalent to any other interleaving, e. g., abcb
it is possible to construct a regular FSA from any given                      or acbb .
PFA. However, this transformation may lead to an exponential
growth in the number of states. PFAs thus allow a more                                  IV. P ROCESSOR E XECUTION M ODELING
compact representation of certain classes of regular languages.                  Initially, lets assume that the processor, for which an ex-
Definition. A parallel finite automaton is defined by a                          ecution model is to be developed, is composed by a set of
tuple M = (N, Q, Σ, µ, q0 , F ), where N is a set of nodes,                   resources, each of which is able to process one instruction at
Q ⊆ P(N ) is a finite set of states, Σ a finite input alphabet,                 a time. Furthermore, assume that instructions traverse a subset
µ : P(N )×(Σ∪{λ}) → P(N ) is a transition function, q0 ∈ Q                    of the resources during their execution - note that it is possible
a start state, and F ⊆ N a set of final nodes.                                 that one instruction uses several resources at the same time. In
                                                                              the following, we will show how parallel automata can be used
   Very similar to traditional automata, the execution of a                   to model the execution state of the processor, its resources, and
PFA is described by repeated transitions from one state to                    its instructions.
another. The difference is that multiple nodes can be active on
each state before and after every transition. Consequently, the               A. Modeling Resources
transition function µ provides an extended semantics covering                   A resource is modeled using four kinds of nodes in the
multiple nodes of the automaton. Transitions with multiple                    automaton: (1) a ready node, (2) an accept node, (3) a
target nodes are termed parallel transitions, while those having              complete node, and (4) two synchronization nodes. A local
cycle end
                 start                                                                                                           rdyb


                 rdy                           synco
                                                                                            rdya                                acptb



                               synci                                                                            τ
    enter                                         ω
               cycle start                                                                 cmplta                               acptc



                 acpt             ρ            cmplt                                                                             rdyc


                                                                            Fig. 3. Nodes and transition modeling a connection among multiple resources
                                                    leave                   of the processor.

Fig. 2. Nodes and transitions representing the state of a resource in the
parallel automaton.                                                         model the movement of instructions through the datapath and
                                                                            in addition perform arbitration of the involved resources.
view of the nodes and transitions of a resource is shown in                    Figure 3 depicts an inter-resource transition modeling an
Figure 2, the interaction with other automata of the execution              instruction that after completing its execution at resource
model is indicated by the gray transitions.                                 a moves on to resources b and c, which then process the
   The synchronization nodes separate transitions logically                 instruction in parallel. The transition is enabled only when the
belonging to one execution cycle from those of the next cycle.              complete node of the source resource is active and at the same
The synchronization node at the center of Figure 2 serves as                time both ready nodes of the targets are active too. When the
a token that ensures that only a single action is performed                 transition is executed, the source resource becomes ready and
by a resource on every cycle. The activation of the second                  at the same time the target resources leave the ready node and
synchronization node on the other hand indicates that the                   transition into the accept nodes. This effectively models how
resource has performed its action for the current cycle and                 an instruction advances through the datapath from resource
is ready to proceed to the next cycle. The gray transactions                to resource. Furthermore, mutual exclusive use of resources
leading to and from those nodes represent a synchronization                 is guaranteed, since the ready nodes of the target resources
among all the resources of the processor.                                   are properly deactivated when an instruction is accepted for
   The other nodes represent the current status of the resource             execution.
itself. The ready node is initially active and indicates that the
resource is currently ready to accept instructions for execution.
The node is deactivated whenever an instruction is accepted for             C. Cycle Synchronization
execution by the resource, and may only be reactivated when                    Every resource is allowed to perform a single resource-local
the currently executing instruction has completed. When the                 transition per cycle, since all resource-local transitions require
resource is idle a transition may immediately re-activate the               the synci node to be active. The synci node thus logically
ready node – as indicated by transition . If an instruction                 marks the beginning of a cycle. Similarly, all resource-local
has been accepted, it proceeds by performing the computation                transitions activate the synco node, which logically marks the
represented by the transition labeled ρ leading from the accept             end of a cycle in the view of a resource. Once all synco nodes
node to the complete node. Finally, when an instruction is                  of all resources are active a global synchronization transition
blocked, i. e., it cannot proceed to another resource due to                is enabled, which re-activates all synci nodes and signals the
a hazard, it has to stall. This is represented by transition ω,             transition from one execution cycle to the next.
leading from the complete node back to itself. Note that all
                                                                               Furthermore, advanced arbitration schemes can be realized
these resource-local transitions disable the synci node and in
                                                                            using the synchronization nodes by keeping the synci nodes of
turn enable synco .
                                                                            certain resources inactive and selectively enabling them when
                                                                            appropriate. This can be particularly valuable in pipelined
B. Instruction Execution
                                                                            processors where resources late within the pipeline control the
  The sub-automata of the individual processor resources are                behavior of other resources that appear earlier in the pipeline,
chained together using inter-resource transitions as indicated              as for example for data by-passing and other forms of data
by the enter and leave transitions in Figure 2. These transitions           and control hazards.
D. Transition Guards                                                associating its transitions with simulation functions that update
   The model presented so far is already able to express            the emulated processor state. In comparison to interpreters
datapaths with structural hazards. In the presence of data or       based on FSAs this will most likely induce a performance
control dependencies, the transitions have to be augmented          hit, since multiple transitions are required in order to sim-
with guard functions. Similarly, dynamic scheduling of mod-         ulate a single execution cycle. However, in the context of
ern superscalar processors and shared resources might require       compiling simulators, which are orders of magnitudes faster
additional guard functions. The guard functions inspect vari-       than interpreting simulators, this additional information has
ables that are associated with the sub-automata of individual       the potential to unveil optimization opportunities, e.g., by
resources. These variables represent control decisions, data,       eliminating unused code via tracing [12].
or input operands processed by the instructions currently ex-          In addition to instruction set simulation, PFA-based execu-
ecuted by the respective resources. The structure of the guard      tion models might also prove valuable for the development
functions is highly dependent on the application scenario an        of verification and analysis tools. With PFAs it is possible
execution model is applied to. However, our model generally         to derive sub-automata of individual instructions, pairs of
simplifies the guard functions since the synchronization nodes       instructions or groups of instructions. These sub-automata are
and transitions provide a consistent view of the currently active   much smaller than the original automaton of the processor’s
resources and instructions.                                         full execution model. This opens the possibility for formal
                                                                    verification methods at the instruction level to proceed in two
             V. I NSTRUCTION - LEVEL M ODELS                        steps. The analysis first operates on the smaller instruction-
   An interesting property of the approach presented in the         level sub-automaton in order to detect possible fault condi-
previous section is that an instruction is represented as a set     tions, such as data loss or race conditions. During the second
of active nodes in the automaton throughout its execution. The      phase the partial results obtained during the first phase are
movement of an instruction through the datapath is resembled        applied to the full execution model and further extended to
by the deactivation and activation of nodes by corresponding        cover the state of the complete processor. This reduces the
transitions.                                                        number of processor states that have to be examined during
   It is thus possible to deduce a sub-automaton consisting         the analysis considerably.
of only those nodes that are potentially activated during the                                     R EFERENCES
execution of an instruction. The result is an execution model
                                                                     [1] J. L. Hennessy and D. A. Patterson, Computer Architecture - A Quan-
of the given instruction, which is considerably smaller than the         titative Approach, 4th ed. Morgan Kaufmann, 2006.
full processor model. This idea can also be extended to pairs        [2] P. D. Stotts and W. Pugh, “Parallel finite automata for modeling
of instructions or more general groups of instructions, such as          concurrent software systems,” J. Syst. Softw., vol. 27, no. 1, pp. 27–
                                                                         43, 1994.
all arithmetic instructions or all floating-point instructions of     [3] P. Mishra and N. Dutt, Processor Description Languages. Morgan
a processor.                                                             Kaufmann Publishers Inc., 2008, vol. 1.
                                                                     [4] A. Hoffmann, H. Meyr, and R. Leupers, Architecture Exploration for
            VI. D ISCUSSION AND C ONCLUSION                              Embedded Processors with Lisa. Kluwer Academic Publishers, 2002.
                                                                     [5] W. Qin and S. Malik, “Flexible and formal modeling of microprocessors
   PFAs offer a number of advantages for the modeling of                 with application to retargetable simulation,” in DATE ’03: Proceedings
processors. The processor itself is inherently parallel, so the          of the Conference on Design, Automation and Test in Europe. IEEE
modeling with PFAs is quite natural and easy to understand.              Computer Society, 2003, pp. 556–561.
                                                                     [6] F. Brandner, D. Ebner, and A. Krall, “Compiler generation from
PFA-based processor models are very visual and can be                    structural architecture descriptions,” in CASES ’07: Proceedings of the
organized in order to match common graphical representations             International Conference on Compilers, Architecture, and Synthesis for
of processors such as pipeline diagrams. The movement of                 Embedded Systems. ACM, 2007, pp. 13–22.
                                                                     [7] F. Brandner, “Compiler backend generation from structural processor
instructions through the pipeline is explicitly visible, which           models,” Ph.D. dissertation, Institut f¨ r Computersprachen, Technische
                                                                                                                 u
simplifies the development and debugging of PFA models.                   Universit¨ t Wien, 2009.
                                                                                   a
   A major advantage of PFAs over FSAs is the decoupling of          [8] J. A. Bondy and U. S. R. Murty, Graduate texts in mathematics - Graph
                                                                         theory. Springer, 2007, vol. 244.
states and nodes. PFAs thus consist of fewer nodes compared          [9] M. Reshadi and N. Dutt, “Generic pipelined processor modeling and
to equivalent regular FSA modeling the same processor. Still,            high performance cycle-accurate simulator generation,” in DATE ’05:
the PFA implicitly encodes the same number of states. The                Proceedings of the Conference on Design, Automation and Test in
                                                                         Europe. IEEE Computer Society, 2005, pp. 786–791.
expressive power does not change since PFAs and FSAs are            [10] S. Thesing, “Safe and precise WCET determination by abstract inter-
equivalent. The implicit encoding of all possible processor              pretation of pipeline models,” Ph.D. dissertation, Naturwissenschaftlich-
states also simplifies the construction of PFA models, i.e.,              Technische Fakult¨ t der Universit¨ t des Saarlandes, 2004.
                                                                                            a               a
                                                                    [11] W. Damm and A. Pnueli, “Verifying out-of-order executions,” in Pro-
it is no longer required to pre-compute all possible states to           ceedings of the IFIP WG 10.5 International Conference on Correct
construct the automaton.                                                 Hardware Design and Verification Methods. Chapman & Hall, Ltd.,
                                                                         1997, pp. 23–47.
A. Application Scenarios                                            [12] F. Brandner, “Precise simulation of interrupts using a rollback mecha-
                                                                         nism,” in SCOPES ’09: Proceedings of the 12th International Workshop
   The presented approach was initially intended as a flexible            on Software and Compilers for Embedded Systems. ACM, 2009, pp.
and generic execution model for instruction set simulators.              71–80.
Interpreting simulators can immediately reuse the PFA by

Más contenido relacionado

La actualidad más candente

33443223 system-software-unit-iv
33443223 system-software-unit-iv33443223 system-software-unit-iv
33443223 system-software-unit-iv
Shaniya Fathimuthu
 
A practical introduction to hardware software codesign 2e
A practical introduction to hardware software codesign  2eA practical introduction to hardware software codesign  2e
A practical introduction to hardware software codesign 2e
Springer
 
Directive-based approach to Heterogeneous Computing
Directive-based approach to Heterogeneous ComputingDirective-based approach to Heterogeneous Computing
Directive-based approach to Heterogeneous Computing
Ruymán Reyes
 
MPI Presentation
MPI PresentationMPI Presentation
MPI Presentation
Tayfun Sen
 
System programming and implementation
System programming and implementationSystem programming and implementation
System programming and implementation
John Todora
 

La actualidad más candente (20)

33443223 system-software-unit-iv
33443223 system-software-unit-iv33443223 system-software-unit-iv
33443223 system-software-unit-iv
 
Sc13 comex poster
Sc13 comex posterSc13 comex poster
Sc13 comex poster
 
Pattern-based Definition and Generation of Components for a Synchronous React...
Pattern-based Definition and Generation of Components for a Synchronous React...Pattern-based Definition and Generation of Components for a Synchronous React...
Pattern-based Definition and Generation of Components for a Synchronous React...
 
MPI message passing interface
MPI message passing interfaceMPI message passing interface
MPI message passing interface
 
Unit 2
Unit 2Unit 2
Unit 2
 
F03201028033
F03201028033F03201028033
F03201028033
 
MPI History
MPI HistoryMPI History
MPI History
 
A practical introduction to hardware software codesign 2e
A practical introduction to hardware software codesign  2eA practical introduction to hardware software codesign  2e
A practical introduction to hardware software codesign 2e
 
Directive-based approach to Heterogeneous Computing
Directive-based approach to Heterogeneous ComputingDirective-based approach to Heterogeneous Computing
Directive-based approach to Heterogeneous Computing
 
Unit1 cd
Unit1 cdUnit1 cd
Unit1 cd
 
CFD - OpenFOAM
CFD - OpenFOAMCFD - OpenFOAM
CFD - OpenFOAM
 
Chapter 6 pc
Chapter 6 pcChapter 6 pc
Chapter 6 pc
 
A rendering architecture
A rendering architectureA rendering architecture
A rendering architecture
 
MPI Presentation
MPI PresentationMPI Presentation
MPI Presentation
 
Theory Psyco
Theory PsycoTheory Psyco
Theory Psyco
 
Programming using MPI and OpenMP
Programming using MPI and OpenMPProgramming using MPI and OpenMP
Programming using MPI and OpenMP
 
Presentation slides for "A formal foundation for trace-based JIT compilation"
Presentation slides for "A formal foundation for trace-based JIT compilation"Presentation slides for "A formal foundation for trace-based JIT compilation"
Presentation slides for "A formal foundation for trace-based JIT compilation"
 
Message passing interface
Message passing interfaceMessage passing interface
Message passing interface
 
System programming and implementation
System programming and implementationSystem programming and implementation
System programming and implementation
 
The Message Passing Interface (MPI) in Layman's Terms
The Message Passing Interface (MPI) in Layman's TermsThe Message Passing Interface (MPI) in Layman's Terms
The Message Passing Interface (MPI) in Layman's Terms
 

Destacado (9)

70
7070
70
 
72
7272
72
 
73
7373
73
 
82
8282
82
 
84
8484
84
 
83
8383
83
 
87
8787
87
 
75
7575
75
 
94
9494
94
 

Similar a 19

Unit 3 principles of programming language
Unit 3 principles of programming languageUnit 3 principles of programming language
Unit 3 principles of programming language
Vasavi College of Engg
 
Machine Learning At Speed: Operationalizing ML For Real-Time Data Streams
Machine Learning At Speed: Operationalizing ML For Real-Time Data StreamsMachine Learning At Speed: Operationalizing ML For Real-Time Data Streams
Machine Learning At Speed: Operationalizing ML For Real-Time Data Streams
Lightbend
 
Reflective and Refractive Variables: A Model for Effective and Maintainable A...
Reflective and Refractive Variables: A Model for Effective and Maintainable A...Reflective and Refractive Variables: A Model for Effective and Maintainable A...
Reflective and Refractive Variables: A Model for Effective and Maintainable A...
Vincenzo De Florio
 

Similar a 19 (20)

Parallel programming model
Parallel programming modelParallel programming model
Parallel programming model
 
Capturing Monotonic Components from Input Patterns
Capturing Monotonic Components from Input PatternsCapturing Monotonic Components from Input Patterns
Capturing Monotonic Components from Input Patterns
 
advanced computer architesture-conditions of parallelism
advanced computer architesture-conditions of parallelismadvanced computer architesture-conditions of parallelism
advanced computer architesture-conditions of parallelism
 
On modeling controller switch interaction in openflow based sdns
On modeling controller switch interaction in openflow based sdnsOn modeling controller switch interaction in openflow based sdns
On modeling controller switch interaction in openflow based sdns
 
Unit 3 principles of programming language
Unit 3 principles of programming languageUnit 3 principles of programming language
Unit 3 principles of programming language
 
Unit2-Part2-MultithreadAlgos.pptx.pdf
Unit2-Part2-MultithreadAlgos.pptx.pdfUnit2-Part2-MultithreadAlgos.pptx.pdf
Unit2-Part2-MultithreadAlgos.pptx.pdf
 
Pipelining and vector processing
Pipelining and vector processingPipelining and vector processing
Pipelining and vector processing
 
On the modeling of
On the modeling ofOn the modeling of
On the modeling of
 
Machine Learning At Speed: Operationalizing ML For Real-Time Data Streams
Machine Learning At Speed: Operationalizing ML For Real-Time Data StreamsMachine Learning At Speed: Operationalizing ML For Real-Time Data Streams
Machine Learning At Speed: Operationalizing ML For Real-Time Data Streams
 
D017311724
D017311724D017311724
D017311724
 
Model Based Software Timing Analysis Using Sequence Diagram for Commercial Ap...
Model Based Software Timing Analysis Using Sequence Diagram for Commercial Ap...Model Based Software Timing Analysis Using Sequence Diagram for Commercial Ap...
Model Based Software Timing Analysis Using Sequence Diagram for Commercial Ap...
 
Operationalizing Machine Learning: Serving ML Models
Operationalizing Machine Learning: Serving ML ModelsOperationalizing Machine Learning: Serving ML Models
Operationalizing Machine Learning: Serving ML Models
 
Reactive Applications with Apache Pulsar and Spring Boot
Reactive Applications with Apache Pulsar and Spring BootReactive Applications with Apache Pulsar and Spring Boot
Reactive Applications with Apache Pulsar and Spring Boot
 
1844 1849
1844 18491844 1849
1844 1849
 
1844 1849
1844 18491844 1849
1844 1849
 
program flow mechanisms, advanced computer architecture
program flow mechanisms, advanced computer architectureprogram flow mechanisms, advanced computer architecture
program flow mechanisms, advanced computer architecture
 
Bh36352357
Bh36352357Bh36352357
Bh36352357
 
Implementing True Zero Cycle Branching in Scalar and Superscalar Pipelined Pr...
Implementing True Zero Cycle Branching in Scalar and Superscalar Pipelined Pr...Implementing True Zero Cycle Branching in Scalar and Superscalar Pipelined Pr...
Implementing True Zero Cycle Branching in Scalar and Superscalar Pipelined Pr...
 
Reflective and Refractive Variables: A Model for Effective and Maintainable A...
Reflective and Refractive Variables: A Model for Effective and Maintainable A...Reflective and Refractive Variables: A Model for Effective and Maintainable A...
Reflective and Refractive Variables: A Model for Effective and Maintainable A...
 
UML2SAN: Toward A New Software Performance Engineering Approach
UML2SAN: Toward A New Software Performance Engineering ApproachUML2SAN: Toward A New Software Performance Engineering Approach
UML2SAN: Toward A New Software Performance Engineering Approach
 

Más de srimoorthi (20)

69
6969
69
 
68
6868
68
 
63
6363
63
 
62
6262
62
 
61
6161
61
 
60
6060
60
 
59
5959
59
 
57
5757
57
 
56
5656
56
 
50
5050
50
 
55
5555
55
 
52
5252
52
 
53
5353
53
 
51
5151
51
 
49
4949
49
 
45
4545
45
 
44
4444
44
 
43
4343
43
 
42
4242
42
 
41
4141
41
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Último (20)

HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 

19

  • 1. Execution Models for Processors and Instructions Florian Brandner Viktor Pavlu and Andreas Krall COMPSYS, LIP, ENS de Lyon Institute of Computer Languages UMR 5668 CNRS – ENS de Lyon – UCB Lyon – INRIA Vienna University of Technology Email: florian.brandner@ens-lyon.fr Email: {vpavlu,andi}@complang.tuwien.ac.at Abstract—Modeling the execution of a processor and its in- example, the Pentium 4 allows up to 128 instructions to be structions is a challenging problem, in particular in the presence in-flight simultaneously [1]. of long pipelines, parallelism, and out-of-order execution. A naive We thus propose a partitioning of the resources of the pro- approach based on finite state automata inevitably leads to an explosion in the number of states and is thus only applicable to cessor’s datapath, where each resource is assigned a dedicated simple minimalistic processors. sub-automaton that is (largely) independent of the automata During their execution, instructions may only proceed forward of other resources. This allows the reduction of the overall through the processor’s datapath towards the end of the pipeline. number of automaton states, but requires a coordinated proce- The state of later pipeline stages is thus independent of potential dure in order to model state transitions among the individual hazards in preceding stages. This also applies for data hazards, i. e., we may observe data by-passing from a later stage to an automata. However, for most pipelined processors this parti- earlier one, but not in the other direction. tioning can be derived readily from the pipeline organization Based on this observation, we explore the use of a series based on the following observation: During execution the state of parallel finite automata to model the execution states of of a resource depends only on the current instruction it is the processor’s resources individually. The automaton model executing, its successors in the datapath where the current captures state updates of the individual resources along with the movement of instructions through the pipeline. A highly- instruction will proceed after finishing its computations at flexible synchronization scheme built into the automata enables the resource, as well as the next instruction that the resource an elegant modeling of parallel computations, pipelining, and will receive from one of its predecessors in the datapath. A even out-of-order execution. An interesting property of our similar interaction pattern can be observed for data by-passing, approach is the ability to model a subset of a given processor where register values are forwarded from instructions that are using a sub-automaton of the full execution model. executed late in the pipeline to instructions in early pipeline stages. It is thus possible to model the state of the processor I. I NTRODUCTION by a series of parallel finite automata (PFA) [2], which are Formal models of the dynamic behavior of processors and synchronized in order to perform state updates according to instructions during their execution find applications in a wide the datapath and the processor’s pipeline structure. range of development, design, validation, and verification Furthermore, our approach allows to derive execution mod- tasks. For example, cycle-accurate instruction set simulators els of individual instructions or groups of instructions using rely on an execution model in order to efficiently emulate sub-automata that include only those resources that are re- the processor’s behavior during the execution of a program. quired for the execution of the considered instructions. This During the simulation pipeline effects, resource-, control-, and is particularly interesting for analysis and verification tasks data-hazards need to be detected and faithfully modeled in where the reduced size of the sub-automata is expected to be order to guarantee correctness. The validation and verification beneficial. of a processor poses similar requirements. Here the processor The rest of this paper is organized as follows. First, we designer seeks for a proof that the behavior of the processor will present previous work on execution models in Section II, actually reflects the properties stated by a formal specification. followed by an introduction to the basic concepts of parallel In addition, processor models can help to build software devel- finite automata in Section III. Section IV introduces our exe- opment and analysis tools, such as worst-case execution time cution model for processors based on parallel finite automata. (WCET) analysis tools, code profiling tools, code certification Next, we present how execution models for a subset of the tools, and compilers. processor’s instruction set can be derived down to the level of Naive approaches based on finite state automata (FSA) individual instructions. Finally, we will conclude by shortly inevitably lead to a rapid growth in the number of states. discussing the merits our new approach in Section VI. In particularly when multiple instructions are executed in parallel, as is the case for pipelined, superscalar, or explicitly II. R ELATED W ORK parallel processors such as VLIW machines. Stalls due to Execution models for processors are often developed in various kinds of data hazards, long memory latencies, shared combination with processor description languages (PDL) [3]. resources, etc. further increase the number of states, rendering Lisa [4] models pipeline effects using extended reservation this approach impractical for many application scenarios. For tables called L-charts that contain operations and the activation 978-1-4244-8971-8/10$26.00 c 2010 IEEE
  • 2. a multiple source nodes are termed synchronization transitions. A transition can be applied, or is enabled, if all its source n4 nodes are active in the current state and the current input start n1 symbol matches the transition’s label. The new state after the execution of a transition is derived by first deactivating all c source nodes and then activating all its target nodes. At the same time the current input symbol is consumed, unless the λ label represents the special symbol ‘λ‘. In that case the current b n2 n3 input symbol is not consumed, nevertheless the transition is still considered enabled. The automaton accepts its input when all the input symbols have been consumed and a final state Fig. 1. A parallel finite automaton accepting a∗ (b∗ c) - see Example 1. has been reached, i.e., when all final nodes in F are active. relations between them to support dynamic scheduling for A formal definition of PFAs and their execution is presented simple in-order pipelined processors. MADL uses extended in [2]. finite state machines with tokens [5]. The token manager, A PFA can be represented as a labeled directed hyper- however, is implemented in C/C++, so the formal model graph [8] H = (V, E), consisting of a set of vertices V remains incomplete. The approach presented in this paper and hyperedges e ∈ E ⊆ P(V ) × P(V ) × (Σ ∪ {λ}). The was also designed in conjunction with a PDL [6], [7]. The automaton’s nodes are represented by vertices of the graph, language relies on hypergraphs [8] in order to specify the i. e., V = N , and the transitions in µ correspond to labeled hardware structure of a processor. Due to the close relationship hyperedges. between hypergraphs and PFAs it is straightforward to derive The language accepted by a PFA can be represented using an execution model from a processor specification. an extended form of regular expressions, with the common Furthermore, processor models are used in software and operators ?, ∗, +, and |. An additional operator represents hardware verification tools, analysis tools, and instruction set the interleaving of the languages specified by the respective simulators. Reshadi and Dutt use Reduced Colored Petri Nets sub-expressions. to formally model the pipeline of processors. They are able Example 1. Consider the PFA depicted in Figure 1. The node to synthesize efficient instruction set simulators [9] for a wide n1 is the start node, while nodes n2 and n4 are final nodes. The range of processor architectures. Thesing [10] uses finite state figure shows a hypergraph representation, where the hyperedge machines in order to capture the execution state of a processor labeled ‘λ‘ is a parallel transition that does not consume any for worst-case execution time (WCET) analysis. Synchronous symbol. The two sub-automata represented by n2 as well as Transition Systems are used by Damm and Pnueli [11] to verify n3 and n4 may proceed independently. The PFA thus accepts machines with out-of-order execution by showing that they the language a∗ (b∗ c), where the sub-expressions b∗ and c reach the same final state as purely sequential machines. are processed independently from each other. The automaton rejects the word abb , because the non- III. PARALLEL F INITE AUTOMATA final node n3 is activated after consuming the input symbol Parallel finite automata (PFA) are related to traditional finite ‘a‘ but never deactivated thereafter. Final node n4 is never state automata, but allow multiple nodes of the automaton activated and thus no final state is reached. The word abbc to be active simultaneously. Stotts and Pugh showed that is, however, accepted. In terms of the automaton’s execution PFAs are equivalent to finite state automaton (FSA) [2], i.e., this word is equivalent to any other interleaving, e. g., abcb it is possible to construct a regular FSA from any given or acbb . PFA. However, this transformation may lead to an exponential growth in the number of states. PFAs thus allow a more IV. P ROCESSOR E XECUTION M ODELING compact representation of certain classes of regular languages. Initially, lets assume that the processor, for which an ex- Definition. A parallel finite automaton is defined by a ecution model is to be developed, is composed by a set of tuple M = (N, Q, Σ, µ, q0 , F ), where N is a set of nodes, resources, each of which is able to process one instruction at Q ⊆ P(N ) is a finite set of states, Σ a finite input alphabet, a time. Furthermore, assume that instructions traverse a subset µ : P(N )×(Σ∪{λ}) → P(N ) is a transition function, q0 ∈ Q of the resources during their execution - note that it is possible a start state, and F ⊆ N a set of final nodes. that one instruction uses several resources at the same time. In the following, we will show how parallel automata can be used Very similar to traditional automata, the execution of a to model the execution state of the processor, its resources, and PFA is described by repeated transitions from one state to its instructions. another. The difference is that multiple nodes can be active on each state before and after every transition. Consequently, the A. Modeling Resources transition function µ provides an extended semantics covering A resource is modeled using four kinds of nodes in the multiple nodes of the automaton. Transitions with multiple automaton: (1) a ready node, (2) an accept node, (3) a target nodes are termed parallel transitions, while those having complete node, and (4) two synchronization nodes. A local
  • 3. cycle end start rdyb rdy synco rdya acptb synci τ enter ω cycle start cmplta acptc acpt ρ cmplt rdyc Fig. 3. Nodes and transition modeling a connection among multiple resources leave of the processor. Fig. 2. Nodes and transitions representing the state of a resource in the parallel automaton. model the movement of instructions through the datapath and in addition perform arbitration of the involved resources. view of the nodes and transitions of a resource is shown in Figure 3 depicts an inter-resource transition modeling an Figure 2, the interaction with other automata of the execution instruction that after completing its execution at resource model is indicated by the gray transitions. a moves on to resources b and c, which then process the The synchronization nodes separate transitions logically instruction in parallel. The transition is enabled only when the belonging to one execution cycle from those of the next cycle. complete node of the source resource is active and at the same The synchronization node at the center of Figure 2 serves as time both ready nodes of the targets are active too. When the a token that ensures that only a single action is performed transition is executed, the source resource becomes ready and by a resource on every cycle. The activation of the second at the same time the target resources leave the ready node and synchronization node on the other hand indicates that the transition into the accept nodes. This effectively models how resource has performed its action for the current cycle and an instruction advances through the datapath from resource is ready to proceed to the next cycle. The gray transactions to resource. Furthermore, mutual exclusive use of resources leading to and from those nodes represent a synchronization is guaranteed, since the ready nodes of the target resources among all the resources of the processor. are properly deactivated when an instruction is accepted for The other nodes represent the current status of the resource execution. itself. The ready node is initially active and indicates that the resource is currently ready to accept instructions for execution. The node is deactivated whenever an instruction is accepted for C. Cycle Synchronization execution by the resource, and may only be reactivated when Every resource is allowed to perform a single resource-local the currently executing instruction has completed. When the transition per cycle, since all resource-local transitions require resource is idle a transition may immediately re-activate the the synci node to be active. The synci node thus logically ready node – as indicated by transition . If an instruction marks the beginning of a cycle. Similarly, all resource-local has been accepted, it proceeds by performing the computation transitions activate the synco node, which logically marks the represented by the transition labeled ρ leading from the accept end of a cycle in the view of a resource. Once all synco nodes node to the complete node. Finally, when an instruction is of all resources are active a global synchronization transition blocked, i. e., it cannot proceed to another resource due to is enabled, which re-activates all synci nodes and signals the a hazard, it has to stall. This is represented by transition ω, transition from one execution cycle to the next. leading from the complete node back to itself. Note that all Furthermore, advanced arbitration schemes can be realized these resource-local transitions disable the synci node and in using the synchronization nodes by keeping the synci nodes of turn enable synco . certain resources inactive and selectively enabling them when appropriate. This can be particularly valuable in pipelined B. Instruction Execution processors where resources late within the pipeline control the The sub-automata of the individual processor resources are behavior of other resources that appear earlier in the pipeline, chained together using inter-resource transitions as indicated as for example for data by-passing and other forms of data by the enter and leave transitions in Figure 2. These transitions and control hazards.
  • 4. D. Transition Guards associating its transitions with simulation functions that update The model presented so far is already able to express the emulated processor state. In comparison to interpreters datapaths with structural hazards. In the presence of data or based on FSAs this will most likely induce a performance control dependencies, the transitions have to be augmented hit, since multiple transitions are required in order to sim- with guard functions. Similarly, dynamic scheduling of mod- ulate a single execution cycle. However, in the context of ern superscalar processors and shared resources might require compiling simulators, which are orders of magnitudes faster additional guard functions. The guard functions inspect vari- than interpreting simulators, this additional information has ables that are associated with the sub-automata of individual the potential to unveil optimization opportunities, e.g., by resources. These variables represent control decisions, data, eliminating unused code via tracing [12]. or input operands processed by the instructions currently ex- In addition to instruction set simulation, PFA-based execu- ecuted by the respective resources. The structure of the guard tion models might also prove valuable for the development functions is highly dependent on the application scenario an of verification and analysis tools. With PFAs it is possible execution model is applied to. However, our model generally to derive sub-automata of individual instructions, pairs of simplifies the guard functions since the synchronization nodes instructions or groups of instructions. These sub-automata are and transitions provide a consistent view of the currently active much smaller than the original automaton of the processor’s resources and instructions. full execution model. This opens the possibility for formal verification methods at the instruction level to proceed in two V. I NSTRUCTION - LEVEL M ODELS steps. The analysis first operates on the smaller instruction- An interesting property of the approach presented in the level sub-automaton in order to detect possible fault condi- previous section is that an instruction is represented as a set tions, such as data loss or race conditions. During the second of active nodes in the automaton throughout its execution. The phase the partial results obtained during the first phase are movement of an instruction through the datapath is resembled applied to the full execution model and further extended to by the deactivation and activation of nodes by corresponding cover the state of the complete processor. This reduces the transitions. number of processor states that have to be examined during It is thus possible to deduce a sub-automaton consisting the analysis considerably. of only those nodes that are potentially activated during the R EFERENCES execution of an instruction. The result is an execution model [1] J. L. Hennessy and D. A. Patterson, Computer Architecture - A Quan- of the given instruction, which is considerably smaller than the titative Approach, 4th ed. Morgan Kaufmann, 2006. full processor model. This idea can also be extended to pairs [2] P. D. Stotts and W. Pugh, “Parallel finite automata for modeling of instructions or more general groups of instructions, such as concurrent software systems,” J. Syst. Softw., vol. 27, no. 1, pp. 27– 43, 1994. all arithmetic instructions or all floating-point instructions of [3] P. Mishra and N. Dutt, Processor Description Languages. Morgan a processor. Kaufmann Publishers Inc., 2008, vol. 1. [4] A. Hoffmann, H. Meyr, and R. Leupers, Architecture Exploration for VI. D ISCUSSION AND C ONCLUSION Embedded Processors with Lisa. Kluwer Academic Publishers, 2002. [5] W. Qin and S. Malik, “Flexible and formal modeling of microprocessors PFAs offer a number of advantages for the modeling of with application to retargetable simulation,” in DATE ’03: Proceedings processors. The processor itself is inherently parallel, so the of the Conference on Design, Automation and Test in Europe. IEEE modeling with PFAs is quite natural and easy to understand. Computer Society, 2003, pp. 556–561. [6] F. Brandner, D. Ebner, and A. Krall, “Compiler generation from PFA-based processor models are very visual and can be structural architecture descriptions,” in CASES ’07: Proceedings of the organized in order to match common graphical representations International Conference on Compilers, Architecture, and Synthesis for of processors such as pipeline diagrams. The movement of Embedded Systems. ACM, 2007, pp. 13–22. [7] F. Brandner, “Compiler backend generation from structural processor instructions through the pipeline is explicitly visible, which models,” Ph.D. dissertation, Institut f¨ r Computersprachen, Technische u simplifies the development and debugging of PFA models. Universit¨ t Wien, 2009. a A major advantage of PFAs over FSAs is the decoupling of [8] J. A. Bondy and U. S. R. Murty, Graduate texts in mathematics - Graph theory. Springer, 2007, vol. 244. states and nodes. PFAs thus consist of fewer nodes compared [9] M. Reshadi and N. Dutt, “Generic pipelined processor modeling and to equivalent regular FSA modeling the same processor. Still, high performance cycle-accurate simulator generation,” in DATE ’05: the PFA implicitly encodes the same number of states. The Proceedings of the Conference on Design, Automation and Test in Europe. IEEE Computer Society, 2005, pp. 786–791. expressive power does not change since PFAs and FSAs are [10] S. Thesing, “Safe and precise WCET determination by abstract inter- equivalent. The implicit encoding of all possible processor pretation of pipeline models,” Ph.D. dissertation, Naturwissenschaftlich- states also simplifies the construction of PFA models, i.e., Technische Fakult¨ t der Universit¨ t des Saarlandes, 2004. a a [11] W. Damm and A. Pnueli, “Verifying out-of-order executions,” in Pro- it is no longer required to pre-compute all possible states to ceedings of the IFIP WG 10.5 International Conference on Correct construct the automaton. Hardware Design and Verification Methods. Chapman & Hall, Ltd., 1997, pp. 23–47. A. Application Scenarios [12] F. Brandner, “Precise simulation of interrupts using a rollback mecha- nism,” in SCOPES ’09: Proceedings of the 12th International Workshop The presented approach was initially intended as a flexible on Software and Compilers for Embedded Systems. ACM, 2009, pp. and generic execution model for instruction set simulators. 71–80. Interpreting simulators can immediately reuse the PFA by