7. CHRONOS
• OVERVIEW OF THE TOOL
• CHRONOS: A TIMING ANALYZER FOR EMBEDDED SOFTWARE
XIANFENG LI, YUN LIANG, TULIKA MITRA AND ABHIK ROYCHOUDHURY
SCIENCE OF COMPUTER PROGRAMMING, VOLUME 69, DECEMBER 2007.
• SCALABLE LIGHT-WEIGHT INFEASIBLE PATH DETECTION
• WITHIN AN ITERATION
• ACROSS LOOP ITERATIONS
• NOVEL MICRO-ARCHITECTURAL MODELING
• OUT-OF-ORDER PIPELINES
• BRANCH PREDICTION
• I-CACHE AND ITS INTERACTION WITH OTHER FEATURES
• D-CACHE WITH NOVEL MODELING
• UNIFIED MULTI-LEVEL CACHE AND CODE/DATA LAYOUT
IEEE ISORC 2018 Keynote 7
8. A VIEW OF TIMING ANALYSIS
IEEE ISORC 2018 Keynote 8
System-level
Efficient,
large designs
Program level
Bit more
expensive,
accurate
System-level and Program-level
techniques are somewhat disjoint.
Motivation:
Artifacts other than
WCET bounds
10. IEEE ISORC 2018 Keynote 10
DRAMCPU
CACHE
S
Caches have a significant impact on performance
Issues such as Cache Thrashing may hamper the performance gain due to
Caches
Caches are used to bridge the performance gap between
CPU and DRAM
CACHES: WHY ARE THEY NEEDED?
11. CACHE THRASHING: WHY IT IS BAD?
IEEE ISORC 2018 Keynote 11
Cache Thrashing occurs when a frequently used cache line
is replaced by another frequently used cache line
… as a result lots of cache misses
m3
m2m1
While(true){
if(x > 5){
// m1 accessed
}else{
// m2 accessed
}
// m3 accessed
}
Set
1
Set
2
Cache
m1 and m2 conflict in cache
may lead to thrashing ...
access to m3
results in
cache hit
after first
iteration
14. IEEE ISORC 2018 Keynote 14
Static
Analysis
Program
Cache
Configuration
Classification of
Memory Block
always hit (AH)
persistent (PS)
always miss (AM)
not classified (NC)
{m1,m2} maps to Cache Set 1
{m3} maps to Cache Set 2
STATIC ANALYSIS
15. IMPRECISION IN ABSTRACT
INTERPRETATION
IEEE ISORC 2018 Keynote
15
p1 p2
Cache state = C1 Cache state = C2
Joined Cache state = C3
a
b
b
x
Abstract
cache set
Abstract
cache set
youngyoung
b Joined cache statePath p1 or path p2?
Joined cache state loses information about path p1 and p2
16. MODEL CHECKING ALONE ?
• A PATH SENSITIVE SEARCH
• PATH SENSITIVE SEARCH IS EXPENSIVE – PATH
EXPLOSION
• WORSE, COMBINED WITH POSSIBLE CACHE STATES
IEEE ISORC 2018 Keynote
16
p1 p2
Cache state =
C1
Cache state =
C2
17. MODEL CHECKING ALONE ?
• A PATH-SENSITIVE SEARCH
• PATH SENSITIVE SEARCH IS EXPENSIVE – PATH
EXPLOSION
• WORSE, COMBINED WITH POSSIBLE CACHE STATES
IEEE ISORC 2018 Keynote
17
p1
p2
a
b
young b
x
Abstract LRU
cache set
young
a
b
Abstract LRU
cache set
young b
x
Abstract LRU
cache set
young
State Explosion
18. CACHE ANALYSIS
IEEE ISORC 2018 Keynote
18
Program
Pipeline
analysis
Branch predictor
modeling
WCET
of basic
blocks
constraints
Infeasible
path
constraints
Loop
bound
IPET
Micro architectural
modeling
Path analysis
Cache
analysis by
abstract
interpretation
Analysis
outcome
Refine by
Symbolic Exec
All checked
Timeout
Refinement by model checker can be terminated at any point
Model checker refinement steps are inherently parallel
Each model checker refinement step checks light assertion property
19. REFINEMENT (INTER-CORE)
IEEE ISORC 2018 Keynote
19
m
m
Task
Cache hit
start
exit
Conflictin
g task
Cache miss
m1
m2
m
cache
x < y
x == y
Infeasible
m1
m2
Spurious
≠m ≠m
young
20. REFINEMENT (INTER-CORE)
IEEE ISORC 2018 Keynote
20
m
m
Task
start
exit
Conflictin
g task
m1
m2
m
cache
x < y
x == y
Infeasible
m1
m2
C_m++
Increment
conflict
C_m++
Increment
conflict
assert (C_m <= 1)
Verified
m
A Cache Hit
young
21. REFINEMENT (WHY IT WORKS?)
IEEE ISORC 2018 Keynote
21
Path 2
Cache miss
m
m
Conflict to mm’
C_m++
Increment
conflict
assert (C_m <= 0)
Property
Does not
affect the
value of C_m
x < y
x == y
m’
m
22. EXTENSION USING SYMBOLIC
EXECUTION
IEEE ISORC 2018 Keynote
22
Conflictin
g task
m1
m2
x < y
x == y
m1
m2
C_m++
Increment
conflict
C_m++
Increment
conflict
assert (C_m <= 1)
x < y
constraint
solver
x = y x = y
x < y x ≥ y
x < y ˄ x = y
unknown
NO
assert (C_m <= 1)
satisfied
abort
25. A GENERIC FRAMEWORK
• THREE DIFFERENT ARCHITECTURAL/APPLICATION
SETTINGS
IEEE ISORC 2018 Keynote
26
Intra task
(WCET in single core)
High
priority
Low
priority
Inter task
(Cache Related
Preemption Delay
analysis)
cache cache L1 cache L1 cache
Shared L2
cache
Task in
Core 1
Task in
Core 2
Inter core
(WCET in multi-core)
Cache
conflict Cache
conflict
Cache
conflict
27. THE TALK
IEEE ISORC 2018 Keynote 28
Advances in
Functionality
checking
driven by
Constraint
solving
Timing
Analysis++
Symbolic Execution
Analysis of multi-
cores
Tests apart from
bounds
28. TEST GENERATION
IEEE ISORC 2018 Keynote 29
To develop a test generation framework which aims
to report all possible cache performance issues that
may exist in some program execution.
Test generator
Program
Cache
Configuration
Unique cache
performance issues
(each issue is reported
with a symbolic formula to
reach that issue)
29. DIFFERENT FROM PROFILING!
IEEE ISORC 2018 Keynote 30
Program
Profiling
Program
Cache Config.
Test Inputs
Performanc
e
Issues
Test generator
Program
Cache Config.
Test Inputs
Performanc
e
Issues
Symbolic
Formula
No guarantees
for
completeness
Vs
30. IEEE ISORC 2018 Keynote 31
We reduce the problem of testing cache
performance to an equivalent functionality testing
problem
Static Analysis
Instrumentation
Dynamic
Explore
Test Generate
P P’
Non-functional properties
encoded as assertions
Reduces the search
space for exploration
Explores the reduced
search space & generate
test cases
Test
Case
s
Stage I
Stage
II
KEY IDEA
31. IDENTIFYING THRASHING SCENARIOS
IEEE ISORC 2018 Keynote 32
Classification of
Memory Block
Extract memory
blocks potentially
involved
in Cache Thrashing
Set of Cache
Thrashing Scenarios
{{m1,m2}}
assume direct
mapped cache
Extract
always miss (AM)
not classified (NC)
For each cache set
32. IEEE ISORC 2018 Keynote 33
Encode each thrashing scenario as an assertion at appropriate
program location
Instrumentation
Thrashing
Sets
{{m1,m2}}
INSTRUMENTATION
33. GENERATING ASSERTIONS
IEEE ISORC 2018 Keynote 34
An assertion captures the property that
all memory blocks in a Thrashing Scenario are
evicted at least once between two consecutive
accesses
Unique cache conflicts between two
access (Cm )
Let , {{m1,m2}}
assert(Cm1 ≤ 0 V Cm2 ≤ 0)
Condition for staying in the cache
Cm ≤ associativity of cache - 1
34. IEEE ISORC 2018 Keynote 35
Exploration is performed to verify the validity of
Instrumented assertions
Instrumente
d Program
Instrumente
d
Assertions
<Ө,ф>
Where,Ө : thrashing scenario
Ф : symbolic formula on input that leads to
Ө
Validate
Deviate
Report
Exploration
DYNAMIC EXPLORATION
35. EXPLORATION USING GREEDY STRATEGY
IEEE ISORC 2018 Keynote 36
Use CDG to find a
path with
maximum
# of unchecked
assertions
Control
Dependence
Graph (CDG)
Unchecked
Assertions
New path
to explore
36. TEST GENERATION
IEEE ISORC 2018 Keynote 37
Results are generated in the format < Ө , Ф >
Where, Ө : thrashing scenario
Ф : symbolic formula on input that leads to
Ө
Any input which satisfy Ф will lead to cache
thrashing scenario Ө
37. IEEE ISORC 2018 Keynote 38
Cache analysis by
abstract
interpretation
Instrumentation
automatically adds
assertions to the
program
Report
violated
assertions
Explore a path
leading to
assertions
(symbolic
exec)
Test
Suite
Program
CHMC
(cache hit-
miss
classification)
Instrumente
d
Program
Assertion
violated in
Time Budget
/
All
instrumente
d assertions
violated
always hit (AH)
persistent (PS)
always miss (AM)
not classified (NC)
TEST GENERATION RECAP
39. EVALUATION
IEEE ISORC 2018 Keynote 40
Assertion Coverage
Thrashing Potential
Unique assertions checked * 100
= --------------------------------------------
Unique assertions instrumented
Unique assertions violated *
100
= ----------------------------------------
Unique assertions
instrumented
100 % coverage implies all unique assertions have been checked at least once
Gives an idea about the thrashing potential for a program, for a given cache
configuration
40. o PROGRAMS WITH LESSER NUMBER OF INPUT DEPENDENT
PATHS WERE EXPLORED FASTER
o FOR MOST EXPERIMENTS, ONLY A SMALL FRACTION OF
INSTRUMENTED ASSERTIONS WERE VIOLATED
o APPLICATIONS INCLUDE
o PROVIDE INPUTS TO SYSTEM LEVEL ANALYSIS?
o REWRITING THE PROGRAM
o CHOOSING CACHE CONFIGURATION FOR AN APPLICATION
o CACHE LOCKING STRATEGIES
IEEE ISORC 2018 Keynote 41
OBSERVATION
41. NOT PROFILING OR TESTING
IEEE ISORC 2018 Keynote 42
Testing
Functionality
(Symbolic
Execution)
Testing
Performance
Profiling
Not Sound or Complete
Sound & Complete
Partitioning I/P
Space
Requires manual effort
May have false positives
Automated
No False Positives
43. A VIEW OF TIMING ANALYSIS
IEEE ISORC 2018 Keynote 44
System-level
Efficient,
large designs
Program level
Bit more
expensive,
accurate
System-level and Program-level
techniques are somewhat disjoint.
Motivation:
Artifacts other than
WCET bounds
44. CACHE SIDE CHANNELS
load a[key]
load a[1]
load a[2]
Cache
Key = 0
load a[2]
a[0]
a[1]
a[2]
classified input (key) — key can be 0 or 1
MISS
Side-channel Leaks
45
IEEE ISORC 2018 Keynote 45
45. CACHE SIDE CHANNELS
load a[key]
load a[1]
load a[2]
Cache
Key = 1
load a[2]
a[1]
a[2]
classified input (key) — key can be 0 or 1
HIT
Side-channel Leaks
46
IEEE ISORC 2018 Keynote 46
46. CACHE SIDE CHANNELS
classified input (key) — key can be 0 or 1
Key = 1
HIT
load a[key]
load a[1]
load a[2]
Key = 0
MISS
🐞leak leak
load a[2]
load a[key]
load a[1]
load a[2]
load a[2]
IEEE ISORC 2018 Keynote 47
47. ANALYZING CACHE SIDE CHANNELS
• Symbolically track memory address
• Expose non-functional behavior (cache misses) as functionality
• Get inputs which show specific cache miss scenarios
load a[key]
load a[1]
load a[2]
a[key] ⋀ (key = 0 ⌵ key = 1)
a[1]
a[2]
load a[2] a[2]
classified input (key) — key can be 0 or 1
48
👿IEEE ISORC 2018 Keynote 48
49. A VIEW OF TIMING ANALYSIS
IEEE ISORC 2018 Keynote 50
System-level
Efficient,
large designs
Program level
Bit more
expensive,
accurate
System-level and Program-level
techniques are somewhat disjoint.
Motivation:
Artifacts other than
WCET bounds
Tests, Attack scenarios
50. IEEE ISORC 2018 Keynote 51
Advances in
Functionality
checking
driven by
Constraint
solving
Timing
Analysis++
Symbolic Execution
Analysis of multi-cores, Tests apart from
bounds
Attack scenarios
When WCET analysis tools were developed in real-time systems community, constraint solvers were not
mature.
Additional applications and analyses can be developed by leveraging constraint solving and symbolic
execution.
Notas del editor
Since a program runs on a particular hardware, WCET analysis of a program consists of two crucial steps --- micro-architectural modeling; which analyzes the timing behavior of individual hardware components e.g. cache, pipeline, branch prediction and path analysis; which finds the longest feasible path in the program. In prior work, it has been argued that abstract interpretation for micro-architectural modeling and Integer Linear Programming for path analysis is a scalable technique for deriving WCET. An alternative solution is to use Model Checking. Let us first review, why model checking alone creates a difficulty in performing the individual analysis.
Let’s see carefully the source of imprecision in abstract interpretation.
Assume two different cache states c1 and c2 along path p1 and p2. At the control flow merge point (light), we obtain a single cache state c3 by abstract join operation.
For a more concrete example, assume two LRU cache sets along path p1 and p2 (as shown, highlight), both containing a memory block “b” at different position of the cache set. If we perform a must join operation, we get this cache set (as shown, light). For the time being, just ignore how this “must join” operation is performed. Important thing to note here is that we have a single joined cache state where memory block “b” cannot be distinguished whether it appears from path “p1” or path “p2” ---- a path insensitive analysis.
However, this also leads to the scalability of the analysis.
What happens if we would have used model checking for cache analysis?
It will not perform the join operation (light)
It split the merge point and do a path-sensitive flow analysis
No doubt that this approach is more precise
As we go through the example again, we can observe that memory block “b” can be distinguished depending on which path it appears (p1 or p2, light)
This approach, however, does not work in practice. As we can see clearly that the approach would generate exponential number of cache states, in the presence of many branch operations.
What do we do?
We apply abstract interpretation first to perform cache analysis and refine the cache analysis outcome by repeated model checking steps
Abstract interpretation based cache analysis has been shown to be very effective when integrating with other micro-architectural features. Important thing is to note that we do not change the outcome of AI based cache analysis --- what we give is just more precise than we get by only applying abstract interpretation.
Assume when the task is run in isolation, m results in a cache hit, as it is in the cache before being accessed. However, when the task is run with a conflicting task (either high priority or running in a different core), we may get a cache miss through analysis. This cache miss could be generated by at least two conflicting memory blocks m1 and m2 in the conflicting task. If m1 and m2 are accessed in an infeasible path of the conflicting task, we get a spurious cache miss.
The problem is solved similarly as in the intra-task case. Here, we run the model checker refinement on the conflicting task. As m1 and m2 are two conflicting memory blocks, the conflict count is incremented before accessing them. The property is checked at the end of the conflicting task, as we are verifying the conflicts generated by the entire conflicting task. We check a property to ensure that “m” would remain a cache hit even if run with the conflicting task. If m1 and m2 are accessed in an infeasible path, as shown, this property will be satisfied and we can say that “m” will not be evicted from the cache and will remain a cache hit.
An important feature of our refinement step is that it is much simpler than a full fledged cache analysis through model checking. Note that we check an assertion property involving variable C_m. There could be many parts of the program (as shown), which does not affect the value of C_m, as not all memory blocks conflict with m. The model checking can only work on the slice of the assertion condition, and therefore, can avoid searching the entire state space.
- In the inter-core conflict refinement, we choose one of the three tasks (statemate/nsichneu/compress) to generate conflicts in the shared cache. We run different tasks on the other core (light) and we measure the average improvement of WCET over all the tasks (light).
This figure gives the key result of this paper. We show the improvement in all three variants of cache analysis with respect to time when benchmark statemate is used.
The vertical cut (light) at 100th second show that we can terminate the refinement after 100 seconds and get the respective improvements (light). However, if we run the refinement phase till 150 seconds (light), we get better improvements.
The vertical portion (light) show the effect of eliminating unnecessary model checker calls.
The flat portion (light) show that all possible refinements have been either validated or violated by model checker refinements.
We instantiate our framework to refine three different varieties of cache conflicts
intra-task, which is most common and is generated among different memory blocks accessed inside a single task
inter-task, which is generated by a high priority task when it preempts a low priority task. The delay generated due to the additional cache misses for preemption is known as cache related preemption delay
inter-core, which is generated in the shared cache in multi-cores. For example, (highlight the tasks) these two tasks may replace the memory blocks of each other in the shared L2 cache (light).
Our model checker refinement steps eliminate spurious cache conflicts. Let us see through an example how these spurious cache conflicts appear.
So if key is 0, we observe that this last access was a cache miss, as a[2] will be replaced by a[0].
So if key is 1, we observe that this last access was a cache hit, as a[2] will not be replaced by a[0].
So if we summarise, the hit/miss characteristics of the last access was sufficient to determine whether key was 0 or 1.
In order to find such information leak, we perform a con colic execution. Meaning we execute the program with concrete inputs, but treat all sensitive inputs as uninstantiated. This way, we track all memory related operations and figure out the memory addresses which depend on sensitive inputs. In this case, the only example is a[key], whose symbolic address can be specified as shown.
#equivalence classes of observations quantify the vulnerability of a program w.r.t. the attack model. The graph shows the #observation equivalence classes explored by our checker over time. For both attacks (cache size is set to 8KB), the checker finishes to explore all equivalence classes. The figure clearly shows trace-based attack (i.e. observing the sequence of cache hits and misses) is more powerful attack as compared to timing-based attack (i.e. observing merely the number of cache misses).