This document describes statistical techniques for analyzing rare circuit failure events caused by process variations in high-dimensional variation spaces. It introduces the subset simulation (SUS) and scaled-sigma sampling (SSS) methods. SUS estimates the rare failure probability by calculating conditional probabilities in subsets of the variation space. SSS scales the sampling distribution to make rare failures less rare, allowing more efficient estimation. Both methods were tested on 45nm CMOS circuits with hundreds of random variables and shown to accurately estimate failure rates, outperforming existing techniques challenged by high dimensions.
1. ECE Department, Carnegie Mellon University, Pittsburgh, PA, USA
Student: Shupeng Sun; Advisor: Xin Li
Fast Statistical Analysis of Rare Circuit Failure Events in High-Dimensional Variation Space
Motivation
Existing Approaches
Challenge: High-Dimensionality
45nm Intel® CoreTM
i7 Processor
Simplified SRAM Architecture
cell
cell
cell
SA
cell
cell
cell
SA
cell
cell
cell
SA
Address
Decoder
Process Variation
Designed Fabricated
Fabricated
BL
CELL<0>
CELL<63>
SA
OUT
BL_
CLK CLK_
CLK
CLK_
CLK_
CLK_
CLK
CLK
D Q
SRAM DFF
Subset Simulation (SUS): Continuous Performance Metric
x : process variation
Ω : interested failure region
PF : interested failure rate
Ω1
0
Ω4
Ω2
Ω3
x1
x2
Ω4 = Ω
0
Ω
x1
x2
Example
4
Pr( )
10
F
P
−
= ∈Ω
=
x
Applying SUS
P1=Pr(x ∈ Ω1)=0.1
P2=Pr(x ∈ Ω2|x ∈ Ω1)=0.1
P3=Pr(x ∈ Ω3|x ∈ Ω2)=0.1
P4=Pr(x ∈ Ω4|x ∈ Ω3)=0.1
Estimate P1, P2, P3, P4
Estimate PF
Phase 1: draw random samples from PDF f(x) and estimate P1 = Pr(x ∈ Ω1)
x1, x2 are generally modeled as independent Normal
random variables
Ω1
0 x1
x2
1
SUS G
G Y
N
P
N N
=
+
NY: # of yellow points NG : # of green points
Phase 2: draw random samples from f(x | x ∈ Ω1) and estimate P2 = Pr(x ∈ Ω2 | x ∈ Ω1)
f(x | x ∈ Ω1) is unknown in advance. Modified Metropolis (MM) algorithm is applied
to generate random samples that follow f(x | x ∈ Ω1)
2
SUS R
G R
N
P
N N
=
+
NG : # of green points
NR : # of red points
0 x1
x2
Ω1
Ω1
0 x1
x2
0 x1
x2 Ω2
MM
samples created in Phase 1
• 45nm CMOS technology
• 200 independent random variables
• The delay from CLK→Q is performance of interest
• 100 95% confidence intervals (CIs) are calculated
from 100 runs with 5500 simulations in each run
• The “golden” failure rate is 4.8×10−6 that is
estimated by MC with 5 million simulations
1 2 1
1 1
2 1
Pr( ) Pr( ) Pr( )
K K
K K
F k k k
k k
P P
−
−
= =
Ω ⊇ Ω ⊇ ⊇ Ω ⊇ Ω = Ω
= ∈Ω
= ∈Ω ∈Ω ∈Ω=
∏ ∏
x x x x
Pk
P1
0 20 40 60 80 100
-8
-6
-4
-2
(a) MNIS
log
10
(P
F
)
0 20 40 60 80 100
-8
-6
-4
-2
(b) SUS
log
10
(P
F
)
Scaled-Sigma Sampling (SSS): Discrete Performance Metric
Idea: it is much easier to estimate PG than PF if s is large enough
f(x) : original probability density function PF : failure rate by sampling f(x)
g(x) : scaled probability density function PG : failure rate by sampling g(x)
σf : σ of f(x) σg : σ of g(x)
[1] C. Bishop, Pattern Recognition and Machine Learning. Prentice Hall, 2007.
[2] M. Qazi, et al., “Loop flattening & spherical sampling: highly efficient model reduction techniques for
SRAM yield analysis,” IEEE DATE, 2008, pp. 801-806.
[3] A. Singhee and R. Rutenbar, “Statistical blockade: very fast statistical simulation and modeling of rare
circuit events, and its application to memory design,” IEEE TCAD, vol. 28, no. 8, pp. 1176-1189, Aug. 2009.
[4] C. Gu and J. Roychowdhury, “An efficient, fully nonlinear, variability aware non-Monte-Carlo yield
estimation procedure with applications to SRAM cells and ring oscillators,” IEEE ASP-DAC, pp. 754-761,
2008.
[5] S. Sun, et al., “Fast statistical analysis of rare circuit failure events via scaled-sigma sampling for high-
dimensional variation space,” IEEE ICCAD, 2013, pp. 478-485.
[6] S. Sun and X. Li, “Fast statistical analysis of rare circuit failure events via subset simulation in high-
dimensional variation space,” IEEE ICCAD, 2014, accepted.
Proposed SSS
2
log log
G
P s
s
γ
α β
= + ⋅ +
s
PG
s1 s2 s3
s
PG
s1 s2 s3
s=1
PF
scaled failure rate
σ
x
f(x)
0
Ω
x
g(x)
0
Ω
σg = s·σf
scaling factor
( ) ( )
* *
1 1 1 1 1 1
1
1
min 1,
else
a
b
a
x r f x f x
x
x
≤
=
r1 is drawn from Uniform(0,1)
( ) ( )
* *
2 2 2 2 2 2
2
2
min 1,
else
a
b
a
x r f x f x
x
x
≤
=
r2 is drawn from Uniform(0,1)
( )
*
1 2
,
b b
x x
=
x
x2
x1
( )
1 2
,
a a a
x x
=
x
Ω1
0
x2
x1
a
x
0
1
a
x
( )
1 1 1
a
q x x
−
*
1
x
f1(x1)
x1
x2
a
x
0
2
a
x
( )
2 2 2
a
q x x
−
*
2
x
f2(x2)
x2
x1
a
x
Ω1
0
b
x
MM
*
1
* *
1
a
b ∉Ω
=
∈Ω
x x
x
x x
Phase 3: similar to Phase 2
Scaling: a large number of replicated components are integrated on the chip
Challenge: all the components need to function correctly under large process variations
Yield requirement: each component must be extremely robust under process variations
• The failure event of each component must be rare
Time to market: fast statistical tools are highly desired to analyze the rare failure event
Idea: find K subsets {Ωk; k = 1, 2, ···, K} in x-space, and estimate the rare failure rate PF
by the conditional probabilities {Pk; k = 1, 2, ···, K}
Brute-force Monte Carlo (MC) [1]
• Pros: no dimensionality issue
• Cons: not efficient
Importance sampling (IS) [2]: bias the sampling distribution
• Pros: efficient in low-D space
• Cons: difficult to find an appropriate biased sampling distribution in high-D space
Statistical blockade [3]: classifier based
• Pros: efficient in low-D space
• Cons: expensive to construct an accurate classifier in high-D space
Deterministic approach [4]: integrate the failure region in the variation space
• Pros: efficient in low-D space
• Cons: expensive to accurately describe the failure region in high-D space
In the past, rare failure event analysis was mainly focused on SRAM bit cell ← low-D
Recently, rare failure event analysis in high-D becomes more and more important
1. Dynamic SRAM bit cell stability related to peripherals: many transistors from multiple
SRAM bit cells and their peripheral circuits must be considered
2. Rare failure event analysis for non-SRAM circuits: e.g., DFF
Question: how to estimate these conditional probabilities {Pk; k = 1, 2, ···, K}?
Experimental results
DFF
BL
CELL<0>
CELL<63>
SA
OUT
BL_
CLK CLK_
CLK
CLK_
CLK_
CLK_
CLK
CLK
D Q
• 45nm CMOS technology
• 384 independent random variables
• The output of SA is performance of interest
• 100 95% confidence intervals (CIs) are calculated
from 100 runs with 6000 simulations in each run
• The “golden” failure rate is 1.1×10−6 that is
estimated by MC with 1 billion simulations
SRAM
0 20 40 60 80 100
-10
-8
-6
-4
-2
(a) MNIS
log
10
(P
F
)
0 20 40 60 80 100
-10
-8
-6
-4
-2
(b) SSS
log
10
(P
F
)
The 95% CIs (blue bars) of the DFF example for:
(a) MNIS [2] and (b) SUS. The red line represents
the “golden” failure rate.
The 95% CIs (blue bars) of the SRAM example for:
(a) MNIS [2] and (b) SSS. The red line represents
the “golden” failure rate.
( ) ( ) ( )
1 1 2 2
f f x f x
= ⋅
x