The integration of Open Library Architecture (OLA) libraries within nano-technology design environments can positively impact SoC design cycle times. Consistent calculation of desired information across a standard application programming interface (API) ensures analysis convergence among tools, eliminates data exchange processing and storage requirements, and significantly reduces iterations through design processes steps.
Breaking the Kubernetes Kill Chain: Host Path Mount
OLA Conf 2002 - OLA in SoC Design Environment - paper
1. Benefits of OLA Integration
into
Nano-Technology SoC Design Environments
2002 First Annual OLA Developer’s Conference
February 11-12, 2002
San Jose, California
Timothy J. Ehrler
Senior Principal Methodology Engineer
SoC Methodology Development, Design Technology Group
Philips Semiconductors
8372 S. River Parkway, Tempe AZ 85284
tim.ehrler@philips.com
Abstract frequencies and decreased cell delays which are
As technologies progress to sub-100nm level, becoming more susceptible to IR drop and more
increased chip densities are allowing greater dependent upon output loading and input slew rates
functionality to be combined onto a single die. than with previous technologies. At the same time,
Increasingly complex designs are evolving from what timing has become increasingly affected by
had previously been sets of ASIC chips into a highly interconnect related issues such as cross-coupling,
integrated system on a chip (SoC). This added wire inductance, and signal noise.
complexity is reflected not only in that of the design
itself, but also in the demands placed upon the EDA As technology progresses to, and even exceeds, the
tools and methodologies necessary to implement such sub-100nm, or “nano-technology”, level, the
designs. capability exists to implement a complete functional
system on a single chip. Whereas previous
Critical to the SoC design cycle is the convergence to technologies had necessitated the implementation of
sufficiently accurate timing and power. Most EDA a total system “solution” to be distributed across a
methodologies rely on tool-specific, proprietary number of advanced ASIC chips, current
characterization data views, or that of a “de-facto” technologies now allow, and indeed encourage, the
standard format. Calculation algorithms differ, as do complete implementation within a single “system-on-
critical signal integrity (SI) analysis capabilities, with chip” (SoC).
designers encountering inconsistent, divergent
results, often among different tools from the same 2. Design Flow Complexity
vendor. The necessary exchange of large volumes of In order to realize the implementation of such
timing information among tools, with the associated expansive designs, however, a new paradigm has
storage and export/import time costs, further impacts emerged which focuses on integrating previously
design cycle times. Multiple passes through design developed and validated complex blocks of logic
processes magnify these impacts. and/or intellectual property (IP), cores, and
memories. The high levels of integration associated
The integration of Open Library Architecture (OLA) with this paradigm is dramatically increasing the
libraries within nano-technology design interconnect to cell delay ratio, requiring more
environments can positively impact SoC design cycle accurate timing calculation methodologies based
times. Consistent calculation of desired information upon the emerging deep sub-micron (DSM)
across a standard application programming interface interconnect issues.
(API) ensures analysis convergence among tools,
eliminates data exchange processing and storage 3. Technology & Design Information
requirements, and significantly reduces iterations In order to address these technology and design
through design processes steps. issues, many more tools are being injected into
traditional design flows, most of which analyze,
1. Technology Advancements generate, or depend upon, concise timing and/or
Semiconductor technology has been advancing at power information to arrive at optimal design
least as rapidly as the rate predicted by Moore’s Law. solutions. Worse still, much of this information is
As transistor sizes have decreased, so too have exchanged among tools by formatting and exporting
associated cell sizes, with increased device operating to mass storage from one tool, followed by importing
-1-
2. from storage, parsing, and interpreting that data
within another tool. RTL LIB
Although the format or content of traditional Synthesis Delay
representations of the characterized information Optimization
Scan Insertion
Calculation
(Tech. WL)
LIB
required by a particular tool may be well defined, the
interpretation of that data, calculation algorithms Netlist SDF Slew
involved, and accuracy of such calculations may Report
differ significantly among tools. The resulting
Static
inconsistent, and oftentimes correspondingly Timing LIB
Analysis
inaccurate, timing information substantially
contributes to increased design cycles which rely on Functional
Simulation LIB
consistent and accurate timing to accomplish solution
design objectives [6].
Formal
Verification
4. Traditional Design Flow Import
LIB
Library
In order to illustrate the major issues facing timing Floor
closure driven design flows, we’ll first review a Planning
typical design flow using traditional library formats.
This flow, restricted to only a relevant subset for this
Design Database
Wireload
Extraction Custom
discussion, is illustrated in Figure 1. The simplistic Wireloads
assumption herein is that the user’s design flow may
Delay
encompass a variety of tools from multiple tool Calculation LIB
(Custom WL)
vendors, including the foundry-provided delay
calculator required for sign-off. This also implies,
Netlist SDF Slew
perhaps in the extreme, that each tool, or type Report
thereof, requires its own library, the format of which
may be industry standard, “de-facto” standard, or Place & Route
Clock Tree
Static
Timing LIB
proprietary, and may not be common to other tools Pad Ring Analysis
within the flow. Parasitics
Extraction SPEF
Of particular note within the timing sections of the
design flow, shown within the shaded areas, is the Delay
LIB
Calculation
inclusion of a foundry or semiconductor vendor (Parasitics)
supplied delay calculation tool. This tool generates a
timing back-annotation SDF file using (perhaps) Netlist SDF Slew
Report
proprietary timing calculation algorithms specific to
the supported technology. In addition to providing Static
Timing LIB
delay and constraint timing, it may also provide a Analysis
slewrate, or ramp times, report as well. Scripts or
other tools may process this report, or it may be Formal
Verification LIB
directly imported by the static timing analysis and/or
synthesis and optimization tools, using such
information as constraints for further analysis. This Figure 1. Traditional Design Flow
becomes much more critical to timing closure within
later physical design phases since design 4.1 Pre-Route Timing Closure
performance becomes increasingly impacted by slight Preliminary timing closure is usually performed after
changes to the design itself, where slewrates may the initial RTL-to-gates synthesis process in order to
become more consequential than delay times. arrive at a sufficiently practical implementation of the
design solution within given performance
Although functional simulation and formal specifications. This phase may also require closure
verification process steps are included within the for gross power consumption, which may or may not
illustrated flow, they are not relevant to the initial be arrived at using additional analysis tools.
timing closure discussions, but their presence within Interconnect timing is estimated using the technology
the flow will be touched on when discussing an OLA library’s wireload tables, which can be detrimental to
based design flow. the closure cycle since such models are statistical by
-2-
3. nature, and can not reflect the varying interconnect clock tree synthesis, and I/O pad ring processing
characteristics among IP, cores, and random logic. steps have been completed. At this point in the design
cycle, the design has been completely implemented at
Although the iterations through this process of timing the physical level, and all information required to
calculation, static timing analysis, and logic achieve power and timing closure is available to the
optimization may not be as numerous as when respective tools.
performed in later physical design phases, an
especially high performance design may require a Of particular relevance to this discussion is the
significant number of iterations when implemented timing closure iteration cycle. Included within this
with low-speed/low-power, i.e. low-performance, process is the major overhead of parasitics extraction,
technology libraries. The greater the disparity with the associated I/O, storage, and processing costs,
between the design performance objectives and the all of which can be tremendous. At this stage,
performance of the implementation technology, the extracting the parasitics and generating the SPEF file
more iterations which must occur in order to achieve can take 10’s of hours of processing time and multi-
initial closure. As shown, however, the cost of each Gbytes of storage space. Conversely, importing that
iteration cycle is the generation and back-annotation information can take even longer since the contents
of SDF and slewrate information files, along with the must be parsed and processed in a manner dictated by
associated processing, I/O, and storage resource the consuming tool, and may require a
costs. correspondingly large memory requirement to do so.
In addition, SDF generation can take many hours and
If there are any discrepancies between the timing consume many 100’s of Mbytes of storage, with the
view from which the SDF file has been generated and same impact of importing and processing that
that of the consuming tool, considerable efforts are information by its consuming tool.
required to modify the SDF to conform to those
views demanded by the latter. Given the significant 5. Timing Closure Impediments
size and content of this timing information file for Because of the methodologies employed within
SoC designs, conversion tool limits may well be traditional design flows, the deficiencies that can be
exceeded by the complexity of the task. attributed to the representation, organization,
exchange, and processing of characterization and
4.2 Floor Plan Timing Closure design information, the efforts involved in achieving
Secondary timing closure may be performed after timing closure with large SoC designs can be
initial floor planning but prior to final placement and immense, requiring significant resource commitments
routing of the design. At this point in the flow, in terms of compute facilities, mass storage,
custom wireload models may be derived from the personnel, and design time. The major issues
floor plan in order to make a more meaningful contributing to ineffective timing closure include
estimate of interconnect timing. Iterations through timing calculation methods, interconnect analysis,
this phase can assist in reaching gross placement view consistency, and information exchange. The
timing, but can be very deceptive since the derived limitations and restrictions caused by these issues
custom wireloads are still statistical, although result in additional iterations within stages of the
targeted at this particular implementation only, yet design cycle, oscillating around design performance
still can not accurately account for the varying types targets as the designer attempts to converge on
of interconnect among the blocks and gates. sufficiently accurate timing.
In addition to the costly overhead of SDF processing, 5.1 Timing Calculation Methods
there are also the costs, though not nearly as severe, Each tool within a design flow will usually contain its
of processing the custom wireloads. Design changes own timing engine, based upon the supporting library
resulting from the timing analysis warrant views containing pertinent characterization data, the
corresponding changes to the design database. This, algorithms of which are sufficiently different enough
in turn, requires the extraction and generation of a that the timing obtained from one tool may be
netlist file for those tools not having direct access to inconsistent with that of another, and may be
the database, with the associated time and storage performed with varying levels of accuracy among
costs. them. Methods and calculations regarding the
derating and/or scaling of this timing will differ, as
4.3 Post-Route Timing Closure will the capability to perform instance-specific versus
The most critical phase of design implementation is global PVT point processing to account for IR drop
the final timing closure after placement, routing, and thermal effects.
-3-
4. 5.2 Interconnect Analysis SDF format, 2.1 and 3.0, may require both flavors be
In addition to differing timing calculation methods, generated to satisfy the consuming tool requirements,
each tool may have its own interconnect analysis which can convey inconsistent information to the
algorithms as well. Different methods of network various tools.
reduction may be employed, loads may be calculated
as lumped or effective, and network driving Some tools may support specific constructs, such as
waveforms and subsequent propagation throughout REMOVAL timing, but others may translate them to
may or may not be implemented or supported, and another, such as HOLD constraints, while still others
most probably differ among tools. may ignore them completely. Although an SDF can
represent a triplet of timing as well as a single timing
Signal integrity issues, such as cross-coupling effects point, SDF generators may well only support one,
and noise-propagation, may or may not be while consuming tools may only support the other. If
implemented, or may be implemented sufficiently a triplet representation is required, but the generation
differently as to appear conflicting among tools. is at a single point, multiple generations must be
performed to obtain the corresponding points and the
5.3 Timing View Consistency results merged into a single SDF file, with the
Tools and library views are inherently coupled, associated I/O, processing, and storage overhead
resulting in inconsistent, and oftentimes conflicting, costs it imposes.
timing representation among the many library views
consumed within the design flow, depending on the Aside from the above issues, significant overhead
capabilities and purposes of the tools involved. costs of exchanging timing information among tools
Timing may be conditional within one library, in this manner are involved. The generation of the
unconditional in another, and omitted entirely in information requires formatting, I/O processing, and
another. Complementary constraints may be storage resources, while the consumer requires I/O,
described differently, such as a SETUPHOLD parsing, and processing resources. With large SoC
window in one and separate SETUP and HOLD designs, the can well take tens of hours and hundreds
timing in another. Interpretation and support for of Mbytes of storage with each tool.
timing constructs may differ, such as REMOVAL
being treated as HOLD or ignored altogether. 6. Open Library Architecture (OLA)
The creation of the Delay Calculation Language
5.4 Timing Information Exchange (DCL) based Delay Calculation System (DCS) by
With SDF files being used as the most common IBM introduced the concept of embedding timing
method of exchanging timing information among calculation algorithms within a technology library.
tools, insufficient, inconsistent, and inaccurate The application would “converse” with the library
information is presented to the consuming tools. One through a standard set of application programming
of the most consequential deficiencies of the format interfaces (API) to request particular timing
is the absence of available slewrate information, information rather than accessing and interpreting
which becomes more critical to analysis and design raw timing information from a library and then
tools for DSM SoC designs. Lacking this calculating the desired result. Later enhancements to
information, a tool may derive inaccurate ramp times, include power calculation capabilities resulted in the
default to an incorrect value, or simply assume a 0.0 IEEE 1481-1999 standard for Delay and Power
value, all of which will severely affect tools that rely Calculation System (DPCS) [1].
on skew information for critical paths or structures,
such as clock trees. Subsequent extensions to the system to include
graph-based functional descriptions, vector based
There can be significant differences in the timing timing and power arithmetic models, and cell and pin
view defined within the library from which the SDF properties and attributes from Accellera’s Advanced
is generated and that of the consuming tool, resulting Library Format (ALF) standard [2][3] further
in unsuccessful back-annotation or, even worse, expanded its capabilities. This resulting SI2 Open
default or erroneous timing. An SDF generation tool Library Architecture (OLA) standard was further
may merge interconnect timing with path timing improved upon to include more concise APIs for
rather than being separately specified, preventing interconnect parasitics, with later additional APIs
consuming tools from properly performing their developed to address signal integrity issues such as
function. Calculated negative timing may or may not cross-coupling, noise propagation, parasitic analysis,
be generated in the SDF, and consuming tools may or and physical characteristics for floor planning,
may not accept it. Support of multiple versions of the placement, routing, etc. [4][5].
-4-
5. 6.1 OLA Concept APP: request timing from OLA
The purpose of OLA is to provide a single method by OLA: get passed timing path
which information required by an application is OLA: request PVT from APP
consistently and accurately calculated. It replaces the APP: return PVT
traditional method of parsing and interpreting OLA: get passed ‘ck’ slew
characterization information from varying view OLA: request ‘q’ load from APP
formats and calculating the desired results using APP: return ‘q’ load
application-specific algorithms with a compiled OLA: calculate timing
library from which the desired information can be (early/late delay/slew)
programmatically requested, calculated, then returned OLA: return timing
as shown in Figure 2. APP: use requested timing information
Tool 6.2 OLA Benefits
By embedding the algorithmic calculations within the
library itself, consistent results are always obtained
Tool
DPCS
DPCS
OLA for use by the requesting application. Slew/rate
LIB
information is calculated in conjunction with delay
Tool timing, providing those tools additional information
not otherwise directly available through SDF timing
information exchange. Since network reduction,
Figure 2. OLA Concept parasitic analysis, cross-talk, and noise propagation
methodologies are embedded as well, interconnect
timing calculations between cells is as consistent as
The concept of OLA, and of DPCS in general, is that that within cells. Providing this consistent and
an application dynamically links the OLA library at additional information, as well as eliminating
runtime, and “converses” with the library through a annotation failures due to timing view inconsistencies
standard defined set of a programming interfaces and conflicting/ambiguous annotation information
(API) to obtain such information as is needed by that interpretation, significantly reduces timing closure
application. The application initiates the request for iteration cycles.
information, and the library responds to the requests,
returning the requested information. It does so by The generation of multiple SDF files at different PVT
using the information provided through the API, points, and overhead costs of merging them into an
using internally cached information, using library acceptable form for consuming tools, is eliminated
characterization information, and/or requesting since instance-specific timing calculations are
additional information from the application, then supported. Incremental timing is easily performed on
calculating the requested information and returning it demand, again eliminating the requirement for SDF
to the application. At any time during this generation and annotation to account for incremental
“conversation”, additional information may be design changes.
requested until all required information has been
collected, the results calculated, and then returned to Because timing information and algorithms are
the requestor. compiled into the library instead being made
available in a readable format, intellectual property
A very simplistic example of this interaction can be content can be hidden from the user. This protects the
shown for an application, such as a static timing vendor’s IP, allows for the implementation of internal
analysis tool, requiring timing within a flip-flop cell timing within the IP, and also prevents local
‘dff’ from the rising clock ‘ck’ to falling output ‘q’. “hacking” of library information by users.
dff
In addition to providing a consistent calculation
methodology, functional expressions, such as
d q specified for conditional timing and functional
behavior, is available in a graph-based form. This
removes from each application the requirement to
ck
parse and interpret expressions, again eliminating
inconsistent interpretation of library information
Figure 3. Timing Example among tools. It also provides consistent functional
information such that synthesis, formal verification,
-5-
6. and simulation tools can use the library as well,
eliminating even more views from the design flow. RTL OLA
LIB
6.3 Design Flow Usage Synthesis
The most productive usage of OLA libraries within Optimization
Scan Insertion
the design flow involves those stages relating to
timing closure. By replacing the separate typical Netlist
Static
Timing
static timing analysis sub-flows involving the Analysis
foundry-supplied delay calculator (Figure 4) with one
Functional
interfacing with the OLA library (Figure 5), a more Simulation
concise, consistent, and accurate timing analysis can
Delay and Power Calculation System (OLA)
be performed. This eliminates the need for a stand- Formal
Verification
alone delay calculator since the timing algorithms
contained therein are now included within the library
Design Database
itself, and provides slew as well as timing Floor
Planning
information to be provided to the analysis tool.
Wireload
Extraction Custom
Delay
Calculation LIB Wireloads
Static
Netlist Timing
SDF Slew OLA Analysis
Report LIB
DPCS
Place & Route
Static Static Clock Tree
Timing LIB Timing Pad Ring
Analysis Analysis
Parasitics
Extraction SPEF
Figure 4. Typical Timing Analysis Figure 5. OLA Timing Analysis
Static
Netlist Timing
Notably missing from this sub-flow is the SDF file, Analysis
the usage of which for timing back-annotation is no Formal
longer required. The reduction in the number of Verification
required library view formats to that of OLA only,
eliminating perhaps inconsistent and inaccurate Figure 6. OLA Based Design Flow
timing views from the analysis sub-flow, promotes
faster timing convergence as well. 7.1 Timing Closure Improvements
The consistent and accurate timing calculation
The combination of compatible timing views, algorithms embedded within the OLA library allow
consistent timing calculations, and the elimination of faster convergence to a reliable timing solution. This
incomplete timing information exchange through the capability is available for pre-route, floor plan, and
intermediate SDF file, greatly reduces the number of post-route stages of timing closure, all using
iterations required to converge upon a timing consistent timing calculation methods and algorithms
solution. embedded therein.
7. OLA Based Design Flow Iterations within the timing closure stages of the
An equivalent design flow which integrates OLA design flow are significantly reduced primarily due to
libraries therein, replacing the stand-alone foundry- this combination of accurate and consistent timing
supplied delay calculator and SDF back-annotation calculation. The single timing engine within the
file, is shown in Figure 6. The relative simplicity of library itself provides consistent information to the
this flow with respect to the previous typical one is application, complete with slew times, for both early
immediately apparent by the simplified timing and late timing. Tool-specific algorithms are avoided,
closure stages, as well as the notable reduction in the as are the commonplace incompatibilities among
number of required library views for the various tools differing timing views usually present within the
included within the flow. associated libraries. Back-annotation of [in]complete
timing information using SDF files is also avoided
since such information need not be exchanged among
-6-
7. tools, but rather are calculated and provided as Interconnect analysis, with due consideration of
needed by the application. signal integrity issues, can be calculated in a
consistently accurate manner, allowing faster timing
Instance specific PVT-related timing is provided for convergence once the physical implementation of a
consideration of IR drop and thermal affects, as is the design is realized. In addition, instance-specific PVT-
capability to provide incremental timing as opposed based timing provides for increased accuracy where
to requiring generation and exchange of complete IR drop and thermal effects may manifest
block or design timing information. themselves, and the capability to provide incremental
timing on demand eliminates the need for further
Interconnect timing calculations, with the associated iteration cycles.
parasitics network reduction and waveform
propagation algorithms, are part of the library timing Above all, the elimination of SDF file based timing
engine, and provides consistent results to all information exchange requirements among tools,
applications. Signal integrity issues such as cross-talk with the incurred compatibility, resource, and time
can be implemented therein, as can be the inclusion costs, greatly improves design development
of inductance for RLC rather than RC based timing. productivity.
7.2 Extended Integration In conclusion, the integration of OLA libraries within
In addition to the elimination of the delay calculation a design flow, in conjunction with appropriate OLA-
tool and SDF file, note the further elimination of compliant tools, can significantly improve design
many of the tool-specific library views. Since OLA efforts by reducing timing closure time through the
libraries provide information and associated use of more accurate and consistent timing
algorithms in a standard accessible method, and calculation methods, which directly contributes to
provide for more than just timing and power analysis reduced design cycle time.
tools, OLA-compliant tools other than those intended
strictly for static timing analysis can be integrated References
into the design flow as well, further reducing the [1] Design Automation Standards Committee of the
need for the various formats of tool-specific views IEEE Computer Society, “IEEE Standard for
previously required. Such tools include synthesis, Integrated Circuit (IC) Delay and Power
scan insertion, optimization, functional simulation, Calculation System”, IEEE 1481-1999, 26 June
formal verification, and many others. 1999.
[2] Accellera, “Advanced Library Format (ALF) for
An extremely aggressive integration of OLA- ASIC Technology, Cells, and Blocks”, revision
compliant tools and libraries, utilized wherever 2.0, 14 December 2000.
possible within a complete industry design flow [7], [3] IEEE P1603, “A standard for an Advanced
can dramatically reduce the number of required Library Format (ALF) describing Integrated
library views, as shown in Figure 7, yielding Circuit (IC) technology, cells, and blocks”,
corresponding improvements within the design flow. revision draft 2, 12 November 2001.
[4] Silicon Integration Initiative, “Specification for
Design Process Tools Standard / Total OLA Total Format
Proprietary Formats Replaceable Formats Reduction the Open Library Architecture (OLA)”, revision
Formats / Deleteable
RTL Development/Analysis 5 3/0 3 2/0 2 33% 1.7.04, 3 January 2002.
Design Synthesis 7 4/6 10 4/1 6 40%
Logic/Timing Verification 17 5/11 16 6/5 6 63% [5] J. Abraham, S. Churiwala, “Flexible Model for
Partitioning & Floor Planning 11 3/9 12 5/0 8 33%
Layout & Chip Finishing 21 4/15 19 6/2 12 37%
Delay and Power”, Silicon Integration Initiative,
1998.
Figure 7. Library View Requirement Reduction
[6] T. Tessier, C. Buhlman, “Timing Closure of a
870Kgate + 3 Mbit Ram, 0.2u-12mm Die in a
8. Conclusion 1312 Pin Package IC”, SNUG 2001.
The capability of providing consistent and accurate [7] T. Ehrler, “Multiple Design Flows: Reducing
timing information at all levels of the design process, Support Requirements with OLA”, Custom
from pre-route through post-route, can dramatically Integrated Circuits Conference 2001, ALF/OLA
reduce, if not eliminate, iterations within timing Panel Discussion, 6-9 May 2001.
closure stages, converging on a design solution which
meets performance objectives much faster and more
easily than with traditional approaches.
-7-