More Related Content Similar to 10 Steps in Telecom Thermal Management (20) More from Kamal Karimanal (20) 10 Steps in Telecom Thermal Management 2. Scope
Thermal management of air cooled telecom
equipment
Meant to be a high level check list/overview
of the complete thermal management
process
• Specific details are beyond the scope of this 30
minute presentation
• Stand by for detailed articles/webinars on each of
the sub topics
© Cielution LLC
3. Air Cooled Telecom Hardware Thermal management Flow
Requirements Gathering
Component Layout Sanity Check
Air Moving Strategy, Fan/blower Selection
Heat Sink Parts Selection
Thermal interface material Selection
Detailed Thermal Simulation &
Optimization
Hardware & PM level
Controls
Heat Sink / TIM Test
Prototype Test
Document
© Cielution LLC
4. Requirements Gathering
Previous Generation
product of the same
family?
• Learn from history
• Thermal data on
Individual Modules?
What is changing?
• Smaller?
• More Power?
• Reduced air flow?
Total power
Power per card
List of most critical
components and their
power
Spec temperature limits
© Cielution LLC
5. Requirements Gathering – Round Up the Usual Suspects
ASICS & FPGAs
Optical Transceivers (SFP(10 Gb/s), SFP+(16 Gb/s),
XFP, QSFP (4X10, 4X28 Gb/s) etc.,)
• Low case temperature limits- 70 C to 85C
• High internal and external thermal resistance
Memory
• Low temperature limit
Power electronics (Inductors, converters)
• Board heating risk
Caution!
• First cut power gathered is usually too high every
component is rated at max power too much margin of safety
Sometimes, this padding effect can be corrected by
comparing the sum of all component powers and comparing
against power supply rating
© Cielution LLC
6. Example Components Table
Card Name CielCard
Component
Name
Components
Power (W)
# of
components
Total Power (Watts)
Temperature Spec
(Deg. C)
Spec Method Component Model
Main Board
ASIC #1 120 2 240 115 Tj
2R
(Tjc=0.1,Tjb=3.5)
ASIC #2 140 2 280 105 Tj
2R
(Tjc=0.12,Tjb=2.5)
Memory
Stack
20 10 200 80 Tc Lumped Block
Inductors 12.5 8 100 125 Tc Lumped Block
Mezzanine Board
SFP+ 1 48 48 85 Tc Semi Detailed
PHY 40 1 40 110 Tj
2R (Rjc=0.2,
Rjb=6.0 )
MAC 20 1 20 110 Tj
2R (Rjc=0.18,
Rjb=5.0 )
928
© Cielution LLC
7. Flow Rate Requirement Estimation
Lane Calculation Method
• Assume even distribution
of air flow (not really… but
sufficient at this stage.)
http://en.wikipedia.org/wiki/Fil
e:Processor_board_cray-
2_hg.jpg
© Cielution LLC
8. Component Layout Sanity Check
300 W
300 W
600 W
Tmax (outlet) = 95 C
55 C air at 6000 ft Elevation
Lane I Lane II Lane III
35 % of card width 25 % of card width 40 % of card width
© Cielution LLC
9. Component Layout Sanity Check
In this example…
• We need to remove
50 % of heat with
40% air flow….
o i.e. Assuming even
distribution of flow
across the board
Not really a red
flag…
• But need for
flexibility to layout
change suggestions
after detailed
analysis.
© Cielution LLC
11. Air Moving Strategy: Flow Requirement
Using the density of air for 6000 ft elevation,
we get the following flow rate requirements:
• Total 66 CFM.
• If there are 10 such cards in the Chassis, total
flow needed =660 CFM
• Number of Fans needed on Tray= 660/Actual
Flow per Fan
• What is the actually realizable flow rate?
CFM
m
TC
P
Q
TCQTCmP
p
pp
66
sec
031.0
40*1005*96.0
1200 3
© Cielution LLC
12. Fan Selection Procedure
Create Simplified CFD model of the
System
Impose varying uniformly distributed
air flow rates up to and above the
range of expected operating flow
rates.
Post process to determine pressure
drop
Plot Air Flow vs DP curve
Select a candidate fan and plot
cumulative volumetric flow rate vs DP
on the same plot
• Cumulative volumetric flow rate values
are determined by multiplying the
single fan curve flow rate with the
number of fans in the fan tray. (Fans in
parallel)
Flow rate at the intersection of the two
curves should be higher the estimated
flow requirement determined by lane
calculation.
Simplified System
Model
Flow
Rate Q
Pout
Pin
© Cielution LLC
15. Thermal Resistance Stack up Calculation
Heat Loss Through Package Top
Heat Loss Through The PCB
Most heat flows through
the heat sink with
effective heat sinking
© Cielution LLC
16. Heat Sink Selection
DTtotal = DTjc + DTth_interface +DTca
DTtotal = Total allowed junction temperature
rise over air temperature at heat sink inlet
• Down stream components are allowed lower
DTtotal due to elevated inlet air temperature.
• For example for a Tj limit of 115 C…
o Upstream component sees 55 Deg. C.
Its total allowed junction temperature
raise is 60 C.
o Last component in the lane sees 78 C
inlet temperature, its allowed rise is only
37 C. So downstream component will
demand more heat sink surface area
than the up stream component even if
they are the same
Assuming all heat through Heat sink…
• DTjc = Rjc * Power
• DTth_interface = Rth_interface * Power
• DTca = Rca_hs * Power
• Look for heat sink with resistance Rca_hs
These 2 identical sets of
ASICS may need
different heat sinks.
http://en.wikipedia.org/wiki/Cisco_12000
© Cielution LLC
18. TIM Classification
TIM type
Mounting pressure
(PSI)
Thermal
resistance
range (k-
sq.Cm/W)
Comments Application
Grease 20-100 0.03-0.5 Pump out, dry out, "messy"
High power components with
mechanical heat sink mounting
(>50W)Phase change materials 20-100 0.05-0.5
Pad form, easier to apply, lower
susceptibility pump out, no dry out
Compressible metallic
interconnects (indium heat-
spring)
100 psi 0.03
Highly sensitive to mounting
pressure. Very smooth surface finish
and planarity expected.
High power, mechanical mounting.
Elastomeric pads ~100 0.7-5
Gap filling for multi component heat
sinks
Low power components with
mechanical mounting (~10W or
lesser)
Adhesive tape Na ~5
Low power components without
mechanical mounting. Lightweight
heat sinks
Thermally conducting
adhesives
Na ~1
Cure necessary, no need for
mechanical mounting
© Cielution LLC
19. Impact of Thermal Interface in Thermal Budget
Power 100 W, Package area ~16 Sq. cm, TIM Implications:
• Grease/PCM with Rth = 0.3 W-cm2/K at Mounting
pressure 50 PSI
o DT across TIM II = 1.86 C
o Sounds OK, But Mounting load = 128 Pounds
• Reducing mounting area on the heat sink to 1 Sq. In
(6.5 cm2)
o DT across TIM II = 4.7 C
o High DT, but Mounting load of 50 Pounds is
probably safe.
• It will be ideal if mounting load can be reduced
without reducing the contact area.
o From that stand point, PC material, whose
Thermal resistance is less sensitive to
mounting pressure is desirable.
• It’s also important to ensure the surface finish and
flatness required to accomplish the published thermal
impedance claimed by the vendor.
Rth(C-Sq.cm/W)
Mounting Pressure (PSI)
0.1
1.0
20
80
• Total Rth=0.45 C
• DT = 45 C for 100 W
• OK for upstream component
• May be cutting close if the heat sink
see’s 10 to 15 C pre heater air.
© Cielution LLC
21. Tasks Accomplished by Detailed Thermal
Simulation
System/Sub System Level
Characterization
Effect of localized temperature
and air flow distribution
Sensitivity to parameters
• It is a valuable exercise to
understand sensitivity to
design parameters or
random variabilities.
Heat sink optimization
Component Temperature
prediction for screening design
prior to prototype and testing.
Weight and pressure loss
reduction using reduced fin
density on upstream
components
Higher fin density for downstream
components
Cu, Al, or VC heat sink?
Fiber Channel
Component Cooling
© Cielution LLC
22. System Impedance Characterization Using
Simulation
Why?
• System impedance for fan selection is an early stage activity.
• Prototypes may not be available at this stage
http://en.wikipedia.org/wiki/Ciena_Optical
_Multiservice_Edge_6500
System Design System Impedance Model
System impedance curve
+
Effective Fan curve
© Cielution LLC
24. VC? Cu or Al for Heat sink Base?
One of the ASIC BGAs
represented in detail with
power map (non-uniform
heat distribution) on the
die
VC Base Cu Base Al Base
© Cielution LLC
25. Fiber Channel Transceiver Cooling
Transceiver cooling
challenges
• Low thermal margin (Tcase_max =
70 to 85C)
• There may be very little head
room for heat sink due to
mezzanine board and double
sided configurations
• Overhanging heat sink
arrangement may be necessary
for creating fin area.
http://de.wikipedia.org/wiki/Gigabit_Interface_Converter
http://en.wikipedia.org/wiki/Avaya_ERS_5600_Series
Limited Head Room
Transceiver Component
Extended HS base to
create additional fin area
http://electronicdesign.com/boards/designing-sfp-cage-
cooling-systems
© Cielution LLC
26. Fiber Channel Cooling – Air Flow Limit
Air Temperature = 75C
Consider reducing fin density to allow more air through SFP heat sink
© Cielution LLC
28. Hardware and PM level Controls
Many thermal challenges can be overcome with
layout changes that are consistent with performance
requirements.
• Requires Thermal – Mechanical - Hardware collaboration
Alternative packaging for the overheating IC
• Copper lidded BGA vs MC encapsulation
Transceiver components have commercial and
industrial spec (higher case temperature allowed)
• http://www.cisco.com/c/en/us/products/collateral/interfaces-modules/gigabit-ethernet-gbic-sfp-
modules/product_data_sheet0900aecd8033f885.html
Some component powers may be temporary spikes
• Design for temporary operation at that power (transient
model), rather than extended periods.
Component Tj may also be allowed to operate at
higher Tj for short periods.
Fan operating voltage
© Cielution LLC
31. Why Test TIM and Heat Sink
TIM test
• Bond Line Thickness (BLT) is a
function of mounting pressure
o Mounting pressure is not uniform
hence BLT is non uniform.
o Thermal impedance is a function
of BLT
• Pump out, and voids cannot be
modeled. Needs to be tested for
various samples.
• Performance variation due to
surface flatness and finish
characteristics
Heat sink test
• Vapor chamber base behavior
can vary significantly from part to
part and can be a function of heat
concentration.
• Need to understand variability, %
rejects etc.
o A compromised VC base is
worse than simple metal base
o May need to test every part at
manufacturer or at
http://en.wikipedia.org/wiki/Heat_sink
http://en.wikipedia.org/wiki/Heat_sink
© Cielution LLC
33. Prototype Testing
Differences between vendor measured fan
curves and in-situ fan curves
Actual Power generated.
Actual Power Distribution on the Chip
Variabilities in actual component surface finish
& flatness – associated non-uniformity in
interface thermal resistance.
System
Pressure tap
to ensure
zero Back
Pressure
Blower
assist to
eliminate
back
pressure
© Cielution LLC
35. Documentation
Next Project usually starts from where the previous ended.
Documentation should be accessible from corporate wide knowledge
management system with appropriate access privileges.
Following details need to be captured in documentation.
• Component table containing:
o Powers, predicted, tested temperatures and spec limit
o Heat sink part # along with reference to mechanical engineering documentation for
mounting design.
o TIM part #
• Potential risks along with recommendations.
o Example: “High Theta JC on component #. Any more dissipation than specified can
cause the cooling solution to be insufficient. Recommend advanced packaging option
for next generation ASIC”
• Test methods used. If possible to keep, location where tested parts are stored
in the lab.
o Suggested improvements in test methodology for next time.
• Models Repository:
o Naming convention and location where all thermal models and calculations are stored
o Ideally fully solved data to avoid need to re-run simulations.
o Keep only the final models for which results are reported. Cleanup everything
else!
o There may be some interesting findings in the sensitivity studies performed….
Keep them in appropriately named folders
o Suggested improvements in modeling methodology for next time.
© Cielution LLC