Apidays New York 2024 - The value of a flexible API Management solution for O...
New solutions for wireless infrastructure applications
1. New Solutions for Wireless
Infrastructure Applications
May 2, 2012
Moshe Anschel
DSP System & Architecture Manager
Freescale
May 2, 2012
2. Agenda
• The wireless baseband market trends and
requirements
• Freescale Approach: QorIQ Converge B4860
overview
• StarCore SC3900 Flexible Vector Processor
architecture
May 2, 2012
3. Macro Base Station Challenges
Connectivity
• Coverage: Urban, highways and rural
• Spectral efficiency: Radio and network performance
High
• Multi-standard: Supports variety of users Throughputs &
• Reliability: Zero down time Coverage
Capacity Multi Many
Standard Active
• Users: Hundreds of active users & SDR Users
• Throughputs: Over 1Gbps data rate
• Scalable/Modular: Sectors, antennas, users…
• Active Antenna, MIMO: Improved QoS
Lowering Costs
Cost Energy Efficiency
• Space: Miniaturization and consolidation of equipment
• Low Impact: Power & Cost
• Future Proof: Easy upgrades, SDR
• Complete solutions: Ease of development, faster time to market
May 2, 2012
4. Industry Flagship for Performance, Power and Cost
B4860 delivers the highest performance in the industry through intelligent, balanced
integration with a focus on cost and power efficiency
Optimal System Cost – industry-leading levels of Performance Optimized – offering a leap in performance
integration, drastically reducing chip count and with efficient, high-performance next generation of our
component cost field proven DSP & MPU cores as well as enhanced
application specific accelerators
Delivers on Scalability – a common architecture from
femto to macro providing vertical and horizontal Power Efficiency – SoC solution allows for intelligent load
scalability; allows customers to leverage both software balancing and power management
and hardware architectures
May 2, 2012
5. Benefit of Intelligent Integration
3 sector, 20 MHz LTE 3 sector, 20 MHz LTE
with 5 major components on a single SoC
CPRI
Antenna
Layer-1
Back Haul
Antenna B4860 PHY 10 Gbps
GE
DSP PHY 1Gbps
I2C
Layer-2/3 sRIO
Transport UART
Maint.
& Control
CPRI
DSP Multicor SPI CPRI
sRIO
Switch
e
MPU
Flas
DDR2 DDR1
h
DSP
Flas
h
DDR
DDR3
POWER 3
B4860 SoC
4X Cost Reduction
3X Power Reduction
May 2, 2012
6. QorIQ Qonverge B4860 – Block Diagram & Benefits
• Next generation, e6500 Dual-Thread
Power Architecture® cores offer
highest CoreMark/Watt with AltiVec
technology for dramatic L2
scheduling acceleration
• Next generation, SC3900
StarCore™ provides 2x DSP
performance compared to
competitive offerings
• Above 21GHz of Programmable
Performance
• Smart hardware acceleration for
Layer 1, 2, Control and Transport
allows for best in class performance,
power and cost
• Large scale SoC integration allows
for simpler programming models and
easier load balancing
• Integrated, Rich I/O including
backhaul & antenna interfaces
provides flexibility, interoperability
and reduces overall system cost
May 2, 2012
7. StarCore SC3900 -Flexible Vector Processors
•StarCore SC3850 DSP is used
in many base stations
powered by the MSC815x
family
•StarCore SC3900 is targeted
to handle future base station
requirements and challenges
•SC3900 architecture is
presented next
May 2, 2012
8. SC3900 Core & Clusters
SC3900 SC3900
High Speed FVP Core FVP Core
Baseband
Accelerators
StarCore SC3900 FVP Clusters Interface 32K 32K 32K 32K
• Six SC3900 Cores
• Clustering two SC3900 under a 2MB, multi-banked L2 cache
2MB 16-way Shared L2 Cache, 4 Banks
• High bandwidth accelerator ports (up to 1Tbps per cluster)
• Hardware support for memory coherency between L1, L2
caches and the main memory CoreNet Coherent Fabric
37,460 BDTI
Highest
BDTI recently benchmarked the SC3900 core included in the Speed
Score
Freescale B4860. Running at 1.2 GHz, the SC3900 core 20,030
received a BDTIsimMark2000™ score of 37,460 – the highest
speed score recorded. See www.BDTI.com for details
Texas Freescale BDTIsimMark2000™
Instruments SC3900 BDTImark2000™
C66x 1.2GHz
1.5GHz
May 2, 2012
9. SC3900 Optimized for Baseband L1 Processing
• SC3900 is optimized to efficiently handle Baseband PHY
Layer processing
• PHY layer processing can be divided into three
categories:
– Computation intensive DSP code (mainly MAC intensive)
– Data manipulation and less intensive DSP code
– Control code
• Each one of the categories is non-negligible in
processing requirements
• There is no clear boundary separation
• SC3900 accelerates all types of Baseband L1 processing
May 2, 2012
10. Computation Intensive DSP Code Acceleration
• SC3900 provides Vector processor capability by
increasing the execution units and optimizing the
whole datapath accordingly
– Up to 32 MACs per cycles (4x versus SC3850)
– Optimized register file and memory throughput
• SC3900 optimized datapath lead to high MAC
utilization
• Performance:
– SC3900 is 3.5x-4x better than SC3850 in intensive DSP
code
May 2, 2012
11. L1 Processing - Data Manipulation Acceleration
• “Data manipulation” stands for many different functions existing in
Baseband Layer 1 - For examples:
– Data preparation before/after intensive kernels
• Ex: data re-ordering, matrix transpose, pack/unpack
– Less regular kernels or serial/cyclic kernels with low parallelism
• Ex: QR Decomposition, Interleaver, encoder.
• SC3900 architecture addresses “Data manipulation” by different means:
– Datapath flexibility: This is the “Flexible Vector Processor” essence
• Register file flexibility: Each unit can read/write any registers
• Execution unit flexibility: Each unit can run different and independent instructions
– Rich and flexible Instructions set
• Efficient instruction set which large support of different data type and size
• New powerful data manipulation specific instructions
• Performance:
– SC3900 is 2x-3x better than SC3850 in “Data Manipulation”
May 2, 2012
12. Data Manipulation Acceleration Flexible Datapath
• Unlike traditional vector processor, SC3900 Datapath is
flexible:
– Flexible execution units:
• 4 independents units, each capable of 8-way SIMD
• Each unit can run different and independent instructions
– Flexible register files:
• Registers are not defined as long Vector of 100’s bits, but scalar which
can be accessed by any execution unit (read and write)
A0
MAC MAC MAC MAC MAC A1
A2
A0 A1 A2 A3 A3
Every
Exec Unit #n ADD B0
B1 execution unit
can only B2 can read/write
B0 B1 B2 B3 read/write SHIFT B3
every register
C0
registers #n C1
C0 C1 C2 C3 C2
CMP C3
Traditional Vector processor model
SC3900 flexible model
May 2, 2012
13. L1 Processing - Control Code Efficiency
• One of the SC3900 goals is to improve in control code efficiency
– L1 control functions are tightly integrated with the Arithmetic
intensive SW
– Useful for running scheduling functions that are control intensive
• Control code performance is affected by two main aspects:
– Core and Compiler efficiency in typical control code constructs
– Memory system efficiency
• Both have been addressed on the SC3900 , E.g. :
– Ability to flatten decision trees using multiple predicates
– Full support for non-aligned memory access without penalty
– Larger, clustered 2MB L2 cache to keep the program close to the core
• Performance:
– SC3900 is up to 1.5x better than SC3850 in control processing
May 2, 2012
14. Summary & Conclusion
• Three 20 MHz sectors of LTE base station in a
single SoC, supporting multiple standards and
multimode operation for macro base stations
• Complete baseband solution, integrates L1, L2,
Control and Transport baseband processing from
backhaul network to antenna Interface
• StarCore SC3900 is a key technology providing
the processing efficiency and flexibility on the
PHY layer processing (Computation intensive DSP,
Data manipulation and less intensive DSP code &
Control code ) for the B4860 SoC
May 2, 2012
Notas del editor
DPAA - Any packet to any CPU to any accelerator or network interface without locks or semaphores
FFTMatrix/vectormultComplex FIRCorrelationOn the contrary, TI C66 is increasing only the execution unitsCausing memory bandwidth bottleneck and register pressureLead to low utilization of the execution unit and lower performance