SlideShare una empresa de Scribd logo
1 de 33
May 1, 2013
An Innovative Multicore System
Architecture for Wireless SoCs
Alon Yaakov
DSP Architecture Manager, CEVA
May 1, 2013
Multicore in Embedded System
Defining the Problem
• Control-plane
– Synchronization between cores
– Semaphores
– Message passing using mailbox mechanism
– Snooping mechanism
– Interrupt handling
• Data-plane
EqualizationAntenna Processing Error Correction
This will be the
focus of today’s
presentation
May 1, 2013
Outline
• Multicore Challenges
• The CEVA-XC Solution
May 1, 2013
Multicore Challenges
1. Partitioning
> Task partitioning onto different chip resources
> Data partitioning onto different chip resources
2. Resource sharing
> Memories, buses, system I/Fs, peripherals, etc.
3. Scheduling
> Allocating tasks/data
4. Data sharing
> Transferring data between engines
DSP A
DSP C
DSP B
CTCMLDFFT
Application ?
May 1, 2013
• Tasks
– Parts of an algorithm running in a sequential order
– A task must have a defined input and output data
structure (packets)
Challenge 1: Task Partitioning
Error Correction
EqualizationAntenna Processing
MLD
FFT
Ch
estimation
FFT
Ch
estimation
Reordering
Interleaver
Interleaver
Interleaver
CTC
CTC
CTC
Concatenati
on & CRC
checker
Concatenati
on & CRC
checker
Concatenati
on & CRC
checker
Task
Data
May 1, 2013
Challenge 1: Task Partitioning
HW Offloading
• Parts of the algorithm are more suited for HW
acceleration
– Well known algorithms that require little programmability
– Heavy computational effort
MLD
FFT
Ch
estimation
FFT
Ch
estimation
Reordering
Interleaver
Interleaver
Interleaver
CTC
CTC
CTC
Concatenati
on & CRC
checker
Concatenati
on & CRC
checker
Concatenati
on & CRC
checker
May 1, 2013
Challenge 1: Data Partitioning
• Several cores are used to process different
input data packets
• Suitable for homogeneous systems
• Shared memory is used for storing history data
– Core must wait for data to update before using it
 Latency
• The entire program code is used by all cores
– Core suffers stall cycles if L1 memory is small
May 1, 2013
Challenge 1: Partitioning
OK, Now What?
• Efficient partitioning is dependent on the
hardware platform
• Building the optimal system depends on the
partitioning
• There is no single optimal solution
– Each approach has its merits
• Partitioning can be eased by starting with a
reference that can be used as a basis
May 1, 2013
Challenge 2: Resource Sharing
• Resource types
– DSP cores
– HW accelerators
– Memories
– Buses
– DMA
• Resource sharing creates contention
Memory
May 1, 2013
Challenge 2: Resource Sharing
Avoiding Contentions
• If possible avoid contentions by duplicating HW
– Multiple DMAs
– Duplicated HW accelerators
– Multilayer BUS
– Partition memory into blocks enabling concurrent access
• Throughput and latency govern the minimum amount of
hardware resources
Memory Memory Memory Memory
May 1, 2013
Challenge 2: Resource Sharing
Arbitration
• When a simple set of known rules can be defined a
resource can be shared using a HW arbiter
• QoS
– Priority
– Bandwidth allocation (weight)
– Well known algorithms (round robin)
• Arbitration is based on time
sharing of resources
 Scheduling
Memory
Arbiter
May 1, 2013
Challenge 3: Scheduling
• How do we assign and schedule tasks to
cores?
Application ?
Concatenati
on & CRC
checker
MLDFFT
Ch
estimation
FFT
Ch
estimation
Reordering
Interleaver
Interleaver
Interleaver
CTC
CTC
CTC
Concatenati
on & CRC
checker
Concatenati
on & CRC
checker DSP A
DSP C
DSP B
CTCMLDFFT
May 1, 2013
Challenge 3: Scheduling
Static Scheduling
• Tasks are statically assigned to DSP cores
• Design phase includes task scheduling
– Data flow is fixed
– Suitable when the load on each task is fixed
CTC HW Core
DSP C
DSP BMLD HW CoreDSP AFFT HW Core
MLD
FFT
Ch
estimation
FFT
Ch
estimation
Reorder
Interleaver
Interleaver
Interleaver
CTC
CTC
CTC
Concatenat
& CRC
checker
Concatenat
& CRC
checker
Concatenat
& CRC
checker
May 1, 2013
Challenge 3: Scheduling
Dynamic Scheduling
Concatenati
on & CRC
checker
DSP CDSP BDSP A CTCMLDFFT
MLD
FFT
Ch
estimation
FFTCh
estimation
Reorder
Interleaver
Interleaver
Interleaver
CTC
CTC
CTC
Concatenati
on & CRC
checker
Concatenati
on & CRC
checker
MASTER
(Scheduler)
> A scheduler dynamically assigns tasks
to cores
> Scheduler algorithm selects the best
core to execute the task
> Processing capabilities
> Locality of data
> Load balance
> Suitable for complex designs with
variable processing load
and QoS
May 1, 2013
Challenge 4: Data Sharing
Memory Hierarchy
• Internal L1 memory
– Fast memory with no access penalty
– Small / medium size
– Dedicated per core
• External memory
– Can be on-chip (L2) or off-chip (i.e. DDR)
– Slow memory with access penalty
– Large size
– Shared among several cores
– Contentions
May 1, 2013
Challenge 4: Data Sharing
Using Cache
• When shared data is used, a cache system can be
used to reduce the stall count
– Statistically reduces memory stalls, but not
deterministic
• Used only for accessing narrow data width
– Cache should be used for control data
– Not recommended for vector DSP data flow
• Large caches
• Many stall cycles
 How to share vector data?
May 1, 2013
Challenge 4: Data Sharing
Pre-Fetching Data
• A task cannot start until its preceding task completes
• If we can schedule the next task to be executed we can
pre-fetch its input data
– Using static scheduling the data flow is known
– Using dynamic scheduling the scheduler must handle data
move prior to activating a task
MLD Reordering
FFT
Ch
estimation
FFT
Ch
estimation
Interleaver
Interleaver
Interleaver
CTC
CTC
CTC
Concatenat
& CRC
checker
Concatenat
& CRC
checker
Concatenat
& CRC
checker
DMA
DMA
DMA
DMA
DMA
DMA
DMA
DMA
DMA
DMA
DMA
DMA
DMA
DMA
May 1, 2013
Challenge 4: Data Sharing
Pre-Fetching using DMA
• DMA transfer must wait for the following conditions:
– Source data is available
– Destination data can be written (i.e. allocated memory is free)
• DMA activation schemes
– Real-Time SW  Programmable, large MIPS overhead
– HW system events  Not programmable
– Queue manager  Programmable, no MIPS overhead
May 1, 2013
Challenge 4: Data Sharing
Pre-Fetching using DMA with Queue Manager
• A Queue is a list of tasks handled in a FIFO manner
• Each DMA queue contains all DMA tasks related with data flow
channel
• DMA tasks are pushed to the queue
– DSP software (i.e. static scheduling)
– System scheduler (i.e. dynamic scheduling)
• Tasks are automatically activated using HW or SW events
– Source data is available & destination memory is free
FFT
Ch
estimation
DMA
May 1, 2013
Outline
• Multicore Challenges
• The CEVA-XC Solution
May 1, 2013
CEVA-XC4000 Multicore Solution
Optional
Cache ctrl
ACE
May 1, 2013
MUST™ Multi-core System Technology
Overview
• Fully featured data cache
– Non-blocking, software operations, Write-Back &
Write-Through
• Advanced support for cache coherency
– Based on ARM’s leading AMBA-4 ACE™ technology
• Advanced system interconnect
– AXI-4 - easy system integration and high Quality of
Service (QoS)
– Multi-layer FIC (Fast Inter-Connect) - low latency, high
throughput master and slave ports
– Multi-level memory architecture using local TCMs and
hierarchy of caches
May 1, 2013
MUST™ Multicore System Technology
– Cont.
• Data Traffic Manager
– Automated data traffic management without DSP intervention
• Comprehensive software development support
– Advanced multicore debug and profiling
– Complete system emulation with real hardware
– Hardware abstraction layer (HAL) including drivers and system APIs
• Support for homogeneous and heterogeneous clusters of
multiple DSPs and CPUs
– Support for advanced resource management and sharing
– Flexible task scheduling for different system architectures: dynamic,
event based, data driven, etc.
May 1, 2013
> Allows multiple cores to use shared memory without any software
intervention
> Superior performance to SW coherency
> Simplifying software development
> Easy SW partitioning and scaling
from single core to multi-core
> External memory can be
dynamically partitioned into
shared and unique areas
> Minimizing system memories size
> Flexible memory allocation speed up
the SW development
> Snooping is only applied to shared areas
Cache Coherency Support
May 1, 2013
Data Traffic Management
– Data Traffic Manager
– Based on Queue Manager and Buffer Manager Structures
• Queue Manager - Maintains multiple independent queues of “tasks”
• Buffer Manager- Autonomously tracks data status of source and
destination buffers
– Data transfers are automatically managed based on tasks
status, input and output data buffers load
– Automatic data traffic management and DSP offloading
– Prioritized scheduling for guaranteed QoS
– Low latency packet transfers without software intervention
– Results in lower memory consumption and improved system
performance
May 1, 2013
Data Traffic Manager
– Allows sharing a resource among multiple cores via a shared queue
• Tasks are executed based on priority and buffer status
• Prevents starvation and deadlocks
– Allows a single core to work with multiple queues
• The core read / writes
from / to its buffers
(can be local or external)
• All data transfers
between cores and
accelerators are
performed automatically
via the data traffic
manager
May 1, 2013
Dynamic Scheduling
– Dynamic scheduling in symmetric systems
– A clustered system based on homogenous DSP cores
– Dynamic task allocation to DSP cores in runtime
– Flexible choice of algorithms
based on system load
– Hardware abstraction using
task oriented APIs
– Shared external memories
– FIC interface for low-latency
high-bandwidth data accesses
• Commonly used in wireless
infrastructure applications
May 1, 2013
MUST™ Hardware Abstraction Layer
(HAL)
• MUST™ is assisted by user-friendly software support
– Abstracts the queues, buffers, DMA and caches
• The software package includes:
– Drivers and APIs
– Full system profiling
– Graphical interface via CEVA ToolBox
May 1, 2013
Multicore Modeling and Simulation
• Simulating any number of cores
– Support for symmetric and asymmetric configurations
• Support for ARM CADI (Cycle Accurate Debug Interface)
– Including connectivity to ARM’s Real-View debugger
• Comprehensive multi-core simulation support
– Synchronization, system browsing, shared memories , inter connect, accelerator simulation,
cross triggering, AXI / FIC I/F
– Support for user-defined
components
• ESL tools integration with
full debug capabilities
– Compliant with TLM 2.0
– Full support for Carbon
and Synopsys
May 1, 2013
Co-processor Portfolio for
Wireless Modems
– Wide range of co-processors offering power-efficient
DSP offloading at extreme processing rates
• A complete wireless platform addressing all major modem PHY
processing requirements
• Offering flexible hardware-software partitioning
• Customers can focus on differentiation via DSP software
– Unique automated data traffic management
between DSP memory and hardware accelerators
• Allows fully parallel processing support
• Based on data traffic manager
May 1, 2013
Co-processor Portfolio for Wireless
Modems – Cont.
• Optimized tightly coupled extensions (TCE)
– MLD – Maximum Likelihood MIMO detectors
• Supports up to four MIMO layers
• Achieves near ML performance
– De-spreader – 3G De-spreader units
• Supports all WCDMA and HSDPA channels
• Scalable up to 3GPP HSPA+ Rel-11
– DFT / FFT
• Supports multi radix DFTs
• Includes NCO correction
– Viterbi
• Programmable K and r values
• Supports tail biting
– LLR processing and HARQ combining
• Supports LTE de-rate matching
• Significantly reduces HARQ memory buffer sizes
Dramatically reduces
time-to-market
May 1, 2013
Putting It All Together
A Cluster of Four CEVA-XC DSPs
> Processor Level
> Fixed-point and Floating-point
Vector DSPs
> Running over 1GHz
> Data Cache
> Platform Level
> Complete set of tightly coupled co-
processor units
> Automated DSP offloading using data traffic
management
> System Level
> Full cache coherency support
> AMBA-4 and FIC system interfaces
> Software Development Support
> HAL using drivers and APIs
> Comprehensive system debug & profiling
May 1, 2013
Thank You

Más contenido relacionado

La actualidad más candente

Processor Allocation (Distributed computing)
Processor Allocation (Distributed computing)Processor Allocation (Distributed computing)
Processor Allocation (Distributed computing)Sri Prasanna
 
Memory Management
Memory ManagementMemory Management
Memory ManagementVisakh V
 
Memory Management in OS
Memory Management in OSMemory Management in OS
Memory Management in OSKumar Pritam
 
Ios103 ios102 iv-operating-system-memory-management_wk4
Ios103 ios102 iv-operating-system-memory-management_wk4Ios103 ios102 iv-operating-system-memory-management_wk4
Ios103 ios102 iv-operating-system-memory-management_wk4Anwal Mirza
 
Non-Uniform Memory Access ( NUMA)
Non-Uniform Memory Access ( NUMA)Non-Uniform Memory Access ( NUMA)
Non-Uniform Memory Access ( NUMA)Nakul Manchanda
 
Csc1401 lecture05 - cache memory
Csc1401   lecture05 - cache memoryCsc1401   lecture05 - cache memory
Csc1401 lecture05 - cache memoryIIUM
 
Cache performance considerations
Cache performance considerationsCache performance considerations
Cache performance considerationsSlideshare
 
31 address binding, dynamic loading
31 address binding, dynamic loading31 address binding, dynamic loading
31 address binding, dynamic loadingmyrajendra
 
8 memory management strategies
8 memory management strategies8 memory management strategies
8 memory management strategiesDr. Loganathan R
 
Chip Multithreading Systems Need a New Operating System Scheduler
Chip Multithreading Systems Need a New Operating System Scheduler Chip Multithreading Systems Need a New Operating System Scheduler
Chip Multithreading Systems Need a New Operating System Scheduler Sarwan ali
 
Parallel architecture
Parallel architectureParallel architecture
Parallel architectureMr SMAK
 
08 operating system support
08 operating system support08 operating system support
08 operating system supportBitta_man
 

La actualidad más candente (20)

Processor Allocation (Distributed computing)
Processor Allocation (Distributed computing)Processor Allocation (Distributed computing)
Processor Allocation (Distributed computing)
 
Memory Management
Memory ManagementMemory Management
Memory Management
 
Os1
Os1Os1
Os1
 
Ch4 memory management
Ch4 memory managementCh4 memory management
Ch4 memory management
 
Memory Management in OS
Memory Management in OSMemory Management in OS
Memory Management in OS
 
Ios103 ios102 iv-operating-system-memory-management_wk4
Ios103 ios102 iv-operating-system-memory-management_wk4Ios103 ios102 iv-operating-system-memory-management_wk4
Ios103 ios102 iv-operating-system-memory-management_wk4
 
Non-Uniform Memory Access ( NUMA)
Non-Uniform Memory Access ( NUMA)Non-Uniform Memory Access ( NUMA)
Non-Uniform Memory Access ( NUMA)
 
Csc1401 lecture05 - cache memory
Csc1401   lecture05 - cache memoryCsc1401   lecture05 - cache memory
Csc1401 lecture05 - cache memory
 
08 operating system support
08 operating system support08 operating system support
08 operating system support
 
Cache performance considerations
Cache performance considerationsCache performance considerations
Cache performance considerations
 
NUMA overview
NUMA overviewNUMA overview
NUMA overview
 
Memory
MemoryMemory
Memory
 
Parallel processing extra
Parallel processing extraParallel processing extra
Parallel processing extra
 
Memory managment
Memory managmentMemory managment
Memory managment
 
31 address binding, dynamic loading
31 address binding, dynamic loading31 address binding, dynamic loading
31 address binding, dynamic loading
 
8 memory management strategies
8 memory management strategies8 memory management strategies
8 memory management strategies
 
Parallel processing
Parallel processingParallel processing
Parallel processing
 
Chip Multithreading Systems Need a New Operating System Scheduler
Chip Multithreading Systems Need a New Operating System Scheduler Chip Multithreading Systems Need a New Operating System Scheduler
Chip Multithreading Systems Need a New Operating System Scheduler
 
Parallel architecture
Parallel architectureParallel architecture
Parallel architecture
 
08 operating system support
08 operating system support08 operating system support
08 operating system support
 

Destacado

digital logic circuits, digital component floting and fixed point
 digital logic circuits, digital component floting and fixed point digital logic circuits, digital component floting and fixed point
digital logic circuits, digital component floting and fixed pointRai University
 
Prof. Zhihua Wang, Tsinghua University, Beijing, China
Prof. Zhihua Wang, Tsinghua University, Beijing, China Prof. Zhihua Wang, Tsinghua University, Beijing, China
Prof. Zhihua Wang, Tsinghua University, Beijing, China chiportal
 
Digital Signal Processor evolution over the last 30 years
Digital Signal Processor evolution over the last 30 yearsDigital Signal Processor evolution over the last 30 years
Digital Signal Processor evolution over the last 30 yearsFrancois Charlot
 
The evolution of TMS, family of DSP\'s
The evolution of TMS, family of DSP\'sThe evolution of TMS, family of DSP\'s
The evolution of TMS, family of DSP\'sRitul Sonania
 
Digital Signal Processors - DSP's
Digital Signal Processors - DSP'sDigital Signal Processors - DSP's
Digital Signal Processors - DSP'sHicham Berkouk
 

Destacado (6)

digital logic circuits, digital component floting and fixed point
 digital logic circuits, digital component floting and fixed point digital logic circuits, digital component floting and fixed point
digital logic circuits, digital component floting and fixed point
 
Prof. Zhihua Wang, Tsinghua University, Beijing, China
Prof. Zhihua Wang, Tsinghua University, Beijing, China Prof. Zhihua Wang, Tsinghua University, Beijing, China
Prof. Zhihua Wang, Tsinghua University, Beijing, China
 
Architecture of tms320 f2812
Architecture of tms320 f2812Architecture of tms320 f2812
Architecture of tms320 f2812
 
Digital Signal Processor evolution over the last 30 years
Digital Signal Processor evolution over the last 30 yearsDigital Signal Processor evolution over the last 30 years
Digital Signal Processor evolution over the last 30 years
 
The evolution of TMS, family of DSP\'s
The evolution of TMS, family of DSP\'sThe evolution of TMS, family of DSP\'s
The evolution of TMS, family of DSP\'s
 
Digital Signal Processors - DSP's
Digital Signal Processors - DSP'sDigital Signal Processors - DSP's
Digital Signal Processors - DSP's
 

Similar a TRACK G: An Innovative multicore system architecture for wireless SoCs/ Alon Yaakov

HDFS_architecture.ppt
HDFS_architecture.pptHDFS_architecture.ppt
HDFS_architecture.pptvijayapraba1
 
Apos week 1 4
Apos week 1   4Apos week 1   4
Apos week 1 4alixafar
 
Introduction to parallel processing
Introduction to parallel processingIntroduction to parallel processing
Introduction to parallel processingPage Maker
 
Memory Management in Operating Systems for all
Memory Management in Operating Systems for allMemory Management in Operating Systems for all
Memory Management in Operating Systems for allVSKAMCSPSGCT
 
Cache Performance Evaluation
Cache Performance EvaluationCache Performance Evaluation
Cache Performance EvaluationAlfred Mutanga
 
Os concepts 4 functions of os
Os concepts 4 functions of osOs concepts 4 functions of os
Os concepts 4 functions of osVaibhav Khanna
 
Flashy prefetching for high performance flash drives
Flashy prefetching for high performance flash drivesFlashy prefetching for high performance flash drives
Flashy prefetching for high performance flash drivesPratik Bhat
 
multiprocessor _system _presentation.ppt
multiprocessor _system _presentation.pptmultiprocessor _system _presentation.ppt
multiprocessor _system _presentation.pptnaghamallella
 
cs-intro-os.ppt
cs-intro-os.pptcs-intro-os.ppt
cs-intro-os.pptinfomerlin
 
Operating system 05 functions of os
Operating system 05 functions of osOperating system 05 functions of os
Operating system 05 functions of osVaibhav Khanna
 
Multiple processor systems
Multiple processor systemsMultiple processor systems
Multiple processor systemsjeetesh036
 
multi processors
multi processorsmulti processors
multi processorsAcad
 

Similar a TRACK G: An Innovative multicore system architecture for wireless SoCs/ Alon Yaakov (20)

HDFS_architecture.ppt
HDFS_architecture.pptHDFS_architecture.ppt
HDFS_architecture.ppt
 
Apos week 1 4
Apos week 1   4Apos week 1   4
Apos week 1 4
 
11-IOManagement.ppt
11-IOManagement.ppt11-IOManagement.ppt
11-IOManagement.ppt
 
11-IOManagement.ppt
11-IOManagement.ppt11-IOManagement.ppt
11-IOManagement.ppt
 
Introduction to parallel processing
Introduction to parallel processingIntroduction to parallel processing
Introduction to parallel processing
 
CQRS
CQRSCQRS
CQRS
 
Memory Management in Operating Systems for all
Memory Management in Operating Systems for allMemory Management in Operating Systems for all
Memory Management in Operating Systems for all
 
Intro_ppt.pptx
Intro_ppt.pptxIntro_ppt.pptx
Intro_ppt.pptx
 
Cache Performance Evaluation
Cache Performance EvaluationCache Performance Evaluation
Cache Performance Evaluation
 
Os concepts
Os conceptsOs concepts
Os concepts
 
Os concepts 4 functions of os
Os concepts 4 functions of osOs concepts 4 functions of os
Os concepts 4 functions of os
 
Flashy prefetching for high performance flash drives
Flashy prefetching for high performance flash drivesFlashy prefetching for high performance flash drives
Flashy prefetching for high performance flash drives
 
multiprocessor _system _presentation.ppt
multiprocessor _system _presentation.pptmultiprocessor _system _presentation.ppt
multiprocessor _system _presentation.ppt
 
Cs intro-ca
Cs intro-caCs intro-ca
Cs intro-ca
 
Io (2)
Io (2)Io (2)
Io (2)
 
cs-intro-os.ppt
cs-intro-os.pptcs-intro-os.ppt
cs-intro-os.ppt
 
Operating system 05 functions of os
Operating system 05 functions of osOperating system 05 functions of os
Operating system 05 functions of os
 
Multiple processor systems
Multiple processor systemsMultiple processor systems
Multiple processor systems
 
multi processors
multi processorsmulti processors
multi processors
 
Hadoop
HadoopHadoop
Hadoop
 

Más de chiportal

Prof. Steve Furber, University of Manchester, Principal Designer of the BBC M...
Prof. Steve Furber, University of Manchester, Principal Designer of the BBC M...Prof. Steve Furber, University of Manchester, Principal Designer of the BBC M...
Prof. Steve Furber, University of Manchester, Principal Designer of the BBC M...chiportal
 
Prof. Steve Furber, University of Manchester, Principal Designer of the BBC M...
Prof. Steve Furber, University of Manchester, Principal Designer of the BBC M...Prof. Steve Furber, University of Manchester, Principal Designer of the BBC M...
Prof. Steve Furber, University of Manchester, Principal Designer of the BBC M...chiportal
 
Prof. Uri Weiser,Technion
Prof. Uri Weiser,TechnionProf. Uri Weiser,Technion
Prof. Uri Weiser,Technionchiportal
 
Ken Liao, Senior Associate VP, Faraday
Ken Liao, Senior Associate VP, FaradayKen Liao, Senior Associate VP, Faraday
Ken Liao, Senior Associate VP, Faradaychiportal
 
Prof. Danny Raz, Director, Bell Labs Israel, Nokia
 Prof. Danny Raz, Director, Bell Labs Israel, Nokia  Prof. Danny Raz, Director, Bell Labs Israel, Nokia
Prof. Danny Raz, Director, Bell Labs Israel, Nokia chiportal
 
Marco Casale-Rossi, Product Mktg. Manager, Synopsys
Marco Casale-Rossi, Product Mktg. Manager, SynopsysMarco Casale-Rossi, Product Mktg. Manager, Synopsys
Marco Casale-Rossi, Product Mktg. Manager, Synopsyschiportal
 
Dr.Efraim Aharoni, ESD Leader, TowerJazz
Dr.Efraim Aharoni, ESD Leader, TowerJazzDr.Efraim Aharoni, ESD Leader, TowerJazz
Dr.Efraim Aharoni, ESD Leader, TowerJazzchiportal
 
Eddy Kvetny, System Engineering Group Leader, Intel
Eddy Kvetny, System Engineering Group Leader, IntelEddy Kvetny, System Engineering Group Leader, Intel
Eddy Kvetny, System Engineering Group Leader, Intelchiportal
 
Dr. John Bainbridge, Principal Application Architect, NetSpeed
 Dr. John Bainbridge, Principal Application Architect, NetSpeed  Dr. John Bainbridge, Principal Application Architect, NetSpeed
Dr. John Bainbridge, Principal Application Architect, NetSpeed chiportal
 
Xavier van Ruymbeke, App. Engineer, Arteris
Xavier van Ruymbeke, App. Engineer, ArterisXavier van Ruymbeke, App. Engineer, Arteris
Xavier van Ruymbeke, App. Engineer, Arterischiportal
 
Asi Lifshitz, VP R&D, Vtool
Asi Lifshitz, VP R&D, VtoolAsi Lifshitz, VP R&D, Vtool
Asi Lifshitz, VP R&D, Vtoolchiportal
 
Zvika Rozenshein,General Manager, EngineeringIQ
Zvika Rozenshein,General Manager, EngineeringIQZvika Rozenshein,General Manager, EngineeringIQ
Zvika Rozenshein,General Manager, EngineeringIQchiportal
 
Lewis Chu,Marketing Director,GUC
Lewis Chu,Marketing Director,GUC Lewis Chu,Marketing Director,GUC
Lewis Chu,Marketing Director,GUC chiportal
 
Kunal Varshney, VLSI Engineer, Open-Silicon
Kunal Varshney, VLSI Engineer, Open-SiliconKunal Varshney, VLSI Engineer, Open-Silicon
Kunal Varshney, VLSI Engineer, Open-Siliconchiportal
 
Gert Goossens,Sen. Director, ASIP Tools, Synopsys
Gert Goossens,Sen. Director, ASIP Tools, SynopsysGert Goossens,Sen. Director, ASIP Tools, Synopsys
Gert Goossens,Sen. Director, ASIP Tools, Synopsyschiportal
 
Tuvia Liran, Director of VLSI, Nano Retina
Tuvia Liran, Director of VLSI, Nano RetinaTuvia Liran, Director of VLSI, Nano Retina
Tuvia Liran, Director of VLSI, Nano Retinachiportal
 
Sagar Kadam, Lead Software Engineer, Open-Silicon
Sagar Kadam, Lead Software Engineer, Open-SiliconSagar Kadam, Lead Software Engineer, Open-Silicon
Sagar Kadam, Lead Software Engineer, Open-Siliconchiportal
 
Ronen Shtayer,Director of ASG Operations & PMO, NXP Semiconductor
Ronen Shtayer,Director of ASG Operations & PMO, NXP SemiconductorRonen Shtayer,Director of ASG Operations & PMO, NXP Semiconductor
Ronen Shtayer,Director of ASG Operations & PMO, NXP Semiconductorchiportal
 
Prof. Emanuel Cohen, Technion
Prof. Emanuel Cohen, TechnionProf. Emanuel Cohen, Technion
Prof. Emanuel Cohen, Technionchiportal
 
Prof. Ramez Daniel, Technion
Prof. Ramez Daniel, TechnionProf. Ramez Daniel, Technion
Prof. Ramez Daniel, Technionchiportal
 

Más de chiportal (20)

Prof. Steve Furber, University of Manchester, Principal Designer of the BBC M...
Prof. Steve Furber, University of Manchester, Principal Designer of the BBC M...Prof. Steve Furber, University of Manchester, Principal Designer of the BBC M...
Prof. Steve Furber, University of Manchester, Principal Designer of the BBC M...
 
Prof. Steve Furber, University of Manchester, Principal Designer of the BBC M...
Prof. Steve Furber, University of Manchester, Principal Designer of the BBC M...Prof. Steve Furber, University of Manchester, Principal Designer of the BBC M...
Prof. Steve Furber, University of Manchester, Principal Designer of the BBC M...
 
Prof. Uri Weiser,Technion
Prof. Uri Weiser,TechnionProf. Uri Weiser,Technion
Prof. Uri Weiser,Technion
 
Ken Liao, Senior Associate VP, Faraday
Ken Liao, Senior Associate VP, FaradayKen Liao, Senior Associate VP, Faraday
Ken Liao, Senior Associate VP, Faraday
 
Prof. Danny Raz, Director, Bell Labs Israel, Nokia
 Prof. Danny Raz, Director, Bell Labs Israel, Nokia  Prof. Danny Raz, Director, Bell Labs Israel, Nokia
Prof. Danny Raz, Director, Bell Labs Israel, Nokia
 
Marco Casale-Rossi, Product Mktg. Manager, Synopsys
Marco Casale-Rossi, Product Mktg. Manager, SynopsysMarco Casale-Rossi, Product Mktg. Manager, Synopsys
Marco Casale-Rossi, Product Mktg. Manager, Synopsys
 
Dr.Efraim Aharoni, ESD Leader, TowerJazz
Dr.Efraim Aharoni, ESD Leader, TowerJazzDr.Efraim Aharoni, ESD Leader, TowerJazz
Dr.Efraim Aharoni, ESD Leader, TowerJazz
 
Eddy Kvetny, System Engineering Group Leader, Intel
Eddy Kvetny, System Engineering Group Leader, IntelEddy Kvetny, System Engineering Group Leader, Intel
Eddy Kvetny, System Engineering Group Leader, Intel
 
Dr. John Bainbridge, Principal Application Architect, NetSpeed
 Dr. John Bainbridge, Principal Application Architect, NetSpeed  Dr. John Bainbridge, Principal Application Architect, NetSpeed
Dr. John Bainbridge, Principal Application Architect, NetSpeed
 
Xavier van Ruymbeke, App. Engineer, Arteris
Xavier van Ruymbeke, App. Engineer, ArterisXavier van Ruymbeke, App. Engineer, Arteris
Xavier van Ruymbeke, App. Engineer, Arteris
 
Asi Lifshitz, VP R&D, Vtool
Asi Lifshitz, VP R&D, VtoolAsi Lifshitz, VP R&D, Vtool
Asi Lifshitz, VP R&D, Vtool
 
Zvika Rozenshein,General Manager, EngineeringIQ
Zvika Rozenshein,General Manager, EngineeringIQZvika Rozenshein,General Manager, EngineeringIQ
Zvika Rozenshein,General Manager, EngineeringIQ
 
Lewis Chu,Marketing Director,GUC
Lewis Chu,Marketing Director,GUC Lewis Chu,Marketing Director,GUC
Lewis Chu,Marketing Director,GUC
 
Kunal Varshney, VLSI Engineer, Open-Silicon
Kunal Varshney, VLSI Engineer, Open-SiliconKunal Varshney, VLSI Engineer, Open-Silicon
Kunal Varshney, VLSI Engineer, Open-Silicon
 
Gert Goossens,Sen. Director, ASIP Tools, Synopsys
Gert Goossens,Sen. Director, ASIP Tools, SynopsysGert Goossens,Sen. Director, ASIP Tools, Synopsys
Gert Goossens,Sen. Director, ASIP Tools, Synopsys
 
Tuvia Liran, Director of VLSI, Nano Retina
Tuvia Liran, Director of VLSI, Nano RetinaTuvia Liran, Director of VLSI, Nano Retina
Tuvia Liran, Director of VLSI, Nano Retina
 
Sagar Kadam, Lead Software Engineer, Open-Silicon
Sagar Kadam, Lead Software Engineer, Open-SiliconSagar Kadam, Lead Software Engineer, Open-Silicon
Sagar Kadam, Lead Software Engineer, Open-Silicon
 
Ronen Shtayer,Director of ASG Operations & PMO, NXP Semiconductor
Ronen Shtayer,Director of ASG Operations & PMO, NXP SemiconductorRonen Shtayer,Director of ASG Operations & PMO, NXP Semiconductor
Ronen Shtayer,Director of ASG Operations & PMO, NXP Semiconductor
 
Prof. Emanuel Cohen, Technion
Prof. Emanuel Cohen, TechnionProf. Emanuel Cohen, Technion
Prof. Emanuel Cohen, Technion
 
Prof. Ramez Daniel, Technion
Prof. Ramez Daniel, TechnionProf. Ramez Daniel, Technion
Prof. Ramez Daniel, Technion
 

Último

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 

Último (20)

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 

TRACK G: An Innovative multicore system architecture for wireless SoCs/ Alon Yaakov

  • 1. May 1, 2013 An Innovative Multicore System Architecture for Wireless SoCs Alon Yaakov DSP Architecture Manager, CEVA
  • 2. May 1, 2013 Multicore in Embedded System Defining the Problem • Control-plane – Synchronization between cores – Semaphores – Message passing using mailbox mechanism – Snooping mechanism – Interrupt handling • Data-plane EqualizationAntenna Processing Error Correction This will be the focus of today’s presentation
  • 3. May 1, 2013 Outline • Multicore Challenges • The CEVA-XC Solution
  • 4. May 1, 2013 Multicore Challenges 1. Partitioning > Task partitioning onto different chip resources > Data partitioning onto different chip resources 2. Resource sharing > Memories, buses, system I/Fs, peripherals, etc. 3. Scheduling > Allocating tasks/data 4. Data sharing > Transferring data between engines DSP A DSP C DSP B CTCMLDFFT Application ?
  • 5. May 1, 2013 • Tasks – Parts of an algorithm running in a sequential order – A task must have a defined input and output data structure (packets) Challenge 1: Task Partitioning Error Correction EqualizationAntenna Processing MLD FFT Ch estimation FFT Ch estimation Reordering Interleaver Interleaver Interleaver CTC CTC CTC Concatenati on & CRC checker Concatenati on & CRC checker Concatenati on & CRC checker Task Data
  • 6. May 1, 2013 Challenge 1: Task Partitioning HW Offloading • Parts of the algorithm are more suited for HW acceleration – Well known algorithms that require little programmability – Heavy computational effort MLD FFT Ch estimation FFT Ch estimation Reordering Interleaver Interleaver Interleaver CTC CTC CTC Concatenati on & CRC checker Concatenati on & CRC checker Concatenati on & CRC checker
  • 7. May 1, 2013 Challenge 1: Data Partitioning • Several cores are used to process different input data packets • Suitable for homogeneous systems • Shared memory is used for storing history data – Core must wait for data to update before using it  Latency • The entire program code is used by all cores – Core suffers stall cycles if L1 memory is small
  • 8. May 1, 2013 Challenge 1: Partitioning OK, Now What? • Efficient partitioning is dependent on the hardware platform • Building the optimal system depends on the partitioning • There is no single optimal solution – Each approach has its merits • Partitioning can be eased by starting with a reference that can be used as a basis
  • 9. May 1, 2013 Challenge 2: Resource Sharing • Resource types – DSP cores – HW accelerators – Memories – Buses – DMA • Resource sharing creates contention Memory
  • 10. May 1, 2013 Challenge 2: Resource Sharing Avoiding Contentions • If possible avoid contentions by duplicating HW – Multiple DMAs – Duplicated HW accelerators – Multilayer BUS – Partition memory into blocks enabling concurrent access • Throughput and latency govern the minimum amount of hardware resources Memory Memory Memory Memory
  • 11. May 1, 2013 Challenge 2: Resource Sharing Arbitration • When a simple set of known rules can be defined a resource can be shared using a HW arbiter • QoS – Priority – Bandwidth allocation (weight) – Well known algorithms (round robin) • Arbitration is based on time sharing of resources  Scheduling Memory Arbiter
  • 12. May 1, 2013 Challenge 3: Scheduling • How do we assign and schedule tasks to cores? Application ? Concatenati on & CRC checker MLDFFT Ch estimation FFT Ch estimation Reordering Interleaver Interleaver Interleaver CTC CTC CTC Concatenati on & CRC checker Concatenati on & CRC checker DSP A DSP C DSP B CTCMLDFFT
  • 13. May 1, 2013 Challenge 3: Scheduling Static Scheduling • Tasks are statically assigned to DSP cores • Design phase includes task scheduling – Data flow is fixed – Suitable when the load on each task is fixed CTC HW Core DSP C DSP BMLD HW CoreDSP AFFT HW Core MLD FFT Ch estimation FFT Ch estimation Reorder Interleaver Interleaver Interleaver CTC CTC CTC Concatenat & CRC checker Concatenat & CRC checker Concatenat & CRC checker
  • 14. May 1, 2013 Challenge 3: Scheduling Dynamic Scheduling Concatenati on & CRC checker DSP CDSP BDSP A CTCMLDFFT MLD FFT Ch estimation FFTCh estimation Reorder Interleaver Interleaver Interleaver CTC CTC CTC Concatenati on & CRC checker Concatenati on & CRC checker MASTER (Scheduler) > A scheduler dynamically assigns tasks to cores > Scheduler algorithm selects the best core to execute the task > Processing capabilities > Locality of data > Load balance > Suitable for complex designs with variable processing load and QoS
  • 15. May 1, 2013 Challenge 4: Data Sharing Memory Hierarchy • Internal L1 memory – Fast memory with no access penalty – Small / medium size – Dedicated per core • External memory – Can be on-chip (L2) or off-chip (i.e. DDR) – Slow memory with access penalty – Large size – Shared among several cores – Contentions
  • 16. May 1, 2013 Challenge 4: Data Sharing Using Cache • When shared data is used, a cache system can be used to reduce the stall count – Statistically reduces memory stalls, but not deterministic • Used only for accessing narrow data width – Cache should be used for control data – Not recommended for vector DSP data flow • Large caches • Many stall cycles  How to share vector data?
  • 17. May 1, 2013 Challenge 4: Data Sharing Pre-Fetching Data • A task cannot start until its preceding task completes • If we can schedule the next task to be executed we can pre-fetch its input data – Using static scheduling the data flow is known – Using dynamic scheduling the scheduler must handle data move prior to activating a task MLD Reordering FFT Ch estimation FFT Ch estimation Interleaver Interleaver Interleaver CTC CTC CTC Concatenat & CRC checker Concatenat & CRC checker Concatenat & CRC checker DMA DMA DMA DMA DMA DMA DMA DMA DMA DMA DMA DMA DMA DMA
  • 18. May 1, 2013 Challenge 4: Data Sharing Pre-Fetching using DMA • DMA transfer must wait for the following conditions: – Source data is available – Destination data can be written (i.e. allocated memory is free) • DMA activation schemes – Real-Time SW  Programmable, large MIPS overhead – HW system events  Not programmable – Queue manager  Programmable, no MIPS overhead
  • 19. May 1, 2013 Challenge 4: Data Sharing Pre-Fetching using DMA with Queue Manager • A Queue is a list of tasks handled in a FIFO manner • Each DMA queue contains all DMA tasks related with data flow channel • DMA tasks are pushed to the queue – DSP software (i.e. static scheduling) – System scheduler (i.e. dynamic scheduling) • Tasks are automatically activated using HW or SW events – Source data is available & destination memory is free FFT Ch estimation DMA
  • 20. May 1, 2013 Outline • Multicore Challenges • The CEVA-XC Solution
  • 21. May 1, 2013 CEVA-XC4000 Multicore Solution Optional Cache ctrl ACE
  • 22. May 1, 2013 MUST™ Multi-core System Technology Overview • Fully featured data cache – Non-blocking, software operations, Write-Back & Write-Through • Advanced support for cache coherency – Based on ARM’s leading AMBA-4 ACE™ technology • Advanced system interconnect – AXI-4 - easy system integration and high Quality of Service (QoS) – Multi-layer FIC (Fast Inter-Connect) - low latency, high throughput master and slave ports – Multi-level memory architecture using local TCMs and hierarchy of caches
  • 23. May 1, 2013 MUST™ Multicore System Technology – Cont. • Data Traffic Manager – Automated data traffic management without DSP intervention • Comprehensive software development support – Advanced multicore debug and profiling – Complete system emulation with real hardware – Hardware abstraction layer (HAL) including drivers and system APIs • Support for homogeneous and heterogeneous clusters of multiple DSPs and CPUs – Support for advanced resource management and sharing – Flexible task scheduling for different system architectures: dynamic, event based, data driven, etc.
  • 24. May 1, 2013 > Allows multiple cores to use shared memory without any software intervention > Superior performance to SW coherency > Simplifying software development > Easy SW partitioning and scaling from single core to multi-core > External memory can be dynamically partitioned into shared and unique areas > Minimizing system memories size > Flexible memory allocation speed up the SW development > Snooping is only applied to shared areas Cache Coherency Support
  • 25. May 1, 2013 Data Traffic Management – Data Traffic Manager – Based on Queue Manager and Buffer Manager Structures • Queue Manager - Maintains multiple independent queues of “tasks” • Buffer Manager- Autonomously tracks data status of source and destination buffers – Data transfers are automatically managed based on tasks status, input and output data buffers load – Automatic data traffic management and DSP offloading – Prioritized scheduling for guaranteed QoS – Low latency packet transfers without software intervention – Results in lower memory consumption and improved system performance
  • 26. May 1, 2013 Data Traffic Manager – Allows sharing a resource among multiple cores via a shared queue • Tasks are executed based on priority and buffer status • Prevents starvation and deadlocks – Allows a single core to work with multiple queues • The core read / writes from / to its buffers (can be local or external) • All data transfers between cores and accelerators are performed automatically via the data traffic manager
  • 27. May 1, 2013 Dynamic Scheduling – Dynamic scheduling in symmetric systems – A clustered system based on homogenous DSP cores – Dynamic task allocation to DSP cores in runtime – Flexible choice of algorithms based on system load – Hardware abstraction using task oriented APIs – Shared external memories – FIC interface for low-latency high-bandwidth data accesses • Commonly used in wireless infrastructure applications
  • 28. May 1, 2013 MUST™ Hardware Abstraction Layer (HAL) • MUST™ is assisted by user-friendly software support – Abstracts the queues, buffers, DMA and caches • The software package includes: – Drivers and APIs – Full system profiling – Graphical interface via CEVA ToolBox
  • 29. May 1, 2013 Multicore Modeling and Simulation • Simulating any number of cores – Support for symmetric and asymmetric configurations • Support for ARM CADI (Cycle Accurate Debug Interface) – Including connectivity to ARM’s Real-View debugger • Comprehensive multi-core simulation support – Synchronization, system browsing, shared memories , inter connect, accelerator simulation, cross triggering, AXI / FIC I/F – Support for user-defined components • ESL tools integration with full debug capabilities – Compliant with TLM 2.0 – Full support for Carbon and Synopsys
  • 30. May 1, 2013 Co-processor Portfolio for Wireless Modems – Wide range of co-processors offering power-efficient DSP offloading at extreme processing rates • A complete wireless platform addressing all major modem PHY processing requirements • Offering flexible hardware-software partitioning • Customers can focus on differentiation via DSP software – Unique automated data traffic management between DSP memory and hardware accelerators • Allows fully parallel processing support • Based on data traffic manager
  • 31. May 1, 2013 Co-processor Portfolio for Wireless Modems – Cont. • Optimized tightly coupled extensions (TCE) – MLD – Maximum Likelihood MIMO detectors • Supports up to four MIMO layers • Achieves near ML performance – De-spreader – 3G De-spreader units • Supports all WCDMA and HSDPA channels • Scalable up to 3GPP HSPA+ Rel-11 – DFT / FFT • Supports multi radix DFTs • Includes NCO correction – Viterbi • Programmable K and r values • Supports tail biting – LLR processing and HARQ combining • Supports LTE de-rate matching • Significantly reduces HARQ memory buffer sizes Dramatically reduces time-to-market
  • 32. May 1, 2013 Putting It All Together A Cluster of Four CEVA-XC DSPs > Processor Level > Fixed-point and Floating-point Vector DSPs > Running over 1GHz > Data Cache > Platform Level > Complete set of tightly coupled co- processor units > Automated DSP offloading using data traffic management > System Level > Full cache coherency support > AMBA-4 and FIC system interfaces > Software Development Support > HAL using drivers and APIs > Comprehensive system debug & profiling

Notas del editor

  1. History data - i.e. chest
  2. History data - i.e. chest
  3. Task scheduling must account for worst case execution time per task
  4. Cache coherencyWhen a shared data resides in the cache of one core it is unaware of changes to data made by other coresThis is best solved using MESI / ACE protocols
  5. Tasks are pushed by the SW -> flexibility