Using Many-Core Processors to Improve the Performance of Space Computing Platforms

Faculty of Informatics
Chair of Computer Architectures

Fisnik Kraja
Fi ik K j
Phd Candidate

2011 IEEE Aerospace Conference, 5-12 March 2011, Big Sky, Montana

• Subject: New computing architecture for future satellites.

• Purpose: To introduce many-core and other COTS
technologies in the design process.

• Main points will be:
– State f th
St t of the art of space applications and computing platforms
t f li ti d ti l tf
– Proposed system architecture
– Performance Estimations (Benchmarking)
– Discussions and conclusions

3/12/2011 2

• On-board computers offer minimal functionality.
• Constrains like power , size , heat

• High-reliability requirements, because of radiation effects:
– Total Ionizing Dose (TID)
– Single Event Upset (SEU)
– Single Event Transient (SET)
– Single Event Latch up (SEL)
Latch-up

• New space applications ask for improved on-board
processing abilities in terms of
abilities,
– high processing power and throughput
– without losing the required reliability.

3/12/2011 3

• HRWS SAR
(High resolution wide swath synthetic aperture radar).
• Used to reduce the amount of data to be transmitted to ground
• Uses separate apertures to transmit and receive
• Uses multiply phase centers in receive
• Each panel represents an independent phase center
• 7 Panels are used, each consisting of 12 tiles

3/12/2011 4

Parallelism of the algorithm:
• 7 independent panel processing
• 12x7=84 independent tile
processing

Requirements:
1 Tera 16-bit fixed point Ops/s
16 bit
(complex multiply and add)
Peak sample rate : 8Gbps
Full t
F ll antenna average raw data
d t
rate 603.1 Gbps

3/12/2011 It is impossible to fulfill these requirements 5
with currently available technology for space.

• To efficiently apply the upcoming many-core processors
and other COTS products to improve the on-board
processing power.
i

• Reliability of the system should be addressed by:
– traditional hardware techniques (TMR)
– software-implemented fault-tolerant techniques
• Thread/process/service replication

• This system should provide other important features:
– flexibility,
– scalability
l bilit
– portability.

3/12/2011 6

I/O RHPU
Memory
Memory
Memory

Reliable Local Bus

Bus interfacing

3/12/2011 8

• Solution to the tradeoff between performance and reliability might be the
rotating consistency check, in which only some processes are replicated
and results checked for consistency at a time, but over a longer period all of
them get verified.

3/12/2011 9

Why SSCA#3?
• Computationally taxing
• Large block data transfers
L bl k d t t f
• Stressful memory access patterns
• Scalable to mimic different problem sizes

1. Synthetic Data Generation stage is used to produce raw SAR
data approximates, which are similar to what would be obtained
from a real SAR system.
f l t
2. SAR Sensor Processing stage reconstructs a SAR image
using a wavefront spotlight SAR reconstruction method known as
2D F i M t h d Filt i and I t
Fourier Matched Filtering d Interpolation.
l ti

3/12/2011 10

SDG:
Kernel 1:
Synthetic SAR returns
Reconstructed SAR image
from a uniform grid of
point reflectors

3/12/2011 11

The symmetric SMA (UMA) The distributed SMA (NUMA)
– 1 Nehalem CPU: Intel Core i7 CPU 920 − 2 Nehalem CPUs: Intel Xeon CPU X5670,
– 2.67 GHz Frequency − 2.93 GHz processor frequency
– 8 MB L3 Smart Cache − 12 MB L3 Smart Cache
– 4 Cores
4 Cores (8 Threads in Hyper threading)
Hyper-threading) − 6 Cores/CPU
– 130 W power consumption − 95 W power consumption
– 24 Gigabytes of DDR3 RAM − 36(18x2) Gigabytes of DDR3 RAM
– 4.8 Giga Transfers/s QPI
g − 6.4 Giga Transfers/s QPI
g

3/12/2011 12

UMA-SMA NUMA-SMA
architectures offer flexibility but architectures avoid bottleneck
they tend to have memory
y y problems in memories, but require
p q
bottlenecks. manual/pinned allocation of memory
for each thread.

3/12/2011 13

Sequential FFT Multithreaded FFT
Parallelized Loops with OpenMP Tiling Technique

Threaded FFT using OpenMP
GOMP_CPU_AFFINITY =” 0-11”
More Private Variables

3/12/2011 14

Most important optimizations:
• Thread Pinning (first touch policy of memory)
• Private Data (stack, local)/Shared Data(remote cached, evicted)
(stack Data(remote, cached
• Scheduling
Static for loops with regular workloads
Dynamic for loops with non regular ones
Outlook
• The SAR data generation and image formation are scalable to
• 4 cores i UMA (U ifi d M
in (Unified Memory A
Access)
)
• 12 cores in NUMA-2x[6Cores, 16GB RAM]
• Speedup is almost linear in these SMA architectures
• This code is expected to scale to bigger numbers of cores
• Further parallelization paradigms are planed:
• MPI(Message Passing Interface) for clusters
• CUDA f GPGPUs
for GPGPU
3/12/2011 15

By combining many-core processors and other COTS
products with radiation-hardened specific components
one can benefit:
• A speedup by a factor of 10 to 100
• Improved reliability and robustness of the system.
• Efficient and faster application development via already familiar
programming models.
• Ability to port applications directly to the space environment.
• Minimization f the
Mi i i ti of th non-recurring d i development ti
l t time and costs f
d t for
future missions.
• Efficient, flexible and portable software fault-tolerance
techniques that can be applied in the space environment
environment.
• Portability to future advances in technology.

3/12/2011 16

Thank you for your attention!

Fisnik Kraja

LRR - L h t hl fü R h t h ik und R h
Lehrstuhl für Rechnertechnik d Rechnerorganisation
i ti
Technische Universität München

kraja@in.tum.de
j @

3/12/2011 17

Using Many-Core Processors to Improve the Performance of Space Computing Platforms

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a Using Many-Core Processors to Improve the Performance of Space Computing Platforms

Similar a Using Many-Core Processors to Improve the Performance of Space Computing Platforms (20)

Más de Fisnik Kraja

Más de Fisnik Kraja (6)

Using Many-Core Processors to Improve the Performance of Space Computing Platforms