SlideShare una empresa de Scribd logo
1 de 17
Descargar para leer sin conexión
Faculty of Informatics
                   Chair of Computer Architectures

                            Fisnik Kraja
                            Fi ik K j
                           Phd Candidate




2011 IEEE Aerospace Conference, 5-12 March 2011, Big Sky, Montana
• Subject: New computing architecture for future satellites.

• Purpose: To introduce many-core and other COTS
  technologies in the design process.

• Main points will be:
     –   State f th
         St t of the art of space applications and computing platforms
                       t f           li ti       d      ti    l tf
     –   Proposed system architecture
     –   Performance Estimations (Benchmarking)
     –   Discussions and conclusions




3/12/2011                                                                2
• On-board computers offer minimal functionality.
• Constrains like power , size , heat

• High-reliability requirements, because of radiation effects:
     –   Total Ionizing Dose (TID)
     –   Single Event Upset (SEU)
     –   Single Event Transient (SET)
     –   Single Event Latch up (SEL)
                       Latch-up

• New space applications ask for improved on-board
  processing abilities in terms of
             abilities,
     – high processing power and throughput
     – without losing the required reliability.


3/12/2011                                                  3
• HRWS SAR
  (High resolution wide swath synthetic aperture radar).
   •   Used to reduce the amount of data to be transmitted to ground
   •   Uses separate apertures to transmit and receive
   •   Uses multiply phase centers in receive
   •   Each panel represents an independent phase center
   •   7 Panels are used, each consisting of 12 tiles




  3/12/2011                                                            4
Parallelism of the algorithm:
                                    • 7 independent panel processing
                                    • 12x7=84 independent tile
                                    processing



                                       Requirements:
                                       1 Tera 16-bit fixed point Ops/s
                                              16 bit
                                       (complex multiply and add)
                                       Peak sample rate : 8Gbps
                                       Full t
                                       F ll antenna average raw data
                                                                d t
                                       rate 603.1 Gbps



3/12/2011    It is impossible to fulfill these requirements       5
            with currently available technology for space.
• To efficiently apply the upcoming many-core processors
  and other COTS products to improve the on-board
  processing power.
           i

• Reliability of the system should be addressed by:
      – traditional hardware techniques (TMR)
      – software-implemented fault-tolerant techniques
            • Thread/process/service replication


• This system should provide other important features:
      – flexibility,
      – scalability
            l bilit
      – portability.



3/12/2011                                                6
3/12/2011   7
I/O   RHPU
                                                Memory
                                               Memory
                                              Memory




                         Reliable Local Bus



                          Bus interfacing




3/12/2011                                                8
•   Solution to the tradeoff between performance and reliability might be the
    rotating consistency check, in which only some processes are replicated
    and results checked for consistency at a time, but over a longer period all of
    them get verified.




     3/12/2011                                                                9
Why SSCA#3?
      •      Computationally taxing
      •      Large block data transfers
             L     bl k d t t       f
      •      Stressful memory access patterns
      •      Scalable to mimic different problem sizes

 1.       Synthetic Data Generation stage is used to produce raw SAR
          data approximates, which are similar to what would be obtained
          from a real SAR system.
          f         l        t
 2.       SAR Sensor Processing stage reconstructs a SAR image
          using a wavefront spotlight SAR reconstruction method known as
          2D F i M t h d Filt i and I t
             Fourier Matched Filtering d Interpolation.
                                                  l ti




3/12/2011                                                              10
SDG: 
                          Kernel 1:
Synthetic SAR returns 
                          Reconstructed  SAR image
from a uniform grid of 
point reflectors




    3/12/2011                          11
The symmetric SMA (UMA)                      The distributed SMA (NUMA)
–   1 Nehalem CPU: Intel Core i7 CPU 920     −   2 Nehalem CPUs: Intel Xeon CPU X5670,
–   2.67 GHz Frequency                       −   2.93 GHz processor frequency
–   8 MB L3 Smart Cache                      −   12 MB L3 Smart Cache
–   4 Cores
    4 Cores (8 Threads in Hyper threading)
                          Hyper-threading)   −   6 Cores/CPU
–   130 W power consumption                  −   95 W power consumption
–   24 Gigabytes of DDR3 RAM                 −   36(18x2) Gigabytes of DDR3 RAM
–   4.8 Giga Transfers/s QPI
          g                                  −   6.4 Giga Transfers/s QPI
                                                       g


    3/12/2011                                                                     12
UMA-SMA                                NUMA-SMA
architectures offer flexibility but      architectures avoid bottleneck
  they tend to have memory
      y                        y       problems in memories, but require
                                       p                             q
          bottlenecks.                manual/pinned allocation of memory
                                               for each thread.


3/12/2011                                                          13
Sequential FFT   Multithreaded FFT
            Parallelized Loops with OpenMP     Tiling Technique




               Threaded FFT using OpenMP
             GOMP_CPU_AFFINITY =” 0-11”
                     More Private Variables



3/12/2011                                                         14
Most important optimizations:
    • Thread Pinning (first touch policy of memory)
    • Private Data (stack, local)/Shared Data(remote cached, evicted)
                   (stack                 Data(remote, cached
    • Scheduling
                Static for loops with regular workloads
                Dynamic for loops with non regular ones
Outlook
    •   The SAR data generation and image formation are scalable to
            •   4 cores i UMA (U ifi d M
                        in     (Unified Memory A
                                               Access)
                                                     )
            •   12 cores in NUMA-2x[6Cores, 16GB RAM]
    •   Speedup is almost linear in these SMA architectures
    •   This code is expected to scale to bigger numbers of cores
    •   Further parallelization paradigms are planed:
            • MPI(Message Passing Interface) for clusters
            • CUDA f GPGPUs
                        for GPGPU
3/12/2011                                                             15
By combining many-core processors and other COTS
     products with radiation-hardened specific components
     one can benefit:
       •    A speedup by a factor of 10 to 100
       •    Improved reliability and robustness of the system.
       •    Efficient and faster application development via already familiar
            programming models.
       •    Ability to port applications directly to the space environment.
       •    Minimization f the
            Mi i i ti of th non-recurring d  i development ti
                                                       l      t time and costs f
                                                                       d    t for
            future missions.
       •    Efficient, flexible and portable software fault-tolerance
            techniques that can be applied in the space environment
                                                            environment.
       •    Portability to future advances in technology.



3/12/2011                                                                  16
Thank you for your attention!




                                  Fisnik Kraja


            LRR - L h t hl fü R h t h ik und R h
                  Lehrstuhl für Rechnertechnik d Rechnerorganisation
                                                             i ti
                         Technische Universität München

                               kraja@in.tum.de
                                  j @




3/12/2011                                                              17

Más contenido relacionado

La actualidad más candente

Instruction level power analysis
Instruction level power analysisInstruction level power analysis
Instruction level power analysis
Radhegovind
 
Fpga implementation of multilayer feed forward neural network architecture us...
Fpga implementation of multilayer feed forward neural network architecture us...Fpga implementation of multilayer feed forward neural network architecture us...
Fpga implementation of multilayer feed forward neural network architecture us...
Ece Rljit
 
"Using SGEMM and FFTs to Accelerate Deep Learning," a Presentation from ARM
"Using SGEMM and FFTs to Accelerate Deep Learning," a Presentation from ARM"Using SGEMM and FFTs to Accelerate Deep Learning," a Presentation from ARM
"Using SGEMM and FFTs to Accelerate Deep Learning," a Presentation from ARM
Edge AI and Vision Alliance
 
Introduction to National Supercomputer center in Tianjin TH-1A Supercomputer
Introduction to National Supercomputer center in Tianjin TH-1A SupercomputerIntroduction to National Supercomputer center in Tianjin TH-1A Supercomputer
Introduction to National Supercomputer center in Tianjin TH-1A Supercomputer
Förderverein Technische Fakultät
 
Iris an architecture for cognitive radio networking testbeds
Iris   an architecture for cognitive radio networking testbedsIris   an architecture for cognitive radio networking testbeds
Iris an architecture for cognitive radio networking testbeds
Patricia Oniga
 
757 08-switcharchp2
757 08-switcharchp2757 08-switcharchp2
757 08-switcharchp2
songoku218
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
ijceronline
 
Design and Implementation of Low Power DSP Core with Programmable Truncated V...
Design and Implementation of Low Power DSP Core with Programmable Truncated V...Design and Implementation of Low Power DSP Core with Programmable Truncated V...
Design and Implementation of Low Power DSP Core with Programmable Truncated V...
ijsrd.com
 

La actualidad más candente (20)

DSP architecture
DSP architectureDSP architecture
DSP architecture
 
Instruction level power analysis
Instruction level power analysisInstruction level power analysis
Instruction level power analysis
 
Intel's Nehalem Microarchitecture by Glenn Hinton
Intel's Nehalem Microarchitecture by Glenn HintonIntel's Nehalem Microarchitecture by Glenn Hinton
Intel's Nehalem Microarchitecture by Glenn Hinton
 
Fpga implementation of multilayer feed forward neural network architecture us...
Fpga implementation of multilayer feed forward neural network architecture us...Fpga implementation of multilayer feed forward neural network architecture us...
Fpga implementation of multilayer feed forward neural network architecture us...
 
"Using SGEMM and FFTs to Accelerate Deep Learning," a Presentation from ARM
"Using SGEMM and FFTs to Accelerate Deep Learning," a Presentation from ARM"Using SGEMM and FFTs to Accelerate Deep Learning," a Presentation from ARM
"Using SGEMM and FFTs to Accelerate Deep Learning," a Presentation from ARM
 
Introduction to National Supercomputer center in Tianjin TH-1A Supercomputer
Introduction to National Supercomputer center in Tianjin TH-1A SupercomputerIntroduction to National Supercomputer center in Tianjin TH-1A Supercomputer
Introduction to National Supercomputer center in Tianjin TH-1A Supercomputer
 
Iris an architecture for cognitive radio networking testbeds
Iris   an architecture for cognitive radio networking testbedsIris   an architecture for cognitive radio networking testbeds
Iris an architecture for cognitive radio networking testbeds
 
"Efficient Implementation of Convolutional Neural Networks using OpenCL on FP...
"Efficient Implementation of Convolutional Neural Networks using OpenCL on FP..."Efficient Implementation of Convolutional Neural Networks using OpenCL on FP...
"Efficient Implementation of Convolutional Neural Networks using OpenCL on FP...
 
Run-Time Adaptive Processor Allocation of Self-Configurable Intel IXP2400 Net...
Run-Time Adaptive Processor Allocation of Self-Configurable Intel IXP2400 Net...Run-Time Adaptive Processor Allocation of Self-Configurable Intel IXP2400 Net...
Run-Time Adaptive Processor Allocation of Self-Configurable Intel IXP2400 Net...
 
BDL_project_report
BDL_project_reportBDL_project_report
BDL_project_report
 
757 08-switcharchp2
757 08-switcharchp2757 08-switcharchp2
757 08-switcharchp2
 
DESIGNED DYNAMIC SEGMENTED LRU AND MODIFIED MOESI PROTOCOL FOR RING CONNECTED...
DESIGNED DYNAMIC SEGMENTED LRU AND MODIFIED MOESI PROTOCOL FOR RING CONNECTED...DESIGNED DYNAMIC SEGMENTED LRU AND MODIFIED MOESI PROTOCOL FOR RING CONNECTED...
DESIGNED DYNAMIC SEGMENTED LRU AND MODIFIED MOESI PROTOCOL FOR RING CONNECTED...
 
Publication
PublicationPublication
Publication
 
Lec06 memory
Lec06 memoryLec06 memory
Lec06 memory
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
 
An fpga based efficient fruit recognition system using minimum
An fpga based efficient fruit recognition system using minimumAn fpga based efficient fruit recognition system using minimum
An fpga based efficient fruit recognition system using minimum
 
UIC Thesis Novati
UIC Thesis NovatiUIC Thesis Novati
UIC Thesis Novati
 
Design and Implementation of Low Power DSP Core with Programmable Truncated V...
Design and Implementation of Low Power DSP Core with Programmable Truncated V...Design and Implementation of Low Power DSP Core with Programmable Truncated V...
Design and Implementation of Low Power DSP Core with Programmable Truncated V...
 
Aw25293296
Aw25293296Aw25293296
Aw25293296
 
Architecture exploration of recent GPUs to analyze the efficiency of hardware...
Architecture exploration of recent GPUs to analyze the efficiency of hardware...Architecture exploration of recent GPUs to analyze the efficiency of hardware...
Architecture exploration of recent GPUs to analyze the efficiency of hardware...
 

Similar a Using Many-Core Processors to Improve the Performance of Space Computing Platforms

Exaflop In 2018 Hardware
Exaflop In 2018   HardwareExaflop In 2018   Hardware
Exaflop In 2018 Hardware
Jacob Wu
 
Shak larry-jeder-perf-and-tuning-summit14-part1-final
Shak larry-jeder-perf-and-tuning-summit14-part1-finalShak larry-jeder-perf-and-tuning-summit14-part1-final
Shak larry-jeder-perf-and-tuning-summit14-part1-final
Tommy Lee
 
Experiences in Application Specific Supercomputer Design - Reasons, Challenge...
Experiences in Application Specific Supercomputer Design - Reasons, Challenge...Experiences in Application Specific Supercomputer Design - Reasons, Challenge...
Experiences in Application Specific Supercomputer Design - Reasons, Challenge...
Heiko Joerg Schick
 
Maxwell siuc hpc_description_tutorial
Maxwell siuc hpc_description_tutorialMaxwell siuc hpc_description_tutorial
Maxwell siuc hpc_description_tutorial
madhuinturi
 
Apache con 2013-hadoop
Apache con 2013-hadoopApache con 2013-hadoop
Apache con 2013-hadoop
Steve Watt
 

Similar a Using Many-Core Processors to Improve the Performance of Space Computing Platforms (20)

Exaflop In 2018 Hardware
Exaflop In 2018   HardwareExaflop In 2018   Hardware
Exaflop In 2018 Hardware
 
Morph : a novel accelerator
Morph : a novel acceleratorMorph : a novel accelerator
Morph : a novel accelerator
 
Userspace networking
Userspace networkingUserspace networking
Userspace networking
 
From Rack scale computers to Warehouse scale computers
From Rack scale computers to Warehouse scale computersFrom Rack scale computers to Warehouse scale computers
From Rack scale computers to Warehouse scale computers
 
Shak larry-jeder-perf-and-tuning-summit14-part1-final
Shak larry-jeder-perf-and-tuning-summit14-part1-finalShak larry-jeder-perf-and-tuning-summit14-part1-final
Shak larry-jeder-perf-and-tuning-summit14-part1-final
 
Introduction to DPDK
Introduction to DPDKIntroduction to DPDK
Introduction to DPDK
 
In datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unitIn datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unit
 
Experiences in Application Specific Supercomputer Design - Reasons, Challenge...
Experiences in Application Specific Supercomputer Design - Reasons, Challenge...Experiences in Application Specific Supercomputer Design - Reasons, Challenge...
Experiences in Application Specific Supercomputer Design - Reasons, Challenge...
 
Intel’S Larrabee
Intel’S LarrabeeIntel’S Larrabee
Intel’S Larrabee
 
Oow 2008 yahoo_pie-db
Oow 2008 yahoo_pie-dbOow 2008 yahoo_pie-db
Oow 2008 yahoo_pie-db
 
Building efficient 5G NR base stations with Intel® Xeon® Scalable Processors
Building efficient 5G NR base stations with Intel® Xeon® Scalable Processors Building efficient 5G NR base stations with Intel® Xeon® Scalable Processors
Building efficient 5G NR base stations with Intel® Xeon® Scalable Processors
 
Exascale Capabl
Exascale CapablExascale Capabl
Exascale Capabl
 
Maxwell siuc hpc_description_tutorial
Maxwell siuc hpc_description_tutorialMaxwell siuc hpc_description_tutorial
Maxwell siuc hpc_description_tutorial
 
Nehalem
NehalemNehalem
Nehalem
 
Lightweight DNN Processor Design (based on NVDLA)
Lightweight DNN Processor Design (based on NVDLA)Lightweight DNN Processor Design (based on NVDLA)
Lightweight DNN Processor Design (based on NVDLA)
 
KIISE:SIGDB Workshop presentation.
KIISE:SIGDB Workshop presentation.KIISE:SIGDB Workshop presentation.
KIISE:SIGDB Workshop presentation.
 
MARC ONERA Toulouse2012 Altreonic
MARC ONERA Toulouse2012 AltreonicMARC ONERA Toulouse2012 Altreonic
MARC ONERA Toulouse2012 Altreonic
 
Optimizing your Infrastrucure and Operating System for Hadoop
Optimizing your Infrastrucure and Operating System for HadoopOptimizing your Infrastrucure and Operating System for Hadoop
Optimizing your Infrastrucure and Operating System for Hadoop
 
Apache con 2013-hadoop
Apache con 2013-hadoopApache con 2013-hadoop
Apache con 2013-hadoop
 
Sponge v2
Sponge v2Sponge v2
Sponge v2
 

Más de Fisnik Kraja (6)

Performance Optimization of HPC Applications: From Hardware to Source Code
Performance Optimization of HPC Applications: From Hardware to Source CodePerformance Optimization of HPC Applications: From Hardware to Source Code
Performance Optimization of HPC Applications: From Hardware to Source Code
 
Runtime Performance Optimizations for an OpenFOAM Simulation
Runtime Performance Optimizations for an OpenFOAM SimulationRuntime Performance Optimizations for an OpenFOAM Simulation
Runtime Performance Optimizations for an OpenFOAM Simulation
 
Performance Analysis and Optimizations of CAE Applications (Case Study: STAR_...
Performance Analysis and Optimizations of CAE Applications (Case Study: STAR_...Performance Analysis and Optimizations of CAE Applications (Case Study: STAR_...
Performance Analysis and Optimizations of CAE Applications (Case Study: STAR_...
 
Designing High Performance Computing Architectures for Reliable Space Applica...
Designing High Performance Computing Architectures for Reliable Space Applica...Designing High Performance Computing Architectures for Reliable Space Applica...
Designing High Performance Computing Architectures for Reliable Space Applica...
 
Performance Evaluation of SAR Image Reconstruction on CPUs and GPUs
Performance Evaluation of SAR Image Reconstruction on CPUs and GPUsPerformance Evaluation of SAR Image Reconstruction on CPUs and GPUs
Performance Evaluation of SAR Image Reconstruction on CPUs and GPUs
 
Parallelization Techniques for the 2D Fourier Matched Filtering and Interpola...
Parallelization Techniques for the 2D Fourier Matched Filtering and Interpola...Parallelization Techniques for the 2D Fourier Matched Filtering and Interpola...
Parallelization Techniques for the 2D Fourier Matched Filtering and Interpola...
 

Using Many-Core Processors to Improve the Performance of Space Computing Platforms

  • 1. Faculty of Informatics Chair of Computer Architectures Fisnik Kraja Fi ik K j Phd Candidate 2011 IEEE Aerospace Conference, 5-12 March 2011, Big Sky, Montana
  • 2. • Subject: New computing architecture for future satellites. • Purpose: To introduce many-core and other COTS technologies in the design process. • Main points will be: – State f th St t of the art of space applications and computing platforms t f li ti d ti l tf – Proposed system architecture – Performance Estimations (Benchmarking) – Discussions and conclusions 3/12/2011 2
  • 3. • On-board computers offer minimal functionality. • Constrains like power , size , heat • High-reliability requirements, because of radiation effects: – Total Ionizing Dose (TID) – Single Event Upset (SEU) – Single Event Transient (SET) – Single Event Latch up (SEL) Latch-up • New space applications ask for improved on-board processing abilities in terms of abilities, – high processing power and throughput – without losing the required reliability. 3/12/2011 3
  • 4. • HRWS SAR (High resolution wide swath synthetic aperture radar). • Used to reduce the amount of data to be transmitted to ground • Uses separate apertures to transmit and receive • Uses multiply phase centers in receive • Each panel represents an independent phase center • 7 Panels are used, each consisting of 12 tiles 3/12/2011 4
  • 5. Parallelism of the algorithm: • 7 independent panel processing • 12x7=84 independent tile processing Requirements: 1 Tera 16-bit fixed point Ops/s 16 bit (complex multiply and add) Peak sample rate : 8Gbps Full t F ll antenna average raw data d t rate 603.1 Gbps 3/12/2011 It is impossible to fulfill these requirements 5 with currently available technology for space.
  • 6. • To efficiently apply the upcoming many-core processors and other COTS products to improve the on-board processing power. i • Reliability of the system should be addressed by: – traditional hardware techniques (TMR) – software-implemented fault-tolerant techniques • Thread/process/service replication • This system should provide other important features: – flexibility, – scalability l bilit – portability. 3/12/2011 6
  • 8. I/O RHPU Memory Memory Memory Reliable Local Bus Bus interfacing 3/12/2011 8
  • 9. Solution to the tradeoff between performance and reliability might be the rotating consistency check, in which only some processes are replicated and results checked for consistency at a time, but over a longer period all of them get verified. 3/12/2011 9
  • 10. Why SSCA#3? • Computationally taxing • Large block data transfers L bl k d t t f • Stressful memory access patterns • Scalable to mimic different problem sizes 1. Synthetic Data Generation stage is used to produce raw SAR data approximates, which are similar to what would be obtained from a real SAR system. f l t 2. SAR Sensor Processing stage reconstructs a SAR image using a wavefront spotlight SAR reconstruction method known as 2D F i M t h d Filt i and I t Fourier Matched Filtering d Interpolation. l ti 3/12/2011 10
  • 11. SDG:  Kernel 1: Synthetic SAR returns  Reconstructed  SAR image from a uniform grid of  point reflectors 3/12/2011 11
  • 12. The symmetric SMA (UMA) The distributed SMA (NUMA) – 1 Nehalem CPU: Intel Core i7 CPU 920 − 2 Nehalem CPUs: Intel Xeon CPU X5670, – 2.67 GHz Frequency − 2.93 GHz processor frequency – 8 MB L3 Smart Cache − 12 MB L3 Smart Cache – 4 Cores 4 Cores (8 Threads in Hyper threading) Hyper-threading) − 6 Cores/CPU – 130 W power consumption − 95 W power consumption – 24 Gigabytes of DDR3 RAM − 36(18x2) Gigabytes of DDR3 RAM – 4.8 Giga Transfers/s QPI g − 6.4 Giga Transfers/s QPI g 3/12/2011 12
  • 13. UMA-SMA NUMA-SMA architectures offer flexibility but architectures avoid bottleneck they tend to have memory y y problems in memories, but require p q bottlenecks. manual/pinned allocation of memory for each thread. 3/12/2011 13
  • 14. Sequential FFT Multithreaded FFT Parallelized Loops with OpenMP Tiling Technique Threaded FFT using OpenMP GOMP_CPU_AFFINITY =” 0-11” More Private Variables 3/12/2011 14
  • 15. Most important optimizations: • Thread Pinning (first touch policy of memory) • Private Data (stack, local)/Shared Data(remote cached, evicted) (stack Data(remote, cached • Scheduling Static for loops with regular workloads Dynamic for loops with non regular ones Outlook • The SAR data generation and image formation are scalable to • 4 cores i UMA (U ifi d M in (Unified Memory A Access) ) • 12 cores in NUMA-2x[6Cores, 16GB RAM] • Speedup is almost linear in these SMA architectures • This code is expected to scale to bigger numbers of cores • Further parallelization paradigms are planed: • MPI(Message Passing Interface) for clusters • CUDA f GPGPUs for GPGPU 3/12/2011 15
  • 16. By combining many-core processors and other COTS products with radiation-hardened specific components one can benefit: • A speedup by a factor of 10 to 100 • Improved reliability and robustness of the system. • Efficient and faster application development via already familiar programming models. • Ability to port applications directly to the space environment. • Minimization f the Mi i i ti of th non-recurring d i development ti l t time and costs f d t for future missions. • Efficient, flexible and portable software fault-tolerance techniques that can be applied in the space environment environment. • Portability to future advances in technology. 3/12/2011 16
  • 17. Thank you for your attention! Fisnik Kraja LRR - L h t hl fü R h t h ik und R h Lehrstuhl für Rechnertechnik d Rechnerorganisation i ti Technische Universität München kraja@in.tum.de j @ 3/12/2011 17