Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

ScilabTEC 2015 - KIT

1.427 visualizaciones

Publicado el

"Can programming of multi-core systems be easier, please? The ALMA Approach"
By Oliver Oey, Karlsruhe Institute of Technologie - KIT for ScilabTEC 2015

  • Sé el primero en comentar

  • Sé el primero en recomendar esto

ScilabTEC 2015 - KIT

  1. 1. FP7-ICT-2011-7-287733 – ScilabTEC – Oliver Oey – oliver.oey@kit.edu 1 FP7-ICT-2011-7-287733 ALMA Project Overview Simplifying programming for multi-cores Oliver Oey
  2. 2. FP7-ICT-2011-7-287733 – ScilabTEC – Oliver Oey – oliver.oey@kit.edu 2 Outline  ALMA EU Project Overview  Project Overview  Motivation  Results  MatrixFrontend  Type inference  Loopify  Simplify  emmtrix Technologies  Summary
  3. 3. FP7-ICT-2011-7-287733 – ScilabTEC – Oliver Oey – oliver.oey@kit.edu 3 ALMA Project ID Card  Three year project: 01/09/2011 – 31/01/2015  Funded by FP7: 3.2 Million Euros  Official web site: http://www.alma-project.eu/  Coordinator: Juergen Becker (KIT) and Timo Stripf (KIT)  Scientific Coordinator: Nikos Voros (TWG)
  4. 4. FP7-ICT-2011-7-287733 – ScilabTEC – Oliver Oey – oliver.oey@kit.edu 4 Why do we need multi-core processors?  Until ~2005 processor performance increase driven by  Clock speed  Execution optimization  Cache  Power wall  ILP wall  Led to multicore processors  Parallelism must be exposed by the programmer (source http://www.gotw.ca/publications/concurrency-ddj.htm)
  5. 5. FP7-ICT-2011-7-287733 – ScilabTEC – Oliver Oey – oliver.oey@kit.edu 5 Motivation End user perspective Target architecture perspective • Explore/Develop algorithms • Use a simple, comfortable language • E.g. Matlab, Scilab, … • Don’t want to care about • data types • parallelism • End result • Performance • Energy efficient • Cost efficient • Fast development time • Multi-Processor System-on-Chip • Parallel processor cores • Explicit parallel programming • Distributed memory model, e.g. MPI • Parallelism within the processor cores • Single Instruction Multiple Data • Very Long Instruction Word • Native data types • E.g. 32-bit integer • Floating-point perform inefficient  Hide the complexity from the end user
  6. 6. FP7-ICT-2011-7-287733 – ScilabTEC – Oliver Oey – oliver.oey@kit.edu 6 ALMA Development Flow (overview) Optimized application code on multi-core platform Embedded application design Multi-core hardware design Translation to Scilab & annotations Abstract hardware description (ADL) KIT C-compiler Multi-core simulator Parameters for algorithm optimization C-based code with parallel descriptions ALMA algorithm parallelization tools Executable binary (for simulator and HW) Recore C-compiler Structural hardware description Feedback for optimization
  7. 7. FP7-ICT-2011-7-287733 – ScilabTEC – Oliver Oey – oliver.oey@kit.edu 7 Challenges for Compiling Scilab to MPSoCs  Scilab programming language  Sequential, imperative language  Dynamic typing (scalars, vectors, matrices)  End users typically use floating-point data types  Pointer-free, i.e. no memory aliasing problems  Natural parallelism within vector operations  MPSoC target architectures  Exploit coarse-grain parallelism (task-level)  Distributed memory  Exploit fine-grain parallelism (instruction-level)
  8. 8. FP7-ICT-2011-7-287733 – ScilabTEC – Oliver Oey – oliver.oey@kit.edu 8 ALMA Target Architectures  Xentium® processing tile  Fixed-point DSP processing  10-issue VLIW processor  SIMD capability  Streaming communication services  Multicore Architectures  Distributed memory  => No shared memory required  No floating point unit  => Use fixed-point arithmetic  Example Architecture: Recore X2014
  9. 9. FP7-ICT-2011-7-287733 – ScilabTEC – Oliver Oey – oliver.oey@kit.edu 9 Application Test Cases - Telecommunications Rx 1 Rx NR FFT Equaliz er Channe l Estimat or Derand o mizer Deinter leaver Symbol Decons truction - Cyclic Prefix Diversity Combine r - Cyclic Prefix FFT SDU Gener ation Data SDU s Uplink Frame Decon structio n MAC -PHY I/F BS Rx ` ALMA 1st Increment ALMA 2nd Increment Tx 1 Tx NT FEC Enco der Interl eaver Constel . Mappin g IFFT + Cyclic Prefix S-T Coding IFFT + Cyclic Prefix + Pre amble Data SDU s PHY MA C UL/DL Frame Mappe r UL/DL Sched uler BS Tx PDU Generati on MAC -PHY I/F Fram e Cons tructi on Downlin k MAC/P HY Control Symb ol Const ructio n Rand omiz er . . . .. . .. . . . . . . . . . . . . . FEC Decode r Const. Demap IEEE 802.16e PHY Layer in NT x NR MIMO Configuration Speedup: ~2,8
  10. 10. FP7-ICT-2011-7-287733 – ScilabTEC – Oliver Oey – oliver.oey@kit.edu 10 Application Test Cases – Image Processing Scale Invariant Feature Transform (SIFT) Speedup: ~1,8
  11. 11. FP7-ICT-2011-7-287733 – ScilabTEC – Oliver Oey – oliver.oey@kit.edu 11 0 20 40 60 80 Telecommunication Image processing Workingdays Manual Using autom. Parallelization Development effort -57% -30%  Reduction of development effort by partially over 50%
  12. 12. FP7-ICT-2011-7-287733 – ScilabTEC – Oliver Oey – oliver.oey@kit.edu 12 ALMA Workflow Parallel C Code Development Cycle II Development using Scilab Development Cycle I ALMA Parallelization Tools Testing plattform CPU CPU CPU CPU Testing PC Multi-core Processor
  13. 13. FP7-ICT-2011-7-287733 – ScilabTEC – Oliver Oey – oliver.oey@kit.edu 13 ALMA Workflow (Details) Parallel C Code Development Cycle I Development with Scilab Sequential Static C Code Paralleliza tion Matrix Frontend Parallelization Development Cycle II
  14. 14. FP7-ICT-2011-7-287733 – ScilabTEC – Oliver Oey – oliver.oey@kit.edu 14 Outline  ALMA EU Project Overview  Project Overview  Motivation  Results  MatrixFrontend  Type inference  Loopify  Simplify  emmtrix Technologies  Summary
  15. 15. FP7-ICT-2011-7-287733 – ScilabTEC – Oliver Oey – oliver.oey@kit.edu 15 Matrix Frontend Parallel C CodeDevelopment with Scilab Sequential Static C Code Paralleliza tion Matrix Frontend Parallelization  Scilab-to-C Compiler  Parses Scilab code  Advanced type inference  High-level optimizations on Scilab code  Turns Scilab statements into loop nests  Generated C Code  Optimized for parallelism extraction  Static memory allocation  Avoid pointers
  16. 16. FP7-ICT-2011-7-287733 – ScilabTEC – Oliver Oey – oliver.oey@kit.edu 16 Requirements Source language  Support Scilab input language  Support well-defined subset  Extend with annotation  for type inference  for parallelization  Annotated code should still be valid Scilab/Matlab code Target language  Generate ANSI C89 code  Polyhedral code  Large Static Control Parts  Avoid pointers  Static code  No dynamic memory allocation  Avoid run-time decisions
  17. 17. FP7-ICT-2011-7-287733 – ScilabTEC – Oliver Oey – oliver.oey@kit.edu 17 Type Inference  Calculate types for expressions and variables “Type” = “Data Type” + “Shape”  Separated into 3 passes 1. Shape Inference 2. Data Type Inference 3. Variable Inference Scilab Type Inference Loopify Simplify C Code Output C Code
  18. 18. FP7-ICT-2011-7-287733 – ScilabTEC – Oliver Oey – oliver.oey@kit.edu 18 Type Inference - Shape  Calculate shape of each Scilab statement s = [1 2 3]; // s = 1x3 for f = 1:10 // f = 1x1 s = s + f // s = 1x3 end Scilab Type Inference Loopify Simplify C Code Output C Code
  19. 19. FP7-ICT-2011-7-287733 – ScilabTEC – Oliver Oey – oliver.oey@kit.edu 19 Type Inference – Growing Arrays  Support growing arrays a = 1; a(1,5) = 1;  [1 0 0 0 1]  Maximum size must be known!  What happens if matrix is indexed by variable? a(1,b) = 1; // Maximum value of b unknown  Two solutions: Scilab Type Inference Loopify Simplify C Code Output C Code a = zeros(1,5); mfe_fixedsize(a); a = 1; a(1,b) = 1; a = 1; mfe_size(a, 1, 1:5); a(1,b) = 1;
  20. 20. FP7-ICT-2011-7-287733 – ScilabTEC – Oliver Oey – oliver.oey@kit.edu 20 Type Inference – Data Type  Scilab has data type function  double  int32, int16, int8  uint32, uint16, uint8  boolean  complex, real, imag a = uint8([255 256]);  [255 0] Scilab Type Inference Loopify Simplify C Code Output C Code
  21. 21. FP7-ICT-2011-7-287733 – ScilabTEC – Oliver Oey – oliver.oey@kit.edu 21 Type Inference – Data Type (2)  Problem: Data type is run-time specific sqrt(1) => double sqrt(-1) => complex double sqrt(a) => ?  We cannot guarantee Scilab conform execution  Solution  Generate warning  Ask end user to specify data type real(sqrt(a)) => double sqrt(complex(a)) => complex double Scilab Type Inference Loopify Simplify C Code Output C Code
  22. 22. FP7-ICT-2011-7-287733 – ScilabTEC – Oliver Oey – oliver.oey@kit.edu 22 Type Inference – Variable  Shape and data type inference operate on expressions  Assign shape/data type to variables  Data type  Limitation: Data type cannot change at run time a = 1; a = uint8(1);  Complex flag is “or” connected a = 1; a = %i;  complex_double_t a; Scilab Type Inference Loopify Simplify C Code Output C Code
  23. 23. FP7-ICT-2011-7-287733 – ScilabTEC – Oliver Oey – oliver.oey@kit.edu 23 Type Inference – Variable (2)  Shape  Variable shape is maximum of all dimensions a = zeros(1,3); a = zeros(4,1);  double a[4,3];  Limitation: Number of dimensions cannot change a = zeros(3,3); a = zeros(3,3,3); Scilab Type Inference Loopify Simplify C Code Output C Code
  24. 24. FP7-ICT-2011-7-287733 – ScilabTEC – Oliver Oey – oliver.oey@kit.edu 24 Loopify  Translates Matlab/Scilab variables into  Data  Dynamic size  Static (maximum) size  Translates Matlab/Scilab statements into  Loop nest  Size calculation Scilab C code a = zeros(2,3); int32_t a_data[3][2] = {{0}}; int32_t a_size[2]; const int32_t a_ssize[2] = {2, 3}; for (v1 = 0; v1 < 3; ++v1) { for (v0 = 0; v0 < 2; ++v0) { a_data[v1][v0] = 0; } } a_size[0] = 2; a_size[1] = 3; Scilab Type Inference Loopify Simplify C Code Output C Code
  25. 25. FP7-ICT-2011-7-287733 – ScilabTEC – Oliver Oey – oliver.oey@kit.edu 25 Simplify  Remove unnecessary “for loops”  Remove unnecessary variable dimensions  Remove size variables and statements for fixed size variables Scilab C code a = 1; (before simplify) int32_t a_data[1][1] = {{0}}; … for (v1 = 0; v1 < 1; ++v1) { for (v0 = 0; v0 < 1; ++v0) { a_data[v1][v0] = 1; } } a = 1; (after simplify) int32_t a_data = 0; … a_data[v1][v0] = 1; Scilab Type Inference Loopify Simplify C Code Output C Code
  26. 26. FP7-ICT-2011-7-287733 – ScilabTEC – Oliver Oey – oliver.oey@kit.edu 26 Results – Lines of Code 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 SIFT Magic IFFT Intracom Scilab C (After Simplify) C (Before Simplify)
  27. 27. FP7-ICT-2011-7-287733 – ScilabTEC – Oliver Oey – oliver.oey@kit.edu 27 Start-up company Solutions for a parallel world  Will be founded from KIT with results from ALMA  www.emmtrix.com
  28. 28. FP7-ICT-2011-7-287733 – ScilabTEC – Oliver Oey – oliver.oey@kit.edu 28 Interactive Parallelization  Control parallelization by high-level decisions in GUI  Control, Traceability, Usability  Automatic test generation  Reliability CPU CPU CPUCPU CPU
  29. 29. FP7-ICT-2011-7-287733 – ScilabTEC – Oliver Oey – oliver.oey@kit.edu 29 emmtrix Workflow Integration Parallel C Code Verification Development with Scilab Iteration emmtrix Parallelization Solution Test Platform CPU CPU CPU CPU Test PC Multicore Processor  Integration into Scilab workflow  Planned Xcos integration for model-based design
  30. 30. FP7-ICT-2011-7-287733 – ScilabTEC – Oliver Oey – oliver.oey@kit.edu 30 Plans for emmtrix  Soon:  Release of MatrixFrontend for Scilab community  Free to use  Convert Scilab code to C code  Product launch of emmtrix Parallel Studio (not final name) at Embedded World 2016 (Feb, 2016)  Integration into workflow  Support for different hardware platforms  Support for model-based design
  31. 31. FP7-ICT-2011-7-287733 – ScilabTEC – Oliver Oey – oliver.oey@kit.edu 31 Summary  ALMA Toolchain  MatrixFrontend: Convert Scilab code to C  Parallelization of generated code  Speedup development for multi-core systems by 30-60%  emmtrix Technologies  Distribution of ALMA results  Free Scilab to C converter: Matrix Frontend  Interactive parallelization tool
  32. 32. FP7-ICT-2011-7-287733 – ScilabTEC – Oliver Oey – oliver.oey@kit.edu 32 Thank you !

×