Publicidad
Publicidad

Más contenido relacionado

Similar a JUNIPER: Towards Modeling Approach Enabling Efficient Platform for Heterogeneous Big Data Analysis(20)

Último(20)

Publicidad

JUNIPER: Towards Modeling Approach Enabling Efficient Platform for Heterogeneous Big Data Analysis

  1. www.modeliosoft.com Towards Modeling Approach Enabling Efficient Platform for Heterogeneous Big Data Analysis Andrey.Sadovykh@softeam.fr 1
  2. Outlines Introduction Model-driven development Big Data Juniper Case-study results Conclusions www.modeliosoft.com 2
  3. 20 ME 2006 17,5 ME 2005 70 ME 2013 Paris Rennes Nantes Sophia SOFTEAM – a French IT services / Software vendor •SOFTEAM, a growing company  25 years’ experience  850 experts  Regular growth • Specialist in OO technologies, new architectures, methodologies • Banking, Defense, Telecom, … www.modeliosoft.com 3 23 ME 2008
  4. Modelio for Software and System Engineering • UML editor with 20 years’ history o CloudML o SysML o MARTE o Code generation o Documentation o Teamwork www.modeliosoft.com 4 • Available under open source at Modelio.org
  5. MODEL-DRIVEN DEVELOPMENT www.modeliosoft.com 5
  6. It is all about models … Starting with UML www.modeliosoft.com 6 Requirements UML Use Cases Architecture UML Components and Classes Design Refined Classes or Domain Specific Language Implementation Code generation Java, C++, Frameworks
  7. Model = Code www.modeliosoft.com 7
  8. Typical example: Control system for a frigate • 800+ components • Developed by 100+ engineers • 1M+ LOC • MDD fosters Productivity and Quality with o Code generation o Components reuse o Tracing o Automation www.modeliosoft.com 8
  9. Curious DSL example: Ruby on Rails Haml HTML %br{:clear => left’} <br clear=”left”/> %p.foo Hello <p class=”foo”>Hello</p> %p#foo Hello <p id=”foo”>Hello</p> .foo <div class=”foo”>...</div> #foo.bar <div id=”foo” class=”bar”>...</div> www.modeliosoft.com 9 Feature: User can manually add movie Scenario: Add a movie Given I am on the RottenPotatoes home page When I follow "Add new movie" Then I should be on the Create New Movie page When I fill in "Title" with "Men In Black" And I should see "Men In Black" Cucumber and Capybara HAML
  10. What do we get from MDD? Pros • Design once, deploy everywhere! • Write your transformation once, transform anything! Cons • Transformations are hard to write… • How to make sure they are CORRECT? i.e. – Is there any data/semantic loss? www.modeliosoft.com 10
  11. BIG DATA www.modeliosoft.com 11
  12. Volume, variety, velocity 1. @-mails sent every second : 2,9 million 2. Video uploaded to YouTube every minute: 25 hours 3. Data processed by Google every day: 24 petabytes 4. Tweets per day: 50 million 5. Products ordered on Amazon per second: 73 items www.modeliosoft.com 12
  13. Only 0,5 % of data is analyzed • In 2012, 2 837EB generated - just 0,5% actually analyzed. That still amounts to 14EB (or 14.185 million terabytes) Source: IDC & EMC www.modeliosoft.com 13
  14. SQL or Hadoop www.modeliosoft.com 14 Do you have some data and a problem to solve? yes Data fits in memory? *Inspired by: Aaron Cordova Data fits on single RAID array? Tons of options. Don’t need database or Hadoop yes no no yes Solvable with SQL? Use MySQL yes Can you program? Write a prog. yes no no Dead end
  15. SQL or Hadoop (continued) www.modeliosoft.com 15 * Inspired by: Aaron Cordova Data fits on single RAID array? no Have lots of money? Solvable with Oracle SQL? no Buy a SAN, Use Oracle yes Do you have a PhD in parallel prog. ? no Roll your own MPI solution yes Solvable using MapReduce? no yes Can you program MapReduce jobs? Write MapReduce on Hadoop? yes no Dead end no
  16. Challenges Hadoop MapReduce is the major trend Success relies on personnel programming skills Many problems are not solvable with Hadoop. Real-time? MPI for high performance computing is an option when you have a lots of money and a PhD www.modeliosoft.com 16
  17. www.modeliosoft.com 17
  18. JUNIPER integrates Big Data technologies over MPI www.modeliosoft.com 18 DOCs StreamsDBs Data Processing Stage 1 Stage N Business Intelligence Analytical DBs Visualization dbdb DOCsDOCs Data Processing in JUNIPER S1 S3 S2 Analytical DBs mpi mpi mpi mpi FPGA-enabled nodes Hadoop HPC
  19. Modelling in Juniper www.modeliosoft.com 19 Models High level Architecture (Nodes,Programs, Streams…) Real-time constraints Java Code Code Generation (+MPI initialization, communication, etc) Reverse Engineering Schedulability Analysis Tool (in progress) Scheduling Advisor (in progress) Measurements & Advice Deployment Scripts (in progress) ConfigurationModel Export Code Generation
  20. Mapping Programming Model, UML and MARTE www.modeliosoft.com 20 JUNIPER Program Channel Cloud Node Programming Model UML MARTE
  21. Modelling the application and real-time constraints www.modeliosoft.com 21 Real-time constrains - response time - bandwidth Big Data flow JUNIPER Programs
  22. Modelling the hardware infrastructure at a high level www.modeliosoft.com 22 Cloud Node CPU with 4 cores Hard drive
  23. MPI code generation www.modeliosoft.com 23 Code Generation JUNIPER Application Model
  24. PETAFUEL CASE STUDY www.modeliosoft.com 24
  25. Risk: $45 million in half day www.modeliosoft.com 25
  26. Master Card debit card approval within 4 sec www.modeliosoft.com 26 petaFuel -transactionapprovalwithin4 seconds Transactions Events Stream HistoricalData 30GB Basic Checks FraudPatterns Detection Transaction Approval Decision
  27. Juniper application model www.modeliosoft.com 27
  28. Deployment model www.modeliosoft.com 28
  29. MPI code generation www.modeliosoft.com 29 public class EventProcessor { public static final int RANK = 1; public static IEventProcessor iEventProcessorImpl = new IEventProcessor() { @Override public void process(Event event) { String key = getKeyFromTimestamp(event.getTimestamp()); String value = keyValueStoreIKeyValueStore.find(key); if (value == null) { keyValueStoreIKeyValueStore.put(key, "1"); } else { int count = Integer.parseInt(value); keyValueStoreIKeyValueStore.put(key, ""+(count+1); } } … }; … public static void main(final String[] args) { MPI.Init(args); … MPI.Finalize(); } … }
  30. CONCLUSIONS www.modeliosoft.com 30
  31. Juniper trade-offs www.modeliosoft.com 31 JUNIPER Criteria Hadoop MPI Communication HDFS - file system (httpd) HPC cluster interconnect (Infiniband) Data flow Map Reduce Modeling + MPI comms Parallelization Automatic Manually based on domain decomposition Response time guaranties None Real-time for single node Stages in multi-format No Any (incl. Hadoop + FPGA) Hardware Commodity cluster HP cluster Price € €€€ Skills + ++++ Customers General audience Critical systems
  32. Prospects – more work Work in progress UML based language • MPI Communication • Timing properties • Deployment petaFuel case study Future work Modelling payload Integrating schedulability Running final evaluations Final release www.modeliosoft.com 32
  33. Questions? Andrey Sadovykh Marcos Almeida SOFTEAM | ModelioSoft {name.surname}@softeam.fr SOFTEAM R&D Web Site: http://rd.softeam.com Modelio Web Site : http://www.modelio.org JUNIPER Web Site : http://www.juniper-project.org www.modeliosoft.com 33 * *for your questions
Publicidad