Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

BigDataCamp 2011

1.420 visualizaciones

Publicado el

Publicado en: Tecnología, Empresariales
  • Inicia sesión para ver los comentarios

BigDataCamp 2011

  1. 1. BigDataCamp 2011 Chris K Wensel
  2. 2. Concurrent, Inc.• Founded in Spring of 2008• Cascading core development• Support, Training, & OEM Licensing
  3. 3. So What is Cascading?
  4. 4. In a Nutshell Processing API Integration APIScheduler API Physical Planner Scheduler Alternative Java API to MapReduce with built in Processing Planner and Workload Scheduler
  5. 5. On Many Platforms Processing API Integration API Scheduler API Physical Planner Scheduler Platform• Apache Hadoop • MapR• Amazon Elastic • EMC/GreenPlum MapReduce • and more**
  6. 6. But How is Cascading Used?
  7. 7. RazorFish/BestBuy Java [unit, regression, & integration testing] Processing API Integration API Scheduler API Physical Planner Scheduler Platform• E-Commerce visitor/customer behavior classification• Rule processing against proprietary logs• Backend system integration
  8. 8. FlightCaster JVM Language/DSL [scripting, ad-hoc queries, etc] Logical Planner Processing API Integration API Scheduler API Physical Planner Scheduler Platform• They predict flight delays 6 hrs in advance• Created own API/DSL in Clojure• Used to build predictive models
  9. 9. Etsy JVM Language/DSL [scripting, ad-hoc queries, etc] Logical Planner Processing API Integration API Scheduler API Physical Planner Scheduler Platform• Online retailer• Forked own API/DSL in JRuby • Cascading.JRuby - avail on github
  10. 10. What• User behavior on site• Data driven site features • Taste Test • Facebook gift recommender • Suggested Shops • Top Query List • plus many more on the way
  11. 11. BackType JVM Language/DSL [scripting, ad-hoc queries, etc] Logical Planner Processing API Integration API Scheduler API Physical Planner Scheduler Platform• Marketing intelligence• Created Cascalog • an API/DSL in Clojure, avail on github
  12. 12. Ion Flux Java [unit, regression, & integration testing] Processing API Integration APIScheduler API Physical Planner Scheduler Platform Gene sequencing
  13. 13. Who Else?http://concurrentinc.com/casestudies/
  14. 14. How is Cascading Different?
  15. 15. Pig/Hive Query Syntax Extension API Logical Planner Processing API Integration API Scheduler API Physical Planner Scheduler PlatformGreat for ad-hoc queries, but hard to operationalize
  16. 16. Oozie/Azkaban Scheduler Syntax Processing API Integration API Scheduler API Physical Planner Scheduler Platform• Great for gluing command line apps together• JVM scripting language + Cascading is less brittle and with more degrees of freedom
  17. 17. But They are Complementary• No reason Oozie (or Talend) can’t be used to drive Cascading apps• No reason Cascading can’t drive raw MR/ Pig/Hive processes (see Riffle)
  18. 18. Architecture isn’t Innovation collection cleansing processing deliveryevent data signal info knowledge normalization scoring mining The point of computing systems is to make data more valuable Everything else is an implementation detail Copyright Concurrent, Inc. 2011. All rights reserved.
  19. 19. Cascading 2.0• Removed dependencies on Hadoop• Improved Processing Planner architecture• Improved integration APIs Copyright Concurrent, Inc. 2011. All rights reserved.
  20. 20. To Do• Support more platforms, including in- memory stream processing• Make Planner more intelligent and leverage more complex data flow topologies• Integrate with more systems and applications Copyright Concurrent, Inc. 2011. All rights reserved.
  21. 21. We are Hiringhttp://www.concurrentinc.com/careers/

×