Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

Realtime streaming architecture in INFINARIO

1.229 visualizaciones

Publicado el

About our experience with realtime analyses on never-ending stream of user events. Discuss Lambda architecture, Kappa, Apache Kafka and our own approach.

Publicado en: Datos y análisis
  • You have to choose carefully. ⇒ www.HelpWriting.net ⇐ offers a professional writing service. I highly recommend them. The papers are delivered on time and customers are their first priority. This is their website: ⇒ www.HelpWriting.net ⇐
       Responder 
    ¿Estás seguro?    No
    Tu mensaje aparecerá aquí
  • You can try to use this service ⇒ www.HelpWriting.net ⇐ I have used it several times in college and was absolutely satisfied with the result.
       Responder 
    ¿Estás seguro?    No
    Tu mensaje aparecerá aquí

Realtime streaming architecture in INFINARIO

  1. 1. How to analyze billions of events in real-time? Jozo.Kovac@infinario.com Co-Founder & Product Manager Lambda architecture for real-time streaming analytics
  2. 2. Agenda • Goals & requirements • Design patterns for streaming analytics – General idea – Lambda – Kappa • INFINARIO backend • Discussion
  3. 3. Example: Lets build a funnel fast
  4. 4. Requirements • VELOCITY – Process never ending stream of “events” in real-time • VARIETY AT SPEED – Analyses! Not just predefined reports • VOLUME – Be able to reprocess a stream; retain data • RELIABILITY – Never lose an event • AVAILABLITY – Avoid down-times
  5. 5. DESIGN PATTERNS FOR REAL-TIME STREAMING ANALYTICS LETS LOOK OUTSIDE
  6. 6. Real Time Streaming Architecture Source Systems Sources Syslog Machine Data External Streams Other Data Collection Flume / Custom Agent A Agent B Agent N Messaging System Kafka Topic B Topic N Topic A Real Time Processing Storm Topology B Topology N Topology A Storage Search Elastic Search / Solr Low Latency NoSql HBase Historic Hive / HDFS Access Web Services REST API Web Apps Analytic Tools R / Python BI Tools Alerting Systems
  7. 7. Apache Kafka • publish-subscribe messaging for real-time feeds • retains data for configurable period of time • immutable messages queue (events) • high-throughput, low-latency
  8. 8. Lambda Architecture New Data Data Stream Batch Layer All Data Pre-compute Views Speed Layer Stream Processing Real Time View Serving Layer Batch View Batch View Data Access Query http://strataconf.com/big-data-conference-ca-2015/public/schedule/detail/38774
  9. 9. Components for Lambda Batch layer components Speed layer components Serving layer components http://lambda-architecture.net/
  10. 10. Lambda pros & cons • Pros – Combines real-time & batch processing – Retains input data unchanged – Allows to reprocess the data – Stores immediate stages • Cons – 2 apps in 2 languages what do the same thing – 2x implement, maintain & debug the code – Say good bye to system specific features http://radar.oreilly.com/2014/07/questioning-the-lambda-architecture.html
  11. 11. Kappa Architecture Data Source Data Stream Stream Processing System Job Version n Serving DB Output table n Output table n + 1 Data Access Query Job Version n + 1 1. Use Kafka that retains full log of data to reprocess and allows for multiple subscribers. 2. Reprocessing: new instance of processing job process from start, outputs to new table. 3. When the second job has caught up, switch the application to read from the new table. 4. Stop the old version of the job, and delete the old output table. http://radar.oreilly.com/2014/07/questioning-the-lambda-architecture.html
  12. 12. Kappa pros & cons • Pros – Allows people to develop, test, debug, and operate their systems on top of a single processing framework • Cons – Needs 2x total storage (2 versions of results) – Requires DB with high volume writes
  13. 13. QUERIES IN MEMORY PROCESSING (IMF™) PERSISTENT STORAGE (NoSQL) EVENTAPI LOAD HISTORY AFTER RESTART EVENT STREAM INFINARIO Architecture (now)
  14. 14. IMF™ • “In-Memory (event processing) Framework” • Collect, store and analyze events and players • Distributed & scalable – Built on NodeJS and C++ – Nodes per CPU core & proportion of RAM – Provides API for analyses
  15. 15. IMF Benchmarking 0.004 0.007 0.039 0.349 0.243 2.354 23.894 262.784 0.349 2.593 25.245 284.803 0.202 2.28 522.518 1.609 86.233 1273.985 0 200 400 600 800 1000 1200 1400 100,000 1,000,000 10,000,000 100,000,000 Timetocalculatefunnel(s) # of events in database BlinkBytes Mongo TokuMX Postgres MySQL IMF https://infinario.com/speedtest
  16. 16. Our experience  It’s lightning fast  Cheap reprocess  No immediate results  Easy life  Can process already processed stream (“streaming”) x Code change or Add new node  reload IMF x Reloads can take too long x PB of RAM in 2015 is a joke
  17. 17. Reloads • NoSQL eats too much resources (CPU time) • Can potentially lose some events • Reload time (NoSQL to IMF) grows fast • Analyses are unavailable during reload
  18. 18. INFINARIO is like this Source Systems Sources SDKs BULK Frontend Data Collection Custom API Agent A Agent B Agent N Messaging System Real Time Processing IMF Topology B Topology N Topology A Storage Historic NoSQL Access Web Services REST API Web Apps Analytic Tools R / Python BI Tools Alerting Systems
  19. 19. Questions • Lambda? • Kappa? • Kafka? • Technologies for components?
  20. 20. LOW LATENCY Access IN MEMORY PROCESSING PERSISTENT STORAGE KAFKA RELOADEVENT STREAM INFINARIO Architecture Updated RAW DATA HISTORY VIEW RAW DATA HISTORY VIEW Ad hoc DM APP
  21. 21. AngularJS developer wanted Our designers works much faster than frontend-team. Could you help? Emai us: jobs@infinario.com

×