Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.
1 de 45

Building Apps with Distributed In-Memory Computing Using Apache Geode

3

Compartir

Descargar para leer sin conexión

Slides from the Meetup Monday March 7, 2016 just before the beginning of #GeodeSummit, where we cover an introduction of the technology and community that is Apache Geode, the in-memory data grid.

Libros relacionados

Gratis con una prueba de 30 días de Scribd

Ver todo

Audiolibros relacionados

Gratis con una prueba de 30 días de Scribd

Ver todo

Building Apps with Distributed In-Memory Computing Using Apache Geode

  1. 1. Building Apps with Distributed In-Memory Computing using Apache Geode Nitin Lamba @nlamba9 (incubating) William Markito @william_markito
  2. 2. Introduction (Nitin) • WHAT? Overview & history • WHY? Relevance & Differentiators • HOW? Features & Basic Concepts • SEE! Quick start Hands-on (William) • LEARN: Advanced Concepts - Persistence, f(x), PDX, … • SHOW: Demos (Docker, PDX) Resources Q & A Agenda 2
  3. 3. Introduction Nitin 3
  4. 4. From GEM to GEODE… 4
  5. 5. A distributed, memory-based data management platform for data oriented apps that need: • high performance, scalability, resiliency and continuous availability • fast access to critical data sets • location-aware distributed data processing • event-driven data architecture What is GEODE? 5
  6. 6. High-level Architecture 6 Powerful app development kit • APIs: Java & REST • Adapters: Redis, Lucene*, Spark*, … Multiple persistence options • Filesystem, RDBMS or HDFS* • Sync: read-through, write-through • Async: write-behind Durable <K,V> cache/ store • Data replicated or partitioned • Redundant storage in-memory/ disk • Flexible data retention policiesÎ ! Locator Server Server Server Server +"""" "  $ % % % && & % % % % % % % % && A Peer-2-Peer Distributed System REST ! * Experimental and waiting community feedback
  7. 7. • 1000+ systems in production (real customers) • Cutting edge use cases Incubating but ROCK solid… 7 <2000 2004 2008 2012 2016 Early drivers • Data Volumes • Margins/ transactions • IT maintenance costs • Elasticity needs Real-time needs • Real-time response • Time to market needs • Flexible Data Models • Persistent+In-memory Global Data • Visibility across DC • Fast Ingest • Device to enterprise • Uptime (always on) Open Source! • Apache Incubation • Gemfire > Geode • M1 release • 1st Geode Summit Financial Services US DoD Trade Clearing Travel Portal Online Gambling Telcos Manufacturing Auto Insurance Payroll processing Rail systems
  8. 8. …with both SCALE and SPEED, … 8 40K Transactions per second 3TB Data in-memory 17B Records in-memory 120K Concurrent users
  9. 9. … and impacting a LOT of people! 9 China Railway
 Corporation Indian Railways 19% 17% 36% of the world population
  10. 10. Built for PERFORMANCE… 10 Operationspersecond 0 200,000 400,000 600,000 800,000 YCSB Workloads AReads AUpdates BReads BUpdates CReads DInserts DReads FReads FUpdates Cassandra Geode
  11. 11. …and horizontal, consistent SCALABILITY! 11 Horizontal scaling for reads, consistent latency and CPU 0 4.5 9 13.5 18 Speedup 0 1.25 2.5 3.75 5 Server Hosts 2 4 6 8 10 speedup latency (ms) CPU % • Scaled from 256 clients and 2 servers to 1280 clients and 10 servers • Partitioned region with redundancy and 1K data size
  12. 12. • Minimize copying • Minimize contention points • Run user code in-process • Partitioning & parallelism • Avoid disk seeks • Automated benchmarks What makes it go FAST? 12
  13. 13. • Cache • Region • Member • Client Cache • Persistence • Functions • Events & Listeners • High Availability • Serialization Let’s talk about a few (basic) CONCEPTS… 13
  14. 14. • In-memory storage and management for your data • Configurable through XML, Java API or CLI • Collection of Region What is a CACHE? 14 Region Region Region Cache JVM
  15. 15. • Distributed java.util.Map on steroids (Key/Value) • Consistent API regardless of where or how data is stored • Observable (reactive) • Highly available, redundant on cache Member (s). What is a REGION? 15 Region Cache java.util.Map JVM Key Value K01 May K02 Tim
  16. 16. • Local, Replicated or Partitioned • In-memory or persistent • Redundant • LRU • Overflow Region: Types & Options 16 Region Cache java.util.Map JVM Key Value K01 May K02 Tim Region Cache java.util.Map JVM Key Value K01 May K02 Tim LOCAL LOCAL_HEAP_LRU LOCAL_OVERFLOW LOCAL_PERSISTENT LOCAL_PERSISTENT_OVERFLOW PARTITION PARTITION_HEAP_LRU PARTITION_OVERFLOW PARTITION_PERSISTENT PARTITION_PERSISTENT_OVERFLOW PARTITION_PROXY PARTITION_PROXY_REDUNDANT PARTITION_REDUNDANT PARTITION_REDUNDANT_HEAP_LRU PARTITION_REDUNDANT_OVERFLOW PARTITION_REDUNDANT_PERSISTENT PARTITION_REDUNDANT_PERSISTENT_OVERFLOW REPLICATE REPLICATE_HEAP_LRU REPLICATE_OVERFLOW REPLICATE_PERSISTENT REPLICATE_PERSISTENT_OVERFLOW REPLICATE_PROXY
  17. 17. • Durability • WAL for efficient writing • Consistent recovery • Compaction Persistent Regions 17 Modify k1->v5 Create k6->v6 Create k2->v2 Create k4->v4 Oplog2.crf Member 1 Modify k4->v7Oplog3.crf Put k4->v7 Region Cache java.util.Map JVM Key Value K01 May K02 Tim Region Cache java.util.Map JVM Key Value K01 May K02 Tim Server 1 Server N
  18. 18. • A process that has a connection to the system • A process that has created a cache • Embeddable within your application What is a MEMBER? 18 Client Locator Server
  19. 19. • A process connected to the Geode server(s) • Can have a local copy of the data • Run OQL queries on local data • Can be notified about events on the servers What is a CLIENT CACHE? 19 Application GemFire Server Region Region RegionClient Cache
  20. 20. • Clone & Build • • Start Services • Create & Monitor Region How to START? Easy as !! 20 git clone https://github.com/apache/incubator-geode cd incubator-geode
 ./gradlew build -Dskip.tests=true cd gemfire-assembly/build/install/apache-geode ./bin/gfsh gfsh> start locator --name=locator gfsh> start server --name=server gfsh> create region --name=myRegion —type=REPLICATE gfsh> start [pulse | jconsole] 1 2 3 ' 1 2 3
  21. 21. Hands On William 21
  22. 22. • Cache • Region • Member • Client Cache • Persistence • Functions • Events & Listeners • High Availability • Serialization More (advanced) CONCEPTS… 22
  23. 23. Persistence - Shared Nothing 23 Server 3Server 2Server 1
  24. 24. Persistence - Shared Nothing 24 Server 3Server 2Server 1 B1 B3 B2 B1 B3 B2 Primary Secondary
  25. 25. Persistence - Shared Nothing 25 Server 3Server 2Server 1 B1 B3 B2 B1 B3 B2 Primary Secondary
  26. 26. Persistence - Shared Nothing 26 Server 3Server 2Server 1 B1 B3 B2 B1 B3 B2 Primary Secondary
  27. 27. Persistence - Shared Nothing 27 Server 3Server 2Server 1 B1 B3 B2 B1 B3 B2 Primary Secondary
  28. 28. Persistence - Shared Nothing 28 Server 3Server 2Server 1 B1 B3 B2 B1 B3 B2 Primary Secondary B3 B2 Server 1 waits for others when it starts
  29. 29. Persistence - Shared Nothing 29 Server 3Server 2Server 1 B1 B3 B2 B1 B3 B2 Primary Secondary Fetches missed operations on restart
  30. 30. Persistence - Operational Logs 30 Create k1->v1 Create 
 k2->v2 Modify
 k1->v3 Create 
 k4->v4 Modify k1->v5 Create 
 k6->v6 Member 1 Put k6->v6 Oplog2.crf Oplog1.crf Append to operation log
  31. 31. Persistence - Operational Logs: Compaction 31 Create k1->v1 Create 
 k2->v2 Modify
 k1->v3 Create 
 k4->v4 Modify k1->v5 Create 
 k6->v6 Member 1 Put k6->v6 Oplog2.crf Oplog1.crf Append to operation log Copy live data forward
  32. 32. • Used for distributed concurrent processing 
 (Map/Reduce, stored procedure) • Highly available • Data oriented • Member oriented Functions 32 Submit (f1) f1 , f2 , … fn Execute
 Functions
  33. 33. Functions 33 Server Server FunctionService.onRegion.withFilter.execute ResultCollector.getResult Server Distributed System execute Server Server 6 1 result execute execute result result 2 5 3 4 3 4 Server Partitioned Region Data Store - X Partitioned Region Data Store - Y Partitioned Region Data Store - Z Partitioned Region Data Accessor Partitioned Region Data Accessor filter = Keys X, Y Client Region
  34. 34. • Register Interest • Individual Keys OR RegEx for Keys • Updates Local Copy • Examples: • region.registerInterest(“key-1”); • region1.registerInterestRegex(“[a-z]+“); • Continuous Query • Receive Notification when Query condition met on server • Example: • SELECT * FROM /tradeOrder t WHERE t.price > 100.00 Can be DURABLE Events & Notifications 34
  35. 35. • CacheWriter / CacheListener • AsyncEventListener (queue / batch) • Parallel or Serial • Conflation Listeners 35
  36. 36. High Availability 36
  37. 37. Fixed or Flexible schema? 37 id name age pet_id or { id : 1, name : “Fred”, age : 42, pet : { name : “Barney”, type : “dino” } }
  38. 38. Portable Data eXchange (PDX) 38 C#, C++, Java, JSON No IDL, no schemas, no hand-coding Schema evolution (Forward and Backward Compatible) * domain object classes not required | header | data | | pdx | length | dsid | typeid | fields | offsets |
  39. 39. Efficient for queries 39 { id : 1, name : “Fred”, age : 42, pet : { name : “Barney”, type : “dino” } } SELECT p.name FROM /Person p WHERE p.pet.type = “dino” single field deserialization
  40. 40. But HOW to serialize data? 40 Benchmark: https://github.com/eishay/jvm-serializers
  41. 41. Schema Evolution 41 Member A Member B Distributed Type Definitions v2v1 Application #1 Application #2 v2 objects preserve data from missing fields v1 objects use default values to fill in new fields PDX provides forwards and backwards compatibility, no code required
  42. 42. Demo (Docker, PDX, …) 42
  43. 43. Code • New features • Bug fixes • Writing tests Documentation • Wiki • Web site • User guide How to CONTRIBUTE? 43 Community • Join the mailing list • Ask or answer • Join our HipChat • Become a speaker • Finding bugs • Testing an RC/Beta
  44. 44. Website http://geode.incubator.apache.org/ JIRA https://issues.apache.org/jira/browse/GEODE Wiki cwiki.apache.org/confluence/display/GEODE GitHub https://github.com/apache/incubator-geode Mailing lists mail-archives.apache.org/mod_mbox/incubator-geode-dev/ Where to BEGIN? 44
  45. 45. 45 Thank you! http://geode.incubator.apache.org https://github.com/Pivotal-Open-Source-Hub

×