Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

Aerospike for machine learning

Using Aerospike and Machine Learning
by Brian Bulkowski

  • Inicia sesión para ver los comentarios

  • Sé el primero en recomendar esto

Aerospike for machine learning

  1. 1. Using Aerospike and Machine Learning Brian Bulkowski CTO, Founder @bbulkow
  2. 2. 2© 2016 Aerospike Inc. All rights reserved.[ ] What is Aerospike ? Large-scale DHT Database ( 10B ++ objects, 100T++, O(1) get / put ) … with queries, data structures, UDF, fast clients ... ... On Linux ... High availability clustering & rebalancing ( proven 5 9’s, no load balancer ) Very high performance C code – reads and writes ( 2M++ TPS from Flash, 4M++ TPS from DRAM PER SERVER ) KVS++ provides query, UDF, table/columns, aggregations, SQL Direct attach storage; persistence through replication and Flash Cloud-savvy – runs with EC2, GCE others; Docker, more … Dual License: Open Source for devs, Enterprise for deployment
  3. 3. 3© 2016 Aerospike Inc. All rights reserved.[ ] Architecture Overview – Flash based system of engagement LEGACY DATABASE (Mainframe) XDR Decisioning Engine DATA WAREHOUSE/ DATA LAKE LEGACY RDBMS HDFS BASED BUSINESS TRANSACTIONS Web views ( Payments ) ( Mobile Queries ) ( Recommendation ) ( And More ) High Performance NoSQL “REAL-TIME BIG DATA” “DECISIONING” 500 Business Trans per sec 5000 Calculations per sec X = 2.5 M Database Transactions per sec
  4. 4. 4© 2016 Aerospike Inc. All rights reserved.[ ] CREDIT CARD PROCESSING SYSTEM FRAUD DETECTION & PROTECTION APP ACCOUNT BEHAVIOR ACCOUNT STATISTICS STATIC DATA RULE 1 – PASSED ✔ RULE 2 – PASSED ✔ RULE 3 – FAILED ✗ HISTORICAL DATA RULES RULE 1 RULE 2 RULE 3 … Challenge ■ Overall SLA 750 ms ■ Loss of Business due to latency ■ Every Credit Card transaction requires hundreds of DB reads/writes Need to scale reliably ■ 10  100 TB ■ 10B  100 B objects ■ 200k  I Million+ TPS Selected NoSQL ■ Built for Flash ■ Predictable Low latency at High Throughput ■ Immediate consistency, no data loss ■ Cross data center (XDR) support ■ 20 Server Cluster ■ Dell 730xd w/ 4NVMe SSDs Example - Fraud Prevention
  5. 5. 5© 2016 Aerospike Inc. All rights reserved.[ ] ■ 3 node cluster, Intel S3700 SSDs ■ Followed religiously all DataStax recommendations ■ Standard YCSB, includes instructions to reproduce for your workload ■ http://www.aerospike.com/blog/comparing-nosql-databases-aerospike-and- cassandra/ Aerospike vs Cassandra ( 2016 )
  6. 6. 6© 2016 Aerospike Inc. All rights reserved.[ ] Aerospike vs Cassandra ( 2016 )
  7. 7. 7© 2016 Aerospike Inc. All rights reserved.[ ] Aerospike vs Cassandra ( 2016 )
  8. 8. Online Learning Leveraging Aerospike to Power Real-time Analytics
  9. 9. Neilson Marketing Cloud Webinar Brent Keator VP Infrastructure Neilson Marketing Cloud Kevin Lyons Senior VP Data Science Neilson Marketing Cloud YouTube: Neilsen Marketing Cloud Aerospike Webinar 2016 Aerospike: https://aerospike.com/webinars
  10. 10. Models that build profitable marketing audiences at scale... Finding more of your best customers: High-income business professional
  11. 11. The Modeling Process, simplified
  12. 12. 2012 2015 30 - 40 models levering billions of events Creating 100 million + scores over 1000 models ‘leveraging’ trillions of events Creating 150 billion+ scores / day The Challenge
  13. 13. A system creates as many models as we want, when we want them, that dynamically adapts in real-time to changing conditions ▪ Automatically creates, validates, ships, and monitors models, with a capacity that scales to 10s of thousands of models The Opportunity What we really need:
  14. 14. In other words, we simply need ….
  15. 15. Online models evolve & adapt over time, in reaction to a changing environment with each and every event Given a complete data set, a batch model is created in entirety all at once Introducing Online Learning Batch Online Learning Creation Evolution
  16. 16. large-scale data storage large-scale data movement painful data aggregation lots of manual everything Harder to build models, but easier to evaluate limited data storage, mostly for monitoring event-level data streams light data aggregation lots of automatic everything Easier to build, but harder to evaluate (& support) Batch Models (Offline) vs. Online Learning Online LearningBatch Models (Offline)
  17. 17. ● Outperformed both L2 and Elastic Net ● Leverages small (‘micro’) batches ● Validates and monitors models in real time ● Alerts team when models are not behaving Some Techno Mumbo Jumbo Stochastic gradient descent with L1 regularization
  18. 18. eXelate.com @eXelate Technical Solutions How do we do it?
  19. 19. eXpresso Serving Cluster 10B+ events/day 300+ nodes across 4 data centers eXtream Modeling Cluster 160B models/day 100+ nodes across 4 data centers JGroups Distribute d Messagin g Serving Layer
  20. 20. Our Aerospike “Citrusleaf” Use-Cases Unique User DataStore 53 Servers across 4 data centers Specs Memory: 512GB CPU: e5-2620v2 (Dual-Socket) Disk: Intel S3710(13-15 1.2TB SSDs) Network: Aggregated 10GB NICs 2-Namespaces Online Learning (Models DataStore) 9 Servers across 3 data centers Specs Memory: 32GB CPU: e5-2620 (Dual-Socket) Disk:1-240GB SSDs Network: Aggregated 1GB NICs 1-Namespace Online Learning
  21. 21. Online LearningBatch Models (Offline) Batch Predefined ratio Predefined feature selection One time Validation Streaming Downsampling Automated feature selection Ongoing data cleaning Ongoing validation The Online Learning Challenge
  22. 22. ● All necessary data already exists in eXtream ● The cluster’s processing resources can be better utilized ● eXtream addresses most performance / scalability requirements ● Scoring mechanism already exists eXtream as a Framework for Online Learning Why it works...
  23. 23. Online Learning Flow
  24. 24. ● Labeling Mechanism - customer defined target audience Events Classification
  25. 25. ● Downsampling mechanism ● Burst tolerance ● Duplicate entries Dataset Preparation
  26. 26. ● Blacklist ● Whitelist ● Automatic Tuning Features Selection
  27. 27. ● Sliding window of recent events ● 60/40 not-converted/converted ratio ● Various accuracy metrics (lift, precision, recall, confusion matrix) ● Decide if the model is ready for making predictions Model Validation
  28. 28. ● Two phases (Scoring, Re-code) ● Scale vs Accuracy tradeoff Predictions Mechanism
  29. 29. Scalability / Performance Thousands of Concurrent Models: High Throughput: billions of training events per daytraining, validation, scoring
  30. 30. Why do we need it? ● Store the models in one common place ● Persistency ● Built-in replication Scalability / Performance Why do we need it?
  31. 31. XDR Replication Map Inter-DC Network Bare-Metal Cloud LVS/GSLB/XDR = HA Online Learning Datastore Replication
  32. 32. Monitoring- Why do we need it? thousands of models automatically created by users some models won’t converge
  33. 33. Monitoring- Real Time
  34. 34. Monitoring- Aggregation
  35. 35. Monitoring- DS Bot
  36. 36. eXelate.com @eXelate Thank You!

×