SlideShare una empresa de Scribd logo
1 de 67
100 Million Events
    Eric Lubow
    @elubow
    elubow@simplereach.com
Overview




100 Million Events   Eric Lubow   @elubow
Overview
•   SimpleReach




    100 Million Events   Eric Lubow   @elubow
Overview
•   SimpleReach
•   100 Million Events




    100 Million Events   Eric Lubow   @elubow
Overview
•   SimpleReach
•   100 Million Events
•   Finding Patterns in Your Data




    100 Million Events              Eric Lubow   @elubow
Overview
•   SimpleReach
•   100 Million Events
•   Finding Patterns in Your Data
•   What Mistakes?




    100 Million Events              Eric Lubow   @elubow
Overview
•   SimpleReach
•   100 Million Events
•   Finding Patterns in Your Data
•   What Mistakes?
•   Questions


    100 Million Events              Eric Lubow   @elubow
Socially Intelligent



100 Million Events                          Eric Lubow   @elubow
Size




100 Million Events   Eric Lubow   @elubow
Size
•   100m events
    recorded per day and
    growing




     100 Million Events    Eric Lubow   @elubow
Size
•   100m events
    recorded per day and
    growing
•   500m Pageviews per
    month and growing




     100 Million Events    Eric Lubow   @elubow
Right Tool For The Job




100 Million Events              Eric Lubow   @elubow
Why?




100 Million Events   Eric Lubow   @elubow
Why?
•   Heavier READ loads vs heavier write loads




    100 Million Events                          Eric Lubow   @elubow
Why?
•   Heavier READ loads vs heavier write loads
•   Data relationships may be less important




    100 Million Events                          Eric Lubow   @elubow
Why?
•   Heavier READ loads vs heavier write loads
•   Data relationships may be less important
•   Different aspects of a system have different requirements




    100 Million Events                               Eric Lubow   @elubow
Why?
•   Heavier READ loads vs heavier write loads
•   Data relationships may be less important
•   Different aspects of a system have different requirements
•   Know your compromises




    100 Million Events                               Eric Lubow   @elubow
Cassandra




100 Million Events   Eric Lubow   @elubow
Cassandra
•   Large data volume ingestion




    100 Million Events            Eric Lubow   @elubow
Cassandra
•   Large data volume ingestion
•   Really fast writes to many locations (eventual consistency)




    100 Million Events                                            Eric Lubow   @elubow
Cassandra
•   Large data volume ingestion
•   Really fast writes to many locations (eventual consistency)
•   Query by column groups within rows




    100 Million Events                                            Eric Lubow   @elubow
Cassandra
•   Large data volume ingestion
•   Really fast writes to many locations (eventual consistency)
•   Query by column groups within rows
•   Range queries in Hive (Slice predicate ranges)




    100 Million Events                                            Eric Lubow   @elubow
Cassandra
•   Large data volume ingestion
•   Really fast writes to many locations (eventual consistency)
•   Query by column groups within rows
•   Range queries in Hive (Slice predicate ranges)
•   Fault tolerant




    100 Million Events                                            Eric Lubow   @elubow
What Mistakes?




100 Million Events   Eric Lubow   @elubow
What Mistakes?
•   Manage how many servers?




    100 Million Events         Eric Lubow   @elubow
What Mistakes?
•   Manage how many servers?
•   Re-inventing the wheel (Helenus)




    100 Million Events                 Eric Lubow   @elubow
What Mistakes?
•   Manage how many servers?
•   Re-inventing the wheel (Helenus)
•   Composites Rock




    100 Million Events                 Eric Lubow   @elubow
What Mistakes?
•   Manage how many servers?
•   Re-inventing the wheel (Helenus)
•   Composites Rock
•   Snapshots before drop keyspace




    100 Million Events                 Eric Lubow   @elubow
What Mistakes?
•   Manage how many servers?
•   Re-inventing the wheel (Helenus)
•   Composites Rock
•   Snapshots before drop keyspace
•   How many experts does it take to run a cluster?




    100 Million Events                                Eric Lubow   @elubow
What Mistakes?
•   Manage how many servers?
•   Re-inventing the wheel (Helenus)
•   Composites Rock
•   Snapshots before drop keyspace
•   How many experts does it take to run a cluster?
•   You can tune Cassandra?!?

    100 Million Events                                Eric Lubow   @elubow
Server Management


                     Cluster SSH




100 Million Events   Eric Lubow   @elubow
Server Management
•   Hand tools - AWS, csshx


                              Cluster SSH




    100 Million Events        Eric Lubow   @elubow
Server Management
•   Hand tools - AWS, csshx
•   Configuration Management
                              Cluster SSH




    100 Million Events        Eric Lubow   @elubow
Server Management
•   Hand tools - AWS, csshx
•   Configuration Management
•   Monitoring and Alerting Tools   Cluster SSH




    100 Million Events              Eric Lubow   @elubow
Server Management
•   Hand tools - AWS, csshx
•   Configuration Management
•   Monitoring and Alerting Tools   Cluster SSH
•   Performance




    100 Million Events              Eric Lubow   @elubow
Server Management
•   Hand tools - AWS, csshx
•   Configuration Management
•   Monitoring and Alerting Tools   Cluster SSH
•   Performance
•   Security




    100 Million Events              Eric Lubow   @elubow
Helenus




100 Million Events   Eric Lubow   @elubow
Helenus
•   Built Node.js driver for Cassandra




    100 Million Events                   Eric Lubow   @elubow
Helenus
•   Built Node.js driver for Cassandra
•   https://github.com/simplereach/helenus




    100 Million Events                       Eric Lubow   @elubow
Helenus
•   Built Node.js driver for Cassandra
•   https://github.com/simplereach/helenus
•   CQL 2/3, Composite Column, Thrift Interface




    100 Million Events                            Eric Lubow   @elubow
Helenus
•   Built Node.js driver for Cassandra
•   https://github.com/simplereach/helenus
•   CQL 2/3, Composite Column, Thrift Interface
•   Parallel querying (split up queries)




    100 Million Events                            Eric Lubow   @elubow
Helenus
•   Built Node.js driver for Cassandra
•   https://github.com/simplereach/helenus
•   CQL 2/3, Composite Column, Thrift Interface
•   Parallel querying (split up queries)
•   Fault tolerance and resilience


    100 Million Events                            Eric Lubow   @elubow
Data Patterns




100 Million Events   Eric Lubow   @elubow
Data Patterns
•   Storage is cheap




    100 Million Events   Eric Lubow   @elubow
Data Patterns
•   Storage is cheap
•   Composites are WAY better than underscores




    100 Million Events                           Eric Lubow   @elubow
Data Patterns
•   Storage is cheap
•   Composites are WAY better than underscores
•   Beyond UTF8Type




    100 Million Events                           Eric Lubow   @elubow
Data Patterns
•   Storage is cheap
•   Composites are WAY better than underscores
•   Beyond UTF8Type
•   Timestamps as LongType




    100 Million Events                           Eric Lubow   @elubow
Safety Mechanisms




100 Million Events   Eric Lubow   @elubow
Safety Mechanisms
•   Snapshots before dropping keyspaces




    100 Million Events                    Eric Lubow   @elubow
Safety Mechanisms
•   Snapshots before dropping keyspaces
•   Authorization and authentication




    100 Million Events                    Eric Lubow   @elubow
Safety Mechanisms
•   Snapshots before dropping keyspaces
•   Authorization and authentication
•   (Limit) Direct access to the data store




    100 Million Events                        Eric Lubow   @elubow
Expertise




100 Million Events   Eric Lubow   @elubow
Expertise
•   What happens when you need help?




    100 Million Events                 Eric Lubow   @elubow
Expertise
•   What happens when you need help?
•   How do you become an expert?




    100 Million Events                 Eric Lubow   @elubow
Expertise
•   What happens when you need help?
•   How do you become an expert?
•   What happens when you need more experts?




    100 Million Events                         Eric Lubow   @elubow
Tunables




100 Million Events   Eric Lubow   @elubow
Tunables
•   Replication factor and read_repair_chance




    100 Million Events                          Eric Lubow   @elubow
Tunables
•   Replication factor and read_repair_chance
•   Phi Convict and RPC timeout for AWS or DC separation




    100 Million Events                                     Eric Lubow   @elubow
Tunables
•   Replication factor and read_repair_chance
•   Phi Convict and RPC timeout for AWS or DC separation
•   MAX_HEAP_SIZE and HEAP_NEWSIZE (Analytics vs Realtime)




    100 Million Events                                     Eric Lubow   @elubow
Future
•   Priam
•   Asgard
•   Curator
•   Work for             ?
•   Hastur



    100 Million Events       Eric Lubow   @elubow
Summary




100 Million Events   Eric Lubow   @elubow
Summary
•   Learn from others mistakes




    100 Million Events           Eric Lubow   @elubow
Summary
•   Learn from others mistakes
•   Tuning and data patterns




    100 Million Events           Eric Lubow   @elubow
Summary
•   Learn from others mistakes
•   Tuning and data patterns
•   It’s ok to re-invent the wheel




    100 Million Events               Eric Lubow   @elubow
Summary
•   Learn from others mistakes
•   Tuning and data patterns
•   It’s ok to re-invent the wheel
•   Applications for/with Cassandra




    100 Million Events                Eric Lubow   @elubow
We’re Hiring




100 Million Events                  Eric Lubow   @elubow
Questions are guaranteed in life.
Answers aren’t.

               Eric Lubow
               @elubow
               elubow@simplereach.com


               Thank you.

Más contenido relacionado

Similar a 100m Events

Adopting Elixir in a 10 year old codebase
Adopting Elixir in a 10 year old codebaseAdopting Elixir in a 10 year old codebase
Adopting Elixir in a 10 year old codebaseMichael Klishin
 
Premature optimisation: The Root of All Evil
Premature optimisation: The Root of All EvilPremature optimisation: The Root of All Evil
Premature optimisation: The Root of All EvilFabio Akita
 
DevCommerce Conference 2016: Performance, anti-patterns e stacks pra desenvol...
DevCommerce Conference 2016: Performance, anti-patterns e stacks pra desenvol...DevCommerce Conference 2016: Performance, anti-patterns e stacks pra desenvol...
DevCommerce Conference 2016: Performance, anti-patterns e stacks pra desenvol...iMasters
 
Canary Analyze All the Things
Canary Analyze All the ThingsCanary Analyze All the Things
Canary Analyze All the Thingsroyrapoport
 
TDC2016SP - Otimização Prematura: a Raíz de Todo o Mal
TDC2016SP - Otimização Prematura: a Raíz de Todo o MalTDC2016SP - Otimização Prematura: a Raíz de Todo o Mal
TDC2016SP - Otimização Prematura: a Raíz de Todo o Maltdc-globalcode
 
Andrew Polaszek - ZooBank: ICZN’s open-access web register of animal names a...
Andrew Polaszek - ZooBank:  ICZN’s open-access web register of animal names a...Andrew Polaszek - ZooBank:  ICZN’s open-access web register of animal names a...
Andrew Polaszek - ZooBank: ICZN’s open-access web register of animal names a...ICZN
 
Your Goat Antifragiled My Snowflake!: Demystifying DevOps Jargon - ChefConf 2015
Your Goat Antifragiled My Snowflake!: Demystifying DevOps Jargon - ChefConf 2015Your Goat Antifragiled My Snowflake!: Demystifying DevOps Jargon - ChefConf 2015
Your Goat Antifragiled My Snowflake!: Demystifying DevOps Jargon - ChefConf 2015Chef
 
Apache Solr 5.0 and beyond
Apache Solr 5.0 and beyondApache Solr 5.0 and beyond
Apache Solr 5.0 and beyondAnshum Gupta
 
Micro Services - Smaller is Better?
Micro Services - Smaller is Better?Micro Services - Smaller is Better?
Micro Services - Smaller is Better?Eberhard Wolff
 
Testing Variability-Intensive Systems, tutorial SPLC 2017, part I
Testing Variability-Intensive Systems, tutorial SPLC 2017, part ITesting Variability-Intensive Systems, tutorial SPLC 2017, part I
Testing Variability-Intensive Systems, tutorial SPLC 2017, part IXavierDevroey
 
Conexão Kinghost - Otimização Prematura
Conexão Kinghost - Otimização PrematuraConexão Kinghost - Otimização Prematura
Conexão Kinghost - Otimização PrematuraFabio Akita
 
Micro Service – The New Architecture Paradigm
Micro Service – The New Architecture ParadigmMicro Service – The New Architecture Paradigm
Micro Service – The New Architecture ParadigmEberhard Wolff
 
Interns What Is DevOps
Interns What Is DevOpsInterns What Is DevOps
Interns What Is DevOpsAaron Blythe
 
Dashboard Mania
Dashboard ManiaDashboard Mania
Dashboard ManiaTim Lossen
 
Dockercon USA 2016 - Immutable Awesomeness
Dockercon USA 2016 - Immutable Awesomeness Dockercon USA 2016 - Immutable Awesomeness
Dockercon USA 2016 - Immutable Awesomeness John Willis
 
Immutable Awesomeness by John Willis and Josh Corman
Immutable Awesomeness by John Willis and Josh CormanImmutable Awesomeness by John Willis and Josh Corman
Immutable Awesomeness by John Willis and Josh CormanDocker, Inc.
 
The "Ops" Side of DevSecOps
The "Ops" Side of DevSecOps The "Ops" Side of DevSecOps
The "Ops" Side of DevSecOps Rundeck
 
Programming quantum computers in Q# (Techorama NL 2018)
Programming quantum computers in Q# (Techorama NL 2018)Programming quantum computers in Q# (Techorama NL 2018)
Programming quantum computers in Q# (Techorama NL 2018)Rolf Huisman
 
DevOps 2016 summit
DevOps 2016 summitDevOps 2016 summit
DevOps 2016 summitChihyang Li
 

Similar a 100m Events (20)

Adopting Elixir in a 10 year old codebase
Adopting Elixir in a 10 year old codebaseAdopting Elixir in a 10 year old codebase
Adopting Elixir in a 10 year old codebase
 
Premature optimisation: The Root of All Evil
Premature optimisation: The Root of All EvilPremature optimisation: The Root of All Evil
Premature optimisation: The Root of All Evil
 
DevCommerce Conference 2016: Performance, anti-patterns e stacks pra desenvol...
DevCommerce Conference 2016: Performance, anti-patterns e stacks pra desenvol...DevCommerce Conference 2016: Performance, anti-patterns e stacks pra desenvol...
DevCommerce Conference 2016: Performance, anti-patterns e stacks pra desenvol...
 
Canary Analyze All the Things
Canary Analyze All the ThingsCanary Analyze All the Things
Canary Analyze All the Things
 
TDC2016SP - Otimização Prematura: a Raíz de Todo o Mal
TDC2016SP - Otimização Prematura: a Raíz de Todo o MalTDC2016SP - Otimização Prematura: a Raíz de Todo o Mal
TDC2016SP - Otimização Prematura: a Raíz de Todo o Mal
 
Andrew Polaszek - ZooBank: ICZN’s open-access web register of animal names a...
Andrew Polaszek - ZooBank:  ICZN’s open-access web register of animal names a...Andrew Polaszek - ZooBank:  ICZN’s open-access web register of animal names a...
Andrew Polaszek - ZooBank: ICZN’s open-access web register of animal names a...
 
Your Goat Antifragiled My Snowflake!: Demystifying DevOps Jargon - ChefConf 2015
Your Goat Antifragiled My Snowflake!: Demystifying DevOps Jargon - ChefConf 2015Your Goat Antifragiled My Snowflake!: Demystifying DevOps Jargon - ChefConf 2015
Your Goat Antifragiled My Snowflake!: Demystifying DevOps Jargon - ChefConf 2015
 
Apache Solr 5.0 and beyond
Apache Solr 5.0 and beyondApache Solr 5.0 and beyond
Apache Solr 5.0 and beyond
 
Micro Services - Smaller is Better?
Micro Services - Smaller is Better?Micro Services - Smaller is Better?
Micro Services - Smaller is Better?
 
Testing Variability-Intensive Systems, tutorial SPLC 2017, part I
Testing Variability-Intensive Systems, tutorial SPLC 2017, part ITesting Variability-Intensive Systems, tutorial SPLC 2017, part I
Testing Variability-Intensive Systems, tutorial SPLC 2017, part I
 
Conexão Kinghost - Otimização Prematura
Conexão Kinghost - Otimização PrematuraConexão Kinghost - Otimização Prematura
Conexão Kinghost - Otimização Prematura
 
Micro Service – The New Architecture Paradigm
Micro Service – The New Architecture ParadigmMicro Service – The New Architecture Paradigm
Micro Service – The New Architecture Paradigm
 
Interns What Is DevOps
Interns What Is DevOpsInterns What Is DevOps
Interns What Is DevOps
 
Dashboard Mania
Dashboard ManiaDashboard Mania
Dashboard Mania
 
Dockercon USA 2016 - Immutable Awesomeness
Dockercon USA 2016 - Immutable Awesomeness Dockercon USA 2016 - Immutable Awesomeness
Dockercon USA 2016 - Immutable Awesomeness
 
Immutable Awesomeness by John Willis and Josh Corman
Immutable Awesomeness by John Willis and Josh CormanImmutable Awesomeness by John Willis and Josh Corman
Immutable Awesomeness by John Willis and Josh Corman
 
The "Ops" Side of DevSecOps
The "Ops" Side of DevSecOps The "Ops" Side of DevSecOps
The "Ops" Side of DevSecOps
 
Programming quantum computers in Q# (Techorama NL 2018)
Programming quantum computers in Q# (Techorama NL 2018)Programming quantum computers in Q# (Techorama NL 2018)
Programming quantum computers in Q# (Techorama NL 2018)
 
Ds @ bol
Ds @ bolDs @ bol
Ds @ bol
 
DevOps 2016 summit
DevOps 2016 summitDevOps 2016 summit
DevOps 2016 summit
 

Último

Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 

Último (20)

Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 

100m Events

  • 1. 100 Million Events Eric Lubow @elubow elubow@simplereach.com
  • 2. Overview 100 Million Events Eric Lubow @elubow
  • 3. Overview • SimpleReach 100 Million Events Eric Lubow @elubow
  • 4. Overview • SimpleReach • 100 Million Events 100 Million Events Eric Lubow @elubow
  • 5. Overview • SimpleReach • 100 Million Events • Finding Patterns in Your Data 100 Million Events Eric Lubow @elubow
  • 6. Overview • SimpleReach • 100 Million Events • Finding Patterns in Your Data • What Mistakes? 100 Million Events Eric Lubow @elubow
  • 7. Overview • SimpleReach • 100 Million Events • Finding Patterns in Your Data • What Mistakes? • Questions 100 Million Events Eric Lubow @elubow
  • 8. Socially Intelligent 100 Million Events Eric Lubow @elubow
  • 9. Size 100 Million Events Eric Lubow @elubow
  • 10. Size • 100m events recorded per day and growing 100 Million Events Eric Lubow @elubow
  • 11. Size • 100m events recorded per day and growing • 500m Pageviews per month and growing 100 Million Events Eric Lubow @elubow
  • 12. Right Tool For The Job 100 Million Events Eric Lubow @elubow
  • 13. Why? 100 Million Events Eric Lubow @elubow
  • 14. Why? • Heavier READ loads vs heavier write loads 100 Million Events Eric Lubow @elubow
  • 15. Why? • Heavier READ loads vs heavier write loads • Data relationships may be less important 100 Million Events Eric Lubow @elubow
  • 16. Why? • Heavier READ loads vs heavier write loads • Data relationships may be less important • Different aspects of a system have different requirements 100 Million Events Eric Lubow @elubow
  • 17. Why? • Heavier READ loads vs heavier write loads • Data relationships may be less important • Different aspects of a system have different requirements • Know your compromises 100 Million Events Eric Lubow @elubow
  • 18. Cassandra 100 Million Events Eric Lubow @elubow
  • 19. Cassandra • Large data volume ingestion 100 Million Events Eric Lubow @elubow
  • 20. Cassandra • Large data volume ingestion • Really fast writes to many locations (eventual consistency) 100 Million Events Eric Lubow @elubow
  • 21. Cassandra • Large data volume ingestion • Really fast writes to many locations (eventual consistency) • Query by column groups within rows 100 Million Events Eric Lubow @elubow
  • 22. Cassandra • Large data volume ingestion • Really fast writes to many locations (eventual consistency) • Query by column groups within rows • Range queries in Hive (Slice predicate ranges) 100 Million Events Eric Lubow @elubow
  • 23. Cassandra • Large data volume ingestion • Really fast writes to many locations (eventual consistency) • Query by column groups within rows • Range queries in Hive (Slice predicate ranges) • Fault tolerant 100 Million Events Eric Lubow @elubow
  • 24. What Mistakes? 100 Million Events Eric Lubow @elubow
  • 25. What Mistakes? • Manage how many servers? 100 Million Events Eric Lubow @elubow
  • 26. What Mistakes? • Manage how many servers? • Re-inventing the wheel (Helenus) 100 Million Events Eric Lubow @elubow
  • 27. What Mistakes? • Manage how many servers? • Re-inventing the wheel (Helenus) • Composites Rock 100 Million Events Eric Lubow @elubow
  • 28. What Mistakes? • Manage how many servers? • Re-inventing the wheel (Helenus) • Composites Rock • Snapshots before drop keyspace 100 Million Events Eric Lubow @elubow
  • 29. What Mistakes? • Manage how many servers? • Re-inventing the wheel (Helenus) • Composites Rock • Snapshots before drop keyspace • How many experts does it take to run a cluster? 100 Million Events Eric Lubow @elubow
  • 30. What Mistakes? • Manage how many servers? • Re-inventing the wheel (Helenus) • Composites Rock • Snapshots before drop keyspace • How many experts does it take to run a cluster? • You can tune Cassandra?!? 100 Million Events Eric Lubow @elubow
  • 31. Server Management Cluster SSH 100 Million Events Eric Lubow @elubow
  • 32. Server Management • Hand tools - AWS, csshx Cluster SSH 100 Million Events Eric Lubow @elubow
  • 33. Server Management • Hand tools - AWS, csshx • Configuration Management Cluster SSH 100 Million Events Eric Lubow @elubow
  • 34. Server Management • Hand tools - AWS, csshx • Configuration Management • Monitoring and Alerting Tools Cluster SSH 100 Million Events Eric Lubow @elubow
  • 35. Server Management • Hand tools - AWS, csshx • Configuration Management • Monitoring and Alerting Tools Cluster SSH • Performance 100 Million Events Eric Lubow @elubow
  • 36. Server Management • Hand tools - AWS, csshx • Configuration Management • Monitoring and Alerting Tools Cluster SSH • Performance • Security 100 Million Events Eric Lubow @elubow
  • 37. Helenus 100 Million Events Eric Lubow @elubow
  • 38. Helenus • Built Node.js driver for Cassandra 100 Million Events Eric Lubow @elubow
  • 39. Helenus • Built Node.js driver for Cassandra • https://github.com/simplereach/helenus 100 Million Events Eric Lubow @elubow
  • 40. Helenus • Built Node.js driver for Cassandra • https://github.com/simplereach/helenus • CQL 2/3, Composite Column, Thrift Interface 100 Million Events Eric Lubow @elubow
  • 41. Helenus • Built Node.js driver for Cassandra • https://github.com/simplereach/helenus • CQL 2/3, Composite Column, Thrift Interface • Parallel querying (split up queries) 100 Million Events Eric Lubow @elubow
  • 42. Helenus • Built Node.js driver for Cassandra • https://github.com/simplereach/helenus • CQL 2/3, Composite Column, Thrift Interface • Parallel querying (split up queries) • Fault tolerance and resilience 100 Million Events Eric Lubow @elubow
  • 43. Data Patterns 100 Million Events Eric Lubow @elubow
  • 44. Data Patterns • Storage is cheap 100 Million Events Eric Lubow @elubow
  • 45. Data Patterns • Storage is cheap • Composites are WAY better than underscores 100 Million Events Eric Lubow @elubow
  • 46. Data Patterns • Storage is cheap • Composites are WAY better than underscores • Beyond UTF8Type 100 Million Events Eric Lubow @elubow
  • 47. Data Patterns • Storage is cheap • Composites are WAY better than underscores • Beyond UTF8Type • Timestamps as LongType 100 Million Events Eric Lubow @elubow
  • 48. Safety Mechanisms 100 Million Events Eric Lubow @elubow
  • 49. Safety Mechanisms • Snapshots before dropping keyspaces 100 Million Events Eric Lubow @elubow
  • 50. Safety Mechanisms • Snapshots before dropping keyspaces • Authorization and authentication 100 Million Events Eric Lubow @elubow
  • 51. Safety Mechanisms • Snapshots before dropping keyspaces • Authorization and authentication • (Limit) Direct access to the data store 100 Million Events Eric Lubow @elubow
  • 52. Expertise 100 Million Events Eric Lubow @elubow
  • 53. Expertise • What happens when you need help? 100 Million Events Eric Lubow @elubow
  • 54. Expertise • What happens when you need help? • How do you become an expert? 100 Million Events Eric Lubow @elubow
  • 55. Expertise • What happens when you need help? • How do you become an expert? • What happens when you need more experts? 100 Million Events Eric Lubow @elubow
  • 56. Tunables 100 Million Events Eric Lubow @elubow
  • 57. Tunables • Replication factor and read_repair_chance 100 Million Events Eric Lubow @elubow
  • 58. Tunables • Replication factor and read_repair_chance • Phi Convict and RPC timeout for AWS or DC separation 100 Million Events Eric Lubow @elubow
  • 59. Tunables • Replication factor and read_repair_chance • Phi Convict and RPC timeout for AWS or DC separation • MAX_HEAP_SIZE and HEAP_NEWSIZE (Analytics vs Realtime) 100 Million Events Eric Lubow @elubow
  • 60. Future • Priam • Asgard • Curator • Work for ? • Hastur 100 Million Events Eric Lubow @elubow
  • 61. Summary 100 Million Events Eric Lubow @elubow
  • 62. Summary • Learn from others mistakes 100 Million Events Eric Lubow @elubow
  • 63. Summary • Learn from others mistakes • Tuning and data patterns 100 Million Events Eric Lubow @elubow
  • 64. Summary • Learn from others mistakes • Tuning and data patterns • It’s ok to re-invent the wheel 100 Million Events Eric Lubow @elubow
  • 65. Summary • Learn from others mistakes • Tuning and data patterns • It’s ok to re-invent the wheel • Applications for/with Cassandra 100 Million Events Eric Lubow @elubow
  • 66. We’re Hiring 100 Million Events Eric Lubow @elubow
  • 67. Questions are guaranteed in life. Answers aren’t. Eric Lubow @elubow elubow@simplereach.com Thank you.

Notas del editor

  1. \n
  2. \n
  3. \n
  4. \n
  5. \n
  6. \n
  7. SimpleReach is a social intelligence tool for content creators. We track everything social action, on every major network, across the entire web in real-time. That means every like, tweet, pin, stumble and many more.\n
  8. \n
  9. \n
  10. \n
  11. \n
  12. \n
  13. \n
  14. \n
  15. \n
  16. \n
  17. \n
  18. \n
  19. \n
  20. \n
  21. \n
  22. \n
  23. \n
  24. \n
  25. \n
  26. \n
  27. \n
  28. \n
  29. \n
  30. \n
  31. \n
  32. \n
  33. \n
  34. \n
  35. \n
  36. \n
  37. \n
  38. \n
  39. \n
  40. \n
  41. \n
  42. \n
  43. \n
  44. \n
  45. \n
  46. \n
  47. \n
  48. \n
  49. \n
  50. \n
  51. \n
  52. \n
  53. \n
  54. \n
  55. \n