SlideShare una empresa de Scribd logo
1 de 15
In-Flux Limiting for a Multi-Tenant Logging Service
Ambud Sharma & Suma Cherukuri
Cloud Platform Engineering @ Symantec
In-Flux Limiting for a Multi-Tenant Logging Service 1
Overview
• Who are we?
• Architecture
• Streaming Pipeline
• Influx Issue
• Influx Limiting Design & Solution
• Conclusion
• Q & A
In-Flux Limiting for a Multi-Tenant Logging Service 2
Who are we?
• Symantec’s internal cloud team
• Host over $1B+ revenue applications
• Team
– Logging as a Service (LaaS) – Elasticsearch/Kibana
– Metering as a Service (MaaS) – InfluxDB/Grafana
– Alerting as a Service (AaaS) – Hendrix
We are hiring!
Also checkout Hendrix: https://github.com/Symantec/hendrix
In-Flux Limiting for a Multi-Tenant Logging Service 3
Our Data
Logs
• Application and system
logs data from VM’s and
Containers
• Used for troubleshooting
Metrics
• Application and system
telemetries
• Used for Application
Performance
Monitoring
{
“message”: “User logged in from 1.1.1.1”,
“@version”: "1",
“@timestamp”: "2014-07-16T06:49:39.919Z",
“host”: "value",
“path”: “/opt/logstash/sample.log",
“tenant_id”: "291167ebed3221a006eb",
“apikey”: "06be8a-28ef-4568-8cb8-612",
“string_boolean”: "true",
“host_ip”: "192.168.99.01"
}
{
“@version”: "1",
“@timestamp”: "2014-07-16T06:49:39.919Z",
“host”: "host1.symantec.com",
“tenant_id”: "291167ebed3221a006ebf6",
“apikey”: "06be8a-28ef-4568-8cb8-618",
“value”: 0.65,
“name”: “cpu”
}
Log Event Metric Event
In-Flux Limiting for a Multi-Tenant Logging Service 4
LMM Architecture
Redis
Customer
Agents
Elasticsearch
InfluxDB
Log Topology
Metrics Topology
Kafka
Logstash
Users
Open to
customers
In-Flux Limiting for a Multi-Tenant Logging Service 5
Streaming Pipeline
• Validate events to match schema to optimize indexing
• Authenticate events to route data to the correct index
• Have 1 index per day per tenant
Kafka
Validate Auth Index
In-Flux Limiting for a Multi-Tenant Logging Service 6
Influx Issue
• You know your data store performance
limits (find EPS from benchmark/capacity)
• Tenants send a lot of data and ingestion
rate is never linear
• Ingestion spikes are bound to happen in a
real-time streaming application
• Wouldn’t it be great if you could
normalize these spikes?
In-Flux Limiting for a Multi-Tenant Logging Service 7
Influx Limiting
• Normalize the EPS curve using buffers
• Like a Hydro Dam, explicitly allocate EPS resource to tenants
Before
After
In-Flux Limiting for a Multi-Tenant Logging Service 8
Design - Options
Approach 1 Approach 2
• Route to separate Kafka topic
• No back-pressure in primary queue
• Secondary queue is drained
at a slower pace
• Events may appear out of order
• Controlled back-pressure in the
primary queue
• Selectively reduce ingestion rate
for tenants
• Events will always appear in order
In-Flux Limiting for a Multi-Tenant Logging Service 9
Customer Requirements
• Customers want threshold quotas defined for them
• Thresholds defined as policies (duration in seconds)
• Policies saved in a data store
Tenant A Tenant B Tenant C
{
“threshold”: 100,
“window”: 90
}
{
“threshold”: 700,
“window”: 10
}
{
“threshold”: 900,
“window”: 1
}
In-Flux Limiting for a Multi-Tenant Logging Service 10
Bolt Design
Kafka
1. Track “Event Rate” for each Tenant for the policy window
2. If threshold exceeds then throttle else allow the events
3. Reset window when the time interval is complete (tumbling window)
Validate Auth Throttle Index
In-Flux Limiting for a Multi-Tenant Logging Service 11
Scheduled-task design pattern
• Clock is maintained using
Storm Tick Tuple
• Tenant’s counter is
incremented when event is
received from it
• Counters are reset when
modulated value matches
Is Time % Throttle Duration = 0?
= Tenant Throttle Counter
Clock time
Modulo
Reset counters for each tenant in this sliceNothing to Reset
= Tenant Throttle Duration (modulated)
Reset counters for each tenant in this slice
In-Flux Limiting for a Multi-Tenant Logging Service 12
Results
13
• Reduced EPS to
Elasticsearch
• We can normalize
flow rate based on
load
In-Flux Limiting for a Multi-Tenant Logging Service
In-Flux Limiting for a Multi-Tenant Logging Service
Conclusion
• Overview of real-time log and metric indexing
• Approaches to rate limit in real-time streaming application
• Design pattern to efficiently perform counting in Storm
14
That’s all folks!
Questions?
In-Flux Limiting for a Multi-Tenant Logging Service 15

Más contenido relacionado

La actualidad más candente

Trend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache BigtopTrend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache BigtopEvans Ye
 
Real time fraud detection at 1+M scale on hadoop stack
Real time fraud detection at 1+M scale on hadoop stackReal time fraud detection at 1+M scale on hadoop stack
Real time fraud detection at 1+M scale on hadoop stackDataWorks Summit/Hadoop Summit
 
Cassandra on Google Cloud Platform (Ravi Madasu, Google / Ben Lackey, DataSta...
Cassandra on Google Cloud Platform (Ravi Madasu, Google / Ben Lackey, DataSta...Cassandra on Google Cloud Platform (Ravi Madasu, Google / Ben Lackey, DataSta...
Cassandra on Google Cloud Platform (Ravi Madasu, Google / Ben Lackey, DataSta...DataStax
 
Built-In Security for the Cloud
Built-In Security for the CloudBuilt-In Security for the Cloud
Built-In Security for the CloudDataWorks Summit
 
Data Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax EnterpriseData Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax EnterpriseDataStax
 
Troubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the BeastTroubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the BeastDataWorks Summit
 
Spark Summit EU talk by Kaarthik Sivashanmugam
Spark Summit EU talk by Kaarthik SivashanmugamSpark Summit EU talk by Kaarthik Sivashanmugam
Spark Summit EU talk by Kaarthik SivashanmugamSpark Summit
 
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder HortonworksThe Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder HortonworksData Con LA
 
Tuning Java Driver for Apache Cassandra by Nenad Bozic at Big Data Spain 2017
Tuning Java Driver for Apache Cassandra by Nenad Bozic at Big Data Spain 2017Tuning Java Driver for Apache Cassandra by Nenad Bozic at Big Data Spain 2017
Tuning Java Driver for Apache Cassandra by Nenad Bozic at Big Data Spain 2017Big Data Spain
 
Unified, Efficient, and Portable Data Processing with Apache Beam
Unified, Efficient, and Portable Data Processing with Apache BeamUnified, Efficient, and Portable Data Processing with Apache Beam
Unified, Efficient, and Portable Data Processing with Apache BeamDataWorks Summit/Hadoop Summit
 
A New “Sparkitecture” for Modernizing your Data Warehouse: Spark Summit East ...
A New “Sparkitecture” for Modernizing your Data Warehouse: Spark Summit East ...A New “Sparkitecture” for Modernizing your Data Warehouse: Spark Summit East ...
A New “Sparkitecture” for Modernizing your Data Warehouse: Spark Summit East ...Spark Summit
 
Securing Data in Hadoop at Uber
Securing Data in Hadoop at UberSecuring Data in Hadoop at Uber
Securing Data in Hadoop at UberDataWorks Summit
 
Preventative Maintenance of Robots in Automotive Industry
Preventative Maintenance of Robots in Automotive IndustryPreventative Maintenance of Robots in Automotive Industry
Preventative Maintenance of Robots in Automotive IndustryDataWorks Summit/Hadoop Summit
 
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision MakingData Architectures for Robust Decision Making
Data Architectures for Robust Decision MakingGwen (Chen) Shapira
 
Building Continuously Curated Ingestion Pipelines
Building Continuously Curated Ingestion PipelinesBuilding Continuously Curated Ingestion Pipelines
Building Continuously Curated Ingestion PipelinesArvind Prabhakar
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache KafkaJim Plush
 
Big Data Day LA 2015 - The Big Data Journey: How Big Data Practices Evolve at...
Big Data Day LA 2015 - The Big Data Journey: How Big Data Practices Evolve at...Big Data Day LA 2015 - The Big Data Journey: How Big Data Practices Evolve at...
Big Data Day LA 2015 - The Big Data Journey: How Big Data Practices Evolve at...Data Con LA
 
Lambda Architecture with Spark
Lambda Architecture with SparkLambda Architecture with Spark
Lambda Architecture with SparkKnoldus Inc.
 

La actualidad más candente (20)

Trend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache BigtopTrend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache Bigtop
 
Real time fraud detection at 1+M scale on hadoop stack
Real time fraud detection at 1+M scale on hadoop stackReal time fraud detection at 1+M scale on hadoop stack
Real time fraud detection at 1+M scale on hadoop stack
 
Cassandra on Google Cloud Platform (Ravi Madasu, Google / Ben Lackey, DataSta...
Cassandra on Google Cloud Platform (Ravi Madasu, Google / Ben Lackey, DataSta...Cassandra on Google Cloud Platform (Ravi Madasu, Google / Ben Lackey, DataSta...
Cassandra on Google Cloud Platform (Ravi Madasu, Google / Ben Lackey, DataSta...
 
Built-In Security for the Cloud
Built-In Security for the CloudBuilt-In Security for the Cloud
Built-In Security for the Cloud
 
Data Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax EnterpriseData Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax Enterprise
 
Troubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the BeastTroubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the Beast
 
Spark Summit EU talk by Kaarthik Sivashanmugam
Spark Summit EU talk by Kaarthik SivashanmugamSpark Summit EU talk by Kaarthik Sivashanmugam
Spark Summit EU talk by Kaarthik Sivashanmugam
 
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder HortonworksThe Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
 
Tuning Java Driver for Apache Cassandra by Nenad Bozic at Big Data Spain 2017
Tuning Java Driver for Apache Cassandra by Nenad Bozic at Big Data Spain 2017Tuning Java Driver for Apache Cassandra by Nenad Bozic at Big Data Spain 2017
Tuning Java Driver for Apache Cassandra by Nenad Bozic at Big Data Spain 2017
 
Big Data Tools in AWS
Big Data Tools in AWSBig Data Tools in AWS
Big Data Tools in AWS
 
Unified, Efficient, and Portable Data Processing with Apache Beam
Unified, Efficient, and Portable Data Processing with Apache BeamUnified, Efficient, and Portable Data Processing with Apache Beam
Unified, Efficient, and Portable Data Processing with Apache Beam
 
A New “Sparkitecture” for Modernizing your Data Warehouse: Spark Summit East ...
A New “Sparkitecture” for Modernizing your Data Warehouse: Spark Summit East ...A New “Sparkitecture” for Modernizing your Data Warehouse: Spark Summit East ...
A New “Sparkitecture” for Modernizing your Data Warehouse: Spark Summit East ...
 
Securing Data in Hadoop at Uber
Securing Data in Hadoop at UberSecuring Data in Hadoop at Uber
Securing Data in Hadoop at Uber
 
What's new in SQL on Hadoop and Beyond
What's new in SQL on Hadoop and BeyondWhat's new in SQL on Hadoop and Beyond
What's new in SQL on Hadoop and Beyond
 
Preventative Maintenance of Robots in Automotive Industry
Preventative Maintenance of Robots in Automotive IndustryPreventative Maintenance of Robots in Automotive Industry
Preventative Maintenance of Robots in Automotive Industry
 
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision MakingData Architectures for Robust Decision Making
Data Architectures for Robust Decision Making
 
Building Continuously Curated Ingestion Pipelines
Building Continuously Curated Ingestion PipelinesBuilding Continuously Curated Ingestion Pipelines
Building Continuously Curated Ingestion Pipelines
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Big Data Day LA 2015 - The Big Data Journey: How Big Data Practices Evolve at...
Big Data Day LA 2015 - The Big Data Journey: How Big Data Practices Evolve at...Big Data Day LA 2015 - The Big Data Journey: How Big Data Practices Evolve at...
Big Data Day LA 2015 - The Big Data Journey: How Big Data Practices Evolve at...
 
Lambda Architecture with Spark
Lambda Architecture with SparkLambda Architecture with Spark
Lambda Architecture with Spark
 

Similar a In Flux Limiting for a multi-tenant logging service

AWS re:Invent 2016: Beeswax: Building a Real-Time Streaming Data Platform on ...
AWS re:Invent 2016: Beeswax: Building a Real-Time Streaming Data Platform on ...AWS re:Invent 2016: Beeswax: Building a Real-Time Streaming Data Platform on ...
AWS re:Invent 2016: Beeswax: Building a Real-Time Streaming Data Platform on ...Amazon Web Services
 
Vault Digital Transformation
Vault Digital TransformationVault Digital Transformation
Vault Digital TransformationStenio Ferreira
 
Keystone - ApacheCon 2016
Keystone - ApacheCon 2016Keystone - ApacheCon 2016
Keystone - ApacheCon 2016Peter Bakas
 
Real Time Insights for Advertising Tech
Real Time Insights for Advertising TechReal Time Insights for Advertising Tech
Real Time Insights for Advertising TechApache Apex
 
NATS: A Cloud Native Messaging System
NATS: A Cloud Native Messaging SystemNATS: A Cloud Native Messaging System
NATS: A Cloud Native Messaging SystemShiju Varghese
 
Service-Level Objective for Serverless Applications
Service-Level Objective for Serverless ApplicationsService-Level Objective for Serverless Applications
Service-Level Objective for Serverless Applicationsalekn
 
Apache Kafka® at Dropbox
Apache Kafka® at DropboxApache Kafka® at Dropbox
Apache Kafka® at Dropboxconfluent
 
Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022
Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022
Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022HostedbyConfluent
 
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015Amazon Web Services Korea
 
DevDay: Corda Enterprise: Journey to 1000 TPS per node, Rick Parker
DevDay: Corda Enterprise: Journey to 1000 TPS per node, Rick ParkerDevDay: Corda Enterprise: Journey to 1000 TPS per node, Rick Parker
DevDay: Corda Enterprise: Journey to 1000 TPS per node, Rick ParkerR3
 
Day 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon Kinesis
Day 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon KinesisDay 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon Kinesis
Day 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon KinesisAmazon Web Services
 
CloudStack Overview
CloudStack OverviewCloudStack Overview
CloudStack Overviewsedukull
 
Beyond REST and RPC: Asynchronous Eventing and Messaging Patterns
Beyond REST and RPC: Asynchronous Eventing and Messaging PatternsBeyond REST and RPC: Asynchronous Eventing and Messaging Patterns
Beyond REST and RPC: Asynchronous Eventing and Messaging PatternsClemens Vasters
 
Itera Dev Meetup - Function as a Service - Serverless architecture
Itera Dev Meetup - Function as a Service - Serverless architectureItera Dev Meetup - Function as a Service - Serverless architecture
Itera Dev Meetup - Function as a Service - Serverless architecturePavol Rajzák
 
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan Waite
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan WaiteStructure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan Waite
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan WaiteGigaom
 
Hacking apache cloud stack
Hacking apache cloud stackHacking apache cloud stack
Hacking apache cloud stackNitin Mehta
 
Practice of large Hadoop cluster in China Mobile
Practice of large Hadoop cluster in China MobilePractice of large Hadoop cluster in China Mobile
Practice of large Hadoop cluster in China MobileDataWorks Summit
 
Leapfrog into Serverless - a Deloitte-Amtrak Case Study | Serverless Confere...
Leapfrog into Serverless - a Deloitte-Amtrak Case Study | Serverless Confere...Leapfrog into Serverless - a Deloitte-Amtrak Case Study | Serverless Confere...
Leapfrog into Serverless - a Deloitte-Amtrak Case Study | Serverless Confere...Gary Arora
 
PayPal Risk Platform High Performance Practice
PayPal Risk Platform High Performance PracticePayPal Risk Platform High Performance Practice
PayPal Risk Platform High Performance PracticeBrian Ling
 

Similar a In Flux Limiting for a multi-tenant logging service (20)

AWS re:Invent 2016: Beeswax: Building a Real-Time Streaming Data Platform on ...
AWS re:Invent 2016: Beeswax: Building a Real-Time Streaming Data Platform on ...AWS re:Invent 2016: Beeswax: Building a Real-Time Streaming Data Platform on ...
AWS re:Invent 2016: Beeswax: Building a Real-Time Streaming Data Platform on ...
 
Vault Digital Transformation
Vault Digital TransformationVault Digital Transformation
Vault Digital Transformation
 
Keystone - ApacheCon 2016
Keystone - ApacheCon 2016Keystone - ApacheCon 2016
Keystone - ApacheCon 2016
 
Real Time Insights for Advertising Tech
Real Time Insights for Advertising TechReal Time Insights for Advertising Tech
Real Time Insights for Advertising Tech
 
NATS: A Cloud Native Messaging System
NATS: A Cloud Native Messaging SystemNATS: A Cloud Native Messaging System
NATS: A Cloud Native Messaging System
 
Service-Level Objective for Serverless Applications
Service-Level Objective for Serverless ApplicationsService-Level Objective for Serverless Applications
Service-Level Objective for Serverless Applications
 
Apache Kafka® at Dropbox
Apache Kafka® at DropboxApache Kafka® at Dropbox
Apache Kafka® at Dropbox
 
Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022
Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022
Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022
 
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
 
DevDay: Corda Enterprise: Journey to 1000 TPS per node, Rick Parker
DevDay: Corda Enterprise: Journey to 1000 TPS per node, Rick ParkerDevDay: Corda Enterprise: Journey to 1000 TPS per node, Rick Parker
DevDay: Corda Enterprise: Journey to 1000 TPS per node, Rick Parker
 
Day 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon Kinesis
Day 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon KinesisDay 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon Kinesis
Day 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon Kinesis
 
CloudStack Overview
CloudStack OverviewCloudStack Overview
CloudStack Overview
 
Beyond REST and RPC: Asynchronous Eventing and Messaging Patterns
Beyond REST and RPC: Asynchronous Eventing and Messaging PatternsBeyond REST and RPC: Asynchronous Eventing and Messaging Patterns
Beyond REST and RPC: Asynchronous Eventing and Messaging Patterns
 
Large scale, distributed access management deployment with aruba clear pass
Large scale, distributed access management deployment with aruba clear passLarge scale, distributed access management deployment with aruba clear pass
Large scale, distributed access management deployment with aruba clear pass
 
Itera Dev Meetup - Function as a Service - Serverless architecture
Itera Dev Meetup - Function as a Service - Serverless architectureItera Dev Meetup - Function as a Service - Serverless architecture
Itera Dev Meetup - Function as a Service - Serverless architecture
 
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan Waite
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan WaiteStructure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan Waite
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan Waite
 
Hacking apache cloud stack
Hacking apache cloud stackHacking apache cloud stack
Hacking apache cloud stack
 
Practice of large Hadoop cluster in China Mobile
Practice of large Hadoop cluster in China MobilePractice of large Hadoop cluster in China Mobile
Practice of large Hadoop cluster in China Mobile
 
Leapfrog into Serverless - a Deloitte-Amtrak Case Study | Serverless Confere...
Leapfrog into Serverless - a Deloitte-Amtrak Case Study | Serverless Confere...Leapfrog into Serverless - a Deloitte-Amtrak Case Study | Serverless Confere...
Leapfrog into Serverless - a Deloitte-Amtrak Case Study | Serverless Confere...
 
PayPal Risk Platform High Performance Practice
PayPal Risk Platform High Performance PracticePayPal Risk Platform High Performance Practice
PayPal Risk Platform High Performance Practice
 

Más de DataWorks Summit/Hadoop Summit

Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerDataWorks Summit/Hadoop Summit
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformDataWorks Summit/Hadoop Summit
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDataWorks Summit/Hadoop Summit
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...DataWorks Summit/Hadoop Summit
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...DataWorks Summit/Hadoop Summit
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLDataWorks Summit/Hadoop Summit
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)DataWorks Summit/Hadoop Summit
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...DataWorks Summit/Hadoop Summit
 

Más de DataWorks Summit/Hadoop Summit (20)

Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
 
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
 
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and ZeppelinRevolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
 
Hadoop Crash Course
Hadoop Crash CourseHadoop Crash Course
Hadoop Crash Course
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
 
Schema Registry - Set you Data Free
Schema Registry - Set you Data FreeSchema Registry - Set you Data Free
Schema Registry - Set you Data Free
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 

Último

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 

Último (20)

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 

In Flux Limiting for a multi-tenant logging service

  • 1. In-Flux Limiting for a Multi-Tenant Logging Service Ambud Sharma & Suma Cherukuri Cloud Platform Engineering @ Symantec In-Flux Limiting for a Multi-Tenant Logging Service 1
  • 2. Overview • Who are we? • Architecture • Streaming Pipeline • Influx Issue • Influx Limiting Design & Solution • Conclusion • Q & A In-Flux Limiting for a Multi-Tenant Logging Service 2
  • 3. Who are we? • Symantec’s internal cloud team • Host over $1B+ revenue applications • Team – Logging as a Service (LaaS) – Elasticsearch/Kibana – Metering as a Service (MaaS) – InfluxDB/Grafana – Alerting as a Service (AaaS) – Hendrix We are hiring! Also checkout Hendrix: https://github.com/Symantec/hendrix In-Flux Limiting for a Multi-Tenant Logging Service 3
  • 4. Our Data Logs • Application and system logs data from VM’s and Containers • Used for troubleshooting Metrics • Application and system telemetries • Used for Application Performance Monitoring { “message”: “User logged in from 1.1.1.1”, “@version”: "1", “@timestamp”: "2014-07-16T06:49:39.919Z", “host”: "value", “path”: “/opt/logstash/sample.log", “tenant_id”: "291167ebed3221a006eb", “apikey”: "06be8a-28ef-4568-8cb8-612", “string_boolean”: "true", “host_ip”: "192.168.99.01" } { “@version”: "1", “@timestamp”: "2014-07-16T06:49:39.919Z", “host”: "host1.symantec.com", “tenant_id”: "291167ebed3221a006ebf6", “apikey”: "06be8a-28ef-4568-8cb8-618", “value”: 0.65, “name”: “cpu” } Log Event Metric Event In-Flux Limiting for a Multi-Tenant Logging Service 4
  • 5. LMM Architecture Redis Customer Agents Elasticsearch InfluxDB Log Topology Metrics Topology Kafka Logstash Users Open to customers In-Flux Limiting for a Multi-Tenant Logging Service 5
  • 6. Streaming Pipeline • Validate events to match schema to optimize indexing • Authenticate events to route data to the correct index • Have 1 index per day per tenant Kafka Validate Auth Index In-Flux Limiting for a Multi-Tenant Logging Service 6
  • 7. Influx Issue • You know your data store performance limits (find EPS from benchmark/capacity) • Tenants send a lot of data and ingestion rate is never linear • Ingestion spikes are bound to happen in a real-time streaming application • Wouldn’t it be great if you could normalize these spikes? In-Flux Limiting for a Multi-Tenant Logging Service 7
  • 8. Influx Limiting • Normalize the EPS curve using buffers • Like a Hydro Dam, explicitly allocate EPS resource to tenants Before After In-Flux Limiting for a Multi-Tenant Logging Service 8
  • 9. Design - Options Approach 1 Approach 2 • Route to separate Kafka topic • No back-pressure in primary queue • Secondary queue is drained at a slower pace • Events may appear out of order • Controlled back-pressure in the primary queue • Selectively reduce ingestion rate for tenants • Events will always appear in order In-Flux Limiting for a Multi-Tenant Logging Service 9
  • 10. Customer Requirements • Customers want threshold quotas defined for them • Thresholds defined as policies (duration in seconds) • Policies saved in a data store Tenant A Tenant B Tenant C { “threshold”: 100, “window”: 90 } { “threshold”: 700, “window”: 10 } { “threshold”: 900, “window”: 1 } In-Flux Limiting for a Multi-Tenant Logging Service 10
  • 11. Bolt Design Kafka 1. Track “Event Rate” for each Tenant for the policy window 2. If threshold exceeds then throttle else allow the events 3. Reset window when the time interval is complete (tumbling window) Validate Auth Throttle Index In-Flux Limiting for a Multi-Tenant Logging Service 11
  • 12. Scheduled-task design pattern • Clock is maintained using Storm Tick Tuple • Tenant’s counter is incremented when event is received from it • Counters are reset when modulated value matches Is Time % Throttle Duration = 0? = Tenant Throttle Counter Clock time Modulo Reset counters for each tenant in this sliceNothing to Reset = Tenant Throttle Duration (modulated) Reset counters for each tenant in this slice In-Flux Limiting for a Multi-Tenant Logging Service 12
  • 13. Results 13 • Reduced EPS to Elasticsearch • We can normalize flow rate based on load In-Flux Limiting for a Multi-Tenant Logging Service
  • 14. In-Flux Limiting for a Multi-Tenant Logging Service Conclusion • Overview of real-time log and metric indexing • Approaches to rate limit in real-time streaming application • Design pattern to efficiently perform counting in Storm 14 That’s all folks!
  • 15. Questions? In-Flux Limiting for a Multi-Tenant Logging Service 15

Notas del editor

  1. welcome to talk In-Flux Limiting for a Multi-Tenant Logging Service introduce yourself we are from CPE at symantec.
  2. today we are here to talk about how we do event throttling, rate limiting for real time streaming, go over architecture internal details of streaming pipeline influx issue different approaches of solving prob show you results also want to cover an efficient pattern of computing if there are any pressing questions pls feel free to stop us. but we would prefer to take questions to the end
  3. we are part of symantec internal cloud team that hosts app’s generating 1B revenue specifically our team builds, owns and runs 3 primary services. we call them Logging as a Service, Metering as a Service and Alerting as a Service Side note we are hiring if anyone is interested in joining the effort to build the biggest security data lake in the world, please stop by after the presentation. Side note we have open sourced project called hendrix , which is our alerting as a service . pls feel free to go and check it at out at github.com slash Symantec slash hendrix
  4. before we jump on into the actual design and architecture of our system, lets talk about the data that we get and what is the problem that we are solving basically we offer logging and APM (app performance and monitoring) as a service. APM stands for application performance monitoring. our customers are Symantec product teams and they send us app and system logs generated on VM’s and containers. The teams use these for troubleshooting their applications. This is basically our own version of splunk. On The metrics side of the story we get app and system telemtetries which the teams use for application performance monitoring. Here is our sample events. we accept data in Json format and this is what it looks like on the left is the log event and the right is the metric event. If you look at log events you notice that there are 2 special fields one is called tenant id and the other is the api key. so whats the tenant and an api key ? Every customer which is a P&S teams at Symantec have something called tenants. the concept of tenant comes from our Openstack cloud. A given P&S teams can have more than one tenant. For example, their production App A can have a tenant and prod app B can have another tenant. Basically every tenant is a unit of isolation for us. An api key is a token used to allow and revoke flow of logs for a given tenant. lets say you wanted to stop a given tenant from sending data, you can revoke the api key. this means we start discarding the events. we call this process event authentication.
  5. Now lets get into our architecture. basically customers run agents like flume, logstash, collectd and statsd which send data to our kafka cluster which is exposed over loadbalancers. we then run a set of storm topologies which write data to destination data stores. incase of logs its ES and incase of metrics its Influx db. We use Kibana as a front end for ES and Grafana as a front end for the influx db so that customers can graph and query the data. Redis is where we store tenant id’s and api keys.
  6. Here’s what happens inside our streaming pipeline that is that storm topology. First like we showed earlier, events arrive in Kafka; we use the Storm Kafka Spout to read them and then we validate these events against the format and schema specifications that we publish to our customers, example if it’s malformed JSON we will drop the event. Next like we check whether the Tenant Id and API Key are valid. And lastly we Index the data to Elasticsearch or insert it into InfluxDB Each of the above stages are in separate Storm Bolts.
  7. So now that you that you have a fair idea of our pipeline let’s understand the Influx issue. Influx means arrival of a large quantity of something in a short time, in this case that is events. When you are writing data to a data store like Hbase, Cassandra or Elasticsearch you provision a capacity in the cluster as in your cluster will have X number of nodes and they can support let’s say 10000 inserts per second. You can gauge this number / capacity running benchmarks. For us these inserts per second can also be referred to as Event per Second or EPS. EPS sent by our tenants is never linear, it fluctuates quite a lot as you can see in the graph on the right, each line in this graph represents the EPS from a tenant. At times we get spikes which is bound to happen in any real-time event processing system, when the load increases on the applications, they generate a lot more logs and we get a spike. So when a spike happens we don’t have provisioned capacity to handle indexing of the additional influx of data that came in because of the spike. So wouldn’t it be great if you could normalize these spikes? What we mean is have an almost flat EPS curve for every tenant.
  8. Let’s understand how we can limit the influx and normalize these spikes. Think about event streams as a river and if there’s a cloud burst (no pun intended) the river will get temporarily flooded so much so that the banks will overflow right? So to fix this problem we can build a Dam and this Dam will buffer the additional influx of water we just got and you can control the rate at which this dam is being drained. Since we are using Kafka we already have a buffer however we have no control over the buffer as it will be controlled by the back pressure our elasticsearch cluster creates because it can take just sot many writes. So the purpose of this work was to have controlled back pressure into Kafka for our streaming pipelines to let us quantitatively determine how many events we would like to let flow through our pipeline into Elasticsearch. But we would like to do this on a tenant by tenant basis as you can see in the diagrams before you have different tenants sending different quantities of data shown in different colors. If we were to have a controlled system we can normalize and evenly divide the capacity among all of them or knowingly make it uneven that is if 1 customer has more need we allocate more capacity to them rather than others.
  9. How can we do this? Well there are 2 approaches we thought of, one is you can write a substream where if a tenant exceeds their allocated throughput capacity we divert the extra event traffic that is over the capacity to a separate queue which we will then drain at a slower pace. In technical that means a separate kafka topic and a separate storm topology with a lower parallelism configuration. The other way of solving this problem is to pause the processing of events in the existing streaming pipeline for the tenant that is sending more data. Both approaches have some pros and cons. If you go with the first one you will see events out of order that means some data you will see right way which is flowing through our main pipeline and the some data will be delayed because it’s flowing though the slower pipeline. For the second approach you will always see events in order but if the queue back pressures too much you may loose data but that is true for either approaches if you share the kafka cluster because for a given cluster you disk space is limited.
  10. What do we do inside the bolt and how do we track the Event rate for tenants? To track counts of events we keep a hashtable of tenant id and an integer counter which every time we see an event from a tenant we increment. But our customers wanted policies that define event rates differently for every tenant that is someone wants to be allowed to send 300 events in 2 minutes while the other one wanted 5000 in 10 minutes which don’t mean the same EPS so we had to come up with a way to track this for every tenant and we came up with an interesting way of solving this problem without using multiple threads. What we built logically was a sort of mary go round where every tenant is allowed to go round once. How it’s setup is every tenant influx limiting policy has 2 parts, 1. the number of events they would like to send 2. the time duration for those events so what we do is we take the time duration we we place tenants on this virtual mary go round based on their policy time duration.