SlideShare una empresa de Scribd logo
1 de 36
Descargar para leer sin conexión
Pinot
Kishore Gopalakrishna
Tuesday, August 18, 15
Agenda
• Pinot @ LinkedIn - Current
• Pinot - Architecture
• Pinot Operations
• Pinot @ LinkedIn - Future
Tuesday, August 18, 15
WVMP
Tuesday, August 18, 15
Slice and Dice Metrics
Tuesday, August 18, 15
Pinot @ LinkedIn
Customers Members Internal tools
Tuesday, August 18, 15
• 100B documents
• 1B documents ingested per day
• 100M queries per day
• 10’s of ms latency
• 30 tables in prod, 250 * 3 std app nodes

 

Pinot @ LinkedIn
Tuesday, August 18, 15
Key features
SQL-like
interface
Columnar
storage and
indexing
Real-time
data load
Tuesday, August 18, 15
(S)QL: Filters and Aggs
SELECT count(*)
FROM companyFollowHistoricalEvents
WHERE entityId = 121011 AND
'day' >= 15949 AND 'day' <= 15963 AND
paid = 'y’ AND
action = 'stop'
Tuesday, August 18, 15
(S)QL: Group By
SELECT count(*)
FROM companyFollowHistoricalEvents
WHERE entityId = 121011 AND
'day' >= 15949 AND 'day' <= 15963 AND
paid = 'y’
GROUP BY action
Tuesday, August 18, 15
(S)QL: ORDER BY and LIMIT
SELECT *
FROM companyFollowHistoricalEvents
WHERE entityId = 121011 AND
entityId = 1000 AND
action = 'start'
ORDER BY creationTime DESC LIMIT 1
Tuesday, August 18, 15
Whats not supported
• JOIN: unpredictable performance
• NOT A SOURCE OF TRUTH
• Mutation
Tuesday, August 18, 15
Pinot
• Data flow
• Query Execution
• How to use/operate
• Pinot @ LinkedIn - Future
Tuesday, August 18, 15
Broker Helix
Real
time
Historical
Kafka Hadoop
Pinot
Architecture
Queries
Raw
Data
Tuesday, August 18, 15
Pinot
• Pinot segments
Tuesday, August 18, 15
Pinot Segment layout: Columnar storage
Tuesday, August 18, 15
Pinot Segment layout: Sorted Forward Index
Tuesday, August 18, 15
Pinot Segment layout: Other techniques
• Indexes: Inverted index, Bitmap, RoaringBitmap
• Compression: Dictionary Encoding, P4Delta
• Multi Valued columns, skip lists,
• Hyperloglog for unique
• T-digest for Percentile, Quantile

Tuesday, August 18, 15
Data aware
pre-computation
Star tree Index
Tuesday, August 18, 15
Pinot
• Query Execution
Tuesday, August 18, 15
Pinot Query Execution: Distributed
Servers
S1
S3
S2
S1
S3
S2
Helix
Brokers
Tuesday, August 18, 15
Pinot Query Execution: Distributed
Servers
1.Query
S1
S3
S2
S1
S3
S2
Helix
Brokers
Tuesday, August 18, 15
Pinot Query Execution: Distributed
Servers
1.Query
S1
S3
S2
S1
S3
S2
Helix
2. Fetch routing table from HelixBrokers
Tuesday, August 18, 15
Pinot Query Execution: Distributed
Servers
1.Query
S1
S3
S2
S1
S3
S2
Helix
2. Fetch routing table from HelixBrokers
3. Scatter Request
Tuesday, August 18, 15
Pinot Query Execution: Distributed
Servers
1.Query
S1
S3
S2
S1
S3
S2
Helix
2. Fetch routing table from HelixBrokers
3. Scatter Request
4. Process Request
&
send response
Tuesday, August 18, 15
Pinot Query Execution: Distributed
Servers
1.Query
S1
S3
S2
S1
S3
S2
Helix
2. Fetch routing table from HelixBrokers
3. Scatter Request
4. Process Request
&
send response
5. Gather Response
Tuesday, August 18, 15
Pinot Query Execution: Distributed
Servers
1.Query
S1
S3
S2
S1
S3
S2
Helix
2. Fetch routing table from HelixBrokers
3. Scatter Request
4. Process Request
&
send response
5. Gather Response
6. Return Response
Tuesday, August 18, 15
Pinot Query Execution: Single Node Architecture
EXECUTION ENGINE
INVERTED
INDEX
BITMAP
INDEX
COLUMN FORMAT
PLANNER
Tuesday, August 18, 15
Pinot Query Execution: Single Node Architecture
SELECT
campaignId,
sum(clicks)
FROM Table A
WHERE
accountId = 121011
AND
'day' >= 15949
GROUP BY
campaignId
account Id daycampaign Id click
Filter
Operator
Projection
Operator
Aggregation
Group by
Operator
Combine Operator
Pinot
Segments
Data sources
Matching
doc ids
campaignId,Click tuple
Tuesday, August 18, 15
Pinot
• Operations
Tuesday, August 18, 15
Cluster Management: Deployment
Helix
Brokers
Servers
• Brokers and Servers register themselves in Helix
• All servers start with no use case specific configuration
Controller
Tuesday, August 18, 15
On boarding new use case
Helix
Brokers
Servers
XLNT XLNT
XLNT
Create Table
command
Controller
XLNT
XLNTTag
Servers
TableName
Brokers
3
XLNT_T1
1
Tuesday, August 18, 15
Segment Assignment
Servers
S3
S2
S1
Upload Segment S2
S1
S3
S2
S1
S3
Helix
Brokers
Copies
TableName
2
XLNT_T1
Controller
Tuesday, August 18, 15
• AUTO recovery mode: Automatically redistribute
segments on failure/addition of new nodes
• Custom mode: Run in degraded mode until node is
restarted/replaced.
Pinot - Fault tolerance/Elasticity
Tuesday, August 18, 15
Pinot vs Druid
Druid Pinot
Architecture
Realtime + Offline,
Realtime only
Realtime + Offline
Realtime only -> consistency is hard and
schema evolution/Bootstrap is hard
Inverted Index
Always On all columns,
Fixed
Configurable on per
column basis
Allows trade off between scanning v/s
inverted index + scanning. More data can be
fit in given memory size
Data organization N/A Sorts data
Organizing data provides speed/better
compression and removes the need for
inverted index
Smart pre-
materialization
N/A star-tree Allows trade off between latency and space
Query Execution
Layer
Fixed Plan
Split into Planning
and execution
Smart choices can be made at runtime
based on metadata/query.
Tuesday, August 18, 15
• Documentation & tooling
• In progress - consistency among real time replicas.
• Improve cost to serve - leverage SSD, partial pre
materialization
• ThirdEye - Business Metrics Monitoring
Pinot - Future
Tuesday, August 18, 15
Thank You
30
Tuesday, August 18, 15

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
 
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 
Splunk: Druid on Kubernetes with Druid-operator
Splunk: Druid on Kubernetes with Druid-operatorSplunk: Druid on Kubernetes with Druid-operator
Splunk: Druid on Kubernetes with Druid-operator
 
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
 
Making Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta LakeMaking Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta Lake
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guide
 
Data Quality With or Without Apache Spark and Its Ecosystem
Data Quality With or Without Apache Spark and Its EcosystemData Quality With or Without Apache Spark and Its Ecosystem
Data Quality With or Without Apache Spark and Its Ecosystem
 
Using ClickHouse for Experimentation
Using ClickHouse for ExperimentationUsing ClickHouse for Experimentation
Using ClickHouse for Experimentation
 
Funnel Analysis with Apache Spark and Druid
Funnel Analysis with Apache Spark and DruidFunnel Analysis with Apache Spark and Druid
Funnel Analysis with Apache Spark and Druid
 
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
Building a Real-Time Analytics Application with  Apache Pulsar and Apache PinotBuilding a Real-Time Analytics Application with  Apache Pulsar and Apache Pinot
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large Scale
 
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
 
XStream: stream processing platform at facebook
XStream:  stream processing platform at facebookXStream:  stream processing platform at facebook
XStream: stream processing platform at facebook
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
An Approach to Data Quality for Netflix Personalization Systems
An Approach to Data Quality for Netflix Personalization SystemsAn Approach to Data Quality for Netflix Personalization Systems
An Approach to Data Quality for Netflix Personalization Systems
 
Building large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBuilding large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudi
 
OSA Con 2022 - Apache Iceberg_ An Architectural Look Under the Covers - Alex ...
OSA Con 2022 - Apache Iceberg_ An Architectural Look Under the Covers - Alex ...OSA Con 2022 - Apache Iceberg_ An Architectural Look Under the Covers - Alex ...
OSA Con 2022 - Apache Iceberg_ An Architectural Look Under the Covers - Alex ...
 
Open core summit: Observability for data pipelines with OpenLineage
Open core summit: Observability for data pipelines with OpenLineageOpen core summit: Observability for data pipelines with OpenLineage
Open core summit: Observability for data pipelines with OpenLineage
 

Destacado

Penyimpangan Nilai "Persatuan" dalam pancasila
Penyimpangan Nilai "Persatuan" dalam pancasilaPenyimpangan Nilai "Persatuan" dalam pancasila
Penyimpangan Nilai "Persatuan" dalam pancasila
helda1234
 
Mother teresa!
Mother teresa!Mother teresa!
Mother teresa!
lsammut
 

Destacado (20)

10 facts about jobs in the future
10 facts about jobs in the future10 facts about jobs in the future
10 facts about jobs in the future
 
Intro to Pinot (2016-01-04)
Intro to Pinot (2016-01-04)Intro to Pinot (2016-01-04)
Intro to Pinot (2016-01-04)
 
The AI Rush
The AI RushThe AI Rush
The AI Rush
 
2017 holiday survey: An annual analysis of the peak shopping season
2017 holiday survey: An annual analysis of the peak shopping season2017 holiday survey: An annual analysis of the peak shopping season
2017 holiday survey: An annual analysis of the peak shopping season
 
Inside Google's Numbers in 2017
Inside Google's Numbers in 2017Inside Google's Numbers in 2017
Inside Google's Numbers in 2017
 
Open Source LinkedIn Analytics Pipeline - BOSS 2016 (VLDB)
Open Source LinkedIn Analytics Pipeline - BOSS 2016 (VLDB)Open Source LinkedIn Analytics Pipeline - BOSS 2016 (VLDB)
Open Source LinkedIn Analytics Pipeline - BOSS 2016 (VLDB)
 
Penyimpangan Nilai "Persatuan" dalam pancasila
Penyimpangan Nilai "Persatuan" dalam pancasilaPenyimpangan Nilai "Persatuan" dalam pancasila
Penyimpangan Nilai "Persatuan" dalam pancasila
 
Why OpenDaylight
Why OpenDaylightWhy OpenDaylight
Why OpenDaylight
 
PENYIMPANGAN KEPADA SILA KE 3
PENYIMPANGAN KEPADA SILA KE 3PENYIMPANGAN KEPADA SILA KE 3
PENYIMPANGAN KEPADA SILA KE 3
 
オールフェスタ Git勉強会資料 (public)
オールフェスタ Git勉強会資料 (public)オールフェスタ Git勉強会資料 (public)
オールフェスタ Git勉強会資料 (public)
 
Do Fluxo de Caixa ao Planejamento Financeiro
Do Fluxo de Caixa ao Planejamento FinanceiroDo Fluxo de Caixa ao Planejamento Financeiro
Do Fluxo de Caixa ao Planejamento Financeiro
 
自習形式で学ぶ「DIGITS による画像分類入門」
自習形式で学ぶ「DIGITS による画像分類入門」自習形式で学ぶ「DIGITS による画像分類入門」
自習形式で学ぶ「DIGITS による画像分類入門」
 
Presentation r4i
Presentation r4i Presentation r4i
Presentation r4i
 
National Research Award_2559
National Research Award_2559National Research Award_2559
National Research Award_2559
 
Presentation talent mobility
Presentation talent mobilityPresentation talent mobility
Presentation talent mobility
 
Research r4i
Research r4iResearch r4i
Research r4i
 
Mother teresa!
Mother teresa!Mother teresa!
Mother teresa!
 
A Tribute to Mother Teresa !!
A Tribute to Mother Teresa !!A Tribute to Mother Teresa !!
A Tribute to Mother Teresa !!
 
Mother Teresa: Saint of the Gutters
Mother Teresa: Saint of the GuttersMother Teresa: Saint of the Gutters
Mother Teresa: Saint of the Gutters
 
Перелік об'єктів державної власності, які рекомендовано до передачі в концесію
Перелік об'єктів державної власності, які рекомендовано до передачі в концесіюПерелік об'єктів державної власності, які рекомендовано до передачі в концесію
Перелік об'єктів державної власності, які рекомендовано до передачі в концесію
 

Similar a Pinot: Realtime Distributed OLAP datastore

Truck and Body Presentation
Truck and Body PresentationTruck and Body Presentation
Truck and Body Presentation
CBN2014
 
Monitorama: How monitoring can improve the rest of the company
Monitorama: How monitoring can improve the rest of the companyMonitorama: How monitoring can improve the rest of the company
Monitorama: How monitoring can improve the rest of the company
Jeff Weinstein
 
An Effective Approach to Migrate Cassandra Thrift to CQL (Yabin Meng, Pythian...
An Effective Approach to Migrate Cassandra Thrift to CQL (Yabin Meng, Pythian...An Effective Approach to Migrate Cassandra Thrift to CQL (Yabin Meng, Pythian...
An Effective Approach to Migrate Cassandra Thrift to CQL (Yabin Meng, Pythian...
DataStax
 

Similar a Pinot: Realtime Distributed OLAP datastore (20)

Cloud Cost Management and Apache Spark with Xuan Wang
Cloud Cost Management and Apache Spark with Xuan WangCloud Cost Management and Apache Spark with Xuan Wang
Cloud Cost Management and Apache Spark with Xuan Wang
 
ADRecon BH USA 2018 : Arsenal and DEF CON 26 Demo Labs Presentation
ADRecon BH USA 2018 : Arsenal and DEF CON 26 Demo Labs PresentationADRecon BH USA 2018 : Arsenal and DEF CON 26 Demo Labs Presentation
ADRecon BH USA 2018 : Arsenal and DEF CON 26 Demo Labs Presentation
 
Monitoring Kubernetes with Icinga - Icinga Camp Milan 2023
Monitoring Kubernetes with Icinga - Icinga Camp Milan 2023Monitoring Kubernetes with Icinga - Icinga Camp Milan 2023
Monitoring Kubernetes with Icinga - Icinga Camp Milan 2023
 
Truck and Body Presentation
Truck and Body PresentationTruck and Body Presentation
Truck and Body Presentation
 
Stream processing at Hotstar
Stream processing at HotstarStream processing at Hotstar
Stream processing at Hotstar
 
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
 
Postgres
PostgresPostgres
Postgres
 
Scaling postgres
Scaling postgresScaling postgres
Scaling postgres
 
Introduction of pg_statsinfo and pg_stats_reporter ~Statistics Reporting Tool...
Introduction of pg_statsinfo and pg_stats_reporter ~Statistics Reporting Tool...Introduction of pg_statsinfo and pg_stats_reporter ~Statistics Reporting Tool...
Introduction of pg_statsinfo and pg_stats_reporter ~Statistics Reporting Tool...
 
8051,chapter1,architecture and peripherals
8051,chapter1,architecture and peripherals8051,chapter1,architecture and peripherals
8051,chapter1,architecture and peripherals
 
ITCamp 2018 - Damian Widera U-SQL in great depth
ITCamp 2018 - Damian Widera U-SQL in great depthITCamp 2018 - Damian Widera U-SQL in great depth
ITCamp 2018 - Damian Widera U-SQL in great depth
 
Accumulo Tutorial — Up and Running (or at Least Walking) in 90 Minutes
Accumulo Tutorial — Up and Running (or at Least Walking) in 90 MinutesAccumulo Tutorial — Up and Running (or at Least Walking) in 90 Minutes
Accumulo Tutorial — Up and Running (or at Least Walking) in 90 Minutes
 
Salesforce Apex Hours : How Lightning Platform Query Optimizer works for LDV
Salesforce Apex Hours : How Lightning Platform Query Optimizer works for LDVSalesforce Apex Hours : How Lightning Platform Query Optimizer works for LDV
Salesforce Apex Hours : How Lightning Platform Query Optimizer works for LDV
 
Monitorama: How monitoring can improve the rest of the company
Monitorama: How monitoring can improve the rest of the companyMonitorama: How monitoring can improve the rest of the company
Monitorama: How monitoring can improve the rest of the company
 
NoSQL Tel Aviv Meetup#1: Introduction to Polyglot Persistance
NoSQL Tel Aviv Meetup#1: Introduction to Polyglot PersistanceNoSQL Tel Aviv Meetup#1: Introduction to Polyglot Persistance
NoSQL Tel Aviv Meetup#1: Introduction to Polyglot Persistance
 
An Effective Approach to Migrate Cassandra Thrift to CQL (Yabin Meng, Pythian...
An Effective Approach to Migrate Cassandra Thrift to CQL (Yabin Meng, Pythian...An Effective Approach to Migrate Cassandra Thrift to CQL (Yabin Meng, Pythian...
An Effective Approach to Migrate Cassandra Thrift to CQL (Yabin Meng, Pythian...
 
Presto meetup 2015-03-19 @Facebook
Presto meetup 2015-03-19 @FacebookPresto meetup 2015-03-19 @Facebook
Presto meetup 2015-03-19 @Facebook
 
Active Directory Recon 101
Active Directory Recon 101Active Directory Recon 101
Active Directory Recon 101
 
50 Billion pins and counting: Using Hadoop to build data driven Products
50 Billion pins and counting: Using Hadoop to build data driven Products50 Billion pins and counting: Using Hadoop to build data driven Products
50 Billion pins and counting: Using Hadoop to build data driven Products
 
Pinterest hadoop summit_talk
Pinterest hadoop summit_talkPinterest hadoop summit_talk
Pinterest hadoop summit_talk
 

Más de Kishore Gopalakrishna

Más de Kishore Gopalakrishna (8)

History of Apache Pinot
History of Apache Pinot History of Apache Pinot
History of Apache Pinot
 
Multi-Tenant Data Cloud with YARN & Helix
Multi-Tenant Data Cloud with YARN & HelixMulti-Tenant Data Cloud with YARN & Helix
Multi-Tenant Data Cloud with YARN & Helix
 
Helix talk at RelateIQ
Helix talk at RelateIQHelix talk at RelateIQ
Helix talk at RelateIQ
 
Untangling cluster management with Helix
Untangling cluster management with HelixUntangling cluster management with Helix
Untangling cluster management with Helix
 
Data driven testing: Case study with Apache Helix
Data driven testing: Case study with Apache HelixData driven testing: Case study with Apache Helix
Data driven testing: Case study with Apache Helix
 
Apache Helix presentation at Vmware
Apache Helix presentation at VmwareApache Helix presentation at Vmware
Apache Helix presentation at Vmware
 
Apache Helix presentation at ApacheCon 2013
Apache Helix presentation at ApacheCon 2013Apache Helix presentation at ApacheCon 2013
Apache Helix presentation at ApacheCon 2013
 
Apache Helix presentation at SOCC 2012
Apache Helix presentation at SOCC 2012Apache Helix presentation at SOCC 2012
Apache Helix presentation at SOCC 2012
 

Último

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Último (20)

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 

Pinot: Realtime Distributed OLAP datastore

  • 2. Agenda • Pinot @ LinkedIn - Current • Pinot - Architecture • Pinot Operations • Pinot @ LinkedIn - Future Tuesday, August 18, 15
  • 4. Slice and Dice Metrics Tuesday, August 18, 15
  • 5. Pinot @ LinkedIn Customers Members Internal tools Tuesday, August 18, 15
  • 6. • 100B documents • 1B documents ingested per day • 100M queries per day • 10’s of ms latency • 30 tables in prod, 250 * 3 std app nodes Pinot @ LinkedIn Tuesday, August 18, 15
  • 8. (S)QL: Filters and Aggs SELECT count(*) FROM companyFollowHistoricalEvents WHERE entityId = 121011 AND 'day' >= 15949 AND 'day' <= 15963 AND paid = 'y’ AND action = 'stop' Tuesday, August 18, 15
  • 9. (S)QL: Group By SELECT count(*) FROM companyFollowHistoricalEvents WHERE entityId = 121011 AND 'day' >= 15949 AND 'day' <= 15963 AND paid = 'y’ GROUP BY action Tuesday, August 18, 15
  • 10. (S)QL: ORDER BY and LIMIT SELECT * FROM companyFollowHistoricalEvents WHERE entityId = 121011 AND entityId = 1000 AND action = 'start' ORDER BY creationTime DESC LIMIT 1 Tuesday, August 18, 15
  • 11. Whats not supported • JOIN: unpredictable performance • NOT A SOURCE OF TRUTH • Mutation Tuesday, August 18, 15
  • 12. Pinot • Data flow • Query Execution • How to use/operate • Pinot @ LinkedIn - Future Tuesday, August 18, 15
  • 15. Pinot Segment layout: Columnar storage Tuesday, August 18, 15
  • 16. Pinot Segment layout: Sorted Forward Index Tuesday, August 18, 15
  • 17. Pinot Segment layout: Other techniques • Indexes: Inverted index, Bitmap, RoaringBitmap • Compression: Dictionary Encoding, P4Delta • Multi Valued columns, skip lists, • Hyperloglog for unique • T-digest for Percentile, Quantile Tuesday, August 18, 15
  • 18. Data aware pre-computation Star tree Index Tuesday, August 18, 15
  • 20. Pinot Query Execution: Distributed Servers S1 S3 S2 S1 S3 S2 Helix Brokers Tuesday, August 18, 15
  • 21. Pinot Query Execution: Distributed Servers 1.Query S1 S3 S2 S1 S3 S2 Helix Brokers Tuesday, August 18, 15
  • 22. Pinot Query Execution: Distributed Servers 1.Query S1 S3 S2 S1 S3 S2 Helix 2. Fetch routing table from HelixBrokers Tuesday, August 18, 15
  • 23. Pinot Query Execution: Distributed Servers 1.Query S1 S3 S2 S1 S3 S2 Helix 2. Fetch routing table from HelixBrokers 3. Scatter Request Tuesday, August 18, 15
  • 24. Pinot Query Execution: Distributed Servers 1.Query S1 S3 S2 S1 S3 S2 Helix 2. Fetch routing table from HelixBrokers 3. Scatter Request 4. Process Request & send response Tuesday, August 18, 15
  • 25. Pinot Query Execution: Distributed Servers 1.Query S1 S3 S2 S1 S3 S2 Helix 2. Fetch routing table from HelixBrokers 3. Scatter Request 4. Process Request & send response 5. Gather Response Tuesday, August 18, 15
  • 26. Pinot Query Execution: Distributed Servers 1.Query S1 S3 S2 S1 S3 S2 Helix 2. Fetch routing table from HelixBrokers 3. Scatter Request 4. Process Request & send response 5. Gather Response 6. Return Response Tuesday, August 18, 15
  • 27. Pinot Query Execution: Single Node Architecture EXECUTION ENGINE INVERTED INDEX BITMAP INDEX COLUMN FORMAT PLANNER Tuesday, August 18, 15
  • 28. Pinot Query Execution: Single Node Architecture SELECT campaignId, sum(clicks) FROM Table A WHERE accountId = 121011 AND 'day' >= 15949 GROUP BY campaignId account Id daycampaign Id click Filter Operator Projection Operator Aggregation Group by Operator Combine Operator Pinot Segments Data sources Matching doc ids campaignId,Click tuple Tuesday, August 18, 15
  • 30. Cluster Management: Deployment Helix Brokers Servers • Brokers and Servers register themselves in Helix • All servers start with no use case specific configuration Controller Tuesday, August 18, 15
  • 31. On boarding new use case Helix Brokers Servers XLNT XLNT XLNT Create Table command Controller XLNT XLNTTag Servers TableName Brokers 3 XLNT_T1 1 Tuesday, August 18, 15
  • 32. Segment Assignment Servers S3 S2 S1 Upload Segment S2 S1 S3 S2 S1 S3 Helix Brokers Copies TableName 2 XLNT_T1 Controller Tuesday, August 18, 15
  • 33. • AUTO recovery mode: Automatically redistribute segments on failure/addition of new nodes • Custom mode: Run in degraded mode until node is restarted/replaced. Pinot - Fault tolerance/Elasticity Tuesday, August 18, 15
  • 34. Pinot vs Druid Druid Pinot Architecture Realtime + Offline, Realtime only Realtime + Offline Realtime only -> consistency is hard and schema evolution/Bootstrap is hard Inverted Index Always On all columns, Fixed Configurable on per column basis Allows trade off between scanning v/s inverted index + scanning. More data can be fit in given memory size Data organization N/A Sorts data Organizing data provides speed/better compression and removes the need for inverted index Smart pre- materialization N/A star-tree Allows trade off between latency and space Query Execution Layer Fixed Plan Split into Planning and execution Smart choices can be made at runtime based on metadata/query. Tuesday, August 18, 15
  • 35. • Documentation & tooling • In progress - consistency among real time replicas. • Improve cost to serve - leverage SSD, partial pre materialization • ThirdEye - Business Metrics Monitoring Pinot - Future Tuesday, August 18, 15