SlideShare una empresa de Scribd logo
1 de 8
FlumeBase Study

      Nov. 29, 2011
        Willis Gong
Big Data Engineering Team
       Hanborq Inc.
Application scenario
• Originating tier
    – Automatically reconfigured as fan out when pull a flow from a stream
    – agentBESink to forward event to FB’s ‘collectorSource’
• Flumebase:
    – Is actually a physical flume node created using flume node constructor
      FlumeNode(…)
    – Presents two type of logical nodes
        • Source adapting node
            – One node for each stream: reuse and de-multiplex stream into flow
            – Input: delimited, regex, avro
        • Output node
            – One node for each named ‘flow’
            – Emitting avro record
            – need manually re-route to appropriate sink
• FB can also use local file source
Flumebase Server
• Stream: to share event from same flume node; created by sql statement
  “CREATE STREAM …”; composed of 0+ flow
• Flow: each ‘select’ statement produce a flow
• rtsqlmultisink
    – Input side: reuse events from same collectorSource
    – Output side: no effect actually (should be manually replaced)
• rtsqlsink: wrap and drive flume event into flumebase flow pipeline
• rtsqlsource: emitting avro record produced by flumebase flow pipeline
• Flumebase flow pipeline: the main thread
    – Process operation from shell
    – Flow lifecycle management (create, deploy, event-feed, terminate)
Flumebase flow pipeline
• Flow:
   – Is a graph of flow elements
   – Take input from rtsqlsink and produce output to rtsqlsource
• Flow element:
   – Each carry out certain functionality in a sql query, like:
          • Project, aggregation, filter, join, etc.
   – drive by the pipeline: take event and produce output
   – Output varies depends on implementation
          • output to next phase queue, or
          • output as flow final result, or
          • Cache and output later (for aggregation)
The aggregation flow element
• Operates on ‘window’
    – Defined by a relative range of time
    – Further divided into smaller time-slot
      (customizable slot width)
         • Aggregation is firstly done per slot, then
           summarized on all slot when window finished
    – A event fall into a particular slot according
      to its timestamp
         • The timestamp is either specified column in
           the record or local sampling time
• Two thread:
    – main thread drive in-window event
    – eviction thread watches when to close a
      window
• Output one record containing results from
  all aggregation functions once a window is
  closed
Features
• Compared with ordinary sql
  – No primary index
     • Do not identify if record is duplicated
  – The window concept
• Compared with ordinary flume node
  – Flumebase logical nodes are with particular
    source & sink – rtsqlxxx
  – Flumebase logical node cannot be initiated by
    flume master – FB shell instead
Features
• SQL
  – CREATE STREAM stream_name (col_name data_type [, ...])
    FROM [LOCAL] {FILE | NODE | SOURCE} input_spec
    [EVENT FORMAT format_spec
    [PROPERTIES (key = val, …)]]
  – SELECT select_expr, select_expr ... FROM stream_reference
    [ JOIN stream_reference ON join_expr OVER range_expr, JOIN ... ]
    [ WHERE where_condition ]
    [ GROUP BY column_list ] [ OVER range_expr ] [ HAVING
    having_condition ]
    [ WINDOW window_name AS ( range_expr ), WINDOW ... ]
Possible issues
• Aggregation
    – Currently FB window is not timeline aligned
        • may need to be aligned with seconds or minutes or hours
    – FB do not to support distinct
• Deployment
    – Currently usage: flume deploy  FB start up  FB shell create stream / flow
       manually re-route FB output logic node
        • manually change sink for rtsqlsource
    – Better if FB stream / flow auto created by configuration from flume – better
      integration with flume
• Code maturity is in doubt
    – Seems to based on flume-0.9.3
    – Not work directly on cdhu1 & 2
    – According to github: few activities
        • No update within about half year
        • Very few issues and discussion; issues unresolved
        • One contributors – the author

Más contenido relacionado

La actualidad más candente

Register allocation
Register allocationRegister allocation
Register allocationJK Knowledge
 
127 Ch 2: Stack overflows on Linux
127 Ch 2: Stack overflows on Linux127 Ch 2: Stack overflows on Linux
127 Ch 2: Stack overflows on LinuxSam Bowne
 
Fluentd and Distributed Logging at Kubecon
Fluentd and Distributed Logging at KubeconFluentd and Distributed Logging at Kubecon
Fluentd and Distributed Logging at KubeconN Masahiro
 
CNIT 127 Ch 2: Stack overflows on Linux
CNIT 127 Ch 2: Stack overflows on LinuxCNIT 127 Ch 2: Stack overflows on Linux
CNIT 127 Ch 2: Stack overflows on LinuxSam Bowne
 
CNIT 127: Ch 8: Windows overflows (Part 2)
CNIT 127: Ch 8: Windows overflows (Part 2)CNIT 127: Ch 8: Windows overflows (Part 2)
CNIT 127: Ch 8: Windows overflows (Part 2)Sam Bowne
 
JS introduction
JS introductionJS introduction
JS introductionYi Tseng
 
CNIT 127: Ch 2: Stack Overflows in Linux
CNIT 127: Ch 2: Stack Overflows in LinuxCNIT 127: Ch 2: Stack Overflows in Linux
CNIT 127: Ch 2: Stack Overflows in LinuxSam Bowne
 
CNIT 127 Ch 2: Stack overflows on Linux
CNIT 127 Ch 2: Stack overflows on LinuxCNIT 127 Ch 2: Stack overflows on Linux
CNIT 127 Ch 2: Stack overflows on LinuxSam Bowne
 
CNIT 127 Ch 3: Shellcode
CNIT 127 Ch 3: ShellcodeCNIT 127 Ch 3: Shellcode
CNIT 127 Ch 3: ShellcodeSam Bowne
 
CNIT 127: Ch 3: Shellcode
CNIT 127: Ch 3: ShellcodeCNIT 127: Ch 3: Shellcode
CNIT 127: Ch 3: ShellcodeSam Bowne
 
CNIT 127 Ch 1: Before you Begin
CNIT 127 Ch 1: Before you BeginCNIT 127 Ch 1: Before you Begin
CNIT 127 Ch 1: Before you BeginSam Bowne
 
What's in the Box?: An Intro to HFM System Utilities
What's in the Box?: An Intro to HFM System Utilities What's in the Box?: An Intro to HFM System Utilities
What's in the Box?: An Intro to HFM System Utilities Alithya
 
Flink Forward Berlin 2017: Boris Lublinsky, Stavros Kontopoulos - Introducing...
Flink Forward Berlin 2017: Boris Lublinsky, Stavros Kontopoulos - Introducing...Flink Forward Berlin 2017: Boris Lublinsky, Stavros Kontopoulos - Introducing...
Flink Forward Berlin 2017: Boris Lublinsky, Stavros Kontopoulos - Introducing...Flink Forward
 
Linker and loader upload
Linker and loader   uploadLinker and loader   upload
Linker and loader uploadBin Yang
 
Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019
Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019
Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019Thomas Weise
 
The Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and ContainersThe Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and ContainersSATOSHI TAGOMORI
 
Dynamic pricing of Lyft rides using streaming
Dynamic pricing of Lyft rides using streamingDynamic pricing of Lyft rides using streaming
Dynamic pricing of Lyft rides using streamingAmar Pai
 

La actualidad más candente (20)

Register allocation
Register allocationRegister allocation
Register allocation
 
127 Ch 2: Stack overflows on Linux
127 Ch 2: Stack overflows on Linux127 Ch 2: Stack overflows on Linux
127 Ch 2: Stack overflows on Linux
 
Fluentd and Distributed Logging at Kubecon
Fluentd and Distributed Logging at KubeconFluentd and Distributed Logging at Kubecon
Fluentd and Distributed Logging at Kubecon
 
CNIT 127 Ch 2: Stack overflows on Linux
CNIT 127 Ch 2: Stack overflows on LinuxCNIT 127 Ch 2: Stack overflows on Linux
CNIT 127 Ch 2: Stack overflows on Linux
 
CNIT 127: Ch 8: Windows overflows (Part 2)
CNIT 127: Ch 8: Windows overflows (Part 2)CNIT 127: Ch 8: Windows overflows (Part 2)
CNIT 127: Ch 8: Windows overflows (Part 2)
 
JS introduction
JS introductionJS introduction
JS introduction
 
CNIT 127: Ch 2: Stack Overflows in Linux
CNIT 127: Ch 2: Stack Overflows in LinuxCNIT 127: Ch 2: Stack Overflows in Linux
CNIT 127: Ch 2: Stack Overflows in Linux
 
CNIT 127 Ch 2: Stack overflows on Linux
CNIT 127 Ch 2: Stack overflows on LinuxCNIT 127 Ch 2: Stack overflows on Linux
CNIT 127 Ch 2: Stack overflows on Linux
 
CNIT 127 Ch 3: Shellcode
CNIT 127 Ch 3: ShellcodeCNIT 127 Ch 3: Shellcode
CNIT 127 Ch 3: Shellcode
 
Migration to Siebel IP17+
Migration to Siebel IP17+Migration to Siebel IP17+
Migration to Siebel IP17+
 
CNIT 127: Ch 3: Shellcode
CNIT 127: Ch 3: ShellcodeCNIT 127: Ch 3: Shellcode
CNIT 127: Ch 3: Shellcode
 
Compilation
CompilationCompilation
Compilation
 
CNIT 127 Ch 1: Before you Begin
CNIT 127 Ch 1: Before you BeginCNIT 127 Ch 1: Before you Begin
CNIT 127 Ch 1: Before you Begin
 
What's in the Box?: An Intro to HFM System Utilities
What's in the Box?: An Intro to HFM System Utilities What's in the Box?: An Intro to HFM System Utilities
What's in the Box?: An Intro to HFM System Utilities
 
Flink Forward Berlin 2017: Boris Lublinsky, Stavros Kontopoulos - Introducing...
Flink Forward Berlin 2017: Boris Lublinsky, Stavros Kontopoulos - Introducing...Flink Forward Berlin 2017: Boris Lublinsky, Stavros Kontopoulos - Introducing...
Flink Forward Berlin 2017: Boris Lublinsky, Stavros Kontopoulos - Introducing...
 
Linker and loader upload
Linker and loader   uploadLinker and loader   upload
Linker and loader upload
 
Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019
Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019
Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019
 
SAP LVM Custom Instances
SAP LVM Custom InstancesSAP LVM Custom Instances
SAP LVM Custom Instances
 
The Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and ContainersThe Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and Containers
 
Dynamic pricing of Lyft rides using streaming
Dynamic pricing of Lyft rides using streamingDynamic pricing of Lyft rides using streaming
Dynamic pricing of Lyft rides using streaming
 

Similar a FlumeBase Study

Centralized logging with Flume
Centralized logging with FlumeCentralized logging with Flume
Centralized logging with FlumeRatnakar Pawar
 
Give your little scripts big wings: Using cron in the cloud with Amazon Simp...
Give your little scripts big wings:  Using cron in the cloud with Amazon Simp...Give your little scripts big wings:  Using cron in the cloud with Amazon Simp...
Give your little scripts big wings: Using cron in the cloud with Amazon Simp...Amazon Web Services
 
Symfony Components 2.0 on PHP 5.3
Symfony Components 2.0 on PHP 5.3Symfony Components 2.0 on PHP 5.3
Symfony Components 2.0 on PHP 5.3Fabien Potencier
 
Low Latency Streaming Data Processing in Hadoop
Low Latency Streaming Data Processing in HadoopLow Latency Streaming Data Processing in Hadoop
Low Latency Streaming Data Processing in HadoopInSemble
 
Cloud Foundry Monitoring How-To: Collecting Metrics and Logs
Cloud Foundry Monitoring How-To: Collecting Metrics and LogsCloud Foundry Monitoring How-To: Collecting Metrics and Logs
Cloud Foundry Monitoring How-To: Collecting Metrics and LogsAltoros
 
Stream Processing with Apache Flink
Stream Processing with Apache FlinkStream Processing with Apache Flink
Stream Processing with Apache FlinkC4Media
 
Etienne chauchot spark structured streaming runner
Etienne chauchot spark structured streaming runnerEtienne chauchot spark structured streaming runner
Etienne chauchot spark structured streaming runnerEtienne Chauchot
 
Flink Forward San Francisco 2019: Building production Flink jobs with Airstre...
Flink Forward San Francisco 2019: Building production Flink jobs with Airstre...Flink Forward San Francisco 2019: Building production Flink jobs with Airstre...
Flink Forward San Francisco 2019: Building production Flink jobs with Airstre...Flink Forward
 
Advanced windows debugging
Advanced windows debuggingAdvanced windows debugging
Advanced windows debuggingchrisortman
 
-ImplementSubprogram-theory of programming
-ImplementSubprogram-theory of programming-ImplementSubprogram-theory of programming
-ImplementSubprogram-theory of programmingjavariagull777
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Guido Schmutz
 
3.2 Streaming and Messaging
3.2 Streaming and Messaging3.2 Streaming and Messaging
3.2 Streaming and Messaging振东 刘
 
Ansible benelux meetup - Amsterdam 27-5-2015
Ansible benelux meetup - Amsterdam 27-5-2015Ansible benelux meetup - Amsterdam 27-5-2015
Ansible benelux meetup - Amsterdam 27-5-2015Pavel Chunyayev
 
Getting Data In and Out of Flink - Understanding Flink and Its Connector Ecos...
Getting Data In and Out of Flink - Understanding Flink and Its Connector Ecos...Getting Data In and Out of Flink - Understanding Flink and Its Connector Ecos...
Getting Data In and Out of Flink - Understanding Flink and Its Connector Ecos...HostedbyConfluent
 
QCon London - Stream Processing with Apache Flink
QCon London - Stream Processing with Apache FlinkQCon London - Stream Processing with Apache Flink
QCon London - Stream Processing with Apache FlinkRobert Metzger
 
Open Source Swift Under the Hood
Open Source Swift Under the HoodOpen Source Swift Under the Hood
Open Source Swift Under the HoodC4Media
 

Similar a FlumeBase Study (20)

Centralized logging with Flume
Centralized logging with FlumeCentralized logging with Flume
Centralized logging with Flume
 
Give your little scripts big wings: Using cron in the cloud with Amazon Simp...
Give your little scripts big wings:  Using cron in the cloud with Amazon Simp...Give your little scripts big wings:  Using cron in the cloud with Amazon Simp...
Give your little scripts big wings: Using cron in the cloud with Amazon Simp...
 
Symfony Components 2.0 on PHP 5.3
Symfony Components 2.0 on PHP 5.3Symfony Components 2.0 on PHP 5.3
Symfony Components 2.0 on PHP 5.3
 
Serverless design with Fn project
Serverless design with Fn projectServerless design with Fn project
Serverless design with Fn project
 
Low Latency Streaming Data Processing in Hadoop
Low Latency Streaming Data Processing in HadoopLow Latency Streaming Data Processing in Hadoop
Low Latency Streaming Data Processing in Hadoop
 
Apache Spark Components
Apache Spark ComponentsApache Spark Components
Apache Spark Components
 
Cloud Foundry Monitoring How-To: Collecting Metrics and Logs
Cloud Foundry Monitoring How-To: Collecting Metrics and LogsCloud Foundry Monitoring How-To: Collecting Metrics and Logs
Cloud Foundry Monitoring How-To: Collecting Metrics and Logs
 
Stream Processing with Apache Flink
Stream Processing with Apache FlinkStream Processing with Apache Flink
Stream Processing with Apache Flink
 
Etienne chauchot spark structured streaming runner
Etienne chauchot spark structured streaming runnerEtienne chauchot spark structured streaming runner
Etienne chauchot spark structured streaming runner
 
Flink Forward San Francisco 2019: Building production Flink jobs with Airstre...
Flink Forward San Francisco 2019: Building production Flink jobs with Airstre...Flink Forward San Francisco 2019: Building production Flink jobs with Airstre...
Flink Forward San Francisco 2019: Building production Flink jobs with Airstre...
 
Advanced windows debugging
Advanced windows debuggingAdvanced windows debugging
Advanced windows debugging
 
-ImplementSubprogram-theory of programming
-ImplementSubprogram-theory of programming-ImplementSubprogram-theory of programming
-ImplementSubprogram-theory of programming
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
 
3.2 Streaming and Messaging
3.2 Streaming and Messaging3.2 Streaming and Messaging
3.2 Streaming and Messaging
 
Ansible benelux meetup - Amsterdam 27-5-2015
Ansible benelux meetup - Amsterdam 27-5-2015Ansible benelux meetup - Amsterdam 27-5-2015
Ansible benelux meetup - Amsterdam 27-5-2015
 
Get to Know AtoM's Codebase
Get to Know AtoM's CodebaseGet to Know AtoM's Codebase
Get to Know AtoM's Codebase
 
Travis Wright - PS WF SMA SCSM SP
Travis Wright - PS WF SMA SCSM SPTravis Wright - PS WF SMA SCSM SP
Travis Wright - PS WF SMA SCSM SP
 
Getting Data In and Out of Flink - Understanding Flink and Its Connector Ecos...
Getting Data In and Out of Flink - Understanding Flink and Its Connector Ecos...Getting Data In and Out of Flink - Understanding Flink and Its Connector Ecos...
Getting Data In and Out of Flink - Understanding Flink and Its Connector Ecos...
 
QCon London - Stream Processing with Apache Flink
QCon London - Stream Processing with Apache FlinkQCon London - Stream Processing with Apache Flink
QCon London - Stream Processing with Apache Flink
 
Open Source Swift Under the Hood
Open Source Swift Under the HoodOpen Source Swift Under the Hood
Open Source Swift Under the Hood
 

Más de Hanborq Inc.

Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to CassandraHanborq Inc.
 
Hadoop HDFS NameNode HA
Hadoop HDFS NameNode HAHadoop HDFS NameNode HA
Hadoop HDFS NameNode HAHanborq Inc.
 
Hadoop大数据实践经验
Hadoop大数据实践经验Hadoop大数据实践经验
Hadoop大数据实践经验Hanborq Inc.
 
Flume and Flive Introduction
Flume and Flive IntroductionFlume and Flive Introduction
Flume and Flive IntroductionHanborq Inc.
 
Hadoop MapReduce Streaming and Pipes
Hadoop MapReduce  Streaming and PipesHadoop MapReduce  Streaming and Pipes
Hadoop MapReduce Streaming and PipesHanborq Inc.
 
HBase Introduction
HBase IntroductionHBase Introduction
HBase IntroductionHanborq Inc.
 
Hadoop MapReduce Task Scheduler Introduction
Hadoop MapReduce Task Scheduler IntroductionHadoop MapReduce Task Scheduler Introduction
Hadoop MapReduce Task Scheduler IntroductionHanborq Inc.
 
Hadoop MapReduce Introduction and Deep Insight
Hadoop MapReduce Introduction and Deep InsightHadoop MapReduce Introduction and Deep Insight
Hadoop MapReduce Introduction and Deep InsightHanborq Inc.
 
Hadoop HDFS Detailed Introduction
Hadoop HDFS Detailed IntroductionHadoop HDFS Detailed Introduction
Hadoop HDFS Detailed IntroductionHanborq Inc.
 
How to Build Cloud Storage Service Systems
How to Build Cloud Storage Service SystemsHow to Build Cloud Storage Service Systems
How to Build Cloud Storage Service SystemsHanborq Inc.
 
Hanborq Optimizations on Hadoop MapReduce
Hanborq Optimizations on Hadoop MapReduceHanborq Optimizations on Hadoop MapReduce
Hanborq Optimizations on Hadoop MapReduceHanborq Inc.
 

Más de Hanborq Inc. (12)

Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to Cassandra
 
Hadoop HDFS NameNode HA
Hadoop HDFS NameNode HAHadoop HDFS NameNode HA
Hadoop HDFS NameNode HA
 
Hadoop大数据实践经验
Hadoop大数据实践经验Hadoop大数据实践经验
Hadoop大数据实践经验
 
Flume and Flive Introduction
Flume and Flive IntroductionFlume and Flive Introduction
Flume and Flive Introduction
 
Hadoop MapReduce Streaming and Pipes
Hadoop MapReduce  Streaming and PipesHadoop MapReduce  Streaming and Pipes
Hadoop MapReduce Streaming and Pipes
 
HBase Introduction
HBase IntroductionHBase Introduction
HBase Introduction
 
Hadoop Versioning
Hadoop VersioningHadoop Versioning
Hadoop Versioning
 
Hadoop MapReduce Task Scheduler Introduction
Hadoop MapReduce Task Scheduler IntroductionHadoop MapReduce Task Scheduler Introduction
Hadoop MapReduce Task Scheduler Introduction
 
Hadoop MapReduce Introduction and Deep Insight
Hadoop MapReduce Introduction and Deep InsightHadoop MapReduce Introduction and Deep Insight
Hadoop MapReduce Introduction and Deep Insight
 
Hadoop HDFS Detailed Introduction
Hadoop HDFS Detailed IntroductionHadoop HDFS Detailed Introduction
Hadoop HDFS Detailed Introduction
 
How to Build Cloud Storage Service Systems
How to Build Cloud Storage Service SystemsHow to Build Cloud Storage Service Systems
How to Build Cloud Storage Service Systems
 
Hanborq Optimizations on Hadoop MapReduce
Hanborq Optimizations on Hadoop MapReduceHanborq Optimizations on Hadoop MapReduce
Hanborq Optimizations on Hadoop MapReduce
 

Último

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 

Último (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 

FlumeBase Study

  • 1. FlumeBase Study Nov. 29, 2011 Willis Gong Big Data Engineering Team Hanborq Inc.
  • 2. Application scenario • Originating tier – Automatically reconfigured as fan out when pull a flow from a stream – agentBESink to forward event to FB’s ‘collectorSource’ • Flumebase: – Is actually a physical flume node created using flume node constructor FlumeNode(…) – Presents two type of logical nodes • Source adapting node – One node for each stream: reuse and de-multiplex stream into flow – Input: delimited, regex, avro • Output node – One node for each named ‘flow’ – Emitting avro record – need manually re-route to appropriate sink • FB can also use local file source
  • 3. Flumebase Server • Stream: to share event from same flume node; created by sql statement “CREATE STREAM …”; composed of 0+ flow • Flow: each ‘select’ statement produce a flow • rtsqlmultisink – Input side: reuse events from same collectorSource – Output side: no effect actually (should be manually replaced) • rtsqlsink: wrap and drive flume event into flumebase flow pipeline • rtsqlsource: emitting avro record produced by flumebase flow pipeline • Flumebase flow pipeline: the main thread – Process operation from shell – Flow lifecycle management (create, deploy, event-feed, terminate)
  • 4. Flumebase flow pipeline • Flow: – Is a graph of flow elements – Take input from rtsqlsink and produce output to rtsqlsource • Flow element: – Each carry out certain functionality in a sql query, like: • Project, aggregation, filter, join, etc. – drive by the pipeline: take event and produce output – Output varies depends on implementation • output to next phase queue, or • output as flow final result, or • Cache and output later (for aggregation)
  • 5. The aggregation flow element • Operates on ‘window’ – Defined by a relative range of time – Further divided into smaller time-slot (customizable slot width) • Aggregation is firstly done per slot, then summarized on all slot when window finished – A event fall into a particular slot according to its timestamp • The timestamp is either specified column in the record or local sampling time • Two thread: – main thread drive in-window event – eviction thread watches when to close a window • Output one record containing results from all aggregation functions once a window is closed
  • 6. Features • Compared with ordinary sql – No primary index • Do not identify if record is duplicated – The window concept • Compared with ordinary flume node – Flumebase logical nodes are with particular source & sink – rtsqlxxx – Flumebase logical node cannot be initiated by flume master – FB shell instead
  • 7. Features • SQL – CREATE STREAM stream_name (col_name data_type [, ...]) FROM [LOCAL] {FILE | NODE | SOURCE} input_spec [EVENT FORMAT format_spec [PROPERTIES (key = val, …)]] – SELECT select_expr, select_expr ... FROM stream_reference [ JOIN stream_reference ON join_expr OVER range_expr, JOIN ... ] [ WHERE where_condition ] [ GROUP BY column_list ] [ OVER range_expr ] [ HAVING having_condition ] [ WINDOW window_name AS ( range_expr ), WINDOW ... ]
  • 8. Possible issues • Aggregation – Currently FB window is not timeline aligned • may need to be aligned with seconds or minutes or hours – FB do not to support distinct • Deployment – Currently usage: flume deploy  FB start up  FB shell create stream / flow  manually re-route FB output logic node • manually change sink for rtsqlsource – Better if FB stream / flow auto created by configuration from flume – better integration with flume • Code maturity is in doubt – Seems to based on flume-0.9.3 – Not work directly on cdhu1 & 2 – According to github: few activities • No update within about half year • Very few issues and discussion; issues unresolved • One contributors – the author