SlideShare una empresa de Scribd logo
1 de 20
Flume Office Hours Community planning Jonathan Hsieh Cloudera HQ, 2/28/2011
Outline State of the world What’s new? Stories (Chime in!) What needs work? Prioritizing what is next. Q+A 3 Flume Office Hours, 2/28/2011
State of the world Flume Office Hours, 2/28/2011 4
Growing user and developer community  Github stats: Currently 295 watchers, 51 forks New Committers:  9/10: Eric Sammer (Cloudera) 1/11: Bruce Mitchener (Independent) User characteristics Most potential users seem to use adhoc scripts Most users are early adopters / startup devops Flume Office Hours, 2/28/2011 5
A short feature history 6/10: v0.9.0  Initial open source release 8/10: v0.9.1  Fixes for hangs  Initial compression features 10/10: v0.9.1+29 (CDH3b3, packages) Added kerberized HDFS support Flume cookbook Elastic Search / Cassandra Plugins Initial VoldemortPlugins 11/10: v0.9.2 Support for other compression codecs Avro RPC Improvements to tail and exec Robustness improvements Initial Hbase /MongoDBPlugin 2/11: v0.9.3 (CDH3b4, packages) Flume Node Windows support Initial JSON metrics support Multi-master functional Robustness improvements JRuby / AMQP Plugins S3/EC2 Blog Stories 4/11: v0.9.3+xxx (CDH3 Stable, packages) Excessive Duplication fixes Compression fixes ?/11: v0.9.4 Flume Office Hours, 2/28/2011 6
Whats new? Flume Office Hours, 2/28/2011 7
New features Flume node JSON metrics http://node:35862/node/reports Terser syntax { deco1 => { deco2 => sink } }  deco1 deco2 sink  Multiple collector sink support collector(30000) { [ escapedCustomDfs(“hdfs://nn1/path”,”prefix”,”format”), escapedCustomDfs(“hdfs://nn2/path”,”prefix”,”format”),   ] } Limited Multi-master support Windows support Flume Office Hours, 2/28/2011 8
Stories 9 Flume Office Hours, 2/28/2011
                : The Standard Use Case HDFS Flume Master Agent server Agent Collector server Agent server Agent server 10 Agent server Agent Collector server Agent server Agent server Agent server Agent Collector server Agent server Agent server Collector tier Agent tier Flume Office Hours, 2/28/2011
                       : Multi Datacenter 11 HDFS Collector tier Agent api Agent api Agent Collector api Agent api API server Agent api Agent Collector api Agent api Agent api Agent api Agent Collector api Agent api Agent api Agent api Agent api Agent Collector api Agent proc Agent api Processor server Agent Collector api Agent api Agent proc Agent api Agent Collector api Agent api Agent proc Flume Office Hours, 2/28/2011
                       : Multi Datacenter 12 HDFS Collector tier Agent api Agent api Agent Collector api Agent api API server Agent api Agent Collector api Agent api Agent api Agent api Agent Collector api Agent api Agent api Relay Agent api Agent api Agent Collector api Agent proc Agent api Processor server Agent Collector api Agent api Agent proc Agent api Agent Collector api Agent api Agent proc Flume Office Hours, 2/28/2011
             : Near Realtime Aggregator 13 HDFS DB Flume Agent Ad svr Collector Tracker  Agent Ad svr Agent Ad svr Agent Ad svr quick reports Hive job verify reports Flume Office Hours, 2/28/2011
An enterprise story 14 Kerberos HDFS Flume Collector tier Agent api Agent Collector api Agent api Win api API server Agent api Agent Collector api Agent api Linux api D D D D D D Agent api Agent Collector api Agent api Linux api Flume Office Hours, 2/28/2011 Active Directory  / LDAP
An emerging community story 15 HDFS HBase Incremental Search Idx Flume Agent Hive query Agent Agent Collector Fanout index hbase hdfs Agent svr Pig query Key lookup Range query Search query Faceted query Flume Office Hours, 2/28/2011
What needs work?What comes next? Flume Office Hours, 2/28/2011 16
Known issues Excessive event duplication (due to tail or e2e agent) Configuration translation problem in some cases Multi-master limited: doesn’t work with translations Flume Office Hours, 2/28/2011 17
What’s next? (proposals) Fix Excessive duplication issues. Apache Incubator (?) Log4j/Log4net/logback/etc… Fix Multi-master limitations. Security upgrades for node to node comms (TLS/SSL) Improved metrics / GUI / usability Integration with open source alerting/monitoring tools Integration with proprietary systems Version proofing RPCs / State storage Packaging friendly plug-in install Multi Datacenter Story Performance Increases Inline near-realtime analytics Puppet/Chef style config for nodes Lightweight Agent Masterless Agent Better S3 / AWS support Flume Office Hours, 2/28/2011 18
Q+A 19 Flume Office Hours, 2/28/2011
Flume office-hours-110228

Más contenido relacionado

La actualidad más candente

Docker at Spotify
Docker at SpotifyDocker at Spotify
Docker at SpotifyRohan Singh
 
Bareos - Open Source Data Protection, by Philipp Storz
Bareos - Open Source Data Protection, by Philipp StorzBareos - Open Source Data Protection, by Philipp Storz
Bareos - Open Source Data Protection, by Philipp StorzNETWAYS
 
Introduction to Subversion and Google Project Hosting
Introduction to Subversion and Google Project HostingIntroduction to Subversion and Google Project Hosting
Introduction to Subversion and Google Project HostingPhilip Johnson
 
Git 101: Git and GitHub for Beginners
Git 101: Git and GitHub for Beginners Git 101: Git and GitHub for Beginners
Git 101: Git and GitHub for Beginners HubSpot
 
How to deploy PHP projects with docker
How to deploy PHP projects with dockerHow to deploy PHP projects with docker
How to deploy PHP projects with dockerRuoshi Ling
 
Docker Container As A Service - Mix-IT 2016
Docker Container As A Service - Mix-IT 2016Docker Container As A Service - Mix-IT 2016
Docker Container As A Service - Mix-IT 2016Patrick Chanezon
 
Democratizing Development - Scott Gress
Democratizing Development - Scott GressDemocratizing Development - Scott Gress
Democratizing Development - Scott GressDocker, Inc.
 
Docker serverless v1.0
Docker serverless v1.0Docker serverless v1.0
Docker serverless v1.0Thomas Chacko
 
Nextflow Camp 2019: nf-core tutorial (Updated Feb 2020)
Nextflow Camp 2019: nf-core tutorial (Updated Feb 2020)Nextflow Camp 2019: nf-core tutorial (Updated Feb 2020)
Nextflow Camp 2019: nf-core tutorial (Updated Feb 2020)Phil Ewels
 
Composer Power User Tips
Composer Power User TipsComposer Power User Tips
Composer Power User TipsTom Corrigan
 
Container orchestration from theory to practice
Container orchestration from theory to practiceContainer orchestration from theory to practice
Container orchestration from theory to practiceDocker, Inc.
 
Dockercon 16 Wrap-up (Docker for Mac and Win, Docker 1.12, Swarm Mode, etc.)
Dockercon 16 Wrap-up (Docker for Mac and Win, Docker 1.12, Swarm Mode, etc.)Dockercon 16 Wrap-up (Docker for Mac and Win, Docker 1.12, Swarm Mode, etc.)
Dockercon 16 Wrap-up (Docker for Mac and Win, Docker 1.12, Swarm Mode, etc.)Nils De Moor
 
Docker Security Deep Dive by Ying Li and David Lawrence
Docker Security Deep Dive by Ying Li and David LawrenceDocker Security Deep Dive by Ying Li and David Lawrence
Docker Security Deep Dive by Ying Li and David LawrenceDocker, Inc.
 
Docker for Java developers at JavaLand
Docker for Java developers at JavaLandDocker for Java developers at JavaLand
Docker for Java developers at JavaLandJohan Janssen
 
Docker Security workshop slides
Docker Security workshop slidesDocker Security workshop slides
Docker Security workshop slidesDocker, Inc.
 
Cloud infrastructure as code
Cloud infrastructure as codeCloud infrastructure as code
Cloud infrastructure as codeTomasz Cholewa
 
Browser Testing with Docker - Craig Huber
Browser Testing with Docker - Craig HuberBrowser Testing with Docker - Craig Huber
Browser Testing with Docker - Craig HuberDocker, Inc.
 
青云虚拟机部署私有Docker Registry
青云虚拟机部署私有Docker Registry青云虚拟机部署私有Docker Registry
青云虚拟机部署私有Docker RegistryZhichao Liang
 
fabric8 ... and Docker, Kubernetes & OpenShift
fabric8 ... and Docker, Kubernetes & OpenShiftfabric8 ... and Docker, Kubernetes & OpenShift
fabric8 ... and Docker, Kubernetes & OpenShiftroland.huss
 
What's New in Docker 1.12 (June 20, 2016) by Mike Goelzer & Andrea Luzzardi
What's New in Docker 1.12 (June 20, 2016) by Mike Goelzer & Andrea LuzzardiWhat's New in Docker 1.12 (June 20, 2016) by Mike Goelzer & Andrea Luzzardi
What's New in Docker 1.12 (June 20, 2016) by Mike Goelzer & Andrea LuzzardiMike Goelzer
 

La actualidad más candente (20)

Docker at Spotify
Docker at SpotifyDocker at Spotify
Docker at Spotify
 
Bareos - Open Source Data Protection, by Philipp Storz
Bareos - Open Source Data Protection, by Philipp StorzBareos - Open Source Data Protection, by Philipp Storz
Bareos - Open Source Data Protection, by Philipp Storz
 
Introduction to Subversion and Google Project Hosting
Introduction to Subversion and Google Project HostingIntroduction to Subversion and Google Project Hosting
Introduction to Subversion and Google Project Hosting
 
Git 101: Git and GitHub for Beginners
Git 101: Git and GitHub for Beginners Git 101: Git and GitHub for Beginners
Git 101: Git and GitHub for Beginners
 
How to deploy PHP projects with docker
How to deploy PHP projects with dockerHow to deploy PHP projects with docker
How to deploy PHP projects with docker
 
Docker Container As A Service - Mix-IT 2016
Docker Container As A Service - Mix-IT 2016Docker Container As A Service - Mix-IT 2016
Docker Container As A Service - Mix-IT 2016
 
Democratizing Development - Scott Gress
Democratizing Development - Scott GressDemocratizing Development - Scott Gress
Democratizing Development - Scott Gress
 
Docker serverless v1.0
Docker serverless v1.0Docker serverless v1.0
Docker serverless v1.0
 
Nextflow Camp 2019: nf-core tutorial (Updated Feb 2020)
Nextflow Camp 2019: nf-core tutorial (Updated Feb 2020)Nextflow Camp 2019: nf-core tutorial (Updated Feb 2020)
Nextflow Camp 2019: nf-core tutorial (Updated Feb 2020)
 
Composer Power User Tips
Composer Power User TipsComposer Power User Tips
Composer Power User Tips
 
Container orchestration from theory to practice
Container orchestration from theory to practiceContainer orchestration from theory to practice
Container orchestration from theory to practice
 
Dockercon 16 Wrap-up (Docker for Mac and Win, Docker 1.12, Swarm Mode, etc.)
Dockercon 16 Wrap-up (Docker for Mac and Win, Docker 1.12, Swarm Mode, etc.)Dockercon 16 Wrap-up (Docker for Mac and Win, Docker 1.12, Swarm Mode, etc.)
Dockercon 16 Wrap-up (Docker for Mac and Win, Docker 1.12, Swarm Mode, etc.)
 
Docker Security Deep Dive by Ying Li and David Lawrence
Docker Security Deep Dive by Ying Li and David LawrenceDocker Security Deep Dive by Ying Li and David Lawrence
Docker Security Deep Dive by Ying Li and David Lawrence
 
Docker for Java developers at JavaLand
Docker for Java developers at JavaLandDocker for Java developers at JavaLand
Docker for Java developers at JavaLand
 
Docker Security workshop slides
Docker Security workshop slidesDocker Security workshop slides
Docker Security workshop slides
 
Cloud infrastructure as code
Cloud infrastructure as codeCloud infrastructure as code
Cloud infrastructure as code
 
Browser Testing with Docker - Craig Huber
Browser Testing with Docker - Craig HuberBrowser Testing with Docker - Craig Huber
Browser Testing with Docker - Craig Huber
 
青云虚拟机部署私有Docker Registry
青云虚拟机部署私有Docker Registry青云虚拟机部署私有Docker Registry
青云虚拟机部署私有Docker Registry
 
fabric8 ... and Docker, Kubernetes & OpenShift
fabric8 ... and Docker, Kubernetes & OpenShiftfabric8 ... and Docker, Kubernetes & OpenShift
fabric8 ... and Docker, Kubernetes & OpenShift
 
What's New in Docker 1.12 (June 20, 2016) by Mike Goelzer & Andrea Luzzardi
What's New in Docker 1.12 (June 20, 2016) by Mike Goelzer & Andrea LuzzardiWhat's New in Docker 1.12 (June 20, 2016) by Mike Goelzer & Andrea Luzzardi
What's New in Docker 1.12 (June 20, 2016) by Mike Goelzer & Andrea Luzzardi
 

Destacado

Contributing to Impala
Contributing to ImpalaContributing to Impala
Contributing to ImpalaCloudera, Inc.
 
Linux Kernel Boot Process , SOSCON 2015, By Mario Cho
Linux Kernel Boot Process , SOSCON 2015, By Mario ChoLinux Kernel Boot Process , SOSCON 2015, By Mario Cho
Linux Kernel Boot Process , SOSCON 2015, By Mario ChoMario Cho
 
Mozilla - Anurag Phadke - Hadoop World 2010
Mozilla - Anurag Phadke - Hadoop World 2010Mozilla - Anurag Phadke - Hadoop World 2010
Mozilla - Anurag Phadke - Hadoop World 2010Cloudera, Inc.
 
Intel - Nurcan Coskun - Hadoop World 2010
Intel - Nurcan Coskun - Hadoop World 2010Intel - Nurcan Coskun - Hadoop World 2010
Intel - Nurcan Coskun - Hadoop World 2010Cloudera, Inc.
 
Chicago Data Summit: Geo-based Content Processing Using HBase
Chicago Data Summit: Geo-based Content Processing Using HBaseChicago Data Summit: Geo-based Content Processing Using HBase
Chicago Data Summit: Geo-based Content Processing Using HBaseCloudera, Inc.
 

Destacado (6)

Contributing to Impala
Contributing to ImpalaContributing to Impala
Contributing to Impala
 
Linux Kernel Boot Process , SOSCON 2015, By Mario Cho
Linux Kernel Boot Process , SOSCON 2015, By Mario ChoLinux Kernel Boot Process , SOSCON 2015, By Mario Cho
Linux Kernel Boot Process , SOSCON 2015, By Mario Cho
 
Mozilla - Anurag Phadke - Hadoop World 2010
Mozilla - Anurag Phadke - Hadoop World 2010Mozilla - Anurag Phadke - Hadoop World 2010
Mozilla - Anurag Phadke - Hadoop World 2010
 
Intel - Nurcan Coskun - Hadoop World 2010
Intel - Nurcan Coskun - Hadoop World 2010Intel - Nurcan Coskun - Hadoop World 2010
Intel - Nurcan Coskun - Hadoop World 2010
 
Chicago Data Summit: Geo-based Content Processing Using HBase
Chicago Data Summit: Geo-based Content Processing Using HBaseChicago Data Summit: Geo-based Content Processing Using HBase
Chicago Data Summit: Geo-based Content Processing Using HBase
 
Cloudera 5.3 Update
Cloudera 5.3 UpdateCloudera 5.3 Update
Cloudera 5.3 Update
 

Similar a Flume office-hours-110228

File Transfers - Web Hosting Curriculum [5/10]
File Transfers - Web Hosting Curriculum [5/10] File Transfers - Web Hosting Curriculum [5/10]
File Transfers - Web Hosting Curriculum [5/10] Web Hosting for Students
 
2011 06-30-hadoop-summit v5
2011 06-30-hadoop-summit v52011 06-30-hadoop-summit v5
2011 06-30-hadoop-summit v5Samuel Rash
 
Bp106 Worst Practices Final
Bp106   Worst Practices FinalBp106   Worst Practices Final
Bp106 Worst Practices FinalBill Buchan
 
PHP Quality Assurance Workshop PHPBenelux
PHP Quality Assurance Workshop PHPBeneluxPHP Quality Assurance Workshop PHPBenelux
PHP Quality Assurance Workshop PHPBeneluxNick Belhomme
 
Managing Plone Projects with Perl and Subversion
Managing Plone Projects with Perl and SubversionManaging Plone Projects with Perl and Subversion
Managing Plone Projects with Perl and SubversionLuciano Rocha
 
Puppet / DevOps - EDGE Lviv
Puppet / DevOps - EDGE LvivPuppet / DevOps - EDGE Lviv
Puppet / DevOps - EDGE Lvivzenyk
 
Flume @ Austin HUG 2/17/11
Flume @ Austin HUG 2/17/11Flume @ Austin HUG 2/17/11
Flume @ Austin HUG 2/17/11Cloudera, Inc.
 
Becoming a Plumber: Building Deployment Pipelines - All Day DevOps
Becoming a Plumber: Building Deployment Pipelines - All Day DevOpsBecoming a Plumber: Building Deployment Pipelines - All Day DevOps
Becoming a Plumber: Building Deployment Pipelines - All Day DevOpsDaniel Barker
 
Chicago Data Summit: Flume: An Introduction
Chicago Data Summit: Flume: An IntroductionChicago Data Summit: Flume: An Introduction
Chicago Data Summit: Flume: An IntroductionCloudera, Inc.
 
Uyuni: the solution to manage your Linux infrastructure (OpenFest 2020)
Uyuni: the solution to manage your Linux infrastructure (OpenFest 2020)Uyuni: the solution to manage your Linux infrastructure (OpenFest 2020)
Uyuni: the solution to manage your Linux infrastructure (OpenFest 2020)Uyuni Project
 
Evolution of HTTP - Miran Al Mehrab
Evolution of HTTP - Miran Al MehrabEvolution of HTTP - Miran Al Mehrab
Evolution of HTTP - Miran Al MehrabCefalo
 
XPDS16: Xen Project Weather Report 2016
XPDS16: Xen Project Weather Report 2016XPDS16: Xen Project Weather Report 2016
XPDS16: Xen Project Weather Report 2016The Linux Foundation
 
Room 2 - 3 - Nguyễn Hoài Nam & Nguyễn Việt Hùng - Terraform & Pulumi Comparin...
Room 2 - 3 - Nguyễn Hoài Nam & Nguyễn Việt Hùng - Terraform & Pulumi Comparin...Room 2 - 3 - Nguyễn Hoài Nam & Nguyễn Việt Hùng - Terraform & Pulumi Comparin...
Room 2 - 3 - Nguyễn Hoài Nam & Nguyễn Việt Hùng - Terraform & Pulumi Comparin...Vietnam Open Infrastructure User Group
 
Linux textbook notes - Graham Helton
Linux textbook notes - Graham HeltonLinux textbook notes - Graham Helton
Linux textbook notes - Graham HeltonGrahamHelton
 
Using Embedded Linux for Infrastructure Systems
Using Embedded Linux for Infrastructure SystemsUsing Embedded Linux for Infrastructure Systems
Using Embedded Linux for Infrastructure SystemsYoshitake Kobayashi
 
Introduction to linux at Introductory Bioinformatics Workshop
Introduction to linux at Introductory Bioinformatics WorkshopIntroduction to linux at Introductory Bioinformatics Workshop
Introduction to linux at Introductory Bioinformatics WorkshopSetor Amuzu
 
Scaleable PHP Applications in Kubernetes
Scaleable PHP Applications in KubernetesScaleable PHP Applications in Kubernetes
Scaleable PHP Applications in KubernetesRobert Lemke
 

Similar a Flume office-hours-110228 (20)

File Transfers - Web Hosting Curriculum [5/10]
File Transfers - Web Hosting Curriculum [5/10] File Transfers - Web Hosting Curriculum [5/10]
File Transfers - Web Hosting Curriculum [5/10]
 
2011 06-30-hadoop-summit v5
2011 06-30-hadoop-summit v52011 06-30-hadoop-summit v5
2011 06-30-hadoop-summit v5
 
Bp106 Worst Practices Final
Bp106   Worst Practices FinalBp106   Worst Practices Final
Bp106 Worst Practices Final
 
PHP Quality Assurance Workshop PHPBenelux
PHP Quality Assurance Workshop PHPBeneluxPHP Quality Assurance Workshop PHPBenelux
PHP Quality Assurance Workshop PHPBenelux
 
Managing Plone Projects with Perl and Subversion
Managing Plone Projects with Perl and SubversionManaging Plone Projects with Perl and Subversion
Managing Plone Projects with Perl and Subversion
 
Puppet / DevOps - EDGE Lviv
Puppet / DevOps - EDGE LvivPuppet / DevOps - EDGE Lviv
Puppet / DevOps - EDGE Lviv
 
Flume @ Austin HUG 2/17/11
Flume @ Austin HUG 2/17/11Flume @ Austin HUG 2/17/11
Flume @ Austin HUG 2/17/11
 
Becoming a Plumber: Building Deployment Pipelines - All Day DevOps
Becoming a Plumber: Building Deployment Pipelines - All Day DevOpsBecoming a Plumber: Building Deployment Pipelines - All Day DevOps
Becoming a Plumber: Building Deployment Pipelines - All Day DevOps
 
Chicago Data Summit: Flume: An Introduction
Chicago Data Summit: Flume: An IntroductionChicago Data Summit: Flume: An Introduction
Chicago Data Summit: Flume: An Introduction
 
Uyuni: the solution to manage your Linux infrastructure (OpenFest 2020)
Uyuni: the solution to manage your Linux infrastructure (OpenFest 2020)Uyuni: the solution to manage your Linux infrastructure (OpenFest 2020)
Uyuni: the solution to manage your Linux infrastructure (OpenFest 2020)
 
Introduction to Docker
Introduction to DockerIntroduction to Docker
Introduction to Docker
 
Evolution of HTTP - Miran Al Mehrab
Evolution of HTTP - Miran Al MehrabEvolution of HTTP - Miran Al Mehrab
Evolution of HTTP - Miran Al Mehrab
 
XPDS16: Xen Project Weather Report 2016
XPDS16: Xen Project Weather Report 2016XPDS16: Xen Project Weather Report 2016
XPDS16: Xen Project Weather Report 2016
 
Room 2 - 3 - Nguyễn Hoài Nam & Nguyễn Việt Hùng - Terraform & Pulumi Comparin...
Room 2 - 3 - Nguyễn Hoài Nam & Nguyễn Việt Hùng - Terraform & Pulumi Comparin...Room 2 - 3 - Nguyễn Hoài Nam & Nguyễn Việt Hùng - Terraform & Pulumi Comparin...
Room 2 - 3 - Nguyễn Hoài Nam & Nguyễn Việt Hùng - Terraform & Pulumi Comparin...
 
Linux textbook notes - Graham Helton
Linux textbook notes - Graham HeltonLinux textbook notes - Graham Helton
Linux textbook notes - Graham Helton
 
Linux training
Linux trainingLinux training
Linux training
 
Using Embedded Linux for Infrastructure Systems
Using Embedded Linux for Infrastructure SystemsUsing Embedded Linux for Infrastructure Systems
Using Embedded Linux for Infrastructure Systems
 
Introduction to linux at Introductory Bioinformatics Workshop
Introduction to linux at Introductory Bioinformatics WorkshopIntroduction to linux at Introductory Bioinformatics Workshop
Introduction to linux at Introductory Bioinformatics Workshop
 
BitTorrent on iOS
BitTorrent on iOSBitTorrent on iOS
BitTorrent on iOS
 
Scaleable PHP Applications in Kubernetes
Scaleable PHP Applications in KubernetesScaleable PHP Applications in Kubernetes
Scaleable PHP Applications in Kubernetes
 

Más de Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformCloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.
 

Más de Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Último

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 

Último (20)

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 

Flume office-hours-110228

  • 1.
  • 2. Flume Office Hours Community planning Jonathan Hsieh Cloudera HQ, 2/28/2011
  • 3. Outline State of the world What’s new? Stories (Chime in!) What needs work? Prioritizing what is next. Q+A 3 Flume Office Hours, 2/28/2011
  • 4. State of the world Flume Office Hours, 2/28/2011 4
  • 5. Growing user and developer community Github stats: Currently 295 watchers, 51 forks New Committers: 9/10: Eric Sammer (Cloudera) 1/11: Bruce Mitchener (Independent) User characteristics Most potential users seem to use adhoc scripts Most users are early adopters / startup devops Flume Office Hours, 2/28/2011 5
  • 6. A short feature history 6/10: v0.9.0 Initial open source release 8/10: v0.9.1 Fixes for hangs Initial compression features 10/10: v0.9.1+29 (CDH3b3, packages) Added kerberized HDFS support Flume cookbook Elastic Search / Cassandra Plugins Initial VoldemortPlugins 11/10: v0.9.2 Support for other compression codecs Avro RPC Improvements to tail and exec Robustness improvements Initial Hbase /MongoDBPlugin 2/11: v0.9.3 (CDH3b4, packages) Flume Node Windows support Initial JSON metrics support Multi-master functional Robustness improvements JRuby / AMQP Plugins S3/EC2 Blog Stories 4/11: v0.9.3+xxx (CDH3 Stable, packages) Excessive Duplication fixes Compression fixes ?/11: v0.9.4 Flume Office Hours, 2/28/2011 6
  • 7. Whats new? Flume Office Hours, 2/28/2011 7
  • 8. New features Flume node JSON metrics http://node:35862/node/reports Terser syntax { deco1 => { deco2 => sink } } deco1 deco2 sink Multiple collector sink support collector(30000) { [ escapedCustomDfs(“hdfs://nn1/path”,”prefix”,”format”), escapedCustomDfs(“hdfs://nn2/path”,”prefix”,”format”), ] } Limited Multi-master support Windows support Flume Office Hours, 2/28/2011 8
  • 9. Stories 9 Flume Office Hours, 2/28/2011
  • 10. : The Standard Use Case HDFS Flume Master Agent server Agent Collector server Agent server Agent server 10 Agent server Agent Collector server Agent server Agent server Agent server Agent Collector server Agent server Agent server Collector tier Agent tier Flume Office Hours, 2/28/2011
  • 11. : Multi Datacenter 11 HDFS Collector tier Agent api Agent api Agent Collector api Agent api API server Agent api Agent Collector api Agent api Agent api Agent api Agent Collector api Agent api Agent api Agent api Agent api Agent Collector api Agent proc Agent api Processor server Agent Collector api Agent api Agent proc Agent api Agent Collector api Agent api Agent proc Flume Office Hours, 2/28/2011
  • 12. : Multi Datacenter 12 HDFS Collector tier Agent api Agent api Agent Collector api Agent api API server Agent api Agent Collector api Agent api Agent api Agent api Agent Collector api Agent api Agent api Relay Agent api Agent api Agent Collector api Agent proc Agent api Processor server Agent Collector api Agent api Agent proc Agent api Agent Collector api Agent api Agent proc Flume Office Hours, 2/28/2011
  • 13. : Near Realtime Aggregator 13 HDFS DB Flume Agent Ad svr Collector Tracker Agent Ad svr Agent Ad svr Agent Ad svr quick reports Hive job verify reports Flume Office Hours, 2/28/2011
  • 14. An enterprise story 14 Kerberos HDFS Flume Collector tier Agent api Agent Collector api Agent api Win api API server Agent api Agent Collector api Agent api Linux api D D D D D D Agent api Agent Collector api Agent api Linux api Flume Office Hours, 2/28/2011 Active Directory / LDAP
  • 15. An emerging community story 15 HDFS HBase Incremental Search Idx Flume Agent Hive query Agent Agent Collector Fanout index hbase hdfs Agent svr Pig query Key lookup Range query Search query Faceted query Flume Office Hours, 2/28/2011
  • 16. What needs work?What comes next? Flume Office Hours, 2/28/2011 16
  • 17. Known issues Excessive event duplication (due to tail or e2e agent) Configuration translation problem in some cases Multi-master limited: doesn’t work with translations Flume Office Hours, 2/28/2011 17
  • 18. What’s next? (proposals) Fix Excessive duplication issues. Apache Incubator (?) Log4j/Log4net/logback/etc… Fix Multi-master limitations. Security upgrades for node to node comms (TLS/SSL) Improved metrics / GUI / usability Integration with open source alerting/monitoring tools Integration with proprietary systems Version proofing RPCs / State storage Packaging friendly plug-in install Multi Datacenter Story Performance Increases Inline near-realtime analytics Puppet/Chef style config for nodes Lightweight Agent Masterless Agent Better S3 / AWS support Flume Office Hours, 2/28/2011 18
  • 19. Q+A 19 Flume Office Hours, 2/28/2011