SlideShare a Scribd company logo
1 of 52
Download to read offline
Fluentd ♥ MongoDB
                          Log Everything As JSON


                 Kazuki Ohta, CTO at Treasure Data, Inc.




Tuesday, July 17, 2012
Self-Introduction
           •       Kazuki Ohta
                   >     twitter: @kzk_mover
                   >     github: kzk

           •       Treasure Data, Inc.
                   >     Chief Technology Officer; Founder
                   >     Original Fluentd Author @frsyuki is another co-founder.

           •       Open-Source Enthusiast
                   >     KDE, uim, Hadoop, memcached, Mozilla, Mongo, etc.
                   >     Fluentd rpm/deb package manager
                                                                              2
Tuesday, July 17, 2012
Logging? Why?




Tuesday, July 17, 2012
Figure 1: Common Logging Purposes




                                                  Analytics

                                                  Error Notification

                                                  Recommendation


                                                                   4
Tuesday, July 17, 2012
Figure 2: Types of Logs




                                           App Log

                                           Access Log
                                           (Apache, Rails, etc.)
                                           System Log
                                           (syslog etc.)
                                           Others
                                                                   5
Tuesday, July 17, 2012
From “Scaling Lessons learned at Dropbox”
                                                            6
Tuesday, July 17, 2012
Fragile for format change,
                         No type information,
                         No field name, etc.


                         From “Scaling Lessons learned at Dropbox”
                                                            6
Tuesday, July 17, 2012
About Fluentd




Tuesday, July 17, 2012
It's like syslogd, but uses JSON for log
                 messages


                                                            8
Tuesday, July 17, 2012
Logs in JSON? Why?

                     1. Machine-Readable
                     > machine is goint to be a main consumer of logs


                     2. Schema-Free
                     > you want to add/remove fields from logs at anytime



    Write Logs for Machines, use JSON
    http://journal.paul.querna.org/articles/2011/12/26/log-for-machines-in-json/
                                                                            9
Tuesday, July 17, 2012
Logs As TEXT


   Logs As JSON



                         + Field Name
                         + No Custom Parser
                         + Type Information
                         + Schema Free

                                       10
Tuesday, July 17, 2012
Logs As TEXT
         “2011-04-01 host1 myapp: cmessage size=12MB user=me”


   Logs As JSON
                         2011-04-01 myapp.message {
                             “on_host”: ”host1”,
                             ”combined”: true,
                             “size”: 12000000,     + Field Name
                             “user”: “me”          + No Custom Parser
                                                   + Type Information
                         }                         + Schema Free

                                                                 10
Tuesday, July 17, 2012
http://fluentd.org/




                                              11
Tuesday, July 17, 2012
•       Website
                   >     http://fluentd.org/

           •       Community
                   >     http://github.com/fluent
                   >     16 committers across
                         many organizations
                   >     web, game, enterprise

           •       Mailing list
                   >     Google groups

                                                   12
Tuesday, July 17, 2012
Fluentd Architecture




Tuesday, July 17, 2012
Fluentd: Log Format

                         Application



                          Fluentd




                          Storage


                                                14
Tuesday, July 17, 2012
Fluentd: Log Format

                         Application

                                       2012-02-04 01:33:51
                                       myapp.buylog {
                          Fluentd
                                           “user”: ”me”,
                                           “path”: “/buyItem”,
                                           “price”: 150,
                                           “referer”: “/landing”
                          Storage      }


                                                                   14
Tuesday, July 17, 2012
Fluentd: Log Format

                                                       time
                         Application                    tag
                                       2012-02-04 01:33:51
                                       myapp.buylog {
                          Fluentd
                                           “user”: ”me”,
                                           “path”: “/buyItem”,
                                           “price”: 150,
                                           “referer”: “/landing”
                          Storage      }
                                                    record

                                                                   14
Tuesday, July 17, 2012
Fluentd: Plugins

                             Application



                                           filter / buffer /
                              Fluentd
                                           routing




                              Storage


                                                              15
Tuesday, July 17, 2012
Fluentd: Plugins

                                       Application



                                                     filter / buffer /
                                        Fluentd
                                                     routing




                          SaaS          Storage            Fluentd

                         Plug-in        Plug-in           Plug-in
                                                                        15
Tuesday, July 17, 2012
Fluentd: Plugins

                                       Application



                                                     filter / buffer /
                                        Fluentd
                                                     routing




                          SaaS          Storage            Fluentd

                         Plug-in        Plug-in           Plug-in
                                                                        16
Tuesday, July 17, 2012
Fluentd: Plugins

            syslogd         Scribe     Application          File Plug-in

                                                     tail
           Plug-in Plug-in
                                                      filter / buffer /
                                        Fluentd
                                                      routing




                          SaaS          Storage                Fluentd

                         Plug-in        Plug-in               Plug-in
                                                                           16
Tuesday, July 17, 2012
•       Client libraries
                   > Ruby
                   > Perl             Application         Buffering

                   > PHP
                                            HTTP / TCP / UDS
                   > Python
                   > Java              Fluentd
                   > ...




                                                                17
Tuesday, July 17, 2012
•       Client libraries
                   > Ruby
                   > Perl               Application         Buffering

                   > PHP
                                              HTTP / TCP / UDS
                   > Python
                   > Java                Fluentd
                   > ...


            Fluent.open(“myapp”)
            Fluent.event(“login”, {“user”=>38})
            #=> 2012-02-04 04:56:01 myapp.login    {“user”:38}

                                                                  17
Tuesday, July 17, 2012
Typical Log Collection by `rsync`




               Burst of traffic
               rsync consumes
               all bandwidth



                                                             18
Tuesday, July 17, 2012
Typical Log Collection by `rsync`
                     App server              App server              App server

                   Application              Application            Application


               File File File ...          File File File ...     File File File ...


                                    File
               Burst of traffic                                 High latency
               rsync consumes                                   must wait for a day
               all bandwidth                 Log server         Hard to analyze
                                                                complex text parsers

                                                                                  18
Tuesday, July 17, 2012
Log Collection using Fluentd

                         Fluentd        Fluentd          Fluentd



                                                       Realtime!
                                   Fluentd   Fluentd




                                                                   19
Tuesday, July 17, 2012
Log Collection using Fluentd

                         Fluentd        Fluentd          Fluentd



                                                       Realtime!
                                   Fluentd   Fluentd



                                              Amazon     Ready to
                         Hadoop    Mongo
                                               S3 /
                          / Hive    DB
                                               EMR       Analyze!

                                                                    19
Tuesday, July 17, 2012
Fluentd Case Study
               Ruby on Rails              Ruby on Rails          Ruby on Rails


                         Fluentd              Fluentd               Fluentd




      ✓    127 RoR servers
      ✓    100,000 msgs/sec             Fluentd    Fluentd      routing
      ✓    120Mbps at peak
      ✓    1TB/day

                                      Hadoop            Mongo     User behavior
                           PV logs     / Hive            DB       logs

                                                                                 20
Tuesday, July 17, 2012
# read logs from a file         # forward other logs to servers
      <source>                        # (load-balancing + fail-over)
        type tail                     <match **>
        path /var/log/httpd.log         type forward
        format apache                   <server>
        tag apache.access                 host 192.168.0.11
      </source>                           weight 20
                                        </server>
      # save access logs to MongoDB     <server>
      <match apache.access>               host 192.168.0.12
        type mongo                        weight 60
        host 127.0.0.1                  </server>
      </match>                        </match>




Tuesday, July 17, 2012
Comparison




Tuesday, July 17, 2012
Scribe: log collector by
                               Facebook
                         Frontend servers

                                            Aggregator nodes
                             scribe
                                                scribe
                             scribe
                                                               Hadoop
                                                                HDFS
                             scribe
                                                scribe
                             scribe

                                                                        23
Tuesday, July 17, 2012
Scribe’s Pros & Cons
                • Pros.
                         • Fast (written in C++)
                • Cons.
                         • VERY HARD to install
                            • nightmare of boost, thrift, libhdfs, etc.
                         • Unstructured Logs
                            • parsing must be required before the analysis
                         • Hard to extend
                            • recompiling C++ programs are required
                         • No longer maintained

                                                                             24
Tuesday, July 17, 2012
Fluentd vs Scribe
                • Easy to install
                         • “gem install fluentd”
                         • Stable RPM and Deb packages
                           • http://packages.treasure-data.com/
                • Easy to write plugins
                         • you can use Ruby
                • Easy plugin distribution
                         • “gem search -rd fluent-plugin”


                                                                  25
Tuesday, July 17, 2012
Flume: distributed log collector by Cloudera

           Phisical
                                 Flume Master
          Topology

                         Flume      Flume       Flume




           Logical                                      Hadoop
          Topology                                       HDFS


                                                             26
Tuesday, July 17, 2012
Flume’s Pros & Cons
                • Pros.
                         • Central master server manages all nodes
                • Cons.
                         • Difficult to understand
                            • logical topologies, phisical servers and a
                              configuration of the logical/phisical mapping
                         • Difficult to configure
                            • replicated master servers, log servers and agents
                         • Big footprint
                            • 50,000 lines of Java

                                                                                  27
Tuesday, July 17, 2012
Fluentd vs Flume
                 • Easy to understand
                         • “syslogd that understands JSON”
                 • Easy to setup
                         • “sudo fluentd --setup && fluentd”
                 • Very small footprint
                         • small engine (3,000) lines + plugins
                         • small, but battle-tested!
                 • Easy to configure


                                                                  28
Tuesday, July 17, 2012
Fluentd           Scribe           Flume
          Installation          gem/rpm/deb          make          jar/rpm/deb

                                 3000 lines of    8000 lines of   50,000 lines of
          Footprint                 Ruby             C++              Java

          Plugin                    Ruby              N/A             Java

          Plugin distribution   RubyGems.org          N/A              N/A

          Master Server              No               No               Yes

          License               Apache License   Apache License   Apache License


                                                                                 29
Tuesday, July 17, 2012
Fluentd Plugin for




Tuesday, July 17, 2012
fluent-plugin-mongo
                • Included within rpm/deb by default!
                         • http://github.com/fluent/fluent-plugin-mongo
                • #1 plugin among 50+ Fluentd plugins
                         • Logs As JSON. WHY NOT Put Them Into Mongo??
                         • http://fluentd.org/plugin/
                • Supports most of the MongoDB features
                         • Authentication
                         • ReplicaSet
                         • Capped Collection

                                                                          31
Tuesday, July 17, 2012
• MongoDB Output Plugin
                     Application                           • Maintain JSON Structure
                                                           • Reliable Buffering
                                                           • Batch Insertion
                         Fluentd       Buffering           • Handle Broken Records
                                                             • Ruby Driver #82
                             Authentication


                         MongoDB              MongoDB               MongoDB    MongoDB
                                                                    MongoDB    MongoDB
                     Single Instance                                MongoDB    MongoDB
                    (Capped or Not)     MongoDB     MongoDB
                                                                          Sharding
                                              ReplicaSet

                                                                                     32
Tuesday, July 17, 2012
• MongoDB Output Plugin
                     Application                           • Maintain JSON Structure
                                                           • Reliable Buffering
                                                           • Batch Insertion
                         Fluentd       Buffering           • Handle Broken Records
                                                             • Ruby Driver #82
                             Authentication


                         MongoDB              MongoDB               MongoDB    MongoDB
                                                                    MongoDB    MongoDB
                     Single Instance                                MongoDB    MongoDB
                    (Capped or Not)     MongoDB     MongoDB
                                                                          Sharding
                                              ReplicaSet

                                                                                     32
Tuesday, July 17, 2012
ReplicaSet
                                          (Capped Collection)
             Single Instance
           (Capped Collection)                MongoDB

                    MongoDB          MongoDB        MongoDB


                         Authentication


                    Fluentd          Buffering
                                                        • MongoDB Input Plugin
                                                           • Tailing Capped Collections


                                                                                    33
Tuesday, July 17, 2012
ReplicaSet
                                          (Capped Collection)
             Single Instance
           (Capped Collection)                MongoDB

                    MongoDB          MongoDB        MongoDB


                         Authentication


                    Fluentd          Buffering
                                                        • MongoDB Input Plugin
                                                           • Tailing Capped Collections


                                                                                    33
Tuesday, July 17, 2012
Realtime Analytics with Fluentd + MongoDB

                          App                    App                 App


                         Fluentd             Fluentd                Fluentd




                             routing   Fluentd         Fluentd


          Nagios, Zabbix, etc.
                                            Mongo          query
                                                                   Charting
                         Alert               DB
                                                                              34
Tuesday, July 17, 2012
Realtime or Batch? No, BOTH!

                          App                          App                 App


                         Fluentd                   Fluentd                Fluentd




                             routing         Fluentd         Fluentd




        Hadoop                     Amazon         Mongo          query
                                                                         Charting
         / Hive                      S3            DB
             batch                 archive         realtime                         35
Tuesday, July 17, 2012
Intro of our company’s service: Treasure Data

                          App                    App                    App


                         Fluentd             Fluentd                   Fluentd




                             routing   Fluentd         Fluentd




      Treasure                              Mongo                Hadoop-based
        Data                                 DB                  Cloud Data Warehouse
             batch                           realtime                            36
Tuesday, July 17, 2012
Exercise: Apache Logs into MongoDB




Tuesday, July 17, 2012
Log File




                                    38
Tuesday, July 17, 2012
39
Tuesday, July 17, 2012
40
Tuesday, July 17, 2012
Conclusion
                • Log Everything as JSON
                         • Machine Readability
                         • Schema Freeness
                • MongoDB fits into Fluentd’s backend perfectly
                         • Both using JSON representation




                                                                  41
Tuesday, July 17, 2012

More Related Content

What's hot

Stability Patterns for Microservices
Stability Patterns for MicroservicesStability Patterns for Microservices
Stability Patterns for Microservicespflueras
 
MySQL Performance Schema in Action: the Complete Tutorial
MySQL Performance Schema in Action: the Complete TutorialMySQL Performance Schema in Action: the Complete Tutorial
MySQL Performance Schema in Action: the Complete TutorialSveta Smirnova
 
MySQL Load Balancers - Maxscale, ProxySQL, HAProxy, MySQL Router & nginx - A ...
MySQL Load Balancers - Maxscale, ProxySQL, HAProxy, MySQL Router & nginx - A ...MySQL Load Balancers - Maxscale, ProxySQL, HAProxy, MySQL Router & nginx - A ...
MySQL Load Balancers - Maxscale, ProxySQL, HAProxy, MySQL Router & nginx - A ...Severalnines
 
Galera cluster for high availability
Galera cluster for high availability Galera cluster for high availability
Galera cluster for high availability Mydbops
 
How to Analyze and Tune MySQL Queries for Better Performance
How to Analyze and Tune MySQL Queries for Better PerformanceHow to Analyze and Tune MySQL Queries for Better Performance
How to Analyze and Tune MySQL Queries for Better Performanceoysteing
 
OSMC 2022 | Ignite: Observability with Grafana & Prometheus for Kafka on Kube...
OSMC 2022 | Ignite: Observability with Grafana & Prometheus for Kafka on Kube...OSMC 2022 | Ignite: Observability with Grafana & Prometheus for Kafka on Kube...
OSMC 2022 | Ignite: Observability with Grafana & Prometheus for Kafka on Kube...NETWAYS
 
How Uber scaled its Real Time Infrastructure to Trillion events per day
How Uber scaled its Real Time Infrastructure to Trillion events per dayHow Uber scaled its Real Time Infrastructure to Trillion events per day
How Uber scaled its Real Time Infrastructure to Trillion events per dayDataWorks Summit
 
patroni-based citrus high availability environment deployment
patroni-based citrus high availability environment deploymentpatroni-based citrus high availability environment deployment
patroni-based citrus high availability environment deploymenthyeongchae lee
 
Patroni: Kubernetes-native PostgreSQL companion
Patroni: Kubernetes-native PostgreSQL companionPatroni: Kubernetes-native PostgreSQL companion
Patroni: Kubernetes-native PostgreSQL companionAlexander Kukushkin
 
PostgreSQL on EXT4, XFS, BTRFS and ZFS
PostgreSQL on EXT4, XFS, BTRFS and ZFSPostgreSQL on EXT4, XFS, BTRFS and ZFS
PostgreSQL on EXT4, XFS, BTRFS and ZFSTomas Vondra
 
Maria db 이중화구성_고민하기
Maria db 이중화구성_고민하기Maria db 이중화구성_고민하기
Maria db 이중화구성_고민하기NeoClova
 
Moving Gigantic Files Into and Out of the Alfresco Repository
Moving Gigantic Files Into and Out of the Alfresco RepositoryMoving Gigantic Files Into and Out of the Alfresco Repository
Moving Gigantic Files Into and Out of the Alfresco RepositoryJeff Potts
 
Avro Tutorial - Records with Schema for Kafka and Hadoop
Avro Tutorial - Records with Schema for Kafka and HadoopAvro Tutorial - Records with Schema for Kafka and Hadoop
Avro Tutorial - Records with Schema for Kafka and HadoopJean-Paul Azar
 
MySQL 8.0 EXPLAIN ANALYZE
MySQL 8.0 EXPLAIN ANALYZEMySQL 8.0 EXPLAIN ANALYZE
MySQL 8.0 EXPLAIN ANALYZENorvald Ryeng
 
MySQL Performance Tuning: Top 10 Tips
MySQL Performance Tuning: Top 10 TipsMySQL Performance Tuning: Top 10 Tips
MySQL Performance Tuning: Top 10 TipsOSSCube
 
Iceberg + Alluxio for Fast Data Analytics
Iceberg + Alluxio for Fast Data AnalyticsIceberg + Alluxio for Fast Data Analytics
Iceberg + Alluxio for Fast Data AnalyticsAlluxio, Inc.
 
PostgreSQLのgitレポジトリから見える2021年の開発状況(第30回PostgreSQLアンカンファレンス@オンライン 発表資料)
PostgreSQLのgitレポジトリから見える2021年の開発状況(第30回PostgreSQLアンカンファレンス@オンライン 発表資料)PostgreSQLのgitレポジトリから見える2021年の開発状況(第30回PostgreSQLアンカンファレンス@オンライン 発表資料)
PostgreSQLのgitレポジトリから見える2021年の開発状況(第30回PostgreSQLアンカンファレンス@オンライン 発表資料)NTT DATA Technology & Innovation
 

What's hot (20)

Stability Patterns for Microservices
Stability Patterns for MicroservicesStability Patterns for Microservices
Stability Patterns for Microservices
 
MySQL Performance Schema in Action: the Complete Tutorial
MySQL Performance Schema in Action: the Complete TutorialMySQL Performance Schema in Action: the Complete Tutorial
MySQL Performance Schema in Action: the Complete Tutorial
 
MySQL Load Balancers - Maxscale, ProxySQL, HAProxy, MySQL Router & nginx - A ...
MySQL Load Balancers - Maxscale, ProxySQL, HAProxy, MySQL Router & nginx - A ...MySQL Load Balancers - Maxscale, ProxySQL, HAProxy, MySQL Router & nginx - A ...
MySQL Load Balancers - Maxscale, ProxySQL, HAProxy, MySQL Router & nginx - A ...
 
Galera cluster for high availability
Galera cluster for high availability Galera cluster for high availability
Galera cluster for high availability
 
How to Analyze and Tune MySQL Queries for Better Performance
How to Analyze and Tune MySQL Queries for Better PerformanceHow to Analyze and Tune MySQL Queries for Better Performance
How to Analyze and Tune MySQL Queries for Better Performance
 
OSMC 2022 | Ignite: Observability with Grafana & Prometheus for Kafka on Kube...
OSMC 2022 | Ignite: Observability with Grafana & Prometheus for Kafka on Kube...OSMC 2022 | Ignite: Observability with Grafana & Prometheus for Kafka on Kube...
OSMC 2022 | Ignite: Observability with Grafana & Prometheus for Kafka on Kube...
 
How Uber scaled its Real Time Infrastructure to Trillion events per day
How Uber scaled its Real Time Infrastructure to Trillion events per dayHow Uber scaled its Real Time Infrastructure to Trillion events per day
How Uber scaled its Real Time Infrastructure to Trillion events per day
 
patroni-based citrus high availability environment deployment
patroni-based citrus high availability environment deploymentpatroni-based citrus high availability environment deployment
patroni-based citrus high availability environment deployment
 
Patroni: Kubernetes-native PostgreSQL companion
Patroni: Kubernetes-native PostgreSQL companionPatroni: Kubernetes-native PostgreSQL companion
Patroni: Kubernetes-native PostgreSQL companion
 
Apache Hadoop 3
Apache Hadoop 3Apache Hadoop 3
Apache Hadoop 3
 
Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
 
PostgreSQL on EXT4, XFS, BTRFS and ZFS
PostgreSQL on EXT4, XFS, BTRFS and ZFSPostgreSQL on EXT4, XFS, BTRFS and ZFS
PostgreSQL on EXT4, XFS, BTRFS and ZFS
 
Maria db 이중화구성_고민하기
Maria db 이중화구성_고민하기Maria db 이중화구성_고민하기
Maria db 이중화구성_고민하기
 
Moving Gigantic Files Into and Out of the Alfresco Repository
Moving Gigantic Files Into and Out of the Alfresco RepositoryMoving Gigantic Files Into and Out of the Alfresco Repository
Moving Gigantic Files Into and Out of the Alfresco Repository
 
Avro Tutorial - Records with Schema for Kafka and Hadoop
Avro Tutorial - Records with Schema for Kafka and HadoopAvro Tutorial - Records with Schema for Kafka and Hadoop
Avro Tutorial - Records with Schema for Kafka and Hadoop
 
ELK Stack
ELK StackELK Stack
ELK Stack
 
MySQL 8.0 EXPLAIN ANALYZE
MySQL 8.0 EXPLAIN ANALYZEMySQL 8.0 EXPLAIN ANALYZE
MySQL 8.0 EXPLAIN ANALYZE
 
MySQL Performance Tuning: Top 10 Tips
MySQL Performance Tuning: Top 10 TipsMySQL Performance Tuning: Top 10 Tips
MySQL Performance Tuning: Top 10 Tips
 
Iceberg + Alluxio for Fast Data Analytics
Iceberg + Alluxio for Fast Data AnalyticsIceberg + Alluxio for Fast Data Analytics
Iceberg + Alluxio for Fast Data Analytics
 
PostgreSQLのgitレポジトリから見える2021年の開発状況(第30回PostgreSQLアンカンファレンス@オンライン 発表資料)
PostgreSQLのgitレポジトリから見える2021年の開発状況(第30回PostgreSQLアンカンファレンス@オンライン 発表資料)PostgreSQLのgitレポジトリから見える2021年の開発状況(第30回PostgreSQLアンカンファレンス@オンライン 発表資料)
PostgreSQLのgitレポジトリから見える2021年の開発状況(第30回PostgreSQLアンカンファレンス@オンライン 発表資料)
 

Similar to Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012

Fluentd: the missing log collector
Fluentd: the missing log collectorFluentd: the missing log collector
Fluentd: the missing log collectortd_kiyoto
 
Symfony2 and MongoDB
Symfony2 and MongoDBSymfony2 and MongoDB
Symfony2 and MongoDBPablo Godel
 
Symfony2 y MongoDB - deSymfony 2012
Symfony2 y MongoDB - deSymfony 2012Symfony2 y MongoDB - deSymfony 2012
Symfony2 y MongoDB - deSymfony 2012Pablo Godel
 
oEmbed in Drupal
oEmbed in DrupaloEmbed in Drupal
oEmbed in DrupalPure Sign
 
Developing RESTful Web APIs with Python, Flask and MongoDB
Developing RESTful Web APIs with Python, Flask and MongoDBDeveloping RESTful Web APIs with Python, Flask and MongoDB
Developing RESTful Web APIs with Python, Flask and MongoDBNicola Iarocci
 
Large Scale Log Analysis with HBase and Solr at Amadeus (Martin Alig, ETH Zur...
Large Scale Log Analysis with HBase and Solr at Amadeus (Martin Alig, ETH Zur...Large Scale Log Analysis with HBase and Solr at Amadeus (Martin Alig, ETH Zur...
Large Scale Log Analysis with HBase and Solr at Amadeus (Martin Alig, ETH Zur...Swiss Big Data User Group
 
Who Pulls the Strings?
Who Pulls the Strings?Who Pulls the Strings?
Who Pulls the Strings?Ronny Trommer
 
Multilingual solutions florian loretan
Multilingual solutions florian loretanMultilingual solutions florian loretan
Multilingual solutions florian loretandrupalconf
 
You rang, M’LOD? Google Refine in the world of LOD
You rang, M’LOD? Google Refine in the world of LODYou rang, M’LOD? Google Refine in the world of LOD
You rang, M’LOD? Google Refine in the world of LODMateja Verlic
 
Presentation mongodb public sector dbsig malaysia
Presentation mongodb public sector dbsig malaysiaPresentation mongodb public sector dbsig malaysia
Presentation mongodb public sector dbsig malaysiaSyahman Mohamad
 
Building businesspost.ie using Node.js
Building businesspost.ie using Node.jsBuilding businesspost.ie using Node.js
Building businesspost.ie using Node.jsRichard Rodger
 
Games for the Masses (QCon London 2012)
Games for the Masses (QCon London 2012)Games for the Masses (QCon London 2012)
Games for the Masses (QCon London 2012)Wooga
 
ORCID Outreach Meeting dev breakout session
ORCID Outreach Meeting dev breakout sessionORCID Outreach Meeting dev breakout session
ORCID Outreach Meeting dev breakout sessionGudmundur Thorisson
 
EDF2012 Chris Taggart - How the biggest Open Database of Companies was built
EDF2012   Chris Taggart - How the biggest Open Database of Companies was builtEDF2012   Chris Taggart - How the biggest Open Database of Companies was built
EDF2012 Chris Taggart - How the biggest Open Database of Companies was builtEuropean Data Forum
 
Osm techniques and developemnt
Osm techniques and developemntOsm techniques and developemnt
Osm techniques and developemntDongpo Deng
 
MongoDB Aug2010 SF Meetup
MongoDB Aug2010 SF MeetupMongoDB Aug2010 SF Meetup
MongoDB Aug2010 SF MeetupScott Hernandez
 
Games for the Masses - Wie DevOps die Entwicklung von Architektur verändert (...
Games for the Masses - Wie DevOps die Entwicklung von Architektur verändert (...Games for the Masses - Wie DevOps die Entwicklung von Architektur verändert (...
Games for the Masses - Wie DevOps die Entwicklung von Architektur verändert (...Wooga
 

Similar to Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012 (20)

Fluentd: the missing log collector
Fluentd: the missing log collectorFluentd: the missing log collector
Fluentd: the missing log collector
 
Symfony2 and MongoDB
Symfony2 and MongoDBSymfony2 and MongoDB
Symfony2 and MongoDB
 
Symfony2 y MongoDB - deSymfony 2012
Symfony2 y MongoDB - deSymfony 2012Symfony2 y MongoDB - deSymfony 2012
Symfony2 y MongoDB - deSymfony 2012
 
oEmbed in Drupal
oEmbed in DrupaloEmbed in Drupal
oEmbed in Drupal
 
Developing RESTful Web APIs with Python, Flask and MongoDB
Developing RESTful Web APIs with Python, Flask and MongoDBDeveloping RESTful Web APIs with Python, Flask and MongoDB
Developing RESTful Web APIs with Python, Flask and MongoDB
 
The Heron Mapping Client
The Heron Mapping ClientThe Heron Mapping Client
The Heron Mapping Client
 
Large Scale Log Analysis with HBase and Solr at Amadeus (Martin Alig, ETH Zur...
Large Scale Log Analysis with HBase and Solr at Amadeus (Martin Alig, ETH Zur...Large Scale Log Analysis with HBase and Solr at Amadeus (Martin Alig, ETH Zur...
Large Scale Log Analysis with HBase and Solr at Amadeus (Martin Alig, ETH Zur...
 
Who Pulls the Strings?
Who Pulls the Strings?Who Pulls the Strings?
Who Pulls the Strings?
 
Multilingual solutions florian loretan
Multilingual solutions florian loretanMultilingual solutions florian loretan
Multilingual solutions florian loretan
 
Jenkins Evolutions
Jenkins EvolutionsJenkins Evolutions
Jenkins Evolutions
 
You rang, M’LOD? Google Refine in the world of LOD
You rang, M’LOD? Google Refine in the world of LODYou rang, M’LOD? Google Refine in the world of LOD
You rang, M’LOD? Google Refine in the world of LOD
 
Presentation mongodb public sector dbsig malaysia
Presentation mongodb public sector dbsig malaysiaPresentation mongodb public sector dbsig malaysia
Presentation mongodb public sector dbsig malaysia
 
Building businesspost.ie using Node.js
Building businesspost.ie using Node.jsBuilding businesspost.ie using Node.js
Building businesspost.ie using Node.js
 
Games for the Masses (QCon London 2012)
Games for the Masses (QCon London 2012)Games for the Masses (QCon London 2012)
Games for the Masses (QCon London 2012)
 
ORCID Outreach Meeting dev breakout session
ORCID Outreach Meeting dev breakout sessionORCID Outreach Meeting dev breakout session
ORCID Outreach Meeting dev breakout session
 
EDF2012 Chris Taggart - How the biggest Open Database of Companies was built
EDF2012   Chris Taggart - How the biggest Open Database of Companies was builtEDF2012   Chris Taggart - How the biggest Open Database of Companies was built
EDF2012 Chris Taggart - How the biggest Open Database of Companies was built
 
Osm techniques and developemnt
Osm techniques and developemntOsm techniques and developemnt
Osm techniques and developemnt
 
Orientação a objetos v2
Orientação a objetos v2Orientação a objetos v2
Orientação a objetos v2
 
MongoDB Aug2010 SF Meetup
MongoDB Aug2010 SF MeetupMongoDB Aug2010 SF Meetup
MongoDB Aug2010 SF Meetup
 
Games for the Masses - Wie DevOps die Entwicklung von Architektur verändert (...
Games for the Masses - Wie DevOps die Entwicklung von Architektur verändert (...Games for the Masses - Wie DevOps die Entwicklung von Architektur verändert (...
Games for the Masses - Wie DevOps die Entwicklung von Architektur verändert (...
 

More from Treasure Data, Inc.

GDPR: A Practical Guide for Marketers
GDPR: A Practical Guide for MarketersGDPR: A Practical Guide for Marketers
GDPR: A Practical Guide for MarketersTreasure Data, Inc.
 
AR and VR by the Numbers: A Data First Approach to the Technology and Market
AR and VR by the Numbers: A Data First Approach to the Technology and MarketAR and VR by the Numbers: A Data First Approach to the Technology and Market
AR and VR by the Numbers: A Data First Approach to the Technology and MarketTreasure Data, Inc.
 
Introduction to Customer Data Platforms
Introduction to Customer Data PlatformsIntroduction to Customer Data Platforms
Introduction to Customer Data PlatformsTreasure Data, Inc.
 
Hands-On: Managing Slowly Changing Dimensions Using TD Workflow
Hands-On: Managing Slowly Changing Dimensions Using TD WorkflowHands-On: Managing Slowly Changing Dimensions Using TD Workflow
Hands-On: Managing Slowly Changing Dimensions Using TD WorkflowTreasure Data, Inc.
 
Brand Analytics Management: Measuring CLV Across Platforms, Devices and Apps
Brand Analytics Management: Measuring CLV Across Platforms, Devices and AppsBrand Analytics Management: Measuring CLV Across Platforms, Devices and Apps
Brand Analytics Management: Measuring CLV Across Platforms, Devices and AppsTreasure Data, Inc.
 
How to Power Your Customer Experience with Data
How to Power Your Customer Experience with DataHow to Power Your Customer Experience with Data
How to Power Your Customer Experience with DataTreasure Data, Inc.
 
Why Your VR Game is Virtually Useless Without Data
Why Your VR Game is Virtually Useless Without DataWhy Your VR Game is Virtually Useless Without Data
Why Your VR Game is Virtually Useless Without DataTreasure Data, Inc.
 
Connecting the Customer Data Dots
Connecting the Customer Data DotsConnecting the Customer Data Dots
Connecting the Customer Data DotsTreasure Data, Inc.
 
Harnessing Data for Better Customer Experience and Company Success
Harnessing Data for Better Customer Experience and Company SuccessHarnessing Data for Better Customer Experience and Company Success
Harnessing Data for Better Customer Experience and Company SuccessTreasure Data, Inc.
 
Packaging Ecosystems -Monki Gras 2017
Packaging Ecosystems -Monki Gras 2017Packaging Ecosystems -Monki Gras 2017
Packaging Ecosystems -Monki Gras 2017Treasure Data, Inc.
 
글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)
글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)
글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)Treasure Data, Inc.
 
Introduction to New features and Use cases of Hivemall
Introduction to New features and Use cases of HivemallIntroduction to New features and Use cases of Hivemall
Introduction to New features and Use cases of HivemallTreasure Data, Inc.
 
Scaling to Infinity - Open Source meets Big Data
Scaling to Infinity - Open Source meets Big DataScaling to Infinity - Open Source meets Big Data
Scaling to Infinity - Open Source meets Big DataTreasure Data, Inc.
 
Treasure Data: Move your data from MySQL to Redshift with (not much more tha...
Treasure Data:  Move your data from MySQL to Redshift with (not much more tha...Treasure Data:  Move your data from MySQL to Redshift with (not much more tha...
Treasure Data: Move your data from MySQL to Redshift with (not much more tha...Treasure Data, Inc.
 
Treasure Data From MySQL to Redshift
Treasure Data  From MySQL to RedshiftTreasure Data  From MySQL to Redshift
Treasure Data From MySQL to RedshiftTreasure Data, Inc.
 
Unifying Events and Logs into the Cloud
Unifying Events and Logs into the CloudUnifying Events and Logs into the Cloud
Unifying Events and Logs into the CloudTreasure Data, Inc.
 

More from Treasure Data, Inc. (20)

GDPR: A Practical Guide for Marketers
GDPR: A Practical Guide for MarketersGDPR: A Practical Guide for Marketers
GDPR: A Practical Guide for Marketers
 
AR and VR by the Numbers: A Data First Approach to the Technology and Market
AR and VR by the Numbers: A Data First Approach to the Technology and MarketAR and VR by the Numbers: A Data First Approach to the Technology and Market
AR and VR by the Numbers: A Data First Approach to the Technology and Market
 
Introduction to Customer Data Platforms
Introduction to Customer Data PlatformsIntroduction to Customer Data Platforms
Introduction to Customer Data Platforms
 
Hands On: Javascript SDK
Hands On: Javascript SDKHands On: Javascript SDK
Hands On: Javascript SDK
 
Hands-On: Managing Slowly Changing Dimensions Using TD Workflow
Hands-On: Managing Slowly Changing Dimensions Using TD WorkflowHands-On: Managing Slowly Changing Dimensions Using TD Workflow
Hands-On: Managing Slowly Changing Dimensions Using TD Workflow
 
Brand Analytics Management: Measuring CLV Across Platforms, Devices and Apps
Brand Analytics Management: Measuring CLV Across Platforms, Devices and AppsBrand Analytics Management: Measuring CLV Across Platforms, Devices and Apps
Brand Analytics Management: Measuring CLV Across Platforms, Devices and Apps
 
How to Power Your Customer Experience with Data
How to Power Your Customer Experience with DataHow to Power Your Customer Experience with Data
How to Power Your Customer Experience with Data
 
Why Your VR Game is Virtually Useless Without Data
Why Your VR Game is Virtually Useless Without DataWhy Your VR Game is Virtually Useless Without Data
Why Your VR Game is Virtually Useless Without Data
 
Connecting the Customer Data Dots
Connecting the Customer Data DotsConnecting the Customer Data Dots
Connecting the Customer Data Dots
 
Harnessing Data for Better Customer Experience and Company Success
Harnessing Data for Better Customer Experience and Company SuccessHarnessing Data for Better Customer Experience and Company Success
Harnessing Data for Better Customer Experience and Company Success
 
Packaging Ecosystems -Monki Gras 2017
Packaging Ecosystems -Monki Gras 2017Packaging Ecosystems -Monki Gras 2017
Packaging Ecosystems -Monki Gras 2017
 
글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)
글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)
글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)
 
Keynote - Fluentd meetup v14
Keynote - Fluentd meetup v14Keynote - Fluentd meetup v14
Keynote - Fluentd meetup v14
 
Introduction to New features and Use cases of Hivemall
Introduction to New features and Use cases of HivemallIntroduction to New features and Use cases of Hivemall
Introduction to New features and Use cases of Hivemall
 
Scalable Hadoop in the cloud
Scalable Hadoop in the cloudScalable Hadoop in the cloud
Scalable Hadoop in the cloud
 
Using Embulk at Treasure Data
Using Embulk at Treasure DataUsing Embulk at Treasure Data
Using Embulk at Treasure Data
 
Scaling to Infinity - Open Source meets Big Data
Scaling to Infinity - Open Source meets Big DataScaling to Infinity - Open Source meets Big Data
Scaling to Infinity - Open Source meets Big Data
 
Treasure Data: Move your data from MySQL to Redshift with (not much more tha...
Treasure Data:  Move your data from MySQL to Redshift with (not much more tha...Treasure Data:  Move your data from MySQL to Redshift with (not much more tha...
Treasure Data: Move your data from MySQL to Redshift with (not much more tha...
 
Treasure Data From MySQL to Redshift
Treasure Data  From MySQL to RedshiftTreasure Data  From MySQL to Redshift
Treasure Data From MySQL to Redshift
 
Unifying Events and Logs into the Cloud
Unifying Events and Logs into the CloudUnifying Events and Logs into the Cloud
Unifying Events and Logs into the Cloud
 

Recently uploaded

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 

Recently uploaded (20)

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 

Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012

  • 1. Fluentd ♥ MongoDB Log Everything As JSON Kazuki Ohta, CTO at Treasure Data, Inc. Tuesday, July 17, 2012
  • 2. Self-Introduction • Kazuki Ohta > twitter: @kzk_mover > github: kzk • Treasure Data, Inc. > Chief Technology Officer; Founder > Original Fluentd Author @frsyuki is another co-founder. • Open-Source Enthusiast > KDE, uim, Hadoop, memcached, Mozilla, Mongo, etc. > Fluentd rpm/deb package manager 2 Tuesday, July 17, 2012
  • 4. Figure 1: Common Logging Purposes Analytics Error Notification Recommendation 4 Tuesday, July 17, 2012
  • 5. Figure 2: Types of Logs App Log Access Log (Apache, Rails, etc.) System Log (syslog etc.) Others 5 Tuesday, July 17, 2012
  • 6. From “Scaling Lessons learned at Dropbox” 6 Tuesday, July 17, 2012
  • 7. Fragile for format change, No type information, No field name, etc. From “Scaling Lessons learned at Dropbox” 6 Tuesday, July 17, 2012
  • 9. It's like syslogd, but uses JSON for log messages 8 Tuesday, July 17, 2012
  • 10. Logs in JSON? Why? 1. Machine-Readable > machine is goint to be a main consumer of logs 2. Schema-Free > you want to add/remove fields from logs at anytime Write Logs for Machines, use JSON http://journal.paul.querna.org/articles/2011/12/26/log-for-machines-in-json/ 9 Tuesday, July 17, 2012
  • 11. Logs As TEXT Logs As JSON + Field Name + No Custom Parser + Type Information + Schema Free 10 Tuesday, July 17, 2012
  • 12. Logs As TEXT “2011-04-01 host1 myapp: cmessage size=12MB user=me” Logs As JSON 2011-04-01 myapp.message { “on_host”: ”host1”, ”combined”: true, “size”: 12000000, + Field Name “user”: “me” + No Custom Parser + Type Information } + Schema Free 10 Tuesday, July 17, 2012
  • 13. http://fluentd.org/ 11 Tuesday, July 17, 2012
  • 14. Website > http://fluentd.org/ • Community > http://github.com/fluent > 16 committers across many organizations > web, game, enterprise • Mailing list > Google groups 12 Tuesday, July 17, 2012
  • 16. Fluentd: Log Format Application Fluentd Storage 14 Tuesday, July 17, 2012
  • 17. Fluentd: Log Format Application 2012-02-04 01:33:51 myapp.buylog { Fluentd “user”: ”me”, “path”: “/buyItem”, “price”: 150, “referer”: “/landing” Storage } 14 Tuesday, July 17, 2012
  • 18. Fluentd: Log Format time Application tag 2012-02-04 01:33:51 myapp.buylog { Fluentd “user”: ”me”, “path”: “/buyItem”, “price”: 150, “referer”: “/landing” Storage } record 14 Tuesday, July 17, 2012
  • 19. Fluentd: Plugins Application filter / buffer / Fluentd routing Storage 15 Tuesday, July 17, 2012
  • 20. Fluentd: Plugins Application filter / buffer / Fluentd routing SaaS Storage Fluentd Plug-in Plug-in Plug-in 15 Tuesday, July 17, 2012
  • 21. Fluentd: Plugins Application filter / buffer / Fluentd routing SaaS Storage Fluentd Plug-in Plug-in Plug-in 16 Tuesday, July 17, 2012
  • 22. Fluentd: Plugins syslogd Scribe Application File Plug-in tail Plug-in Plug-in filter / buffer / Fluentd routing SaaS Storage Fluentd Plug-in Plug-in Plug-in 16 Tuesday, July 17, 2012
  • 23. Client libraries > Ruby > Perl Application Buffering > PHP HTTP / TCP / UDS > Python > Java Fluentd > ... 17 Tuesday, July 17, 2012
  • 24. Client libraries > Ruby > Perl Application Buffering > PHP HTTP / TCP / UDS > Python > Java Fluentd > ... Fluent.open(“myapp”) Fluent.event(“login”, {“user”=>38}) #=> 2012-02-04 04:56:01 myapp.login {“user”:38} 17 Tuesday, July 17, 2012
  • 25. Typical Log Collection by `rsync` Burst of traffic rsync consumes all bandwidth 18 Tuesday, July 17, 2012
  • 26. Typical Log Collection by `rsync` App server App server App server Application Application Application File File File ... File File File ... File File File ... File Burst of traffic High latency rsync consumes must wait for a day all bandwidth Log server Hard to analyze complex text parsers 18 Tuesday, July 17, 2012
  • 27. Log Collection using Fluentd Fluentd Fluentd Fluentd Realtime! Fluentd Fluentd 19 Tuesday, July 17, 2012
  • 28. Log Collection using Fluentd Fluentd Fluentd Fluentd Realtime! Fluentd Fluentd Amazon Ready to Hadoop Mongo S3 / / Hive DB EMR Analyze! 19 Tuesday, July 17, 2012
  • 29. Fluentd Case Study Ruby on Rails Ruby on Rails Ruby on Rails Fluentd Fluentd Fluentd ✓ 127 RoR servers ✓ 100,000 msgs/sec Fluentd Fluentd routing ✓ 120Mbps at peak ✓ 1TB/day Hadoop Mongo User behavior PV logs / Hive DB logs 20 Tuesday, July 17, 2012
  • 30. # read logs from a file # forward other logs to servers <source> # (load-balancing + fail-over) type tail <match **> path /var/log/httpd.log type forward format apache <server> tag apache.access host 192.168.0.11 </source> weight 20 </server> # save access logs to MongoDB <server> <match apache.access> host 192.168.0.12 type mongo weight 60 host 127.0.0.1 </server> </match> </match> Tuesday, July 17, 2012
  • 32. Scribe: log collector by Facebook Frontend servers Aggregator nodes scribe scribe scribe Hadoop HDFS scribe scribe scribe 23 Tuesday, July 17, 2012
  • 33. Scribe’s Pros & Cons • Pros. • Fast (written in C++) • Cons. • VERY HARD to install • nightmare of boost, thrift, libhdfs, etc. • Unstructured Logs • parsing must be required before the analysis • Hard to extend • recompiling C++ programs are required • No longer maintained 24 Tuesday, July 17, 2012
  • 34. Fluentd vs Scribe • Easy to install • “gem install fluentd” • Stable RPM and Deb packages • http://packages.treasure-data.com/ • Easy to write plugins • you can use Ruby • Easy plugin distribution • “gem search -rd fluent-plugin” 25 Tuesday, July 17, 2012
  • 35. Flume: distributed log collector by Cloudera Phisical Flume Master Topology Flume Flume Flume Logical Hadoop Topology HDFS 26 Tuesday, July 17, 2012
  • 36. Flume’s Pros & Cons • Pros. • Central master server manages all nodes • Cons. • Difficult to understand • logical topologies, phisical servers and a configuration of the logical/phisical mapping • Difficult to configure • replicated master servers, log servers and agents • Big footprint • 50,000 lines of Java 27 Tuesday, July 17, 2012
  • 37. Fluentd vs Flume • Easy to understand • “syslogd that understands JSON” • Easy to setup • “sudo fluentd --setup && fluentd” • Very small footprint • small engine (3,000) lines + plugins • small, but battle-tested! • Easy to configure 28 Tuesday, July 17, 2012
  • 38. Fluentd Scribe Flume Installation gem/rpm/deb make jar/rpm/deb 3000 lines of 8000 lines of 50,000 lines of Footprint Ruby C++ Java Plugin Ruby N/A Java Plugin distribution RubyGems.org N/A N/A Master Server No No Yes License Apache License Apache License Apache License 29 Tuesday, July 17, 2012
  • 40. fluent-plugin-mongo • Included within rpm/deb by default! • http://github.com/fluent/fluent-plugin-mongo • #1 plugin among 50+ Fluentd plugins • Logs As JSON. WHY NOT Put Them Into Mongo?? • http://fluentd.org/plugin/ • Supports most of the MongoDB features • Authentication • ReplicaSet • Capped Collection 31 Tuesday, July 17, 2012
  • 41. • MongoDB Output Plugin Application • Maintain JSON Structure • Reliable Buffering • Batch Insertion Fluentd Buffering • Handle Broken Records • Ruby Driver #82 Authentication MongoDB MongoDB MongoDB MongoDB MongoDB MongoDB Single Instance MongoDB MongoDB (Capped or Not) MongoDB MongoDB Sharding ReplicaSet 32 Tuesday, July 17, 2012
  • 42. • MongoDB Output Plugin Application • Maintain JSON Structure • Reliable Buffering • Batch Insertion Fluentd Buffering • Handle Broken Records • Ruby Driver #82 Authentication MongoDB MongoDB MongoDB MongoDB MongoDB MongoDB Single Instance MongoDB MongoDB (Capped or Not) MongoDB MongoDB Sharding ReplicaSet 32 Tuesday, July 17, 2012
  • 43. ReplicaSet (Capped Collection) Single Instance (Capped Collection) MongoDB MongoDB MongoDB MongoDB Authentication Fluentd Buffering • MongoDB Input Plugin • Tailing Capped Collections 33 Tuesday, July 17, 2012
  • 44. ReplicaSet (Capped Collection) Single Instance (Capped Collection) MongoDB MongoDB MongoDB MongoDB Authentication Fluentd Buffering • MongoDB Input Plugin • Tailing Capped Collections 33 Tuesday, July 17, 2012
  • 45. Realtime Analytics with Fluentd + MongoDB App App App Fluentd Fluentd Fluentd routing Fluentd Fluentd Nagios, Zabbix, etc. Mongo query Charting Alert DB 34 Tuesday, July 17, 2012
  • 46. Realtime or Batch? No, BOTH! App App App Fluentd Fluentd Fluentd routing Fluentd Fluentd Hadoop Amazon Mongo query Charting / Hive S3 DB batch archive realtime 35 Tuesday, July 17, 2012
  • 47. Intro of our company’s service: Treasure Data App App App Fluentd Fluentd Fluentd routing Fluentd Fluentd Treasure Mongo Hadoop-based Data DB Cloud Data Warehouse batch realtime 36 Tuesday, July 17, 2012
  • 48. Exercise: Apache Logs into MongoDB Tuesday, July 17, 2012
  • 49. Log File 38 Tuesday, July 17, 2012
  • 52. Conclusion • Log Everything as JSON • Machine Readability • Schema Freeness • MongoDB fits into Fluentd’s backend perfectly • Both using JSON representation 41 Tuesday, July 17, 2012