SlideShare una empresa de Scribd logo
1 de 43
Descargar para leer sin conexión
collectd & PostgreSQL

       Mark Wong
 markwkm@postgresql.org
 mark.wong@myemma.com
         PDXPUG


    November 17, 2011
My Story



     • How did I get to collectd?
     • What is collectd
     • Hacking collectd
     • Using collectd with Postgres
     • Visualizing the data




    markwkm (PDXPUG)            collectd & PostgreSQL   November 17, 2011   2 / 43
Brief background




     • Working at a little company called Emma http://myemma.com
     • Collect performance data from production systems




    markwkm (PDXPUG)           collectd & PostgreSQL       November 17, 2011   3 / 43
What did we have?



     • A database with over 1 million database objects
           • >500,000 tables
           • >1,000,000 indexes

     • Tables alone generate 11,000,000 data point per sample




    markwkm (PDXPUG)            collectd & PostgreSQL           November 17, 2011   4 / 43
What did we try?


  Only free things:
      • Cacti http://www.cacti.net/
      • Ganglia http://ganglia.info/
      • Munin http://munin-monitoring.org/
      • Reconnoiter https://labs.omniti.com/labs/reconnoiter
      • Zenoss http://community.zenoss.org/




    markwkm (PDXPUG)         collectd & PostgreSQL      November 17, 2011   5 / 43
What doesn’t work

  Dependency on RRDtool; can’t handle more than hundreds of thousands of
  metrics (Application Buffer-Cache Management for Performance: Running the
  World’s Largest MRTG by David Plonka, Archit Gupta and Dale Carder, LISA
  2007):
      • Cacti
      • Ganglia
      • Munin
      • Reconnoiter
      • Zenoss



     markwkm (PDXPUG)           collectd & PostgreSQL          November 17, 2011   6 / 43
Reconnoiter almost worked for us

  Pro’s:
      • Write your own SQL queries to collect data from Postgres
      • Used Postgres instead of RRDtool for storing data
      • JavaScript based on-the-fly charting
      • Support for integrating many other health and stats collection solutions
  Con’s:
      • Data collection still couldn’t keep up; maybe needed more tuning
      • Faster hardware? (using VM’s)
      • More hardware? (scale out MQ processes)


     markwkm (PDXPUG)             collectd & PostgreSQL            November 17, 2011   7 / 43
Couldn’t bring myself to try anything else



      • Hands were tied, no resources available to help move forward.
      • Can we build something light weight?
      • Played with collectd (http://collectd.org/) while evaluating
          Reconnoiter




     markwkm (PDXPUG)            collectd & PostgreSQL            November 17, 2011   8 / 43
What is collectd?



          collectd is a daemon which collects system performance
          statistics periodically and provides mechanisms to store the
          values in a variety of ways, for example in RRD files.

  http://collectd.org/




     markwkm (PDXPUG)               collectd & PostgreSQL            November 17, 2011   9 / 43
Does this look familiar?




  Note: RRDtool is an option, not a requirement
     markwkm (PDXPUG)                  collectd & PostgreSQL   November 17, 2011   10 / 43
What is special about collectd?

  From their web site:
      •         it’s written in C for performance and portability
      •         includes optimizations and features to handle hundreds
                of thousands of data sets
      • PostgreSQL plugin enables querying the database
      • Can collect most operating systems statistics (I say “most” because I
         don’t know if anything is missing)
      • Over 90 total plugins
         http://collectd.org/wiki/index.php/Table_of_Plugins


     markwkm (PDXPUG)             collectd & PostgreSQL           November 17, 2011   11 / 43
collectd data description

      • time - when the data was collected
      • interval - frequency of data collection
      • host - server hostname
      • plugin - collectd plugin used
      • plugin instance - additional plugin information
      • type - type of data collected for set of values
      • type instance - unique identifier of the metric
      • dsnames - names for the values collected
      • dstypes - type of data for values collected (e.g. counter, gauge, etc.)
      • values - array of values collected

     markwkm (PDXPUG)              collectd & PostgreSQL            November 17, 2011   12 / 43
PostgreSQL plugin configuration
  Define custom queries in collectd.conf:

  LoadPlugin postgresql
  <Plugin postgresql>
     <Query magic>
         Statement "SELECT magic FROM wizard;"
         <Result>
             Type gauge
             InstancePrefix "magic"
             ValuesFrom magic
         </Result>
     </Query>
  ...

     markwkm (PDXPUG)             collectd & PostgreSQL   November 17, 2011   13 / 43
. . . per database.

...
   <Database bar>
       Interval 60
       Service "service_name"
       Query backend # predefined
       Query magic_tickets
   </Database>
</Plugin>


Full details at
http://collectd.org/wiki/index.php/Plugin:PostgreSQL

   markwkm (PDXPUG)       collectd & PostgreSQL        November 17, 2011   14 / 43
Hurdles



  More meta data:
      • Need a way to save schema, table, and index names; can’t differentiate
        stats between tables and indexes
      • Basic support of meta data in collectd but mostly unused
      • How to store data in something other than RRDtool




     markwkm (PDXPUG)            collectd & PostgreSQL          November 17, 2011   15 / 43
Wanted: additional meta data


  Hack the PostgreSQL plugin to create meta data for:
      • database - database name (maybe not needed, same as
         plugin instance)
      • schemaname - schema name
      • tablename - table name
      • indexname - index name
      • metric - e.g. blks hit, blks read, seq scan, etc.




     markwkm (PDXPUG)           collectd & PostgreSQL         November 17, 2011   16 / 43
Another database query for collecting a table statistic



  <Query table_stats>
      SELECT schemaname, relname, seq_scan
      FROM pg_stat_all_tables;
  <Query>




     markwkm (PDXPUG)       collectd & PostgreSQL   November 17, 2011   17 / 43
Identify the data



  <Result>
      Type counter
      InstancePrefix "seq_scan"
      InstancesFrom "schemaname" "relname"
      ValuesFrom "seq_scan"
  </Result>




     markwkm (PDXPUG)       collectd & PostgreSQL   November 17, 2011   18 / 43
Meta data specific parameters


  <Database postgres>
      Host "localhost"
      Query table_stats
      SchemanameColumn 0
      TablenameColumn 1
  </Database>



  Note: The database name is set by what is specified in the <Database>tag, if
  it is not retrieved by the query.

     markwkm (PDXPUG)            collectd & PostgreSQL          November 17, 2011   19 / 43
Example data

     • time: 2011-10-20 18:04:17-05
     • interval: 300
     • host: pong.int
     • plugin: postgresql
     • plugin instance: sandbox
     • type: counter
     • type instance: seq scan-pg catalog-pg class
     • dsnames: {value}
     • dstypes: {counter}
     • values: {249873}

    markwkm (PDXPUG)              collectd & PostgreSQL   November 17, 2011   20 / 43
Example meta data



     • database: sandbox
     • schemaname: pg catalog
     • tablename: pg class
     • indexname:
     • metric: seq scan




    markwkm (PDXPUG)            collectd & PostgreSQL   November 17, 2011   21 / 43
Now what?



  Hand’s were tied (I think I mentioned that earlier); open sourced work to date:

      • collectd forked with patches
        https://github.com/mwongatemma/collectd
      • YAMS https://github.com/myemma/yams




     markwkm (PDXPUG)              collectd & PostgreSQL            November 17, 2011   22 / 43
Yet Another Monitoring System




markwkm (PDXPUG)             collectd & PostgreSQL   November 17, 2011   23 / 43
Switching hats and boosting code




  Using extracurricular time working on equipment donated to Postgres from
  SUN, IBM, and HP to continue proofing collectd changes.




     markwkm (PDXPUG)            collectd & PostgreSQL           November 17, 2011   24 / 43
How am I going to move the data?

  Options from available write plugins; guess which I used:
      • Carbon - Graphite’s storage API to Whisper
        http://collectd.org/wiki/index.php/Plugin:Carbon
      • CSV http://collectd.org/wiki/index.php/Plugin:CSV
      • Network - Send/Receive to other collectd daemons
        http://collectd.org/wiki/index.php/Plugin:Network
      • RRDCacheD http://collectd.org/wiki/index.php/Plugin:RRDCacheD
      • RRDtool http://collectd.org/wiki/index.php/Plugin:RRDtool
      • SysLog http://collectd.org/wiki/index.php/Plugin:SysLog
      • UnixSock http://collectd.org/wiki/index.php/Plugin:UnixSock
      • Write HTTP - PUTVAL (plain text), JSON
        http://collectd.org/wiki/index.php/Plugin:Write_HTTP

     markwkm (PDXPUG)              collectd & PostgreSQL      November 17, 2011   25 / 43
Process of elimination

  If RRDtool (wriiten in C) can’t handle massive volumes of data, a Python
  RRD like database probably can’t either:
       • Carbon
       • CSV
       • Network
       • RRDCacheD
       • RRDtool
       • SysLog
       • UnixSock
       • Write HTTP - PUTVAL (plain text), JSON

     markwkm (PDXPUG)             collectd & PostgreSQL           November 17, 2011   26 / 43
Process of elimination


  Writing to other collectd daemons or just locally doesn’t seem useful at the
  moment:
      • CSV
      • Network
      • SysLog
      • UnixSock
      • Write HTTP - PUTVAL (plain text), JSON




     markwkm (PDXPUG)              collectd & PostgreSQL            November 17, 2011   27 / 43
Process of elimination



  Let’s try CouchDB’s RESTful JSON API!
       • CSV
       • SysLog
       • Write HTTP - PUTVAL (plain text), JSON




     markwkm (PDXPUG)          collectd & PostgreSQL   November 17, 2011   28 / 43
Random: What Write HTTP PUTVAL data looks like
  Note: Each PUTVAL is a single line but is broken up into two lines to fit onto
  the slide.

  PUTVAL leeloo.lan.home.verplant.org/disk-sda/disk_octets
      interval=10 1251533299:197141504:175136768
  PUTVAL leeloo.lan.home.verplant.org/disk-sda/disk_ops
      interval=10 1251533299:10765:12858
  PUTVAL leeloo.lan.home.verplant.org/disk-sda/disk_time
      interval=10 1251533299:5:140
  PUTVAL leeloo.lan.home.verplant.org/disk-sda/disk_merged
      interval=10 1251533299:4658:29899


     markwkm (PDXPUG)             collectd & PostgreSQL           November 17, 2011   29 / 43
Random: What the Write HTTP JSON data looks like
  Note: Write HTTP packs as much data as it can into a 4KB buffer.
   [ {
       "values": [197141504, 175136768],
       "dstypes": ["counter", "counter"],
       "dsnames": ["read", "write"],
       "time": 1251533299,
       "interval": 10,
       "host": "leeloo.lan.home.verplant.org",
       "plugin": "disk",
       "plugin_instance": "sda",
       "type": "disk_octets",
       "type_instance": ""
     }, ... ]
     markwkm (PDXPUG)           collectd & PostgreSQL         November 17, 2011   30 / 43
I didn’t know anything about CouchDB at the time



     • Query interface not really suited for retrieving data to visualize
     • Insert performance not suited for millions of metrics of data over short
         intervals (can insert same data into Postgres several orders of
         magnitude faster)




    markwkm (PDXPUG)               collectd & PostgreSQL            November 17, 2011   31 / 43
Now where am I going to put the data?



  Hoping that using the Write HTTP is still a good choice:
      • Write an ETL
                •   Table partitioning logic; creation of partition tables
                •   Transform JSON data into INSERT statements
      • Use Postgres




     markwkm (PDXPUG)                     collectd & PostgreSQL              November 17, 2011   32 / 43
Database design
                Table "collectd.value_list"
       Column      |           Type           | Modifiers
  -----------------+--------------------------+-----------
   time            | timestamp with time zone | not null
   interval        | integer                  | not null
   host            | character varying(64)    | not null
   plugin          | character varying(64)    | not null
   plugin_instance | character varying(64)    |
   type            | character varying(64)    | not null
   type_instance   | character varying(64)    |
   dsnames         | character varying(512)[] | not null
   dstypes         | character varying(8)[]   | not null
   values          | numeric[]                | not null
    markwkm (PDXPUG)        collectd & PostgreSQL     November 17, 2011   33 / 43
Take advantage of partitioning




  At least table inheritance in Postgres’ case; partition data by plugin




     markwkm (PDXPUG)               collectd & PostgreSQL             November 17, 2011   34 / 43
Child table
               Table "collectd.vl_postgresql"
       Column      |           Type            | Modifiers
  -----------------+--------------------------+-----------
   ...
   database        | character varying(64)     | not null
   schemaname      | character varying(64)     |
   tablename       | character varying(64)     |
   indexname       | character varying(64)     |
   metric          | character varying(64)     | not null
  Check constraints:
      "vl_postgresql_plugin_check" CHECK (plugin::text =
                                           ’postgresql’::text)
  Inherits: value_list
     markwkm (PDXPUG)       collectd & PostgreSQL      November 17, 2011   35 / 43
How much partitioning?


  Lots of straightforward options:
      • Date
      • Database
      • Schema
      • Table
      • Index
      • Metric




     markwkm (PDXPUG)                collectd & PostgreSQL   November 17, 2011   36 / 43
Back to the ETL


  Parameters set for fastest path to working prototype:
      • Keeping using HTTP POST (Write HTTP plugin) for HTTP protocol
        and JSON
      • Use Python for built in HTTP Server and JSON parsing (Emma is
        primarily a Python shop)
      • Use SQLAlchemy/psycopg2




    markwkm (PDXPUG)          collectd & PostgreSQL       November 17, 2011   37 / 43
Back again to the ETL

  Python didn’t perform; combination of JSON parsing, data transformation,
  and INSERT performance still several orders of magnitude below acceptable
  levels:
       • redis to queue data to transform
       • lighttpd for the HTTP interface
       • fastcgi C program to push things to redis
       • multi-threaded C program using libpq for Postgres API
                •   pop data out of redis
                •   table partitioning creation logic
                •   transform JSON data into INSERT statements


     markwkm (PDXPUG)                  collectd & PostgreSQL     November 17, 2011   38 / 43
Success?




     • Table statistics for 1 million tables collect in approximately 12 minutes.
     • Is that acceptable?
     • Can we go faster?




    markwkm (PDXPUG)              collectd & PostgreSQL             November 17, 2011   39 / 43
If you don’t have millions of data


  Easier ways to visualize the data:
       • RRDtool
       • RRDtool compatible front-ends
         http://collectd.org/wiki/index.php/List_of_front-ends
       • Graphite with the Carbon and Whisper combo
         http://graphite.wikidot.com/
       • Reconnoiter




     markwkm (PDXPUG)       collectd & PostgreSQL      November 17, 2011   40 / 43
__      __
          / ~~~/  . o O ( Thank you! )
    ,----(       oo     )
  /        __      __/
 /|            ( |(
^     /___ / |
    |__|    |__|-"




  markwkm (PDXPUG)       collectd & PostgreSQL   November 17, 2011   41 / 43
Acknowledgements

  Hayley Jane Wakenshaw

              __      __
            / ~~~/ 
      ,----(       oo     )
    /        __      __/
   /|            ( |(
  ^     /___ / |
      |__|    |__|-"



     markwkm (PDXPUG)         collectd & PostgreSQL   November 17, 2011   42 / 43
License



  This work is licensed under a Creative Commons Attribution 3.0 Unported
  License. To view a copy of this license, (a) visit
  http://creativecommons.org/licenses/by/3.0/us/; or, (b) send a
  letter to Creative Commons, 171 2nd Street, Suite 300, San Francisco,
  California, 94105, USA.




     markwkm (PDXPUG)            collectd & PostgreSQL          November 17, 2011   43 / 43

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Cluster-as-code. The Many Ways towards Kubernetes
Cluster-as-code. The Many Ways towards KubernetesCluster-as-code. The Many Ways towards Kubernetes
Cluster-as-code. The Many Ways towards Kubernetes
 
Sonatype nexus 로 docker registry 관리하기
Sonatype nexus 로 docker registry 관리하기Sonatype nexus 로 docker registry 관리하기
Sonatype nexus 로 docker registry 관리하기
 
Temperatura Zabbix Procedimento Temper Usb
Temperatura Zabbix Procedimento Temper UsbTemperatura Zabbix Procedimento Temper Usb
Temperatura Zabbix Procedimento Temper Usb
 
Db2 v11.5.4 高可用性構成 & HADR 構成パターンご紹介
Db2 v11.5.4 高可用性構成 & HADR 構成パターンご紹介Db2 v11.5.4 高可用性構成 & HADR 構成パターンご紹介
Db2 v11.5.4 高可用性構成 & HADR 構成パターンご紹介
 
Crossplane @ Mastering GitOps.pdf
Crossplane @ Mastering GitOps.pdfCrossplane @ Mastering GitOps.pdf
Crossplane @ Mastering GitOps.pdf
 
Automation with ansible
Automation with ansibleAutomation with ansible
Automation with ansible
 
Apache Hive - Introduction
Apache Hive - IntroductionApache Hive - Introduction
Apache Hive - Introduction
 
Container Storage Best Practices in 2017
Container Storage Best Practices in 2017Container Storage Best Practices in 2017
Container Storage Best Practices in 2017
 
Red Hat Ansible 적용 사례
Red Hat Ansible 적용 사례Red Hat Ansible 적용 사례
Red Hat Ansible 적용 사례
 
PostgreSQL
PostgreSQL PostgreSQL
PostgreSQL
 
Ansible과 CloudFormation을 이용한 배포 자동화
Ansible과 CloudFormation을 이용한 배포 자동화Ansible과 CloudFormation을 이용한 배포 자동화
Ansible과 CloudFormation을 이용한 배포 자동화
 
Apache Hadoop Security - Ranger
Apache Hadoop Security - RangerApache Hadoop Security - Ranger
Apache Hadoop Security - Ranger
 
実環境にTerraform導入したら驚いた
実環境にTerraform導入したら驚いた実環境にTerraform導入したら驚いた
実環境にTerraform導入したら驚いた
 
Machine Learning Model Serving with Backend.AI
Machine Learning Model Serving with Backend.AIMachine Learning Model Serving with Backend.AI
Machine Learning Model Serving with Backend.AI
 
Scale Kubernetes to support 50000 services
Scale Kubernetes to support 50000 servicesScale Kubernetes to support 50000 services
Scale Kubernetes to support 50000 services
 
Automating with Ansible
Automating with AnsibleAutomating with Ansible
Automating with Ansible
 
Ansible
AnsibleAnsible
Ansible
 
Introduction to Docker Compose
Introduction to Docker ComposeIntroduction to Docker Compose
Introduction to Docker Compose
 
プライベートクラウドを支えるAMD EPYCサーバ
プライベートクラウドを支えるAMD EPYCサーバプライベートクラウドを支えるAMD EPYCサーバ
プライベートクラウドを支えるAMD EPYCサーバ
 
Advanced backup methods (Postgres@CERN)
Advanced backup methods (Postgres@CERN)Advanced backup methods (Postgres@CERN)
Advanced backup methods (Postgres@CERN)
 

Destacado

OHAI, my name is Chelnik! PGCon 2014 Mockumentary
OHAI, my name is Chelnik! PGCon 2014 MockumentaryOHAI, my name is Chelnik! PGCon 2014 Mockumentary
OHAI, my name is Chelnik! PGCon 2014 Mockumentary
Mark Wong
 
Bacd zenoss
Bacd zenossBacd zenoss
Bacd zenoss
ke4qqq
 
Advanced Postgres Monitoring
Advanced Postgres MonitoringAdvanced Postgres Monitoring
Advanced Postgres Monitoring
Denish Patel
 

Destacado (17)

OHAI, my name is Chelnik! PGCon 2014 Mockumentary
OHAI, my name is Chelnik! PGCon 2014 MockumentaryOHAI, my name is Chelnik! PGCon 2014 Mockumentary
OHAI, my name is Chelnik! PGCon 2014 Mockumentary
 
Influxdb
InfluxdbInfluxdb
Influxdb
 
pg_top is 'top' for PostgreSQL: pg_top + pg_proctab
pg_top is 'top' for PostgreSQL: pg_top + pg_proctabpg_top is 'top' for PostgreSQL: pg_top + pg_proctab
pg_top is 'top' for PostgreSQL: pg_top + pg_proctab
 
pg_top is 'top' for PostgreSQL
pg_top is 'top' for PostgreSQLpg_top is 'top' for PostgreSQL
pg_top is 'top' for PostgreSQL
 
Bacd zenoss
Bacd zenossBacd zenoss
Bacd zenoss
 
Monitoring at Cloud Scale
Monitoring at Cloud ScaleMonitoring at Cloud Scale
Monitoring at Cloud Scale
 
3 conley-mar16
3 conley-mar163 conley-mar16
3 conley-mar16
 
2 macrina-mar16
2 macrina-mar162 macrina-mar16
2 macrina-mar16
 
1 hellman-mar16
1 hellman-mar161 hellman-mar16
1 hellman-mar16
 
InfluxDB & Grafana
InfluxDB & GrafanaInfluxDB & Grafana
InfluxDB & Grafana
 
Advanced Postgres Monitoring
Advanced Postgres MonitoringAdvanced Postgres Monitoring
Advanced Postgres Monitoring
 
InfluxDb
InfluxDbInfluxDb
InfluxDb
 
Backend server monitoring and alarm system (collectd, graphite, grafana, zabb...
Backend server monitoring and alarm system (collectd, graphite, grafana, zabb...Backend server monitoring and alarm system (collectd, graphite, grafana, zabb...
Backend server monitoring and alarm system (collectd, graphite, grafana, zabb...
 
Beautiful Monitoring With Grafana and InfluxDB
Beautiful Monitoring With Grafana and InfluxDBBeautiful Monitoring With Grafana and InfluxDB
Beautiful Monitoring With Grafana and InfluxDB
 
Best Practices for Becoming an Exceptional Postgres DBA
Best Practices for Becoming an Exceptional Postgres DBA Best Practices for Becoming an Exceptional Postgres DBA
Best Practices for Becoming an Exceptional Postgres DBA
 
Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.
 
5 Steps to PostgreSQL Performance
5 Steps to PostgreSQL Performance5 Steps to PostgreSQL Performance
5 Steps to PostgreSQL Performance
 

Similar a collectd & PostgreSQL

Operating PostgreSQL at Scale with Kubernetes
Operating PostgreSQL at Scale with KubernetesOperating PostgreSQL at Scale with Kubernetes
Operating PostgreSQL at Scale with Kubernetes
Jonathan Katz
 

Similar a collectd & PostgreSQL (20)

Making Postgres Central in Your Data Center
Making Postgres Central in Your Data CenterMaking Postgres Central in Your Data Center
Making Postgres Central in Your Data Center
 
PostgreSQL 9.4
PostgreSQL 9.4PostgreSQL 9.4
PostgreSQL 9.4
 
Operating PostgreSQL at Scale with Kubernetes
Operating PostgreSQL at Scale with KubernetesOperating PostgreSQL at Scale with Kubernetes
Operating PostgreSQL at Scale with Kubernetes
 
Logs aggregation and analysis
Logs aggregation and analysisLogs aggregation and analysis
Logs aggregation and analysis
 
10 Reasons to Start Your Analytics Project with PostgreSQL
10 Reasons to Start Your Analytics Project with PostgreSQL10 Reasons to Start Your Analytics Project with PostgreSQL
10 Reasons to Start Your Analytics Project with PostgreSQL
 
PostgreSQL 9.4, 9.5 and Beyond @ COSCUP 2015 Taipei
PostgreSQL 9.4, 9.5 and Beyond @ COSCUP 2015 TaipeiPostgreSQL 9.4, 9.5 and Beyond @ COSCUP 2015 Taipei
PostgreSQL 9.4, 9.5 and Beyond @ COSCUP 2015 Taipei
 
Postgres в основе вашего дата-центра, Bruce Momjian (EnterpriseDB)
Postgres в основе вашего дата-центра, Bruce Momjian (EnterpriseDB)Postgres в основе вашего дата-центра, Bruce Momjian (EnterpriseDB)
Postgres в основе вашего дата-центра, Bruce Momjian (EnterpriseDB)
 
Monitoring pg with_graphite_grafana
Monitoring pg with_graphite_grafanaMonitoring pg with_graphite_grafana
Monitoring pg with_graphite_grafana
 
Prestogres internals
Prestogres internalsPrestogres internals
Prestogres internals
 
Perl Programming - 04 Programming Database
Perl Programming - 04 Programming DatabasePerl Programming - 04 Programming Database
Perl Programming - 04 Programming Database
 
PostgreSQL 9.4 and Beyond @ FOSSASIA 2015 Singapore
PostgreSQL 9.4 and Beyond @ FOSSASIA 2015 SingaporePostgreSQL 9.4 and Beyond @ FOSSASIA 2015 Singapore
PostgreSQL 9.4 and Beyond @ FOSSASIA 2015 Singapore
 
OSMC 2008 | PostgreSQL Monitoring - Introduction, Internals And Monitoring S...
OSMC 2008 |  PostgreSQL Monitoring - Introduction, Internals And Monitoring S...OSMC 2008 |  PostgreSQL Monitoring - Introduction, Internals And Monitoring S...
OSMC 2008 | PostgreSQL Monitoring - Introduction, Internals And Monitoring S...
 
Beyond Postgres: Interesting Projects, Tools and forks
Beyond Postgres: Interesting Projects, Tools and forksBeyond Postgres: Interesting Projects, Tools and forks
Beyond Postgres: Interesting Projects, Tools and forks
 
Jethro data meetup index base sql on hadoop - oct-2014
Jethro data meetup    index base sql on hadoop - oct-2014Jethro data meetup    index base sql on hadoop - oct-2014
Jethro data meetup index base sql on hadoop - oct-2014
 
Attack monitoring using ElasticSearch Logstash and Kibana
Attack monitoring using ElasticSearch Logstash and KibanaAttack monitoring using ElasticSearch Logstash and Kibana
Attack monitoring using ElasticSearch Logstash and Kibana
 
OSDC 2016 - Unifying Logs and Metrics Data with Elastic Beats by Monica Sarbu
OSDC 2016 - Unifying Logs and Metrics Data with Elastic Beats by Monica SarbuOSDC 2016 - Unifying Logs and Metrics Data with Elastic Beats by Monica Sarbu
OSDC 2016 - Unifying Logs and Metrics Data with Elastic Beats by Monica Sarbu
 
Postgresql Database Administration Basic - Day2
Postgresql  Database Administration Basic  - Day2Postgresql  Database Administration Basic  - Day2
Postgresql Database Administration Basic - Day2
 
Large Scale Crawling with Apache Nutch and Friends
Large Scale Crawling with Apache Nutch and FriendsLarge Scale Crawling with Apache Nutch and Friends
Large Scale Crawling with Apache Nutch and Friends
 
Large Scale Crawling with Apache Nutch and Friends
Large Scale Crawling with Apache Nutch and FriendsLarge Scale Crawling with Apache Nutch and Friends
Large Scale Crawling with Apache Nutch and Friends
 
ElasticSearch as (only) datastore
ElasticSearch as (only) datastoreElasticSearch as (only) datastore
ElasticSearch as (only) datastore
 

Más de Mark Wong

Introduction to PostgreSQL
Introduction to PostgreSQLIntroduction to PostgreSQL
Introduction to PostgreSQL
Mark Wong
 

Más de Mark Wong (18)

OHAI, my name is Chelnik! Postgres Open 2013 Report
OHAI, my name is Chelnik! Postgres Open 2013 ReportOHAI, my name is Chelnik! Postgres Open 2013 Report
OHAI, my name is Chelnik! Postgres Open 2013 Report
 
Android & PostgreSQL
Android & PostgreSQLAndroid & PostgreSQL
Android & PostgreSQL
 
PGTop for Android: Things I learned making this app
PGTop for Android: Things I learned making this appPGTop for Android: Things I learned making this app
PGTop for Android: Things I learned making this app
 
Introduction to PostgreSQL
Introduction to PostgreSQLIntroduction to PostgreSQL
Introduction to PostgreSQL
 
Developing PGTop for Android
Developing PGTop for AndroidDeveloping PGTop for Android
Developing PGTop for Android
 
Pg in-the-brazilian-armed-forces-presentation
Pg in-the-brazilian-armed-forces-presentationPg in-the-brazilian-armed-forces-presentation
Pg in-the-brazilian-armed-forces-presentation
 
pg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLpg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQL
 
pg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLpg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQL
 
PostgreSQL Portland Performance Practice Project - Database Test 2 Tuning
PostgreSQL Portland Performance Practice Project - Database Test 2 TuningPostgreSQL Portland Performance Practice Project - Database Test 2 Tuning
PostgreSQL Portland Performance Practice Project - Database Test 2 Tuning
 
pg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLpg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQL
 
Filesystem Performance from a Database Perspective
Filesystem Performance from a Database PerspectiveFilesystem Performance from a Database Perspective
Filesystem Performance from a Database Perspective
 
PostgreSQL Portland Performance Practice Project - Database Test 2 Filesystem...
PostgreSQL Portland Performance Practice Project - Database Test 2 Filesystem...PostgreSQL Portland Performance Practice Project - Database Test 2 Filesystem...
PostgreSQL Portland Performance Practice Project - Database Test 2 Filesystem...
 
PostgreSQL Portland Performance Practice Project - Database Test 2 Howto
PostgreSQL Portland Performance Practice Project - Database Test 2 HowtoPostgreSQL Portland Performance Practice Project - Database Test 2 Howto
PostgreSQL Portland Performance Practice Project - Database Test 2 Howto
 
PostgreSQL Portland Performance Practice Project - Database Test 2 Workload D...
PostgreSQL Portland Performance Practice Project - Database Test 2 Workload D...PostgreSQL Portland Performance Practice Project - Database Test 2 Workload D...
PostgreSQL Portland Performance Practice Project - Database Test 2 Workload D...
 
PostgreSQL Portland Performance Practice Project - Database Test 2 Background
PostgreSQL Portland Performance Practice Project - Database Test 2 BackgroundPostgreSQL Portland Performance Practice Project - Database Test 2 Background
PostgreSQL Portland Performance Practice Project - Database Test 2 Background
 
PostgreSQL Portland Performance Practice Project - Database Test 2 Series Ove...
PostgreSQL Portland Performance Practice Project - Database Test 2 Series Ove...PostgreSQL Portland Performance Practice Project - Database Test 2 Series Ove...
PostgreSQL Portland Performance Practice Project - Database Test 2 Series Ove...
 
Linux Filesystems, RAID, and more
Linux Filesystems, RAID, and moreLinux Filesystems, RAID, and more
Linux Filesystems, RAID, and more
 
What Is Going On?
What Is Going On?What Is Going On?
What Is Going On?
 

Último

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Último (20)

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 

collectd & PostgreSQL

  • 1. collectd & PostgreSQL Mark Wong markwkm@postgresql.org mark.wong@myemma.com PDXPUG November 17, 2011
  • 2. My Story • How did I get to collectd? • What is collectd • Hacking collectd • Using collectd with Postgres • Visualizing the data markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 2 / 43
  • 3. Brief background • Working at a little company called Emma http://myemma.com • Collect performance data from production systems markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 3 / 43
  • 4. What did we have? • A database with over 1 million database objects • >500,000 tables • >1,000,000 indexes • Tables alone generate 11,000,000 data point per sample markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 4 / 43
  • 5. What did we try? Only free things: • Cacti http://www.cacti.net/ • Ganglia http://ganglia.info/ • Munin http://munin-monitoring.org/ • Reconnoiter https://labs.omniti.com/labs/reconnoiter • Zenoss http://community.zenoss.org/ markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 5 / 43
  • 6. What doesn’t work Dependency on RRDtool; can’t handle more than hundreds of thousands of metrics (Application Buffer-Cache Management for Performance: Running the World’s Largest MRTG by David Plonka, Archit Gupta and Dale Carder, LISA 2007): • Cacti • Ganglia • Munin • Reconnoiter • Zenoss markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 6 / 43
  • 7. Reconnoiter almost worked for us Pro’s: • Write your own SQL queries to collect data from Postgres • Used Postgres instead of RRDtool for storing data • JavaScript based on-the-fly charting • Support for integrating many other health and stats collection solutions Con’s: • Data collection still couldn’t keep up; maybe needed more tuning • Faster hardware? (using VM’s) • More hardware? (scale out MQ processes) markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 7 / 43
  • 8. Couldn’t bring myself to try anything else • Hands were tied, no resources available to help move forward. • Can we build something light weight? • Played with collectd (http://collectd.org/) while evaluating Reconnoiter markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 8 / 43
  • 9. What is collectd? collectd is a daemon which collects system performance statistics periodically and provides mechanisms to store the values in a variety of ways, for example in RRD files. http://collectd.org/ markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 9 / 43
  • 10. Does this look familiar? Note: RRDtool is an option, not a requirement markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 10 / 43
  • 11. What is special about collectd? From their web site: • it’s written in C for performance and portability • includes optimizations and features to handle hundreds of thousands of data sets • PostgreSQL plugin enables querying the database • Can collect most operating systems statistics (I say “most” because I don’t know if anything is missing) • Over 90 total plugins http://collectd.org/wiki/index.php/Table_of_Plugins markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 11 / 43
  • 12. collectd data description • time - when the data was collected • interval - frequency of data collection • host - server hostname • plugin - collectd plugin used • plugin instance - additional plugin information • type - type of data collected for set of values • type instance - unique identifier of the metric • dsnames - names for the values collected • dstypes - type of data for values collected (e.g. counter, gauge, etc.) • values - array of values collected markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 12 / 43
  • 13. PostgreSQL plugin configuration Define custom queries in collectd.conf: LoadPlugin postgresql <Plugin postgresql> <Query magic> Statement "SELECT magic FROM wizard;" <Result> Type gauge InstancePrefix "magic" ValuesFrom magic </Result> </Query> ... markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 13 / 43
  • 14. . . . per database. ... <Database bar> Interval 60 Service "service_name" Query backend # predefined Query magic_tickets </Database> </Plugin> Full details at http://collectd.org/wiki/index.php/Plugin:PostgreSQL markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 14 / 43
  • 15. Hurdles More meta data: • Need a way to save schema, table, and index names; can’t differentiate stats between tables and indexes • Basic support of meta data in collectd but mostly unused • How to store data in something other than RRDtool markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 15 / 43
  • 16. Wanted: additional meta data Hack the PostgreSQL plugin to create meta data for: • database - database name (maybe not needed, same as plugin instance) • schemaname - schema name • tablename - table name • indexname - index name • metric - e.g. blks hit, blks read, seq scan, etc. markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 16 / 43
  • 17. Another database query for collecting a table statistic <Query table_stats> SELECT schemaname, relname, seq_scan FROM pg_stat_all_tables; <Query> markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 17 / 43
  • 18. Identify the data <Result> Type counter InstancePrefix "seq_scan" InstancesFrom "schemaname" "relname" ValuesFrom "seq_scan" </Result> markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 18 / 43
  • 19. Meta data specific parameters <Database postgres> Host "localhost" Query table_stats SchemanameColumn 0 TablenameColumn 1 </Database> Note: The database name is set by what is specified in the <Database>tag, if it is not retrieved by the query. markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 19 / 43
  • 20. Example data • time: 2011-10-20 18:04:17-05 • interval: 300 • host: pong.int • plugin: postgresql • plugin instance: sandbox • type: counter • type instance: seq scan-pg catalog-pg class • dsnames: {value} • dstypes: {counter} • values: {249873} markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 20 / 43
  • 21. Example meta data • database: sandbox • schemaname: pg catalog • tablename: pg class • indexname: • metric: seq scan markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 21 / 43
  • 22. Now what? Hand’s were tied (I think I mentioned that earlier); open sourced work to date: • collectd forked with patches https://github.com/mwongatemma/collectd • YAMS https://github.com/myemma/yams markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 22 / 43
  • 23. Yet Another Monitoring System markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 23 / 43
  • 24. Switching hats and boosting code Using extracurricular time working on equipment donated to Postgres from SUN, IBM, and HP to continue proofing collectd changes. markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 24 / 43
  • 25. How am I going to move the data? Options from available write plugins; guess which I used: • Carbon - Graphite’s storage API to Whisper http://collectd.org/wiki/index.php/Plugin:Carbon • CSV http://collectd.org/wiki/index.php/Plugin:CSV • Network - Send/Receive to other collectd daemons http://collectd.org/wiki/index.php/Plugin:Network • RRDCacheD http://collectd.org/wiki/index.php/Plugin:RRDCacheD • RRDtool http://collectd.org/wiki/index.php/Plugin:RRDtool • SysLog http://collectd.org/wiki/index.php/Plugin:SysLog • UnixSock http://collectd.org/wiki/index.php/Plugin:UnixSock • Write HTTP - PUTVAL (plain text), JSON http://collectd.org/wiki/index.php/Plugin:Write_HTTP markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 25 / 43
  • 26. Process of elimination If RRDtool (wriiten in C) can’t handle massive volumes of data, a Python RRD like database probably can’t either: • Carbon • CSV • Network • RRDCacheD • RRDtool • SysLog • UnixSock • Write HTTP - PUTVAL (plain text), JSON markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 26 / 43
  • 27. Process of elimination Writing to other collectd daemons or just locally doesn’t seem useful at the moment: • CSV • Network • SysLog • UnixSock • Write HTTP - PUTVAL (plain text), JSON markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 27 / 43
  • 28. Process of elimination Let’s try CouchDB’s RESTful JSON API! • CSV • SysLog • Write HTTP - PUTVAL (plain text), JSON markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 28 / 43
  • 29. Random: What Write HTTP PUTVAL data looks like Note: Each PUTVAL is a single line but is broken up into two lines to fit onto the slide. PUTVAL leeloo.lan.home.verplant.org/disk-sda/disk_octets interval=10 1251533299:197141504:175136768 PUTVAL leeloo.lan.home.verplant.org/disk-sda/disk_ops interval=10 1251533299:10765:12858 PUTVAL leeloo.lan.home.verplant.org/disk-sda/disk_time interval=10 1251533299:5:140 PUTVAL leeloo.lan.home.verplant.org/disk-sda/disk_merged interval=10 1251533299:4658:29899 markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 29 / 43
  • 30. Random: What the Write HTTP JSON data looks like Note: Write HTTP packs as much data as it can into a 4KB buffer. [ { "values": [197141504, 175136768], "dstypes": ["counter", "counter"], "dsnames": ["read", "write"], "time": 1251533299, "interval": 10, "host": "leeloo.lan.home.verplant.org", "plugin": "disk", "plugin_instance": "sda", "type": "disk_octets", "type_instance": "" }, ... ] markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 30 / 43
  • 31. I didn’t know anything about CouchDB at the time • Query interface not really suited for retrieving data to visualize • Insert performance not suited for millions of metrics of data over short intervals (can insert same data into Postgres several orders of magnitude faster) markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 31 / 43
  • 32. Now where am I going to put the data? Hoping that using the Write HTTP is still a good choice: • Write an ETL • Table partitioning logic; creation of partition tables • Transform JSON data into INSERT statements • Use Postgres markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 32 / 43
  • 33. Database design Table "collectd.value_list" Column | Type | Modifiers -----------------+--------------------------+----------- time | timestamp with time zone | not null interval | integer | not null host | character varying(64) | not null plugin | character varying(64) | not null plugin_instance | character varying(64) | type | character varying(64) | not null type_instance | character varying(64) | dsnames | character varying(512)[] | not null dstypes | character varying(8)[] | not null values | numeric[] | not null markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 33 / 43
  • 34. Take advantage of partitioning At least table inheritance in Postgres’ case; partition data by plugin markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 34 / 43
  • 35. Child table Table "collectd.vl_postgresql" Column | Type | Modifiers -----------------+--------------------------+----------- ... database | character varying(64) | not null schemaname | character varying(64) | tablename | character varying(64) | indexname | character varying(64) | metric | character varying(64) | not null Check constraints: "vl_postgresql_plugin_check" CHECK (plugin::text = ’postgresql’::text) Inherits: value_list markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 35 / 43
  • 36. How much partitioning? Lots of straightforward options: • Date • Database • Schema • Table • Index • Metric markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 36 / 43
  • 37. Back to the ETL Parameters set for fastest path to working prototype: • Keeping using HTTP POST (Write HTTP plugin) for HTTP protocol and JSON • Use Python for built in HTTP Server and JSON parsing (Emma is primarily a Python shop) • Use SQLAlchemy/psycopg2 markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 37 / 43
  • 38. Back again to the ETL Python didn’t perform; combination of JSON parsing, data transformation, and INSERT performance still several orders of magnitude below acceptable levels: • redis to queue data to transform • lighttpd for the HTTP interface • fastcgi C program to push things to redis • multi-threaded C program using libpq for Postgres API • pop data out of redis • table partitioning creation logic • transform JSON data into INSERT statements markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 38 / 43
  • 39. Success? • Table statistics for 1 million tables collect in approximately 12 minutes. • Is that acceptable? • Can we go faster? markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 39 / 43
  • 40. If you don’t have millions of data Easier ways to visualize the data: • RRDtool • RRDtool compatible front-ends http://collectd.org/wiki/index.php/List_of_front-ends • Graphite with the Carbon and Whisper combo http://graphite.wikidot.com/ • Reconnoiter markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 40 / 43
  • 41. __ __ / ~~~/ . o O ( Thank you! ) ,----( oo ) / __ __/ /| ( |( ^ /___ / | |__| |__|-" markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 41 / 43
  • 42. Acknowledgements Hayley Jane Wakenshaw __ __ / ~~~/ ,----( oo ) / __ __/ /| ( |( ^ /___ / | |__| |__|-" markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 42 / 43
  • 43. License This work is licensed under a Creative Commons Attribution 3.0 Unported License. To view a copy of this license, (a) visit http://creativecommons.org/licenses/by/3.0/us/; or, (b) send a letter to Creative Commons, 171 2nd Street, Suite 300, San Francisco, California, 94105, USA. markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 43 / 43