"PuppetDB: New Adventures in Higher-Order Automation" by
Deepak Giridharagopal, Director of Engineering, Puppet Labs.
Presentation Overview: PuppetDB gives users fast, robust, centralized storage for Puppet-produced data. The 1.0 version landed at Puppetconf 2012, and now we're one year older and one year wiser. It's been deployed in thousands of sites, people have written libraries and tools on top of it, and there's been plenty of activity in the past year. We've tightly integrated it into Puppet Enterprise. We've added new features like report storage, event querying, import/export, better HTTP endpoints, and unified querying. And though we've added features, we've also made PuppetDB faster and consume less disk space. This talk will cover what's happened in the PuppetDB world between Puppetconf 2012 and now. We'll go into the new features, talk about performance and correctness, and discuss lessons learned.
Speaker Bio: Deepak is Director of Engineering at Puppet Labs, one of the authors of PuppetDB, and a many-times-over Puppetconf veteran. Prior to joining Puppet Labs, he was Principal Engineer at Dell/MessageOne, using Puppet to manage thousands of production systems.
96. Tons and tons of high
quality libraries
Web servers, concurrency
frameworks, databases, fast
parsing/lexing, clustering,
debugging, profiling, etc.
Friday, August 23, 13
97. Can ship an uberjar,
makes deployment
straightforward with
few moving pieces
Friday, August 23, 13
108. AST-based API lets
users write their own
languages
ah, you’ve got to love
open source!
Friday, August 23, 13
109. (Package[httpd] and country=fr)
or country=us
Package["mysql-server"]
and architecture=amd64
Erik Dalén, Spotify
https://github.com/dalen/puppet-puppetdbquery
Friday, August 23, 13
126. Thousands of deployments,
Hundreds of threads per install,
Zero deadlocks,
Zero bugs involving mutable state
companion Ruby code has
~10x the defect rate
Friday, August 23, 13
127. All with a pretty tiny codebase
Friday, August 23, 13
145. 20% faster storage
Improvements to memoization
and caching,eliminate double-
serialization,nuked superfluous
indexes
Friday, August 23, 13
146. Much faster terminus
Better caching and data
structures.For a catalog with
10k resources,drops
serialization time from ~80s to
~6s.
Friday, August 23, 13
155. Subqueries
You can now correlate data from
resource queries with fact
queries with node queries.
"Give me the IP address of all machines with
the Nginx service configured"
Friday, August 23, 13
156. Report storage
-Comes with a report
processing plugin
-Store report-level metadata
-Can do queries on events that
span reports
-Basis for PE's Event Inspector
Friday, August 23, 13
158. Streaming queries
Stream results to clients on-the-
fly,as they come in from the
database.
Massively lower latency for first
response!
Friday, August 23, 13
164. We will be developing tools to replicate
data from one PuppetDB daemon to
another. This will help with HA and DR.
PuppetDB
Diff &
Mirror PuppetDB
Friday, August 23, 13
165. By initially developing an out-of-band
mirroring tool, we can create more
interesting replication topologies:
PuppetDB
Diff &
Mirror PuppetDB
Diff &
Mirror
Friday, August 23, 13
166. We can also later optimize the process to
lower latency, but preserve eventual
consistency:
PuppetDB
Diff &
Mirror
PuppetDBDirect MQ connection
Friday, August 23, 13
167. More flexible routing is coming, allowing
for soft failures and read/write splits:
PuppetDB
Puppetmaster
PuppetDB
Replication
Catalogs,Facts,
Reports
Collection
queries
Log error and
continue
Friday, August 23, 13