4. Data!
Puppet generates a lot of it, in many
delicious flavors!
Persisted, ephemeral, machine local,
centralized, meticulously structured, totally
free-form, human readable, machine
optimized...
Monday, May 21, 12
5. Catalogs
“The Graph”
Containment edges, dependency edges,
classes, tags, resources, resource
parameters, metadata
Monday, May 21, 12
6. file {“/tmp/foo”: content => “This is a test”}
target: &id063 !ruby/object:Puppet::Resource
catalog: *id001
exported: false
file: /etc/puppetlabs/puppet/manifests/site.pp
line: 44
parameters:
!ruby/sym content: This is a test
!ruby/sym backup: main
reference: "File[/tmp/foo]"
tags:
- file
- node
- default
- class
title: /tmp/foo
type: File
Monday, May 21, 12
15. “There's a war out there, old
friend. A world war. And it's
not about who's got the most
bullets. It's about who controls
the information. What we see
and hear, how we work, what
we think... it's all about the
information!”
-- Sneakers
Monday, May 21, 12
16. Storeconfigs
Centralized storage of the configuration of
all your nodes.
All resources, all parameters, all classes, all
tags, all stages...
Enables use of exported resources
Monday, May 21, 12
17. class exporter {
@@file {
"/var/lib/puppet/nodes/$fqdn":
content => "$ipaddressn",
tag => "ip"
}
}
node "export1.daysofwonder.com" {
include exporter
}
node "export2.daysofwonder.com" {
include exporter
}
node "collector.daysofwonder.com" {
File <<| tag == "ip" |>>
}
http://www.masterzen.fr/2009/03/08/all-about-puppet-storeconfigs/
Monday, May 21, 12
18. public key distribution
monitoring checks
clustered services
master/slave replication
load balancers
shared filesystems
firewall rules
...
Monday, May 21, 12
19. Query
Interrogation, investigation, correlation
Use Puppet-generated data in scripts or for
integration with other tools
Monday, May 21, 12
21. Volume
Every node, on every puppet run, generates
data
We have customers generating over 750G of
data a day. Even storing a small subset of
that much information adds up...
Monday, May 21, 12
23. Slow = :(
When data storage is slow, it makes baby
Deepak cry!
Slows down catalog compilation,
More quickly saturates a Puppetmaster,
Thrashes disk,
Bad news!
Monday, May 21, 12
24. API
Current APIs are limited
Hard to get at the data, and performance
concerns discourage use.
We need better ways of searching, filtering,
and correlating data.
Monday, May 21, 12
25. Paradox
Seemingly contradictory goals
We want to store as much data as we can,
and allow for better querying, but without
slowing stuff down or reducing reliability.
Monday, May 21, 12
26. We need
An information clearinghouse
Something that evolves the Puppet Data
Library. A scalable, safe place to store the
information Puppet collects and generates.
This is a hard problem!
Monday, May 21, 12
27. PuppetDB
Definitely Better!
Monday, May 21, 12
30. PuppetDB is
Fast storage of current catalogs and current
facts,
100% compatible with storeconfigs and
inventory service,
REST APIs for resource, fact, and node
retrieval,
...and other things, even!
Monday, May 21, 12
31. science
&
secret alien
technology!
Monday, May 21, 12
32. Message Queue
"new catalog" "new catalog"
"new facts" "new facts"
"delete node" "delete node"
Puppetmaster
Compiler
Command Handler
Storeconfigs Parsing
Transformation
Validation
Storeconfigs,
Catalogs, Facts
REST
Puppet (SCF)
"inventory query"
Enterprise Domain
"interactive query"
Console objects
Query handling
CLI &
Other
Tools
Monday, May 21, 12
75. Reliable!
We work very hard to persist everything we
accept
Acknowledgements with UUIDS,
Checksums,
Queueing,
Automatic retry and reconnect,
and the Dead Letter Office if all else fails!
Monday, May 21, 12
76. APIs!
We don’t cheat
Anything Puppet does with PuppetDB, you
can do to
Query your own resources, upload new fact
sets, create catalogs, inspect facts...all part
of the Puppet Data Library
Monday, May 21, 12
78. curl
-H "Accept: application/json"
"http://puppetdb/facts/host.my.net"
Monday, May 21, 12
79. curl
-H "Accept: application/json"
"http://puppetdb/resources?query=..."
Monday, May 21, 12
80. Transparent!
We care about operational visibility
Ships with a real-time dashboard,
Dozens of metrics and gauges,
Correlate-able logs,
Easy to integrate with monitoring systems
Monday, May 21, 12
81. Speedy!
PuppetDB is much, *much* faster than the
previous storeconfigs and inventory services
At Puppet Labs, we’ve seen huge reductions
in compile times, resource collection times,
time to persist catalogs and facts, etc.
Monday, May 21, 12
83. Posit:
Hosts are not
entirely unique
snowflakes
Monday, May 21, 12
84. Therefore:
A resource often
exists across
multiple hosts
Monday, May 21, 12
85. Feature:
Single-instance
resource storage
Monday, May 21, 12
86. Resource dedupe
Compute unique hashes for resources
We quickly hash all the resources in a
catalog, and use bulk operations to compare
them to hashes stored.
Monday, May 21, 12
87. Resource dedupe
Significant speed improvement!
Internal to Puppet Labs, we see ~83%
resource duplication; this number is
consistent with what we’ve seen in most
customer environments.
Monday, May 21, 12
88. Posit:
Puppet runs
frequently, but
catalogs change
infrequently
Monday, May 21, 12
89. Therefore:
We’ll often receive
the same catalog for
a host
Monday, May 21, 12
90. Feature:
Single-instance
catalog storage
Monday, May 21, 12
91. Catalog dedupe
Compute unique hashes for catalogs
We use a Merkle Tree approach (hash tree)
for quick comparisons.
Puppet Labs sees ~88% catalog duplication
Big savings!
Monday, May 21, 12
92. Posit:
You have more than
one core, though
storeconfigs is
single-threaded
Monday, May 21, 12
93. Therefore:
Throughput is not
maximized
Monday, May 21, 12
94. Feature:
Massively parallel
operation
Monday, May 21, 12
95. Parallel
We can pat our heads and rub our tummies
at the same time
Database operations don’t block MQ
operations don’t block HTTP operations
don’t block hash computation operations
don’t block metric calculations don’t block...
Dozens of threads, zero locks
Monday, May 21, 12