36. Broken Service
service provider
hasstatus => true
/sbin/service $service status
/etc/init.d/$service status
/usr/bin/systemctl is-active $service
37. Bad/Missing Variable
$one = "1"
file {"lisaone":
Info: Caching catalog for node1.example.com
Error: path Failed => "/to tmp/apply lisa$catalog: one",
Cannot alias File[lisa1] to
["/ensure tmp/lisa1"] => 'directory',
at
/}
etc/puppet/environments/production/manifests/site.pp:34;
resource file {"lisa1":
["File", "/tmp/lisa1"] already declared at
/etc/puppet/environments/production/manifests/site.pp:30
path => "/tmp/lisa1",
ensure => 'file',
}
38. Bad/Missing Variable
lisa {'one':
place => "/tmp/$LISA",
type => "directory",
Info: Caching catalog for node1.example.com
Error: Failed to apply catalog: Cannot alias File[two] to
["/tmp"] at
/etc/puppet/environments/production/modules/lisa/manifests/i
nit.pp:5; resource ["File", "/tmp"] already declared at
/etc/puppet/environments/production/modules/lisa/manifests/i
nit.pp:5
}
lisa {'two':
place => "/tmp/$LISA",
type => "file",
}
define lisa ($place,$type) {
file {"$title":
path => $place,
ensure => $type,
}
}
42. Debug Script… just an example
#!/bin/bash
LOG=$(mktemp /tmp/puppet-debug.XXXXXX)
echo Puppet Debug -- $@ -- $(date) | tee $LOG
echo "-- Disk --" | tee -a $LOG
df -h |tee -a $LOG
df -i |tee -a $LOG
echo "-- Mem --" | tee -a $LOG
free | tee -a $LOG
echo "-- Files --" | tee -a $LOG
PUPPET=$(pgrep puppet)
for proc in $PUPPET
do
lsof -p $proc |tee -a $LOG
done
Puppet Debug -- before resolv.conf -- Fri Oct 24 01:13:34 EDT 2014
-- Disk --
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/VolGroup-lv_root
6.7G 2.5G 3.9G 39% /
tmpfs 246M 0 246M 0% /dev/shm
/dev/vda1 485M 80M 380M 18% /boot
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/mapper/VolGroup-lv_root
440640 79253 361387 18% /
tmpfs 62783 1 62782 1% /dev/shm
/dev/vda1 128016 50 127966 1% /boot
-- Mem --
total used free shared buffers cached
Mem: 502268 415488 86780 0 22176 172036
-/+ buffers/cache: 221276 280992
Swap: 835580 0 835580
-- Files --
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
puppet 2058 root cwd DIR 253,0 4096 14 /root
puppet 2058 root rtd DIR 253,0 4096 2 /
puppet 2058 root txt REG 253,0 10600 36617 /usr/bin/ruby
puppet 2058 root mem REG 253,0 156928 4134 /lib64/ld-2.12.so
puppet 2058 root mem REG 253,0 1926680 6282 /lib64/libc-2.12.so
43. Printing - Template
- scope.to_hash
- reject a few
file { "/tmp/puppet-debug.txt":
content => inline_template("<% vars =
- sort
scope.to_hash.reject { |k,v| !( k.is_a?(String) &&
v.is_a?(String) ) }; vars.sort.each do |k,v| %><%= k %>=<%=
v %>n<% end %>"),
- print, one per line
}
vars = scope.to_hash.reject
{ |k,v| !( k.is_a?(String) &&
v.is_a?(String) ) };
vars.sort.each do |k,v|
k=vn
end
44. _timestamp=2014-10-23 22:29:52 -0700
architecture=x86_64
augeasversion=1.0.0
bios_release_date=01/01/2011
bios_vendor=Bochs
bios_version=Bochs
blockdevice_vda_size=8589934592
blockdevice_vda_vendor=6900
blockdevices=vda
caller_module_name=
clientcert=cookbook.example.com
clientnoop=false
clientversion=3.7.1
concat_basedir=/var/lib/puppet/concat
domain=example.com
environment=production
facterversion=2.2.0
filesystems=ext4,iso9660
fqdn=cookbook.example.com
gid=root
hardwareisa=x86_64
hardwaremodel=x86_64
hostname=cookbook
id=root
interfaces=eth0,lo
Printing - Template
- scope.to_hash
- reject a few
file { "/tmp/puppet-debug.txt":
content => inline_template("<% vars =
- sort
scope.to_hash.reject { |k,v| !( k.is_a?(String) &&
v.is_a?(String) ) }; vars.sort.each do |k,v| %><%= k %>=<%=
v %>n<% end %>"),
- print, one per line
}
vars = scope.to_hash.reject
{ |k,v| !( k.is_a?(String) && v.is_a?(String) ) };
vars.sort.each do |k,v|
k=vn
end
45. Scope
The scene:
class ntp {
include ntp::server
● roles and profiles
● ntp server
class role::ntp {
include ntp
}
}
46. Scope
The solution:
class ntp {
include ntp::server
● fully scope everything
● remember scope
class role::ntp {
include ::ntp
}
}
47. Summary
learn some networking
remember the REST api
read up on SSL / x509
use --trace
make a debug class
remember scope
I started using puppet early on, 0.24
I know a guy that crushes coke cans for a living. It's soda pressing.
how do we get from this to this.
These are the techniques I've used and what I've seen
so these errors can be broken into two groups, we'll talk about each separately.
There's a third category, hopefully you don't fall into this camp, but if you aren't running the latest stuff you are missing out.
so this is network and certificate issues.
sometimes there's dragons between you and the server.
sometimes the masterport isn't 8140.
you can tell mtr to use port 8140,
we'll show each of these and why they are useful.
ping will show you if your client node can lookup the puppet server, you don't need ping to succeed to still be ok. It uses the gethostbyname system call, which will be the same what that puppet will lookup the host. if this fails, puppet will fail.
check what your hostname is, hostname -f will see if your reverse lookup is working.
you don't need to have your nodes resolve properly. only the master should resolve, locally is fine.
mtr, my traceroute, originally matt's traceroute, it works different than traceroute, it uses icmp by default (traceroute uses udp)
netcat - the swiss army knife of network tools.
this is not an x509 talk, but you should know how x509 works.
puppet uses the certname directive, that's how it knows what to do, it uses the hostname command to do that.
ask facter what the hostname is, use config print to show the value
basic unix permissions ← I hear Antoine Dodson in my head when it turns out to be basic unix permissions.
it's not working, why can't you find that module, it's right there you idiot. it turns out someone changed perms manually or messed up with git.
it could be selinux but please don't shut that off.
it could be some other communication problem, so to fix it, you need to know how it works, puppet has a built in REST API
REST - Representational state transfer
I say this in the book, but a lot of puppet is just https traffic
ok, everyone hold hands, this might get rough
the node starts with a GET request to grab the CA.
the server should respond with a certificate
so how does the node construct this get request.
what resources are available? - next slide
verify you can download the CA, your own certificate
check that it's the same. I kid you not, some companies actually interject themselves in any http traffic, you might not be getting the CA you asked for.
now that we have the ca cert, we can use that with curl to try and download our nodes certificate
that should give you a cert, but if it doesn't, that's your problem.
Ok, so we'll ask for cookbook and we get it, use openssl x509 to look at the cert.
check the validity.
like anyone can even know that. well, there's more to it, but you will get a catalog back.
wget or curl will just grab whatever but obscure what's happening, if you are having trouble it's better to use an interactive client
you're a sysadmin, you should know just enough about everything to get in trouble.
so this is an http request, these tools take care of the ssl part of the communication. gnutls-cli can work with startssl type connections.
this is how I work, this is the first chapter of my book
But we use apache or nginx or whatever and we use mod_proxy_balancer for instance in apache.
we can look at the url's coming in, and based on the environment we redirect to a specific worker system that can compile the catalog for us.
when you are trying to diagnose a problem it can be useful to just make a problem environment and go to another worker.
you need to ability to create arbitrary test branches. so you need git in your workflow.
workflow is something I talk about heavily in my LISA talk. next slide git
making a branch per user, ticket etc it very useful.
ok, back to our problem worker.
you would not do this on a production server.
this will generate a lot of logs, make sure you logrotate that stuff.
this is another way to run, the --trace option is the one that will get you a ton of data.
but when you have a problem, that's the best way to find it.
this is a good technique, copy the part that you are having trouble with somewhere, run puppet apply on it with --trace
you'll get to see how the exec is actually running.
so what is the catalog, it's yaml or json.
yaml is easy to read
json not so much, but jq works.
what can you do with the catalog? -- next slide.
the catalog you compile should have the same classes as classes.txt if your last run was successful and you haven't changed the class list. errors can come from the class list changing
when the catalog fails to compile, debugging is usually a lot simpler
either puppet found too many
failing to apply is usually much harder to figure out.
The first issue with variables is the hardest one to find
i'll show examples of the bad exec and service.
so how do you fix duplicate resource?
separate the thing you need multiple times into their own class. - that's the best option for things like httpd/apache
virtual resources, works well for users, sometimes confusing
if you can't find the module, maybe it's the modulepath, use config print to show that. it could be basic unix permissions, try going to that place as the puppet or pe-puppet user to make sure.
these problems are the stumpers, but after you fix them, they seem super simple
scripts that use environment variables,
when the user runs it, it works
when you sudo puppet agent it works too (unless envreset sure)
confusing, but you can run with trace and see what's actually running.
puppet agent --trace will show how the script was run.
puppet just runs these commands, if the return code is no good, puppet thinks the service is not operational
puppet will try and start the service.
puppet does a restart when it gets a notify,
if restart is broken, your runs will fail
this one can stump you for a while.
you have two resources with different names
they point to the same thing but only after variable substitution. well actually that doesn't matter the main thing is that the name/title of the resource is different and the check to make sure they aren't working on the same thing is done after the catalog is compiled.
this one is even worse than the last one :-(
the problem when you are debugging is that the error is in the defined type definition, the best thing I've found at this point is to look in the catalog and figure out where the define was called from.
anyone can do this, it's the echo "hello" thing
you need to use chaining to ensure that things happen near each other.
so this is a technique that I've used when I'm stumped.
make a class for yourself that installs a script and does some debugging.
we then execute that script in an exec, requiring the class with the script.
so here's an example, mine is actually a bit more verbose, but here's a start for y'all.
explain the lsof and what's going on.
and the output, you can see what I'm looking for..
ok break it down
we use an inline template
lets go through how that part works.
we take that scope object, translate to hash
use the reject function to remove the entries that are not strings
place that all into the vars variable
now take vars and sort it, go through each variable and print them one per line.
so what does this look like ← next slide
this is very useful for knowing what the value of everything is at this moment.
but when you are working with puppet you have to remember, even though you define something near something else, there's no guarantee that they are executed near each other ← next slide