2. DTrace and zones: Fraternal twins
• DTrace and zones were developed in parallel during
development of Solaris 10
• DTrace integrated (September 2003) before zones
(early 2004)
• When zones integrated, the priority was making
DTrace in the global zone be able to meaningfully
instrument non-global zones
• DTrace in the non-global zone was hard — and a
lower priority than other work on both technologies
3. DTrace and zones: Basic functionality
• In 2006, Dan Price (with help from Adam Leventhal
and Jonathan Adams) added initial support for
DTrace in the non-global zone
• Allowed use of syscall provider, pid provider and (in
a deranged, broken way) the profile provider
• This was significant work: required modifications to
both the zones privilege model and the DTrace
privilege model
• For example, required an implicit predicate on
syscall and profile probes
4. DTrace and zones in SmartOS
• As the worldʼs heaviest user of zones, we at Joyent
ran into (and fixed) a number of annoying bugs:
• USDT probes from the non-global were not
properly being enabled in the global zone
(illumos#908)
• Tick and profile probes did not properly fire when
used in the non-global zone (illumos#1456)
• Fixing the latter required an extension of the DTrace
privilege model: introduced a notion of restricted
operation in which args could not be referenced
5. DTrace and zones in SmartOS
• Other (very) annoying issues still lurked:
• Inability to read “cpu” in the non-global zone
• Inability to read any fields from “curlwpsinfo”
and “curpsinfo”— especially “pr_dmodel”
• Inability to read the “fds[]” array
• Failure mode highly obnoxious:
[my-non-global-zone ~]# dtrace -n BEGIN'{trace(curpsinfo->pr_psargs)}'
dtrace: description 'BEGIN' matched 1 probe
dtrace: error on enabled probe ID 1 (ID 1: dtrace:::BEGIN): invalid kernel
access in action #1 at DIF offset 44
6. Divide and conquer
• curlwpsinfo and curpsinfo both are translators
over the current thread (“kthread_t”) and current
process (“proc_t”)
• Importantly, the state contained in oneʼs own
kthread_t and proc_t:
• Is safe to read while executing (threads cannot
disappear out from under themselves)
• Does not represent potential privilege escalation
• This can be fixed by simply allowing the loads where
one has privileges to the current process!
7. fds[]: A magic bullet?
• Somehow, I convinced myself that the problem with
fds[] was the translator that translates the member
accesses into kernel accesses:
inline fileinfo_t fds[int fd] = xlate (
fd >= 0 && fd < t_procp->p_user.u_finfo.fi_nfiles ?
curthread->t_procp->p_user.u_finfo.fi_list[fd].uf_file : NULL);
• If the problem was the static translators, the solution
must be dynamic translators — a(n in)famously
unimplemented feature of DTrace!
• After dtrace.conf(12), I realized that the expression
was orthogonal to the fact that the in-kernel
implementation must not allow privilege escalation
8. fds[]: No magic bullets
• Focussing on the implementation, allows one to
consider the specifics of the fds[] case
• Helped by the fact that the fi_list implementation
uses memory retiring for scalability of file descriptor
lookups: the array is only freed upon process exit
• Assures that oneʼs own fi_list is always pointing
to memory that is (or was) an array of uf_entry_t
• Leaves the file_t itself, which can be freed during
probe context (specifically, by another thread in the
same process)
9. Dealing with file_t
• We can deal with this by forcing everyone out of
probe context after a file_t has been removed
from the uf_entry_t, but before being freed
• This is done by issuing a dtrace_sync() — a
synchronous (empty) cross-call to all CPUs
• This is expensive, and required answering an
important question: just how hot is the closef()
path, anyway?
• By instrumenting our guinea pigs production cloud,
we could answer this concisely: closef() is pretty
damned hot (> 5,000/second on some machines!)
10. Adding getf()
• To track when fds[] was active in the non-global
zone, we added a getf() subroutine (ht: ken)
• Allows us to issue the sync only when we have a
closef() from a non-global zone using fds[]
• Had to take the final step of cleaning up the path
output to strip off the zone path from the file name
(as a cleanliness issue, not a security issue)
• De-mo, de-mo, de-mo!
11. sched and proc providers
• With fds[] done, focus turned the only meaningful
impediment to DTrace in the non-global zone:
enabling the sched and proc providers
• Recall the restricted operation introduced for the
profile provider in the non-global zone...
• Used this to have limited (non-global) DTrace
privileges imply restricted operation for some SDT
providers
• Thanks to the curlwpsinfo/curpsinfo work,
these providers can be meaningfully used without
access to arguments