SlideShare una empresa de Scribd logo
1 de 13
Descargar para leer sin conexión
DTrace in the
Non-global Zone
Bryan Cantrill
SVP Engineering, Joyent

@bcantrill
bryan@joyent.com
DTrace and zones: Fraternal twins

 • DTrace and zones were developed in parallel during
   development of Solaris 10

 • DTrace integrated (September 2003) before zones
   (early 2004)

 • When zones integrated, the priority was making
   DTrace in the global zone be able to meaningfully
   instrument non-global zones

 • DTrace in the non-global zone was hard — and a
   lower priority than other work on both technologies
DTrace and zones: Basic functionality

 • In 2006, Dan Price (with help from Adam Leventhal
   and Jonathan Adams) added initial support for
   DTrace in the non-global zone

 • Allowed use of syscall provider, pid provider and (in
   a deranged, broken way) the profile provider

 • This was significant work: required modifications to
   both the zones privilege model and the DTrace
   privilege model

 • For example, required an implicit predicate on
   syscall and profile probes
DTrace and zones in SmartOS

 • As the worldʼs heaviest user of zones, we at Joyent
  ran into (and fixed) a number of annoying bugs:

   • USDT probes from the non-global were not
     properly being enabled in the global zone
     (illumos#908)

   • Tick and profile probes did not properly fire when
     used in the non-global zone (illumos#1456)

 • Fixing the latter required an extension of the DTrace
  privilege model: introduced a notion of restricted
  operation in which args could not be referenced
DTrace and zones in SmartOS

 • Other (very) annoying issues still lurked:
   • Inability to read “cpu” in the non-global zone
   • Inability to read any fields from “curlwpsinfo”
     and “curpsinfo”— especially “pr_dmodel”

   • Inability to read the “fds[]” array
 • Failure mode highly obnoxious:
   [my-non-global-zone ~]# dtrace -n BEGIN'{trace(curpsinfo->pr_psargs)}'
   dtrace: description 'BEGIN' matched 1 probe
   dtrace: error on enabled probe ID 1 (ID 1: dtrace:::BEGIN): invalid kernel
   access in action #1 at DIF offset 44
Divide and conquer

 • curlwpsinfo and curpsinfo both are translators
  over the current thread (“kthread_t”) and current
  process (“proc_t”)

 • Importantly, the state contained in oneʼs own
  kthread_t and proc_t:

   • Is safe to read while executing (threads cannot
     disappear out from under themselves)

   • Does not represent potential privilege escalation
 • This can be fixed by simply allowing the loads where
  one has privileges to the current process!
fds[]: A magic bullet?

 • Somehow, I convinced myself that the problem with
   fds[] was the translator that translates the member
   accesses into kernel accesses:
   inline fileinfo_t fds[int fd] = xlate (
      fd >= 0 && fd < t_procp->p_user.u_finfo.fi_nfiles ?
      curthread->t_procp->p_user.u_finfo.fi_list[fd].uf_file : NULL);


 • If the problem was the static translators, the solution
   must be dynamic translators — a(n in)famously
   unimplemented feature of DTrace!

 • After dtrace.conf(12), I realized that the expression
   was orthogonal to the fact that the in-kernel
   implementation must not allow privilege escalation
fds[]: No magic bullets

 • Focussing on the implementation, allows one to
   consider the specifics of the fds[] case

 • Helped by the fact that the fi_list implementation
   uses memory retiring for scalability of file descriptor
   lookups: the array is only freed upon process exit

 • Assures that oneʼs own fi_list is always pointing
   to memory that is (or was) an array of uf_entry_t

 • Leaves the file_t itself, which can be freed during
   probe context (specifically, by another thread in the
   same process)
Dealing with file_t

 • We can deal with this by forcing everyone out of
  probe context after a file_t has been removed
  from the uf_entry_t, but before being freed

 • This is done by issuing a dtrace_sync() — a
  synchronous (empty) cross-call to all CPUs

 • This is expensive, and required answering an
  important question: just how hot is the closef()
  path, anyway?

 • By instrumenting our guinea pigs production cloud,
  we could answer this concisely: closef() is pretty
  damned hot (> 5,000/second on some machines!)
Adding getf()

 • To track when fds[] was active in the non-global
  zone, we added a getf() subroutine (ht: ken)

 • Allows us to issue the sync only when we have a
  closef() from a non-global zone using fds[]

 • Had to take the final step of cleaning up the path
  output to strip off the zone path from the file name
  (as a cleanliness issue, not a security issue)

 • De-mo, de-mo, de-mo!
sched and proc providers

 • With fds[] done, focus turned the only meaningful
  impediment to DTrace in the non-global zone:
  enabling the sched and proc providers

 • Recall the restricted operation introduced for the
  profile provider in the non-global zone...

 • Used this to have limited (non-global) DTrace
  privileges imply restricted operation for some SDT
  providers

 • Thanks to the curlwpsinfo/curpsinfo work,
  these providers can be meaningfully used without
  access to arguments
Thank you.
FOR MORE INFORMATION VISIT

 www.joyent.com
           OR

www.smartos.org

Más contenido relacionado

La actualidad más candente

Real-time soultion
Real-time soultionReal-time soultion
Real-time soultionNylon
 
NetBSD and Linux for Embedded Systems
NetBSD and Linux for Embedded SystemsNetBSD and Linux for Embedded Systems
NetBSD and Linux for Embedded SystemsMahendra M
 
Xen PV Performance Status and Optimization Opportunities
Xen PV Performance Status and Optimization OpportunitiesXen PV Performance Status and Optimization Opportunities
Xen PV Performance Status and Optimization OpportunitiesThe Linux Foundation
 
Evoluation of Linux Container Virtualization
Evoluation of Linux Container VirtualizationEvoluation of Linux Container Virtualization
Evoluation of Linux Container VirtualizationImesh Gunaratne
 
CIF16/Scale14x: The latest from the Xen Project (Lars Kurth, Chairman of Xen ...
CIF16/Scale14x: The latest from the Xen Project (Lars Kurth, Chairman of Xen ...CIF16/Scale14x: The latest from the Xen Project (Lars Kurth, Chairman of Xen ...
CIF16/Scale14x: The latest from the Xen Project (Lars Kurth, Chairman of Xen ...The Linux Foundation
 
Running without a ZFS system pool
Running without a ZFS system poolRunning without a ZFS system pool
Running without a ZFS system poolBill Pijewski
 
Union FileSystem - A Building Blocks Of a Container
Union FileSystem - A Building Blocks Of a ContainerUnion FileSystem - A Building Blocks Of a Container
Union FileSystem - A Building Blocks Of a ContainerKnoldus Inc.
 
Linux Instrumentation
Linux InstrumentationLinux Instrumentation
Linux InstrumentationDarkStarSword
 
BACD July 2012 : The Xen Cloud Platform
BACD July 2012 : The Xen Cloud Platform BACD July 2012 : The Xen Cloud Platform
BACD July 2012 : The Xen Cloud Platform The Linux Foundation
 
Linux container, namespaces & CGroup.
Linux container, namespaces & CGroup. Linux container, namespaces & CGroup.
Linux container, namespaces & CGroup. Neeraj Shrimali
 
Scale14x: Are today's foss security practices robust enough in the cloud era ...
Scale14x: Are today's foss security practices robust enough in the cloud era ...Scale14x: Are today's foss security practices robust enough in the cloud era ...
Scale14x: Are today's foss security practices robust enough in the cloud era ...The Linux Foundation
 
S4 xen hypervisor_20080622
S4 xen hypervisor_20080622S4 xen hypervisor_20080622
S4 xen hypervisor_20080622Todd Deshane
 
High Availability with Novell Cluster Services for Novell Open Enterprise Ser...
High Availability with Novell Cluster Services for Novell Open Enterprise Ser...High Availability with Novell Cluster Services for Novell Open Enterprise Ser...
High Availability with Novell Cluster Services for Novell Open Enterprise Ser...Novell
 
Practical Tips for Novell Cluster Services
Practical Tips for Novell Cluster ServicesPractical Tips for Novell Cluster Services
Practical Tips for Novell Cluster ServicesNovell
 

La actualidad más candente (20)

Real-time soultion
Real-time soultionReal-time soultion
Real-time soultion
 
NetBSD and Linux for Embedded Systems
NetBSD and Linux for Embedded SystemsNetBSD and Linux for Embedded Systems
NetBSD and Linux for Embedded Systems
 
Plan 9: Not (Only) A Better UNIX
Plan 9: Not (Only) A Better UNIXPlan 9: Not (Only) A Better UNIX
Plan 9: Not (Only) A Better UNIX
 
Xen PV Performance Status and Optimization Opportunities
Xen PV Performance Status and Optimization OpportunitiesXen PV Performance Status and Optimization Opportunities
Xen PV Performance Status and Optimization Opportunities
 
Evoluation of Linux Container Virtualization
Evoluation of Linux Container VirtualizationEvoluation of Linux Container Virtualization
Evoluation of Linux Container Virtualization
 
Making Linux do Hard Real-time
Making Linux do Hard Real-timeMaking Linux do Hard Real-time
Making Linux do Hard Real-time
 
Making Linux do Hard Real-time
Making Linux do Hard Real-timeMaking Linux do Hard Real-time
Making Linux do Hard Real-time
 
CIF16/Scale14x: The latest from the Xen Project (Lars Kurth, Chairman of Xen ...
CIF16/Scale14x: The latest from the Xen Project (Lars Kurth, Chairman of Xen ...CIF16/Scale14x: The latest from the Xen Project (Lars Kurth, Chairman of Xen ...
CIF16/Scale14x: The latest from the Xen Project (Lars Kurth, Chairman of Xen ...
 
Understanding LXC & Docker
Understanding LXC & DockerUnderstanding LXC & Docker
Understanding LXC & Docker
 
Implement Runtime Environments for HSA using LLVM
Implement Runtime Environments for HSA using LLVMImplement Runtime Environments for HSA using LLVM
Implement Runtime Environments for HSA using LLVM
 
Running without a ZFS system pool
Running without a ZFS system poolRunning without a ZFS system pool
Running without a ZFS system pool
 
Union FileSystem - A Building Blocks Of a Container
Union FileSystem - A Building Blocks Of a ContainerUnion FileSystem - A Building Blocks Of a Container
Union FileSystem - A Building Blocks Of a Container
 
Linux Instrumentation
Linux InstrumentationLinux Instrumentation
Linux Instrumentation
 
BACD July 2012 : The Xen Cloud Platform
BACD July 2012 : The Xen Cloud Platform BACD July 2012 : The Xen Cloud Platform
BACD July 2012 : The Xen Cloud Platform
 
Linux container, namespaces & CGroup.
Linux container, namespaces & CGroup. Linux container, namespaces & CGroup.
Linux container, namespaces & CGroup.
 
Scale14x: Are today's foss security practices robust enough in the cloud era ...
Scale14x: Are today's foss security practices robust enough in the cloud era ...Scale14x: Are today's foss security practices robust enough in the cloud era ...
Scale14x: Are today's foss security practices robust enough in the cloud era ...
 
S4 xen hypervisor_20080622
S4 xen hypervisor_20080622S4 xen hypervisor_20080622
S4 xen hypervisor_20080622
 
Cl306
Cl306Cl306
Cl306
 
High Availability with Novell Cluster Services for Novell Open Enterprise Ser...
High Availability with Novell Cluster Services for Novell Open Enterprise Ser...High Availability with Novell Cluster Services for Novell Open Enterprise Ser...
High Availability with Novell Cluster Services for Novell Open Enterprise Ser...
 
Practical Tips for Novell Cluster Services
Practical Tips for Novell Cluster ServicesPractical Tips for Novell Cluster Services
Practical Tips for Novell Cluster Services
 

Similar a DTrace in the Non-global Zone

Solaris DTrace, An Introduction
Solaris DTrace, An IntroductionSolaris DTrace, An Introduction
Solaris DTrace, An Introductionsatyajit_t
 
From DTrace to Linux
From DTrace to LinuxFrom DTrace to Linux
From DTrace to LinuxBrendan Gregg
 
Bsdtw17: george neville neil: realities of dtrace on free-bsd
Bsdtw17: george neville neil: realities of dtrace on free-bsdBsdtw17: george neville neil: realities of dtrace on free-bsd
Bsdtw17: george neville neil: realities of dtrace on free-bsdScott Tsai
 
Rainbow Over the Windows: More Colors Than You Could Expect
Rainbow Over the Windows: More Colors Than You Could ExpectRainbow Over the Windows: More Colors Than You Could Expect
Rainbow Over the Windows: More Colors Than You Could ExpectPeter Hlavaty
 
Linux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsLinux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsBrendan Gregg
 
DTrace Topics: Introduction
DTrace Topics: IntroductionDTrace Topics: Introduction
DTrace Topics: IntroductionBrendan Gregg
 
Performance analysis and troubleshooting using DTrace
Performance analysis and troubleshooting using DTracePerformance analysis and troubleshooting using DTrace
Performance analysis and troubleshooting using DTraceGraeme Jenkinson
 
UGIF 12 2010 - features11.70
UGIF 12 2010 - features11.70UGIF 12 2010 - features11.70
UGIF 12 2010 - features11.70UGIF
 
Informix User Group France - 30/11/2010 - Fonctionalités IDS 11.7
Informix User Group France - 30/11/2010 - Fonctionalités IDS 11.7Informix User Group France - 30/11/2010 - Fonctionalités IDS 11.7
Informix User Group France - 30/11/2010 - Fonctionalités IDS 11.7Nicolas Desachy
 
System Device Tree and Lopper: Concrete Examples - ELC NA 2022
System Device Tree and Lopper: Concrete Examples - ELC NA 2022System Device Tree and Lopper: Concrete Examples - ELC NA 2022
System Device Tree and Lopper: Concrete Examples - ELC NA 2022Stefano Stabellini
 
From printk to QEMU: Xen/Linux Kernel debugging
From printk to QEMU: Xen/Linux Kernel debuggingFrom printk to QEMU: Xen/Linux Kernel debugging
From printk to QEMU: Xen/Linux Kernel debuggingThe Linux Foundation
 
The Enterprise IT Checklist for Docker Operations
The Enterprise IT Checklist for Docker Operations The Enterprise IT Checklist for Docker Operations
The Enterprise IT Checklist for Docker Operations Nicola Kabar
 
Linux Perf Tools
Linux Perf ToolsLinux Perf Tools
Linux Perf ToolsRaj Pandey
 
Вениамин Гвоздиков: Особенности использования DTrace
Вениамин Гвоздиков: Особенности использования DTrace Вениамин Гвоздиков: Особенности использования DTrace
Вениамин Гвоздиков: Особенности использования DTrace Yandex
 
Surge2014 talk - illumos State of the Community & POSIX Update
Surge2014 talk - illumos State of the Community & POSIX UpdateSurge2014 talk - illumos State of the Community & POSIX Update
Surge2014 talk - illumos State of the Community & POSIX UpdateGarrett D'Amore
 
Terraforming your Infrastructure on GCP
Terraforming your Infrastructure on GCPTerraforming your Infrastructure on GCP
Terraforming your Infrastructure on GCPSamuel Chow
 
Linuxtraining 130710022121-phpapp01
Linuxtraining 130710022121-phpapp01Linuxtraining 130710022121-phpapp01
Linuxtraining 130710022121-phpapp01Chander Pandey
 

Similar a DTrace in the Non-global Zone (20)

Solaris DTrace, An Introduction
Solaris DTrace, An IntroductionSolaris DTrace, An Introduction
Solaris DTrace, An Introduction
 
From DTrace to Linux
From DTrace to LinuxFrom DTrace to Linux
From DTrace to Linux
 
Bsdtw17: george neville neil: realities of dtrace on free-bsd
Bsdtw17: george neville neil: realities of dtrace on free-bsdBsdtw17: george neville neil: realities of dtrace on free-bsd
Bsdtw17: george neville neil: realities of dtrace on free-bsd
 
Rainbow Over the Windows: More Colors Than You Could Expect
Rainbow Over the Windows: More Colors Than You Could ExpectRainbow Over the Windows: More Colors Than You Could Expect
Rainbow Over the Windows: More Colors Than You Could Expect
 
Linux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsLinux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old Secrets
 
DTrace Topics: Introduction
DTrace Topics: IntroductionDTrace Topics: Introduction
DTrace Topics: Introduction
 
Performance analysis and troubleshooting using DTrace
Performance analysis and troubleshooting using DTracePerformance analysis and troubleshooting using DTrace
Performance analysis and troubleshooting using DTrace
 
Terraform training 🎒 - Basic
Terraform training 🎒 - BasicTerraform training 🎒 - Basic
Terraform training 🎒 - Basic
 
UGIF 12 2010 - features11.70
UGIF 12 2010 - features11.70UGIF 12 2010 - features11.70
UGIF 12 2010 - features11.70
 
Informix User Group France - 30/11/2010 - Fonctionalités IDS 11.7
Informix User Group France - 30/11/2010 - Fonctionalités IDS 11.7Informix User Group France - 30/11/2010 - Fonctionalités IDS 11.7
Informix User Group France - 30/11/2010 - Fonctionalités IDS 11.7
 
System Device Tree and Lopper: Concrete Examples - ELC NA 2022
System Device Tree and Lopper: Concrete Examples - ELC NA 2022System Device Tree and Lopper: Concrete Examples - ELC NA 2022
System Device Tree and Lopper: Concrete Examples - ELC NA 2022
 
From printk to QEMU: Xen/Linux Kernel debugging
From printk to QEMU: Xen/Linux Kernel debuggingFrom printk to QEMU: Xen/Linux Kernel debugging
From printk to QEMU: Xen/Linux Kernel debugging
 
The Enterprise IT Checklist for Docker Operations
The Enterprise IT Checklist for Docker Operations The Enterprise IT Checklist for Docker Operations
The Enterprise IT Checklist for Docker Operations
 
Linux Perf Tools
Linux Perf ToolsLinux Perf Tools
Linux Perf Tools
 
Вениамин Гвоздиков: Особенности использования DTrace
Вениамин Гвоздиков: Особенности использования DTrace Вениамин Гвоздиков: Особенности использования DTrace
Вениамин Гвоздиков: Особенности использования DTrace
 
A Peek into TFRT
A Peek into TFRTA Peek into TFRT
A Peek into TFRT
 
Surge2014 talk - illumos State of the Community & POSIX Update
Surge2014 talk - illumos State of the Community & POSIX UpdateSurge2014 talk - illumos State of the Community & POSIX Update
Surge2014 talk - illumos State of the Community & POSIX Update
 
Terraforming your Infrastructure on GCP
Terraforming your Infrastructure on GCPTerraforming your Infrastructure on GCP
Terraforming your Infrastructure on GCP
 
Linuxtraining 130710022121-phpapp01
Linuxtraining 130710022121-phpapp01Linuxtraining 130710022121-phpapp01
Linuxtraining 130710022121-phpapp01
 
A22 Introduction to DTrace by Kyle Hailey
A22 Introduction to DTrace by Kyle HaileyA22 Introduction to DTrace by Kyle Hailey
A22 Introduction to DTrace by Kyle Hailey
 

Más de bcantrill

Predicting the Present
Predicting the PresentPredicting the Present
Predicting the Presentbcantrill
 
Sharpening the Axe: The Primacy of Toolmaking
Sharpening the Axe: The Primacy of ToolmakingSharpening the Axe: The Primacy of Toolmaking
Sharpening the Axe: The Primacy of Toolmakingbcantrill
 
Coming of Age: Developing young technologists without robbing them of their y...
Coming of Age: Developing young technologists without robbing them of their y...Coming of Age: Developing young technologists without robbing them of their y...
Coming of Age: Developing young technologists without robbing them of their y...bcantrill
 
I have come to bury the BIOS, not to open it: The need for holistic systems
I have come to bury the BIOS, not to open it: The need for holistic systemsI have come to bury the BIOS, not to open it: The need for holistic systems
I have come to bury the BIOS, not to open it: The need for holistic systemsbcantrill
 
Towards Holistic Systems
Towards Holistic SystemsTowards Holistic Systems
Towards Holistic Systemsbcantrill
 
The Coming Firmware Revolution
The Coming Firmware RevolutionThe Coming Firmware Revolution
The Coming Firmware Revolutionbcantrill
 
Hardware/software Co-design: The Coming Golden Age
Hardware/software Co-design: The Coming Golden AgeHardware/software Co-design: The Coming Golden Age
Hardware/software Co-design: The Coming Golden Agebcantrill
 
Tockilator: Deducing Tock execution flows from Ibex Verilator traces
Tockilator: Deducing Tock execution flows from Ibex Verilator tracesTockilator: Deducing Tock execution flows from Ibex Verilator traces
Tockilator: Deducing Tock execution flows from Ibex Verilator tracesbcantrill
 
No Moore Left to Give: Enterprise Computing After Moore's Law
No Moore Left to Give: Enterprise Computing After Moore's LawNo Moore Left to Give: Enterprise Computing After Moore's Law
No Moore Left to Give: Enterprise Computing After Moore's Lawbcantrill
 
Andreessen's Corollary: Ethical Dilemmas in Software Engineering
Andreessen's Corollary: Ethical Dilemmas in Software EngineeringAndreessen's Corollary: Ethical Dilemmas in Software Engineering
Andreessen's Corollary: Ethical Dilemmas in Software Engineeringbcantrill
 
Visualizing Systems with Statemaps
Visualizing Systems with StatemapsVisualizing Systems with Statemaps
Visualizing Systems with Statemapsbcantrill
 
Platform values, Rust, and the implications for system software
Platform values, Rust, and the implications for system softwarePlatform values, Rust, and the implications for system software
Platform values, Rust, and the implications for system softwarebcantrill
 
Is it time to rewrite the operating system in Rust?
Is it time to rewrite the operating system in Rust?Is it time to rewrite the operating system in Rust?
Is it time to rewrite the operating system in Rust?bcantrill
 
dtrace.conf(16): DTrace state of the union
dtrace.conf(16): DTrace state of the uniondtrace.conf(16): DTrace state of the union
dtrace.conf(16): DTrace state of the unionbcantrill
 
The Hurricane's Butterfly: Debugging pathologically performing systems
The Hurricane's Butterfly: Debugging pathologically performing systemsThe Hurricane's Butterfly: Debugging pathologically performing systems
The Hurricane's Butterfly: Debugging pathologically performing systemsbcantrill
 
Papers We Love: ARC after dark
Papers We Love: ARC after darkPapers We Love: ARC after dark
Papers We Love: ARC after darkbcantrill
 
Principles of Technology Leadership
Principles of Technology LeadershipPrinciples of Technology Leadership
Principles of Technology Leadershipbcantrill
 
Zebras all the way down: The engineering challenges of the data path
Zebras all the way down: The engineering challenges of the data pathZebras all the way down: The engineering challenges of the data path
Zebras all the way down: The engineering challenges of the data pathbcantrill
 
Platform as reflection of values: Joyent, node.js, and beyond
Platform as reflection of values: Joyent, node.js, and beyondPlatform as reflection of values: Joyent, node.js, and beyond
Platform as reflection of values: Joyent, node.js, and beyondbcantrill
 
Debugging under fire: Keeping your head when systems have lost their mind
Debugging under fire: Keeping your head when systems have lost their mindDebugging under fire: Keeping your head when systems have lost their mind
Debugging under fire: Keeping your head when systems have lost their mindbcantrill
 

Más de bcantrill (20)

Predicting the Present
Predicting the PresentPredicting the Present
Predicting the Present
 
Sharpening the Axe: The Primacy of Toolmaking
Sharpening the Axe: The Primacy of ToolmakingSharpening the Axe: The Primacy of Toolmaking
Sharpening the Axe: The Primacy of Toolmaking
 
Coming of Age: Developing young technologists without robbing them of their y...
Coming of Age: Developing young technologists without robbing them of their y...Coming of Age: Developing young technologists without robbing them of their y...
Coming of Age: Developing young technologists without robbing them of their y...
 
I have come to bury the BIOS, not to open it: The need for holistic systems
I have come to bury the BIOS, not to open it: The need for holistic systemsI have come to bury the BIOS, not to open it: The need for holistic systems
I have come to bury the BIOS, not to open it: The need for holistic systems
 
Towards Holistic Systems
Towards Holistic SystemsTowards Holistic Systems
Towards Holistic Systems
 
The Coming Firmware Revolution
The Coming Firmware RevolutionThe Coming Firmware Revolution
The Coming Firmware Revolution
 
Hardware/software Co-design: The Coming Golden Age
Hardware/software Co-design: The Coming Golden AgeHardware/software Co-design: The Coming Golden Age
Hardware/software Co-design: The Coming Golden Age
 
Tockilator: Deducing Tock execution flows from Ibex Verilator traces
Tockilator: Deducing Tock execution flows from Ibex Verilator tracesTockilator: Deducing Tock execution flows from Ibex Verilator traces
Tockilator: Deducing Tock execution flows from Ibex Verilator traces
 
No Moore Left to Give: Enterprise Computing After Moore's Law
No Moore Left to Give: Enterprise Computing After Moore's LawNo Moore Left to Give: Enterprise Computing After Moore's Law
No Moore Left to Give: Enterprise Computing After Moore's Law
 
Andreessen's Corollary: Ethical Dilemmas in Software Engineering
Andreessen's Corollary: Ethical Dilemmas in Software EngineeringAndreessen's Corollary: Ethical Dilemmas in Software Engineering
Andreessen's Corollary: Ethical Dilemmas in Software Engineering
 
Visualizing Systems with Statemaps
Visualizing Systems with StatemapsVisualizing Systems with Statemaps
Visualizing Systems with Statemaps
 
Platform values, Rust, and the implications for system software
Platform values, Rust, and the implications for system softwarePlatform values, Rust, and the implications for system software
Platform values, Rust, and the implications for system software
 
Is it time to rewrite the operating system in Rust?
Is it time to rewrite the operating system in Rust?Is it time to rewrite the operating system in Rust?
Is it time to rewrite the operating system in Rust?
 
dtrace.conf(16): DTrace state of the union
dtrace.conf(16): DTrace state of the uniondtrace.conf(16): DTrace state of the union
dtrace.conf(16): DTrace state of the union
 
The Hurricane's Butterfly: Debugging pathologically performing systems
The Hurricane's Butterfly: Debugging pathologically performing systemsThe Hurricane's Butterfly: Debugging pathologically performing systems
The Hurricane's Butterfly: Debugging pathologically performing systems
 
Papers We Love: ARC after dark
Papers We Love: ARC after darkPapers We Love: ARC after dark
Papers We Love: ARC after dark
 
Principles of Technology Leadership
Principles of Technology LeadershipPrinciples of Technology Leadership
Principles of Technology Leadership
 
Zebras all the way down: The engineering challenges of the data path
Zebras all the way down: The engineering challenges of the data pathZebras all the way down: The engineering challenges of the data path
Zebras all the way down: The engineering challenges of the data path
 
Platform as reflection of values: Joyent, node.js, and beyond
Platform as reflection of values: Joyent, node.js, and beyondPlatform as reflection of values: Joyent, node.js, and beyond
Platform as reflection of values: Joyent, node.js, and beyond
 
Debugging under fire: Keeping your head when systems have lost their mind
Debugging under fire: Keeping your head when systems have lost their mindDebugging under fire: Keeping your head when systems have lost their mind
Debugging under fire: Keeping your head when systems have lost their mind
 

Último

Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 

Último (20)

Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 

DTrace in the Non-global Zone

  • 1. DTrace in the Non-global Zone Bryan Cantrill SVP Engineering, Joyent @bcantrill bryan@joyent.com
  • 2. DTrace and zones: Fraternal twins • DTrace and zones were developed in parallel during development of Solaris 10 • DTrace integrated (September 2003) before zones (early 2004) • When zones integrated, the priority was making DTrace in the global zone be able to meaningfully instrument non-global zones • DTrace in the non-global zone was hard — and a lower priority than other work on both technologies
  • 3. DTrace and zones: Basic functionality • In 2006, Dan Price (with help from Adam Leventhal and Jonathan Adams) added initial support for DTrace in the non-global zone • Allowed use of syscall provider, pid provider and (in a deranged, broken way) the profile provider • This was significant work: required modifications to both the zones privilege model and the DTrace privilege model • For example, required an implicit predicate on syscall and profile probes
  • 4. DTrace and zones in SmartOS • As the worldʼs heaviest user of zones, we at Joyent ran into (and fixed) a number of annoying bugs: • USDT probes from the non-global were not properly being enabled in the global zone (illumos#908) • Tick and profile probes did not properly fire when used in the non-global zone (illumos#1456) • Fixing the latter required an extension of the DTrace privilege model: introduced a notion of restricted operation in which args could not be referenced
  • 5. DTrace and zones in SmartOS • Other (very) annoying issues still lurked: • Inability to read “cpu” in the non-global zone • Inability to read any fields from “curlwpsinfo” and “curpsinfo”— especially “pr_dmodel” • Inability to read the “fds[]” array • Failure mode highly obnoxious: [my-non-global-zone ~]# dtrace -n BEGIN'{trace(curpsinfo->pr_psargs)}' dtrace: description 'BEGIN' matched 1 probe dtrace: error on enabled probe ID 1 (ID 1: dtrace:::BEGIN): invalid kernel access in action #1 at DIF offset 44
  • 6. Divide and conquer • curlwpsinfo and curpsinfo both are translators over the current thread (“kthread_t”) and current process (“proc_t”) • Importantly, the state contained in oneʼs own kthread_t and proc_t: • Is safe to read while executing (threads cannot disappear out from under themselves) • Does not represent potential privilege escalation • This can be fixed by simply allowing the loads where one has privileges to the current process!
  • 7. fds[]: A magic bullet? • Somehow, I convinced myself that the problem with fds[] was the translator that translates the member accesses into kernel accesses: inline fileinfo_t fds[int fd] = xlate ( fd >= 0 && fd < t_procp->p_user.u_finfo.fi_nfiles ? curthread->t_procp->p_user.u_finfo.fi_list[fd].uf_file : NULL); • If the problem was the static translators, the solution must be dynamic translators — a(n in)famously unimplemented feature of DTrace! • After dtrace.conf(12), I realized that the expression was orthogonal to the fact that the in-kernel implementation must not allow privilege escalation
  • 8. fds[]: No magic bullets • Focussing on the implementation, allows one to consider the specifics of the fds[] case • Helped by the fact that the fi_list implementation uses memory retiring for scalability of file descriptor lookups: the array is only freed upon process exit • Assures that oneʼs own fi_list is always pointing to memory that is (or was) an array of uf_entry_t • Leaves the file_t itself, which can be freed during probe context (specifically, by another thread in the same process)
  • 9. Dealing with file_t • We can deal with this by forcing everyone out of probe context after a file_t has been removed from the uf_entry_t, but before being freed • This is done by issuing a dtrace_sync() — a synchronous (empty) cross-call to all CPUs • This is expensive, and required answering an important question: just how hot is the closef() path, anyway? • By instrumenting our guinea pigs production cloud, we could answer this concisely: closef() is pretty damned hot (> 5,000/second on some machines!)
  • 10. Adding getf() • To track when fds[] was active in the non-global zone, we added a getf() subroutine (ht: ken) • Allows us to issue the sync only when we have a closef() from a non-global zone using fds[] • Had to take the final step of cleaning up the path output to strip off the zone path from the file name (as a cleanliness issue, not a security issue) • De-mo, de-mo, de-mo!
  • 11. sched and proc providers • With fds[] done, focus turned the only meaningful impediment to DTrace in the non-global zone: enabling the sched and proc providers • Recall the restricted operation introduced for the profile provider in the non-global zone... • Used this to have limited (non-global) DTrace privileges imply restricted operation for some SDT providers • Thanks to the curlwpsinfo/curpsinfo work, these providers can be meaningfully used without access to arguments
  • 12.
  • 13. Thank you. FOR MORE INFORMATION VISIT www.joyent.com OR www.smartos.org