XPDS14: MirageOS 2.0: branch consistency for Xen Stub Domains - Anil Madhavapeddy, Univeristy of Cambridge
1. MIRAGEOS 2.0: BRANCH CONSISTENCY
FOR XEN STUB DOMAINS
Dave Scott Citrix Systems
@mugofsoup
@eriangazag
@avsm
Thomas Gazagnaire University of Cambridge
Anil Madhavapeddy University of Cambridge
http://openmirage.org
http://decks.openmirage.org/xendevsummit14/
Press <esc> to view the slide index, and the <arrow> keys to navigate.
2. INTRODUCING MIRAGE OS 2.0
These slides were written using Mirage on OSX:
They are hosted in a 938kB Xen unikernel written in statically
type-safe OCaml, including device drivers and network stack.
Their application logic is just a couple of source files, written
independently of any OS dependencies.
Running on an ARM CubieBoard2, and hosted on the cloud.
Binaries small enough to track the entire deployment in Git!
4. NEW FEATURES IN 2.0
Mirage OS 2.0 is an important step forward, supporting more, and
more diverse, backends with much greater modularity.
For information about the new components we cannot cover here,
see openmirage.org:
Xen/ARM
Irmin
OCaml-TLS
Vchan
Ctypes
, for running unikernels on embedded devices .
, Git-like distributed branchable storage.
, a from-scratch native OCaml TLS stack.
, for low-latency inter-VM communication.
, modular C foreign function bindings.
5. THIS XEN DEV SUMMIT TALK
We focus on how we have been using Mirage to:
improve the core Xenstore toolstack using Irmin.
a performance and distribution future for Xenstore.
plans for upstreaming our patches.
But first, some background...
6. IRMIN: MIRAGE 2.0 STORAGE
Irmin is our library database that follows the modular design
principles of MirageOS: https://github.com/mirage/irmin
Runs in both userspace and kernelspace
A key = value store (sound familiar?)
Git-style: commit, branch, merge
Preserves history by default
Backend support for in-memory, Git and HTTP/REST stores.
Mirage unikernels thus version control all their data, and have a
distributed provenance graph of all activities.
7. BASE CONCEPTS
OBJECT DAG (OR THE "BLOB STORE")
Append-only and easily distributed.
Provides stable serialisation of structured values.
Backend independent storage
memory or on-disk persistence
encryption or plaintext
Position and architecture independent pointers
such as via SHA1 checksum of blocks.
8. BASE CONCEPTS
HISTORY DAG (OR THE "GIT STORE")
Append-only and easily distributed.
Can be stored in the Object DAG store.
Keeps track of history.
Ordered audit log of all operations.
Useful for merge (3-way merge is easier than 2-way)
Snapshots and reverting operations for free.
10. IRMIN TOOLING
opam update && opam install irmin
Command-line frontend that uses:
storage: in-memory format or Git
network: custom format, Git or HTTP/REST
interface: JSON interface for storing content easily
OCaml library that supplies:
merge-friendly data structures
backend implementations (Git, HTTP/REST)
11. XENSTORE: VM METADATA
Xenstore is our configuration database that stores VM metadata in
directories (ala Plan 9).
Runs in either userspace or kernelspace (just like Mirage)
A key = value store (just like Irmin)
Logs history by default (just like Irmin...)
12. XENSTORE: VM METADATA
Xenstore is our configuration database that stores VM metadata in
directories (ala Plan 9).
Runs in either userspace or kernelspace (just like Mirage)
A key = value store (just like Irmin)
Logs history by default (just like Irmin...)
TRANSACTION_START branch; TRANSACTION_END merge
The "original plan" in 2002 was for seamless distribution across
hosts/clusters/clouds. What happened? Unfortunately the
previous transaction implementations all suck.
13. XENSTORE: CONFLICTS
Terrible performance impact: a transaction involves 100 RPCs
to set it up (one per r/w op), only to be aborted and retried.
Longer lived transactions have a greater chance of conflict vs a
shorter transaction, repeating the longer transaction.
Concurrent transactions can lead to live-lock:
Try starting lots of VMs in parallel!
Much time wasted removing transactions (from xend )
14. XENSTORE: CONFLICTS
Conflicts between Xenstore transactions are so
devastating, we try hard to avoid transactions
altogether. However they aren't going away.
15. XENSTORE: CONFLICTS
Observe: typical Xenstore transactions (eg creating domains)
shouldn't conflict. It's a flawed merging algorithm.
If we were managing domain configurations in git , we
would simply merge or rebase and it would work.
Therefore the Irmin Xenstore simply does:
DB.View.merge_path ~origin db [] transaction >>= function
| `Ok () -> return true
| `Conflict msg ->
(* if merge doesn't work, try rebase *)
DB.View.rebase_path ~origin db [] transaction >>= function
| `Ok () -> return true
| `Conflict msg ->
(* A true conflict: tell the client *)
...
17. XENSTORE: TRANSACTIONS
Big transactions give you high-level intent
useful for debug and tracing
minimise merge commits (1 per transaction)
minimise backend I/O (1 op per commit)
crash during transaction can tell the client to "abort retry"
Solving the performance problems with big
transactions in previous implementations greatly
improves the overall health of Xenstore.
18. XENSTORE: RELIABILITY
What happens if Xenstore crashes?
Rings full of partially read/written packets. No reconnection
protocol in common use.
proposal on xen-devel but years before we can rely on it
Per-connection state in Xenstore:
watch registrations, pending watch events
If Xenstore is restarted, many of the rings will be broken
... you'll probably have to reboot the host
19. XENSTORE: RELIABILITY
Irmin to the rescue!
Data structure libraries built on top of Irmin, for example
mergeable queues. Use these for (eg) pending watch events.
We can persist partially read/written packets so fragments can
be recovered over restart
We can persist connection information (i.e. ring information
from an Introduce) and auto-reconnect on start
Added bonus: easy to introspect state via xenstore-ls , can
see each registered watch, queue etc
20. XENSTORE: TRACING
When a bug is reported normal procedure is:
stare at Xenstore logs for a very long time
slowly deduce the state at the time the bug manifested
(swearing and cursing is strictly optional)
With Irmin+Xenstore, one can simply:
git checkout to the revision
Inspect the state with ls
In the future: git bisect automation!
22. XENSTORE: DATA STORAGE
Xenstore contains VM metadata ( /vm ) and domain metadata
( /local/domain )
But VM metadata is duplicated elsewhere and copied in/out
xl config files, and xapi database
(insert cloud toolstack here)
With current daemons, it is unwise to persist large data.
What if Xenstore could store and distribute this
data efficiently, and if application data could be
persisted reliably?
23. XENSTORE: THE DATA
Irmin to the rescue!
Check in VM metadata to Irmin
clone , pull and push to move between hosts
expose to host via FUSE, for Plan9 filesystem goodness
maybe one day even echo start > VM/uuid/ctl
FUSE code at
https://github.com/dsheets/profuse
VM data could be checked in to Irmin
very important for unikernels that have no native storage
24. XENSTORE: UPSTREAMING
Advanced prototype exists using Mirage libraries, but doesn't fully
pass unit test suite. Before upstreaming:
Write fixed-size backend for block device
Preserving history is a good default, but history does need to
be squashed from time to time.
Upstream patches:
switch to using using opam to build Xenstore
reproducible builds via a custom Xen remote
allows using modern OCaml libraries (Lwt, Mirage, etc...)
In Xapi, delete existing db and replace with Xenstore 2.0
25. XENSTORE: CODE
Prototype+unit tests at:
(can build without Xen on MacOS X now)
https://github.com/mirage/ocaml-xenstore-server
opam init --comp=4.01.0
eval `opam config env`
opam pin irmin git://github.com/mirage/irmin
opam install xenstore irmin shared-memory-ring xen-evtchn io-page
git clone git://github.com/mirage/ocaml-xenstore-server
cd ocaml-xenstore-server
make
./main.native --enable-unix --path /tmp/test-socket --database /tmp/db&
./cli.native -path /tmp/test-socket write foo=bar
./cli.native -path /tmp/test-socket write read foo
cd /tmp/db; git log
26. HTTP://OPENMIRAGE.ORG/
Featuring blog posts about Mirage OS 2.0 by:
Amir Chaudhry , Thomas Gazagnaire , David Kaloper
,
Thomas Leonard , Jon Ludlam , Hannes Mehnert , Mindy Preston
,
Dave Scott , and Jeremy Yallop
.
Mindy Preston and Jyotsna Prakash from OPW/GSoC will also be
talking about their projects in the community panel!
More Irmin+Xenstore posts with details:
Introduction to Irmin
Using Irmin to add fault-tolerance to Xenstore