Más contenido relacionado
La actualidad más candente (20)
Similar a VirtFS Ols2010 (20)
VirtFS Ols2010
- 1. IBM Linux Technology Center
VirtFS
A virtualization aware File System pass-through
Venkateswararao Jujjuri (JV)
jvrao@us.ibm.com
Linux Symposium 2010
© 2010 IBM Corporation
- 2. IBM Linux Technology Center
Paravirtual Applications and System Services
Move up the virtualization intelligence into system
services.
Being explored by research and academic communities
but largely ignored by the mainline.
Provides hybrid environment leveraging the security,
isolation, and performance.
Visibility into guest operations allow hypervisor to offer
variety of use cases.
Desktops, network sharing, file systems
Avoids a layer of indirection and boosts performance.
Adding this to the existing device virtualization takes
the virtualization to next level.
© 2010 IBM Corporation
- 3. IBM Linux Technology Center
Paravirtual File Systems
Good target as an entry into paravirtual system
services.
Virtual storage in the form of virtual disks.
Can't be shared between multiple guests.
Redundant caching
Unnecessary indirection between FS and block
layer.
Using traditional distributed/network file systems over
virtualized network device.
Configuration, management and encapsulation
overheads.
Double caching.
Different semantics for different File Systems.
© 2010 IBM Corporation
- 4. IBM Linux Technology Center
Use cases of Paravirtual File Systems
Replace virtual disk as the root filesystem.
Rapid cloning, Easy management, secure.
Sharing between host and guests.
Offer file system services to thin clients like LibraryOS.
Cloud computing
Secure window of host file system on the guest.
Different
portions of the same file system shared
among different guests.
Knowledge about the guest activity enables
hypervisor to offer services like de-dup, snapshots
etc.
Better utilization of system resources.
© 2010 IBM Corporation
- 5. IBM Linux Technology Center
VirtFS
Paravirtual file system pass-through between the KVM
host and guest.
Uses 9P Protocol between Client and Server.
9P2000.L protocol is being developed/defined as part of
this effort.
Server is part of QEMU and uses VirtIO transport.
File System is exported to the guest at the invocation of
QEMU.
Client is part of the Guest Kernel.
Mounted on the guest with the mount tag defined during
the QEMU invocation.
© 2010 IBM Corporation
- 6. IBM Linux Technology Center
Plan 9 Overview
Plan 9 OS is developed by AT&T Bell laboratories
(Alcatel Lucent).
Intention was to address Unix shortcomings
Seamless distributed system with integrated secure
network resource sharing.
Three core design principles
Single set of simple, well-defined interfaces to services.
Simple protocol to securely distribute the interfaces
across any network
Dynamic hierarchical structure to organize these
interfaces.
Unix pioneered the concept of treating devices like files,
Plan 9 took the metaphor further by using file
operations as the simple well-defined interfaces to all
system and application services.
© 2010 IBM Corporation
- 7. IBM Linux Technology Center
9P Overview
9P represents the protocol/abstract interface used to
access resources under Plan 9.
Any transport can be used. The only requirement is it
should be a reliable, in-order transport.
Made into Linux kernel 2.6.14 and had major changes
in 2.6.24.
Part of Linux mainline with VirtIO transport support.
9P2000.u extension
For POSIX adoption, during Linux port the protocol was
extended with 9P2000.u version.
Provided support for numeric uid/gid, extended
operations to support symlinks, links, special files etc.
Did not include full support for Linux operations.
© 2010 IBM Corporation
- 8. IBM Linux Technology Center
9P2000.L Protocol extension
Aimed at addressing 9P2000.u protocol deficiencies
while keeping the core protocol elements intact.
New opcodes which match Linux VFS API in
complimentary name space.
Linux native data formats (stat/permissions, etc)
Support of xattr, locking, quotas etc.
Co-exist with legacy and 9P2000.L with no changes to
the existing operations.
Protocol version is negotiated during initial hand shake.
If server doesn't support that version, client falls back.
© 2010 IBM Corporation
- 9. IBM Linux Technology Center
KVM and QEMU
Kernel based Virtual Machine - KVM
Is a full virtualization solution for Linux on x86 h/w
containing virtualization extensions ( VT-X / AMD-V)
Set of Linux kernel modules offer a special process
mode to the user spaces processes (kvm.ko, kvm-
intel.ko or kvm-amd.ko)
Quick EMUlator – QEMU
Uses interfaces provided by KVM to offer full system
virtualization.
Emulates standard PC hardware such as IDE disk, VGA
graphics, PCI devices etc.
Any I/O requests a guest OS makes are intercepted and
routed to the user mode to be emulated by the QEMU
process.
© 2010 IBM Corporation
- 10. IBM Linux Technology Center
VirtIO Transport
A paravirtual IO bus based on hypervisor neutral DMA
API.
Offers lockless ring queues between the guest and the
host to enable zero-copy bulk data transfer.
VirtIO PCI transport allow VirtFS to be implemented in
such a way that guest driven I/O Operations can be
zero-copy.
© 2010 IBM Corporation
- 11. IBM Linux Technology Center
VirtFS Block Diagram
Apps on Guest
VirtIO
Ring
VFS Interface
VirtFS (v9fs)
Client
Host
User Space
Guest Kernel
VirtFS
Server FS API
(v9fs server in QEMU)
File System
VFS Interface
HOST KERNEL
HARDWARE
© 2010 IBM Corporation
- 12. IBM Linux Technology Center
VirtFS Implementation
KVM, QEMU, and VirtIO presents an ideal platform for the
VirtFS server.
Two types of virtual devices
virtio-9p-pci, used to transport protocol messages and data
between host and the guest.
Fsdev, used to define the exported file system characteristics
like fs type and security model etc.
Command line options
-fsdev
local,id=myid,path=/share_path/,security_model=mapped
-device virtio-9p-pci,fsdev=myid,mount_tag=v_tag1
-virtfs
local,path=/share_path/,security_model=passthrough,mnt_ta
g=v_tag2
On Client mount -t 9p -o trans=virtio -o version=9p2000.L
v_tag1 /mnt
© 2010 IBM Corporation
- 13. IBM Linux Technology Center
Security
Two models of security enforcement
One with complete isolation of guest user domain from
that of the host.
Eliminates the need for root squashing
No setuid/setgid exposures.
Complete isolation enhances security.
Not very portable.
Other model shares user domains between host and the
guests.
Follows transitional network file system model.
If not careful, it is susceptible to security holes.
Client based security enforcement.
Server makes sure that the client control never crosses
the exported portion.
© 2010 IBM Corporation
- 14. IBM Linux Technology Center
Security Model - Mapped
VirtFS server intercepts and maps all file object create
and get/set attribute requests from client.
Files are created with VirtFS server's user credentials.
Client user credentials are stored in extended
attributes.
Extended user attributes are allowed for regular files
and directories only.
For special files, corresponding regular files are created
on file server and appropriate mode bits are added to
extended attributes.
This enhances security.
Guest user domain is completely isolated from host's.
Symlinks can't be followed locally on the file server.
File system will be VirtFS'ized.
© 2010 IBM Corporation
- 15. IBM Linux Technology Center
Security Model – Mapped (Cont...)
On Host (ls -l output)
drwx------. 2 virfsuid virtfsgid 4096 2010-05-11 09:19 adir
-rw-------. 1 virfsuid virtfsgid 0 2010-05-11 09:36 afifo
-rw-------. 2 virfsuid virtfsgid 0 2010-05-11 09:19 afile
-rw-------. 2 virfsuid virtfsgid 0 2010-05-11 09:19 alink
-rw-------. 1 virfsuid virtfsgid 0 2010-05-11 09:57 asocket1
-rw-------. 1 virfsuid virtfsgid 0 2010-05-11 09:32 blkdev
-rw-------. 1 virfsuid virtfsgid 0 2010-05-11 09:33 chardev
-rw-------. 1 root root 6 2010-05-11 09:20 asymlink
On Guest (ls -l output)
drwxr-xr-x 2 guestuser guestuser 4096 2010-05-11 12:19 adir
prw-r--r-- 1 guestuser guestuser 0 2010-05-11 12:36 afifo
-rw-r--r-- 2 guestuser guestuser 0 2010-05-11 12:19 afile
-rw-r--r-- 2 guestuser guestuser 0 2010-05-11 12:19 alink
srwxr-xr-x 1 guestuser guestuser 0 2010-05-11 12:57 asocket1
brw-r--r-- 1 guestuser guestuser 0, 0 2010-05-11 12:32 blkdev
crw-r--r-- 1 guestuser guestuser 4, 5 2010-05-11 12:33 chardev
lrwxrwxrwx 1 root root 6 2010-05-11 12:20 asymlink -> afile
© 2010 IBM Corporation
- 16. IBM Linux Technology Center
Security Model – Passthrough
All the requests are passed directly to underlying file
system without any interception.
File system objects on the fileserver will be created with
client-user's credentials.
Two methods to do this:
setuid/setgid during the creation.
chmod/chown immediately after creation.
All special files are created as-is.
Portable between NFS/CIFS.
Susceptible to security issues.
Client root can create files on the fileserver with root
privileges if fileserver is running as root.
Symlinks can be followed locally.
© 2010 IBM Corporation
- 17. IBM Linux Technology Center
Security Model – Passthrough (Cont...)
On Host
# grep 611 /etc/passwd
hostuser:x:611:611::/home/hostuser:/bin/bash
# ls -l
-rwxrwxrwx. 2 hostuser hostuser 0 2010-05-12 18:14 file1
-rwxrwxrwx. 2 hostuser hostuser 0 2010-05-12 18:14 link1
srwxrwxr-x. 1 hostuser hostuser 0 2010-05-12 18:27 mysock
lrwxrwxrwx. 1 hostuser hostuser 5 2010-05-12 18:25 symlink1 -> file1
On Guest
$ grep 611 /etc/passwd
guestuser:x:611:611::/home/guestuser:/bin/bash
$ ls -l
-rwxrwxrwx 2 guestuser guestuser 0 2010-05-12 21:14 file1
-rwxrwxrwx 2 guestuser guestuser 0 2010-05-12 21:14 link1
srwxrwxr-x 1 guestuser guestuser 0 2010-05-12 21:27 mysock
lrwxrwxrwx 1 guestuser guestuser 5 2010-05-12 21:25 symlink1 ->file1
© 2010 IBM Corporation
- 18. IBM Linux Technology Center
ACL Implementation
Access Control Lists (ACLs) allow fine grained control.
No universal standards.
Linux offers POSIX ACLs, but they are not versatile/rich
enough to support NFSv4.
Rich ACL patch set for Linux is on the mailing list.
Strategy for VirtFS
Enforcement at client.
Support only one ACL model.
Start with POSIX ACLs
Help Rich ACLs to make into the mainline.
Convert to Rich ACLs once they are available on
mainline.
© 2010 IBM Corporation
- 19. IBM Linux Technology Center
Where are we?
VirtFS server is in QEMU mainline.
Security model patchset had been accepted into QEMU
mainline, part of QEMU 0.13
Several patches made into mainline Linux.
Fedora13 and Lucid mounts VirtFS (9P2000.U).
Making good progress on 9P2000.L. Implemented all
the required VFS calls to satisfy Tuxera POSIX test
suite. These patches are either on the list or already
got accepted.
A patchset to generalize worker thread infrastructure in
QEMU is on mainline. Working on to convert the current
single thread server into multi-thread using that
infrastructure.
Working on POSIX ACLs, byte range lock
implementation.
© 2010 IBM Corporation
- 20. IBM Linux Technology Center
Performance Tests
Plain & Simple; dd to compare.
Write
dd if=/dev/zero of=/mnt/fileX bs=<blocksize>
count=<count>
Read
dd if=/mnt/fileX of=/dev/null bs=<blocksize>
count=<count>
blocksize - 8k, 64k, 2M.
Count = Number of blocks to do 8GB worth of IO
All tests are conducted on Guest.
© 2010 IBM Corporation
- 21. IBM Linux Technology Center
Comparison with NFS and CIFS
Sequential Read Sequential Write
© 2010 IBM Corporation
- 22. IBM Linux Technology Center
Comparison with blockdev
Sequential Read Sequential Write
© 2010 IBM Corporation
- 23. IBM Linux Technology Center
Next Steps
Fully Linux complaint, complete 9P2000.L protocol.
ACL implementation.
Page Cache sharing between host and guest(s)
dcache sharing between host and guests(s)
Interfacing with other filesystem APIs.
Enable consistent caching.
NFS and CIFS exportability.
Making it a rootfs for guests instead of using root
volumes
Ongoing stability and scalability and performance
improvements.
© 2010 IBM Corporation
- 24. IBM Linux Technology Center
Conclusions
Huge Potential for specialized filesystems in the
virtualization space.
Growth in the cloud space will be a major catalyst.
Lot of scope for innovation.
A step towards paravirtual system services.
Related Work
XenFS
Shared Folders(VMHGFS)
Lguest 9P support (virtio gateway to spfs)
Plan 9 Kernel KVM and Lguest Support (Sandia)
Foundation: Venti for storage content-addressable back-
end for vmware (MIT)
© 2010 IBM Corporation