Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

PyParis 2017 / Camisole : A secure online sandbox to grade student - Antoine Pietri

205 visualizaciones

Publicado el

PyParis 2017
http://pyparis.org

Publicado en: Tecnología
  • Sé el primero en comentar

  • Sé el primero en recomendar esto

PyParis 2017 / Camisole : A secure online sandbox to grade student - Antoine Pietri

  1. 1. Camisole A secure online sandbox to grade students
  2. 2. Context: Prologin ● French national programming contest for students under 20 ● Online qualification with algorithmic exercises ● Thousands of applications every year ● C, C++, C#, Python, Haskell, OCaml, Java, PHP, … https://prologin.org
  3. 3. Problem: secure untrusted code evaluation (We are lazy and we want to grade our students without looking at their code.) Aimed at: teachers, programming contests, learning websites, Design goals: ● Simple enough to be used by everyone (teachers, developers, tinkerers…) ● Fast and precise (overhead matters in programming contests) ● Secure enough to be used in online websites (or malicious students) ● Abstract the languages in a modular way
  4. 4. HTTP/JSON interface $ curl camisole/run -d '{"lang": "python", "source": "print(42)"}' { "success": true, "tests": [ { "exitcode": 0, "meta": { ... }, "name": "test000", "stderr": "", "stdout": "42n" } ] }
  5. 5. Limits and quotas { "lang": "ocaml", "source": "print_string "Hello, world!n"", "compile": { "wall-time": 10 }, "execute": { "time": 2, "wall-time": 5, "processes": 1, "mem": 100000 } } ● User time ● Wall time ● Memory ● Stack size ● Number of processes/threads ● Size of files created ● Filesystem blocks ● Filesystem inodes ● … possibly more?
  6. 6. Test suite Statement: “Write a program that outputs twice its input.” { "lang": "python", "source": "print(int(input()) * 2)", "tests": [{"name": "test_h2g2", "stdin": "42"}, {"name": "test_?", "stdin": "404"}, {"name": "test_leet", "stdin": "1337"}, {"name": "test_666", "stdin": "27972"}] } { "success": true, "tests": [ { "exitcode": 0, "meta": { ... }, "name": "test_h2g2", "stderr": "", "stdout": "84n" }, { "exitcode": 0, "meta": { ... }, "name": "test_notfound", "stderr": "", "stdout": "808n" }, { "exitcode": 0, "meta": { ... }, "name": "test_leet", "stderr": "", "stdout": "2674n" }, { "exitcode": 0, "meta": { ... }, "name": "test_666", "stderr": "", "stdout": "55944n" } ] }
  7. 7. Metadata { "success": true, "tests": [ { "exitcode": 0, "meta": { "cg-mem": 2408, "csw-forced": 9, "csw-voluntary": 2, "exitcode": 0, "exitsig": null, "killed": false, "max-rss": 6628, "message": null, "status": "OK", "time": 0.009, "time-wall": 0.028 }, "name": "test000", "stderr": "", "stdout": "42n" } ] } ● Time ● Wall time ● Memory of the cgroup ● Context switches ● Exit code ● Signal received ● Killed or exited successfully ● Max resident set size ● … possibly more?
  8. 8. Front-end integration: programming contest
  9. 9. Front-end integration: online course * (* not actually using camisole, but could… :-))
  10. 10. Architecture User application Camisole Isolation backend HTTP/JSON API Virtual machine Sandbox Untrusted program
  11. 11. Solutions considered that don’t really work: ● ptrace ○ Overhead to monitor the system calls ○ Multiprocessing doesn’t work ○ Not multiplatform ○ Lot of things to handle ○ Runtimes can do weird things ● Docker ○ Overhead because overkill ○ Not precise enough Isolation backend
  12. 12. Isolation backend Backends : ● “Big brother” (chroot + setrlimit + memory watchdog + outside firewall) ○ Previous in-house solution ○ Isolation is very sloppy ● Isolate (https://github.com/ioi/isolate) ○ Resources limitation using cgroups ○ Isolation with namespaces ○ Lightweight FS isolation (chroot + mount --bind) ● Nsjail? (http://nsjail.com/) ○ Could be implemented as an alternate backend ○ You know how every time you do something, Google comes and does it 10x better?
  13. 13. Language module system Python 3.6 __init_subclass__ in action! from camisole.models import Lang, Program class Python(Lang, name='Python'): source_ext = '.py' interpreter = Program('python3') reference_source = r'print(42)' Load arbitrary language modules with: $ export CAMISOLEPATH=~/mylangs
  14. 14. (Simple, except for Java.) import re import subprocess from pathlib import Path from camisole.models import Lang, Program RE_WRONG_FILENAME_ERROR = re.compile(r...,') PSVMAIN_SIGNATURE = 'public static void main(' PSVMAIN_DESCRIPTOR = 'descriptor: ([Ljava/lang/String;)V' class Java(Lang): source_ext = '.java' compiled_ext = '.class' compiler = Program('javac', env={'LANG': 'C'}, version_opt='-version') interpreter = Program('java', version_opt='-version') # /usr/lib/jvm/java-8-openjdk/jre/lib/amd64/jvm.cfg links to # /etc/java-8-openjdk/amd64/jvm.cfg allowed_dirs = ['/etc/java-8-openjdk'] # ensure we can parse the javac(1) stderr extra_binaries = {'disassembler': Program('javap', version_opt='-version')} reference_source = r''' class SomeClass { static int fortytwo() { return 42; } static class Subclass { // nested psvmain! wow! public static void main(String args[]) { System.out.println(SomeClass.fortytwo()); } } } ''' def __init__(self, *args, **kwargs): super().__init__(*args, **kwargs) # use an illegal class name so that javac(1) will spit out the actual # class named used in the source self.class_name = '1337' # we give priority to the public class, if any, so keep a flag if we # found such a public class self.found_public = False try: self.heapsize = self.opts['execute'].pop('mem') except KeyError: self.heapsize = None def compile_opt_out(self, output): # javac has no output directive, file name is class name return [] async def compile(self): # try to compile with default class name (Main) retcode, info, binary = await super().compile() if retcode != 0: # error: public class name is not '1337' -- obviously, it's illegal, # so find what it actually is match = RE_WRONG_FILENAME_ERROR.search(info['stderr']) if match: self.found_public = True self.class_name = match.group(1) # retry with new name retcode, info, binary = await super().compile() return (retcode, info, binary) def source_filename(self): return self.class_name + self.source_ext def execute_filename(self): # return eg. Main.class return self.class_name + self.compiled_ext def execute_command(self, output): cmd = [self.interpreter.cmd] # Use the memory limit as a maximum heap size if self.heapsize is not None: cmd.append(f'-Xmx{self.heapsize}k') # foo/Bar.class is run with $ java -cp foo Bar cmd += ['-cp', str(Path(self.filter_box_prefix(output)).parent), self.class_name] return cmd def find_class_having_main(self, classes): for file in classes: # run javap(1) with type signatures try: stdout = subprocess.check_output( [self.extra_binaries['disassembler'].cmd, '-s', str(file)], stderr=subprocess.DEVNULL, env=self.compiler.env) except subprocess.SubprocessError: continue # iterate on lines to find p s v main() signature and then # its descriptor on the line below; we don't rely on the type # from the signature, because it could be String[], String... or # some other syntax I'm not even aware of lines = iter(stdout.decode().split('n')) for line in lines: if line.lstrip().startswith(PSVMAIN_SIGNATURE): if next(lines).lstrip() == PSVMAIN_DESCRIPTOR: return file.stem def read_compiled(self, path, isolator): # in case of multiple or nested classes, multiple .class files are # generated by javac classes = list(isolator.path.glob('*.class')) files = [(file.name, file.open('rb').read()) for file in classes] if not self.found_public: # the main() may be anywhere, so run javap(1) on all .class new_class_name = self.find_class_having_main(classes) if new_class_name: self.class_name = new_class_name return files def write_binary(self, path, binary): # see read_compiled(), we need to write back all .class files # but give only the main class name (execute_filename()) to java(1) for file, data in binary: with (path / file).open('wb') as c: c.write(data) return path / self.execute_filename()
  15. 15. Low-level API When simple single-file evaluation doesn’t suit your needs: opts = {'time': 5, 'mem': 5000} isolator = Isolator(opts, allowed_dirs=['/home']) async with isolator: await isolator.run(command, env=env, data=input()) return (isolator.stdout, isolator.stderr)
  16. 16. Deployment We autobuild an OVA (VirtualBox export) using packer.io: https://camisole.prologin.org/ova/camisole-latest.ova Importing it in VirtualBox and running the VM just works™ and gives you an HTTP server with all the built-in languages (Ada, C, Brainfuck, C#, C++, F#, Haskell, Java, Javascript, Lua, OCaml, Pascal, Perl, PHP, Python, Ruby, Rust, Scheme, VisualBasic). Great for non-tech savvy people!
  17. 17. Conclusion ● Elegant API for a hard problem: good abstraction! ● Linux isolation is awesome ● Python 3.5 and 3.6 features are awesome (f-strings, __init_subclass__, async…) Will our simplicity-centered design will make the project gain traction? :-) Full documentation: https://camisole.prologin.org Contribute! https://github.com/prologin/camisole Contact: #prologin @ irc.freenode.net antoine.pietri@prologin.org alexandre.macabies@prologin.org

×