Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

TMPA-2017: Vellvm - Verifying the LLVM

TMPA-2017: Tools and Methods of Program Analysis
3-4 March, 2017, Hotel Holiday Inn Moscow Vinogradovo, Moscow
Vellvm - Verifying the LLVM
Steve Zdancewic (Professor, USA University of Pennsylvania)

For video follow the link: https://youtu.be/jDPAtUfnoBU

Would like to know more?
Visit our website:
www.tmpaconf.org
www.exactprosystems.com/events/tmpa

Follow us:
https://www.linkedin.com/company/exactpro-systems-llc?trk=biz-companies-cym
https://twitter.com/exactpro

  • Inicia sesión para ver los comentarios

TMPA-2017: Vellvm - Verifying the LLVM

  1. 1. Vellvm: Verifying the LLVM IR Steve Zdancewic University of Pennsylvania TMPA 2017
  2. 2. Collaborators •  Jianzhou Zhao •  Dmitri Garbuzov •  William Mansky •  ChrisGne Rizkallah •  Richard Zhang •  Milo M.K. MarGn •  Santosh NagarakaLe •  Gil Hur •  Jeehon Kang •  Viktor Vafeiadis
  3. 3. The Science of Deep SpecificaGon •  Andrew Appel (Princeton) •  Adam Chlipala (MIT) •  Zhong Shao (Yale) •  Benjamin Pierce (U. Penn.) •  Stephanie Weirich (U. Penn.)
  4. 4. The Need For High Assurance So[ware 4 heartbleed car hacking stuxnet buffer overflow aLacks
  5. 5. Deep SpecificaGons •  Rich – expressive descripGon •  Formal – mathemaGcal, machine-checked •  2-Sided – tested from both sides •  Live – connected to real, executable code Goal: Advance the reliability, safety, security, and cost-effecGveness of so[ware (and hardware).
  6. 6. The Coq InteracGve Theorem Prover •  Based on dependent type theory •  Pure funcGonal language + datatypes •  ConstrucGve proofs ⇒ executable code •  AutomaGon: tacGcs + inference ⇒ formalizaGon tool of choice for DeepSpec team [Developed at INRIA]
  7. 7. DeepSpec: InterconnecGons
  8. 8. DeepSpec: InterconnecGons
  9. 9. LLVM: Low-Level Virtual Machine
  10. 10. LLVM Compiler Infrastructure LLVM Front Ends Code Gen/Jit OpGmizaGons/ TransformaGons Typed SSA IR Analysis [LaLner et al. ]
  11. 11. MoGvaGon: So[Bound/CETS •  Buffer overflow vulnerabiliGes. •  Detect spaGal/temporal memory safety violaGons in legacy C code. •  Implemented as an LLVM pass. •  What about correctness? [NagarakaLe, et al. PLDI ’09, ISMM ‘10] hLp://www.cis.upenn.edu/acg/so[bound/
  12. 12. InspiraGon: CompCert 12 [Xavier Leroy INRIA Rocquencourt] OpGmizing C Compiler: proved correct end-to-end with machine-checked proof in Coq C language CompCert Compiler ISA rich, formal, 2-sided, live
  13. 13. Does Such VerificaGon Work? LLVM Csmith Tool Random test-case generaGon {8 other C compilers} + CompCert 79 bugs (25 criGcal) 202 bugs 325 bugs in total Source Programs [Yang et al. PLDI 2011]
  14. 14. YES! VerificaGon Works "The striking thing about our CompCert results is that the middle-end bugs we found in all other compilers are absent. As of early 2011, the under-development version of CompCert is the only compiler we have tested for which Csmith cannot find wrong-code errors. This is not for lack of trying: we have devoted about six CPU-years to the task. The apparent unbreakability of CompCert supports a strong argument that developing compiler opEmizaEons within a proof framework, where safety checks are explicit and machine-checked, has tangible benefits for compiler users." – Regehr et. al 2011
  15. 15. LLVM Compiler Infrastructure LLVM Front Ends Code Gen/Jit OpGmizaGons/ TransformaGons Typed SSA IR Analysis [LaLner et al. ]
  16. 16. LLVM Compiler Infrastructure LLVM Front Ends Code Gen/Jit OpGmizaGons/ TransformaGons Typed SSA IR Analysis [LaLner et al.]
  17. 17. The Vellvm Project OpGmizaGons/ TransformaGons Typed SSA IR Analysis •  Formal semanGcs •  FaciliGes for creaGng simulaGon proofs •  Implemented in Coq •  Extract passes for use with LLVM compiler •  Example: verified memory safety instrumentaGon [Zhao et al. POPL 2012, CPP 2012, PLDI 2013]
  18. 18. Vellvm Framework Transform C Source Code Other OpGmizaGons LLVM IR LLVM IR Target LLVM OCaml Bindings Printer Parser Coq Syntax OperaGonal SemanGcs Memory Model Type System and SSA Proof Techniques & Metatheory Extract
  19. 19. Vellvm Framework C Source Code Other OpGmizaGons LLVM IR LLVM IR Target LLVM OCaml Bindings Printer Parser Coq Syntax OperaGonal SemanGcs Memory Model Type System and SSA Proof Techniques & Metatheory Extract Verified Transform
  20. 20. Plan •  Tour of the LLVM IR •  Vellvm infrastructure – OperaGonal SemanGcs – SSA Metatheory + Proof Techniques •  Case studies: – So[Bound memory safety – mem2reg •  Conclusion
  21. 21. LLVM IR by Example entry: r0 = ... r1 = ... r2 = ... Control-flow Graphs: + Labeled blocks exit: r7 = ... r8 = r1 x r2 r9 = r7 + r8 loop: r3 = ... r4 = r1 x r2 r5 = r3 + r4 r6 = r5 ≥ 100
  22. 22. LLVM IR by Example entry: r0 = ... r1 = ... r2 = ... Control-flow Graphs: + Labeled blocks + Binary OperaGons exit: r7 = ... r8 = r1 x r2 r9 = r7 + r8 loop: r3 = ... r4 = r1 x r2 r5 = r3 + r4 r6 = r5 ≥ 100
  23. 23. LLVM IR by Example entry: r0 = ... r1 = ... r2 = ... br r0 loop exit Control-flow Graphs: + Labeled blocks + Binary OperaGons + Branches/Return exit: r7 = ... r8 = r1 x r2 r9 = r7 + r8 ret r9 loop: r3 = ... r4 = r1 x r2 r5 = r3 + r4 r6 = r5 ≥ 100 br r6 loop exit
  24. 24. LLVM IR by Example entry: r0 = ... r1 = ... r2 = ... br r0 loop exit Control-flow Graphs: + Labeled blocks + Binary OperaGons + Branches/Return + StaGc Single Assignment (each variable assigned only once, staGcally) exit: r7 = ... r8 = r1 x r2 r9 = r7 + r8 ret r9 loop: r3 = ... r4 = r1 x r2 r5 = r3 + r4 r6 = r5 ≥ 100 br r6 loop exit
  25. 25. LLVM IR by Example entry: r0 = ... r1 = ... r2 = ... br r0 loop exit Control-flow Graphs: + Labeled blocks + Binary OperaGons + Branches/Return + StaGc Single Assignment + φ nodes exit: r7 = φ[0;entry][r5;loop] r8 = r1 x r2 r9 = r7 + r8 ret r9 loop: r3 = φ[0;entry][r5;loop] r4 = r1 x r2 r5 = r3 + r4 r6 = r5 ≥ 100 br r6 loop exit
  26. 26. LLVM IR by Example entry: r0 = ... r1 = ... r2 = ... br r0 loop exit Control-flow Graphs: + Labeled blocks + Binary OperaGons + Branches/Return + StaGc Single Assignment + φ nodes (choose values based on predecessor blocks) exit: r7 = φ[0;entry][r5;loop] r8 = r1 x r2 r9 = r7 + r8 ret r9 loop: r3 = φ[0;entry][r5;loop] r4 = r1 x r2 r5 = r3 + r4 r6 = r5 ≥ 100 br r6 loop exit
  27. 27. (UnopGmized) LLVM IR Code 27 example.c define i32 @factorial(i32 %n) nounwind uwtable ssp { entry: %1 = alloca i32, align 4 %acc = alloca i32, align 4 store i32 %n, i32* %1, align 4 store i32 1, i32* %acc, align 4 br label %start start: ; preds = %entry, %else %3 = load i32* %1, align 4 %4 = icmp ugt i32 %3, 0 br i1 %4, label %then, label %else then: ; preds = %start %6 = load i32* %acc, align 4 %7 = load i32* %1, align 4 %8 = mul i32 %6, %7 store i32 %8, i32* %acc, align 4 %9 = load i32* %1, align 4 %10 = sub i32 %9, 1 store i32 %10, i32* %1, align 4 br label %start else: ; preds = %start %12 = load i32* %acc, align 4 ret i32 %12 } example.ll unsigned factorial(unsigned n) { unsigned acc = 1; while (n > 0) { acc = acc * n; n = n -1; } return acc; }
  28. 28. Other Parts of the LLVM IR 28 op ::= %uid | constant | undef Operands bop ::= add | sub | mul | shl | … OperaEons cmpop ::= eq | ne | slt | sle | … Comparison insn ::= | %uid = alloca ty Stack AllocaEon | %uid = load ty op1 Load | store ty op1, op2 Store | %uid = getelementptr ty op1 … Address CalculaEon | %uid = call rt fun(…args…) FuncEon Calls | … phi ::= | φ[op1;lbl1]...[opn;lbln] terminator ::= | ret %ty op | br op label %lbl1, label %lbl2 | br label %lbl
  29. 29. Plan •  Tour of the LLVM IR •  Vellvm infrastructure – OperaGonal SemanGcs – SSA Metatheory + Proof Techniques •  Case studies: – So[Bound memory safety – mem2reg •  Conclusion
  30. 30. LLVM IR SemanGcs SSA CFG ≈ funcGonal program + •  Types & Memory Layout –  structured, recursive types –  type-directed projecGon –  type casts •  Effects –  structured heap load/store –  system calls (I/O) –  nondeterminism [Appel 1998] We know how to model this and prove properGes about the models.
  31. 31. LLVM’s memory model •  Manipulate structured types. %ST = type {i10,[10 x i8*]} i10 i8* i8* i8* i8* i8* i8* i8* i8* i8* i8* High-level RepresentaGon %val = load %ST* %ptr … store %ST* %ptr, %new
  32. 32. LLVM’s memory model •  Manipulate structured types. •  SemanGcs is given in terms of byte-oriented low-level memory. –  padding & alignment –  physical subtyping %ST = type {i10,[10 x i8*]} b(10, 136) 0 b(10, 2) 1 uninit 2 uninit 3 ptr(Blk32,0,0) 4 ptr(Blk32,0,1) 5 ptr(Blk32,0,2) 6 ptr(Blk32,0,3) 7 ptr(Blk32,8,0) 8 ptr(Blk32,8,1) 9 ptr(Blk32,8,2) 10 ptr(Blk32,8,3) 11 … 12 … … i10 i8* i8* i8* i8* i8* i8* i8* i8* i8* i8* High-level RepresentaGon Low-level RepresentaGon %val = load %ST* %ptr … store %ST* %ptr, %new
  33. 33. Dynamic Physical Subtyping b(10, 136) 0 b(10, 2) 1 uninit 2 uninit 3 ptr(Blk32,0,0) 4 ptr(Blk32,0,1) 5 ptr(Blk32,0,2) 6 ptr(Blk32,0,3) 7 ptr(Blk32,8,0) 8 ptr(Blk32,8,1) 9 ptr(Blk32,8,2) 10 ptr(Blk32,8,3) 11 … 12 … … Blk0 Blk1 Blk32 b(16, 1) 0 b(16, 0) 1 uninit 2 uninit 3 uninit 4 uninit 5 uninit 6 uninit 7 ptr(Blk1,0,0) 8 ptr(Blk1,0,1) 9 ptr(Blk1,0,2) 10 ptr(Blk1,0,3) 11 … 12 … … i10 load i16*
 ⇒ 1 ✓ load i16*
 ⇒ undef ✗ [Nita, et al. POPL ’08]
  34. 34. Fatal Errors Target-dependent Results Sources of Undefined Behavior •  UniniGalized variables: •  UniniGalized memory: •  Ill-typed memory usage •  Out-of-bounds accesses •  Access dangling pointers •  Free invalid pointers •  Invalid indirect calls %v = add i32 %x, undef %ptr = alloca i32 %v = load (i32*) %ptr Nondeterminism Stuck States
  35. 35. Target-dependent Results Sources of Undefined Behavior •  UniniGalized variables: •  UniniGalized memory: •  Ill-typed memory usage %v = add i32 %x, undef %ptr = alloca i32 %v = load (i32*) %ptr Nondeterminism Stuck States Stuck(f, σ) = BadFree(f, σ) ˅ BadLoad(f, σ) ˅ BadStore(f, σ) ˅ … ˅ …0 Defined by a predicate on the program configuraGon.
  36. 36. undef •  What is the value of %y a[er running the following? •  One plausible answer: 0 •  Not LLVM’s semanGcs! (LLVM is more liberal to permit more aggressive opGmizaGons) %x = or i8 undef, 1 %y = xor i8 %x %x
  37. 37. undef •  ParGally defined values are interpreted nondeterminisEcally as sets of possible values: ⟦%x⟧ = {a or b | a∈⟦i8 undef⟧, b ∈⟦1⟧}
 = {1,3,5,…,255} ⟦%y⟧ = {a xor b | a∈⟦%x⟧, b∈⟦%x⟧} = {0,2,4,…,254} %x = or i8 undef, 1 %y = xor i8 %x %x ⟦i8 undef⟧ = {0,…,255} ⟦i8 1⟧ = {1}
  38. 38. LLVMND OperaGonal SemanGcs •  Define a transiGon relaGon: f ⊢ σ1 ⟼ σ2 –  f is the program –  σ is the program state: pc, locals(δ), stack, heap •  NondeterminisGc –  δ maps local %uids to sets. –  Step relaGon is nondeterminisGc •  Mostly straigh~orward (given the heap model) –  One wrinkle: phi-nodes exectuted atomically
  39. 39. OperaGonal SemanGcs Small Step Big Step NondeterminisGc DeterminisGc LLVMND
  40. 40. DeterminisGc Refinement Small Step Big Step NondeterminisGc DeterminisGc LLVMND LLVMD ∋︎ InstanGate ‘undef’ with default value (0 or null) ⇒ determinisGc.
  41. 41. Big-step DeterminisGc Refinements Small Step Big Step NondeterminisGc DeterminisGc LLVMND LLVMDLLVMInterp ≈︎ ∋︎ BisimulaGon up to “observable events”: •  external funcGon calls
  42. 42. Big-step DeterminisGc Refinements [Tristan, et al. POPL ’08, Tristan, et al. PLDI ’09] Small Step Big Step NondeterminisGc DeterminisGc LLVMND LLVMD LLVM* DFn LLVM* DBLLVMInterp ≈︎ ≿︎ ≿︎ ∋︎ SimulaGon up to “observable events”: •  useful for encapsulaGng behavior of funcGon calls •  large step evaluaGon of basic blocks
  43. 43. A Taste of Coq FormalizaGon …
  44. 44. Plan •  Tour of the LLVM IR •  Vellvm infrastructure – OperaGonal SemanGcs – SSA Metatheory + Proof Techniques •  Case studies: – So[Bound memory safety – mem2reg •  Conclusion
  45. 45. Reasoning About LLVM Code How do we prove that a program transformaGon is correct with respect to the defined operaGonal semanGcs? •  Safety Invariants (preservaGon and progress) •  SimulaGon techniques
  46. 46. Key SSA Invariant entry: r0 = ... r1 = ... r2 = ... br r0 loop exit exit: r7 = φ[0;entry][r5;loop] r8 = r1 x r2 r9 = r7 + r8 ret r9 loop: r3 = φ[0;entry][r5;loop] r4 = r1 x r2 r5 = r3 + r4 r6 = r5 ≥ 100 br r6 loop exit DefiniGon of r2. Use of r2. Uses of r2.
  47. 47. Key SSA Invariant entry: r0 = ... r1 = ... r2 = ... br r0 loop exit exit: r7 = φ[0;entry][r5;loop] r8 = r1 x r2 r9 = r7 + r8 ret r9 loop: r3 = φ[0;entry][r5;loop] r4 = r1 x r2 r5 = r3 + r4 r6 = r5 ≥ 100 br r6 loop exit DefiniGon of r2. Use of r2. Uses of r2. The definiGon of a variable must dominate its uses.
  48. 48. Safety ProperGes •  A well-formed program never accesses undefined variables. •  Ini=aliza=on: •  Preserva=on: •  Progress: If ⊢ f and f ⊢ σ0 ⟼* σ then σ is not stuck. ⊢ f program f is well formed σ program state f ⊢ σ ⟼* σ evaluaGon of f If ⊢ f then wf(f, σ0). If ⊢ f and f ⊢ σ ⟼ σ’ and wf(f, σ) then wf(f, σ’) If ⊢ f and wf(f, σ) then f ⊢ σ ⟼ σ’
  49. 49. Safety ProperGes •  A well-formed program never accesses undefined variables. •  Ini=aliza=on: •  Preserva=on: •  Progress: If ⊢ f and f ⊢ σ0 ⟼* σ then σ is not stuck. ⊢ f program f is well formed σ program state f ⊢ σ ⟼* σ evaluaGon of f If ⊢ f then wf(f, σ0). If ⊢ f and f ⊢ σ ⟼ σ’ and wf(f, σ) then wf(f, σ’) If ⊢ f and wf(f, σ) then done(f,σ) or stuck(f,σ) or f ⊢ σ ⟼ σ’
  50. 50. Well-formed States entry: r0 = ... r1 = ... r2 = ... br r0 loop exit exit: r7 = φ[0;entry][r5;loop] r8 = r1 x r2 r9 = r7 + r8 ret r9 loop: r3 = φ[0;entry][r5;loop] r4 = r1 x r2 r5 = r3 + r4 r6 = r5 ≥ 100 br r6 loop exit pc State σ is: pc = program counter δ = local values
  51. 51. Well-formed States entry: r0 = ... r1 = ... r2 = ... br r0 loop exit exit: r7 = φ[0;entry][r5;loop] r8 = r1 x r2 r9 = r7 + r8 ret r9 loop: r3 = φ[0;entry][r5;loop] r4 = r1 x r2 r5 = r3 + r4 r6 = r5 ≥ 100 br r6 loop exit pc State σ is: pc = program counter δ = local values sdom(f,pc) = variable defns. that strictly dominate pc.
  52. 52. Well-formed States entry: r0 = ... r1 = ... r2 = ... br r0 loop exit exit: r7 = φ[0;entry][r5;loop] r8 = r1 x r2 r9 = r7 + r8 ret r9 loop: r3 = φ[0;entry][r5;loop] r4 = r1 x r2 r5 = r3 + r4 r6 = r5 ≥ 100 br r6 loop exit pc State σ contains: pc = program counter δ = local values sdom(f,pc) = variable defns. that strictly dominate pc. wf(f,σ) = ∀r∊sdom(f,pc). ∃v. δ(r) = ⎣v⎦ “All variables in scope are iniGalized.”
  53. 53. Generalizing Safety •  DefiniGon of wf: •  Generalize like this: •  Methodology: for a given P prove three theorems: IniEalizaEon(P) PreservaEon(P) Progress(P) wf(f,(pc, δ)) = ∀r∊sdom(f,pc). ∃v. δ(r) = ⎣v⎦ wf(f,(pc, δ)) = P f (δ|sdom(f,pc)) where P : Program ⟶ Locals ⟶ Prop Consider only variables in scope ⇒ P defined relaGve to the dominator tree of the CFG.
  54. 54. InstanGaGng •  For usual safety: •  For semanGc properGes: •  Useful for verifying correctness of: – code moGon, dead variable eliminaGon, common expression eliminaGon, etc. Psafety f δ = ∀r∊dom(δ). ∃v. δ(r) = ⎣v⎦ Psem f δ = ∀r. f[r] = ⎣rhs⎦ ⇒ δ(r) = ⟦rhs⟧δ
  55. 55. Plan •  Tour of the LLVM IR •  Vellvm infrastructure – OperaGonal SemanGcs – SSA Metatheory + Proof Techniques •  Case studies: – So[Bound memory safety – mem2reg •  Conclusion
  56. 56. So[Bound So[Bound C Source Code Other OpGmizaGons LLVM IR LLVM IR Target •  Implemented as an LLVM pass. •  Detect spaGal/temporal memory safety violaGons in legacy C code. •  Good test case: –  Safety CriGcal ⇒ Proof cost warranted –  Non-trivial Memory transformaGon
  57. 57. So[Bound So[Bound C Source Code Other OpGmizaGons LLVM IR LLVM IR Target %p = call malloc [10 x i8] %q = gep %p, i32 0, i32 255 store i8 0, %q %p = call malloc [10 x i8] %p_base = gep %p, i32 0 %p_bound = gep %p, i32 0, i32 10 %q = gep %p, i32 0, i32 255 %q_base = %p_base %q_bound = %p_bound assert %q_base <= %q / %q+1 < %q_bound store i8 0, %q Maintain base and bound for all pointers Propagate metadata on assignment Check that a pointer is within its bounds when being accessed
  58. 58. Disjoint Metadata •  Maintain pointer bounds in a separate memory space. •  Key Invariant: Metadata cannot be corrupted by bounds violaGon. User memory Disjoint metadata %p %pbase %pbound %i1 %q %qbase %qbound %i6 %i3
  59. 59. Proving So[Bound Correct 1.  Define So[Bound(f,σ) = (fs,σs) –  TransformaGon pass implemented in Coq. 2.  Define predicate: MemoryViolaGon(f,σ) 3.  Construct a non-standard operaGonal semanGcs: –  Builds in safety invariants “by construcGon” 4.  Show that the instrumented code simulates the “correct” code: SB f ⊢ σ ⟼ σ’ SB f ⊢ σ ⟼* σ’ ⇒ ¬MemoryViolaGon(f,σ’) So[Bound(f,σ) = (fs,σs) ⇒ [f ⊢ σ ⟼* σ’] ≿ [fs ⊢ σs ⟼* σ’s] SB
  60. 60. Lessons About So[Bound •  Found several bugs in our C++ implementaGon – InteracGon of undef, ‘null’, and metadata iniGalizaGon. •  SimulaGon proofs suggested a redesign of So[Bound’s handling of stack pointers. – Use a “shadow stack” – Simplify the design/implementaGon – Significantly more robust (e.g. varargs)
  61. 61. 0% 50% 100% 150% 200% 250% Run5me overhead Extracted Competitive Runtime Overhead The performance of extracted SoftBound is competitive with the non-verified original
  62. 62. Plan •  Tour of the LLVM IR •  Vellvm infrastructure – OperaGonal SemanGcs – SSA Metatheory + Proof Techniques •  Case studies: – So[Bound memory safety – mem2reg •  Conclusion
  63. 63. mem2reg in LLVM Front-ends w/o SSA construcGon The LLVM IR w/o φ-nodes mem2reg •  Promote stack allocas to temporaries •  Insert minimal φ-nodes •  imperaGve variables stack allocas •  no φ-nodes •  trivially in SSA form Backends SSA-based opGmizaGons The LLVM IR in the minimal SSA form
  64. 64. mem2reg Example int x = 0; if (y > 0) 
 x = 1; return x; l1: %p = alloca i32 store 0, %p %b = %y > 0 br %b, %l2, %l3 l2: store 1, %p br %l3 l3: %x = load %p ret %x The LLVM IR in the trivial SSA form
  65. 65. mem2reg Example int x = 0; if (y > 0) 
 x = 1; return x; l1: %p = alloca i32 store 0, %p %b = %y > 0 br %b, %l2, %l3 l2: store 1, %p br %l3 l3: %x = load %p ret %x The LLVM IR in the trivial SSA form l1: %b = %y > 0 br %b, %l2, %l3 l2: br %l3 l3: %x = φ[ 1,%l2] [ 0,%l1] ret %x Minimal SSA a[er mem2reg mem2reg
  66. 66. mem2reg Algorithm •  Main operaGons – Phi placement (Lengauer-Tarjan algorithm) – Renaming of the variables – Removing loads/stores •  Intermediate stage breaks SSA invariant – Defining semanGcs & well formedness non-trivial
  67. 67. vmem2reg Algorithm •  Incremental algorithm •  Pipeline of "micro transformaGons" – Preserves SSA semanGcs – Preserves well-formedness See: [Aycock & Horspool 2002.] max φs LAS/ LAA DSE DAE elim φ Find alloca
  68. 68. How to Establish Correctness? max φs LAS/ LAA DSE DAE elim φ Find alloca 1.  Simple aliasing properGes (e.g. to determine promotability) 2.  InstanGate proof technique for –  SubsGtuGon –  Dead InstrucGon EliminaGon PDIE = … IniGalize(PDIE) PreservaGon(PDIE) Progress(PDIE) 4. Put it all together to prove composiGon of “pipeline” correct. Aliasing ProperGes subst DIE
  69. 69. vmem2reg is Correct Theorem: The vmem2reg algorithm preserves the semanGcs of the source program. Proof: ComposiGon of simulaGon relaGons from the “mini” transformaGons, each built using instances of the sdom proof technique. (See Coq Vellvm development.) □
  70. 70. RunGme overhead of verified mem2reg 0% 20% 40% 60% 80% 100% 120% 140% 160% 180% 200% sjeng go compress ijpeg gzip vpr mesa art ammp equake libquantum lbm milc bzip2 parser twolf mcf h264 Geo.mean Speedup Over LLVM-O0 LLVM's mem2reg Extracted mem2reg Vmem2reg: 77% LLVM’s mem2reg: 81% (LLVM’s mem2reg promotes allocas used by intrinsics)
  71. 71. Plan •  Tour of the LLVM IR •  Vellvm infrastructure – OperaGonal SemanGcs – SSA Metatheory + Proof Techniques •  Case studies: – So[Bound memory safety – mem2reg •  Conclusion
  72. 72. Ongoing Work •  Modular SemanGcs –  Factor out memory model [CAV 15] –  Linking/separate compilaGon •  For: –  more extensibility/robustness to changes –  verifying more analyses and opGmizaGons / program transformaGons –  support for (relaxed) concurrency –  beLer support for casts [PLDI 15] LLVM SSA core IR Memory Model / IO Concurrency
  73. 73. •  Deep SpecificaGons –  rich, formal, 2-sided, live •  Layers of abstracGon –  Layer Calculus in CerGKOS [Shao et al.] –  Good for proofs! –  Bad for performance? –  ImplicaGons for theory / proof engineering? •  ComposiGonal specificaGon –  ComposiGonal CompCert [Stewart, et al. PLDI 15] •  So[ware Engineering ⇒ Proof Engineering –  Coq development methodology [CPDT: Chlipala] What engineering principles enable large-scale deep specifications?
  74. 74. Conclusions •  Proof techniques for verifying LLVM transformaGons •  Verified: –  So[bound & vmem2reg –  Similar performance to naGve implementaGons •  Future: –  IntegraGon with other DeepSpec projects [hLp://www.cis.upenn.edu/~stevez/vellvm/]

×