2. TSUNAMi
0 National Research
Foundation Project
0 2015 - 2020
0 Trustworthiness in
COTS-Integrated platforms
0 Trustworthy systems from
un-trusted components
http://www.comp.nus.edu.sg/~tsunami
Vulnerability
Discovery
Binary
Hardening
Verification
Data
Protection
Research Outputs
– Publications, Tools, Academic Collaboration, Exchanges,
Seminars & Workshops
Education – NUS
(New module)
Industry
Collaboration
Singapore Cyber-security
Consortium
Agency
Collaboration
- DSTA, etc
Enhancing Local
Capabilities
2
25th ASWEC Keynote Adelaide
3. 2016 event
3
A team of hackers won $2 million by building a
machine that could hack better than they could
DARPA Cyber Grand
Challenge
Automation of Security
~ detecting and fixing of
vulnerabilities
25th ASWEC Keynote Adelaide
4. In the rest of the talk
0 Capsule version
0 Executive summary
0 Slightly detailed version
0 A vision for trustworthy software
25th ASWEC Keynote Adelaide 4
Technology is driving us to distraction
5. Repair - Why?
25thASWECKeynoteAdelaide
5
Maintaining Legacy Software
Debugging Aid
Education, Grading in MooCs
Security Patches
Self-healing systems, Drones
Buggy
Program
Correctness
Criterion
Repair Patched
Program
6. Troubles with repair
25th ASWEC Keynote Adelaide 6
• Weak description of intended behavior / correctness criterion e.g. tests
• Possibility to use “Bugs as deviant behavior” philosophy
• Weak applicability of repair techniques e.g. only overflow errors
• Large search space of candidate patches for general-purpose repair tools.
• Patch suggestions and Interactive Repair
8. 8
1 int triangle(int a, int b, int c){
2 if (a <= 0 || b <= 0 || c <= 0)
3 return INVALID;
4 if (a == b && b == c)
5 return EQUILATERAL;
6 if (a == b || b != c) // bug!
7 return ISOSCELES;
8 return SCALENE;
9 }
Test id a b c oracle Pass
1 -1 -1 -1 INVALID pass
2 1 1 1 EQUILATERAL pass
3 2 2 3 ISOSCELES pass
4 2 3 2 ISOSCELES fail
5 3 2 2 ISOSCELES fail
6 2 3 4 SCALENES fail
Correct fix
(a == b || b == c || a== c)
Traverse all mutations of line 6, and check
Hard to generate correct fix since a==c never
appears elsewhere in the program.
OR
Generate the constraint
f(2,2,3)f(2,3,2) f(3,2,2)f(2,3,4)
and get the solution
f(a,b,c) = (a == b || b == c || a== c)
25th ASWEC Keynote Adelaide
9. Comparison
Syntactic Program Repair
1. Where to fix, which line?
2. Generate patches in the candidate
line
3. Validate the candidate patches
against correctness criterion.
Semantic Program Repair
1. Where to fix, which line(s)?
2. What values should be returned by
those lines, e.g. <inp ==1, ret== 0>
3. What are the expressions which
will return such values?
9
Syntax-based Schematic
for e in Search-space{
Validate e against Tests
}
Semantics-based Schematic
for t in Tests {
generate repair constraint Ψt
}
Synthesize e from ∧tΨt
25th ASWEC Keynote Adelaide
11. 11
1 i f ( hbtype == TLS1 HB REQUEST) {
2 . . .
3 memcpy (bp , pl , payload ) ;
4 . . .
5 }
(a) The buggy part of the Heartbleed-
vulnerable OpenSSL
1 i f ( hbtype == TLS1 HB REQUEST
2 && payload + 18 < s->s3->rrec.length) {
3 . . .
4 }
(b) A fix generated automatically
1 if (1 + 2 + payload + 16 > s->s3->rrec.length)
2 return 0;
3 . . .
4 i f ( hbtype == TLS1_HB_REQUEST) {
5 . . .
6 }
7 e l s e i f ( hbtype == TLS1_HB_RESPONSE) {
8 . . .
9 }
10 r e t u r n 0 ;
(c) The developer-provided repair
The Heartbleed Bug is a serious vulnerabilityin the popular OpenSSL
cryptographicsoftware library. This weakness allows stealing the
information protected, under normal conditions, by the SSL/TLS
encryption used to secure the Internet. SSL/TLS provides
communication security and privacy over the Internet for applications
such as web, email, instant messaging (IM) and some virtual private
networks (VPNs).
--- Source: heartbleed.com
25thASWECKeynoteAdelaide
12. Application in education
12
Use program repair in intelligent
tutoring systems to give the students’
individual attention.
Detailed Study in IIT-Kanpur, India by
Prof. Amey Karkare and team
(FSE17)
25th ASWEC Keynote Adelaide
13. In the rest of the talk
0 Capsule version
0 Executive summary
0 Slightly detailed version
0 A vision for trustworthy software
25th ASWEC Keynote Adelaide 13
14. Trustworthy Software:
Space of Problems
0 Fuzz Testing
0 Feed semi-random inputs to find hangs and crashes
0 Continuous fuzzing
0 Incrementally find new “problems” in software
0 Crash reproduction
0 Re-construct a reported crash, crashing input not included
due to privacy
0 Reaching nooks and corners
0 Localizing reported observable errors
0 Patching reported errors from input-output examples
1425th ASWEC Keynote Adelaide
15. Space of Techniques
Search
0 Random
0 Biased-random
0 Genetic (AFL Fuzzer)
0 Low set-up overhead
0 Fast, less accurate
0 Use objective function to
steer
Symbolic Execution
0 Dynamic Symbolic execution
0 Concolic Execution
0 Cluster paths based on symbolic
expressions of variables
0 High set-up overhead
0 Slow, more accurate
0 Use logical formula to steer
1525th ASWEC Keynote Adelaide
16. Search & SE: Interplay
(Random) Search
0 Trade-offs
0 Less systematic
0 Easy set-up, execute up to a
time budget
0 Enhance the effectiveness of
search, with symbolic execution
as inspiration
Symbolic Execution
0 Trade-offs
0 Systematic
0 More involved set-up, solver calls.
0 Explore capabilities of symbolic
execution beyond directed search
1625th ASWEC Keynote Adelaide
17. Symbolic Execution Tree
25th ASWEC Keynote Adelaide 17
int test_me(int Climb, int Up){
int sep, upward;
if (Climb > 0){
sep = Up;}
else {sep = add100(Up);}
if (sep > 150){
upward = 1;
} else {upward = 0;}
if (upward < 0){
abort;
} else return upward;
}
Climb > 0
Up > 150
Yes
1 < 0
Yes
Infeasible
Climb ==1,
Up == 200
1 < 0
No
Infeasible Climb ==1,
Up == 100
….
Yes YesNo No
18. Typical Use of Symbolic
Execution
25th ASWEC Keynote Adelaide 18
Program Analysis for Bug Finding
Concolic execution: supporting real executions
[Directed Automated Random Testing]
Symbolic execution tree construction e.g. KLEE
[Modeling system environment]
Grey-box fuzz testing for systematic path exploration
inspired by concolic execution
AFLFast
Random Testing for Bug Finding
19. AFLFast [CCS16]
25th ASWEC Keynote Adelaide 19
Schematic
• if (condition1)
• return // short path, frequented by many inputs
• else if (condition2)
• exit // short paths, frequented by many inputs
• else …. Use grey-box fuzzer which keeps track of path id for a test.
Estimate whether a discovered path is higb probability or not
Higher weightage to low probability paths discovered, to gravitate to
those -> discover new paths with minimal effort.
Integrated into distribution of AFL fuzzer within a year of
publication (CCS16), which is used on a daily basis by corporations
for finding vulnerabilities
20. Another use of Symbolic
Execution [CCS17]
25th ASWEC Keynote Adelaide 20
Reachability Analysis
Reachability of a location in the program
- Traverse the symbolic execution tree using search strategies
- Encode it as an optimization
problem inside the genetic search
of grey-box fuzzing AFLGo
22. Novel use of symbolic execution
25th ASWEC Keynote Adelaide 22
In the absence of formal specifications, analyze the buggy
program and its artifacts such as execution traces via various
heuristics to glean a specification about how it can pass tests
and what could have gone wrong!
Specification Inference (TODAY!)
(application: localization, synthesis, repair)
23. Automated Program Repair
0 Weak description of intended behavior / correctness criterion e.g. tests
0 Weak applicability of repair techniques e.g. only overflow errors
0 Large search space of candidate patches for general-purpose repair tools.
25th ASWEC Keynote Adelaide 23
24. Specification Inference
25th ASWEC Keynote Adelaide
Test input
Concrete
values
Expected output of program
Output:
Value-set or Constraint
Symbolic
execution
Program
Concrete Execution
24
ICSE13, FSE18
25. Example
1 int is_upward( int inhibit, int up_sep, int down_sep){
2 int bias;
3 if (inhibit)
4 bias = down_sep; // bias= up_sep + 100
5 else bias = up_sep ;
6 if (bias > down_sep)
7 return 1;
8 else return 0;
9 }
inhibit up_sep down_sep Observed
output
Expected
Output
Result
1 0 100 0 0 pass
1 11 110 0 1 fail
0 100 50 1 1 pass
1 -20 60 0 1 fail
0 0 10 0 0 pass
2525th ASWEC Keynote Adelaide
26. Example
26
1 int is_upward( int inhibit, int up_sep, int down_sep){
2 int bias;
3 if (inhibit)
4 bias = f(inhibit, up_sep, down_sep) // X
5 else bias = up_sep ;
6 if (bias > down_sep)
7 return 1;
8 else return 0;
9 }
Inhibit == 1 up_sep == 11 down_sep == 110
Symbolic Execution
( pcj outj == expected_out(t) )
f(t) == X
j Paths
Repair constraint
( (X >110 1 ==1)
(X ≤ 110 0 == 1)
)
f(1,11,110) == X
25thASWECKeynote
Adelaide
26
27. What it should have been
1 int is_upward( int inhibit, int up_sep, int
down_sep){
2 int bias;
3 if (inhibit)
4 bias = f(inhibit, up_sep, down_sep)
5 else bias = up_sep ;
6 if (bias > down_sep)
7 return 1;
8 else return 0;
9 }
Inhibit
== 1
up_sep ==
11
down_sep
== 110
Symbolic Execution
f(1,11,110) > 110
25thASWECKeynote
Adelaide
27
28. Fix the suspect
0 Accumulated constraints
0 f(1,11, 110) > 110
0 f(1,0,100) ≤ 100
0 …
0 Find a f satisfying this constraint
0 By fixing the set of operators appearing in f
0 Candidate methods
0 Search over the space of expressions
0 Program synthesis with fixed set of operators
0 Can also be achieved by second-order constraint solving
0 Generated fix
0 f(inhibit,up_sep,down_sep) = up_sep + 100
25thASWECKeynote
Adelaide
28
29. Term = Var | Constant | Term + Term |
Term – Term | Constant * Term
(Second order) Synthesis
29
scanf(“%d”, &x);
for(i = 0; i <10; i++){
int t = (i,x);
if (t > 0) printf(“1”);
else printf(“0”);
}
P(5) “1110000000” expected “1111111000”
Buggy Program:
Sample Test:
Synthesis Specification: . i i output = expected Solve for directly
Term = Var | Constant | Term + Term |
Term – Term | Constant * Term
𝜌 0,5
> 0
𝜌 1,5
> 0
𝜌 1,5
> 0
𝜌 2,5
> 0
𝜌 2,5
> 0
𝜌 2,5
> 0
𝜌 2,5
> 0
Yes No
Yes No Yes No
𝑈𝑁𝑆𝐴𝑇
Yes
Term = Var | Constant | Term + Term |
Term – Term | Constant * Term
Existing component-based synthesis encodings provide inefficient unsatisfiability proofs,
because of using linear integer arithmetic encoding.
Designed a new encoding based on propositional logic for efficient unsatisfiability proofs.
25th ASWEC Keynote Adelaide
30. DirectFix [ICSE2015]
30
Tests Debugging DSE
Synthesis
Tests
MaxSMT solver
Conjure a function which
represents minimal change
to buggy program.
25th ASWEC Keynote Adelaide
31. Example
31
if (x > y)
if (x > z)
out =10;
else
out = 20;
else
out = 30;
return out;
if (x >= y)
if (x >= z)
out =10;
else
out = 20;
else
out = 30;
return out;
if (x > y)
if (x > z)
out =10;
else
out = 20;
else
out = 30;
return ((x==y)? ((x==z)?10: 20)): out);
SemFix
DirectFix
Test cases:
all possible
orderings of x,y,z
25th ASWEC Keynote Adelaide
#Pgm Equiv Same Loc Diff
SemFix 44 17% 46% 6.36
DirectFix 44 53% 95% 2.31
37. SemGraft Results
37
Program Commit Bug Angelix SemGraft
sed c35545a Handle empty match Correct Correct
seq f7d1c59 Wrong output Correct Correct
sed 7666fa1 Wrong output Incorrect Correct
sort d1ed3e6 Wrong output Incorrect Correct
seq d86d20b Don’t accepts 0 Incorrect Correct
sed 3a9365e Handle s/// Incorrect Correct
Program Commit Bug Angelix SemGraft
mkdir f7d1c59 Segmentation fault Incorrect Correct
mkfifo cdb1682 Segmentation fault Incorrect Correct
mknod cdb1682 Segmentation fault Incorrect Correct
copy f3653f0 Failed to copy a file Correct Correct
md5sum 739cf4e Segmentation fault Correct Correct
cut 6f374d7 Wrong output Incorrect Correct
GNU Coreutils
as reference
Linux Busybox
as reference
25th ASWEC Keynote Adelaide
38. Specification Inference [FSE18]
x = (… );
Test input
Concrete
values
Expected output of program
Output:
Value-set or Constraint
Symbolic execution
Program
Concrete Execution
38
𝜋1 𝜋2 𝜋 𝑛
Program
Tests or
Property
Environment
ModelingLibrary
Buggy
Program
Patched
Program
Tests or
Property
Repair
25th ASWEC Keynote Adelaide
39. Application in Library
Modeling
0 Modeling libraries for symbolic execution of application program.
0 Do not manually provide libraries for symbolic analysis.
0 Instead, they can be partially synthesized.
25th ASWEC Keynote Adelaide 39
void main (int argc , char * argv
[]) {
int a = atoi( argv [1]) ;
printf ("%dn", 16 / a);
}
void atoi_sketch ( char *arr []) {
int acc;
for (i = 0; i < strlen (arr); i++)
acc = (acc , arr[i]);
return acc;
}
Second order
reasoning
= xy. 10 x + y - 48
P(“4") “4";
P(“16") “1" Tests
Use this and symbolic execution to
find crashing inputs e.g. “0”
40. Application in education FSE17
40
Use program repair in intelligent tutoring systems to give the
students’ individual attention. Detailed Study in IIT-Kanpur,
(FSE17, and ongoing)
25th ASWEC Keynote Adelaide
The fact that student programs are often
significantly incorrect makes it difficult to
fix those programs.
P:8
F:2
P:5
F:5
P:6
F:4
P:9
F:1partial repair
P:10
F: 0
P: # of passing tests
F: # of failing tests
APR
(symbolic execution)
Test 1
Test 2
Test 3
Test 4
Test 5
Test 1
Test 2
Test 3
Test 4
Test 5
Test 1
Test 2
Test 3
Test 4
Test 5
+ if (true) {
S;
+ }
(search)
+ if (E) {
S;
+ }
41. Application in Education
25th ASWEC Keynote Adelaide 41
43 buggy student submissions from dataset
Across 8 unique problems
37 TA graders volunteered for study
Each TA gets all 43 submissions to grade
With repair hints for half the submissions
42. Acknowledgments
42
Discussions:
Umair Ahmed & Amey Karkare (IIT-K)
Marcel Boehme (Monash),
Cristian Cadar (Imperial)
Satish Chandra (Facebook),
Kenneth Cheong & Chia Yuan Cho (DSO),
Claire Le Goues (CMU),
Lars Grunske & Yannic Noller(Humboldt)
Sergey Mechtaev (University College London)
Martin Monperrus (KTH)
HDT Nguyen and Dawei Qi
Manh-Dung Nguyen & Van-Thuan Pham (NUS)
Michael Pradel (Darmstadt)
Mukul Prasad & Hiroaki Yoshida (Fujitsu),
Shin Hwei Tan (SUSTech)
Jooyong Yi (Innopolis)
Relevant papers:
http://www.comp.nus.edu.sg/~abhik/projects/Repair/index.html
http://www.comp.nus.edu.sg/~abhik/projects/Fuzz/
25th ASWEC Keynote Adelaide
43. JOINT R&D
Seed funding
(Industry-Academia pair)
Infrastructure sharing
TECHNOLOGY TALKS
Latest technologies and trends
Project showcases
WILD & CRAZY IDEAS
(WACI) DAY
Research ideas
Problem statements
SPECIAL INTEREST GROUPS
Knowledge and idea exchange
R&D partnership exploration
CYBERSECURITY
CAMP
Workshop, Industry talks
Hackathons
CYBERSECURITY
LEAN LAUNCHPAD
Business +
Technical Discussions
Engage via training
Engage via discussions
and advice
Engage via research
collaboration
SCYFI
Research Showcase
Research Presentations
Talks
SG Cyber-security Consortium
40 companies, 10 platinum member companies (>100M)
25th ASWEC Keynote Adelaide 43
44. Special Interest Groups (SIGs)
Future
possibilities
Threat Intelligence
and Incident Response
Led by
Data Protection
and Privacy
Led by
System and
Software Security
Led by
Mobile Security
Led by
Cybercrime and
Investigation
Led by
Led by
Cyber-Physical System
(CPS) and IoT Security
Regular meetings organized in-between major events
Discussions leading to formulation of grant calls
25th ASWEC Keynote Adelaide 44
45. Deployment
0 Grey-box Fuzzing
0 Enhanced and (more) systematic path coverage & directed-ness in grey-
box fuzzing.
0 AFLFast integrated into main AFL distribution after discussions.
0 Automated Program Repair
0 Infer specifications about repairing a program to meet correctness
requirements.
0 Angelix tool used for scalable program repair ~ 80 groups
0 Automated grading, and hints for programming assignments from partial
repairs
0 Moving forward
0 Fuzzing and repair tightly integrated for self-healing software.
0 Software can morph to respond to changes in environment.
0 Built-in Self Test (BIST) for autonomous software systems
0 Global as well as local usage via SG Cyber-security Consortium
25th ASWEC Keynote Adelaide 45
46. BIST for Software
Common in critical systems such as avionics, military and others,
supplemented by a repair.
Can autonomous software test and repair itself autonomously to cater
for corner cases? Can autonomous software repair itself subject to
changes in environment?
?