SlideShare una empresa de Scribd logo
1 de 20
Descargar para leer sin conexión
Similarity of Source Code

in the Presence of Pervasive
Modifications
Chaiyong Ragkhitwetsagul, Jens Krinke, David Clark
Centre for Research on Evolution, Search and Testing (CREST)
Dept. of Computer Science, UCL, London, UK
Similarity of Source Code in the Presence of Pervasive Modifications — C. Ragkhitwetsagul, J. Krinke, D. Clark — CREST, UCL, UK
Pervasive Modifications
2
/* ORIGINAL */
private static int partition

(Comparable[] a, int lo, int hi) {

int i = lo;

int j = hi+1;

Comparable v = a[lo];

while (true) {

while (less(a[++i], v)) {

if (i == hi) break;

}

while (less(v, a[--j])) {

if (j == lo) break;

}

if (i >= j) break;

exch(a, i, j);

}

exch(a, lo, j);

return j;

}
/* PERVASIVELY MODIFIED CODE */
private static int partition
(int[] bob, int left, int right){

int x = left;

int y = right+1;

for (;;) {

while (less(bob[left],bob[--y]))

if (y == left) break;

while (less(bob[++x],bob[left]))

if (x == right) break;

if (x >= y) break;

swap(bob, y, x);

}

swap(bob, y, left);

return y;

}
From: https://www.princeton.edu/pr/pub/integrity/pages/plagiarism/
Similarity of Source Code in the Presence of Pervasive Modifications — C. Ragkhitwetsagul, J. Krinke, D. Clark — CREST, UCL, UK
Pervasive Modifications
3
Changes affecting many locations in the whole method,
file, or project
Examples: layout changes, identifier renaming, API
changes, refactoring
Code cloning, software plagiarism, software evolution
But do not include (strong) code obfuscation
Similarity of Source Code in the Presence of Pervasive Modifications — C. Ragkhitwetsagul, J. Krinke, D. Clark — CREST, UCL, UK 4
When source code is pervasively
modified, which similarity detection
techniques or tools get the most
accurate results?
Similarity of Source Code in the Presence of Pervasive Modifications — C. Ragkhitwetsagul, J. Krinke, D. Clark — CREST, UCL, UK
30 Similarity Analysers
5
CCFinderX
iClones
Simian, NiCad
Deckard
Clone detectors
JPlag
Plaggie, Sherlock
Sim
Plagiarism detectors
7zncd, bzip2ncd
gzipncd, xz-ncd
icd, ncd
Compression
diff, bsdiff
difflib, fuzzywuzzy
jellyfish, ngram, sklearn
Others
Similarity of Source Code in the Presence of Pervasive Modifications — C. Ragkhitwetsagul, J. Krinke, D. Clark — CREST, UCL, UK
Test Data Generation
6
original
source
obfuscator
bytecode
obfuscator decompilers
InfixConverter.java
SqrtAlgorithm.java
Hanoi.java
Queens.java
MagicSquare.java
pervasively modified code
to be used in
detection phase
pervasively
modified code
compiler
javac
ARTIFICE
ProGuard Krakatau
Procyon
Similarity of Source Code in the Presence of Pervasive Modifications — C. Ragkhitwetsagul, J. Krinke, D. Clark — CREST, UCL, UK
Parameter Settings
7
Similarity of Source Code in the Presence of Pervasive Modifications — C. Ragkhitwetsagul, J. Krinke, D. Clark — CREST, UCL, UK
Similarity Report
8
InfC/
orig
InfC/
artfc
InfC/
orig
no
kraka
tau
InfC/
orig
no
procy
on
InfC/
orig
pg
kraka
tau
InfC/
orig
pg
procy
on
InfC/
artfc
no
kraka
tau
InfC/
artfc
no
procy
on
InfC/
artfc
pg
kraka
tau
InfC/
artfc
pg
procy
on
Sqrt/
orig
Sqrt/
artfc
… Squr/
artfc
pg
kraka
tau
Squr/
artfc
pg
procy
on
InfConv/orig 100 55 36 63 32 43 34 60 31 43 20 20 … 14 17
InfConv/artifice 55 100 35 54 33 39 37 56 32 39 19 30 … 14 17
InfConv/orig_no_krakatau 36 35 100 38 60 26 80 35 59 26 13 14 … 28 17
InfConv/orig_no_procyon 63 54 38 100 34 58 37 80 34 58 21 20 … 15 21
InfConv/orig_pg_krakatau 32 33 60 34 100 33 61 33 82 33 17 17 … 29 20
InfConv/orig_pg_procyon 43 39 26 58 33 100 26 59 33 100 19 20 … 14 21
InfConv/artific_no_krakatau 34 37 80 37 61 26 100 36 59 26 14 14 … 28 17
InfConv/artifice_no_procyon 60 56 35 80 33 59 36 100 32 59 19 20 … 15 19
InfConv/artifice_pg_krakatau 31 32 59 34 82 33 59 32 100 33 15 16 … 28 17
InfConv/artifice_pg_procyon 43 39 26 58 33 100 26 59 33 100 19 20 … 14 21
Sqrt/orig 20 19 13 21 17 19 14 19 15 19 100 32 … 14 16
Sqrt/artifice 20 30 14 20 17 20 14 20 16 20 32 100 … 15 18
… … … … … … … … … … … … … … … …
Square/artifice_pg_krakatau 14 14 28 15 29 14 28 15 28 14 14 15 … 100 32
Square/artifice_pg_procyon 17 17 17 21 20 21 17 19 17 21 16 18 … 32 100
Similarity of Source Code in the Presence of Pervasive Modifications — C. Ragkhitwetsagul, J. Krinke, D. Clark — CREST, UCL, UK
Similarity Threshold = 50
9
InfC/
orig
InfC/
artfc
InfC/
orig
no
kraka
tau
InfC/
orig
no
procy
on
InfC/
orig
pg
kraka
tau
InfC/
orig
pg
procy
on
InfC/
artfc
no
kraka
tau
InfC/
artfc
no
procy
on
InfC/
artfc
pg
kraka
tau
InfC/
artfc
pg
procy
on
Sqrt/
orig
Sqrt/
artfc
… Squr/
artfc
pg
kraka
tau
Squr/
artfc
pg
procy
on
InfConv/orig 100 55 36 63 32 43 34 60 31 43 20 20 … 14 17
InfConv/artifice 55 100 35 54 33 39 37 56 32 39 19 30 … 14 17
InfConv/orig_no_krakatau 36 35 100 38 60 26 80 35 59 26 13 14 … 28 17
InfConv/orig_no_procyon 63 54 38 100 34 58 37 80 34 58 21 20 … 15 21
InfConv/orig_pg_krakatau 32 33 60 34 100 33 61 33 82 33 17 17 … 29 20
InfConv/orig_pg_procyon 43 39 26 58 33 100 26 59 33 100 19 20 … 14 21
InfConv/artific_no_krakatau 34 37 80 37 61 26 100 36 59 26 14 14 … 28 17
InfConv/artifice_no_procyon 60 56 35 80 33 59 36 100 32 59 19 20 … 15 19
InfConv/artifice_pg_krakatau 31 32 59 34 82 33 59 32 100 33 15 16 … 28 17
InfConv/artifice_pg_procyon 43 39 26 58 33 100 26 59 33 100 19 20 … 14 21
Sqrt/orig 20 19 13 21 17 19 14 19 15 19 100 32 … 14 16
Sqrt/artifice 20 30 14 20 17 20 14 20 16 20 32 100 … 15 18
… … … … … … … … … … … … … … … …
Square/artifice_pg_krakatau 14 14 28 15 29 14 28 15 28 14 14 15 … 100 32
Square/artifice_pg_procyon 17 17 17 21 20 21 17 19 17 21 16 18 … 32 100
Similarity of Source Code in the Presence of Pervasive Modifications — C. Ragkhitwetsagul, J. Krinke, D. Clark — CREST, UCL, UK
Best Threshold
10
F-measure
0.00
0.23
0.45
0.68
0.90
Threshold Value (T)
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100
31
F-measure = 0.8282
Similarity of Source Code in the Presence of Pervasive Modifications — C. Ragkhitwetsagul, J. Krinke, D. Clark — CREST, UCL, UK
Optimal Configuration
11
Best ThresholdBest Parameter Settings
Similarity of Source Code in the Presence of Pervasive Modifications — C. Ragkhitwetsagul, J. Krinke, D. Clark — CREST, UCL, UK
Results
12
Tool Settings T Acc Prec Rec AUC Prec@n F1
ccfx b=20,t=1 4 0.9640 0.9145 0.9040 0.9468 0.9040 0.9095
simjava r=22 5 0.9568 0.8769 0.9120 0.9490 0.8840 0.8941
jplag-text t=8 2 0.9408 0.8235 0.8960 0.9453 0.8440 0.8582
py-difflib noautojunk 35 0.9392 0.8901 0.7940 0.9147 0.8080 0.8393
7zncd-BZip2 mx=1 39 0.9368 0.8977 0.7720 0.9419 0.8180 0.8301
ncd-bzlib 31 0.9336 0.8584 0.8000 0.9482 0.8200 0.8282
jplag-java t=3 43 0.9160 0.7526 0.8640 0.9667 0.7860 0.8045
py-sklearn 33 0.8488 0.5894 0.8040 0.9146 0.6200 0.6802
ccfx
deckard
iclones
nicad
simian
jplag-java
jplag-text
plaggie
sherlock
simjava
simtext
7zncd-BZip2
7zncd-LZMA
7zncd-LZMA2
7zncd-Deflate
7zncd-Deflate64
7zncd-PPMd
bzip2ncd
gzipncd
icd
ncd-bzlib
ncd-zlib
xz-ncd
bsdiff
diff
py-difflib
py-fuzzywuzzy
py-jellyfish
py-ngram
py-sklearn
0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 F1
Clone 

det.
Plag 

det.
Comp.
Others
Similarity of Source Code in the Presence of Pervasive Modifications — C. Ragkhitwetsagul, J. Krinke, D. Clark — CREST, UCL, UK 14
Highly specialised source code similarity
detection techniques and tools can perform
better than more general, textual similarity
measures.
Similarity of Source Code in the Presence of Pervasive Modifications — C. Ragkhitwetsagul, J. Krinke, D. Clark — CREST, UCL, UK
Normalisation by Decompilation
15
javac
Krakatau
Procyon
Pervasively modified
code
Normalised
code
Normalisation
Compile
Decompile
Similarity of Source Code in the Presence of Pervasive Modifications — C. Ragkhitwetsagul, J. Krinke, D. Clark — CREST, UCL, UK
Code Before Decompilation
16
Similarity of Source Code in the Presence of Pervasive Modifications — C. Ragkhitwetsagul, J. Krinke, D. Clark — CREST, UCL, UK
Code After Decompilation
17
Clone 

det.
Plag 

det.
Comp.
Others
ccfx
deckard
iclones
nicad
simian
jplag-java
jplag-text
plaggie
sherlock
simjava
simtext
7zncd-BZip2
7zncd-LZMA
7zncd-LZMA2
7zncd-Deflate
7zncd-Deflate64
7zncd-PPMd
bzip2ncd
gzipncd
icd
ncd-bzlib
ncd-zlib
xz-ncd
bsdiff
diff
py-difflib
py-fuzzywuzzy
py-jellyfish
py-ngram
py-sklearn
0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 F1
Orig.
Dec.
Similarity of Source Code in the Presence of Pervasive Modifications — C. Ragkhitwetsagul, J. Krinke, D. Clark — CREST, UCL, UK 19
Compilation and decompilation can be used
as an effective normalisation method that
greatly improves similarity detection on Java
source code
Similarity of Source Code in the Presence of Pervasive Modifications — C. Ragkhitwetsagul, J. Krinke, D. Clark — CREST, UCL, UK 20
Compilation and decompilation can be used as
an effective normalisation method that greatly
improves similarity detection on Java source code
Highly specialised source code similarity
detection techniques and tools can perform
better than more general, textual similarity
measures.
Similarity of Source Code

in the Presence of Pervasive Modifications
Chaiyong Ragkhitwetsagul, Jens Krinke, David Clark — CREST, UCL
More info: http://crest.cs.ucl.ac.uk/resources/cloplag/

Más contenido relacionado

Similar a Similarity of Source Code in the Presence of Pervasive Modifications [SCAM'16]

Using Compilation/Decompilation to Enhance Clone Detection
Using Compilation/Decompilation to Enhance Clone DetectionUsing Compilation/Decompilation to Enhance Clone Detection
Using Compilation/Decompilation to Enhance Clone DetectionChaiyong Ragkhitwetsagul
 
Opportunities for X-Ray science in future computing architectures
Opportunities for X-Ray science in future computing architecturesOpportunities for X-Ray science in future computing architectures
Opportunities for X-Ray science in future computing architecturesIan Foster
 
Reproducible Workflow with Cytoscape and Jupyter Notebook
Reproducible Workflow with Cytoscape and Jupyter NotebookReproducible Workflow with Cytoscape and Jupyter Notebook
Reproducible Workflow with Cytoscape and Jupyter NotebookKeiichiro Ono
 
Detailed cryptographic analysis of contact tracing protocols
Detailed cryptographic analysis of contact tracing protocolsDetailed cryptographic analysis of contact tracing protocols
Detailed cryptographic analysis of contact tracing protocolsChristian Spolaore
 
The Transformation of Systems Biology Into A Large Data Science
The Transformation of Systems Biology Into A Large Data ScienceThe Transformation of Systems Biology Into A Large Data Science
The Transformation of Systems Biology Into A Large Data ScienceRobert Grossman
 
Self-Similarity in Complex Networks
Self-Similarity in Complex NetworksSelf-Similarity in Complex Networks
Self-Similarity in Complex Networksnorman_fahrer
 
Bioinfo ngs data format visualization v2
Bioinfo ngs data format visualization v2Bioinfo ngs data format visualization v2
Bioinfo ngs data format visualization v2Li Shen
 
Android & PostgreSQL
Android & PostgreSQLAndroid & PostgreSQL
Android & PostgreSQLMark Wong
 
Searching for Configurations in Clone Evaluation: A Replication Study [SSBSE'16]
Searching for Configurations in Clone Evaluation: A Replication Study [SSBSE'16]Searching for Configurations in Clone Evaluation: A Replication Study [SSBSE'16]
Searching for Configurations in Clone Evaluation: A Replication Study [SSBSE'16]Chaiyong Ragkhitwetsagul
 
Tracing Tuples Across Dimensions: A Comparison of Scatterplots and Parallel C...
Tracing Tuples Across Dimensions: A Comparison of Scatterplots and Parallel C...Tracing Tuples Across Dimensions: A Comparison of Scatterplots and Parallel C...
Tracing Tuples Across Dimensions: A Comparison of Scatterplots and Parallel C...Kimberly Aguada
 
GraphQL Relay Introduction
GraphQL Relay IntroductionGraphQL Relay Introduction
GraphQL Relay IntroductionChen-Tsu Lin
 
On Continuum Limits of Markov Chains and Network Modeling
On Continuum Limits of Markov Chains and  Network ModelingOn Continuum Limits of Markov Chains and  Network Modeling
On Continuum Limits of Markov Chains and Network ModelingYang Zhang
 
Spatially resolved pair correlation functions for point cloud data
Spatially resolved pair correlation functions for point cloud dataSpatially resolved pair correlation functions for point cloud data
Spatially resolved pair correlation functions for point cloud dataTony Fast
 
New Broken Time-reversal Symmetry Superconductors: Theoretical Constraints on...
New Broken Time-reversal Symmetry Superconductors: Theoretical Constraints on...New Broken Time-reversal Symmetry Superconductors: Theoretical Constraints on...
New Broken Time-reversal Symmetry Superconductors: Theoretical Constraints on...Jorge Quintanilla
 
Tutorial ESWC2011 Building Semantic Sensor Web - 04 - Querying_semantic_strea...
Tutorial ESWC2011 Building Semantic Sensor Web - 04 - Querying_semantic_strea...Tutorial ESWC2011 Building Semantic Sensor Web - 04 - Querying_semantic_strea...
Tutorial ESWC2011 Building Semantic Sensor Web - 04 - Querying_semantic_strea...Jean-Paul Calbimonte
 

Similar a Similarity of Source Code in the Presence of Pervasive Modifications [SCAM'16] (20)

Using Compilation/Decompilation to Enhance Clone Detection
Using Compilation/Decompilation to Enhance Clone DetectionUsing Compilation/Decompilation to Enhance Clone Detection
Using Compilation/Decompilation to Enhance Clone Detection
 
Opportunities for X-Ray science in future computing architectures
Opportunities for X-Ray science in future computing architecturesOpportunities for X-Ray science in future computing architectures
Opportunities for X-Ray science in future computing architectures
 
Reproducible Workflow with Cytoscape and Jupyter Notebook
Reproducible Workflow with Cytoscape and Jupyter NotebookReproducible Workflow with Cytoscape and Jupyter Notebook
Reproducible Workflow with Cytoscape and Jupyter Notebook
 
Detailed cryptographic analysis of contact tracing protocols
Detailed cryptographic analysis of contact tracing protocolsDetailed cryptographic analysis of contact tracing protocols
Detailed cryptographic analysis of contact tracing protocols
 
The Transformation of Systems Biology Into A Large Data Science
The Transformation of Systems Biology Into A Large Data ScienceThe Transformation of Systems Biology Into A Large Data Science
The Transformation of Systems Biology Into A Large Data Science
 
Self-Similarity in Complex Networks
Self-Similarity in Complex NetworksSelf-Similarity in Complex Networks
Self-Similarity in Complex Networks
 
Bioinfo ngs data format visualization v2
Bioinfo ngs data format visualization v2Bioinfo ngs data format visualization v2
Bioinfo ngs data format visualization v2
 
Cto cn
Cto cnCto cn
Cto cn
 
Android & PostgreSQL
Android & PostgreSQLAndroid & PostgreSQL
Android & PostgreSQL
 
20120423.NGS.Rennes
20120423.NGS.Rennes20120423.NGS.Rennes
20120423.NGS.Rennes
 
Searching for Configurations in Clone Evaluation: A Replication Study [SSBSE'16]
Searching for Configurations in Clone Evaluation: A Replication Study [SSBSE'16]Searching for Configurations in Clone Evaluation: A Replication Study [SSBSE'16]
Searching for Configurations in Clone Evaluation: A Replication Study [SSBSE'16]
 
Tracing Tuples Across Dimensions: A Comparison of Scatterplots and Parallel C...
Tracing Tuples Across Dimensions: A Comparison of Scatterplots and Parallel C...Tracing Tuples Across Dimensions: A Comparison of Scatterplots and Parallel C...
Tracing Tuples Across Dimensions: A Comparison of Scatterplots and Parallel C...
 
Data analysis pipelines for NGS applications
Data analysis pipelines for NGS applicationsData analysis pipelines for NGS applications
Data analysis pipelines for NGS applications
 
GraphQL Relay Introduction
GraphQL Relay IntroductionGraphQL Relay Introduction
GraphQL Relay Introduction
 
Ijetr021108
Ijetr021108Ijetr021108
Ijetr021108
 
Ijetr021108
Ijetr021108Ijetr021108
Ijetr021108
 
On Continuum Limits of Markov Chains and Network Modeling
On Continuum Limits of Markov Chains and  Network ModelingOn Continuum Limits of Markov Chains and  Network Modeling
On Continuum Limits of Markov Chains and Network Modeling
 
Spatially resolved pair correlation functions for point cloud data
Spatially resolved pair correlation functions for point cloud dataSpatially resolved pair correlation functions for point cloud data
Spatially resolved pair correlation functions for point cloud data
 
New Broken Time-reversal Symmetry Superconductors: Theoretical Constraints on...
New Broken Time-reversal Symmetry Superconductors: Theoretical Constraints on...New Broken Time-reversal Symmetry Superconductors: Theoretical Constraints on...
New Broken Time-reversal Symmetry Superconductors: Theoretical Constraints on...
 
Tutorial ESWC2011 Building Semantic Sensor Web - 04 - Querying_semantic_strea...
Tutorial ESWC2011 Building Semantic Sensor Web - 04 - Querying_semantic_strea...Tutorial ESWC2011 Building Semantic Sensor Web - 04 - Querying_semantic_strea...
Tutorial ESWC2011 Building Semantic Sensor Web - 04 - Querying_semantic_strea...
 

Último

BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.PraveenaKalaiselvan1
 
Four Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptFour Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptJoemSTuliba
 
Forensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptxForensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptxkumarsanjai28051
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationColumbia Weather Systems
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayupadhyaymani499
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxmalonesandreagweneth
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxBerniceCayabyab1
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPirithiRaju
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...lizamodels9
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxJorenAcuavera1
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxpriyankatabhane
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationColumbia Weather Systems
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensorsonawaneprad
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologycaarthichand2003
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxEran Akiva Sinbar
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...Universidade Federal de Sergipe - UFS
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024innovationoecd
 
Carbon Dioxide Capture and Storage (CSS)
Carbon Dioxide Capture and Storage (CSS)Carbon Dioxide Capture and Storage (CSS)
Carbon Dioxide Capture and Storage (CSS)Tamer Koksalan, PhD
 

Último (20)

BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
 
Four Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptFour Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.ppt
 
Forensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptxForensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptx
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather Station
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyay
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptx
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptx
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather Station
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensor
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technology
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptx
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024
 
Carbon Dioxide Capture and Storage (CSS)
Carbon Dioxide Capture and Storage (CSS)Carbon Dioxide Capture and Storage (CSS)
Carbon Dioxide Capture and Storage (CSS)
 

Similarity of Source Code in the Presence of Pervasive Modifications [SCAM'16]

  • 1. Similarity of Source Code
 in the Presence of Pervasive Modifications Chaiyong Ragkhitwetsagul, Jens Krinke, David Clark Centre for Research on Evolution, Search and Testing (CREST) Dept. of Computer Science, UCL, London, UK
  • 2. Similarity of Source Code in the Presence of Pervasive Modifications — C. Ragkhitwetsagul, J. Krinke, D. Clark — CREST, UCL, UK Pervasive Modifications 2 /* ORIGINAL */ private static int partition
 (Comparable[] a, int lo, int hi) {
 int i = lo;
 int j = hi+1;
 Comparable v = a[lo];
 while (true) {
 while (less(a[++i], v)) {
 if (i == hi) break;
 }
 while (less(v, a[--j])) {
 if (j == lo) break;
 }
 if (i >= j) break;
 exch(a, i, j);
 }
 exch(a, lo, j);
 return j;
 } /* PERVASIVELY MODIFIED CODE */ private static int partition (int[] bob, int left, int right){
 int x = left;
 int y = right+1;
 for (;;) {
 while (less(bob[left],bob[--y]))
 if (y == left) break;
 while (less(bob[++x],bob[left]))
 if (x == right) break;
 if (x >= y) break;
 swap(bob, y, x);
 }
 swap(bob, y, left);
 return y;
 } From: https://www.princeton.edu/pr/pub/integrity/pages/plagiarism/
  • 3. Similarity of Source Code in the Presence of Pervasive Modifications — C. Ragkhitwetsagul, J. Krinke, D. Clark — CREST, UCL, UK Pervasive Modifications 3 Changes affecting many locations in the whole method, file, or project Examples: layout changes, identifier renaming, API changes, refactoring Code cloning, software plagiarism, software evolution But do not include (strong) code obfuscation
  • 4. Similarity of Source Code in the Presence of Pervasive Modifications — C. Ragkhitwetsagul, J. Krinke, D. Clark — CREST, UCL, UK 4 When source code is pervasively modified, which similarity detection techniques or tools get the most accurate results?
  • 5. Similarity of Source Code in the Presence of Pervasive Modifications — C. Ragkhitwetsagul, J. Krinke, D. Clark — CREST, UCL, UK 30 Similarity Analysers 5 CCFinderX iClones Simian, NiCad Deckard Clone detectors JPlag Plaggie, Sherlock Sim Plagiarism detectors 7zncd, bzip2ncd gzipncd, xz-ncd icd, ncd Compression diff, bsdiff difflib, fuzzywuzzy jellyfish, ngram, sklearn Others
  • 6. Similarity of Source Code in the Presence of Pervasive Modifications — C. Ragkhitwetsagul, J. Krinke, D. Clark — CREST, UCL, UK Test Data Generation 6 original source obfuscator bytecode obfuscator decompilers InfixConverter.java SqrtAlgorithm.java Hanoi.java Queens.java MagicSquare.java pervasively modified code to be used in detection phase pervasively modified code compiler javac ARTIFICE ProGuard Krakatau Procyon
  • 7. Similarity of Source Code in the Presence of Pervasive Modifications — C. Ragkhitwetsagul, J. Krinke, D. Clark — CREST, UCL, UK Parameter Settings 7
  • 8. Similarity of Source Code in the Presence of Pervasive Modifications — C. Ragkhitwetsagul, J. Krinke, D. Clark — CREST, UCL, UK Similarity Report 8 InfC/ orig InfC/ artfc InfC/ orig no kraka tau InfC/ orig no procy on InfC/ orig pg kraka tau InfC/ orig pg procy on InfC/ artfc no kraka tau InfC/ artfc no procy on InfC/ artfc pg kraka tau InfC/ artfc pg procy on Sqrt/ orig Sqrt/ artfc … Squr/ artfc pg kraka tau Squr/ artfc pg procy on InfConv/orig 100 55 36 63 32 43 34 60 31 43 20 20 … 14 17 InfConv/artifice 55 100 35 54 33 39 37 56 32 39 19 30 … 14 17 InfConv/orig_no_krakatau 36 35 100 38 60 26 80 35 59 26 13 14 … 28 17 InfConv/orig_no_procyon 63 54 38 100 34 58 37 80 34 58 21 20 … 15 21 InfConv/orig_pg_krakatau 32 33 60 34 100 33 61 33 82 33 17 17 … 29 20 InfConv/orig_pg_procyon 43 39 26 58 33 100 26 59 33 100 19 20 … 14 21 InfConv/artific_no_krakatau 34 37 80 37 61 26 100 36 59 26 14 14 … 28 17 InfConv/artifice_no_procyon 60 56 35 80 33 59 36 100 32 59 19 20 … 15 19 InfConv/artifice_pg_krakatau 31 32 59 34 82 33 59 32 100 33 15 16 … 28 17 InfConv/artifice_pg_procyon 43 39 26 58 33 100 26 59 33 100 19 20 … 14 21 Sqrt/orig 20 19 13 21 17 19 14 19 15 19 100 32 … 14 16 Sqrt/artifice 20 30 14 20 17 20 14 20 16 20 32 100 … 15 18 … … … … … … … … … … … … … … … … Square/artifice_pg_krakatau 14 14 28 15 29 14 28 15 28 14 14 15 … 100 32 Square/artifice_pg_procyon 17 17 17 21 20 21 17 19 17 21 16 18 … 32 100
  • 9. Similarity of Source Code in the Presence of Pervasive Modifications — C. Ragkhitwetsagul, J. Krinke, D. Clark — CREST, UCL, UK Similarity Threshold = 50 9 InfC/ orig InfC/ artfc InfC/ orig no kraka tau InfC/ orig no procy on InfC/ orig pg kraka tau InfC/ orig pg procy on InfC/ artfc no kraka tau InfC/ artfc no procy on InfC/ artfc pg kraka tau InfC/ artfc pg procy on Sqrt/ orig Sqrt/ artfc … Squr/ artfc pg kraka tau Squr/ artfc pg procy on InfConv/orig 100 55 36 63 32 43 34 60 31 43 20 20 … 14 17 InfConv/artifice 55 100 35 54 33 39 37 56 32 39 19 30 … 14 17 InfConv/orig_no_krakatau 36 35 100 38 60 26 80 35 59 26 13 14 … 28 17 InfConv/orig_no_procyon 63 54 38 100 34 58 37 80 34 58 21 20 … 15 21 InfConv/orig_pg_krakatau 32 33 60 34 100 33 61 33 82 33 17 17 … 29 20 InfConv/orig_pg_procyon 43 39 26 58 33 100 26 59 33 100 19 20 … 14 21 InfConv/artific_no_krakatau 34 37 80 37 61 26 100 36 59 26 14 14 … 28 17 InfConv/artifice_no_procyon 60 56 35 80 33 59 36 100 32 59 19 20 … 15 19 InfConv/artifice_pg_krakatau 31 32 59 34 82 33 59 32 100 33 15 16 … 28 17 InfConv/artifice_pg_procyon 43 39 26 58 33 100 26 59 33 100 19 20 … 14 21 Sqrt/orig 20 19 13 21 17 19 14 19 15 19 100 32 … 14 16 Sqrt/artifice 20 30 14 20 17 20 14 20 16 20 32 100 … 15 18 … … … … … … … … … … … … … … … … Square/artifice_pg_krakatau 14 14 28 15 29 14 28 15 28 14 14 15 … 100 32 Square/artifice_pg_procyon 17 17 17 21 20 21 17 19 17 21 16 18 … 32 100
  • 10. Similarity of Source Code in the Presence of Pervasive Modifications — C. Ragkhitwetsagul, J. Krinke, D. Clark — CREST, UCL, UK Best Threshold 10 F-measure 0.00 0.23 0.45 0.68 0.90 Threshold Value (T) 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 31 F-measure = 0.8282
  • 11. Similarity of Source Code in the Presence of Pervasive Modifications — C. Ragkhitwetsagul, J. Krinke, D. Clark — CREST, UCL, UK Optimal Configuration 11 Best ThresholdBest Parameter Settings
  • 12. Similarity of Source Code in the Presence of Pervasive Modifications — C. Ragkhitwetsagul, J. Krinke, D. Clark — CREST, UCL, UK Results 12 Tool Settings T Acc Prec Rec AUC Prec@n F1 ccfx b=20,t=1 4 0.9640 0.9145 0.9040 0.9468 0.9040 0.9095 simjava r=22 5 0.9568 0.8769 0.9120 0.9490 0.8840 0.8941 jplag-text t=8 2 0.9408 0.8235 0.8960 0.9453 0.8440 0.8582 py-difflib noautojunk 35 0.9392 0.8901 0.7940 0.9147 0.8080 0.8393 7zncd-BZip2 mx=1 39 0.9368 0.8977 0.7720 0.9419 0.8180 0.8301 ncd-bzlib 31 0.9336 0.8584 0.8000 0.9482 0.8200 0.8282 jplag-java t=3 43 0.9160 0.7526 0.8640 0.9667 0.7860 0.8045 py-sklearn 33 0.8488 0.5894 0.8040 0.9146 0.6200 0.6802
  • 14. Similarity of Source Code in the Presence of Pervasive Modifications — C. Ragkhitwetsagul, J. Krinke, D. Clark — CREST, UCL, UK 14 Highly specialised source code similarity detection techniques and tools can perform better than more general, textual similarity measures.
  • 15. Similarity of Source Code in the Presence of Pervasive Modifications — C. Ragkhitwetsagul, J. Krinke, D. Clark — CREST, UCL, UK Normalisation by Decompilation 15 javac Krakatau Procyon Pervasively modified code Normalised code Normalisation Compile Decompile
  • 16. Similarity of Source Code in the Presence of Pervasive Modifications — C. Ragkhitwetsagul, J. Krinke, D. Clark — CREST, UCL, UK Code Before Decompilation 16
  • 17. Similarity of Source Code in the Presence of Pervasive Modifications — C. Ragkhitwetsagul, J. Krinke, D. Clark — CREST, UCL, UK Code After Decompilation 17
  • 19. Similarity of Source Code in the Presence of Pervasive Modifications — C. Ragkhitwetsagul, J. Krinke, D. Clark — CREST, UCL, UK 19 Compilation and decompilation can be used as an effective normalisation method that greatly improves similarity detection on Java source code
  • 20. Similarity of Source Code in the Presence of Pervasive Modifications — C. Ragkhitwetsagul, J. Krinke, D. Clark — CREST, UCL, UK 20 Compilation and decompilation can be used as an effective normalisation method that greatly improves similarity detection on Java source code Highly specialised source code similarity detection techniques and tools can perform better than more general, textual similarity measures. Similarity of Source Code
 in the Presence of Pervasive Modifications Chaiyong Ragkhitwetsagul, Jens Krinke, David Clark — CREST, UCL More info: http://crest.cs.ucl.ac.uk/resources/cloplag/