SlideShare una empresa de Scribd logo
1 de 30
Guide : Ms Sangeetha Jamal                Presented by
            Dept of Computer Science       Merin Paul
                                       Mtech CS-IS S1


9/25/2012                                          1
Contents
  Introduction
  Types of Source-code Plagiarism
            Textual Similarity
            Functional Similarity
    Source Code Detection Algorithms.
    Detecting Techniques
    Tools used for code based plagiarism.
    Conclusion


9/25/2012                                    2
Introduction
 Plagiarism in source-code files occurs when source-code
     is copied and edited without proper acknowledgment of
     the original author.

 Techniques for plagiarism: Lexical changes and structural
     changes.

 Lexical changes: changes that can be done to the source-
     code without affecting the parsing of the program


9/25/2012                                                     3
Introduction
 Structural changes: changes made to the source code that
     will affect the parsing of the code and involve program
     debugging.

 Reasons for code copying:
            Code reusing.
            Programmer limitation
            Coincidentally implement using the same logic


9/25/2012                                                      4
TYPES OF SOURCE CODE
                PLAGIARISM
  Textual Similarity


  Functional Similarity




9/25/2012                          5
Textual Similarity
  Two individual source codes look similar based on their
     textual content.

  Textual content mean the words, letters, variable
     names, etc

  Type 1, Type 2, Type 3.




9/25/2012                                                    6
Type I
  The copied code fragment is as same as the original one
     without any modification except white spaces, comments
     and line modifications.
        int a; // counter
        // count five times
        for(a = 0; a < 5; a++)
        {
            printf(“a = %d”, a); // print value of a
        }
        return 0;

9/25/2012                                                     7
Type I
 int a;
 /* Loop increasing of a and print a value of it */
 for(a = 0; a < 5; a++){
 printf(“a = %d”, a);
 }
 return 0;




9/25/2012                                             8
Type II
  Same as Type I and also with modifications to variable
     names, function names and other user-defined identifiers.

      if(a > b)
      {
              a = a - 1;
              b = b * a; // comment 1
       }
      else
      {
             b = a; // comment
             2a = 0;
      }
9/25/2012                                                        9
Type II
 if(m > n)
 {m=m - 5;
 n=n*m; //my comment 1
 }
 else
 {n=m; //my comment
 2m=0;
 }


9/25/2012                10
Type III
  A copied code fragment is done by inserting or
   removing unnecessary statements.
            if(a > b)
               {
                    a = a - 1;
                    b = b * a;
                }
            else
                 {
                     b = a;
                     a = 0;
                 }
9/25/2012                                           11
Type III
 if(a > b)
      {
         a = a – 1;
         c = 0; // this statement is added
         b = b * a;
       }
 else
      {
         b = a;
         a = 0;
     }
9/25/2012                                    12
Functional similarity
  It refers to the code fragments that have the same semantic or
  functionality.

fragment 1 :                      fragment 2:
int i , j = 1;                    int factorial(int n)
for(i = 1; i <= VALUE; i++)       {
j = j * i;                          if(n == 0) return 1;
                                    else return factorial(n – 1)*n;
                                  }


9/25/2012                                                         13
Source Code Detection Algorithms
  Text based
  Token-based
  Parse tree-based
  PDG-based
  Metrics-based
  Hybrid Approaches




9/25/2012                               14
CONTD..
  Text based
             Find
                 textual match between two source codes..
            Simple and Fast.

  Token based
             Using a lexer to convert the program into tokens.
            Find a match in token sequences.
            More robust to simple text replacements.



9/25/2012                                                         15
CONTD…
  Parse Trees
            Build and compare parsetrees
            Contains the complete information about the
             source code
            Tree comparison can normalize conditional
             statements.

  Program Dependency Graphs (PDGs)
            Captures the actual flow of control in a program.
            Allows higher-level equivalences to be located.
            More complex.
9/25/2012                                                        16
CONTD…
  Metrics
           capture 'scores' of code segments according to
            certain criteria.
           Metrics are simple to calculate.
           Lead to false positives.
 •   Hybrid
           Combination of two or more previous
            techniques.



9/25/2012                                                    17
Detecting Techniques
 Detection via Lexical Similarities


            The process of lexical analysis takes source code and
             converts it into a stream of lexical tokens.
            Source code undergoes a series of transformation.
            Identification of reserved words, identifiers, and
             numbers are beneficial for plagiarism detection.




9/25/2012                                                        18
CONTD…
   int[] A = {1,2,3,4};   int[] B = {1, 2, 3, 4};
   for(int i = 0; i <     for(int j = 0; j < B.length;
   A.length; i++) {       j++) {
   A[i] = A[i] + 1;       B[j] = B[j] + 1;
   }                      }




9/25/2012                                                19
CONTD…

    LITERAL_int LBRACK RBRACK IDENT ASSIGN
    LCURLY NUM_INT COMMA NUM_INT
    COMMA NUM_INT COMMA NUM_INT RCURLY SEMI
    LITERAL_for LPAREN LITERAL_int IDENT ASSIGN
    NUM_INT SEMI IDENT LT
    IDENT DOT IDENT SEMI IDENT INC RPAREN LCURLY
    NUM_INT SEMI
    RCURLY




9/25/2012                                          20
Detection via Parse Tree Similarities




9/25/2012                                 21
Detection via Metrics
  Calculate and compare attribute counts.


  Programs with similar attribute counts are potentially
     similar programs.

  Counts of operators and operands are typically used to
     construct attribute counts.




9/25/2012                                                   22
Tools used for code based plagiarism
 Jplag

  Finds similarities among multiple sets of source code files.
  JPlag operates in two phases.
  First phase: All programs to be compared are parsed and
   converted into token strings.
  Second phase: Token strings are compared in pairs for
   determining the similarity of each pair.
  It is more robust. It supports Java, c#, C, C++ and natural
   language text.
9/25/2012                                                        23
CONTD..
MOSS (Measure Of Software Similarity)

 Measure Of Software Similarity was developed in 1994
  by Alex Aiken.
 It analyzes code written in languages like
  C, C++, Python, Visual
  Basic, Javascript, FORTRAN, Lisp, Ada etc.
 Provided as an internet service and given a list of source
  files.

9/25/2012                                                      24
CONTD…
  YAP (Yet Another Plague)

  Token-based system.
  YAP works in two phases.
  The first phase generates a token file for each submission.
  The second phase compares pairs of token files using the
     token matching algorithm, Running-Karp-Rabin Greedy-
     String-Tiling algorithm (RKRGST)



9/25/2012                                                        25
Conclusion
  Plagiarism in programming assignments is an inevitable
   issue for most academics teaching programming.
  Plagiarism Detection systems are built based on a few
   languages.
  Most of the detection software checking is done with
   some repository situated in an organization.
  As the number of digital copies are going up the
   repository size should be large and the plagiarism
   Detection software should be able to handle it.


9/25/2012                                                   26
Conclusion
  Plagiarism in programming assignments is an inevitable
   issue for most academics teaching programming.
  Most popular plagiarism detection algorithms use string-
   matching to create token string representations of
   programs.
  The tokens of each document are compared on a pair-wise
   basis to determine similar source-code segments between
   the files.
  String-matching systems are language-dependent
   depending on the programming languages supported by
   their parsers

9/25/2012                                                     27
References
 1)     G. Cosma and M. Joy,” An Approach to Source-Code Plagiarism
        Detection and Investigation Using Latent Semantic Analysis”
        IEEE Trans. Computers, vol. 61, no. 3, pp. 379-391, March 2012
 2)     Georgina Cosma, Mike Joy, Daniel White and Jane Yau, 9th
        August 2007 ,ICS,University of Ulster
        http://www.ics.heacademy.ac.uk/resources/assessment/plagiarism/
 3)     Okiemute Omuta ”Electronic Source Code Plagiarism Detection”
        Computer Engineering Department,European University of
        Lefke, North Cyprus
 4)     S. Schleimer, D. Wilkerson, and A. Aiken, “Winnowing: Local
        Algorithms for Document Fingerprinting,” Proc. the ACM
        SIGMOD Int’l Conf. Management of Data, pp. 76-85, 2003
9/25/2012                                                                 28
References
 4) M.J. Wise, “YAP3: Improved Detection of Similarities in Computer
    Program and Other Texts,” Proc. 27th SIGCSE Technical
    Symp., pp. 130-134, 1996.




9/25/2012                                                          29
THANK U!!!


9/25/2012                30

Más contenido relacionado

La actualidad más candente

Tools and Techniques for Analyzing Texts: Tweets to Intellectual Property
Tools and Techniques for Analyzing Texts: Tweets to Intellectual PropertyTools and Techniques for Analyzing Texts: Tweets to Intellectual Property
Tools and Techniques for Analyzing Texts: Tweets to Intellectual PropertyDan Sullivan, Ph.D.
 
A novel approach based on topic
A novel approach based on topicA novel approach based on topic
A novel approach based on topiccsandit
 
A Novel Approach for Code Clone Detection Using Hybrid Technique
A Novel Approach for Code Clone Detection Using Hybrid TechniqueA Novel Approach for Code Clone Detection Using Hybrid Technique
A Novel Approach for Code Clone Detection Using Hybrid TechniqueINFOGAIN PUBLICATION
 
A hybrid model to detect malicious executables
A hybrid model to detect malicious executablesA hybrid model to detect malicious executables
A hybrid model to detect malicious executablesUltraUploader
 
IRJET - Pseudocode to Python Translation using Machine Learning
IRJET - Pseudocode to Python Translation using Machine LearningIRJET - Pseudocode to Python Translation using Machine Learning
IRJET - Pseudocode to Python Translation using Machine LearningIRJET Journal
 
Automatic reverse engineering of malware emulators
Automatic reverse engineering of malware emulatorsAutomatic reverse engineering of malware emulators
Automatic reverse engineering of malware emulatorsUltraUploader
 
Using Clone Detection to Identify Bugs in Concurrent Software
Using Clone Detection to Identify Bugs in Concurrent SoftwareUsing Clone Detection to Identify Bugs in Concurrent Software
Using Clone Detection to Identify Bugs in Concurrent SoftwareICSM 2010
 
Tutorial - Introduction to Rule Technologies and Systems
Tutorial - Introduction to Rule Technologies and SystemsTutorial - Introduction to Rule Technologies and Systems
Tutorial - Introduction to Rule Technologies and SystemsAdrian Paschke
 
Survey of universal authentication protocol for mobile communication
Survey of universal authentication protocol for mobile communicationSurvey of universal authentication protocol for mobile communication
Survey of universal authentication protocol for mobile communicationAhmad Sharifi
 
Semi-Supervised Keyphrase Extraction on Scientific Article using Fact-based S...
Semi-Supervised Keyphrase Extraction on Scientific Article using Fact-based S...Semi-Supervised Keyphrase Extraction on Scientific Article using Fact-based S...
Semi-Supervised Keyphrase Extraction on Scientific Article using Fact-based S...TELKOMNIKA JOURNAL
 
Using IR methods for labeling source code artifacts: Is it worthwhile?
Using IR methods for labeling source code artifacts: Is it worthwhile?Using IR methods for labeling source code artifacts: Is it worthwhile?
Using IR methods for labeling source code artifacts: Is it worthwhile?Sebastiano Panichella
 
TEXT ADVERTISEMENTS ANALYSIS USING CONVOLUTIONAL NEURAL NETWORKS
TEXT ADVERTISEMENTS ANALYSIS USING CONVOLUTIONAL NEURAL NETWORKSTEXT ADVERTISEMENTS ANALYSIS USING CONVOLUTIONAL NEURAL NETWORKS
TEXT ADVERTISEMENTS ANALYSIS USING CONVOLUTIONAL NEURAL NETWORKSijdms
 
Named Entity Recognition For Hindi-English code-mixed Twitter Text
Named Entity Recognition For Hindi-English code-mixed Twitter Text Named Entity Recognition For Hindi-English code-mixed Twitter Text
Named Entity Recognition For Hindi-English code-mixed Twitter Text Amogh Kawle
 
Myanmar Named Entity Recognition with Hidden Markov Model
Myanmar Named Entity Recognition with Hidden Markov ModelMyanmar Named Entity Recognition with Hidden Markov Model
Myanmar Named Entity Recognition with Hidden Markov Modelijtsrd
 
Email Data Cleaning
Email Data CleaningEmail Data Cleaning
Email Data Cleaningfeiwin
 
Extracting Archival-Quality Information from Software-Related Chats
Extracting Archival-Quality Information from Software-Related ChatsExtracting Archival-Quality Information from Software-Related Chats
Extracting Archival-Quality Information from Software-Related ChatsPreetha Chatterjee
 

La actualidad más candente (20)

Tools and Techniques for Analyzing Texts: Tweets to Intellectual Property
Tools and Techniques for Analyzing Texts: Tweets to Intellectual PropertyTools and Techniques for Analyzing Texts: Tweets to Intellectual Property
Tools and Techniques for Analyzing Texts: Tweets to Intellectual Property
 
A novel approach based on topic
A novel approach based on topicA novel approach based on topic
A novel approach based on topic
 
A Novel Approach for Code Clone Detection Using Hybrid Technique
A Novel Approach for Code Clone Detection Using Hybrid TechniqueA Novel Approach for Code Clone Detection Using Hybrid Technique
A Novel Approach for Code Clone Detection Using Hybrid Technique
 
A hybrid model to detect malicious executables
A hybrid model to detect malicious executablesA hybrid model to detect malicious executables
A hybrid model to detect malicious executables
 
IRJET - Pseudocode to Python Translation using Machine Learning
IRJET - Pseudocode to Python Translation using Machine LearningIRJET - Pseudocode to Python Translation using Machine Learning
IRJET - Pseudocode to Python Translation using Machine Learning
 
Automatic reverse engineering of malware emulators
Automatic reverse engineering of malware emulatorsAutomatic reverse engineering of malware emulators
Automatic reverse engineering of malware emulators
 
Using Clone Detection to Identify Bugs in Concurrent Software
Using Clone Detection to Identify Bugs in Concurrent SoftwareUsing Clone Detection to Identify Bugs in Concurrent Software
Using Clone Detection to Identify Bugs in Concurrent Software
 
H017445260
H017445260H017445260
H017445260
 
Tutorial - Introduction to Rule Technologies and Systems
Tutorial - Introduction to Rule Technologies and SystemsTutorial - Introduction to Rule Technologies and Systems
Tutorial - Introduction to Rule Technologies and Systems
 
Icsme16.ppt
Icsme16.pptIcsme16.ppt
Icsme16.ppt
 
Survey of universal authentication protocol for mobile communication
Survey of universal authentication protocol for mobile communicationSurvey of universal authentication protocol for mobile communication
Survey of universal authentication protocol for mobile communication
 
Oop
OopOop
Oop
 
C++ programing lanuage
C++ programing lanuageC++ programing lanuage
C++ programing lanuage
 
Semi-Supervised Keyphrase Extraction on Scientific Article using Fact-based S...
Semi-Supervised Keyphrase Extraction on Scientific Article using Fact-based S...Semi-Supervised Keyphrase Extraction on Scientific Article using Fact-based S...
Semi-Supervised Keyphrase Extraction on Scientific Article using Fact-based S...
 
Using IR methods for labeling source code artifacts: Is it worthwhile?
Using IR methods for labeling source code artifacts: Is it worthwhile?Using IR methods for labeling source code artifacts: Is it worthwhile?
Using IR methods for labeling source code artifacts: Is it worthwhile?
 
TEXT ADVERTISEMENTS ANALYSIS USING CONVOLUTIONAL NEURAL NETWORKS
TEXT ADVERTISEMENTS ANALYSIS USING CONVOLUTIONAL NEURAL NETWORKSTEXT ADVERTISEMENTS ANALYSIS USING CONVOLUTIONAL NEURAL NETWORKS
TEXT ADVERTISEMENTS ANALYSIS USING CONVOLUTIONAL NEURAL NETWORKS
 
Named Entity Recognition For Hindi-English code-mixed Twitter Text
Named Entity Recognition For Hindi-English code-mixed Twitter Text Named Entity Recognition For Hindi-English code-mixed Twitter Text
Named Entity Recognition For Hindi-English code-mixed Twitter Text
 
Myanmar Named Entity Recognition with Hidden Markov Model
Myanmar Named Entity Recognition with Hidden Markov ModelMyanmar Named Entity Recognition with Hidden Markov Model
Myanmar Named Entity Recognition with Hidden Markov Model
 
Email Data Cleaning
Email Data CleaningEmail Data Cleaning
Email Data Cleaning
 
Extracting Archival-Quality Information from Software-Related Chats
Extracting Archival-Quality Information from Software-Related ChatsExtracting Archival-Quality Information from Software-Related Chats
Extracting Archival-Quality Information from Software-Related Chats
 

Similar a Plagiarism introduction

A Platform for Application Risk Intelligence
A Platform for Application Risk IntelligenceA Platform for Application Risk Intelligence
A Platform for Application Risk IntelligenceCheckmarx
 
Software engineering principles in system software design
Software engineering principles in system software designSoftware engineering principles in system software design
Software engineering principles in system software designTech_MX
 
Study on Different Code-Clone Detection Techniques & Approaches to MitigateCo...
Study on Different Code-Clone Detection Techniques & Approaches to MitigateCo...Study on Different Code-Clone Detection Techniques & Approaches to MitigateCo...
Study on Different Code-Clone Detection Techniques & Approaches to MitigateCo...IRJET Journal
 
Study on Different Code-Clone Detection Techniques & Approaches to MitigateCo...
Study on Different Code-Clone Detection Techniques & Approaches to MitigateCo...Study on Different Code-Clone Detection Techniques & Approaches to MitigateCo...
Study on Different Code-Clone Detection Techniques & Approaches to MitigateCo...IRJET Journal
 
Basics of c# by sabir
Basics of c# by sabirBasics of c# by sabir
Basics of c# by sabirSabir Ali
 
Compiler gate question key
Compiler gate question keyCompiler gate question key
Compiler gate question keyArthyR3
 
Aspect Oriented Programming Through C#.NET
Aspect Oriented Programming Through C#.NETAspect Oriented Programming Through C#.NET
Aspect Oriented Programming Through C#.NETWaqas Tariq
 
distributing computing
distributing computingdistributing computing
distributing computingnibiganesh
 
A JCR View of the World - adaptTo() 2012 Berlin
A JCR View of the World - adaptTo() 2012 BerlinA JCR View of the World - adaptTo() 2012 Berlin
A JCR View of the World - adaptTo() 2012 BerlinAlexander Klimetschek
 
STATICMOCK : A Mock Object Framework for Compiled Languages
STATICMOCK : A Mock Object Framework for Compiled Languages STATICMOCK : A Mock Object Framework for Compiled Languages
STATICMOCK : A Mock Object Framework for Compiled Languages ijseajournal
 
GENERIC CODE CLONING METHOD FOR DETECTION OF CLONE CODE IN SOFTWARE DEVELOPMENT
GENERIC CODE CLONING METHOD FOR DETECTION OF CLONE CODE IN SOFTWARE DEVELOPMENT GENERIC CODE CLONING METHOD FOR DETECTION OF CLONE CODE IN SOFTWARE DEVELOPMENT
GENERIC CODE CLONING METHOD FOR DETECTION OF CLONE CODE IN SOFTWARE DEVELOPMENT IAEME Publication
 
Put Your Hands in the Mud: What Technique, Why, and How
Put Your Hands in the Mud: What Technique, Why, and HowPut Your Hands in the Mud: What Technique, Why, and How
Put Your Hands in the Mud: What Technique, Why, and HowMassimiliano Di Penta
 
Euro python 2015 writing quality code
Euro python 2015   writing quality codeEuro python 2015   writing quality code
Euro python 2015 writing quality coderadek_j
 
Project_Report (BARC-Jerin)_final
Project_Report (BARC-Jerin)_finalProject_Report (BARC-Jerin)_final
Project_Report (BARC-Jerin)_finalJerin John
 
Tag Clouds for Object-Oriented Source Code Visualization
Tag Clouds for Object-Oriented Source Code VisualizationTag Clouds for Object-Oriented Source Code Visualization
Tag Clouds for Object-Oriented Source Code VisualizationRa'Fat Al-Msie'deen
 
A Tool to Detect Plagiarism in Java Source Code.pdf
A Tool to Detect Plagiarism in Java Source Code.pdfA Tool to Detect Plagiarism in Java Source Code.pdf
A Tool to Detect Plagiarism in Java Source Code.pdfKayla Smith
 

Similar a Plagiarism introduction (20)

A Platform for Application Risk Intelligence
A Platform for Application Risk IntelligenceA Platform for Application Risk Intelligence
A Platform for Application Risk Intelligence
 
Software engineering principles in system software design
Software engineering principles in system software designSoftware engineering principles in system software design
Software engineering principles in system software design
 
Study on Different Code-Clone Detection Techniques & Approaches to MitigateCo...
Study on Different Code-Clone Detection Techniques & Approaches to MitigateCo...Study on Different Code-Clone Detection Techniques & Approaches to MitigateCo...
Study on Different Code-Clone Detection Techniques & Approaches to MitigateCo...
 
Study on Different Code-Clone Detection Techniques & Approaches to MitigateCo...
Study on Different Code-Clone Detection Techniques & Approaches to MitigateCo...Study on Different Code-Clone Detection Techniques & Approaches to MitigateCo...
Study on Different Code-Clone Detection Techniques & Approaches to MitigateCo...
 
Basics of c# by sabir
Basics of c# by sabirBasics of c# by sabir
Basics of c# by sabir
 
7068458.ppt
7068458.ppt7068458.ppt
7068458.ppt
 
Compiler gate question key
Compiler gate question keyCompiler gate question key
Compiler gate question key
 
Aspect Oriented Programming Through C#.NET
Aspect Oriented Programming Through C#.NETAspect Oriented Programming Through C#.NET
Aspect Oriented Programming Through C#.NET
 
C sharp
C sharpC sharp
C sharp
 
distributing computing
distributing computingdistributing computing
distributing computing
 
A JCR View of the World - adaptTo() 2012 Berlin
A JCR View of the World - adaptTo() 2012 BerlinA JCR View of the World - adaptTo() 2012 Berlin
A JCR View of the World - adaptTo() 2012 Berlin
 
STATICMOCK : A Mock Object Framework for Compiled Languages
STATICMOCK : A Mock Object Framework for Compiled Languages STATICMOCK : A Mock Object Framework for Compiled Languages
STATICMOCK : A Mock Object Framework for Compiled Languages
 
Learning activity 3
Learning activity 3Learning activity 3
Learning activity 3
 
GENERIC CODE CLONING METHOD FOR DETECTION OF CLONE CODE IN SOFTWARE DEVELOPMENT
GENERIC CODE CLONING METHOD FOR DETECTION OF CLONE CODE IN SOFTWARE DEVELOPMENT GENERIC CODE CLONING METHOD FOR DETECTION OF CLONE CODE IN SOFTWARE DEVELOPMENT
GENERIC CODE CLONING METHOD FOR DETECTION OF CLONE CODE IN SOFTWARE DEVELOPMENT
 
Put Your Hands in the Mud: What Technique, Why, and How
Put Your Hands in the Mud: What Technique, Why, and HowPut Your Hands in the Mud: What Technique, Why, and How
Put Your Hands in the Mud: What Technique, Why, and How
 
Euro python 2015 writing quality code
Euro python 2015   writing quality codeEuro python 2015   writing quality code
Euro python 2015 writing quality code
 
Project_Report (BARC-Jerin)_final
Project_Report (BARC-Jerin)_finalProject_Report (BARC-Jerin)_final
Project_Report (BARC-Jerin)_final
 
Tag Clouds for Object-Oriented Source Code Visualization
Tag Clouds for Object-Oriented Source Code VisualizationTag Clouds for Object-Oriented Source Code Visualization
Tag Clouds for Object-Oriented Source Code Visualization
 
Objective-C
Objective-CObjective-C
Objective-C
 
A Tool to Detect Plagiarism in Java Source Code.pdf
A Tool to Detect Plagiarism in Java Source Code.pdfA Tool to Detect Plagiarism in Java Source Code.pdf
A Tool to Detect Plagiarism in Java Source Code.pdf
 

Último

Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024Janet Corral
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...PsychoTech Services
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 

Último (20)

Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 

Plagiarism introduction

  • 1. Guide : Ms Sangeetha Jamal Presented by Dept of Computer Science Merin Paul Mtech CS-IS S1 9/25/2012 1
  • 2. Contents  Introduction  Types of Source-code Plagiarism Textual Similarity Functional Similarity  Source Code Detection Algorithms.  Detecting Techniques  Tools used for code based plagiarism.  Conclusion 9/25/2012 2
  • 3. Introduction Plagiarism in source-code files occurs when source-code is copied and edited without proper acknowledgment of the original author. Techniques for plagiarism: Lexical changes and structural changes. Lexical changes: changes that can be done to the source- code without affecting the parsing of the program 9/25/2012 3
  • 4. Introduction Structural changes: changes made to the source code that will affect the parsing of the code and involve program debugging. Reasons for code copying: Code reusing. Programmer limitation Coincidentally implement using the same logic 9/25/2012 4
  • 5. TYPES OF SOURCE CODE PLAGIARISM  Textual Similarity  Functional Similarity 9/25/2012 5
  • 6. Textual Similarity  Two individual source codes look similar based on their textual content.  Textual content mean the words, letters, variable names, etc  Type 1, Type 2, Type 3. 9/25/2012 6
  • 7. Type I  The copied code fragment is as same as the original one without any modification except white spaces, comments and line modifications. int a; // counter // count five times for(a = 0; a < 5; a++) { printf(“a = %d”, a); // print value of a } return 0; 9/25/2012 7
  • 8. Type I int a; /* Loop increasing of a and print a value of it */ for(a = 0; a < 5; a++){ printf(“a = %d”, a); } return 0; 9/25/2012 8
  • 9. Type II  Same as Type I and also with modifications to variable names, function names and other user-defined identifiers. if(a > b) { a = a - 1; b = b * a; // comment 1 } else { b = a; // comment 2a = 0; } 9/25/2012 9
  • 10. Type II if(m > n) {m=m - 5; n=n*m; //my comment 1 } else {n=m; //my comment 2m=0; } 9/25/2012 10
  • 11. Type III  A copied code fragment is done by inserting or removing unnecessary statements. if(a > b) { a = a - 1; b = b * a; } else { b = a; a = 0; } 9/25/2012 11
  • 12. Type III if(a > b) { a = a – 1; c = 0; // this statement is added b = b * a; } else { b = a; a = 0; } 9/25/2012 12
  • 13. Functional similarity It refers to the code fragments that have the same semantic or functionality. fragment 1 : fragment 2: int i , j = 1; int factorial(int n) for(i = 1; i <= VALUE; i++) { j = j * i; if(n == 0) return 1; else return factorial(n – 1)*n; } 9/25/2012 13
  • 14. Source Code Detection Algorithms  Text based  Token-based  Parse tree-based  PDG-based  Metrics-based  Hybrid Approaches 9/25/2012 14
  • 15. CONTD..  Text based  Find textual match between two source codes.. Simple and Fast.  Token based  Using a lexer to convert the program into tokens. Find a match in token sequences. More robust to simple text replacements. 9/25/2012 15
  • 16. CONTD…  Parse Trees Build and compare parsetrees Contains the complete information about the source code Tree comparison can normalize conditional statements.  Program Dependency Graphs (PDGs) Captures the actual flow of control in a program. Allows higher-level equivalences to be located. More complex. 9/25/2012 16
  • 17. CONTD…  Metrics capture 'scores' of code segments according to certain criteria. Metrics are simple to calculate. Lead to false positives. • Hybrid Combination of two or more previous techniques. 9/25/2012 17
  • 18. Detecting Techniques Detection via Lexical Similarities The process of lexical analysis takes source code and converts it into a stream of lexical tokens. Source code undergoes a series of transformation. Identification of reserved words, identifiers, and numbers are beneficial for plagiarism detection. 9/25/2012 18
  • 19. CONTD… int[] A = {1,2,3,4}; int[] B = {1, 2, 3, 4}; for(int i = 0; i < for(int j = 0; j < B.length; A.length; i++) { j++) { A[i] = A[i] + 1; B[j] = B[j] + 1; } } 9/25/2012 19
  • 20. CONTD… LITERAL_int LBRACK RBRACK IDENT ASSIGN LCURLY NUM_INT COMMA NUM_INT COMMA NUM_INT COMMA NUM_INT RCURLY SEMI LITERAL_for LPAREN LITERAL_int IDENT ASSIGN NUM_INT SEMI IDENT LT IDENT DOT IDENT SEMI IDENT INC RPAREN LCURLY NUM_INT SEMI RCURLY 9/25/2012 20
  • 21. Detection via Parse Tree Similarities 9/25/2012 21
  • 22. Detection via Metrics  Calculate and compare attribute counts.  Programs with similar attribute counts are potentially similar programs.  Counts of operators and operands are typically used to construct attribute counts. 9/25/2012 22
  • 23. Tools used for code based plagiarism Jplag  Finds similarities among multiple sets of source code files.  JPlag operates in two phases.  First phase: All programs to be compared are parsed and converted into token strings.  Second phase: Token strings are compared in pairs for determining the similarity of each pair.  It is more robust. It supports Java, c#, C, C++ and natural language text. 9/25/2012 23
  • 24. CONTD.. MOSS (Measure Of Software Similarity)  Measure Of Software Similarity was developed in 1994 by Alex Aiken.  It analyzes code written in languages like C, C++, Python, Visual Basic, Javascript, FORTRAN, Lisp, Ada etc.  Provided as an internet service and given a list of source files. 9/25/2012 24
  • 25. CONTD…  YAP (Yet Another Plague)  Token-based system.  YAP works in two phases.  The first phase generates a token file for each submission.  The second phase compares pairs of token files using the token matching algorithm, Running-Karp-Rabin Greedy- String-Tiling algorithm (RKRGST) 9/25/2012 25
  • 26. Conclusion  Plagiarism in programming assignments is an inevitable issue for most academics teaching programming.  Plagiarism Detection systems are built based on a few languages.  Most of the detection software checking is done with some repository situated in an organization.  As the number of digital copies are going up the repository size should be large and the plagiarism Detection software should be able to handle it. 9/25/2012 26
  • 27. Conclusion  Plagiarism in programming assignments is an inevitable issue for most academics teaching programming.  Most popular plagiarism detection algorithms use string- matching to create token string representations of programs.  The tokens of each document are compared on a pair-wise basis to determine similar source-code segments between the files.  String-matching systems are language-dependent depending on the programming languages supported by their parsers 9/25/2012 27
  • 28. References 1) G. Cosma and M. Joy,” An Approach to Source-Code Plagiarism Detection and Investigation Using Latent Semantic Analysis” IEEE Trans. Computers, vol. 61, no. 3, pp. 379-391, March 2012 2) Georgina Cosma, Mike Joy, Daniel White and Jane Yau, 9th August 2007 ,ICS,University of Ulster http://www.ics.heacademy.ac.uk/resources/assessment/plagiarism/ 3) Okiemute Omuta ”Electronic Source Code Plagiarism Detection” Computer Engineering Department,European University of Lefke, North Cyprus 4) S. Schleimer, D. Wilkerson, and A. Aiken, “Winnowing: Local Algorithms for Document Fingerprinting,” Proc. the ACM SIGMOD Int’l Conf. Management of Data, pp. 76-85, 2003 9/25/2012 28
  • 29. References 4) M.J. Wise, “YAP3: Improved Detection of Similarities in Computer Program and Other Texts,” Proc. 27th SIGCSE Technical Symp., pp. 130-134, 1996. 9/25/2012 29