SlideShare una empresa de Scribd logo
1 de 3
Abstract - This paper introduces a computer
based plagiarism detection technique which
combines the functionality of substring
matching and keyword similarity to give more
accurate results. Also to make the algorithm
more efficient clustering is done in which
cluster of similar documents is created using
LCS (Longest Common Subsequence Method).
1. INTRODUCTION
Different people define plagiarism in different
ways. In layman language, plagiarism is an
unacknowledged, act of copying someone’s
work. Technically, as described by Wikipedia
[1], it is “wrongful appropriation" and "stealing
and publication" of another author’s "language,
thoughts, ideas, or expressions" and the
representation of them as one's own original
work”.
So, to detect plagiarism one can use either
manual or computer based techniques. As
manual techniques to detect plagiarism are
difficult to implement so there is a need to
develop computer based techniques which
would efficiently detect plagiarism.
So, this paper introduces an efficient technique
to detect plagiarism for text which uses:
1. Clustering using longest common substring
algorithm.
2. Substring Matching
3. Keyword Similarity
A. CLUSTERING
If all documents present are compared with the
reference document it would require a lot of time.
So to save time clustering based plagiarism
detection technique is used in which a cluster or
group of similar kind of documents is created
among which document is compared. These
similar documents are than compared by other
plagiarism detection techniques like substring
matching to get more efficient result of plagiarism.
B. SUBSTRING MATCHING
Here, in this technique, a pattern or a string is
compared with the document. The document is
divided using any indicator like ‘.’ , ’,’ , ’?’ etc.
It plays an important role in detecting
plagiarism in application source codes.
C. KEYWORD SIMILARITY
In this method of plagiarism detection, a
keyword is given and based on that the
similarity between the document is calculated.
2. PROPOSED WORK
Our main aim is to develop an efficient
algorithm to determine text based plagiarism.
We have developed a plagiarism detection
application in which clustering based on longest
common subsequence (LCS) [2], keyword
similarity [3] and substring matching [4]
algorithms are used. We have implemented it by
using C++ and Python as a programming
language.
Plagiarism Detection Technique
Vibhanshu
Manav Bagai
Siddharth Gupta
Department of Computer Engineering,
Zakir Hussain College of Engineering and Technology,
Aligarh.
A. FLOW OF THE SYSTEM
Fig. 1. Flow of the system.
Fig. 1, illustrates the general outline how the
whole algorithm works which start from
clustering which develop a cluster of similar
documents on which substring matching and at
last keyword similarity is applied to detect
plagiarism. Detailed outline of each process will
be given below.
B. CLUSTERING
The longest common subsequence (or LCS)
algorithm [4] finds the longest string between
two given strings that are common between the
two groups and in the same order in each string.
To add to the functionality and accuracy of
above algorithm clustering is proposed by us,
As there are many documents to be compared so
this may take a lot of time. So, to solve this
problem clustering is used. Here a cluster is
created which mainly contain those files which
are similar to the document to be compared by
Longest Common Subsequence Method.
Let the two documents compared be X and Y
where X is reference document and Y is
document to be compared having length m and
n respectively.
From longest common subsequence method we
will get the length of lowest common
subsequence, let it be lcs. To find the similarity
between the documents following method is
used:
R=lcs/m (1.1)
S= lcs/n (1.2)
Then we will find F which is equal to
F= ((1+(B*B))*R*S)/(R+B*B*S) (1.3)
Where, B= S/R
What is F, R,S AND B? *mention equation nos.
in the para all mention fig2*
The document which is having more value of F
is more similar. If documents are completely
same we get the value of F equal to 1. A cluster
is made for documents which are more similar
to reference document so that next step is
applied to only those document.
Fig. 2. xxxxxxx
C. SUBSTRING MATCHING AND
KEYWORD SIMILARITY
After getting the clusters of the similar
documents using LCS, they are no compared
using substring matching technique. In this
method, we will break the strings from both the
Clustering
Substring
Matching
Keyword
Similarity
Finding length of
longest common
substring
Perform Calculation
and find F
Compare F for
different
documents and
creating a cluster
among them.
documents into substring based on ‘.’, ‘!’, ‘?’.
Then, we will compare all the substrings from
both the documents (the reference and the
documents from the cluster). If the substrings
are found similar, we will increase the
plagiarism count. After this, we will apply
keyword similarity method, we will ask for a
keyword of the document and then using that
keyword we will find the sentences from both
the documents with that keyword. Then, we will
compare those sentences again and if found
same we will add them to the plagiarism set.
This is shown in the Algorithm [5] below:
Terms Used:
Suspected document - Q;
Reference document - D;
Sentences in Q - {q1,q2,…qn};
Sentences in reference document - {d1,
d2,…dn};
Plagiarism set - P=Null;
Input: Q.
For Q
Separate sentences Q= {q1,q2,….,qn};
For every q in Q,
Compare with reference
document D,
If (q==d)
Add sentence to plagiarized set
P,
Update result. P=P+q;
End if
End for
If (P==NULL)
Display “document is plagiarism free”
Else
Display,
set P as a plagiarized sentences.
Highlight all P in Q as plagiarism text.
End else
End for
3. CONCLUSION
In this paper we have proposed a new
plagiarism detection method which is a
combination of LCS algorithm, Substring
matching and Keyword similarity. This
algorithm is much more efficient than the
traditional substring based algorithm [6] as it
firstly forms the cluster of more similar
documents and then apply a hybrid algorithm of
substring matching and keyword similarity to
get more accurate results.
4. REFERENCES
[1] Font size :10pt, Font type: Times new
roman, First authors, title of research
reference,volume.
[2] G. Eason, B. Noble, and I.N. Sneddon, “On
certain integrals of Lipschitz-Hankel type
involving products of Bessel functions,” Phil.
Trans. Roy. Soc. London, vol. A247, pp. 529-
551, April 1955.
[3] Sangeetha Jamal, on “Plagiarism Detection
Techniques”, Cochin University of Science and
Technology, Cochin 682022 , 2010.
[4]Du Zou, Wei-jiang Long, Zhang Ling “A
Cluster Based Plagiarism Detection Technique”
for PAN , CLEF, 2010.
[5] Chow Kok Kent, Naomi Salim “Features
Based Text Similarity Detection”, Faculty of
Computer Science and Informatics System,
University Teknologi Malaysia, 81310 Skudai,
Johor Malaysia, Journal of Computing , vol 2,
issue 1, January 2010.
[6]Sudhir D. Salukhe, S. Z. Gawali, “A
Plagiarism Detection Technique Using
Reinforcement Learning”, International Journal
of Advanced Research in Computer Science and
Management Studies, vol. 1 ,issue 6, November
2013.

Más contenido relacionado

La actualidad más candente

Centrality-Based Network Coder Placement For Peer-To-Peer Content Distribution
Centrality-Based Network Coder Placement For Peer-To-Peer Content DistributionCentrality-Based Network Coder Placement For Peer-To-Peer Content Distribution
Centrality-Based Network Coder Placement For Peer-To-Peer Content DistributionIJCNCJournal
 
Enhancing security in cloud storage
Enhancing security in cloud storageEnhancing security in cloud storage
Enhancing security in cloud storageShivam Singh
 
Solutions crypto4e
Solutions crypto4eSolutions crypto4e
Solutions crypto4eJack Ndahiro
 
Analysis of rsa algorithm using gpu
Analysis of rsa algorithm using gpuAnalysis of rsa algorithm using gpu
Analysis of rsa algorithm using gpuIJNSA Journal
 
Analysis of Searchable Encryption
Analysis of Searchable EncryptionAnalysis of Searchable Encryption
Analysis of Searchable EncryptionNagendra Posani
 
Info mimi-hop-by-hop authentication-copy
Info mimi-hop-by-hop authentication-copyInfo mimi-hop-by-hop authentication-copy
Info mimi-hop-by-hop authentication-copySelva Raj
 
Info mimi-hop-by-hop authentication
Info mimi-hop-by-hop authenticationInfo mimi-hop-by-hop authentication
Info mimi-hop-by-hop authenticationSelva Raj
 
Towards Practical Homomorphic Encryption with Efficient Public key Generation
Towards Practical Homomorphic Encryption with Efficient Public key GenerationTowards Practical Homomorphic Encryption with Efficient Public key Generation
Towards Practical Homomorphic Encryption with Efficient Public key GenerationIDES Editor
 
Genetic Algorithm Based Cryptographic Approach using Karnatic Music
Genetic Algorithm Based Cryptographic Approach using  Karnatic  MusicGenetic Algorithm Based Cryptographic Approach using  Karnatic  Music
Genetic Algorithm Based Cryptographic Approach using Karnatic MusicIRJET Journal
 
Identity-Based Blind Signature Scheme with Message Recovery
Identity-Based Blind Signature Scheme with Message Recovery Identity-Based Blind Signature Scheme with Message Recovery
Identity-Based Blind Signature Scheme with Message Recovery IJECEIAES
 
Exploiting tls to disrupt privacy of web application's traffic
Exploiting tls to disrupt privacy of web application's trafficExploiting tls to disrupt privacy of web application's traffic
Exploiting tls to disrupt privacy of web application's trafficSandipan Biswas
 
Based on the Influence Factors in the Heterogeneous Network t-path Similarity...
Based on the Influence Factors in the Heterogeneous Network t-path Similarity...Based on the Influence Factors in the Heterogeneous Network t-path Similarity...
Based on the Influence Factors in the Heterogeneous Network t-path Similarity...IJRESJOURNAL
 

La actualidad más candente (18)

75227-144257-1-PB
75227-144257-1-PB75227-144257-1-PB
75227-144257-1-PB
 
Centrality-Based Network Coder Placement For Peer-To-Peer Content Distribution
Centrality-Based Network Coder Placement For Peer-To-Peer Content DistributionCentrality-Based Network Coder Placement For Peer-To-Peer Content Distribution
Centrality-Based Network Coder Placement For Peer-To-Peer Content Distribution
 
Enhancing security in cloud storage
Enhancing security in cloud storageEnhancing security in cloud storage
Enhancing security in cloud storage
 
Distributed Hash Table
Distributed Hash TableDistributed Hash Table
Distributed Hash Table
 
Solutions crypto4e
Solutions crypto4eSolutions crypto4e
Solutions crypto4e
 
Analysis of rsa algorithm using gpu
Analysis of rsa algorithm using gpuAnalysis of rsa algorithm using gpu
Analysis of rsa algorithm using gpu
 
N sys s 32
N sys s 32N sys s 32
N sys s 32
 
15 82-87
15 82-8715 82-87
15 82-87
 
Analysis of Searchable Encryption
Analysis of Searchable EncryptionAnalysis of Searchable Encryption
Analysis of Searchable Encryption
 
poster
posterposter
poster
 
Info mimi-hop-by-hop authentication-copy
Info mimi-hop-by-hop authentication-copyInfo mimi-hop-by-hop authentication-copy
Info mimi-hop-by-hop authentication-copy
 
Info mimi-hop-by-hop authentication
Info mimi-hop-by-hop authenticationInfo mimi-hop-by-hop authentication
Info mimi-hop-by-hop authentication
 
Spatial approximate string search
Spatial approximate string searchSpatial approximate string search
Spatial approximate string search
 
Towards Practical Homomorphic Encryption with Efficient Public key Generation
Towards Practical Homomorphic Encryption with Efficient Public key GenerationTowards Practical Homomorphic Encryption with Efficient Public key Generation
Towards Practical Homomorphic Encryption with Efficient Public key Generation
 
Genetic Algorithm Based Cryptographic Approach using Karnatic Music
Genetic Algorithm Based Cryptographic Approach using  Karnatic  MusicGenetic Algorithm Based Cryptographic Approach using  Karnatic  Music
Genetic Algorithm Based Cryptographic Approach using Karnatic Music
 
Identity-Based Blind Signature Scheme with Message Recovery
Identity-Based Blind Signature Scheme with Message Recovery Identity-Based Blind Signature Scheme with Message Recovery
Identity-Based Blind Signature Scheme with Message Recovery
 
Exploiting tls to disrupt privacy of web application's traffic
Exploiting tls to disrupt privacy of web application's trafficExploiting tls to disrupt privacy of web application's traffic
Exploiting tls to disrupt privacy of web application's traffic
 
Based on the Influence Factors in the Heterogeneous Network t-path Similarity...
Based on the Influence Factors in the Heterogeneous Network t-path Similarity...Based on the Influence Factors in the Heterogeneous Network t-path Similarity...
Based on the Influence Factors in the Heterogeneous Network t-path Similarity...
 

Destacado

матема
матемаматема
матемаVladaaaa
 
2 column paper
2 column paper2 column paper
2 column paperAksh Gupta
 
урок. кто такие звери
урок. кто такие звериурок. кто такие звери
урок. кто такие звериVladaaaa
 
Adverbialclauses 110413145624-phpapp01-130302024250-phpapp02
Adverbialclauses 110413145624-phpapp01-130302024250-phpapp02Adverbialclauses 110413145624-phpapp01-130302024250-phpapp02
Adverbialclauses 110413145624-phpapp01-130302024250-phpapp02Ogundipe Babajjide
 
Basics of Microsoft windows
Basics of Microsoft windows Basics of Microsoft windows
Basics of Microsoft windows vethics
 
Basic Html Knowledge for students
Basic Html Knowledge for studentsBasic Html Knowledge for students
Basic Html Knowledge for studentsvethics
 
засоби навчання, класифікація засобів навчання
засоби навчання, класифікація засобів навчаннязасоби навчання, класифікація засобів навчання
засоби навчання, класифікація засобів навчанняVladaaaa
 
Learn MS Powerpoint basics
Learn MS Powerpoint basicsLearn MS Powerpoint basics
Learn MS Powerpoint basicsvethics
 
Basics of Computer for Students
Basics of Computer for StudentsBasics of Computer for Students
Basics of Computer for Studentsvethics
 
Ms access basics ppt
Ms access basics ppt Ms access basics ppt
Ms access basics ppt vethics
 
learn about Structure of human eye in hindi
learn about Structure of human eye in hindi learn about Structure of human eye in hindi
learn about Structure of human eye in hindi vethics
 
Earthquake ppt in hindi
Earthquake ppt in hindi Earthquake ppt in hindi
Earthquake ppt in hindi vethics
 
ppt of our Solar system in hindi
ppt of our Solar system in hindippt of our Solar system in hindi
ppt of our Solar system in hindivethics
 
Ms excel ppt presentation
Ms excel ppt presentationMs excel ppt presentation
Ms excel ppt presentationvethics
 
Risalah 16022006171006
Risalah 16022006171006Risalah 16022006171006
Risalah 16022006171006Ivan Fauzillah
 

Destacado (16)

матема
матемаматема
матема
 
2 column paper
2 column paper2 column paper
2 column paper
 
урок. кто такие звери
урок. кто такие звериурок. кто такие звери
урок. кто такие звери
 
Adverbialclauses 110413145624-phpapp01-130302024250-phpapp02
Adverbialclauses 110413145624-phpapp01-130302024250-phpapp02Adverbialclauses 110413145624-phpapp01-130302024250-phpapp02
Adverbialclauses 110413145624-phpapp01-130302024250-phpapp02
 
Co315 part 1
Co315   part 1Co315   part 1
Co315 part 1
 
Basics of Microsoft windows
Basics of Microsoft windows Basics of Microsoft windows
Basics of Microsoft windows
 
Basic Html Knowledge for students
Basic Html Knowledge for studentsBasic Html Knowledge for students
Basic Html Knowledge for students
 
засоби навчання, класифікація засобів навчання
засоби навчання, класифікація засобів навчаннязасоби навчання, класифікація засобів навчання
засоби навчання, класифікація засобів навчання
 
Learn MS Powerpoint basics
Learn MS Powerpoint basicsLearn MS Powerpoint basics
Learn MS Powerpoint basics
 
Basics of Computer for Students
Basics of Computer for StudentsBasics of Computer for Students
Basics of Computer for Students
 
Ms access basics ppt
Ms access basics ppt Ms access basics ppt
Ms access basics ppt
 
learn about Structure of human eye in hindi
learn about Structure of human eye in hindi learn about Structure of human eye in hindi
learn about Structure of human eye in hindi
 
Earthquake ppt in hindi
Earthquake ppt in hindi Earthquake ppt in hindi
Earthquake ppt in hindi
 
ppt of our Solar system in hindi
ppt of our Solar system in hindippt of our Solar system in hindi
ppt of our Solar system in hindi
 
Ms excel ppt presentation
Ms excel ppt presentationMs excel ppt presentation
Ms excel ppt presentation
 
Risalah 16022006171006
Risalah 16022006171006Risalah 16022006171006
Risalah 16022006171006
 

Similar a 2 column paper

International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)inventionjournals
 
The Search of New Issues in the Detection of Near-duplicated Documents
The Search of New Issues in the Detection of Near-duplicated DocumentsThe Search of New Issues in the Detection of Near-duplicated Documents
The Search of New Issues in the Detection of Near-duplicated Documentsijceronline
 
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMSA COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMSgerogepatton
 
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMSA COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMSgerogepatton
 
O NTOLOGY B ASED D OCUMENT C LUSTERING U SING M AP R EDUCE
O NTOLOGY B ASED D OCUMENT C LUSTERING U SING M AP R EDUCE O NTOLOGY B ASED D OCUMENT C LUSTERING U SING M AP R EDUCE
O NTOLOGY B ASED D OCUMENT C LUSTERING U SING M AP R EDUCE ijdms
 
Textmining Retrieval And Clustering
Textmining Retrieval And ClusteringTextmining Retrieval And Clustering
Textmining Retrieval And ClusteringDataminingTools Inc
 
Textmining Retrieval And Clustering
Textmining Retrieval And ClusteringTextmining Retrieval And Clustering
Textmining Retrieval And Clusteringguest0edcaf
 
Textmining Retrieval And Clustering
Textmining Retrieval And ClusteringTextmining Retrieval And Clustering
Textmining Retrieval And ClusteringDatamining Tools
 
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATION
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATIONONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATION
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATIONIJDKP
 
Clustering Algorithm with a Novel Similarity Measure
Clustering Algorithm with a Novel Similarity MeasureClustering Algorithm with a Novel Similarity Measure
Clustering Algorithm with a Novel Similarity MeasureIOSR Journals
 
AN ALGORITHM FOR OPTIMIZED SEARCHING USING NON-OVERLAPPING ITERATIVE NEIGHBOR...
AN ALGORITHM FOR OPTIMIZED SEARCHING USING NON-OVERLAPPING ITERATIVE NEIGHBOR...AN ALGORITHM FOR OPTIMIZED SEARCHING USING NON-OVERLAPPING ITERATIVE NEIGHBOR...
AN ALGORITHM FOR OPTIMIZED SEARCHING USING NON-OVERLAPPING ITERATIVE NEIGHBOR...IJCSEA Journal
 
TEXT ADVERTISEMENTS ANALYSIS USING CONVOLUTIONAL NEURAL NETWORKS
TEXT ADVERTISEMENTS ANALYSIS USING CONVOLUTIONAL NEURAL NETWORKSTEXT ADVERTISEMENTS ANALYSIS USING CONVOLUTIONAL NEURAL NETWORKS
TEXT ADVERTISEMENTS ANALYSIS USING CONVOLUTIONAL NEURAL NETWORKSijdms
 

Similar a 2 column paper (20)

International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
L0261075078
L0261075078L0261075078
L0261075078
 
L0261075078
L0261075078L0261075078
L0261075078
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)
 
The Search of New Issues in the Detection of Near-duplicated Documents
The Search of New Issues in the Detection of Near-duplicated DocumentsThe Search of New Issues in the Detection of Near-duplicated Documents
The Search of New Issues in the Detection of Near-duplicated Documents
 
TEXT CLUSTERING.doc
TEXT CLUSTERING.docTEXT CLUSTERING.doc
TEXT CLUSTERING.doc
 
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMSA COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
 
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMSA COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
 
Ju3517011704
Ju3517011704Ju3517011704
Ju3517011704
 
F0431025031
F0431025031F0431025031
F0431025031
 
O NTOLOGY B ASED D OCUMENT C LUSTERING U SING M AP R EDUCE
O NTOLOGY B ASED D OCUMENT C LUSTERING U SING M AP R EDUCE O NTOLOGY B ASED D OCUMENT C LUSTERING U SING M AP R EDUCE
O NTOLOGY B ASED D OCUMENT C LUSTERING U SING M AP R EDUCE
 
Textmining Retrieval And Clustering
Textmining Retrieval And ClusteringTextmining Retrieval And Clustering
Textmining Retrieval And Clustering
 
Textmining Retrieval And Clustering
Textmining Retrieval And ClusteringTextmining Retrieval And Clustering
Textmining Retrieval And Clustering
 
Textmining Retrieval And Clustering
Textmining Retrieval And ClusteringTextmining Retrieval And Clustering
Textmining Retrieval And Clustering
 
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATION
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATIONONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATION
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATION
 
Clustering Algorithm with a Novel Similarity Measure
Clustering Algorithm with a Novel Similarity MeasureClustering Algorithm with a Novel Similarity Measure
Clustering Algorithm with a Novel Similarity Measure
 
AN ALGORITHM FOR OPTIMIZED SEARCHING USING NON-OVERLAPPING ITERATIVE NEIGHBOR...
AN ALGORITHM FOR OPTIMIZED SEARCHING USING NON-OVERLAPPING ITERATIVE NEIGHBOR...AN ALGORITHM FOR OPTIMIZED SEARCHING USING NON-OVERLAPPING ITERATIVE NEIGHBOR...
AN ALGORITHM FOR OPTIMIZED SEARCHING USING NON-OVERLAPPING ITERATIVE NEIGHBOR...
 
TEXT ADVERTISEMENTS ANALYSIS USING CONVOLUTIONAL NEURAL NETWORKS
TEXT ADVERTISEMENTS ANALYSIS USING CONVOLUTIONAL NEURAL NETWORKSTEXT ADVERTISEMENTS ANALYSIS USING CONVOLUTIONAL NEURAL NETWORKS
TEXT ADVERTISEMENTS ANALYSIS USING CONVOLUTIONAL NEURAL NETWORKS
 
P13 corley
P13 corleyP13 corley
P13 corley
 
P33077080
P33077080P33077080
P33077080
 

Último

Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024Janet Corral
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 

Último (20)

Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 

2 column paper

  • 1. Abstract - This paper introduces a computer based plagiarism detection technique which combines the functionality of substring matching and keyword similarity to give more accurate results. Also to make the algorithm more efficient clustering is done in which cluster of similar documents is created using LCS (Longest Common Subsequence Method). 1. INTRODUCTION Different people define plagiarism in different ways. In layman language, plagiarism is an unacknowledged, act of copying someone’s work. Technically, as described by Wikipedia [1], it is “wrongful appropriation" and "stealing and publication" of another author’s "language, thoughts, ideas, or expressions" and the representation of them as one's own original work”. So, to detect plagiarism one can use either manual or computer based techniques. As manual techniques to detect plagiarism are difficult to implement so there is a need to develop computer based techniques which would efficiently detect plagiarism. So, this paper introduces an efficient technique to detect plagiarism for text which uses: 1. Clustering using longest common substring algorithm. 2. Substring Matching 3. Keyword Similarity A. CLUSTERING If all documents present are compared with the reference document it would require a lot of time. So to save time clustering based plagiarism detection technique is used in which a cluster or group of similar kind of documents is created among which document is compared. These similar documents are than compared by other plagiarism detection techniques like substring matching to get more efficient result of plagiarism. B. SUBSTRING MATCHING Here, in this technique, a pattern or a string is compared with the document. The document is divided using any indicator like ‘.’ , ’,’ , ’?’ etc. It plays an important role in detecting plagiarism in application source codes. C. KEYWORD SIMILARITY In this method of plagiarism detection, a keyword is given and based on that the similarity between the document is calculated. 2. PROPOSED WORK Our main aim is to develop an efficient algorithm to determine text based plagiarism. We have developed a plagiarism detection application in which clustering based on longest common subsequence (LCS) [2], keyword similarity [3] and substring matching [4] algorithms are used. We have implemented it by using C++ and Python as a programming language. Plagiarism Detection Technique Vibhanshu Manav Bagai Siddharth Gupta Department of Computer Engineering, Zakir Hussain College of Engineering and Technology, Aligarh.
  • 2. A. FLOW OF THE SYSTEM Fig. 1. Flow of the system. Fig. 1, illustrates the general outline how the whole algorithm works which start from clustering which develop a cluster of similar documents on which substring matching and at last keyword similarity is applied to detect plagiarism. Detailed outline of each process will be given below. B. CLUSTERING The longest common subsequence (or LCS) algorithm [4] finds the longest string between two given strings that are common between the two groups and in the same order in each string. To add to the functionality and accuracy of above algorithm clustering is proposed by us, As there are many documents to be compared so this may take a lot of time. So, to solve this problem clustering is used. Here a cluster is created which mainly contain those files which are similar to the document to be compared by Longest Common Subsequence Method. Let the two documents compared be X and Y where X is reference document and Y is document to be compared having length m and n respectively. From longest common subsequence method we will get the length of lowest common subsequence, let it be lcs. To find the similarity between the documents following method is used: R=lcs/m (1.1) S= lcs/n (1.2) Then we will find F which is equal to F= ((1+(B*B))*R*S)/(R+B*B*S) (1.3) Where, B= S/R What is F, R,S AND B? *mention equation nos. in the para all mention fig2* The document which is having more value of F is more similar. If documents are completely same we get the value of F equal to 1. A cluster is made for documents which are more similar to reference document so that next step is applied to only those document. Fig. 2. xxxxxxx C. SUBSTRING MATCHING AND KEYWORD SIMILARITY After getting the clusters of the similar documents using LCS, they are no compared using substring matching technique. In this method, we will break the strings from both the Clustering Substring Matching Keyword Similarity Finding length of longest common substring Perform Calculation and find F Compare F for different documents and creating a cluster among them.
  • 3. documents into substring based on ‘.’, ‘!’, ‘?’. Then, we will compare all the substrings from both the documents (the reference and the documents from the cluster). If the substrings are found similar, we will increase the plagiarism count. After this, we will apply keyword similarity method, we will ask for a keyword of the document and then using that keyword we will find the sentences from both the documents with that keyword. Then, we will compare those sentences again and if found same we will add them to the plagiarism set. This is shown in the Algorithm [5] below: Terms Used: Suspected document - Q; Reference document - D; Sentences in Q - {q1,q2,…qn}; Sentences in reference document - {d1, d2,…dn}; Plagiarism set - P=Null; Input: Q. For Q Separate sentences Q= {q1,q2,….,qn}; For every q in Q, Compare with reference document D, If (q==d) Add sentence to plagiarized set P, Update result. P=P+q; End if End for If (P==NULL) Display “document is plagiarism free” Else Display, set P as a plagiarized sentences. Highlight all P in Q as plagiarism text. End else End for 3. CONCLUSION In this paper we have proposed a new plagiarism detection method which is a combination of LCS algorithm, Substring matching and Keyword similarity. This algorithm is much more efficient than the traditional substring based algorithm [6] as it firstly forms the cluster of more similar documents and then apply a hybrid algorithm of substring matching and keyword similarity to get more accurate results. 4. REFERENCES [1] Font size :10pt, Font type: Times new roman, First authors, title of research reference,volume. [2] G. Eason, B. Noble, and I.N. Sneddon, “On certain integrals of Lipschitz-Hankel type involving products of Bessel functions,” Phil. Trans. Roy. Soc. London, vol. A247, pp. 529- 551, April 1955. [3] Sangeetha Jamal, on “Plagiarism Detection Techniques”, Cochin University of Science and Technology, Cochin 682022 , 2010. [4]Du Zou, Wei-jiang Long, Zhang Ling “A Cluster Based Plagiarism Detection Technique” for PAN , CLEF, 2010. [5] Chow Kok Kent, Naomi Salim “Features Based Text Similarity Detection”, Faculty of Computer Science and Informatics System, University Teknologi Malaysia, 81310 Skudai, Johor Malaysia, Journal of Computing , vol 2, issue 1, January 2010. [6]Sudhir D. Salukhe, S. Z. Gawali, “A Plagiarism Detection Technique Using Reinforcement Learning”, International Journal of Advanced Research in Computer Science and Management Studies, vol. 1 ,issue 6, November 2013.