SlideShare una empresa de Scribd logo
1 de 3
Descargar para leer sin conexión
IJSRD - International Journal for Scientific Research & Development| Vol. 3, Issue 10, 2015 | ISSN (online): 2321-0613
All rights reserved by www.ijsrd.com 408
Detection of Fraud Reviews for Products
Dr. S.K. Pathan1
Janhavi Sankpal2
Pooja Sankpal3
Kanchan Haral4
1
Professor 2,3,4
Student
1,2,3,4
Department of Computer Science
1,2,3,4
Smt. Kashibai Navale College of Engineering, Pune
Abstract— Most of us use e-shopping (Any product) these
days and refer its rating or reviews before we download or
buy that product. Amazon/Play store provide a great number
of products but unfortunately few of those product reviews
are fraud. Hence such products must be marked, so that they
will be recognizable for rest of the users. Here we are
comparing reviews from two sites so that we can get more
clear idea. We can get higher probability of getting real
reviews if we take data from multiple sites. We are
proposing a system to develop an android application that
will take reviews from two different websites for single
product, and analyze them with NLP for positive or negative
rating. In this, user will give two different URLs of two
different sites for same product to the system as input. For
every URL reviews and comments will be fetched
separately and analyzed with NLP for positive negative
rating. Then their rating will be combined together with
average to give final rating for the product. As we are
handling the big data here, we are using Hadoops map
reduce. So it will be easier to decide which product reviews
are fraud or not.
Key words: Primal Text Mining and Pre-processing, NLP,
Mapping Text Data
I. INTRODUCTION
This article is provided for the people who download or buy
products online. Before they proceed with their purchase,
review of that product is a key aspect. There is no means by
which they can be sure that the given reviews are authentic.
A database is provided through which we can map the
comments by which we can conclude which of the
comments are positive and which are negative. User can
check the reviews of the same product from two different
sites. User will provide the URL of the one product from
two different sites. After analysing the reviews and mapping
with the database result will be generated. The generated
result will be in the form of positive or negative and in terms
of percentage and a graph. Using NLP we can process the
reviews and generate results. To map the data with the
database Hadoop’s MapReduce method can be used.
II. EXISTING WORK
Many of the sites just provide ranking, rating and reviews.
They do not assure its authenticity. There is a need to cross
check if the commented reviews are original or they are just
for bumping up the product to gain popularity.
On this domain there are exist some related work,
such as web ranking spam detection [1], online review spam
detection [2] and mobile App recommendation systems [3],
the issue in detecting ranking of the fraud for mobile
applications or any online product is still unfathomed.
Currently, the most popular and widely used
Online e-shopping sites provide facility for buyers to view
review of the product they are interested to buy and for the
people already bought products to share their shopping
experience and rate or review the product. This helps future
buyers to have the base of reference to which they can refer
before buying products online. Even on platforms like Play
store, one gets an opportunity to study reviews before
actually downloading the application. However, as the
positive side enables us to these facilities of studying the
experiences of users on the other hand there also raises
questions about authenticity of these reviews. In this digital
world it has become easier to rate any products online.
If we consider real-world observations, we find that
each review is always associated with a respective untapped
topic. For instance, some reviews might be related to the
untapped topic “worth a try” while some might be related to
the untapped topic “not so good”. At the same time,
different buyers have different personal preferences of
mobile applications or different site preferences for buying
products. A product or an application may have different
topic distributions in their historical review records.
Plausibly, the topic distribution of reviews, be it of a product
or an application, in a normal leading session of application,
should be consistent with the topic distribution in all
historical review records of that application. And so applies
to the product reviews. The possible cause to this are the
reviews of the topics that are based on the user’s personal
usage experiences and choices but not the popularity of
mobile applications. On the other hand, if the reviews of
leading sessions have been manipulated, the two topic
distributions will be prominently different. For example,
there may contain more positive topics, such as “worth a
try”, “popular” and “good” in the leading session.
III. DRAWBACKS OF THE EXISTING SYSTEM
The existing system cannot determine the authenticity of the
reviews which are posted for any product.
Buyers or the users are not able to recognize the
authenticity of the reviews given to a product and hence
they face dilemma whether the product is really good or bad.
The existing system does not detect fraudulent or deceptive
activities in reviews which might be just for gaining
popularity of that product.
For example, the developer or the brand which are
selling their products can themselves rate the products to
manipulate consumer’s perception. Also the rivals or the
opponents can post negative reviews for inflating the
product’s reputation.
Table 1
After using these search words both, positive
keywords classification and negative keywords
classification are used which leads to the generation of result
from the keywords in the database [10].
Detection of Fraud Reviews for Products
(IJSRD/Vol. 3/Issue 10/2015/082)
All rights reserved by www.ijsrd.com 409
A. Primal Text Mining and Pre-processing:
Many software’s / algorithms like Rapid-miner, WorldNet,
and data collection and processing trees have been
developed to test common positive and negative words.
Fig. 1: A sample architecture [10]
The URLs from two different sites for the same
product are taken as input in the system. The reviews from
those sites are scanned and mapped into the database to get
the result in terms of positive and negative percentage. A
database is maintained to map the words which are used in
the comments to check the positivity or the negativity of the
comments or reviews. Based on these results, a user can
roughly get an idea of whether the product (application or
any product) is worth buying or downloading.
B. Mapping Text Data:
The final word set will be pruned using the above mentioned
methods, with the results displayed in Table 1,2 which have
separate column of both positive and negative words.
Positive Negative
Agree Bad
Appreciate Cannot
Beneficial Damage
Comfort Dangerous
Ease Depression
Easier Died
Enjoy Difficult
Good Error
Great Hard
Greatest Failure
Help Impossible
Hope Lack
Important No
Thanks Not
Superb Poor
Yes Weak
Table 2: Positive and Negative Classification
of Words from Posts
IV. PROPOSED SYSTEM
Our main aim is to provide users with the authenticity of
product reviews and authenticity of application reviews.
This will enable user to have a clearer view towards the
quality and reliability of the product or application.
We therefore propose a system which will perform
the following tasks: - 1) System that will collect all the
information about particular application/sites such as any
comment or reviews about it. 2) Use NLP for processing
those comments and reviews. 3) Use Map reduce Algorithm
for querying. Hence we can detect whether the reviews
fraud or not.
The system uses the following basic concepts: -
A. Natural Language Processing:
It is a field of computer science, artificial intelligence, and
computational linguistics concern with interactions between
computers and human languages.
B. Tokenization:
It transforms a stream of characters into a stream of
processing units called tokens.
C. Stop Words Filtering:
It consists in eliminating stop words. Stop words or words
are filtered out before or after natural language processing.
For example, “is”,”the”,”at”,”who”,”which”.
D. Stemming:
It is the process of reducing derived words into their root
form. The related words should match to the same stem.
For example, the words “drove”, “drives”, “driven” would
fall under there root word “drive”.
V. CONCLUSION
Thus, if the gathered data is processed and studied precisely
then it will lead to better judgment of the products, the
developers will no longer bring their applications to market
just for the sake of money, people will get idea about its
quality and reliability easily. Reviews will easily be
detected and would not be taken into consideration unless
they are truly genuine.
ACKNOWLEDGEMENTS
We take this chance to convey our sincere gratitude, respect
and deep regards to our Project guide, Dr.S.K.Pathan, for his
encouraging guidance, monitoring and constant motivation
throughout the process of writing this research paper.
The blessing, help and guidance given by him shall
from time to time, carry us a long way on the journey of life
which we are about to embark.
We are also competed to Prof.Pramod Mali, our
Project Coordinator, for the valuable information and
guidance provided by him.
REFERENCES
[1] Online:
https://en.wikipedia.org/wiki/Information_retrieval
[2] E.-P. Lim, V.-A. Nguyen, N. Jindal, B. Liu, and H. W.
Lauw, “Detecting product review spammers using
rating behaviours,” in Proc. 19th ACM Int. Conf.
Inform. Knowl. Manage., 2010, pp. 939–948
[3] A. Ntoulas, M. Najork, M. Manasse, and D. Fetterly,
“Detecting spam web pages through content analysis,”
in Proc. 15th Int. Conf. World Wide Web, 2006, pp.
83–92.
[4] N. Spirin and J. Han, “Survey on web spam detection:
Principles and algorithms,” SIGKDD Explor.
Newslett., vol. 13, no. 2, pp. 50– 64, May 2012.
Detection of Fraud Reviews for Products
(IJSRD/Vol. 3/Issue 10/2015/082)
All rights reserved by www.ijsrd.com 410
[5] G. Heinrich, Parameter estimation for text analysis,
“Univ. Leipzig, Leipzig, Germany, Tech.Rep.,
http://faculty.cs.byu.edu/~ringger/CS601R/papers/Hein
rich-GibbsLDA.pdf, 2008.
[6] N. Spirin and J. Han, “Survey on web spam detection:
Principles and algorithms,” SIGKDD Explor.
Newslett., vol. 13, no. 2, pp. 50– 64, May 2012.
[7] B. Zhou, J. Pei, and Z. Tang, “A spamicity approach to
web spam detection,” in Proc. SIAM Int. Conf. Data
Mining, 2008, pp. 277–288.
[8] Z. Wu, J. Wu, J. Cao, and D. Tao, “HySAD: A semi-
supervised hybrid shilling attack detector for
trustworthy product recommendation,” in Proc. 18th
ACM SIGKDD Int. Conf. Knowl. Discovery Data
Mining, 2012, pp. 985–993.
[9] S. Xie, G. Wang, S. Lin, and P. S. Yu, “Review spam
detection via temporal pattern discovery,” in Proc. 18th
ACM SIGKDD Int. Conf. Knowl. Discovery Data
Mining, 2012, pp. 823–831.
[10]Altug Akay, Andrei Dragomir and Bj orn Erik
Erlandsson, Senior Member, IEEE. “Network Based
Modeling and Intelligent Data Mining of Social Media
for Improving Care”, IEEE Journal of Biomedical and
Health Informatics, Vol. 19, No. 1, January 2015.
[11]H. Zhu, E. Chen, K. Yu, H. Cao, H. Xiong, and J. Tian,
“Mining personal context-aware preferences for
mobile users,” in Proc. IEEE 12th Int. Conf. Data
Mining, 2012, pp. 1212–1217

Más contenido relacionado

La actualidad más candente

Opinion Mining
Opinion MiningOpinion Mining
Opinion Mining
George Ang
 
Empirical analysis on iOS app Popularity
Empirical analysis on iOS app PopularityEmpirical analysis on iOS app Popularity
Empirical analysis on iOS app Popularity
Min-Hsueh Tsai
 

La actualidad más candente (12)

Mahendra nath
Mahendra nathMahendra nath
Mahendra nath
 
IRJET- Enhancing NLP Techniques for Fake Review Detection
IRJET- Enhancing NLP Techniques for Fake Review DetectionIRJET- Enhancing NLP Techniques for Fake Review Detection
IRJET- Enhancing NLP Techniques for Fake Review Detection
 
IRJET- E-Commerce Recommendation based on Users Rating Data
IRJET-  	  E-Commerce Recommendation based on Users Rating DataIRJET-  	  E-Commerce Recommendation based on Users Rating Data
IRJET- E-Commerce Recommendation based on Users Rating Data
 
Statistical Methods for Integration and Analysis of Online Opinionated Text...
Statistical Methods for Integration and Analysis of Online Opinionated Text...Statistical Methods for Integration and Analysis of Online Opinionated Text...
Statistical Methods for Integration and Analysis of Online Opinionated Text...
 
UserZoom Webinar: How to Conduct Web Customer Experience Benchmarking
UserZoom Webinar: How to Conduct Web Customer Experience BenchmarkingUserZoom Webinar: How to Conduct Web Customer Experience Benchmarking
UserZoom Webinar: How to Conduct Web Customer Experience Benchmarking
 
Review on Opinion Targets and Opinion Words Extraction Techniques from Online...
Review on Opinion Targets and Opinion Words Extraction Techniques from Online...Review on Opinion Targets and Opinion Words Extraction Techniques from Online...
Review on Opinion Targets and Opinion Words Extraction Techniques from Online...
 
Brightfind world usability day 2016 full deck final
Brightfind world usability day 2016   full deck finalBrightfind world usability day 2016   full deck final
Brightfind world usability day 2016 full deck final
 
Opinion Mining
Opinion MiningOpinion Mining
Opinion Mining
 
User research to enhance the us postal service website
User research to enhance the us postal service websiteUser research to enhance the us postal service website
User research to enhance the us postal service website
 
Enabling Opinion-Driven Decision Making - Sentiment Analysis Innovation Summit
Enabling Opinion-Driven Decision Making - Sentiment Analysis Innovation Summit Enabling Opinion-Driven Decision Making - Sentiment Analysis Innovation Summit
Enabling Opinion-Driven Decision Making - Sentiment Analysis Innovation Summit
 
Final Report for CUTGroup #17 - Ventra Mobile App
Final Report for CUTGroup #17 - Ventra Mobile AppFinal Report for CUTGroup #17 - Ventra Mobile App
Final Report for CUTGroup #17 - Ventra Mobile App
 
Empirical analysis on iOS app Popularity
Empirical analysis on iOS app PopularityEmpirical analysis on iOS app Popularity
Empirical analysis on iOS app Popularity
 

Similar a Detection of Fraud Reviews for a Product

iaetsd Co extracting opinion targets and opinion words from online reviews ba...
iaetsd Co extracting opinion targets and opinion words from online reviews ba...iaetsd Co extracting opinion targets and opinion words from online reviews ba...
iaetsd Co extracting opinion targets and opinion words from online reviews ba...
Iaetsd Iaetsd
 
Fake Product Review Monitoring System
Fake Product Review Monitoring SystemFake Product Review Monitoring System
Fake Product Review Monitoring System
ijtsrd
 
TOWARDS AUTOMATIC DETECTION OF SENTIMENTS IN CUSTOMER REVIEWS
TOWARDS AUTOMATIC DETECTION OF SENTIMENTS IN CUSTOMER REVIEWSTOWARDS AUTOMATIC DETECTION OF SENTIMENTS IN CUSTOMER REVIEWS
TOWARDS AUTOMATIC DETECTION OF SENTIMENTS IN CUSTOMER REVIEWS
ijistjournal
 
TOWARDS AUTOMATIC DETECTION OF SENTIMENTS IN CUSTOMER REVIEWS
TOWARDS AUTOMATIC DETECTION OF SENTIMENTS IN CUSTOMER REVIEWSTOWARDS AUTOMATIC DETECTION OF SENTIMENTS IN CUSTOMER REVIEWS
TOWARDS AUTOMATIC DETECTION OF SENTIMENTS IN CUSTOMER REVIEWS
ijistjournal
 

Similar a Detection of Fraud Reviews for a Product (20)

Product Quality Analysis based on online Reviews
Product Quality Analysis based on online ReviewsProduct Quality Analysis based on online Reviews
Product Quality Analysis based on online Reviews
 
iaetsd Co extracting opinion targets and opinion words from online reviews ba...
iaetsd Co extracting opinion targets and opinion words from online reviews ba...iaetsd Co extracting opinion targets and opinion words from online reviews ba...
iaetsd Co extracting opinion targets and opinion words from online reviews ba...
 
FAKE PRODUCT PAPER PRESENTATION.pptx
FAKE PRODUCT PAPER PRESENTATION.pptxFAKE PRODUCT PAPER PRESENTATION.pptx
FAKE PRODUCT PAPER PRESENTATION.pptx
 
Sentiment analysis on unstructured review
Sentiment analysis on unstructured reviewSentiment analysis on unstructured review
Sentiment analysis on unstructured review
 
Apps for good - Review It
Apps for good - Review ItApps for good - Review It
Apps for good - Review It
 
IRJET- Spotting and Removing Fake Product Review in Consumer Rating Reviews
IRJET- Spotting and Removing Fake Product Review in Consumer Rating ReviewsIRJET- Spotting and Removing Fake Product Review in Consumer Rating Reviews
IRJET- Spotting and Removing Fake Product Review in Consumer Rating Reviews
 
Fake Product Review Monitoring System
Fake Product Review Monitoring SystemFake Product Review Monitoring System
Fake Product Review Monitoring System
 
Sentiment analysis on_unstructured_review-1
Sentiment analysis on_unstructured_review-1Sentiment analysis on_unstructured_review-1
Sentiment analysis on_unstructured_review-1
 
A proposed Novel Approach for Sentiment Analysis and Opinion Mining
A proposed Novel Approach for Sentiment Analysis and Opinion MiningA proposed Novel Approach for Sentiment Analysis and Opinion Mining
A proposed Novel Approach for Sentiment Analysis and Opinion Mining
 
A proposed novel approach for sentiment analysis and opinion mining
A proposed novel approach for sentiment analysis and opinion miningA proposed novel approach for sentiment analysis and opinion mining
A proposed novel approach for sentiment analysis and opinion mining
 
SENTIMENT ANALYSIS ON PRODUCT FEATURES BASED ON LEXICON APPROACH USING NATURA...
SENTIMENT ANALYSIS ON PRODUCT FEATURES BASED ON LEXICON APPROACH USING NATURA...SENTIMENT ANALYSIS ON PRODUCT FEATURES BASED ON LEXICON APPROACH USING NATURA...
SENTIMENT ANALYSIS ON PRODUCT FEATURES BASED ON LEXICON APPROACH USING NATURA...
 
Sentiment Analysis of Product Reviews and Trustworthiness Evaluation using TRS
Sentiment Analysis of Product Reviews and Trustworthiness Evaluation using TRSSentiment Analysis of Product Reviews and Trustworthiness Evaluation using TRS
Sentiment Analysis of Product Reviews and Trustworthiness Evaluation using TRS
 
Sentiment Analysis of Product Reviews and Trustworthiness Evaluation using TRS
Sentiment Analysis of Product Reviews and Trustworthiness Evaluation using TRSSentiment Analysis of Product Reviews and Trustworthiness Evaluation using TRS
Sentiment Analysis of Product Reviews and Trustworthiness Evaluation using TRS
 
2
22
2
 
Automatic Recommendation of Trustworthy Users in Online Product Rating Sites
Automatic Recommendation of Trustworthy Users in Online Product Rating SitesAutomatic Recommendation of Trustworthy Users in Online Product Rating Sites
Automatic Recommendation of Trustworthy Users in Online Product Rating Sites
 
Guide: Conjoint Analysis
Guide: Conjoint AnalysisGuide: Conjoint Analysis
Guide: Conjoint Analysis
 
50120130406034 2
50120130406034 250120130406034 2
50120130406034 2
 
TOWARDS AUTOMATIC DETECTION OF SENTIMENTS IN CUSTOMER REVIEWS
TOWARDS AUTOMATIC DETECTION OF SENTIMENTS IN CUSTOMER REVIEWSTOWARDS AUTOMATIC DETECTION OF SENTIMENTS IN CUSTOMER REVIEWS
TOWARDS AUTOMATIC DETECTION OF SENTIMENTS IN CUSTOMER REVIEWS
 
TOWARDS AUTOMATIC DETECTION OF SENTIMENTS IN CUSTOMER REVIEWS
TOWARDS AUTOMATIC DETECTION OF SENTIMENTS IN CUSTOMER REVIEWSTOWARDS AUTOMATIC DETECTION OF SENTIMENTS IN CUSTOMER REVIEWS
TOWARDS AUTOMATIC DETECTION OF SENTIMENTS IN CUSTOMER REVIEWS
 
Ieee format 5th nccci_a study on factors influencing as a best practice for...
Ieee format 5th nccci_a study on factors influencing as  a  best practice for...Ieee format 5th nccci_a study on factors influencing as  a  best practice for...
Ieee format 5th nccci_a study on factors influencing as a best practice for...
 

Más de IJSRD

Más de IJSRD (20)

#IJSRD #Research Paper Publication
#IJSRD #Research Paper Publication#IJSRD #Research Paper Publication
#IJSRD #Research Paper Publication
 
Maintaining Data Confidentiality in Association Rule Mining in Distributed En...
Maintaining Data Confidentiality in Association Rule Mining in Distributed En...Maintaining Data Confidentiality in Association Rule Mining in Distributed En...
Maintaining Data Confidentiality in Association Rule Mining in Distributed En...
 
Performance and Emission characteristics of a Single Cylinder Four Stroke Die...
Performance and Emission characteristics of a Single Cylinder Four Stroke Die...Performance and Emission characteristics of a Single Cylinder Four Stroke Die...
Performance and Emission characteristics of a Single Cylinder Four Stroke Die...
 
Preclusion of High and Low Pressure In Boiler by Using LABVIEW
Preclusion of High and Low Pressure In Boiler by Using LABVIEWPreclusion of High and Low Pressure In Boiler by Using LABVIEW
Preclusion of High and Low Pressure In Boiler by Using LABVIEW
 
Prevention and Detection of Man in the Middle Attack on AODV Protocol
Prevention and Detection of Man in the Middle Attack on AODV ProtocolPrevention and Detection of Man in the Middle Attack on AODV Protocol
Prevention and Detection of Man in the Middle Attack on AODV Protocol
 
Comparative Analysis of PAPR Reduction Techniques in OFDM Using Precoding Tec...
Comparative Analysis of PAPR Reduction Techniques in OFDM Using Precoding Tec...Comparative Analysis of PAPR Reduction Techniques in OFDM Using Precoding Tec...
Comparative Analysis of PAPR Reduction Techniques in OFDM Using Precoding Tec...
 
Evaluation the Effect of Machining Parameters on MRR of Mild Steel
Evaluation the Effect of Machining Parameters on MRR of Mild SteelEvaluation the Effect of Machining Parameters on MRR of Mild Steel
Evaluation the Effect of Machining Parameters on MRR of Mild Steel
 
Filter unwanted messages from walls and blocking nonlegitimate user in osn
Filter unwanted messages from walls and blocking nonlegitimate user in osnFilter unwanted messages from walls and blocking nonlegitimate user in osn
Filter unwanted messages from walls and blocking nonlegitimate user in osn
 
Keystroke Dynamics Authentication with Project Management System
Keystroke Dynamics Authentication with Project Management SystemKeystroke Dynamics Authentication with Project Management System
Keystroke Dynamics Authentication with Project Management System
 
Diagnosing lungs cancer Using Neural Networks
Diagnosing lungs cancer Using Neural NetworksDiagnosing lungs cancer Using Neural Networks
Diagnosing lungs cancer Using Neural Networks
 
A Survey on Sentiment Analysis and Opinion Mining
A Survey on Sentiment Analysis and Opinion MiningA Survey on Sentiment Analysis and Opinion Mining
A Survey on Sentiment Analysis and Opinion Mining
 
A Defect Prediction Model for Software Product based on ANFIS
A Defect Prediction Model for Software Product based on ANFISA Defect Prediction Model for Software Product based on ANFIS
A Defect Prediction Model for Software Product based on ANFIS
 
Experimental Investigation of Granulated Blast Furnace Slag ond Quarry Dust a...
Experimental Investigation of Granulated Blast Furnace Slag ond Quarry Dust a...Experimental Investigation of Granulated Blast Furnace Slag ond Quarry Dust a...
Experimental Investigation of Granulated Blast Furnace Slag ond Quarry Dust a...
 
Solving Fuzzy Matrix Games Defuzzificated by Trapezoidal Parabolic Fuzzy Numbers
Solving Fuzzy Matrix Games Defuzzificated by Trapezoidal Parabolic Fuzzy NumbersSolving Fuzzy Matrix Games Defuzzificated by Trapezoidal Parabolic Fuzzy Numbers
Solving Fuzzy Matrix Games Defuzzificated by Trapezoidal Parabolic Fuzzy Numbers
 
Study of Clustering of Data Base in Education Sector Using Data Mining
Study of Clustering of Data Base in Education Sector Using Data MiningStudy of Clustering of Data Base in Education Sector Using Data Mining
Study of Clustering of Data Base in Education Sector Using Data Mining
 
Fault Tolerance in Big Data Processing Using Heartbeat Messages and Data Repl...
Fault Tolerance in Big Data Processing Using Heartbeat Messages and Data Repl...Fault Tolerance in Big Data Processing Using Heartbeat Messages and Data Repl...
Fault Tolerance in Big Data Processing Using Heartbeat Messages and Data Repl...
 
Investigation of Effect of Process Parameters on Maximum Temperature during F...
Investigation of Effect of Process Parameters on Maximum Temperature during F...Investigation of Effect of Process Parameters on Maximum Temperature during F...
Investigation of Effect of Process Parameters on Maximum Temperature during F...
 
Review Paper on Computer Aided Design & Analysis of Rotor Shaft of a Rotavator
Review Paper on Computer Aided Design & Analysis of Rotor Shaft of a RotavatorReview Paper on Computer Aided Design & Analysis of Rotor Shaft of a Rotavator
Review Paper on Computer Aided Design & Analysis of Rotor Shaft of a Rotavator
 
A Survey on Data Mining Techniques for Crime Hotspots Prediction
A Survey on Data Mining Techniques for Crime Hotspots PredictionA Survey on Data Mining Techniques for Crime Hotspots Prediction
A Survey on Data Mining Techniques for Crime Hotspots Prediction
 
Studies on Physico - Mechanical Properties of Chloroprene Rubber Vulcanizate ...
Studies on Physico - Mechanical Properties of Chloroprene Rubber Vulcanizate ...Studies on Physico - Mechanical Properties of Chloroprene Rubber Vulcanizate ...
Studies on Physico - Mechanical Properties of Chloroprene Rubber Vulcanizate ...
 

Último

Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
KarakKing
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
AnaAcapella
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
 

Último (20)

ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Dyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxDyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptx
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 

Detection of Fraud Reviews for a Product

  • 1. IJSRD - International Journal for Scientific Research & Development| Vol. 3, Issue 10, 2015 | ISSN (online): 2321-0613 All rights reserved by www.ijsrd.com 408 Detection of Fraud Reviews for Products Dr. S.K. Pathan1 Janhavi Sankpal2 Pooja Sankpal3 Kanchan Haral4 1 Professor 2,3,4 Student 1,2,3,4 Department of Computer Science 1,2,3,4 Smt. Kashibai Navale College of Engineering, Pune Abstract— Most of us use e-shopping (Any product) these days and refer its rating or reviews before we download or buy that product. Amazon/Play store provide a great number of products but unfortunately few of those product reviews are fraud. Hence such products must be marked, so that they will be recognizable for rest of the users. Here we are comparing reviews from two sites so that we can get more clear idea. We can get higher probability of getting real reviews if we take data from multiple sites. We are proposing a system to develop an android application that will take reviews from two different websites for single product, and analyze them with NLP for positive or negative rating. In this, user will give two different URLs of two different sites for same product to the system as input. For every URL reviews and comments will be fetched separately and analyzed with NLP for positive negative rating. Then their rating will be combined together with average to give final rating for the product. As we are handling the big data here, we are using Hadoops map reduce. So it will be easier to decide which product reviews are fraud or not. Key words: Primal Text Mining and Pre-processing, NLP, Mapping Text Data I. INTRODUCTION This article is provided for the people who download or buy products online. Before they proceed with their purchase, review of that product is a key aspect. There is no means by which they can be sure that the given reviews are authentic. A database is provided through which we can map the comments by which we can conclude which of the comments are positive and which are negative. User can check the reviews of the same product from two different sites. User will provide the URL of the one product from two different sites. After analysing the reviews and mapping with the database result will be generated. The generated result will be in the form of positive or negative and in terms of percentage and a graph. Using NLP we can process the reviews and generate results. To map the data with the database Hadoop’s MapReduce method can be used. II. EXISTING WORK Many of the sites just provide ranking, rating and reviews. They do not assure its authenticity. There is a need to cross check if the commented reviews are original or they are just for bumping up the product to gain popularity. On this domain there are exist some related work, such as web ranking spam detection [1], online review spam detection [2] and mobile App recommendation systems [3], the issue in detecting ranking of the fraud for mobile applications or any online product is still unfathomed. Currently, the most popular and widely used Online e-shopping sites provide facility for buyers to view review of the product they are interested to buy and for the people already bought products to share their shopping experience and rate or review the product. This helps future buyers to have the base of reference to which they can refer before buying products online. Even on platforms like Play store, one gets an opportunity to study reviews before actually downloading the application. However, as the positive side enables us to these facilities of studying the experiences of users on the other hand there also raises questions about authenticity of these reviews. In this digital world it has become easier to rate any products online. If we consider real-world observations, we find that each review is always associated with a respective untapped topic. For instance, some reviews might be related to the untapped topic “worth a try” while some might be related to the untapped topic “not so good”. At the same time, different buyers have different personal preferences of mobile applications or different site preferences for buying products. A product or an application may have different topic distributions in their historical review records. Plausibly, the topic distribution of reviews, be it of a product or an application, in a normal leading session of application, should be consistent with the topic distribution in all historical review records of that application. And so applies to the product reviews. The possible cause to this are the reviews of the topics that are based on the user’s personal usage experiences and choices but not the popularity of mobile applications. On the other hand, if the reviews of leading sessions have been manipulated, the two topic distributions will be prominently different. For example, there may contain more positive topics, such as “worth a try”, “popular” and “good” in the leading session. III. DRAWBACKS OF THE EXISTING SYSTEM The existing system cannot determine the authenticity of the reviews which are posted for any product. Buyers or the users are not able to recognize the authenticity of the reviews given to a product and hence they face dilemma whether the product is really good or bad. The existing system does not detect fraudulent or deceptive activities in reviews which might be just for gaining popularity of that product. For example, the developer or the brand which are selling their products can themselves rate the products to manipulate consumer’s perception. Also the rivals or the opponents can post negative reviews for inflating the product’s reputation. Table 1 After using these search words both, positive keywords classification and negative keywords classification are used which leads to the generation of result from the keywords in the database [10].
  • 2. Detection of Fraud Reviews for Products (IJSRD/Vol. 3/Issue 10/2015/082) All rights reserved by www.ijsrd.com 409 A. Primal Text Mining and Pre-processing: Many software’s / algorithms like Rapid-miner, WorldNet, and data collection and processing trees have been developed to test common positive and negative words. Fig. 1: A sample architecture [10] The URLs from two different sites for the same product are taken as input in the system. The reviews from those sites are scanned and mapped into the database to get the result in terms of positive and negative percentage. A database is maintained to map the words which are used in the comments to check the positivity or the negativity of the comments or reviews. Based on these results, a user can roughly get an idea of whether the product (application or any product) is worth buying or downloading. B. Mapping Text Data: The final word set will be pruned using the above mentioned methods, with the results displayed in Table 1,2 which have separate column of both positive and negative words. Positive Negative Agree Bad Appreciate Cannot Beneficial Damage Comfort Dangerous Ease Depression Easier Died Enjoy Difficult Good Error Great Hard Greatest Failure Help Impossible Hope Lack Important No Thanks Not Superb Poor Yes Weak Table 2: Positive and Negative Classification of Words from Posts IV. PROPOSED SYSTEM Our main aim is to provide users with the authenticity of product reviews and authenticity of application reviews. This will enable user to have a clearer view towards the quality and reliability of the product or application. We therefore propose a system which will perform the following tasks: - 1) System that will collect all the information about particular application/sites such as any comment or reviews about it. 2) Use NLP for processing those comments and reviews. 3) Use Map reduce Algorithm for querying. Hence we can detect whether the reviews fraud or not. The system uses the following basic concepts: - A. Natural Language Processing: It is a field of computer science, artificial intelligence, and computational linguistics concern with interactions between computers and human languages. B. Tokenization: It transforms a stream of characters into a stream of processing units called tokens. C. Stop Words Filtering: It consists in eliminating stop words. Stop words or words are filtered out before or after natural language processing. For example, “is”,”the”,”at”,”who”,”which”. D. Stemming: It is the process of reducing derived words into their root form. The related words should match to the same stem. For example, the words “drove”, “drives”, “driven” would fall under there root word “drive”. V. CONCLUSION Thus, if the gathered data is processed and studied precisely then it will lead to better judgment of the products, the developers will no longer bring their applications to market just for the sake of money, people will get idea about its quality and reliability easily. Reviews will easily be detected and would not be taken into consideration unless they are truly genuine. ACKNOWLEDGEMENTS We take this chance to convey our sincere gratitude, respect and deep regards to our Project guide, Dr.S.K.Pathan, for his encouraging guidance, monitoring and constant motivation throughout the process of writing this research paper. The blessing, help and guidance given by him shall from time to time, carry us a long way on the journey of life which we are about to embark. We are also competed to Prof.Pramod Mali, our Project Coordinator, for the valuable information and guidance provided by him. REFERENCES [1] Online: https://en.wikipedia.org/wiki/Information_retrieval [2] E.-P. Lim, V.-A. Nguyen, N. Jindal, B. Liu, and H. W. Lauw, “Detecting product review spammers using rating behaviours,” in Proc. 19th ACM Int. Conf. Inform. Knowl. Manage., 2010, pp. 939–948 [3] A. Ntoulas, M. Najork, M. Manasse, and D. Fetterly, “Detecting spam web pages through content analysis,” in Proc. 15th Int. Conf. World Wide Web, 2006, pp. 83–92. [4] N. Spirin and J. Han, “Survey on web spam detection: Principles and algorithms,” SIGKDD Explor. Newslett., vol. 13, no. 2, pp. 50– 64, May 2012.
  • 3. Detection of Fraud Reviews for Products (IJSRD/Vol. 3/Issue 10/2015/082) All rights reserved by www.ijsrd.com 410 [5] G. Heinrich, Parameter estimation for text analysis, “Univ. Leipzig, Leipzig, Germany, Tech.Rep., http://faculty.cs.byu.edu/~ringger/CS601R/papers/Hein rich-GibbsLDA.pdf, 2008. [6] N. Spirin and J. Han, “Survey on web spam detection: Principles and algorithms,” SIGKDD Explor. Newslett., vol. 13, no. 2, pp. 50– 64, May 2012. [7] B. Zhou, J. Pei, and Z. Tang, “A spamicity approach to web spam detection,” in Proc. SIAM Int. Conf. Data Mining, 2008, pp. 277–288. [8] Z. Wu, J. Wu, J. Cao, and D. Tao, “HySAD: A semi- supervised hybrid shilling attack detector for trustworthy product recommendation,” in Proc. 18th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, 2012, pp. 985–993. [9] S. Xie, G. Wang, S. Lin, and P. S. Yu, “Review spam detection via temporal pattern discovery,” in Proc. 18th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, 2012, pp. 823–831. [10]Altug Akay, Andrei Dragomir and Bj orn Erik Erlandsson, Senior Member, IEEE. “Network Based Modeling and Intelligent Data Mining of Social Media for Improving Care”, IEEE Journal of Biomedical and Health Informatics, Vol. 19, No. 1, January 2015. [11]H. Zhu, E. Chen, K. Yu, H. Cao, H. Xiong, and J. Tian, “Mining personal context-aware preferences for mobile users,” in Proc. IEEE 12th Int. Conf. Data Mining, 2012, pp. 1212–1217