SlideShare una empresa de Scribd logo
1 de 4
Descargar para leer sin conexión
IOSR Journal of Computer Engineering (IOSR-JCE)
e-ISSN: 2278-0661,p-ISSN: 2278-8727, Volume 17, Issue 6, Ver. I (Nov – Dec. 2015), PP 91-94
www.iosrjournals.org
DOI: 10.9790/0661-17619194 www.iosrjournals.org 91 | Page
Hadoop add-on API for Advanced Content Based Search &
Retrieval
1
Kshama Jain, 2
Aditya Kamble, 3
Siddhesh Palande, 4
Rahul Rao,
Prof. ShaileshHule
Department of Computer EngineeringPimpriChinchwad College of Engineering, Pune
Guided by: Prof. ShaileshHule
Abstract:Unstructured data like doc, pdf is lengthy to search and filter for desired information. We need to go
through every file manually for finding information. It is very time consuming and frustrating. We can use
features of big data management system like Hadoop to organize unstructured data dynamically and return
desired information. Hadoop provides features like Map Reduce, HDFS, HBase to filter data as per user input.
Finally we can develop Hadoop Addon for content search and filtering on unstructured data.
Index Terms:Hadoop, Map Reduce, HBase, Content BasedRetrieval.
I. Introduction
A. Hadoop
Hadoop is an open-source software framework written in Java for distributed storage and distributed
processing of very large data sets on computer clusters built from commodity hardware. Commodity computing,
or commodity cluster computing, is the use of large numbers of already-available computing components for
parallel computing, to get the greatest amount of useful computation at low cost. All the modules in Hadoop are
designed with a fundamental assumption that hardware failures (of individual machines, or racks of machines)
are commonplace and thus should be automatically handled in software by the framework. The core of Apache
Hadoop consists of a storage part (Hadoop Distributed File System (HDFS)) and a processing part (Map
Reduce). Hadoop splits files into large blocks and distributesthem amongst the nodes in the cluster.
B. HDFS
The Hadoop Distributed File System (HDFS) is a distributed file system designed to run on commodity
hardware. HDFS is highly fault-tolerant and is designed to be deployed on low-cost hardware. HDFS provides
high throughput access to application data and is suitable for applications that have large data sets.
C. MapReduce
A MapReduce program is composed of a Map() procedure(method) that performs filtering and sorting
and a Reduce() method that performs a summary operation. The MapReduce System run the various tasks in
parallel, managing all communications and data transfers between the various parts of the system.
D. HBase
HBase is a data model that is similar to Googles big table designed to provide quick random access to
huge amounts of structured data. This tutorial provides an introduction to HBase, the procedures to set up HBase
on Hadoop File Systems, and ways to interact with HBase shell. It also describes how to connect to HBase using
java, and how to perform basic operations on HBase using java.
E. Current System
Tasks like assignments, taking notes from text books and reference books on particular topic, topics for
presentation need deep reading and need to go through every document manually just to find relevant content on
given topic. Currently present systems are only searching based on document title, author, size, and time but not
on content. So to do content based search on big data documents and large text data Haddop framework can be
used
F. Content Based Approach
Manually filtering from any kind of unstructured data like PDF is tedious and time consuming, we are
developing API for Hadoop finding relevant information from large sets and retrieving the same is main
concern. So using Hadoop Big Data management framework consist of HDFS, MapReduce, and HBase, we are
developing content based search on PDF documents to solve real life problem. So this is basic motivation for the
Hadoop add-on API for Advanced Content Based Search & Retrieval
DOI: 10.9790/0661-17619194 www.iosrjournals.org 92 | Page
project.
II. Proposed System
This system shall retrieve the required contents of files which are in an unstructured format containing
huge amount of Data like e-books in a digital library where the number of books are present in thousands. The
scope here is initially limited to PDF files which may be expanded to other unstructured formats like ePUB,
mobi. The input is provided to the system API in the form of search query which will be firstly filtered to find
the important expression as a query to the API which will return the important content to the user in the form of
paragraph or text highlighted using the power of distributed computing. The particular page or the Entire book
itself can be downloaded by the Library users if the content is satisfied else the search continues for finding
relevant content
Fig. 1. Data Flow Diagram
Algorithm 1: Content Based Search Algorithm
1) Take Input Query from User
2) Use the Suffix Stripping Algorithm to remove English Suffixes like ’ed’,’ing’,’ ’ ’s ’ to extract keywords
3) Match keywords with Documents’ metadata present in Hbase and filter the documents to be searched.
4) Apply Searching Algorithm based on map-reduce operations like keyword count,positions.
 Open the PDF
 Distribute the text using map operation.
 Combine the results with reduce operations.
5) Sort the documents which are returned by Searching w.r.t keyword count and relevance.
6) Depending on API methods return the result with
 Paragraph
 Whole Page
 Current and Previous Page
A. Document Metadata
Searching and retrieving information from data inside HDFS. Hadoop Map Reduce operations will be
used to perform key value pair generation and depend upon result information is searched.
a) Here, data is documents in PDF (Portable Document Format) format. These documents are stored on HDFS
Hadoop Distributed File System. These documents are considered as raw data. These documents can have
properties like
 Name Size Date
 Author
b) Document meta data is also maintain in HBase distributed database. It has attributes like
 Content Keywords
Hadoop add-on API for Advanced Content Based Search & Retrieval
DOI: 10.9790/0661-17619194 www.iosrjournals.org 93 | Page
 Index Keyword
 Document Keywords
This information will be useful to filter documents before performing content based searching operations.
Fig. 2. Activity Diagram
Algorithm 2: Meta-data Extraction Service Algorithm
Meta-data Extraction Service
1) Take the input pdf files from user
2) Check if corrupt
3) Upload the file into the Hadoop Distributed Filesystem
4) Run the Metadata Extracting Algorithm using the Map Reduce Engine.
5) Extract the Bibliography and Index Contents of each and every document (if present) and related keywords.
6) Store it in the HBase as metadata.
7) This metadata will be used by map reduce engine to search and retrieve data from relevant document inside
HDFS using keyword relative custom partitioning.
8) Return the result to the search API
9) Stop
Hadoop add-on API for Advanced Content Based Search & Retrieval
DOI: 10.9790/0661-17619194 www.iosrjournals.org 94 | Page
III. Applications
Content based search and retrieval on
1) Digital library books
2) Conference Papers
3) Private Authorities
4) Government Authorities
5) Unstructured text files
This API can be used to develop above functionality on platforms like
1) Web Application
2) Android Application
3) Standalone Application
4) Command Line Interface Application
IV. Future Scope
Extending our API further to read scanned text copies and retrieve the data from them would solve
further advanced issues. This API again can be extended further to work with Image processing so that it may
find a relevant information from the digital images for the end- users.
V. Conclusion
This API would be favorable for many large organizations like Large-scale Industries and Educational
institutes, Power-Grids, Air-lines, Government Offices and many others working with and producing Big-Data
to retrieve data in an efficient and a very cost effective way. Thus we can use this API in Hadoop to reduce
manual efforts and bring advance content based search and retrieval
References
[1] Lars George, ”HBase: The Definitive Guide”, 1st edition, O’Reilly Media, September 2011, ISBN 9781449396107
[2] Tom White, ”Hadoop: The Definitive Guide”, 1st edition, O’Reilly Media, June 2009, ISBN 9780596521974
[3] Apache Hadoop HDFS homepage http://hadoop.apache.org/hdfs/
[4] MehulNalinVora ,”Hadoop-HBase for Large-Scale Data”,InnovationLabs,PERC,ICCNT Conference
[5] YijunBei,ZhenLin,ChenZhao,Xiaojun Zhu ,”HBase System-based Distributed Framework for Searching Large Graph
Databases”,ICCNT Conference
[6] SeemaMaitrey,C.K.”Handling Big Data Efficiently by using Map Reduce Technique”,ICCICT
[7] Maitrey S, Jha. An Integrated Approach for CURE Clustering using Map-Reduce Technique. In Proceedings of Elsevier,ISBN 978-
81- 910691-6-3,2 nd August 2013.

Más contenido relacionado

La actualidad más candente

Performance evaluation of Map-reduce jar pig hive and spark with machine lear...
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...Performance evaluation of Map-reduce jar pig hive and spark with machine lear...
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...IJECEIAES
 
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...Cognizant
 
Design architecture based on web
Design architecture based on webDesign architecture based on web
Design architecture based on webcsandit
 
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap IT Strategy Group
 
Design and Implementation of SOA Enhanced Semantic Information Retrieval web ...
Design and Implementation of SOA Enhanced Semantic Information Retrieval web ...Design and Implementation of SOA Enhanced Semantic Information Retrieval web ...
Design and Implementation of SOA Enhanced Semantic Information Retrieval web ...iosrjce
 
Survey on Performance of Hadoop Map reduce Optimization Methods
Survey on Performance of Hadoop Map reduce Optimization MethodsSurvey on Performance of Hadoop Map reduce Optimization Methods
Survey on Performance of Hadoop Map reduce Optimization Methodspaperpublications3
 
Infrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical WorkloadsInfrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical WorkloadsCognizant
 
A comparative survey based on processing network traffic data using hadoop pi...
A comparative survey based on processing network traffic data using hadoop pi...A comparative survey based on processing network traffic data using hadoop pi...
A comparative survey based on processing network traffic data using hadoop pi...ijcses
 
Intro to Hybrid Data Warehouse
Intro to Hybrid Data WarehouseIntro to Hybrid Data Warehouse
Intro to Hybrid Data WarehouseJonathan Bloom
 
Sparkr sigmod
Sparkr sigmodSparkr sigmod
Sparkr sigmodwaqasm86
 
Which NoSQL Database to Combine with Spark for Real Time Big Data Analytics?
Which NoSQL Database to Combine with Spark for Real Time Big Data Analytics?Which NoSQL Database to Combine with Spark for Real Time Big Data Analytics?
Which NoSQL Database to Combine with Spark for Real Time Big Data Analytics?IJCSIS Research Publications
 
RDBMS vs Hadoop vs Spark
RDBMS vs Hadoop vs SparkRDBMS vs Hadoop vs Spark
RDBMS vs Hadoop vs SparkLaxmi8
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File SystemNilaNila16
 

La actualidad más candente (17)

Performance evaluation of Map-reduce jar pig hive and spark with machine lear...
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...Performance evaluation of Map-reduce jar pig hive and spark with machine lear...
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...
 
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...
 
B1803040412
B1803040412B1803040412
B1803040412
 
Design architecture based on web
Design architecture based on webDesign architecture based on web
Design architecture based on web
 
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
 
Design and Implementation of SOA Enhanced Semantic Information Retrieval web ...
Design and Implementation of SOA Enhanced Semantic Information Retrieval web ...Design and Implementation of SOA Enhanced Semantic Information Retrieval web ...
Design and Implementation of SOA Enhanced Semantic Information Retrieval web ...
 
Survey on Performance of Hadoop Map reduce Optimization Methods
Survey on Performance of Hadoop Map reduce Optimization MethodsSurvey on Performance of Hadoop Map reduce Optimization Methods
Survey on Performance of Hadoop Map reduce Optimization Methods
 
Big data
Big dataBig data
Big data
 
Infrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical WorkloadsInfrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical Workloads
 
Hadoop Seminar Report
Hadoop Seminar ReportHadoop Seminar Report
Hadoop Seminar Report
 
A comparative survey based on processing network traffic data using hadoop pi...
A comparative survey based on processing network traffic data using hadoop pi...A comparative survey based on processing network traffic data using hadoop pi...
A comparative survey based on processing network traffic data using hadoop pi...
 
Intro to Hybrid Data Warehouse
Intro to Hybrid Data WarehouseIntro to Hybrid Data Warehouse
Intro to Hybrid Data Warehouse
 
Big data
Big dataBig data
Big data
 
Sparkr sigmod
Sparkr sigmodSparkr sigmod
Sparkr sigmod
 
Which NoSQL Database to Combine with Spark for Real Time Big Data Analytics?
Which NoSQL Database to Combine with Spark for Real Time Big Data Analytics?Which NoSQL Database to Combine with Spark for Real Time Big Data Analytics?
Which NoSQL Database to Combine with Spark for Real Time Big Data Analytics?
 
RDBMS vs Hadoop vs Spark
RDBMS vs Hadoop vs SparkRDBMS vs Hadoop vs Spark
RDBMS vs Hadoop vs Spark
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
 

Destacado

Việc làm bán thời gian cho sinh viên.
Việc làm bán thời gian cho sinh viên.Việc làm bán thời gian cho sinh viên.
Việc làm bán thời gian cho sinh viên.Bất động sản 777
 
Kyandle_Migration-CultAnthro
Kyandle_Migration-CultAnthroKyandle_Migration-CultAnthro
Kyandle_Migration-CultAnthroKrystal Yandle
 
7 yếu tố thành công của một mô hình nhượng quyền
7 yếu tố thành công của một mô hình nhượng quyền7 yếu tố thành công của một mô hình nhượng quyền
7 yếu tố thành công của một mô hình nhượng quyềnPhi Van Nguyen
 
20140120 action learning
20140120 action learning20140120 action learning
20140120 action learninghumana12
 
Empowering Student Learning Through The Development Of A Social Media Communi...
Empowering Student Learning Through The Development Of A Social Media Communi...Empowering Student Learning Through The Development Of A Social Media Communi...
Empowering Student Learning Through The Development Of A Social Media Communi...Thomas Lancaster
 
Fiesta de la tirana, norte de chile.
Fiesta de la tirana, norte de chile.Fiesta de la tirana, norte de chile.
Fiesta de la tirana, norte de chile.jrtorresb
 

Destacado (9)

Apresum erev
Apresum erevApresum erev
Apresum erev
 
Việc làm bán thời gian cho sinh viên.
Việc làm bán thời gian cho sinh viên.Việc làm bán thời gian cho sinh viên.
Việc làm bán thời gian cho sinh viên.
 
Deafblindness
DeafblindnessDeafblindness
Deafblindness
 
Nih cert
Nih certNih cert
Nih cert
 
Kyandle_Migration-CultAnthro
Kyandle_Migration-CultAnthroKyandle_Migration-CultAnthro
Kyandle_Migration-CultAnthro
 
7 yếu tố thành công của một mô hình nhượng quyền
7 yếu tố thành công của một mô hình nhượng quyền7 yếu tố thành công của một mô hình nhượng quyền
7 yếu tố thành công của một mô hình nhượng quyền
 
20140120 action learning
20140120 action learning20140120 action learning
20140120 action learning
 
Empowering Student Learning Through The Development Of A Social Media Communi...
Empowering Student Learning Through The Development Of A Social Media Communi...Empowering Student Learning Through The Development Of A Social Media Communi...
Empowering Student Learning Through The Development Of A Social Media Communi...
 
Fiesta de la tirana, norte de chile.
Fiesta de la tirana, norte de chile.Fiesta de la tirana, norte de chile.
Fiesta de la tirana, norte de chile.
 

Similar a Hadoop add-on API for Advanced Content Based Search & Retrieval

OPTIMIZATION OF MULTIPLE CORRELATED QUERIES BY DETECTING SIMILAR DATA SOURCE ...
OPTIMIZATION OF MULTIPLE CORRELATED QUERIES BY DETECTING SIMILAR DATA SOURCE ...OPTIMIZATION OF MULTIPLE CORRELATED QUERIES BY DETECTING SIMILAR DATA SOURCE ...
OPTIMIZATION OF MULTIPLE CORRELATED QUERIES BY DETECTING SIMILAR DATA SOURCE ...Puneet Kansal
 
Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHitendra Kumar
 
project report on hadoop
project report on hadoopproject report on hadoop
project report on hadoopManoj Jangalva
 
A Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - IntroductionA Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - Introductionsaisreealekhya
 
DESIGN ARCHITECTURE-BASED ON WEB SERVER AND APPLICATION CLUSTER IN CLOUD ENVI...
DESIGN ARCHITECTURE-BASED ON WEB SERVER AND APPLICATION CLUSTER IN CLOUD ENVI...DESIGN ARCHITECTURE-BASED ON WEB SERVER AND APPLICATION CLUSTER IN CLOUD ENVI...
DESIGN ARCHITECTURE-BASED ON WEB SERVER AND APPLICATION CLUSTER IN CLOUD ENVI...cscpconf
 
IJSRED-V2I3P43
IJSRED-V2I3P43IJSRED-V2I3P43
IJSRED-V2I3P43IJSRED
 
Hadoop Based Data Discovery
Hadoop Based Data DiscoveryHadoop Based Data Discovery
Hadoop Based Data DiscoveryBenjamin Ashkar
 
Managing Big data with Hadoop
Managing Big data with HadoopManaging Big data with Hadoop
Managing Big data with HadoopNalini Mehta
 
Unstructured Datasets Analysis: Thesaurus Model
Unstructured Datasets Analysis: Thesaurus ModelUnstructured Datasets Analysis: Thesaurus Model
Unstructured Datasets Analysis: Thesaurus ModelEditor IJCATR
 

Similar a Hadoop add-on API for Advanced Content Based Search & Retrieval (20)

HDFS
HDFSHDFS
HDFS
 
G017143640
G017143640G017143640
G017143640
 
OPTIMIZATION OF MULTIPLE CORRELATED QUERIES BY DETECTING SIMILAR DATA SOURCE ...
OPTIMIZATION OF MULTIPLE CORRELATED QUERIES BY DETECTING SIMILAR DATA SOURCE ...OPTIMIZATION OF MULTIPLE CORRELATED QUERIES BY DETECTING SIMILAR DATA SOURCE ...
OPTIMIZATION OF MULTIPLE CORRELATED QUERIES BY DETECTING SIMILAR DATA SOURCE ...
 
Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log Processing
 
What is hadoop
What is hadoopWhat is hadoop
What is hadoop
 
Hadoop basics
Hadoop basicsHadoop basics
Hadoop basics
 
Unit-3_BDA.ppt
Unit-3_BDA.pptUnit-3_BDA.ppt
Unit-3_BDA.ppt
 
Big data
Big dataBig data
Big data
 
project report on hadoop
project report on hadoopproject report on hadoop
project report on hadoop
 
A Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - IntroductionA Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - Introduction
 
paper
paperpaper
paper
 
Big data Presentation
Big data PresentationBig data Presentation
Big data Presentation
 
DESIGN ARCHITECTURE-BASED ON WEB SERVER AND APPLICATION CLUSTER IN CLOUD ENVI...
DESIGN ARCHITECTURE-BASED ON WEB SERVER AND APPLICATION CLUSTER IN CLOUD ENVI...DESIGN ARCHITECTURE-BASED ON WEB SERVER AND APPLICATION CLUSTER IN CLOUD ENVI...
DESIGN ARCHITECTURE-BASED ON WEB SERVER AND APPLICATION CLUSTER IN CLOUD ENVI...
 
Hadoop in action
Hadoop in actionHadoop in action
Hadoop in action
 
IJSRED-V2I3P43
IJSRED-V2I3P43IJSRED-V2I3P43
IJSRED-V2I3P43
 
Hadoop Based Data Discovery
Hadoop Based Data DiscoveryHadoop Based Data Discovery
Hadoop Based Data Discovery
 
Managing Big data with Hadoop
Managing Big data with HadoopManaging Big data with Hadoop
Managing Big data with Hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
Unstructured Datasets Analysis: Thesaurus Model
Unstructured Datasets Analysis: Thesaurus ModelUnstructured Datasets Analysis: Thesaurus Model
Unstructured Datasets Analysis: Thesaurus Model
 
hadoop resume
hadoop resumehadoop resume
hadoop resume
 

Más de iosrjce

An Examination of Effectuation Dimension as Financing Practice of Small and M...
An Examination of Effectuation Dimension as Financing Practice of Small and M...An Examination of Effectuation Dimension as Financing Practice of Small and M...
An Examination of Effectuation Dimension as Financing Practice of Small and M...iosrjce
 
Does Goods and Services Tax (GST) Leads to Indian Economic Development?
Does Goods and Services Tax (GST) Leads to Indian Economic Development?Does Goods and Services Tax (GST) Leads to Indian Economic Development?
Does Goods and Services Tax (GST) Leads to Indian Economic Development?iosrjce
 
Childhood Factors that influence success in later life
Childhood Factors that influence success in later lifeChildhood Factors that influence success in later life
Childhood Factors that influence success in later lifeiosrjce
 
Emotional Intelligence and Work Performance Relationship: A Study on Sales Pe...
Emotional Intelligence and Work Performance Relationship: A Study on Sales Pe...Emotional Intelligence and Work Performance Relationship: A Study on Sales Pe...
Emotional Intelligence and Work Performance Relationship: A Study on Sales Pe...iosrjce
 
Customer’s Acceptance of Internet Banking in Dubai
Customer’s Acceptance of Internet Banking in DubaiCustomer’s Acceptance of Internet Banking in Dubai
Customer’s Acceptance of Internet Banking in Dubaiiosrjce
 
A Study of Employee Satisfaction relating to Job Security & Working Hours amo...
A Study of Employee Satisfaction relating to Job Security & Working Hours amo...A Study of Employee Satisfaction relating to Job Security & Working Hours amo...
A Study of Employee Satisfaction relating to Job Security & Working Hours amo...iosrjce
 
Consumer Perspectives on Brand Preference: A Choice Based Model Approach
Consumer Perspectives on Brand Preference: A Choice Based Model ApproachConsumer Perspectives on Brand Preference: A Choice Based Model Approach
Consumer Perspectives on Brand Preference: A Choice Based Model Approachiosrjce
 
Student`S Approach towards Social Network Sites
Student`S Approach towards Social Network SitesStudent`S Approach towards Social Network Sites
Student`S Approach towards Social Network Sitesiosrjce
 
Broadcast Management in Nigeria: The systems approach as an imperative
Broadcast Management in Nigeria: The systems approach as an imperativeBroadcast Management in Nigeria: The systems approach as an imperative
Broadcast Management in Nigeria: The systems approach as an imperativeiosrjce
 
A Study on Retailer’s Perception on Soya Products with Special Reference to T...
A Study on Retailer’s Perception on Soya Products with Special Reference to T...A Study on Retailer’s Perception on Soya Products with Special Reference to T...
A Study on Retailer’s Perception on Soya Products with Special Reference to T...iosrjce
 
A Study Factors Influence on Organisation Citizenship Behaviour in Corporate ...
A Study Factors Influence on Organisation Citizenship Behaviour in Corporate ...A Study Factors Influence on Organisation Citizenship Behaviour in Corporate ...
A Study Factors Influence on Organisation Citizenship Behaviour in Corporate ...iosrjce
 
Consumers’ Behaviour on Sony Xperia: A Case Study on Bangladesh
Consumers’ Behaviour on Sony Xperia: A Case Study on BangladeshConsumers’ Behaviour on Sony Xperia: A Case Study on Bangladesh
Consumers’ Behaviour on Sony Xperia: A Case Study on Bangladeshiosrjce
 
Design of a Balanced Scorecard on Nonprofit Organizations (Study on Yayasan P...
Design of a Balanced Scorecard on Nonprofit Organizations (Study on Yayasan P...Design of a Balanced Scorecard on Nonprofit Organizations (Study on Yayasan P...
Design of a Balanced Scorecard on Nonprofit Organizations (Study on Yayasan P...iosrjce
 
Public Sector Reforms and Outsourcing Services in Nigeria: An Empirical Evalu...
Public Sector Reforms and Outsourcing Services in Nigeria: An Empirical Evalu...Public Sector Reforms and Outsourcing Services in Nigeria: An Empirical Evalu...
Public Sector Reforms and Outsourcing Services in Nigeria: An Empirical Evalu...iosrjce
 
Media Innovations and its Impact on Brand awareness & Consideration
Media Innovations and its Impact on Brand awareness & ConsiderationMedia Innovations and its Impact on Brand awareness & Consideration
Media Innovations and its Impact on Brand awareness & Considerationiosrjce
 
Customer experience in supermarkets and hypermarkets – A comparative study
Customer experience in supermarkets and hypermarkets – A comparative studyCustomer experience in supermarkets and hypermarkets – A comparative study
Customer experience in supermarkets and hypermarkets – A comparative studyiosrjce
 
Social Media and Small Businesses: A Combinational Strategic Approach under t...
Social Media and Small Businesses: A Combinational Strategic Approach under t...Social Media and Small Businesses: A Combinational Strategic Approach under t...
Social Media and Small Businesses: A Combinational Strategic Approach under t...iosrjce
 
Secretarial Performance and the Gender Question (A Study of Selected Tertiary...
Secretarial Performance and the Gender Question (A Study of Selected Tertiary...Secretarial Performance and the Gender Question (A Study of Selected Tertiary...
Secretarial Performance and the Gender Question (A Study of Selected Tertiary...iosrjce
 
Implementation of Quality Management principles at Zimbabwe Open University (...
Implementation of Quality Management principles at Zimbabwe Open University (...Implementation of Quality Management principles at Zimbabwe Open University (...
Implementation of Quality Management principles at Zimbabwe Open University (...iosrjce
 
Organizational Conflicts Management In Selected Organizaions In Lagos State, ...
Organizational Conflicts Management In Selected Organizaions In Lagos State, ...Organizational Conflicts Management In Selected Organizaions In Lagos State, ...
Organizational Conflicts Management In Selected Organizaions In Lagos State, ...iosrjce
 

Más de iosrjce (20)

An Examination of Effectuation Dimension as Financing Practice of Small and M...
An Examination of Effectuation Dimension as Financing Practice of Small and M...An Examination of Effectuation Dimension as Financing Practice of Small and M...
An Examination of Effectuation Dimension as Financing Practice of Small and M...
 
Does Goods and Services Tax (GST) Leads to Indian Economic Development?
Does Goods and Services Tax (GST) Leads to Indian Economic Development?Does Goods and Services Tax (GST) Leads to Indian Economic Development?
Does Goods and Services Tax (GST) Leads to Indian Economic Development?
 
Childhood Factors that influence success in later life
Childhood Factors that influence success in later lifeChildhood Factors that influence success in later life
Childhood Factors that influence success in later life
 
Emotional Intelligence and Work Performance Relationship: A Study on Sales Pe...
Emotional Intelligence and Work Performance Relationship: A Study on Sales Pe...Emotional Intelligence and Work Performance Relationship: A Study on Sales Pe...
Emotional Intelligence and Work Performance Relationship: A Study on Sales Pe...
 
Customer’s Acceptance of Internet Banking in Dubai
Customer’s Acceptance of Internet Banking in DubaiCustomer’s Acceptance of Internet Banking in Dubai
Customer’s Acceptance of Internet Banking in Dubai
 
A Study of Employee Satisfaction relating to Job Security & Working Hours amo...
A Study of Employee Satisfaction relating to Job Security & Working Hours amo...A Study of Employee Satisfaction relating to Job Security & Working Hours amo...
A Study of Employee Satisfaction relating to Job Security & Working Hours amo...
 
Consumer Perspectives on Brand Preference: A Choice Based Model Approach
Consumer Perspectives on Brand Preference: A Choice Based Model ApproachConsumer Perspectives on Brand Preference: A Choice Based Model Approach
Consumer Perspectives on Brand Preference: A Choice Based Model Approach
 
Student`S Approach towards Social Network Sites
Student`S Approach towards Social Network SitesStudent`S Approach towards Social Network Sites
Student`S Approach towards Social Network Sites
 
Broadcast Management in Nigeria: The systems approach as an imperative
Broadcast Management in Nigeria: The systems approach as an imperativeBroadcast Management in Nigeria: The systems approach as an imperative
Broadcast Management in Nigeria: The systems approach as an imperative
 
A Study on Retailer’s Perception on Soya Products with Special Reference to T...
A Study on Retailer’s Perception on Soya Products with Special Reference to T...A Study on Retailer’s Perception on Soya Products with Special Reference to T...
A Study on Retailer’s Perception on Soya Products with Special Reference to T...
 
A Study Factors Influence on Organisation Citizenship Behaviour in Corporate ...
A Study Factors Influence on Organisation Citizenship Behaviour in Corporate ...A Study Factors Influence on Organisation Citizenship Behaviour in Corporate ...
A Study Factors Influence on Organisation Citizenship Behaviour in Corporate ...
 
Consumers’ Behaviour on Sony Xperia: A Case Study on Bangladesh
Consumers’ Behaviour on Sony Xperia: A Case Study on BangladeshConsumers’ Behaviour on Sony Xperia: A Case Study on Bangladesh
Consumers’ Behaviour on Sony Xperia: A Case Study on Bangladesh
 
Design of a Balanced Scorecard on Nonprofit Organizations (Study on Yayasan P...
Design of a Balanced Scorecard on Nonprofit Organizations (Study on Yayasan P...Design of a Balanced Scorecard on Nonprofit Organizations (Study on Yayasan P...
Design of a Balanced Scorecard on Nonprofit Organizations (Study on Yayasan P...
 
Public Sector Reforms and Outsourcing Services in Nigeria: An Empirical Evalu...
Public Sector Reforms and Outsourcing Services in Nigeria: An Empirical Evalu...Public Sector Reforms and Outsourcing Services in Nigeria: An Empirical Evalu...
Public Sector Reforms and Outsourcing Services in Nigeria: An Empirical Evalu...
 
Media Innovations and its Impact on Brand awareness & Consideration
Media Innovations and its Impact on Brand awareness & ConsiderationMedia Innovations and its Impact on Brand awareness & Consideration
Media Innovations and its Impact on Brand awareness & Consideration
 
Customer experience in supermarkets and hypermarkets – A comparative study
Customer experience in supermarkets and hypermarkets – A comparative studyCustomer experience in supermarkets and hypermarkets – A comparative study
Customer experience in supermarkets and hypermarkets – A comparative study
 
Social Media and Small Businesses: A Combinational Strategic Approach under t...
Social Media and Small Businesses: A Combinational Strategic Approach under t...Social Media and Small Businesses: A Combinational Strategic Approach under t...
Social Media and Small Businesses: A Combinational Strategic Approach under t...
 
Secretarial Performance and the Gender Question (A Study of Selected Tertiary...
Secretarial Performance and the Gender Question (A Study of Selected Tertiary...Secretarial Performance and the Gender Question (A Study of Selected Tertiary...
Secretarial Performance and the Gender Question (A Study of Selected Tertiary...
 
Implementation of Quality Management principles at Zimbabwe Open University (...
Implementation of Quality Management principles at Zimbabwe Open University (...Implementation of Quality Management principles at Zimbabwe Open University (...
Implementation of Quality Management principles at Zimbabwe Open University (...
 
Organizational Conflicts Management In Selected Organizaions In Lagos State, ...
Organizational Conflicts Management In Selected Organizaions In Lagos State, ...Organizational Conflicts Management In Selected Organizaions In Lagos State, ...
Organizational Conflicts Management In Selected Organizaions In Lagos State, ...
 

Último

The SRE Report 2024 - Great Findings for the teams
The SRE Report 2024 - Great Findings for the teamsThe SRE Report 2024 - Great Findings for the teams
The SRE Report 2024 - Great Findings for the teamsDILIPKUMARMONDAL6
 
Vishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documentsVishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documentsSachinPawar510423
 
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncWhy does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncssuser2ae721
 
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONTHE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONjhunlian
 
Internet of things -Arshdeep Bahga .pptx
Internet of things -Arshdeep Bahga .pptxInternet of things -Arshdeep Bahga .pptx
Internet of things -Arshdeep Bahga .pptxVelmuruganTECE
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvLewisJB
 
Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHC Sai Kiran
 
System Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event SchedulingSystem Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event SchedulingBootNeck1
 
Solving The Right Triangles PowerPoint 2.ppt
Solving The Right Triangles PowerPoint 2.pptSolving The Right Triangles PowerPoint 2.ppt
Solving The Right Triangles PowerPoint 2.pptJasonTagapanGulla
 
Virtual memory management in Operating System
Virtual memory management in Operating SystemVirtual memory management in Operating System
Virtual memory management in Operating SystemRashmi Bhat
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)Dr SOUNDIRARAJ N
 
Steel Structures - Building technology.pptx
Steel Structures - Building technology.pptxSteel Structures - Building technology.pptx
Steel Structures - Building technology.pptxNikhil Raut
 
National Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdfNational Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdfRajuKanojiya4
 
Mine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptxMine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptxRomil Mishra
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptSAURABHKUMAR892774
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleAlluxio, Inc.
 
home automation using Arduino by Aditya Prasad
home automation using Arduino by Aditya Prasadhome automation using Arduino by Aditya Prasad
home automation using Arduino by Aditya Prasadaditya806802
 

Último (20)

The SRE Report 2024 - Great Findings for the teams
The SRE Report 2024 - Great Findings for the teamsThe SRE Report 2024 - Great Findings for the teams
The SRE Report 2024 - Great Findings for the teams
 
Vishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documentsVishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documents
 
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncWhy does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
 
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONTHE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
 
Internet of things -Arshdeep Bahga .pptx
Internet of things -Arshdeep Bahga .pptxInternet of things -Arshdeep Bahga .pptx
Internet of things -Arshdeep Bahga .pptx
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvv
 
Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECH
 
System Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event SchedulingSystem Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event Scheduling
 
POWER SYSTEMS-1 Complete notes examples
POWER SYSTEMS-1 Complete notes  examplesPOWER SYSTEMS-1 Complete notes  examples
POWER SYSTEMS-1 Complete notes examples
 
young call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Serviceyoung call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Service
 
Solving The Right Triangles PowerPoint 2.ppt
Solving The Right Triangles PowerPoint 2.pptSolving The Right Triangles PowerPoint 2.ppt
Solving The Right Triangles PowerPoint 2.ppt
 
Virtual memory management in Operating System
Virtual memory management in Operating SystemVirtual memory management in Operating System
Virtual memory management in Operating System
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
 
Steel Structures - Building technology.pptx
Steel Structures - Building technology.pptxSteel Structures - Building technology.pptx
Steel Structures - Building technology.pptx
 
National Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdfNational Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdf
 
Mine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptxMine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptx
 
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.ppt
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at Scale
 
home automation using Arduino by Aditya Prasad
home automation using Arduino by Aditya Prasadhome automation using Arduino by Aditya Prasad
home automation using Arduino by Aditya Prasad
 

Hadoop add-on API for Advanced Content Based Search & Retrieval

  • 1. IOSR Journal of Computer Engineering (IOSR-JCE) e-ISSN: 2278-0661,p-ISSN: 2278-8727, Volume 17, Issue 6, Ver. I (Nov – Dec. 2015), PP 91-94 www.iosrjournals.org DOI: 10.9790/0661-17619194 www.iosrjournals.org 91 | Page Hadoop add-on API for Advanced Content Based Search & Retrieval 1 Kshama Jain, 2 Aditya Kamble, 3 Siddhesh Palande, 4 Rahul Rao, Prof. ShaileshHule Department of Computer EngineeringPimpriChinchwad College of Engineering, Pune Guided by: Prof. ShaileshHule Abstract:Unstructured data like doc, pdf is lengthy to search and filter for desired information. We need to go through every file manually for finding information. It is very time consuming and frustrating. We can use features of big data management system like Hadoop to organize unstructured data dynamically and return desired information. Hadoop provides features like Map Reduce, HDFS, HBase to filter data as per user input. Finally we can develop Hadoop Addon for content search and filtering on unstructured data. Index Terms:Hadoop, Map Reduce, HBase, Content BasedRetrieval. I. Introduction A. Hadoop Hadoop is an open-source software framework written in Java for distributed storage and distributed processing of very large data sets on computer clusters built from commodity hardware. Commodity computing, or commodity cluster computing, is the use of large numbers of already-available computing components for parallel computing, to get the greatest amount of useful computation at low cost. All the modules in Hadoop are designed with a fundamental assumption that hardware failures (of individual machines, or racks of machines) are commonplace and thus should be automatically handled in software by the framework. The core of Apache Hadoop consists of a storage part (Hadoop Distributed File System (HDFS)) and a processing part (Map Reduce). Hadoop splits files into large blocks and distributesthem amongst the nodes in the cluster. B. HDFS The Hadoop Distributed File System (HDFS) is a distributed file system designed to run on commodity hardware. HDFS is highly fault-tolerant and is designed to be deployed on low-cost hardware. HDFS provides high throughput access to application data and is suitable for applications that have large data sets. C. MapReduce A MapReduce program is composed of a Map() procedure(method) that performs filtering and sorting and a Reduce() method that performs a summary operation. The MapReduce System run the various tasks in parallel, managing all communications and data transfers between the various parts of the system. D. HBase HBase is a data model that is similar to Googles big table designed to provide quick random access to huge amounts of structured data. This tutorial provides an introduction to HBase, the procedures to set up HBase on Hadoop File Systems, and ways to interact with HBase shell. It also describes how to connect to HBase using java, and how to perform basic operations on HBase using java. E. Current System Tasks like assignments, taking notes from text books and reference books on particular topic, topics for presentation need deep reading and need to go through every document manually just to find relevant content on given topic. Currently present systems are only searching based on document title, author, size, and time but not on content. So to do content based search on big data documents and large text data Haddop framework can be used F. Content Based Approach Manually filtering from any kind of unstructured data like PDF is tedious and time consuming, we are developing API for Hadoop finding relevant information from large sets and retrieving the same is main concern. So using Hadoop Big Data management framework consist of HDFS, MapReduce, and HBase, we are developing content based search on PDF documents to solve real life problem. So this is basic motivation for the
  • 2. Hadoop add-on API for Advanced Content Based Search & Retrieval DOI: 10.9790/0661-17619194 www.iosrjournals.org 92 | Page project. II. Proposed System This system shall retrieve the required contents of files which are in an unstructured format containing huge amount of Data like e-books in a digital library where the number of books are present in thousands. The scope here is initially limited to PDF files which may be expanded to other unstructured formats like ePUB, mobi. The input is provided to the system API in the form of search query which will be firstly filtered to find the important expression as a query to the API which will return the important content to the user in the form of paragraph or text highlighted using the power of distributed computing. The particular page or the Entire book itself can be downloaded by the Library users if the content is satisfied else the search continues for finding relevant content Fig. 1. Data Flow Diagram Algorithm 1: Content Based Search Algorithm 1) Take Input Query from User 2) Use the Suffix Stripping Algorithm to remove English Suffixes like ’ed’,’ing’,’ ’ ’s ’ to extract keywords 3) Match keywords with Documents’ metadata present in Hbase and filter the documents to be searched. 4) Apply Searching Algorithm based on map-reduce operations like keyword count,positions.  Open the PDF  Distribute the text using map operation.  Combine the results with reduce operations. 5) Sort the documents which are returned by Searching w.r.t keyword count and relevance. 6) Depending on API methods return the result with  Paragraph  Whole Page  Current and Previous Page A. Document Metadata Searching and retrieving information from data inside HDFS. Hadoop Map Reduce operations will be used to perform key value pair generation and depend upon result information is searched. a) Here, data is documents in PDF (Portable Document Format) format. These documents are stored on HDFS Hadoop Distributed File System. These documents are considered as raw data. These documents can have properties like  Name Size Date  Author b) Document meta data is also maintain in HBase distributed database. It has attributes like  Content Keywords
  • 3. Hadoop add-on API for Advanced Content Based Search & Retrieval DOI: 10.9790/0661-17619194 www.iosrjournals.org 93 | Page  Index Keyword  Document Keywords This information will be useful to filter documents before performing content based searching operations. Fig. 2. Activity Diagram Algorithm 2: Meta-data Extraction Service Algorithm Meta-data Extraction Service 1) Take the input pdf files from user 2) Check if corrupt 3) Upload the file into the Hadoop Distributed Filesystem 4) Run the Metadata Extracting Algorithm using the Map Reduce Engine. 5) Extract the Bibliography and Index Contents of each and every document (if present) and related keywords. 6) Store it in the HBase as metadata. 7) This metadata will be used by map reduce engine to search and retrieve data from relevant document inside HDFS using keyword relative custom partitioning. 8) Return the result to the search API 9) Stop
  • 4. Hadoop add-on API for Advanced Content Based Search & Retrieval DOI: 10.9790/0661-17619194 www.iosrjournals.org 94 | Page III. Applications Content based search and retrieval on 1) Digital library books 2) Conference Papers 3) Private Authorities 4) Government Authorities 5) Unstructured text files This API can be used to develop above functionality on platforms like 1) Web Application 2) Android Application 3) Standalone Application 4) Command Line Interface Application IV. Future Scope Extending our API further to read scanned text copies and retrieve the data from them would solve further advanced issues. This API again can be extended further to work with Image processing so that it may find a relevant information from the digital images for the end- users. V. Conclusion This API would be favorable for many large organizations like Large-scale Industries and Educational institutes, Power-Grids, Air-lines, Government Offices and many others working with and producing Big-Data to retrieve data in an efficient and a very cost effective way. Thus we can use this API in Hadoop to reduce manual efforts and bring advance content based search and retrieval References [1] Lars George, ”HBase: The Definitive Guide”, 1st edition, O’Reilly Media, September 2011, ISBN 9781449396107 [2] Tom White, ”Hadoop: The Definitive Guide”, 1st edition, O’Reilly Media, June 2009, ISBN 9780596521974 [3] Apache Hadoop HDFS homepage http://hadoop.apache.org/hdfs/ [4] MehulNalinVora ,”Hadoop-HBase for Large-Scale Data”,InnovationLabs,PERC,ICCNT Conference [5] YijunBei,ZhenLin,ChenZhao,Xiaojun Zhu ,”HBase System-based Distributed Framework for Searching Large Graph Databases”,ICCNT Conference [6] SeemaMaitrey,C.K.”Handling Big Data Efficiently by using Map Reduce Technique”,ICCICT [7] Maitrey S, Jha. An Integrated Approach for CURE Clustering using Map-Reduce Technique. In Proceedings of Elsevier,ISBN 978- 81- 910691-6-3,2 nd August 2013.