SlideShare una empresa de Scribd logo
1 de 17
Descargar para leer sin conexión
Context 
based 
Web 
Indexing 
for 
Storage 
of 
Relevant 
Web 
Pages
ABSTRACT 
• A 
focused 
crawler 
downloads 
web 
pages 
that 
are 
relevant 
to 
a 
user 
specified 
topic. 
• This 
paper 
proposes 
a 
technique 
for 
indexing 
the 
keyword 
extracted 
from 
the 
web 
documents 
along 
with 
their 
contexts 
wherein 
it 
uses 
a 
height 
balanced 
binary 
search 
(AVL) 
tree, 
for 
indexing 
purpose 
to 
enhance 
the 
performance 
of 
the 
retrieval 
system.
INTRODUCTION 
• The 
basic 
aim 
is 
to 
select 
the 
best 
collecNon 
of 
informaNon 
according 
to 
users 
need. 
• 
The 
exisNng 
focused 
crawlers 
do 
not 
analyze 
the 
context 
of 
the 
keyword 
in 
the 
web 
page 
before 
they 
download 
it. 
• The 
Use 
Of 
AVL 
Tree
AVL 
Tree 
• In 
AVL 
(i.e. 
height 
balanced 
binary 
tree) 
tree 
[3], 
the 
height 
of 
a 
tree 
is 
defined 
as 
the 
length 
of 
the 
longest 
path 
from 
the 
root 
node 
of 
the 
tree 
to 
one 
of 
its 
leaf 
node. 
• 
And 
the 
balance 
factor 
(BF) 
is: 
(height 
of 
leU 
subtree 
– 
height 
of 
right 
subtree). 
To 
call 
the 
AVL 
as 
balanced 
the 
value 
of 
BF 
should 
be 
-­‐1, 
0 
or 
1. 
• This 
strategy 
makes 
the 
searching 
task 
faster 
and 
opNmized.
RELATED 
WORK 
• F. 
Silvestri, 
R.Perego 
and 
Orlando. 
proposed 
the 
reordering 
algorithm. 
• Oren 
Zamir 
and 
Oren 
Etzioni. 
proposed 
threshold 
based 
clustering 
algorithm. 
• C. 
Zhou, 
W. 
Ding 
and 
Na 
Yang. 
The 
paper 
introduces 
a 
double 
indexing 
mechanism 
for 
search 
engines 
based 
on 
campus 
Net. 
• N. 
Chauhan 
and 
A. 
K. 
Sharma. 
Proposed, 
the 
context 
driven 
focused 
crawler 
(CDFC) 
• P. 
Gupta 
and 
A. 
K. 
Sharma. 
Worked 
on 
context 
based 
indexing 
in 
search 
engines 
using 
ontology.
PROPOSED 
WORK 
• This 
paper 
proposes 
an 
algorithm 
for 
indexing 
the 
keyword 
extracted 
from 
the 
web 
documents 
along 
with 
their 
context. 
• The 
indexing 
technique 
uses 
a 
height 
balanced 
binary 
search 
(AVL) 
tree, 
in 
addiNon 
to 
improved 
performance 
in 
the 
retrieval 
of 
informaNon, 
this 
data 
structure 
is 
able 
to 
support 
dynamic 
indexing, 
which 
is 
especially 
important 
for 
environments 
where 
documents 
are 
changed 
frequently.
Architecture 
of 
context 
based 
indexing.
Context 
Based 
Retrieval 
Interface
Steps 
involved 
in 
the 
construcNon 
of 
the 
context 
based 
index 
using 
AVL. 
• Step1: 
Preprocess 
the 
crawled 
web 
documents 
and 
extract 
the 
keyword 
along 
with 
their 
frequency 
of 
occurrence. 
• Step2: 
Input 
the 
keywords 
to 
the 
context 
generator 
which 
extracts 
the 
mulNple 
contextual 
sense 
of 
the 
word. 
Context 
is 
being 
searched 
in 
the 
thesaurus 
(a 
dicNonary 
of 
words 
available 
on 
WWW 
from 
thesaurus.com, 
which 
contains 
the 
words 
as 
well 
their 
mulNple 
meanings). 
Step3: 
The 
keywords 
along 
with 
the 
context 
are 
indexed 
using 
the 
AVL 
tree. 
• Step4: 
Compare 
the 
entered 
keyword 
with 
the 
node’s 
keyword 
field 
of 
the 
AVL 
tree, 
unNl 
a 
similar 
word 
is 
found. 
• Step5: 
If 
search 
is 
not 
a 
success, 
create 
a 
node 
containing 
the 
following 
fields 
(LeUchild, 
keyword, 
rightchild, 
link) 
as 
shown 
in 
figure4.The 
link 
is 
pointer 
variable 
which 
points 
to 
the 
database 
where 
the 
context 
of 
keyword 
and 
the 
corresponding 
document_id 
is 
stored. 
Context 
is 
being 
searched 
in 
the 
thesaurus 
(a 
dicNonary 
of 
words 
available 
on 
WWW 
from 
thesaurus.com, 
which 
contains 
the 
words 
as 
well 
their 
mulNple 
meanings). 
Step6: 
Arrange 
the 
node 
in 
the 
AVL 
tree, 
according 
to 
the 
height 
BF.
Steps 
involved 
in 
the 
construcNon 
of 
the 
context 
based 
index 
using 
AVL. 
• Step7: 
Repeat 
step 
4, 
5 
and 
6 
unNl 
all 
the 
extracted 
keywords 
are 
arranged. 
• Step8: 
Now 
when 
the 
user 
fires 
the 
query 
with 
context 
explicitly 
specified, 
then 
the 
index 
is 
being 
searched, 
reducing 
its 
search 
Nme 
to 
half 
of 
the 
linear 
search. 
• Step9. 
Thus, 
AVL 
indexing 
technique 
provides 
a 
fast 
access 
to 
document 
context 
and 
structure.
Node 
structure.
Node 
structure. 
Create_BST() 
//iniNally 
the 
tree 
is 
empty. 
{ 
create 
new 
node 
containing 
the 
fields 
( 
leU 
child, 
keyword, 
rightchild, 
link). 
LeUchild 
value 
= 
NULL 
Rightchild 
value 
= 
NULL 
Link 
= 
address 
of 
database 
where 
the 
context 
and 
the 
corresponding 
document_id 
is 
stored 
Insert_node(); 
} 
Insert_node() 
{ 
Check, 
whether 
value 
in 
current 
node 
and 
a 
new 
keyword 
value 
are 
equal. 
If 
so, 
duplicate 
is 
found. 
Otherwise, 
if 
a 
new 
keyword 
value 
is 
less, 
than 
the 
root 
node's 
value: 
If 
a 
current 
node 
has 
no 
leU 
child, 
place 
for 
inserNon 
has 
been 
found; 
Otherwise, 
handle 
the 
leU 
child 
with 
the 
same 
algorithm. 
Compute_height(); 
if 
a 
new 
value 
is 
greater, 
than 
the 
root 
node's 
value: 
if 
a 
current 
node 
has 
no 
right 
child, 
place 
for 
inserNon 
has 
been 
found; 
otherwise, 
handle 
the 
right 
child 
with 
the 
same 
algorithm. 
Compute_height();
Node 
structure. 
The rearrangement of the node can eliminate the imbalance. 
Representation of keywords using binary search tree
Example
Node 
structure.
CONCLUSION 
• This 
paper 
proposes 
a 
technique 
for 
indexing 
the 
keyword 
extracted 
from 
the 
web 
documents 
along 
with 
their 
context. 
• The 
AVL 
tree 
based 
indexing 
technique, 
is 
able 
to 
support 
dynamic 
indexing 
and 
improves 
the 
performance 
in 
terms 
of 
accuracy 
and 
efficiency 
for 
retrieving 
more, 
relevant 
documents 
as 
per 
the 
user’s 
requirements 
since 
the 
context 
of 
the 
various 
keywords 
is 
also 
stored 
along 
with 
them. 
• Thus, 
the 
indexing 
technique 
provides 
a 
fast 
access 
to 
document 
context 
and 
structure 
along 
with 
an 
opNmized 
searching.
Thank 
You

Más contenido relacionado

La actualidad más candente

Document Classification and Clustering
Document Classification and ClusteringDocument Classification and Clustering
Document Classification and ClusteringAnkur Shrivastava
 
A Research Literature Search Engine With Abbreviation Recognition
A Research Literature Search Engine With Abbreviation RecognitionA Research Literature Search Engine With Abbreviation Recognition
A Research Literature Search Engine With Abbreviation RecognitionHector Lin
 
Binary search in ds
Binary search in dsBinary search in ds
Binary search in dschauhankapil
 
Textmining Retrieval And Clustering
Textmining Retrieval And ClusteringTextmining Retrieval And Clustering
Textmining Retrieval And Clusteringguest0edcaf
 
Evolving a Medical Image Similarity Search
Evolving a Medical Image Similarity SearchEvolving a Medical Image Similarity Search
Evolving a Medical Image Similarity SearchSujit Pal
 
Data-Applied: Technology Insights
Data-Applied: Technology InsightsData-Applied: Technology Insights
Data-Applied: Technology InsightsDataminingTools Inc
 
Reviewing basic concepts of relational database
Reviewing basic concepts of relational databaseReviewing basic concepts of relational database
Reviewing basic concepts of relational databaseHitesh Mohapatra
 
Susie search using services and information extraction
Susie search using services and information extractionSusie search using services and information extraction
Susie search using services and information extractionIEEEFINALYEARPROJECTS
 
8 efficient multi-document summary generation using neural network
8 efficient multi-document summary generation using neural network8 efficient multi-document summary generation using neural network
8 efficient multi-document summary generation using neural networkINFOGAIN PUBLICATION
 
Query-porcessing-& Query optimization
Query-porcessing-& Query optimizationQuery-porcessing-& Query optimization
Query-porcessing-& Query optimizationSaranya Natarajan
 
Topic sensitive page rank(review)
Topic sensitive page rank(review)Topic sensitive page rank(review)
Topic sensitive page rank(review)hongs
 
Query evaluation and optimization
Query evaluation and optimizationQuery evaluation and optimization
Query evaluation and optimizationlavanya marichamy
 
Combination of Levenshtein Distance and Rabin-Karp to Improve the Accuracy of...
Combination of Levenshtein Distance and Rabin-Karp to Improve the Accuracy of...Combination of Levenshtein Distance and Rabin-Karp to Improve the Accuracy of...
Combination of Levenshtein Distance and Rabin-Karp to Improve the Accuracy of...Universitas Pembangunan Panca Budi
 
Vchunk join an efficient algorithm for edit similarity joins
Vchunk join an efficient algorithm for edit similarity joinsVchunk join an efficient algorithm for edit similarity joins
Vchunk join an efficient algorithm for edit similarity joinsVijay Koushik
 
Real Time Competitive Marketing Intelligence
Real Time Competitive Marketing IntelligenceReal Time Competitive Marketing Intelligence
Real Time Competitive Marketing Intelligencefeiwin
 
Classification of URLs
Classification of URLsClassification of URLs
Classification of URLsFANCY ARORA
 
Indexing in eXist database
Indexing in eXist database Indexing in eXist database
Indexing in eXist database redchilly
 
Supporting search as-you-type using sql in databases
Supporting search as-you-type using sql in databasesSupporting search as-you-type using sql in databases
Supporting search as-you-type using sql in databasesEcway Technologies
 
Author paper identification problem final presentation
Author  paper identification problem final presentationAuthor  paper identification problem final presentation
Author paper identification problem final presentationPooja Mishra
 
Clustering sentence level text using a novel fuzzy relational clustering algo...
Clustering sentence level text using a novel fuzzy relational clustering algo...Clustering sentence level text using a novel fuzzy relational clustering algo...
Clustering sentence level text using a novel fuzzy relational clustering algo...JPINFOTECH JAYAPRAKASH
 

La actualidad más candente (20)

Document Classification and Clustering
Document Classification and ClusteringDocument Classification and Clustering
Document Classification and Clustering
 
A Research Literature Search Engine With Abbreviation Recognition
A Research Literature Search Engine With Abbreviation RecognitionA Research Literature Search Engine With Abbreviation Recognition
A Research Literature Search Engine With Abbreviation Recognition
 
Binary search in ds
Binary search in dsBinary search in ds
Binary search in ds
 
Textmining Retrieval And Clustering
Textmining Retrieval And ClusteringTextmining Retrieval And Clustering
Textmining Retrieval And Clustering
 
Evolving a Medical Image Similarity Search
Evolving a Medical Image Similarity SearchEvolving a Medical Image Similarity Search
Evolving a Medical Image Similarity Search
 
Data-Applied: Technology Insights
Data-Applied: Technology InsightsData-Applied: Technology Insights
Data-Applied: Technology Insights
 
Reviewing basic concepts of relational database
Reviewing basic concepts of relational databaseReviewing basic concepts of relational database
Reviewing basic concepts of relational database
 
Susie search using services and information extraction
Susie search using services and information extractionSusie search using services and information extraction
Susie search using services and information extraction
 
8 efficient multi-document summary generation using neural network
8 efficient multi-document summary generation using neural network8 efficient multi-document summary generation using neural network
8 efficient multi-document summary generation using neural network
 
Query-porcessing-& Query optimization
Query-porcessing-& Query optimizationQuery-porcessing-& Query optimization
Query-porcessing-& Query optimization
 
Topic sensitive page rank(review)
Topic sensitive page rank(review)Topic sensitive page rank(review)
Topic sensitive page rank(review)
 
Query evaluation and optimization
Query evaluation and optimizationQuery evaluation and optimization
Query evaluation and optimization
 
Combination of Levenshtein Distance and Rabin-Karp to Improve the Accuracy of...
Combination of Levenshtein Distance and Rabin-Karp to Improve the Accuracy of...Combination of Levenshtein Distance and Rabin-Karp to Improve the Accuracy of...
Combination of Levenshtein Distance and Rabin-Karp to Improve the Accuracy of...
 
Vchunk join an efficient algorithm for edit similarity joins
Vchunk join an efficient algorithm for edit similarity joinsVchunk join an efficient algorithm for edit similarity joins
Vchunk join an efficient algorithm for edit similarity joins
 
Real Time Competitive Marketing Intelligence
Real Time Competitive Marketing IntelligenceReal Time Competitive Marketing Intelligence
Real Time Competitive Marketing Intelligence
 
Classification of URLs
Classification of URLsClassification of URLs
Classification of URLs
 
Indexing in eXist database
Indexing in eXist database Indexing in eXist database
Indexing in eXist database
 
Supporting search as-you-type using sql in databases
Supporting search as-you-type using sql in databasesSupporting search as-you-type using sql in databases
Supporting search as-you-type using sql in databases
 
Author paper identification problem final presentation
Author  paper identification problem final presentationAuthor  paper identification problem final presentation
Author paper identification problem final presentation
 
Clustering sentence level text using a novel fuzzy relational clustering algo...
Clustering sentence level text using a novel fuzzy relational clustering algo...Clustering sentence level text using a novel fuzzy relational clustering algo...
Clustering sentence level text using a novel fuzzy relational clustering algo...
 

Destacado

Book Citation Index and NOVA's Ranking
Book Citation Index and NOVA's RankingBook Citation Index and NOVA's Ranking
Book Citation Index and NOVA's RankingNOVA
 
Best Practices For Delivering Virtual Classroom Training
Best Practices For Delivering Virtual Classroom TrainingBest Practices For Delivering Virtual Classroom Training
Best Practices For Delivering Virtual Classroom TrainingFareeza Marican
 
Assignment 1 determining employee motivation
Assignment 1 determining employee motivationAssignment 1 determining employee motivation
Assignment 1 determining employee motivationMoses Mbanje
 
Ratings made simple edited
Ratings made simple editedRatings made simple edited
Ratings made simple editedmediaplaylab
 
Video Paper Builder
Video Paper BuilderVideo Paper Builder
Video Paper Buildermediaplaylab
 
Desitara-optimized-seo-plan-for-desitara-by-Shyam-Swaraj
Desitara-optimized-seo-plan-for-desitara-by-Shyam-SwarajDesitara-optimized-seo-plan-for-desitara-by-Shyam-Swaraj
Desitara-optimized-seo-plan-for-desitara-by-Shyam-SwarajEventXP
 
Diacritice romanesti in ym 2
Diacritice romanesti in ym 2Diacritice romanesti in ym 2
Diacritice romanesti in ym 2dianaifrim
 
áLbum De FotografíAs
áLbum De FotografíAsáLbum De FotografíAs
áLbum De FotografíAsBAT007
 
Amitabh Leveraging Cable Networks In India
Amitabh Leveraging Cable Networks In IndiaAmitabh Leveraging Cable Networks In India
Amitabh Leveraging Cable Networks In Indiagunjan999906
 
Exposure Lecture 2014 - Tamil Language
Exposure Lecture 2014 - Tamil LanguageExposure Lecture 2014 - Tamil Language
Exposure Lecture 2014 - Tamil Languagemediaplaylab
 
Introduction to Service Design for Translink
Introduction to Service Design for TranslinkIntroduction to Service Design for Translink
Introduction to Service Design for TranslinkCathy Wang
 
A L BÚ M D E L P A R Q U E E C O LÓ G I C O
A L BÚ M  D E L  P A R Q U E  E C O LÓ G I C OA L BÚ M  D E L  P A R Q U E  E C O LÓ G I C O
A L BÚ M D E L P A R Q U E E C O LÓ G I C OBAT007
 
C:\Fakepath\Information Literacy Hw5
C:\Fakepath\Information Literacy  Hw5C:\Fakepath\Information Literacy  Hw5
C:\Fakepath\Information Literacy Hw5王耀慶
 
Allyssen
AllyssenAllyssen
AllyssenLooppa
 
Bluesky Concept Presentation
Bluesky Concept PresentationBluesky Concept Presentation
Bluesky Concept PresentationHuanYang
 

Destacado (20)

Book Citation Index and NOVA's Ranking
Book Citation Index and NOVA's RankingBook Citation Index and NOVA's Ranking
Book Citation Index and NOVA's Ranking
 
Google App indexing
Google App indexingGoogle App indexing
Google App indexing
 
Chunking
ChunkingChunking
Chunking
 
Best Practices For Delivering Virtual Classroom Training
Best Practices For Delivering Virtual Classroom TrainingBest Practices For Delivering Virtual Classroom Training
Best Practices For Delivering Virtual Classroom Training
 
Chunking Text
Chunking TextChunking Text
Chunking Text
 
Assignment 1 determining employee motivation
Assignment 1 determining employee motivationAssignment 1 determining employee motivation
Assignment 1 determining employee motivation
 
Ratings made simple edited
Ratings made simple editedRatings made simple edited
Ratings made simple edited
 
Video Paper Builder
Video Paper BuilderVideo Paper Builder
Video Paper Builder
 
Desitara-optimized-seo-plan-for-desitara-by-Shyam-Swaraj
Desitara-optimized-seo-plan-for-desitara-by-Shyam-SwarajDesitara-optimized-seo-plan-for-desitara-by-Shyam-Swaraj
Desitara-optimized-seo-plan-for-desitara-by-Shyam-Swaraj
 
Diacritice romanesti in ym 2
Diacritice romanesti in ym 2Diacritice romanesti in ym 2
Diacritice romanesti in ym 2
 
áLbum De FotografíAs
áLbum De FotografíAsáLbum De FotografíAs
áLbum De FotografíAs
 
Property Centric Overview
Property Centric OverviewProperty Centric Overview
Property Centric Overview
 
Amitabh Leveraging Cable Networks In India
Amitabh Leveraging Cable Networks In IndiaAmitabh Leveraging Cable Networks In India
Amitabh Leveraging Cable Networks In India
 
Exposure Lecture 2014 - Tamil Language
Exposure Lecture 2014 - Tamil LanguageExposure Lecture 2014 - Tamil Language
Exposure Lecture 2014 - Tamil Language
 
เสนอค่าย
เสนอค่ายเสนอค่าย
เสนอค่าย
 
Introduction to Service Design for Translink
Introduction to Service Design for TranslinkIntroduction to Service Design for Translink
Introduction to Service Design for Translink
 
A L BÚ M D E L P A R Q U E E C O LÓ G I C O
A L BÚ M  D E L  P A R Q U E  E C O LÓ G I C OA L BÚ M  D E L  P A R Q U E  E C O LÓ G I C O
A L BÚ M D E L P A R Q U E E C O LÓ G I C O
 
C:\Fakepath\Information Literacy Hw5
C:\Fakepath\Information Literacy  Hw5C:\Fakepath\Information Literacy  Hw5
C:\Fakepath\Information Literacy Hw5
 
Allyssen
AllyssenAllyssen
Allyssen
 
Bluesky Concept Presentation
Bluesky Concept PresentationBluesky Concept Presentation
Bluesky Concept Presentation
 

Similar a Context based Web Indexing for Storage of Relevant Web Pages

Context Based Web Indexing For Semantic Web
Context Based Web Indexing For Semantic WebContext Based Web Indexing For Semantic Web
Context Based Web Indexing For Semantic WebIOSR Journals
 
Comparative analysis of relative and exact search for web information retrieval
Comparative analysis of relative and exact search for web information retrievalComparative analysis of relative and exact search for web information retrieval
Comparative analysis of relative and exact search for web information retrievaleSAT Journals
 
Annotating search results from web databases-IEEE Transaction Paper 2013
Annotating search results from web databases-IEEE Transaction Paper 2013Annotating search results from web databases-IEEE Transaction Paper 2013
Annotating search results from web databases-IEEE Transaction Paper 2013Yadhu Kiran
 
Red Hat Summit Connect 2023 - Redis Enterprise, the engine of Generative AI
Red Hat Summit Connect 2023 - Redis Enterprise, the engine of Generative AIRed Hat Summit Connect 2023 - Redis Enterprise, the engine of Generative AI
Red Hat Summit Connect 2023 - Redis Enterprise, the engine of Generative AILuigi Fugaro
 
Keyword Query Routing
Keyword Query RoutingKeyword Query Routing
Keyword Query RoutingSWAMI06
 
DEByE─Data Extraction By Example
DEByE─Data Extraction By ExampleDEByE─Data Extraction By Example
DEByE─Data Extraction By Examplelswing
 
IEEE 2014 JAVA DATA MINING PROJECTS Keyword query routing
IEEE 2014 JAVA DATA MINING PROJECTS Keyword query routingIEEE 2014 JAVA DATA MINING PROJECTS Keyword query routing
IEEE 2014 JAVA DATA MINING PROJECTS Keyword query routingIEEEFINALYEARSTUDENTPROJECTS
 
2014 IEEE JAVA DATA MINING PROJECT Keyword query routing
2014 IEEE JAVA DATA MINING PROJECT Keyword query routing2014 IEEE JAVA DATA MINING PROJECT Keyword query routing
2014 IEEE JAVA DATA MINING PROJECT Keyword query routingIEEEMEMTECHSTUDENTSPROJECTS
 
score based ranking of documents
score based ranking of documentsscore based ranking of documents
score based ranking of documentsKriti Khanna
 
Getting started with Elasticsearch in .net
Getting started with Elasticsearch in .netGetting started with Elasticsearch in .net
Getting started with Elasticsearch in .netIsmaeel Enjreny
 
Getting Started With Elasticsearch In .NET
Getting Started With Elasticsearch In .NETGetting Started With Elasticsearch In .NET
Getting Started With Elasticsearch In .NETAhmed Abd Ellatif
 
View the Microsoft Word document.doc
View the Microsoft Word document.docView the Microsoft Word document.doc
View the Microsoft Word document.docbutest
 
View the Microsoft Word document.doc
View the Microsoft Word document.docView the Microsoft Word document.doc
View the Microsoft Word document.docbutest
 
View the Microsoft Word document.doc
View the Microsoft Word document.docView the Microsoft Word document.doc
View the Microsoft Word document.docbutest
 
International Journal of Computational Engineering Research(IJCER)
 International Journal of Computational Engineering Research(IJCER)  International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER) ijceronline
 

Similar a Context based Web Indexing for Storage of Relevant Web Pages (20)

Context Based Web Indexing For Semantic Web
Context Based Web Indexing For Semantic WebContext Based Web Indexing For Semantic Web
Context Based Web Indexing For Semantic Web
 
Comparative analysis of relative and exact search for web information retrieval
Comparative analysis of relative and exact search for web information retrievalComparative analysis of relative and exact search for web information retrieval
Comparative analysis of relative and exact search for web information retrieval
 
Elastic search
Elastic searchElastic search
Elastic search
 
Annotating search results from web databases-IEEE Transaction Paper 2013
Annotating search results from web databases-IEEE Transaction Paper 2013Annotating search results from web databases-IEEE Transaction Paper 2013
Annotating search results from web databases-IEEE Transaction Paper 2013
 
Keyword query routing
Keyword query routingKeyword query routing
Keyword query routing
 
Red Hat Summit Connect 2023 - Redis Enterprise, the engine of Generative AI
Red Hat Summit Connect 2023 - Redis Enterprise, the engine of Generative AIRed Hat Summit Connect 2023 - Redis Enterprise, the engine of Generative AI
Red Hat Summit Connect 2023 - Redis Enterprise, the engine of Generative AI
 
Keyword Query Routing
Keyword Query RoutingKeyword Query Routing
Keyword Query Routing
 
DEByE─Data Extraction By Example
DEByE─Data Extraction By ExampleDEByE─Data Extraction By Example
DEByE─Data Extraction By Example
 
IEEE 2014 JAVA DATA MINING PROJECTS Keyword query routing
IEEE 2014 JAVA DATA MINING PROJECTS Keyword query routingIEEE 2014 JAVA DATA MINING PROJECTS Keyword query routing
IEEE 2014 JAVA DATA MINING PROJECTS Keyword query routing
 
2014 IEEE JAVA DATA MINING PROJECT Keyword query routing
2014 IEEE JAVA DATA MINING PROJECT Keyword query routing2014 IEEE JAVA DATA MINING PROJECT Keyword query routing
2014 IEEE JAVA DATA MINING PROJECT Keyword query routing
 
score based ranking of documents
score based ranking of documentsscore based ranking of documents
score based ranking of documents
 
Lucene basics
Lucene basicsLucene basics
Lucene basics
 
Getting started with Elasticsearch in .net
Getting started with Elasticsearch in .netGetting started with Elasticsearch in .net
Getting started with Elasticsearch in .net
 
Getting Started With Elasticsearch In .NET
Getting Started With Elasticsearch In .NETGetting Started With Elasticsearch In .NET
Getting Started With Elasticsearch In .NET
 
View the Microsoft Word document.doc
View the Microsoft Word document.docView the Microsoft Word document.doc
View the Microsoft Word document.doc
 
View the Microsoft Word document.doc
View the Microsoft Word document.docView the Microsoft Word document.doc
View the Microsoft Word document.doc
 
View the Microsoft Word document.doc
View the Microsoft Word document.docView the Microsoft Word document.doc
View the Microsoft Word document.doc
 
Sub1522
Sub1522Sub1522
Sub1522
 
International Journal of Computational Engineering Research(IJCER)
 International Journal of Computational Engineering Research(IJCER)  International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
 

Último

ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
Plant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptxPlant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptxUmeshTimilsina1
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxDr. Ravikiran H M Gowda
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the ClassroomPooky Knightsmith
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfSherif Taha
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Pooja Bhuva
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxmarlenawright1
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.MaryamAhmad92
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and ModificationsMJDuyan
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...Nguyen Thanh Tu Collection
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024Elizabeth Walsh
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentationcamerronhm
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jisc
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxCeline George
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Jisc
 
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxOn_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxPooja Bhuva
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash
 
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...Amil baba
 

Último (20)

ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Plant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptxPlant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptx
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptx
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptx
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxOn_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
 

Context based Web Indexing for Storage of Relevant Web Pages

  • 1. Context based Web Indexing for Storage of Relevant Web Pages
  • 2. ABSTRACT • A focused crawler downloads web pages that are relevant to a user specified topic. • This paper proposes a technique for indexing the keyword extracted from the web documents along with their contexts wherein it uses a height balanced binary search (AVL) tree, for indexing purpose to enhance the performance of the retrieval system.
  • 3. INTRODUCTION • The basic aim is to select the best collecNon of informaNon according to users need. • The exisNng focused crawlers do not analyze the context of the keyword in the web page before they download it. • The Use Of AVL Tree
  • 4. AVL Tree • In AVL (i.e. height balanced binary tree) tree [3], the height of a tree is defined as the length of the longest path from the root node of the tree to one of its leaf node. • And the balance factor (BF) is: (height of leU subtree – height of right subtree). To call the AVL as balanced the value of BF should be -­‐1, 0 or 1. • This strategy makes the searching task faster and opNmized.
  • 5. RELATED WORK • F. Silvestri, R.Perego and Orlando. proposed the reordering algorithm. • Oren Zamir and Oren Etzioni. proposed threshold based clustering algorithm. • C. Zhou, W. Ding and Na Yang. The paper introduces a double indexing mechanism for search engines based on campus Net. • N. Chauhan and A. K. Sharma. Proposed, the context driven focused crawler (CDFC) • P. Gupta and A. K. Sharma. Worked on context based indexing in search engines using ontology.
  • 6. PROPOSED WORK • This paper proposes an algorithm for indexing the keyword extracted from the web documents along with their context. • The indexing technique uses a height balanced binary search (AVL) tree, in addiNon to improved performance in the retrieval of informaNon, this data structure is able to support dynamic indexing, which is especially important for environments where documents are changed frequently.
  • 7. Architecture of context based indexing.
  • 9. Steps involved in the construcNon of the context based index using AVL. • Step1: Preprocess the crawled web documents and extract the keyword along with their frequency of occurrence. • Step2: Input the keywords to the context generator which extracts the mulNple contextual sense of the word. Context is being searched in the thesaurus (a dicNonary of words available on WWW from thesaurus.com, which contains the words as well their mulNple meanings). Step3: The keywords along with the context are indexed using the AVL tree. • Step4: Compare the entered keyword with the node’s keyword field of the AVL tree, unNl a similar word is found. • Step5: If search is not a success, create a node containing the following fields (LeUchild, keyword, rightchild, link) as shown in figure4.The link is pointer variable which points to the database where the context of keyword and the corresponding document_id is stored. Context is being searched in the thesaurus (a dicNonary of words available on WWW from thesaurus.com, which contains the words as well their mulNple meanings). Step6: Arrange the node in the AVL tree, according to the height BF.
  • 10. Steps involved in the construcNon of the context based index using AVL. • Step7: Repeat step 4, 5 and 6 unNl all the extracted keywords are arranged. • Step8: Now when the user fires the query with context explicitly specified, then the index is being searched, reducing its search Nme to half of the linear search. • Step9. Thus, AVL indexing technique provides a fast access to document context and structure.
  • 12. Node structure. Create_BST() //iniNally the tree is empty. { create new node containing the fields ( leU child, keyword, rightchild, link). LeUchild value = NULL Rightchild value = NULL Link = address of database where the context and the corresponding document_id is stored Insert_node(); } Insert_node() { Check, whether value in current node and a new keyword value are equal. If so, duplicate is found. Otherwise, if a new keyword value is less, than the root node's value: If a current node has no leU child, place for inserNon has been found; Otherwise, handle the leU child with the same algorithm. Compute_height(); if a new value is greater, than the root node's value: if a current node has no right child, place for inserNon has been found; otherwise, handle the right child with the same algorithm. Compute_height();
  • 13. Node structure. The rearrangement of the node can eliminate the imbalance. Representation of keywords using binary search tree
  • 16. CONCLUSION • This paper proposes a technique for indexing the keyword extracted from the web documents along with their context. • The AVL tree based indexing technique, is able to support dynamic indexing and improves the performance in terms of accuracy and efficiency for retrieving more, relevant documents as per the user’s requirements since the context of the various keywords is also stored along with them. • Thus, the indexing technique provides a fast access to document context and structure along with an opNmized searching.