SlideShare a Scribd company logo
1 of 17
In Search of a Semantic Book
Search Engine on the Web:
Are We There Yet?
By
Irfan Ullah and Shah Khusro
University of Peshawar, Pakistan
5th Computer Science On-line Conference 2016
ComputerScienceOnline
Conference2016
1
In this Presentation
• Abstract
• Introduction
• Survey of the Literature
• Extracting Structure & Indexing Books
• Searching and Ranking Books
• Book Recommendations
• Fine-grained Access to Information in Books
• Discussion and Analysis
• Conclusions
• References
ComputerScienceOnline
Conference2016
2
Abstract
• Books – Valuable source of knowledge and learning
• Position
• Web Information Retrieval (IR) techniques for book retrieval
• Existing searching solutions treat books as plaintext collections
• Inaccurate and imprecise book search results
• Solution
• Books are different from web pages
• Structural semantics and logical connections in their content for
searching, ranking and recommendations
• Fine-grained access to information in books e.g. tables, figures
ComputerScienceOnline
Conference2016
3
Introduction
• Web Information Retrieval
• Rich text collections with explicit hypertextual structure
• Used in searching and ranking web pages
• Books lack this graph-like structure – Problem
• Books are well-organized and logically connected
• Presenting a graph-like structure – can be used in searching,
ranking, and recommending books
• But visible to Human readers only
• Problem – Need to be machine understandable and processable
http://talk.payloadz.com/wp-
content/uploads/2013/10/Selling-Books-Online-660x320.jpg
ComputerScienceOnline
Conference2016
4
Introduction
• Solution – Semantic Book Search Engine
• What is Required?
• A more in-depth and comprehensive book structure ontology
• Domain level ontologies to understand book contents in different
domains
• Connecting books in graph-like manner
• Why?
• Better searching, ranking, and recommendations
• Increase user satisfaction
• Promoting objectives of other stakeholders
ComputerScienceOnline
Conference2016
5
Survey of the Literature
• Extracting Structure & Indexing Books
• Many Research Initiatives and Conferences
• INEX, ICDAR, and BooksOnline
• Indexing books’ valuable parts [2].
• Book layout analysis for extracting TOC [3] and other parts [8]
• Resurgence software for detecting different parts [4-6]
• Rule-based and SVM-based methods extracting TOC [7]
• Detecting and parsing TOC pages [9], index pages [9] through
classical methods [10, 11] and using trailing page whitespace
methods [9]
• Required
• Connecting book title with other parts
• Better book indexing, ranking and recommendations
ComputerScienceOnline
Conference2016
6
Survey of the Literature
• Searching and Ranking Books
• Ranking authors by expert finding to rank books [12]
• “Authors capture an important aspect of relevance [12]”
• Read books written by popular experts in the field
• No bags-of-words models
• Ranking by what is actually inside books [13]
• Thesaurus, reference works and ontologies
• Helping readers in getting useful insights into text and decide about
the relevancy of the book
ComputerScienceOnline
Conference2016
7
www.vectastock.com
Survey of the Literature
• Searching and Ranking Books
• Digitized Books
• By combining and comparing scores for book headings, TOC and
book titles [2].
• Digitization Projects – Limited/No Ranking
• Project Gutenberg – sorting results
• Google Books – 100 (unknown) ranking signals [1]
• Google Patents [15,16] – Not implemented YET
• Books could be connected through references [14] – Limited
• Need
• Using Semantic Web and Ontologies
ComputerScienceOnline
Conference2016
8
prepa3.sems.udg.mx
Survey of the Literature
• Book Recommendations
• Available Recommenders
• BReK12 – readability levels of K-12 readers + book contents [21]
• BReT – K-12 teachers in finding relevant books for K-12 students [22]
• K3Rec – K-3 readers, their parents, and teachers [23]
• Using near and partial duplicates, citation analysis, and metadata
similarities [24].
• User modeling – information from Social Web [17].
• Book reviews [18, 19].
• Semantic Web and ontologies [25-27]
• Limited – Use only book descriptions not the actual content
• Required
• True content-based semantic book recommender
ComputerScienceOnline
Conference2016
9
bookshelvesofdoom.blogs.com
Survey of the Literature
• Fine-grained access to information in books
• Retrieving similar and related tables, figures, images, algorithms,
equations, quotations, and passages
• Augmenting tables with different data sources to restore back the
lost semantics [28].
• Same is the case with figures and images
• CiteSeer – document, author, and table search
• Need
• Exploitation of book structural semantics and logical connections
ComputerScienceOnline
Conference2016
10
2.bp.blogspot.com
Discussion & Analysis
• Indexing books
• Multi-field inverted index should be used [29].
• Book search engine should be able to understand
• The nature of books, their contents, and user intensions
• E.g., fiction and novels, readers may be interested in different stratas
including the plot, the idea, and the composition of work [30].
• Required
• Semantic indexing by exploiting book structural semantics
• Indexing fictions/novels, and
• Indexing books using metadata
• Book reviews
ComputerScienceOnline
Conference2016
11
Discussion & Analysis
• Searching books
• Search Engine Results Page (SERP)
• Too many relevant and irrelevant results – Information Overload [31]
• Required – User Interface
• Provide more relevant results
• Robust, non-ambiguous, understandable and relevant to information
need
• Present results in a manner that augments user understanding
ComputerScienceOnline
Conference2016
12
davidpoulos.com
Discussion & Analysis
• Ranking and recommending books
• Using ontologies and the actual book contents
• Exploiting structural semantics and logical connections in book
contentss
• Problem
• Existing ontologies (JeromeDL, and DocBook) are limited in fully
describing books
• Required
• Comprehensive book structure and several domain-level
ontologies
• Ontology Engineering and Ontology Learning [32] along with
involving domain experts
ComputerScienceOnline
Conference2016
13
Discussion & Analysis
• Finding Related tables and figures
• Table extraction and searching
• Summarize, elaborate and compare tables
• Interpret tables accurately
• Structure and semantic characteristics of book tables of all possible layout
variations
• Using online knowledge sources in annotating tables [28]
• Using ontologies in indexing, searching, and ranking tables
• Figure extraction and searching
• Relating figures using visual similarities and contextual clues
• To retrieve books that present images and figures on a certain
concept or topic
ComputerScienceOnline
Conference2016
14
Conclusions
• Book Search and Retrieval
• Has been focused by research initiatives and academic research
• Several retrieval methods have been proposed
• Several book ontologies have been developed for indexing,
ranking, and recommending books
• Still we are miles away from the ideal system
• Need
• Further research initiatives for discovering book structural
semantics and its use in searching, ranking, and recommending
books
ComputerScienceOnline
Conference2016
15
Conclusions
• Need – Semantic book search engine
• Treat books different from other web documents
• Use their structural semantics and logical connections in
searching, ranking, and recommendations
• Comprehensive book structure ontology
• Domain-level ontologies
• To process book contents in different domains
• To create a graph-like structure of books to be used by PageRank
type algorithms
• To allow fine-grained access to information in books like tables,
figures, algorithms, equations, similar passages etc.
• To fulfill the information needs of readers and other stakeholders
ComputerScienceOnline
Conference2016
16
ComputerScienceOnline
Conference2016
17

More Related Content

What's hot

Dr Jalaluddin Haider
Dr Jalaluddin HaiderDr Jalaluddin Haider
Dr Jalaluddin Haideriamlibrarian
 
Natural Language Processing in the Wild.pptx
Natural Language Processing in the Wild.pptxNatural Language Processing in the Wild.pptx
Natural Language Processing in the Wild.pptxColleen Farrelly
 
Publishing with IEEE Workshop February 2019
Publishing with IEEE Workshop February 2019Publishing with IEEE Workshop February 2019
Publishing with IEEE Workshop February 2019uoblibraries
 
Mining Virtual Reference Data for an Iterative Assessment Cycle
Mining Virtual Reference Data for an Iterative Assessment CycleMining Virtual Reference Data for an Iterative Assessment Cycle
Mining Virtual Reference Data for an Iterative Assessment CycleAmanda Clay Powers
 

What's hot (6)

Dr Jalaluddin Haider
Dr Jalaluddin HaiderDr Jalaluddin Haider
Dr Jalaluddin Haider
 
Natural Language Processing in the Wild.pptx
Natural Language Processing in the Wild.pptxNatural Language Processing in the Wild.pptx
Natural Language Processing in the Wild.pptx
 
Publishing with IEEE Workshop February 2019
Publishing with IEEE Workshop February 2019Publishing with IEEE Workshop February 2019
Publishing with IEEE Workshop February 2019
 
Ajay swayam
Ajay swayamAjay swayam
Ajay swayam
 
Mining Virtual Reference Data for an Iterative Assessment Cycle
Mining Virtual Reference Data for an Iterative Assessment CycleMining Virtual Reference Data for an Iterative Assessment Cycle
Mining Virtual Reference Data for an Iterative Assessment Cycle
 
Analysis as KM
Analysis as KMAnalysis as KM
Analysis as KM
 

Viewers also liked

Search Engines After The Semanatic Web
Search Engines After The Semanatic WebSearch Engines After The Semanatic Web
Search Engines After The Semanatic Websamar_slideshare
 
Demo: Profiling & Exploration of Linked Open Data
Demo: Profiling & Exploration of Linked Open DataDemo: Profiling & Exploration of Linked Open Data
Demo: Profiling & Exploration of Linked Open DataStefan Dietze
 
Knowledge Patterns SSSW2016
Knowledge Patterns SSSW2016Knowledge Patterns SSSW2016
Knowledge Patterns SSSW2016Aldo Gangemi
 
SemTech 2011 Semantic Search tutorial
SemTech 2011 Semantic Search tutorialSemTech 2011 Semantic Search tutorial
SemTech 2011 Semantic Search tutorialPeter Mika
 
PhD Dissertation Supporting tools for automated generation and visual editing...
PhD Dissertation Supporting tools for automated generation and visual editing...PhD Dissertation Supporting tools for automated generation and visual editing...
PhD Dissertation Supporting tools for automated generation and visual editing...Álvaro Sicilia
 
Tutorial Knowledge Discovery
Tutorial Knowledge DiscoveryTutorial Knowledge Discovery
Tutorial Knowledge DiscoverySSSW
 
Ontological approach for improving semantic web search results
Ontological approach for improving semantic web search resultsOntological approach for improving semantic web search results
Ontological approach for improving semantic web search resultseSAT Journals
 
TwitIE: An Open-Source Information Extraction Pipeline for Microblog Text
TwitIE: An Open-Source Information Extraction Pipeline for Microblog TextTwitIE: An Open-Source Information Extraction Pipeline for Microblog Text
TwitIE: An Open-Source Information Extraction Pipeline for Microblog TextLeon Derczynski
 
Intriduction to Ontotext's KIM platform
Intriduction to Ontotext's KIM platformIntriduction to Ontotext's KIM platform
Intriduction to Ontotext's KIM platformtoncho11
 
Semantic Search Engines
Semantic Search EnginesSemantic Search Engines
Semantic Search EnginesAtul Shridhar
 
Adding Semantic Edge to Your Content – From Authoring to Delivery
Adding Semantic Edge to Your Content – From Authoring to DeliveryAdding Semantic Edge to Your Content – From Authoring to Delivery
Adding Semantic Edge to Your Content – From Authoring to DeliveryOntotext
 
WOTS2E: A Search Engine for a Semantic Web of Things
WOTS2E: A Search Engine for a Semantic Web of ThingsWOTS2E: A Search Engine for a Semantic Web of Things
WOTS2E: A Search Engine for a Semantic Web of ThingsAndreas Kamilaris
 
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information RetrievalKeystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information RetrievalMauro Dragoni
 
Semantics And Search
Semantics And SearchSemantics And Search
Semantics And SearchVestforsk.no
 
Semantic data mining: an ontology based approach
Semantic data mining: an ontology based approachSemantic data mining: an ontology based approach
Semantic data mining: an ontology based approachAgnieszka Ławrynowicz
 
Text Analysis and Semantic Search with GATE
Text Analysis and Semantic Search with GATEText Analysis and Semantic Search with GATE
Text Analysis and Semantic Search with GATEDiana Maynard
 
Semantic security framework and context-aware role-based access control ontol...
Semantic security framework and context-aware role-based access control ontol...Semantic security framework and context-aware role-based access control ontol...
Semantic security framework and context-aware role-based access control ontol...Natalia Díaz Rodríguez
 

Viewers also liked (20)

Search Engines After The Semanatic Web
Search Engines After The Semanatic WebSearch Engines After The Semanatic Web
Search Engines After The Semanatic Web
 
A Survey of Entity Ranking over RDF Graphs
A Survey of Entity Ranking over RDF GraphsA Survey of Entity Ranking over RDF Graphs
A Survey of Entity Ranking over RDF Graphs
 
Demo: Profiling & Exploration of Linked Open Data
Demo: Profiling & Exploration of Linked Open DataDemo: Profiling & Exploration of Linked Open Data
Demo: Profiling & Exploration of Linked Open Data
 
Knowledge Patterns SSSW2016
Knowledge Patterns SSSW2016Knowledge Patterns SSSW2016
Knowledge Patterns SSSW2016
 
SemTech 2011 Semantic Search tutorial
SemTech 2011 Semantic Search tutorialSemTech 2011 Semantic Search tutorial
SemTech 2011 Semantic Search tutorial
 
PhD Dissertation Supporting tools for automated generation and visual editing...
PhD Dissertation Supporting tools for automated generation and visual editing...PhD Dissertation Supporting tools for automated generation and visual editing...
PhD Dissertation Supporting tools for automated generation and visual editing...
 
School intro
School introSchool intro
School intro
 
Tutorial Knowledge Discovery
Tutorial Knowledge DiscoveryTutorial Knowledge Discovery
Tutorial Knowledge Discovery
 
Ontological approach for improving semantic web search results
Ontological approach for improving semantic web search resultsOntological approach for improving semantic web search results
Ontological approach for improving semantic web search results
 
TwitIE: An Open-Source Information Extraction Pipeline for Microblog Text
TwitIE: An Open-Source Information Extraction Pipeline for Microblog TextTwitIE: An Open-Source Information Extraction Pipeline for Microblog Text
TwitIE: An Open-Source Information Extraction Pipeline for Microblog Text
 
Intriduction to Ontotext's KIM platform
Intriduction to Ontotext's KIM platformIntriduction to Ontotext's KIM platform
Intriduction to Ontotext's KIM platform
 
Semantic Search Engines
Semantic Search EnginesSemantic Search Engines
Semantic Search Engines
 
Adding Semantic Edge to Your Content – From Authoring to Delivery
Adding Semantic Edge to Your Content – From Authoring to DeliveryAdding Semantic Edge to Your Content – From Authoring to Delivery
Adding Semantic Edge to Your Content – From Authoring to Delivery
 
A Taxonomy of Semantic Web data Retrieval Techniques
A Taxonomy of Semantic Web data Retrieval TechniquesA Taxonomy of Semantic Web data Retrieval Techniques
A Taxonomy of Semantic Web data Retrieval Techniques
 
WOTS2E: A Search Engine for a Semantic Web of Things
WOTS2E: A Search Engine for a Semantic Web of ThingsWOTS2E: A Search Engine for a Semantic Web of Things
WOTS2E: A Search Engine for a Semantic Web of Things
 
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information RetrievalKeystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
 
Semantics And Search
Semantics And SearchSemantics And Search
Semantics And Search
 
Semantic data mining: an ontology based approach
Semantic data mining: an ontology based approachSemantic data mining: an ontology based approach
Semantic data mining: an ontology based approach
 
Text Analysis and Semantic Search with GATE
Text Analysis and Semantic Search with GATEText Analysis and Semantic Search with GATE
Text Analysis and Semantic Search with GATE
 
Semantic security framework and context-aware role-based access control ontol...
Semantic security framework and context-aware role-based access control ontol...Semantic security framework and context-aware role-based access control ontol...
Semantic security framework and context-aware role-based access control ontol...
 

Similar to In Search of a Semantic Book Search Engine: Are We There Yet?

Literature review in behavioural sciences 10 01 2022
Literature review in behavioural sciences 10 01 2022Literature review in behavioural sciences 10 01 2022
Literature review in behavioural sciences 10 01 2022Dr. Chinchu C
 
Lecture 6 - Literature Review.pptx
Lecture 6 - Literature Review.pptxLecture 6 - Literature Review.pptx
Lecture 6 - Literature Review.pptxHafeezUllah783173
 
Literature review in social sciences
Literature review in social sciencesLiterature review in social sciences
Literature review in social sciencesDr. Chinchu C
 
Literature Review in Legal Research
Literature Review in Legal ResearchLiterature Review in Legal Research
Literature Review in Legal ResearchPreeti Sikder
 
Corp3400 econ3530 ente3532 2015
Corp3400 econ3530 ente3532 2015Corp3400 econ3530 ente3532 2015
Corp3400 econ3530 ente3532 2015Nathan Rush
 
Effective Literature Searching in the Sciences/Social Sciences & Humanities
Effective Literature Searching in the Sciences/Social Sciences & HumanitiesEffective Literature Searching in the Sciences/Social Sciences & Humanities
Effective Literature Searching in the Sciences/Social Sciences & HumanitiesHKBU Library
 
Assessing user experience of e-books in academic libraries
Assessing user experience of e-books in academic librariesAssessing user experience of e-books in academic libraries
Assessing user experience of e-books in academic librariesTao Zhang
 
Getting Published 2009
Getting Published 2009Getting Published 2009
Getting Published 2009Janet Martin
 
Taxonomies & folksonomies
Taxonomies  & folksonomiesTaxonomies  & folksonomies
Taxonomies & folksonomiesAparna Sane
 

Similar to In Search of a Semantic Book Search Engine: Are We There Yet? (20)

Review of Literature.pptx
Review of Literature.pptxReview of Literature.pptx
Review of Literature.pptx
 
LITERATURE AND CRITICAL REVIEW
LITERATURE AND CRITICAL REVIEWLITERATURE AND CRITICAL REVIEW
LITERATURE AND CRITICAL REVIEW
 
Literature review in behavioural sciences 10 01 2022
Literature review in behavioural sciences 10 01 2022Literature review in behavioural sciences 10 01 2022
Literature review in behavioural sciences 10 01 2022
 
Lecture 6 - Literature Review.pptx
Lecture 6 - Literature Review.pptxLecture 6 - Literature Review.pptx
Lecture 6 - Literature Review.pptx
 
Literature review in social sciences
Literature review in social sciencesLiterature review in social sciences
Literature review in social sciences
 
Literature Review in Legal Research
Literature Review in Legal ResearchLiterature Review in Legal Research
Literature Review in Legal Research
 
Corp3400 econ3530 ente3532 2015
Corp3400 econ3530 ente3532 2015Corp3400 econ3530 ente3532 2015
Corp3400 econ3530 ente3532 2015
 
Literature review ppt
Literature review pptLiterature review ppt
Literature review ppt
 
Literature review
Literature reviewLiterature review
Literature review
 
Jameel 6461 day 2.pdf
Jameel 6461 day 2.pdfJameel 6461 day 2.pdf
Jameel 6461 day 2.pdf
 
Literature search
Literature searchLiterature search
Literature search
 
Effective Literature Searching in the Sciences/Social Sciences & Humanities
Effective Literature Searching in the Sciences/Social Sciences & HumanitiesEffective Literature Searching in the Sciences/Social Sciences & Humanities
Effective Literature Searching in the Sciences/Social Sciences & Humanities
 
Assessing user experience of e-books in academic libraries
Assessing user experience of e-books in academic librariesAssessing user experience of e-books in academic libraries
Assessing user experience of e-books in academic libraries
 
Getting Published 2009
Getting Published 2009Getting Published 2009
Getting Published 2009
 
Literature review
Literature reviewLiterature review
Literature review
 
Literature review
Literature reviewLiterature review
Literature review
 
Chapter Two Review of the Literature
Chapter Two Review of the LiteratureChapter Two Review of the Literature
Chapter Two Review of the Literature
 
Taxonomies & folksonomies
Taxonomies  & folksonomiesTaxonomies  & folksonomies
Taxonomies & folksonomies
 
Hm 418 harris ch05 ppt
Hm 418 harris ch05 pptHm 418 harris ch05 ppt
Hm 418 harris ch05 ppt
 
Maximizing New Tools
Maximizing New ToolsMaximizing New Tools
Maximizing New Tools
 

Recently uploaded

AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplatePresentation.STUDIO
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastPapp Krisztián
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...Shane Coughlan
 
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxArtyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxAnnaArtyushina1
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...masabamasaba
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park masabamasaba
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024VictoriaMetrics
 
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...SelfMade bd
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech studentsHimanshiGarg82
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareJim McKeeth
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrandmasabamasaba
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...masabamasaba
 
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburgmasabamasaba
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Bert Jan Schrijver
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...masabamasaba
 

Recently uploaded (20)

AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxArtyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptx
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
 
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 

In Search of a Semantic Book Search Engine: Are We There Yet?

  • 1. In Search of a Semantic Book Search Engine on the Web: Are We There Yet? By Irfan Ullah and Shah Khusro University of Peshawar, Pakistan 5th Computer Science On-line Conference 2016 ComputerScienceOnline Conference2016 1
  • 2. In this Presentation • Abstract • Introduction • Survey of the Literature • Extracting Structure & Indexing Books • Searching and Ranking Books • Book Recommendations • Fine-grained Access to Information in Books • Discussion and Analysis • Conclusions • References ComputerScienceOnline Conference2016 2
  • 3. Abstract • Books – Valuable source of knowledge and learning • Position • Web Information Retrieval (IR) techniques for book retrieval • Existing searching solutions treat books as plaintext collections • Inaccurate and imprecise book search results • Solution • Books are different from web pages • Structural semantics and logical connections in their content for searching, ranking and recommendations • Fine-grained access to information in books e.g. tables, figures ComputerScienceOnline Conference2016 3
  • 4. Introduction • Web Information Retrieval • Rich text collections with explicit hypertextual structure • Used in searching and ranking web pages • Books lack this graph-like structure – Problem • Books are well-organized and logically connected • Presenting a graph-like structure – can be used in searching, ranking, and recommending books • But visible to Human readers only • Problem – Need to be machine understandable and processable http://talk.payloadz.com/wp- content/uploads/2013/10/Selling-Books-Online-660x320.jpg ComputerScienceOnline Conference2016 4
  • 5. Introduction • Solution – Semantic Book Search Engine • What is Required? • A more in-depth and comprehensive book structure ontology • Domain level ontologies to understand book contents in different domains • Connecting books in graph-like manner • Why? • Better searching, ranking, and recommendations • Increase user satisfaction • Promoting objectives of other stakeholders ComputerScienceOnline Conference2016 5
  • 6. Survey of the Literature • Extracting Structure & Indexing Books • Many Research Initiatives and Conferences • INEX, ICDAR, and BooksOnline • Indexing books’ valuable parts [2]. • Book layout analysis for extracting TOC [3] and other parts [8] • Resurgence software for detecting different parts [4-6] • Rule-based and SVM-based methods extracting TOC [7] • Detecting and parsing TOC pages [9], index pages [9] through classical methods [10, 11] and using trailing page whitespace methods [9] • Required • Connecting book title with other parts • Better book indexing, ranking and recommendations ComputerScienceOnline Conference2016 6
  • 7. Survey of the Literature • Searching and Ranking Books • Ranking authors by expert finding to rank books [12] • “Authors capture an important aspect of relevance [12]” • Read books written by popular experts in the field • No bags-of-words models • Ranking by what is actually inside books [13] • Thesaurus, reference works and ontologies • Helping readers in getting useful insights into text and decide about the relevancy of the book ComputerScienceOnline Conference2016 7 www.vectastock.com
  • 8. Survey of the Literature • Searching and Ranking Books • Digitized Books • By combining and comparing scores for book headings, TOC and book titles [2]. • Digitization Projects – Limited/No Ranking • Project Gutenberg – sorting results • Google Books – 100 (unknown) ranking signals [1] • Google Patents [15,16] – Not implemented YET • Books could be connected through references [14] – Limited • Need • Using Semantic Web and Ontologies ComputerScienceOnline Conference2016 8 prepa3.sems.udg.mx
  • 9. Survey of the Literature • Book Recommendations • Available Recommenders • BReK12 – readability levels of K-12 readers + book contents [21] • BReT – K-12 teachers in finding relevant books for K-12 students [22] • K3Rec – K-3 readers, their parents, and teachers [23] • Using near and partial duplicates, citation analysis, and metadata similarities [24]. • User modeling – information from Social Web [17]. • Book reviews [18, 19]. • Semantic Web and ontologies [25-27] • Limited – Use only book descriptions not the actual content • Required • True content-based semantic book recommender ComputerScienceOnline Conference2016 9 bookshelvesofdoom.blogs.com
  • 10. Survey of the Literature • Fine-grained access to information in books • Retrieving similar and related tables, figures, images, algorithms, equations, quotations, and passages • Augmenting tables with different data sources to restore back the lost semantics [28]. • Same is the case with figures and images • CiteSeer – document, author, and table search • Need • Exploitation of book structural semantics and logical connections ComputerScienceOnline Conference2016 10 2.bp.blogspot.com
  • 11. Discussion & Analysis • Indexing books • Multi-field inverted index should be used [29]. • Book search engine should be able to understand • The nature of books, their contents, and user intensions • E.g., fiction and novels, readers may be interested in different stratas including the plot, the idea, and the composition of work [30]. • Required • Semantic indexing by exploiting book structural semantics • Indexing fictions/novels, and • Indexing books using metadata • Book reviews ComputerScienceOnline Conference2016 11
  • 12. Discussion & Analysis • Searching books • Search Engine Results Page (SERP) • Too many relevant and irrelevant results – Information Overload [31] • Required – User Interface • Provide more relevant results • Robust, non-ambiguous, understandable and relevant to information need • Present results in a manner that augments user understanding ComputerScienceOnline Conference2016 12 davidpoulos.com
  • 13. Discussion & Analysis • Ranking and recommending books • Using ontologies and the actual book contents • Exploiting structural semantics and logical connections in book contentss • Problem • Existing ontologies (JeromeDL, and DocBook) are limited in fully describing books • Required • Comprehensive book structure and several domain-level ontologies • Ontology Engineering and Ontology Learning [32] along with involving domain experts ComputerScienceOnline Conference2016 13
  • 14. Discussion & Analysis • Finding Related tables and figures • Table extraction and searching • Summarize, elaborate and compare tables • Interpret tables accurately • Structure and semantic characteristics of book tables of all possible layout variations • Using online knowledge sources in annotating tables [28] • Using ontologies in indexing, searching, and ranking tables • Figure extraction and searching • Relating figures using visual similarities and contextual clues • To retrieve books that present images and figures on a certain concept or topic ComputerScienceOnline Conference2016 14
  • 15. Conclusions • Book Search and Retrieval • Has been focused by research initiatives and academic research • Several retrieval methods have been proposed • Several book ontologies have been developed for indexing, ranking, and recommending books • Still we are miles away from the ideal system • Need • Further research initiatives for discovering book structural semantics and its use in searching, ranking, and recommending books ComputerScienceOnline Conference2016 15
  • 16. Conclusions • Need – Semantic book search engine • Treat books different from other web documents • Use their structural semantics and logical connections in searching, ranking, and recommendations • Comprehensive book structure ontology • Domain-level ontologies • To process book contents in different domains • To create a graph-like structure of books to be used by PageRank type algorithms • To allow fine-grained access to information in books like tables, figures, algorithms, equations, similar passages etc. • To fulfill the information needs of readers and other stakeholders ComputerScienceOnline Conference2016 16

Editor's Notes

  1. Indexing books’ valuable parts e.g., chapter, section and subsection headings, table of contents (TOC), index pages and book titles that are obtained from book metadata [2]. Title: first line in the document except the page number TOC and index pages: Looking for key terms e.g., “table of contents”, “contents”, “page”, “index”, and long number of lines that are ending with digits. Failure: first 3000 characters and last 10 pages of the book [2]. What is Required: Needs further research for greater precision and accuracy in book structure detection and extraction Book title can be connected with TOC, chapters, sections, subsections, tables, images, figures, algorithms, procedures, mathematical equations and different related concepts. Resulting in a connected graph Better search, ranking, and recommendations using contextual clues than using simple bags-of-words models and ordinary ranking methods