%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
In Search of a Semantic Book Search Engine: Are We There Yet?
1. In Search of a Semantic Book
Search Engine on the Web:
Are We There Yet?
By
Irfan Ullah and Shah Khusro
University of Peshawar, Pakistan
5th Computer Science On-line Conference 2016
ComputerScienceOnline
Conference2016
1
2. In this Presentation
• Abstract
• Introduction
• Survey of the Literature
• Extracting Structure & Indexing Books
• Searching and Ranking Books
• Book Recommendations
• Fine-grained Access to Information in Books
• Discussion and Analysis
• Conclusions
• References
ComputerScienceOnline
Conference2016
2
3. Abstract
• Books – Valuable source of knowledge and learning
• Position
• Web Information Retrieval (IR) techniques for book retrieval
• Existing searching solutions treat books as plaintext collections
• Inaccurate and imprecise book search results
• Solution
• Books are different from web pages
• Structural semantics and logical connections in their content for
searching, ranking and recommendations
• Fine-grained access to information in books e.g. tables, figures
ComputerScienceOnline
Conference2016
3
4. Introduction
• Web Information Retrieval
• Rich text collections with explicit hypertextual structure
• Used in searching and ranking web pages
• Books lack this graph-like structure – Problem
• Books are well-organized and logically connected
• Presenting a graph-like structure – can be used in searching,
ranking, and recommending books
• But visible to Human readers only
• Problem – Need to be machine understandable and processable
http://talk.payloadz.com/wp-
content/uploads/2013/10/Selling-Books-Online-660x320.jpg
ComputerScienceOnline
Conference2016
4
5. Introduction
• Solution – Semantic Book Search Engine
• What is Required?
• A more in-depth and comprehensive book structure ontology
• Domain level ontologies to understand book contents in different
domains
• Connecting books in graph-like manner
• Why?
• Better searching, ranking, and recommendations
• Increase user satisfaction
• Promoting objectives of other stakeholders
ComputerScienceOnline
Conference2016
5
6. Survey of the Literature
• Extracting Structure & Indexing Books
• Many Research Initiatives and Conferences
• INEX, ICDAR, and BooksOnline
• Indexing books’ valuable parts [2].
• Book layout analysis for extracting TOC [3] and other parts [8]
• Resurgence software for detecting different parts [4-6]
• Rule-based and SVM-based methods extracting TOC [7]
• Detecting and parsing TOC pages [9], index pages [9] through
classical methods [10, 11] and using trailing page whitespace
methods [9]
• Required
• Connecting book title with other parts
• Better book indexing, ranking and recommendations
ComputerScienceOnline
Conference2016
6
7. Survey of the Literature
• Searching and Ranking Books
• Ranking authors by expert finding to rank books [12]
• “Authors capture an important aspect of relevance [12]”
• Read books written by popular experts in the field
• No bags-of-words models
• Ranking by what is actually inside books [13]
• Thesaurus, reference works and ontologies
• Helping readers in getting useful insights into text and decide about
the relevancy of the book
ComputerScienceOnline
Conference2016
7
www.vectastock.com
8. Survey of the Literature
• Searching and Ranking Books
• Digitized Books
• By combining and comparing scores for book headings, TOC and
book titles [2].
• Digitization Projects – Limited/No Ranking
• Project Gutenberg – sorting results
• Google Books – 100 (unknown) ranking signals [1]
• Google Patents [15,16] – Not implemented YET
• Books could be connected through references [14] – Limited
• Need
• Using Semantic Web and Ontologies
ComputerScienceOnline
Conference2016
8
prepa3.sems.udg.mx
9. Survey of the Literature
• Book Recommendations
• Available Recommenders
• BReK12 – readability levels of K-12 readers + book contents [21]
• BReT – K-12 teachers in finding relevant books for K-12 students [22]
• K3Rec – K-3 readers, their parents, and teachers [23]
• Using near and partial duplicates, citation analysis, and metadata
similarities [24].
• User modeling – information from Social Web [17].
• Book reviews [18, 19].
• Semantic Web and ontologies [25-27]
• Limited – Use only book descriptions not the actual content
• Required
• True content-based semantic book recommender
ComputerScienceOnline
Conference2016
9
bookshelvesofdoom.blogs.com
10. Survey of the Literature
• Fine-grained access to information in books
• Retrieving similar and related tables, figures, images, algorithms,
equations, quotations, and passages
• Augmenting tables with different data sources to restore back the
lost semantics [28].
• Same is the case with figures and images
• CiteSeer – document, author, and table search
• Need
• Exploitation of book structural semantics and logical connections
ComputerScienceOnline
Conference2016
10
2.bp.blogspot.com
11. Discussion & Analysis
• Indexing books
• Multi-field inverted index should be used [29].
• Book search engine should be able to understand
• The nature of books, their contents, and user intensions
• E.g., fiction and novels, readers may be interested in different stratas
including the plot, the idea, and the composition of work [30].
• Required
• Semantic indexing by exploiting book structural semantics
• Indexing fictions/novels, and
• Indexing books using metadata
• Book reviews
ComputerScienceOnline
Conference2016
11
12. Discussion & Analysis
• Searching books
• Search Engine Results Page (SERP)
• Too many relevant and irrelevant results – Information Overload [31]
• Required – User Interface
• Provide more relevant results
• Robust, non-ambiguous, understandable and relevant to information
need
• Present results in a manner that augments user understanding
ComputerScienceOnline
Conference2016
12
davidpoulos.com
13. Discussion & Analysis
• Ranking and recommending books
• Using ontologies and the actual book contents
• Exploiting structural semantics and logical connections in book
contentss
• Problem
• Existing ontologies (JeromeDL, and DocBook) are limited in fully
describing books
• Required
• Comprehensive book structure and several domain-level
ontologies
• Ontology Engineering and Ontology Learning [32] along with
involving domain experts
ComputerScienceOnline
Conference2016
13
14. Discussion & Analysis
• Finding Related tables and figures
• Table extraction and searching
• Summarize, elaborate and compare tables
• Interpret tables accurately
• Structure and semantic characteristics of book tables of all possible layout
variations
• Using online knowledge sources in annotating tables [28]
• Using ontologies in indexing, searching, and ranking tables
• Figure extraction and searching
• Relating figures using visual similarities and contextual clues
• To retrieve books that present images and figures on a certain
concept or topic
ComputerScienceOnline
Conference2016
14
15. Conclusions
• Book Search and Retrieval
• Has been focused by research initiatives and academic research
• Several retrieval methods have been proposed
• Several book ontologies have been developed for indexing,
ranking, and recommending books
• Still we are miles away from the ideal system
• Need
• Further research initiatives for discovering book structural
semantics and its use in searching, ranking, and recommending
books
ComputerScienceOnline
Conference2016
15
16. Conclusions
• Need – Semantic book search engine
• Treat books different from other web documents
• Use their structural semantics and logical connections in
searching, ranking, and recommendations
• Comprehensive book structure ontology
• Domain-level ontologies
• To process book contents in different domains
• To create a graph-like structure of books to be used by PageRank
type algorithms
• To allow fine-grained access to information in books like tables,
figures, algorithms, equations, similar passages etc.
• To fulfill the information needs of readers and other stakeholders
ComputerScienceOnline
Conference2016
16
Indexing books’ valuable parts
e.g., chapter, section and subsection headings, table of contents (TOC), index pages and book titles that are obtained from book metadata [2].
Title: first line in the document except the page number
TOC and index pages: Looking for key terms e.g., “table of contents”, “contents”, “page”, “index”, and long number of lines that are ending with digits.
Failure: first 3000 characters and last 10 pages of the book [2].
What is Required:
Needs further research for greater precision and accuracy in book structure detection and extraction
Book title can be connected with
TOC, chapters, sections, subsections, tables, images, figures, algorithms, procedures, mathematical equations and different related concepts.
Resulting in a connected graph
Better search, ranking, and recommendations using contextual clues than using simple bags-of-words models and ordinary ranking methods