In Search of a Semantic Book Search Engine: Are We There Yet?

In Search of a Semantic Book
Search Engine on the Web:
Are We There Yet?
By
Irfan Ullah and Shah Khusro
University of Peshawar, Pakistan
5th Computer Science On-line Conference 2016
ComputerScienceOnline
Conference2016
1

In this Presentation
• Abstract
• Introduction
• Survey of the Literature
• Extracting Structure & Indexing Books
• Searching and Ranking Books
• Book Recommendations
• Fine-grained Access to Information in Books
• Discussion and Analysis
• Conclusions
• References
Conference2016
2

Abstract
• Books – Valuable source of knowledge and learning
• Position
• Web Information Retrieval (IR) techniques for book retrieval
• Existing searching solutions treat books as plaintext collections
• Inaccurate and imprecise book search results
• Solution
• Books are different from web pages
• Structural semantics and logical connections in their content for
searching, ranking and recommendations
• Fine-grained access to information in books e.g. tables, figures
Conference2016
3

Introduction
• Web Information Retrieval
• Rich text collections with explicit hypertextual structure
• Used in searching and ranking web pages
• Books lack this graph-like structure – Problem
• Books are well-organized and logically connected
• Presenting a graph-like structure – can be used in searching,
ranking, and recommending books
• But visible to Human readers only
• Problem – Need to be machine understandable and processable
http://talk.payloadz.com/wp-
content/uploads/2013/10/Selling-Books-Online-660x320.jpg
Conference2016
4

Introduction
• Solution – Semantic Book Search Engine
• What is Required?
• A more in-depth and comprehensive book structure ontology
• Domain level ontologies to understand book contents in different
domains
• Connecting books in graph-like manner
• Why?
• Better searching, ranking, and recommendations
• Increase user satisfaction
• Promoting objectives of other stakeholders
Conference2016
5

Survey of the Literature
• Extracting Structure & Indexing Books
• Many Research Initiatives and Conferences
• INEX, ICDAR, and BooksOnline
• Indexing books’ valuable parts [2].
• Book layout analysis for extracting TOC [3] and other parts [8]
• Resurgence software for detecting different parts [4-6]
• Rule-based and SVM-based methods extracting TOC [7]
• Detecting and parsing TOC pages [9], index pages [9] through
classical methods [10, 11] and using trailing page whitespace
methods [9]
• Required
• Connecting book title with other parts
• Better book indexing, ranking and recommendations
Conference2016
6

• Ranking authors by expert finding to rank books [12]
• “Authors capture an important aspect of relevance [12]”
• Read books written by popular experts in the field
• No bags-of-words models
• Ranking by what is actually inside books [13]
• Thesaurus, reference works and ontologies
• Helping readers in getting useful insights into text and decide about
the relevancy of the book
Conference2016
7
www.vectastock.com

• Digitized Books
• By combining and comparing scores for book headings, TOC and
book titles [2].
• Digitization Projects – Limited/No Ranking
• Project Gutenberg – sorting results
• Google Books – 100 (unknown) ranking signals [1]
• Google Patents [15,16] – Not implemented YET
• Books could be connected through references [14] – Limited
• Need
• Using Semantic Web and Ontologies
Conference2016
8
prepa3.sems.udg.mx

• Book Recommendations
• Available Recommenders
• BReK12 – readability levels of K-12 readers + book contents [21]
• BReT – K-12 teachers in finding relevant books for K-12 students [22]
• K3Rec – K-3 readers, their parents, and teachers [23]
• Using near and partial duplicates, citation analysis, and metadata
similarities [24].
• User modeling – information from Social Web [17].
• Book reviews [18, 19].
• Semantic Web and ontologies [25-27]
• Limited – Use only book descriptions not the actual content
• Required
• True content-based semantic book recommender
Conference2016
9
bookshelvesofdoom.blogs.com

• Fine-grained access to information in books
• Retrieving similar and related tables, figures, images, algorithms,
equations, quotations, and passages
• Augmenting tables with different data sources to restore back the
lost semantics [28].
• Same is the case with figures and images
• CiteSeer – document, author, and table search
• Need
• Exploitation of book structural semantics and logical connections
Conference2016
10
2.bp.blogspot.com

Discussion & Analysis
• Indexing books
• Multi-field inverted index should be used [29].
• Book search engine should be able to understand
• The nature of books, their contents, and user intensions
• E.g., fiction and novels, readers may be interested in different stratas
including the plot, the idea, and the composition of work [30].
• Required
• Semantic indexing by exploiting book structural semantics
• Indexing fictions/novels, and
• Indexing books using metadata
• Book reviews
Conference2016
11

• Searching books
• Search Engine Results Page (SERP)
• Too many relevant and irrelevant results – Information Overload [31]
• Required – User Interface
• Provide more relevant results
• Robust, non-ambiguous, understandable and relevant to information
need
• Present results in a manner that augments user understanding
Conference2016
12
davidpoulos.com

• Ranking and recommending books
• Using ontologies and the actual book contents
• Exploiting structural semantics and logical connections in book
contentss
• Problem
• Existing ontologies (JeromeDL, and DocBook) are limited in fully
describing books
• Required
• Comprehensive book structure and several domain-level
ontologies
• Ontology Engineering and Ontology Learning [32] along with
involving domain experts
Conference2016
13

• Finding Related tables and figures
• Table extraction and searching
• Summarize, elaborate and compare tables
• Interpret tables accurately
• Structure and semantic characteristics of book tables of all possible layout
variations
• Using online knowledge sources in annotating tables [28]
• Using ontologies in indexing, searching, and ranking tables
• Figure extraction and searching
• Relating figures using visual similarities and contextual clues
• To retrieve books that present images and figures on a certain
concept or topic
Conference2016
14

Conclusions
• Book Search and Retrieval
• Has been focused by research initiatives and academic research
• Several retrieval methods have been proposed
• Several book ontologies have been developed for indexing,
ranking, and recommending books
• Still we are miles away from the ideal system
• Need
• Further research initiatives for discovering book structural
semantics and its use in searching, ranking, and recommending
books
Conference2016
15

Conclusions
• Need – Semantic book search engine
• Treat books different from other web documents
• Use their structural semantics and logical connections in
searching, ranking, and recommendations
• Comprehensive book structure ontology
• Domain-level ontologies
• To process book contents in different domains
• To create a graph-like structure of books to be used by PageRank
type algorithms
• To allow fine-grained access to information in books like tables,
figures, algorithms, equations, similar passages etc.
• To fulfill the information needs of readers and other stakeholders
Conference2016
16

Conference2016
17

In Search of a Semantic Book Search Engine: Are We There Yet?

Recommended

Recommended

More Related Content

What's hot

What's hot (6)

Viewers also liked

Viewers also liked (20)

Similar to In Search of a Semantic Book Search Engine: Are We There Yet?

Similar to In Search of a Semantic Book Search Engine: Are We There Yet? (20)

Recently uploaded

Recently uploaded (20)

In Search of a Semantic Book Search Engine: Are We There Yet?

Editor's Notes