SlideShare a Scribd company logo
1 of 27
Binxing Jiao et. al (SIGIR ’10)
Presenter : Lin, Yi-Jhen
Advisor: Dr. Koh. Jia-ling
Date: 2011/4/25
VISUAL SUMMARIZATION OF WEB PAGES
OUTLINE
• Introduction
• Internal image based visual summarization
• Visual Snippet
• External image based visual summarization
• Experimental results
• User study
• Conclusion
INTRODUCTION
• Search and re-finding tasks are among the most typical
applications on the Internet.
• Information search and re-finding tasks would become more
efficient if the web pages are visually summarized.
INTRODUCTION CONT.
Thumbnails
Sites with images
INTRODUCTION CONT.
• Difficulties of internal image based visual summarization
• For those web pages which have complex layout or rich content,
users are difficult to see clearly from the small thumbnails.
• Internal images are unavailable for a large amount of web pages
which do not contain useful images.
• We propose to mine images from the external, which is so-called
external image based visual summarization.
• We investigate the applicability of the different visual
summarization approaches to different kinds of web pages and for
different tasks including search and re-finding.
The different approaches including:
• Thumbnails
• Internal images
• External images
• Visual snippet
INTERNAL IMAGE BASED VISUAL
SUMMARIZATION
• The images contained in a web page are usually used to describe
the content of the page, which are so-called internal images and
can be directly used to summarize the web page.
• We need to detect the most dominant among all the images in a
web page for visually summarization.
Extract
features
Labeled
training data
Ranking-
SVM
Labeled as (𝑥𝑖,𝑗, 𝑦𝑖,𝑗)
𝑥𝑖,𝑗 : extracted feature vector of image i in page j
𝑦𝑖,𝑗 : labeled dominance
0 (useless) / 1 (important) / 2 (highly important)
VISUAL SNIPPET
• Visual snippet enriches internal dominant images by synthesizing
the salient text (e.g., title), dominant image, and logo into a single
image.
Internal
dominant image
selection and
processing
Logo
extraction
Title
extraction
Image
Composition
Detected by a trained model
• Image’s location in the page
• Size
• Name
• Surrounding text
Extracted from the html
source of web page
• The first 19 characters
of the title are cropped
EXTERNAL IMAGE BASED VISUAL
SUMMARIZATION
KEY PHRASE EXTRACTION
• We employ the KEX algorithm to extract the key phrases.
• In KEX, a model is trained by logistic regression, and can be used to
calculate the salience score of each candidate phrase.
• The top 4 key phrases are used as queries to the image search engine.
• For each key phrase, 30 results including images and corresponding
web pages are retrieved for further processing.
EXTERNAL IMAGE BASED VISUAL SUMMARIZATION CONT.
IMAGE RANKING AND FILTERING
• Rerank result images
• Based on the textual similarities between the web
pages containing the images and the target web
page.
• TFIDF --> cosine similarity
• Filter images
• Visual importance of result images is calculated
using VisualRank
• The images whose importance score is below a
threshold will be filtered out
EXTERNAL IMAGE BASED VISUAL SUMMARIZATION CONT.
IMAGE RANKING AND FILTERING CONT.
• VisualRank
• Extract local SIFT features for each image
• Similarity(img i, img j) : the # of local features shared
between them divided by the average # of local
features in img i, img j
• A graph is constructed with images as vertices and
visual similarities as weights on edges
• PageRank is applied on the graph and then the
resulted rank score vector is employed to represent
the importance of each image
EXTERNAL IMAGE BASED VISUAL SUMMARIZATION CONT.
EXPERIMENT SETUP
• 100 queries from KDD CUP 2005
• Use Bing Search API to retrieve top 100 search results for
each of queries.
• 8412 retrieved web pages was obtained
• Download all of internal images and retrieved external
images.
• Expert labelers annotate whether the internal images were
dominant and the external images were relevant to
summarize the corresponding web page.
EXPERIMENTAL RESULTS
EXPERIMENT SETUP CONT.
• DIP : Web pages with at least on internal image annotated as
dominant
• NDIP : without any dominant internal image annotated
• 41.8% of web pages are NDIPs in our data set.
• Labelers judge which summary is the best to represent the web
page.
• He/she can select “others”.
EXPERIMENTAL RESULTS
USEFULNESS OF EXTERNAL IMAGES
• External images can be considered as a useful source for web page
summarization.
• Web pages containing dominant images can be regarded as more
appropriate for visual summarization, which, hence, can also be
summarized well by a relevant image retrieved from the external.
• The external images are valuable complement to internal images
for visual summarization of web pages.
EXPERIMENTAL RESULTS CONT.
COMPARISON OF VARIOUS VISUAL
SUMMARIES
• Dominant Image and Visual Snippet as the best summarization
• Thumbnail as the best summarization
• External Image as the best summarization
EXPERIMENTAL RESULTS CONT.
COMPARISON OF VARIOUS VISUAL
SUMMARIES CONT.
Dominant Image and Visual Snippet as the best summarization
EXPERIMENTAL RESULTS CONT.
• Dominant image based
summarizations are the best
summarization for over a half
DIPs.
• For DIP, the visual snippet is
worse than the dominant image.
• Title information is misread
• Visual snippet is rough-and-
tumble.
COMPARISON OF VARIOUS VISUAL
SUMMARIES CONT.
Thumbnail as the best summarization
EXPERIMENTAL RESULTS CONT.
• categorize these pages into
• pages in small size
• pages with several images
• Pages with salient and clear
image/text
• Pages With logo
• Pages With simple page structure
• Pages in well-known web site
With salient and clear image/text
With logo
Well-known web site
COMPARISON OF VARIOUS VISUAL
SUMMARIES CONT.
External Image as the best summarization
EXPERIMENTAL RESULTS CONT.
• For NDIPs, external images with
tremendous relevance score can
be adopted as reliable
summarizations
• Otherwise, thumbnails are better,
especially for “pages with logo”
and “pages in well-known web
site”.
COMPARISON OF VARIOUS VISUAL
SUMMARIES CONT.
Summary – guidance of selecting summarization type
EXPERIMENTAL RESULTS CONT.
• For NDIP
• External images are useful, but suffer from reliability problem.
• Thumbnail are also good, especially for “pages with logo” and “pages in well-known
web site”.
• For DIP
• If the dominant images are contained and can be recognized in the thumbnail,
thumbnail is the best choice; otherwise dominant images are more reliable.
• If an exclusive dominant image is found and the title information agrees with the
content of the dominant image, visual snippet is better than a single dominant
image.
USER STUDY
• The goal of this study is to explore how different summarization
approaches support a understanding or a re-finding task.
• 51 participants
• Bachelor’s degree at least
• 20-35 years old
• 56% male, 44% female
• Heavy web users
• 11 web pages are chosen from a range of different categories
• 9 DIP / 2 NDIP
USER STUDY CONT.
• Phrase 1 -- understanding
• Each participant was given a web page and one of four visual
summaries
• Step 1 : guess the content of the web page based solely on the provided
visual summarization
• Step 2 : read the web page and to rate a score ranging from -1 to 3
• -1 : the summarization misleads the participant
• 0 : participant can not understanding the summarization
• 1 : the summarization can help users grasp a rough idea on what the
page is about
• 3 : the summarization can help users exactly understanding the
content of the web page
• 2 : the between of 1 and 3
USER STUDY CONT.
• Phrase 2 -- re-finding
• We explored how the various summarizations types would affect the
recall of the web pages visited in phrase 1.
• we specify a web page to be re-found, the participants were asked to
select an image, which can remind them of the web page.
• To evaluate the performance of each summarization type for this
phrase, we define the error rate of a summarization type (e(s)) as
I{ } : an indicator , 1 or 0
RESULTS AND DISCUSSION
USER STUDY CONT.
Understanding Task For DIP For NDIP
RESULTS AND DISCUSSION CONT.
USER STUDY CONT.
Re-finding Task
RESULTS AND DISCUSSION CONT.
USER STUDY CONT.
Re-finding Task
• The visual snippet performs better than other summarizations.
• The additional two components of the visual snippet :
Title and logo information are useful for identifying different visited web
pages.
RESULTS AND DISCUSSION CONT.
USER STUDY CONT.
Re-finding Task
Thumbnail is
good
Thumbnail worse
than visual snippet
External image based summarization is not suitable for re-finding task
CONCLUSION
• An external image based visual summarization is proposed, which selects
the representative images from the entire Internet.
• We present a careful study on comparing the various visual
summarization method for web pages.
• The various summarization approaches have respective advantages on
different types of web pages and for different tasks.
• Thumbnails are simple and effective for web pages with simple
structure or clear image and text with large size
• Internal images can summarize the web page well if it includes
dominant images
• External images are useful for those web pages without dominant
images
• Visual snippets are effective in helping users identifying the visited
web pages in the re-finding task

More Related Content

Similar to Visual Summarization of Web Pages

Measuring the quality of web search engines
Measuring the quality of web search enginesMeasuring the quality of web search engines
Measuring the quality of web search engines
Dirk Lewandowski
 
Renee Anderson, Techniques for prioritizing, road-mapping, and staffing your ...
Renee Anderson, Techniques for prioritizing, road-mapping, and staffing your ...Renee Anderson, Techniques for prioritizing, road-mapping, and staffing your ...
Renee Anderson, Techniques for prioritizing, road-mapping, and staffing your ...
museums and the web
 
Emma.antunes
Emma.antunesEmma.antunes
Emma.antunes
NASAPMC
 
Moving Minds and Moving Code - Understanding, Exploring and Defining SMB Web...
Moving Minds and Moving Code - Understanding, Exploring and  Defining SMB Web...Moving Minds and Moving Code - Understanding, Exploring and  Defining SMB Web...
Moving Minds and Moving Code - Understanding, Exploring and Defining SMB Web...
IIBA Rochester NY
 
Group presentation
Group presentationGroup presentation
Group presentation
so3137
 
The Importance of Site Search for Ecommerce - Ed Hoffman
The Importance of Site Search for Ecommerce - Ed HoffmanThe Importance of Site Search for Ecommerce - Ed Hoffman
The Importance of Site Search for Ecommerce - Ed Hoffman
E-Commerce Brasil
 
Carl week 5 dont make me think part 2 pp
Carl week 5 dont make me think part 2 ppCarl week 5 dont make me think part 2 pp
Carl week 5 dont make me think part 2 pp
wendyr1974
 
Carl week 5 dont make me think part 2 pp
Carl week 5 dont make me think part 2 ppCarl week 5 dont make me think part 2 pp
Carl week 5 dont make me think part 2 pp
wendyr1974
 
4087 chapter 08 8ed part2
4087 chapter 08 8ed part24087 chapter 08 8ed part2
4087 chapter 08 8ed part2
winegron
 

Similar to Visual Summarization of Web Pages (20)

Measuring the quality of web search engines
Measuring the quality of web search enginesMeasuring the quality of web search engines
Measuring the quality of web search engines
 
Fast & Cheap UX Research
Fast & Cheap UX ResearchFast & Cheap UX Research
Fast & Cheap UX Research
 
Understanding Your Project Before You Start
Understanding Your Project Before You StartUnderstanding Your Project Before You Start
Understanding Your Project Before You Start
 
Enabling Opinion-Driven Decision Making - Sentiment Analysis Innovation Summit
Enabling Opinion-Driven Decision Making - Sentiment Analysis Innovation Summit Enabling Opinion-Driven Decision Making - Sentiment Analysis Innovation Summit
Enabling Opinion-Driven Decision Making - Sentiment Analysis Innovation Summit
 
Search engine optimization
Search engine optimizationSearch engine optimization
Search engine optimization
 
Renee Anderson, Techniques for prioritizing, road-mapping, and staffing your ...
Renee Anderson, Techniques for prioritizing, road-mapping, and staffing your ...Renee Anderson, Techniques for prioritizing, road-mapping, and staffing your ...
Renee Anderson, Techniques for prioritizing, road-mapping, and staffing your ...
 
Emma.antunes
Emma.antunesEmma.antunes
Emma.antunes
 
Moving Minds and Moving Code - Understanding, Exploring and Defining SMB Web...
Moving Minds and Moving Code - Understanding, Exploring and  Defining SMB Web...Moving Minds and Moving Code - Understanding, Exploring and  Defining SMB Web...
Moving Minds and Moving Code - Understanding, Exploring and Defining SMB Web...
 
Group presentation
Group presentationGroup presentation
Group presentation
 
URANUS
URANUSURANUS
URANUS
 
The Importance of Site Search for Ecommerce - Ed Hoffman
The Importance of Site Search for Ecommerce - Ed HoffmanThe Importance of Site Search for Ecommerce - Ed Hoffman
The Importance of Site Search for Ecommerce - Ed Hoffman
 
Advanced Guide to Seo (Third Sector - Leeds Digital Festival 2016)
Advanced Guide to Seo (Third Sector - Leeds Digital Festival 2016)Advanced Guide to Seo (Third Sector - Leeds Digital Festival 2016)
Advanced Guide to Seo (Third Sector - Leeds Digital Festival 2016)
 
Ppt ch03
Ppt ch03Ppt ch03
Ppt ch03
 
Ppt ch03
Ppt ch03Ppt ch03
Ppt ch03
 
Week 8 & 10
Week 8 & 10Week 8 & 10
Week 8 & 10
 
9781111528705_PPT_ch03.ppt
9781111528705_PPT_ch03.ppt9781111528705_PPT_ch03.ppt
9781111528705_PPT_ch03.ppt
 
Basics of Interaction Design & Strategy - 6/12/15
Basics of Interaction Design & Strategy - 6/12/15Basics of Interaction Design & Strategy - 6/12/15
Basics of Interaction Design & Strategy - 6/12/15
 
Carl week 5 dont make me think part 2 pp
Carl week 5 dont make me think part 2 ppCarl week 5 dont make me think part 2 pp
Carl week 5 dont make me think part 2 pp
 
Carl week 5 dont make me think part 2 pp
Carl week 5 dont make me think part 2 ppCarl week 5 dont make me think part 2 pp
Carl week 5 dont make me think part 2 pp
 
4087 chapter 08 8ed part2
4087 chapter 08 8ed part24087 chapter 08 8ed part2
4087 chapter 08 8ed part2
 

More from YI-JHEN LIN

More from YI-JHEN LIN (6)

Adaptive relevance feedback in information retrieval
Adaptive relevance feedback in information retrievalAdaptive relevance feedback in information retrieval
Adaptive relevance feedback in information retrieval
 
BioSnowball_Automated Population of Wikis
BioSnowball_Automated Population of WikisBioSnowball_Automated Population of Wikis
BioSnowball_Automated Population of Wikis
 
Probabilistic Models of Novel Document Rankings for Faceted Topic Retrieval
Probabilistic Models of Novel Document Rankings for Faceted Topic RetrievalProbabilistic Models of Novel Document Rankings for Faceted Topic Retrieval
Probabilistic Models of Novel Document Rankings for Faceted Topic Retrieval
 
PQC_Personalized Query Classification
PQC_Personalized Query ClassificationPQC_Personalized Query Classification
PQC_Personalized Query Classification
 
Query Dependent Pseudo-Relevance Feedback based on Wikipedia
Query Dependent Pseudo-Relevance Feedback based on WikipediaQuery Dependent Pseudo-Relevance Feedback based on Wikipedia
Query Dependent Pseudo-Relevance Feedback based on Wikipedia
 
MrKNN_Soft Relevance for Multi-label Classification
MrKNN_Soft Relevance for Multi-label ClassificationMrKNN_Soft Relevance for Multi-label Classification
MrKNN_Soft Relevance for Multi-label Classification
 

Recently uploaded

怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
vexqp
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
q6pzkpark
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Klinik kandungan
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
chadhar227
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit RiyadhCytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Abortion pills in Riyadh +966572737505 get cytotec
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
nirzagarg
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
ahmedjiabur940
 
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
vexqp
 

Recently uploaded (20)

Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
SR-101-01012024-EN.docx  Federal Constitution  of the Swiss ConfederationSR-101-01012024-EN.docx  Federal Constitution  of the Swiss Confederation
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit RiyadhCytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
 
Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
 

Visual Summarization of Web Pages

  • 1. Binxing Jiao et. al (SIGIR ’10) Presenter : Lin, Yi-Jhen Advisor: Dr. Koh. Jia-ling Date: 2011/4/25 VISUAL SUMMARIZATION OF WEB PAGES
  • 2. OUTLINE • Introduction • Internal image based visual summarization • Visual Snippet • External image based visual summarization • Experimental results • User study • Conclusion
  • 3. INTRODUCTION • Search and re-finding tasks are among the most typical applications on the Internet. • Information search and re-finding tasks would become more efficient if the web pages are visually summarized.
  • 5. INTRODUCTION CONT. • Difficulties of internal image based visual summarization • For those web pages which have complex layout or rich content, users are difficult to see clearly from the small thumbnails. • Internal images are unavailable for a large amount of web pages which do not contain useful images. • We propose to mine images from the external, which is so-called external image based visual summarization. • We investigate the applicability of the different visual summarization approaches to different kinds of web pages and for different tasks including search and re-finding. The different approaches including: • Thumbnails • Internal images • External images • Visual snippet
  • 6. INTERNAL IMAGE BASED VISUAL SUMMARIZATION • The images contained in a web page are usually used to describe the content of the page, which are so-called internal images and can be directly used to summarize the web page. • We need to detect the most dominant among all the images in a web page for visually summarization. Extract features Labeled training data Ranking- SVM Labeled as (𝑥𝑖,𝑗, 𝑦𝑖,𝑗) 𝑥𝑖,𝑗 : extracted feature vector of image i in page j 𝑦𝑖,𝑗 : labeled dominance 0 (useless) / 1 (important) / 2 (highly important)
  • 7. VISUAL SNIPPET • Visual snippet enriches internal dominant images by synthesizing the salient text (e.g., title), dominant image, and logo into a single image. Internal dominant image selection and processing Logo extraction Title extraction Image Composition Detected by a trained model • Image’s location in the page • Size • Name • Surrounding text Extracted from the html source of web page • The first 19 characters of the title are cropped
  • 8. EXTERNAL IMAGE BASED VISUAL SUMMARIZATION
  • 9. KEY PHRASE EXTRACTION • We employ the KEX algorithm to extract the key phrases. • In KEX, a model is trained by logistic regression, and can be used to calculate the salience score of each candidate phrase. • The top 4 key phrases are used as queries to the image search engine. • For each key phrase, 30 results including images and corresponding web pages are retrieved for further processing. EXTERNAL IMAGE BASED VISUAL SUMMARIZATION CONT.
  • 10. IMAGE RANKING AND FILTERING • Rerank result images • Based on the textual similarities between the web pages containing the images and the target web page. • TFIDF --> cosine similarity • Filter images • Visual importance of result images is calculated using VisualRank • The images whose importance score is below a threshold will be filtered out EXTERNAL IMAGE BASED VISUAL SUMMARIZATION CONT.
  • 11. IMAGE RANKING AND FILTERING CONT. • VisualRank • Extract local SIFT features for each image • Similarity(img i, img j) : the # of local features shared between them divided by the average # of local features in img i, img j • A graph is constructed with images as vertices and visual similarities as weights on edges • PageRank is applied on the graph and then the resulted rank score vector is employed to represent the importance of each image EXTERNAL IMAGE BASED VISUAL SUMMARIZATION CONT.
  • 12. EXPERIMENT SETUP • 100 queries from KDD CUP 2005 • Use Bing Search API to retrieve top 100 search results for each of queries. • 8412 retrieved web pages was obtained • Download all of internal images and retrieved external images. • Expert labelers annotate whether the internal images were dominant and the external images were relevant to summarize the corresponding web page. EXPERIMENTAL RESULTS
  • 13. EXPERIMENT SETUP CONT. • DIP : Web pages with at least on internal image annotated as dominant • NDIP : without any dominant internal image annotated • 41.8% of web pages are NDIPs in our data set. • Labelers judge which summary is the best to represent the web page. • He/she can select “others”. EXPERIMENTAL RESULTS
  • 14. USEFULNESS OF EXTERNAL IMAGES • External images can be considered as a useful source for web page summarization. • Web pages containing dominant images can be regarded as more appropriate for visual summarization, which, hence, can also be summarized well by a relevant image retrieved from the external. • The external images are valuable complement to internal images for visual summarization of web pages. EXPERIMENTAL RESULTS CONT.
  • 15. COMPARISON OF VARIOUS VISUAL SUMMARIES • Dominant Image and Visual Snippet as the best summarization • Thumbnail as the best summarization • External Image as the best summarization EXPERIMENTAL RESULTS CONT.
  • 16. COMPARISON OF VARIOUS VISUAL SUMMARIES CONT. Dominant Image and Visual Snippet as the best summarization EXPERIMENTAL RESULTS CONT. • Dominant image based summarizations are the best summarization for over a half DIPs. • For DIP, the visual snippet is worse than the dominant image. • Title information is misread • Visual snippet is rough-and- tumble.
  • 17. COMPARISON OF VARIOUS VISUAL SUMMARIES CONT. Thumbnail as the best summarization EXPERIMENTAL RESULTS CONT. • categorize these pages into • pages in small size • pages with several images • Pages with salient and clear image/text • Pages With logo • Pages With simple page structure • Pages in well-known web site With salient and clear image/text With logo Well-known web site
  • 18. COMPARISON OF VARIOUS VISUAL SUMMARIES CONT. External Image as the best summarization EXPERIMENTAL RESULTS CONT. • For NDIPs, external images with tremendous relevance score can be adopted as reliable summarizations • Otherwise, thumbnails are better, especially for “pages with logo” and “pages in well-known web site”.
  • 19. COMPARISON OF VARIOUS VISUAL SUMMARIES CONT. Summary – guidance of selecting summarization type EXPERIMENTAL RESULTS CONT. • For NDIP • External images are useful, but suffer from reliability problem. • Thumbnail are also good, especially for “pages with logo” and “pages in well-known web site”. • For DIP • If the dominant images are contained and can be recognized in the thumbnail, thumbnail is the best choice; otherwise dominant images are more reliable. • If an exclusive dominant image is found and the title information agrees with the content of the dominant image, visual snippet is better than a single dominant image.
  • 20. USER STUDY • The goal of this study is to explore how different summarization approaches support a understanding or a re-finding task. • 51 participants • Bachelor’s degree at least • 20-35 years old • 56% male, 44% female • Heavy web users • 11 web pages are chosen from a range of different categories • 9 DIP / 2 NDIP
  • 21. USER STUDY CONT. • Phrase 1 -- understanding • Each participant was given a web page and one of four visual summaries • Step 1 : guess the content of the web page based solely on the provided visual summarization • Step 2 : read the web page and to rate a score ranging from -1 to 3 • -1 : the summarization misleads the participant • 0 : participant can not understanding the summarization • 1 : the summarization can help users grasp a rough idea on what the page is about • 3 : the summarization can help users exactly understanding the content of the web page • 2 : the between of 1 and 3
  • 22. USER STUDY CONT. • Phrase 2 -- re-finding • We explored how the various summarizations types would affect the recall of the web pages visited in phrase 1. • we specify a web page to be re-found, the participants were asked to select an image, which can remind them of the web page. • To evaluate the performance of each summarization type for this phrase, we define the error rate of a summarization type (e(s)) as I{ } : an indicator , 1 or 0
  • 23. RESULTS AND DISCUSSION USER STUDY CONT. Understanding Task For DIP For NDIP
  • 24. RESULTS AND DISCUSSION CONT. USER STUDY CONT. Re-finding Task
  • 25. RESULTS AND DISCUSSION CONT. USER STUDY CONT. Re-finding Task • The visual snippet performs better than other summarizations. • The additional two components of the visual snippet : Title and logo information are useful for identifying different visited web pages.
  • 26. RESULTS AND DISCUSSION CONT. USER STUDY CONT. Re-finding Task Thumbnail is good Thumbnail worse than visual snippet External image based summarization is not suitable for re-finding task
  • 27. CONCLUSION • An external image based visual summarization is proposed, which selects the representative images from the entire Internet. • We present a careful study on comparing the various visual summarization method for web pages. • The various summarization approaches have respective advantages on different types of web pages and for different tasks. • Thumbnails are simple and effective for web pages with simple structure or clear image and text with large size • Internal images can summarize the web page well if it includes dominant images • External images are useful for those web pages without dominant images • Visual snippets are effective in helping users identifying the visited web pages in the re-finding task