Media Fragments Indexing using Social Media

Media Fragment Indexing
Using Social Media
Yunjia Li1, Raphael Troncy2, Mike Wald1 and Gary Wills1
1School of Electronics and Computer Science
University of Southampton, UK
2EURECOM, Sophia Antipolis, France,
1

Agenda
• Media Fragments
• Media Fragment Indexing Framework
• Survey on Media Fragment URI Implementations on Video
Sharing Platforms
• Indexing Media Fragments Using Twitter
• Conclusions and Future Work
2

Media Fragment
• Denote the inside content of multimedia resources
• Dimensions defined in the Media Fragment URI 1.0 spec
– Temporal dimension
http://example.org/test.mp4#t=3,7
– Spatial dimension (a rectangle area)
http://example.org/test.mp4#xywh=120,240,180,240
3

Current Situation
• Multimedia uploading, sharing, tagging is easy
• Searching a complete multimedia resource on major search
engines is easy
• But searching multimedia resource at a fine-grained level
on major search engines is difficult
– Availability of annotations: limited amount of
annotations linked to media fragments
– SEO problem:
• The landing page is not search-engine-friendly
• Everything is on the same page and the notion of
media fragment is not explicitly embedded in HTML
4

Media Fragment Indexing
Framework
5

Google’s Ajax Content Crawler
• The Crawler is designed to index Ajax content
• Replace token “#!” in URLs with “_escaped_fragment_”
6
*Diagram from
https://developers.google.com/webmasters/ajax-
crawling/docs/getting-started

Key Ideas
• The fragment information must be included in the URL
– Syntax: W3C Media Fragment 1.0 Specification
• Prepare two sets of pages for every media fragment
– original landing page for end-users
– a snapshot page for SEO
• Landing page keeps the original user interaction
– Highlight media fragments on opening
• SEO page
– ONLY includes annotations of the media fragment
– Embed rich snippet 7

The Solution
8
Server
Crawler
1:
1: Submit pretty URL replay/1#!t=3,7 to the crawler
2:
2: Crawler asks server for replay/1?_escaped_fragment_=t=3,7
Terrace Theater
3:
Snapshot page
Snapshot/1?_escaped_frag
ment_=t=3,7
3: Redirect the request to the snapshot page generated by the server. The snapshot page only
contains annotations and Microdata for “#t=3,7”,
Terrace Theater
Linked Data
Landing page replay/1#!t=3,7
Terrace Theater replay/1#!t=3,7
4:
4: The snapshot page is returned to the crawler with URL replay/1#!t=3,7
5: Terrace Theater
5: A user searches keyword “Terrace Theater”
6: replay/1#!t=3,7
6: Google includes replay/1#!t=3,7 in the search results
7:
7: The user click the link and ask for the document at replay/1#!t=3,7
8:
8: The server returns the landing page containing both “Terrace Theater” and “Linked Data”
9:
9: The landing page highlights the media fragment by start playing from 3s to 7s

Discussion
• The Media Fragment Indexing Framework solved the SEO
problem of media fragments
• The scalability of such method largely relies on whether
there are large number of annotations linked to media
fragments
• Looking for media fragment annotations?
– Timed-text, transcript, speech recognition
– Manual annotations on each video sharing platforms
– Social Media (Twitter)
9

Survey on Media Fragment
URI Implementation
10

Media Fragments and Social Media
• The deep-linking function
• A Media Fragment URL can be embedded in a Tweet
• Text of the Tweet is the annotation to the URL
• Get annotations by filtering Tweets that have MF URIs 11

Filter Tweets by Media Fragment URIs
• Problem:
– Any URL in Tweet is potentially a MF URI
– Too many false-positive cases
http://example.org/1234#t=23
http://example.org/1234?t=23
http://example.org/1234?track=23
– They could all be MF URIs, need to be identified
manually
• Work around:
– Identify platforms (partially-)implementing MF URI
– Only filter Tweets containing URLs from those domains
12

Survey Methodology
• Find a list of video sharing platforms
– http://en.wikipedia.org/wiki/List_of_video_hosting_services
– 59 websites are targeted in the survey
– Some of them have access restrictions
• Go through each website manually to see whether they
provide deep-linking function, such as:
– Social sharing button from a certain time point
– Deep-linking option in right click menu
13

Survey Results (1)
• 9 websites partially-implemented MFURI
– 56.com, Dailymotion, Hulu, Vbox7, Viddler, vimeo,
Tudou, Youku and YouTube
• They use different syntax to encode temporal dimension
– Most of them use URI query, except YouTube & Vimeo
– Parameter name: “start”, “t”, “st”, etc
– Only Hulu implemented the end time
• Only YouTube partially implemented spatial dimension
– This is an external function implemented by Clickberry
https://clickberry.tv/video/6dafe30e-dcb8-44b8-8190-32be8249a297
14

Survey Results (2)
• Only 9 websites partially-implemented MFURI, however:
– Those websites have covered most videos shared on the
web
– eBizMBA report:
http://www.ebizmba.com/articles/video-websites
• Select filter keywords based on the survey results:
– Twitter is banned in China, so 56.com, Tudou and
Youku are ignored
– Hulu has access restriction outside U.S.
• Filter keywords: “YouTube”, “Dailymotion”, “Vbox7”,
“Vimeo” and “Viddler”
15

Indexing Media Fragments
Using Twitter
16

Twitter Media Fragment Indexer
• Collect Tweets filtered by the keywords
• Extract MF URIs in Tweets, parse the media fragment
information
• Use Media Fragment Indexing Framework to publish
Tweets as media fragment annotations
• Embed rich snippet in the snapshot pages
• Create sitemap for Google to crawl the snapshot pages
• User searches keywords in the Tweet in Google and the link
will lead to the video with corresponding start time
17

Indexing Results (1)
• Monitor 50-hour non-stop Twitter stream
• Filter phrase: “youtube, dailymotion, vimeo, vbox7, viddler”
• 5,779,858 Tweets examined, 5,269,742 contain URLs
• 32,754 Tweets contain MF URIs, 32796 MF URIs in total
• Media Fragment URIs shared in each website:
19
Website No. of MFURIs %
YouTube 32,666 99.604
Dailymotion 101 0.308
Vbox7 0 0
Viddler 0 0
Vimeo 29 0.088

Indexing Results (2)
• 13,088 distinct videos are found
• 17,854 distinct MF URIs for sitemap
– Many Tweets share the same video, but different
fragments
– Many retweets
– Some video are not available in UK
• 17,479 URLs (97.9%) in the sitemap have been indexed by
Google
• Only 775 URLs are indexed as VideoObject even though all
rich snippets are embedded in all snapshot pages
20

Demo
• Search “Chris Eppstein”
• As a result, this landing page will be opened and the video
start playing from the time indicated in the Tweet
containing keywords “Chris Eppstein”
21

Conclusions and Future Work
22

Conclusions
• Introduced Media Fragment Indexing Framework
• Propose the using of social media to acquire more
annotations to media fragments
• Survey the MF URI implementation on major video sharing
platforms
• Twitter Media Fragment Indexer
– Monitor Tweet Stream and automatically create media
fragment annotations
– Index media fragments in Google
– YouTube is the most important domain to share media
fragments on Twitter 23

Future Work
• How valid tweets could be served as media fragment
annotations
– many noisy and unrelated text
– many re-tweets
• Experiment on larger scale (billions of tweets and
continuous monitoring)
• Expand the methodology to other media fragment
annotations, such as timed-text
• Extract named entities from tweets and further link media
fragments to the Linked Data Cloud
24

Media Fragments Indexing using Social Media

Recommended

Recommended

More Related Content

What's hot

What's hot (11)

Similar to Media Fragments Indexing using Social Media

Similar to Media Fragments Indexing using Social Media (20)

More from LinkedTV

More from LinkedTV (20)

Recently uploaded

Recently uploaded (20)

Media Fragments Indexing using Social Media