Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
CatchLIve: Real-time Summarization of Live Streams with Stream Content and Interaction Data
1. 2022.04.26
CatchLive: Real-time Summarization of Live Streams with Stream
Content and Interaction Data
Saelyne Yang et al.
CVPR ’21
Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems
2. 2
Introduction
• Viewers who join in the middle want to understand content
• It is difficult to know important part in the video
High-level overview
of previous content
Different levels of
details
Minimal interruption
to the current stream
• CatchLive provides a real-time summary of ongoing live streams by
utilizing the stream content and user interaction data
Overview of the stream that
segments into multiple sections
A summary of highlights for each
sections
visual changes chat dynamics
3. 3
Related Work
Video Summarization and Highlight Generation Techniques
• Machine learning techniques to create highlights and summaries using
visual/audio elements, or metadata
• Finding points of importance and confusion from the user interaction
data
Live Stream Summarization Techniques
• Leveraging user chat messages
• Selecting salient messages through upvoting
Improving Live Streaming Interaction
• Multi-modal tools to improve live steam interaction
• Sharing snapshots for visual arts
• Deciding scenes by voting in audience participation games
• Suggesting hints
4. 4
Formative Study
• 10 viewers who watch live streams at least once a week
• Think aloud study with 4 participants
• Watching instructional and entertainment streams after joining in
the middle
Current Practices and Challenges of Catching up
• Reading chat messages
• Chat messages are less helpful in entertainment streams
emotional reactions
• Asking others in the chat
• Rewinding the video
• Possible to miss the current parts
• Follow asynchronously in instructional streams
5. 5
Formative Study
Needs for Catching up in Live Streams
• Overview of the stream
• A sequential overview of the stream
• Step-by-step (instructional stream)
• Overall flow of the stream
• Covering subtopics (entertainment stream)
• Different levels of detail depending on the content and the current
context
• Seek detailed information
• Important concept or terminology
• Highlight moments that other viewers enjoyed
• Catching up with minimal interruption to the current stream
• Enjoy real-time stream, not miss out on key events
6. 6
CatchLive Interface
Timeline Information of the Stream
• High-level sections
• Timeline with a representative snapshot and several keywords
• Overall structure of the previous content
7. 7
CatchLive Interface
• Segmenting the live stream into high-level sections
• Providing a summary of highlight moments
• Snap and transcript
• Showing more details in a chat format
Top-2 highlight
moments
Chat messages
of the
highlight
moments
8. 8
CatchLive Interface
Annotating and Sharing
Moments from the Stream
• Taking a snapshot and
sharing in the chat
• The most recent sentence
from the transcript is
captured along with the
screenshot
• “Likes” providing
meaningful signals for
summarization
9. 9
Algorithms
Online Segmentation Algorithms
• Score to find section boundary from candidates
• Set minimum and maximum length as 5 and 20 min
• Transcript regions
• Google Speech-to-Text API identifies as a unit
• Candidate boundary region
• Interval between the end and the beginning of a transcript region
• Score only the breaks between transcript regions
• Highest score is the final boundary
10. 10
Online Segmentation Algorithms
Visual difference of scenes
• Compute the Structural Similarity Index (SSIM) between two frames
• Higher scores for more visual differences between tow adjacent
transcript regions (A, B)
Keywords from the transcript and the chat
• Exclude stopwords
• Take out the 10 most frequent words
• Extract from adjacent transcript regions (P, Q)
11. 11
Online Segmentation Algorithms
Transitional cues
• Ending cues (“that’s all”, “done”, “therefore”) to be close to the end
of a sentence
• Starting cues (“start”, “next”) to be close to the start
• Use 8 cues
12. 12
Online Segmentation Algorithms
Chat frequency
• Higher score if there are few chat messages
• s: starting time, e: ending time
Duration of a break
• Long break might indicate a shift to the next sentence
• di : duration of a candidate boundary region i
13. 13
Highlights Detection Algorithms
• Divide a section into one-minute intervals
• Higher scores for more chat messages
• Snapshot = 3x of plain chat
• Number of likes gives weights to the message
• Number of viewers may fluctuate
• Divide the overall score with the log of the number of viewers
• a: weight of the message
• N: number of likes on the message
• M: number of viewers
• Use peak detection algorithm for a highlighting moment
14. 14
Preliminary Evaluation
• Use seven publicly available live streams
• Ground truth segment posted by a streamer and a viewer
• Threshold
• Low-threshold: minimum between 3 min and 3% of the length
• High-threshold: minimum between 5 min and 5% of the length
• Minimum and maximum lengths of a segment
• Standard: 5 min and 20 min
• Minmax: min. and max. length of the ground truth
• Weight of the five factors
• Base-coeff: uniform weight
• Optimal-coeff: optimal weight with the highest accuracy
15. 15
Preliminary Evaluation
Result
• Optimal-coeff
• Standard
• Low-threshold
• 67.6%
• High-threshold
• 75.9%
• Minmax
• Low-threshold
• 78.6%
• High-threshold
• 87.2%
• Base-coeff optimal-coeff: increase 25 percentage on average
16. 16
User Evaluation 1
CatchLive for different genres of streams
Methodology
• Stream Information
• Stock: unstructured, audial
• Cooking: procedural, slow
• Game: visual, fast
• Major attributes
• Content type
• Unstructured/ Procedural
• Content format
• Visual/ Audial
• Stream pace
• Slow/Fast
• Online survey after watching videos
17. 17
User Evaluation 1
CatchLive for different genres of streams
Findings
How participants use CatchLive’s summary features
• Timeline helps viewers grasp the overview of the stream
• Which parts to watch
• Highlights help viewers
• Identify important moments
• Understand more about the stream
• Fill the void in live streams
• Timeline and highlights allow viewers to catch up with less
interruption compared to rewinding
18. 18
User Evaluation 1
CatchLive for different genres of streams
Findings
How catching up behavior differs across the stream characteristics
• Participants use the timeline in different ways
• Understand topics covered in the previous parts (stock)
• Cooking stream contains procedural knowledge
• Highlights were most useful in stream with visual content
• Game (3.88/5), Stock (2.56/5)
• Highlights allow user to quickly grasp what happened
• Summary features were least distracting in streams with slow pace
• Cooking (3.87/5 for not distracting) less verbal info
• Game (3.25/5), felt most interruption
• Percentage of voice: Stock 92.1%, Cooking 61.7%, Game 75.3%
19. 19
User Evaluation 2
Comparative Study Using the Game Stream
H1) Viewers with CatchLive will have a better understanding of the stream than viewers without
it
H2) Viewers with CatchLive will be more engaged with the stream than viewers without
Methodology
• Stream Information
• Gaming stream from the first user study
• Participants
• 17 for baseline group
• 16 for CatchLive group from the first user study
• Procedure
• Between-subject
• Online survey after watching stream for about 25 min
• Understanding and engagement related questions
• Factual recall questions to measure the accuracy of understanding
20. 20
User Evaluation 2
Comparative Study Using the Game Stream
Findings
H1: better understanding
• Average understanding score was higher
in CatchLive group but no significant
difference
H2: more engaged
• CatchLive group was more active in the
stream than the baseline group
(Q1) List the animals that the streamer met
(Q2) List the tools and what the streamer
did with the tool
(Q3) List the keywords that were covered in
the stream other than animals and tools