SlideShare una empresa de Scribd logo
1 de 22
2022.04.26
CatchLive: Real-time Summarization of Live Streams with Stream
Content and Interaction Data
Saelyne Yang et al.
CVPR ’21
Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems
2
Introduction
• Viewers who join in the middle want to understand content
• It is difficult to know important part in the video
High-level overview
of previous content
Different levels of
details
Minimal interruption
to the current stream
• CatchLive provides a real-time summary of ongoing live streams by
utilizing the stream content and user interaction data
Overview of the stream that
segments into multiple sections
A summary of highlights for each
sections
visual changes chat dynamics
3
Related Work
Video Summarization and Highlight Generation Techniques
• Machine learning techniques to create highlights and summaries using
visual/audio elements, or metadata
• Finding points of importance and confusion from the user interaction
data
Live Stream Summarization Techniques
• Leveraging user chat messages
• Selecting salient messages through upvoting
Improving Live Streaming Interaction
• Multi-modal tools to improve live steam interaction
• Sharing snapshots for visual arts
• Deciding scenes by voting in audience participation games
• Suggesting hints
4
Formative Study
• 10 viewers who watch live streams at least once a week
• Think aloud study with 4 participants
• Watching instructional and entertainment streams after joining in
the middle
Current Practices and Challenges of Catching up
• Reading chat messages
• Chat messages are less helpful in entertainment streams 
emotional reactions
• Asking others in the chat
• Rewinding the video
• Possible to miss the current parts
• Follow asynchronously in instructional streams
5
Formative Study
Needs for Catching up in Live Streams
• Overview of the stream
• A sequential overview of the stream
• Step-by-step (instructional stream)
• Overall flow of the stream
• Covering subtopics (entertainment stream)
• Different levels of detail depending on the content and the current
context
• Seek detailed information
• Important concept or terminology
• Highlight moments that other viewers enjoyed
• Catching up with minimal interruption to the current stream
• Enjoy real-time stream, not miss out on key events
6
CatchLive Interface
Timeline Information of the Stream
• High-level sections
• Timeline with a representative snapshot and several keywords
• Overall structure of the previous content
7
CatchLive Interface
• Segmenting the live stream into high-level sections
• Providing a summary of highlight moments
• Snap and transcript
• Showing more details in a chat format
Top-2 highlight
moments
Chat messages
of the
highlight
moments
8
CatchLive Interface
Annotating and Sharing
Moments from the Stream
• Taking a snapshot and
sharing in the chat
• The most recent sentence
from the transcript is
captured along with the
screenshot
• “Likes” providing
meaningful signals for
summarization
9
Algorithms
Online Segmentation Algorithms
• Score to find section boundary from candidates
• Set minimum and maximum length as 5 and 20 min
• Transcript regions
• Google Speech-to-Text API identifies as a unit
• Candidate boundary region
• Interval between the end and the beginning of a transcript region
• Score only the breaks between transcript regions
• Highest score is the final boundary
10
Online Segmentation Algorithms
Visual difference of scenes
• Compute the Structural Similarity Index (SSIM) between two frames
• Higher scores for more visual differences between tow adjacent
transcript regions (A, B)
Keywords from the transcript and the chat
• Exclude stopwords
• Take out the 10 most frequent words
• Extract from adjacent transcript regions (P, Q)
11
Online Segmentation Algorithms
Transitional cues
• Ending cues (“that’s all”, “done”, “therefore”) to be close to the end
of a sentence
• Starting cues (“start”, “next”) to be close to the start
• Use 8 cues
12
Online Segmentation Algorithms
Chat frequency
• Higher score if there are few chat messages
• s: starting time, e: ending time
Duration of a break
• Long break might indicate a shift to the next sentence
• di : duration of a candidate boundary region i
13
Highlights Detection Algorithms
• Divide a section into one-minute intervals
• Higher scores for more chat messages
• Snapshot = 3x of plain chat
• Number of likes gives weights to the message
• Number of viewers may fluctuate
• Divide the overall score with the log of the number of viewers
• a: weight of the message
• N: number of likes on the message
• M: number of viewers
• Use peak detection algorithm for a highlighting moment
14
Preliminary Evaluation
• Use seven publicly available live streams
• Ground truth segment posted by a streamer and a viewer
• Threshold
• Low-threshold: minimum between 3 min and 3% of the length
• High-threshold: minimum between 5 min and 5% of the length
• Minimum and maximum lengths of a segment
• Standard: 5 min and 20 min
• Minmax: min. and max. length of the ground truth
• Weight of the five factors
• Base-coeff: uniform weight
• Optimal-coeff: optimal weight with the highest accuracy
15
Preliminary Evaluation
Result
• Optimal-coeff
• Standard
• Low-threshold
• 67.6%
• High-threshold
• 75.9%
• Minmax
• Low-threshold
• 78.6%
• High-threshold
• 87.2%
• Base-coeff  optimal-coeff: increase 25 percentage on average
16
User Evaluation 1
CatchLive for different genres of streams
Methodology
• Stream Information
• Stock: unstructured, audial
• Cooking: procedural, slow
• Game: visual, fast
• Major attributes
• Content type
• Unstructured/ Procedural
• Content format
• Visual/ Audial
• Stream pace
• Slow/Fast
• Online survey after watching videos
17
User Evaluation 1
CatchLive for different genres of streams
Findings
How participants use CatchLive’s summary features
• Timeline helps viewers grasp the overview of the stream
• Which parts to watch
• Highlights help viewers
• Identify important moments
• Understand more about the stream
• Fill the void in live streams
• Timeline and highlights allow viewers to catch up with less
interruption compared to rewinding
18
User Evaluation 1
CatchLive for different genres of streams
Findings
How catching up behavior differs across the stream characteristics
• Participants use the timeline in different ways
• Understand topics covered in the previous parts (stock)
• Cooking stream contains procedural knowledge
• Highlights were most useful in stream with visual content
• Game (3.88/5), Stock (2.56/5)
• Highlights allow user to quickly grasp what happened
• Summary features were least distracting in streams with slow pace
• Cooking (3.87/5 for not distracting)  less verbal info
• Game (3.25/5), felt most interruption
• Percentage of voice: Stock 92.1%, Cooking 61.7%, Game 75.3%
19
User Evaluation 2
Comparative Study Using the Game Stream
H1) Viewers with CatchLive will have a better understanding of the stream than viewers without
it
H2) Viewers with CatchLive will be more engaged with the stream than viewers without
Methodology
• Stream Information
• Gaming stream from the first user study
• Participants
• 17 for baseline group
• 16 for CatchLive group from the first user study
• Procedure
• Between-subject
• Online survey after watching stream for about 25 min
• Understanding and engagement related questions
• Factual recall questions to measure the accuracy of understanding
20
User Evaluation 2
Comparative Study Using the Game Stream
Findings
H1: better understanding
• Average understanding score was higher
in CatchLive group but no significant
difference
H2: more engaged
• CatchLive group was more active in the
stream than the baseline group
(Q1) List the animals that the streamer met
(Q2) List the tools and what the streamer
did with the tool
(Q3) List the keywords that were covered in
the stream other than animals and tools
21
Conclusion
Thank you

Más contenido relacionado

Similar a CatchLIve: Real-time Summarization of Live Streams with Stream Content and Interaction Data

Foundational Design Patterns for Multi-Purpose Applications
Foundational Design Patterns for Multi-Purpose ApplicationsFoundational Design Patterns for Multi-Purpose Applications
Foundational Design Patterns for Multi-Purpose Applications
Ching-Hwa Yu
 
Policy-Driven Dynamic HTTP Adaptive Streaming Player Environment
Policy-Driven Dynamic HTTP Adaptive Streaming Player EnvironmentPolicy-Driven Dynamic HTTP Adaptive Streaming Player Environment
Policy-Driven Dynamic HTTP Adaptive Streaming Player Environment
Minh Nguyen
 
Policy-Driven Dynamic HTTP Adaptive Streaming Player Environment
Policy-Driven Dynamic HTTP Adaptive Streaming Player EnvironmentPolicy-Driven Dynamic HTTP Adaptive Streaming Player Environment
Policy-Driven Dynamic HTTP Adaptive Streaming Player Environment
Alpen-Adria-Universität
 
Modeling of players activity by Michel pierfitte, Director of Game Analytics ...
Modeling of players activity by Michel pierfitte, Director of Game Analytics ...Modeling of players activity by Michel pierfitte, Director of Game Analytics ...
Modeling of players activity by Michel pierfitte, Director of Game Analytics ...
Sylvain Gauthier
 
Voxeo Summit Day 2 - Voxeo CXP for business users
Voxeo Summit Day 2 - Voxeo CXP for business usersVoxeo Summit Day 2 - Voxeo CXP for business users
Voxeo Summit Day 2 - Voxeo CXP for business users
Voxeo Corp
 

Similar a CatchLIve: Real-time Summarization of Live Streams with Stream Content and Interaction Data (20)

How To Plan a Webinar | The Planning and Implementation of a Webinar
How To Plan a Webinar | The Planning and Implementation of a WebinarHow To Plan a Webinar | The Planning and Implementation of a Webinar
How To Plan a Webinar | The Planning and Implementation of a Webinar
 
Multimedia streaming
Multimedia streamingMultimedia streaming
Multimedia streaming
 
The Newest in Session Types
The Newest in Session TypesThe Newest in Session Types
The Newest in Session Types
 
Adding Real-time Features to PHP Applications
Adding Real-time Features to PHP ApplicationsAdding Real-time Features to PHP Applications
Adding Real-time Features to PHP Applications
 
Foundational Design Patterns for Multi-Purpose Applications
Foundational Design Patterns for Multi-Purpose ApplicationsFoundational Design Patterns for Multi-Purpose Applications
Foundational Design Patterns for Multi-Purpose Applications
 
Multimodal Features for Linking Television Content
Multimodal Features for Linking Television ContentMultimodal Features for Linking Television Content
Multimodal Features for Linking Television Content
 
Policy-Driven Dynamic HTTP Adaptive Streaming Player Environment
Policy-Driven Dynamic HTTP Adaptive Streaming Player EnvironmentPolicy-Driven Dynamic HTTP Adaptive Streaming Player Environment
Policy-Driven Dynamic HTTP Adaptive Streaming Player Environment
 
Policy-Driven Dynamic HTTP Adaptive Streaming Player Environment
Policy-Driven Dynamic HTTP Adaptive Streaming Player EnvironmentPolicy-Driven Dynamic HTTP Adaptive Streaming Player Environment
Policy-Driven Dynamic HTTP Adaptive Streaming Player Environment
 
Network and Multimedia QoE Management
Network and Multimedia QoE ManagementNetwork and Multimedia QoE Management
Network and Multimedia QoE Management
 
How to Architect your WebRTC application, Alberto Gonzalez and Arin Sime, Web...
How to Architect your WebRTC application, Alberto Gonzalez and Arin Sime, Web...How to Architect your WebRTC application, Alberto Gonzalez and Arin Sime, Web...
How to Architect your WebRTC application, Alberto Gonzalez and Arin Sime, Web...
 
Webinar on webinars
Webinar on webinarsWebinar on webinars
Webinar on webinars
 
(Slides) P2P video broadcast based on per-peer transcoding and its evaluatio...
(Slides) P2P video broadcast based on per-peer transcoding and its evaluatio...(Slides) P2P video broadcast based on per-peer transcoding and its evaluatio...
(Slides) P2P video broadcast based on per-peer transcoding and its evaluatio...
 
Building an recommendation system for IPTV on a fast streaming architecture -...
Building an recommendation system for IPTV on a fast streaming architecture -...Building an recommendation system for IPTV on a fast streaming architecture -...
Building an recommendation system for IPTV on a fast streaming architecture -...
 
Conducting Remote Unmoderated Usability Testing: Part 2
Conducting Remote Unmoderated Usability Testing: Part 2Conducting Remote Unmoderated Usability Testing: Part 2
Conducting Remote Unmoderated Usability Testing: Part 2
 
Remote Assistante
Remote AssistanteRemote Assistante
Remote Assistante
 
Modeling of players activity by Michel pierfitte, Director of Game Analytics ...
Modeling of players activity by Michel pierfitte, Director of Game Analytics ...Modeling of players activity by Michel pierfitte, Director of Game Analytics ...
Modeling of players activity by Michel pierfitte, Director of Game Analytics ...
 
Performance Analysis: The USE Method
Performance Analysis: The USE MethodPerformance Analysis: The USE Method
Performance Analysis: The USE Method
 
Realtime classroom analytics powered by apache druid
Realtime classroom analytics powered by apache druid Realtime classroom analytics powered by apache druid
Realtime classroom analytics powered by apache druid
 
Understanding In-Video Dropouts and Interaction Peaks in Online Lecture Videos
Understanding In-Video Dropouts and Interaction Peaks in Online Lecture VideosUnderstanding In-Video Dropouts and Interaction Peaks in Online Lecture Videos
Understanding In-Video Dropouts and Interaction Peaks in Online Lecture Videos
 
Voxeo Summit Day 2 - Voxeo CXP for business users
Voxeo Summit Day 2 - Voxeo CXP for business usersVoxeo Summit Day 2 - Voxeo CXP for business users
Voxeo Summit Day 2 - Voxeo CXP for business users
 

Más de ivaderivader

Reinforcement Learning-based Placement of Charging Stations in Urban Road Net...
Reinforcement Learning-based Placement of Charging Stations in Urban Road Net...Reinforcement Learning-based Placement of Charging Stations in Urban Road Net...
Reinforcement Learning-based Placement of Charging Stations in Urban Road Net...
ivaderivader
 
Prediction for Retrospection: Integrating Algorithmic Stress Prediction into ...
Prediction for Retrospection: Integrating Algorithmic Stress Prediction into ...Prediction for Retrospection: Integrating Algorithmic Stress Prediction into ...
Prediction for Retrospection: Integrating Algorithmic Stress Prediction into ...
ivaderivader
 

Más de ivaderivader (20)

Argument Mining
Argument MiningArgument Mining
Argument Mining
 
Papers at CHI23
Papers at CHI23Papers at CHI23
Papers at CHI23
 
DDGK: Learning Graph Representations for Deep Divergence Graph Kernels
DDGK: Learning Graph Representations for Deep Divergence Graph KernelsDDGK: Learning Graph Representations for Deep Divergence Graph Kernels
DDGK: Learning Graph Representations for Deep Divergence Graph Kernels
 
So Predictable! Continuous 3D Hand Trajectory Prediction in Virtual Reality
So Predictable! Continuous 3D Hand Trajectory Prediction in Virtual Reality So Predictable! Continuous 3D Hand Trajectory Prediction in Virtual Reality
So Predictable! Continuous 3D Hand Trajectory Prediction in Virtual Reality
 
Reinforcement Learning-based Placement of Charging Stations in Urban Road Net...
Reinforcement Learning-based Placement of Charging Stations in Urban Road Net...Reinforcement Learning-based Placement of Charging Stations in Urban Road Net...
Reinforcement Learning-based Placement of Charging Stations in Urban Road Net...
 
Prediction for Retrospection: Integrating Algorithmic Stress Prediction into ...
Prediction for Retrospection: Integrating Algorithmic Stress Prediction into ...Prediction for Retrospection: Integrating Algorithmic Stress Prediction into ...
Prediction for Retrospection: Integrating Algorithmic Stress Prediction into ...
 
Mem2Seq: Effectively Incorporating Knowledge Bases into End-to-End Task-Orien...
Mem2Seq: Effectively Incorporating Knowledge Bases into End-to-End Task-Orien...Mem2Seq: Effectively Incorporating Knowledge Bases into End-to-End Task-Orien...
Mem2Seq: Effectively Incorporating Knowledge Bases into End-to-End Task-Orien...
 
A Style-Based Generator Architecture for Generative Adversarial Networks
A Style-Based Generator Architecture for Generative Adversarial NetworksA Style-Based Generator Architecture for Generative Adversarial Networks
A Style-Based Generator Architecture for Generative Adversarial Networks
 
Perception! Immersion! Empowerment! Superpowers as Inspiration for Visualization
Perception! Immersion! Empowerment! Superpowers as Inspiration for VisualizationPerception! Immersion! Empowerment! Superpowers as Inspiration for Visualization
Perception! Immersion! Empowerment! Superpowers as Inspiration for Visualization
 
Learning to Remember Patterns: Pattern Matching Memory Networks for Traffic F...
Learning to Remember Patterns: Pattern Matching Memory Networks for Traffic F...Learning to Remember Patterns: Pattern Matching Memory Networks for Traffic F...
Learning to Remember Patterns: Pattern Matching Memory Networks for Traffic F...
 
Neural Approximate Dynamic Programming for On-Demand Ride-Pooling
Neural Approximate Dynamic Programming for On-Demand Ride-PoolingNeural Approximate Dynamic Programming for On-Demand Ride-Pooling
Neural Approximate Dynamic Programming for On-Demand Ride-Pooling
 
StoryMap: Using Social Modeling and Self-Modeling to Support Physical Activit...
StoryMap: Using Social Modeling and Self-Modeling to Support Physical Activit...StoryMap: Using Social Modeling and Self-Modeling to Support Physical Activit...
StoryMap: Using Social Modeling and Self-Modeling to Support Physical Activit...
 
Bad Breakdowns, Useful Seams, and Face Slapping: Analysis of VR Fails on YouTube
Bad Breakdowns, Useful Seams, and Face Slapping: Analysis of VR Fails on YouTubeBad Breakdowns, Useful Seams, and Face Slapping: Analysis of VR Fails on YouTube
Bad Breakdowns, Useful Seams, and Face Slapping: Analysis of VR Fails on YouTube
 
Invertible Denoising Network: A Light Solution for Real Noise Removal
Invertible Denoising Network: A Light Solution for Real Noise RemovalInvertible Denoising Network: A Light Solution for Real Noise Removal
Invertible Denoising Network: A Light Solution for Real Noise Removal
 
Traffic Demand Prediction Based Dynamic Transition Convolutional Neural Network
Traffic Demand Prediction Based Dynamic Transition Convolutional Neural NetworkTraffic Demand Prediction Based Dynamic Transition Convolutional Neural Network
Traffic Demand Prediction Based Dynamic Transition Convolutional Neural Network
 
MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training
MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training  MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training
MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training
 
Screen2Vec: Semantic Embedding of GUI Screens and GUI Components
Screen2Vec: Semantic Embedding of GUI Screens and GUI ComponentsScreen2Vec: Semantic Embedding of GUI Screens and GUI Components
Screen2Vec: Semantic Embedding of GUI Screens and GUI Components
 
Augmenting Decisions of Taxi Drivers through Reinforcement Learning for Impro...
Augmenting Decisions of Taxi Drivers through Reinforcement Learning for Impro...Augmenting Decisions of Taxi Drivers through Reinforcement Learning for Impro...
Augmenting Decisions of Taxi Drivers through Reinforcement Learning for Impro...
 
Natural Language to Visualization by Neural Machine Translation
Natural Language to Visualization by Neural Machine TranslationNatural Language to Visualization by Neural Machine Translation
Natural Language to Visualization by Neural Machine Translation
 
Recommending What Video to Watch Next: A Multitask Ranking System
Recommending What Video to Watch Next: A Multitask Ranking SystemRecommending What Video to Watch Next: A Multitask Ranking System
Recommending What Video to Watch Next: A Multitask Ranking System
 

Último

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 

CatchLIve: Real-time Summarization of Live Streams with Stream Content and Interaction Data

  • 1. 2022.04.26 CatchLive: Real-time Summarization of Live Streams with Stream Content and Interaction Data Saelyne Yang et al. CVPR ’21 Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems
  • 2. 2 Introduction • Viewers who join in the middle want to understand content • It is difficult to know important part in the video High-level overview of previous content Different levels of details Minimal interruption to the current stream • CatchLive provides a real-time summary of ongoing live streams by utilizing the stream content and user interaction data Overview of the stream that segments into multiple sections A summary of highlights for each sections visual changes chat dynamics
  • 3. 3 Related Work Video Summarization and Highlight Generation Techniques • Machine learning techniques to create highlights and summaries using visual/audio elements, or metadata • Finding points of importance and confusion from the user interaction data Live Stream Summarization Techniques • Leveraging user chat messages • Selecting salient messages through upvoting Improving Live Streaming Interaction • Multi-modal tools to improve live steam interaction • Sharing snapshots for visual arts • Deciding scenes by voting in audience participation games • Suggesting hints
  • 4. 4 Formative Study • 10 viewers who watch live streams at least once a week • Think aloud study with 4 participants • Watching instructional and entertainment streams after joining in the middle Current Practices and Challenges of Catching up • Reading chat messages • Chat messages are less helpful in entertainment streams  emotional reactions • Asking others in the chat • Rewinding the video • Possible to miss the current parts • Follow asynchronously in instructional streams
  • 5. 5 Formative Study Needs for Catching up in Live Streams • Overview of the stream • A sequential overview of the stream • Step-by-step (instructional stream) • Overall flow of the stream • Covering subtopics (entertainment stream) • Different levels of detail depending on the content and the current context • Seek detailed information • Important concept or terminology • Highlight moments that other viewers enjoyed • Catching up with minimal interruption to the current stream • Enjoy real-time stream, not miss out on key events
  • 6. 6 CatchLive Interface Timeline Information of the Stream • High-level sections • Timeline with a representative snapshot and several keywords • Overall structure of the previous content
  • 7. 7 CatchLive Interface • Segmenting the live stream into high-level sections • Providing a summary of highlight moments • Snap and transcript • Showing more details in a chat format Top-2 highlight moments Chat messages of the highlight moments
  • 8. 8 CatchLive Interface Annotating and Sharing Moments from the Stream • Taking a snapshot and sharing in the chat • The most recent sentence from the transcript is captured along with the screenshot • “Likes” providing meaningful signals for summarization
  • 9. 9 Algorithms Online Segmentation Algorithms • Score to find section boundary from candidates • Set minimum and maximum length as 5 and 20 min • Transcript regions • Google Speech-to-Text API identifies as a unit • Candidate boundary region • Interval between the end and the beginning of a transcript region • Score only the breaks between transcript regions • Highest score is the final boundary
  • 10. 10 Online Segmentation Algorithms Visual difference of scenes • Compute the Structural Similarity Index (SSIM) between two frames • Higher scores for more visual differences between tow adjacent transcript regions (A, B) Keywords from the transcript and the chat • Exclude stopwords • Take out the 10 most frequent words • Extract from adjacent transcript regions (P, Q)
  • 11. 11 Online Segmentation Algorithms Transitional cues • Ending cues (“that’s all”, “done”, “therefore”) to be close to the end of a sentence • Starting cues (“start”, “next”) to be close to the start • Use 8 cues
  • 12. 12 Online Segmentation Algorithms Chat frequency • Higher score if there are few chat messages • s: starting time, e: ending time Duration of a break • Long break might indicate a shift to the next sentence • di : duration of a candidate boundary region i
  • 13. 13 Highlights Detection Algorithms • Divide a section into one-minute intervals • Higher scores for more chat messages • Snapshot = 3x of plain chat • Number of likes gives weights to the message • Number of viewers may fluctuate • Divide the overall score with the log of the number of viewers • a: weight of the message • N: number of likes on the message • M: number of viewers • Use peak detection algorithm for a highlighting moment
  • 14. 14 Preliminary Evaluation • Use seven publicly available live streams • Ground truth segment posted by a streamer and a viewer • Threshold • Low-threshold: minimum between 3 min and 3% of the length • High-threshold: minimum between 5 min and 5% of the length • Minimum and maximum lengths of a segment • Standard: 5 min and 20 min • Minmax: min. and max. length of the ground truth • Weight of the five factors • Base-coeff: uniform weight • Optimal-coeff: optimal weight with the highest accuracy
  • 15. 15 Preliminary Evaluation Result • Optimal-coeff • Standard • Low-threshold • 67.6% • High-threshold • 75.9% • Minmax • Low-threshold • 78.6% • High-threshold • 87.2% • Base-coeff  optimal-coeff: increase 25 percentage on average
  • 16. 16 User Evaluation 1 CatchLive for different genres of streams Methodology • Stream Information • Stock: unstructured, audial • Cooking: procedural, slow • Game: visual, fast • Major attributes • Content type • Unstructured/ Procedural • Content format • Visual/ Audial • Stream pace • Slow/Fast • Online survey after watching videos
  • 17. 17 User Evaluation 1 CatchLive for different genres of streams Findings How participants use CatchLive’s summary features • Timeline helps viewers grasp the overview of the stream • Which parts to watch • Highlights help viewers • Identify important moments • Understand more about the stream • Fill the void in live streams • Timeline and highlights allow viewers to catch up with less interruption compared to rewinding
  • 18. 18 User Evaluation 1 CatchLive for different genres of streams Findings How catching up behavior differs across the stream characteristics • Participants use the timeline in different ways • Understand topics covered in the previous parts (stock) • Cooking stream contains procedural knowledge • Highlights were most useful in stream with visual content • Game (3.88/5), Stock (2.56/5) • Highlights allow user to quickly grasp what happened • Summary features were least distracting in streams with slow pace • Cooking (3.87/5 for not distracting)  less verbal info • Game (3.25/5), felt most interruption • Percentage of voice: Stock 92.1%, Cooking 61.7%, Game 75.3%
  • 19. 19 User Evaluation 2 Comparative Study Using the Game Stream H1) Viewers with CatchLive will have a better understanding of the stream than viewers without it H2) Viewers with CatchLive will be more engaged with the stream than viewers without Methodology • Stream Information • Gaming stream from the first user study • Participants • 17 for baseline group • 16 for CatchLive group from the first user study • Procedure • Between-subject • Online survey after watching stream for about 25 min • Understanding and engagement related questions • Factual recall questions to measure the accuracy of understanding
  • 20. 20 User Evaluation 2 Comparative Study Using the Game Stream Findings H1: better understanding • Average understanding score was higher in CatchLive group but no significant difference H2: more engaged • CatchLive group was more active in the stream than the baseline group (Q1) List the animals that the streamer met (Q2) List the tools and what the streamer did with the tool (Q3) List the keywords that were covered in the stream other than animals and tools

Notas del editor

  1. Error threshold
  2. Error threshold