Más contenido relacionado

Similar a Take-away TV: Recharging Work Commutes with Greedy and Predictive Preloading of TV Content(20)


Take-away TV: Recharging Work Commutes with Greedy and Predictive Preloading of TV Content

  1. Mining TV-on-demand Services EPSRC project Dmytro Karamshuk
  2. Users - 32 M/month IP address – 20 M/month Sessions - 1.9 Billion May 2013 – Jan 2014 ≈ 50% of population Large-scale study of BBC iPlayer UK Population – 64M 2  x  INFOCOM’2015,  ToN’2015,  JSAC’2016
  3. Longitudinal View across ISPs Fixed-line Internet market (5 representative providers) Mobile market is more dynamic than the fixed-line Internet market Mobile Internet market (5 representative providers)
  4. Data caps decrease market share All-you-can-eat data (M1, M5) Limited-cap data packages (M2 – M4) All-you-can-eat plans boost user consumption
  5. Temporal Patterns in different ISPs Fixed-line accesses (F1-F5) peaks in the evening hours Mobile users watch more during commutes FixedLined ISPs Mobile,limiteddata caps
  6. There is a problem… Internet on trains in the UK is no good A study shows that 23.2% 3G packets and 37.2% 4G packets on the major train routes failed
  7. A useful insight: users watch across networks Users complete watching across different sessions and networks Fixed-line ISPs Mobile ISPs Per user completion ratio
  8. Speculative Content Pre-fetching Pre-fetch at home Watch during commutes
  9. Speculative Content Pre-fetching Not very efficient… Per-user mobile savings with pre-fetching
  10. Can we do better with predictive preloading?
  11. Towards Predicting User Preferences Featured content Most Popular Content
  12. How important are UI guidance? For 20% of users > 60% of their access are from the Front Page
  13. Content Types 11 channels 11 categories and 172 genres thousands shows
  14. 1 channel 2 channels 3 channels 20%0% 40% 60% 100% 1 category 2 category 3 categories 30%0% 75%55% 100% 1 genre 2 gen. 3 gen. 15%0% 40% 50%30% 4 gen. 100% 1 sh. 2 sh. 3 10%0% 25%20% 4 sh. 100%35% User Focus on Different Content Types share of users with all their sessions from: out of 11 channels out of 171 genres out of thousands shows out of 11 categ.
  15. importance content category 0.038 content genre 0.063 category affinity 0.042 genre affinity 0.103 show affinity 0.179 channel affinity 0.043 content age 0.087 User Preferences Total importance: 0.555 importance featured content 0.061 featured position 0.061 content popularity rank 0.071 popularity position 0.008 featured probability 0.091 UI Guidance Total importance: 0.292 importance previously watched 0.066 completion ratio 0.081 probability of re-watching 0.007 Repeatedly Watched Content Total importance: 0.154 Engineering Features
  16. Supervised Learning Problem: For a given user U and an episode E predict whether U will watch E Binary Classification Problem f(U,E) -> {0,1} Random Forest: fast, good performance on high dimensional data Negative Examples: randomly sample from what users did not watch Predictions: Predict probability, rank all episodes by probability
  17. Accuracy of Personalized Predictions For 50% of users over 70% chance of fitting in Top-10 predictions
  18. When do we do predictions? Front Pages are updated over night…
  19. When do we do predictions? … and remain largely unchanged for 24h
  20. How much traffic can be saved? Predictive pre-fetching can potentially save near 71% of mobile usage
  21. We made mobile users happy! How about the rest?
  22. Access Patterns Average per-user # sessions Correlation with Internet speed
  23. Content Delivery for Home Broadband Install more distributed caches May requires significant investments Any alternatives? Problem: how to handle peak load from 32M users
  24. Alternative: Peer-assisted Content Delivery Content Servers user user user user user user average of 5K users online every sec in the first day after release 5K duplicates every second!!! Ask users for assistance
  25. Elegant Theoretical Model for very Complex Behavior around 88% of savings can be achieved Data Analysis TheoreticalModel G c 1 e c
  26. Why it works? Top-5% of the content corpus accounts for 80% of traffic Most of accesses happen in the first day after release Yes, it’s all about very popular content
  27. Dmytro Karamshuk King’s College London “True genius resides in the capacity for evaluation of uncertain, hazardous, and conflicting information” - Winston Churchill