Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

Geetu Ambwani, Principal Data Scientist, Huffington Post at MLconf NYC - 4/15/16

879 visualizaciones

Publicado el

Data Science in the Newsroom

Publicado en: Tecnología
  • Sé el primero en comentar

Geetu Ambwani, Principal Data Scientist, Huffington Post at MLconf NYC - 4/15/16

  1. 1. Data Science in the Newsroom Geetu Ambwani Principal Data Scientist geetu.ambwani@huffingtonpost.com
  2. 2. What is the Huffington Post? Founded May 2005 Ranking among Digital-only news websites 1 Cross-platform monthly unique visitors Over 187 Million Number of articles per day Over 500 Number of international editions 15 Bloggers Over 100,000
  3. 3. News Industry - Trends HuffPost has consistently been an innovator in the digital publishing space. Massive Blogging Network: More than 100K bloggers across the globe
  4. 4. News Industry - Trends HuffPost has consistently been an innovator in the digital publishing space. Google Site Rank
  5. 5. News Industry - Trends HuffPost has consistently been an innovator in the digital publishing space. Biggest Social publisher
  6. 6. News Industry - Challenges
  7. 7. How Can Data Help ?
  8. 8. Ad campaigns International editionsSocial media promotion Editors User-experience Blog moderators Reporters HuffPost Studio
  9. 9. Content Lifecycle DistributionCreation Consumption
  10. 10. Content Creation: How Can Data Help ? ● Tools to help surface, discover trends in different parts of the web ● Content Enhancement with multimedia based on semantic matching (images, slideshows, videos) ● Optimizing headlines/images (RobinHood Platform)
  11. 11. Content Gap: Production Versus Consumption
  12. 12. Content Consumption: How Can Data Help? Know Your Audience ● User Cohorts: ○ Social Traffic versus FrontPage Clickers consume different content ○ Desktop Vs Mobile consumption ● Recommendations/Personalization ● Can we use data to inform product design and interface ? ○ Rearrange share buttons based on traffic origin (Facebook vs Pinterest)
  13. 13. Content Lifecycle DistributionCreation Consumption
  14. 14. Content Distribution: Can Data Help ? ● People’s attention is increasingly concentrated on social streams ○ More traffic to publishers from social than any other way ● Are Distributed Platforms the new home page ? ○ Facebook Instant, Apple News, Snapchat Discover, Google Amp ○ Messenger Bots ● You need to be where your audience is: ○ Identify the content mix that is maximally engaging on an external platform ○ Can we use data to seed these distribution networks ? (Facebook HuffPost Pages, Snapchat Discover)
  15. 15. Content Distribution: Can Data Help ? ● HuffPost produces 1000 articles a day - which of these do we promote ? ● Article PVs follow a very skewed distribution of success ○ Only 1% of our articles > 100k PVs ● Content performs differently on different networks. ● Can we predict the articles that will get traction in advance so ■ We can optimally seed multiple distribution channels (Facebook HP Pages, Snapchat Discover) ■ Target for premium/high value ads to maximize revenue ■ Populate Recommendation Widgets
  16. 16. Content Distribution: Can Data Help ? Challenges ● Histogram of traffic distribution - highly skewed. ● The very act of promoting something causes a bump in traffic. ● Data normalization - how long do want to wait before predicting ? ● Very imbalanced data set Our Approach ● Random Forest classifier. ● Multiple success criteria ● Historical examples of (+) and (-) articles. Downsampling. ● Different normalization thresholds ● Feature engineering: traffic growth ratios; initial organic social traffic per minute; distinct referrers;
  17. 17. Slackbot for the social promotion team ● 20% lift in PVs per predicted article
  18. 18. ● 20% lift in PVs per predicted article
  19. 19. Conclusion A Data Driven Newsroom today means ● More than just keeping track of clicks and shares ● Using predictive analytics to drive product and content placement Machine Learning will be a key driver for success with the advent of distributed content
  20. 20. Thanks ! MachineLearning@HuffPost

×