Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

Big Data at Tube: Events to Insights to Action

The presentation talks about the Events architecture and how these are leveraged to produce the insights and necessary actions to drive the use-cases.

  • Inicia sesión para ver los comentarios

Big Data at Tube: Events to Insights to Action

  1. 1. Big Data at Tube (Events → Insights → Actions) 27th April 2016 @John Trenkle (Chief Scientist) @Murtaza Doctor (Director of Engineering, RTB)
  2. 2. ©2016 TubeMogul Inc. All rights reserved. • Where do we fit? • What do we do? • Life of a video Ad • RTB Architecture • Events Architecture • ML Perspective: Transactional -> User-Oriented • Data -> Models • Models -> Action Outline
  3. 3. Busy Ad-Tech Landscape
  4. 4. ©2016 TubeMogul Inc. All rights reserved. Where does TubeMogul fit?
  5. 5. ©2016 TubeMogul Inc. All rights reserved. Scale: An enterprise software company for digital branding ● Processed over 12.6 Trillion Ad Auctions in 2015 ● Serve over 55 billion auctions per day ● Served over 3 Billion Ad Impressions on linear TV via our PTV solution ● Process bids in < 50 ms ● Serve bid responses in < 80 ms (includes network round-trip) ● Serve 5 PB of monthly video traffic
  6. 6. ©2016 TubeMogul Inc. All rights reserved. Ex: Life of a video Ad:
  7. 7. ©2016 TubeMogul Inc. All rights reserved. Technical Overview Bidding Layer Ad Serving - High Volumes - Low Latency - Small Packets - Large Data Sets - Low Latency - Fast Processing - Large Caches Low Latency User Database for User Targeting and Frequency Capping
  8. 8. ©2016 TubeMogul Inc. All rights reserved. Events Architecture: ● Auctions (Bids + Non Bids) ● Win Events (Impressions) ● Columnar format (ORC) ● Data Pipeline? ● Bad data? ● Scaling challenges ● Multiple downstream consumers
  9. 9. ©2016 TubeMogul Inc. All rights reserved. Events Architecture
  10. 10. ©2016 TubeMogul Inc. All rights reserved. Events Architecture: Takeaways ● Simply and Unify ● Focus on Data Validation at each step ● Automated recovery ● Leverage the messaging system for status or completion ● Metrics & Measurement for SLA
  11. 11. ©2016 TubeMogul Inc. All rights reserved. Machine-Learning as a Consumer • Audience Modeling begets user-oriented data • Pivot RTB / Analytics sources for model-building • Many sources of Truth that need to be integrated • Ad Interaction • Characterize Users with robust signature (UU-Code) rather than just an item list • Facilitate rapid prototyping and model-building • Maintain enriched information for exploratory analysis and visualization • Insights • Actionable Intel
  12. 12. ©2016 TubeMogul Inc. All rights reserved. Ad Calls to User-Traces in Hive (on path to NoSQL) Hive RTB Ad Calls RTB Digest User Activity NoSQL RTB Ad Calls User Activity Elastic Search
  13. 13. ©2016 TubeMogul Inc. All rights reserved. Token Embedding Models and Spark http://deepdist.com/ Ref: http://static.googleusercontent.com/media/research.google.com/en//archive/large_deep_networks_nips2012.pdf
  14. 14. ©2016 TubeMogul Inc. All rights reserved. Cascading for Signatures 1. JOIN on tm_client 2. Filter average weight per verticals < 0.5 Daily Users Activities Prefixed Daily UUCode Creation Process Daily UUCodes TM Client Daily Activity3 Get Truth Users By LAL Segment Daily Truth Users for all LAL segment Centroid Creation Process LAL Landmarks Segment Creation Process User Membership Unfiltered UUCode Model TM Daily Converters Convs LAL segments from Mario User Membership Attach SourceID Process Daily UUCodes with Source ID TMClientID SourceID Lookup Aggregated UUCode Creation Process UU Code TM Client Digest3 Create SourceID Lookup Process Wormhole Process Segment Filter Process ~650GB UDB Team Persistent Users Table
  15. 15. ©2016 TubeMogul Inc. All rights reserved. Large-Scale Predictive Model Building Get Truth Users, signature Data Warehouse Of truth users Training Data Creation Training Data for segments Ground Truth For each segment, perform training Check performance, log in mysql for tracking purposes. Model/ weights file for each segment Aggregate and Convert to UUCode UU Code Model 3 months aggregatio n Segment Information Dashboard UI
  16. 16. ©2016 TubeMogul Inc. All rights reserved. Partners that have Contributed to Our Ecosystem • Qubole • Long-time partners • Great for Ad Hoc queries and scheduled ETL • Dynamic Scaling • Snowflake • Data Warehouse – facilitates Fraud Analysis • SpotInst • Cost effective Spot Instances in EMR • Robust provisioning • Dynamic Scaling • Driven • Monitor, optimize and debug Hadoop flows
  17. 17. ©2016 TubeMogul Inc. All rights reserved. Since Hive has been our primary datastore for a while… • Tips and tricks • ORC • MAPJOIN • Sorted, Bucketed JOINs • TRANSFORM • HAVING • Hadoop Streaming
  18. 18. ©2016 TubeMogul Inc. All rights reserved. Models → Action • Optimization • Surrogate measures of engagement: Clicks, Completions, Conversions • Audience Building for Targeting • Demographic • Behavioral • Fraud Detection • Cross Device Synching • Profiling / Data Mining / Actionable Intel

×