Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.
Data Platform and Services  Vipul Sharma and EyalReuveni
Agenda            Eventbrite           Data Products           Data Platform         Recommendations            Questions
•   A social event ticketing and discovery platform•   50th Million Ticket Sold•   Revenue doubled YOY•   180 Employees in...
Data Products
Analytics            • Add–Hoc queries by Analysts
Fraud and Spam
Data Platform
Hadoop Cluster•   30 persistent EC2 High-Memory Instances•   30TB disk with replication factor of 2, ext3 formatted•   CDH...
Infrastructure• Search   • Solr   • Incremental updates towards event driven• Recommendation/Graph   • Hadoop   • Native J...
Infrastructure• Stream   • RabbitMQ   • Internal Fire hose (Investigating Kafka)• Offline   •   MapRedude   •   Streaming ...
Infrastructure - Sqoozie• Workflow for mysql imports to HDFS    • Generate Sqoop commands    • Run these imports in parall...
Infrastructure - Blammo•   Raw logs are imported to HDFS via flume•   Almost real-time – 5 min latency•   Logs are key-val...
Recommendations
You will like to attend this event
Recommendation Engines                                                                                      Interest Graph...
Why Interest?  Events are Social          Events are InterestDense Graph is Irrelevant                            Interest...
How do we know your Interest?• We ask you• Based on your activity   • Events Attended   • Events Browsed• Facebook Interes...
Model Based vs Clustering            Item-Item vs User-User     Building Social Graph is Clustering StepSocial Graph Recom...
Implicit Social Graph                                 U1                            E1        E4                  U2      ...
Mixed Social Graph                                U1                           E1                 U2                  U3  ...
15M * 260 * 260 = 1.14 Trillion Edges               4Billion edges ranked   Each node is a feature vector representing a U...
Feature Generation•   Mixed Features•   A series of map-reduce jobs•   Output on HDFS in flat files; Input to subsequent j...
U1U2        U3
HBase
HBase• Collect data from multiple Map Reduce jobs   • Stores entire social graph   • Over one million writes per second
HBase    rowid     neighbors   events   featureX    2718282   101         3        0.3678795
HBaserowid     314159:n   314159:e   314159:fx   161803:n   161803:e   161803:fx2718282   31         1          0.3183    ...
Tips & Tricks• Distributed cache database   • Sped up some Map Reduce jobs by hours   • Be sure to use counters!
Tips & Tricks• Hive (ab)uses   •   Almost as many hive jobs as custom ones   •   “flip join”   •   Statistical functions u...
Tips & Tricks•   Memory Memory Memory•   LZO, WAL•   Combiners are great until•   Shuffle and Sorting stage•   Hadoop ecos...
Questions?
Implicit Social Graph U1 E1
Implicit Social Graph U1 E1
Implicit Social Graph U1 E1
Próxima SlideShare
Cargando en…5
×

Publicado el

Implicit Social Graph U1 E1 E4 U2 U3 E2 E3 U4 U5

Publicado en: Tecnología, Educación
  • الفيس بوك
       Responder 
    ¿Estás seguro?    No
    Tu mensaje aparecerá aquí

×