More Related Content
Similar to Data Applications and Infrastructure at LinkedIn__HadoopSummit2010 (20)
More from Yahoo Developer Network (20)
Data Applications and Infrastructure at LinkedIn__HadoopSummit2010
- 9. Open Source Zoie – Faceted Search Bobo – Real-time search indexing Decomposer – Very large matrix decomposition routines (now in Mahout) Norbert – Partition aware cluster management & RPC Voldemort – Key/Value storage Kamikaze – Compression package Sensei – Distributed search Azkaban – Hadoop workflow
- 16. Data Deployment How do you get your multi-billion edge probabilistic relationship graph to the live website to serve queries?
Editor's Notes
- This is the Title slide. Please use the name of the presentation that was used in the abstract submission.
- This is the agenda slide. There is only one of these in the deck.
- Why linkedin cares about derived data Why it is hard
- Talk about what you can do
- if you get bad results, I claim you are in an unsuccessful test! Still a small percentage of the quadrillion possible relationships (pairwise is hard)
- What we learned
- Azk is a workflow scheduler? What is workflow?
- Samurai rule Logic is in jobs, not job descriptor Jobs are independent Work – viz, polish
- This is the final slide; generally for questions at the end of the talk. Please post your contact information here.