1. Cloud Computing at Yahoo!Lessons, Challenges and Futures Geoff Arnold May 12,2011
2. Abstract Yahoo operates a wide range of global and regional web services, from email to photo sharing, from sports and business news to ecommerce, from social networks to education. Our properties range in size from global behemoths like Yahoo Mail to hyper-local startups. The numbers are huge. Yahoo sites attract more than 680 million users; our Superbowl coverage generated 37 million clicks, and there were 1.2 billion page views of our Oscars site. And this scale carries through to our technology: we analyze 100 billion events every day, and we’re rolling out 135 websites on our new media technology platform this year. All of this is focused on one end: to deliver the premier digital experience to our users. And this leads to an interesting challenge: how can we apply all of our resources - computation, networking, user data, media, and science – to deliver a consistent and profitable experience, with agility and efficiency? For Yahoo, the answer has been to embrace cloud computing. In this talk, I’ll discuss what this really means for software developers within Yahoo, in terms of technologies and engineering practices, and how we intend to transform the development and delivery of Yahoo applications. 5/12/11 Yahoo! Presentation 2
3. Agenda Why this is different from the last 99 cloud computing presentations you’ve sat through Motivation Déjà vu What we’ve learned Where we’re going 5/12/11 Yahoo! Presentation 3
4. Why this is different from the last 99 cloud computing talks you’ve sat through No definition of cloud If you don’t know it by now…. I’m not selling anything Except maybe a job opportunity or two…. No cool new technologies It’s mostly about operational and systems refactoring More questions than answers Progress report 5/12/11 Yahoo! Presentation 4
6. OurMission Create a global, scalable platform built on science that enables rapidinnovation and delivery of personalized, monetizable experiences across devices. 5/12/11 6 Yahoo! Presentation
8. INTERNET EDGE MEDIA ADVERTISING DATA HWY PLATFORMS PIPELINES BT COKE KEYSTONE YELLOWSTONE MAIL/ANTI-SPAM GD STONE CONTENT AGILITY User Data Media & Files UPS / UDS / UDB / SHERPA MOBSTOR HADOOP ODS STORAGE MAIL FRONTPAGE SEARCH 8
9. Space Yahoo is big. You just won't believe how vastly, hugely, mind- bogglingly big it is. I mean, you may think it's a long way down the road to the chemist's, but that's just peanuts to space Yahoo. Douglas Adams, The Hitchhiker's Guide to the Galaxy
10. Unfortunately, I can’t tell you just how big What can I say? 680+M users 200PB of data, adding ~50TB of data per day 100B events a day captured, collected, transported and processed 43k Hadoop servers running 5M+Hadoop jobs every month Let’s just say that life is simpler for other folks… 5/12/11 Yahoo! Presentation 10
11. Why cloud – specifically “private cloud”? After all, many people think that “private cloud” is an oxymoron OH: “You should either be using a public cloud or operating one.” 5/12/11 Yahoo! Presentation 11
13. Why is agility #1? March 11, 2011 – the Japanese earthquake and tsunami Yahoo News spiked to 20.6 million unique visitors We served 371 million page views We added a “Donate Now” button which raised $7M Frequently need to spin up new sites with high traffic and short lifetime: Royal Wedding Canadian Election 5/12/11 Yahoo! Presentation 13
14. But why private cloud? Our business is digital media:“People want to be informed, they want to be entertained, they want to be educated, and they want to communicate. And that's what Yahoo is all about.” There are no public cloud operators that have the scale, geographic presence, and functionality that we would need So we have to do it ourselves 5/12/11 Yahoo! Presentation 14
16. Hasn’t Yahoo been here before? There have been many public presentations on Yahoo’s cloud ambitions and technology over the last three years Architectural details IETF submissions Open source plans So what happened? And is this talk going to be just another one in the series? 5/12/11 Yahoo! Presentation 16
17. Progress report In some areas, we’ve made great progress Grid Cloud storage Transforming complex middleware into hosted services OH: “Cloud apps are SOA apps” In others, we got ahead of ourselves And we’ve learned some useful lessons 5/12/11 Yahoo! Presentation 17
19. The cloud compute API story is a mess We have procedural primitives to manipulate instances AWS EC2, RackSpace, OpenStack We have “virtual data center” declarative schemes VMware vCloud, Orancle/Sun, Yahoo CSE We have PaaS models which abstract away instances and data centers And there’s no coherent way to tie them all together 5/12/11 Yahoo! Presentation 19
20. Consequences How do I describe alternative roll-out/roll-back policies for my VDC? It seems natural to describe – or specify – the semantics of my declarative API in terms of the primitives in my procedural API How can I deploy a multi-tier application in which one tier was created using a PaaS framework One or more! Who’s on first? Does the VDC deployment description control the PaaS system? Does the PaaS application configuration trigger the deployment of other tiers? 5/12/11 Yahoo! Presentation 20
21. Forget “long tail” – what about the head? Yahoo has a number of “mega-properties” Mail, Front Page, Sports, News, Flickr They all want to take advantage of the benefits of cloud Potential for huge gains in efficiency, predictabilty However because of their size, they’ve had to radically optimize their technology and operations They have stringent performance (latency) requirements and use highly customized technologies This means that they aren’t necessarily a good fit for generic, multi-tenant services
22. Cranking up the rate of change Historically, Yahoo properties have acquired new hardware through traditional committee-driven processes So there are long lead times, and plenty of time to provision systems like asset management, DNS, monitoring, access control, and so forth Many of these systems were not designed to support real-time updates And even if they were, they were rarely stressed The effects of introducing on-demand provisioning tend to cascade though many operational systems 5/12/11 Yahoo! Presentation 22
23. Security policies Pre-cloud security policies tend to assume a single point of responsibility: the property which “owns” the box and the software stack that’s running on it Threat analysis tends to focus on the box, and on mitigating consequences of attack With the cloud, there are many more attack vectors: The application VM on the physical box Any other VM running on the box The hypervisor managing the box The VM management system Creating new security policies takes time 5/12/11 Yahoo! Presentation 23
24. Invalidating traditional assumptions In a slowly-changing environment, it’s easy to assume that some things are constant For example, Yahoo has a flat IP network with fixed IP addresses per-box and per-rack So it’s reasonable to use an IP address as a key for many things It violates current policy, but there are many legacy systems… So when a system that uses IP addresses for app instance identifiers is dynamically [re]deployed in a cloud fabric, things break E.g. Reprocessing historic Apache log records 5/12/11 Yahoo! Presentation 24
25. Legacy workloads Most people agree that the future of cloud computing lies with PaaS systems which provide a programming, development and operational model that is optimized for the cloud CloudFoundry and OpenShift are promising developments However it is difficult to make the case that the benefits of cloud computing should only be available to new applications So we have IaaS fabrics on which people run traditional workloads Sometimes unchanged, more often with minor cloud adaptations
26. Legacy workloads at very large scale At Yahoo, we run into the problem that many of our legacy property workloads run on large fleets in many data centers around the world Their “cloud adoption strategy” involves deploying their current stack on a small number of cloud instances and running them in parallel with their existing fleet This makes it hard to adapt the application configuration to work well in the cloud It also means that the property’s existing operational practices have to be made to work with cloud instances This is particularly challenging for Edge configuration 5/12/11 Yahoo! Presentation 26
27. The reality of “on demand” “On demand” capacity presumes that each individual customer request is very small compared with the size of the resource pool Capacity management involves careful over-provisioning of the pool But we have many large properties… Today we overprovision to their planned size/QPS, which: Can waste a lot of resources Has long lead times We need to be able to capture the time dimension in our capacity planning and provisioning requests Define the envelope, allow on-demand within that envelope 5/12/11 Yahoo! Presentation 27
28. Integrating Platform as a Service We’ve already seen that PaaS presents us with a problem in terms of cloud compute APIs We’re working on a number of outstanding issues with PaaS: Accommodating PaaS in our existing software development and test methodologies, including Continuous Integration Deploying PaaS in different network security zones Managing the various kinds of access to the PaaS infrastructure Integrating various network and request handling mechanisms – throttling, traffic shaping, A/B testing, etc. – into PaaS frameworks 5/12/11 Yahoo! Presentation 28
29. Automation One of the key benefits of cloud computing is improved consistency and predictability through automation Scale affects automation in interesting ways Potential interactions between automated deployment and self-healing Very long-running automation, such as the initial population of a new storage farm in a new location Issues also arise when automation is designed to exploit an API that was created for a different use-case E.g. an API intended to support a portal GUI with a human operator, which has never been stressed 5/12/11 Yahoo! Presentation 29
31. Architectural Vision Software as a Service Knowledge as a Service Platform as a Service Infrastructure as a Service Hardware 5/12/11 Yahoo! Presentation 31
32.
33. Fundamental Cloud Services that can be assembled to build higher level services5/12/11 Yahoo! Presentation 32
38. Hosts the development and operation of Yahoo applications5/12/11 Yahoo! Presentation 33
39. Takeaways The Yahoo private cloud project is on track Many successes already, but much more work to do End-to-end architecture is a key element Open source is important: we’re committed to a collaborative approach Founder member of Open Networking Foundation Studying other open source projects No more premature announcements And of course we’re hiring…. ☺ 5/12/11 Yahoo! Presentation 34