8. 1.Experimen
t
2.Learn
3.Plan
All images sourced from
iStockPhoto.com
Editor's Notes
\n
1st... the way online advertising is bought and sold is fundamentally broken. The typical process is a media buyer builds a media plan using ratings data from companies like Nielsen or Comscore. They then send request-for-proposal documents to publishers, who then prepare proposal documents. Negotiation then ensues and at the end, a contract is signed. Once the media contract begins, its difficult to change if you're not meeting your goals. So, the process is very inefficient in the preparation and execution of the advertising campaign.\n\nNow, a lot of people also had this insight, and there were many products trying to automate the media buying process. But at their core, they were automating a fundamentally broken process.\n
2nd... if you abstract the media buying system, it is a one-sided market. In fact, structurally, it is a commodity market. So the insight here is that the solution is to trade media not using the old system which was basically "forward contracts" that have little flexibility, but rather execute the trades in real time as a "spot market”.\n
And to execute these trades programmatically, leveraging powerful machine learning algorithms. In this sort of system, we watch every ad impression available and make a buying decision instantaneously of whether to bid for the impression, how much to bid and which ad to show. If a strategy isn't working, you can pause it within minutes. To start a new campaign takes only a few minutes.\n\nOnly a few companies had this insight, and we were fortunate to be in the leading group.\n\nOK - so those two insights were the hard bit. The easy bit was implementing that system.... no, wait, other way around. Actually, it turns out\nthat the implementation is very challenging. Because we're watching every ad impression in the market, and making decisions in real-time, we have\nthree very hard constraints:\n\n1st... Very low latency: we have to make a high quality decision on which ad to show and how much to pay in milliseconds.\n2nd... Very high throughput: we have to make these very fast decisions over 7 million times every minute.\n3rd... Very high volume: we see billions of ad impressions every single day. And we have to report, analyse and learn from all this data.\n
Hence the "Big Data" challenge:\n\nIn raw terms, we have over a petabyte of raw log data stored on Amazon Simple Storage Service (S3), and that is growing at 4 terabytes per day or 130 terabytes per month.\nWhen this is compressed down and actually stored, it compresses to around 100 TB. \n\nWhen you're seeing billions of new events every day and processing terabytes per day, traditional database systems just don't cope. So, to help us with this volume, we use Hadoop MapReduce jobs. This is all powered by Amazon Elastic Map Reduce. At any given time, we might have 30-40 Hadoop nodes running various processing jobs, from report aggregations to machine learning algorithms.\n
At the time when we started using Amazon Elastic Map Reduce, we didn't have CAPEX, time and in-house skills to setup and maintain a 30-40 node Hadoop cluster required to run these sorts of processing jobs. So Amazon Elastic Map Reduce really enabled us to quickly build the Big Data capability we required without any big up-front investment that would have easily cost us several months and a couple hundred thousand dollars. This accelerated our product time-to-market by months.\n
Another requirement is to do Machine Learning "at scale". Sometimes, we want to test a new algorithm. With Amazon Elastic Map Reduce, we can run a once-off job on months of data (literally 100's of terabytes) and test the new algorithm in a couple of hours. If we were using a non-cloud Hadoop cluster, this sort of agile analytics would be cost prohibitive and time consuming. We can do this sort of analysis in hours instead of weeks. With Amazon Elastic Map Reduce, we can innovate quickly and continuously enhance our customer offerings.\n
Finally, some of the key learnings from our adoption of Amazon Web Services:\n1) Experiment: It is fast and cheap to experiment, so just get started and iterate. When the experiment is over, just turn off the services.\n2) Learn: Spend some time on the forums and reading the documentation to pick up some tips and pointers to optimise.\n3) Plan: Just because its "in the cloud", doesn't excuse you from having to architect a fault tolerant solution and think about redundancy and single points of failure. Amazon just makes it easier to execute the fault tolerant solutions - you still have to do the thinking and planning. In any reasonable large, complicated distributed system, things are bound to go wrong-network connections timeout, jobs fail to start, and machines occasionally die. Build things expecting failure and put in place the necessary mechanisms to gracefully deal with these minor failures.\n\nThank you for your time today and the opportunity to share a bit about Brandscreen...our challenges with Big Data...and how we're solving those challenges with Amazon Web Services.\n