20. Aggregate Ad Serving data Log Files File Export APIs Internet Client Provided Data Data Sources Presentation Layer Talend Data Flow Manager Direct Analytics Processing via EMR Web Application Layer ODBC Edge Provisioning DB OLAP Cache Cloud Storage S3 HBase/SDB 15 Elastic MapReduce
24. Drive a personalized message User recently purchased a home theater system and is now looking for sports games Target Ad ( 1.7 million per day )
25. We import Atlas transaction level data 24 servers S3 file storage Compress and upload 200 + GB of data per day ( 180 days = ½ Trillion ICA records )
26. We use EMR to process and segment EMR S3 100 Machinecluster created on demand ( 3.5 Billion records, 71 million unique cookies a day)
27. Process and Cost This all happens in about 8 hours every day and is fully automated (previously 2+ days) And increased ROAS by 500% (to $74)
28. Why AWS Efficient Elastic infrastructure from AWS allows capacity to be provisioned as needed based on load, reducing cost and the risk of processing delays Ease of integration Amazon Elastic MapReduce with Cascading allows data processing in the cloud without any changes to the underlying algorithms Flexible Hadoop with Cascading is flexible enough to allow “agile” implementation and unit testing of sophisticated algorithms. Adaptable Cascading simplifies the integration of Hadoop with external ad system Scalable AWS infrastructure helps reliably store and process huge (Petabytes) data setss