The document discusses extracting value from big data by using the most appropriate computing platform at each stage of analysis, whether it be NoSQL, MapReduce, or a cloud OS. It emphasizes balancing on-premise and cloud resources and combining relational and non-relational data through techniques like Polybase to gain insights from large, complex datasets. Choosing the right mix of platforms can provide customized experiences and drive significant business growth for companies like Amazon.
2. Big data in a Hybrid-Cloud world
Dr Michael Newberry
Windows Azure Lead, Microsoft UK
Michael.Newberry@Microsoft.com
3.
4. Doggerland: Simon Fitch, Vince Gaffney and Ken Thomson
Image Source: drowned-landscapes.tumblr.com
Royal Society's Summer Science Blog (http://summer-science.tumblr.com/)
6. VOLUME VARIETY VELOCITY
(Size) (Structure) (Speed)
Big Data.
7. Getting useful insights
from awkward data sets
using the most appropriate
computing platform at each
stage.
Dr Michael Newberry
Windows Azure Lead
Microsoft UK
8. Big data in a Hybrid-Cloud world
Dr Michael Newberry
Windows Azure Lead, Microsoft UK
Michael.Newberry@Microsoft.com
10. ….Amazon (AMZN) calls this homegrown math "item-to-item collaborative filtering," and it's used this algorithm to heavily
customize the browsing experience for returning customers…. Judging by Amazon's success, the recommendation
system works. The company reported a 29% sales increase to $12.83 billion during its second fiscal quarter, up from
$9.9 billion during the same time last year. A lot of that growth arguably has to do with the way Amazon has integrated
recommendations into nearly every part of the purchasing process from product discovery to checkout.
http://tech.fortune.cnn.com/2012/07/30/amazon-5/
11. “In theory there is no difference between theory and practice;
in practice, there is”.
Yogi Berra, cited in Nassim Taleb, Antifragile.
25. Takeaways
1. “big data” can do some amazing stuff.
2. Don’t think “big data” as much as “data needing non-
relational approaches”
3. If your big data insights are probabilistic, which they often
are, have a plan to deal with variance.
4. Pick the most appropriate platform: Think “and” not “or”:
- Balance public cloud AND on-premise,
- Combine “big data” with RDBMS.
Notas del editor
A personal view from where I sit.Picking examples unlikely to be used by other speakers.Hype – freeform dynamics on the register: http://www.freeformdynamics.com/fullarticle.asp?aid=1590
Big DataThis is a picture down the center isle of a shipping container from one of Microsoft’s datacenters. We put ~1800 computers inside one of these containers. Some of us had the privilege of working on the data storage and computational platform that powers Bing. We used 22 of these containers, spanning 40,000 machines where we stored over 100PB of data. This was three years ago, and now these servers are almost obsolete.Big Data is in constant motion and growing at an incredible rate,90% of the world’s data generated in just the past two years. That's remarkable growth.
Don’t forget – other kinds of machine learningBridge into
Need for these tools motivated by data explosion –“Each mapper takes a line as input and breaks it into words. It then emits a key/value pair of the word and 1. Each reducer sums the counts for each word and emits a single key/value with the word and sum. As an optimization, the reducer is also used as a combiner on the map outputs. This reduces the amount of data sent across the network by combining each word into a single record. “
Future of query processingPioneered in the Jim Gray Systems Labs by David DeWitt, PolyBase is a federated query processor in SQL Server 2012 Parallel Data Warehouse which represents a breakthrough innovation from traditional query processing to join structured and unstructured data from Hadoop together. Without manual intervention, PolyBase Query Processor can accept a standard SQL query and combine tables from a relational source with tables from a Hadoop source directly through external tables. As well, PolyBase Query Processor parallelizes the ability to import/export data to and from Hadoop giving PDW speed, simplicity, and responsiveness in addressing these new types of queries.Ability to issue standard T-SQL that joins relational data with unstructured data in Hadoop PolyBase rapidly imports/exports data between Hadoop and PDW in parallel3) PolyBase can query data in Hadoop directly without movement (with external tables)4) Created in “Gray Systems Labs” by David DeWitthttp://www.microsoft.com/en-us/sqlserver/solutions-technologies/data-warehousing/polybase.aspx
http://www.microsoft.com/casestudies/Case_Study_Detail.aspx?CaseStudyID=710000002102As the game was prepared for release, however, 343 Industries was faced with an entirely new kind of challenge: to gain insight into player behavior and user preferences. To achieve this goal, Microsoft leadership asked 343 Industries to find a way to effectively mine user data. At the same time, the team was faced with another need: analyzing data during the five-week Halo 4 “Infinity Challenge” tournament and providing results each day to their tournament partner, Virgin Gaming. The Halo 4 Infinity Challenge, the largest free-to-enter online Halo tournament in the world, tracked a player’s personal score in the game’s multiplayer modes across a global leaderboard, giving players a chance to win more than 2,800 prizes. Virgin Gaming needed to use business intelligence (BI) data gathered during the event to update leaderboards on the tournament website.“..the average length of a game and the specific game features that players use the most. By getting these insights, the Halo 4 team can make frequent updates to the game. “Based on the user preference data we’re getting from Hadoop, we’re able to update game maps and game modes on a week-to-week basis,” says Vayman. “And the suggestions we get in the forums often find their way into the next week’s update. We can actually use this feedback to make changes and see if we attract new players. Hadoop and the forums are great tuning mechanisms for us.”