11. + SAMBA
“A few years back, a patch
submission from coders at
Microsoft would have been
amazing to the point of
unthinkable, but the battles
are mostly over and times
have changed.
12.
13.
14.
15.
16.
17. Attract Individual
Consumers:
- Provide interesting
service
- Provide mobility Online
- Provide social Monetize the Social:
Business - Improve individual
Monetize Individual: experience
- Upsell service
- VIP
Application - Re-sell Aggregate Data
(e.g., Advertisers)
- Speed
- Extra
Capabilities
18. Social NetworkING: the Business Problem
• 100s of million of users
• Terabytes to petabytes of data
• Required (eventual) data
consistency across users
19. Solution
• Shard/Partition user data across hundreds to
thousands of SQL Databases
• Propagate data changes from one DB to other DBs
using reliable, async Message Service
• Provide a caching layer for performance
• And also used for
20. Many LARGE SCALE customers using similar patterns
• Patterns
• Sharding and reliable messaging
• Sharding and fan/out query layer
• Caching layer
• Customer Examples
• Social Networking: Facebook, MySpace, etc
• Online electronic stores (cannot give names )
• Travel reservation systems (e.g. Choice International)
• MSN Casual Gaming
• etc.
21. • Require high availability
• Be able to scale out:
• Be able to quickly grow and change:
Move better support for these patterns into the Data Platform!
22. • NoSQL = operational and developer agility at low CapEx and OpEx!
• Low Cost
• Processing Paradigms
• Data Model Paradigms
• Range from devices, over OLTP Web 2.0 applications to BigData Analytics
23. Data Model Example Stores (apologies to the ones I did not list)
Simple Key-Value Pairs Memcache, Redis, Dynamo, Voldermort, LevelDB, Azure Caching
Wide Sparse Column Sets HyperTable, Big Table, Cassandra, HBASE, Hyperbase, Amazon
DynamoDB, Windows Azure Tables, SQL Server/Azure Sparse
columns
BLOBs Amazon S3, Oracle Berkeley NoSQL, Windows Azure Blob Store,
SQL Server RBS/FileTable
JSON Documents MongoDB, CouchBase, Riak, RavenDB
Graph Neo4J, GraphDB, HypergraphDB, Stig, Intellidimension
Objects and XML Documents Versant, Oracle Berkeley NoSQL, MarkLogic, existDB, EMC
HiveDB, SQL Server/Azure, Oracle, IBM DB2
Extended Relational Oracle, EMC SQLFire, IBM DB2, MySQL, Postgres, SQL
Server/Azure
24. • You want:
• You can only get 2 of 3 (CAP Theorem)
• In Brave New World:
25. • Performance and Elastic Scale on Demand
• Automate management lifecycle (or fail)
• Simple deployment lifecycle
• No DB or OS Admin telling me what to do
26. • Code First and revise quickly
• Application-model first (before database)
• Flexible open data models
• You don’t know exactly what you are looking for
• Lower Pain of adoption and maintenance
• No DB or OS Admin telling me what to do
27. • Low CapEx, Low OpEx
• Built-in tunable High-Availability
• Data scale-out (Sharding)
• Processing scale-out (Map-Reduce, Fan-Out, tunable consistency)
• Flexible Data Models
• Integrate with BigData Analytics (e.g., Hadoop)
Many Relational Database Systems are incorporating these learning!
28. • Provides Data Partitioning/Sharding at the Data Platform
• Enables applications to build elastic scale-out applications
• Provides non-blocking SPLIT/DROP for shards (MERGE to
come later)
• Auto-connect to right shard based on sharding keyvalue
• Provides SPLIT resilient query mode
29. • Flexible data is good, but:
• Procedural Scale-Out processing is good, but:
• Eventual Consistency is good, but:
• Simple Queries are good, but:
Many NoSQL Database Systems are starting to incorporate these learnings!
30. Attract Individual
Consumers:
- Provide interesting
service
- Provide mobility Online
- Provide social Monetize the Social:
Business - Improve individual
Monetize Individual: experience
- Upsell service
- VIP
Application - Re-sell Aggregate Data
(e.g., Advertisers)
- Speed
- Extra
Capabilities
31. Readable
Replica
Primary Copy
Shard
Readable
OLTP Workloads Replica
Traditional OLAP Workloads
Highly Available known schema
High Scale Readable Data warehouse, “Star joins”
Replica
High Flexibility
Primary
Shard Dynamic OLAP Workloads
mostly touching 1 Readable
to low number of Replica
3Vs (Volume, Velocity, Variety)
shards Exploratory
Readable
Replica
Scale-out queries, often using
Primary
Shard Query eventual consistent scale-out
Readable frameworks like Hadoop
Replica
SQL or NoSQL Store
33. http://www.windowsazure.com
Presentation Speaker Date and Time
Do We Have the Tools We Need to Navigate the
Dave Campbell 2/29 9:00am PST
New World of Data?
Onsite Interview * Tim O’Reilly, Dave Campbell 2/29 10:15am PST
Unleash Insights on All Data With Microsoft Big
Alexander Stojanovic 2/29 11:30am PST
Data
Office Hours (Q&A session) Dave Campbell 2/29 1:30pm PST
Hadoop + Javascript: What We Learned Asad Khan 2/29 2:20pm PST
Democratizing BI at Microsoft: 40,000 Users
Kirkland Barrett 3/1 10:40am PST
and Counting
Data Marketplaces For Your Extended
Piyush Lumba 3/1 2:20pm PST
Enterprise
33
34. • NoSQL and the Windows Azure Platform
http://download.microsoft.com/download/9/E/9/9E9F240D-0EB6-472E-B4DE-
6D9FCBB505DD/Windows%20Azure%20No%20SQL%20White%20Paper.pdf
http://blogs.msdn.com/b/cbiyikoglu/archive/2011/03/03/nosql-genes-in-sql-azure-
federations.aspx
<choose from slides 3 – 10 as alternative intro pictures>Timing: 1 minute Key Points:Microsoft has changed as a company and become more open.Script:Microsoft has changed as a company and become more open. The old debate – black or white; open source or commercial software; us versus them – is simply no longer relevant. Today, many customers manage mixed IT environments. And they have told us that what matters today is maximizing their existing IT investments while having the freedom to choose new solutions that best support their business goals. To meet these customer needs, Microsoft is committed to openness.
Timing: 2 minutes Key Points:We do not compete against open source as a category, we increasingly work collaboratively with this community. You may be surprised to learn what Microsoft is doing with open source. More and more, customers, partners and the industry understand that the work we are doing with open source is about helping customers and enabling a rich and robust ecosystem of developers and partners. The following slides will provide some great examples. Script:You may be surprised to learn what Microsoft is doing with open source. More and more, customers, partners and the industry understand that the work we are doing with open source is about helping customers and enabling a rich and robust ecosystem of developers and partners. We enable open source on our platforms. We recognize that if we’re going to use open source, then we also have to give back, especially if we want open source developers to continue to think of Windows and Windows Phone as platforms for them to develop on. For example, Windows Azure supports a wide-range of development languages, including Java, PHP and Node.js so that developers can build applications for using any language tool, or framework of their choice – including open source. Let’s review the following slides for some more detailed examples.
Timing: 2 minutes Key Points:Device Driver Code contributionsfor Linux: enables better performance of Linux when virtualized with Hyper-V CoApp: you are developing apps for Linux? Why not make them work on Windows and open up more opportunities for your app to get adopted? Windows Azure Virtual Machines enablescustomers to run both their existing Windows and Linux-based applications in the cloud. Compatible operating systems/images include CentOS, openSUSE, SUSE Linux Enterprise Server, Ubuntu, as well as Windows Server. Script:You may be surprised to learn what we are doing with Linux. We have learned a lot over the past decade. Embracing Linux on our platforms is a real business for us. For example, we work on a variety of interoperability initiatives with Linux vendors -- SUSE, Citrix, RedHat, CentOS -- to provide support for Linux as a “first-class guest” on Hyper-V. Another great example is CoApp, which is an is an open-source package management system for Windows. The goal of the CoApp project is to create a community of developers dedicated to creating a set of tools and processes that enable other open source developers to create and maintain their open source products with Windows as a build target. Further, with Windows Azure Virtual Machines, customers can run both their existing Windows and Linux-based applications in the cloud. Compatible operating systems/images include CentOS, openSUSE, SUSE Linux Enterprise Server, Ubuntu, and Windows Server, further illustrating Microsoft’s commitment to openness for customersand partners. FYI – Data sources and more information:-Robert McMillan, Wired Enterprise (March 2012): http://www.wired.com/wiredenterprise/2012/03/mr-linux/. Note: thequote is accurate, but the broader article is all about Linus and Linux
Timing: 2 minutes Key Points:We’re committed to helping customers manage “big data”, working with the Apache Hadoop community to support Hadoop on Windows Server and Windows Azure.Our Big Data solution is also integrated into the Microsoft BI tools such as SQL Server Analysis Services, Reporting Services and even PowerPivot and Excel. This enables you to do BI on all your data, including those in Hadoop.Script:Just ten years ago, most business data was locked up behind big applications. We are now entering an era when unlocking this data and its potential to drive new knowledge and insights is becoming a key success factor for many ventures. To embrace this “Big Data revolution”, we’ve launched customer previews of Apache Hadoop-based solutions for Windows Server and Windows Azure, which enables Hadoop apps to be deployed in hours instead of days. The most recent customer preview is called Windows Azure HDInsight and Microsoft HDInsight for Windows Server. Both solutions embrace enterprise-ready Apache Hadoop to enable most any user to begin viewing and truly analyzing Big Data, using such tools as Microsoft Excel, PowerPivot, and SQL Server Analysis Services. Regardless of the size or type of data, or where it’s stored, both HDInsight versions offer simple management via Microsoft System Center 2012, a shared codebase for platform consistency whether on Windows Server or Azure, and 100% compatibility with Hadoop.Customers such as Klout,Webtrends and the University of Dundee have been using the service to glean simple, actionable insights from complex data sets hosted in the cloud. FYI – Data Sources and more information:“Opening Doors To Real Big Data Value: Hadoop On Windows Azure And Windows Server” (Oct 2012): http://blogs.technet.com/b/openness/archive/2012/10/24/opening-doors-to-real-big-data-value-hadoop-on-windows-azure-and-windows-server.aspx “Openness Customer Spotlight: Klout Uses Microsoft BI and Hadoop to Bolster Big Data Insights” (Sept 2012): http://blogs.technet.com/b/openness/archive/2012/09/07/klout-uses-microsoft-bi-and-hadoop-to-bolster-big-data-insights.aspx“Navigating the New World of Data” (Mar 2012): http://blogs.technet.com/b/openness/archive/2012/03/01/navigating-the-new-world-of-data.aspxKurt Mackie, Redmondmag.com quote (Oct 2011):http://redmondmag.com/articles/2011/10/12/hadoop-efforts-announced-at-pass.aspx?admgarea=BDNA
Timing: 2 minutes Key Points:Great Java experience on Windows Server and Windows AzurePartners like Gigaspaces are taking advantage of Java support to provide services to customers with existing Java-based enterprise applications. Windows Azure plug-in for Eclipse with helps Eclipse users create and configure deployment packages of their Java applications for the Windows Azure cloud.Script:Customers and partners are taking advantage of the “first-class” Javaexperience on Windows Server and Windows Azure. For example, partners like Gigaspacesare now able to take advantage of Java support to provide services to customers with existing Java-based enterprise applications. Microsoft also continues to work on projects that foster interoperability with Java and Windows. For example, Windows Azure SDK for Java includes a Windows Azure plug-in for Eclipseprovides templates and functionality that allow you to easily create, develop, test, and deploy Windows Azure applications using the Eclipse development environment. It is an Open Source project, whose source code is available under the Apache License 2.0 from the project’s site at http://sourceforge.net/projects/waplugin4ej/.FYI – Data Sources and more information:Gigaspaces case study (Feb 2012): http://www.microsoft.com/casestudies/Windows-Azure/Gigaspaces/Solution-Provider-Streamlines-Java-Application-Deployment-in-the-Cloud/400000000081
Timing: 2 minutes Key Points:Great example of how far the Linux experience has evolved over the past several years – from no PHP experience on Windows to PHP running extremely well and with high performance on both Windows and Linux. PHP releases now include support for both Windows and Linux. Script:Over the past several years, Microsoft and its partners have worked diligently with the PHP community to improve the experience PHP developers and users have on Windows Server and Windows Azure. Now the PHP community supports Windows right alongside Linux, including the recent release of PHP 5.4.0. René de Haas, CEO of a Dutch webhosting company called SoHosted, is a partner who has been instrumental in improving the PHP on Windows experience. According to René, “Between 2003 and 2012 we've seen the general opinion about Microsoft, Windows and PHP turn 180 degrees” due to the improvements made.FYI – Data Sources and more information:“PHP 5.4 Available in Windows Azure Web Sites” (Nov 2012): http://blogs.technet.com/b/openness/archive/2012/11/27/php-5-4-available-in-windows-azure-web-sites.aspx“Evolution of PHP on Windows” (Mar 2012), including SoHosted interview:http://blogs.technet.com/b/openness/archive/2012/03/01/evolution-of-php-on-windows.aspx
Timing: 1 minute Key Points:Firefox browser is well supported across cloud services (Office 365, SkyDrive, Bing, Skype).Microsoft created a Firefox plug-in for Windows Media Player.Mozilla has acknowledged how Microsoft’s commitment to HTML5 enables this support for Firefox and other modern browsers. Script:Firefox browser is well supported across Microsoft’s cloud services like Office 365, SkyDrive, Bing, and Skype, as well as Microsoft created a Firefox plug-in for Windows Media Player. Those within the Mozilla community have acknowledged how Microsoft’s commitment to HTML5 enables this support for Firefox and other modern browsers. FYI – Data Sources and more information:Blizzard quote reference: http://www.theregister.co.uk/2010/06/09/mozilla_man_on_apple_google_and_html5/
Timing: 1 minute Key Points:Microsoft has worked with Drupal to improve interoperability, resulting in more choices for users. Script:Drupal is a popular open source content management system that powers many of the world's web sites.Microsoft has worked with Drupal to improve interoperability, resulting in more choices for users. The Screen Actors Guild recently migrated their Drupal site to Windows Azure. The SAG Awards, their biggest traffic day of the year, “went off with flying colors.” FYI – Data Sources and more information:“Drupal + Windows Azure: A Winning Combination for SAG” (Feb 2012): http://blogs.technet.com/b/openness/archive/2012/02/29/drupal-windows-azure-a-winning-combination-for-sag.aspx
Timing: 2 minutes Key Points:Node.js provides an end-to-end JavaScript experience for the development of a whole new class of real-time applications With the work that we did to enable Windows on Node.js, not did we support Windows, but the benchmarks for Linux also improvedDevelopers can also implement a Node.js application and deploy it to Windows Azure using Cloud9 IDEScript:Node.js is Node.js is a platform built on Chrome’s JavaScript runtime for easily building fast, scalable network applications. Microsoft’s support for Node.js on Windows Azure enables a new class of real-time applications. We also released the Windows Azure SDK for Node.js as open source, availableon Github, as well as the Windows Azure Development Centers has great Node.jsdocumentation, tutorials, samples and how-to guides to get you started with Node.js on Windows Azure.Also announced recently is support for Cloud9 IDE as a way to create Node.js applications and deploy to Windows Azure. FYI – Data Sources and more information:Scott Fulton, ReadWriteWeb quote (Dec 2011): http://www.readwriteweb.com/cloud/2011/12/windows-azure-adds-nodejs-supp.php
Timing: 1 minute Key Points:Patches have been submitted to SambaGreat example of how relationship between an open source solution and Microsoft can evolve Script:In late 2011, a patch to the Samba code was submitted that enables Linux clients to better interoperate with Microsoft Windows in mixed source environments. Contributed under GPL2+, the patch was an individual contribution made by Microsoft’s Stephen Zarkos (Open Source Technical Center team) in line with Samba policies in place at the time. Efforts also continue to move forward with Microsoft and the Samba team working together to support the SMB protocol. The comments by Chris Hertel of the Samba team reflect how the relationship between key open source solutions and Microsoft have been evolving in the past several years. FYI – Data Sources and more information:“Driving Interoperability with the SMB Open Specifications” (Jun 2012): http://blogs.technet.com/b/openness/archive/2012/06/29/driving-interoperability-with-the-smb-open-specifications.aspx
Timing: 3 minutes Key Points:The substantial growth of the Microsoft open source project community, Codeplex, which has tripled in size in the past two years, illustrates the momentum of Microsoft + Open Source. 9 of the top 10 most downloaded OSS projects run on Windows.In 2011 Microsoftlaunched WebMatrix -- a free, light-weight web development tool designed for quick website building and deployment. This tool puts open source tools at developers’ fingertips and these developers have downloaded more than one million open source web applications.Customers are benefitting from our work with open source solutions, including the more than 900 customers of the Microsoft-SUSE Alliance.Script:Our increased commitment to working with open source has sparked tremendous momentum and contributed to rapid growth of open source software on Windows – according to Sourceforge, 9 of the top 10 most downloaded OSS projects run on Windows today.(Side note: the compete project list is below; the only project that “isn’t supported on Windows” is the “Smart package of Microsoft's core fonts” which doesn’t need to be supported because is obviously already runs on Windows ). Further, Codeplex, Microsoft’s open source project community hosts more than 32,000 open source projects and has tripled its membership in just two years, from 300,000 members to more than 900,000 in 2012. Another great example is Webmatrix, a free, light-weight web development tool designed for quick website building and deployment. This tool puts open source tools at developers’ fingertips and these developers have downloaded more than one million open source web applications. Since it’s launch in 2011, there have been more than 1 million downloads. And customers as well as developers are benefitting directly from these efforts, including the more than 900 customers of the Microsoft-SUSE Alliance, which delivers interoperability solutions that help customers to get more out of their mixed Windows and Linux environments. FYI – Data Sources and more information:SUSE,Codeplex, and WebMatrix stats current as of Nov 2012Sorceforge top projects site (http://sourceforge.net/top/). “Most downloads over all time” as of Nov 25, 2012: VLC media playereMuleAzureus / VuzeAres Galaxy7-ZipSmart package of Microsoft's core fonts (“not supported on Windows” by Sourceforge definition)FileZillaPortableApps.com: Portable Software/USBMinGW - Minimalist GNU for WindowsNotepad++ Plugin Manager
<OPTIONAL SLIDE: Customize with local announcements as appropriate>Timing: 1 minute Key Points:MongoDB has been supported on Windows Azure for some time, but recently the setup, deployment, and development experience has been streamlined by the release of the MongoDB Installer for Windows Azure.In October, MongoLabreleased the preview of a MongoDB-as-a-Service offering through the Windows Azure Store. MongoLab is a full-featured MongoDB cloud database solution that completely automates the operational aspects of running MongoDB. Script: MongoDB is a very popular NoSQL database that is easy to learn if you have JavaScript (or Node.js) experience and is used in many high-volume web sites including Craigslist, FourSquare, Shutterfly, The New York Times, MTV, and others.People have been using MongoDB on Windows Azure for some time, but recently the setup, deployment, and development experience has been streamlined by the release of the MongoDB Installer for Windows Azure. It’s now easier than ever to get started with MongoDB on Windows Azure!Also, in October, MongoLab released the preview of a Mongo-DB-as-a-Service offering through the Windows Azure Store. MongoLab is a full-featured MongoDB cloud database solution that completely automates the operational aspects of running MongoDB. With the MongoLab cloud platform developers can deploy and manage highly-available databases for their applications and leverage automated backups, web-based tools, 24/7 monitoring, and expert support.FYI – Data Sources and more information:For more detail on the MongoDB Installer for Windows Azure: http://blogs.msdn.com/b/interoperability/archive/2012/07/09/mongodb-installer-for-windows-azure.aspxFor more detail on the MongoLab service: https://www.windowsazure.com/en-us/store/service/?name=mongolab
Timing: 3 minutesKey Points:Windows Azure is an open and flexible cloud platform. Developers can build applications using any language, tool or framework – including open source languages such as PHP, Java, and Node.js, and other open source tools. Our June 2012 technical preview release, brought support for Linux on Windows Azure Virtual Machines and further support for multiple frameworks and popular open source applications through Windows Azure Web Sites.Script:As part of our cloud platform, interoperability is a design-time requirement. Windows Azure is an open and flexible cloud platform that enables customers to quickly build, deploy and manage applications across a global network of Microsoft-managed datacenters. To do it right we know we’ve got to be open.Developers can build applications using any language, tool or framework – including open source languages such as PHP, Java, and Node.js, and other open source tools – which means they can utilize familiar open source skills on Microsoft's cloud platform. Currently features and services in Windows Azure are exposed using open REST protocols. Windows Azure client libraries are available for multiple programming languages and are released under an open source license and hosted on GitHub. As Microsoft continues to provide incremental improvements to Windows Azure, we remain committed to working with developer communities. Other recent interoperability enhancements include: Eclipse Plugin for Java, Mongo DB support, code configuration for hosting Solr/Lucene, Hadoop services preview. Also, our June 2012 technical preview brought support for Linux images on Windows Azure Virtual Machines and further support for multiple frameworks and popular open source applications through Windows Azure Web Sites (note: see appendix slides for more detail on Virtual Machines and Web Sites).
Timing: 1 minuteKey Points:Windows Azure Web Sites enable developers to quickly and easily deploy sites with support for multiple frameworks and popular open source applications to a highly scalable cloud environment.Script: Windows Azure Web Sites allows you to build highly scalable websites on Windows Azure. You can quickly and easily deploy sites to a highly scalable cloud environment that allows you to start small and scale as traffic grows. Windows Azure Web Sites uses the languages and open source apps of your choice and supports deployment with Git, FTP, and TFS. You can easily integrate other services like MySQL, SQL Database, Caching, CDN, and Storage.
In January 2011 Microsoftlaunched WebMatrix -- a free, light-weight web development tool designed for quick web site building and deployment. This tool puts open source tools at developers’ fingertips:Choose from a gallery of popular open source web applications to get a site up and running in a few clicks.Installs PHP & MySQL for necessary apps. Edit your code or database within WebMatrix.Utilizes NuGet to gain access to a community-driven gallery of ASP.NET “helpers” that given you small snippets of code to perform common tasks (bit.ly, Facebook integration, twitter, etc.).
Example MSN Casual Gaming:~2 Million users at launch~86 Million services requests/day 135 Windows Azure Data Services Hosting VMs ca. 18K connections in Connection Pools, this could grow with trafficCa. 1200 SQL Azure requests/second spread across all partitions during peak load~ 90% reads vs 10% writes (this varies per storage type)~ 200 bytes of storage per user~ 20% of database storage is currently used, but expect this to growSharded over 400 SQL Azure Databases
Note: Big-sized companies invest resources in building these platforms instead of using existing relational platforms!
No DB or OS Admin telling me what to do!
Performance and Scale:Map/Reduce PatternsEventual consistency (trade-off due to CAP)ShardingCachingAutomate management Lifecycle:Elastic Scale on demand (no need to pay for resources until needed)Automatic Fail-overScalable Schema version rolloutPerf troubleshootingAuto alertingAuto loadbalancingAuto resourcing (e.g., auto splits based on policies)Declarative policy-based management
Code First and revise quicklyWorking software over comprehensive documentationResponding to change over following a planApplication-model first (before database) Dictates the data model and queriesFlexible data modelsNo a priori modeling: Data first, schema later/Open SchemaKey/Value storesReduced impedance mismatch: JSON, XML, YAMLYou don’t know exactly what you are looking forMap/Reduce for adhoc analysisProvide Search across all your data instead of just queryLower Pain of adoption and maintenance From code to deployment & “monetization” of data, services, apps and tenantsRich Services out of the BoxData and services mashupEasy troubleshooting of deployed appsNo DB or OS Admin telling me what to do
Low CapEx, Low OpEx: SQL Azure and other Platform as a Service offeringsBuilt-in High-Availability (tunable): SQL Azure has quorum based built-in replicasData scale-out (Sharding): SQL Azure FederationsProcessing scale-out (Map-Reduce, Fan-Out, tunable consistency)Flexible Data ModelsJSON (& XML) supportSparse columns/Column sets Integrate with BigData Analytics (e.g., Hadoop)
SharePoint – BI, Enterprise Search, Enterprise Content Management, CollaborationTransform - ETLClean – Data Quality, AugmentationDiscover – Search, Meta-data, Classification, Information CatalogInfer – Recommendation Engines, Machine LearningShare – Publish, CollaborateGovern – Lineage & Impact Analysis, Master Data ManagementMarketplace – Private, Public, Bing Data, 3rd Party Data Sources, Models, Algorithms, APIs