3. 20%
Structured Data
80%
Unstructured Data
Click Stream
Social
Geolocation
Machine Data
Logs
Videos
Images
Text
Sensors
Big Data is about building
new analytic applications
based on
new types of data,
in order to
better serve your customers
and drive a better
competitive advantage
David McJannet, Hortonworks http://www.informationweek.com/big-data/news/big-data-
analytics/big-data-a-practical-definition/240160412#disqus_thread
4.
5. How do I optimize my
fleet based on weather
and traffic patterns?
What’s the social
sentiment for my
brand or products
How do I better
predict future
outcomes?
6. GAIN COMPETITIVEADVANTAGE BY MOVING FIRSTAND FAST IN YOUR
INDUSTRY
Web app
optimization
Smart meter
monitoring
Equipment
monitoring
Advertising
analysis
Life sciences
research
Fraud
detection
Healthcare
outcomes
Weather
forecasting
Natural resource
exploration
Social network
analysis
Churn
analysis
Traffic flow
optimization
IT infrastructure
optimization
Legal
discovery
8. Discover data with Data Explorer
Combine with information from
other sources via Azure Marketplace
Refine with advanced analytics
Connecting
with the World’s Data
Immersive insights for all users
Insights on any data
Embedded insights with simplified
programming
Immersive Insight,
Wherever you are
Enterprise-ready Hadoop
Windows simplicity and
Manageability for Hadoop
Extend data warehouse with Hadoop
Scale & elasticity of the cloud
Open Big Data Platform
Any Data, Any Size Anywhere
11. HDInsight Documentation and Tutorials
Programming Hive Book
http://blogs.msdn.com/cindygross
http://www.windowsazure.com/en-
us/home/features/preview/
http://www.microsoft.com/web/gallery/install.aspx?appid=HDINSIGHT-PREVIEW
http://www.microsoft.com/en-us/download/details.aspx?id=36803
http://www.microsoft.com/en-us/download/details.aspx?id=38395
Notas del editor
ERP, SCM, CRM, and transactional Web applications are classic examples of systems processing Transactions. Highly structured data in these systems is typically stored in SQL databases. Web 2.0 are about how people and things interact with each other or with your business. Web Logs, User Click Streams, Social Interactions & Feeds, and User-Generated Content are classic places to find Interaction data.Ambient data tends is coming “Internet of Things”. Mary Meeker has predicted 10B connected devices by 2015. Sensors for heat, motion, pressure and RFID and GPS chips within such things as mobile devices, ATM machines, and even aircraft engines provide just some examples of “things” that output ambient signals…There are multiple types of data personal - > organizational - > public - > private So we should NOT minimize our thinking to just data that flows through an organization. Ex. The mortgage-related data you may have COULD benefit from being blended with external data found in Zillow, for example.Moreover, the government has the Open Data Initiative. Which means that more and more data is being made publicly available.
So – for the purposes of this presentation – let me set the context for what is meant by Big Data. When we refer to Big Data we are referring to data sets that have the following characteristics or attributes: 1. Large Data Volume (Size) >10TB 2. High Data Velocity (Growth) >25% YoY3. Wide Data Variety (Form) >15% Unstructured Data (Text, E-Mail) Weblogs, Video, Images, Sound, As you can see from this graphic – we traditionally look at Data through two lenses – traditional structured data – which accounts for 20% of all data and is what enterprises historically focused on – and unstructured data - which accounts for appx. 80% of all data, but nobody much cared about until recently – it’s often referred to as dark data and sits beneath the surface. Again – historically we did not have the tools to do much with unstructured data – so it was just noise.
Today new types of questions are being asked to drive the business. These questions include:Questions on Social & Web Analytics e.g. What is my brand and product sentiment? How effective is my online campaign? Who am I reaching? How can I optimize or target the correct audience? Questions that require connecting to live data feeds e.g. a large shipping company uses live weather feeds and traffic patterns to fine tune its ship and truck routes leading to improved delivery times and cost savings. Retailers analyze sales, pricing and economic, demographic and live weather data to tailor product selections at particular stores and determine the timing of price markdowns.Questions that require advanced analytics e.g. Financial firms using machine learning to build better fraud detection algorithms that go beyond the simple business rules involving charge frequency and location to also include an individual’s customized buying patterns ultimately leading to a better customer experience. Organizations that are able to take advantage of Big Data to ask and answer these new types of questions will be able to more effectively differentiate and derive new value for the business whether it is in the form of revenue growth, cost savings or creating entirely new business models. Gartner asserts that “By 2015 businesses that build a modern information management system will outperform their peers financially by 20 percent.” McKinsey agrees, confirming that organizations that use data and business analytics to drive decision making are more productive and deliver higher return on equity than those who don’t.
There has never been a more exciting time with respect to the world of data. We are seeing the convergence of significant trends that are fundamentally transforming the industry and a new era of tech innovation in areas like social, mobile, advanced analytics and machine learning. We’re seeing an explosion of data, there is an entirely new scale and scope to the kinds of data we are trying to gain insights from. There’s a lot of talk about this – estimates are that the total amount of digital information in the world is increasing 10X every 5 years, with 85% of this data coming from new data types eg. Sensors, RFIDs, WebLogs etc. This presents a huge opportunity for Businesses that tap into this new data to identify new opportunity and areas for innovation. However, having a platform that supports the data trend is only part of today’s challenge; you need to also make it easier for people to access so that they can infer insight and make better decisions. If you think about the user experience, with everything we are able to do on the Web, our experiences through social media sites, how we’re discovering, sharing, and collaborating in new ways – User Expectations of their business and productivity applications are changing as well.
For customers with large or diverse datasets, Microsoft’s Big Data solution unleashes actionable insights from both structured and unstructured data. Unlike the competition it offers insights to everyone through familiar tools such as Office and SharePoint, and the ability to unlock hidden value by connecting to publicly available data and services. In addition, it brings the simplicity of Windows to Hadoop through integration with key Microsoft components including Active Directory and System Center, and offers elastic scale of the cloud through a Hadoop based service on Windows Azure.Any Data, Any Size Anywhere: Microsoft enables customers to realize the benefits of Big Data through a modern data management platform that seamlessly handles data of all types (including structured, unstructured and real time data) and scale. We bring the simplicity of Windows to Hadoop, extend your Data Warehouse with Hadoop and offer the elastic scale of the cloud to Big Data.Enterprise-ready Hadoop based on HDPMicrosoft offers Enterprise-ready Hadoop based on the Hortonworks Data Platform (HDP 1.1) which offers the most reliable, innovative and trusted distribution available. Microsoft and Hortonworks together deliver this for Windows Server and Azure.Bringing the Simplicity and Manageability of Windows to Hadoop:Microsoft accelerates deployment of Hadoop in the Enterprise by simplifying setup and management, while enhancing ease of use and performance.Seamlessly Extend your Data Warehouse with Hadoop: Microsoft enables customers to extend their Enterprise Data Warehouses with Hadoop connectors for SQL Server and Parallel Data Warehouse appliance. In addition, HDP 1.0 improves integration of Hadoop with relational Data Warehouses with HCatalog and Services Integration. This provides SQL-like language access to Hadoop so that customers can enrich their analysis by including insights from Hadoop environments into the Enterprise Data Warehouse and BI systems. You can also move data from a warehouses to Hadoop, which is useful when you need structured data from the database to enhance a mining model on Hadoop.Seamless Scale and Elasticity of the Cloud:Unlike most competitors, Microsoft offers two options for deploying Hadoop on Windows – in the cloud or on premise.Open Big Data Platform: Microsoft is committed to shipping an open Big Data platform. First, we have a strategic Partnership with Hortonworks, the market leading pioneer of Hadoop with more Apache Hadoop committers than any organization in the world! Second, through HDP 1.0, we offer a pure, 100% open source distribution of Apache Hadoop. There are no proprietary components.Unlike Oracle and other competitors, Microsoft is broadening access by giving back to the Apache Hadoop community. In tandem with Hortonworks, we are already submitting our first proposals to Apache projects, including new JavaScript libraries for Hadoop being developed by Microsoft, as well as the Hive ODBC Driver.Connecting with the World’s Data: Unlike the competition, Microsoft Big Data enables customers to make breakthrough discoveries by boosting their data and models with publicly available data and services from several sources including social media sites like Twitter, Facebook, etc. Thanks to applications and Mining algorithms on Windows Azure Marketplace, customers can uncover hidden patterns in their data.Discover Data:Today, it is hard enough to find the right dataset within an organization, let alone outside it. A typical Analyst spends too much time searching for the right data from thousands of sources, which adversely impacts productivity. We will move from a world of search to one of discovery where information is brought to the user based on who you are, and what you are working on.Combine with the World’s Data:By combining the data you need across personal, corporate, community and world data, you can derive much deeper insight and understanding of your business.Refine with External Data:We enable customers to convert their raw data into credible consistent data by enriching data through enterprise information management capabilities and advanced analytics. SQL Server provides strong data transformation capabilities through SQL Server Integration Services (IS), data cleansing through SQL Server Data Quality Services (DQS) and data governance through SQL Server Master Data Services (MDS). For predictive analytics, we offer data mining tools in SQL Server Analysis Services (SSAS). Through Microsoft’s Self-Service tools as well as the Data Mining Add-ins, access and mash-up data from virtually any source, including data from the Windows Azure Marketplace, and continue to refine those data sets to create compelling analytical application. Immersive Insight, Wherever you are: Through integration with its familiar Office and business intelligence tools, Microsoft Big Data offers customers new tools to gain insight from all types of data no matter the size or complexity. With Microsoft Big Data, once exhaust or low value data can be mined for gold.Immersive insights for all usersThrough a new Add-in for Excel, Microsoft enables analysts and business owners to interact with, and gain valuable insight from Hadoop functions all thorough the very familiar Excel interface. No other Vendor offers this capability.Immersive Insights from any data:Microsoft’s Big Data solution unlocks insights from structured and unstructured data through integration with existing Microsoft BI tools. Driving Insights through simplified Programming: Microsoft provides insights to more developers through simplified programming. Microsoft simplifies programming on Hadoop through integration with .NET and new JavaScript libraries that will make JavaScript a first class language on Hadoop. The new JavaScript libraries will enable developers to write powerful MapReduce programs with fewer lines of code in JavaScript. Developers can also deploy their JavaScript code from a simple web browser on any device that supports HTML 5.
The world of data is changing in a big way, and customer expectations are changing accordingly. We offer the most complete and scalable portfolio of data storage capabilities for structured, unstructured and streaming data both on premises and in the cloud. Customers can unleash new value through discovering and enriching and connecting to data, services and people outside their organizations and deliver new insights into Big Data for all users through familiar tools such as Office and SharePoint. Specifically, Microsoft’s Big Data solution offers the best end to end platform to manage any data, any size, anywhere with our industry leading database products SQL Server 2012, SQL Server Parallel Data Warehouse appliance, streaming data via SQL Server StreamInsight new capabilities such as our Hadoop-based distribution on Windows Azure and Windows Server for processing petabyte scale unstructured data. New value is created by enriching your data with the world’s data through the industry’s first data marketplace – Azure Marketplace Datamarket. Unlock actionable insights for all users through familiar tools such as PowerPivot for Excel and Power View for SharePoint delivered wherever you are and on any device.