Introduction to Microsoft Hadoop

•Descargar como PPTX, PDF•

2 recomendaciones•403 vistas

This is the slide deck from the June meeting of the Boise Web Technologies Group. Cindy Gross from Microsoft presented on Hadoop.

Tecnología Empresariales

INTRODUCTION TO
HADOOP

Cindy Gross | @SQLCindy | SQLCAT PM
http://blogs.msdn.com/cindygross

SELECT
deviceplatform, state, c
ountry
FROM hivesampletable
LIMIT 200;

Hadoop: The Definitive Guide by Tom White
SQL Server Sqoop http://bit.ly/rulsjX
JavaScript http://bit.ly/wdaTv6
Twitter https://twitter.com/#!/search/%23bigdata

Hive http://hive.apache.org
Excel to Hadoop via Hive ODBC http://tinyurl.com/7c4qjjj
Hadoop On Azure Videos http://tinyurl.com/6munnx2
Klout http://tinyurl.com/6qu9php
Microsoft Big Data http://microsoft.com/bigdata
Denny Lee http://dennyglee.com/category/bigdata/
Carl Nolan http://tinyurl.com/6wbfxy9
Cindy Gross http://tinyurl.com/SmallBitesBigData

@SQLCindy / @SQLCATWoman
http://blogs.msdn.com/cindygross

Más contenido relacionado

Destacado

Hadoop overviewHyeonSeok Choi

Hadoop 제주대DaeHeon Oh

하둡완벽가이드 Ch9HyeonSeok Choi

HdfsMungyu Choi

하둡완벽가이드 Ch6. 맵리듀스 작동 방법HyeonSeok Choi

hadoop ch1Mungyu Choi

Apache Hadoop Java APIAdam Kawa

빅데이터, big dataH K Yoon

about hadoop yesEunsil Yoon

Hadoop과 SQL-on-Hadoop (A short intro to Hadoop and SQL-on-Hadoop)Matthew (정재화)

하둡 HDFS 훑어보기beom kyun choi

Hadoop Introduction (1.0)Keeyong Han

Big data Hadoop Analytic and Data warehouse comparison guideDanairat Thanabodithammachari

Hadoop HDFS Detailed IntroductionHanborq Inc.

Cluster - sparkHyeonSeok Choi

Introduction To Map Reducerantav

introduction to data processing using Hadoop and PigRicardo Varela

Pig, Making Hadoop EasyNick Dimiduk

learning spark - Chatper8. Tuning and DebuggingMungyu Choi

Integration of Hive and HBaseHortonworks

Destacado (20)

Hadoop overview

Hadoop 제주대

하둡완벽가이드 Ch9

Hdfs

하둡완벽가이드 Ch6. 맵리듀스 작동 방법

hadoop ch1

Apache Hadoop Java API

빅데이터, big data

about hadoop yes

Hadoop과 SQL-on-Hadoop (A short intro to Hadoop and SQL-on-Hadoop)

하둡 HDFS 훑어보기

Hadoop Introduction (1.0)

Big data Hadoop Analytic and Data warehouse comparison guide

Hadoop HDFS Detailed Introduction

Cluster - spark

Introduction To Map Reduce

introduction to data processing using Hadoop and Pig

Pig, Making Hadoop Easy

learning spark - Chatper8. Tuning and Debugging

Integration of Hive and HBase

Similar a Introduction to Microsoft Hadoop

HADOOP ONLINE TRAININGSanthosh Sap

HADOOP ONLINE TRAININGtraining3

Getting your Big Data on with HDInsightSimon Elliston Ball

CloudOps CloudStack Days, Austin April 2015CloudOps2005

Orienit hadoop practical cluster setup screenshotsKalyan Hadoop

NYC_2016_slidesNathan Halko

Uotm workshopRavi Patel

Big Data in the Cloud - Montreal April 2015Cindy Gross

Big Data Integration Webinar: Getting Started With Hadoop Big DataPentaho

HP Helion European Webinar Series ,Webinar #3 BeMyApp

2016 05-cloudsoft-amp-and-brooklyn-newBradDesAulniers2

Practical Hadoop Big Data Training Course by Certified ArchitectKamal A

Hadoop contentHadoop online training

Zeronights 2015 - Big problems with big data - Hadoop interfaces securityJakub Kałużny

Effective DevOps by using Docker and Chef together !WhiteHedge Technologies Inc.

Big problems with big data – Hadoop interfaces securitySecuRing

Hadoop and Mapreduce CertificationVskills

Hadoop Hive Tutorial | Hive Fundamentals | Hive ArchitectureSkillspeed

Capital onehadoopintroDoug Chang

MongoDB & Hadoop, Sittin' in a TreeMongoDB

Similar a Introduction to Microsoft Hadoop (20)

HADOOP ONLINE TRAINING

Getting your Big Data on with HDInsight

CloudOps CloudStack Days, Austin April 2015

Orienit hadoop practical cluster setup screenshots

NYC_2016_slides

Uotm workshop

Big Data in the Cloud - Montreal April 2015

Big Data Integration Webinar: Getting Started With Hadoop Big Data

HP Helion European Webinar Series ,Webinar #3

2016 05-cloudsoft-amp-and-brooklyn-new

Practical Hadoop Big Data Training Course by Certified Architect

Hadoop content

Zeronights 2015 - Big problems with big data - Hadoop interfaces security

Effective DevOps by using Docker and Chef together !

Big problems with big data – Hadoop interfaces security

Hadoop and Mapreduce Certification

Hadoop Hive Tutorial | Hive Fundamentals | Hive Architecture

Capital onehadoopintro

MongoDB & Hadoop, Sittin' in a Tree

Último

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93

Exploring Multimodal Embeddings with MilvusZilliz

Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub

Why Teams call analytics are critical to your entire businesspanagenda

Introduction to use of FHIR Documents in ABDMKumar Satyam

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays

Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays

[BuildWithAI] Introduction to Gemini.pdfSandro Moreira

Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub

How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes

Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea

Platformless Horizons for Digital AdaptabilityWSO2

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea

Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood

WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2

DBX First Quarter 2024 Investor PresentationDropbox

Spring Boot vs Quarkus the ultimate battle - DevoxxUKJago de Vreede

Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya

Introduction to Microsoft Hadoop

1. INTRODUCTION TO HADOOP Cindy Gross | @SQLCindy | SQLCAT PM http://blogs.msdn.com/cindygross

4. SELECT deviceplatform, state, c ountry FROM hivesampletable LIMIT 200;

5. Sqoop to/from relational

6. Ha do op Not fully pre- structured

7. Hadoop Ecosystem Snapshot

10.

11.

12. Open Source Apache Hadoop

13.

14.

15.

16.

17.

18.

19. Hadoop: The Definitive Guide by Tom White SQL Server Sqoop http://bit.ly/rulsjX JavaScript http://bit.ly/wdaTv6 Twitter https://twitter.com/#!/search/%23bigdata Hive http://hive.apache.org Excel to Hadoop via Hive ODBC http://tinyurl.com/7c4qjjj Hadoop On Azure Videos http://tinyurl.com/6munnx2 Klout http://tinyurl.com/6qu9php Microsoft Big Data http://microsoft.com/bigdata Denny Lee http://dennyglee.com/category/bigdata/ Carl Nolan http://tinyurl.com/6wbfxy9 Cindy Gross http://tinyurl.com/SmallBitesBigData

20. @SQLCindy / @SQLCATWoman http://blogs.msdn.com/cindygross

21. INTRODUCTION TO HADOOP Cindy Gross | @SQLCindy | SQLCAT PM http://blogs.msdn.com/cindygross

Notas del editor

Hadoop is part of NOSQL (Not Only SQL) and it’s a bit wild. You explore in/with Hadoop. You learn new things. You test hypotheses on unstructured jungle data. You eliminate noise.Then you take the best learnings and share them with the world via a relational or multidimensional database.Atomicity, consistency, isolation, durability (ACID) is used in relational databases to ensure immediate consistency. But what if eventual consistency is good enough? In stomps BASE – Basically available, soft state, eventual consistencyScale up or scale out?Pay up front or pay as you go?Which IT skills do you utilize?
Hive is a database that sits on top of Hadoop. HiveQL (HQL) generates (possibly multiple) MapReduce programs to execute the joins, filters, aggregates, etc. The language is very SQL-like, perhaps closer to MySQL but still very familiar.
Get your data from anywhere. There’s a data explosion and we can now use more of it than ever before. The HadoopOnAzure.com portal provides an easy interface to pull in data from sources including secure FTP, Amazon S3, Azure blob store, Azure Data Market. Use Sqoop to move data between Hadoop and SQL Server, PDW, SQL Azure. The Hive ODBC driver lets you display Hive data in Excel or apps.
Many equate big data to MapReduce and in particular Hadoop. However, other applications like streaming, machine learning, and PDW type systems can also be described as big data solutions. Big Data is unstructured, flows fast, has many formats, and/or has quickly changing formats. How big is “big” really depends on what is too big/complex for your environment (hardware, people, software, processes). It’s done by scaling out on commodity (low end enterprise level) hardware.
Big data solutions are comprised of matching the right set of tools to the right set of problems (architectures are compositional, not monolithic)Need to select appropriate combinations of storage, analytics and consumers.
For demo steps see: http://blogs.msdn.com/b/cindygross/archive/2012/05/07/load-data-from-the-azure-datamarket-to-hadoop-on-azure-small-bites-of-big-data.aspx
Big data is often described as problems that have one or more of the 3 (or 4) Vs – volume, velocity, variety, variability. Think about big data when you describe a problem with terms like tame the chaos, reduce the complexity, explore, I don’t know what I don’t know, unknown unknowns, unstructured, changing quickly, too much for what my environment can handle now, unused data.Volume = more data than the current environment can handle with vertical scaling, need to make sure of data that it is currently too expensive to useVelocity = Small decision window compared to data change rate, ask how quickly you need to analyze and how quickly data arrivesVariety = many different formats that are expensive to integrate, probably from many data sources/feedsVariability = many possible interpretations of the data
It’s not the hammer for every problem and it’s not the answer to every large store of data. It does not replace relational or multi-dimensional dbs, it’s a solution to a different sort of problem. It’s a new, specialized type of db for certain scenarios. It will feed other types of dbs.
Microsoft takes what is already there, makes it run on Windows, and offers the option of full control or simplificationHadoop in the cloud simplifies managementHadoop on Windows lets you reuse existing skillsJavaScript opens up more hiring optionsHive ODBC Driver / Excel Addin lets you combine data, move dataSqoop moves data – Linux based version to/from SQL available now, Windows based soon
Demo2 –Mashup1) Hive Panea. Excel, blank worksheet, datab. Use your HadoopOnAzure clusterc. Object = Gender2007 or whatever table you pre-loaded in Hive (select * from gender2007 limit 200)d. KEY POINT = pulled data from multiple files across many nodes and displayed via ODBC is user friendly format – not easy in Hadoop world2) PowerPivota. KEY POINTS = uses local memory, pulls data from multiple data sources (structured and unstructured), can be stored/scheduled in Sharepoint, creates relationships to add value -- MASHUPb. Excel file DeviceAnalysisByRegion.xlsx (worksheet with region/country data, relationship defined between Gender2007 country and this country data), click on PowerPivot tab and open blank tabc. Click on PowerPivot Window – show each tab is data from a different source – hivesampletable (Hadoop/unstructured) and regions (could be anything/structured)d. Click on diagram view – show relationships, rich valuee. Pivot table.pivotchart.newf. Close hive query windowg. Values = count of platform, axis=platform, zoom to selectionh. Slicers Vertical = regions hierarchyi. Region = North America, country = Canada == Windows Phone jokesj. KEY Load to Sharepoint, schedule refreshes, use for Power View
Expand your audience of decision makers by making BI easier with self-service, visualizationOur products interact and work together + one company for questions/issuesUse existing hardware, employeesExpand options for hiring/training/re-training with familiar tools Familiar tools = less rampup timeCloud = elasticity, easy scale up/down, pay for what you useEasier to move data to/from HDFS
It’s about separating the signal from the noise so you have insight to make decisions to take action. Discover, explore, gain insight.
Familiar tools, new tools, ease of use
Take action! All the exploring doesn’t help if you don’t do something! Something might be starting another round of exploring, but eventually DO SOMETHING!

Introduction to Microsoft Hadoop

Recomendados

Recomendados

Más contenido relacionado

Destacado

Destacado (20)

Similar a Introduction to Microsoft Hadoop

Similar a Introduction to Microsoft Hadoop (20)

Último

Último (20)

Introduction to Microsoft Hadoop

Notas del editor