SlideShare una empresa de Scribd logo
1 de 41
Descargar para leer sin conexión
Big Data World
With Azure HDInsight
About
▪ Koray Kocabaş
▪ Data Platform (SQL Server) MVP
▪ Yemeksepeti Business Intelligence
▪ Bahcesehir University Instructor
▪ @koraykocabas
▪ https://tr.linkedin.com/in/koraykocabas
▪ Blog: http://www.misjournal.com
▪ E-Mail: koraykocabas@outlook.com
Evolution of Data
Internet ofThings
Web 2.0
ERP/CRM
• Clickstream
• Sensors / RFID / Devices
• Log Files
• Spatial & GPS Coordinates
• Social Media
• Mobile
• Advertising
• eCommerce
• Digital Marketing
• Search Marketing
• Recommendations
• Payables
• Payroll
• Inventory
• Contacts
• DealTracking
• Sales Pipeline
Gigabytes Terabytes Petabytes Exabytes
Big Data Utility Gap
70 % of data
generated by
customers
80 % of data
being stored
3 % being
prepared for
analysis
0.5 % begin
analyzed
< 0.5 % begin
operationalized
How Does this Work in Practice
• Obsessively collect data
• Keep it forever
• Put the data in one place
Store
Everything
• Cleanse, organize and manage your data
• Make the right tools available
• Use the resources wisely to compute, analyze and understand data
Analyze
Anything
• Use insights to iteratively improve your product
Build the Right
Thing
Big Data isn’t meaningful
Big Data is not just data
65 + Million Members
50 Countries
1000 + Devices Supported
~25 PB Datawarehouse on Cloud (Read %10)
~550 Billion events daily
• 20 Million songs
• 24 Million Active Users
• 8 Million Daily Active Users
• 1TB of Compressed Data Generated From Users Per Day
• 700 node Hadoop Cluster
Big Data is not just data
Cannes Lion 2014 - Grand Prix - Titanium : Honda 'Sound of Honda Ayrton Senna 1989'
2000 + sensors, 200 GB data per a race
Big Data is not just data Boeing generates 20 TB data per hour
~10 Billions row processed (Daily) ~750 Millions row result set (Daily)
New E-Commerce Big Data Flow
Purchase
User
Product
Data Warehouse
Store it All
Overview (ETL to ELT)
Demand Architecture Data
Loading
Data
Preparation
Analytics Validation
Problem: How can we track?
Web Analytics Companies
Google Analytics & Adobe Omniture
Problem 1: How can we collect data
Problem 2: How can we store data
Problem 3: How can we visualize data
Problem 4: How can we predict data
Buraya Google Analytics Adobe Örneği koy
Azure vs Amazon
Collect Process Analyze Visualize Prediction
Store
Ready to Use
MOOC
Big Data Analytics, Implementing Big Data Analysis, Big Data Analytics with HDInsight, Big
Data and BusinessAnalytics Immersion,Getting Started with MicrosoftAzure Machine
Learning
RealWorld Big Data in Azure, Big Data on AmazonWeb Services, Reporting with MongoDB,
Cloud Business Intelligence, HDInsight Deep Dive: Storm HBase and Hive, Data Science &
Hadoop Workflows at ScaleWith Scalding, SQL on Hadoop - Analyzing Big Data with Hive
Introduction to Big Data Analytics, Machine Learning with Big Data, Big Data Analytics for
Healthcare, Data Science at Scale,The Data Scientist'sToolbox, R Programming
Master Big Data and Hadoop Step by Step, Hadoop Essentials, Hadoop Starter Kit, Data
Analytics using Hadoop eco system, Big Data: How Data Analytics IsTransforming the World,
Applied Data Science with R, Hadoop Enterprise Integration
Data Science and Analytics in Context, Introduction to Big Data with Spark, Data Science and
Machine Learning Essentials, Machine Learning for Data Science and Analytics, Statistical
Thinking for Data Science and Analytics
OLTP vs Hadoop
Hadoop Ecosystem
One more cup of coffee
https://azure.microsoft.com/en-us/pricing/details/hdinsight/
https://azure.microsoft.com/en-us/pricing/calculator/#
Developed by Facebook. Later it was adopted in Apache as an open source project.
A data warehouse infrastructure built on top of Hadoop for providing data
summarization, query and analysis
Integration between Hadoop and BI and visualization
Provides an SQL Like language called Hive QL to query data
Create Index, includes Partitioning
Not supported Update (isn’t correct)
Hive provides Users, Groups, Roles. But it’s not designed for high security.
Console (hive>), script, ODBC/JDBC, SQuirreL, HUE,Web Interface, etc.
Most popular Business IntelligenceTools support Hive
DataTypes
Primitive DataTypes: int, bigint, float, double, boolean, decimal, string, timestamp, date etc
Complex DataTypes: arrays, maps, structs
ARRAY<string>: workplace: istanbul, ankara
STRUCT<sex:string,age:int> : Female,25
MAP<string,int>: SOLR:92
Hive RDBMS
SQL Interface SQL Interface
Focus on analytics ay focus on online or analytics
No transactions Transactions usually supported
Partition adds, no random Inserts. Random Insert and Update supported
Distributed processing via map/reduce Distributed processing varies by vendor (if available)
Scales to hundreds of nodes Seldom scale beyond 20 nodes
Built for commodity hardware Often built on proprietary hardware (especially when scaling out)
Low cost per petabyte What's petabyte? :) (note: Are you sure?)
Hive Architecture
SQL on Hadoop Frameworks
• Apache Hive
• Impala
• Presto (Facebook)
• EMC/Pivotal HAWQ
• BigSQL by IBM
OLTP vs Hive
http://hortonworks.com/wp-content/uploads/downloads/2013/08/Hortonworks.CheatSheet.SQLtoHive.pdf
Originally developed atYahoo! (Huge contributions from Hortonworks,Twitter)
A Platform for analyzing large data sets that consists of high-level language for
expressing data analysis programs
Processing large semi-structured data sets using Hadoop Map Reduce
Write complex MapReduce jobs using a simple script language (Pig Latin)
Pig provides a bunch of aggregation function (AVG, COUNT, SUM, MAX, MIN etc.)
Developers can develop UDF
Console (grunt), script, java, HUE (Hadoop User Experience by Cloudera)
Easy to use and efficient
DataTypes
Simple DataTypes: int, float, double, chararray (UTF-8), bytearray
Complex DataTypes: map (Key,Value),Tuple, Bag (list of tuples)
Commands
Loading: LOAD, STORE, DUMP
Filtering: FILTER, FOREACH, DISTINCT
Grouping: JOIN, GROUP, COGROUP, CROSS
Ordering: ORDER, LIMIT
Merging & Split: UNION, SPLIT
DataTypes
Simple DataTypes: int, float, double, chararray (UTF-8), bytearray
Complex DataTypes: map (Key,Value),Tuple, Bag (list of tuples)
Commands
Loading: LOAD, STORE, DUMP
Filtering: FILTER, FOREACH, DISTINCT
Grouping: JOIN, GROUP, COGROUP, CROSS
Ordering: ORDER, LIMIT
Merging & Split: UNION, SPLIT
SQL SCRIPT PIG SCRIPT
SELECT * FROM TABLE A=LOAD 'DATA' USING PigStorage('t') AS (col1:int, col2:int, col3:int);
SELECT col1+col2, col3 FROM TABLE B=FOREACH A GENERATE col1+col2, col3;
SELECT col1+col2, col3 FROM TABLE WHERE col3>10 C=FILTER B by col3>10;
SELECT col1, col2, sum(col3) FROM X GROUP BY col1, col2 D=GROUP A BY (col1,col2);
E=FOREACH D GENERATE FLATTEN(group), SUM(A.col3);
... HAVING sum(col3) > 5 F=FILTER E BY $2>5;
... ORDER BY col1 G=ORDER F BY $0
SELECT DISTINCT col1 FROM TABLE I=FOREACH A GENERATE col1;
J=DISTINCT I;
SELECT col1,COUNT(DISTINCT col2) FROM TABLE GROUP BY
col1 K=GROUP A BY col1;
L=FOREACH K {M=DISTINCT A.col2; GENERATE FLATTEN(group),
count(M);}
Methods of Creating Azure HDInsight (Azure
Portal)
Methods of Creating Azure HDInsight
(Powershell)
Methods of Creating Azure HDInsight (.Net SDK)
Methods of Creating Azure HDInsight (SSIS)
Demo
• Create Hadoop Cluster (HDInsight)
• Create Database andTable (Hive)
• Data Load (Hive)
• Querying (Hive)
• Analyzing BreakingBad Subtitle (Pig)
Case Study Klout
• Collect and normalize more than
12 billion signals a day
• Hive data warehouse of more
than 1 trillion rows
• Klout acquired for $200 million
by LithiumTechnologies
Necessary to use HDInsight or Hadoop?
• Find the Major Problem
Thank you

Más contenido relacionado

La actualidad más candente

The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
The Hive
 

La actualidad más candente (20)

Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouse
 
Designing big data analytics solutions on azure
Designing big data analytics solutions on azureDesigning big data analytics solutions on azure
Designing big data analytics solutions on azure
 
The Microsoft BigData Story
The Microsoft BigData StoryThe Microsoft BigData Story
The Microsoft BigData Story
 
Introducing Azure SQL Data Warehouse
Introducing Azure SQL Data WarehouseIntroducing Azure SQL Data Warehouse
Introducing Azure SQL Data Warehouse
 
Azure Hd insigth news
Azure Hd insigth newsAzure Hd insigth news
Azure Hd insigth news
 
Introduction to PolyBase
Introduction to PolyBaseIntroduction to PolyBase
Introduction to PolyBase
 
DataMinds 2022 Azure Purview Erwin de Kreuk
DataMinds 2022 Azure Purview Erwin de KreukDataMinds 2022 Azure Purview Erwin de Kreuk
DataMinds 2022 Azure Purview Erwin de Kreuk
 
Building Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureBuilding Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft Azure
 
AWS Cloud Kata 2013 | Singapore - Getting to Scale on AWS
AWS Cloud Kata 2013 | Singapore - Getting to Scale on AWSAWS Cloud Kata 2013 | Singapore - Getting to Scale on AWS
AWS Cloud Kata 2013 | Singapore - Getting to Scale on AWS
 
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
 
Azure Databricks for Data Scientists
Azure Databricks for Data ScientistsAzure Databricks for Data Scientists
Azure Databricks for Data Scientists
 
Ai & Data Analytics 2018 - Azure Databricks for data scientist
Ai & Data Analytics 2018 - Azure Databricks for data scientistAi & Data Analytics 2018 - Azure Databricks for data scientist
Ai & Data Analytics 2018 - Azure Databricks for data scientist
 
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
 
Democratizing Data Science on Kubernetes
Democratizing Data Science on Kubernetes Democratizing Data Science on Kubernetes
Democratizing Data Science on Kubernetes
 
AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...
AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...
AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...
 
Machine Learning and AI
Machine Learning and AIMachine Learning and AI
Machine Learning and AI
 
Introduction to Azure HDInsight
Introduction to Azure HDInsightIntroduction to Azure HDInsight
Introduction to Azure HDInsight
 
Microsoft Data Platform - What's included
Microsoft Data Platform - What's includedMicrosoft Data Platform - What's included
Microsoft Data Platform - What's included
 
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionDifferentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
 
Architecting Big Data Applications with HDInsight
Architecting Big Data Applications with HDInsightArchitecting Big Data Applications with HDInsight
Architecting Big Data Applications with HDInsight
 

Destacado

Destacado (20)

Open up to a better learning ecosystem
Open up to a better learning ecosystemOpen up to a better learning ecosystem
Open up to a better learning ecosystem
 
Azure IOT
Azure IOTAzure IOT
Azure IOT
 
Go Serverless with Azure Functions
Go Serverless with Azure FunctionsGo Serverless with Azure Functions
Go Serverless with Azure Functions
 
Microsoft NYC 14
Microsoft NYC 14Microsoft NYC 14
Microsoft NYC 14
 
Belgian Windows Server 2012 Launch windows azure insights for the enterprise ...
Belgian Windows Server 2012 Launch windows azure insights for the enterprise ...Belgian Windows Server 2012 Launch windows azure insights for the enterprise ...
Belgian Windows Server 2012 Launch windows azure insights for the enterprise ...
 
Big data streaming with Apache Spark on Azure
Big data streaming with Apache Spark on AzureBig data streaming with Apache Spark on Azure
Big data streaming with Apache Spark on Azure
 
Fraud Detection using Hadoop
Fraud Detection using HadoopFraud Detection using Hadoop
Fraud Detection using Hadoop
 
Enterprise Data Workflows with Cascading and Windows Azure HDInsight
Enterprise Data Workflows with Cascading and Windows Azure HDInsightEnterprise Data Workflows with Cascading and Windows Azure HDInsight
Enterprise Data Workflows with Cascading and Windows Azure HDInsight
 
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
 
Azure api app métricas com application insights
Azure api app métricas com application insightsAzure api app métricas com application insights
Azure api app métricas com application insights
 
Azure Stream Analytics : Analyse Data in Motion
Azure Stream Analytics  : Analyse Data in MotionAzure Stream Analytics  : Analyse Data in Motion
Azure Stream Analytics : Analyse Data in Motion
 
2016-08-25 TechExeter - going serverless with Azure
2016-08-25 TechExeter - going serverless with Azure2016-08-25 TechExeter - going serverless with Azure
2016-08-25 TechExeter - going serverless with Azure
 
Software scope
Software scopeSoftware scope
Software scope
 
Going serverless
Going serverlessGoing serverless
Going serverless
 
Azure functions
Azure functionsAzure functions
Azure functions
 
Azure IoT Hub on a Toradex Colibri VF61 – Part 1 - Sending data to the cloud
Azure IoT Hub on a Toradex Colibri VF61 – Part 1 - Sending data to the cloudAzure IoT Hub on a Toradex Colibri VF61 – Part 1 - Sending data to the cloud
Azure IoT Hub on a Toradex Colibri VF61 – Part 1 - Sending data to the cloud
 
Spark on Azure HDInsight - spark meetup seattle
Spark on Azure HDInsight - spark meetup seattleSpark on Azure HDInsight - spark meetup seattle
Spark on Azure HDInsight - spark meetup seattle
 
Building big data solutions on azure
Building big data solutions on azureBuilding big data solutions on azure
Building big data solutions on azure
 
Microsoft Azure For Solutions Architects
Microsoft Azure For Solutions ArchitectsMicrosoft Azure For Solutions Architects
Microsoft Azure For Solutions Architects
 
Going serverless
Going serverlessGoing serverless
Going serverless
 

Similar a Azure HDInsight

Similar a Azure HDInsight (20)

Introduction To Big Data & Hadoop
Introduction To Big Data & HadoopIntroduction To Big Data & Hadoop
Introduction To Big Data & Hadoop
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptx
 
Big Data - HDInsight and Power BI
Big Data - HDInsight and Power BIBig Data - HDInsight and Power BI
Big Data - HDInsight and Power BI
 
Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...
Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...
Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...
 
Big Data Analytics: From SQL to Machine Learning and Graph Analysis
Big Data Analytics: From SQL to Machine Learning and Graph AnalysisBig Data Analytics: From SQL to Machine Learning and Graph Analysis
Big Data Analytics: From SQL to Machine Learning and Graph Analysis
 
Open source for customer analytics
Open source for customer analyticsOpen source for customer analytics
Open source for customer analytics
 
Big dataarchitecturesandecosystem+nosql
Big dataarchitecturesandecosystem+nosqlBig dataarchitecturesandecosystem+nosql
Big dataarchitecturesandecosystem+nosql
 
Trivadis Azure Data Lake
Trivadis Azure Data LakeTrivadis Azure Data Lake
Trivadis Azure Data Lake
 
the Data World Distilled
the Data World Distilledthe Data World Distilled
the Data World Distilled
 
Hadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewHadoop Master Class : A concise overview
Hadoop Master Class : A concise overview
 
Predictive Analysis of Financial Fraud Detection using Azure and Spark ML
Predictive Analysis of Financial Fraud Detection using Azure and Spark MLPredictive Analysis of Financial Fraud Detection using Azure and Spark ML
Predictive Analysis of Financial Fraud Detection using Azure and Spark ML
 
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenariosThe Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
 
Big Data Trend and Open Data
Big Data Trend and Open DataBig Data Trend and Open Data
Big Data Trend and Open Data
 
Best Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache SparkBest Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache Spark
 
Social media analytics using Azure Technologies
Social media analytics using Azure TechnologiesSocial media analytics using Azure Technologies
Social media analytics using Azure Technologies
 
Big Data
Big DataBig Data
Big Data
 
Webinar: How Banks Use MongoDB as a Tick Database
Webinar: How Banks Use MongoDB as a Tick DatabaseWebinar: How Banks Use MongoDB as a Tick Database
Webinar: How Banks Use MongoDB as a Tick Database
 
Information Security Analytics
Information Security AnalyticsInformation Security Analytics
Information Security Analytics
 
Big Data with Not Only SQL
Big Data with Not Only SQLBig Data with Not Only SQL
Big Data with Not Only SQL
 
Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02
Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02
Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02
 

Más de Koray Kocabas (6)

Turning data from insights into value
Turning data from insights into valueTurning data from insights into value
Turning data from insights into value
 
Predictive modelling with azure ml
Predictive modelling with azure mlPredictive modelling with azure ml
Predictive modelling with azure ml
 
Web intelligence
Web intelligenceWeb intelligence
Web intelligence
 
Ssas dmx ile kurum içi verilerin i̇şlenmesi
Ssas dmx ile kurum içi verilerin i̇şlenmesiSsas dmx ile kurum içi verilerin i̇şlenmesi
Ssas dmx ile kurum içi verilerin i̇şlenmesi
 
Azure machine learning ile tahminleme modelleri
Azure machine learning ile tahminleme modelleriAzure machine learning ile tahminleme modelleri
Azure machine learning ile tahminleme modelleri
 
Data mining using sql server v1.2
Data mining using sql server v1.2Data mining using sql server v1.2
Data mining using sql server v1.2
 

Último

➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
only4webmaster01
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
AroojKhan71
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
amitlee9823
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
MarinCaroMartnezBerg
 

Último (20)

VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 

Azure HDInsight

  • 1. Big Data World With Azure HDInsight
  • 2. About ▪ Koray Kocabaş ▪ Data Platform (SQL Server) MVP ▪ Yemeksepeti Business Intelligence ▪ Bahcesehir University Instructor ▪ @koraykocabas ▪ https://tr.linkedin.com/in/koraykocabas ▪ Blog: http://www.misjournal.com ▪ E-Mail: koraykocabas@outlook.com
  • 3. Evolution of Data Internet ofThings Web 2.0 ERP/CRM • Clickstream • Sensors / RFID / Devices • Log Files • Spatial & GPS Coordinates • Social Media • Mobile • Advertising • eCommerce • Digital Marketing • Search Marketing • Recommendations • Payables • Payroll • Inventory • Contacts • DealTracking • Sales Pipeline Gigabytes Terabytes Petabytes Exabytes
  • 4. Big Data Utility Gap 70 % of data generated by customers 80 % of data being stored 3 % being prepared for analysis 0.5 % begin analyzed < 0.5 % begin operationalized
  • 5. How Does this Work in Practice • Obsessively collect data • Keep it forever • Put the data in one place Store Everything • Cleanse, organize and manage your data • Make the right tools available • Use the resources wisely to compute, analyze and understand data Analyze Anything • Use insights to iteratively improve your product Build the Right Thing
  • 6. Big Data isn’t meaningful Big Data is not just data 65 + Million Members 50 Countries 1000 + Devices Supported ~25 PB Datawarehouse on Cloud (Read %10) ~550 Billion events daily
  • 7. • 20 Million songs • 24 Million Active Users • 8 Million Daily Active Users • 1TB of Compressed Data Generated From Users Per Day • 700 node Hadoop Cluster
  • 8. Big Data is not just data Cannes Lion 2014 - Grand Prix - Titanium : Honda 'Sound of Honda Ayrton Senna 1989' 2000 + sensors, 200 GB data per a race
  • 9. Big Data is not just data Boeing generates 20 TB data per hour
  • 10. ~10 Billions row processed (Daily) ~750 Millions row result set (Daily)
  • 11. New E-Commerce Big Data Flow Purchase User Product Data Warehouse Store it All
  • 12.
  • 13. Overview (ETL to ELT) Demand Architecture Data Loading Data Preparation Analytics Validation
  • 14. Problem: How can we track?
  • 16. Google Analytics & Adobe Omniture Problem 1: How can we collect data Problem 2: How can we store data Problem 3: How can we visualize data Problem 4: How can we predict data
  • 17. Buraya Google Analytics Adobe Örneği koy
  • 18. Azure vs Amazon Collect Process Analyze Visualize Prediction Store
  • 20. MOOC Big Data Analytics, Implementing Big Data Analysis, Big Data Analytics with HDInsight, Big Data and BusinessAnalytics Immersion,Getting Started with MicrosoftAzure Machine Learning RealWorld Big Data in Azure, Big Data on AmazonWeb Services, Reporting with MongoDB, Cloud Business Intelligence, HDInsight Deep Dive: Storm HBase and Hive, Data Science & Hadoop Workflows at ScaleWith Scalding, SQL on Hadoop - Analyzing Big Data with Hive Introduction to Big Data Analytics, Machine Learning with Big Data, Big Data Analytics for Healthcare, Data Science at Scale,The Data Scientist'sToolbox, R Programming Master Big Data and Hadoop Step by Step, Hadoop Essentials, Hadoop Starter Kit, Data Analytics using Hadoop eco system, Big Data: How Data Analytics IsTransforming the World, Applied Data Science with R, Hadoop Enterprise Integration Data Science and Analytics in Context, Introduction to Big Data with Spark, Data Science and Machine Learning Essentials, Machine Learning for Data Science and Analytics, Statistical Thinking for Data Science and Analytics
  • 23. One more cup of coffee https://azure.microsoft.com/en-us/pricing/details/hdinsight/ https://azure.microsoft.com/en-us/pricing/calculator/#
  • 24.
  • 25. Developed by Facebook. Later it was adopted in Apache as an open source project. A data warehouse infrastructure built on top of Hadoop for providing data summarization, query and analysis Integration between Hadoop and BI and visualization Provides an SQL Like language called Hive QL to query data Create Index, includes Partitioning Not supported Update (isn’t correct) Hive provides Users, Groups, Roles. But it’s not designed for high security. Console (hive>), script, ODBC/JDBC, SQuirreL, HUE,Web Interface, etc. Most popular Business IntelligenceTools support Hive
  • 26. DataTypes Primitive DataTypes: int, bigint, float, double, boolean, decimal, string, timestamp, date etc Complex DataTypes: arrays, maps, structs ARRAY<string>: workplace: istanbul, ankara STRUCT<sex:string,age:int> : Female,25 MAP<string,int>: SOLR:92 Hive RDBMS SQL Interface SQL Interface Focus on analytics ay focus on online or analytics No transactions Transactions usually supported Partition adds, no random Inserts. Random Insert and Update supported Distributed processing via map/reduce Distributed processing varies by vendor (if available) Scales to hundreds of nodes Seldom scale beyond 20 nodes Built for commodity hardware Often built on proprietary hardware (especially when scaling out) Low cost per petabyte What's petabyte? :) (note: Are you sure?)
  • 27. Hive Architecture SQL on Hadoop Frameworks • Apache Hive • Impala • Presto (Facebook) • EMC/Pivotal HAWQ • BigSQL by IBM
  • 29.
  • 30. Originally developed atYahoo! (Huge contributions from Hortonworks,Twitter) A Platform for analyzing large data sets that consists of high-level language for expressing data analysis programs Processing large semi-structured data sets using Hadoop Map Reduce Write complex MapReduce jobs using a simple script language (Pig Latin) Pig provides a bunch of aggregation function (AVG, COUNT, SUM, MAX, MIN etc.) Developers can develop UDF Console (grunt), script, java, HUE (Hadoop User Experience by Cloudera) Easy to use and efficient
  • 31. DataTypes Simple DataTypes: int, float, double, chararray (UTF-8), bytearray Complex DataTypes: map (Key,Value),Tuple, Bag (list of tuples) Commands Loading: LOAD, STORE, DUMP Filtering: FILTER, FOREACH, DISTINCT Grouping: JOIN, GROUP, COGROUP, CROSS Ordering: ORDER, LIMIT Merging & Split: UNION, SPLIT
  • 32. DataTypes Simple DataTypes: int, float, double, chararray (UTF-8), bytearray Complex DataTypes: map (Key,Value),Tuple, Bag (list of tuples) Commands Loading: LOAD, STORE, DUMP Filtering: FILTER, FOREACH, DISTINCT Grouping: JOIN, GROUP, COGROUP, CROSS Ordering: ORDER, LIMIT Merging & Split: UNION, SPLIT SQL SCRIPT PIG SCRIPT SELECT * FROM TABLE A=LOAD 'DATA' USING PigStorage('t') AS (col1:int, col2:int, col3:int); SELECT col1+col2, col3 FROM TABLE B=FOREACH A GENERATE col1+col2, col3; SELECT col1+col2, col3 FROM TABLE WHERE col3>10 C=FILTER B by col3>10; SELECT col1, col2, sum(col3) FROM X GROUP BY col1, col2 D=GROUP A BY (col1,col2); E=FOREACH D GENERATE FLATTEN(group), SUM(A.col3); ... HAVING sum(col3) > 5 F=FILTER E BY $2>5; ... ORDER BY col1 G=ORDER F BY $0 SELECT DISTINCT col1 FROM TABLE I=FOREACH A GENERATE col1; J=DISTINCT I; SELECT col1,COUNT(DISTINCT col2) FROM TABLE GROUP BY col1 K=GROUP A BY col1; L=FOREACH K {M=DISTINCT A.col2; GENERATE FLATTEN(group), count(M);}
  • 33. Methods of Creating Azure HDInsight (Azure Portal)
  • 34. Methods of Creating Azure HDInsight (Powershell)
  • 35. Methods of Creating Azure HDInsight (.Net SDK)
  • 36. Methods of Creating Azure HDInsight (SSIS)
  • 37. Demo • Create Hadoop Cluster (HDInsight) • Create Database andTable (Hive) • Data Load (Hive) • Querying (Hive) • Analyzing BreakingBad Subtitle (Pig)
  • 38.
  • 39. Case Study Klout • Collect and normalize more than 12 billion signals a day • Hive data warehouse of more than 1 trillion rows • Klout acquired for $200 million by LithiumTechnologies
  • 40. Necessary to use HDInsight or Hadoop? • Find the Major Problem