Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.
Social Media Analytics using Azure Technologies
Koray Kocabaş
#sqlsatistanbul
Sponsors
Media Sponsor
Main Sponsor
Swag Sponsor
#sqlsatistanbul
What do we need ?
Just a quick blog post, update on LinkedIn, or a tweet on Twitter is all we need.
#sqlsatistanbul
Session Evaluations
Evaluate sessions and get a chance for the raffle:
http://spoke.at/sqlsat451
#sqlsatistanbul
About Me...
Koray Kocabaş
Data Platform (SQL Server) MVP
Yemeksepeti Business Intelligence
Bahcesehir Univ...
The Data Deluge
#sqlsatistanbul
What kind of solutions using Big Data
• Clickstream analysis to find buying patterns
• Sentiment analysis ...
Twitter launched in 2006
Active users per month
~316 Millions (August)
~320 Millions (October)
%80 of users is Mobile!
Twe...
Why it is so Popular?
Event based data
Unstructured data
Detail event information
Streaming
Who is the influencer
TweetTracker
TweetArchivist
Ra...
#sqlsatistanbul
PROBLEMS...
#sqlsatistanbul
1. Collect Twitter Data & Get Simple Information
2. Data Enrichment
3. Store Semi - Structured Data
4. Ana...
#sqlsatistanbul
#sqlsatistanbul
Collect Twitter Data & Get Simple Information
#sqlsatistanbul
#sqlsatistanbul
Real-Time Analytics
Intake millions of events per second
Process data from connected devices/apps
Detect p...
#sqlsatistanbul
Stream Analytics Query Language Functions
DML Statements
• SELECT
• FROM
• WHERE
• GROUP BY
• HAVING
• CAS...
0 5 10 15 20 25 30
0 5 10 15 20 25 30
4
4
5
The count of tweets every 10 secondsTumbling Windows
SELECT Topic, Count(*) AS Count
FROM sqlsatu...
0 5 10 15 20 25 30
Every 5 seconds give me the count of
tweets over 10 seconds by topic
Hopping Windows
SELECT Topic, Coun...
0 5 10 15 20 25 30
If the tweets count is above a threshold
of 8 for a total of 5 seconds
Sliding Windows
SELECT Topic, Co...
#sqlsatistanbul
Stream Analytics
Event Hub
#sqlsatistanbul
Data Enrichment
#sqlsatistanbul
Data Azure Machine Learning Consumers
Local storage
Upload data from PC…
Cloud storage
Azure Storage
Azure...
#sqlsatistanbul
#sqlsatistanbul
https://sites.google.com/site/miningtwitter/questions/sentiment/sentiment
http://www.slideshare.net/ajayoh...
#sqlsatistanbul
SQL Server 2016
CTP 3.1
Revolution R Open
3.2.2 for Revolution
R Enterprise
Revolution R
Enterprise 7.5.0
...
#sqlsatistanbul
The Klout Score is a number between 1-100 that
represents your influence.
Collect and normalize more than ...
#sqlsatistanbul
Store Semi - Structured Data
Analyze Semi - Structured Data
#sqlsatistanbul
#sqlsatistanbul
Developed by Facebook. Later it was adopted in Apache as an open source project.
A data warehouse infrastr...
#sqlsatistanbul
Data Types
Primitive Data Types: int, bigint, float, double, boolean, decimal, string, timestamp, date etc...
#sqlsatistanbul
http://hortonworks.com/wp-content/uploads/downloads/2013/08/Hortonworks.CheatSheet.SQLtoHive.pdf
#sqlsatistanbul
#sqlsatistanbul
Originally developed at Yahoo! (Huge contributions from Hortonworks, Twitter)
A Platform for analyzing lar...
#sqlsatistanbul
Data Types
Simple Data Types: int, float, double, chararray (UTF-8), bytearray
Complex Data Types: map (Ke...
#sqlsatistanbul
Ohhh Finally Demo Time!
#sqlsatistanbul
Visualize Meaningful Results
#sqlsatistanbul
#sqlsatistanbul
Big Data Analytics, Implementing Big Data Analysis, Big Data Analytics with HDInsight, Big Data
and Busine...
#sqlsatistanbul
Social media analytics using Azure Technologies
Social media analytics using Azure Technologies
Social media analytics using Azure Technologies
Próxima SlideShare
Cargando en…5
×

4

Compartir

Descargar para leer sin conexión

Social media analytics using Azure Technologies

Descargar para leer sin conexión

Social media are computer-mediated tools that allow people to create, share or exchange information, ideas, and pictures/videos in virtual communities and networks. To sum up Social Media is everything for your customers and Your company need to listen them to understand, make a custom offer or improve loyalty etc. Azure Stream Analytics and HDInsight platforms can solve this problem for you. We'll focus on how to get Twitter data using Stream Analytics and how to make data enrichment and storing using HDInsight and What is the problem about sentiment analytics using Azure Machine Learning.

Social media analytics using Azure Technologies

  1. 1. Social Media Analytics using Azure Technologies Koray Kocabaş
  2. 2. #sqlsatistanbul Sponsors Media Sponsor Main Sponsor Swag Sponsor
  3. 3. #sqlsatistanbul What do we need ? Just a quick blog post, update on LinkedIn, or a tweet on Twitter is all we need.
  4. 4. #sqlsatistanbul Session Evaluations Evaluate sessions and get a chance for the raffle: http://spoke.at/sqlsat451
  5. 5. #sqlsatistanbul About Me... Koray Kocabaş Data Platform (SQL Server) MVP Yemeksepeti Business Intelligence Bahcesehir University Instructor @koraykocabas https://tr.linkedin.com/in/koraykocabas Blog: http://www.misjournal.com E-Mail: koraykocabas@outlook.com
  6. 6. The Data Deluge
  7. 7. #sqlsatistanbul What kind of solutions using Big Data • Clickstream analysis to find buying patterns • Sentiment analysis for text data • Fraud detection; forensic analysis • Machine learning • Healthcare research • Predictive Maintenance Just dream it. Data is everywhere!
  8. 8. Twitter launched in 2006 Active users per month ~316 Millions (August) ~320 Millions (October) %80 of users is Mobile! Tweets per second 6.000 Tweets per day ~500 Million Tweets per year ~200 Billion Twitter generate a lot of data (12 TB per day) 90 % of buyers trust peer recommendations 55 % of Twitter users are females The average Twitter user has 27 Followers
  9. 9. Why it is so Popular?
  10. 10. Event based data Unstructured data Detail event information Streaming Who is the influencer TweetTracker TweetArchivist Radian6 Sysomos Tweet Deck Hootsuite Twitter Problems Dashboards For Tweets
  11. 11. #sqlsatistanbul PROBLEMS...
  12. 12. #sqlsatistanbul 1. Collect Twitter Data & Get Simple Information 2. Data Enrichment 3. Store Semi - Structured Data 4. Analyze Semi - Structured Data 5. Visualize Meaningful Results
  13. 13. #sqlsatistanbul
  14. 14. #sqlsatistanbul Collect Twitter Data & Get Simple Information
  15. 15. #sqlsatistanbul
  16. 16. #sqlsatistanbul Real-Time Analytics Intake millions of events per second Process data from connected devices/apps Detect patterns and anomalies in streaming data Transform, augment, correlate, temporal operations No hardware (PaaS offering) Up and running in a few clicks (and within minutes) No performance tuning Efficiently pay only for usage Not paying for idle resources Low startup costs Scale from small to large when required Only SQL queries needed (Thousand lines of code in other solutions, such as Apache Storm)
  17. 17. #sqlsatistanbul Stream Analytics Query Language Functions DML Statements • SELECT • FROM • WHERE • GROUP BY • HAVING • CASE • JOIN • UNION Windowing Extensions • Tumbling Window • Hopping Window • Sliding Window • Duration Aggregate Functions • SUM • COUNT • AVG • MIN • MAX Scaling Functions • WITH • PARTITION BY Date and Time Functions • DATENAME • DATEPART • DAY • MONTH • YEAR • DATETIMEFROMPARTS • DATEDIFF • DATADD String Functions • LEN • CONCAT • CHARINDEX • SUBSTRING Statistical Functions • VAR • VARP • STDEV
  18. 18. 0 5 10 15 20 25 30
  19. 19. 0 5 10 15 20 25 30 4 4 5 The count of tweets every 10 secondsTumbling Windows SELECT Topic, Count(*) AS Count FROM sqlsaturdaystream TIMESTAMP BY CreatedAt GROUP BY Topic, TumblingWindow(second,10)
  20. 20. 0 5 10 15 20 25 30 Every 5 seconds give me the count of tweets over 10 seconds by topic Hopping Windows SELECT Topic, Count(*) AS Count FROM sqlsaturdaystream TIMESTAMP BY CreatedAt GROUP BY Topic, HoppingWindow(second,10,5)
  21. 21. 0 5 10 15 20 25 30 If the tweets count is above a threshold of 8 for a total of 5 seconds Sliding Windows SELECT Topic, Count(*) AS Count FROM sqlsaturdaystream TIMESTAMP BY CreatedAt GROUP BY Topic, SlidingWindow(second,5) HAVING Count(*)>8
  22. 22. #sqlsatistanbul Stream Analytics Event Hub
  23. 23. #sqlsatistanbul Data Enrichment
  24. 24. #sqlsatistanbul Data Azure Machine Learning Consumers Local storage Upload data from PC… Cloud storage Azure Storage Azure Table Hive etc. Excel Business Apps Business problem Modeling Business valueDeployment Azure Marketplace (Applications store) Azure ML Gallery (community) ML Web Services (REST API Services) ML Studio (Web IDE) Workspace: Experiments Datasets Trained models Notebooks Access settings Data Model API Manage API
  25. 25. #sqlsatistanbul
  26. 26. #sqlsatistanbul https://sites.google.com/site/miningtwitter/questions/sentiment/sentiment http://www.slideshare.net/ajayohri/twitter-analysis-by-kaify-rais Sentiment140 (formerly known as "Twitter Sentiment") allows you to discover the sentiment of a brand, product, or topic on Twitter.
  27. 27. #sqlsatistanbul SQL Server 2016 CTP 3.1 Revolution R Open 3.2.2 for Revolution R Enterprise Revolution R Enterprise 7.5.0 Revolution R Enterprise is able to deliver speeds 42 times faster than competing technology from SAS. Microsoft announced on January 23, 2015 that they had reached an agreement to purchase Revolution Analytics for an as yet undisclosed amount.
  28. 28. #sqlsatistanbul The Klout Score is a number between 1-100 that represents your influence. Collect and normalize more than 12 billion signals a day Hive data warehouse of more than 1 trillion rows Klout acquired for $200 million by Lithium Technologies
  29. 29. #sqlsatistanbul Store Semi - Structured Data Analyze Semi - Structured Data
  30. 30. #sqlsatistanbul
  31. 31. #sqlsatistanbul Developed by Facebook. Later it was adopted in Apache as an open source project. A data warehouse infrastructure built on top of Hadoop for providing data summarization, query and analysis Integration between Hadoop and BI and visualization Provides an SQL Like language called Hive QL to query data Create Index, includes Partitioning Not supported Update (isn’t correct) Hive provides Users, Groups, Roles. But it’s not designed for high security. Console (hive>), script, ODBC/JDBC, SQuirreL, HUE, Web Interface, etc. Most popular Business Intelligence Tools support Hive
  32. 32. #sqlsatistanbul Data Types Primitive Data Types: int, bigint, float, double, boolean, decimal, string, timestamp, date etc. Complex Data Types: arrays, maps, structs ARRAY<string>: workplace: istanbul, ankara STRUCT<sex:string,age:int> : Female,25 MAP<string,int>: SOLR:92 Hive RDBMS SQL Interface SQL Interface Focus on analytics ay focus on online or analytics No transactions Transactions usually supported Partition adds, no random Inserts. Random Insert and Update supported Distributed processing via map/reduce Distributed processing varies by vendor (if available) Scales to hundreds of nodes Seldom scale beyond 20 nodes Built for commodity hardware Often built on proprietary hardware (especially when scaling out) Low cost per petabyte What's petabyte? :) (note: Are you sure?)
  33. 33. #sqlsatistanbul http://hortonworks.com/wp-content/uploads/downloads/2013/08/Hortonworks.CheatSheet.SQLtoHive.pdf
  34. 34. #sqlsatistanbul
  35. 35. #sqlsatistanbul Originally developed at Yahoo! (Huge contributions from Hortonworks, Twitter) A Platform for analyzing large data sets that consists of high-level language for expressing data analysis programs Processing large semi-structured data sets using Hadoop Map Reduce Write complex MapReduce jobs using a simple script language (Pig Latin) Pig provides a bunch of aggregation function (AVG, COUNT, SUM, MAX, MIN etc.) Developers can develop UDF Console (grunt), script, java, HUE (Hadoop User Experience by Cloudera) Easy to use and efficient
  36. 36. #sqlsatistanbul Data Types Simple Data Types: int, float, double, chararray (UTF-8), bytearray Complex Data Types: map (Key,Value), Tuple, Bag (list of tuples) Commands Loading: LOAD, STORE, DUMP Filtering: FILTER, FOREACH, DISTINCT Grouping: JOIN, GROUP, COGROUP, CROSS Ordering: ORDER, LIMIT Merging & Split: UNION, SPLIT SQL SCRIPT PIG SCRIPT SELECT * FROM TABLE A=LOAD 'DATA' USING PigStorage('t') AS (col1:int, col2:int, col3:int); SELECT col1+col2, col3 FROM TABLE B=FOREACH A GENERATE col1+col2, col3; SELECT col1+col2, col3 FROM TABLE WHERE col3>10 C=FILTER B by col3>10; SELECT col1, col2, sum(col3) FROM X GROUP BY col1, col2 D=GROUP A BY (col1,col2); E=FOREACH D GENERATE FLATTEN(group), SUM(A.col3); ... HAVING sum(col3) > 5 F=FILTER E BY $2>5; ... ORDER BY col1 G=ORDER F BY $0 SELECT DISTINCT col1 FROM TABLE I=FOREACH A GENERATE col1; J=DISTINCT I; SELECT col1,COUNT(DISTINCT col2) FROM TABLE GROUP BY col1 K=GROUP A BY col1; L=FOREACH K {M=DISTINCT A.col2; GENERATE FLATTEN(group), count(M);}
  37. 37. #sqlsatistanbul Ohhh Finally Demo Time!
  38. 38. #sqlsatistanbul Visualize Meaningful Results
  39. 39. #sqlsatistanbul
  40. 40. #sqlsatistanbul Big Data Analytics, Implementing Big Data Analysis, Big Data Analytics with HDInsight, Big Data and Business Analytics Immersion, Getting Started with Microsoft Azure Machine Learning Real World Big Data in Azure, Big Data on Amazon Web Services, Reporting with MongoDB, Cloud Business Intelligence, HDInsight Deep Dive: Storm HBase and Hive, Data Science & Hadoop Workflows at Scale With Scalding, SQL on Hadoop - Analyzing Big Data with Hive Introduction to Big Data Analytics, Machine Learning with Big Data, Big Data Analytics for Healthcare, Data Science at Scale, The Data Scientist's Toolbox, R Programming Master Big Data and Hadoop Step by Step, Hadoop Essentials, Hadoop Starter Kit, Data Analytics using Hadoop eco system, Big Data: How Data Analytics Is Transforming the World, Applied Data Science with R, Hadoop Enterprise Integration Data Science and Analytics in Context, Introduction to Big Data with Spark, Data Science and Machine Learning Essentials, Machine Learning for Data Science and Analytics, Statistical Thinking for Data Science and Analytics
  41. 41. #sqlsatistanbul
  • francescoangelini717

    Jul. 30, 2017
  • JamesBaldwin8

    Dec. 2, 2016
  • MuratAras1

    Dec. 21, 2015
  • caniberk

    Dec. 20, 2015

Social media are computer-mediated tools that allow people to create, share or exchange information, ideas, and pictures/videos in virtual communities and networks. To sum up Social Media is everything for your customers and Your company need to listen them to understand, make a custom offer or improve loyalty etc. Azure Stream Analytics and HDInsight platforms can solve this problem for you. We'll focus on how to get Twitter data using Stream Analytics and how to make data enrichment and storing using HDInsight and What is the problem about sentiment analytics using Azure Machine Learning.

Vistas

Total de vistas

3.070

En Slideshare

0

De embebidos

0

Número de embebidos

12

Acciones

Descargas

49

Compartidos

0

Comentarios

0

Me gusta

4

×