SlideShare una empresa de Scribd logo
1 de 29
Descargar para leer sin conexión
Analyzing Realtime News
Raffaele Lorusso – Marco Fusi
Milan, November 2015 #RateMe
CREARE	LA	
NOTIZIA	
This project has been realized during the 2015-2016 master “Business Intelligence
and Big Data Analytics” at Università di Milano - BicoccaCONTEXT	
#RateMe
L'IT riesce a conseguire una sostanziale riduzione dei costi operativi attraverso la
modernizzazione delle proprie Data Architecture. L'innovazione include l'implementazione di
Active Archive per i cold data, l’offloading di processi ETL e l'enrichment dei dati.
	
CREARE	LA	
NOTIZIA	
BIG	DATA	 Quali son le tecnologie e le potenzialità dei Big Data
Twitter as an example of new media and realtime news sharingTWITTER	
#RateMe
TIMELINE
NEWS	LIFECYCLE	 How news spreads on Twitter and other new-media
News	
#RateMe
TIMELINE
NEWS	LIFECYCLE	 How news spreads on Twitter and other new-media
Tweet	 News	
#RateMe
TIMELINE
NEWS	LIFECYCLE	 How news spreads on Twitter and other new-media
News	Tweet	
Tweet	Tweet	
Tweet	
Tweet	Tweet	Tweet	
Tweet	
#RateMe
TIMELINE
NEWS	LIFECYCLE	 How news spreads on Twitter and other new-media
News	Tweet	
Tweet	Tweet	
Tweet	
Tweet	Tweet	Tweet	
Tweet	
Tweet	
Tweet	 Tweet	 Tweet	
Tweet	
Tweet	
Tweet	
Tweet	
#RateMe
TIMELINE
NEWS	LIFECYCLE	 How news spreads on Twitter and other new-media
News	
Tweet	Tweet	
Tweet	
Tweet	Tweet	Tweet	
Tweet	
Tweet	
Tweet	 Tweet	 Tweet	
Tweet	Tweet	
Tweet	
Tweet	
Tweet	 Tweet	 Tweet	
Tweet	
Tweet	
Tweet	
Tweet	
Tweet	
#RateMe
TIMELINE
NEWS	LIFECYCLE	 How news spreads on Twitter and other new-media
Tweet	
Tweet	Tweet	
Tweet	
Tweet	Tweet	Tweet	
Tweet	
Tweet	
Tweet	 Tweet	 Tweet	
Tweet	
Tweet	
Tweet	
Tweet	
Tweet	
Tweet	
Tweet	 Tweet	 Tweet	
Tweet	Tweet	
Tweet	
Tweet	 Tweet	
News	
#RateMe
L'IT riesce a conseguire una sostanziale riduzione dei costi operativi attraverso la
modernizzazione delle proprie Data Architecture. L'innovazione include l'implementazione di
Active Archive per i cold data, l’offloading di processi ETL e l'enrichment dei dati.
	
Twitter is an easy way to create and share news and opinions.
It’s a new flow of content and information associated with huge opportunities.
With the collected data it’s possible to conduct statystical analysis that allow us to
extrapolate quantitative and qualitative indicators in order to identify trends, correlations,
flows, sentiment,….
CREATE	
ANALYZE	
FOLLOW	
Follow the news evolution during the time by analyzing and contextualyizing it in the reality
and comparing the externals events that can contribute to generete and modify the news
itself.
#RateMe
L'IT riesce a conseguire una sostanziale riduzione dei costi operativi attraverso la
modernizzazione delle proprie Data Architecture. L'innovazione include l'implementazione di
Active Archive per i cold data, l’offloading di processi ETL e l'enrichment dei dati.
ARCHITECTURE	 Main Components
#RateMe
BATCH
LAYER
SPEED
LAYER
DATA
SOURCES
Machine
Learning
PRESENTATION
LAYER
	
CREARE	LA	
NOTIZIA	
ARCHITECTURE	 The Lambda Architecture
#RateMe
Case Study: Big Data Ecosystem on Twitter
#RateMe
BIG DATA
FRONTEND	
BIG DATA
BACKEND BIG DATA
FRONTEND
Big Data Ecosystem
BIG DATA
BACKEND	
#RateMe
Big Data Ecosystem at a glance
40k	 1	Month	
100	k	
28	k	
170	k	
1.2	k	
30	k	
#RateMe
L'IT riesce a conseguire una sostanziale riduzione dei costi operativi attraverso la
modernizzazione delle proprie Data Architecture. L'innovazione include l'implementazione di
Active Archive per i cold data, l’offloading di processi ETL e l'enrichment dei dati.
Big Data Ecosystem
#RateMe
L'IT riesce a conseguire una sostanziale riduzione dei costi operativi attraverso la
modernizzazione delle proprie Data Architecture. L'innovazione include l'implementazione di
Active Archive per i cold data, l’offloading di processi ETL e l'enrichment dei dati.
SENTIMENT	
ANALYSIS	
From the text of the Tweets it’s possible to compute a measure relative to the sentiment
associated with it.
In this project we have built two different models.
BIG DATA
BACKEND
BIG DATA
FRONTEND
CLUSTER
THEN
PREDICT	
BIG DATA
BACKEND
DICTIONARY
ALGORITM	
#RateMe
L'IT riesce a conseguire una sostanziale riduzione dei costi operativi attraverso la
modernizzazione delle proprie Data Architecture. L'innovazione include l'implementazione di
Active Archive per i cold data, l’offloading di processi ETL e l'enrichment dei dati.
SENTIMENT	
ANALYSIS	
This model concept is to split a Tweet into tokens composed by the single words, and then
associate a score to each word by looking in a dictionary table containing positive and
negative words and a numerical score.
BIG DATA
BACKEND
BIG DATA
BACKEND
DICTIONARY
ALGORITM	
#RateMe
L'IT riesce a conseguire una sostanziale riduzione dei costi operativi attraverso la
modernizzazione delle proprie Data Architecture. L'innovazione include l'implementazione di
Active Archive per i cold data, l’offloading di processi ETL e l'enrichment dei dati.
SENTIMENT	
ANALYSIS	
This model is based upon clustering Tweets with similar words and then applying a
Random Forest algorithm on each cluster
“Improved Twitter Sentiment prediction through Cluster then Predict Model”
International Journal of Computer Science and Network, August 2015
BIG DATA
FRONTEND
CLUSTER
THEN
PREDICT	
#RateMe
DASHBOARD	
*LIVE	DEMO	
#RateMe
CREARE	LA	
NOTIZIA	
CONCLUSIONS	
• The «Lambda Architecture» seems a good approach thanks to the tradeoff between the need of RealTime Analysis
and Batch computations
• The Big Data Ecosystem is composed by etherogeneous technologies and each of them solve just a part of the
whole problem
• Many technlogies are easily interoperable and composable
• There are many first mover in the Big Data market but also consolidated ones that are nowdays a must have in a
Big Data Architecture
Big Data Ecosystem - Architecture
#RateMe
CREARE	LA	
NOTIZIA	
BIG	DATA	
CONCLUSIONS	
•  The most twitted technlogies are not always the ones that has the largest market share
•  It seems there’s no correlation between real Big Data Events and tweets volumes
•  In this case study the sentiment analysis made with the cluster then predict model is worse than the one made
with the dictionary algorithm
•  The dictionary algorithm approach is very susceptible to the usage of a good dictionary with a lot of words.
With the dictionary we used only 42% tweets were scored
•  The analysis between the senders and the mentioned users underlyned that there are many influencers who
are actually closely connected to the technologies or even the official accounts of that technlogy
•  45% of the tweets were sent by official apps from Web platform, Android and IOS
Big Data Ecosystem – Data Analysis
#RateMe
Case Study: Data Science seminar @masterBIBDA
Milan, 19 November 2015 #RateMe
Game
Rate this seminar
Players
Our speakers and YOU!
Objectives
Have Fun!
#RateMe Rules
#RateMe
Tweet to
@masterbibda
Reference the keyword
by using an hashtag
#datascientistprofiles
Vote
alto – medio - basso
Example#RateMe
#RateMe
CREIAMO	LA	NOTIZIA	
and…
Feel free to Tweet your toughts @masterbibda!
Every Tweet will be analyzed!
#RateMe
#RateMe
DASHBOARD	
*LIVE	DEMO	
#RateMe
Tweet	
Tweet	Tweet	
Tweet	
Tweet	Tweet	Tweet	
Tweet	
Tweet	
Tweet	 Tweet	 Tweet	
Tweet	
Tweet	
Tweet	
Tweet	
Tweet	
Tweet	
Tweet	 Tweet	 Tweet	
Tweet	Tweet	
Tweet	
Tweet	 Tweet	
News	
Enjoy #RateMe
#RateMe
Raffaele Lorusso – Marco Fusi
Milan, November 2015
THANKS!
Analyzing Realtime News
#RateMe

Más contenido relacionado

La actualidad más candente

Data sciences and marketing analytics
Data sciences and marketing analyticsData sciences and marketing analytics
Data sciences and marketing analyticsMJ Xavier
 
Real time streaming analytics
Real time streaming analyticsReal time streaming analytics
Real time streaming analyticsAnirudh
 
DataOps: An Agile Method for Data-Driven Organizations
DataOps: An Agile Method for Data-Driven OrganizationsDataOps: An Agile Method for Data-Driven Organizations
DataOps: An Agile Method for Data-Driven OrganizationsEllen Friedman
 
Bigdatacooltools
BigdatacooltoolsBigdatacooltools
Bigdatacooltoolssuresh sood
 
Case of success: Visualization as an example for exercising democratic transp...
Case of success: Visualization as an example for exercising democratic transp...Case of success: Visualization as an example for exercising democratic transp...
Case of success: Visualization as an example for exercising democratic transp...Big Data Spain
 
BIG DATA & DATA ANALYTICS
BIG  DATA & DATA  ANALYTICSBIG  DATA & DATA  ANALYTICS
BIG DATA & DATA ANALYTICSNAGARAJAGIDDE
 
Leveraging Open Source Automated Data Science Tools
Leveraging Open Source Automated Data Science ToolsLeveraging Open Source Automated Data Science Tools
Leveraging Open Source Automated Data Science ToolsDomino Data Lab
 
"Agile Analytics" - Marianne Faro, Analytics Competence Lead at Itility
"Agile Analytics" - Marianne Faro, Analytics Competence Lead at Itility"Agile Analytics" - Marianne Faro, Analytics Competence Lead at Itility
"Agile Analytics" - Marianne Faro, Analytics Competence Lead at ItilityDataconomy Media
 
"Social innovation with (big) data" - Maurice Fransen, Analytics Lead Public ...
"Social innovation with (big) data" - Maurice Fransen, Analytics Lead Public ..."Social innovation with (big) data" - Maurice Fransen, Analytics Lead Public ...
"Social innovation with (big) data" - Maurice Fransen, Analytics Lead Public ...Dataconomy Media
 
Dataiku - for Data Geek Paris@Criteo - Close the Data Circle
Dataiku  - for Data Geek Paris@Criteo - Close the Data CircleDataiku  - for Data Geek Paris@Criteo - Close the Data Circle
Dataiku - for Data Geek Paris@Criteo - Close the Data CircleDataiku
 
Big Data and Computer Science Education
Big Data and Computer Science EducationBig Data and Computer Science Education
Big Data and Computer Science EducationJames Hendler
 
The Four V’s of Big Data Testing: Variety, Volume, Velocity, and Veracity
The Four V’s of Big Data Testing: Variety, Volume, Velocity, and VeracityThe Four V’s of Big Data Testing: Variety, Volume, Velocity, and Veracity
The Four V’s of Big Data Testing: Variety, Volume, Velocity, and VeracityTechWell
 
On Big Data Analytics - opportunities and challenges
On Big Data Analytics - opportunities and challengesOn Big Data Analytics - opportunities and challenges
On Big Data Analytics - opportunities and challengesPetteri Alahuhta
 
Big Data Landscape 2018
Big Data Landscape 2018Big Data Landscape 2018
Big Data Landscape 2018Leanne Hwee
 
6 levels of big data analytics applications
6 levels of big data analytics applications6 levels of big data analytics applications
6 levels of big data analytics applicationspanoratio
 

La actualidad más candente (19)

Data sciences and marketing analytics
Data sciences and marketing analyticsData sciences and marketing analytics
Data sciences and marketing analytics
 
Real time streaming analytics
Real time streaming analyticsReal time streaming analytics
Real time streaming analytics
 
5 v of big data
5 v of big data5 v of big data
5 v of big data
 
NewMR 2016 presents: 9 Big Applications of Big Data
NewMR 2016 presents: 9 Big Applications of Big DataNewMR 2016 presents: 9 Big Applications of Big Data
NewMR 2016 presents: 9 Big Applications of Big Data
 
DataOps: An Agile Method for Data-Driven Organizations
DataOps: An Agile Method for Data-Driven OrganizationsDataOps: An Agile Method for Data-Driven Organizations
DataOps: An Agile Method for Data-Driven Organizations
 
Bigdatacooltools
BigdatacooltoolsBigdatacooltools
Bigdatacooltools
 
Case of success: Visualization as an example for exercising democratic transp...
Case of success: Visualization as an example for exercising democratic transp...Case of success: Visualization as an example for exercising democratic transp...
Case of success: Visualization as an example for exercising democratic transp...
 
BIG DATA & DATA ANALYTICS
BIG  DATA & DATA  ANALYTICSBIG  DATA & DATA  ANALYTICS
BIG DATA & DATA ANALYTICS
 
Leveraging Open Source Automated Data Science Tools
Leveraging Open Source Automated Data Science ToolsLeveraging Open Source Automated Data Science Tools
Leveraging Open Source Automated Data Science Tools
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
"Agile Analytics" - Marianne Faro, Analytics Competence Lead at Itility
"Agile Analytics" - Marianne Faro, Analytics Competence Lead at Itility"Agile Analytics" - Marianne Faro, Analytics Competence Lead at Itility
"Agile Analytics" - Marianne Faro, Analytics Competence Lead at Itility
 
"Social innovation with (big) data" - Maurice Fransen, Analytics Lead Public ...
"Social innovation with (big) data" - Maurice Fransen, Analytics Lead Public ..."Social innovation with (big) data" - Maurice Fransen, Analytics Lead Public ...
"Social innovation with (big) data" - Maurice Fransen, Analytics Lead Public ...
 
Dataiku - for Data Geek Paris@Criteo - Close the Data Circle
Dataiku  - for Data Geek Paris@Criteo - Close the Data CircleDataiku  - for Data Geek Paris@Criteo - Close the Data Circle
Dataiku - for Data Geek Paris@Criteo - Close the Data Circle
 
Big Data and Computer Science Education
Big Data and Computer Science EducationBig Data and Computer Science Education
Big Data and Computer Science Education
 
Applications of R (DataWeek 2014)
Applications of R (DataWeek 2014)Applications of R (DataWeek 2014)
Applications of R (DataWeek 2014)
 
The Four V’s of Big Data Testing: Variety, Volume, Velocity, and Veracity
The Four V’s of Big Data Testing: Variety, Volume, Velocity, and VeracityThe Four V’s of Big Data Testing: Variety, Volume, Velocity, and Veracity
The Four V’s of Big Data Testing: Variety, Volume, Velocity, and Veracity
 
On Big Data Analytics - opportunities and challenges
On Big Data Analytics - opportunities and challengesOn Big Data Analytics - opportunities and challenges
On Big Data Analytics - opportunities and challenges
 
Big Data Landscape 2018
Big Data Landscape 2018Big Data Landscape 2018
Big Data Landscape 2018
 
6 levels of big data analytics applications
6 levels of big data analytics applications6 levels of big data analytics applications
6 levels of big data analytics applications
 

Similar a Analyzing Real Time News

Take Action: The New Reality of Data-Driven Business
Take Action: The New Reality of Data-Driven BusinessTake Action: The New Reality of Data-Driven Business
Take Action: The New Reality of Data-Driven BusinessInside Analysis
 
From IoT to IoTA
From IoT to IoTAFrom IoT to IoTA
From IoT to IoTAStriim
 
Business in the Driver’s Seat – An Improved Model for Integration
Business in the Driver’s Seat – An Improved Model for IntegrationBusiness in the Driver’s Seat – An Improved Model for Integration
Business in the Driver’s Seat – An Improved Model for IntegrationInside Analysis
 
The Right Data Warehouse: Automation Now, Business Value Thereafter
The Right Data Warehouse: Automation Now, Business Value ThereafterThe Right Data Warehouse: Automation Now, Business Value Thereafter
The Right Data Warehouse: Automation Now, Business Value ThereafterInside Analysis
 
Big Data Scotland
Big Data ScotlandBig Data Scotland
Big Data ScotlandRay Bugg
 
Big Data Driven Solutions to Combat Covid' 19
Big Data Driven Solutions to Combat Covid' 19Big Data Driven Solutions to Combat Covid' 19
Big Data Driven Solutions to Combat Covid' 19Prof.Balakrishnan S
 
The State of Streaming Analytics: The Need for Speed and Scale
The State of Streaming Analytics: The Need for Speed and ScaleThe State of Streaming Analytics: The Need for Speed and Scale
The State of Streaming Analytics: The Need for Speed and ScaleVoltDB
 
Architecting for Big Data: Trends, Tips, and Deployment Options
Architecting for Big Data: Trends, Tips, and Deployment OptionsArchitecting for Big Data: Trends, Tips, and Deployment Options
Architecting for Big Data: Trends, Tips, and Deployment OptionsCaserta
 
Introduction to big data
Introduction to big dataIntroduction to big data
Introduction to big dataSitaram Kotnis
 
Agile data science
Agile data scienceAgile data science
Agile data scienceJoel Horwitz
 
Logitech Accelerates Cloud Analytics Using Data Virtualization by Avinash Des...
Logitech Accelerates Cloud Analytics Using Data Virtualization by Avinash Des...Logitech Accelerates Cloud Analytics Using Data Virtualization by Avinash Des...
Logitech Accelerates Cloud Analytics Using Data Virtualization by Avinash Des...Data Con LA
 
Smarter Analytics: Supporting the Enterprise with Automation
Smarter Analytics: Supporting the Enterprise with AutomationSmarter Analytics: Supporting the Enterprise with Automation
Smarter Analytics: Supporting the Enterprise with AutomationInside Analysis
 
Integrate Big Data into Your Organization with Informatica and Perficient
Integrate Big Data into Your Organization with Informatica and PerficientIntegrate Big Data into Your Organization with Informatica and Perficient
Integrate Big Data into Your Organization with Informatica and PerficientPerficient, Inc.
 
Denodo Datafest 2017 London Tekin Mentes Logitech
Denodo Datafest 2017 London Tekin Mentes LogitechDenodo Datafest 2017 London Tekin Mentes Logitech
Denodo Datafest 2017 London Tekin Mentes LogitechTekin Mentes
 
Smart Data for Smart Labs
Smart Data for Smart Labs Smart Data for Smart Labs
Smart Data for Smart Labs OSTHUS
 
Predictive Analytics - Big Data Warehousing Meetup
Predictive Analytics - Big Data Warehousing MeetupPredictive Analytics - Big Data Warehousing Meetup
Predictive Analytics - Big Data Warehousing MeetupCaserta
 
8.17.11 big data and hadoop with informatica slideshare
8.17.11 big data and hadoop with informatica slideshare8.17.11 big data and hadoop with informatica slideshare
8.17.11 big data and hadoop with informatica slideshareJulianna DeLua
 

Similar a Analyzing Real Time News (20)

Take Action: The New Reality of Data-Driven Business
Take Action: The New Reality of Data-Driven BusinessTake Action: The New Reality of Data-Driven Business
Take Action: The New Reality of Data-Driven Business
 
From IoT to IoTA
From IoT to IoTAFrom IoT to IoTA
From IoT to IoTA
 
Business in the Driver’s Seat – An Improved Model for Integration
Business in the Driver’s Seat – An Improved Model for IntegrationBusiness in the Driver’s Seat – An Improved Model for Integration
Business in the Driver’s Seat – An Improved Model for Integration
 
Analyzing Real Time News
Analyzing Real Time NewsAnalyzing Real Time News
Analyzing Real Time News
 
The Right Data Warehouse: Automation Now, Business Value Thereafter
The Right Data Warehouse: Automation Now, Business Value ThereafterThe Right Data Warehouse: Automation Now, Business Value Thereafter
The Right Data Warehouse: Automation Now, Business Value Thereafter
 
Introduction to ELK
Introduction to ELKIntroduction to ELK
Introduction to ELK
 
Big Data Scotland
Big Data ScotlandBig Data Scotland
Big Data Scotland
 
Big Data Driven Solutions to Combat Covid' 19
Big Data Driven Solutions to Combat Covid' 19Big Data Driven Solutions to Combat Covid' 19
Big Data Driven Solutions to Combat Covid' 19
 
The State of Streaming Analytics: The Need for Speed and Scale
The State of Streaming Analytics: The Need for Speed and ScaleThe State of Streaming Analytics: The Need for Speed and Scale
The State of Streaming Analytics: The Need for Speed and Scale
 
Architecting for Big Data: Trends, Tips, and Deployment Options
Architecting for Big Data: Trends, Tips, and Deployment OptionsArchitecting for Big Data: Trends, Tips, and Deployment Options
Architecting for Big Data: Trends, Tips, and Deployment Options
 
Introduction to big data
Introduction to big dataIntroduction to big data
Introduction to big data
 
Agile data science
Agile data scienceAgile data science
Agile data science
 
Logitech Accelerates Cloud Analytics Using Data Virtualization by Avinash Des...
Logitech Accelerates Cloud Analytics Using Data Virtualization by Avinash Des...Logitech Accelerates Cloud Analytics Using Data Virtualization by Avinash Des...
Logitech Accelerates Cloud Analytics Using Data Virtualization by Avinash Des...
 
Smarter Analytics: Supporting the Enterprise with Automation
Smarter Analytics: Supporting the Enterprise with AutomationSmarter Analytics: Supporting the Enterprise with Automation
Smarter Analytics: Supporting the Enterprise with Automation
 
Integrate Big Data into Your Organization with Informatica and Perficient
Integrate Big Data into Your Organization with Informatica and PerficientIntegrate Big Data into Your Organization with Informatica and Perficient
Integrate Big Data into Your Organization with Informatica and Perficient
 
Denodo Datafest 2017 London Tekin Mentes Logitech
Denodo Datafest 2017 London Tekin Mentes LogitechDenodo Datafest 2017 London Tekin Mentes Logitech
Denodo Datafest 2017 London Tekin Mentes Logitech
 
IT In Europe
IT In EuropeIT In Europe
IT In Europe
 
Smart Data for Smart Labs
Smart Data for Smart Labs Smart Data for Smart Labs
Smart Data for Smart Labs
 
Predictive Analytics - Big Data Warehousing Meetup
Predictive Analytics - Big Data Warehousing MeetupPredictive Analytics - Big Data Warehousing Meetup
Predictive Analytics - Big Data Warehousing Meetup
 
8.17.11 big data and hadoop with informatica slideshare
8.17.11 big data and hadoop with informatica slideshare8.17.11 big data and hadoop with informatica slideshare
8.17.11 big data and hadoop with informatica slideshare
 

Último

Master's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationMaster's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationGiorgio Carbone
 
Optimal Decision Making - Cost Reduction in Logistics
Optimal Decision Making - Cost Reduction in LogisticsOptimal Decision Making - Cost Reduction in Logistics
Optimal Decision Making - Cost Reduction in LogisticsThinkInnovation
 
CI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionCI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionajayrajaganeshkayala
 
CCS336-Cloud-Services-Management-Lecture-Notes-1.pptx
CCS336-Cloud-Services-Management-Lecture-Notes-1.pptxCCS336-Cloud-Services-Management-Lecture-Notes-1.pptx
CCS336-Cloud-Services-Management-Lecture-Notes-1.pptxdhiyaneswaranv1
 
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Guido X Jansen
 
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxTINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxDwiAyuSitiHartinah
 
The Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerThe Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerPavel Šabatka
 
Rock Songs common codes and conventions.pptx
Rock Songs common codes and conventions.pptxRock Songs common codes and conventions.pptx
Rock Songs common codes and conventions.pptxFinatron037
 
5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best PracticesDataArchiva
 
Mapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxMapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxVenkatasubramani13
 
Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Vladislav Solodkiy
 
Virtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product IntroductionVirtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product Introductionsanjaymuralee1
 
Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...PrithaVashisht1
 
How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?sonikadigital1
 
ChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics InfrastructureChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics Infrastructuresonikadigital1
 
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityStrategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityAggregage
 

Último (16)

Master's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationMaster's Thesis - Data Science - Presentation
Master's Thesis - Data Science - Presentation
 
Optimal Decision Making - Cost Reduction in Logistics
Optimal Decision Making - Cost Reduction in LogisticsOptimal Decision Making - Cost Reduction in Logistics
Optimal Decision Making - Cost Reduction in Logistics
 
CI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionCI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual intervention
 
CCS336-Cloud-Services-Management-Lecture-Notes-1.pptx
CCS336-Cloud-Services-Management-Lecture-Notes-1.pptxCCS336-Cloud-Services-Management-Lecture-Notes-1.pptx
CCS336-Cloud-Services-Management-Lecture-Notes-1.pptx
 
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
 
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxTINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
 
The Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerThe Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayer
 
Rock Songs common codes and conventions.pptx
Rock Songs common codes and conventions.pptxRock Songs common codes and conventions.pptx
Rock Songs common codes and conventions.pptx
 
5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices
 
Mapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxMapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptx
 
Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023
 
Virtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product IntroductionVirtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product Introduction
 
Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...
 
How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?
 
ChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics InfrastructureChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics Infrastructure
 
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityStrategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
 

Analyzing Real Time News

  • 1. Analyzing Realtime News Raffaele Lorusso – Marco Fusi Milan, November 2015 #RateMe
  • 2. CREARE LA NOTIZIA This project has been realized during the 2015-2016 master “Business Intelligence and Big Data Analytics” at Università di Milano - BicoccaCONTEXT #RateMe
  • 3. L'IT riesce a conseguire una sostanziale riduzione dei costi operativi attraverso la modernizzazione delle proprie Data Architecture. L'innovazione include l'implementazione di Active Archive per i cold data, l’offloading di processi ETL e l'enrichment dei dati. CREARE LA NOTIZIA BIG DATA Quali son le tecnologie e le potenzialità dei Big Data Twitter as an example of new media and realtime news sharingTWITTER #RateMe
  • 4. TIMELINE NEWS LIFECYCLE How news spreads on Twitter and other new-media News #RateMe
  • 5. TIMELINE NEWS LIFECYCLE How news spreads on Twitter and other new-media Tweet News #RateMe
  • 6. TIMELINE NEWS LIFECYCLE How news spreads on Twitter and other new-media News Tweet Tweet Tweet Tweet Tweet Tweet Tweet Tweet #RateMe
  • 7. TIMELINE NEWS LIFECYCLE How news spreads on Twitter and other new-media News Tweet Tweet Tweet Tweet Tweet Tweet Tweet Tweet Tweet Tweet Tweet Tweet Tweet Tweet Tweet Tweet #RateMe
  • 8. TIMELINE NEWS LIFECYCLE How news spreads on Twitter and other new-media News Tweet Tweet Tweet Tweet Tweet Tweet Tweet Tweet Tweet Tweet Tweet Tweet Tweet Tweet Tweet Tweet Tweet Tweet Tweet Tweet Tweet Tweet Tweet #RateMe
  • 9. TIMELINE NEWS LIFECYCLE How news spreads on Twitter and other new-media Tweet Tweet Tweet Tweet Tweet Tweet Tweet Tweet Tweet Tweet Tweet Tweet Tweet Tweet Tweet Tweet Tweet Tweet Tweet Tweet Tweet Tweet Tweet Tweet Tweet Tweet News #RateMe
  • 10. L'IT riesce a conseguire una sostanziale riduzione dei costi operativi attraverso la modernizzazione delle proprie Data Architecture. L'innovazione include l'implementazione di Active Archive per i cold data, l’offloading di processi ETL e l'enrichment dei dati. Twitter is an easy way to create and share news and opinions. It’s a new flow of content and information associated with huge opportunities. With the collected data it’s possible to conduct statystical analysis that allow us to extrapolate quantitative and qualitative indicators in order to identify trends, correlations, flows, sentiment,…. CREATE ANALYZE FOLLOW Follow the news evolution during the time by analyzing and contextualyizing it in the reality and comparing the externals events that can contribute to generete and modify the news itself. #RateMe
  • 11. L'IT riesce a conseguire una sostanziale riduzione dei costi operativi attraverso la modernizzazione delle proprie Data Architecture. L'innovazione include l'implementazione di Active Archive per i cold data, l’offloading di processi ETL e l'enrichment dei dati. ARCHITECTURE Main Components #RateMe
  • 13. Case Study: Big Data Ecosystem on Twitter #RateMe
  • 14. BIG DATA FRONTEND BIG DATA BACKEND BIG DATA FRONTEND Big Data Ecosystem BIG DATA BACKEND #RateMe
  • 15. Big Data Ecosystem at a glance 40k 1 Month 100 k 28 k 170 k 1.2 k 30 k #RateMe
  • 16. L'IT riesce a conseguire una sostanziale riduzione dei costi operativi attraverso la modernizzazione delle proprie Data Architecture. L'innovazione include l'implementazione di Active Archive per i cold data, l’offloading di processi ETL e l'enrichment dei dati. Big Data Ecosystem #RateMe
  • 17. L'IT riesce a conseguire una sostanziale riduzione dei costi operativi attraverso la modernizzazione delle proprie Data Architecture. L'innovazione include l'implementazione di Active Archive per i cold data, l’offloading di processi ETL e l'enrichment dei dati. SENTIMENT ANALYSIS From the text of the Tweets it’s possible to compute a measure relative to the sentiment associated with it. In this project we have built two different models. BIG DATA BACKEND BIG DATA FRONTEND CLUSTER THEN PREDICT BIG DATA BACKEND DICTIONARY ALGORITM #RateMe
  • 18. L'IT riesce a conseguire una sostanziale riduzione dei costi operativi attraverso la modernizzazione delle proprie Data Architecture. L'innovazione include l'implementazione di Active Archive per i cold data, l’offloading di processi ETL e l'enrichment dei dati. SENTIMENT ANALYSIS This model concept is to split a Tweet into tokens composed by the single words, and then associate a score to each word by looking in a dictionary table containing positive and negative words and a numerical score. BIG DATA BACKEND BIG DATA BACKEND DICTIONARY ALGORITM #RateMe
  • 19. L'IT riesce a conseguire una sostanziale riduzione dei costi operativi attraverso la modernizzazione delle proprie Data Architecture. L'innovazione include l'implementazione di Active Archive per i cold data, l’offloading di processi ETL e l'enrichment dei dati. SENTIMENT ANALYSIS This model is based upon clustering Tweets with similar words and then applying a Random Forest algorithm on each cluster “Improved Twitter Sentiment prediction through Cluster then Predict Model” International Journal of Computer Science and Network, August 2015 BIG DATA FRONTEND CLUSTER THEN PREDICT #RateMe
  • 21. CREARE LA NOTIZIA CONCLUSIONS • The «Lambda Architecture» seems a good approach thanks to the tradeoff between the need of RealTime Analysis and Batch computations • The Big Data Ecosystem is composed by etherogeneous technologies and each of them solve just a part of the whole problem • Many technlogies are easily interoperable and composable • There are many first mover in the Big Data market but also consolidated ones that are nowdays a must have in a Big Data Architecture Big Data Ecosystem - Architecture #RateMe
  • 22. CREARE LA NOTIZIA BIG DATA CONCLUSIONS •  The most twitted technlogies are not always the ones that has the largest market share •  It seems there’s no correlation between real Big Data Events and tweets volumes •  In this case study the sentiment analysis made with the cluster then predict model is worse than the one made with the dictionary algorithm •  The dictionary algorithm approach is very susceptible to the usage of a good dictionary with a lot of words. With the dictionary we used only 42% tweets were scored •  The analysis between the senders and the mentioned users underlyned that there are many influencers who are actually closely connected to the technologies or even the official accounts of that technlogy •  45% of the tweets were sent by official apps from Web platform, Android and IOS Big Data Ecosystem – Data Analysis #RateMe
  • 23. Case Study: Data Science seminar @masterBIBDA Milan, 19 November 2015 #RateMe
  • 24. Game Rate this seminar Players Our speakers and YOU! Objectives Have Fun! #RateMe Rules #RateMe
  • 25. Tweet to @masterbibda Reference the keyword by using an hashtag #datascientistprofiles Vote alto – medio - basso Example#RateMe #RateMe
  • 26. CREIAMO LA NOTIZIA and… Feel free to Tweet your toughts @masterbibda! Every Tweet will be analyzed! #RateMe #RateMe
  • 29. Raffaele Lorusso – Marco Fusi Milan, November 2015 THANKS! Analyzing Realtime News #RateMe