This project has been realized during the 2015-2016 master “Business Intelligence and Big Data Analytics” at Università di Milano - Bicocca.
Authors: Marco Fusi @marco_fusi, Raffaele Lorusso @rlorusso76
2. CREARE LA
NOTIZIA
This project has been realized during the 2015-2016 master “Business Intelligence
and Big Data Analytics” at Università di Milano - BicoccaCONTEXT
#RateMe
3. L'IT riesce a conseguire una sostanziale riduzione dei costi operativi attraverso la
modernizzazione delle proprie Data Architecture. L'innovazione include l'implementazione di
Active Archive per i cold data, l’offloading di processi ETL e l'enrichment dei dati.
CREARE LA
NOTIZIA
BIG DATA Quali son le tecnologie e le potenzialità dei Big Data
Twitter as an example of new media and realtime news sharingTWITTER
#RateMe
6. TIMELINE
NEWS LIFECYCLE How news spreads on Twitter and other new-media
News Tweet
Tweet Tweet
Tweet
Tweet Tweet Tweet
Tweet
#RateMe
7. TIMELINE
NEWS LIFECYCLE How news spreads on Twitter and other new-media
News Tweet
Tweet Tweet
Tweet
Tweet Tweet Tweet
Tweet
Tweet
Tweet Tweet Tweet
Tweet
Tweet
Tweet
Tweet
#RateMe
8. TIMELINE
NEWS LIFECYCLE How news spreads on Twitter and other new-media
News
Tweet Tweet
Tweet
Tweet Tweet Tweet
Tweet
Tweet
Tweet Tweet Tweet
Tweet Tweet
Tweet
Tweet
Tweet Tweet Tweet
Tweet
Tweet
Tweet
Tweet
Tweet
#RateMe
9. TIMELINE
NEWS LIFECYCLE How news spreads on Twitter and other new-media
Tweet
Tweet Tweet
Tweet
Tweet Tweet Tweet
Tweet
Tweet
Tweet Tweet Tweet
Tweet
Tweet
Tweet
Tweet
Tweet
Tweet
Tweet Tweet Tweet
Tweet Tweet
Tweet
Tweet Tweet
News
#RateMe
10. L'IT riesce a conseguire una sostanziale riduzione dei costi operativi attraverso la
modernizzazione delle proprie Data Architecture. L'innovazione include l'implementazione di
Active Archive per i cold data, l’offloading di processi ETL e l'enrichment dei dati.
Twitter is an easy way to create and share news and opinions.
It’s a new flow of content and information associated with huge opportunities.
With the collected data it’s possible to conduct statystical analysis that allow us to
extrapolate quantitative and qualitative indicators in order to identify trends, correlations,
flows, sentiment,….
CREATE
ANALYZE
FOLLOW
Follow the news evolution during the time by analyzing and contextualyizing it in the reality
and comparing the externals events that can contribute to generete and modify the news
itself.
#RateMe
11. L'IT riesce a conseguire una sostanziale riduzione dei costi operativi attraverso la
modernizzazione delle proprie Data Architecture. L'innovazione include l'implementazione di
Active Archive per i cold data, l’offloading di processi ETL e l'enrichment dei dati.
ARCHITECTURE Main Components
#RateMe
15. Big Data Ecosystem at a glance
40k 1 Month
100 k
28 k
170 k
1.2 k
30 k
#RateMe
16. L'IT riesce a conseguire una sostanziale riduzione dei costi operativi attraverso la
modernizzazione delle proprie Data Architecture. L'innovazione include l'implementazione di
Active Archive per i cold data, l’offloading di processi ETL e l'enrichment dei dati.
Big Data Ecosystem
#RateMe
17. L'IT riesce a conseguire una sostanziale riduzione dei costi operativi attraverso la
modernizzazione delle proprie Data Architecture. L'innovazione include l'implementazione di
Active Archive per i cold data, l’offloading di processi ETL e l'enrichment dei dati.
SENTIMENT
ANALYSIS
From the text of the Tweets it’s possible to compute a measure relative to the sentiment
associated with it.
In this project we have built two different models.
BIG DATA
BACKEND
BIG DATA
FRONTEND
CLUSTER
THEN
PREDICT
BIG DATA
BACKEND
DICTIONARY
ALGORITM
#RateMe
18. L'IT riesce a conseguire una sostanziale riduzione dei costi operativi attraverso la
modernizzazione delle proprie Data Architecture. L'innovazione include l'implementazione di
Active Archive per i cold data, l’offloading di processi ETL e l'enrichment dei dati.
SENTIMENT
ANALYSIS
This model concept is to split a Tweet into tokens composed by the single words, and then
associate a score to each word by looking in a dictionary table containing positive and
negative words and a numerical score.
BIG DATA
BACKEND
BIG DATA
BACKEND
DICTIONARY
ALGORITM
#RateMe
19. L'IT riesce a conseguire una sostanziale riduzione dei costi operativi attraverso la
modernizzazione delle proprie Data Architecture. L'innovazione include l'implementazione di
Active Archive per i cold data, l’offloading di processi ETL e l'enrichment dei dati.
SENTIMENT
ANALYSIS
This model is based upon clustering Tweets with similar words and then applying a
Random Forest algorithm on each cluster
“Improved Twitter Sentiment prediction through Cluster then Predict Model”
International Journal of Computer Science and Network, August 2015
BIG DATA
FRONTEND
CLUSTER
THEN
PREDICT
#RateMe
21. CREARE LA
NOTIZIA
CONCLUSIONS
• The «Lambda Architecture» seems a good approach thanks to the tradeoff between the need of RealTime Analysis
and Batch computations
• The Big Data Ecosystem is composed by etherogeneous technologies and each of them solve just a part of the
whole problem
• Many technlogies are easily interoperable and composable
• There are many first mover in the Big Data market but also consolidated ones that are nowdays a must have in a
Big Data Architecture
Big Data Ecosystem - Architecture
#RateMe
22. CREARE LA
NOTIZIA
BIG DATA
CONCLUSIONS
• The most twitted technlogies are not always the ones that has the largest market share
• It seems there’s no correlation between real Big Data Events and tweets volumes
• In this case study the sentiment analysis made with the cluster then predict model is worse than the one made
with the dictionary algorithm
• The dictionary algorithm approach is very susceptible to the usage of a good dictionary with a lot of words.
With the dictionary we used only 42% tweets were scored
• The analysis between the senders and the mentioned users underlyned that there are many influencers who
are actually closely connected to the technologies or even the official accounts of that technlogy
• 45% of the tweets were sent by official apps from Web platform, Android and IOS
Big Data Ecosystem – Data Analysis
#RateMe
23. Case Study: Data Science seminar @masterBIBDA
Milan, 19 November 2015 #RateMe