Il Mashable Social Media Day e’ una delle manifestazioni più importanti al mondo che celebra la rivoluzione digitale, le dinamiche, le potenzialità dei social network e gli impatti da questi generate.
Oltre 40 eccellenze in campo digitale racconteranno come ottenere risultati di successo attraverso l’utilizzo dei principali strumenti in ambito social media.
Fare networking il modo più efficace e divertente per fare business angela bi...
Distinguere grano e loglio segnali, rumore e altre storie in un big (data) world luigi curini
1. Luigi Curini - @Curini
VOICES from the Blogs & University of Milan
#Spoletta @ Mashable Social Media Day 30-Jun-2015
Separating the wheat from the
chaff: Signal, Noise & Other
stories in the Big (Data) World
http://voicesfromtheblogs.com
2. http://voicesfromtheblogs.com
Testo
They are:
๏ Big (in volume)
๏ Many (per unit of time)
๏ Unstructured (messy, not ready to be processed)
Big (or organic) data
Sources:
๏ Administrative repositories
๏ Transaction data
๏ Social media & Social Network
10. “BIG Data” are today’s “data”
answer:
“there are Big Data & small data scientists”
change the data scientist, not the data!
a good advise:
http://voicesfromtheblogs.com
12. Let us focus
on big data
coming from
Social Media
http://voicesfromtheblogs.com
13. geo-localized data
retrospective analysis (capture opinions when they are
expressed)
real-time analysis (continuous monitoring and/or alerting)
speed of data analysis (if you know how to do it)
gathering of unsolicited opinions
census-type analysis: analyze the entire population of
texts not just a sample
population on social media not necessarily
representative of demographic population
can’t ask questions, just listen to people: if people
don’t discuss about a topic you don’t have the data
textual analysis, language evolves continuously
and changes according to topic, media, etc.
pros
cons
http://voicesfromtheblogs.com
14. Three
simple
ideas
“Romance should never begin with sentiment.
It should begin with science
and end with a settlement.”
Oscar Wilde, An Ideal Husband
http://voicesfromtheblogs.com
15. NO: Mentions, Likes or Retweet. Computers
are good at this, but humans can do better!
How to analyze Social Media data (1)
Obama 16.8M of followers
Romney 0.6M of followers
Final result: Obama +4.0% !
http://voicesfromtheblogs.com
16. NO: ontological dictionaries, nor NLP rules
How to analyze Social Media data (2)
Testo
“This movie has good premises. Looks like it has a nice plot, an
exceptional cast, first class actors and Stallone gives his best. But it
sucks”
"Ibis redibis numquam peribis in bello", can be translated as “will go,
will come back, will not die in war", but also the opposite way, “will
go, will not come back, will die in war"
“ragazza stufa scappa di casa… i genitori muoiono di freddo”
“There is no favorable wind for the mariner who doesn’t know where to go” (Seneca)
http://voicesfromtheblogs.com
17. NO: ontological dictionaries, nor NLP rules
How to analyze Social Media data (2)
Look at the data
Look into the data
http://voicesfromtheblogs.com
18. Switch to Supervised Techiniques!
The advantages of human beings…
• Always in sync with linguistic expressions
[dictionaries are static]
• Completely language-independent
• Moreover….
http://voicesfromtheblogs.com
19. Beyond sentiment…
there is more information out there!!!
http://voicesfromtheblogs.com
Opinions, reasons, attitudes, tones…
see the colours!
20. NO: individual classification and later
aggregation. Estimate directly the aggregated
distribution of opinions!
How to analyze Social Media data (3)
We don’t care about the needle in the haystack...
...we care about the haystack! (G. King)
http://voicesfromtheblogs.com
26. See the large picture:
the Moncler case study
www.voicesfromtheblogs.com | we capture the sentiment of the net
Monday Nov. 3rd 2014. The day after the TV Show
Report sent on air a negative reportage on the
Moncler company. Mentions online (among Twitter,
Facebook, Instagram, blog, forum and other social
channels) for the brand raised of about 450%
compared to the average level.
That peak corresponded to a 22% fall in social brand
reputation in just 24 hours (from a positive
sentiment of 75% to a negative of 53%, and 43% on
Twitter alone).
The assets on the stock exchange felt as well by 5%.
Was this due to the
Social Media?
27. www.voicesfromtheblogs.com | we capture the sentiment of the net
Obviously not,
the negative
trend was totally
predictable and
independent of
the SM sentiment
See the large picture:
the Moncler case study
33. VOICES from the Blogs born in October 2010 as a scientific
project to capture opinions expressed on the Web (social
media, blogs, forums, web)
On 12/12/12 VOICES became a Spin-off of the University of
Milan – Italy; and started operations as an independent
company
Up to January 2015 VOICES has analyzed more than half
billion of posts written in Italian, English, French, Spanish,
German, Russian, Arabic, Portuguese, Chinese and
Japanese
In December 2014 VOICES is among the winners of the
contest “Produrre Statistica ufficiale con i Big Data”
promoted by &
About us
www.voicesfromtheblogs.com | we look into the data, not at the data
34. Since March 2015 SWG has become a
partner of VOICES
Thanks to this partnership, the first
integrated group in data science and
business intelligence has born in Italy
About us
www.voicesfromtheblogs.com | we look into the data, not at the data
35. But remember…
Big Data is likely to contribute so long
as the desired qualities of the data ar
not negatively correlated with the
quantity of data
In a nutshell…
Method DO MATTER!
http://voicesfromtheblogs.com
36. Thx !
For more information, analyses and
white papers about the project visit us at
http://voicesfromtheblogs.com
On Twitter: @blogsvoices
http://voicesfromtheblogs.com