Abstract:- It is a rare occurrence to observe the rise of a new language amongst a population. It is an even more rare occurrence to observe the adoption of such a language on a global scale. Since the introduction of the emoji keyboard on iOS in 2011, the use of emojis in textual communication has steadily grown into a common vernacular on social media. As of April 2015, Instagram reported that nearly half of all text contained emojis and, in some countries, over 60% of texts contained emoji characters. For power users of social media as well as for marketers looking for audiences on these platforms, it is becoming increasingly imperative to capture emoji data and derive insight from its use; to better understand what intent or meaning the usage carries in the conversation. Jeff Weintraub, VP of Technology at theAmplify, a creative Brandtech Influencer Service and a subsidiary of You & Mr Jones, the World's First Brandtech Group, will briefly summarize the data science behind learning emoji representations and also present recent trends in emoji usage within the context of advertising and branded marketing campaigns on social media.
4. 4
//BigDataLA2017
Emoji Adoption -‐ Instagram
October 2011
Emoji keyboard launches on iOS
10%
Instagram Comments contained emoji
(Nov 2011)
50%+
Instagram Comments contained
emoji (March 2015)
See https://engineering.instagram.com/emojineering-‐part-‐1-‐machine-‐learning-‐for-‐emoji-‐trendsmachine-‐learning-‐for-‐emoji-‐trends-‐7f5f9cb979ad
5. 5
//BigDataLA2017
Emoji Adoption -‐ Instagram
2,666
Emojis in Unicode Standard as of
May 2017
-‐0.93
Correla?on coefficient within respec?ve
cohorts
See https://engineering.instagram.com/emojineering-‐part-‐1-‐machine-‐learning-‐for-‐emoji-‐trendsmachine-‐learning-‐for-‐emoji-‐trends-‐7f5f9cb979ad
9. 9
//BigDataLA2017
Emojineering
NLP SemanCc Analysis
-‐ N-‐gram Nueral Network Language
Model (NNLM)
See Mikolov, et al. Efficient Estimation of Word Representations in Vector Space, 2013
Q = Training Complexity; Goal is to minimize so can be trained efficiently on
more data
C is the maximum distance of the words.
V is size of the vocabulary; output layer dimensionality
-‐ Trained with stochas?c gradient descent
(SGD) and back propaga?on
-‐ Maximize classifica?on of a word based
on another word in the same sentence.
ConCnuous Skip-‐gram Model
10. 10
//BigDataLA2017
Emojineering
Skip-‐gram Model
-‐ if we choose C = 5, for each training
word we will select randomly a number
R in range < 1; C >, and then use R
words from history and R words from
the future of the current word as
correct labels.
See Mikolov, et al. Efficient Estimation of Word Representations in Vector Space, 2013
-‐ increasing the range improves quality of
the resul?ng word vectors, but it also
increases the computa?onal complexity
11. 11
//BigDataLA2017
Emojineering
DistribuConal Hypothesis
Words that occur in similar contexts tend
to have similar meanings (Harris, 1954;
Firth, 1957; Deerwester et al., 1990)
Training Accuracy
-‐ 300 dimensional vectors; words and
emojis
-‐ 3 million phrases
-‐ 6B tokens
the, Ford, GT
cars, Ford, :)
See https://engineering.instagram.com/emojineering-‐part-‐1-‐machine-‐learning-‐for-‐emoji-‐trendsmachine-‐learning-‐for-‐emoji-‐trends-‐7f5f9cb979ad
13. 13
//BigDataLA2017
Emojineering
DistribuConal Hypothesis
Words that occur in similar contexts tend
to have similar meanings (Harris, 1954;
Firth, 1957; Deerwester et al., 1990)
100 Billion Words
Model contains 300 dimensional vectors
for 3 million words and phrases
the, Ford, GT
cars, Ford, :)
3. Conversational Insight
14. 14
//BigDataLA2017
Conversational Insight -‐ Entertainment Vertical
65.23%
of Emojis used were Top 10 Emojis
34.7%
of Emojis uses were and 😂 😍
30.01% of Emojis used were seman?cally
relevant to key words
15. 15
//BigDataLA2017
Conversational Insight -‐ Retail Vertical
58.14%
of Emojis used were Top 10 Emojis
22.5%
of Emojis uses were and 😍
11.78% of Emojis used were seman?cally
relevant to key words
❤
16. 16
//BigDataLA2017
Conversational Insight -‐ Beauty Vertical
71.22%
of Emojis used were Top 10 Emojis
37.8%
of Emojis uses were and 😂 😍
4% of Emojis used were seman?cally
relevant to key words