Text or Image classification done using deep neural networks presents us with a unique way to identify each trained image/word via something known as ‘Embedding’. Embedding refers to fix sized vectors which are learnt during the training process of a neural network but it is very difficult to make sense of these random values.
3. About Me ..
▪ Team Lead – Data Science
Bain and Company
▪ Speaker
-O’Reilly Strata conference
-GIDS
▪ Published Author
• Machine Learning using PySpark
• Learn PySpark
• Learn TensorFlow 2.0 : The easy way
• Machine Learning in Production ( WIP)
▪ https://www.linkedin.com/in/pramodchahar/
11. Applicable to other domains
Finance & Insurance E-Commerce/Retail Real Estate
12. Key Questions
▪ Which set of customer journeys are similar
to each other ?
▪ Which set of customer journeys indicate
broken vs seamless experience ?
▪ Which are those 4-5 major routes that
customers takes in order to convert ?
15. Challenges
Specifications
Number of columns = Number of unique categories
Price
Features
Specifications Price Features Reviews .. …
1 0 0 0 0 0
0 1 0 0 0 0
0 0 1 0 0 0
Similarity between Specifications and Price = 0
Similarity between Price and Features = 0
Similarity between Features and Specifications = 0
16. Gaps
• Sequence of events is ignored
Can we represent each of these page categories with a vector which captures the underlying semantics ?
Using this vector , can we represent each user journey?
18. Embeddings
“An embedding is a mapping of a discrete — categorical — variable to a vector of continuous numbers such that the vectors of similar entities are
closer to one another in vector space.”
king - man + woman = queen
19. Category Similarity using Embeddings
Price 0.43 0.75 0.98 … …. … 0.55 0.87
Specification 0.23 0.10 0.33 … …. … 0.45 0.20
Features 0.22 0.09 0.30 … …. … 0.44 0.18
Similarity between Specifications and Price = - 0.75
Similarity between Price and Features = - 0.83
Similarity between Features and Specifications = 0.91
26. Sequence Based Embedding*
The earth is round and moves around the sun
• Context and Target Words
• Given a word, which are the neighboring words ?
• Given the neighboring words, what's the target word ?
*Window size
34. Embedding Visualization
Categories related to services,
warranty, review are closer
Categories related to
test drive activities are
closer
Categories vehicle
information are closer
39. Advantages of Embeddings
• Finding nearest neighbours in the low dimensional space
• Input features for machine learning prediction
• For understanding relations between between categories