As part of the IBM PartyCloud 2018 in Milan, the talk "A Journey into Data Science & AI" will present a case study about estimating Panelists Latent Affinities. I will show the components to develop an intelligent social agent able to classify entities and estimate latent affinities. The session will also cover good practices and common challenges faced by R&D organizations dealing with Machine Learning products.
3. External Projects
3
● Co-author of “Python Deep Learning”
book and
“The Professional Data Science
Manifesto”
● Founder of the “Data Science Milan”
community and the
Machine Intelligence Hub network
4. Intelligence
4
● Capacity to learn from experience *
● Ability to adapt to different context *
● The use of metacognition to enhance learning *
* Cognitive Psychology 4th edition, Robert J Sternberg, Chapter 13
5. Social Intelligence
5
● Ability to get along with others *
● Knowledge of social matters *
● Insight into moods and or underlying personality traits of
others *
* Cognitive Psychology 4th edition, Robert J Sternberg, Chapter 13
6. Artificial (Social) Intelligence
6
● The computational part of the ability to achieve (social)**
goals in the world*
● The application of machine intelligence techniques to social
phenomena ***
* Cognitive Psychology 4th edition, Robert J Sternberg, Chapter 13
** My own social re-interpretation
*** Artificial Social Intelligence, William Sims Bainbridge et al., Annual Review of Sociology, Vol. 20 (1994), pp. 407-436
7. Generative AI Technology
7
A generative algorithm ensembling AI models and prior knowledge of the
world in order to unify different data sources into a single population of
synthetic users representing an augmented view of U.S. consumers and
their affinities.
8. Case Study: Panelists Latent Affinities
Influencers &
Celebrities
Products &
Brands
Media &
Publishers
Anonymous Panelists
Survey
10. Partially-responded Survey
10
Are you
inspired by Elon
Musk?
Are you
interested on
SpaceX
mission?
Have you drunk
Starbucks
coffee in the
last month?
Do you read NY
Times at least
once a week?
Do you listen to
Led Zeppelin?
❌ N/A ✅ N/A ✅
❌ N/A N/A ❌ N/A
✅ ✅ ❌ N/A ✅
✅ N/A ❌ N/A ✅
11. Given a set of uncategorized entities and a
set of anonymized users along with some
observable affinities:
1. What category each entity belongs to?
2. What are his/her latent affinites?
11
12. Social Agent Goals
12
1. Identify the category of each entity (e.g. influencer, product, media)
2. Learn representations of the entities (e.g. grouping them in shared
topics such as sports, music genres, movie kinds)
3. Learn how to map one entity to another (e.g. Elon Musk : people =
SpaceX : technology)
4. Estimate latent affinities by reasoning on the available observations
(e.g. if you are interested in Rock music and 70s culture, you are very
likely to be a fan of Led Zeppelin)
14. Recommender System for Affinities
14
● Build a user-item matrix
● Use implicit feedback to
represent missing
affinities
● Decompose it in the
multiplication of
user-topic and topic-item
● Infer probability scores of
latent items
Alternating Least Squares algorithm
17. Development Workflow
17
1. Time-boxed research spikes followed by clearly defined feature stories
2. Notebooks good for analysis, entry points and results presentation
3. Code developed in modules and functions in a proper IDE
4. Unit tests: replace “assertEqual” with uncertainty ranges
5. Versioning: both code, priors, data, evaluation results and models
6. Git-flow branching system, pull request, peer review
7. End-to-end testing using small datasets and Continuous Integration
First release plan focused on a MVP satisfying the acceptance
criteria defined by early adopter customers instead of just PoC
19. The Crew
ML Engineer ML Scientist
Chief Scientist Mathematician
Psychologist Data Engineer
NLP Specialist
20. Challenges
20
1. Business integration (technical and cultural)
2. Lack of expertise in the market and steep learning curve
3. Too many different tools and technologies