SlideShare una empresa de Scribd logo
1 de 38
Descargar para leer sin conexión
Automating
Tinder
Justin Long | @crockpotveggies
with Eigenfaces and StanfordNLP
Hours and hours of
swiping
Not everyone is
interested/
attracted
Who wants to sink
endless hours with
these odds?
What’s the Problem, dude?
1 hr +
Average / day
4+ matches
Average / week
Infinite
Time wasted
A Problem for Technology
Performing both daily
engineering and research-
based projects at 3 Tier Logic,
I’ve learned how to quickly
hack things in Scala and the
JVM. I figured a “Tinder Bot”
was do-able.
Resources. Spend less
time developing, more time
automating.
Scoping. What specific
problems can be solved with
technology?
Research. Use secondary
research to figure out what
has already been done in
this field.
A Problem of Problems
Swiping.
•  Others have used swipe-all strategies
•  Creates bigger problem => more matches to filter
•  You might not think everyone is attractive
Interest.
•  Even after a match, not everyone is interested
•  Matches suddenly “go dark”
•  You simply don’t get along
Spammers.
•  Self-explanatory.
A Problem of Problems, solved.
Swiping.
•  Use some sort of machine learning/A.I.
technique that can be “taught” who I find
attractive
Interest.
•  Develop a chat bot that will hold a generic
conversation for a couple messages to
filter the uninterested
Spammers.
•  Set up detection rules to filter spammers
What tools are needed?
Scala
•  Type-safe, easy to hack, and huge advantage
with Java interchangeability.
Eigenfaces
•  Battle-tested facial recognition since 1980s.
•  Best algorithmic performance and easy-to-use
on Java Virtual Machine
StanfordNLP
•  Well-developed JVM-based NLP library
compatible with Scala
Swiping.
Putting Eigenfaces to work.
What is Eigenfaces?
Need to do quick and dirty object or facial recognition?
Eigenfaces may be for you.
The essence of Eigenfaces
Eigenfaces is the name given to a set of eigenvectors
when they are used for facial recognition.
A typical use for calculating Eigenfaces works as such:
1.  Obtain a training set of faces and convert to a pixel matrix
2.  Compute the mean image (which is an average of pixel
intensity across each image).
3.  Compute a differential matrix by subtracting the mean from
each training image, pixel by pixel
4.  Compute covariance matrix of the differential matrix
The essence of Eigenfaces
Your “average face” may look a little
uncanny…
5.  Calculate eigenvectors from covariance matrix
6.  Compute Eigenfaces by multiplying eigenvectors and
covariance matrices together, and normalizing them
Putting it together
def computeEigenFaces(pixelMatrix: Array[Array[Double]],
meanColumn: Array[Double]): DoubleMatrix2D = {!
val diffMatrix =
MatrixHelpers.computeDifferenceMatrixPixels(pixelMatrix,
meanColumn)!
val covarianceMatrix =
MatrixHelpers.computeCovarianceMatrix(pixelMatrix, diffMatrix)!
val eigenVectors =
MatrixHelpers.computeEigenVectors(covarianceMatrix)!
computeEigenFaces(eigenVectors, diffMatrix)!
}!
Multiplying eigenvectors/differential
(0 to (rank-1)).foreach { i =>!
var sumSquare = 0.0!
(0 to (pixelCount-1)).foreach { j =>!
(0 to (imageCount-1)).foreach { k =>!
eigenFaces(j)(i) += diffMatrix(j)(k) * eigenVectors.get(i,k)!
}!
sumSquare += eigenFaces(j)(i) * eigenFaces(j)(i)!
}!
var norm = Math.sqrt(sumSquare)!
(0 to (pixelCount-1)).foreach { j =>!
eigenFaces(j)(i) /= norm!
}!
}!
Preprocessing is key
You need to preprocess your images!
Grayscale. Important for calculating pixel intensity values.
Normalization. Not all lighting conditions are equal.
Cropping. Very important to focus only on facial features.
Without preprocessing, you’re gonna have a bad time.
Scala Advantages
Interoperability is a win.
Compatibility with Java means we can use useful classes like
`BufferedImage` while keeping Scala’s simplicity*.
val meanImage = new BufferedImage(width,!
height, BufferedImage.TYPE_BYTE_GRAY)!
!
val raster = meanImage.getRaster()!
*Scala is simpler for this particular situation, IMO
Uncanny Results
Averaging my selections proved interesting.
People I disliked smiled less, had rounder faces, while the opposite was true for those
who I found attractive.
Potential for Eigenfaces?
What else can we do with these great faces!
•  Subjects that can be read 2-dimensionally, from
same angle
•  Optical Character Recognition (OCR)
•  Image segmentation
•  http://www.cs.huji.ac.il/~yweiss/iccv99.pdf
It isn’t a Google Deep Dream, but it has potential…
Interest.
The marriage of StanfordNLP and Scala.
Play 2.
Rebuilt the entire Tinder
interface in Play Scala MVC
framework for desktop.
Chat Bot.
Bot in background is semi-
intelligent and looks for
uninitiated conversations.
Notifications.
Desktop browser
notifications alert for new
chats.
A Non-Typical Conversation
Before Natural Language
Processing could be used to
analyze replies of conversations,
a structure was needed to map
progress of conversations.
•  Analyze reply depth
•  Provide a path to next reply
•  Determine if notification
was necessary
Scala Tree Structures
Trees track progress and replies of conversations.
Trees Codified
case class MessageTree(!
val value: String,!
val left: Option[MessageTree] = None,!
val right: Option[MessageTree] = None!
) {!
!
/** Walk the node using a boolean input. */!
def walk(direction: Direction): Option[MessageTree] = {!
direction match {!
case Right => this.left!
case Left => this.right!
}!
}!
}!
Message trees are simple binary trees.
Walking the Tree
FunMessages.messages.find(_.value == theTreeRoot) match {!
case None => createStopGap(m, true)!
case Some(tree) =>!
val sentiments = MessageUtil!
.assignSentimentDirection(MessageUtil.filterSenderMessag
es(userId, m.messages))!
.map(_._2)!
MessageTree.walkTree(tree, sentiments) match {!
case None => createStopGap(m, true)!
case Some(branch) =>!
new TinderApi(Some(xAuthToken))!
.sendMessage(m._id, branch.value).map { result => …!
Note: pattern matching isn’t the only way to do this.
Sentiment analysis
was easy part.
•  Library already had
trained models for
sentiment
•  Split each match’s
reply into
sentences and
score sentiment
•  Use score to
determine reply
direction
Ready for StanfordNLP
Sentiment of reply determined direction of tree.
val pipeline = new StanfordCoreNLP(nlpProps)!
val annotation = pipeline.process(message)!
var sentiments: ListBuffer[Double] = ListBuffer()!
for (sentence <-
annotation.get(classOf[CoreAnnotations.SentencesAnnotation])
) {!
val tree =
sentence.get(classOf[SentimentCoreAnnotations.AnnotatedTree]
)!
val sentiment =
RNNCoreAnnotations.getPredictedClass(tree)!
val partText = sentence.toString!
sentiments += sentiment.toDouble!
}!
val averageSentiment:Double = {!
if(sentiments.size > 0) sentiments.sum /
sentiments.size!
else 2!
}!
Create Reply Trees
object FunMessages {!
!
def messages = List(!
MessageTree(!
value = "{name} are you a fan of avocados?",!
right = Some(MessageTree(!
value = "So if I asked you to have a guacamole party
with me you'd do it?",!
right = …,!
left = …!
)) …!
Now we have a list of generic replies to open
conversations.
Spammers.
Scala pattern matching FTW.
Number of photos.
Applicable for both spammers and
matching, a profile with one or zero
photos was not worth the time.
Length of bio.
An empty or short bio was a strong
indicator of spammer presence.
Activity.
If they haven’t been active for a
while, they probably won’t respond
soon anyways ;)
General Rules of Selection
Integrate with Selection
if(rec.photos.size==2 && rec.bio=="") dislikeUser("sparse photos, no
bio")!
else if (rec.photos.size==1) dislikeUser("sparse photos")!
else if (lastSeenAgo > (day*3)) dislikeUser("hasn't been active for %s
days".format((lastSeenAgo/day)))!
else if (!photoCriteria(rec.photos)) dislikeUser("failed photo
criteria")!
else if (rec.bio.matches("no.{0,15}hookups")) likeUser("claiming
friendship only")!
else if (autoLike) likeUser("auto-liked")!
else { !
recommendation.FacialRecommendation.makeComparison(user.user._id,
rec._id, rec.photos) match {!
case Some(true) => likeUser("face matched positive
recommendation criteria”) …!
Implementing the rules in code in SwipeTask.scala.
Structure.
Leveraging the Scala and Akka framework.
•  If you need concurrency for
basic computational
performance, use Futures
•  If you’re setting up a router
firing to multiple workers, use
can use Actors
•  If you need something to track
state from outside messages,
such as counting, use Actors
•  And futures are composable!
Notes about Actors
Use Actors for State, Futures for Concurrency
Now that I’ve said that…
i cheated.
1.  Top-level bot service
iterates through data,
looks for tasks.
2.  Tasks are spawned in
their appropriate actors
1.  MessageReplyTask
2.  SwipeTask
3.  FacialCheckTask
3.  Tasks are then placed
in a timed queue
Queue System with Actors
Concurrent and queued calculations were a must.
Found an advantage to
following this anti-pattern,
because I was able to throttle
the amount of computation
(and messaging) without
overwhelming my local CPU
and the Tinder API. In
hindsight, it may have been
better to make each Actor a
worker.
The Bot Service
class TinderBot(taskWarningThreshold: Int, taskSleepThreshold:
Int) extends Actor {!
// Throttler and supervisor watch all of the work!
val botThrottle = context.actorOf(Props(new BotThrottle(1
msgsPer (2 seconds), Some(self))), "BotThrottle")!
val botSupervisor = context.actorOf(Props(new
BotSupervisor(self)), "BotSupervisor”)!
def receive = {!
// send commands to the bot!
case BotCommand(command) => …!
// logic for handling queue state!
case QueueState(queueLength) => …!
}!
Admittedly, a little heavyweight…
•  One key mistake above is I wasn’t storing state in
UpdatesTask actor, I was storing it elsewhere!
•  Akka is especially useful for creating timed micro-
services like the above
•  There are other ways to do this, too…
Easy Scala Services
Scala made it somewhat easy to create micro-services.
private class UpdatesTask extends Actor {
def receive =
case "tick" =>
TinderService.activeSessions.foreach { s => syncUpdates(s) } }
}
private val updateActor = Akka.system.actorOf(Props[UpdatesTask], name = "UpdatesTask")
private val updateService = {
Akka.system.scheduler.schedule(0 seconds, 40 seconds, updateActor, "tick")
}
Final Product.
Putting everything together.
Dashboard
No automated dating is complete without a dashboard...
Messaging
Fully-featured messaging inbox.
Selections
Created my own flavor of the Tinder swipe screen.
Links
Github Project
•  https://github.com/crockpotveggies/tinderbox
Scala
•  http://www.scala-lang.org
Eigenfaces
•  https://en.wikipedia.org/wiki/Eigenface
StanfordNLP
•  http://nlp.stanford.edu/
Akka
•  http://akka.io/
Read More
crockpotveggies.com
Questions? Contact @crockpotveggies

Más contenido relacionado

Destacado

Eigenfaces In Scala
Eigenfaces In ScalaEigenfaces In Scala
Eigenfaces In ScalaJustin Long
 
Naive Bayes Classifier
Naive Bayes ClassifierNaive Bayes Classifier
Naive Bayes ClassifierYiqun Hu
 
Interactive Scientific Image Analysis using Spark
Interactive Scientific Image Analysis using SparkInteractive Scientific Image Analysis using Spark
Interactive Scientific Image Analysis using SparkKevin Mader
 
H2O Deep Water - Making Deep Learning Accessible to Everyone
H2O Deep Water - Making Deep Learning Accessible to EveryoneH2O Deep Water - Making Deep Learning Accessible to Everyone
H2O Deep Water - Making Deep Learning Accessible to EveryoneJo-fai Chow
 
Smart Data Conference: DL4J and DataVec
Smart Data Conference: DL4J and DataVecSmart Data Conference: DL4J and DataVec
Smart Data Conference: DL4J and DataVecJosh Patterson
 
Drone Data Flowing Through Apache NiFi
Drone Data Flowing Through Apache NiFiDrone Data Flowing Through Apache NiFi
Drone Data Flowing Through Apache NiFiTimothy Spann
 
Alpine academy apache spark series #1 introduction to cluster computing wit...
Alpine academy apache spark series #1   introduction to cluster computing wit...Alpine academy apache spark series #1   introduction to cluster computing wit...
Alpine academy apache spark series #1 introduction to cluster computing wit...Holden Karau
 
Intro to Deep Learning for Question Answering
Intro to Deep Learning for Question AnsweringIntro to Deep Learning for Question Answering
Intro to Deep Learning for Question AnsweringTraian Rebedea
 
Sa introduction to big data pipelining with cassandra &amp; spark west mins...
Sa introduction to big data pipelining with cassandra &amp; spark   west mins...Sa introduction to big data pipelining with cassandra &amp; spark   west mins...
Sa introduction to big data pipelining with cassandra &amp; spark west mins...Simon Ambridge
 
Four Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Four Things to Know About Reliable Spark Streaming with Typesafe and DatabricksFour Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Four Things to Know About Reliable Spark Streaming with Typesafe and DatabricksLegacy Typesafe (now Lightbend)
 
Reactive app using actor model & apache spark
Reactive app using actor model & apache sparkReactive app using actor model & apache spark
Reactive app using actor model & apache sparkRahul Kumar
 
An introduction to Machine Learning (and a little bit of Deep Learning)
An introduction to Machine Learning (and a little bit of Deep Learning)An introduction to Machine Learning (and a little bit of Deep Learning)
An introduction to Machine Learning (and a little bit of Deep Learning)Thomas da Silva Paula
 
Spark Kernel Talk - Apache Spark Meetup San Francisco (July 2015)
Spark Kernel Talk - Apache Spark Meetup San Francisco (July 2015)Spark Kernel Talk - Apache Spark Meetup San Francisco (July 2015)
Spark Kernel Talk - Apache Spark Meetup San Francisco (July 2015)Robert "Chip" Senkbeil
 
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo LeeData Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo LeeSpark Summit
 
Reactive dashboard’s using apache spark
Reactive dashboard’s using apache sparkReactive dashboard’s using apache spark
Reactive dashboard’s using apache sparkRahul Kumar
 
Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra
Real-Time Anomaly Detection  with Spark MLlib, Akka and  CassandraReal-Time Anomaly Detection  with Spark MLlib, Akka and  Cassandra
Real-Time Anomaly Detection with Spark MLlib, Akka and CassandraNatalino Busa
 
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...Anton Kirillov
 
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, ScalaLambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, ScalaHelena Edelson
 
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)Helena Edelson
 

Destacado (20)

Eigenfaces In Scala
Eigenfaces In ScalaEigenfaces In Scala
Eigenfaces In Scala
 
Naive Bayes Classifier
Naive Bayes ClassifierNaive Bayes Classifier
Naive Bayes Classifier
 
Interactive Scientific Image Analysis using Spark
Interactive Scientific Image Analysis using SparkInteractive Scientific Image Analysis using Spark
Interactive Scientific Image Analysis using Spark
 
H2O Deep Water - Making Deep Learning Accessible to Everyone
H2O Deep Water - Making Deep Learning Accessible to EveryoneH2O Deep Water - Making Deep Learning Accessible to Everyone
H2O Deep Water - Making Deep Learning Accessible to Everyone
 
Smart Data Conference: DL4J and DataVec
Smart Data Conference: DL4J and DataVecSmart Data Conference: DL4J and DataVec
Smart Data Conference: DL4J and DataVec
 
Drone Data Flowing Through Apache NiFi
Drone Data Flowing Through Apache NiFiDrone Data Flowing Through Apache NiFi
Drone Data Flowing Through Apache NiFi
 
Alpine academy apache spark series #1 introduction to cluster computing wit...
Alpine academy apache spark series #1   introduction to cluster computing wit...Alpine academy apache spark series #1   introduction to cluster computing wit...
Alpine academy apache spark series #1 introduction to cluster computing wit...
 
Intro to Deep Learning for Question Answering
Intro to Deep Learning for Question AnsweringIntro to Deep Learning for Question Answering
Intro to Deep Learning for Question Answering
 
Sa introduction to big data pipelining with cassandra &amp; spark west mins...
Sa introduction to big data pipelining with cassandra &amp; spark   west mins...Sa introduction to big data pipelining with cassandra &amp; spark   west mins...
Sa introduction to big data pipelining with cassandra &amp; spark west mins...
 
Four Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Four Things to Know About Reliable Spark Streaming with Typesafe and DatabricksFour Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Four Things to Know About Reliable Spark Streaming with Typesafe and Databricks
 
Reactive app using actor model & apache spark
Reactive app using actor model & apache sparkReactive app using actor model & apache spark
Reactive app using actor model & apache spark
 
How to deploy Apache Spark 
to Mesos/DCOS
How to deploy Apache Spark 
to Mesos/DCOSHow to deploy Apache Spark 
to Mesos/DCOS
How to deploy Apache Spark 
to Mesos/DCOS
 
An introduction to Machine Learning (and a little bit of Deep Learning)
An introduction to Machine Learning (and a little bit of Deep Learning)An introduction to Machine Learning (and a little bit of Deep Learning)
An introduction to Machine Learning (and a little bit of Deep Learning)
 
Spark Kernel Talk - Apache Spark Meetup San Francisco (July 2015)
Spark Kernel Talk - Apache Spark Meetup San Francisco (July 2015)Spark Kernel Talk - Apache Spark Meetup San Francisco (July 2015)
Spark Kernel Talk - Apache Spark Meetup San Francisco (July 2015)
 
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo LeeData Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
 
Reactive dashboard’s using apache spark
Reactive dashboard’s using apache sparkReactive dashboard’s using apache spark
Reactive dashboard’s using apache spark
 
Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra
Real-Time Anomaly Detection  with Spark MLlib, Akka and  CassandraReal-Time Anomaly Detection  with Spark MLlib, Akka and  Cassandra
Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra
 
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
 
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, ScalaLambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala
 
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
 

Similar a Automating Tinder w/ Eigenfaces and StanfordNLP

Deep learning introduction
Deep learning introductionDeep learning introduction
Deep learning introductionAdwait Bhave
 
Mirko Lucchese - Deep Image Processing
Mirko Lucchese - Deep Image ProcessingMirko Lucchese - Deep Image Processing
Mirko Lucchese - Deep Image ProcessingMeetupDataScienceRoma
 
Data Science Challenge presentation given to the CinBITools Meetup Group
Data Science Challenge presentation given to the CinBITools Meetup GroupData Science Challenge presentation given to the CinBITools Meetup Group
Data Science Challenge presentation given to the CinBITools Meetup GroupDoug Needham
 
Cloudera Data Science Challenge
Cloudera Data Science ChallengeCloudera Data Science Challenge
Cloudera Data Science ChallengeMark Nichols, P.E.
 
Beyond the Style Guides
Beyond the Style GuidesBeyond the Style Guides
Beyond the Style GuidesMosky Liu
 
Data oriented design and c++
Data oriented design and c++Data oriented design and c++
Data oriented design and c++Mike Acton
 
Machine Learning Workshop, TSEC 2020
Machine Learning Workshop, TSEC 2020Machine Learning Workshop, TSEC 2020
Machine Learning Workshop, TSEC 2020Siddharth Adelkar
 
Basics of Programming - A Review Guide
Basics of Programming - A Review GuideBasics of Programming - A Review Guide
Basics of Programming - A Review GuideBenjamin Kissinger
 
Discovering User's Topics of Interest in Recommender Systems @ Meetup Machine...
Discovering User's Topics of Interest in Recommender Systems @ Meetup Machine...Discovering User's Topics of Interest in Recommender Systems @ Meetup Machine...
Discovering User's Topics of Interest in Recommender Systems @ Meetup Machine...Gabriel Moreira
 
Multimodal Learning Analytics
Multimodal Learning AnalyticsMultimodal Learning Analytics
Multimodal Learning AnalyticsXavier Ochoa
 
OWF14 - Big Data : The State of Machine Learning in 2014
OWF14 - Big Data : The State of Machine  Learning in 2014OWF14 - Big Data : The State of Machine  Learning in 2014
OWF14 - Big Data : The State of Machine Learning in 2014Paris Open Source Summit
 
programming_tutorial_course_ lesson_1.pptx
programming_tutorial_course_ lesson_1.pptxprogramming_tutorial_course_ lesson_1.pptx
programming_tutorial_course_ lesson_1.pptxaboma2hawi
 
2013 Lecture 5: AR Tools and Interaction
2013 Lecture 5: AR Tools and Interaction 2013 Lecture 5: AR Tools and Interaction
2013 Lecture 5: AR Tools and Interaction Mark Billinghurst
 
Predict oscars (5:11)
Predict oscars (5:11)Predict oscars (5:11)
Predict oscars (5:11)Thinkful
 
Discovering User's Topics of Interest in Recommender Systems
Discovering User's Topics of Interest in Recommender SystemsDiscovering User's Topics of Interest in Recommender Systems
Discovering User's Topics of Interest in Recommender SystemsGabriel Moreira
 
Extreme Programming practices for your team
Extreme Programming practices for your teamExtreme Programming practices for your team
Extreme Programming practices for your teamPawel Lipinski
 

Similar a Automating Tinder w/ Eigenfaces and StanfordNLP (20)

Deep learning introduction
Deep learning introductionDeep learning introduction
Deep learning introduction
 
Mirko Lucchese - Deep Image Processing
Mirko Lucchese - Deep Image ProcessingMirko Lucchese - Deep Image Processing
Mirko Lucchese - Deep Image Processing
 
InternshipReport
InternshipReportInternshipReport
InternshipReport
 
Data Science Challenge presentation given to the CinBITools Meetup Group
Data Science Challenge presentation given to the CinBITools Meetup GroupData Science Challenge presentation given to the CinBITools Meetup Group
Data Science Challenge presentation given to the CinBITools Meetup Group
 
Cloudera Data Science Challenge
Cloudera Data Science ChallengeCloudera Data Science Challenge
Cloudera Data Science Challenge
 
Beyond the Style Guides
Beyond the Style GuidesBeyond the Style Guides
Beyond the Style Guides
 
Data oriented design and c++
Data oriented design and c++Data oriented design and c++
Data oriented design and c++
 
Machine Learning Workshop, TSEC 2020
Machine Learning Workshop, TSEC 2020Machine Learning Workshop, TSEC 2020
Machine Learning Workshop, TSEC 2020
 
Basics of Programming - A Review Guide
Basics of Programming - A Review GuideBasics of Programming - A Review Guide
Basics of Programming - A Review Guide
 
Discovering User's Topics of Interest in Recommender Systems @ Meetup Machine...
Discovering User's Topics of Interest in Recommender Systems @ Meetup Machine...Discovering User's Topics of Interest in Recommender Systems @ Meetup Machine...
Discovering User's Topics of Interest in Recommender Systems @ Meetup Machine...
 
Apple Machine Learning
Apple Machine LearningApple Machine Learning
Apple Machine Learning
 
Multimodal Learning Analytics
Multimodal Learning AnalyticsMultimodal Learning Analytics
Multimodal Learning Analytics
 
OWF14 - Big Data : The State of Machine Learning in 2014
OWF14 - Big Data : The State of Machine  Learning in 2014OWF14 - Big Data : The State of Machine  Learning in 2014
OWF14 - Big Data : The State of Machine Learning in 2014
 
programming_tutorial_course_ lesson_1.pptx
programming_tutorial_course_ lesson_1.pptxprogramming_tutorial_course_ lesson_1.pptx
programming_tutorial_course_ lesson_1.pptx
 
2013 Lecture 5: AR Tools and Interaction
2013 Lecture 5: AR Tools and Interaction 2013 Lecture 5: AR Tools and Interaction
2013 Lecture 5: AR Tools and Interaction
 
Predict oscars (5:11)
Predict oscars (5:11)Predict oscars (5:11)
Predict oscars (5:11)
 
Phx dl meetup
Phx dl meetupPhx dl meetup
Phx dl meetup
 
Discovering User's Topics of Interest in Recommender Systems
Discovering User's Topics of Interest in Recommender SystemsDiscovering User's Topics of Interest in Recommender Systems
Discovering User's Topics of Interest in Recommender Systems
 
tensorflow.pptx
tensorflow.pptxtensorflow.pptx
tensorflow.pptx
 
Extreme Programming practices for your team
Extreme Programming practices for your teamExtreme Programming practices for your team
Extreme Programming practices for your team
 

Último

Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdfChristopherTHyatt
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 

Último (20)

Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 

Automating Tinder w/ Eigenfaces and StanfordNLP

  • 1. Automating Tinder Justin Long | @crockpotveggies with Eigenfaces and StanfordNLP
  • 2. Hours and hours of swiping Not everyone is interested/ attracted Who wants to sink endless hours with these odds? What’s the Problem, dude? 1 hr + Average / day 4+ matches Average / week Infinite Time wasted
  • 3. A Problem for Technology Performing both daily engineering and research- based projects at 3 Tier Logic, I’ve learned how to quickly hack things in Scala and the JVM. I figured a “Tinder Bot” was do-able. Resources. Spend less time developing, more time automating. Scoping. What specific problems can be solved with technology? Research. Use secondary research to figure out what has already been done in this field.
  • 4. A Problem of Problems Swiping. •  Others have used swipe-all strategies •  Creates bigger problem => more matches to filter •  You might not think everyone is attractive Interest. •  Even after a match, not everyone is interested •  Matches suddenly “go dark” •  You simply don’t get along Spammers. •  Self-explanatory.
  • 5. A Problem of Problems, solved. Swiping. •  Use some sort of machine learning/A.I. technique that can be “taught” who I find attractive Interest. •  Develop a chat bot that will hold a generic conversation for a couple messages to filter the uninterested Spammers. •  Set up detection rules to filter spammers
  • 6. What tools are needed? Scala •  Type-safe, easy to hack, and huge advantage with Java interchangeability. Eigenfaces •  Battle-tested facial recognition since 1980s. •  Best algorithmic performance and easy-to-use on Java Virtual Machine StanfordNLP •  Well-developed JVM-based NLP library compatible with Scala
  • 8. What is Eigenfaces? Need to do quick and dirty object or facial recognition? Eigenfaces may be for you.
  • 9. The essence of Eigenfaces Eigenfaces is the name given to a set of eigenvectors when they are used for facial recognition. A typical use for calculating Eigenfaces works as such: 1.  Obtain a training set of faces and convert to a pixel matrix 2.  Compute the mean image (which is an average of pixel intensity across each image). 3.  Compute a differential matrix by subtracting the mean from each training image, pixel by pixel 4.  Compute covariance matrix of the differential matrix
  • 10. The essence of Eigenfaces Your “average face” may look a little uncanny… 5.  Calculate eigenvectors from covariance matrix 6.  Compute Eigenfaces by multiplying eigenvectors and covariance matrices together, and normalizing them
  • 11. Putting it together def computeEigenFaces(pixelMatrix: Array[Array[Double]], meanColumn: Array[Double]): DoubleMatrix2D = {! val diffMatrix = MatrixHelpers.computeDifferenceMatrixPixels(pixelMatrix, meanColumn)! val covarianceMatrix = MatrixHelpers.computeCovarianceMatrix(pixelMatrix, diffMatrix)! val eigenVectors = MatrixHelpers.computeEigenVectors(covarianceMatrix)! computeEigenFaces(eigenVectors, diffMatrix)! }!
  • 12. Multiplying eigenvectors/differential (0 to (rank-1)).foreach { i =>! var sumSquare = 0.0! (0 to (pixelCount-1)).foreach { j =>! (0 to (imageCount-1)).foreach { k =>! eigenFaces(j)(i) += diffMatrix(j)(k) * eigenVectors.get(i,k)! }! sumSquare += eigenFaces(j)(i) * eigenFaces(j)(i)! }! var norm = Math.sqrt(sumSquare)! (0 to (pixelCount-1)).foreach { j =>! eigenFaces(j)(i) /= norm! }! }!
  • 13. Preprocessing is key You need to preprocess your images! Grayscale. Important for calculating pixel intensity values. Normalization. Not all lighting conditions are equal. Cropping. Very important to focus only on facial features. Without preprocessing, you’re gonna have a bad time.
  • 14. Scala Advantages Interoperability is a win. Compatibility with Java means we can use useful classes like `BufferedImage` while keeping Scala’s simplicity*. val meanImage = new BufferedImage(width,! height, BufferedImage.TYPE_BYTE_GRAY)! ! val raster = meanImage.getRaster()! *Scala is simpler for this particular situation, IMO
  • 15. Uncanny Results Averaging my selections proved interesting. People I disliked smiled less, had rounder faces, while the opposite was true for those who I found attractive.
  • 16. Potential for Eigenfaces? What else can we do with these great faces! •  Subjects that can be read 2-dimensionally, from same angle •  Optical Character Recognition (OCR) •  Image segmentation •  http://www.cs.huji.ac.il/~yweiss/iccv99.pdf It isn’t a Google Deep Dream, but it has potential…
  • 17. Interest. The marriage of StanfordNLP and Scala.
  • 18. Play 2. Rebuilt the entire Tinder interface in Play Scala MVC framework for desktop. Chat Bot. Bot in background is semi- intelligent and looks for uninitiated conversations. Notifications. Desktop browser notifications alert for new chats. A Non-Typical Conversation
  • 19. Before Natural Language Processing could be used to analyze replies of conversations, a structure was needed to map progress of conversations. •  Analyze reply depth •  Provide a path to next reply •  Determine if notification was necessary Scala Tree Structures Trees track progress and replies of conversations.
  • 20. Trees Codified case class MessageTree(! val value: String,! val left: Option[MessageTree] = None,! val right: Option[MessageTree] = None! ) {! ! /** Walk the node using a boolean input. */! def walk(direction: Direction): Option[MessageTree] = {! direction match {! case Right => this.left! case Left => this.right! }! }! }! Message trees are simple binary trees.
  • 21. Walking the Tree FunMessages.messages.find(_.value == theTreeRoot) match {! case None => createStopGap(m, true)! case Some(tree) =>! val sentiments = MessageUtil! .assignSentimentDirection(MessageUtil.filterSenderMessag es(userId, m.messages))! .map(_._2)! MessageTree.walkTree(tree, sentiments) match {! case None => createStopGap(m, true)! case Some(branch) =>! new TinderApi(Some(xAuthToken))! .sendMessage(m._id, branch.value).map { result => …! Note: pattern matching isn’t the only way to do this.
  • 22. Sentiment analysis was easy part. •  Library already had trained models for sentiment •  Split each match’s reply into sentences and score sentiment •  Use score to determine reply direction Ready for StanfordNLP Sentiment of reply determined direction of tree. val pipeline = new StanfordCoreNLP(nlpProps)! val annotation = pipeline.process(message)! var sentiments: ListBuffer[Double] = ListBuffer()! for (sentence <- annotation.get(classOf[CoreAnnotations.SentencesAnnotation]) ) {! val tree = sentence.get(classOf[SentimentCoreAnnotations.AnnotatedTree] )! val sentiment = RNNCoreAnnotations.getPredictedClass(tree)! val partText = sentence.toString! sentiments += sentiment.toDouble! }! val averageSentiment:Double = {! if(sentiments.size > 0) sentiments.sum / sentiments.size! else 2! }!
  • 23. Create Reply Trees object FunMessages {! ! def messages = List(! MessageTree(! value = "{name} are you a fan of avocados?",! right = Some(MessageTree(! value = "So if I asked you to have a guacamole party with me you'd do it?",! right = …,! left = …! )) …! Now we have a list of generic replies to open conversations.
  • 25. Number of photos. Applicable for both spammers and matching, a profile with one or zero photos was not worth the time. Length of bio. An empty or short bio was a strong indicator of spammer presence. Activity. If they haven’t been active for a while, they probably won’t respond soon anyways ;) General Rules of Selection
  • 26. Integrate with Selection if(rec.photos.size==2 && rec.bio=="") dislikeUser("sparse photos, no bio")! else if (rec.photos.size==1) dislikeUser("sparse photos")! else if (lastSeenAgo > (day*3)) dislikeUser("hasn't been active for %s days".format((lastSeenAgo/day)))! else if (!photoCriteria(rec.photos)) dislikeUser("failed photo criteria")! else if (rec.bio.matches("no.{0,15}hookups")) likeUser("claiming friendship only")! else if (autoLike) likeUser("auto-liked")! else { ! recommendation.FacialRecommendation.makeComparison(user.user._id, rec._id, rec.photos) match {! case Some(true) => likeUser("face matched positive recommendation criteria”) …! Implementing the rules in code in SwipeTask.scala.
  • 27. Structure. Leveraging the Scala and Akka framework.
  • 28. •  If you need concurrency for basic computational performance, use Futures •  If you’re setting up a router firing to multiple workers, use can use Actors •  If you need something to track state from outside messages, such as counting, use Actors •  And futures are composable! Notes about Actors Use Actors for State, Futures for Concurrency
  • 29. Now that I’ve said that… i cheated.
  • 30. 1.  Top-level bot service iterates through data, looks for tasks. 2.  Tasks are spawned in their appropriate actors 1.  MessageReplyTask 2.  SwipeTask 3.  FacialCheckTask 3.  Tasks are then placed in a timed queue Queue System with Actors Concurrent and queued calculations were a must. Found an advantage to following this anti-pattern, because I was able to throttle the amount of computation (and messaging) without overwhelming my local CPU and the Tinder API. In hindsight, it may have been better to make each Actor a worker.
  • 31. The Bot Service class TinderBot(taskWarningThreshold: Int, taskSleepThreshold: Int) extends Actor {! // Throttler and supervisor watch all of the work! val botThrottle = context.actorOf(Props(new BotThrottle(1 msgsPer (2 seconds), Some(self))), "BotThrottle")! val botSupervisor = context.actorOf(Props(new BotSupervisor(self)), "BotSupervisor”)! def receive = {! // send commands to the bot! case BotCommand(command) => …! // logic for handling queue state! case QueueState(queueLength) => …! }! Admittedly, a little heavyweight…
  • 32. •  One key mistake above is I wasn’t storing state in UpdatesTask actor, I was storing it elsewhere! •  Akka is especially useful for creating timed micro- services like the above •  There are other ways to do this, too… Easy Scala Services Scala made it somewhat easy to create micro-services. private class UpdatesTask extends Actor { def receive = case "tick" => TinderService.activeSessions.foreach { s => syncUpdates(s) } } } private val updateActor = Akka.system.actorOf(Props[UpdatesTask], name = "UpdatesTask") private val updateService = { Akka.system.scheduler.schedule(0 seconds, 40 seconds, updateActor, "tick") }
  • 34. Dashboard No automated dating is complete without a dashboard...
  • 36. Selections Created my own flavor of the Tinder swipe screen.
  • 37. Links Github Project •  https://github.com/crockpotveggies/tinderbox Scala •  http://www.scala-lang.org Eigenfaces •  https://en.wikipedia.org/wiki/Eigenface StanfordNLP •  http://nlp.stanford.edu/ Akka •  http://akka.io/