3. Agenda
• An Optimization Problem
• Genetic Algorithm Overview
• Modeling Solr Parameters
• Fitness Function
4. sir can you help me… ????
"iam from indonesia want to build
search engine like a Google and i
want to build the system using
Genetic Algorithm but iam confused
what will i do first.
Thanks before."
5. Search Algorithm Parameters
/select?q=foo&defType=dismax
&qf=name^20+desc^10
&pf=name^10&ps=3&mm=2
&bf=”ord(popularity)^0.05”
and many more
6. Where did those numbers come from?
I made them up…
shhhhhhh.
Then we tweaked them after testing.
7. An Optimization Problem
So, how do we know we have the best set of
numbers? Or even a good set?
We have an optimization problem.
9. Sample Data Set
[{
"name":"Red Lobster",
"description":"We deliver the freshest caught seafood every
day."
},{
"name":"Joe's Crab Shack",
"description":"We serve delicious red crabs, rock crabs,
large lobsters, and other delicious seafood. Our lobsters are
our specialty."}]
http://localhost:8983/solr/restaurantsCollection/select?q=red+lobster&defType=dismax&qf=name
+description&indent=true&fl=name+description
10. Genetic Algorithms
• A tool for solving optimization problems
• Based on ideas from genetics, evolution,
and natural selection
• DEAP – Distributed Evolutionary
Algorithms in Python
11. Genetic Algorithms
• Define candidate solution encoding
• Define a fitness function
• Generate random solutions
• Select candidates for reproduction
• Use crossover and mutation to create a new
generation
• Repeat until some criteria is met
19. Fitness Function
• Measure how well a candidate solution
solves the problem
• Should be very fast
20. Normalized Discounted Cumulative Gain
• Very relevant > relevant > not relevant
• Relevant results are more useful if they
appear earlier
• Results should be irrelevant of the query
21.
22. Precision and Recall
Precision – Likelihood that a returned result was
correct
Recall – Likelihood that a relevant result was
returned
26. Resources
• DEAP - https://code.google.com/p/deap/
• My github repo for this example -
https://github.com/jstrassburg/evolving-search-relevancy
• @jstrassburg