2. History
~2005? Created "#trivia" for my wife
Uses Blitzed Trivia bot, "brainiac", and a 110k
question/answer DB
Winter 2012 had an interest in NLP for potential
project
Decided to tackle a "toy problem"
"Let's play trivia!"
4. Natural Language Processing (NLP)
Algorithms that can parse and process human
language
Major field of study related to AI, useful in
● Machine translation
● Grammar induction
● Information extraction
● Sentence understanding
5. Challenges & Advantages
Unlike Jeopardy
● Can answer question wrong and not get
penalized, try multiple times
● No puns or wordplay, straightforward
questions
Still ...
● Have to have a knowledge base - Google
● Have to be able to figure out the right
answer
7. Base Assumptions
"Google knows all" - no need to make a local
knowledge database
The right answer will be commonly seen,
exploit that repetition
8. Watson 1.0
~100 LoC, "an evening of futzing around"
"Strategy"
1. Read the question
2. Throw it at Google, get a result page
3. Find all the proper names (via NLTK) from
page titles, rank by frequency
4. Guess those sequentially
10. Watson 2.0
Written a few days later
~300 LoC, "actually had to think this time"
Strategy
● Check a DB of cached questions and
answers (from observations), use similar
ones if possible
● Read question, throw at Google (or Bing)
● Figure out what kind of answer is expected,
extract matching text via NLTK and scoring
● If we get a hint, use it (as a regex)
11. Extracting Answers from Web Pages
Challenge
Web pages contain a lot of junk around the
answer
How do we find what the answer in the sea of
words?
Simple strategy - extract proper names!
(The trivia DB often has proper names for
answers)
12. Where is Watson these days?
00:08 < brainiac> Congratulations to
rogueclown who has won this round! What a
brain!
00:08 < brainiac> Final scores:
00:08 < brainiac>
rogueclown: 10
00:08 < brainiac>
watson: 9
00:08 < brainiac>
purge: 2
irc://coffee.ofdoom.org:6667/#trivia
13. Additional Ideas for Watson 2.0
New search engines
Bing, Ask, Wolfram Alpha
Prune knowledge base
Weed out useless “answers”
New/different named entity recognition engine
Experiment with scoring algorithms for guesses
14. Disappointments
Only a minor increase in my knowledge of NLP
I did not become an NLP maestro
No one else built a bot
Was hoping for a competition
15. Watson 3 .. sorta in the works
Ideas
Natural language interface to semantic web (e.g.
QuestIO, Quepy), SPARQL endpoints
Wolfram Alpha-like UI, research prototypes
available
Teach the bot what kind of answer to look for
Quantity, dates, names, etc
Probabalistic programming? Marry answers with
confidence