2. Motivation
• Twitter
represents
a
rich
flow
of
information
• Lack
of
an
effective
way
to
query
the
twitter
• Hard
to
monitor
interested
topics
at
real
time
3. Search
Tweets
Like
a
Professional
A
Real
Time
Twitter
Search
Engine
That
Allows
you
to
Search
based
on:
•Keywords
◦Country
◦Language
◦Negative
words
Demo(http://searchyourtweet.info:5000/input)
4. Keep
an
eye
on
your
interested
topic
•Not
just
searching
the
historical
tweets
•Express
your
interest,
we
will
keep
you
update
on
the
newest
event
•More
technical
detail
on
this
later
•Video
(https://youtu.be/GdRmXNfukos)
6. Challenge
Connect
backend
data
pipeline:
◦How to connect Kafka with ElasticSearch?
◦ Try with elasticsearch-‐river-‐kafka plugin,not
successful
◦ Solution:using Logstash!
◦ Advantage:
◦ Easy to use
◦ Highly Scalable
◦ Work with different data sources and
destinations
An
example
of
logstash and
queue
In
production
environment
7. Challenge
Percolator:
◦Use
Case:
Altering
and
monitoring
documents
◦Think
it
as
“search
in
reverse”
◦ User
register
queries
into
percolator
◦ Percolator
match
incoming
documents
with
registered
queries
◦How
to
design
the
percolator
data
pipeline?
◦How
to
decouple
the
backend
database
with
frontend
server?
◦ Use
publish
/
subscribe
design
pattern
9. •query_controller will
construct
the
percolator
query
based
on
it,
and
pass
it
to
ElasticSearch percolator.
The
query_controllerwill
also
open
an
Redis channel
for
this
topic.
•Query_controller will
keep
fetching
the
latest
tweets
from
ElasticSearch for
every
5s
(current
setting)
and
sending
them
to
percolator
for
matching.
•For
each
tweet,
percolator
will
tell
us
if
it
matches
any
registered
query.
Query_controller will
push
tweet
to
the
right
Redis channel
based
this
information.
•In
frontend,
Flask
server
will
subscribe
to
the
Redis channel
and
receive
percolator's
update.
•For
this
demo,
in
order
to
keep
frontend
UI
simple,
all
tweets
will
be
directed
to
the
default
Redis channel.
Data
flow
of
percolator
10. Challenge
• Real
time
update
on
frontend:
◦ How
to
keep
posting
Redis messages
from
Flask
server
to
client
at
real
time
(solved
a
very
hacky solution)
• Construct
ElasticSearch query
• Fine
tuning
on
ElasticSearch (not
enough
time
to
fine
tuning
elasticsearch mapping)
11. About
Me
M.Math,
University
of
Waterloo
◦ Field:
Statistics
and
Machine
Learning
B.S.,
University
of
Toronto
◦ Field:
Applied
Mathematics
Data
Scientist
Intern,
Neon
Inc.,
San
Francisco
Back-‐end
Model
Developer,
MetricAid Inc.,
Toronto
Strong
interest
in
Deep
Learning:
◦ Convolutional
Network,
Recurrent
Network
◦ Applying
Deep
Learning
in
NLP