SlideShare una empresa de Scribd logo
1 de 12
Descargar para leer sin conexión
SEARCH	
  YOUR	
  TWEETS
SEARCH	
  LIKE	
  A	
  PROFESSIONAL
Motivation
• Twitter	
  represents	
  a	
  rich	
  flow	
  of	
  information
• Lack	
  of	
  an	
  effective	
  way	
  to	
  query	
  the	
  twitter
• Hard	
  to	
  monitor	
  interested	
  topics	
  at	
  real	
  time
Search	
  Tweets	
  Like	
  a	
  Professional
A	
  Real	
  Time	
  Twitter	
  Search	
  Engine	
  That	
  
Allows	
  you	
  to	
  Search	
  based	
  on:
•Keywords
◦Country
◦Language
◦Negative	
  words
Demo(http://searchyourtweet.info:5000/input)
Keep	
  an	
  eye	
  on	
  your	
  interested	
  topic
•Not	
  just	
  searching	
  the	
  historical	
  tweets
•Express	
  your	
  interest,	
  we	
  will	
  keep	
  you	
  update	
  on	
  the	
  newest	
  event
•More	
  technical	
  detail	
  on	
  this	
  later
•Video	
  (https://youtu.be/GdRmXNfukos)
Data	
  pipeline
Query	
  Controller
Backend	
  Database
percolator
Logic	
  Layer Frontend
Searching	
  database
Data	
  Backup
Pub/Sub
Publish
Matching	
  query
Register	
  query
searching
Challenge
Connect	
  backend	
  data	
  pipeline:
◦How to connect Kafka with ElasticSearch?
◦ Try with elasticsearch-­‐river-­‐kafka plugin,not
successful
◦ Solution:using Logstash!
◦ Advantage:
◦ Easy to use
◦ Highly Scalable
◦ Work with different data sources and
destinations
An	
  example	
  of	
  logstash and	
  queue	
  
In	
  production	
   environment
Challenge
Percolator:
◦Use	
  Case:	
  Altering	
  and	
  monitoring	
  documents
◦Think	
  it	
  as	
  “search	
  in	
  reverse”
◦ User	
  register	
  queries	
  into	
  percolator
◦ Percolator	
  match	
  incoming	
  documents	
  with	
  registered	
  queries
◦How	
  to	
  design	
  the	
  percolator	
  data	
  pipeline?
◦How	
  to	
  decouple	
  the	
  backend	
  database	
  with	
  frontend	
  server?
◦ Use	
  publish	
  /	
  subscribe	
  design	
  pattern
Percolator	
  Pipeline
Percolator
Query	
  database
Twitter	
  database
Controller
Pub/Sub
New	
  incoming	
  tweets
publish
subscribe
Open	
  channel
•query_controller will	
  construct	
  the	
  percolator	
  query	
  based	
  on	
  it,	
  and	
  pass	
  it	
  to	
  
ElasticSearch percolator.	
  The	
  query_controllerwill	
  also	
  open	
  an	
  Redis channel	
  for	
  
this	
  topic.
•Query_controller will	
  keep	
  fetching	
  the	
  latest	
  tweets	
  from	
  ElasticSearch for	
  every	
  
5s	
  (current	
  setting)	
  and	
  sending	
  them	
  to	
  percolator	
  for	
  matching.
•For	
  each	
  tweet,	
  percolator	
  will	
  tell	
  us	
  if	
  it	
  matches	
  any	
  registered	
  query.	
  
Query_controller will	
  push	
  tweet	
  to	
  the	
  right	
  Redis channel	
  based	
  this	
  information.
•In	
  frontend,	
  Flask	
  server	
  will	
  subscribe	
  to	
  the	
  Redis channel	
  and	
  receive	
  
percolator's	
  update.
•For	
  this	
  demo,	
  in	
  order	
  to	
  keep	
  frontend	
  UI	
  simple,	
  all	
  tweets	
  will	
  be	
  directed	
  to	
  
the	
  default	
  Redis channel.	
  
Data	
  flow	
  of	
  percolator
Challenge
• Real	
  time	
  update	
  on	
  frontend:
◦ How	
  to	
  keep	
  posting	
  Redis messages	
  from	
  Flask	
  server	
  to	
  client	
  at	
  
real	
  time	
  (solved	
  a	
  very	
  hacky solution)
• Construct	
  ElasticSearch query
• Fine	
  tuning	
  on	
  ElasticSearch (not	
  enough	
  time	
  to	
  fine	
  
tuning	
  elasticsearch mapping)	
  
About	
  Me
M.Math,	
  University	
  of	
  Waterloo
◦ Field:	
  Statistics	
  and	
  Machine	
  Learning
B.S.,	
  University	
  of	
  Toronto
◦ Field:	
  Applied	
  Mathematics
Data	
  Scientist	
  Intern,	
  Neon	
  Inc.,	
  San	
  Francisco
Back-­‐end	
  Model	
  Developer,	
  MetricAid Inc.,	
  Toronto
Strong	
  interest	
  in	
  Deep	
  Learning:	
  
◦ Convolutional	
  Network,	
  Recurrent	
  Network
◦ Applying	
  Deep	
  Learning	
  in	
  NLP
Questions?
Thank	
  you!	
  

Más contenido relacionado

La actualidad más candente

Data Care, Feeding, and Maintenance
Data Care, Feeding, and MaintenanceData Care, Feeding, and Maintenance
Data Care, Feeding, and Maintenance
Mercedes Coyle
 
Black Friday and Cyber Monday- Best Practices for Your E-Commerce Database
Black Friday and Cyber Monday- Best Practices for Your E-Commerce DatabaseBlack Friday and Cyber Monday- Best Practices for Your E-Commerce Database
Black Friday and Cyber Monday- Best Practices for Your E-Commerce Database
Tim Vaillancourt
 

La actualidad más candente (15)

Extending Solr: Behind CareerBuilder’s Cloud-like Knowledge Discovery Platfor...
Extending Solr: Behind CareerBuilder’s Cloud-like Knowledge Discovery Platfor...Extending Solr: Behind CareerBuilder’s Cloud-like Knowledge Discovery Platfor...
Extending Solr: Behind CareerBuilder’s Cloud-like Knowledge Discovery Platfor...
 
Zeppelin at twitter (sf data science meetup, july 2016)
Zeppelin at twitter (sf data science meetup, july 2016)Zeppelin at twitter (sf data science meetup, july 2016)
Zeppelin at twitter (sf data science meetup, july 2016)
 
Innovate Better Through Machine data Analytics
Innovate Better Through Machine data AnalyticsInnovate Better Through Machine data Analytics
Innovate Better Through Machine data Analytics
 
Spark Pitfalls meetup UnderscoreIL
Spark Pitfalls meetup UnderscoreILSpark Pitfalls meetup UnderscoreIL
Spark Pitfalls meetup UnderscoreIL
 
Confluent and Elastic
Confluent and ElasticConfluent and Elastic
Confluent and Elastic
 
Data Care, Feeding, and Maintenance
Data Care, Feeding, and MaintenanceData Care, Feeding, and Maintenance
Data Care, Feeding, and Maintenance
 
The tale of 100 cve's
The tale of 100 cve'sThe tale of 100 cve's
The tale of 100 cve's
 
Black Friday and Cyber Monday- Best Practices for Your E-Commerce Database
Black Friday and Cyber Monday- Best Practices for Your E-Commerce DatabaseBlack Friday and Cyber Monday- Best Practices for Your E-Commerce Database
Black Friday and Cyber Monday- Best Practices for Your E-Commerce Database
 
Batch and Streaming
Batch and StreamingBatch and Streaming
Batch and Streaming
 
Customer Presentation - Financial Services Organization
Customer Presentation - Financial Services OrganizationCustomer Presentation - Financial Services Organization
Customer Presentation - Financial Services Organization
 
Data Science at Netflix with Amazon EMR (BDT306) | AWS re:Invent 2013
Data Science at Netflix with Amazon EMR (BDT306) | AWS re:Invent 2013Data Science at Netflix with Amazon EMR (BDT306) | AWS re:Invent 2013
Data Science at Netflix with Amazon EMR (BDT306) | AWS re:Invent 2013
 
Improving data interoperability in Python and R
Improving data interoperability in Python and RImproving data interoperability in Python and R
Improving data interoperability in Python and R
 
Advanced Splunk Administration
Advanced Splunk AdministrationAdvanced Splunk Administration
Advanced Splunk Administration
 
(BDT207) Real-Time Analytics In Service Of Self-Healing Ecosystems
(BDT207) Real-Time Analytics In Service Of Self-Healing Ecosystems(BDT207) Real-Time Analytics In Service Of Self-Healing Ecosystems
(BDT207) Real-Time Analytics In Service Of Self-Healing Ecosystems
 
Building Big Data Streaming Architectures
Building Big Data Streaming ArchitecturesBuilding Big Data Streaming Architectures
Building Big Data Streaming Architectures
 

Destacado

ResumeChitraSuresh 2016 new
ResumeChitraSuresh  2016 newResumeChitraSuresh  2016 new
ResumeChitraSuresh 2016 new
Chitra Kvs
 
Resume of Jyoti Yadav
Resume of Jyoti Yadav Resume of Jyoti Yadav
Resume of Jyoti Yadav
jyoti yadav
 
Resume - Divya Shree
Resume - Divya ShreeResume - Divya Shree
Resume - Divya Shree
Divya Shree
 

Destacado (15)

Dec3 a
Dec3 aDec3 a
Dec3 a
 
CHANDANI YADAV
CHANDANI  YADAVCHANDANI  YADAV
CHANDANI YADAV
 
ResumeChitraSuresh 2016 new
ResumeChitraSuresh  2016 newResumeChitraSuresh  2016 new
ResumeChitraSuresh 2016 new
 
Resume of Jyoti Yadav
Resume of Jyoti Yadav Resume of Jyoti Yadav
Resume of Jyoti Yadav
 
Akansha_Aggarwal_Resume
Akansha_Aggarwal_ResumeAkansha_Aggarwal_Resume
Akansha_Aggarwal_Resume
 
PRODUCCIÓN AGROALIMENTARIA: LA SOLUCIÓN VIABLE PARA EL FUTURO POSIBLE
PRODUCCIÓN AGROALIMENTARIA: LA SOLUCIÓN VIABLE PARA EL FUTURO POSIBLEPRODUCCIÓN AGROALIMENTARIA: LA SOLUCIÓN VIABLE PARA EL FUTURO POSIBLE
PRODUCCIÓN AGROALIMENTARIA: LA SOLUCIÓN VIABLE PARA EL FUTURO POSIBLE
 
Technology Essentials Sneak Preview
Technology Essentials Sneak PreviewTechnology Essentials Sneak Preview
Technology Essentials Sneak Preview
 
TERM PAPER(PDF)
TERM PAPER(PDF)TERM PAPER(PDF)
TERM PAPER(PDF)
 
Grow Your Own, Nevada! Summer 2012: Aquaponics
Grow Your Own, Nevada! Summer 2012: AquaponicsGrow Your Own, Nevada! Summer 2012: Aquaponics
Grow Your Own, Nevada! Summer 2012: Aquaponics
 
Emergencias en el Centro de Salud - Parte 1
Emergencias en el Centro de Salud - Parte 1Emergencias en el Centro de Salud - Parte 1
Emergencias en el Centro de Salud - Parte 1
 
Resume - Divya Shree
Resume - Divya ShreeResume - Divya Shree
Resume - Divya Shree
 
Ed Batista, The Art of Self-Coaching @StanfordBiz, Class 8: UNHAPPINESS
Ed Batista, The Art of Self-Coaching @StanfordBiz, Class 8: UNHAPPINESSEd Batista, The Art of Self-Coaching @StanfordBiz, Class 8: UNHAPPINESS
Ed Batista, The Art of Self-Coaching @StanfordBiz, Class 8: UNHAPPINESS
 
Trabalho em equipe e comunicação no ambiente hospitalar: hospitalistas e outr...
Trabalho em equipe e comunicação no ambiente hospitalar: hospitalistas e outr...Trabalho em equipe e comunicação no ambiente hospitalar: hospitalistas e outr...
Trabalho em equipe e comunicação no ambiente hospitalar: hospitalistas e outr...
 
Terapia no farmacológica para ansiedad e insomnio
Terapia no farmacológica para ansiedad e insomnioTerapia no farmacológica para ansiedad e insomnio
Terapia no farmacológica para ansiedad e insomnio
 
04 安心共事 - 共同化解分歧20160325
04 安心共事 - 共同化解分歧2016032504 安心共事 - 共同化解分歧20160325
04 安心共事 - 共同化解分歧20160325
 

Similar a Jinchao demo v3

Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Open Analytics
 
Open Data Summit Presentation by Joe Olsen
Open Data Summit Presentation by Joe OlsenOpen Data Summit Presentation by Joe Olsen
Open Data Summit Presentation by Joe Olsen
Christopher Whitaker
 
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
ALTER WAY
 

Similar a Jinchao demo v3 (20)

Jinchao demo
Jinchao demoJinchao demo
Jinchao demo
 
Jinchao demo v7
Jinchao demo v7Jinchao demo v7
Jinchao demo v7
 
[System design] Design a tweeter-like system
[System design] Design a tweeter-like system[System design] Design a tweeter-like system
[System design] Design a tweeter-like system
 
Configuring elasticsearch for performance and scale
Configuring elasticsearch for performance and scaleConfiguring elasticsearch for performance and scale
Configuring elasticsearch for performance and scale
 
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
 
Open Data Summit Presentation by Joe Olsen
Open Data Summit Presentation by Joe OlsenOpen Data Summit Presentation by Joe Olsen
Open Data Summit Presentation by Joe Olsen
 
CCI2018 - Real-time dashboard whatif analysis
CCI2018 - Real-time dashboard whatif analysisCCI2018 - Real-time dashboard whatif analysis
CCI2018 - Real-time dashboard whatif analysis
 
Solr and ElasticSearch demo and speaker feb 2014
Solr  and ElasticSearch demo and speaker feb 2014Solr  and ElasticSearch demo and speaker feb 2014
Solr and ElasticSearch demo and speaker feb 2014
 
Internals of Presto Service
Internals of Presto ServiceInternals of Presto Service
Internals of Presto Service
 
Data Science with the Help of Metadata
Data Science with the Help of MetadataData Science with the Help of Metadata
Data Science with the Help of Metadata
 
Azure saturday pn 2018
Azure saturday pn 2018Azure saturday pn 2018
Azure saturday pn 2018
 
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
 
SenchaCon 2016: Turbocharge your Ext JS App - Per Minborg, Anselm McClain, Jo...
SenchaCon 2016: Turbocharge your Ext JS App - Per Minborg, Anselm McClain, Jo...SenchaCon 2016: Turbocharge your Ext JS App - Per Minborg, Anselm McClain, Jo...
SenchaCon 2016: Turbocharge your Ext JS App - Per Minborg, Anselm McClain, Jo...
 
Perfect Norikra 2nd Season
Perfect Norikra 2nd SeasonPerfect Norikra 2nd Season
Perfect Norikra 2nd Season
 
SenchaCon Roadshow Irvine 2017
SenchaCon Roadshow Irvine 2017SenchaCon Roadshow Irvine 2017
SenchaCon Roadshow Irvine 2017
 
JOSA TechTalk: Metadata Management
in Big Data
JOSA TechTalk: Metadata Management
in Big DataJOSA TechTalk: Metadata Management
in Big Data
JOSA TechTalk: Metadata Management
in Big Data
 
(BDT403) Best Practices for Building Real-time Streaming Applications with Am...
(BDT403) Best Practices for Building Real-time Streaming Applications with Am...(BDT403) Best Practices for Building Real-time Streaming Applications with Am...
(BDT403) Best Practices for Building Real-time Streaming Applications with Am...
 
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
 
USG Rock Eagle 2017 - PWP at 1000 Days
USG Rock Eagle 2017 - PWP at 1000 DaysUSG Rock Eagle 2017 - PWP at 1000 Days
USG Rock Eagle 2017 - PWP at 1000 Days
 
Stateful Stream Processing at In-Memory Speed
Stateful Stream Processing at In-Memory SpeedStateful Stream Processing at In-Memory Speed
Stateful Stream Processing at In-Memory Speed
 

Último

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Último (20)

Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 

Jinchao demo v3

  • 1. SEARCH  YOUR  TWEETS SEARCH  LIKE  A  PROFESSIONAL
  • 2. Motivation • Twitter  represents  a  rich  flow  of  information • Lack  of  an  effective  way  to  query  the  twitter • Hard  to  monitor  interested  topics  at  real  time
  • 3. Search  Tweets  Like  a  Professional A  Real  Time  Twitter  Search  Engine  That   Allows  you  to  Search  based  on: •Keywords ◦Country ◦Language ◦Negative  words Demo(http://searchyourtweet.info:5000/input)
  • 4. Keep  an  eye  on  your  interested  topic •Not  just  searching  the  historical  tweets •Express  your  interest,  we  will  keep  you  update  on  the  newest  event •More  technical  detail  on  this  later •Video  (https://youtu.be/GdRmXNfukos)
  • 5. Data  pipeline Query  Controller Backend  Database percolator Logic  Layer Frontend Searching  database Data  Backup Pub/Sub Publish Matching  query Register  query searching
  • 6. Challenge Connect  backend  data  pipeline: ◦How to connect Kafka with ElasticSearch? ◦ Try with elasticsearch-­‐river-­‐kafka plugin,not successful ◦ Solution:using Logstash! ◦ Advantage: ◦ Easy to use ◦ Highly Scalable ◦ Work with different data sources and destinations An  example  of  logstash and  queue   In  production   environment
  • 7. Challenge Percolator: ◦Use  Case:  Altering  and  monitoring  documents ◦Think  it  as  “search  in  reverse” ◦ User  register  queries  into  percolator ◦ Percolator  match  incoming  documents  with  registered  queries ◦How  to  design  the  percolator  data  pipeline? ◦How  to  decouple  the  backend  database  with  frontend  server? ◦ Use  publish  /  subscribe  design  pattern
  • 8. Percolator  Pipeline Percolator Query  database Twitter  database Controller Pub/Sub New  incoming  tweets publish subscribe Open  channel
  • 9. •query_controller will  construct  the  percolator  query  based  on  it,  and  pass  it  to   ElasticSearch percolator.  The  query_controllerwill  also  open  an  Redis channel  for   this  topic. •Query_controller will  keep  fetching  the  latest  tweets  from  ElasticSearch for  every   5s  (current  setting)  and  sending  them  to  percolator  for  matching. •For  each  tweet,  percolator  will  tell  us  if  it  matches  any  registered  query.   Query_controller will  push  tweet  to  the  right  Redis channel  based  this  information. •In  frontend,  Flask  server  will  subscribe  to  the  Redis channel  and  receive   percolator's  update. •For  this  demo,  in  order  to  keep  frontend  UI  simple,  all  tweets  will  be  directed  to   the  default  Redis channel.   Data  flow  of  percolator
  • 10. Challenge • Real  time  update  on  frontend: ◦ How  to  keep  posting  Redis messages  from  Flask  server  to  client  at   real  time  (solved  a  very  hacky solution) • Construct  ElasticSearch query • Fine  tuning  on  ElasticSearch (not  enough  time  to  fine   tuning  elasticsearch mapping)  
  • 11. About  Me M.Math,  University  of  Waterloo ◦ Field:  Statistics  and  Machine  Learning B.S.,  University  of  Toronto ◦ Field:  Applied  Mathematics Data  Scientist  Intern,  Neon  Inc.,  San  Francisco Back-­‐end  Model  Developer,  MetricAid Inc.,  Toronto Strong  interest  in  Deep  Learning:   ◦ Convolutional  Network,  Recurrent  Network ◦ Applying  Deep  Learning  in  NLP