Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

Project IDI PPT

  • Inicia sesión para ver los comentarios

  • Sé el primero en recomendar esto

Project IDI PPT

  1. 1. Project IDI David I Widjaja
  2. 2. Steps  Data Extraction  Tagging  Correlation  Web Scraping  Comparison  Documentation
  3. 3. Data Extraction  How to get the data?  Input from database  Input manually  Data type:  Topics that is made of strings
  4. 4. Tagging  Prerequisite:  Topic Sentences (Subject)  Dictionary (Tags)
  5. 5. Dictionary  How to create tags: 1. Get all topic sentences and split them between white space 2. Convert all words into lower case 3. Delete all numeric and duplicate values 4. Sort words alphabetically 5. Delete unnecessary words (e.g. is, the, and, etc.) 6. Search for synonym words and cluster them into a single tag 7. Translate words if necessary 8. Insert tags into main spreadsheet
  6. 6. Correlation  A weighted graph map is used:  The larger the amount of word associated with the tag, the bigger the bubble.  Lines get thicker according to the number of relationship between topics.
  7. 7. Web Scraping  Web Scraping on other similar websites  Take the topic sentences to be in the subject columns. Examples:  Article Titles  Comments  Etc.  Copy to previous spreadsheet (The one with the pervious tags).
  8. 8. Correlation  Do the same process as before on the weighted graph map
  9. 9. Comparison  Compare the two weighted graph maps
  10. 10. Word Cloud  Generate Word Cloud using Python or online tools. e.g.
  11. 11. Tools  Microsoft Excel 2013 (Spreadsheet)  Mozilla Firefox (Browser)  Inspect Element (Search Patterns)  DownThemAll (Download HTMLs)  Total Commander (Merge HTMLs)  Notepad++ (Cleanse Data)

×