SlideShare una empresa de Scribd logo
1 de 2
Descargar para leer sin conexión
Helen Bailey and Sands Fish, MIT Libraries 1
Text Analysis Methods for Digital Humanities
Workshop Exercise
MALLET
Pre-workshop, students should download and install the MALLET GUI on their laptops.
https://code.google.com/p/topic-modeling-tool/
They should also run the topic modeler on a sample text file with the default settings to make
sure it’s working correctly.
Helpful MALLET Resources
• MALLET GUI information
• Blog post on using the GUI and displaying output in Gephi
• Using MALLET on the command line
• Intro to Topic Modeling in general
• MALLET website
• Review of MALLET in Journal of Digital Humanities
• Using HO-LDA / Finding Number of Topics in Emergency Text classification
In-Class Exercise
1. Run MALLET on a known corpus (full-text examples used in the demo, all from the
Gutenberg Project: Adventures of Huckleberry Finn, Alice’s Adventures in Wonderland,
Andersen’s Fairy Tales, Grimm’s Fairy Tales, Life on the Mississippi, On the Origin of
Species, The Wizard of Oz).
2. Change the parameters to see how they impact the results. For example:
• Does preserving case matter?
• How does changing the number of iterations impact the results?
• What about changing the topic proportion threshold?
• How many topic words should you print? (What are you trying to discover? How
much info is useful?)
• What do the results tell you about this corpus? How could you use this to learn
about a corpus you weren’t familiar with?
3. MALLET implements the LDA algorithm, discuss its details a little. Hierarchicial Topic
Modeling as a juxtaposition (not available via MALLET)
Helen Bailey and Sands Fish, MIT Libraries 2
Stanford Named Entity Recognizer
Pre-workshop, students should download and install the SNER GUI on their laptops.
http://nlp.stanford.edu/software/CRF-NER.shtml#Download
They should also run SNER on a sample file using the default classifier (or, if that’s not
available, the first classifier in the classifier folder), to make sure it’s working correctly.
Helpful SNER Resources
• Basic GUI tutorial
In-Class Exercise
• Run SNER on a known corpus. Change the classifier to see if results differ.
• Save tagged file output and open. What do you then need to do with that to make it
useful?
• Difference between entity extraction and entity disambiguation.
• Do we want to have them run the output through a concordance program?
• What might you do with this data? How could it interact with other tools to tell the
narrative?
CLAVIN
• CLAVIN Tool by Berico Technologies
o Cartographic Location And Vicinity INdexer
• MIT Center for Civic Media open source CLAVIN Server for doing geo-parsing via HTTP
o Includes special "civic sauce" for determining the "aboutness" of a document,
narrowing down to the most likely place a document is talking about.
o According to Civic, this is the best quality geo-parsing service outside of Yahoo's
pay service.
• Uses ApacheNLP for location entity extraction under the hood.
Setup
1. Download source from https://github.com/sandsfish/CLAVIN-Server
2. Follow the instructions in the readme to build and setup the tool.
Evaluating Assumptions
● We’re providing sample text to work with. What do you already know about it? What do
you know from the data itself, and what information are you lacking?
● What characteristics of the sample data are likely contributing to the results you get from
these tools? (Lack of pre-processing, for example)
● Note how long it takes for these tools o run. Consider the size of the data set we’re
working with versus the size of possible data sets you may be interested in.

Más contenido relacionado

Similar a MALLET Topic Modeling & NER Tools

Data Science Accelerator Program
Data Science Accelerator ProgramData Science Accelerator Program
Data Science Accelerator ProgramGoDataDriven
 
Databases, Web Services and Tools For Systems Immunology
Databases, Web Services and Tools For Systems ImmunologyDatabases, Web Services and Tools For Systems Immunology
Databases, Web Services and Tools For Systems ImmunologyYannick Pouliot
 
Data science and Hadoop
Data science and HadoopData science and Hadoop
Data science and HadoopDonald Miner
 
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...Ilkay Altintas, Ph.D.
 
Maintaining Large Scale Julia Ecosystems
Maintaining Large Scale Julia EcosystemsMaintaining Large Scale Julia Ecosystems
Maintaining Large Scale Julia EcosystemsChris Rackauckas
 
BESDUI: Benchmark for End-User Structured Data User Interfaces
BESDUI: Benchmark for End-User Structured Data User InterfacesBESDUI: Benchmark for End-User Structured Data User Interfaces
BESDUI: Benchmark for End-User Structured Data User InterfacesRoberto García
 
Software tools to facilitate materials science research
Software tools to facilitate materials science researchSoftware tools to facilitate materials science research
Software tools to facilitate materials science researchAnubhav Jain
 
1. OBJECT ORIENTED PROGRAMMING USING JAVA - OOps Concepts.ppt
1. OBJECT ORIENTED PROGRAMMING USING JAVA - OOps Concepts.ppt1. OBJECT ORIENTED PROGRAMMING USING JAVA - OOps Concepts.ppt
1. OBJECT ORIENTED PROGRAMMING USING JAVA - OOps Concepts.pptsagarjsicg
 
Fake news detection
Fake news detection Fake news detection
Fake news detection shalushamil
 
Deep learning with tensorflow
Deep learning with tensorflowDeep learning with tensorflow
Deep learning with tensorflowCharmi Chokshi
 
Applications of Large Language Models in Materials Discovery and Design
Applications of Large Language Models in Materials Discovery and DesignApplications of Large Language Models in Materials Discovery and Design
Applications of Large Language Models in Materials Discovery and DesignAnubhav Jain
 
Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...
Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...
Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...Maurice Nsabimana
 
Big Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-onBig Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-onDony Riyanto
 
2014 nicta-reproducibility
2014 nicta-reproducibility2014 nicta-reproducibility
2014 nicta-reproducibilityc.titus.brown
 
Tds — big science dec 2021
Tds — big science dec 2021Tds — big science dec 2021
Tds — big science dec 2021Gérard Dupont
 
You and your code.pdf
You and your code.pdfYou and your code.pdf
You and your code.pdfTony Khánh
 

Similar a MALLET Topic Modeling & NER Tools (20)

Machine Learning & Apache Mahout
Machine Learning & Apache MahoutMachine Learning & Apache Mahout
Machine Learning & Apache Mahout
 
Data Science Accelerator Program
Data Science Accelerator ProgramData Science Accelerator Program
Data Science Accelerator Program
 
Databases, Web Services and Tools For Systems Immunology
Databases, Web Services and Tools For Systems ImmunologyDatabases, Web Services and Tools For Systems Immunology
Databases, Web Services and Tools For Systems Immunology
 
kaggle_meet_up
kaggle_meet_upkaggle_meet_up
kaggle_meet_up
 
Data science and Hadoop
Data science and HadoopData science and Hadoop
Data science and Hadoop
 
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
 
Maintaining Large Scale Julia Ecosystems
Maintaining Large Scale Julia EcosystemsMaintaining Large Scale Julia Ecosystems
Maintaining Large Scale Julia Ecosystems
 
BESDUI: Benchmark for End-User Structured Data User Interfaces
BESDUI: Benchmark for End-User Structured Data User InterfacesBESDUI: Benchmark for End-User Structured Data User Interfaces
BESDUI: Benchmark for End-User Structured Data User Interfaces
 
Software tools to facilitate materials science research
Software tools to facilitate materials science researchSoftware tools to facilitate materials science research
Software tools to facilitate materials science research
 
1. OBJECT ORIENTED PROGRAMMING USING JAVA - OOps Concepts.ppt
1. OBJECT ORIENTED PROGRAMMING USING JAVA - OOps Concepts.ppt1. OBJECT ORIENTED PROGRAMMING USING JAVA - OOps Concepts.ppt
1. OBJECT ORIENTED PROGRAMMING USING JAVA - OOps Concepts.ppt
 
Fake news detection
Fake news detection Fake news detection
Fake news detection
 
Deep learning with tensorflow
Deep learning with tensorflowDeep learning with tensorflow
Deep learning with tensorflow
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Applications of Large Language Models in Materials Discovery and Design
Applications of Large Language Models in Materials Discovery and DesignApplications of Large Language Models in Materials Discovery and Design
Applications of Large Language Models in Materials Discovery and Design
 
Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...
Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...
Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...
 
Big Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-onBig Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-on
 
2014 nicta-reproducibility
2014 nicta-reproducibility2014 nicta-reproducibility
2014 nicta-reproducibility
 
Deep learning for NLP
Deep learning for NLPDeep learning for NLP
Deep learning for NLP
 
Tds — big science dec 2021
Tds — big science dec 2021Tds — big science dec 2021
Tds — big science dec 2021
 
You and your code.pdf
You and your code.pdfYou and your code.pdf
You and your code.pdf
 

Último

Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfJemuel Francisco
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONHumphrey A Beña
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfSpandanaRallapalli
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17Celine George
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection
 
Culture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptxCulture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptxPoojaSen20
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptxiammrhaywood
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxMaryGraceBautista27
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfphamnguyenenglishnb
 

Último (20)

Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdf
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
 
Culture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptxCulture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptx
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
 
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptxYOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptx
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
 
Raw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptxRaw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptx
 

MALLET Topic Modeling & NER Tools

  • 1. Helen Bailey and Sands Fish, MIT Libraries 1 Text Analysis Methods for Digital Humanities Workshop Exercise MALLET Pre-workshop, students should download and install the MALLET GUI on their laptops. https://code.google.com/p/topic-modeling-tool/ They should also run the topic modeler on a sample text file with the default settings to make sure it’s working correctly. Helpful MALLET Resources • MALLET GUI information • Blog post on using the GUI and displaying output in Gephi • Using MALLET on the command line • Intro to Topic Modeling in general • MALLET website • Review of MALLET in Journal of Digital Humanities • Using HO-LDA / Finding Number of Topics in Emergency Text classification In-Class Exercise 1. Run MALLET on a known corpus (full-text examples used in the demo, all from the Gutenberg Project: Adventures of Huckleberry Finn, Alice’s Adventures in Wonderland, Andersen’s Fairy Tales, Grimm’s Fairy Tales, Life on the Mississippi, On the Origin of Species, The Wizard of Oz). 2. Change the parameters to see how they impact the results. For example: • Does preserving case matter? • How does changing the number of iterations impact the results? • What about changing the topic proportion threshold? • How many topic words should you print? (What are you trying to discover? How much info is useful?) • What do the results tell you about this corpus? How could you use this to learn about a corpus you weren’t familiar with? 3. MALLET implements the LDA algorithm, discuss its details a little. Hierarchicial Topic Modeling as a juxtaposition (not available via MALLET)
  • 2. Helen Bailey and Sands Fish, MIT Libraries 2 Stanford Named Entity Recognizer Pre-workshop, students should download and install the SNER GUI on their laptops. http://nlp.stanford.edu/software/CRF-NER.shtml#Download They should also run SNER on a sample file using the default classifier (or, if that’s not available, the first classifier in the classifier folder), to make sure it’s working correctly. Helpful SNER Resources • Basic GUI tutorial In-Class Exercise • Run SNER on a known corpus. Change the classifier to see if results differ. • Save tagged file output and open. What do you then need to do with that to make it useful? • Difference between entity extraction and entity disambiguation. • Do we want to have them run the output through a concordance program? • What might you do with this data? How could it interact with other tools to tell the narrative? CLAVIN • CLAVIN Tool by Berico Technologies o Cartographic Location And Vicinity INdexer • MIT Center for Civic Media open source CLAVIN Server for doing geo-parsing via HTTP o Includes special "civic sauce" for determining the "aboutness" of a document, narrowing down to the most likely place a document is talking about. o According to Civic, this is the best quality geo-parsing service outside of Yahoo's pay service. • Uses ApacheNLP for location entity extraction under the hood. Setup 1. Download source from https://github.com/sandsfish/CLAVIN-Server 2. Follow the instructions in the readme to build and setup the tool. Evaluating Assumptions ● We’re providing sample text to work with. What do you already know about it? What do you know from the data itself, and what information are you lacking? ● What characteristics of the sample data are likely contributing to the results you get from these tools? (Lack of pre-processing, for example) ● Note how long it takes for these tools o run. Consider the size of the data set we’re working with versus the size of possible data sets you may be interested in.