SlideShare una empresa de Scribd logo
1 de 58
Laboratory for Knowledge Discovery in Databases Entity Extraction, Animal Disease-related Event Recognition and Classification from Web Presenter: Svitlana Volkova  Adviser: William H. Hsu Committee: Dr. Doina Caragea, Dr. Gurdip Singh  Supported by: K-State National Agricultural Biosecurity Center (NABC), US Department of Defense
Agenda Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010 Background Related Work Framework for Epidemiological Analysis Disease-related Document Classification Domain-specific Entity Extraction ,[object Object]
Sequence Labeling using Syntactic FeaturesDisease-related Event Recognition and Classification  Summary & Future Work
Importance of the Problem influence on the travel and trade cause economic crises, political instability diseases, zoonotic in type can cause loss of life Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
Animal Disease Monitoring Systems - Automated Web Services Information retrieval system MedISys  -  http://medusa.jrc.it/medisys/homeedition/all/home.html Pattern-based Understanding and Learning System (PULS) - http://sysdb.cs.helsinki.fi/puls/jrc/all BioCaster - http://biocaster.nii.ac.jp/ HealthMap - http://healthmap.org/en EpiSpider- http://www.epispider.org/ Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
Limitations of the Existing Systems No timeline visualization (BioCaster) Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
Problem Statement Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010 Introduce the following features to the framework for the epidemiological analysis: Classification of the disease-related documents collected from different domains Domain-specific entity extraction - animal disease names, viruses, disease serotypes Automated animal disease-related event recognition and classification from unstructured web data
Methodology Suppose we have a document collection D with documents collected from different domains C: news, web pages, scientific papers, medical literature, e-mails. We classify documents into two classes: disease-related documents DR; disease non-related document DNR. We extract a set of events E from every document di in DR for every domain cj.  For every event ek in E we extract a set of domain-specific and domain-independent entities: disease, species, location, date, event status. We classify recognized events from E into: two classes – suspected or confirmed; three classes – susceptible, infected or recovered. Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
Related Work Approaches for text categorization: supervised, unsupervised and semi-supervised learning and different feature representations: “bag-of-words”, terms frequency, binary features, word bigrams, classification algorithms: lazy learners, decision trees, Naïve Bayes, Maximum Entropy. Entity extraction approaches: gazetteers, regular expressions, Hidden Markov Models  and Conditional random Fields; ontology-based biomedical entity extraction. Relation extraction for automated ontology construction works. Animal disease-related event recognition methods.  Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
Framework for Epidemiological Analytics Framework for Epidemiological Analysis Main Functional Components Data Collection (Document Relevance Classification) -> Data Sharing -> Search -> Data Analysis (Entity Extraction and Event Recognition) -> Visualization
Advantages of the Designed System Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
Phases of Data Processing Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
1. Data Collection (1) Crawl the web using Heritrix crawler - http://crawler.archive.org/ set of seeds (ProMED-Mail, DEFRA etc.) set of terms (animal disease names from the ontology)  Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
2. Data Sharing Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010 Document relevance classification Relevant Non-relevant
3. Search Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010 Lucene-based* ranking Query-based keyword search Search by animal disease name and/or location *Lucene - http://lucene.apache.org
4. Data Analysis Event example: “On 12 September 2007, a new foot-and-mouth disease outbreak was confirmed in Egham, Surrey” Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
5. Visualization Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010 Map View GoogleMaps API - http://code.google.com/apis/maps/ TimeLine View SIMILE API - http://www.simile-widgets.org/timeline/
Disease-related Document Classification Binary Classification using Supervised Learning Feature Representations: “Bag-of-words”, TF, Bigrams Classifiers:  Naïve Bayes, MaxEntropy,  J48
SupervisedLearningFramework New Documents DTest Feature Representation  R1 … Feature Representation  Rn Learned Model  M1 … Learned Model Mk Crawled Documents DTrain Classifier Disease Related - DR  (processed to the next phases) Disease Non-related – DNR (eliminated from the index) Feature Representations: R1 – “bag-of-words” binary, |R1|=28908 R2 – “bag-of-words”  term frequency, |R2|=28908 R3 – “bag-of-words”  bigrams, |R3|=99108 R4 –  noun and verb keywords represented as binary counts, |R4|=2 R5 –  noun and verb keywords normalized frequency, |R5|=2 Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
Experiment ADisease-related Document Classification ~1500 crawled documents Foot-and-mouth disease (FMD) Rift valley fever (RVF) Focused Crawl Terms [foot and mouth disease, FMD, rift valley fever, RVF] After labeling - 813 related and 752 non-related docs Testing with 10-fold cross validation  + OR - Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
Classification Results: Precision, Recall,F-Measure, Area Under Curve Simplified Binary Counts as Features Simplified Noun and Verb Frequency as Features Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
Classification Results: Accuracy Comprehensive “Bag-of-words” Binary Features Comprehensive “Bag-of-words”, unigrams, bigrams and term frequency features Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
Summary (1) Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010 “Bag-of-words” representation gives higher accuracy; Generative approaches give the highest accuracy:  Naïve Bayes together with comprehensive feature representation R3 using bigram as features – 0.97; Maximum Entropy classier using unigram “bag-of words” representation R2 – 0.96; Maximum Entropy classier using comprehensive binary counts as feature representation R1 – 0.94. Normalized term frequency is much better than just binary features.
Summary (2) Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
Entity Extraction in the Domain of Veterinary Medicine (1) Ontology-based Entity Extraction Automated Ontology Construction
Domain Meta-data Domain-independent knowledge Domain-specific knowledge Location hierarchy names of countries, states, cities; Time hierarchy canonical dates. Medical ontology diseases, serotypes, and viruses. Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
Manually-constructedInitial Ontology |OINIT|=429	|OS|=581	 |OA|=581	 |OS+A|=605 1. Disease names and fact sheets from Iowa State University Center for Food Security and Public Health (CFSPH):  ,[object Object],2. Word Organization of Animal Health (OIE) Animal Disease Data: ,[object Object],3. Department for Environmental Food and Rural Affairs, UK (DEFRA): ,[object Object],4. Wikipedia ,[object Object],[object Object]
Experiment BOntology-based Entity Extraction ,[object Object]
100 manually labeled document for entity extractionThesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
Entity Extraction Results: ROC Curves Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
Entity Extraction Results: Learning Curves |OG|=754..1238  |OR|=772..1287 Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
Entity Extraction in the Domain of Veterinary Medicine (2) Sequence Labeling using Syntactic Features  with Sliding Window
Syntactic Feature Extraction POS tag numeric word-level feature Capitalization binary word-level feature Capitalization inside binary word-level feature for identifying abbreviations Position in the sentence numeric document-level feature Position in the document numeric document-level feature Frequency numeric document-level feature Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
Sequence Labeling Approach Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
An Example of Syntactic Feature Extraction “Severe disease in dairy cattle caused by Salmonella Newport” POS= [NNP, IN, NNS, VBN, …] = [2, 0, 2, 5, …] Xi = [POSi, CAPi, ICAPi, SPOSi, DPOSi, FREQi] Xi-3 = [2, 0, 0, 5, 5, 1] Xi = [2, 1, 0, 8, 8, 1] Xi-2 = [5, 0, 0, 6, 6, 1] Newport Xi-1 = [0, 0, 0, 7, 7, 1] … … wi wi+1 wi+2 wi-3 wi-1 wi+3 wi-2 cattle  caused  by  Salmonella  Xi+1 = [2, 1, 0, 9, 9, 1] Xi+2 = [-1, -1, -1, -1, -1, -1] Fi = [Xi, Xi-1, Xi-2, Xi-3, Xi+1, Xi+2, Xi+3], w = 3 Class = {0, 1} Xi+3 = [-1, -1, -1, -1, -1, -1] Fi = [2, 1, 0, 8, 8, 1, 0, 0, 0, 7, 7, 1, 5, 0, 0, 6, 6, 1, 2, 0, 0, 5, 5, 1,         2, 1, 0, 9, 9, 1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1], Class = [1] Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
Experiment CSequence Labeling using Syntactic Features 100 manually labeled documents from Experiment B Number of the disease names is more that 5 per document Keep capitalization Remove stop words 202977 examples in the dataset 80% for training (approx. 160000 examples) 20% for testing (approx. 40000 examples) Results are averaged over 3 runs We do not report accuracy because the data set is unbalanced (approx. 8570 positive examples vs. approx. 194430 negative examples) Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
Entity Extraction Results: Precision, Recall, AUC (1) Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
Entity Extraction Results: Precision, Recall, AUC (2) Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
Summary BioCaster named entity recognition system 200 news articles  F-score – 76.9 for all named entity classes SVM and feature window -2/+1 including surface word,  orthography, biomedical prefixes/suffixes, lemma, head noun etc. DNA, RNA, cell type extraction SVM and orthographic features F-score – 79.9 during the identification phase and 66.5 during the classification phase; Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
Disease-related Event Recognition and Classification (1) Sentence-based Event Recognition and Classification
Animal Disease-related Event Types Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
Event Recognition Methodology Step 1. Entity recognition from raw text. Step 2. Sentence classification from which entities are extracted as being related to an event or not; if they are related to an event we classify them as confirmed or suspected. Step 3. Combination of entities within an event sentence into the structured tuples and aggregation of tuples related to the same event into one comprehensive tuple. Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
Step 1.Entity Recognition Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010 Locate and classify atomic elements into predefined categories: Disease names:“foot and mouth disease”, “rift valley fever”; viruses: “picornavirus”; serotypes: “Asia-1”; Species: “sheep”, “pigs”, “cattle” and “livestock”; Locationsof events specified at different levels of geo-granularity: “United Kingdom", “eastern provinces of Shandong and Jiangsu, China”; Datesin different formats: “last Tuesday”, “two month ago”.
Entity Recognition Tools Animal Disease Extractor* relies on a medical ontology, automatically-enriched with synonyms and causative viruses. Species Extractor*  pattern matching on a stemmed dictionary of animal names from Wikipedia. Location Extractor Stanford NER Tool** (uses conditional random fields); NGA GEOnet Names Database (GNS)*** for location disambiguation and retrieving latitude/longitude. Date/Time Extractor set of regular expressions. *KDD KSU DSEx - http://fingolfin.user.cis.ksu.edu:8080/diseaseextractor/ **Stanford NER - http://nlp.stanford.edu/ner/index.shtml ***GNS - http://earth-info.nga.mil/gns/html/ Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
Step 2. Event Sentence Classification  Constraint: True events should include a disease name together with a status verb from Google Sets* and WordNet** (eliminate event non-related sentences). “Foot and mouth disease is[V] a highly pathogenic animal disease”. Confirmed status verbs “happened” and verb phrases “strike out” “On 9 Jun 2009, the farm's owner reported[V] symptoms of FMD in more than 30 hogs”. Suspected status verbs “catch” and verb phrases “be taken in” “RVF is suspected[V] in Saudi Arabia in September 2000”. 	*GoogleSets - http://labs.google.com/sets        **WordNet - http://wordnet.princeton.edu/ Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
Step 3. Event Tuple Generation Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010 Event attributes: disease date location species confirmation status Event tuple: Eventi  = < disease; date; location; species; status > =  			<FMD, 9 Jun 2009, Taoyuan, hog, confirmed> Event tuple with missing attributes: Eventj = <FMD, ?, ?, ?, confirmed>
Event Recognition Workflow Step 1: Entity Recognition Foot-and-mouth disease[DIS]on hog[SP] farm in Taoyuan[LOC].  Taiwan's TVBS television station reports that agricultural authorities confirmed foot-and-mouth disease[DIS] on a hog[SP] farm in Taoyuan[LOC]. On 9 Jun 2009[DT], the farm's owner reported symptoms of FMD[DIS] in more than 30 hogs[SP]. Subsequent testing confirmed FMD[DIS]. Agricultural authorities asked the farmer to strengthen immunization. The outbreak has not affected other farms. Authorities stipulated that the affected hog[SP] farm may not sell pork for 2 weeks. Step 2: Sentence Classification YES      1. Foot-and-mouth disease[DIS]on hog[SP] farm in Taoyuan[LOC].  YES      2.Taiwan's TVBS television station reports that agricultural authorities confirmedfoot-and-mouth disease[DIS]on a hog[SP] farm in Taoyuan[LOC].  YES      3. On 9 Jun 2009[DT], the farm's owner reported symptoms of FMD[DIS] in more than 30 hogs[SP].  YES      4. Subsequent testing confirmedFMD[DIS]. NO        5. Agricultural authorities asked the farmer to strengthen immunization. NO        6. The outbreak has not affected other farms.	 NO        7. Authorities stipulated that the affected hog[SP] farm may not sell pork for 2 weeks. Step 3a: Tuple Generation E1 = <Foot-and-mouth disease, ?, Taoyuan, hog, ?>	E3 = <FMD, 9 Jun 2009,?, hog, reported> E2 = <Foot-and-mouth disease, ?, Taoyuan, hog, confirmed >	E4 = <FMD, ?, ?, ?, confirmed> Step 3b: Tuple Aggregation E = <disease, date, location, species, status> = <Foot-and-mouth disease, 9 Jun 2009, Taoyuan, hog, confirmed >  Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
Experiment DEvent Recognition and Classification The First International Workshop on Web Science and Information Exchange in the Medical Web (MedEx  2010)   ~100 event-related documents Foot-and-mouth disease (FMD) Rift valley fever (RVF) Manually created 2 sets of summaries for 100 docs DUCView Pyramid Scoring Tool* – Score [0..1] relies on multiple summaries to assign the significance weights to summarization content units (i.e., entities) to compare automatically generated event tuples with entities from human summaries. Scorei = < wddisease; wtdate; wllocation; wsspecies; wcstatus… >, subject to disease + status = 2
Event Score Distribution by Range We interpret the Pyramid score values as an event extraction accuracy: # of unique contributing entities (TP); # of entities not in the summary (FP); # of extra contributing entities from summary (FN). multiple summaries – majority voting for annotation. The First International Workshop on Web Science and Information Exchange in the Medical Web (MedEx  2010)
Disease-related Event Recognition and Classification (2) Event Recognition and Classification in Predictive Epidemiology Domain
ENTITY EXTRACTION Document 3, sentence s31 Almost 2000 cattle[SP] are waiting to be slaughtered on 02/28/2001[DATE]since the resurgence of FMD[DIS] in Northumberland[LOC]. Document 2, sentence s21 The UK Ministry of Agriculture confirmed on 2/20/01[DATE] that 27 pigs[SP] found with vesicles in an abattoir near Brentwood, Essex[LOC] have FMD[DIS]. Document 1, sentence s11, s12 The signs suggested the 27 pigs[SP] could be suffering from foot and mouth disease[DIS] in Anglesey, Wales[LOC].It was reported on 02/18/01[DATE]. … … EVENT TUPLE GENERATION e11 = [27 pigs, FMD, ?, Anglesey, Wales, “suggest”] e12 = [?,?, 02/18/01, ?, “report”] e21 = [27 pigs, FMD, 2/20/01, Brentwood, Essex, “confirm”] e31 = [2000 cattle, FMD, 2/28/01, Northumberland, “slaughter”] EVENT TUPLE CLASSIFICATION Susceptible Recovered Infected EVENT TUPLE AGGREGATION E2 = [27 pigs, FMD, 2/20/01, Brentwood, Essex, Infected] E3 = [2000 cattle, FMD, 2/28/01, Northumberland, Recovered] E1= [27 pigs, FMD, 02/18/01, Anglesey, Wales, Susceptible] Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
The spread of foot-and-mouth disease outbreak in UK, 2001 118 ProMed-Mail reports yellow - susceptible red  - infected green  -  recovered Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
Summary The accuracy of the event recognition depends on the separate entity extraction accuracy The event aggregation and deduplication requires much comprehensive heuristics and additional knowledge, for example co-reference resolution BioCaster 950 disease-location pairs per month reported results - 887/950 correct disease-location pairs and 0.934 precision MedISys/PULS 100 English-language documents with 156 events Reported results – 0.88 precision Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
Conclusions, Contributions and Future Work Summary:  1. Disease-related Document Classification  2. Ontology-based Entity Extraction  3. Entity Extraction using Sequence Labeling  4. Event Recognition and Classification
Conclusions Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010 Disease-related Document Classification supervised framework feature representations and classification algorithms Ontology-based Domain-specific Entity Extraction semantic relationship extraction approach sequence labeling using syntactic patterns Event Recognition and Classification novel sentence-based approach
Contributions Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010 Paper “Computational Knowledge and Information Management in Veterinary Epidemiology” IEEE Intelligence and Security Informatics Conference (ISI'10), 23-26 May 2010, Vancouver, BC, Canada Paper “Animal Disease Event Recognition and Classification” First International Workshop on Web Science and Information Exchange in the Medical Web (MedEx'10), WWW Conference, 26-30 April 2010, Raleigh, NC, USA Paper “Boosting Biomedical Entity Extraction by Using Syntactic Patterns for Semantic Relation Discovery” (to appear) 2010 IEEE/WIC/ACM International Conference on Web Intelligence (WI'10), August 31 - September 3, York University, Toronto, Canada Poster “Named Entity Recognition and Tagging in the Domain of Epizootics” Women in Machine Learning Workshop (WiML'09) Workshop, 6-7 Dec 2009, Vancouver, Canada ACM Poster Presentation Competition “Automated Event Extraction and Named Entity Recognition in the Domain of Veterinary Medicine” 2010 Grace Hopper Celebration of Women in Computing (GHC'10),September 28 - October 1, Atlanta, Georgia, USA
Future Work Domain-specific Entity Extraction multilingual ontology construction using Wikipedia. Automated Ontology Construction generalize for other named entities Event Recognition and Classification deeper syntactic analysis  co-reference resolution Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
Acknowledgments Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010 Faculty:  Dr. William H. Hsu Dr. Doina Caragea Dr. Gurdip Singh KDD Lab alumni:  Tim Weninger (crawler deployment) and Jing Xia (rule-based event extraction) KDD Lab assistants: Information Extraction Team: John Drouhard, Landon Fowles, Swathi Bujuru Spatial Data Mining Team: Wesam Elshamy, AndrewBerggren Topic Detection & Tracking Team: Danny Jones, Srinivas Reddy Fulbright Program supported by the US Department of State's Bureau of Education and Cultural Affairs

Más contenido relacionado

Destacado

Linked in Social Selling 5 Easy Steps to Convert Connections to New Sales - ...
Linked in Social Selling  5 Easy Steps to Convert Connections to New Sales - ...Linked in Social Selling  5 Easy Steps to Convert Connections to New Sales - ...
Linked in Social Selling 5 Easy Steps to Convert Connections to New Sales - ...Social Jack
 
5 Easy Steps to Access SBA Business Financing Bridgeview Bank - Tom Meyer -...
5 Easy Steps to Access SBA Business Financing   Bridgeview Bank - Tom Meyer -...5 Easy Steps to Access SBA Business Financing   Bridgeview Bank - Tom Meyer -...
5 Easy Steps to Access SBA Business Financing Bridgeview Bank - Tom Meyer -...Social Jack
 
Grace Hopper Celebration 2010
Grace Hopper Celebration 2010Grace Hopper Celebration 2010
Grace Hopper Celebration 2010Svitlana volkova
 
人生沒有彩排 (With music)
人生沒有彩排 (With music)人生沒有彩排 (With music)
人生沒有彩排 (With music)Dhamma Jata
 
Webinaarimateriaali: Terveystalon OmaTerveys ja mobiili asiakakaskokemus
Webinaarimateriaali: Terveystalon OmaTerveys ja mobiili asiakakaskokemusWebinaarimateriaali: Terveystalon OmaTerveys ja mobiili asiakakaskokemus
Webinaarimateriaali: Terveystalon OmaTerveys ja mobiili asiakakaskokemusJarno Malaprade
 
NAHJ Mobile Journalism Live Video
NAHJ Mobile Journalism Live VideoNAHJ Mobile Journalism Live Video
NAHJ Mobile Journalism Live VideoMo Krochmal
 
Informática aplicada a Ed. Física
Informática aplicada a Ed. Física Informática aplicada a Ed. Física
Informática aplicada a Ed. Física João Filho
 
Capítulo1 - Introdução a Sistemas Distribuídos - Coulouris
Capítulo1 - Introdução a Sistemas Distribuídos - CoulourisCapítulo1 - Introdução a Sistemas Distribuídos - Coulouris
Capítulo1 - Introdução a Sistemas Distribuídos - CoulourisWindson Viana
 

Destacado (11)

Linked in Social Selling 5 Easy Steps to Convert Connections to New Sales - ...
Linked in Social Selling  5 Easy Steps to Convert Connections to New Sales - ...Linked in Social Selling  5 Easy Steps to Convert Connections to New Sales - ...
Linked in Social Selling 5 Easy Steps to Convert Connections to New Sales - ...
 
5 Easy Steps to Access SBA Business Financing Bridgeview Bank - Tom Meyer -...
5 Easy Steps to Access SBA Business Financing   Bridgeview Bank - Tom Meyer -...5 Easy Steps to Access SBA Business Financing   Bridgeview Bank - Tom Meyer -...
5 Easy Steps to Access SBA Business Financing Bridgeview Bank - Tom Meyer -...
 
P1 e1 internet
P1 e1 internetP1 e1 internet
P1 e1 internet
 
MedEx'10
MedEx'10MedEx'10
MedEx'10
 
IEEE ISI'10
IEEE ISI'10IEEE ISI'10
IEEE ISI'10
 
Grace Hopper Celebration 2010
Grace Hopper Celebration 2010Grace Hopper Celebration 2010
Grace Hopper Celebration 2010
 
人生沒有彩排 (With music)
人生沒有彩排 (With music)人生沒有彩排 (With music)
人生沒有彩排 (With music)
 
Webinaarimateriaali: Terveystalon OmaTerveys ja mobiili asiakakaskokemus
Webinaarimateriaali: Terveystalon OmaTerveys ja mobiili asiakakaskokemusWebinaarimateriaali: Terveystalon OmaTerveys ja mobiili asiakakaskokemus
Webinaarimateriaali: Terveystalon OmaTerveys ja mobiili asiakakaskokemus
 
NAHJ Mobile Journalism Live Video
NAHJ Mobile Journalism Live VideoNAHJ Mobile Journalism Live Video
NAHJ Mobile Journalism Live Video
 
Informática aplicada a Ed. Física
Informática aplicada a Ed. Física Informática aplicada a Ed. Física
Informática aplicada a Ed. Física
 
Capítulo1 - Introdução a Sistemas Distribuídos - Coulouris
Capítulo1 - Introdução a Sistemas Distribuídos - CoulourisCapítulo1 - Introdução a Sistemas Distribuídos - Coulouris
Capítulo1 - Introdução a Sistemas Distribuídos - Coulouris
 

Similar a MS Thesis Short

Multimodal Information Extraction: Disease, Date and Location Retrieval
Multimodal Information Extraction: Disease, Date and Location RetrievalMultimodal Information Extraction: Disease, Date and Location Retrieval
Multimodal Information Extraction: Disease, Date and Location RetrievalSvitlana volkova
 
RIFF - A Social Network and Collaborative Platform For Public Health Disease ...
RIFF - A Social Network and Collaborative Platform For Public Health Disease ...RIFF - A Social Network and Collaborative Platform For Public Health Disease ...
RIFF - A Social Network and Collaborative Platform For Public Health Disease ...InSTEDD
 
Riff: A Social Network and Collaborative Platform for Public Health Disease S...
Riff: A Social Network and Collaborative Platform for Public Health Disease S...Riff: A Social Network and Collaborative Platform for Public Health Disease S...
Riff: A Social Network and Collaborative Platform for Public Health Disease S...Taha Kass-Hout, MD, MS
 
Exploiting NLP for Digital Disease Informatics
Exploiting NLP for Digital Disease InformaticsExploiting NLP for Digital Disease Informatics
Exploiting NLP for Digital Disease InformaticsNigel Collier
 
Biosurveillance: Machine Learning And Disease Surveillance by Kass-Hout Di Tada
Biosurveillance: Machine Learning And Disease Surveillance by Kass-Hout Di TadaBiosurveillance: Machine Learning And Disease Surveillance by Kass-Hout Di Tada
Biosurveillance: Machine Learning And Disease Surveillance by Kass-Hout Di TadaTaha Kass-Hout, MD, MS
 
InSTEDD: Collaboration in Disease Surveillance & Response
InSTEDD: Collaboration in Disease Surveillance & ResponseInSTEDD: Collaboration in Disease Surveillance & Response
InSTEDD: Collaboration in Disease Surveillance & ResponseInSTEDD
 
Biosurveillance 2.0: Lecture at Emory University
Biosurveillance 2.0: Lecture at Emory UniversityBiosurveillance 2.0: Lecture at Emory University
Biosurveillance 2.0: Lecture at Emory UniversityTaha Kass-Hout, MD, MS
 
Evolve: InSTEDD's Global Early Warning and Response System
Evolve: InSTEDD's Global Early Warning and Response SystemEvolve: InSTEDD's Global Early Warning and Response System
Evolve: InSTEDD's Global Early Warning and Response SystemTaha Kass-Hout, MD, MS
 
Presentation from Code Camp 2017
Presentation from Code Camp 2017Presentation from Code Camp 2017
Presentation from Code Camp 2017Mitch Miller
 
Acting as Advocate? Seven steps for libraries in the data decade
Acting as Advocate? Seven steps for libraries in the data decadeActing as Advocate? Seven steps for libraries in the data decade
Acting as Advocate? Seven steps for libraries in the data decadeLizLyon
 
Improving Disease Surveillance in the United States Using Companion Animal Data
Improving Disease Surveillance in the United States Using Companion Animal DataImproving Disease Surveillance in the United States Using Companion Animal Data
Improving Disease Surveillance in the United States Using Companion Animal DataPamela Okerholm
 
Searching for evidence
Searching for evidenceSearching for evidence
Searching for evidenceAnne Madden
 
Developing data services: a tale from two Oregon universities
Developing data services: a tale from two Oregon universitiesDeveloping data services: a tale from two Oregon universities
Developing data services: a tale from two Oregon universitiesAmanda Whitmire
 
Infrastructures Supporting Inter-disciplinary Research - Exemplars from the UK

Infrastructures Supporting Inter-disciplinary Research - Exemplars from the UK
Infrastructures Supporting Inter-disciplinary Research - Exemplars from the UK

Infrastructures Supporting Inter-disciplinary Research - Exemplars from the UK
NeISSProject
 
Eysenbach: Infodemiology and Infoveillance
Eysenbach: Infodemiology and InfoveillanceEysenbach: Infodemiology and Infoveillance
Eysenbach: Infodemiology and InfoveillanceGunther Eysenbach
 

Similar a MS Thesis Short (20)

Master Thesis
Master ThesisMaster Thesis
Master Thesis
 
Multimodal Information Extraction: Disease, Date and Location Retrieval
Multimodal Information Extraction: Disease, Date and Location RetrievalMultimodal Information Extraction: Disease, Date and Location Retrieval
Multimodal Information Extraction: Disease, Date and Location Retrieval
 
RIFF - A Social Network and Collaborative Platform For Public Health Disease ...
RIFF - A Social Network and Collaborative Platform For Public Health Disease ...RIFF - A Social Network and Collaborative Platform For Public Health Disease ...
RIFF - A Social Network and Collaborative Platform For Public Health Disease ...
 
Riff: A Social Network and Collaborative Platform for Public Health Disease S...
Riff: A Social Network and Collaborative Platform for Public Health Disease S...Riff: A Social Network and Collaborative Platform for Public Health Disease S...
Riff: A Social Network and Collaborative Platform for Public Health Disease S...
 
Exploiting NLP for Digital Disease Informatics
Exploiting NLP for Digital Disease InformaticsExploiting NLP for Digital Disease Informatics
Exploiting NLP for Digital Disease Informatics
 
Biosurveillance: Machine Learning And Disease Surveillance by Kass-Hout Di Tada
Biosurveillance: Machine Learning And Disease Surveillance by Kass-Hout Di TadaBiosurveillance: Machine Learning And Disease Surveillance by Kass-Hout Di Tada
Biosurveillance: Machine Learning And Disease Surveillance by Kass-Hout Di Tada
 
Biosurveillance 2.0
Biosurveillance 2.0Biosurveillance 2.0
Biosurveillance 2.0
 
InSTEDD HISA Conference
InSTEDD HISA ConferenceInSTEDD HISA Conference
InSTEDD HISA Conference
 
InSTEDD: Collaboration in Disease Surveillance & Response
InSTEDD: Collaboration in Disease Surveillance & ResponseInSTEDD: Collaboration in Disease Surveillance & Response
InSTEDD: Collaboration in Disease Surveillance & Response
 
Biosurveillance 2.0: Lecture at Emory University
Biosurveillance 2.0: Lecture at Emory UniversityBiosurveillance 2.0: Lecture at Emory University
Biosurveillance 2.0: Lecture at Emory University
 
Evolve: InSTEDD's Global Early Warning and Response System
Evolve: InSTEDD's Global Early Warning and Response SystemEvolve: InSTEDD's Global Early Warning and Response System
Evolve: InSTEDD's Global Early Warning and Response System
 
Summit2013 ho-jin choi - summit2013
Summit2013   ho-jin choi - summit2013Summit2013   ho-jin choi - summit2013
Summit2013 ho-jin choi - summit2013
 
Presentation from Code Camp 2017
Presentation from Code Camp 2017Presentation from Code Camp 2017
Presentation from Code Camp 2017
 
Acting as Advocate? Seven steps for libraries in the data decade
Acting as Advocate? Seven steps for libraries in the data decadeActing as Advocate? Seven steps for libraries in the data decade
Acting as Advocate? Seven steps for libraries in the data decade
 
Improving Disease Surveillance in the United States Using Companion Animal Data
Improving Disease Surveillance in the United States Using Companion Animal DataImproving Disease Surveillance in the United States Using Companion Animal Data
Improving Disease Surveillance in the United States Using Companion Animal Data
 
Environmental Public Health Tracking Network
Environmental Public Health Tracking NetworkEnvironmental Public Health Tracking Network
Environmental Public Health Tracking Network
 
Searching for evidence
Searching for evidenceSearching for evidence
Searching for evidence
 
Developing data services: a tale from two Oregon universities
Developing data services: a tale from two Oregon universitiesDeveloping data services: a tale from two Oregon universities
Developing data services: a tale from two Oregon universities
 
Infrastructures Supporting Inter-disciplinary Research - Exemplars from the UK

Infrastructures Supporting Inter-disciplinary Research - Exemplars from the UK
Infrastructures Supporting Inter-disciplinary Research - Exemplars from the UK

Infrastructures Supporting Inter-disciplinary Research - Exemplars from the UK

 
Eysenbach: Infodemiology and Infoveillance
Eysenbach: Infodemiology and InfoveillanceEysenbach: Infodemiology and Infoveillance
Eysenbach: Infodemiology and Infoveillance
 

Más de Svitlana volkova

Más de Svitlana volkova (12)

EACL'12 Poster
EACL'12 PosterEACL'12 Poster
EACL'12 Poster
 
Multilingual Ner Using Wiki
Multilingual Ner Using WikiMultilingual Ner Using Wiki
Multilingual Ner Using Wiki
 
WiML Poster
WiML PosterWiML Poster
WiML Poster
 
Topics Modeling
Topics ModelingTopics Modeling
Topics Modeling
 
Project Proposal Topics Modeling (Ir)
Project Proposal    Topics Modeling (Ir)Project Proposal    Topics Modeling (Ir)
Project Proposal Topics Modeling (Ir)
 
Social Networks
Social NetworksSocial Networks
Social Networks
 
Methods Of Reliability Analysis
Methods Of Reliability AnalysisMethods Of Reliability Analysis
Methods Of Reliability Analysis
 
Ohio Project
Ohio ProjectOhio Project
Ohio Project
 
Ukraine Presentation
Ukraine PresentationUkraine Presentation
Ukraine Presentation
 
Ukraine Presentation at Kansas State University
Ukraine Presentation at Kansas State UniversityUkraine Presentation at Kansas State University
Ukraine Presentation at Kansas State University
 
Communicatons Fulbright
Communicatons FulbrightCommunicatons Fulbright
Communicatons Fulbright
 
Communications Ternopil
Communications TernopilCommunications Ternopil
Communications Ternopil
 

Último

Dyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxDyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxcallscotland1987
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxDr. Sarita Anand
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentationcamerronhm
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfSherif Taha
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxVishalSingh1417
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the ClassroomPooky Knightsmith
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptxMaritesTamaniVerdade
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfPoh-Sun Goh
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfNirmal Dwivedi
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin ClassesCeline George
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibitjbellavia9
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and ModificationsMJDuyan
 

Último (20)

Dyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxDyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptx
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 

MS Thesis Short

  • 1. Laboratory for Knowledge Discovery in Databases Entity Extraction, Animal Disease-related Event Recognition and Classification from Web Presenter: Svitlana Volkova Adviser: William H. Hsu Committee: Dr. Doina Caragea, Dr. Gurdip Singh Supported by: K-State National Agricultural Biosecurity Center (NABC), US Department of Defense
  • 2.
  • 3. Sequence Labeling using Syntactic FeaturesDisease-related Event Recognition and Classification Summary & Future Work
  • 4. Importance of the Problem influence on the travel and trade cause economic crises, political instability diseases, zoonotic in type can cause loss of life Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
  • 5. Animal Disease Monitoring Systems - Automated Web Services Information retrieval system MedISys - http://medusa.jrc.it/medisys/homeedition/all/home.html Pattern-based Understanding and Learning System (PULS) - http://sysdb.cs.helsinki.fi/puls/jrc/all BioCaster - http://biocaster.nii.ac.jp/ HealthMap - http://healthmap.org/en EpiSpider- http://www.epispider.org/ Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
  • 6. Limitations of the Existing Systems No timeline visualization (BioCaster) Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
  • 7. Problem Statement Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010 Introduce the following features to the framework for the epidemiological analysis: Classification of the disease-related documents collected from different domains Domain-specific entity extraction - animal disease names, viruses, disease serotypes Automated animal disease-related event recognition and classification from unstructured web data
  • 8. Methodology Suppose we have a document collection D with documents collected from different domains C: news, web pages, scientific papers, medical literature, e-mails. We classify documents into two classes: disease-related documents DR; disease non-related document DNR. We extract a set of events E from every document di in DR for every domain cj. For every event ek in E we extract a set of domain-specific and domain-independent entities: disease, species, location, date, event status. We classify recognized events from E into: two classes – suspected or confirmed; three classes – susceptible, infected or recovered. Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
  • 9. Related Work Approaches for text categorization: supervised, unsupervised and semi-supervised learning and different feature representations: “bag-of-words”, terms frequency, binary features, word bigrams, classification algorithms: lazy learners, decision trees, Naïve Bayes, Maximum Entropy. Entity extraction approaches: gazetteers, regular expressions, Hidden Markov Models and Conditional random Fields; ontology-based biomedical entity extraction. Relation extraction for automated ontology construction works. Animal disease-related event recognition methods. Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
  • 10. Framework for Epidemiological Analytics Framework for Epidemiological Analysis Main Functional Components Data Collection (Document Relevance Classification) -> Data Sharing -> Search -> Data Analysis (Entity Extraction and Event Recognition) -> Visualization
  • 11. Advantages of the Designed System Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
  • 12. Phases of Data Processing Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
  • 13. 1. Data Collection (1) Crawl the web using Heritrix crawler - http://crawler.archive.org/ set of seeds (ProMED-Mail, DEFRA etc.) set of terms (animal disease names from the ontology) Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
  • 14. 2. Data Sharing Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010 Document relevance classification Relevant Non-relevant
  • 15. 3. Search Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010 Lucene-based* ranking Query-based keyword search Search by animal disease name and/or location *Lucene - http://lucene.apache.org
  • 16. 4. Data Analysis Event example: “On 12 September 2007, a new foot-and-mouth disease outbreak was confirmed in Egham, Surrey” Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
  • 17. 5. Visualization Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010 Map View GoogleMaps API - http://code.google.com/apis/maps/ TimeLine View SIMILE API - http://www.simile-widgets.org/timeline/
  • 18. Disease-related Document Classification Binary Classification using Supervised Learning Feature Representations: “Bag-of-words”, TF, Bigrams Classifiers: Naïve Bayes, MaxEntropy, J48
  • 19. SupervisedLearningFramework New Documents DTest Feature Representation R1 … Feature Representation Rn Learned Model M1 … Learned Model Mk Crawled Documents DTrain Classifier Disease Related - DR (processed to the next phases) Disease Non-related – DNR (eliminated from the index) Feature Representations: R1 – “bag-of-words” binary, |R1|=28908 R2 – “bag-of-words” term frequency, |R2|=28908 R3 – “bag-of-words” bigrams, |R3|=99108 R4 – noun and verb keywords represented as binary counts, |R4|=2 R5 – noun and verb keywords normalized frequency, |R5|=2 Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
  • 20. Experiment ADisease-related Document Classification ~1500 crawled documents Foot-and-mouth disease (FMD) Rift valley fever (RVF) Focused Crawl Terms [foot and mouth disease, FMD, rift valley fever, RVF] After labeling - 813 related and 752 non-related docs Testing with 10-fold cross validation + OR - Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
  • 21. Classification Results: Precision, Recall,F-Measure, Area Under Curve Simplified Binary Counts as Features Simplified Noun and Verb Frequency as Features Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
  • 22. Classification Results: Accuracy Comprehensive “Bag-of-words” Binary Features Comprehensive “Bag-of-words”, unigrams, bigrams and term frequency features Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
  • 23. Summary (1) Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010 “Bag-of-words” representation gives higher accuracy; Generative approaches give the highest accuracy: Naïve Bayes together with comprehensive feature representation R3 using bigram as features – 0.97; Maximum Entropy classier using unigram “bag-of words” representation R2 – 0.96; Maximum Entropy classier using comprehensive binary counts as feature representation R1 – 0.94. Normalized term frequency is much better than just binary features.
  • 24. Summary (2) Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
  • 25. Entity Extraction in the Domain of Veterinary Medicine (1) Ontology-based Entity Extraction Automated Ontology Construction
  • 26. Domain Meta-data Domain-independent knowledge Domain-specific knowledge Location hierarchy names of countries, states, cities; Time hierarchy canonical dates. Medical ontology diseases, serotypes, and viruses. Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
  • 27.
  • 28.
  • 29. 100 manually labeled document for entity extractionThesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
  • 30. Entity Extraction Results: ROC Curves Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
  • 31. Entity Extraction Results: Learning Curves |OG|=754..1238 |OR|=772..1287 Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
  • 32. Entity Extraction in the Domain of Veterinary Medicine (2) Sequence Labeling using Syntactic Features with Sliding Window
  • 33. Syntactic Feature Extraction POS tag numeric word-level feature Capitalization binary word-level feature Capitalization inside binary word-level feature for identifying abbreviations Position in the sentence numeric document-level feature Position in the document numeric document-level feature Frequency numeric document-level feature Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
  • 34. Sequence Labeling Approach Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
  • 35. An Example of Syntactic Feature Extraction “Severe disease in dairy cattle caused by Salmonella Newport” POS= [NNP, IN, NNS, VBN, …] = [2, 0, 2, 5, …] Xi = [POSi, CAPi, ICAPi, SPOSi, DPOSi, FREQi] Xi-3 = [2, 0, 0, 5, 5, 1] Xi = [2, 1, 0, 8, 8, 1] Xi-2 = [5, 0, 0, 6, 6, 1] Newport Xi-1 = [0, 0, 0, 7, 7, 1] … … wi wi+1 wi+2 wi-3 wi-1 wi+3 wi-2 cattle caused by Salmonella Xi+1 = [2, 1, 0, 9, 9, 1] Xi+2 = [-1, -1, -1, -1, -1, -1] Fi = [Xi, Xi-1, Xi-2, Xi-3, Xi+1, Xi+2, Xi+3], w = 3 Class = {0, 1} Xi+3 = [-1, -1, -1, -1, -1, -1] Fi = [2, 1, 0, 8, 8, 1, 0, 0, 0, 7, 7, 1, 5, 0, 0, 6, 6, 1, 2, 0, 0, 5, 5, 1, 2, 1, 0, 9, 9, 1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1], Class = [1] Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
  • 36. Experiment CSequence Labeling using Syntactic Features 100 manually labeled documents from Experiment B Number of the disease names is more that 5 per document Keep capitalization Remove stop words 202977 examples in the dataset 80% for training (approx. 160000 examples) 20% for testing (approx. 40000 examples) Results are averaged over 3 runs We do not report accuracy because the data set is unbalanced (approx. 8570 positive examples vs. approx. 194430 negative examples) Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
  • 37. Entity Extraction Results: Precision, Recall, AUC (1) Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
  • 38. Entity Extraction Results: Precision, Recall, AUC (2) Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
  • 39. Summary BioCaster named entity recognition system 200 news articles F-score – 76.9 for all named entity classes SVM and feature window -2/+1 including surface word, orthography, biomedical prefixes/suffixes, lemma, head noun etc. DNA, RNA, cell type extraction SVM and orthographic features F-score – 79.9 during the identification phase and 66.5 during the classification phase; Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
  • 40. Disease-related Event Recognition and Classification (1) Sentence-based Event Recognition and Classification
  • 41. Animal Disease-related Event Types Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
  • 42. Event Recognition Methodology Step 1. Entity recognition from raw text. Step 2. Sentence classification from which entities are extracted as being related to an event or not; if they are related to an event we classify them as confirmed or suspected. Step 3. Combination of entities within an event sentence into the structured tuples and aggregation of tuples related to the same event into one comprehensive tuple. Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
  • 43. Step 1.Entity Recognition Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010 Locate and classify atomic elements into predefined categories: Disease names:“foot and mouth disease”, “rift valley fever”; viruses: “picornavirus”; serotypes: “Asia-1”; Species: “sheep”, “pigs”, “cattle” and “livestock”; Locationsof events specified at different levels of geo-granularity: “United Kingdom", “eastern provinces of Shandong and Jiangsu, China”; Datesin different formats: “last Tuesday”, “two month ago”.
  • 44. Entity Recognition Tools Animal Disease Extractor* relies on a medical ontology, automatically-enriched with synonyms and causative viruses. Species Extractor* pattern matching on a stemmed dictionary of animal names from Wikipedia. Location Extractor Stanford NER Tool** (uses conditional random fields); NGA GEOnet Names Database (GNS)*** for location disambiguation and retrieving latitude/longitude. Date/Time Extractor set of regular expressions. *KDD KSU DSEx - http://fingolfin.user.cis.ksu.edu:8080/diseaseextractor/ **Stanford NER - http://nlp.stanford.edu/ner/index.shtml ***GNS - http://earth-info.nga.mil/gns/html/ Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
  • 45. Step 2. Event Sentence Classification Constraint: True events should include a disease name together with a status verb from Google Sets* and WordNet** (eliminate event non-related sentences). “Foot and mouth disease is[V] a highly pathogenic animal disease”. Confirmed status verbs “happened” and verb phrases “strike out” “On 9 Jun 2009, the farm's owner reported[V] symptoms of FMD in more than 30 hogs”. Suspected status verbs “catch” and verb phrases “be taken in” “RVF is suspected[V] in Saudi Arabia in September 2000”. *GoogleSets - http://labs.google.com/sets **WordNet - http://wordnet.princeton.edu/ Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
  • 46. Step 3. Event Tuple Generation Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010 Event attributes: disease date location species confirmation status Event tuple: Eventi = < disease; date; location; species; status > = <FMD, 9 Jun 2009, Taoyuan, hog, confirmed> Event tuple with missing attributes: Eventj = <FMD, ?, ?, ?, confirmed>
  • 47. Event Recognition Workflow Step 1: Entity Recognition Foot-and-mouth disease[DIS]on hog[SP] farm in Taoyuan[LOC]. Taiwan's TVBS television station reports that agricultural authorities confirmed foot-and-mouth disease[DIS] on a hog[SP] farm in Taoyuan[LOC]. On 9 Jun 2009[DT], the farm's owner reported symptoms of FMD[DIS] in more than 30 hogs[SP]. Subsequent testing confirmed FMD[DIS]. Agricultural authorities asked the farmer to strengthen immunization. The outbreak has not affected other farms. Authorities stipulated that the affected hog[SP] farm may not sell pork for 2 weeks. Step 2: Sentence Classification YES 1. Foot-and-mouth disease[DIS]on hog[SP] farm in Taoyuan[LOC]. YES 2.Taiwan's TVBS television station reports that agricultural authorities confirmedfoot-and-mouth disease[DIS]on a hog[SP] farm in Taoyuan[LOC]. YES 3. On 9 Jun 2009[DT], the farm's owner reported symptoms of FMD[DIS] in more than 30 hogs[SP]. YES 4. Subsequent testing confirmedFMD[DIS]. NO 5. Agricultural authorities asked the farmer to strengthen immunization. NO 6. The outbreak has not affected other farms. NO 7. Authorities stipulated that the affected hog[SP] farm may not sell pork for 2 weeks. Step 3a: Tuple Generation E1 = <Foot-and-mouth disease, ?, Taoyuan, hog, ?> E3 = <FMD, 9 Jun 2009,?, hog, reported> E2 = <Foot-and-mouth disease, ?, Taoyuan, hog, confirmed > E4 = <FMD, ?, ?, ?, confirmed> Step 3b: Tuple Aggregation E = <disease, date, location, species, status> = <Foot-and-mouth disease, 9 Jun 2009, Taoyuan, hog, confirmed > Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
  • 48. Experiment DEvent Recognition and Classification The First International Workshop on Web Science and Information Exchange in the Medical Web (MedEx 2010) ~100 event-related documents Foot-and-mouth disease (FMD) Rift valley fever (RVF) Manually created 2 sets of summaries for 100 docs DUCView Pyramid Scoring Tool* – Score [0..1] relies on multiple summaries to assign the significance weights to summarization content units (i.e., entities) to compare automatically generated event tuples with entities from human summaries. Scorei = < wddisease; wtdate; wllocation; wsspecies; wcstatus… >, subject to disease + status = 2
  • 49. Event Score Distribution by Range We interpret the Pyramid score values as an event extraction accuracy: # of unique contributing entities (TP); # of entities not in the summary (FP); # of extra contributing entities from summary (FN). multiple summaries – majority voting for annotation. The First International Workshop on Web Science and Information Exchange in the Medical Web (MedEx 2010)
  • 50. Disease-related Event Recognition and Classification (2) Event Recognition and Classification in Predictive Epidemiology Domain
  • 51. ENTITY EXTRACTION Document 3, sentence s31 Almost 2000 cattle[SP] are waiting to be slaughtered on 02/28/2001[DATE]since the resurgence of FMD[DIS] in Northumberland[LOC]. Document 2, sentence s21 The UK Ministry of Agriculture confirmed on 2/20/01[DATE] that 27 pigs[SP] found with vesicles in an abattoir near Brentwood, Essex[LOC] have FMD[DIS]. Document 1, sentence s11, s12 The signs suggested the 27 pigs[SP] could be suffering from foot and mouth disease[DIS] in Anglesey, Wales[LOC].It was reported on 02/18/01[DATE]. … … EVENT TUPLE GENERATION e11 = [27 pigs, FMD, ?, Anglesey, Wales, “suggest”] e12 = [?,?, 02/18/01, ?, “report”] e21 = [27 pigs, FMD, 2/20/01, Brentwood, Essex, “confirm”] e31 = [2000 cattle, FMD, 2/28/01, Northumberland, “slaughter”] EVENT TUPLE CLASSIFICATION Susceptible Recovered Infected EVENT TUPLE AGGREGATION E2 = [27 pigs, FMD, 2/20/01, Brentwood, Essex, Infected] E3 = [2000 cattle, FMD, 2/28/01, Northumberland, Recovered] E1= [27 pigs, FMD, 02/18/01, Anglesey, Wales, Susceptible] Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
  • 52. The spread of foot-and-mouth disease outbreak in UK, 2001 118 ProMed-Mail reports yellow - susceptible red - infected green - recovered Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
  • 53. Summary The accuracy of the event recognition depends on the separate entity extraction accuracy The event aggregation and deduplication requires much comprehensive heuristics and additional knowledge, for example co-reference resolution BioCaster 950 disease-location pairs per month reported results - 887/950 correct disease-location pairs and 0.934 precision MedISys/PULS 100 English-language documents with 156 events Reported results – 0.88 precision Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
  • 54. Conclusions, Contributions and Future Work Summary: 1. Disease-related Document Classification 2. Ontology-based Entity Extraction 3. Entity Extraction using Sequence Labeling 4. Event Recognition and Classification
  • 55. Conclusions Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010 Disease-related Document Classification supervised framework feature representations and classification algorithms Ontology-based Domain-specific Entity Extraction semantic relationship extraction approach sequence labeling using syntactic patterns Event Recognition and Classification novel sentence-based approach
  • 56. Contributions Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010 Paper “Computational Knowledge and Information Management in Veterinary Epidemiology” IEEE Intelligence and Security Informatics Conference (ISI'10), 23-26 May 2010, Vancouver, BC, Canada Paper “Animal Disease Event Recognition and Classification” First International Workshop on Web Science and Information Exchange in the Medical Web (MedEx'10), WWW Conference, 26-30 April 2010, Raleigh, NC, USA Paper “Boosting Biomedical Entity Extraction by Using Syntactic Patterns for Semantic Relation Discovery” (to appear) 2010 IEEE/WIC/ACM International Conference on Web Intelligence (WI'10), August 31 - September 3, York University, Toronto, Canada Poster “Named Entity Recognition and Tagging in the Domain of Epizootics” Women in Machine Learning Workshop (WiML'09) Workshop, 6-7 Dec 2009, Vancouver, Canada ACM Poster Presentation Competition “Automated Event Extraction and Named Entity Recognition in the Domain of Veterinary Medicine” 2010 Grace Hopper Celebration of Women in Computing (GHC'10),September 28 - October 1, Atlanta, Georgia, USA
  • 57. Future Work Domain-specific Entity Extraction multilingual ontology construction using Wikipedia. Automated Ontology Construction generalize for other named entities Event Recognition and Classification deeper syntactic analysis co-reference resolution Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
  • 58. Acknowledgments Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010 Faculty: Dr. William H. Hsu Dr. Doina Caragea Dr. Gurdip Singh KDD Lab alumni: Tim Weninger (crawler deployment) and Jing Xia (rule-based event extraction) KDD Lab assistants: Information Extraction Team: John Drouhard, Landon Fowles, Swathi Bujuru Spatial Data Mining Team: Wesam Elshamy, AndrewBerggren Topic Detection & Tracking Team: Danny Jones, Srinivas Reddy Fulbright Program supported by the US Department of State's Bureau of Education and Cultural Affairs
  • 59. Thank you! Svitlana Volkova, svitlana.volkova@gmail.com http://people.cis.ksu.edu/~svitlana Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010