mlas06_nigam_tie_01.ppt

Machine Learning for Information Extraction: An Overview Kamal Nigam Google Pittsburgh With input, slides and suggestions from William Cohen, Andrew McCallum and Ion Muslea

Example: A Problem Genomics job Mt. Baker, the school district Baker Hostetler , the company Baker, a job opening

Job Openings: Category = Food Services Keyword = Baker Location = Continental U.S.

Extracting Job Openings from the Web Title: Ice Cream Guru Description: If you dream of cold creamy… Contact: [email_address] Category: Travel/Hospitality Function: Food Services

Potential Enabler of Faceted Search

Lots of Structured Information in Text

What is Information Extraction? ,[object Object]

What is Information Extraction? ,[object Object],[object Object]

What is Information Extraction? ,[object Object],[object Object],[object Object]

What is Information Extraction? ,[object Object],[object Object],[object Object],[object Object]

What is Information Extraction? ,[object Object],[object Object],[object Object],[object Object],[object Object]

IE Posed as a Machine Learning Task ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],00 : pm Place : Wean Hall Rm 5409 Speaker : Sebastian Thrun prefix contents suffix … …

Good Features for Information Extraction ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],begins-with-number begins-with-ordinal begins-with-punctuation begins-with-question-word begins-with-subject blank contains-alphanum contains-bracketed-number contains-http contains-non-space contains-number contains-pipe contains-question-mark contains-question-word ends-with-question-mark first-alpha-is-capitalized indented indented-1-to-4 indented-5-to-10 more-than-one-third-space only-punctuation prev-is-blank prev-begins-with-ordinal shorter-than-30 Creativity and Domain Knowledge Required!

[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Good Features for Information Extraction Is Capitalized Is Mixed Caps Is All Caps Initial Cap Contains Digit All lowercase Is Initial Punctuation Period Comma Apostrophe Dash Preceded by HTML tag Character n-gram classifier says string is a person name (80% accurate) In stopword list (the, of, their, etc) In honorific list (Mr, Mrs, Dr, Sen, etc) In person suffix list (Jr, Sr, PhD, etc) In name particle list (de, la, van, der, etc) In Census lastname list; segmented by P(name) In Census firstname list; segmented by P(name) In locations lists (states, cities, countries) In company name list (“J. C. Penny”) In list of company suffixes (Inc, & Associates, Foundation) Creativity and Domain Knowledge Required!

IE History ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Landscape of ML Techniques for IE: Any of these models can be used to capture words, formatting or both. Classify Candidates Abraham Lincoln was born in Kentucky . Classifier which class? Sliding Window Abraham Lincoln was born in Kentucky. Classifier which class? Try alternate window sizes: Boundary Models Abraham Lincoln was born in Kentucky. Classifier which class? BEGIN END BEGIN END BEGIN Finite State Machines Abraham Lincoln was born in Kentucky. Most likely state sequence? Wrapper Induction Abraham Lincoln was born in Kentucky. Learn and apply pattern for a website PersonName

Sliding Windows & Boundary Detection

Information Extraction by Sliding Windows GRAND CHALLENGES FOR MACHINE LEARNING Jaime Carbonell School of Computer Science Carnegie Mellon University 3:30 pm 7500 Wean Hall Machine learning has evolved from obscurity in the 1970s into a vibrant and popular discipline in artificial intelligence during the 1980s and 1990s. As a result of its success and growth, machine learning is evolving into a collection of related disciplines: inductive concept acquisition, analytic learning in problem solving (e.g. analogy, explanation-based learning), learning theory (e.g. PAC learning), genetic algorithms, connectionist learning, hybrid systems, and so on. CMU UseNet Seminar Announcement

Information Extraction by Sliding Window GRAND CHALLENGES FOR MACHINE LEARNING Jaime Carbonell School of Computer Science Carnegie Mellon University 3:30 pm 7500 Wean Hall Machine learning has evolved from obscurity in the 1970s into a vibrant and popular discipline in artificial intelligence during the 1980s and 1990s. As a result of its success and growth, machine learning is evolving into a collection of related disciplines: inductive concept acquisition, analytic learning in problem solving (e.g. analogy, explanation-based learning), learning theory (e.g. PAC learning), genetic algorithms, connectionist learning, hybrid systems, and so on. CMU UseNet Seminar Announcement

Information Extraction with Sliding Windows [Freitag 97, 98; Soderland 97; Califf 98] 00 : pm Place : Wean Hall Rm 5409 Speaker : Sebastian Thrun w t-m w t-1 w t w t+n w t+n+1 w t+n+m prefix contents suffix … … ,[object Object],[object Object],[object Object],[object Object],[object Object],courseNumber(X) :- tokenLength(X,=,2), every(X, inTitle, false), some(X, A, <previousToken>, inTitle, true), some(X, B, <>. tripleton, true)

Rule-learning approaches to sliding-window classification: Summary ,[object Object],[object Object],[object Object]

IE by Boundary Detection GRAND CHALLENGES FOR MACHINE LEARNING Jaime Carbonell School of Computer Science Carnegie Mellon University 3:30 pm 7500 Wean Hall Machine learning has evolved from obscurity in the 1970s into a vibrant and popular discipline in artificial intelligence during the 1980s and 1990s. As a result of its success and growth, machine learning is evolving into a collection of related disciplines: inductive concept acquisition, analytic learning in problem solving (e.g. analogy, explanation-based learning), learning theory (e.g. PAC learning), genetic algorithms, connectionist learning, hybrid systems, and so on. CMU UseNet Seminar Announcement

BWI: Learning to detect boundaries ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[Freitag & Kushmerick, AAAI 2000]

BWI: Learning to detect boundaries ,[object Object],[object Object],[object Object],[object Object]

BWI: Learning to detect boundaries Field F1 Person Name: 30% Location: 61% Start Time: 98%

Problems with Sliding Windows and Boundary Finders ,[object Object],[object Object],[object Object],[object Object]

Hidden Markov Models S t - 1 S t O t S t+1 O t +1 O t - 1 ... ... Finite state model Graphical model Parameters: for all states S={s 1 ,s 2 ,…} Start state probabilities: P(s t ) Transition probabilities: P(s t |s t-1 ) Observation (emission) probabilities: P(o t |s t ) Training: Maximize probability of training observations (w/ prior) HMMs are the standard sequence modeling tool in genomics, music, speech, NLP, … ... transitions observations o 1 o 2 o 3 o 4 o 5 o 6 o 7 o 8 Generates: State sequence Observation sequence Usually a multinomial over atomic, fixed alphabet

IE with Hidden Markov Models Yesterday Lawrence Saul spoke this example sentence. Yesterday Lawrence Saul spoke this example sentence. Person name: Lawrence Saul Given a sequence of observations: and a trained HMM: Find the most likely state sequence: (Viterbi) Any words said to be generated by the designated “person name” state extract as a person name:

Generative Extraction with HMMs ,[object Object],[object Object],[McCallum, Nigam, Seymore & Rennie ‘00]

HMM Example: “Nymble” Other examples of HMMs in IE: [Leek ’97; Freitag & McCallum ’99; Seymore et al. 99] Task: Named Entity Extraction Train on 450k words of news wire text. Case Language F1 . Mixed English 93% Upper English 91% Mixed Spanish 90% [Bikel, et al 97] Person Org Other (Five other name classes) start-of-sentence end-of-sentence Transition probabilities Observation probabilities P(s t | s t-1 , o t-1 ) P(o t | s t , s t-1 ) Back-off to: Back-off to: P(s t | s t-1 ) P(s t ) P(o t | s t , o t-1 ) P(o t | s t ) P(o t ) or Results:

Regrets from Atomic View of Tokens Would like richer representation of text: multiple overlapping features, whole chunks of text. ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Problems with Richer Representation and a Generative Model ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Conditional Sequence Models ,[object Object],[object Object],[object Object],[object Object],[object Object]

Conditional Markov Models S t - 1 S t O t S t+1 O t +1 O t - 1 ... ... Generative (traditional HMM) ... transitions observations S t - 1 S t O t S t+1 O t +1 O t - 1 ... ... Conditional ... transitions observations Standard belief propagation: forward-backward procedure. Viterbi and Baum-Welch follow naturally. Maximum Entropy Markov Models [McCallum, Freitag & Pereira, 2000] MaxEnt POS Tagger [Ratnaparkhi, 1996] SNoW-based Markov Model [Punyakanok & Roth, 2000]

Exponential Form for “Next State” Function Capture dependency on s t-1 with |S| independent functions, P s t-1 (s t |o t ). Each state contains a “next-state classifier” that, given the next observation, produces a probability of the next state, P s t-1 (s t |o t ). s t-1 s t Recipe: - Labeled data is assigned to transitions. - Train each state’s exponential model by maximum entropy weight feature

From HMMs to MEMMs to CRFs HMM MEMM CRF S t-1 S t O t S t+1 O t+1 O t-1 S t-1 S t O t S t+1 O t+1 O t-1 S t-1 S t O t S t+1 O t+1 O t-1 ... ... ... ... ... ... (A special case of MEMMs and CRFs.) Conditional Random Fields (CRFs) [Lafferty, McCallum, Pereira ‘2001]

Conditional Random Fields (CRFs) S t S t+1 S t+2 O = O t , O t+1 , O t+2 , O t+3 , O t+4 S t+3 S t+4 Markov on s , conditional dependency on o. Hammersley-Clifford-Besag theorem stipulates that the CRF has this form—an exponential function of the cliques in the graph. Assuming that the dependency structure of the states is tree-shaped (linear chain is a trivial tree), inference can be done by dynamic programming in time O(|o| |S| 2 )—just like HMMs.

Training CRFs ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[Sha & Pereira 2002] & [Malouf 2002]

Sample IE Applications of CRFs ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Examples of Recent CRF Research ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Further Reading about CRFs ,[object Object],[object Object]

mlas06_nigam_tie_01.ppt

Recomendados

Recomendados

Más contenido relacionado

Destacado

Destacado (7)

Similar a mlas06_nigam_tie_01.ppt

Similar a mlas06_nigam_tie_01.ppt (20)

Más de butest

Más de butest (20)

mlas06_nigam_tie_01.ppt

Notas del editor