SlideShare una empresa de Scribd logo
1 de 17
Jarrar © 2014 1
Dr. Mustafa Jarrar
Sina Institute, Birzeit University
mjarrar@birzeit.edu
www.jarrar.info
Lecture Notes on Natural Language Processing,
Birzeit University, Palestine
2014
Artificial Intelligence
Introduction to
Natural Language Processing
Jarrar © 2014 2
Watch this lecture and download the slides from
http://jarrar-courses.blogspot.com/2011/11/artificial-intelligence-fall-2011.html
Jarrar © 2014 3
Outline
 NLP Applications
 NLP and Intelligence
 Linguistics Levels of ambiguity
 Language Models
Keywords: Natural Language Processing ,NLP, NLP Applications, NLP and Intelligence,
Linguistics Levels of ambiguity, Language Models, Part of Speech Tagging,
‫الحاسوبية‬ ‫اللسانيات‬ ,‫اآللي‬ ‫اللغوي‬ ‫التحليل‬ ،‫اللغوي‬ ‫لغوية,الغموض‬ ‫تطبيقات‬ , ‫الطبيعية‬ ‫للغات‬ ‫اآللية‬ ‫المعالجة‬
Jarrar © 2014 4
Motivation
• Google, Microsoft, Yahoo,
• Job Seeking
• Google translate Systran powers Babelfish
• Myspace, Facebook, Blogspot
• Tools for “business intelligence”
• …..
 Most ideas stem from Academia, but big guys have (several)
strong NLP research labs (like Microsoft, Yahoo, AT&T, IBM, etc.)
Which NLP applications do you use every day?
(how much money these companies are making?)
Jarrar © 2014 5
Why Natural Language Processing?
• Huge amounts of data on the Internet, Intranets, desktops,
• We need applications for processing (understanding, retrieving,
translating, summarizing, …) this large amounts of texts.
• Modern applications contain many NLP components. Imagine your
address book without good NLP to smartly search your contacts!!!
Jarrar © 2014 6
NLP Applications
 Classifiers: classify a set of document into categories, (as spam filters)
 Information Retrieval: find relevant documents to a given query.
 Information Extraction: Extract useful information from resumes; discover
names of people and events they participate in, from a document.
 Machine Translation: translate text from one human language into another
 Question Answering: find answers to natural language questions in a text
collection or database…
 Summarization: Produce a readable summary, e.g., news about oil today.
 Sentiment Analysis, identify people opinion on a subjective.
 Speech Processing: book a hotel over the phone, TTS (for the blind)
 OCR: both print and handwritten.
 Spelling checkers, grammar checkers, auto-filling, ….. and more
Jarrar © 2014 7
Natural Language? and Intelligence?
• Artificial languages, like C# and Java
• Automatic processing of computer languages is easy! why?
• Natural Language, that people speak, like English, Arabic, …
• Automatic processing (analyzing, understanding, generating,…) of
natural languages is very difficult! why?
• Intelligence: Natural? and Artificial (AI).
• Computers are called intelligent if thy are able to process (analyze,
understand, learn,…) natural languages as humans do.
• Modern NLP algorithms are based on machine learning, especially
statistical machine learning.
Jarrar © 2014 8
NLP Current Motives
• Historically: peaks and valleys. Now is a peak, 20 years ago may
have been a valley.
• Security agencies are typically interested in NLP.
• Most big companies nowadays are interested in NLP
• The internet and mobile devices are important driving forces in NLP
research.
Jarrar © 2014 9
Computers Lack Knowledge!
kJfmmfj mmmvvv nnnffn333
Uj iheale eleee mnster vensi credur
Baboi oi cestnitze
Coovoel2^ ekk; ldsllk lkdf vnnjfj?
Fgmflmllk mlfm kfre xnnn!
• People have no trouble understanding language
 Common sense knowledge
 Reasoning capacity
 Experience
• Computers have
 No common sense knowledge
 No reasoning capacity
This is how computers “see” text in English.
Based on [1]
Jarrar © 2014 10
Linguistics Levels of Ambiguity/Analysis
Speech
Written language
– Phonology: sounds / letters / pronunciation
(two, too. ‫صائد‬ ،‫سائد‬)
– Morphology: the structure of words
(child – children, book - books; ‫كتاب‬-‫طفل‬ ،‫كتب‬-‫أكل‬ ،‫أطفال‬-‫يأكل‬ )
– Syntax: grammar, how these sequences are structured
I saw the man with the telescope ‫رأيته‬‫بالنظارة‬
– Semantics: meaning of the strings
(table as data structure, table as furniture. ‫جدول‬-‫جدول‬ ،‫مصفوفة‬-‫نهر‬ )
 Dealing with all of these levels of ambiguity make NLP difficult
Based on [1]
Jarrar © 2014 11
Issues in Syntax
Syntax does not deal with the meaning of a sentence, but it may help?!
“the dog ate my homework”
Who ate? dog
The important thing when we analyze a syntax is to identify the part of
speech (POS): Dog = noun ; ate = verb ; homework = noun
There are programs that do this automatically, called: Part of Speech
Taggers. (also called grammatical tagging)
Accuracy of English POS tagging: 95%.
Identify collocations
mother in law, hot dog
Compositional versus non-compositional collocates
Based on [1]
Jarrar © 2014 12
Issues in Syntax (Part of Speech Tagging)
Pars tree: John loves Mary
Helps a computer to automatically answer questions like -Who did what
and when?
Based on [1]
Assume input sentence S in natural language L. Assume you have rules
(grammar G) that describe syntactic regularities (patterns or structures). Given S
& G, find syntactic structure of S. Such a structure is called a Parse Tree
John loves Mary
Name(John)
NP(John)
Verb(x,y Loves(x,y)))
S(Loves(John, Mary)
Name(Mary)
NP(Mary)
VP(x Loves (x,Mary)
Jarrar © 2014 13
Issues in Syntax
Shallow Parsing:
An analysis of a sentence which identifies the constituents
(noun groups,verbs, verb groups, etc.), but does not specify their
internal structure, nor their role in the main sentence.
Example:
“John Loves Mary”
“John” “Loves Mary”
subject predicate
Identify basic structures as:
NP-[John] VP-[Loves Mary]
Based on [1]
Jarrar © 2014 14
More Issues in Syntax
Anaphora Resolution: resolving what a pronoun, or a noun phrase
refers to. “The dog entered my room. It scared me”
Preposition Attachment
I saw the man in the park with a telescope
‫رأيت‬‫با‬ ‫الجالس‬ ‫الرجل‬‫لنظارة‬
The son asked the father to drive him home
‫طلبت‬‫األم‬‫من‬‫البنت‬‫تصفيف‬‫شعرها‬
Based on [1]
Jarrar © 2014 15
Issues in Semantics
How to understand the meaning, specially that words are ambiguous and
polysemous (may have multiple meanings)
Buy this table? serve that table? sort the table?
‫الطاولة‬ ‫هذه‬ ‫رأيت‬ ‫هل‬.‫الطاولة‬ ‫هذه‬ ‫خدمت‬ ‫هل‬.
How to learn the meaning of words?
- From available dictionaries? WordNet?
- Applying statistical methods on annotated examples?
How to learn the meaning (word-sense disambiguation)?
Assume a (large) amount of annotated data = training
Assume a new text not annotated = test
Learn from previous experience (training) to classify new data (test)
Decision trees, memory based learning, neural networks
Jarrar © 2014 16
Language Models
Three approaches to Natural Language Processing (Language Models)
– Rule-based: using a predefined set of rules (knowledge)
– Statistical: using probabilities of what normally people write or say
– Hybrid models combine the two
Jarrar © 2014 17
Acknowledgement
Some of the slides in this lecture are based on the following resources ,
but with many additions and revision:
[1] Rada Mihalcea: Natural Language Processing, 2008
www.cs.odu.edu/~mukka/cs480f09/Lecturenotes/.../Intro1.ppt
[2] Markus Dickinson: Introduction to Natural Language Processing (NLP),
Linguistics 362 course, 2006
http://www9.georgetown.edu/faculty/mad87/06/362/syllabus.html

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Natural Language Processing for Games Research
Natural Language Processing for Games ResearchNatural Language Processing for Games Research
Natural Language Processing for Games Research
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Natural language procssing
Natural language procssing Natural language procssing
Natural language procssing
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
Natural language processing (NLP) introduction
Natural language processing (NLP) introductionNatural language processing (NLP) introduction
Natural language processing (NLP) introduction
 
Introduction to natural language processing
Introduction to natural language processingIntroduction to natural language processing
Introduction to natural language processing
 
natural language processing help at myassignmenthelp.net
natural language processing  help at myassignmenthelp.netnatural language processing  help at myassignmenthelp.net
natural language processing help at myassignmenthelp.net
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Natural lanaguage processing
Natural lanaguage processingNatural lanaguage processing
Natural lanaguage processing
 
Natural Language Processing with Python
Natural Language Processing with PythonNatural Language Processing with Python
Natural Language Processing with Python
 
Nlp
NlpNlp
Nlp
 
Big Data and Natural Language Processing
Big Data and Natural Language ProcessingBig Data and Natural Language Processing
Big Data and Natural Language Processing
 
Word representation: SVD, LSA, Word2Vec
Word representation: SVD, LSA, Word2VecWord representation: SVD, LSA, Word2Vec
Word representation: SVD, LSA, Word2Vec
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Natural Language Processing Crash Course
Natural Language Processing Crash CourseNatural Language Processing Crash Course
Natural Language Processing Crash Course
 

Destacado

Habash: Arabic Natural Language Processing
Habash: Arabic Natural Language ProcessingHabash: Arabic Natural Language Processing
Habash: Arabic Natural Language Processing
Mustafa Jarrar
 
Jarrar: Introduction to logic and Logic Agents
Jarrar: Introduction to logic and Logic Agents Jarrar: Introduction to logic and Logic Agents
Jarrar: Introduction to logic and Logic Agents
Mustafa Jarrar
 
[GL] 7 العلاقة بين اللغة و الفكر
[GL] 7  العلاقة بين اللغة و الفكر[GL] 7  العلاقة بين اللغة و الفكر
[GL] 7 العلاقة بين اللغة و الفكر
TimAisyah
 
[GL] 2 المدارس اللسانية المعاصرة + تطور الدراسات اللغوية
[GL] 2 المدارس اللسانية المعاصرة + تطور الدراسات اللغوية[GL] 2 المدارس اللسانية المعاصرة + تطور الدراسات اللغوية
[GL] 2 المدارس اللسانية المعاصرة + تطور الدراسات اللغوية
TimAisyah
 

Destacado (20)

Business Process Implementation
Business Process ImplementationBusiness Process Implementation
Business Process Implementation
 
Jarrar: Description Logic
Jarrar: Description LogicJarrar: Description Logic
Jarrar: Description Logic
 
Jarrar: Introduction to Information Retrieval
Jarrar: Introduction to Information RetrievalJarrar: Introduction to Information Retrieval
Jarrar: Introduction to Information Retrieval
 
Jarrar: Probabilistic Language Modeling - Introduction to N-grams
Jarrar: Probabilistic Language Modeling - Introduction to N-gramsJarrar: Probabilistic Language Modeling - Introduction to N-grams
Jarrar: Probabilistic Language Modeling - Introduction to N-grams
 
Habash: Arabic Natural Language Processing
Habash: Arabic Natural Language ProcessingHabash: Arabic Natural Language Processing
Habash: Arabic Natural Language Processing
 
Jarrar: Introduction to logic and Logic Agents
Jarrar: Introduction to logic and Logic Agents Jarrar: Introduction to logic and Logic Agents
Jarrar: Introduction to logic and Logic Agents
 
Jarrar: WordNet And Global WordNets
Jarrar: WordNet And Global WordNetsJarrar: WordNet And Global WordNets
Jarrar: WordNet And Global WordNets
 
Jarrar: Games
Jarrar: GamesJarrar: Games
Jarrar: Games
 
BPMN 2.0 Descriptive Constructs
BPMN 2.0 Descriptive Constructs  BPMN 2.0 Descriptive Constructs
BPMN 2.0 Descriptive Constructs
 
Business Process Design and Re-engineering
Business Process Design and Re-engineeringBusiness Process Design and Re-engineering
Business Process Design and Re-engineering
 
BPMN 2.0 Analytical Constructs
BPMN 2.0 Analytical ConstructsBPMN 2.0 Analytical Constructs
BPMN 2.0 Analytical Constructs
 
Introduction to Artificial Intelligence
Introduction to Artificial Intelligence Introduction to Artificial Intelligence
Introduction to Artificial Intelligence
 
Introduction to Business Process Management
Introduction to Business Process ManagementIntroduction to Business Process Management
Introduction to Business Process Management
 
أصل حرف الدال إلى ما يناسب
أصل حرف الدال إلى ما يناسبأصل حرف الدال إلى ما يناسب
أصل حرف الدال إلى ما يناسب
 
الممارسات الفضلى في تعليم اللغة العربية
الممارسات الفضلى في تعليم اللغة العربيةالممارسات الفضلى في تعليم اللغة العربية
الممارسات الفضلى في تعليم اللغة العربية
 
ICALL 2015
ICALL 2015ICALL 2015
ICALL 2015
 
Jarrar: First Order Logic
Jarrar: First Order LogicJarrar: First Order Logic
Jarrar: First Order Logic
 
[GL] 7 العلاقة بين اللغة و الفكر
[GL] 7  العلاقة بين اللغة و الفكر[GL] 7  العلاقة بين اللغة و الفكر
[GL] 7 العلاقة بين اللغة و الفكر
 
[GL] 2 المدارس اللسانية المعاصرة + تطور الدراسات اللغوية
[GL] 2 المدارس اللسانية المعاصرة + تطور الدراسات اللغوية[GL] 2 المدارس اللسانية المعاصرة + تطور الدراسات اللغوية
[GL] 2 المدارس اللسانية المعاصرة + تطور الدراسات اللغوية
 
Jarrar: Un-informed Search
Jarrar: Un-informed SearchJarrar: Un-informed Search
Jarrar: Un-informed Search
 

Similar a Jarrar: Introduction to Natural Language Processing

Natural Language Processing (NLP).pptx
Natural Language Processing (NLP).pptxNatural Language Processing (NLP).pptx
Natural Language Processing (NLP).pptx
SHIBDASDUTTA
 
Natural language processing (nlp)
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)
Kuppusamy P
 
Intro 2 document
Intro 2 documentIntro 2 document
Intro 2 document
Uma Kant
 

Similar a Jarrar: Introduction to Natural Language Processing (20)

Natural Language Processing (NLP).pptx
Natural Language Processing (NLP).pptxNatural Language Processing (NLP).pptx
Natural Language Processing (NLP).pptx
 
Natural language processing (nlp)
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)
 
NOVA Data Science Meetup 1/19/2017 - Presentation 2
NOVA Data Science Meetup 1/19/2017 - Presentation 2NOVA Data Science Meetup 1/19/2017 - Presentation 2
NOVA Data Science Meetup 1/19/2017 - Presentation 2
 
Natural Language Processing from Object Automation
Natural Language Processing from Object Automation Natural Language Processing from Object Automation
Natural Language Processing from Object Automation
 
Intro
IntroIntro
Intro
 
Intro
IntroIntro
Intro
 
L1 nlp intro
L1 nlp introL1 nlp intro
L1 nlp intro
 
NLP todo
NLP todoNLP todo
NLP todo
 
Natural language processing with python and amharic syntax parse tree by dani...
Natural language processing with python and amharic syntax parse tree by dani...Natural language processing with python and amharic syntax parse tree by dani...
Natural language processing with python and amharic syntax parse tree by dani...
 
5810 oral lang anly transcr wkshp (fall 2014) pdf
5810 oral lang anly transcr wkshp (fall 2014) pdf  5810 oral lang anly transcr wkshp (fall 2014) pdf
5810 oral lang anly transcr wkshp (fall 2014) pdf
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Nlp (1)
Nlp (1)Nlp (1)
Nlp (1)
 
AI_08_NLP.pptx
AI_08_NLP.pptxAI_08_NLP.pptx
AI_08_NLP.pptx
 
5810 day 3 sept 20 2014
5810 day 3 sept 20 2014 5810 day 3 sept 20 2014
5810 day 3 sept 20 2014
 
NLP introduced and in 47 slides Lecture 1.ppt
NLP introduced and in 47 slides Lecture 1.pptNLP introduced and in 47 slides Lecture 1.ppt
NLP introduced and in 47 slides Lecture 1.ppt
 
Teaching Vocabulary Workshop
Teaching Vocabulary WorkshopTeaching Vocabulary Workshop
Teaching Vocabulary Workshop
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Intro 2 document
Intro 2 documentIntro 2 document
Intro 2 document
 
5810 day 3 sept 20 2014
5810 day 3 sept 20 2014 5810 day 3 sept 20 2014
5810 day 3 sept 20 2014
 
Langoslide.pptx
Langoslide.pptxLangoslide.pptx
Langoslide.pptx
 

Más de Mustafa Jarrar

Jarrar: Sparql Project
Jarrar: Sparql ProjectJarrar: Sparql Project
Jarrar: Sparql Project
Mustafa Jarrar
 
Jarrar: Stepwise Methodologies for Developing Ontologies
Jarrar: Stepwise Methodologies for Developing OntologiesJarrar: Stepwise Methodologies for Developing Ontologies
Jarrar: Stepwise Methodologies for Developing Ontologies
Mustafa Jarrar
 

Más de Mustafa Jarrar (18)

Clustering Arabic Tweets for Sentiment Analysis
Clustering Arabic Tweets for Sentiment AnalysisClustering Arabic Tweets for Sentiment Analysis
Clustering Arabic Tweets for Sentiment Analysis
 
Classifying Processes and Basic Formal Ontology
Classifying Processes  and Basic Formal OntologyClassifying Processes  and Basic Formal Ontology
Classifying Processes and Basic Formal Ontology
 
Discrete Mathematics Course Outline
Discrete Mathematics Course OutlineDiscrete Mathematics Course Outline
Discrete Mathematics Course Outline
 
Customer Complaint Ontology
Customer Complaint Ontology Customer Complaint Ontology
Customer Complaint Ontology
 
Subset, Equality, and Exclusion Rules
Subset, Equality, and Exclusion RulesSubset, Equality, and Exclusion Rules
Subset, Equality, and Exclusion Rules
 
Schema Modularization in ORM
Schema Modularization in ORMSchema Modularization in ORM
Schema Modularization in ORM
 
On Computer Science Trends and Priorities in Palestine
On Computer Science Trends and Priorities in PalestineOn Computer Science Trends and Priorities in Palestine
On Computer Science Trends and Priorities in Palestine
 
Lessons from Class Recording & Publishing of Eight Online Courses
Lessons from Class Recording & Publishing of Eight Online CoursesLessons from Class Recording & Publishing of Eight Online Courses
Lessons from Class Recording & Publishing of Eight Online Courses
 
Presentation curras paper-emnlp2014-final
Presentation curras paper-emnlp2014-finalPresentation curras paper-emnlp2014-final
Presentation curras paper-emnlp2014-final
 
Jarrar: Future Internet in Horizon 2020 Calls
Jarrar: Future Internet in Horizon 2020 CallsJarrar: Future Internet in Horizon 2020 Calls
Jarrar: Future Internet in Horizon 2020 Calls
 
Riestra: How to Design and engineer Competitive Horizon 2020 Proposals
Riestra: How to Design and engineer Competitive Horizon 2020 ProposalsRiestra: How to Design and engineer Competitive Horizon 2020 Proposals
Riestra: How to Design and engineer Competitive Horizon 2020 Proposals
 
Bouquet: SIERA Workshop on The Pillars of Horizon2020
Bouquet: SIERA Workshop on The Pillars of Horizon2020Bouquet: SIERA Workshop on The Pillars of Horizon2020
Bouquet: SIERA Workshop on The Pillars of Horizon2020
 
Jarrar: Sparql Project
Jarrar: Sparql ProjectJarrar: Sparql Project
Jarrar: Sparql Project
 
Jarrar: Logical Foundation of Ontology Engineering
Jarrar: Logical Foundation of Ontology EngineeringJarrar: Logical Foundation of Ontology Engineering
Jarrar: Logical Foundation of Ontology Engineering
 
Jarrar: Stepwise Methodologies for Developing Ontologies
Jarrar: Stepwise Methodologies for Developing OntologiesJarrar: Stepwise Methodologies for Developing Ontologies
Jarrar: Stepwise Methodologies for Developing Ontologies
 
Jarrar: Ontology Modeling using OntoClean Methodology
Jarrar: Ontology Modeling using OntoClean MethodologyJarrar: Ontology Modeling using OntoClean Methodology
Jarrar: Ontology Modeling using OntoClean Methodology
 
Jarrar: Informed Search
Jarrar: Informed Search  Jarrar: Informed Search
Jarrar: Informed Search
 
Jarrar: SPARQL - RDF Query Language
Jarrar: SPARQL - RDF Query LanguageJarrar: SPARQL - RDF Query Language
Jarrar: SPARQL - RDF Query Language
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 

Jarrar: Introduction to Natural Language Processing

  • 1. Jarrar © 2014 1 Dr. Mustafa Jarrar Sina Institute, Birzeit University mjarrar@birzeit.edu www.jarrar.info Lecture Notes on Natural Language Processing, Birzeit University, Palestine 2014 Artificial Intelligence Introduction to Natural Language Processing
  • 2. Jarrar © 2014 2 Watch this lecture and download the slides from http://jarrar-courses.blogspot.com/2011/11/artificial-intelligence-fall-2011.html
  • 3. Jarrar © 2014 3 Outline  NLP Applications  NLP and Intelligence  Linguistics Levels of ambiguity  Language Models Keywords: Natural Language Processing ,NLP, NLP Applications, NLP and Intelligence, Linguistics Levels of ambiguity, Language Models, Part of Speech Tagging, ‫الحاسوبية‬ ‫اللسانيات‬ ,‫اآللي‬ ‫اللغوي‬ ‫التحليل‬ ،‫اللغوي‬ ‫لغوية,الغموض‬ ‫تطبيقات‬ , ‫الطبيعية‬ ‫للغات‬ ‫اآللية‬ ‫المعالجة‬
  • 4. Jarrar © 2014 4 Motivation • Google, Microsoft, Yahoo, • Job Seeking • Google translate Systran powers Babelfish • Myspace, Facebook, Blogspot • Tools for “business intelligence” • …..  Most ideas stem from Academia, but big guys have (several) strong NLP research labs (like Microsoft, Yahoo, AT&T, IBM, etc.) Which NLP applications do you use every day? (how much money these companies are making?)
  • 5. Jarrar © 2014 5 Why Natural Language Processing? • Huge amounts of data on the Internet, Intranets, desktops, • We need applications for processing (understanding, retrieving, translating, summarizing, …) this large amounts of texts. • Modern applications contain many NLP components. Imagine your address book without good NLP to smartly search your contacts!!!
  • 6. Jarrar © 2014 6 NLP Applications  Classifiers: classify a set of document into categories, (as spam filters)  Information Retrieval: find relevant documents to a given query.  Information Extraction: Extract useful information from resumes; discover names of people and events they participate in, from a document.  Machine Translation: translate text from one human language into another  Question Answering: find answers to natural language questions in a text collection or database…  Summarization: Produce a readable summary, e.g., news about oil today.  Sentiment Analysis, identify people opinion on a subjective.  Speech Processing: book a hotel over the phone, TTS (for the blind)  OCR: both print and handwritten.  Spelling checkers, grammar checkers, auto-filling, ….. and more
  • 7. Jarrar © 2014 7 Natural Language? and Intelligence? • Artificial languages, like C# and Java • Automatic processing of computer languages is easy! why? • Natural Language, that people speak, like English, Arabic, … • Automatic processing (analyzing, understanding, generating,…) of natural languages is very difficult! why? • Intelligence: Natural? and Artificial (AI). • Computers are called intelligent if thy are able to process (analyze, understand, learn,…) natural languages as humans do. • Modern NLP algorithms are based on machine learning, especially statistical machine learning.
  • 8. Jarrar © 2014 8 NLP Current Motives • Historically: peaks and valleys. Now is a peak, 20 years ago may have been a valley. • Security agencies are typically interested in NLP. • Most big companies nowadays are interested in NLP • The internet and mobile devices are important driving forces in NLP research.
  • 9. Jarrar © 2014 9 Computers Lack Knowledge! kJfmmfj mmmvvv nnnffn333 Uj iheale eleee mnster vensi credur Baboi oi cestnitze Coovoel2^ ekk; ldsllk lkdf vnnjfj? Fgmflmllk mlfm kfre xnnn! • People have no trouble understanding language  Common sense knowledge  Reasoning capacity  Experience • Computers have  No common sense knowledge  No reasoning capacity This is how computers “see” text in English. Based on [1]
  • 10. Jarrar © 2014 10 Linguistics Levels of Ambiguity/Analysis Speech Written language – Phonology: sounds / letters / pronunciation (two, too. ‫صائد‬ ،‫سائد‬) – Morphology: the structure of words (child – children, book - books; ‫كتاب‬-‫طفل‬ ،‫كتب‬-‫أكل‬ ،‫أطفال‬-‫يأكل‬ ) – Syntax: grammar, how these sequences are structured I saw the man with the telescope ‫رأيته‬‫بالنظارة‬ – Semantics: meaning of the strings (table as data structure, table as furniture. ‫جدول‬-‫جدول‬ ،‫مصفوفة‬-‫نهر‬ )  Dealing with all of these levels of ambiguity make NLP difficult Based on [1]
  • 11. Jarrar © 2014 11 Issues in Syntax Syntax does not deal with the meaning of a sentence, but it may help?! “the dog ate my homework” Who ate? dog The important thing when we analyze a syntax is to identify the part of speech (POS): Dog = noun ; ate = verb ; homework = noun There are programs that do this automatically, called: Part of Speech Taggers. (also called grammatical tagging) Accuracy of English POS tagging: 95%. Identify collocations mother in law, hot dog Compositional versus non-compositional collocates Based on [1]
  • 12. Jarrar © 2014 12 Issues in Syntax (Part of Speech Tagging) Pars tree: John loves Mary Helps a computer to automatically answer questions like -Who did what and when? Based on [1] Assume input sentence S in natural language L. Assume you have rules (grammar G) that describe syntactic regularities (patterns or structures). Given S & G, find syntactic structure of S. Such a structure is called a Parse Tree John loves Mary Name(John) NP(John) Verb(x,y Loves(x,y))) S(Loves(John, Mary) Name(Mary) NP(Mary) VP(x Loves (x,Mary)
  • 13. Jarrar © 2014 13 Issues in Syntax Shallow Parsing: An analysis of a sentence which identifies the constituents (noun groups,verbs, verb groups, etc.), but does not specify their internal structure, nor their role in the main sentence. Example: “John Loves Mary” “John” “Loves Mary” subject predicate Identify basic structures as: NP-[John] VP-[Loves Mary] Based on [1]
  • 14. Jarrar © 2014 14 More Issues in Syntax Anaphora Resolution: resolving what a pronoun, or a noun phrase refers to. “The dog entered my room. It scared me” Preposition Attachment I saw the man in the park with a telescope ‫رأيت‬‫با‬ ‫الجالس‬ ‫الرجل‬‫لنظارة‬ The son asked the father to drive him home ‫طلبت‬‫األم‬‫من‬‫البنت‬‫تصفيف‬‫شعرها‬ Based on [1]
  • 15. Jarrar © 2014 15 Issues in Semantics How to understand the meaning, specially that words are ambiguous and polysemous (may have multiple meanings) Buy this table? serve that table? sort the table? ‫الطاولة‬ ‫هذه‬ ‫رأيت‬ ‫هل‬.‫الطاولة‬ ‫هذه‬ ‫خدمت‬ ‫هل‬. How to learn the meaning of words? - From available dictionaries? WordNet? - Applying statistical methods on annotated examples? How to learn the meaning (word-sense disambiguation)? Assume a (large) amount of annotated data = training Assume a new text not annotated = test Learn from previous experience (training) to classify new data (test) Decision trees, memory based learning, neural networks
  • 16. Jarrar © 2014 16 Language Models Three approaches to Natural Language Processing (Language Models) – Rule-based: using a predefined set of rules (knowledge) – Statistical: using probabilities of what normally people write or say – Hybrid models combine the two
  • 17. Jarrar © 2014 17 Acknowledgement Some of the slides in this lecture are based on the following resources , but with many additions and revision: [1] Rada Mihalcea: Natural Language Processing, 2008 www.cs.odu.edu/~mukka/cs480f09/Lecturenotes/.../Intro1.ppt [2] Markus Dickinson: Introduction to Natural Language Processing (NLP), Linguistics 362 course, 2006 http://www9.georgetown.edu/faculty/mad87/06/362/syllabus.html