The main thesis here is this: (i) The Data-Driven approach to NLU is utterly fallacious; (ii) Logical Semantics has been seriously misguided; and (iii) logical semantics can be rectified, and here we suggest how this can be done and how to go forward, again
3. Language is not Learnable: PART I
• natural language is an infinite object
• infinite objects are recursively defined
• recursive definitions are rules
• rules are not learnable from examples
natural language is not learnable
6. Chomsky’s Infinity
How many sentences
are people are ready to
understand (once they
have attained linguistic
competency)?
I’m sorry but I don’t
have your last sentence
in my dictionary.
7. Chomsky’s Infinity
) 1
How many sentences
are people are ready to
understand (once they
have attained linguistic
competency)?
I’m sorry but I don’t
have your last sentence
in my dictionary.
8. We have the capacity to express (and
interpret) an infinite number of thoughts
Infinite
)
we can never be exposed to but a tiny
fraction of examples that, in the end,
are statistically insignificant
9. Noam Chomsky
the notion of the probability
of a sentence is an entirely
useless one, under any known
interpretation of this term.
12. Recursion is the tool by
which we can have a finite
representation of infinite
objects. But recursive
definitions are rules; and
rules are not susceptible to
individual experiences;
and thus infinite objects
cannot be learned from
observation (experience)
Recursion is the tool by
which we can have a finite
representation of infinite
objects. But recursive
definitions are rules; and
rules are not susceptible to
individual experiences;
and thus infinite objects
cannot be learned from
observation (experience)
13. I reject the contention that an
important theoretical
difference exists between formal
and natural languages
RICHARD MONTAGUE
14. IMMANUEL KANT
Every thing in nature, in the
inanimate as well as in the
animate world, happens
according to some rules, though
we do not always know them
I reject the contention that an
important theoretical
difference exists between formal
and natural languages
RICHARD MONTAGUE
15. the challenge in language understanding
is related to uncovering all the missing
text that is never explicitly stated, but is
often implicitly assumed as shared
background knowledge
Language is not Learnable: PART II
16. The MissingText Phenomenon (MTP)
quantifier scope
BBC has a reporter in every country
BBC has a different reporter in every country)
17. prepositional phrase attachments
)
John had pizza with his kids
John had pizza along/together with his kids
) John had pizza with a pineapple topping
John had pizza with pineapple
The MissingText Phenomenon (MTP)
18. metonymy
)
The corner table wants another beer
The person sitting at the corner table wants another beer
The MissingText Phenomenon (MTP)
19. metaphor
)
Don’t worry about Simon, he’s a rock
Don’t worry about Simon, he’s solid like a rock
The MissingText Phenomenon (MTP)
23. HECTOR LEVESQUE
You need to have
background knowledge
that is not expressed in the
words of the sentence to
be able to sort out what is
going on … and it is
precisely bringing this
background knowledge to
bear that we informally
call thinking.
The MissingText Phenomenon (MTP)
25. Unlike formal languages (e.g., Java), in ordinary
spoken languages (e.g., English, Spanish, etc. ) we
leave out implicitly assumed information by relying
in our “common” background knowledge
Ordinary spoken language is thus
highly (in fact, optimally) compressed
)
26. 4 technical reasons why languages are not learnable
QED
3. NL is not compressible (it is already highly compressed) from 1
4. NL does not have redundancies and thus is not learnable from 2 & 3
MISSING TEXT PHENOMENON (MTP)
27. 4 technical reasons why languages are not learnable
FUNCTION WORDS
quantifiers every some all most
modals must could should can
prepositions with on for to at
connectives not and or if
relative pronouns that which
In ML/Data-Driven approaches function words are considered to be
stopwords and are typically ignored since their probabilities are equal
in all contexts (they are statistically insignificant) and thus leaving
them would disrupt the entire statistical model.
28. 4 technical reasons why languages are not learnable
FUNCTION WORDS
But ignoring function words is problematic: these words are what in the end
determines (‘glues together’) the final meaning. Thus, ML/Data-Driven models,
while they can approximate text similarity, cannot account for true meanings.
29. 4 technical reasons why languages are not learnable
STATISTICAL INSIGNIFICANCE
Besides function words, statistical insignificance can occur in situations where the
distinguishing information is not even in the data.
Antonyms/opposites (e.g., big/small, writing/reading) are known to occur in similar
contexts with equal probabilities, and thus in the above, statistical analysis would be
useless since the only difference in the preferred reference is a function of the antonyms
30. 4 technical reasons why languages are not learnable
STATISTICAL INSIGNIFICANCE
Clearly, it is neither psychologically nor computationally plausible that we need to see 40,000,000 examples just to learn
how to resolve a reference such as ‘it’ in (1). (it would take a child a lifetime to learn how to resolve such references!)
31. 4 technical reasons why languages are not learnable
ACCOUNTING FOR INTENSIONS
Note that ‘2 * (4 + 3)’ might be equal to ‘14’ and to ‘7 + 7’ (by value, only) but the three
expressions are not the same objects (besides their value, they have many other attributes
that they differ in!)
Intension (with an ‘s’) is a complex and very involving subject, but for now we look at
the simple notion of intension that precludes data-driven/quantitative approaches from
being relevant to NLU as these models deal with extensions only and cannot account for
intensions.
Basically, data-driven/quantitative systems can deal with equality, but not sameness –
the latter implies the former, but the former is much weaker!
32. 4 technical reasons why languages are not learnable
ACCOUNTING FOR INTENSIONS
34. One can assume a theory of the world that is
isomorphic to the way we talk about it… in this
case, semantics becomes very nearly trivial
JERRY HOBBS
Use language as a tool for uncovering the semiotic
ontology of commonsense since ordinary language is the
best-known theory we have of everyday knowledge
JOHN A. BATEMAN
we should investigate how our language
functions and then answer the metaphysical
questions
MICHAEL DUMMETT
35. We know any object
only through predicates
that we can say or
think of it.
Any object has a set
of predicates that can
‘sensibly’ be applied
to it.
Being (and having) a
concept is being locked
to the property that
the concept expresses.
Only where the word for
been found is the thing a
thing... the word alone
gives being to the thing.
How can we uncover all the implicitly assumed
information that is never explicitly stated?
KANT
HEIDEGGER
SOMMERS
FODOR
36. PART III
All of the above can be summarized as follows:
1. There’s a formal system that underlies all natural languages
2. In our linguistic communication, there seems to be an innate ontological
structure that we safely assume is common to all humans
3. We need to discover the nature of that ontological structure
4. We can use (reverse-engineer) language itself to discover the nature of
that ontological structure that underlies all natural languages
37. Unfortunately, the dominant logic that
won the day is the ‘logic as a calculus’ -
which is an abstract symbol manipulation
system devoid of any content, and not the
‘logic as a language’ – a logic that has
ontological content, and the logic that was
to be the lingua universalis
NINO. B. COCCHIARELLA
38. But mistakes made in logical semantics can be corrected
leading to a computationally formal system that underlies
all natural languages
the main mishap in logical semantics was confusing
predicates and types: predication was wrongly used to
represent types in a strongly-typed ontology; types that
correspond to all that we talk about in NL
39. Types vs. Predicates
How can we explain that (1) and (2) convey, roughly, the same cognitive content?
(1) Julie is an articulate person ) articulate(julie) ^ person(julie)
(2) Julie is articulate ) articulate(julie)
40. Types vs. Predicates
How can we explain that (1) and (2) convey, roughly, the same cognitive content?
(1) Julie is an articulate person ) articulate(julie) ^ person(julie)
(2) Julie is articulate ) articulate(julie)
(p ^ q p) ¾ q, thus in (1) person(Julie) is assumed to be true a priori. As we
suggest below, distinguishing between logical and ontological concepts results in
(1)/(2) ) (91julie :: person)(articulate(julie))
42. Types vs. Predicates
We can later discuss how this hierarchy
of ontological types (roughly, what Fred
Sommers calls ‘The Language Tree’)
might be discovered
45. Because we can always cast-up (generalize).
Casting down, however, is undecidable
Adjective-Ordering Restrictions
Why is (a) more natural to say than (b)?
(a) Jon bought a beautiful red car
(b) Jon bought a red beautiful car
46. An Innate Ontological Structure?
Because we can always cast-up (generalize).
Casting down, however, is undecidable
Why is (a) more natural to say than (b)?
(a) Jon bought a beautiful red car
(b) Jon bought a red beautiful car
48. The objects a and Olga are associated
with more than one type in the same
scope: type unification is required
Type Unification: Ambiguity in Nominal Modification
52. Discovering the MissingText: Metonymy
The unification of b is easy. The unification of oml, however, will introduce a salient
relationship between oml and another object
(b :: (Beer ² Thing)) ! (b :: Beer)
(oml :: (Omelet ² Human)) ! R(Omelet, Human)
eat(x :: Human, y:: Food) is the most salient relationship between
Human and Food and Omelet v Food
57. Type Unification and Uncovering the MissingText
[any person engaged in the activity of] exercising is wise
Activities can be wise?
58. What ‘Paradox of the Ravens’?
The story goes like this: H1 and H2 are logically equivalent, and thus whatever
confirms H1 must (equally) confirm H2, and vice versa.
But now seeing a red ball, or a pink elephant, or a white table, etc. will confirm
H1, since all of these confirm the logically equivalent hypothesis H2 – which is
clearly counter intuitive (not sure that it’s paradoxical, though!)
59. 32
What if we distinguish between types and predicates?
60. What if we distinguish between types and predicates?
61. What if we distinguish between types and predicates?
62. What if we distinguish between types and predicates?
63. What if we distinguish between types and predicates?
Now both equivalent hypothesis are
equally confirmed and disconfirmed by
the same observations and ‘Paradox of
the Ravens’ no more!
65. Lexical Disambiguation
‘party’ is still ambiguous because one can promote a political party as well as promote an event
‘party’ here is not ambiguous because the object of a cancellation can only be an Event
69. SUMMARY
1. Most of the challenges in the semantics of NL are about discovering
the missing text – text that is implicitly assumed as shared
background knowledge
2. By embedding ontological types in our predicates and performing
various type operations we can discover all the implicitly assumed
information
3. Logical semantics can be salvaged in a Logic as a Language – that is,
a logic with ontological content