Towards a lingua universalis

An Utterly Fallacious Data-Driven Approach, a Misguided
Logical Semantics, and How to Go Forward, Again

Language is not Learnable: PART I
• natural language is an infinite object
• infinite objects are recursively defined
• recursive definitions are rules
• rules are not learnable from examples
natural language is not learnable

How many valid
programs is a Python
compiler ready to
interpret?
Chomsky’s Infinity

How many valid
programs is a Python
compiler ready to
interpret?
) 1

How many sentences
are people are ready to
understand (once they
have attained linguistic
competency)?
I’m sorry but I don’t
have your last sentence
in my dictionary.

) 1
How many sentences
are people are ready to
understand (once they
have attained linguistic
competency)?
I’m sorry but I don’t
have your last sentence
in my dictionary.

We have the capacity to express (and
interpret) an infinite number of thoughts
Infinite
)
we can never be exposed to but a tiny
fraction of examples that, in the end,
are statistically insignificant

Noam Chomsky
the notion of the probability
of a sentence is an entirely
useless one, under any known
interpretation of this term.

External Syntax of Infinite Objects

External Syntax of Infinite Objects
NL
NL

Recursion is the tool by
which we can have a finite
representation of infinite
objects. But recursive
definitions are rules; and
rules are not susceptible to
individual experiences;
and thus infinite objects
cannot be learned from
observation (experience)
Recursion is the tool by
which we can have a finite
representation of infinite
objects. But recursive
definitions are rules; and
rules are not susceptible to
individual experiences;
and thus infinite objects
cannot be learned from
observation (experience)

I reject the contention that an
important theoretical
difference exists between formal
and natural languages
RICHARD MONTAGUE

IMMANUEL KANT
Every thing in nature, in the
inanimate as well as in the
animate world, happens
according to some rules, though
we do not always know them
I reject the contention that an
important theoretical
difference exists between formal
and natural languages
RICHARD MONTAGUE

the challenge in language understanding
is related to uncovering all the missing
text that is never explicitly stated, but is
often implicitly assumed as shared
background knowledge
Language is not Learnable: PART II

The MissingText Phenomenon (MTP)
quantifier scope
BBC has a reporter in every country
BBC has a different reporter in every country)

prepositional phrase attachments
)
John had pizza with his kids
John had pizza along/together with his kids
) John had pizza with a pineapple topping
John had pizza with pineapple

metonymy
)
The corner table wants another beer
The person sitting at the corner table wants another beer

metaphor
)
Don’t worry about Simon, he’s a rock
Don’t worry about Simon, he’s solid like a rock

compound nominals
)
John works in a car factory
John works in a car -producing factory

lexical ambiguity
)
John likes to play bridge
John likes to play the game bridge

HECTOR LEVESQUE
You need to have
that is not expressed in the
words of the sentence to
be able to sort out what is
going on … and it is
precisely bringing this
background knowledge to
bear that we informally
call thinking.

4 technical reasons why natural language
is not Learnable from Data

Unlike formal languages (e.g., Java), in ordinary
spoken languages (e.g., English, Spanish, etc. ) we
leave out implicitly assumed information by relying
in our “common” background knowledge
Ordinary spoken language is thus
highly (in fact, optimally) compressed
)

4 technical reasons why languages are not learnable
QED
3. NL is not compressible (it is already highly compressed) from 1
4. NL does not have redundancies and thus is not learnable from 2 & 3
MISSING TEXT PHENOMENON (MTP)

FUNCTION WORDS
quantifiers every some all most
modals must could should can
prepositions with on for to at
connectives not and or if
relative pronouns that which
In ML/Data-Driven approaches function words are considered to be
stopwords and are typically ignored since their probabilities are equal
in all contexts (they are statistically insignificant) and thus leaving
them would disrupt the entire statistical model.

FUNCTION WORDS
But ignoring function words is problematic: these words are what in the end
determines (‘glues together’) the final meaning. Thus, ML/Data-Driven models,
while they can approximate text similarity, cannot account for true meanings.

STATISTICAL INSIGNIFICANCE
Besides function words, statistical insignificance can occur in situations where the
distinguishing information is not even in the data.
Antonyms/opposites (e.g., big/small, writing/reading) are known to occur in similar
contexts with equal probabilities, and thus in the above, statistical analysis would be
useless since the only difference in the preferred reference is a function of the antonyms

STATISTICAL INSIGNIFICANCE
Clearly, it is neither psychologically nor computationally plausible that we need to see 40,000,000 examples just to learn
how to resolve a reference such as ‘it’ in (1). (it would take a child a lifetime to learn how to resolve such references!)

ACCOUNTING FOR INTENSIONS
Note that ‘2 * (4 + 3)’ might be equal to ‘14’ and to ‘7 + 7’ (by value, only) but the three
expressions are not the same objects (besides their value, they have many other attributes
that they differ in!)
Intension (with an ‘s’) is a complex and very involving subject, but for now we look at
the simple notion of intension that precludes data-driven/quantitative approaches from
being relevant to NLU as these models deal with extensions only and cannot account for
intensions.
Basically, data-driven/quantitative systems can deal with equality, but not sameness –
the latter implies the former, but the former is much weaker!

ACCOUNTING FOR INTENSIONS

PART III
if data-driven
(quantitative) NLU is
not a viable approach,
and if logical semantics
failed (thus far), then
what to do?

One can assume a theory of the world that is
isomorphic to the way we talk about it… in this
case, semantics becomes very nearly trivial
JERRY HOBBS
Use language as a tool for uncovering the semiotic
ontology of commonsense since ordinary language is the
best-known theory we have of everyday knowledge
JOHN A. BATEMAN
we should investigate how our language
functions and then answer the metaphysical
questions
MICHAEL DUMMETT

We know any object
only through predicates
that we can say or
think of it.
Any object has a set
of predicates that can
‘sensibly’ be applied
to it.
Being (and having) a
concept is being locked
to the property that
the concept expresses.
Only where the word for
been found is the thing a
thing... the word alone
gives being to the thing.
How can we uncover all the implicitly assumed
information that is never explicitly stated?
KANT
HEIDEGGER
SOMMERS
FODOR

PART III
All of the above can be summarized as follows:
1. There’s a formal system that underlies all natural languages
2. In our linguistic communication, there seems to be an innate ontological
structure that we safely assume is common to all humans
3. We need to discover the nature of that ontological structure
4. We can use (reverse-engineer) language itself to discover the nature of
that ontological structure that underlies all natural languages

Unfortunately, the dominant logic that
won the day is the ‘logic as a calculus’ -
which is an abstract symbol manipulation
system devoid of any content, and not the
‘logic as a language’ – a logic that has
ontological content, and the logic that was
to be the lingua universalis
NINO. B. COCCHIARELLA

But mistakes made in logical semantics can be corrected
leading to a computationally formal system that underlies
all natural languages
the main mishap in logical semantics was confusing
predicates and types: predication was wrongly used to
represent types in a strongly-typed ontology; types that
correspond to all that we talk about in NL

Types vs. Predicates
How can we explain that (1) and (2) convey, roughly, the same cognitive content?
(1) Julie is an articulate person ) articulate(julie) ^ person(julie)
(2) Julie is articulate ) articulate(julie)

How can we explain that (1) and (2) convey, roughly, the same cognitive content?
(1) Julie is an articulate person ) articulate(julie) ^ person(julie)
(2) Julie is articulate ) articulate(julie)
(p ^ q  p) ¾ q, thus in (1) person(Julie) is assumed to be true a priori. As we
suggest below, distinguishing between logical and ontological concepts results in
(1)/(2) ) (91julie :: person)(articulate(julie))

We can later discuss how this hierarchy
of ontological types (roughly, what Fred
Sommers calls ‘The Language Tree’)
might be discovered

Adjective-Ordering Restrictions
Why is (a) more natural to say than (b)?
(a) Jon bought a beautiful red car
(b) Jon bought a red beautiful car

Because we can always cast-up (generalize).
Casting down, however, is undecidable
Adjective-Ordering Restrictions

An Innate Ontological Structure?
Because we can always cast-up (generalize).
Casting down, however, is undecidable

Type Unification: Ambiguity in Nominal Modification

The objects a and Olga are associated
with more than one type in the same
scope: type unification is required

Check why the nominal modification in
“Olga is an experienced dancer” is
not ambiguous

Discovering the MissingText: Metonymy

The unification of b is easy. The unification of oml, however, will introduce a salient
relationship between oml and another object
(b :: (Beer ² Thing)) ! (b :: Beer)
(oml :: (Omelet ² Human)) ! R(Omelet, Human)
eat(x :: Human, y:: Food) is the most salient relationship between
Human and Food and Omelet v Food

The person eating the omelet wants a beer

Type Unification and Uncovering the MissingText
Activities can be wise?

Type Unification and Uncovering the MissingText
[any person engaged in the activity of] exercising is wise
Activities can be wise?

What ‘Paradox of the Ravens’?
The story goes like this: H1 and H2 are logically equivalent, and thus whatever
confirms H1 must (equally) confirm H2, and vice versa.
But now seeing a red ball, or a pink elephant, or a white table, etc. will confirm
H1, since all of these confirm the logically equivalent hypothesis H2 – which is
clearly counter intuitive (not sure that it’s paradoxical, though!)

32
What if we distinguish between types and predicates?

Now both equivalent hypothesis are
equally confirmed and disconfirmed by
the same observations and ‘Paradox of
the Ravens’ no more!

Lexical Disambiguation
‘party’ is still ambiguous because one can promote a political party as well as promote an event

Lexical Disambiguation
‘party’ is still ambiguous because one can promote a political party as well as promote an event
‘party’ here is not ambiguous because the object of a cancellation can only be an Event

Co-Predication
Type unifications will result in interpreting the above as: John bought a physical Book
and he studied its Content

SUMMARY
1. Most of the challenges in the semantics of NL are about discovering
the missing text – text that is implicitly assumed as shared
2. By embedding ontological types in our predicates and performing
various type operations we can discover all the implicitly assumed
information
3. Logical semantics can be salvaged in a Logic as a Language – that is,
a logic with ontological content

Towards a lingua universalis

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Towards a lingua universalis

Similar to Towards a lingua universalis (20)

Recently uploaded

Recently uploaded (20)

Towards a lingua universalis