The document discusses using computational methods for information access tasks involving large amounts of complex data. It proposes representing text using three feature sets - content words, function words, and constructions - with constructions given equal ontological status as words. An experiment is described that represents sentences this way, builds a language model from news text, and tests classifying sentences for attitude using an SVM on the feature vectors. Classifying attitude based on patterns rather than just words is argued to be useful.
2. jussi karlgren
ph d in (computational) linguistics from stockholm
senior researcher in information access at sics, stockholm
docent in language technology at univ of helsinki
founding partner, gavagai ab, stockholm
3. • independent non-profit
research institute
• about 100-200 researchers
• ... networks, distributed
systems, programming tools,
collaborative environments,
information access, design,
digital art...
4. • recent startup company
• about 7-8 employees
• extracts actionable
intelligence from very large
text streams
5. why use computational methods and machinery for information
access?
two reasons:
1 amount of data is overwhelming → reduce data complexity
let’s call these “simple” tasks
2 signal is weak and complex → peer closer into data
let’s call these “difficult” tasks
6. why use computational methods and machinery for information
access?
two reasons:
1 amount of data is overwhelming → reduce data complexity
let’s call these “simple” tasks
2 signal is weak and complex → peer closer into data
let’s call these “difficult” tasks
7. why use computational methods and machinery for information
access?
two reasons:
1 amount of data is overwhelming → reduce data complexity
let’s call these “simple” tasks
2 signal is weak and complex → peer closer into data
let’s call these “difficult” tasks
8. why use computational methods and machinery for information
access?
two reasons:
1 amount of data is overwhelming → reduce data complexity
let’s call these “simple” tasks
2 signal is weak and complex → peer closer into data
let’s call these “difficult” tasks
9. for the simple tasks the sensible thing to do is to
pound the text into small bits and count the various type of bit.
10. for the simple tasks the sensible thing to do is to
pound the text into small bits and count the various type of bit.
11. this works well up to a point.
for search engines.
but not for e.g. authorship attribution.
12. this works well up to a point.
for search engines.
but not for e.g. authorship attribution.
13. this works well up to a point.
for search engines.
but not for e.g. authorship attribution.
14. what is the next sensible thing to do?
try to organise the bits into piles first, generalising them,
or
try to see if the bits have relations to each other, building more
complex structures.
which involves non-trivial complex decisions and results in a brittle, error-prone procedure.
15. what is the next sensible thing to do?
try to organise the bits into piles first, generalising them,
or
try to see if the bits have relations to each other, building more
complex structures.
which involves non-trivial complex decisions and results in a brittle, error-prone procedure.
16. what is the next sensible thing to do?
try to organise the bits into piles first, generalising them,
or
try to see if the bits have relations to each other, building more
complex structures.
which involves non-trivial complex decisions and results in a brittle, error-prone procedure.
17. why is parsing impractical?
1 new text
2 categories unfounded in data
3 dependencies not based on necessity or efficiency
18. why is parsing impractical?
1 new text
2 categories unfounded in data
3 dependencies not based on necessity or efficiency
19. why is parsing impractical?
1 new text
2 categories unfounded in data
3 dependencies not based on necessity or efficiency
20. why is parsing impractical?
1 new text
2 categories unfounded in data
3 dependencies not based on necessity or efficiency
21. what is in the signal?
“It is this, I think, that commentators mean when they say glibly
that the ‘world changed’ after Sept 11.”
words? or something more?
22. what is in the signal?
“It is this, I think, that commentators mean when they say glibly
that the ‘world changed’ after Sept 11.”
words? or something more?
23. linguistics has an answer.
but that answer doesn’t help much in practical applications.
24. linguistics has an answer.
but that answer doesn’t help much in practical applications.
25. what is in the signal to begin with?
just words?
and a pattern?
26. what is in the signal to begin with?
just words?
and a pattern?
27. what is in the signal to begin with?
just words?
and a pattern?
28. „ah, but the words are in the clause, the pattern is only expressed
by the words that participate in it.”
29. so patterns do not exist when not in use?
do words exist outside their usage?
„To ask where a verbal operant is when a response is not in the course of
being emitted is like asking where one’s knee-jerk is when the physician is
not tapping the patellar tendon.”
B.F. Skinner, Verbal Behavior
30. so patterns do not exist when not in use?
do words exist outside their usage?
„To ask where a verbal operant is when a response is not in the course of
being emitted is like asking where one’s knee-jerk is when the physician is
not tapping the patellar tendon.”
B.F. Skinner, Verbal Behavior
31. so patterns do not exist when not in use?
do words exist outside their usage?
„To ask where a verbal operant is when a response is not in the course of
being emitted is like asking where one’s knee-jerk is when the physician is
not tapping the patellar tendon.”
B.F. Skinner, Verbal Behavior
32. we claim that patterns are part of the signal,
- not incidental to it,
- nor secondary to the terms in it.
this appears to be a contentious statement.
33. we claim that patterns are part of the signal,
- not incidental to it,
- nor secondary to the terms in it.
this appears to be a contentious statement.
34. radical construction grammar (cxg)
1 syntax-lexicon continuum
2 form and function specified in unified model
3 structurally cohesive
(william croft, 2005)
35. 1. syntax-lexicon continuum
Construction type Examples
Complex and abstract syntax sbj be-tense verb-en by agent
Complex and concrete idiom up-tense the ante
Complex and bound morphology noun-s
Atomic and abstract category adj, clause
Atomic and concrete lexicon this, green
are all equal: constructions are the primitive elements
→ no parts of speech, no syntactic categories necessary
36. 2. form and function specified in unified model
→ no separate syntactic (or semantic) component necessary
37. 3. structurally cohesive
→ everything is constructions and nothing else; everything is
specific and nothing is universal
38. practically:
the pattern of an utterance is a feature with the same ontological
status as the terms that occur in the utterance.
constructions and lexemes both have conceptual meaning.
constructions or patterns are present even without recourse to the
words in it.
39. practically:
the pattern of an utterance is a feature with the same ontological
status as the terms that occur in the utterance.
constructions and lexemes both have conceptual meaning.
constructions or patterns are present even without recourse to the
words in it.
40. practically:
the pattern of an utterance is a feature with the same ontological
status as the terms that occur in the utterance.
constructions and lexemes both have conceptual meaning.
constructions or patterns are present even without recourse to the
words in it.
41. our claim:
to study pattern occurrences, no coupling between the features and
the words carrying them needs to be done.
this is quite convenient.
which is good.
42. our claim:
to study pattern occurrences, no coupling between the features and
the words carrying them needs to be done.
this is quite convenient.
which is good.
43. patterns, in various forms have been used in language technology
for some time:
linguistic string project. (1965-1998)
Naomi Sager et al
leading to
information extraction.
large number of adhoc pattern descriptions, closely based on data as observed in use
44. patterns, in various forms have been used in language technology
for some time:
linguistic string project. (1965-1998)
Naomi Sager et al
leading to
information extraction.
large number of adhoc pattern descriptions, closely based on data as observed in use
45. now turn to one example task: identification and analysis of
attitude
46. attitude analysis can be done on any text source
blogs: unfettered discourse, wom, low publication threshold, no
editorial control
but it’s new text — new processing practice necessary
47. attitude analysis can be done on any text source
blogs: unfettered discourse, wom, low publication threshold, no
editorial control
but it’s new text — new processing practice necessary
48. a prototypical attitudinal expression
Expression WHO FEELS WHAT ABOUT WHAT
I like sauerkraut I like sauerkraut
Kissing is nice ? nice kiss
someone sentiment term topic
is this picture true?
49. a prototypical attitudinal expression
Expression WHO FEELS WHAT ABOUT WHAT
I like sauerkraut I like sauerkraut
Kissing is nice ? nice kiss
someone sentiment term topic
is this picture true?
50. it is this, i think, that commentators mean when they say glibly
that the ‘world changed’ after sept 11.
president hafez al-assad has said that peace was a pressing need for
the region and the world at large and syria, considering peace a
strategic option would take steps towards peace.
mr cohen, beginning an eight-day european tour including a nato
defence ministers’ meeting in brussels today and tomorrow, said he
expected further international action soon, though not necessarily
military intervention.
the designers from house on fire do not like random play.
sauerkraut is damn good but kimchi is even better.
bertram powerboats have a deep v hull and handle well in choppy
sea.
m.a.k. halliday thought it natural to view syntax from a functional
perspective.
51. our claim:
attitude is not only lexical or lexicon is not only words & terms
“He blew me off” vs “He blew off”
“He has the best result, we cannot fail him” vs “This is the best
coffee, we cannot fail with it”
“Fifth Avenue”, “9/11”
53. we’ll hand code a number of sample constructions to test our claim
that they might be useful to identify attitudinal expressions.
remember: to study patterns — we do not need to encode explict
linkage to words!
54. we’ll hand code a number of sample constructions to test our claim
that they might be useful to identify attitudinal expressions.
remember: to study patterns — we do not need to encode explict
linkage to words!
55. we represent each sentence using three separate sets of features:
I content words
F form words
K constructions
56. I features
content words – nouns, adjectives, verbs (including verbal uses of
participles), adverbs, abbreviations, numerals, interjections, and
negation
58. K : sentence structure
transitivity, predicate, relative, and object clauses, tense shift within
sentence
59. K : various adverbials
adverbials of location, time, manner, condition, quantity, clause
adverbial, clause initial adverbials
60. K : morphology of sentence constituents
present or past tense, adjectives in base, comparative, or superlative
form
61. K : word dependencies and categories
subordinate conjunctions, negations, prepositional post modifiers,
verb chains, quantifiers, particle verbs, prepositional phrases,
adjective modifiers
62. “It is this, I think, that commentators mean when they
say glibly that the ‘world changed’ after Sept 11.”
I be think commentator mean when say glibly
world change sept 11
F it this i that they that the after
K AdvlTim, AdvlMan, ObjCls, PredCls,
TRIn, TRtr, TRmix, TnsPres, TnsPast,
TnsShift
63. in preliminary experiments with SVM feature selection we find that
several of the K features have high rank for categorisation, notably
TnsShift, TnsPast, TRmix, PredCls.
64. TnsShift
“Noam Chomsky saidpast that what makes human language unique
ispresent recursive centre embedding”
“M.A.K. Halliday believedpast that grammar, viewed functionally,
ispresent natural”
→ saves us from acquiring and maintaining lists of verbs of
utterance, pronuncement, and cognition.
65. TnsShift
“Noam Chomsky saidpast that what makes human language unique
ispresent recursive centre embedding”
“M.A.K. Halliday believedpast that grammar, viewed functionally,
ispresent natural”
→ saves us from acquiring and maintaining lists of verbs of
utterance, pronuncement, and cognition.
67. our experiment:
1 represent sentences using three sets of features: I , F , K .
2 build a language representation using one year of newsprint:
test for differences between KT, MD, GH.
3 test sets of attitudinal sentences: SEMEVAL, NTCIR 6 & 7,
MPQA.
4 put test sentences in word space (random indexing, 2000
dims) with added feature indicating attitude.
5 extract feature vector from word space and run through SVM.
6 test with five-fold crossvalidation.
68. our experiment:
1 represent sentences using three sets of features: I , F , K .
2 build a language representation using one year of newsprint:
test for differences between KT, MD, GH.
3 test sets of attitudinal sentences: SEMEVAL, NTCIR 6 & 7,
MPQA.
4 put test sentences in word space (random indexing, 2000
dims) with added feature indicating attitude.
5 extract feature vector from word space and run through SVM.
6 test with five-fold crossvalidation.
69. our experiment:
1 represent sentences using three sets of features: I , F , K .
2 build a language representation using one year of newsprint:
test for differences between KT, MD, GH.
3 test sets of attitudinal sentences: SEMEVAL, NTCIR 6 & 7,
MPQA.
4 put test sentences in word space (random indexing, 2000
dims) with added feature indicating attitude.
5 extract feature vector from word space and run through SVM.
6 test with five-fold crossvalidation.
70. our experiment:
1 represent sentences using three sets of features: I , F , K .
2 build a language representation using one year of newsprint:
test for differences between KT, MD, GH.
3 test sets of attitudinal sentences: SEMEVAL, NTCIR 6 & 7,
MPQA.
4 put test sentences in word space (random indexing, 2000
dims) with added feature indicating attitude.
5 extract feature vector from word space and run through SVM.
6 test with five-fold crossvalidation.
71. our experiment:
1 represent sentences using three sets of features: I , F , K .
2 build a language representation using one year of newsprint:
test for differences between KT, MD, GH.
3 test sets of attitudinal sentences: SEMEVAL, NTCIR 6 & 7,
MPQA.
4 put test sentences in word space (random indexing, 2000
dims) with added feature indicating attitude.
5 extract feature vector from word space and run through SVM.
6 test with five-fold crossvalidation.
72. our experiment:
1 represent sentences using three sets of features: I , F , K .
2 build a language representation using one year of newsprint:
test for differences between KT, MD, GH.
3 test sets of attitudinal sentences: SEMEVAL, NTCIR 6 & 7,
MPQA.
4 put test sentences in word space (random indexing, 2000
dims) with added feature indicating attitude.
5 extract feature vector from word space and run through SVM.
6 test with five-fold crossvalidation.
74. F1
NTCIR 6 NTCIR 7 MPQA SEMEVAL
I 46.1 45.2 63.4 42.4
F 44.9 47.5 65.4 40.4
K 42.3 43.6 63.7 33.8
IF 45.9 47.4 67.3 41.4
IK 45.9 48.6 67.0 38.6
FK 46.1 47.9 68.0 37.5
IFK 47.5 48.6 69.2 41.8
Precision range approx 40 approx 70 approx 30
Recall range approx 55-65
K features often help and never really hurt.
(karlgren et al, ECIR 2010)
75. SEMEVAL is different:
Discovered Boys Bring Shock, Joy (+45)
Iraq Car Bombings Kill 22 People, Wound more than 60 (−98)
76. 1 results tie with reported NTCIR and SEMEVAL results, not far
from best MPQA results.
2 combinations with K generally better than those without.
3 SEMEVAL data: much lower results, no surprise given
terseness.
4 background language model has some effect: Glasgow Herald
better precision; Korea Times better for recall for NTCIR data.
77. 1 results tie with reported NTCIR and SEMEVAL results, not far
from best MPQA results.
2 combinations with K generally better than those without.
3 SEMEVAL data: much lower results, no surprise given
terseness.
4 background language model has some effect: Glasgow Herald
better precision; Korea Times better for recall for NTCIR data.
78. 1 results tie with reported NTCIR and SEMEVAL results, not far
from best MPQA results.
2 combinations with K generally better than those without.
3 SEMEVAL data: much lower results, no surprise given
terseness.
4 background language model has some effect: Glasgow Herald
better precision; Korea Times better for recall for NTCIR data.
79. 1 results tie with reported NTCIR and SEMEVAL results, not far
from best MPQA results.
2 combinations with K generally better than those without.
3 SEMEVAL data: much lower results, no surprise given
terseness.
4 background language model has some effect: Glasgow Herald
better precision; Korea Times better for recall for NTCIR data.
83. put sentences in word space (random indexing, 2000 dims) with
added feature for each trigram of structural terms
84.
85.
86. prove utility by better choice of task?
• sentiment and opinion identification
• quote identification
• novelty detection
• authorship attribution
• summarisation
• terminology mining
suggestions?
87. prove utility by better choice of task?
• sentiment and opinion identification
• quote identification
• novelty detection
• authorship attribution
• summarisation
• terminology mining
suggestions?
88. prove utility by better choice of task?
• sentiment and opinion identification
• quote identification
• novelty detection
• authorship attribution
• summarisation
• terminology mining
suggestions?
89. take home
1 constructional models have suitable granularity for simple
difficult tasks
2 constructional models provide simple methodology to test the
effect of language structure
3 constructional features are not subsidiary to word occurrence
features
4 constructional analysis has a long history in language
technology
5 constructional analysis has an opportunity to influence
linguistics
90. take home
1 constructional models have suitable granularity for simple
difficult tasks
2 constructional models provide simple methodology to test the
effect of language structure
3 constructional features are not subsidiary to word occurrence
features
4 constructional analysis has a long history in language
technology
5 constructional analysis has an opportunity to influence
linguistics
91. take home
1 constructional models have suitable granularity for simple
difficult tasks
2 constructional models provide simple methodology to test the
effect of language structure
3 constructional features are not subsidiary to word occurrence
features
4 constructional analysis has a long history in language
technology
5 constructional analysis has an opportunity to influence
linguistics
92. take home
1 constructional models have suitable granularity for simple
difficult tasks
2 constructional models provide simple methodology to test the
effect of language structure
3 constructional features are not subsidiary to word occurrence
features
4 constructional analysis has a long history in language
technology
5 constructional analysis has an opportunity to influence
linguistics
93. take home
1 constructional models have suitable granularity for simple
difficult tasks
2 constructional models provide simple methodology to test the
effect of language structure
3 constructional features are not subsidiary to word occurrence
features
4 constructional analysis has a long history in language
technology
5 constructional analysis has an opportunity to influence
linguistics
94. to discuss
1 what constitutes a construction?
2 what is not a construction?
3 what sort of tasks can use constructions profitably?
4 what sort of abstractions do we want to use for describing
constructions productively?
5 how can we learn constructions automatically?