1) Gallese's research into mirror neurons found that observing another person's actions activates the same areas of the observer's brain that are involved in performing those actions. This suggests humans simulate others' intentional actions unconsciously through an "embodied simulation" mechanism.
2) Gallese argues this embodied simulation allows for direct, non-conceptual understanding of others' intentions. More cognitive elaboration is needed to understand complex social stimuli.
3) Sterelny argues folk psychology is an automated skill acquired through perceptual priming and a carefully engineered social learning environment, not innate conceptual knowledge. This challenges claims of a specialized "theory of mind module."
1. Theory of mind, mirror neurons, and the massive modularity hypothesis
"Are there other areas of human competence where one might hope
to develop a fruitful theory, analogous to generative grammar?
Although this is a very important question, there is very little that
can be said about it today. One might, for example, consider the
problem of how a person comes to acquire a certain concept of three
dimensional space, or an implicit 'theory of human action', in similar
terms. Such a study would begin with the attempt to characterize the
implicit theory that underlies actual performance and would then turn
to the question of how this theory develops under the given conditions
of time and access to data- that is, in what way the resulting system of
beliefs is determined by the interplay of available data, 'heuristic procedures',
and the innate schematism that restricts and conditions the form of the
acquired system. At the moment, this is nothing more than a sketch of a
program of research."
Noam Chomsky (1975) Linguistic contributions: Future1
Carl Mair
Introduction
Chomsky's 'sketch of a program of research' as outlined in the above quote has since
blossomed into the 'modularity hypothesis'; a program comprising both a moderate and a
strong stance. 'Moderate modularity', as argued for by Fodor, breaks human cognitive
competences into 'input mechanisms' and 'central systems', and contends that the former are
hard-wired, encapsulated, inaccessible, domain-specific modules2. The 'central systems' on the
other hand, which are equivlant to the higher cognitive faculties, are serviced by a domain-
general intelligence. 'Strong' or 'massive modularity' differs from Fodor's account by
contending that the 'central systems' themselves are also Fodorean modules. This position has
found its most forceful proponents in a group of psychologists who base their arguments on
an appeal to evolutionary considerations3. The position prides itself in a certain scientific
'toughmindedness'4, and on its reliance on empirical data. One of the earliest and most
thoroughly researched 'modular' 'central systems' is the cognitive competence of
understanding other agents, or 'theory of mind'. This essay will engage a critical discussion of
the so-called 'theory of mind module' by relying on two main approaches. The first approach
will be to examine the very recent neurological studies of Gallese5 et al into the 'mirror-
neurone' systems involved in intentional attunement. Gallese's argument that other agents'
intentional states are made understandable through autonomous and non-propositional
embodied simulation will be presented as an alternative to the existence of a content-rich
innate module, although its relationship to nativism will be shown to be more complex. The
second approach will follow Sterelny in seeking to undercut the logical arguments for a
theory of mind module, by presenting sophisticated exogenous scaffolding as a substitute for
rich conceptual nativism. A third, shorter section will draw on the work of Arbib and
Tomasello in order to shed some light on how these two approaches can be frutifully
1
N. Chomsky Language and Mind (Harcourt Brace, 1975) p 73-74
2
J Fodor There and Back Again in In Critical Condition (MIT, 1998) p 128
3
cosmides and tooby but also Pinker
4
Fodor review of Pinker
5
V. Gallese Embodied simulation: From neurons to phenomenal experience
1
2. synthesised. The conclusion will then seek a reconciliation of these two approaches in order
to suggest a novel hypothesis of this crucial human cognitive competence.
Embodied simulation: mirror neurones and intentional attunement
It is important at the outset to distinguish Gallese's theory of embodied simulation from
Simulation Theory as argued for by Gordon and Goldman in the philosophy of mind.6
While the latter involves a willed cognitive effort aimed at interpreting agents' intentions,
Gallese conceives embodied simulation as an 'automatic, unconscious, and pre-reflexive
functional mechanism'7. Furthermore, in contrast to simulation theory, Gallese's model of
embodied simulation is not a product of a priori reasoning about the nature of the mind, but an
hypothesis which arose from detailed neurobiological studies into the brain. Before this
hypothesis can be explained, a brief introduction to mirror neuron systems is needed.
Gallese was among the original researchers who discovered the systems, and gives the
following account8:
About ten years ago a new class of pre-motor neurons discharging not
only when the monkey executes goal-related hand actions like grasping
objects, but also when observing other individuals (monkeys or humans)
executing similar actions, was discovered in the macaque monkey brain.
These neurons were called "mirror neurons".
Since then, numerous brain-scan experiments have found mirror system homologues in the
brains of human agents. The micro-patterning of neural activation in the observer have been
demonstrated to correspond to the activation patterns in the performing agent. Furthermore,
the mirroring is not just restricted to grasping, but is also found in facial and speech-related
mouth actions9, and in the expression of emotions10. The key requirement for mirror neuron
activation seems to be intentional agentive actions11:
Mirror neurons constitutively map an agentive relation; the mere observation
of an object not acted upon indeed does not evoke any response at all.
Moreover, only actions which belong to the motor 'repertoire' of the observer (or are closely
related) are mapped on the observer's motor systems. Experiments involving humans'
mirroring responses to monkeys, dogs and other humans have shown that though neural
mirroring occurs in response to human silent-speech (i.e in 'repertoire' actions) and monkey
lip-smacking (i.e. 'closely-related'), there is no mirroring in response to dog barking. Put
another way, humans are perceptually tuned to salient actions by conspecifics.
Further experiments have also demonstrated that mirror neuron systems are predicitve, in the
sense of mediating inferences about the goals of the behaviour of others. In another
experiment involving monkeys, the patterns of activation were compared between the full
observation of a grasping action, and one where the final stage of the action was hidden
behind an occluder. In the latter case, the majority of the neurons recorded in the first case
6
S. Guttenplan A Companion to the philosophy of Mind (Blackwell, 1999) p 561
7
V Gallese, see n 5 above, p 41
8
V Gallese, see n 5 above, p 32
9
'The observation of human silent speech activated the...premotor sector of Broca's region', the same area
responsible for performance'. see V Gallese, see n5 above, p 35
10
The experiments to date have been restricted to that of 'disgust'
11
V Gallese, see n5 above, p 35
2
3. still responded, suggesting that 'by simulating the action, the gap can be filled'12, and that
simulations are in effect models of intentional goal-directed actions.
The significance of these findings for gaining some traction on the problem of how humans
interpret each other's actions and intentions should be clear. But first to summarise some main
points. Human brains (and to a lesser extent, primate brains13) contain populations of mirror
neurons which respond to the observation of agentive actions. Whatever pattern of neural
activation occurs in the performer's brain is mirrored in the corresponding loci of the
observer's brain. This holds true for both transitive actions (like grasping, biting) and non-
transitive actions (like speech-like mouth actions and the expression of emotion).Furthermore,
these 'embodied simulations' of performance can be used by the observer to make inferences
about the goals of intentional behaviour, as shown by the occluder experiments. Gallese is
alert to the philosophical implications of his findings for 'theory of mind', and indeed
explicitly engages with the literature. In many ways, the model Gallese suggests is analogous
to that of embodied cognition14 in the representation debate, since what he stresses is the real-
time coupled nature of agents interacting in the world.
In short, Gallese's conclusions are as follows. The folk-psychological approach to
understanding other agents is solipsistic in that it assumes one agent understands another by
giving an objective account of her behaviour according to propositional attitudes, like belief,
desire etc. But although we can and do give such 'objective' descriptions of agents when
asked to 'recognize, discriminate, parameterize, or categorize the emotions or sensations
displayed by others, we exert our cognitive operations by adopting a third-person perspective,
aimed exactly at objectifying the content of our perceptions'.15 Real-time online understanding
of other agents in interactive encounters is non-conceptual, non-declarative and non-
propositional:16
...to perceive an action is equivalent to internally simulating it.
This enables the observer to use her own resources to penetrate
the world of the other by means of a direct, automatic, and
unconscious process of motor simulation. Such simulation
processeses automatically establish a direct link between agent
and observer, in that both are mapped in a neutral fashion.
Gallese calls this automatically generated inter-agent link intentional attunement, and prefers
it to other epistemological approaches because 'it generates predictions about the intrinsic
functional nature of our social cognitive operations that cut across, and neither necessarily
depend on, nor are subordinate to any specific cognitive mind ontology, including that of Folk
Psychology.17
Notwithstanding the explanatory successes of this model, it is clear that intentional
attunement does not give the whole story of our ability to understand other agents. As
Gallese concedes, some social stimuli (particularly emotions) can only be understood by the
'explicit cognitive elaboration of their contextual aspects and previous information.'18
12
V Gallese, see n5 above, p 33s
13
Scientific american??
14
see Andy Clarke
15
V Gallese, see n5 above, p 31
16
V Gallese, see n5 above, p 35
17
V Gallese, see n5 above, p 31
18
V Gallese, see n5 above, p 43
3
4. But these two mechanisms taken together do give the whole story. Embodied simulation and
intentional attunement is the experience-based, non-propositional mechanism which
scaffolds19 the 'propositional, more sophisticated mentalizing mechanism'20. Although Gallese
uses the term 'mechanism' in the singular, it is unlikely by his own admission, that these
second tier sophisticated abilities would be restricted to any one specific region of the brain,
and would certainly be larger than a 'putative domain-specific Theory of Mind Module'21.
The important question now is to explain this secondary cognitively elaborate 'mechanism' for
understanding other agents. We now turn to Sterelny's account of folk psychology as an
'automated skill whose acquistion is scaffolded by downstream niche construction'22 for the
second half of this story.
Downstream epistemic engineering
Superficially, it may seem that Gallese and Sterelny's accounts of folk psychology are
incompatible. Gallese's model presents the ability to understand other agents as a brain
endowment, universal and innate. Sterelny, as we will see shortly, views folk psychology as a
perceptually primed automatic skill, gained by learning. The first account is avowedly
nativist; the second anti-nativist. However, Gallese's model differs from the usual nativist
accounts of folk psychology in that it does not specify innate representational content. The
mechanism is innate; but automatic, pre-reflexive intentional attunement is more like
perception than it is like knowledge. It is of a very different character from a putative theory
of mind module, and it is not one that suffers much from Sterelny's critique.
Sterelny's account provides that folk psychology is 'scaffolded by perceptual tuning'23. But it
is also further scaffolded by an engineered learning environment which helps the acquistion
of the highly sophisticated and cognitively elaborate set of interpretive skills. Here there is an
obvious parallel with Gallese's two mechanisms. But there is also one obvious clash. While
Sterelny's account views the perceptual mechanisms as merely biased towards picking up
salient information in respect of agents' intentions, emotions etc., Gallese's model suggests
that in addition to this bias (i.e a shared 'repertoire' between conspecifics), we also own those
intentions, emotions etc., in that our mirror systems actively simulate them. It seems that
Sterelny's account, like that of folk psychology which Gallese originally attacked, endorses a
kind of agentive solipsism. This is undeniable; it is only with something like a mirror system
that the kind of inter-subjectivity Gallese argues for becomes coherent. Though this is the
case, it has no bearing on the force of Sterelny's deeper point. For a start, the model he
develops is a hybrid between simulation theory and an account of how agents get the
information to guide the simulations.24 It will be recalled from the previous section that
simulation theory is really just a solipsistic account of embodied simulation, and one that
requires conscious cognitive effort as opposed to being unconscious and operationally
autonomous. However, in terms of the explanatory role that it plays in Sterelny's account
there is little difference between the two theories. The fact that intentional attunement sets up
a direct 'neural link' between agents does not mean that the strategic environment
19
Gallese suggests that the malfunctioning of this base mechanism may explain the failure of autistics to have a
fully functioning sophisticated mechanism. Experiments have shown that the mirror systems in autistics are
indeed impaired.
20
V Gallese, see n5 above, p 43
21
V Gallese, see n5 above, p 43
22
K. Sterelny Thought in a Hostile World (Oxford, 2002) p 220
23
K. Sterelny, see n 22 above, p 222
24
K. Sterelny, see n 22 above, p 217
4
5. miraculously becomes transparent. Agents still deceive, fake emotions, pretend to be other
than what they are. Even with embodied simulation, the strategic environment is still
translucent (though perhaps less so) rather than transparent. More information is required in
order to make accurate interpretive judgments.
What follows is Sterelny's hunch on how agents get that extra information, and thus how
Gallese's model acquires its second sophisticated tier. Furthermore, all of these arguments
undercut the position for the rich conceptual nativism of folk psychology.
Before Sterelny's account can be presented, a brief precis of the evolutionary psychology
position with respect to high order cognitive modules is required.
This position depends on three main arguments; two logical, and one 'just so' story. The
logical arguments are those of ‘poverty of the stimulus’ and the 'frame problem'. The
‘poverty of the stimulus’ argument was appropriated from Chomsky’s defence of his
innateness hypothesis. The argument runs: cognitive competence Z involves a multiplicity of
highly complex rules and parameters. A general-purpose learning mechanism could only
acquire Z after extensive tuition, and it would be slow. Since Z is acquired quickly and with
minimal tuition, a large amount of this cognitive structure must be endogenous. The second
logical argument, the ‘frame problem’, can be summarised like this. If the mind were an
homogenous domain-general learning mechanism, every cognitive ‘act’ would be
accompanied by a ‘combinatorial explosion’ as the possible inputs would be enormous, with
the result that cognition could not function. Since cognition does work, the possible inputs
accompanying a cognitive ‘act’ must be limited. Limitation of inputs means domain-specific
modules, not a single domain-general one. The third ‘just so’ story is a speculation on the
selection pressures that were extant in a hypothetical ancestral environment. Cosmides and
Tooby imagine a Pleistocene hunter-gatherer society and the kind of cognitive skills that they
might have needed, which would then have been modularized by the Baldwin effect. These
include: 'face recognition, friendship, child care, theory of mind, social-exchange, folk
biology, folk physics' etc25. Support for a theory of mind module generally follows the rough
contours of the above, though sometimes with the addition of developmental and dissociative
arguments from psychology as well.26
Though they will not be examined here, Sterelny devotes a fair amount of time to showing the
weaknesses in these last two arguments. However, the brunt of his thesis is addressed to the
classic Chomskyan arguments for innate structure, 'poverty of the stimulus' and the 'frame
problem'.
Sterelny's argument for the scaffolding of folk psychology on a carefully constructed learning
environment is essentially one of 'wealth of the stimulus'. While something like embodied
simulation could form the primary mechanism for intentional understanding, the cognitively
elaborate and context-sensitive second tier 'mechanism' is learnt in an environment of high-fi
signal. The term Sterelny uses for this careful structuring of the learning environment is
‘downstream epistemic engineering’: the fact that the way the Nth generation structures its
environment affects the way the N+ 1 th generation interprets and perceives it, and so on to
the N+ Nth generation. Put another way, ‘We engineer the informational environment of our
downstream generation, thus making for more accurate and reliable acquisition of key
25
S. Mithen, The Prehistory of the Mind, (T&H, 1996), p 45
26
There is evidence that mindreading skills go through a maturation process much like language; and also that
mindreading skills dissociate from other high cognitive skills.
5
6. capacities’27. Furthermore, the character of these learning environments for interpretive skills
is determined by selection pressures28:
Selection for interpretive skills could lead to a different evolutionary trajectory:
selection on parents (and via group selection, on the band as a whole) for actions
which scaffold the development of interpretive capacities. Selection rebuilds
the epistemic environment to scaffold the development of those capacities.
Language would also be a crucial part of that learning environment in that it helps in the
identification of perceptually salient inputs29:
Labelling turns perceptual tasks into memory tasks…[it] makes aspects of the
world transparent by establishing a one-to-one correspondence between
sensory properties and functional ones.
By reducing the bewildering array of possible distinctions to only that set of salient inputs the
learner of interpretive skills would also avoid the computational 'frame problem'.
Before we go on to tie the threads of Sterelny's and Gallese's models together, it is useful to
summarise Sterelny's main points. Folk psychology is an acquired automatic skill. The
cognitively sophisticated and context-sensitive mechanisms of mind-reading are learned in a
'wealth of stimulus' environment that has been selected for its efficacy in making interpretive
capacities available to novices. Such exogenous scaffolding would make the skill of
mindreading immune to both 'poverty of the stimulus' and 'frame problem' critiques.
Furthermore, the learning process is aided by both perceptual tuning and internal simulation.
This essay has taken the liberty of substituting Sterelny's posited 'perceptual mechanisms' and
simulation theory by the single cognitve feature of embodied simulation which explains both
features under a single model: actions/gestures/emotions are salient because they are part of
the observers's 'repertoire'; intentions and goal-directed behaviour is understood by the
automatic, unconscious process of intentional attunement
Both Tomasello and Arbib have posited models which also explain higher human cognitive
faculties as a combination of innate brain capacities and human cultural history. We will
briefly review these before offering a conclusion on the matters discussed.
Tomasello and Arbib
The work of Tomasello puts down the singularity of human beings to their ability to
accumulate and modify cultural capital, according to the rachet effect.30 He traces this ability
to our cognitive skill to learn imitatively. Furthermore, Tomasello explains this ability as
dependent on a cognitive capacity which seems almost identical to Gallese's description of
miror neurons31:
Imitative learning does not just mean mimicking the surface structure
of poorly understood behaviour...it also means reproducing an instrumental
act understood intentionally, that is reproducing not just the behavioural
27
K Sterelny, The Evolution and Evolvability of Culture, p 25
28
K. Sterelny, THough in a Hostile world, p 221
29
K. Sterelny, Thought…,p 154
30
The pattern of accumulation and modification of cultural artifacts through time.
31
M Tomasello The Human adaptation for culture
6
7. means but also the intentional end for which the behavioural means was
formulated. This requires some specially adapted skills of social cognition.
It seems plausible that mirror neuron systems fit the bill of the 'specially adapted skills of
social cognition' which allow agentive actions to be understood intentionally. Risking over-
simplication, Tomasello seems to be suggesting the following hierarchy of competences to
explain human singularity:
Mirror neuron systems which allow agents to understand actions intentionally
X) which scaffolds imitative learning
1.) which scaffolds the group's accumulation of skills and practices; construction of niche
2.) which further scaffolds the accumulation/modification of increasingly complex skills
and practices.
X is the underlying skill which is the pre-condition for the accumulation of cultural capital.
The cultural capital acquired at step 1 by the Nth generation is modified at step 2 by the N+
1th generation. The greater the number of repetitions of this process- the greater the number
of turns of the rachet- (i.e the greater the value of N), the richer, more refined and more
advanced are the types of skills available. More cognitively demanding skills require a large
amount of exogenous scaffolding, and thus are more likely to come about later on unless they
are a pre-condition for this process32. Tomasello suggests that 'these facts provide a sufficient
explanation for the existence of many of the most distinctive cognitive products that human
beings produce'33. It should also be noted that Tomasello's model of cultural skill acquisition
is probably recapitulated on the level of the maturing skill-acquiring individual34. Sterelny's
significant contribution to Tomasello's story is his application of this process of cultural
learning to understanding other agents, and in particular to deceptive agents.
The work of Arbib into the relationship between mirror systems and language has suggested
similar conclusions, although his sketch of the chronology of evolved skills links up where
Tomasello's starts. Summarised briefly, Arbib sees sophisticated cognitive skills as the end
point of the following chronology in stages (s):35
s1: Grasping
s2: Mirror system for grasping shared with common ancestor of humans and monkey
s3 Simple imitation system for object directed grasping. Shared with humans and chimps
s4: Complex imitation system for grasping. Hominid line only.
The final stage is claimed to 'involve little if any biological evolution, but instead to result
from historical evolution (historical change)' and is specific to Homo Sapiens. It is clear that
this final stage is where Tomasello's and Sterelny's accounts of cultural evolution and
epistemic engineering enter the story.
The shared thesis between Sterelny, Tomasello and Arbib, in contrast to those who argue for
nativism with respect to cognitive capacities, is that many of these capacities are the product
of cultural learning scaffolded on general brain endowments.
32
perhaps such as language
33
M. Tomasello, see n 31 above, p 513
34
Although the repertoire of acquirable skills for the individual will obviously be composed of both self-directed
trial and error learning as well as culturally acquired skills
35
Arbib The Mirror System Hypothesis. Linking Language to Theory of Mind p2
7
8. Conclusion
In contrast to the position of the evolutionary psychologists, this essay has argued that
domain-general learning mechanisms are sufficient for acquiring high-level cognitive skills,
and in particular the ability to understand other agents. But this argument only goes through if
the brain is granted some specific endowments. The empirical studies of Gallese which have
discovered the existence of mirror systems is such an endowment. This mechanism allows
agents to be perceptually tuned to interpretively salient behaviour of other agents; and
embodied simulation allows agents direct access to the intentions and goals of other agents.
However, this mechanism only makes other agents' intentions, emotions and sensations
translucent. The strategic environment is one where agents lie, fake emotions, and pretend to
be other than what they are. This mechanism thus needs to be supplemented by culturally
mediated learning which allows agents to interpret others. Mirror systems provide Homo
Sapiens with the ability to engage in imitative learning, which allows the accumulation of
cultural capital, including sophisticated interpretive strategies. Sterelny's model of epistemic
engineering allows these skills to be exogenously scaffolded such that they can be acquired
with the requisite high fidelity. Agents may lie, cheat and fake their intentions, but cultural
learning gives interpreting agents the skills to deal with these strategies.
8