SlideShare una empresa de Scribd logo
1 de 36
Descargar para leer sin conexión
The Old Bailey Corpus
  Spoken English in the 18th and
         19th centuries
   The use of historical court records in
   the investigation of language change
           Digital History Seminar, 21 February 2012

Magnus Huber
Department of English
University of Giessen
Otto-Behaghel-Str. 10B
D-35394 Giessen, Germany
magnus.huber@anglistik.uni-giessen.de
Structure
1. Introduction
  1.1 Corpus linguistics, sociolinguistics and
      sociohistorical linguistics
  1.2 The Proceedings of the Old Bailey
  1.3 Turning the Proceedings into a linguistic corpus
2. How linguistically accurate is OBC?
  2.1   Comparison with alternative accounts
  2.2   Language event and its representation
  2.3   Internal consistency: negative contraction
  2.4   Sociolinguistic potential: relative clauses
3. Brief summary                                      2
1. Introduction
1.1 Corpus linguistics, sociolinguistics and
     sociohistorical linguistics
Definition of linguistic corpus
Generally speaking, a
(usually large) collection of
machine-readable texts used
as a database in linguistic
analyses
Importance of
spoken language
Spoken language precedes
written language
Peter Trudgill (1974)
The social differentiation of English in Norwich
100                                Percentage
 80                                of (ng):[n] by
 60                                social class
 40                                and sex
 20                                  Female
  0                                  Male
      MMC LMC UWC MWC LWC
      MMC   middle middle class       drinking
      LMC   lower middle class
      UWC   upper working class
                                      (ng):[n]
      MWC   middle working class        = [drɪnkɪn]
      LWC   lower working class
Historical linguistics: language change
ye > you in subject position
when ye
come set it in
sech rewle as
ye seeme
best (1465)

And thus in
hast fare you
hartely well
(1545)
Sociohistorical linguistics
Gender-related change: ye > you
1.2 The Proceedings of the Old Bailey


•   Old Bailey = London's Central Criminal Court
•   meets 8 times/year, from 1830s 10 times/year
•   "Proceedings" published 1674-1913
•   start as a commercial enterprise: publishers
    send scribes into courtroom
•   proceedings taken down in shorthand
•   sold privately by publishers
•   City of London gains more and more control
    during 18th century
                                                   7
• 2100+ volumes
• ca. 200,000 trials
• ca. 134 million words
www.oldbaileyonline.org
Original computerized Proceedings (Sheffield)
<unit id="t17330510-1"><trial><info><identifier>t17330510-
1</identifier><source>173305100002</source><header>Sar
ah Sanders, theft: specified place, 10 May 1733.</header>
<pfro>17330510</pfro><ntrial>2</ntrial><psession>1733040
4</psession><nsession>17330628</nsession></info>
<p>1. <person gender="f"><defend
gender="f"><given>Sarah </given><surname>Sanders
</surname></defend></person>, was indicted for <off><theft
type="specified place">stealing a Portugal Piece of Gold,
value 36 s. a Gold Ring, value 10 s. a Gold Ring set with
Vermillion Stones, value 7 s. 6d. a Silver Girdle Buckle, value
10 s. three Aprons, a Shirt, a Shift, and 2 Ells of Holland, the
Goods of <person gender="m"><victim
gender="m"><given>John </given><surname>Underwood
</surname></victim> </person>, in his House</theft></off>,
<cd>March 4</cd>.</p>
<p>John Underwood. The Prisoner was my
<deflabel>Servant</deflabel>, she came to me very well
recommended, but had not staid above ten Weeks before
several [. . .]
Original computerized Proceedings (Sheffield)
<unit id="t17330510-1"><trial><info><identifier>t17330510-
1</identifier><source>173305100002</source><header>Sar
ah Sanders, theft: specified place, 10 May 1733.</header>
<pfro>17330510</pfro><ntrial>2</ntrial><psession>1733040
4</psession><nsession>17330628</nsession></info>
<p>1. <person gender="f"><defend
gender="f"><given>Sarah </given><surname>Sanders
</surname></defend></person>, was indicted for <off><theft
type="specified place">stealing a Portugal Piece of Gold,
value 36 s. a Gold Ring, value 10 s. a Gold Ring set with
Vermillion Stones, value 7 s. 6d. a Silver Girdle Buckle, value
10 s. three Aprons, a Shirt, a Shift, and 2 Ells of Holland, the
Goods of <person gender="m"><victim
gender="m"><given>John </given><surname>Underwood
</surname></victim> </person>, in his House</theft></off>,
<cd>March 4</cd>.</p>
<p>John Underwood. The Prisoner was my
<deflabel>Servant</deflabel>, she came to me very well
recommended, but had not staid above ten Weeks before
several [. . .]
Sociolinguistically useful XML-tags
in Sheffield Proceedings
• name
   <given>Sarah</given> <surname>Sanders</surname>
• year
   <identifier>t17180110-1</identifier>
• gender
   <defend gender="f">
• age
   <age>43</age>
• profession
   <deflabel>Servant</deflabel>
• origin
   <crimeloc>Tottenham</crimeloc>
1.3 Turning the Proceedings
    into a linguistic corpus of
    early spoken English




                                  13
<unit id="t17330510-1"><trial><info><identifier>t17330510-
1</identifier><source>173305100002</source><header>Sa
rah Sanders, theft: specified place, 10 May 1733.</header>
<pfro>17330510</pfro><ntrial>2</ntrial><psession>173304
04</psession><nsession>17330628</nsession></info>
<p>1. <person gender="f"><defend
gender="f"><given>Sarah </given><surname>Sanders
</surname></defend></person>, was indicted for
<off><theft type="specified place">stealing a Portugal Piece
of Gold, value 36 s. a Gold Ring, value 10 s. a Gold Ring set
with Vermillion Stones, value 7 s. 6d. a Silver Girdle Buckle,
value 10 s. three Aprons, a Shirt, a Shift, and 2 Ells of
                <speech>
Holland, the Goods of <person gender="m"><victim
gender="m"><given>John </given><surname>Underwood
</surname></victim> </person>, in his House</theft></off>,
<cd>March 4</cd>.</p>
<p>John Underwood. The Prisoner was my
<deflabel>Servant</deflabel>, she came to me very well
recommended, but had not staid above ten Weeks before
several [. . .]
Tagging spoken language
• Need for automatic annotation
• Perl script identifying non-linguistic
  patterns indicating spoken language
  in the original proceedings
  – layout
  – metalinguistic information
• Linguistic markers indicating spoken
  language? > 1st + 2nd person prns
Automatic speech tagging
  e.g. "Q. – A."-sequences
  <speech>                          </speech>
    Q. Did you see him on Sunday night? - A.
<speech>
    Yes, at Walworth, on Sunday night, the

    12th of January, at one o'clock - I am sure
     </speech>
    of that.</p>
Sociobiographical speech event annotation
The New Bailey Tag Assistant




                                            17
- <xml>
  - <document name="19100426">              Social data file
    ...                                     • XML format
     - <speaker id="271">                   • attributes of every speaker
      <sex>m</sex>
      <age></age>
                                               in OBC
      <given>Thomas</given>                 • plus: scribe, printer,
      <surname>Tuckey</surname>                publisher
      <occupation>Warder</occupation>
      <occupation2></occupation2>
      <hiscolabel>Prison Guard</hiscolabel>
      <hiscocode>58930</hiscocode>
      <hiscolabel2></hiscolabel2>
      <hiscocode2></hiscocode2>
      <crimescene></crimescene>
      <birthplace></birthplace>
      <workplace>Wormwood Scrubs Prison</workplace>
      <placeofresidence></placeofresidence>
      <role>witness</role>
      </speaker>
     ...
  - </document>                                                       18
- </xml>
2. How linguistically accurate is OBC?
2.1. Comparison with alternative accounts, e.g.
     trial of John Ayliffe, 17591024-27, vs. alternative
     account The tryal at large of John Ayliffe

Proceedings (718 words)           Tryal (1290 words)
Thomas. I am clerk to Mr Jones,   Henry Thomas. I am clerk to Mr
a Stationer in the Temple.        Jones, a Stationer, in the Temple.
Hargrave. By Mr Ayliffe: I saw    Walter Hargrave. By Mr Ayliffe. – I
him seal and deliver it.          saw him sign, seal, and deliver it, as
                                  his act and deed.
./.                               John Fannen. I am not sure; but to
                                  the best of my remembrance, it was
                                  sometime the beginning of
                                  December last, at Mr Fox's house.
                                                                       19
Proceedings (718 words)              Tryal (1290 words)
Hargrave. Because he said he         Walter Hargrave. The reason Mr
was not willing Mr Fox should        Ayliffe gave, was, that he would not
know of it?                          on any account have it come to Mr
                                     Fox's ears.
Thomas. I can't particularly say     Henry Thomas. I cannot positively
that; sometimes we leave a           say. – We sometimes leave out the
blank by the gentlemens desire,      conclusion by gentlemen's desire, in
perhaps they may add another         order that they may add a covenant,
covenant, or something of that       or some such thing, if it should be
sort, I can't recollect the reason   thought necessary; but I cannot
for that.                            particularly recollect the reason why
                                     the conclusion was omitted in this
                                     case.


                                                                         20
2.2 Language event ↔ written representation


Letters
formulation     writing




Trial proceedings (e.g. Old Bailey Proceedings)
 speech       perception   shorthand   expanding    proof     type
  event        by scribe     script    shorthand   reading   setting




                                                                21
Gurney (1752)
Brachygraphy: or short-writing
'to take a Speech,
or Sermon
verbatim, as a
Person talks in
common' (p. 3)

Scribes
Thomas Gurney
(1749-1770)
Joseph Gurney
(1770-1782)


                                 22
Recording linguisticdetails
• no distinction between inflected and
  uninflected auxiliaries
         = 'may' or 'mayst'
         = 'can' or 'canst'
        = 'should' or 'shouldst'
• dot placed on the top left of the noun phrase
  = allomorphs a and an
• auxiliary contractions
           'you will' (you w-il) vs.         'you'll' (you-l)
 but │        'it will' ~ 'twill' (│= <t> and it)
                                                           23
2.3 Internal consistency:
     negative contraction
     e.g. do not > don't, need not > needn't, was not > wasn't
     N = 1,344,244
                      NEG contraction in %
18
16
14
12
10
8
6
4
2
0
                                                                24
      1732-1759 1760-1789 1790-1819 1820-1849 1850-1879 1818-1913
Negative contraction in the
OBC, 1732-1912 1. Lexeme?
AUX form    % contr.       N   AUX form % contr.        N
do not       28.9    189,776   is not     0.2      47,142
will not     27.7     17,302   must not   0.2       1,620
shall not    20.6      4,172   would not  0.2      52,123
cannot       13.3    106,005   had not    0.1      72,395
are not       3.2     11,552   has not    0.1       9,244
dare not      3.1        260   should not 0.1      20,192
need not      0.6      2,136   was not    0.1      64,574
did not       0.4    429,143   may not    0.0       1,271
does not      0.4      9,539   might not  0.0       2,404
have not      0.4     44,038   ought not  0.0       1,221
could not     0.2     85,361
                                                        25
Negative contraction in the
OBC, 1732-1912 2. Frequency?
AUX form    % contr.       N   AUX form % contr.        N
do not       28.9    189,776   is not     0.2      47,142
will not     27.7     17,302   must not   0.2       1,620
shall not    20.6      4,172   would not  0.2      52,123
cannot       13.3    106,005   had not    0.1      72,395
are not       3.2     11,552   has not    0.1       9,244
dare not      3.1        260   should not 0.1      20,192
need not      0.6      2,136   was not    0.1      64,574
did not       0.4    429,143   may not    0.0       1,271
does not      0.4      9,539   might not  0.0       2,404
have not      0.4     44,038   ought not  0.0       1,221
could not     0.2     85,361
                                                        26
Negative contraction in the
OBC, 1732-1912 3. Tense?
AUX form    % contr.       N   AUX form % contr.        N
do not       28.9    189,776   is not     0.2      47,142
will not     27.7     17,302   must not   0.2       1,620
shall not    20.6      4,172   would not  0.2      52,123
cannot       13.3    106,005   had not    0.1      72,395
are not       3.2     11,552   has not    0.1       9,244
dare not      3.1        260   should not 0.1      20,192
need not      0.6      2,136   was not    0.1      64,574
did not       0.4    429,143   may not    0.0       1,271
does not      0.4      9,539   might not  0.0       2,404
have not      0.4     44,038   ought not  0.0       1,221
could not     0.2     85,361
                                                        27
Explaining the absence of
negative contraction
• combination of phonology and genre
• n't is phonetically reduced, less salient than not
• do-don't [u - o(u)] vs. did-didn't [ɪ - ɪ]
  can-can't            vs. could-couldn't
  will-won't           vs. would-wouldn't
  shall-shan't         vs. should-shouldn't
• negative contraction is (near) absent where the
  context (e.g. change in the stem vowel in the
  negative) does not allow disambiguation
                                                       28
Hierarchy of perceptive difference
      between positive and negative
             contracted forms
                 V change   C change/   Score
                            addition
do-don('t)           1           1        2
will-won('t)         1           1        2
shall-shan('t)       0.5         1        1.5

can-can('t)          0.5         0        0.5
                                                29
2.4 Sociolinguistic potential: relative
     clauses
 • random extracts of speech events from OBC:
   20,000 words/decade (10,000 w. each for m + f)
 • 2500+ relative clauses, of which 1533 restrictive
      1720-     % 1780-     % 1840-      %     ∑       %
       1779        1839        1913
that    259   53.8  240   45.4  136   26.0    635   41.4
zero    107   22.2  118   22.3  201   38.4    426   27.8
which    70   14.6   97   18.3   92   17.6    259   16.9
who      38    7.9   69   13.0   89   17.0    196   12.8
whom      6    1.2    2    0.4    5    1.0     13    0.8
whose     1    0.2    3    0.6    0    0.0      4    0.3
∑       481         529         523          1533    30
Diagram 1 Distribution of that with regard to
          animacy of the head

          100%
           80%
           60%
           40%
           20%
            0%
                 1720-1779   1780-1839       1840-1913
       non-human    121         164             105
       human        137         76              31
                             1720-1779 vs 1780-1839 p = 0.000
                             1720-1779 vs 1840-1913 p = 0.000
                             1780-1839 vs 1840-1913 p = 0.070
                                                                31
Diagram 2 Distribution of that and pronominal
          relativizers with human heads

       100%
        80%
        60%
        40%
        20%
         0%
                1720-1779   1780-1839        1840-1913
         PRN        49         72               93
         that      137         76               31

                             1720-1779 vs 1780-1839: p = 0.000
                             1720-1779 vs 1840-1913: p = 0.000
                             1780-1839 vs 1840-1913: p = 0.000   32
Diagram 3 Relativizers by gender (excl. genitives)
                              p = 0.135       p = 0.001         p = 0.000
                100%
                 80%
                 60%
                 40%
                 20%
                  0%
                         f     m             f     m           f     m
                        1720-1779           1780-1839         1840-1913
                   PRN 43      71           56    112         66    119
                   zero 53     54           66     52         110    73
                   that 124   134           108   132         72     64
      f 1720-1779 vs 1780-1839: p = 0.135   m 1720-1779 vs 1780-1839: p = 0.033
      f 1720-1779 vs 1840-1913: p = 0.000   m 1720-1779 vs 1840-1913: p = 0.000
      f 1780-1839 vs 1840-1913: p = 0.000   m 1780-1839 vs 1840-1913: p = 0.000
Diagram 4 Zero relativizer by gender (excl. genitives)

                100%
                  80%
                  60%
                  40%
                  20%
                    0%
                          f     m            f     m           f     m
                         1720-1779          1780-1839         1840-1913
                   other 167   205          164 244           138   173
                   zero 53      54          66     52         110    73
      f 1720-1779 vs 1780-1839: p = 0.268   m 1720-1779 vs 1780-1839: p = 0.326
      f 1720-1779 vs 1840-1913: p = 0.000   m 1720-1779 vs 1840-1913: p = 0.022
      f 1780-1839 vs 1840-1913: p = 0.000   m 1780-1839 vs 1840-1913: p = 0.001
Thank you




            35
References
• Gurney, Thomas. 1752. Brachygraphy: or short-writing.
  2nd ed. London: [no publisher].
• Nevalainen, Terttu & Raumolin-Brunberg, Helena (eds).
  1996. Sociolinguistics and language history: studies
  based on the corpus of early English correspondence.
  Amsterdam: Rodopi.
• Trudgill, Peter. 1974. The Social Differentiation of
  English in Norwich. Cambridge: Cambridge University
  Press.
• van Leeuwen, Marco H.D., Ineke Maas and Andrew
  Miles. 2002. HISCO: Historical international standard
  classification of occupations. Leuven: Leuven University
  Press.                                                  36

Más contenido relacionado

Más de Digital History

Identifying responses to revolution
Identifying responses to revolutionIdentifying responses to revolution
Identifying responses to revolutionDigital History
 
Chance encounters with the past
Chance encounters with the pastChance encounters with the past
Chance encounters with the pastDigital History
 
The lives and criminal careers of juvenile offenders
The lives and criminal careers of juvenile offendersThe lives and criminal careers of juvenile offenders
The lives and criminal careers of juvenile offendersDigital History
 
Tudor Intelligence Networks - Ruth Ahnert
Tudor Intelligence Networks - Ruth AhnertTudor Intelligence Networks - Ruth Ahnert
Tudor Intelligence Networks - Ruth AhnertDigital History
 
The Pictorial publisher - Agents technologies and the illustrrated book in Br...
The Pictorial publisher - Agents technologies and the illustrrated book in Br...The Pictorial publisher - Agents technologies and the illustrrated book in Br...
The Pictorial publisher - Agents technologies and the illustrrated book in Br...Digital History
 
Cordell scientific american
Cordell scientific americanCordell scientific american
Cordell scientific americanDigital History
 
Political Meetings Mapper with British Library Labs: mapping the origins of B...
Political Meetings Mapper with British Library Labs: mapping the origins of B...Political Meetings Mapper with British Library Labs: mapping the origins of B...
Political Meetings Mapper with British Library Labs: mapping the origins of B...Digital History
 
European or Imperial Metropolis? Depictions of London in British Newspapers, ...
European or Imperial Metropolis? Depictions of London in British Newspapers, ...European or Imperial Metropolis? Depictions of London in British Newspapers, ...
European or Imperial Metropolis? Depictions of London in British Newspapers, ...Digital History
 
The Challenge of Digital Sources in the Web Age: Common Tensions Across Three...
The Challenge of Digital Sources in the Web Age: Common Tensions Across Three...The Challenge of Digital Sources in the Web Age: Common Tensions Across Three...
The Challenge of Digital Sources in the Web Age: Common Tensions Across Three...Digital History
 
Emma Bayne: ‘Traces Through Time overview and next steps’
Emma Bayne: ‘Traces Through Time overview and next steps’ Emma Bayne: ‘Traces Through Time overview and next steps’
Emma Bayne: ‘Traces Through Time overview and next steps’ Digital History
 
Sonia Ranade: 'Traces Through Time overview and next steps'
Sonia Ranade: 'Traces Through Time overview and next steps'Sonia Ranade: 'Traces Through Time overview and next steps'
Sonia Ranade: 'Traces Through Time overview and next steps'Digital History
 
Sarah Rees Jones (York) and Helen Petrie: 'Chartex overview and next steps'
Sarah Rees Jones (York) and Helen Petrie: 'Chartex overview and next steps' Sarah Rees Jones (York) and Helen Petrie: 'Chartex overview and next steps'
Sarah Rees Jones (York) and Helen Petrie: 'Chartex overview and next steps' Digital History
 
Writing a Big Data History of Music
Writing a Big Data History of MusicWriting a Big Data History of Music
Writing a Big Data History of MusicDigital History
 
Text Mining the History of Medicine
Text Mining the History of MedicineText Mining the History of Medicine
Text Mining the History of MedicineDigital History
 
Tracking the Emergence of New Words across Time and Space
Tracking the Emergence of New Words across Time and SpaceTracking the Emergence of New Words across Time and Space
Tracking the Emergence of New Words across Time and SpaceDigital History
 
Citizen History and its Discontents
Citizen History and its DiscontentsCitizen History and its Discontents
Citizen History and its DiscontentsDigital History
 

Más de Digital History (20)

Identifying responses to revolution
Identifying responses to revolutionIdentifying responses to revolution
Identifying responses to revolution
 
Chance encounters with the past
Chance encounters with the pastChance encounters with the past
Chance encounters with the past
 
The lives and criminal careers of juvenile offenders
The lives and criminal careers of juvenile offendersThe lives and criminal careers of juvenile offenders
The lives and criminal careers of juvenile offenders
 
History of teaching ihr
History of teaching ihrHistory of teaching ihr
History of teaching ihr
 
Tudor Intelligence Networks - Ruth Ahnert
Tudor Intelligence Networks - Ruth AhnertTudor Intelligence Networks - Ruth Ahnert
Tudor Intelligence Networks - Ruth Ahnert
 
The Pictorial publisher - Agents technologies and the illustrrated book in Br...
The Pictorial publisher - Agents technologies and the illustrrated book in Br...The Pictorial publisher - Agents technologies and the illustrrated book in Br...
The Pictorial publisher - Agents technologies and the illustrrated book in Br...
 
Cordell scientific american
Cordell scientific americanCordell scientific american
Cordell scientific american
 
Mapping paris
Mapping parisMapping paris
Mapping paris
 
Political Meetings Mapper with British Library Labs: mapping the origins of B...
Political Meetings Mapper with British Library Labs: mapping the origins of B...Political Meetings Mapper with British Library Labs: mapping the origins of B...
Political Meetings Mapper with British Library Labs: mapping the origins of B...
 
European or Imperial Metropolis? Depictions of London in British Newspapers, ...
European or Imperial Metropolis? Depictions of London in British Newspapers, ...European or Imperial Metropolis? Depictions of London in British Newspapers, ...
European or Imperial Metropolis? Depictions of London in British Newspapers, ...
 
The Challenge of Digital Sources in the Web Age: Common Tensions Across Three...
The Challenge of Digital Sources in the Web Age: Common Tensions Across Three...The Challenge of Digital Sources in the Web Age: Common Tensions Across Three...
The Challenge of Digital Sources in the Web Age: Common Tensions Across Three...
 
Emma Bayne: ‘Traces Through Time overview and next steps’
Emma Bayne: ‘Traces Through Time overview and next steps’ Emma Bayne: ‘Traces Through Time overview and next steps’
Emma Bayne: ‘Traces Through Time overview and next steps’
 
Ihr june15-evans
Ihr june15-evansIhr june15-evans
Ihr june15-evans
 
Petrie ihr presentation
Petrie ihr presentationPetrie ihr presentation
Petrie ihr presentation
 
Sonia Ranade: 'Traces Through Time overview and next steps'
Sonia Ranade: 'Traces Through Time overview and next steps'Sonia Ranade: 'Traces Through Time overview and next steps'
Sonia Ranade: 'Traces Through Time overview and next steps'
 
Sarah Rees Jones (York) and Helen Petrie: 'Chartex overview and next steps'
Sarah Rees Jones (York) and Helen Petrie: 'Chartex overview and next steps' Sarah Rees Jones (York) and Helen Petrie: 'Chartex overview and next steps'
Sarah Rees Jones (York) and Helen Petrie: 'Chartex overview and next steps'
 
Writing a Big Data History of Music
Writing a Big Data History of MusicWriting a Big Data History of Music
Writing a Big Data History of Music
 
Text Mining the History of Medicine
Text Mining the History of MedicineText Mining the History of Medicine
Text Mining the History of Medicine
 
Tracking the Emergence of New Words across Time and Space
Tracking the Emergence of New Words across Time and SpaceTracking the Emergence of New Words across Time and Space
Tracking the Emergence of New Words across Time and Space
 
Citizen History and its Discontents
Citizen History and its DiscontentsCitizen History and its Discontents
Citizen History and its Discontents
 

Último

Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxVishalSingh1417
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfChris Hunter
 
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...KokoStevan
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Shubhangi Sonawane
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin ClassesCeline George
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docxPoojaSen20
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxnegromaestrong
 

Último (20)

Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
 
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 

Magnus Huber - The Old Bailey Corpus: Spoken English in the 18th and 19th Centuries

  • 1. The Old Bailey Corpus Spoken English in the 18th and 19th centuries The use of historical court records in the investigation of language change Digital History Seminar, 21 February 2012 Magnus Huber Department of English University of Giessen Otto-Behaghel-Str. 10B D-35394 Giessen, Germany magnus.huber@anglistik.uni-giessen.de
  • 2. Structure 1. Introduction 1.1 Corpus linguistics, sociolinguistics and sociohistorical linguistics 1.2 The Proceedings of the Old Bailey 1.3 Turning the Proceedings into a linguistic corpus 2. How linguistically accurate is OBC? 2.1 Comparison with alternative accounts 2.2 Language event and its representation 2.3 Internal consistency: negative contraction 2.4 Sociolinguistic potential: relative clauses 3. Brief summary 2
  • 3. 1. Introduction 1.1 Corpus linguistics, sociolinguistics and sociohistorical linguistics Definition of linguistic corpus Generally speaking, a (usually large) collection of machine-readable texts used as a database in linguistic analyses Importance of spoken language Spoken language precedes written language
  • 4. Peter Trudgill (1974) The social differentiation of English in Norwich 100 Percentage 80 of (ng):[n] by 60 social class 40 and sex 20 Female 0 Male MMC LMC UWC MWC LWC MMC middle middle class drinking LMC lower middle class UWC upper working class (ng):[n] MWC middle working class = [drɪnkɪn] LWC lower working class
  • 5. Historical linguistics: language change ye > you in subject position when ye come set it in sech rewle as ye seeme best (1465) And thus in hast fare you hartely well (1545)
  • 7. 1.2 The Proceedings of the Old Bailey • Old Bailey = London's Central Criminal Court • meets 8 times/year, from 1830s 10 times/year • "Proceedings" published 1674-1913 • start as a commercial enterprise: publishers send scribes into courtroom • proceedings taken down in shorthand • sold privately by publishers • City of London gains more and more control during 18th century 7
  • 8. • 2100+ volumes • ca. 200,000 trials • ca. 134 million words
  • 10. Original computerized Proceedings (Sheffield) <unit id="t17330510-1"><trial><info><identifier>t17330510- 1</identifier><source>173305100002</source><header>Sar ah Sanders, theft: specified place, 10 May 1733.</header> <pfro>17330510</pfro><ntrial>2</ntrial><psession>1733040 4</psession><nsession>17330628</nsession></info> <p>1. <person gender="f"><defend gender="f"><given>Sarah </given><surname>Sanders </surname></defend></person>, was indicted for <off><theft type="specified place">stealing a Portugal Piece of Gold, value 36 s. a Gold Ring, value 10 s. a Gold Ring set with Vermillion Stones, value 7 s. 6d. a Silver Girdle Buckle, value 10 s. three Aprons, a Shirt, a Shift, and 2 Ells of Holland, the Goods of <person gender="m"><victim gender="m"><given>John </given><surname>Underwood </surname></victim> </person>, in his House</theft></off>, <cd>March 4</cd>.</p> <p>John Underwood. The Prisoner was my <deflabel>Servant</deflabel>, she came to me very well recommended, but had not staid above ten Weeks before several [. . .]
  • 11. Original computerized Proceedings (Sheffield) <unit id="t17330510-1"><trial><info><identifier>t17330510- 1</identifier><source>173305100002</source><header>Sar ah Sanders, theft: specified place, 10 May 1733.</header> <pfro>17330510</pfro><ntrial>2</ntrial><psession>1733040 4</psession><nsession>17330628</nsession></info> <p>1. <person gender="f"><defend gender="f"><given>Sarah </given><surname>Sanders </surname></defend></person>, was indicted for <off><theft type="specified place">stealing a Portugal Piece of Gold, value 36 s. a Gold Ring, value 10 s. a Gold Ring set with Vermillion Stones, value 7 s. 6d. a Silver Girdle Buckle, value 10 s. three Aprons, a Shirt, a Shift, and 2 Ells of Holland, the Goods of <person gender="m"><victim gender="m"><given>John </given><surname>Underwood </surname></victim> </person>, in his House</theft></off>, <cd>March 4</cd>.</p> <p>John Underwood. The Prisoner was my <deflabel>Servant</deflabel>, she came to me very well recommended, but had not staid above ten Weeks before several [. . .]
  • 12. Sociolinguistically useful XML-tags in Sheffield Proceedings • name <given>Sarah</given> <surname>Sanders</surname> • year <identifier>t17180110-1</identifier> • gender <defend gender="f"> • age <age>43</age> • profession <deflabel>Servant</deflabel> • origin <crimeloc>Tottenham</crimeloc>
  • 13. 1.3 Turning the Proceedings into a linguistic corpus of early spoken English 13
  • 14. <unit id="t17330510-1"><trial><info><identifier>t17330510- 1</identifier><source>173305100002</source><header>Sa rah Sanders, theft: specified place, 10 May 1733.</header> <pfro>17330510</pfro><ntrial>2</ntrial><psession>173304 04</psession><nsession>17330628</nsession></info> <p>1. <person gender="f"><defend gender="f"><given>Sarah </given><surname>Sanders </surname></defend></person>, was indicted for <off><theft type="specified place">stealing a Portugal Piece of Gold, value 36 s. a Gold Ring, value 10 s. a Gold Ring set with Vermillion Stones, value 7 s. 6d. a Silver Girdle Buckle, value 10 s. three Aprons, a Shirt, a Shift, and 2 Ells of <speech> Holland, the Goods of <person gender="m"><victim gender="m"><given>John </given><surname>Underwood </surname></victim> </person>, in his House</theft></off>, <cd>March 4</cd>.</p> <p>John Underwood. The Prisoner was my <deflabel>Servant</deflabel>, she came to me very well recommended, but had not staid above ten Weeks before several [. . .]
  • 15. Tagging spoken language • Need for automatic annotation • Perl script identifying non-linguistic patterns indicating spoken language in the original proceedings – layout – metalinguistic information • Linguistic markers indicating spoken language? > 1st + 2nd person prns
  • 16. Automatic speech tagging e.g. "Q. – A."-sequences <speech> </speech> Q. Did you see him on Sunday night? - A. <speech> Yes, at Walworth, on Sunday night, the 12th of January, at one o'clock - I am sure </speech> of that.</p>
  • 17. Sociobiographical speech event annotation The New Bailey Tag Assistant 17
  • 18. - <xml> - <document name="19100426"> Social data file ... • XML format - <speaker id="271"> • attributes of every speaker <sex>m</sex> <age></age> in OBC <given>Thomas</given> • plus: scribe, printer, <surname>Tuckey</surname> publisher <occupation>Warder</occupation> <occupation2></occupation2> <hiscolabel>Prison Guard</hiscolabel> <hiscocode>58930</hiscocode> <hiscolabel2></hiscolabel2> <hiscocode2></hiscocode2> <crimescene></crimescene> <birthplace></birthplace> <workplace>Wormwood Scrubs Prison</workplace> <placeofresidence></placeofresidence> <role>witness</role> </speaker> ... - </document> 18 - </xml>
  • 19. 2. How linguistically accurate is OBC? 2.1. Comparison with alternative accounts, e.g. trial of John Ayliffe, 17591024-27, vs. alternative account The tryal at large of John Ayliffe Proceedings (718 words) Tryal (1290 words) Thomas. I am clerk to Mr Jones, Henry Thomas. I am clerk to Mr a Stationer in the Temple. Jones, a Stationer, in the Temple. Hargrave. By Mr Ayliffe: I saw Walter Hargrave. By Mr Ayliffe. – I him seal and deliver it. saw him sign, seal, and deliver it, as his act and deed. ./. John Fannen. I am not sure; but to the best of my remembrance, it was sometime the beginning of December last, at Mr Fox's house. 19
  • 20. Proceedings (718 words) Tryal (1290 words) Hargrave. Because he said he Walter Hargrave. The reason Mr was not willing Mr Fox should Ayliffe gave, was, that he would not know of it? on any account have it come to Mr Fox's ears. Thomas. I can't particularly say Henry Thomas. I cannot positively that; sometimes we leave a say. – We sometimes leave out the blank by the gentlemens desire, conclusion by gentlemen's desire, in perhaps they may add another order that they may add a covenant, covenant, or something of that or some such thing, if it should be sort, I can't recollect the reason thought necessary; but I cannot for that. particularly recollect the reason why the conclusion was omitted in this case. 20
  • 21. 2.2 Language event ↔ written representation Letters formulation writing Trial proceedings (e.g. Old Bailey Proceedings) speech perception shorthand expanding proof type event by scribe script shorthand reading setting 21
  • 22. Gurney (1752) Brachygraphy: or short-writing 'to take a Speech, or Sermon verbatim, as a Person talks in common' (p. 3) Scribes Thomas Gurney (1749-1770) Joseph Gurney (1770-1782) 22
  • 23. Recording linguisticdetails • no distinction between inflected and uninflected auxiliaries = 'may' or 'mayst' = 'can' or 'canst'  = 'should' or 'shouldst' • dot placed on the top left of the noun phrase = allomorphs a and an • auxiliary contractions 'you will' (you w-il) vs. 'you'll' (you-l) but │ 'it will' ~ 'twill' (│= <t> and it) 23
  • 24. 2.3 Internal consistency: negative contraction e.g. do not > don't, need not > needn't, was not > wasn't N = 1,344,244 NEG contraction in % 18 16 14 12 10 8 6 4 2 0 24 1732-1759 1760-1789 1790-1819 1820-1849 1850-1879 1818-1913
  • 25. Negative contraction in the OBC, 1732-1912 1. Lexeme? AUX form % contr. N AUX form % contr. N do not 28.9 189,776 is not 0.2 47,142 will not 27.7 17,302 must not 0.2 1,620 shall not 20.6 4,172 would not 0.2 52,123 cannot 13.3 106,005 had not 0.1 72,395 are not 3.2 11,552 has not 0.1 9,244 dare not 3.1 260 should not 0.1 20,192 need not 0.6 2,136 was not 0.1 64,574 did not 0.4 429,143 may not 0.0 1,271 does not 0.4 9,539 might not 0.0 2,404 have not 0.4 44,038 ought not 0.0 1,221 could not 0.2 85,361 25
  • 26. Negative contraction in the OBC, 1732-1912 2. Frequency? AUX form % contr. N AUX form % contr. N do not 28.9 189,776 is not 0.2 47,142 will not 27.7 17,302 must not 0.2 1,620 shall not 20.6 4,172 would not 0.2 52,123 cannot 13.3 106,005 had not 0.1 72,395 are not 3.2 11,552 has not 0.1 9,244 dare not 3.1 260 should not 0.1 20,192 need not 0.6 2,136 was not 0.1 64,574 did not 0.4 429,143 may not 0.0 1,271 does not 0.4 9,539 might not 0.0 2,404 have not 0.4 44,038 ought not 0.0 1,221 could not 0.2 85,361 26
  • 27. Negative contraction in the OBC, 1732-1912 3. Tense? AUX form % contr. N AUX form % contr. N do not 28.9 189,776 is not 0.2 47,142 will not 27.7 17,302 must not 0.2 1,620 shall not 20.6 4,172 would not 0.2 52,123 cannot 13.3 106,005 had not 0.1 72,395 are not 3.2 11,552 has not 0.1 9,244 dare not 3.1 260 should not 0.1 20,192 need not 0.6 2,136 was not 0.1 64,574 did not 0.4 429,143 may not 0.0 1,271 does not 0.4 9,539 might not 0.0 2,404 have not 0.4 44,038 ought not 0.0 1,221 could not 0.2 85,361 27
  • 28. Explaining the absence of negative contraction • combination of phonology and genre • n't is phonetically reduced, less salient than not • do-don't [u - o(u)] vs. did-didn't [ɪ - ɪ] can-can't vs. could-couldn't will-won't vs. would-wouldn't shall-shan't vs. should-shouldn't • negative contraction is (near) absent where the context (e.g. change in the stem vowel in the negative) does not allow disambiguation 28
  • 29. Hierarchy of perceptive difference between positive and negative contracted forms V change C change/ Score addition do-don('t) 1 1 2 will-won('t) 1 1 2 shall-shan('t) 0.5 1 1.5 can-can('t) 0.5 0 0.5 29
  • 30. 2.4 Sociolinguistic potential: relative clauses • random extracts of speech events from OBC: 20,000 words/decade (10,000 w. each for m + f) • 2500+ relative clauses, of which 1533 restrictive 1720- % 1780- % 1840- % ∑ % 1779 1839 1913 that 259 53.8 240 45.4 136 26.0 635 41.4 zero 107 22.2 118 22.3 201 38.4 426 27.8 which 70 14.6 97 18.3 92 17.6 259 16.9 who 38 7.9 69 13.0 89 17.0 196 12.8 whom 6 1.2 2 0.4 5 1.0 13 0.8 whose 1 0.2 3 0.6 0 0.0 4 0.3 ∑ 481 529 523 1533 30
  • 31. Diagram 1 Distribution of that with regard to animacy of the head 100% 80% 60% 40% 20% 0% 1720-1779 1780-1839 1840-1913 non-human 121 164 105 human 137 76 31 1720-1779 vs 1780-1839 p = 0.000 1720-1779 vs 1840-1913 p = 0.000 1780-1839 vs 1840-1913 p = 0.070 31
  • 32. Diagram 2 Distribution of that and pronominal relativizers with human heads 100% 80% 60% 40% 20% 0% 1720-1779 1780-1839 1840-1913 PRN 49 72 93 that 137 76 31 1720-1779 vs 1780-1839: p = 0.000 1720-1779 vs 1840-1913: p = 0.000 1780-1839 vs 1840-1913: p = 0.000 32
  • 33. Diagram 3 Relativizers by gender (excl. genitives) p = 0.135 p = 0.001 p = 0.000 100% 80% 60% 40% 20% 0% f m f m f m 1720-1779 1780-1839 1840-1913 PRN 43 71 56 112 66 119 zero 53 54 66 52 110 73 that 124 134 108 132 72 64 f 1720-1779 vs 1780-1839: p = 0.135 m 1720-1779 vs 1780-1839: p = 0.033 f 1720-1779 vs 1840-1913: p = 0.000 m 1720-1779 vs 1840-1913: p = 0.000 f 1780-1839 vs 1840-1913: p = 0.000 m 1780-1839 vs 1840-1913: p = 0.000
  • 34. Diagram 4 Zero relativizer by gender (excl. genitives) 100% 80% 60% 40% 20% 0% f m f m f m 1720-1779 1780-1839 1840-1913 other 167 205 164 244 138 173 zero 53 54 66 52 110 73 f 1720-1779 vs 1780-1839: p = 0.268 m 1720-1779 vs 1780-1839: p = 0.326 f 1720-1779 vs 1840-1913: p = 0.000 m 1720-1779 vs 1840-1913: p = 0.022 f 1780-1839 vs 1840-1913: p = 0.000 m 1780-1839 vs 1840-1913: p = 0.001
  • 35. Thank you 35
  • 36. References • Gurney, Thomas. 1752. Brachygraphy: or short-writing. 2nd ed. London: [no publisher]. • Nevalainen, Terttu & Raumolin-Brunberg, Helena (eds). 1996. Sociolinguistics and language history: studies based on the corpus of early English correspondence. Amsterdam: Rodopi. • Trudgill, Peter. 1974. The Social Differentiation of English in Norwich. Cambridge: Cambridge University Press. • van Leeuwen, Marco H.D., Ineke Maas and Andrew Miles. 2002. HISCO: Historical international standard classification of occupations. Leuven: Leuven University Press. 36