SlideShare una empresa de Scribd logo
1 de 64
Zakkiyyah Ansari Maria Fernandez  Christopher Hercules A Comparison of Three Corpora from Internet-based Registers
Introduction/Background Email has effectively replaced the letter in recent years as the frequency of computer mediated communication has increased.  Not being quite a letter, yet not being quite conversational, email has been described as “written speech” or “computer conversation” by some (Perez, Turney, & Montero, 2008.)  Such a classification then allows for the creation of a new register for corpus linguists to examine.  The ENRON corpus , being an example of this register, is the largest real email corpus (Yang, Zeng, & Chau, 2007.) The ENRON corpus contains e-mails that were utilized for a variety of purposes (not all of which professional), and due to this, is a very versatile corpus to work with – many forms of communication are presented here, thus furthering this notion of a register between writing and conversation. Similarly, another common web-based register that has the capacity to be examined by corpus linguists is online news. Slate magazine is an on-line magazine which has never been featured in print.  Topics within Slate include arts, business, sports, news, and politics; although it is primarily politically focused.  It is property of the Microsoft corporation, who contributed 4,694 different articles published between 1996-2000 from its archives to the American National Corpus.
Introduction/Background (cont.) Lastly, there are blogs; perhaps one of the most frequent web-based registers in current use.  Blogs are so varied in their uses and topics that it is almost impossible to talk about blogs without some context.  For example, many blog-writers use their blogs for personal journals, politics, business, academic purposes, and literary criticism, to name a few (Schmidt, 2007.)
Goals of the Project ,[object Object]
Interpret common features found in the three registers to understand their level of formality or informality.
Disprove or confirm the intuition of whether certain registers are truly more formal or informal than others.
Describe and interpret the common linguistic features of each register by utilizing Biber’s 5 major dimensions of English.,[object Object]
Dimensional Discourse Analysis Dimension 1: Edited informational discourse vs. on-line informational discourse Dimension 2: Involved personal discourse vs. non-personal uninvolved discourse (Biber, 1998, p. 183)
Enron Email Example Hi.  This is ***, and now ***'s, friend ***.  I know that she spoke  with you about me, and I wanted to see if you might be interested in getting  together sometime.  You probably want to know a little about me.  I work at  Enron as an electricity trader, and I've lived in Houston for about three  years.  I've known Richard 9 years from our fraternity days together.  I'm 27  and I graduated from UT 4 1/2 years ago.  Of course, I would be interested in  hearing something about you as well. *** just told me that she had a good  friend in Houston that she thought I should meet.  Let me know what you  think, and we can go from there. ***Wow, I almost forgot.  Thank you.  I will contact her soon.  Thanks *** Monday Feb 4th
An Example Slate Article Excerpt The papers report that the federal judge who presided over the Paula Jones lawsuit yesterday ordered President Clinton to pay nearly $90,000 to Jones' legal team for giving false testimony about his relationship with Monica Lewinsky. The sum is less than the Jones lawyers asked for, but more than Clinton offered. Clinton and his lawyer said that they will pay the money without further legal protest. The WP and LAT play the story below the fold, the NYT reefers it, and USAT runs it on Page 5. This play, given the story's unprecedented content, signifies that editors have just plain had it with the whole topic and are sure that readers feel likewise. The WP , NYT , LAT and Wall Street Journal report that China has issued an arrest warrant for Li Hongzhi, the New York-based leader of the now-banned sect Falun Gong, charging him with the deaths of hundreds of his followers. The papers note that the warrant is much more political than legal, in that the U.S. Does not have an extradition treaty with China. The NYT reports that immediately after last summer's terrorist bombing of two American embassies in Africa, the government of Sudan detained two suspects, only to angrily release them after the U.S. conducted a cruise missile strike against an alleged chemical weapons facility in Sudan. Also, the paper says, Sudanese officials claim that the U.S. ignored their message that they had suspects in the case in custody. The NYT says that Sudan's notification has been confirmed by some American officials. – Slate Article 247_3303 from the American National Corpus
Word Count Result/Interpretation In the chart of average word count, there is an exceptional difference across the three registers word counts.  However, the register with the highest word count is the Blog (4,615.5). One explanation that could explain this difference is the fact that blogs tend to be less concise and less informal. While blog writers are creating blogs, they are self conscious of their audience. Therefore, in order to attract readers, bloggers must use attractive language in order to hold a readers attention. In that process writing becomes more drawn out and detail oriented.  In addition, the second highest average word count register is the Enron emails. After analyzing random samples of the Enron email, it can be stated that many of the emails are informal and involved. Thus, both  the Blog and Enron Email registers can be characterized as On-line informational in Discourse Dimension 1. In Discourse Dimension 2, the Enron emails and the blogs can be generalized to exhibit features of Involved personal discourse.
Dimension 1 Analysis Positive features to be analyzed and interpreted: 1.) First Person Pronouns 2.) Second Person Pronouns 3.) It Occurrence 4.) Private Verbs
Dimension 1 Analysis (cont.) Negative features to be analyzed and interpreted: 1.) Noun Usage 2.) Word Length
Dimension 1 Analysis (cont.) According to Biber, more positive features in dimension 1 generalizably suggests more involved discourse. Cyclically, more negative features suggests more in terms of informational production. (Biber, 1998)
First Person Interpretation/Analysis According to Longman Grammar of Spoken and Written English, “first person pronouns function to refer to the speaker/writer” (Biber, 2007, p. 41.) Furthermore, according to Ingrid Westin, first person pronouns are generally avoided in newspaper writing (2002.) Thus, given the data, it would be sensible to conclude that the infrequent use of first person pronouns in the Slate corpus is predominantly due to the fact that it is a source for news.
Second Person Result/Interpretation The use of the 2nd person in ENRON e-mails is not particularly surprising, given that e-mails are prone to less formality and are usually written as a dialogue between two people (with the exception of office memos which includes an audience of more than one person).  However, even e-mails addressed to more than one person would still utilize the 2nd person. The same can be said for blogs, as blogs may be less formal than Slate magazine columns, for example.  We attribute the lower 2nd person usage in blogs to the wide variety of blogs included in the corpus.  Many may resemble Slate magazine columns and be more informational, while others may be more involved.
Results & Interpretation of it occurrence Although when we measured this feature, we anticipated that blogs and ENRON would show the involved feature of “pronoun it” as shown in dimension 1, the data suggests the contrary, as ENRON shows the least frequency of the pronoun it.
Results and Interpretation of itOccurrence  With that said, Biber, Conrad, Johansson, Leech, & Finegan state as well in the Longman Grammar of Spoken and Written English (2007, p. 235 )that news registers have a very high count of both nouns and pronouns in general, much higher than conversational registers. Figure 4.1 from p. 235 of Longman Grammar demonstrates this.
Private Verbs In Variation Across Speech and Writing, Biber states that private verbs are associated with expressing intellectual states or non-observable intellectual acts (1992, p. 242.)  In addition, he states, “private verbs (e.g. think and feel) are used for the overt expression of private attitudes, thoughts, and emotions.”   Since we can assume with that at least some of the blogs in the corpus were written in a more diary or journal format due to the data provided, the similar private verb counts between the blogs and ENRON e-mails are unsurprising.  Slate, as a magazine, would have less of these as they are focused on reporting news and covering politics.
Noun Usage Result/Interpretation The highest register for noun usage is Slate (320.6) in comparison to Enron (245.5 and Blog ( 261.0). This suggests that Slate, being a magazine, involves more informational production. For instance, when reporting events, detailed descriptions are given.  This example from Slate demonstrates this: In the 1970s, W. Glenn Campbell had a brilliant idea for reviving the backwater California think tank he ran: He would hire pre-eminent scholars who were being let go from their universities because they had reached the age of mandatory retirement. So in the 1970s, Campbell lured philosopher Sidney Hook, physicist Edward Teller, and Nobel Laureate economist Milton Friedman to the Hoover Institution at Stanford University. 
Word Length Result/Interpretation The resulting high word length for the Slate corpus suggests that Slate actually falls under Dimension 1’s  informational production (Biber et al., 1998.)  As is stated in the text, Corpus Linguistics: Investigating Language Structure and Use, word length can indicate an “informational focus and a careful integration of information in a text” (1998, p. 149.) The low word length counts for the blogs and ENRON e-mails suggest the opposite – an involved focus.
Dimension 2 Analysis Positive features to be analyzed and interpreted: 1.) Third Person Pronouns 2.) Public Verbs
Dimension 2 Analysis (cont.) Higher frequency counts in the positive features of Dimension 2 suggest a more narrative discourse. Cyclically, lower counts in the positive features of Dimension 2 (and higher counts in the negative features) suggest a more non-narrative discourse. (Biber et al., 1998)
Third Person Result/Interpretation While, Slate (22.6) and Blog(21.8) both have close counts of 3rd person usage, the ENRON e-mails (12.1) corpus is significantly low. Generally, magazines have a frequent usage of the 3rd person because of reporting. Magazines indirectly address the audience, given that information in magazines is reported, following a formal format. However, blogs, depending on the type and topic can be formal or informal. One explanation for such a close count between the Slate and Blog corpora is the type of blogs used in the sample.  For example, here are two different blogs that represent formal language and informal language:
Informal I normally would not post this type of recipe until summer when I would probably feature it as a main course for a poolside buffet ; but I was so excited to try it I decided to think of it as a practice run.  It all started with leftover pineapple and  grew from there.  Although I found many recipes for this particular rice dish, the one I decided to try came from Closet Cooking, a food blogger I follow from the West Coast. Formal Scientists recently gleaned valuable information from emails sent by its employees in the 18 months prior to the company’s collapse. New Scientist reported that two researchers at the Florida Institute of Technology assessed 517,000 emails sent to approximately 15,000 employees at the now defunct energy company. (Courtesy Lindaraxa & Hoban)
Public Verbs The public verb (e.g. say) counts for ENRON and the blogs averaged out to 4.03 and 4.15 respectively.  Slate had a slightly higher count with 7.13.  We attribute this to the fact that Slate should have more positive narrative features, much like the American history articles mentioned in Biber, Conrad, and Reppen(1998) on pp. 159-160.   Slate is concerned with reporting events and offering commentary on these events.  As a result, there should be more past tense and perfect aspect verbs, more third-person referents as well as more reported speech.
Dimension 3 Analysis Positive features to be analyzed and interpreted: 1.) wh- Relative Clauses 2.) Nominalizations
Dimension 3 Analysis (cont.) Negative features to be analyzed and interpreted: 1.) Time Adverbials 2.) Place Adverbials
Wh- Relative Clauses
Wh- RelativeClauses In the  wh- relative clause average frequency counts, the data suggests that the Slate and Blogs corpora utilize more elaborated reference. In terms of situation-dependent reference, the ENRON corpus contains significantly less uses of wh- relative clauses. Due to the fact that wh- relative clauses are a positive feature in dimension 3, a low average frequency count suggests situation-dependent reference whereas a high average frequency count suggests elaborated reference.
Time Adverbials
Place Adverbials
Time/Place Adverbials In the  time and place adverbials average frequency counts, the data suggests that the Blogs and ENRON corpora utilize more situation-dependent reference. In terms of elaborated reference, the Slate corpus contains significantly less uses of time and place adverbials. Due to the fact that time and place adverbials are a negative feature in dimension 3, a low average frequency count suggests elaborated reference whereas a high average frequency count suggests situation-dependent reference.
Nominalizations
Nominalizations In the nominalization average frequency counts, the data suggests that the Blogs and ENRON corpora utilize more elaborated reference. In terms of situation-dependent reference, the Slate corpus contains significantly less uses of nominalization. Due to the fact that nominalizations are a positive feature in dimension 3, a high average frequency count suggests elaborated reference whereas a low average frequency count suggests situation-dependent reference.
Interpretation and Analysis of Dimension 3 Although the ENRON and Blog corpora may seem to have a contradictory relationship with Dimension 3 (i.e., more elaborated reference according to the positive features and more situation-dependent reference according to the negative features –except in the case of wh- relative clauses), it is important to note that time and place adverbials are usually used for “text-external references to the physical context of the discourse” (Biber et al., 1998, p. 153.) Moreover, wh- relative clauses are used to specify the identity of referents within a text in an explicit/elaborated manner (Biber et al., 1998.)
Interpretation and Analysis of Dimension 3 (cont.) Given this information, it is then not hard to consider the reasons behind such a discrepancy. 1.) The Slate and Blogs corpora are highly elaborated in wh- relative clauses as there must be constant reference to those who are being addressed; most likely due to the nature of  weblogs and news writing – in a sense this is because they are not formally addressed to a singular individual (most times) and are thus more elaborated or explicit in their use of reference… as opposed to say an e-mail that generally already tends to have a clear and concise personal reference (i.e., the receiver of the e-mail). However, it is also important to note that according to Ingrid Westin, in news writing there was a decrease in the use of wh- relative clauses that started in the 1970’s (Westin, 2003.) Up to this point there have been no significant statistical changes in the use of wh- relative clauses in news writing. Although only speculative, perhaps this resurgence of wh- relative clauses in Slate then has to do with the nature to web-based news media, and some more elaborated/involved tendencies that the register of online news is beginning to adopt.  2.) In terms of time and place adverbials, it would also make sense that Blogs and ENRON would contain more situation-dependent reference as there would be more time-external references to physical contexts outside of the discourse – this would likely signal more involved production (see dimension 1 analysis) and thus would be less related to Slate given its generally informational nature.
Interpretation and Analysis of Dimension 3 (cont.) However, in terms of nominalizations, the data suggests findings that are quite contrary to intuition. That is to say that given the informational and professional nature of Slate, one could assume that nominalization would be more frequent in the Slate corpus than in the ENRON or Blog corpora – in a sense, this would be relating the Slate corpus to things like AWL or other corpora that involve academic/professional language use where nominalizations tend to be more frequent. The data suggests instead that there is significantly less use of nominalization in Slate than in the ENRON and Blog corpora. In fact, there is about half as much nominalization in Slate than in the other two corpora.
Interpretation and Analysis of Dimension 3 (cont.) One explanation for this comes from Maurizio Gotti’s text Investigating Specialized Discourse. In Gotti’s words, “one effect of nominalization is the simplification of syntactic structures within a sentence” (Gotti, 2008, p. 83.) In terms of e-mails and blogs a simplification of the syntactic structures would be extremely important given the often brief and concise nature of both registers. This, in part can help explain why the Slate nominalization counts would be so low and the ENRON and Blog counts so high.
Interpretation and Analysis of Dimension 3 (cont.) Furthermore, according to Gotti, the increase in the use of nominalization is part of a “gradual tendency towards a loss of importance with the verb” (Gotti, 2008, p. 167.) In addition to news writing’s attempt to target a vast audience (and thus lowering the required “reading level” for a text – which could also help explain the low nominalization counts and low word length), news articles tend to be more active in order to convey important information in such a way that is digestible to the reader.   For instance, according to Westin “the increase of present tense verbs suggests an increased interest in topics of current relevance [i.e., news]” (Westin, 2003, p. 39.)
Interpretation and Analysis of Dimension 3 (cont.) Finally, one last way to look at the anti-intuitively low counts in Slate’s use of nominalization is to consider the anti-intuitively high counts in nominalization in the ENRON and Blog corpora. In the blog corpus, this could be very well due to the wide range of blogs contained within the corpus itself (i.e., some more formal or professional and some more informal or personal) – thus the use of nominalization then would be subject to all sorts of different criteria.  However, in the ENRON corpus, one would think that nominalization would be more infrequent given the (generally perceived) conversational nature of e-mails. According to Suzanne Eggins, conversational interactions generally contain very low use of nominalization (Eggins, 2004. ) With that said, perhaps this goes back to an e-mail being somewhere between conversation and a letter – making it the intermediary web-based register that it is. E-mails then are not specifically conversation, nor are blogs.
Dimension 4 Biber’s dimension 4 is described as the overt expression of argumentation. In other words, this dimension includes features that are associated with persuasive language.  Infinitives, the highest feature associated with dimension 4, were very similar in all three registers.  ENRON had an average count of 15.5, blogs had a count of 14.94, and Slate had the lowest count at 12.45. The next feature in dimension 4 is the occurrence of prediction modals.  ENRON had a exceptionally high count of prediction modals with 13.04, while the blog average was 6.81.  The average for Slate was again the lowest at 5.62.
Dimension 4 (cont.) Suasive verbs occurred at roughly the same frequencies, with ENRON having an average of 1.22, blogs having an average of 1.07, and Slate having an average of 1.05. Necessity modals occurred most frequently in blogs (with an average of 3.51,) with ENRON and Slate having averages of 3.22 and 2.22. Possibility modals occurred more frequently in the ENRON emails, with an average of 8.77.  The blogs and Slate had averages of 6.41 and 4.88 respectively.
Interpretation of Dimension 4 Results ENRON had a surprisingly high average of prediction modals, and a slightly high average of possibility modals.  This can be explained by a study by Carmen Frehner (2008) that prediction modals and possibility modals are most common in conversation, and email features more closely reflect those of conversation or a dialogue than the other corpora we are examining. On all features in Dimension 4, Slate had the lowest averages.  The reasons for this will be discussed in the conclusions.
Prediction  & Possibility Modals in Emails Frehner, 2008, p.71
Dimension 5 Dimension 5 is described as “impersonal versus non-impersonal style.”  This feature was formerly known as abstract versus non-abstract. Although not many features listed in dimension 5 were available to analyze, we were able to extract data on both by-passives and agentless passives. Agentless passives were most common in Slate at 8.18 while ENRON and the blogs had averages of 5.08 and 7.43. By-passives were also highest in Slate at 1.22, while ENRON and the blogs had similar counts of 0.68 and 0.87. We think that this may suggest that Slate has a slightly more abstract or impersonal style.
Other Features to Consider Overall Verb Usage Activity Verbs
Overall Verb Usage Result/Interpretation Although we could not find any research that supports that high or low verb usage alone is significant, we do find that certain types of verbs occurred more or less frequently within the different corpora.  As stated in our dimension 1 analysis, the private verb counts between the Blogs and ENRON corpora are expected given that some of the blogs can be assumed to have been written in a diary/journal format. Furthermore, Slate, as a magazine, would have less private verbs as they are focused on reporting news and covering politics. On the contrary, the public verb (e.g. say) counts for ENRON and the blogs averaged out to 4.03 and 4.15 respectively.  Slate had a slightly higher count with 7.13.  We attribute this to the fact that Slate should have more positive narrative features as mentioned in dimension 2. Moreover, Slate is concerned with reporting events and offering commentary on these events.  As a result, there should be more past tense and perfect aspect verbs, more third-person referents as well as more reported speech.
Activity Verbs Result/Interpretation According to Biber (2007) on p. 336 of Longman Grammar, across semantic domains, activity verbs have a high frequency in conversation. The only place where they have a higher frequency is fiction.  We attribute the high frequency of activity verbs in both blogs and ENRON e-mails to be due to the nature of these corpora being more placed on the positive features of Biber’s dimension 1, suggesting more involved production.  Blogs may be more informal, but this depends on the type of blog.  However, e-mail often functions much more like conversation.
Our Intuitions about the ENRON Corpus Generally, company emails are viewed as tools to use in the office. Strictly professional tools to communicate to co-workers about work related affairs. Therefore the assumption would be that a company, such as Enron, would  tend to engage in more formal language, a conciseness of language, frequent usage of 1st or 2nd person pronouns, and less usage of nouns.  Reasons: Conciseness of Language- Specific tasks that are explained within an email. Email is not used for informal conversation, so there is less reason for a high word count. Frequent 1st or 2nd pronouns- Usually the employee is writing about an office task that they are engaged in or want others to take part in.  Words such as “I” and “you” are commonly used. Less usage of nouns- Emails consist of directives, which request participation from another. Therefore activity is more frequent, and less informational which is associated with nouns. In order to illustrate this, here is an example of a standard professional e-mail:
To: abc@wyz.comCC: Accounts PayableSubject: Request for copy of invoiceDear ABC,I'm LMN form the Accounts Payable department at GHI. Ltd. I understand that we have an invoice outstanding with your company since 07/01/2010. This email is to request you for a copy of the invoice, so that we can clear it for payment at the earliest. First of all, apologies for the delay in payment. The accounts team has been reshuffled and this case came to my notice just an hour ago and I am writing to you immediately. The invoice in question is invoice number 246849, for Mr.JKI who stayed at your hotel for a period of 4 days. That is, from 06/28/2010 to 07/01/2010. We cannot seem to locate the invoice, so I request you to email me a copy of the invoice, so that I can issue the payment right away. Please send it to the email address mentioned below and mark it for my attention. Once again, sincere apologies for the delay.Thank you,LMN,Senior ExecutiveAccounts Payable,GHI. Ltdemail: accountspayable@ghi.com  Courtesy: (Iyer, 2010)
Our Intuitions about Slate As a group, we had little to no familiarity with Slate Magazine’s content. However, in our early research, we uncovered a few comments regarding Slate’s “left-leaning, liberal bias.” Given this, our intuitions told us that there would be a tendency towards persuasion; which, subsequently, could be reflected in an analysis of dimension 4 (i.e., overt expression of argumentation). We also thought that Slate, given its web-based nature, would reflect patterns as seen in blogs, thus making it more informal.
Our Intuitions about Blogs Given the lack of metadata about the Blogs corpus, we could only rely on the raw data to inform us as to the content of the overall corpus. However, we did have a few preconceived notions about what the Blogs corpus might be like: 1.) They would score high in terms of involved production via dimension 1. 2.) They would score high in terms of narrative discourse via dimension 2. 3.) They would tend to more closely mirror the frequency counts of the ENRON corpus.
Conclusion ,[object Object]
In addition to this, ENRON also had a tendency to reflect conversational discourse as seen in our analysis of dimensions 1; this is given that ENRON tended to exhibit frequency counts that suggested that it was more involved.

Más contenido relacionado

Similar a Final presentation al_8760_ansari_fernandez_hercules

John Q. StudentProfessor StalbirdEnglish 1201.xxx27 February.docx
John Q. StudentProfessor StalbirdEnglish 1201.xxx27 February.docxJohn Q. StudentProfessor StalbirdEnglish 1201.xxx27 February.docx
John Q. StudentProfessor StalbirdEnglish 1201.xxx27 February.docx
vrickens
 
Assignment Writing Your Working Bibliography You should have be.docx
Assignment Writing Your Working Bibliography You should have be.docxAssignment Writing Your Working Bibliography You should have be.docx
Assignment Writing Your Working Bibliography You should have be.docx
ssuser562afc1
 
Varying Definitions of Online Communication and .docx
 Varying Definitions of Online Communication and  .docx Varying Definitions of Online Communication and  .docx
Varying Definitions of Online Communication and .docx
MARRY7
 
Instructions Please note the research subject will be the use .docx
Instructions Please note the research subject will be the use .docxInstructions Please note the research subject will be the use .docx
Instructions Please note the research subject will be the use .docx
lanagore871
 
Lets Get It Started!1. Review your thesis, your classmates f.docx
Lets Get It Started!1.  Review your thesis, your classmates f.docxLets Get It Started!1.  Review your thesis, your classmates f.docx
Lets Get It Started!1. Review your thesis, your classmates f.docx
jesssueann
 
Varying Definitions of Online Communication and
 Varying Definitions of Online Communication and   Varying Definitions of Online Communication and
Varying Definitions of Online Communication and
MikeEly930
 
Varying Definitions of Online Communication and .docx
 Varying Definitions of Online Communication and  .docx Varying Definitions of Online Communication and  .docx
Varying Definitions of Online Communication and .docx
gertrudebellgrove
 
Varying Definitions of Online Communication and
 Varying Definitions of Online Communication and   Varying Definitions of Online Communication and
Varying Definitions of Online Communication and
MoseStaton39
 

Similar a Final presentation al_8760_ansari_fernandez_hercules (20)

Advanced English grammar tips for dissertation writers.
Advanced English grammar tips for dissertation writers.Advanced English grammar tips for dissertation writers.
Advanced English grammar tips for dissertation writers.
 
Essay Questions Hunting Snake
Essay Questions Hunting SnakeEssay Questions Hunting Snake
Essay Questions Hunting Snake
 
Essay Compare And Contrast Topics.pdf
Essay Compare And Contrast Topics.pdfEssay Compare And Contrast Topics.pdf
Essay Compare And Contrast Topics.pdf
 
Apa style
Apa styleApa style
Apa style
 
Question And Answer Essay Format. Buy Short Essay with Questions and Answers ...
Question And Answer Essay Format. Buy Short Essay with Questions and Answers ...Question And Answer Essay Format. Buy Short Essay with Questions and Answers ...
Question And Answer Essay Format. Buy Short Essay with Questions and Answers ...
 
Writing A Good Critique Essay
Writing A Good Critique EssayWriting A Good Critique Essay
Writing A Good Critique Essay
 
John Q. StudentProfessor StalbirdEnglish 1201.xxx27 February.docx
John Q. StudentProfessor StalbirdEnglish 1201.xxx27 February.docxJohn Q. StudentProfessor StalbirdEnglish 1201.xxx27 February.docx
John Q. StudentProfessor StalbirdEnglish 1201.xxx27 February.docx
 
TOPIC BASED ANALYSIS OF TEXT CORPORA
TOPIC BASED ANALYSIS OF TEXT CORPORATOPIC BASED ANALYSIS OF TEXT CORPORA
TOPIC BASED ANALYSIS OF TEXT CORPORA
 
106 slides on genre and narrative
106 slides on genre and narrative106 slides on genre and narrative
106 slides on genre and narrative
 
Sample Expository Essay Ppt. Online assignment writing service.
Sample Expository Essay Ppt. Online assignment writing service.Sample Expository Essay Ppt. Online assignment writing service.
Sample Expository Essay Ppt. Online assignment writing service.
 
Assignment Writing Your Working Bibliography You should have be.docx
Assignment Writing Your Working Bibliography You should have be.docxAssignment Writing Your Working Bibliography You should have be.docx
Assignment Writing Your Working Bibliography You should have be.docx
 
E-mail Vs. email
E-mail Vs. emailE-mail Vs. email
E-mail Vs. email
 
Varying Definitions of Online Communication and .docx
 Varying Definitions of Online Communication and  .docx Varying Definitions of Online Communication and  .docx
Varying Definitions of Online Communication and .docx
 
Literature Review Written In Apa Format Literature R
Literature Review Written In Apa Format Literature RLiterature Review Written In Apa Format Literature R
Literature Review Written In Apa Format Literature R
 
Instructions Please note the research subject will be the use .docx
Instructions Please note the research subject will be the use .docxInstructions Please note the research subject will be the use .docx
Instructions Please note the research subject will be the use .docx
 
Lets Get It Started!1. Review your thesis, your classmates f.docx
Lets Get It Started!1.  Review your thesis, your classmates f.docxLets Get It Started!1.  Review your thesis, your classmates f.docx
Lets Get It Started!1. Review your thesis, your classmates f.docx
 
How To Write A Hypothesis Example - How To Do Thing
How To Write A Hypothesis Example - How To Do ThingHow To Write A Hypothesis Example - How To Do Thing
How To Write A Hypothesis Example - How To Do Thing
 
Varying Definitions of Online Communication and
 Varying Definitions of Online Communication and   Varying Definitions of Online Communication and
Varying Definitions of Online Communication and
 
Varying Definitions of Online Communication and .docx
 Varying Definitions of Online Communication and  .docx Varying Definitions of Online Communication and  .docx
Varying Definitions of Online Communication and .docx
 
Varying Definitions of Online Communication and
 Varying Definitions of Online Communication and   Varying Definitions of Online Communication and
Varying Definitions of Online Communication and
 

Último

Russian Call Girls In Rajiv Chowk Gurgaon ❤️8448577510 ⊹Best Escorts Service ...
Russian Call Girls In Rajiv Chowk Gurgaon ❤️8448577510 ⊹Best Escorts Service ...Russian Call Girls In Rajiv Chowk Gurgaon ❤️8448577510 ⊹Best Escorts Service ...
Russian Call Girls In Rajiv Chowk Gurgaon ❤️8448577510 ⊹Best Escorts Service ...
lizamodels9
 
Call Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
amitlee9823
 
Call Girls In Majnu Ka Tilla 959961~3876 Shot 2000 Night 8000
Call Girls In Majnu Ka Tilla 959961~3876 Shot 2000 Night 8000Call Girls In Majnu Ka Tilla 959961~3876 Shot 2000 Night 8000
Call Girls In Majnu Ka Tilla 959961~3876 Shot 2000 Night 8000
dlhescort
 
The Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai Kuwait
The Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai KuwaitThe Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai Kuwait
The Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai Kuwait
daisycvs
 
Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...
Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...
Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...
lizamodels9
 
Call Girls In Noida 959961⊹3876 Independent Escort Service Noida
Call Girls In Noida 959961⊹3876 Independent Escort Service NoidaCall Girls In Noida 959961⊹3876 Independent Escort Service Noida
Call Girls In Noida 959961⊹3876 Independent Escort Service Noida
dlhescort
 
Call Girls From Pari Chowk Greater Noida ❤️8448577510 ⊹Best Escorts Service I...
Call Girls From Pari Chowk Greater Noida ❤️8448577510 ⊹Best Escorts Service I...Call Girls From Pari Chowk Greater Noida ❤️8448577510 ⊹Best Escorts Service I...
Call Girls From Pari Chowk Greater Noida ❤️8448577510 ⊹Best Escorts Service I...
lizamodels9
 

Último (20)

Russian Call Girls In Rajiv Chowk Gurgaon ❤️8448577510 ⊹Best Escorts Service ...
Russian Call Girls In Rajiv Chowk Gurgaon ❤️8448577510 ⊹Best Escorts Service ...Russian Call Girls In Rajiv Chowk Gurgaon ❤️8448577510 ⊹Best Escorts Service ...
Russian Call Girls In Rajiv Chowk Gurgaon ❤️8448577510 ⊹Best Escorts Service ...
 
Value Proposition canvas- Customer needs and pains
Value Proposition canvas- Customer needs and painsValue Proposition canvas- Customer needs and pains
Value Proposition canvas- Customer needs and pains
 
Call Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Falcon Invoice Discounting: The best investment platform in india for investors
Falcon Invoice Discounting: The best investment platform in india for investorsFalcon Invoice Discounting: The best investment platform in india for investors
Falcon Invoice Discounting: The best investment platform in india for investors
 
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
 
RSA Conference Exhibitor List 2024 - Exhibitors Data
RSA Conference Exhibitor List 2024 - Exhibitors DataRSA Conference Exhibitor List 2024 - Exhibitors Data
RSA Conference Exhibitor List 2024 - Exhibitors Data
 
Business Model Canvas (BMC)- A new venture concept
Business Model Canvas (BMC)-  A new venture conceptBusiness Model Canvas (BMC)-  A new venture concept
Business Model Canvas (BMC)- A new venture concept
 
Call Girls In Majnu Ka Tilla 959961~3876 Shot 2000 Night 8000
Call Girls In Majnu Ka Tilla 959961~3876 Shot 2000 Night 8000Call Girls In Majnu Ka Tilla 959961~3876 Shot 2000 Night 8000
Call Girls In Majnu Ka Tilla 959961~3876 Shot 2000 Night 8000
 
How to Get Started in Social Media for Art League City
How to Get Started in Social Media for Art League CityHow to Get Started in Social Media for Art League City
How to Get Started in Social Media for Art League City
 
Falcon's Invoice Discounting: Your Path to Prosperity
Falcon's Invoice Discounting: Your Path to ProsperityFalcon's Invoice Discounting: Your Path to Prosperity
Falcon's Invoice Discounting: Your Path to Prosperity
 
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best Services
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best ServicesMysore Call Girls 8617370543 WhatsApp Number 24x7 Best Services
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best Services
 
It will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 MayIt will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 May
 
Falcon Invoice Discounting platform in india
Falcon Invoice Discounting platform in indiaFalcon Invoice Discounting platform in india
Falcon Invoice Discounting platform in india
 
The Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai Kuwait
The Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai KuwaitThe Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai Kuwait
The Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai Kuwait
 
Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...
Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...
Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...
 
Katrina Personal Brand Project and portfolio 1
Katrina Personal Brand Project and portfolio 1Katrina Personal Brand Project and portfolio 1
Katrina Personal Brand Project and portfolio 1
 
Call Girls In Noida 959961⊹3876 Independent Escort Service Noida
Call Girls In Noida 959961⊹3876 Independent Escort Service NoidaCall Girls In Noida 959961⊹3876 Independent Escort Service Noida
Call Girls In Noida 959961⊹3876 Independent Escort Service Noida
 
Falcon Invoice Discounting: Empowering Your Business Growth
Falcon Invoice Discounting: Empowering Your Business GrowthFalcon Invoice Discounting: Empowering Your Business Growth
Falcon Invoice Discounting: Empowering Your Business Growth
 
Organizational Transformation Lead with Culture
Organizational Transformation Lead with CultureOrganizational Transformation Lead with Culture
Organizational Transformation Lead with Culture
 
Call Girls From Pari Chowk Greater Noida ❤️8448577510 ⊹Best Escorts Service I...
Call Girls From Pari Chowk Greater Noida ❤️8448577510 ⊹Best Escorts Service I...Call Girls From Pari Chowk Greater Noida ❤️8448577510 ⊹Best Escorts Service I...
Call Girls From Pari Chowk Greater Noida ❤️8448577510 ⊹Best Escorts Service I...
 

Final presentation al_8760_ansari_fernandez_hercules

  • 1. Zakkiyyah Ansari Maria Fernandez Christopher Hercules A Comparison of Three Corpora from Internet-based Registers
  • 2. Introduction/Background Email has effectively replaced the letter in recent years as the frequency of computer mediated communication has increased. Not being quite a letter, yet not being quite conversational, email has been described as “written speech” or “computer conversation” by some (Perez, Turney, & Montero, 2008.) Such a classification then allows for the creation of a new register for corpus linguists to examine. The ENRON corpus , being an example of this register, is the largest real email corpus (Yang, Zeng, & Chau, 2007.) The ENRON corpus contains e-mails that were utilized for a variety of purposes (not all of which professional), and due to this, is a very versatile corpus to work with – many forms of communication are presented here, thus furthering this notion of a register between writing and conversation. Similarly, another common web-based register that has the capacity to be examined by corpus linguists is online news. Slate magazine is an on-line magazine which has never been featured in print. Topics within Slate include arts, business, sports, news, and politics; although it is primarily politically focused. It is property of the Microsoft corporation, who contributed 4,694 different articles published between 1996-2000 from its archives to the American National Corpus.
  • 3. Introduction/Background (cont.) Lastly, there are blogs; perhaps one of the most frequent web-based registers in current use. Blogs are so varied in their uses and topics that it is almost impossible to talk about blogs without some context. For example, many blog-writers use their blogs for personal journals, politics, business, academic purposes, and literary criticism, to name a few (Schmidt, 2007.)
  • 4.
  • 5. Interpret common features found in the three registers to understand their level of formality or informality.
  • 6. Disprove or confirm the intuition of whether certain registers are truly more formal or informal than others.
  • 7.
  • 8. Dimensional Discourse Analysis Dimension 1: Edited informational discourse vs. on-line informational discourse Dimension 2: Involved personal discourse vs. non-personal uninvolved discourse (Biber, 1998, p. 183)
  • 9. Enron Email Example Hi. This is ***, and now ***'s, friend ***. I know that she spoke with you about me, and I wanted to see if you might be interested in getting together sometime. You probably want to know a little about me. I work at Enron as an electricity trader, and I've lived in Houston for about three years. I've known Richard 9 years from our fraternity days together. I'm 27 and I graduated from UT 4 1/2 years ago. Of course, I would be interested in hearing something about you as well. *** just told me that she had a good friend in Houston that she thought I should meet. Let me know what you think, and we can go from there. ***Wow, I almost forgot. Thank you. I will contact her soon. Thanks *** Monday Feb 4th
  • 10. An Example Slate Article Excerpt The papers report that the federal judge who presided over the Paula Jones lawsuit yesterday ordered President Clinton to pay nearly $90,000 to Jones' legal team for giving false testimony about his relationship with Monica Lewinsky. The sum is less than the Jones lawyers asked for, but more than Clinton offered. Clinton and his lawyer said that they will pay the money without further legal protest. The WP and LAT play the story below the fold, the NYT reefers it, and USAT runs it on Page 5. This play, given the story's unprecedented content, signifies that editors have just plain had it with the whole topic and are sure that readers feel likewise. The WP , NYT , LAT and Wall Street Journal report that China has issued an arrest warrant for Li Hongzhi, the New York-based leader of the now-banned sect Falun Gong, charging him with the deaths of hundreds of his followers. The papers note that the warrant is much more political than legal, in that the U.S. Does not have an extradition treaty with China. The NYT reports that immediately after last summer's terrorist bombing of two American embassies in Africa, the government of Sudan detained two suspects, only to angrily release them after the U.S. conducted a cruise missile strike against an alleged chemical weapons facility in Sudan. Also, the paper says, Sudanese officials claim that the U.S. ignored their message that they had suspects in the case in custody. The NYT says that Sudan's notification has been confirmed by some American officials. – Slate Article 247_3303 from the American National Corpus
  • 11.
  • 12. Word Count Result/Interpretation In the chart of average word count, there is an exceptional difference across the three registers word counts. However, the register with the highest word count is the Blog (4,615.5). One explanation that could explain this difference is the fact that blogs tend to be less concise and less informal. While blog writers are creating blogs, they are self conscious of their audience. Therefore, in order to attract readers, bloggers must use attractive language in order to hold a readers attention. In that process writing becomes more drawn out and detail oriented. In addition, the second highest average word count register is the Enron emails. After analyzing random samples of the Enron email, it can be stated that many of the emails are informal and involved. Thus, both the Blog and Enron Email registers can be characterized as On-line informational in Discourse Dimension 1. In Discourse Dimension 2, the Enron emails and the blogs can be generalized to exhibit features of Involved personal discourse.
  • 13. Dimension 1 Analysis Positive features to be analyzed and interpreted: 1.) First Person Pronouns 2.) Second Person Pronouns 3.) It Occurrence 4.) Private Verbs
  • 14. Dimension 1 Analysis (cont.) Negative features to be analyzed and interpreted: 1.) Noun Usage 2.) Word Length
  • 15. Dimension 1 Analysis (cont.) According to Biber, more positive features in dimension 1 generalizably suggests more involved discourse. Cyclically, more negative features suggests more in terms of informational production. (Biber, 1998)
  • 16.
  • 17. First Person Interpretation/Analysis According to Longman Grammar of Spoken and Written English, “first person pronouns function to refer to the speaker/writer” (Biber, 2007, p. 41.) Furthermore, according to Ingrid Westin, first person pronouns are generally avoided in newspaper writing (2002.) Thus, given the data, it would be sensible to conclude that the infrequent use of first person pronouns in the Slate corpus is predominantly due to the fact that it is a source for news.
  • 18.
  • 19. Second Person Result/Interpretation The use of the 2nd person in ENRON e-mails is not particularly surprising, given that e-mails are prone to less formality and are usually written as a dialogue between two people (with the exception of office memos which includes an audience of more than one person). However, even e-mails addressed to more than one person would still utilize the 2nd person. The same can be said for blogs, as blogs may be less formal than Slate magazine columns, for example. We attribute the lower 2nd person usage in blogs to the wide variety of blogs included in the corpus. Many may resemble Slate magazine columns and be more informational, while others may be more involved.
  • 20.
  • 21. Results & Interpretation of it occurrence Although when we measured this feature, we anticipated that blogs and ENRON would show the involved feature of “pronoun it” as shown in dimension 1, the data suggests the contrary, as ENRON shows the least frequency of the pronoun it.
  • 22. Results and Interpretation of itOccurrence With that said, Biber, Conrad, Johansson, Leech, & Finegan state as well in the Longman Grammar of Spoken and Written English (2007, p. 235 )that news registers have a very high count of both nouns and pronouns in general, much higher than conversational registers. Figure 4.1 from p. 235 of Longman Grammar demonstrates this.
  • 23. Private Verbs In Variation Across Speech and Writing, Biber states that private verbs are associated with expressing intellectual states or non-observable intellectual acts (1992, p. 242.) In addition, he states, “private verbs (e.g. think and feel) are used for the overt expression of private attitudes, thoughts, and emotions.” Since we can assume with that at least some of the blogs in the corpus were written in a more diary or journal format due to the data provided, the similar private verb counts between the blogs and ENRON e-mails are unsurprising. Slate, as a magazine, would have less of these as they are focused on reporting news and covering politics.
  • 24.
  • 25. Noun Usage Result/Interpretation The highest register for noun usage is Slate (320.6) in comparison to Enron (245.5 and Blog ( 261.0). This suggests that Slate, being a magazine, involves more informational production. For instance, when reporting events, detailed descriptions are given. This example from Slate demonstrates this: In the 1970s, W. Glenn Campbell had a brilliant idea for reviving the backwater California think tank he ran: He would hire pre-eminent scholars who were being let go from their universities because they had reached the age of mandatory retirement. So in the 1970s, Campbell lured philosopher Sidney Hook, physicist Edward Teller, and Nobel Laureate economist Milton Friedman to the Hoover Institution at Stanford University. 
  • 26.
  • 27. Word Length Result/Interpretation The resulting high word length for the Slate corpus suggests that Slate actually falls under Dimension 1’s informational production (Biber et al., 1998.) As is stated in the text, Corpus Linguistics: Investigating Language Structure and Use, word length can indicate an “informational focus and a careful integration of information in a text” (1998, p. 149.) The low word length counts for the blogs and ENRON e-mails suggest the opposite – an involved focus.
  • 28. Dimension 2 Analysis Positive features to be analyzed and interpreted: 1.) Third Person Pronouns 2.) Public Verbs
  • 29. Dimension 2 Analysis (cont.) Higher frequency counts in the positive features of Dimension 2 suggest a more narrative discourse. Cyclically, lower counts in the positive features of Dimension 2 (and higher counts in the negative features) suggest a more non-narrative discourse. (Biber et al., 1998)
  • 30.
  • 31. Third Person Result/Interpretation While, Slate (22.6) and Blog(21.8) both have close counts of 3rd person usage, the ENRON e-mails (12.1) corpus is significantly low. Generally, magazines have a frequent usage of the 3rd person because of reporting. Magazines indirectly address the audience, given that information in magazines is reported, following a formal format. However, blogs, depending on the type and topic can be formal or informal. One explanation for such a close count between the Slate and Blog corpora is the type of blogs used in the sample. For example, here are two different blogs that represent formal language and informal language:
  • 32. Informal I normally would not post this type of recipe until summer when I would probably feature it as a main course for a poolside buffet ; but I was so excited to try it I decided to think of it as a practice run.  It all started with leftover pineapple and  grew from there.  Although I found many recipes for this particular rice dish, the one I decided to try came from Closet Cooking, a food blogger I follow from the West Coast. Formal Scientists recently gleaned valuable information from emails sent by its employees in the 18 months prior to the company’s collapse. New Scientist reported that two researchers at the Florida Institute of Technology assessed 517,000 emails sent to approximately 15,000 employees at the now defunct energy company. (Courtesy Lindaraxa & Hoban)
  • 33. Public Verbs The public verb (e.g. say) counts for ENRON and the blogs averaged out to 4.03 and 4.15 respectively. Slate had a slightly higher count with 7.13. We attribute this to the fact that Slate should have more positive narrative features, much like the American history articles mentioned in Biber, Conrad, and Reppen(1998) on pp. 159-160. Slate is concerned with reporting events and offering commentary on these events. As a result, there should be more past tense and perfect aspect verbs, more third-person referents as well as more reported speech.
  • 34. Dimension 3 Analysis Positive features to be analyzed and interpreted: 1.) wh- Relative Clauses 2.) Nominalizations
  • 35. Dimension 3 Analysis (cont.) Negative features to be analyzed and interpreted: 1.) Time Adverbials 2.) Place Adverbials
  • 37. Wh- RelativeClauses In the wh- relative clause average frequency counts, the data suggests that the Slate and Blogs corpora utilize more elaborated reference. In terms of situation-dependent reference, the ENRON corpus contains significantly less uses of wh- relative clauses. Due to the fact that wh- relative clauses are a positive feature in dimension 3, a low average frequency count suggests situation-dependent reference whereas a high average frequency count suggests elaborated reference.
  • 40. Time/Place Adverbials In the time and place adverbials average frequency counts, the data suggests that the Blogs and ENRON corpora utilize more situation-dependent reference. In terms of elaborated reference, the Slate corpus contains significantly less uses of time and place adverbials. Due to the fact that time and place adverbials are a negative feature in dimension 3, a low average frequency count suggests elaborated reference whereas a high average frequency count suggests situation-dependent reference.
  • 42. Nominalizations In the nominalization average frequency counts, the data suggests that the Blogs and ENRON corpora utilize more elaborated reference. In terms of situation-dependent reference, the Slate corpus contains significantly less uses of nominalization. Due to the fact that nominalizations are a positive feature in dimension 3, a high average frequency count suggests elaborated reference whereas a low average frequency count suggests situation-dependent reference.
  • 43. Interpretation and Analysis of Dimension 3 Although the ENRON and Blog corpora may seem to have a contradictory relationship with Dimension 3 (i.e., more elaborated reference according to the positive features and more situation-dependent reference according to the negative features –except in the case of wh- relative clauses), it is important to note that time and place adverbials are usually used for “text-external references to the physical context of the discourse” (Biber et al., 1998, p. 153.) Moreover, wh- relative clauses are used to specify the identity of referents within a text in an explicit/elaborated manner (Biber et al., 1998.)
  • 44. Interpretation and Analysis of Dimension 3 (cont.) Given this information, it is then not hard to consider the reasons behind such a discrepancy. 1.) The Slate and Blogs corpora are highly elaborated in wh- relative clauses as there must be constant reference to those who are being addressed; most likely due to the nature of weblogs and news writing – in a sense this is because they are not formally addressed to a singular individual (most times) and are thus more elaborated or explicit in their use of reference… as opposed to say an e-mail that generally already tends to have a clear and concise personal reference (i.e., the receiver of the e-mail). However, it is also important to note that according to Ingrid Westin, in news writing there was a decrease in the use of wh- relative clauses that started in the 1970’s (Westin, 2003.) Up to this point there have been no significant statistical changes in the use of wh- relative clauses in news writing. Although only speculative, perhaps this resurgence of wh- relative clauses in Slate then has to do with the nature to web-based news media, and some more elaborated/involved tendencies that the register of online news is beginning to adopt. 2.) In terms of time and place adverbials, it would also make sense that Blogs and ENRON would contain more situation-dependent reference as there would be more time-external references to physical contexts outside of the discourse – this would likely signal more involved production (see dimension 1 analysis) and thus would be less related to Slate given its generally informational nature.
  • 45. Interpretation and Analysis of Dimension 3 (cont.) However, in terms of nominalizations, the data suggests findings that are quite contrary to intuition. That is to say that given the informational and professional nature of Slate, one could assume that nominalization would be more frequent in the Slate corpus than in the ENRON or Blog corpora – in a sense, this would be relating the Slate corpus to things like AWL or other corpora that involve academic/professional language use where nominalizations tend to be more frequent. The data suggests instead that there is significantly less use of nominalization in Slate than in the ENRON and Blog corpora. In fact, there is about half as much nominalization in Slate than in the other two corpora.
  • 46. Interpretation and Analysis of Dimension 3 (cont.) One explanation for this comes from Maurizio Gotti’s text Investigating Specialized Discourse. In Gotti’s words, “one effect of nominalization is the simplification of syntactic structures within a sentence” (Gotti, 2008, p. 83.) In terms of e-mails and blogs a simplification of the syntactic structures would be extremely important given the often brief and concise nature of both registers. This, in part can help explain why the Slate nominalization counts would be so low and the ENRON and Blog counts so high.
  • 47. Interpretation and Analysis of Dimension 3 (cont.) Furthermore, according to Gotti, the increase in the use of nominalization is part of a “gradual tendency towards a loss of importance with the verb” (Gotti, 2008, p. 167.) In addition to news writing’s attempt to target a vast audience (and thus lowering the required “reading level” for a text – which could also help explain the low nominalization counts and low word length), news articles tend to be more active in order to convey important information in such a way that is digestible to the reader. For instance, according to Westin “the increase of present tense verbs suggests an increased interest in topics of current relevance [i.e., news]” (Westin, 2003, p. 39.)
  • 48. Interpretation and Analysis of Dimension 3 (cont.) Finally, one last way to look at the anti-intuitively low counts in Slate’s use of nominalization is to consider the anti-intuitively high counts in nominalization in the ENRON and Blog corpora. In the blog corpus, this could be very well due to the wide range of blogs contained within the corpus itself (i.e., some more formal or professional and some more informal or personal) – thus the use of nominalization then would be subject to all sorts of different criteria. However, in the ENRON corpus, one would think that nominalization would be more infrequent given the (generally perceived) conversational nature of e-mails. According to Suzanne Eggins, conversational interactions generally contain very low use of nominalization (Eggins, 2004. ) With that said, perhaps this goes back to an e-mail being somewhere between conversation and a letter – making it the intermediary web-based register that it is. E-mails then are not specifically conversation, nor are blogs.
  • 49. Dimension 4 Biber’s dimension 4 is described as the overt expression of argumentation. In other words, this dimension includes features that are associated with persuasive language. Infinitives, the highest feature associated with dimension 4, were very similar in all three registers. ENRON had an average count of 15.5, blogs had a count of 14.94, and Slate had the lowest count at 12.45. The next feature in dimension 4 is the occurrence of prediction modals. ENRON had a exceptionally high count of prediction modals with 13.04, while the blog average was 6.81. The average for Slate was again the lowest at 5.62.
  • 50. Dimension 4 (cont.) Suasive verbs occurred at roughly the same frequencies, with ENRON having an average of 1.22, blogs having an average of 1.07, and Slate having an average of 1.05. Necessity modals occurred most frequently in blogs (with an average of 3.51,) with ENRON and Slate having averages of 3.22 and 2.22. Possibility modals occurred more frequently in the ENRON emails, with an average of 8.77. The blogs and Slate had averages of 6.41 and 4.88 respectively.
  • 51. Interpretation of Dimension 4 Results ENRON had a surprisingly high average of prediction modals, and a slightly high average of possibility modals. This can be explained by a study by Carmen Frehner (2008) that prediction modals and possibility modals are most common in conversation, and email features more closely reflect those of conversation or a dialogue than the other corpora we are examining. On all features in Dimension 4, Slate had the lowest averages. The reasons for this will be discussed in the conclusions.
  • 52. Prediction & Possibility Modals in Emails Frehner, 2008, p.71
  • 53. Dimension 5 Dimension 5 is described as “impersonal versus non-impersonal style.” This feature was formerly known as abstract versus non-abstract. Although not many features listed in dimension 5 were available to analyze, we were able to extract data on both by-passives and agentless passives. Agentless passives were most common in Slate at 8.18 while ENRON and the blogs had averages of 5.08 and 7.43. By-passives were also highest in Slate at 1.22, while ENRON and the blogs had similar counts of 0.68 and 0.87. We think that this may suggest that Slate has a slightly more abstract or impersonal style.
  • 54. Other Features to Consider Overall Verb Usage Activity Verbs
  • 55.
  • 56. Overall Verb Usage Result/Interpretation Although we could not find any research that supports that high or low verb usage alone is significant, we do find that certain types of verbs occurred more or less frequently within the different corpora. As stated in our dimension 1 analysis, the private verb counts between the Blogs and ENRON corpora are expected given that some of the blogs can be assumed to have been written in a diary/journal format. Furthermore, Slate, as a magazine, would have less private verbs as they are focused on reporting news and covering politics. On the contrary, the public verb (e.g. say) counts for ENRON and the blogs averaged out to 4.03 and 4.15 respectively. Slate had a slightly higher count with 7.13. We attribute this to the fact that Slate should have more positive narrative features as mentioned in dimension 2. Moreover, Slate is concerned with reporting events and offering commentary on these events. As a result, there should be more past tense and perfect aspect verbs, more third-person referents as well as more reported speech.
  • 57.
  • 58. Activity Verbs Result/Interpretation According to Biber (2007) on p. 336 of Longman Grammar, across semantic domains, activity verbs have a high frequency in conversation. The only place where they have a higher frequency is fiction. We attribute the high frequency of activity verbs in both blogs and ENRON e-mails to be due to the nature of these corpora being more placed on the positive features of Biber’s dimension 1, suggesting more involved production. Blogs may be more informal, but this depends on the type of blog. However, e-mail often functions much more like conversation.
  • 59. Our Intuitions about the ENRON Corpus Generally, company emails are viewed as tools to use in the office. Strictly professional tools to communicate to co-workers about work related affairs. Therefore the assumption would be that a company, such as Enron, would tend to engage in more formal language, a conciseness of language, frequent usage of 1st or 2nd person pronouns, and less usage of nouns. Reasons: Conciseness of Language- Specific tasks that are explained within an email. Email is not used for informal conversation, so there is less reason for a high word count. Frequent 1st or 2nd pronouns- Usually the employee is writing about an office task that they are engaged in or want others to take part in. Words such as “I” and “you” are commonly used. Less usage of nouns- Emails consist of directives, which request participation from another. Therefore activity is more frequent, and less informational which is associated with nouns. In order to illustrate this, here is an example of a standard professional e-mail:
  • 60. To: abc@wyz.comCC: Accounts PayableSubject: Request for copy of invoiceDear ABC,I'm LMN form the Accounts Payable department at GHI. Ltd. I understand that we have an invoice outstanding with your company since 07/01/2010. This email is to request you for a copy of the invoice, so that we can clear it for payment at the earliest. First of all, apologies for the delay in payment. The accounts team has been reshuffled and this case came to my notice just an hour ago and I am writing to you immediately. The invoice in question is invoice number 246849, for Mr.JKI who stayed at your hotel for a period of 4 days. That is, from 06/28/2010 to 07/01/2010. We cannot seem to locate the invoice, so I request you to email me a copy of the invoice, so that I can issue the payment right away. Please send it to the email address mentioned below and mark it for my attention. Once again, sincere apologies for the delay.Thank you,LMN,Senior ExecutiveAccounts Payable,GHI. Ltdemail: accountspayable@ghi.com Courtesy: (Iyer, 2010)
  • 61. Our Intuitions about Slate As a group, we had little to no familiarity with Slate Magazine’s content. However, in our early research, we uncovered a few comments regarding Slate’s “left-leaning, liberal bias.” Given this, our intuitions told us that there would be a tendency towards persuasion; which, subsequently, could be reflected in an analysis of dimension 4 (i.e., overt expression of argumentation). We also thought that Slate, given its web-based nature, would reflect patterns as seen in blogs, thus making it more informal.
  • 62. Our Intuitions about Blogs Given the lack of metadata about the Blogs corpus, we could only rely on the raw data to inform us as to the content of the overall corpus. However, we did have a few preconceived notions about what the Blogs corpus might be like: 1.) They would score high in terms of involved production via dimension 1. 2.) They would score high in terms of narrative discourse via dimension 2. 3.) They would tend to more closely mirror the frequency counts of the ENRON corpus.
  • 63.
  • 64. In addition to this, ENRON also had a tendency to reflect conversational discourse as seen in our analysis of dimensions 1; this is given that ENRON tended to exhibit frequency counts that suggested that it was more involved.
  • 65.
  • 66. References Biber, D., Conrad, S., & Reppen, R. (1998). Corpus linguistics: Investigating language structure and use. Cambridge: Cambridge University Press. Biber, D., Conrad, S., Johansson, S., Leech, G., & Finegan, E. (2007). Longman grammar of spoken and written English. Essex, England: Pearson Education Limited. Biber, D. (1992). Variation across speech and writing. New York, NY: Cambridge University Press. Cho, T. (2010). language@internet. Retrieved 3 26, 2011, from Linguistic Features of Electronic Mail in the Workplace: A Comparison with Memoranda: http://www.languageatinternet.de/articles/2010/2728 Eggins, Suzanne (2004). An introduction to systemic functional linguistics. New York: Continuum International Publishing Group. Frehner, Carmen (2008). Email, SMS, MMS: the linguistic creativity of asynchronous discourse in the new media age. New York: Peter Lang Gotti, Maurizio (2008). Investigating specialized discourse. New York: Peter Lang. Hoban, J. (2009, June 24). Hidden Patterns – Enron Email Predicted Collapse. Retrieved 03 26, 2011, from Xobni Blog : http://blog.xobni.com/2009/06/24/hidden-patterns-enron-email-predicted-collapse Iyer, S. (2010, August 13). Business Email Sample. Retrieved 04 2011, from Buzzle.com: http://www.buzzle.com/articles/business-email-sample.html Lindaraxa. (2011, April 13). Lindaraxa:Thai Pineapple Fried Rice With Shrimp. Retrieved 04 14, 2011, from http://lindaraxa.blogspot.com/2011/04/thai-pineapple-fried-rice-with-shrimp.html Perez Sabater, C. , Turney, E. , & Montero Fleta, B. (2008). Orality and literacy, formality and informality in email communication. Iberica: Revista De La AsociacionEuropea De Lenguas Para Fines Especificos/Journal of the European Association of Languages for Specific Purposes (AELFE) (IbericaR), 15, 71-88.
  • 67. References Schmidt, J. (2007). Blogging practices: An analytical framework. Journal of Computer-Mediated Communication, 12(4), article 13. http://jcmc.indiana.edu/vol12/issue4/schmidt.html Westin, Ingrid (2003). Language change in English newspaper editorials. New York: Editions Rodopi B.V. Yang, C.C., Zeng, Daniel, & Chau, Michael (Eds). (2007). Proceedings from the Pacific Asia Workshop, PAISI ‘07: Intelligence and Security Informatics. Chengdu, China: Springer Yoffe, E. (2011, April 14). Please Take the Gold Watch. Please! Retrieved 04 14, 2011, from Slate: http://www.slate.com/id/2291194/pagenum/all/)