This document discusses quantifying stylistic variation in blogs to determine if the corporate blog can be considered an emerging genre. It presents research on analyzing over 16,000 blog posts from 115 corporate blogs, 18 personal blogs, and sections from newspapers and company press releases. Linguistic features are automatically tagged and a formality metric called the F-score is used to measure variation across individual blogs, authors, blog types, and levels. Preliminary observations include that blogs are characterized by dynamicity and fluidity, functional blog subtypes tend to be consistent, internal variation may correlate with functional complexity, and quantitative variation can measure genre dynamicity and stability.
Separation of Lanthanides/ Lanthanides and Actinides
Variation in Corporate Blogs and Emerging Genres
1. Variation and “Genrefication” in Blogs
Cornelius Puschmann
AG 3: Syntactic Variation and Emerging Genres
DGfS 29, Siegen
28.02.2007
2. Thesis project
The corporate blog as an emerging genre of
computer-mediated communication
Focus
●
survey of a new form of domain-specific publishing
●
both linguistic and extra-linguistic aspects
Question: is the corporate blog a genre?
Research context
3. “A blog is a user-generated website where entries are made
in journal style and displayed in a reverse chronological order.
Blogs often provide commentary or news on a particular
subject, such as food, politics, or local news; some function
as more personal online diaries.” http://en.wikipedia.org/wiki/Blog
“Corporate blogging is the use of blogs to further
organizational goals” Debbie Weil, The Corporate Blogging Book
Blogs? Corporate Blogs?
5. Genre: “A class of communicative events with a shared set of
communicative purposes” (Swales)
Text typology: “Linguistic features, their co-occurrence
and relative distribution in a text” (Biber, paraphrase)
Assumption: genre is one factor determining text typology
Genre vs. Text Typology
6. My focus: differences in the relative distribution of features
=> quantitative variation
is shaped by
formal factors
mode/channel
register
speaker
Quantifying stylistic variation
7. Quantifiable stylistic variation in blogs can occur
on several levels
1. post 2. author
3. blog 4. type of blog (corporate,..)
Assumption: By vertically and horizontally assessing the
degree of variation on these levels for an emerging genre
(e.g. the corporate blog), we should be able to observe its
degree of typological stability.
Assessing variation on multiple levels
8. - web feeds (RSS and Atom protocols) used to retrieve,
store and analyse language data
- implemented TreeTagger for automated tagging
- 134 blogs (115 corporate, 1 political*, 18 private)
- 3 press editorial sections (NYT, WashPo, LA Times)
- 5 press release sections (Microsoft, GM, Sun, Oracle, McD)
- 16,895 posts
- 4,041,133 tokens
The corpus
9. F-score (Heylighen & Dewaele):
a metric to quantify the level of formality in a text,
where formality is specifically defined as the diametrical
opposite of contextuality
formula:
0.5 * ((N + ADJ + PRP + DET) - (PN + V + ADV + ITJ) + 100)
Measuring formality via f-scores
10. The Toshiba Portege R400 is a Windows Vista-inspired
signature mobile PC that incorporates innovative connectivity
and display technologies to provide timely access to e-mail and
appointments via Active Notifications and is built on Windows
SideShow™ technology. [...]
http://www.microsoft.com/presspass/press/2007/jan07/01-07CES2007PR.mspx
- high noun frequency
- high adjective frequency
- more nominal than verbal
- often relate complex information
- often describe future events/potentiality
Example: high f-score (press release)
11. OK, OK, I'm partly at fault here. But, hear me out. Last year at
Gnomedex I had my son demonstrate Second Life up on stage
while I was hosting a panel discussion. Someone from Linden
Labs (the folks who make Second Life), Beth Goza (she now
works at Microsoft), saw that, and told me and my son to knock
it off. People under 18 aren't allowed in Second Life. So, what
did I do? I just told Patrick never to go into Second Life and I
didn't go back into Second Life either. [...]
http://scobleizer.com/2007/02/18/second-life-has-my-credit-card-and-wont-let-go/
- high frequency of personal pronouns
- more verbal than nominal
- often describe past events, personal impressions, feelings
Example: low f-score (blog)
16. O1: Blogs are characterized by dynamicity and fluidity
O2: Functional blog subtypes tend to be consistent
O3: Internal variation may correlate with functional complexity
O4: Quantitative variation can be a measure of
a) genre dynamicity and
b) genre stability/fluidity
Observations