SlideShare una empresa de Scribd logo
1 de 12
Descargar para leer sin conexión
Only the abstract here is included in the proceedings of the WikiSym + OpenSym 2013 Conference (wsos2013). The full text is a work-in-
progress draft, revised based on blind-review comments and suggestions. Please contact the author for latest citation for this research.
How does localization influence online visibility of user-
generated encyclopedias? A study on Chinese-language
Search Engine Result Pages (SERPs)
Han-Teng Liao
Oxford Internet Institute
University of Oxford
Oxford, United Kingdom
hanteng@gmail.com
ABSTRACT
Prior empirical and theoretical work has discussed the role of
dominant search engine plays in the function of information
gatekeeping on the Web, and there are reports on the high ranking
of Wikipedia website among the search engine result pages (SERP).
However, little research has been conducted on non-Google search
engines and non-English versions of user-generated encyclopedias.
This paper proposes a method to quantify the “display” gatekeeping
differences of the SERP ranking and presents findings based on the
Chinese SERP data. Based on 2,500 mainly-Chinese-language
search queries, the data set includes the SERP outcome of four
Chinese-speaking regions (mainland China, Singapore, Hong Kong
and Taiwan) provided by three major search engines (Baidu, and
Google and Yahoo), covering over 97% of the search engine
market in each region. The findings, analysed and visualized using
network analysis techniques, demonstrate the followings: major
user-generated encyclopedias are among the most visible;
localization factors matter (certain search engine variants produce
the most divergent outcomes, especially mainland Chinese ones).
The indicated strong effects of “network gatekeeping” by search
engines also suggest similar dynamics inside user-generated
encyclopedias.
Categories and Subject Descriptors
[Human-centered computing]: Collaborative and social
computing – Collaborative filtering, Wikis, Empirical studies in
collaborative and social computing
[Information Systems]: Web search engines – Collaborative
filtering, Page and site ranking
General Terms
Management, Performance, Design, Human Factors, Theory
Keywords
Geo-linguistic analysis, network analysis, Network gatekeeping,
Chinese Internet, Chinese characters, Localization, censorship.
1. INTRODUCTION
Using search engine is among the most popular online activity for
users in the US (Fallows, 2008) and mainland China (CIC, 2009;
CNNIC, 2009), and has been among the driving forces of the fast-
growing online advertising platform (Varian, 2007; SEMPO, 2011;
IDATE, 2011; PricewaterhouseCoopers, 2011). It has been
reported that (and speculated why) the global leader of search
engines Google has consistently favoured the global leader of user-
generated encyclopedias Wikipedia by showing relevant pages
frequently and prominently in the search engine result pages
(thereafter SERP) (Charlton, 2012; Čuhalev, 2006; Gray, 2007;
Jones, 2007; Silverwood-Cope, 2012). Independent market
research by Nielsen Online and Hitwise Intelligence has
demonstrated that Wikipedia not only dominates the online visits
for encyclopedia content, but also does so mainly because of the
traffic directed by major Web search engines (Hopkins, 2009;
Nielsen Online, 2008). Even the Wikimedia Foundation
acknowledged this (Google drives traffic to Wikipedia), but
nonetheless argued that half of its readers did want to look for
Wikipedia content (Khanna, 2011). Thus, as major websites that
dominate traffic and user attention, Google and Wikipedia seem to
be central in guiding users where to look.
However, most of the findings and discussions are limited to or
predominantly focused on the English-language context(Battelle,
2005; Bermejo, 2009; Couvering, 2004, 2008; Dahlberg, 2005;
Hargittai, 2007; Segev, 2008), and little effort has been made to
understand whether such a phenomenon is specific to
Google/Wikipedia or can be found for other major search engines
and user-generated encyclopedias. In addition, the multi-lingual
internet and the rise of non-English users on the Web have multiple
implications on the “localization” effects on search engines.
Localization (thereafter L10n), a process of adapting computer
software or information systems for a group of users usually
defined by national boundaries or geo-linguistic profiles(Hussain
& Mohan, 2008; Liao, 2011; McKenna & Naftulin, 2000), is
expected to influence users’ information-seeking practices. Both
Google and Wikipedia provide localized content and interfaces
designed to serve different group of users. .
Because Google (or other general-purpose search engines),
Wikipedia (or other user-generated encyclopedias) and localization
are likely to present and thus frame the Web differently for different
groups of users, they effectively filter information for them. While
such filtering can be described as gatekeeping by communication
scholars, the fact that the Web users can directly or indirectly
participate in such information filtering processes has introduced
techniques and theories of "collaborative filtering" (Benkler, 2006;
Goldberg, Nichols, Oki, & Terry, 1992) and “network
gatekeeping”(Barzilai-Nahon, 2008). Indeed, while Google and
Only a prior version of the abstract above was included in the
proceedings of the WikiSym + OpenSym 2013 Conference
(wsos2013). The text below is a work-in-progress draft, revised
based on blind-review comments and suggestions. Please contact
the author for latest citation for this research.
WikiSym '13 August 05 - 07 2013, Hong Kong, China
Copyright 2013 ACM 978-1-4503-1852-5/13/08 ...$15.00.
2
Wikipedia may concentrate Web traffic and command user
attention as major global websites, users’ contribution of web
content and links may also influence such filtering and gatekeeping
outcomes, as demonstrated by the case of Google query of
“Jew”(Bar Ilan, 2006)­ : some users were organized to help the
Wikipedia’s entry page of “Jew” to rank higher in the Google’s
English-language SERPs.
Thus, although both "collaborative filtering" (Benkler, 2006;
Goldberg et al., 1992) and “network gatekeeping”(Barzilai-Nahon,
2008) are indeed about filtering and keeping information, the
possibility of participation by user input makes the different from
the filtering and gatekeeping processes in traditional media.
Nonetheless, I argue that geographic and linguistic factors may
bound or limit such collaborative and networking possibilities and
thus re-introducing national and/or linguistic boundaries back on
the Web. Indeed, as early as in the early 2000s, researchers such as
Zittrain and Sunstein have raised the issues of localized search
results in filtering political content or fragmenting public sphere
(Morris & Ogan, 2002; Sunstein, 2002). For SERPs, the question
of information control and linguistic boundaries remains, while the
“borders” of national framework have been reintroduced in many
aspects of technological and legal arrangements(University &
School, 2006). In particular, Google’s first collaboration with (or
accommodation of) Chinese government’s need and later exit from
mainland China has demonstrated the intricate political and cultural
dimensions of “localization” of search engine services(Vaughan &
Zhang, 2007; Einhorn, 2010). Thus, the research gap on the effects
of localization on SERPs and non-English Wikipedia need to be
filled, including prominent cases of Chinese-language and Arabic-
language internet users whose recent presence and participation in
the new internet world has also attracted much attention (Dutta,
Dutton, & Law, 2011). In particular, in order to answer how search
engines and/or user-generated encyclopedias reintroduce or shape
the national or social boundaries, more empirical work on L10n
effects is needed (Aragón, Kaltenbrunner, Laniado, & Volkovich,
2012; Bao et al., 2012; Hecht & Gergle, 2010; Liao, 2008, 2011;
Luyt, Goh, & Lee, 2009; Massa & Scrinzi, 2012; Mazieres &
Huron, 2013; Petzold, Liao, Hartley, & Potts, 2012; Rogers &
Sendijarevic, 2012; Warncke-Wang, Uduwage, Dong, & Riedl,
2012). L10n is also briefly discussed as contributing factor to
“internationalization mechanisms” of “network
gatekeeping”(Barzilai-Nahon, 2008), holding the key for
researchers to understand the nationalization or internationalization
dynamics of the Web.
For Chinese-language internet, there are many localized versions
provided several major search engines, including examples such as
Yahoo China, Google Hong Kong, Google Taiwan, etc. I call them
search engine-locale variants (thereafter search engine variants).
Do different search engine variants guide users from various
Chinese-speaking regions to see the same websites regardless of
which search engine they chose? Or do they see divergent SERP?
Prior empirical research has been conducted in analysing SERPs
inside mainland China, with the latest research on 316 search query
phrases of “Internet event” collected in 2009, indicating that indeed
Baidu Baike and Chinese Wikipedia has ranked high among the
SERPs (Jiang & Akhtar, 2011). However, it focuses on (and thus is
limited to) simplified Chinese users in mainland China and the
selected sample of search queries was based upon internet incidents
that are politically controversial to mainland China. This paper
contributes findings based 2500 search queries in 2011, covering
not only more topics but also more Chinese-language search
engines across more regions such as Hong Kong, Taiwan and
Singapore. Before presenting the methods and findings, the next
section will first provide a theoretical framework that captures the
localization effects of search engines.
2. L10N OF SEARCH ENGINES
Observing how search engines categorise users is one of the
practical ways to examine the impact of search engines on national
and/or regional boundaries. As part of the industry practice in
internationalization/Localization (i18n/L10n), search engines
provide different interfaces and services for different users, usually
categorized by their geo-linguistic identifiers, using language codes
such as zh-TW (Chinese in Taiwan), pt-BR (Portuguese in Brazil),
and en-IN (English in India)(DePalma, 2002; Dunne, 2006). These
identifiers in turn influence how content is aggregated, filtered and
prioritised for users who share the same or similar language
preferences. Online users and audiences are often partitioned
accordingly by search engine marketing tools such as Google
AdWords and Microsoft adCenter. Unlike the globalized TV
industry where broadcasting and cable TV are still bounded to
geography, these geo-linguistic codes are configurable. For
example, one can manage to use UK version of Google even when
not in UK
To conceptualize the localization effects of search engines, this
paper applies the “network gatekeeping” theory (Barzilai-Nahon,
2008) for the following reasons. First, localization was discussed
as contributing factor to “internationalization mechanisms” of
“network gatekeeping”(Barzilai-Nahon, 2008). Albeit the theory
comes mainly from information science to better understand
information control in network settings, its multidisciplinary
aspects (Jucquois-Delpierre, 2007) can help researchers understand
how seemingly technical arrangement of computer software or
information system can have enormous effects on gatekeeping or
controlling the flows and presentation of information. Second,
distinct from traditional gatekeeping theory that focuses on
withholding or deletion of information, the network gatekeeping
theory not only conceptualizes localization as part of the
gatekeeping processes, but also emphasizes the “display” bases for
such processes: “Presenting information in a particular visual form
designed to catch the eye” (Barzilai-Nahon, 2008). Indeed, search
engines visually present the results. Thus, to understand the
localization effects of search engines, a data collection method
must consider not only the localization parameters but also the
visual display of search results.
I argue that locales in computing, a set of parameters that describes
user’s language, region and other interface preferences, constitute
one of the most important online “situations” for online media. By
“situations” I use the definition used by medium theorists in the
tradition of media ecology: “situations as (social) information-
systems that set the patterns of access to information” (Meyrowitz,
1986, 1994). Note that as medium theorists focus on medium rather
on messages, the definition is particular suitable for studying search
engines because some major companies including Google have
resisted the idea that they are in the content or media industry by
insisting that they are information companies. For media and
communication scholars, the underlying question is less about
Google’s industrial identity but rather about how online media in
general can use locales to segment, fragment and integrate different
media markets and/or audiences by using different information
system settings. Thus, geographic and linguistic factors seem to
“set the patterns of access to information”, as geo-linguistic
situations are expected to determine which websites will be the
most visible and constantly appearing ones in the SERPs.
3
2.1 A Straight-forward Visibility Test
Because users often browse SERPs from the top to the bottom,
various market research(Enquiro, 2007), social science research
(Bar Ilan, 2006; Dunleavy, Margetts, Bastow, Pearce, & Tinkler,­
2007; Margetts & Escher, 2006; Vaughan & Thelwall, 2004) and
industry practices (Slingshot SEO, 2011) has measured the level of
online visibility based on webometric data such as the positions in
SERPs (more visible if more high up) and/or the number of in-
coming web links by other websites. These measurements provide
the foundations for keyword search advertising (Brettel & Spilker-
Attig, 2010; Chen, 2008; B. J. Jansen, Brown, & Resnick, 2007; B.
J. Jansen & Mullen, 2008; J. Jansen, 2011; Jung, 2008; Malaga,
2008; Spindler, 2010). For marketing purposes, it is imperative to
boost the ranking of a website for a target set of search terms (or
search keywords). For the purpose of this research, the focus shifts
to the medium role of search engines between users and webpages.
As shown in Figure 1, search engines play the gatekeeping role by
curating different sets of web pages for different group of users
characterized by their respective search engine variants. It
functions as “network” gatekeeping because search engines often
provide different rankings based on both user data and the inter-
linking data among the web pages themselves.
Figure 1. Search engines as the “network gatekeeper”
between users and web pages
To account for the difference made by the ranking positions in
SERPs, this research proposes a method to quantify such “display”
gatekeeping differences(Barzilai-Nahon, 2008). Because different
SERP rankings suggest different level of visibility, different scores
can be assigned. One way to do so is use click-through rate
(thereafter CTR) data for SERPs.
Commonly used in online advertising, CTR measure the number of
clicks on a web link divided by the number of times it is shown to
the users (i.e. clicks/impressions). For search engine marketing,
CTR indicate the probability of a listed web link being clicked.
Based on the arithmetic mean of the CTR for top-10 search results
from five different sources (Hearne, 2006; Jones, 2007; Young,
2011), I plotted the scatter chart in Figure 2 to show the relationship
between the SERP ranking and CTR. The top-ranking website is
expected to receive more than 30% of the traffic while the second
receives just a bit over 10%, and so on. The relationship between
the SERP ranking and CTR seem to follow the power function of y
= axb
. Thus a power regression analysis is done to provide a
curve-fitting function of y = 0.2889x-1.078
, with high R² value
(0.9934), suggesting a close fit. Thus for this research, the visibility
scores are assigned accordingly based on the SERP ranking.
Figure 2. Click-through Rates depending on the ranking in
the Search Engine Results Page (SERP)
While it is impossible to exhaust the SERPs to identify patterns of
preferred websites, it has been established by the previous research
that the top-10 search results in the first SERP occupy a significant
proportion of users’ attention and actual clicks (Hearne, 2006;
Jones, 2007; Young, 2011), and based on such estimated data of
CTR, different visibility scores can be assigned to websites
depending on their ranking in the SERP, as shown in Figure 2.
High SERP ranking does not always guarantee users’ actual clicks.
Nonetheless, it is justified to use CTR as proxy for visibility scores
for the purpose of research: it is the best-effort attempt based on
various sources of industry data.
2.2 Chinese Search Engine Markets
According to various survey, market and traffic reports from both
inside and outside mainland China (CIC, 2009; CNNIC, 2006,
2007; Nguyen, 2011; Russell, 2011; StatCounter, 2011), three
major search engines (Baidu, and Google and Yahoo) dominate the
search engine markets across four regions (mainland China,
Singapore, Hong Kong, and Taiwan) and two Chinese scripts
preferences (simplified Chinese for mainland China and Singapore;
traditional Chinese for Hong Kong and Taiwan). Thus, nine search
engine variants can be derived from the combinations of search
engine providers and geo-linguistic preferences, which altogether
cover over 97% of the market::
 For mainland China (mostly simplified Chinese users):
zh-cn: Baidu, Google (simplified Chinese), Yahoo China
 For Singapore (mostly simplified Chinese users):
zh-sg:Google Singapore and Yahoo Singapore
 For Hong Kong (mostly traditional Chinese users):
zh-hk:Google Hong Kong and Yahoo Hong Kong
 For Taiwan (mostly traditional Chinese users):
zh-tw:Google Taiwan and Yahoo Taiwan
These variants are hereafter abbreviated as Baidu_CN, Google_CN,
Yahoo_CN, Google_SG, Yahoo_SG, Google_HK, Yahoo_HK,
Google_TW,Yahoo_TW.It is noted that Baidu continues to enjoy its
lead in mainland China with Google at second place, after Google
moved its mainland operations to Hong Kong (BBC, 2011). In
Hong Kong and Taiwan around 2010 to 2011, Google has
overtaken Yahoo’s leading position while maintaining its top
position in Singapore (StatCounter, 2011). With all these nine
variants, will the SERPs merge on a similar set of websites or
diverge? By answering this question, researchers can gain insights
on the converging and diverging effects of search engines for
Chinese-language users across these regions.
Users	(often	
categorized	by	
providers	and	
geo‐linguistic	
settings)
Search	Engines
Web	
pages
y = 0.2889x-1.078
R² = 0.9934
0%
5%
10%
15%
20%
25%
30%
35%
1 2 3 4 5 6 7 8 9 10
VisibilityScores
Ranking of the Search Engine Results Page
wighted by
CTR
unweighted
Power
(wighted by
CTR)
4
2.3 Merging and diverging effects of SERPs
If the aforementioned market survey and traffic reports are correct,
search engine users from Taiwan mostly filter web pages through
the lens of search engine variants of Google_TW and Yahoo_TW.
ThosefromHongKongmostlyuseGoogle_HKandYahoo_HK,andsoon.
By conceptualizing search engines as medium, the merging and
diverging patterns of SERPs will also indicate whether users from
these regions will see similar websites, using different search
engine providers. Hence, the SERP data may indicate patterns
which search engines may overcome offline boundaries across
these regions (if the SERPs converge on specific websites) and
which may reinforce them (if the SERPs diverge), thereby
contributing to the general question of media and globalization on
the case of search engines.
To do so, the proposed method of visibility tests that quantify the
top-ranking websites can be used as indication of search engines
exercising its “display” gatekeeping power for certain websites.
Based on the quantified numbers of such display gatekeeping
power, the visibility patterns can be systematically examined
between (1) search engine variants and (2) visible websites.
Moreover, visibility scores can be further aggregated (i.e. summed)
over a selection of search queries, so as to better answer different
research questions that guide such selection. Ideally, by exhausting
visibility scores for various localized versions of SERPs over large
sample of search queries, researchers can better compare how
visible a website is across different search engine variants, thereby
paving the ways for showing the merging and diverging patterns of
the SERPs.
It should be noted that, borrowing from the academic research on
webometric visibility and the industry practice on keyword
advertising, the proposed framework and method is general enough
for future study regardless the providers and/or geo-linguistic
preferences of search engines: For example. How different, or
similar, are the SERPs provided by Yandex versus Google in
Turkey? How different, or similar, are the SERPs provided by
Google Hindi versus Google Urdu in India? The outcome of
visibility scores can be further visualized and analysed by various
network analysis techniques. Thus, this method will answer these
empirical questions, with results that can then be interpreted to
explore the cultural political implications of such patterns.
To showcase how the integrated method works satisfactorily, I
choose to study Chinese-language internet because its boundaries
have several historical, cultural and political complications. For
example, regions such as mainland China, Singapore, Hong Kong
and Taiwan have different practices in democracy, free speech,
human rights and Chinese scripts (Damm, 2007; Liao, 2009; Zhao
& Baldauf, 2007).
3. DATA Collection
To identify how search engine variants influence the Chinese-
language SERPs, the top-10 results should provide enough
indication.
3.1 Search Queries
First, I have selected about 2500 search queries that are relevant to
Chinese cultural and political topics. As summarized in Table 1, the
selection includes all 990 entries in "The Cambridge Encyclopedia
of China"(The Cambridge encyclopedia of China, 1991), the top 10
search terms provided respectively by Baidu and Google (including
mainland China, Hong Kong and Taiwan variations) of various
categories since 2007, major popular cultural references, notable
people names and some other culturally and politically "sensitive"
keywords. Although other selection or combination is possible, this
selection aims to focus this research on the prominence of user-
generated encyclopedias across Chinese-speaking regions.
Table 1 Sources and numbers of search queries
Second, the sample keywords are transliterated into search queries
according to the respective Chinese orthographic preferences
(simplified Chinese for mainland China and Singapore; traditional
Chinese for Hong Kong and Taiwan), making this research first of
its kind to compare SERPs across Chinese-language variants.
Third, the top-10 SERPs are collected for the nine search engine
variants that cover four major Chinese-speaking regions of China,
Singapore, Hong Kong and Taiwan. Then they are parsed and
processed by the visibility tests, weighting the high-ranking
website with higher visibility scores.
3.2 Search Results
Around 22,000 web links are extracted from the SERPs based on
the outcome of 2500 search queries submitted across nine
variations of search engines in 2011. These 22,000 web links
correspond to around 25,000 unique domain names. Then the
outcome is further consolidated manually by checking IP addresses
to over 16,000 websites (e.g. the website of sohu.com aggregates
money.sohu.com and women.sohu.com). Finally, all education and
government websites are aggregated into respective top-level
domain names, such as edu.tw, edu.cn, gov.cn and gov.hk.
4. FINDINGS
To show how localization influences online visibility, the collected
data of visibility scores are unpacked and analysed as follows.
4.1 Concentrated visibility scores
Figure 3 shows the respective proportion distribution and
accumulative distribution of visibility scores for the top-100 most
visible websites. It is evident that near 80% of the visibility scores
are concentrated over the top-100 websites, and indeed three user-
generated encyclopedia websites ranked highest: (1)wikipedia.org,
(2)baidu.com and (3)hudong.com. For the website wikipedia.org,
Chinese Wikipedia (zh.wikipedia.org) is the most visible; for
Baidu.com, Baidu Baike (baike.baidu.org) is the most visible.
Categories of Search Keywords
The Cambridge Encyclopedia of China 990
Top 10 Search Terms (Google and Baidu) 387
Best Film/Popular Music (China, Hong Kong, Taiwan) 364
Modern Concepts (shared with modern Japanese) 171
Notable People 476
Nobel Prize Winners of Chinese origin 11
Major Chinese Politicians 187
Rich People (China, Hong Kong, Taiwan) 82
100 Contemporary Intellectuals (China) 100
Major Fugitives From Taiwan 17
Victims of White Terror in Taiwan 79
Potentially Sensitive Terms 112
Japanese AV porn stars 48
Prosecuted and Sentenced Corrupted Chinese Officials 14
Documented Filtered Words by Great Firewall 50
Total 2500
Numbers
5
Figure 3. Concentrated visibility scores
Since the top-100 most visible websites account for more than 80%
of the visibility scores, strong concentration effects are found. Thus,
the following sub-section further examines these websites.
4.2 Tabulating visibility scores
Table 2 tabulates the top-100 ranking websites, and their respective
visibility scores for each search engine variants. Each cell shows
the visibility score that a search engine variant has contributed to a
particular website. For example, the first cell 34.30 indicates how
much Baidu_CN has contributed to Chinese Wikipedia
(zh.wikipedia.org).
Table 2 Top-ranking websites: visibility scores
Note that the top three are all user-generated encyclopedia: Chinese
Wikipedia, Baidu Baike and Hudong Baike. For another example,
the official news website of Falun Gong (epochtimes.com which is
ranked at 18th) is completely blocked out from Baidu’s results (i.e.
the zero visibility score suggests that it never show up in Baidu’s
SERPs). It is in direct contrast, say for Yahoo_HK in third last
column, where it enjoys visibility score higher than all other
mainland-based website including Chinese official media People’s
Daily (people.com.cn which is ranked at 15th), suggesting that the
Falun Gong news website perform better even than People’s Daily
for Yahoo Hong Kong.
Therefore, Table 2 shows in detail which search engine variants
favour which websites by citing and showing them more often and
prominently in SERPs, rendering them easier to be found (at least
for the selection of the search queries). The top-ranking websites
include major China-based portals (e.g. baidu.com, sina.com.cn,
qq.com, sohu.com and 163.com), US-based websites (e.g.
youtube.com, facebook.com), mainland China-based news media
websites (e.g. people.com.cn, xinhuanet.com, ifeng.com) and the
aggregated category of mainland Chinese government websites
(i.e. gov.cn).
Table 2 orders the websites from the most visible one at the top row
to the least visible at the bottom row, while the order of search
engine variants is decided firstly by search engine providers (from
Baidu, Google to Yahoo) then secondly by region (from CN, HK,
SG to TW). It is relatively difficult, however, to see any pattern
right away from Table 2 as it is tabulated. In other words, although
each cell in the table shows the specific level of propensity that a
search engine variant prefers a certain website in their SERPs, the
table as a whole fails to show in a clear way the overall propensity
of which "group" of search engine variants favours which "set" of
websites.
To identify patterns of converging and diverging, I will use
blockmodeling analysis in the next subsection to study the visibility
scores in Table 2, each of which represents the strength of ties
between search engines and websites. To avoid arbitrary clustering
results produced by less-consequential websites collected in the
SERPs, only the top-100 most visible websites are considered for
analysis.
4.3 Clustering using blockmodeling analysis
Cluster analysis is commonly used for exploratory data mining to
find how different data points can be grouped based on some
statistical data analysis of similarities and differences. To find how
“birds of a feather flock together” for the websites and search
engine variants at hand, various clustering techniques can be
applied, including the agglomerative hierarchical clustering
analysis that produce a family tree that details how each data points
can be grouped.
Nonetheless, this study chooses blockmodeling analysis (Doreian,
Batagelj, & Ferligoj, 2004) for the following reasons. First, a
blockmodel analysis will produce simplified outcome that suits
better for the research question at hand: to identify the rough
patterns, without the need to see how specific details on which
website is closer to another. Second, as to be shown later, a
blockmodel analysis can greatly simplify a complex dataset to
provide succinct summarization of the overall structure. Third, as
researchers can and must design a blockmodel for data points to fit,
a blockmodel analysis is particularly useful to identify converging
and diverging patterns. It also provides a systematic way to see how
the data points fit the model or not. Fourth, a blockmodel can be
seen as a simplified network, and thus it can help to produce a
simplified visualization of network data. It should be noted that the
dataset can be seen as a two-mode network: Different “nodes” of
search engine variants giving different visibility scores to different
“nodes” of websites. It is thus equivalent to a network of visibility
scores. High visibility scores indicate strong “relationship”. It is an
example of two-mode network because there are two types of nodes
(i.e. search engine variants and websites) and the relationship
between the nodes is limited between the two types of nodes (i.e.
the visibility score contributed by one search engine variant to one
website).
4.3.1 A blockmodel design
Before detailing how the cluster outcome helps identify the
merging and diverging patterns systematically, it is necessary to
explain the basis on which I design the blockmodel in Table 3. To
build a blockmodel, researchers have to make design decisions on
g g g g
0%
10%
20%
30%
40%
50%
60%
70%
80%
0 20 40 60 80 100
Accumulative
Proportion
Rank-
ing
Websites
(Aggregated)
Baidu
_CN
Google
_CN
Google
_HK
Google
_SG
Google
_TW
Yahoo
_CN
Yahoo
_HK
Yahoo
_SG
Yahoo
_TW
1 zh.wikipedia.org 34.30 272.37 611.39 304.15 586.50 24.46 833.95 254.00 721.01
2 baike.baidu.com 661.93 410.28 174.04 433.81 125.52 72.44 39.10 508.05 4.88
3 hudong.com 5.30 107.93 71.29 107.92 57.31 267.17 2.54 168.23 0.35
4 baidu.com 385.80 51.36 13.29 53.21 9.93 20.52 7.17 102.80 1.65
5 sina.com.cn 59.18 76.85 21.69 69.33 16.63 41.70 2.04 35.29 0.68
6 knowledge.yahoo.com 0.10 0.03 0.29 0.36 93.46 20.33 140.07
7 edu.tw 0.46 5.14 21.14 7.21 64.29 0.06 30.61 21.07 102.98
8 qq.com 40.27 41.23 13.00 37.26 11.64 57.85 2.07 23.35 0.95
9 youtube.com 0.29 8.39 66.03 9.04 68.63 45.20 4.96 19.00
10 gov.cn 25.46 38.94 20.30 32.29 15.61 43.03 5.29 34.84 3.57
11 sohu.com 20.89 32.82 10.08 27.34 8.08 38.97 3.18 22.11 1.57
12 163.com 25.59 34.68 10.78 31.51 10.00 32.31 2.52 14.56 0.87
13 facebook.com 0.29 1.93 8.96 2.26 19.00 88.33 8.31 33.61
14 youku.com 42.04 29.12 10.32 19.34 8.41 36.38 1.03 15.31 0.64
15 people.com.cn 14.54 23.19 16.00 23.82 18.14 20.97 17.81 11.43 13.39
16 blog.sina.com.cn 21.73 28.47 15.41 26.79 13.95 9.75 4.27 33.78 2.53
17 xinhuanet.com 26.13 27.18 21.02 27.71 20.06 11.50 1.70 19.31 0.40
18 epochtimes.com 1.05 27.34 2.23 33.05 34.57 3.93 36.62
19 ifeng.com 25.67 25.13 11.86 24.39 9.67 16.70 4.20 10.12 2.56
20 baike.soso.com 11.08 7.60 1.31 5.93 1.05 29.16 0.29 63.30 0.04
6
the “connection types” (e.g. “complete” versus “null”) and the
number of blocks. A block is said to be “complete” if all cells in
that block indicate strong relationship and a block is said to be “null”
if all cells in that block contain only weak or none relationship.
Thus the three by three blockmodel in Table 3 assumes the data
points will fit into nine blocks. For this study, nine search engines
will be divided into three groups, and the top-100 websites will be
categorized into three sets of websites.
Table 3 Expected outcome of blockmodeling
 
The rationale behind this model is to identify converging and
diverging patterns. The second part of the Table 3 shows how three
groups of search engine variants (Cluster A, B and C) may
converge or diverge on different sets of websites (Cluster X, Y and
Z). Thus, I assume a middle ground of websites exist: for all search
engine variants, there will be a set of websites that are all visible
(i.e. Cluster Y). That is, Cluster A, B and C converge on Cluster Y
with high visibility scores, indicated by the dark blocks containing
strong ties (i.e. high visibility scores). To account for any deviation
from the "converging" middle ground, I expect two blocks of low-
visibility cells (i.e. weak or none relationship), as represented by
two white cells in Table 3): one at the top-left and another at the
bottom-right. Both blocks thus indicate the patterns of divergence,
or lack of convergence. For this study, if all search engine variants
converge on the same top visible websites, then there should be no
patterns of divergence. Using this scenario of complete
convergence as the null hypothesis (no difference in visibility
patterns), I expect some evidence of diverging effects to reject the
null hypothesis. If there is a significant number of websites in the
low-visibility blocks (one at upper-left and another at lower-right
corner), then the diverging patterns are identified accordingly.
4.3.2 Patterns of merging and diverging
Using the blockmodeling function provided by a social network
analysis tool called Pajek, the 9 by 100 cells of strong versus weak
ties are simplified into the three-by-three blockmodel, as shown in
Table 4. For each cell, the color represents strong (dark) or weak
(white) ties, and these cells are roughly partitioned into three-by-
three blocks, thereby effectively clustering the nine search engine
variants into three groups and the 100 most visible websites into
three sets. It is not a perfect match, and there are 87 cells out of 900
(9.67%) that does not match the designed block model. Given the
space limitation, only the top-20 websites in full.
As shown in Table 4, for the top 100 websites, 39 of them are
categorized into the first cluster of websites (Cluster X), 13 to
Cluster Y and 49 to Cluster Z. If we look at the top-20 most visible
websites only, the converging set of websites (Cluster Y) is thin
(only one website). This website (people.com.cn) belongs to the
Chinese official party organ media People’s Daily.
Table 4 Blockmodeling outcome
weak strong strong
strong strong strong
strong strong weak
Rank-
ing
Websites
(Aggregated)
Baidu_
CN
Yahoo_
CN
Google
_CN
Yahoo_
SG
Google
_SG
Google
_TW
Google
_HK
Yahoo_
HK
Yahoo_
TW
1 zh.wikipedia.org 34.30 24.46 272.37 254.00 304.15 586.50 611.39 833.95 721.01
6 knowledge.yahoo.com 0.10 0.00 0.03 20.33 0.00 0.36 0.29 93.46 140.07
7 edu.tw 0.46 0.06 5.14 21.07 7.21 64.29 21.14 30.61 102.98
9 youtube.com 0.29 0.00 8.39 4.96 9.04 68.63 66.03 45.20 19.00
13 facebook.com 0.29 0.00 1.93 8.31 2.26 19.00 8.96 88.33 33.61
18 epochtimes.com 0.00 0.00 1.05 3.93 2.23 33.05 27.34 34.57 36.62
… and other 33 websites (The total number of websites is 39 for this block)
15 people.com.cn 14.54 20.97 23.19 11.43 23.82 18.14 16.00 17.81 13.39
… and other 12 websites (The total number of websites is 13 for this block)
2 baike.baidu.com 661.93 72.44 410.28 508.05 433.81 125.52 174.04 39.10 4.88
3 hudong.com 5.30 267.17 107.93 168.23 107.92 57.31 71.29 2.54 0.35
4 baidu.com 385.80 20.52 51.36 102.80 53.21 9.93 13.29 7.17 1.65
5 sina.com.cn 59.18 41.70 76.85 35.29 69.33 16.63 21.69 2.04 0.68
8 qq.com 40.27 57.85 41.23 23.35 37.26 11.64 13.00 2.07 0.95
10 gov.cn 25.46 43.03 38.94 34.84 32.29 15.61 20.30 5.29 3.57
11 sohu.com 20.89 38.97 32.82 22.11 27.34 8.08 10.08 3.18 1.57
12 163.com 25.59 32.31 34.68 14.56 31.51 10.00 10.78 2.52 0.87
14 youku.com 42.04 36.38 29.12 15.31 19.34 8.41 10.32 1.03 0.64
16 blog.sina.com.cn 21.73 9.75 28.47 33.78 26.79 13.95 15.41 4.27 2.53
17 xinhuanet.com 26.13 11.50 27.18 19.31 27.71 20.06 21.02 1.70 0.40
19 ifeng.com 25.67 16.70 25.13 10.12 24.39 9.67 11.86 4.20 2.56
20 baike.soso.com 11.08 29.16 7.60 63.30 5.93 1.05 1.31 0.29 0.04
… and other 35 websites (The total number of websites is 48 for this block)
relatively strong versus weak: vs blockmodel:
strong weak
This blockmodeling findings also help identify the merging and
diverging patterns of search engine variants. Cluster A contains
Baidu_CN, Yahoo_CN and Google_CN; Cluster B contains
Google_HK, Google_SG, Google_TW and Yahoo_SG; Cluster C
contains Yahoo_HK and Yahoo_TW. The cluster outcome shown
in Table 5 indicates both patterns of merging and diverging,
determined by the choice of search engine variants. For the three
groups of search engine variants, two groups of search engine
variants deviate from the rest. The first group (Cluster A) contains
search engine variants designed for mainland China (Baidu_CN,
Yahoo_CN and Google_CN), and the second group (Cluster C)
contains the Yahoo Search for Taiwan and Hong Kong (Yahoo_HK
and Yahoo_TW). Thus, while the search engine variants in Cluster
B produce converging results for the top-100 websites, with
“complete” connection types to all clusters of websites, those in
Cluster A and those in Cluster C lead to diverging SERP.
Table 5 Clusters identified by blockmodeling
4.4 Visualizing and unpacking findings
To show the results of visibility scores in a more intuitive manner,
a network visualization graph of the top-800 most visible websites
is shown in Figure 4. I visualize the nine search engine variants
(shown as the text boxes at the peripheral) and 800 most visible
websites (shown as nodes in the middle). Thus, the two-mode
network is presented in a way to indicate the overall likelihood for
a given search engine variant to recommend a website shown in the
middle. Pointing only from one node of search engine variant to
one node of website, each arrow represents a total visibility score
Cluster A Cluster B Cluster C
Cluster X complete complete
Cluster Y complete complete complete
Cluster Z complete complete
Cluster A Cluster B Cluster C
Cluster X
Cluster Y
Cluster Z
converging
converging
converging
Cluster A Cluster B Cluster C
Baidu_CN Google_HK Yahoo_HK
Google_CN Google_SG Yahoo_TW
Yahoo_CN Google_TW
Websites # Yahoo_SG
Cluster X 39 complete complete
Cluster Y 13 complete complete complete
Cluster Z 48 complete complete
7
contributed by a search engine variant to a website, with its arrow
width proportional to the values of visibility scores: Wider arrows
indicate higher visibility scores . Similarly, the area size of a node
is proportional to the sum of visibility scores a website receive from
all search engine variants, allowing easy comparison on which
websites are more visible.
Note that the visibility scores are distributed quiet unevenly and
thus only the top 20 are marked with their respective ranking
numbers. User-generated encyclopedias are the most visible
websites (node 1: Chinese Wikipedia , node 2: Baidu Baike, node
3: Hudong). For another, Chinese Wikipedia(1) is highly visible to
almost all variations except Yahoo_CN and Baidu_CN, while
Baidu Baike(2) highly visible in Baidu_CN, Google_CN,
Google_SG, and moderately so in Google_HK.
Based on the previous clustering results, two red dash lines are also
drawn in Figure 4, roughly indicating three areas. Positioned in the
middle are the search engine variants in Cluster B, because of their
converging patterns on strong ties with most websites. The two red
dash lines also show the search engine variants in Cluster A to the
left and those in Cluster C to its right, indicating diverging effects
because of the presence of weak ties. This explains why Cluster A
and Cluster C is shown adjacent to Cluster B, but not adjacent to
each other. This visualization is thus consistent with the findings
shown in Table 5.
This blockmodeling findings also help identify the merging and
diverging patterns of search engine variants. Cluster A contains
Baidu_CN, Yahoo_CN and Google_CN; Cluster B contains
Google_HK, Google_SG, Google_TW and Yahoo_SG; Cluster C
contains Yahoo_HK and Yahoo_TW. The cluster outcome shown
in Table 5 indicates both patterns of merging and diverging,
determined by the choice of search engine variants. For the three
groups of search engine variants, two groups of search engine
variants deviate from the rest. The first group (Cluster A) contains
search engine variants designed for mainland China (Baidu_CN,
Yahoo_CN and Google_CN), and the second group (Cluster C)
contains the Yahoo Search for Taiwan and Hong Kong (Yahoo_HK
and Yahoo_TW). Thus, while the search engine variants in Cluster
B produce converging results for the top-100 websites, with
“complete” connection types to all clusters of websites, those in
Cluster A and those in Cluster C lead to diverging SERP.
The findings can also be unpacked depending the specific search
engine variant. Based on the same method, an additional 500
Chinese names of the Fortune 500 companies are added to the
selection of 2500 search queries, producing a second dataset in
2012 (Liao, 2013a). The following paragraphs unpack this second
dataset for two search engine variants in mainland China:
Google_CN (see Table 6) and Baidu_CN (see Table 7).
The results for the top-20 websites for each categories of search
queries of Google_CN, as shown in Table 6, show that Baidu.com
rank the top in almost all categories. Wikipedia.org is close second
here for Google_CN, suggesting a general observation that search
engines favour user-generated encyclopedias. The particular
findings also provide some counter evidence against the idea that
Google as a specific comapny favour Wikipedia as a website
because Google_CN actually favours Baidu Baike more than
Chinese Wikipedia, as clearly shown in Table 6.
The findings of Baidu_CN in Table 7 shows even more dominance
by Baidu Baike: It dominates all of seven categories with the
proportion of visibility scores is comparatively much concentrated
when compared to the results of Google_CN (see Table 6). In
addition, when considering the ranking position of hudong.com, the
findings seem to confirm the unfair competition accusation made
by Hudong’s CEO against Baidu (Yang, 2011). Depending on the
types of search quries, Hudong.com is ranked by Google_CN from
3rd to 9th (see Table 6). In contrast, Hudong Baike is not even
among the top-20 for many categories of the sampled queries for
Baidu Search. Indeed, if Google’s SERP can serve as an
independent third party for the competition between Baidu Baike
Figure 4. Delineating the boundaries of geo-linguistic settings based on SERPs.
Rank-
ing
Websites
(Aggregated)
1 zh.wikipedia.org
2 baike.baidu.com
3 hudong.com
4 baidu.com
5 sina.com.cn
6 knowledge.yahoo.com
7 edu.tw
8 qq.com
9 youtube.com
10 gov.cn
11 sohu.com
12 163.com
13 facebook.com
14 youku.com
15 people.com.cn
16 blog.sina.com.cn
17 xinhuanet.com
18 epochtimes.com
19 ifeng.com
20 baike.soso.com
8
and Hudong, Google does not make Hudong almost invisible as
Baidu does.
Hence if users from mainland China use Google Search instead of
Baidu Search, then Chinese Wikipedia will become equally visible
as Baidu Baike for them.
5. DISCUSSION
By systematically analysing the SERPs collected across four major
Chinese-speaking regions, it is shown that the patterns of merging
and diverging do exist. It is achieved by calculating visibility scores
as the equivalent “social ties” between search engine variants on
one hand and top-ranking websites on the other. Both the network
visualization and the blockmodeling outcomes show that the geo-
linguistic factors do make Chinese-language SERPs diverge on
certain websites, while converging on another. In particular, of the
nine search engine variants, the first group that diverges from the
rest contains search engine variants designed for mainland China
(Baidu_CN, Yahoo_CN and Google_CN), The second group
contains the Yahoo Search for Taiwan and Hong Kong (Yahoo_HK
and Yahoo_TW).
The findings suggest that the major online boundary in Chinese
Internet is drawn first along the line of regional difference, with all
mainland Chinese search engine settings share similar SERPs
among themselves, but not with the others to the same degree, as
shown in Figure 4. Another boundary is drawn for Yahoo Taiwan
and Yahoo Hong Kong at the other end. It is relatively easy to
explain the latter results because Yahoo Search by default
prioritizes local content, with other geo-linguistic variant options
available for users listed in the web interface: e.g. “search the
traditional Chinese-character-written web pages” or “search the
global websites”.
In contrast, it is relatively difficult to provide just technical
explanations regarding the question why all three mainland Chinese
settings do not share that much with other settings in terms of the
corresponding SERPs. It is likely that many of the websites that
are absent from the SERPs in three mainland Chinese settings
include those are not politically welcome in mainland China. Note
that the first two columns in Table 5 represent Baidu_CN and
Yahoo_CN, both of which constantly have weak ties with several
of the top 100 websites. The two search engine variants also
represent the only two that filter SERPs for users in mainland China.
Note also that the third column in Table 5 represents Google_CN.
While it is clustered with Baidu_CN and Yahoo_CN, it has more
strong ties with the top 100 websites, suggesting it has less
divergent results.
The findings seems to suggest that users from mainland China, if
using only Baidu_CN and Yahoo_CN, will have a substantial
number of otherwise highly visible websites overlooked or even
missing from their daily search experiences. These include
websites such as YouTube and Facebook that have been reported
being blocked by mainland China. They also include the websites
of government and education institutions in Taiwan and Hong
Kong: gov.tw gov.hk, edu.tw and edu.hk. In other words, the
Table 6 Results for Google_CN
Ranking
1 baidu.com 47.65% baidu.com 25.08% baidu.com 36.44% baidu.com 37.28% wikipedia.org 28.98% baidu.com 27.89% mbalib.com 27.99%
2 wikipedia.org 25.36% wikipedia.org 12.94% wikipedia.org 15.33% wikipedia.org 24.13% baidu.com 26.82% wikipedia.org 25.14% baidu.com 16.67%
3 hudong.com 8.74% sina.com.cn 12.06% sina.com.cn 9.46% hudong.com 11.00% hudong.com 7.66% hudong.com 9.63% fortunechina.com 13.65%
4 sina.com.cn 2.58% qq.com 6.67% douban.com 5.00% mbalib.com 3.55% sina.com.cn 7.10% sina.com.cn 7.17% wikipedia.org 8.74%
5 ifeng.com 2.03% 163.com 6.01% qq.com 4.45% sina.com.cn 3.18% xinhuanet.com 4.81% sohu.com 3.59% qq.com 4.09%
6 artxun.com 1.33% sohu.com 5.86% hudong.com 3.60% people.com.cn 2.66% people.com.cn 4.03% people.com.cn 3.34% qkankan.com 3.79%
7 soso.com 1.30% hudong.com 4.27% sohu.com 3.33% qq.com 2.64% qq.com 3.39% xinhuanet.com 2.95% sina.com.cn 3.62%
8 zdic.net 1.13% youku.com 4.26% youku.com 3.14% hc360.com 1.61% ifeng.com 3.30% youku.com 2.74% ifeng.com 3.59%
9 tiexue.net 1.07% xinhuanet.com 3.29% 163.com 3.09% sohu.com 1.50% 163.com 2.93% qq.com 2.45% hudong.com 3.41%
10 cncn.com 1.06% ifeng.com 2.78% mtime.com 2.14% 163.com 1.46% sohu.com 2.30% iciba.com 1.94% gold678.com 3.20%
11 xinhuanet.com 1.04% douban.com 2.47% youtube.com 1.77% hexun.com 1.44% weibo.com 1.71% 163.com 1.90% 163.com 2.43%
12 artx.cn 1.03% people.com.cn 2.31% 1ting.com 1.63% ifeng.com 1.43% youtube.com 1.55% ifeng.com 1.72% ciipp.com 1.29%
13 people.com.cn 0.96% hexun.com 1.85% weibo.com 1.58% studa.net 1.26% boxun.com 1.25% 360doc.com 1.50% sohu.com 1.15%
14 youku.com 0.84% huanqiu.com 1.59% m1905.com 1.56% 3edu.net 1.05% hexun.com 0.78% youtube.com 1.45% egouz.com 1.12%
15 163.com 0.83% youtube.com 1.57% iqiyi.com 1.50% 39.net 1.04% renren.com 0.62% sogou.com 1.25% bitauto.com 1.11%
16 sohu.com 0.73% yahoo.com 1.51% sogou.com 1.39% edu.cn 1.02% edu.tw 0.60% tianya.cn 1.12% people.com.cn 0.96%
17 qq.com 0.63% gov.tw 1.45% tudou.com 1.35% jrj.com.cn 1.00% china.com.cn 0.58% laonanren.com 1.03% zol.com.cn 0.93%
18 edu.tw 0.60% iqiyi.com 1.43% ifeng.com 1.27% chinaacc.com 0.97% libertytimes.com.tw 0.55% hexun.com 0.89% hexun.com 0.88%
19 edu.cn 0.54% weibo.com 1.32% xiami.com 1.07% xinhuanet.com 0.95% twitter.com 0.53% soso.com 0.81% yup.cn 0.72%
20 5156edu.com 0.54% tudou.com 1.27% pptv.com 0.91% youku.com 0.83% yahoo.com 0.52% cfdd.org.cn 0.76% google.cn 0.66%
Fortune500
The Cambridge
Encyclopedia of China
Top 10 Search Terms
(Google and Baidu)
Best Film/Popular Music
(China, Hong Kong,
Taiwan)
Modern Concepts (shared
with modern Japanese)
Notable People Potentially sensitive terms
Table 7 Results for Baidu_CN
Ranking
1 baidu.com 75.74% baidu.com 64.17% baidu.com 73.28% baidu.com 81.56% baidu.com 57.53% baidu.com 69.54% baidu.com 61.90%
2 wikipedia.org 6.20% youku.com 4.79% youku.com 6.66% wikipedia.org 2.41% wikipedia.org 7.48% wikipedia.org 5.30% mbalib.com 7.62%
3 hudong.com 1.98% sina.com.cn 4.59% iqiyi.com 2.57% sina.com.cn 2.16% qq.com 6.12% sina.com.cn 3.38% fortunechina.com 7.13%
4 sina.com.cn 1.94% qq.com 4.13% douban.com 2.30% qq.com 2.05% sina.com.cn 5.00% qq.com 3.23% sina.com.cn 3.20%
5 youku.com 1.86% sohu.com 3.05% tudou.com 1.91% youku.com 1.59% ifeng.com 2.82% youku.com 2.17% ifeng.com 2.27%
6 soso.com 1.64% iqiyi.com 2.73% sina.com.cn 1.65% xinhuanet.com 1.14% people.com.cn 2.52% sohu.com 1.73% fx678.com 1.91%
7 qq.com 1.61% 163.com 2.32% weibo.com 1.61% www.gov.cn 1.10% sohu.com 2.46% xinhuanet.com 1.68% zol.com.cn 1.73%
8 ifeng.com 1.18% tudou.com 1.91% qq.com 1.55% edu.cn 0.89% xinhuanet.com 2.31% 163.com 1.50% wikipedia.org 1.73%
9 douban.com 1.13% xinhuanet.com 1.53% xunlei.com 1.48% ifeng.com 0.80% 163.com 1.84% tianya.cn 1.47% qq.com 1.60%
10 tiexue.net 0.89% douban.com 1.28% mtime.com 1.07% sohu.com 0.78% soso.com 1.68% people.com.cn 1.40% bitauto.com 1.53%
11 weather.com.cn 0.88% ifeng.com 1.24% letv.com 0.78% people.com.cn 0.74% weibo.com 1.52% hexun.com 1.27% 163.com 1.38%
12 edu.cn 0.61% renren.com 1.19% m1905.com 0.78% douban.com 0.60% uname.cn 1.40% soso.com 1.17% qkankan.com 1.37%
13 xilu.com 0.59% letv.com 1.14% 163.com 0.73% 163.com 0.59% renren.com 1.36% douban.com 1.04% gongchang.com 1.05%
14 xinhuanet.com 0.58% weibo.com 0.97% verycd.com 0.68% rayli.com.cn 0.59% kaixin001.com 1.32% tudou.com 0.89% ticarefree.cn 1.04%
15 163.com 0.58% wikipedia.org 0.97% sohu.com 0.55% hao123.com 0.57% douban.com 0.97% bitauto.com 0.88% soso.com 0.86%
16 guoxue.com 0.57% zol.com.cn 0.93% 1ting.com 0.53% jrj.com.cn 0.50% youku.com 0.85% ifeng.com 0.73% yingjiesheng.com 0.83%
17 360buy.com 0.52% xunlei.com 0.80% pptv.com 0.50% huanqiu.com 0.49% 360buy.com 0.78% sensagent.com 0.70% autohome.com.cn 0.74%
18 qidian.com 0.51% taobao.com 0.80% ku6.com 0.48% iqiyi.com 0.48% www.gov.cn 0.73% hudong.com 0.66% xgo.com.cn 0.73%
19 tudou.com 0.51% huanqiu.com 0.74% yinyuetai.com 0.48% bankcomm.com 0.47% edu.cn 0.73% yangbihu.com 0.65% eastmoney.com 0.70%
20 sohu.com 0.50% 4399.com 0.71% wikipedia.org 0.42% chinaacc.com 0.46% hudong.com 0.58% tiexue.net 0.61% people.com.cn 0.68%
Fortune500
The Cambridge
Encyclopedia of China
Top 10 Search Terms
(Google and Baidu)
Best Film/Popular Music
(China, Hong Kong,
Taiwan)
Modern Concepts (shared
with modern Japanese)
Notable People Potentially sensitive terms
9
SERPs of the three mainland Chinese variants seem to diverge from
these websites. In contrast, the websites of government and
education institutions in mainland China, gov.cn and edu.cn, are
still relatively visible for almost all other search engine variants
except for the by-default-local Yahoo_TW and Yahoo_HK. Thus,
the patterns of merging and diverging seem to reflect the cultural
political complications of Chinese-language internet. While the
offline boundary between Hong Kong and Taiwan seems to be
overcome, that between mainland China and Hong Kong seems to
be reinforced. Although the SERP data may not reflect perfectly
what users actually read and click, it nonetheless indicates a general
probabilistic tendency substantiated by industry data.
6. CONCLUSION
The findings, visualized and analysed using network analysis
techniques, clearly indicate a strong localization effects on the
gatekeeping function of search engines, based on data covering
over 97% of the search engine market for four Chinese-speaking
regions. The findings also show major user-generated
encyclopedias such as Baidu Baike and Chinese Wikipedia do
dominate the SERPs with high rankings and visibility scores.
Because of the geo-linguistic factors coincide with different
cultural political situations of these Chinese-speaking regions,
different localization variants produce divergent outcomes of high-
ranking encyclopedia and other websites, thereby indicating strong
effects of “network gatekeeping” by search engines in exercising
gatekeeping bases of “display” and “localization”(Barzilai-Nahon,
2008).
In addition, by examining the overall patterns of SERPs, I have
demonstrated the merging and diverging effects contributed by the
factors of search engine providers and regional and language
settings. Different combinations of such provider and geo-linguistic
information lead to different “search engine variants”. Nine major
search engine variants, covering four regions with Chinese-
speaking majority population, are identified for the Chinese-
language internet. For a selected set of search queries covering
major Chinese cultural and political topics, I have found that the
SERPs converge on a specific type of websites (i.e. user-generated
encyclopedias) and that some search engine variants converge more
on Baidu Baike while other on Chinese Wikipedia. The merging
and diverging patterns are further analysed by both network
visualization and network analysis (blockmodeling analysis of two-
mode networks). Different patterns indicate that both
“nationalization” of a specific kind (i.e. mainland China) and
“trans-nationalization” (i.e. Hong Kong and Taiwan) can be
achieved by different gatekeeping options offered by various search
engine variants.
The results show that the SERPs are more likely to converge based
on similar geo-linguistic preferences. For example, the SERPs
diverge the most when users choose different Chinese characters
(i.e. simplified Chinese versus traditional Chinese). It is then
particularly intriguing that all Hong Kong variant results converge
more with Taiwanese variant ones and much less so with mainland
Chinese variants, while Hong Kong is much closer to mainland
China geographically, politically and administratively. In addition,
Chinese Wikipedia is much more visible in these regions than in
mainland China. Though the findings here cannot further
breakdown the geo-linguistic factors from cultural political ones,
the converging and diverging patterns alone are important findings
for Chinese-internet research and Wikipedia research.
There are of course obvious limitations for the findings presented
above. First, the selection of search query, while significant larger
than previous social scientific research on Chinese-language search
engines(Jiang & Akhtar, 2011), is still limited. Second, due to
limitation of space, this paper has not yet fully unpacked the
different findings for different categories of search queries. Third,
only standard Mandarin Chinese terms are used for this research,
overlooking other possibilities of written Cantonese queries (Chau,
Fang, & Yang, 2007). Forth but not last, only the default setting for
each localized search engine is analysed.
While the dataset presented may be limited in the scope of selected
search queries, time and search engine variants, I have
demonstrated the usefulness and viability of examining the merging
and diverging patterns because of the search engine variants, each
of which correspond to a segment of search engine market. For
instance, it can help online linguistics research by analysing
different SERP outcome for regions that use a shared writing
system but with regional variants, such as the difference between
Egyptian Arabic and Maghrebi Arabic. For another example, these
geo-linguistic factors can be said to constitute one of the most
important online “situations” for online media, as defined by
medium theorists in the tradition of media ecology (Meyrowitz,
1986, 1994), because these factors set the patterns of access.
According to a statistical report by the Data Center of China
Internet, During the first half year of 2010, the content produced by
amateur Chinese Internet users have surpassed that produced by
professional websites (Liao, 2013b; Qiang, 2010). Thus user-
generated content by Chinese Internet users are expected to have
influenced user-generated encyclopedias directly and SERP
indirectly. While this study has not yet addressed the relationship
among search engines, user-generated content and user-generated
encyclopedias, the findings here seems to suggest similar
geographic and linguistic dynamics. The clear outcome of
“network gatekeeping”, identified by Chinese search engine
variants and their respective preferred encyclopedias, may point to
a larger online context for Chinese Internet users across regions.
For future research, it will be useful to examine how geographic
and linguistic factors may influence the network gatekeeping
processes inside user-generated encyclopedias (Liao, 2009). It is
likely that they also exercise the gatekeeping bases of “display” and
“localization” as search engines do.
The overall method can be systematically extended for other
contexts. Various search engine variants can be chosen for research
for almost all the other language in the world, including languages
with transnational adoption such as Arabic, Hindu, Tamil, English,
Spanish, Portuguese, etc. Researchers can thus further interpret the
merging and diverging SERP outcome for research questions that
are relevant for global, transnational or inter-cultural
communications on one hand, and another set of questions for
human-computer interaction and information system on the other.
Also, the focus on examining geo-linguistic factors as important
variables for understanding search engines can contribute to the
development of geo-linguistic analysis of the Web (Liao & Petzold,
2011; Petzold & Liao, 2011). It can also be adopted for market and
industry applications when geo-linguistic identifiers are central
(DePalma, 2002; Dunne, 2006) .
In conclusion, the proposed method has the potentials for a wider
range of market and academic applications. The theoretical
implication may be extended to other websites or information
systems that produce or curate different outcome based on
geographic and linguistic preferences (or configurations) of users.
It highlights the role of geo-linguistic parameters as media “access
codes”, or set patterns of access to information as articulated by
medium theorists for TV research (Meyrowitz, 1986, 1994), or the
10
“network gatekeeping process” theorized by new information
science theory (Barzilai-Nahon, 2008). Localization has become
the new medium that has the (higher-level) messages of cultural
political integration, reintegration or fragmentation of users.
7. ACKNOWLEDGMENTS
This work was supported by the Taiwan National Science
Council’s Taiwan Merit Scholarships Program (NSC-095-SAF-I-
564-028-TMS) and supported in part by the Oxford Internet
Institute Scholarship. Special thanks to Ralph Schroeder, Bernie
Hogan, Scott Hales and Min Jiang for their advice and support.
8. REFERENCES
Aragón, P., Kaltenbrunner, A., Laniado, D., & Volkovich, Y.
(2012). Biographical Social Networks on Wikipedia - A cross-
cultural study of links that made history. In Proceedings of
WikiSym 2012. Retrieved from http://arxiv.org/abs/1204.3799
Bao, P., Hecht, B., Carton, S., Quaderi, M., Horn, M., & Gergle, D.
(2012). Omnipedia: bridging the Wikipedia language gap. In
Proceedings of the 2012 ACM annual conference on Human
Factors in Computing Systems (pp. 1075–1084). Retrieved from
http://dl.acm.org/citation.cfm?id=2208553
Bar‐Ilan, J. (2006). Web links and search engine ranking: The case
of Google and the query “jew.” Journal of the American Society
for Information Science and Technology, 57(12), 1581–1589.
doi:10.1002/asi.20404
Barzilai-Nahon, K. (2008). Toward a theory of network
gatekeeping: A framework for exploring information control.
Journal of the American Society for Information Science and
Technology, 59(9), 1493–1512. doi:10.1002/asi.20857
Battelle, J. (2005). The Search: How Google and Its Rivals Rewrote
the Rules of Business and Transformed Our Culture (First
Edition.). Portfolio Hardcover.
BBC. (2011, March 31). Google’s China exit “exaggerated.” BBC.
Retrieved from http://www.bbc.co.uk/news/business-12917322
Benkler, Y. (2006). The Wealth of Networks: How Social
Production Transforms Markets and Freedom. New Haven and
London: Yale University Press. Retrieved from
http://www.congo-education.net/wealth-of-networks/
Bermejo, F. (2009). Audience manufacture in historical
perspective: from broadcasting to Google. New Media &
Society, 11(1-2), 133 –154. doi:10.1177/1461444808099579
Brettel, M., & Spilker-Attig, A. (2010). Online advertising
effectiveness: a cross-cultural comparison. Journal of Research
in Interactive Marketing, 4(3), 176–196.
doi:10.1108/17505931011070569
Charlton, G. (2012, February 13). Why Wikipedia is top on Google:
the SEO truth no-one wants to hear. Econsultancy: Digital
Marketers United. Retrieved from
http://econsultancy.com/blog/9009-why-wikipedia-is-top-on-
google-the-seo-truth-no-one-wants-to-
hear?utm_campaign=bloglikes&utm_medium=socialnetwork&
utm_source=facebook
Chau, M., Fang, X., & Yang, C. C. (2007). Web searching in
Chinese: A study of a search engine in Hong Kong. Journal of
the American Society for Information Science and Technology,
58(7), 1044–1054. doi:10.1002/asi.20592
Chen, J. (2008). Essays on auction mechanisms and resource
allocation in keyword advertising (The University of Texas at
Austin). ProQuest.
CIC. (2009). China Search Engine Market Report 2009. Beijing,
China: China IntelliConsulting Corporation. Retrieved from
http://tech.sina.com.cn/z/2009ssdc/index.shtml
CNNIC. (2006, September 16). Chinese Search Engine Market
Survey Report 2006. China Internet Network Information
Center. Retrieved November 19, 2011, from
http://xtlv.cn/html/Dir/2006/11/06/4216.htm
CNNIC. (2007, September 26). 2007 Survey Report on Search
Engine Market in China. China Internet Network Information
Center. Retrieved November 19, 2011, from
http://www.cnnic.cn/html/Dir/2007/10/10/4838.htm
CNNIC. (2009, March 5). China Search Engine Report 2008
Advertisers and Users Behavior Study. (中国搜索引擎市场广
告主与用户行为研究报告). Retrieved November 19, 2011,
from http://www.cnnic.cn/html/Dir/2009/03/05/5483.htm
Couvering, E. V. (2004). New Media? The Political Economy of
Internet Search Engines. Presented at the International
Association of Media & Communications Researchers, Porto
Alegre, Brazil. Retrieved from
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.129.
1900
Couvering, E. V. (2008). The History of the Internet Search Engine:
Navigational Media and the Traffic Commodity. In A. Spink &
M. Zimmer (Eds.), Web Search (Vol. 14, pp. 177–206). Berlin,
Heidelberg: Springer Berlin Heidelberg. Retrieved from
http://www.springerlink.com/content/xn75781g305j756h/
Čuhalev, J. (2006). Ranking of Wikipedia articles on search
engines for searches about its own articles (Seminar Task for
Internet Search Techniques and Business Intelligence class) (p.
7). Retrieved from
http://www.jurecuhalev.com/blog/2006/10/13/seeing-lots-of-
wikipedia-in-your-google-searches/
Dahlberg, L. (2005). The Corporate Colonization of Online
Attention and the Marginalization of Critical Communication?
Journal of Communication Inquiry, 29(2), 160 –180.
doi:10.1177/0196859904272745
Damm, J. (2007). The Internet and the fragmentation of Chinese
society. Critical Asian Studies, 39, 273–294.
doi:doi:10.1080/14672710701339485
DePalma, D. A. (2002). Internationalization and Localization. In
Business without borders: a strategic guide to global marketing.
New York: John Wiley and Sons.
Doreian, P., Batagelj, V., & Ferligoj, A. (2004). Generalized
blockmodeling of two-mode network data. Social Networks,
26(1), 29–53. doi:10.1016/j.socnet.2004.01.002
Dunleavy, P., Margetts, H., Bastow, S., Pearce, O., & Tinkler, J.
(2007). Government on the internet: progress in delivering
information and services online. UK: National Audit Office.
Retrieved from
http://www.nao.org.uk/publications/nao_reports/06-
07/0607529.pdf
Dunne, K. J. (2006). Perspectives on Localization. John Benjamins
Publishing Company.
Dutta, S., Dutton, W. H., & Law, G. (2011). The New Internet
World: A Global Perspective on Freedom of Expression,
Privacy, Trust and Security Online. SSRN eLibrary. Retrieved
from
http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1810005
Einhorn, B. S., Bruce. (2010, November 11). How Baidu Won
China. BusinessWeek: Online Magazine. Retrieved from
http://www.businessweek.com/magazine/content/10_47/b4204
060242597_page_6.htm
Enquiro. (2007, June 15). Chinese Eye Tracking Study: Baidu Vs
Google. Retrieved July 9, 2009, from
http://searchengineland.com/chinese-eye-tracking-study-baidu-
vs-google-11477
11
Fallows, D. (2008). Search Engine Use. Pew Research Center’s
Internet & American Life Project. Retrieved November 19,
2011, from http://www.pewinternet.org/Reports/2008/Search-
Engine-Use.aspx
Goldberg, D., Nichols, D., Oki, B. M., & Terry, D. (1992). Using
collaborative filtering to weave an information tapestry.
Commun. ACM, 35(12), 61–70. doi:10.1145/138859.138867
Gray, M. (2007, May). Google Love Affair with Wikipedia -
Graywolf’s SEO Blog. Graywolf’s SEO Blog. Retrieved
December 2, 2011, from http://www.wolf-
howl.com/google/google-love-affair-with-wikipedia/
Hargittai, E. (2007). The Social, Political, Economic, and Cultural
Dimensions of Search Engines: An Introduction. Journal of
Computer‐Mediated Communication, 12(3), 769–777.
Hearne, R. (2006, August 12). SERP Click Through Rate of Google
Search Results – AOL-data.tgz – Want to Know How Many
Clicks The #1 Google Position Gets? Red Cardinal. Retrieved
December 2, 2011, from http://www.redcardinal.ie/search-
engine-optimisation/12-08-2006/clickthrough-analysis-of-aol-
datatgz/
Hecht, B., & Gergle, D. (2010). The tower of Babel meets web 2.0:
user-generated content and its applications in a multilingual
context. In Proceedings of the 28th international conference on
Human factors in computing systems (pp. 291–300). Retrieved
from http://dl.acm.org/citation.cfm?id=1753370
Hopkins, H. (2009, January 23). Britannica 2.0: Wikipedia Gets
97% of Encyclopedia Visits. Hitwise Intelligence: Analyst
Weblog. Retrieved from http://weblogs.hitwise.com/us-heather-
hopkins/2009/01/britannica_20_wikipedia_gets_9.html
Hussain, S., & Mohan, R. (2008). Localization in Asia Pacific. In
F. Librero & P. B. Arinto (Eds.), Digital Review of Asia Pacific
2007/2008. Orbicom and the International Development
Research Centre (IDRC). Retrieved from
http://www.idrc.ca/openebooks/377-5/
IDATE. (2011). World Internet Usage & Markets. IDATE
Consulting and Research. Retrieved from
http://www.idate.org/en/Research-store/Collection/Market-
Data-Reports_23/World-Internet-Usage-Markets_584.html
Jansen, B. J., Brown, A., & Resnick, M. (2007). Factors relating to
the decision to click on a sponsored link. Decision Support
Systems, 44(1), 46–59. doi:10.1016/j.dss.2007.02.009
Jansen, B. J., & Mullen, T. (2008). Sponsored search: an overview
of the concept, history, and technology. Int. J. Electronic
Business, 6(2), 114–131.
Jansen, J. (2011). Understanding Sponsored Search: Core
Elements of Keyword Advertising. Cambridge University Press.
Jiang, M., & Akhtar, A. (2011). Peer into the Black Box of Chinese
Search Engines: A Comparative Study of Baidu, Google, and
Goso. Presented at the The 9th Chinese Internet Research
Conference (CIRC 2011), Washington, D.C.: Institute for the
Study of Diplomacy. Georgetown University.
Jones, R. (2007, June 26). 96.6% of Wikipedia Pages Rank in
Google’s Top 10. The Google Cache: Search Engine Marketing,
SEO & PPC. Retrieved December 2, 2011, from
http://www.thegooglecache.com/white-hat-seo/966-of-
wikipedia-pages-rank-in-googles-top-10/
Jucquois-Delpierre, M. (2007). Fictional reality or real fiction: how
can one decide?: The strengths and weaknesses of information
science concepts and methods in the media world. Journal of
Information, Communication & Ethics in Society, 5(2/3), 235–
252. doi:10.1080/14616700306488
Jung, G. (2008). The Increasing Relevance of Online Marketing.
GRIN Verlag.
Khanna, A. (2011, October 26). Google drives traffic to Wikipedia,
but half of readers look for Wikipedia content — Wikimedia
blog. Wikimedia Foundation: Global blog. Official blog.
Retrieved from http://blog.wikimedia.org/2011/10/26/search-
and-wikipedia/
Liao, H.-T. (2008). A webometric comparison of Chinese
Wikipedia and Baidu Baike and its implications for
understanding the Chinese-speaking Internet. In 9th annual
Internet Research Conference: Rethinking Community,
Rethinking Place. Copenhagen.
Liao, H.-T. (2009). Conflict and Consensus in the Chinese version
of Wikipedia. IEEE Technology and Society Magazine, 28(2),
49–56. doi:10.1109/MTS.2009.932799
Liao, H.-T. (2011). Needing to Have a Voice: Linguisitc Grouping
in the Digital Networked Environment (ISD Working Papers in
New Diplomacy). Washington, D.C.: Institute for the Study of
Diplomacy. Georgetown University. Retrieved from
http://isd.georgetown.edu/files/Needing%20to%20Have%20a
%20Voice.pdf
Liao, H.-T. (2013a). How does Chinese localization influence
online visibility? A study on Chinese-language Search Engine
Result Pages (SERPs). (Accepted). To be presented at the 11th
Annual Chinese Internet Research Conference (CIRC 2013),
Oxford, UK.
Liao, H.-T. (2013b). “Online Encyclopedia” (网上/网络百科全书
), “User Generated Content” (用户生成内容). In (L. Cheng,
Ed.)The Internet in China: An Encyclopedic Handbook of
Online Business, Information Distribution, and Social
Connectivity. Berkshire Publishing.
Liao, H.-T., & Petzold, T. (2011). Analysing geo-linguistic
dynamics of the World Wide Web: The use of cartograms and
network analysis to understand linguistic development in
Wikipedia. Cultural Science, 3(2).
Luyt, B., Goh, D., & Lee, C. S. (2009). Searching locally: a
comparison of Yehey! and Google. Online Information Review,
33(3), 499–510.
Malaga, R. A. (2008). Worst practices in search engine
optimization. Commun. ACM, 51(12), 147–150.
doi:10.1145/1409360.1409388
Margetts, H. Z., & Escher, T. (2006). Governing from the Centre?
Comparing the Nodality of Digital Governments. SSRN
eLibrary. Retrieved from
http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1755762
Massa, P., & Scrinzi, F. (2012). Manypedia: Comparing Language
Points of View of Wikipedia Communities. In Proceedings of
WikiSym 2012. Retrieved from
http://orga.wikisym.org/ws2012/bin/download/Main/Program/
p13wikisym2012.pdf
Mazieres, A., & Huron, S. (2013). Toward Google Borders.
Presented at the Web Science. Retrieved from
http://hal.inria.fr/hal-00805048
McKenna, M. G., & Naftulin, H. (2000). Challenges in the
multicultural HCI development environment. In CHI ’00
extended abstracts on Human factors in computing systems (pp.
362–362). New York, NY, USA: ACM.
doi:10.1145/633292.633509
Meyrowitz, J. (1986). No sense of place : the impact of electronic
media on social behavior. New York ; Oxford: Oxford
University Press.
Meyrowitz, J. (1994). Medium theory. In D. Crowley & D. Mitchell
(Eds.), Communication Theory Today. Stanford University
Press.
12
Morris, M., & Ogan, C. (2002). The Internet as Mass Medium. In
D. McQuail (Ed.), McQuail’s reader in mass communication
theory (pp. 134–145). London: SAGE.
Nguyen, C. (2011, March). Search Engine Market share by country.
Chandler Nguyen Digital Marketing Blog. Retrieved December
1, 2011, from http://www.chandlernguyen.com/2011/03/search-
engine-market-share-by-country-mar-2011.html
Nielsen Online. (2008). Wikipedia U.S. Web Traffic Grows 8,000
Percent In Five Years, Driven By Search. New York: Nielsen
Online. Retrieved from
http://news.softpedia.com/news/Wikipedia-Traffic-Mostly-
from-Google-85703.shtml
Petzold, T., & Liao, H.-T. (2011). Geo-linguistic analysis of the
World Wide Web: The use of cartograms and network analysis
to understand linguistic development in Wikipedia. In D. Araya,
Y. Breindl, & T. J. Houghton (Eds.), Nexus: New Intersections
in Internet Research (pp. 55–75). New York: Peter Lang.
Petzold, T., Liao, H.-T., Hartley, J., & Potts, J. (2012). A world map
of knowledge in the making: Wikipedia’s inter-language linkage
as a dependency explorer of global knowledge accumulation.
Leonardo: Art, Science and Technology, 45(3), 284–284.
doi:10.1162/LEON_a_00376
PricewaterhouseCoopers. (2011). IAB Internet Advertising
Revenue Report. New York; DC: The Interactive Advertising
Bureau. Retrieved from http://www.iab.net/AdRevenueReport
Qiang, X. (2010, July 23). User-generated content online now
50.7% of total. China Daily. Beijing. Retrieved from
http://www.chinadaily.com.cn/business/2010-
07/23/content_11042851.htm
Rogers, R., & Sendijarevic, E. (2012). Neutral or National Point of
View? A Comparison of Srebrenica articles across Wikipedia’s
language versions. In Wikipedia Academy: Research and Free
Knowledge (#wpac2012). Berlin. Retrieved from
http://wikipedia-
academy.de/2012/w/images/8/89/3_Paper_Richard_Rogers_E
mina_Sendijarevic.pdf
Russell, J. (2011). Why Yahoo! –not Google– rules Taiwan’s
webspace. Asian Correspondent. Retrieved December 1, 2011,
from http://asiancorrespondent.com/55695/focus-on-taiwan-
where-yahoo-not-google-rules-the-countrys-webspace/
Segev, E. (2008). Search Engines and Power: A Politics of Online
(Mis-) Information. text. Retrieved November 19, 2011, from
http://www.webology.org/2008/v5n2/a54.html
SEMPO. (2011). SEMPO State of Search Marketing Report 2011.
SEMPO Institute. Retrieved from
http://econsultancy.com/uk/reports/sempo-state-of-search
Silverwood-Cope, S. (2012, February 8). Wikipedia: Page one of
Google UK for 99% of searches. Intelligent Positioning Blog.
Retrieved from
http://www.intelligentpositioning.com/blog/2012/02/wikipedia
-page-one-of-google-uk-for-99-of-searches/
Slingshot SEO. (2011). Google & Bing Click-Through Rates
(White paper). Retrieved from
http://www.slingshotseo.com/resources/white-papers/google-
ctr-study/
Spindler, S. (2010). Online Marketing: How to Increase
International Sales with Search Engine Optimisation. GRIN
Verlag.
StatCounter. (2011). Top 5 Search Engines in China/Hong
Kong/Singapore/Taiwan from Nov 2010 to Nov 2011.
StatCounter Global Stats. Retrieved December 1, 2011, from
http://gs.statcounter.com/#search_engine-CN-monthly-
201011-201111
Sunstein, C. R. (2002). Fragmentation and Cybercascades. In
Republic.Com. Princeton University Press.
The Cambridge encyclopedia of China. (1991) (2nd ed.).
Cambridge [England] ; New York: Cambridge University Press.
University, J. G. H. L. S. P. of L. H., & School, T. W. P. of L. C.
L. (2006). Who Controls the Internet? : Illusions of a Borderless
World: Illusions of a Borderless World. Oxford University
Press.
Varian, H. R. (2007). The Economics of Internet Search. Presented
at the Angelo Costa lecture, Rome. Retrieved from
http://people.ischool.berkeley.edu/~hal/Papers/2007/costa-
lecture.pdf
Vaughan, L., & Thelwall, M. (2004). Search engine coverage bias:
evidence and possible causes. Information Processing &
Management, 40(4), 693–707.
Vaughan, L., & Zhang, Y. (2007). Equal Representation by Search
Engines? A Comparison of Websites across Countries and
Domains. Journal of Computer-Mediated Communication,
12(3). Retrieved from
http://jcmc.indiana.edu/vol12/issue3/vaughan.html
Warncke-Wang, M., Uduwage, A., Dong, Z., & Riedl, J. (2012). In
Search of the Ur-Wikipedia: Universality, Similarity, and
Translation in the Wikipedia Inter-language Link Network.
Retrieved from
http://www.grouplens.org/system/files/p3wikisym2012.pdf
Yang, Y. (2011, February 25). China’s “Wikipedia” Submits
Complaint about Baidu. Economic Observer News, 508, 28.
Young, R. D. (2011, August 10). Top Google Ranking Captures
18.2% of Clicks. Search Engine Watch (#SEW). Retrieved
December 2, 2011, from
http://searchenginewatch.com/article/2100616/Top-Google-
Ranking-Captures-18.2-of-Clicks-Study
Zhao, S., & Baldauf, R. B. J. (2007). Planning Chinese Characters:
Reaction, Evolution or Revolution? Springer.

Más contenido relacionado

La actualidad más candente

AN INTEGRATED RANKING ALGORITHM FOR EFFICIENT INFORMATION COMPUTING IN SOCIAL...
AN INTEGRATED RANKING ALGORITHM FOR EFFICIENT INFORMATION COMPUTING IN SOCIAL...AN INTEGRATED RANKING ALGORITHM FOR EFFICIENT INFORMATION COMPUTING IN SOCIAL...
AN INTEGRATED RANKING ALGORITHM FOR EFFICIENT INFORMATION COMPUTING IN SOCIAL...ijwscjournal
 
2010 06-08 chania stochastic web modelling - copy
2010 06-08 chania stochastic web modelling - copy2010 06-08 chania stochastic web modelling - copy
2010 06-08 chania stochastic web modelling - copyvafopoulos
 
“무형의 대학”(The New Invisible College) 저자 C. Wagner 교수 초청특강
“무형의 대학”(The New Invisible College) 저자 C. Wagner 교수 초청특강“무형의 대학”(The New Invisible College) 저자 C. Wagner 교수 초청특강
“무형의 대학”(The New Invisible College) 저자 C. Wagner 교수 초청특강Han Woo PARK
 
Node xl book chapter oct 3
Node xl book chapter oct 3Node xl book chapter oct 3
Node xl book chapter oct 3Han Woo PARK
 
How to utilize ‘big data’ on SNS for academic purpose?
How to utilize ‘big data’ on SNS  for academic purpose?How to utilize ‘big data’ on SNS  for academic purpose?
How to utilize ‘big data’ on SNS for academic purpose?Han Woo PARK
 
The effects of Facebook use on civic participation attitudes and behaviour: A...
The effects of Facebook use on civic participation attitudes and behaviour: A...The effects of Facebook use on civic participation attitudes and behaviour: A...
The effects of Facebook use on civic participation attitudes and behaviour: A...Mark Dix
 
Mapping big data science
Mapping big data scienceMapping big data science
Mapping big data scienceHan Woo PARK
 
빅데이터 시대의 미디어&커뮤니케이션 교육과 연구
빅데이터 시대의 미디어&커뮤니케이션 교육과 연구빅데이터 시대의 미디어&커뮤니케이션 교육과 연구
빅데이터 시대의 미디어&커뮤니케이션 교육과 연구Han Woo PARK
 
Introduction To Social Network Analysis In Digital Age (11 June2009)
Introduction To Social Network Analysis In Digital Age (11 June2009)Introduction To Social Network Analysis In Digital Age (11 June2009)
Introduction To Social Network Analysis In Digital Age (11 June2009)Han Woo PARK
 
Recommending Communities at the Regional and City Level
Recommending Communities at the Regional and City LevelRecommending Communities at the Regional and City Level
Recommending Communities at the Regional and City LevelJohn Verostek
 
Mining and Analyzing Academic Social Networks
Mining and Analyzing Academic Social NetworksMining and Analyzing Academic Social Networks
Mining and Analyzing Academic Social NetworksEditor IJCATR
 
Cyworld Jeju 2009 Conference(10 Aug2009)No2(2)
Cyworld Jeju 2009 Conference(10 Aug2009)No2(2)Cyworld Jeju 2009 Conference(10 Aug2009)No2(2)
Cyworld Jeju 2009 Conference(10 Aug2009)No2(2)SangMe Nam
 
From Broadcast to Netcast - PhD Thesis - Bonchek - 1997
From Broadcast to Netcast - PhD Thesis - Bonchek - 1997From Broadcast to Netcast - PhD Thesis - Bonchek - 1997
From Broadcast to Netcast - PhD Thesis - Bonchek - 1997Mark Bonchek
 
FEATURE SELECTION-MODEL-BASED CONTENT ANALYSIS FOR COMBATING WEB SPAM
FEATURE SELECTION-MODEL-BASED CONTENT ANALYSIS FOR COMBATING WEB SPAM FEATURE SELECTION-MODEL-BASED CONTENT ANALYSIS FOR COMBATING WEB SPAM
FEATURE SELECTION-MODEL-BASED CONTENT ANALYSIS FOR COMBATING WEB SPAM csandit
 
ACOMP_2014_submission_70
ACOMP_2014_submission_70ACOMP_2014_submission_70
ACOMP_2014_submission_70David Nguyen
 

La actualidad más candente (19)

AN INTEGRATED RANKING ALGORITHM FOR EFFICIENT INFORMATION COMPUTING IN SOCIAL...
AN INTEGRATED RANKING ALGORITHM FOR EFFICIENT INFORMATION COMPUTING IN SOCIAL...AN INTEGRATED RANKING ALGORITHM FOR EFFICIENT INFORMATION COMPUTING IN SOCIAL...
AN INTEGRATED RANKING ALGORITHM FOR EFFICIENT INFORMATION COMPUTING IN SOCIAL...
 
2010 06-08 chania stochastic web modelling - copy
2010 06-08 chania stochastic web modelling - copy2010 06-08 chania stochastic web modelling - copy
2010 06-08 chania stochastic web modelling - copy
 
“무형의 대학”(The New Invisible College) 저자 C. Wagner 교수 초청특강
“무형의 대학”(The New Invisible College) 저자 C. Wagner 교수 초청특강“무형의 대학”(The New Invisible College) 저자 C. Wagner 교수 초청특강
“무형의 대학”(The New Invisible College) 저자 C. Wagner 교수 초청특강
 
Node xl book chapter oct 3
Node xl book chapter oct 3Node xl book chapter oct 3
Node xl book chapter oct 3
 
How to utilize ‘big data’ on SNS for academic purpose?
How to utilize ‘big data’ on SNS  for academic purpose?How to utilize ‘big data’ on SNS  for academic purpose?
How to utilize ‘big data’ on SNS for academic purpose?
 
The effects of Facebook use on civic participation attitudes and behaviour: A...
The effects of Facebook use on civic participation attitudes and behaviour: A...The effects of Facebook use on civic participation attitudes and behaviour: A...
The effects of Facebook use on civic participation attitudes and behaviour: A...
 
Mapping big data science
Mapping big data scienceMapping big data science
Mapping big data science
 
빅데이터 시대의 미디어&커뮤니케이션 교육과 연구
빅데이터 시대의 미디어&커뮤니케이션 교육과 연구빅데이터 시대의 미디어&커뮤니케이션 교육과 연구
빅데이터 시대의 미디어&커뮤니케이션 교육과 연구
 
Introduction To Social Network Analysis In Digital Age (11 June2009)
Introduction To Social Network Analysis In Digital Age (11 June2009)Introduction To Social Network Analysis In Digital Age (11 June2009)
Introduction To Social Network Analysis In Digital Age (11 June2009)
 
Recommending Communities at the Regional and City Level
Recommending Communities at the Regional and City LevelRecommending Communities at the Regional and City Level
Recommending Communities at the Regional and City Level
 
Mining and Analyzing Academic Social Networks
Mining and Analyzing Academic Social NetworksMining and Analyzing Academic Social Networks
Mining and Analyzing Academic Social Networks
 
Cyworld Jeju 2009 Conference(10 Aug2009)No2(2)
Cyworld Jeju 2009 Conference(10 Aug2009)No2(2)Cyworld Jeju 2009 Conference(10 Aug2009)No2(2)
Cyworld Jeju 2009 Conference(10 Aug2009)No2(2)
 
Paper 28
Paper 28Paper 28
Paper 28
 
From Broadcast to Netcast - PhD Thesis - Bonchek - 1997
From Broadcast to Netcast - PhD Thesis - Bonchek - 1997From Broadcast to Netcast - PhD Thesis - Bonchek - 1997
From Broadcast to Netcast - PhD Thesis - Bonchek - 1997
 
Community Data Program Submitted letter to Open Government Partneship
Community Data Program Submitted letter to Open Government PartneshipCommunity Data Program Submitted letter to Open Government Partneship
Community Data Program Submitted letter to Open Government Partneship
 
FEATURE SELECTION-MODEL-BASED CONTENT ANALYSIS FOR COMBATING WEB SPAM
FEATURE SELECTION-MODEL-BASED CONTENT ANALYSIS FOR COMBATING WEB SPAM FEATURE SELECTION-MODEL-BASED CONTENT ANALYSIS FOR COMBATING WEB SPAM
FEATURE SELECTION-MODEL-BASED CONTENT ANALYSIS FOR COMBATING WEB SPAM
 
IJET-V3I2P23
IJET-V3I2P23IJET-V3I2P23
IJET-V3I2P23
 
Citizens Speak Out:Public e-Engagement Experienceof Slovakia
Citizens Speak Out:Public e-Engagement Experienceof SlovakiaCitizens Speak Out:Public e-Engagement Experienceof Slovakia
Citizens Speak Out:Public e-Engagement Experienceof Slovakia
 
ACOMP_2014_submission_70
ACOMP_2014_submission_70ACOMP_2014_submission_70
ACOMP_2014_submission_70
 

Destacado

搜索引擎与网站间网络结构:基于能见指数的分析 Wuhan liao and zhang 海峡两岸
搜索引擎与网站间网络结构:基于能见指数的分析 Wuhan liao and zhang 海峡两岸搜索引擎与网站间网络结构:基于能见指数的分析 Wuhan liao and zhang 海峡两岸
搜索引擎与网站间网络结构:基于能见指数的分析 Wuhan liao and zhang 海峡两岸Hanteng Liao
 
Presentation ivunitex
Presentation ivunitexPresentation ivunitex
Presentation ivunitexSimon Kulkov
 
#Curiosidades123 Automóviles
#Curiosidades123 Automóviles#Curiosidades123 Automóviles
#Curiosidades123 Automóvilessegurosundostres
 
Σπουδές στη Φυσική
Σπουδές στη ΦυσικήΣπουδές στη Φυσική
Σπουδές στη ΦυσικήAthanasios Psaltis
 
Microman Brochure Webview
Microman Brochure WebviewMicroman Brochure Webview
Microman Brochure Webviewmark petrelle
 
Cis336 week 5 i lab 5
Cis336 week 5 i lab 5Cis336 week 5 i lab 5
Cis336 week 5 i lab 5jackiechaner
 
System Gas - cogenerazione vers. italiano
System Gas - cogenerazione vers. italianoSystem Gas - cogenerazione vers. italiano
System Gas - cogenerazione vers. italianoSauro Bompani
 
Presentacion compartida
Presentacion compartidaPresentacion compartida
Presentacion compartidawilsondaza1979
 
Opal jewellery
Opal jewelleryOpal jewellery
Opal jewelleryopalmine88
 
常民建築簡介Peoples' architecture(1)
常民建築簡介Peoples' architecture(1)常民建築簡介Peoples' architecture(1)
常民建築簡介Peoples' architecture(1)Ati Tsai
 
Learning design overview
Learning design overviewLearning design overview
Learning design overviewMartin Weller
 

Destacado (12)

搜索引擎与网站间网络结构:基于能见指数的分析 Wuhan liao and zhang 海峡两岸
搜索引擎与网站间网络结构:基于能见指数的分析 Wuhan liao and zhang 海峡两岸搜索引擎与网站间网络结构:基于能见指数的分析 Wuhan liao and zhang 海峡两岸
搜索引擎与网站间网络结构:基于能见指数的分析 Wuhan liao and zhang 海峡两岸
 
Presentation ivunitex
Presentation ivunitexPresentation ivunitex
Presentation ivunitex
 
#Curiosidades123 Automóviles
#Curiosidades123 Automóviles#Curiosidades123 Automóviles
#Curiosidades123 Automóviles
 
Σπουδές στη Φυσική
Σπουδές στη ΦυσικήΣπουδές στη Φυσική
Σπουδές στη Φυσική
 
Microman Brochure Webview
Microman Brochure WebviewMicroman Brochure Webview
Microman Brochure Webview
 
Cis336 week 5 i lab 5
Cis336 week 5 i lab 5Cis336 week 5 i lab 5
Cis336 week 5 i lab 5
 
System Gas - cogenerazione vers. italiano
System Gas - cogenerazione vers. italianoSystem Gas - cogenerazione vers. italiano
System Gas - cogenerazione vers. italiano
 
Presentacion compartida
Presentacion compartidaPresentacion compartida
Presentacion compartida
 
Nilai dan norma sosial
Nilai dan norma sosialNilai dan norma sosial
Nilai dan norma sosial
 
Opal jewellery
Opal jewelleryOpal jewellery
Opal jewellery
 
常民建築簡介Peoples' architecture(1)
常民建築簡介Peoples' architecture(1)常民建築簡介Peoples' architecture(1)
常民建築簡介Peoples' architecture(1)
 
Learning design overview
Learning design overviewLearning design overview
Learning design overview
 

Similar a [Wikisym2013] serp revised_apa_notice

Widj social media-is-not-search-v1-1
Widj social media-is-not-search-v1-1Widj social media-is-not-search-v1-1
Widj social media-is-not-search-v1-1Marianne Sweeny
 
FINAL DISSERTATION_MARIA SIMAS_2016
FINAL DISSERTATION_MARIA SIMAS_2016FINAL DISSERTATION_MARIA SIMAS_2016
FINAL DISSERTATION_MARIA SIMAS_2016Maria Simas
 
Cummings Research Proposal
Cummings Research ProposalCummings Research Proposal
Cummings Research ProposalDarcy Cummings
 
Literature Survey on Web Mining
Literature Survey on Web MiningLiterature Survey on Web Mining
Literature Survey on Web MiningIOSR Journals
 
Proposal for Designing a Linked Data Migrational Framework for Singapore Gove...
Proposal for Designing a Linked Data Migrational Framework for Singapore Gove...Proposal for Designing a Linked Data Migrational Framework for Singapore Gove...
Proposal for Designing a Linked Data Migrational Framework for Singapore Gove...Aravind Sesagiri Raamkumar
 
An Exploratory Study of the Motivations and Satisfactions on Mobile Web Browsing
An Exploratory Study of the Motivations and Satisfactions on Mobile Web BrowsingAn Exploratory Study of the Motivations and Satisfactions on Mobile Web Browsing
An Exploratory Study of the Motivations and Satisfactions on Mobile Web BrowsingRuby Kuo
 
INFORMATION RETRIEVAL TECHNIQUE FOR WEB USING NLP
INFORMATION RETRIEVAL TECHNIQUE FOR WEB USING NLP INFORMATION RETRIEVAL TECHNIQUE FOR WEB USING NLP
INFORMATION RETRIEVAL TECHNIQUE FOR WEB USING NLP ijnlc
 
Doing An Internet Study
Doing An Internet StudyDoing An Internet Study
Doing An Internet StudyHan Woo PARK
 
Published Paper
Published PaperPublished Paper
Published PaperFaeza Noor
 
Smart Crawler for Efficient Deep-Web Harvesting
Smart Crawler for Efficient Deep-Web HarvestingSmart Crawler for Efficient Deep-Web Harvesting
Smart Crawler for Efficient Deep-Web Harvestingpaperpublications3
 
Linked Data Generation for the University Data From Legacy Database
Linked Data Generation for the University Data From Legacy Database  Linked Data Generation for the University Data From Legacy Database
Linked Data Generation for the University Data From Legacy Database dannyijwest
 
A Literature Survey on Ranking Tagged Web Documents in Social Bookmarking Sys...
A Literature Survey on Ranking Tagged Web Documents in Social Bookmarking Sys...A Literature Survey on Ranking Tagged Web Documents in Social Bookmarking Sys...
A Literature Survey on Ranking Tagged Web Documents in Social Bookmarking Sys...IOSR Journals
 
Taxonomy, Social Networks and Pace Layering
Taxonomy, Social Networks and Pace LayeringTaxonomy, Social Networks and Pace Layering
Taxonomy, Social Networks and Pace LayeringRoger Hudson
 
Investigating Internet-based Korean politics using e-research tools Kaist Cu...
Investigating Internet-based Korean politics using e-research tools Kaist Cu...Investigating Internet-based Korean politics using e-research tools Kaist Cu...
Investigating Internet-based Korean politics using e-research tools Kaist Cu...Han Woo PARK
 
WEB EVOLUTION - THE SHIFT FROM INFORMATION PUBLISHING TO REASONING
WEB EVOLUTION - THE SHIFT FROM INFORMATION PUBLISHING TO REASONINGWEB EVOLUTION - THE SHIFT FROM INFORMATION PUBLISHING TO REASONING
WEB EVOLUTION - THE SHIFT FROM INFORMATION PUBLISHING TO REASONINGijaia
 
Building efficient and effective metasearch engines
Building efficient and effective metasearch enginesBuilding efficient and effective metasearch engines
Building efficient and effective metasearch enginesunyil96
 

Similar a [Wikisym2013] serp revised_apa_notice (20)

Widj social media-is-not-search-v1-1
Widj social media-is-not-search-v1-1Widj social media-is-not-search-v1-1
Widj social media-is-not-search-v1-1
 
FINAL DISSERTATION_MARIA SIMAS_2016
FINAL DISSERTATION_MARIA SIMAS_2016FINAL DISSERTATION_MARIA SIMAS_2016
FINAL DISSERTATION_MARIA SIMAS_2016
 
Cummings Research Proposal
Cummings Research ProposalCummings Research Proposal
Cummings Research Proposal
 
Literature Survey on Web Mining
Literature Survey on Web MiningLiterature Survey on Web Mining
Literature Survey on Web Mining
 
Proposal for Designing a Linked Data Migrational Framework for Singapore Gove...
Proposal for Designing a Linked Data Migrational Framework for Singapore Gove...Proposal for Designing a Linked Data Migrational Framework for Singapore Gove...
Proposal for Designing a Linked Data Migrational Framework for Singapore Gove...
 
An Exploratory Study of the Motivations and Satisfactions on Mobile Web Browsing
An Exploratory Study of the Motivations and Satisfactions on Mobile Web BrowsingAn Exploratory Study of the Motivations and Satisfactions on Mobile Web Browsing
An Exploratory Study of the Motivations and Satisfactions on Mobile Web Browsing
 
INFORMATION RETRIEVAL TECHNIQUE FOR WEB USING NLP
INFORMATION RETRIEVAL TECHNIQUE FOR WEB USING NLP INFORMATION RETRIEVAL TECHNIQUE FOR WEB USING NLP
INFORMATION RETRIEVAL TECHNIQUE FOR WEB USING NLP
 
Linked data migrational framework
Linked data migrational frameworkLinked data migrational framework
Linked data migrational framework
 
A42020106
A42020106A42020106
A42020106
 
Doing An Internet Study
Doing An Internet StudyDoing An Internet Study
Doing An Internet Study
 
Published Paper
Published PaperPublished Paper
Published Paper
 
Smart Crawler for Efficient Deep-Web Harvesting
Smart Crawler for Efficient Deep-Web HarvestingSmart Crawler for Efficient Deep-Web Harvesting
Smart Crawler for Efficient Deep-Web Harvesting
 
Searching the web general
Searching the web generalSearching the web general
Searching the web general
 
Linked Data Generation for the University Data From Legacy Database
Linked Data Generation for the University Data From Legacy Database  Linked Data Generation for the University Data From Legacy Database
Linked Data Generation for the University Data From Legacy Database
 
A Literature Survey on Ranking Tagged Web Documents in Social Bookmarking Sys...
A Literature Survey on Ranking Tagged Web Documents in Social Bookmarking Sys...A Literature Survey on Ranking Tagged Web Documents in Social Bookmarking Sys...
A Literature Survey on Ranking Tagged Web Documents in Social Bookmarking Sys...
 
Taxonomy, Social Networks and Pace Layering
Taxonomy, Social Networks and Pace LayeringTaxonomy, Social Networks and Pace Layering
Taxonomy, Social Networks and Pace Layering
 
Investigating Internet-based Korean politics using e-research tools Kaist Cu...
Investigating Internet-based Korean politics using e-research tools Kaist Cu...Investigating Internet-based Korean politics using e-research tools Kaist Cu...
Investigating Internet-based Korean politics using e-research tools Kaist Cu...
 
WEB EVOLUTION - THE SHIFT FROM INFORMATION PUBLISHING TO REASONING
WEB EVOLUTION - THE SHIFT FROM INFORMATION PUBLISHING TO REASONINGWEB EVOLUTION - THE SHIFT FROM INFORMATION PUBLISHING TO REASONING
WEB EVOLUTION - THE SHIFT FROM INFORMATION PUBLISHING TO REASONING
 
Going for GOLD - Adventures in Open Linked Metadata
Going for GOLD - Adventures in Open Linked MetadataGoing for GOLD - Adventures in Open Linked Metadata
Going for GOLD - Adventures in Open Linked Metadata
 
Building efficient and effective metasearch engines
Building efficient and effective metasearch enginesBuilding efficient and effective metasearch engines
Building efficient and effective metasearch engines
 

Último

CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 

Último (20)

CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 

[Wikisym2013] serp revised_apa_notice

  • 1. Only the abstract here is included in the proceedings of the WikiSym + OpenSym 2013 Conference (wsos2013). The full text is a work-in- progress draft, revised based on blind-review comments and suggestions. Please contact the author for latest citation for this research. How does localization influence online visibility of user- generated encyclopedias? A study on Chinese-language Search Engine Result Pages (SERPs) Han-Teng Liao Oxford Internet Institute University of Oxford Oxford, United Kingdom hanteng@gmail.com ABSTRACT Prior empirical and theoretical work has discussed the role of dominant search engine plays in the function of information gatekeeping on the Web, and there are reports on the high ranking of Wikipedia website among the search engine result pages (SERP). However, little research has been conducted on non-Google search engines and non-English versions of user-generated encyclopedias. This paper proposes a method to quantify the “display” gatekeeping differences of the SERP ranking and presents findings based on the Chinese SERP data. Based on 2,500 mainly-Chinese-language search queries, the data set includes the SERP outcome of four Chinese-speaking regions (mainland China, Singapore, Hong Kong and Taiwan) provided by three major search engines (Baidu, and Google and Yahoo), covering over 97% of the search engine market in each region. The findings, analysed and visualized using network analysis techniques, demonstrate the followings: major user-generated encyclopedias are among the most visible; localization factors matter (certain search engine variants produce the most divergent outcomes, especially mainland Chinese ones). The indicated strong effects of “network gatekeeping” by search engines also suggest similar dynamics inside user-generated encyclopedias. Categories and Subject Descriptors [Human-centered computing]: Collaborative and social computing – Collaborative filtering, Wikis, Empirical studies in collaborative and social computing [Information Systems]: Web search engines – Collaborative filtering, Page and site ranking General Terms Management, Performance, Design, Human Factors, Theory Keywords Geo-linguistic analysis, network analysis, Network gatekeeping, Chinese Internet, Chinese characters, Localization, censorship. 1. INTRODUCTION Using search engine is among the most popular online activity for users in the US (Fallows, 2008) and mainland China (CIC, 2009; CNNIC, 2009), and has been among the driving forces of the fast- growing online advertising platform (Varian, 2007; SEMPO, 2011; IDATE, 2011; PricewaterhouseCoopers, 2011). It has been reported that (and speculated why) the global leader of search engines Google has consistently favoured the global leader of user- generated encyclopedias Wikipedia by showing relevant pages frequently and prominently in the search engine result pages (thereafter SERP) (Charlton, 2012; Čuhalev, 2006; Gray, 2007; Jones, 2007; Silverwood-Cope, 2012). Independent market research by Nielsen Online and Hitwise Intelligence has demonstrated that Wikipedia not only dominates the online visits for encyclopedia content, but also does so mainly because of the traffic directed by major Web search engines (Hopkins, 2009; Nielsen Online, 2008). Even the Wikimedia Foundation acknowledged this (Google drives traffic to Wikipedia), but nonetheless argued that half of its readers did want to look for Wikipedia content (Khanna, 2011). Thus, as major websites that dominate traffic and user attention, Google and Wikipedia seem to be central in guiding users where to look. However, most of the findings and discussions are limited to or predominantly focused on the English-language context(Battelle, 2005; Bermejo, 2009; Couvering, 2004, 2008; Dahlberg, 2005; Hargittai, 2007; Segev, 2008), and little effort has been made to understand whether such a phenomenon is specific to Google/Wikipedia or can be found for other major search engines and user-generated encyclopedias. In addition, the multi-lingual internet and the rise of non-English users on the Web have multiple implications on the “localization” effects on search engines. Localization (thereafter L10n), a process of adapting computer software or information systems for a group of users usually defined by national boundaries or geo-linguistic profiles(Hussain & Mohan, 2008; Liao, 2011; McKenna & Naftulin, 2000), is expected to influence users’ information-seeking practices. Both Google and Wikipedia provide localized content and interfaces designed to serve different group of users. . Because Google (or other general-purpose search engines), Wikipedia (or other user-generated encyclopedias) and localization are likely to present and thus frame the Web differently for different groups of users, they effectively filter information for them. While such filtering can be described as gatekeeping by communication scholars, the fact that the Web users can directly or indirectly participate in such information filtering processes has introduced techniques and theories of "collaborative filtering" (Benkler, 2006; Goldberg, Nichols, Oki, & Terry, 1992) and “network gatekeeping”(Barzilai-Nahon, 2008). Indeed, while Google and Only a prior version of the abstract above was included in the proceedings of the WikiSym + OpenSym 2013 Conference (wsos2013). The text below is a work-in-progress draft, revised based on blind-review comments and suggestions. Please contact the author for latest citation for this research. WikiSym '13 August 05 - 07 2013, Hong Kong, China Copyright 2013 ACM 978-1-4503-1852-5/13/08 ...$15.00.
  • 2. 2 Wikipedia may concentrate Web traffic and command user attention as major global websites, users’ contribution of web content and links may also influence such filtering and gatekeeping outcomes, as demonstrated by the case of Google query of “Jew”(Bar Ilan, 2006)­ : some users were organized to help the Wikipedia’s entry page of “Jew” to rank higher in the Google’s English-language SERPs. Thus, although both "collaborative filtering" (Benkler, 2006; Goldberg et al., 1992) and “network gatekeeping”(Barzilai-Nahon, 2008) are indeed about filtering and keeping information, the possibility of participation by user input makes the different from the filtering and gatekeeping processes in traditional media. Nonetheless, I argue that geographic and linguistic factors may bound or limit such collaborative and networking possibilities and thus re-introducing national and/or linguistic boundaries back on the Web. Indeed, as early as in the early 2000s, researchers such as Zittrain and Sunstein have raised the issues of localized search results in filtering political content or fragmenting public sphere (Morris & Ogan, 2002; Sunstein, 2002). For SERPs, the question of information control and linguistic boundaries remains, while the “borders” of national framework have been reintroduced in many aspects of technological and legal arrangements(University & School, 2006). In particular, Google’s first collaboration with (or accommodation of) Chinese government’s need and later exit from mainland China has demonstrated the intricate political and cultural dimensions of “localization” of search engine services(Vaughan & Zhang, 2007; Einhorn, 2010). Thus, the research gap on the effects of localization on SERPs and non-English Wikipedia need to be filled, including prominent cases of Chinese-language and Arabic- language internet users whose recent presence and participation in the new internet world has also attracted much attention (Dutta, Dutton, & Law, 2011). In particular, in order to answer how search engines and/or user-generated encyclopedias reintroduce or shape the national or social boundaries, more empirical work on L10n effects is needed (Aragón, Kaltenbrunner, Laniado, & Volkovich, 2012; Bao et al., 2012; Hecht & Gergle, 2010; Liao, 2008, 2011; Luyt, Goh, & Lee, 2009; Massa & Scrinzi, 2012; Mazieres & Huron, 2013; Petzold, Liao, Hartley, & Potts, 2012; Rogers & Sendijarevic, 2012; Warncke-Wang, Uduwage, Dong, & Riedl, 2012). L10n is also briefly discussed as contributing factor to “internationalization mechanisms” of “network gatekeeping”(Barzilai-Nahon, 2008), holding the key for researchers to understand the nationalization or internationalization dynamics of the Web. For Chinese-language internet, there are many localized versions provided several major search engines, including examples such as Yahoo China, Google Hong Kong, Google Taiwan, etc. I call them search engine-locale variants (thereafter search engine variants). Do different search engine variants guide users from various Chinese-speaking regions to see the same websites regardless of which search engine they chose? Or do they see divergent SERP? Prior empirical research has been conducted in analysing SERPs inside mainland China, with the latest research on 316 search query phrases of “Internet event” collected in 2009, indicating that indeed Baidu Baike and Chinese Wikipedia has ranked high among the SERPs (Jiang & Akhtar, 2011). However, it focuses on (and thus is limited to) simplified Chinese users in mainland China and the selected sample of search queries was based upon internet incidents that are politically controversial to mainland China. This paper contributes findings based 2500 search queries in 2011, covering not only more topics but also more Chinese-language search engines across more regions such as Hong Kong, Taiwan and Singapore. Before presenting the methods and findings, the next section will first provide a theoretical framework that captures the localization effects of search engines. 2. L10N OF SEARCH ENGINES Observing how search engines categorise users is one of the practical ways to examine the impact of search engines on national and/or regional boundaries. As part of the industry practice in internationalization/Localization (i18n/L10n), search engines provide different interfaces and services for different users, usually categorized by their geo-linguistic identifiers, using language codes such as zh-TW (Chinese in Taiwan), pt-BR (Portuguese in Brazil), and en-IN (English in India)(DePalma, 2002; Dunne, 2006). These identifiers in turn influence how content is aggregated, filtered and prioritised for users who share the same or similar language preferences. Online users and audiences are often partitioned accordingly by search engine marketing tools such as Google AdWords and Microsoft adCenter. Unlike the globalized TV industry where broadcasting and cable TV are still bounded to geography, these geo-linguistic codes are configurable. For example, one can manage to use UK version of Google even when not in UK To conceptualize the localization effects of search engines, this paper applies the “network gatekeeping” theory (Barzilai-Nahon, 2008) for the following reasons. First, localization was discussed as contributing factor to “internationalization mechanisms” of “network gatekeeping”(Barzilai-Nahon, 2008). Albeit the theory comes mainly from information science to better understand information control in network settings, its multidisciplinary aspects (Jucquois-Delpierre, 2007) can help researchers understand how seemingly technical arrangement of computer software or information system can have enormous effects on gatekeeping or controlling the flows and presentation of information. Second, distinct from traditional gatekeeping theory that focuses on withholding or deletion of information, the network gatekeeping theory not only conceptualizes localization as part of the gatekeeping processes, but also emphasizes the “display” bases for such processes: “Presenting information in a particular visual form designed to catch the eye” (Barzilai-Nahon, 2008). Indeed, search engines visually present the results. Thus, to understand the localization effects of search engines, a data collection method must consider not only the localization parameters but also the visual display of search results. I argue that locales in computing, a set of parameters that describes user’s language, region and other interface preferences, constitute one of the most important online “situations” for online media. By “situations” I use the definition used by medium theorists in the tradition of media ecology: “situations as (social) information- systems that set the patterns of access to information” (Meyrowitz, 1986, 1994). Note that as medium theorists focus on medium rather on messages, the definition is particular suitable for studying search engines because some major companies including Google have resisted the idea that they are in the content or media industry by insisting that they are information companies. For media and communication scholars, the underlying question is less about Google’s industrial identity but rather about how online media in general can use locales to segment, fragment and integrate different media markets and/or audiences by using different information system settings. Thus, geographic and linguistic factors seem to “set the patterns of access to information”, as geo-linguistic situations are expected to determine which websites will be the most visible and constantly appearing ones in the SERPs.
  • 3. 3 2.1 A Straight-forward Visibility Test Because users often browse SERPs from the top to the bottom, various market research(Enquiro, 2007), social science research (Bar Ilan, 2006; Dunleavy, Margetts, Bastow, Pearce, & Tinkler,­ 2007; Margetts & Escher, 2006; Vaughan & Thelwall, 2004) and industry practices (Slingshot SEO, 2011) has measured the level of online visibility based on webometric data such as the positions in SERPs (more visible if more high up) and/or the number of in- coming web links by other websites. These measurements provide the foundations for keyword search advertising (Brettel & Spilker- Attig, 2010; Chen, 2008; B. J. Jansen, Brown, & Resnick, 2007; B. J. Jansen & Mullen, 2008; J. Jansen, 2011; Jung, 2008; Malaga, 2008; Spindler, 2010). For marketing purposes, it is imperative to boost the ranking of a website for a target set of search terms (or search keywords). For the purpose of this research, the focus shifts to the medium role of search engines between users and webpages. As shown in Figure 1, search engines play the gatekeeping role by curating different sets of web pages for different group of users characterized by their respective search engine variants. It functions as “network” gatekeeping because search engines often provide different rankings based on both user data and the inter- linking data among the web pages themselves. Figure 1. Search engines as the “network gatekeeper” between users and web pages To account for the difference made by the ranking positions in SERPs, this research proposes a method to quantify such “display” gatekeeping differences(Barzilai-Nahon, 2008). Because different SERP rankings suggest different level of visibility, different scores can be assigned. One way to do so is use click-through rate (thereafter CTR) data for SERPs. Commonly used in online advertising, CTR measure the number of clicks on a web link divided by the number of times it is shown to the users (i.e. clicks/impressions). For search engine marketing, CTR indicate the probability of a listed web link being clicked. Based on the arithmetic mean of the CTR for top-10 search results from five different sources (Hearne, 2006; Jones, 2007; Young, 2011), I plotted the scatter chart in Figure 2 to show the relationship between the SERP ranking and CTR. The top-ranking website is expected to receive more than 30% of the traffic while the second receives just a bit over 10%, and so on. The relationship between the SERP ranking and CTR seem to follow the power function of y = axb . Thus a power regression analysis is done to provide a curve-fitting function of y = 0.2889x-1.078 , with high R² value (0.9934), suggesting a close fit. Thus for this research, the visibility scores are assigned accordingly based on the SERP ranking. Figure 2. Click-through Rates depending on the ranking in the Search Engine Results Page (SERP) While it is impossible to exhaust the SERPs to identify patterns of preferred websites, it has been established by the previous research that the top-10 search results in the first SERP occupy a significant proportion of users’ attention and actual clicks (Hearne, 2006; Jones, 2007; Young, 2011), and based on such estimated data of CTR, different visibility scores can be assigned to websites depending on their ranking in the SERP, as shown in Figure 2. High SERP ranking does not always guarantee users’ actual clicks. Nonetheless, it is justified to use CTR as proxy for visibility scores for the purpose of research: it is the best-effort attempt based on various sources of industry data. 2.2 Chinese Search Engine Markets According to various survey, market and traffic reports from both inside and outside mainland China (CIC, 2009; CNNIC, 2006, 2007; Nguyen, 2011; Russell, 2011; StatCounter, 2011), three major search engines (Baidu, and Google and Yahoo) dominate the search engine markets across four regions (mainland China, Singapore, Hong Kong, and Taiwan) and two Chinese scripts preferences (simplified Chinese for mainland China and Singapore; traditional Chinese for Hong Kong and Taiwan). Thus, nine search engine variants can be derived from the combinations of search engine providers and geo-linguistic preferences, which altogether cover over 97% of the market::  For mainland China (mostly simplified Chinese users): zh-cn: Baidu, Google (simplified Chinese), Yahoo China  For Singapore (mostly simplified Chinese users): zh-sg:Google Singapore and Yahoo Singapore  For Hong Kong (mostly traditional Chinese users): zh-hk:Google Hong Kong and Yahoo Hong Kong  For Taiwan (mostly traditional Chinese users): zh-tw:Google Taiwan and Yahoo Taiwan These variants are hereafter abbreviated as Baidu_CN, Google_CN, Yahoo_CN, Google_SG, Yahoo_SG, Google_HK, Yahoo_HK, Google_TW,Yahoo_TW.It is noted that Baidu continues to enjoy its lead in mainland China with Google at second place, after Google moved its mainland operations to Hong Kong (BBC, 2011). In Hong Kong and Taiwan around 2010 to 2011, Google has overtaken Yahoo’s leading position while maintaining its top position in Singapore (StatCounter, 2011). With all these nine variants, will the SERPs merge on a similar set of websites or diverge? By answering this question, researchers can gain insights on the converging and diverging effects of search engines for Chinese-language users across these regions. Users (often categorized by providers and geo‐linguistic settings) Search Engines Web pages y = 0.2889x-1.078 R² = 0.9934 0% 5% 10% 15% 20% 25% 30% 35% 1 2 3 4 5 6 7 8 9 10 VisibilityScores Ranking of the Search Engine Results Page wighted by CTR unweighted Power (wighted by CTR)
  • 4. 4 2.3 Merging and diverging effects of SERPs If the aforementioned market survey and traffic reports are correct, search engine users from Taiwan mostly filter web pages through the lens of search engine variants of Google_TW and Yahoo_TW. ThosefromHongKongmostlyuseGoogle_HKandYahoo_HK,andsoon. By conceptualizing search engines as medium, the merging and diverging patterns of SERPs will also indicate whether users from these regions will see similar websites, using different search engine providers. Hence, the SERP data may indicate patterns which search engines may overcome offline boundaries across these regions (if the SERPs converge on specific websites) and which may reinforce them (if the SERPs diverge), thereby contributing to the general question of media and globalization on the case of search engines. To do so, the proposed method of visibility tests that quantify the top-ranking websites can be used as indication of search engines exercising its “display” gatekeeping power for certain websites. Based on the quantified numbers of such display gatekeeping power, the visibility patterns can be systematically examined between (1) search engine variants and (2) visible websites. Moreover, visibility scores can be further aggregated (i.e. summed) over a selection of search queries, so as to better answer different research questions that guide such selection. Ideally, by exhausting visibility scores for various localized versions of SERPs over large sample of search queries, researchers can better compare how visible a website is across different search engine variants, thereby paving the ways for showing the merging and diverging patterns of the SERPs. It should be noted that, borrowing from the academic research on webometric visibility and the industry practice on keyword advertising, the proposed framework and method is general enough for future study regardless the providers and/or geo-linguistic preferences of search engines: For example. How different, or similar, are the SERPs provided by Yandex versus Google in Turkey? How different, or similar, are the SERPs provided by Google Hindi versus Google Urdu in India? The outcome of visibility scores can be further visualized and analysed by various network analysis techniques. Thus, this method will answer these empirical questions, with results that can then be interpreted to explore the cultural political implications of such patterns. To showcase how the integrated method works satisfactorily, I choose to study Chinese-language internet because its boundaries have several historical, cultural and political complications. For example, regions such as mainland China, Singapore, Hong Kong and Taiwan have different practices in democracy, free speech, human rights and Chinese scripts (Damm, 2007; Liao, 2009; Zhao & Baldauf, 2007). 3. DATA Collection To identify how search engine variants influence the Chinese- language SERPs, the top-10 results should provide enough indication. 3.1 Search Queries First, I have selected about 2500 search queries that are relevant to Chinese cultural and political topics. As summarized in Table 1, the selection includes all 990 entries in "The Cambridge Encyclopedia of China"(The Cambridge encyclopedia of China, 1991), the top 10 search terms provided respectively by Baidu and Google (including mainland China, Hong Kong and Taiwan variations) of various categories since 2007, major popular cultural references, notable people names and some other culturally and politically "sensitive" keywords. Although other selection or combination is possible, this selection aims to focus this research on the prominence of user- generated encyclopedias across Chinese-speaking regions. Table 1 Sources and numbers of search queries Second, the sample keywords are transliterated into search queries according to the respective Chinese orthographic preferences (simplified Chinese for mainland China and Singapore; traditional Chinese for Hong Kong and Taiwan), making this research first of its kind to compare SERPs across Chinese-language variants. Third, the top-10 SERPs are collected for the nine search engine variants that cover four major Chinese-speaking regions of China, Singapore, Hong Kong and Taiwan. Then they are parsed and processed by the visibility tests, weighting the high-ranking website with higher visibility scores. 3.2 Search Results Around 22,000 web links are extracted from the SERPs based on the outcome of 2500 search queries submitted across nine variations of search engines in 2011. These 22,000 web links correspond to around 25,000 unique domain names. Then the outcome is further consolidated manually by checking IP addresses to over 16,000 websites (e.g. the website of sohu.com aggregates money.sohu.com and women.sohu.com). Finally, all education and government websites are aggregated into respective top-level domain names, such as edu.tw, edu.cn, gov.cn and gov.hk. 4. FINDINGS To show how localization influences online visibility, the collected data of visibility scores are unpacked and analysed as follows. 4.1 Concentrated visibility scores Figure 3 shows the respective proportion distribution and accumulative distribution of visibility scores for the top-100 most visible websites. It is evident that near 80% of the visibility scores are concentrated over the top-100 websites, and indeed three user- generated encyclopedia websites ranked highest: (1)wikipedia.org, (2)baidu.com and (3)hudong.com. For the website wikipedia.org, Chinese Wikipedia (zh.wikipedia.org) is the most visible; for Baidu.com, Baidu Baike (baike.baidu.org) is the most visible. Categories of Search Keywords The Cambridge Encyclopedia of China 990 Top 10 Search Terms (Google and Baidu) 387 Best Film/Popular Music (China, Hong Kong, Taiwan) 364 Modern Concepts (shared with modern Japanese) 171 Notable People 476 Nobel Prize Winners of Chinese origin 11 Major Chinese Politicians 187 Rich People (China, Hong Kong, Taiwan) 82 100 Contemporary Intellectuals (China) 100 Major Fugitives From Taiwan 17 Victims of White Terror in Taiwan 79 Potentially Sensitive Terms 112 Japanese AV porn stars 48 Prosecuted and Sentenced Corrupted Chinese Officials 14 Documented Filtered Words by Great Firewall 50 Total 2500 Numbers
  • 5. 5 Figure 3. Concentrated visibility scores Since the top-100 most visible websites account for more than 80% of the visibility scores, strong concentration effects are found. Thus, the following sub-section further examines these websites. 4.2 Tabulating visibility scores Table 2 tabulates the top-100 ranking websites, and their respective visibility scores for each search engine variants. Each cell shows the visibility score that a search engine variant has contributed to a particular website. For example, the first cell 34.30 indicates how much Baidu_CN has contributed to Chinese Wikipedia (zh.wikipedia.org). Table 2 Top-ranking websites: visibility scores Note that the top three are all user-generated encyclopedia: Chinese Wikipedia, Baidu Baike and Hudong Baike. For another example, the official news website of Falun Gong (epochtimes.com which is ranked at 18th) is completely blocked out from Baidu’s results (i.e. the zero visibility score suggests that it never show up in Baidu’s SERPs). It is in direct contrast, say for Yahoo_HK in third last column, where it enjoys visibility score higher than all other mainland-based website including Chinese official media People’s Daily (people.com.cn which is ranked at 15th), suggesting that the Falun Gong news website perform better even than People’s Daily for Yahoo Hong Kong. Therefore, Table 2 shows in detail which search engine variants favour which websites by citing and showing them more often and prominently in SERPs, rendering them easier to be found (at least for the selection of the search queries). The top-ranking websites include major China-based portals (e.g. baidu.com, sina.com.cn, qq.com, sohu.com and 163.com), US-based websites (e.g. youtube.com, facebook.com), mainland China-based news media websites (e.g. people.com.cn, xinhuanet.com, ifeng.com) and the aggregated category of mainland Chinese government websites (i.e. gov.cn). Table 2 orders the websites from the most visible one at the top row to the least visible at the bottom row, while the order of search engine variants is decided firstly by search engine providers (from Baidu, Google to Yahoo) then secondly by region (from CN, HK, SG to TW). It is relatively difficult, however, to see any pattern right away from Table 2 as it is tabulated. In other words, although each cell in the table shows the specific level of propensity that a search engine variant prefers a certain website in their SERPs, the table as a whole fails to show in a clear way the overall propensity of which "group" of search engine variants favours which "set" of websites. To identify patterns of converging and diverging, I will use blockmodeling analysis in the next subsection to study the visibility scores in Table 2, each of which represents the strength of ties between search engines and websites. To avoid arbitrary clustering results produced by less-consequential websites collected in the SERPs, only the top-100 most visible websites are considered for analysis. 4.3 Clustering using blockmodeling analysis Cluster analysis is commonly used for exploratory data mining to find how different data points can be grouped based on some statistical data analysis of similarities and differences. To find how “birds of a feather flock together” for the websites and search engine variants at hand, various clustering techniques can be applied, including the agglomerative hierarchical clustering analysis that produce a family tree that details how each data points can be grouped. Nonetheless, this study chooses blockmodeling analysis (Doreian, Batagelj, & Ferligoj, 2004) for the following reasons. First, a blockmodel analysis will produce simplified outcome that suits better for the research question at hand: to identify the rough patterns, without the need to see how specific details on which website is closer to another. Second, as to be shown later, a blockmodel analysis can greatly simplify a complex dataset to provide succinct summarization of the overall structure. Third, as researchers can and must design a blockmodel for data points to fit, a blockmodel analysis is particularly useful to identify converging and diverging patterns. It also provides a systematic way to see how the data points fit the model or not. Fourth, a blockmodel can be seen as a simplified network, and thus it can help to produce a simplified visualization of network data. It should be noted that the dataset can be seen as a two-mode network: Different “nodes” of search engine variants giving different visibility scores to different “nodes” of websites. It is thus equivalent to a network of visibility scores. High visibility scores indicate strong “relationship”. It is an example of two-mode network because there are two types of nodes (i.e. search engine variants and websites) and the relationship between the nodes is limited between the two types of nodes (i.e. the visibility score contributed by one search engine variant to one website). 4.3.1 A blockmodel design Before detailing how the cluster outcome helps identify the merging and diverging patterns systematically, it is necessary to explain the basis on which I design the blockmodel in Table 3. To build a blockmodel, researchers have to make design decisions on g g g g 0% 10% 20% 30% 40% 50% 60% 70% 80% 0 20 40 60 80 100 Accumulative Proportion Rank- ing Websites (Aggregated) Baidu _CN Google _CN Google _HK Google _SG Google _TW Yahoo _CN Yahoo _HK Yahoo _SG Yahoo _TW 1 zh.wikipedia.org 34.30 272.37 611.39 304.15 586.50 24.46 833.95 254.00 721.01 2 baike.baidu.com 661.93 410.28 174.04 433.81 125.52 72.44 39.10 508.05 4.88 3 hudong.com 5.30 107.93 71.29 107.92 57.31 267.17 2.54 168.23 0.35 4 baidu.com 385.80 51.36 13.29 53.21 9.93 20.52 7.17 102.80 1.65 5 sina.com.cn 59.18 76.85 21.69 69.33 16.63 41.70 2.04 35.29 0.68 6 knowledge.yahoo.com 0.10 0.03 0.29 0.36 93.46 20.33 140.07 7 edu.tw 0.46 5.14 21.14 7.21 64.29 0.06 30.61 21.07 102.98 8 qq.com 40.27 41.23 13.00 37.26 11.64 57.85 2.07 23.35 0.95 9 youtube.com 0.29 8.39 66.03 9.04 68.63 45.20 4.96 19.00 10 gov.cn 25.46 38.94 20.30 32.29 15.61 43.03 5.29 34.84 3.57 11 sohu.com 20.89 32.82 10.08 27.34 8.08 38.97 3.18 22.11 1.57 12 163.com 25.59 34.68 10.78 31.51 10.00 32.31 2.52 14.56 0.87 13 facebook.com 0.29 1.93 8.96 2.26 19.00 88.33 8.31 33.61 14 youku.com 42.04 29.12 10.32 19.34 8.41 36.38 1.03 15.31 0.64 15 people.com.cn 14.54 23.19 16.00 23.82 18.14 20.97 17.81 11.43 13.39 16 blog.sina.com.cn 21.73 28.47 15.41 26.79 13.95 9.75 4.27 33.78 2.53 17 xinhuanet.com 26.13 27.18 21.02 27.71 20.06 11.50 1.70 19.31 0.40 18 epochtimes.com 1.05 27.34 2.23 33.05 34.57 3.93 36.62 19 ifeng.com 25.67 25.13 11.86 24.39 9.67 16.70 4.20 10.12 2.56 20 baike.soso.com 11.08 7.60 1.31 5.93 1.05 29.16 0.29 63.30 0.04
  • 6. 6 the “connection types” (e.g. “complete” versus “null”) and the number of blocks. A block is said to be “complete” if all cells in that block indicate strong relationship and a block is said to be “null” if all cells in that block contain only weak or none relationship. Thus the three by three blockmodel in Table 3 assumes the data points will fit into nine blocks. For this study, nine search engines will be divided into three groups, and the top-100 websites will be categorized into three sets of websites. Table 3 Expected outcome of blockmodeling   The rationale behind this model is to identify converging and diverging patterns. The second part of the Table 3 shows how three groups of search engine variants (Cluster A, B and C) may converge or diverge on different sets of websites (Cluster X, Y and Z). Thus, I assume a middle ground of websites exist: for all search engine variants, there will be a set of websites that are all visible (i.e. Cluster Y). That is, Cluster A, B and C converge on Cluster Y with high visibility scores, indicated by the dark blocks containing strong ties (i.e. high visibility scores). To account for any deviation from the "converging" middle ground, I expect two blocks of low- visibility cells (i.e. weak or none relationship), as represented by two white cells in Table 3): one at the top-left and another at the bottom-right. Both blocks thus indicate the patterns of divergence, or lack of convergence. For this study, if all search engine variants converge on the same top visible websites, then there should be no patterns of divergence. Using this scenario of complete convergence as the null hypothesis (no difference in visibility patterns), I expect some evidence of diverging effects to reject the null hypothesis. If there is a significant number of websites in the low-visibility blocks (one at upper-left and another at lower-right corner), then the diverging patterns are identified accordingly. 4.3.2 Patterns of merging and diverging Using the blockmodeling function provided by a social network analysis tool called Pajek, the 9 by 100 cells of strong versus weak ties are simplified into the three-by-three blockmodel, as shown in Table 4. For each cell, the color represents strong (dark) or weak (white) ties, and these cells are roughly partitioned into three-by- three blocks, thereby effectively clustering the nine search engine variants into three groups and the 100 most visible websites into three sets. It is not a perfect match, and there are 87 cells out of 900 (9.67%) that does not match the designed block model. Given the space limitation, only the top-20 websites in full. As shown in Table 4, for the top 100 websites, 39 of them are categorized into the first cluster of websites (Cluster X), 13 to Cluster Y and 49 to Cluster Z. If we look at the top-20 most visible websites only, the converging set of websites (Cluster Y) is thin (only one website). This website (people.com.cn) belongs to the Chinese official party organ media People’s Daily. Table 4 Blockmodeling outcome weak strong strong strong strong strong strong strong weak Rank- ing Websites (Aggregated) Baidu_ CN Yahoo_ CN Google _CN Yahoo_ SG Google _SG Google _TW Google _HK Yahoo_ HK Yahoo_ TW 1 zh.wikipedia.org 34.30 24.46 272.37 254.00 304.15 586.50 611.39 833.95 721.01 6 knowledge.yahoo.com 0.10 0.00 0.03 20.33 0.00 0.36 0.29 93.46 140.07 7 edu.tw 0.46 0.06 5.14 21.07 7.21 64.29 21.14 30.61 102.98 9 youtube.com 0.29 0.00 8.39 4.96 9.04 68.63 66.03 45.20 19.00 13 facebook.com 0.29 0.00 1.93 8.31 2.26 19.00 8.96 88.33 33.61 18 epochtimes.com 0.00 0.00 1.05 3.93 2.23 33.05 27.34 34.57 36.62 … and other 33 websites (The total number of websites is 39 for this block) 15 people.com.cn 14.54 20.97 23.19 11.43 23.82 18.14 16.00 17.81 13.39 … and other 12 websites (The total number of websites is 13 for this block) 2 baike.baidu.com 661.93 72.44 410.28 508.05 433.81 125.52 174.04 39.10 4.88 3 hudong.com 5.30 267.17 107.93 168.23 107.92 57.31 71.29 2.54 0.35 4 baidu.com 385.80 20.52 51.36 102.80 53.21 9.93 13.29 7.17 1.65 5 sina.com.cn 59.18 41.70 76.85 35.29 69.33 16.63 21.69 2.04 0.68 8 qq.com 40.27 57.85 41.23 23.35 37.26 11.64 13.00 2.07 0.95 10 gov.cn 25.46 43.03 38.94 34.84 32.29 15.61 20.30 5.29 3.57 11 sohu.com 20.89 38.97 32.82 22.11 27.34 8.08 10.08 3.18 1.57 12 163.com 25.59 32.31 34.68 14.56 31.51 10.00 10.78 2.52 0.87 14 youku.com 42.04 36.38 29.12 15.31 19.34 8.41 10.32 1.03 0.64 16 blog.sina.com.cn 21.73 9.75 28.47 33.78 26.79 13.95 15.41 4.27 2.53 17 xinhuanet.com 26.13 11.50 27.18 19.31 27.71 20.06 21.02 1.70 0.40 19 ifeng.com 25.67 16.70 25.13 10.12 24.39 9.67 11.86 4.20 2.56 20 baike.soso.com 11.08 29.16 7.60 63.30 5.93 1.05 1.31 0.29 0.04 … and other 35 websites (The total number of websites is 48 for this block) relatively strong versus weak: vs blockmodel: strong weak This blockmodeling findings also help identify the merging and diverging patterns of search engine variants. Cluster A contains Baidu_CN, Yahoo_CN and Google_CN; Cluster B contains Google_HK, Google_SG, Google_TW and Yahoo_SG; Cluster C contains Yahoo_HK and Yahoo_TW. The cluster outcome shown in Table 5 indicates both patterns of merging and diverging, determined by the choice of search engine variants. For the three groups of search engine variants, two groups of search engine variants deviate from the rest. The first group (Cluster A) contains search engine variants designed for mainland China (Baidu_CN, Yahoo_CN and Google_CN), and the second group (Cluster C) contains the Yahoo Search for Taiwan and Hong Kong (Yahoo_HK and Yahoo_TW). Thus, while the search engine variants in Cluster B produce converging results for the top-100 websites, with “complete” connection types to all clusters of websites, those in Cluster A and those in Cluster C lead to diverging SERP. Table 5 Clusters identified by blockmodeling 4.4 Visualizing and unpacking findings To show the results of visibility scores in a more intuitive manner, a network visualization graph of the top-800 most visible websites is shown in Figure 4. I visualize the nine search engine variants (shown as the text boxes at the peripheral) and 800 most visible websites (shown as nodes in the middle). Thus, the two-mode network is presented in a way to indicate the overall likelihood for a given search engine variant to recommend a website shown in the middle. Pointing only from one node of search engine variant to one node of website, each arrow represents a total visibility score Cluster A Cluster B Cluster C Cluster X complete complete Cluster Y complete complete complete Cluster Z complete complete Cluster A Cluster B Cluster C Cluster X Cluster Y Cluster Z converging converging converging Cluster A Cluster B Cluster C Baidu_CN Google_HK Yahoo_HK Google_CN Google_SG Yahoo_TW Yahoo_CN Google_TW Websites # Yahoo_SG Cluster X 39 complete complete Cluster Y 13 complete complete complete Cluster Z 48 complete complete
  • 7. 7 contributed by a search engine variant to a website, with its arrow width proportional to the values of visibility scores: Wider arrows indicate higher visibility scores . Similarly, the area size of a node is proportional to the sum of visibility scores a website receive from all search engine variants, allowing easy comparison on which websites are more visible. Note that the visibility scores are distributed quiet unevenly and thus only the top 20 are marked with their respective ranking numbers. User-generated encyclopedias are the most visible websites (node 1: Chinese Wikipedia , node 2: Baidu Baike, node 3: Hudong). For another, Chinese Wikipedia(1) is highly visible to almost all variations except Yahoo_CN and Baidu_CN, while Baidu Baike(2) highly visible in Baidu_CN, Google_CN, Google_SG, and moderately so in Google_HK. Based on the previous clustering results, two red dash lines are also drawn in Figure 4, roughly indicating three areas. Positioned in the middle are the search engine variants in Cluster B, because of their converging patterns on strong ties with most websites. The two red dash lines also show the search engine variants in Cluster A to the left and those in Cluster C to its right, indicating diverging effects because of the presence of weak ties. This explains why Cluster A and Cluster C is shown adjacent to Cluster B, but not adjacent to each other. This visualization is thus consistent with the findings shown in Table 5. This blockmodeling findings also help identify the merging and diverging patterns of search engine variants. Cluster A contains Baidu_CN, Yahoo_CN and Google_CN; Cluster B contains Google_HK, Google_SG, Google_TW and Yahoo_SG; Cluster C contains Yahoo_HK and Yahoo_TW. The cluster outcome shown in Table 5 indicates both patterns of merging and diverging, determined by the choice of search engine variants. For the three groups of search engine variants, two groups of search engine variants deviate from the rest. The first group (Cluster A) contains search engine variants designed for mainland China (Baidu_CN, Yahoo_CN and Google_CN), and the second group (Cluster C) contains the Yahoo Search for Taiwan and Hong Kong (Yahoo_HK and Yahoo_TW). Thus, while the search engine variants in Cluster B produce converging results for the top-100 websites, with “complete” connection types to all clusters of websites, those in Cluster A and those in Cluster C lead to diverging SERP. The findings can also be unpacked depending the specific search engine variant. Based on the same method, an additional 500 Chinese names of the Fortune 500 companies are added to the selection of 2500 search queries, producing a second dataset in 2012 (Liao, 2013a). The following paragraphs unpack this second dataset for two search engine variants in mainland China: Google_CN (see Table 6) and Baidu_CN (see Table 7). The results for the top-20 websites for each categories of search queries of Google_CN, as shown in Table 6, show that Baidu.com rank the top in almost all categories. Wikipedia.org is close second here for Google_CN, suggesting a general observation that search engines favour user-generated encyclopedias. The particular findings also provide some counter evidence against the idea that Google as a specific comapny favour Wikipedia as a website because Google_CN actually favours Baidu Baike more than Chinese Wikipedia, as clearly shown in Table 6. The findings of Baidu_CN in Table 7 shows even more dominance by Baidu Baike: It dominates all of seven categories with the proportion of visibility scores is comparatively much concentrated when compared to the results of Google_CN (see Table 6). In addition, when considering the ranking position of hudong.com, the findings seem to confirm the unfair competition accusation made by Hudong’s CEO against Baidu (Yang, 2011). Depending on the types of search quries, Hudong.com is ranked by Google_CN from 3rd to 9th (see Table 6). In contrast, Hudong Baike is not even among the top-20 for many categories of the sampled queries for Baidu Search. Indeed, if Google’s SERP can serve as an independent third party for the competition between Baidu Baike Figure 4. Delineating the boundaries of geo-linguistic settings based on SERPs. Rank- ing Websites (Aggregated) 1 zh.wikipedia.org 2 baike.baidu.com 3 hudong.com 4 baidu.com 5 sina.com.cn 6 knowledge.yahoo.com 7 edu.tw 8 qq.com 9 youtube.com 10 gov.cn 11 sohu.com 12 163.com 13 facebook.com 14 youku.com 15 people.com.cn 16 blog.sina.com.cn 17 xinhuanet.com 18 epochtimes.com 19 ifeng.com 20 baike.soso.com
  • 8. 8 and Hudong, Google does not make Hudong almost invisible as Baidu does. Hence if users from mainland China use Google Search instead of Baidu Search, then Chinese Wikipedia will become equally visible as Baidu Baike for them. 5. DISCUSSION By systematically analysing the SERPs collected across four major Chinese-speaking regions, it is shown that the patterns of merging and diverging do exist. It is achieved by calculating visibility scores as the equivalent “social ties” between search engine variants on one hand and top-ranking websites on the other. Both the network visualization and the blockmodeling outcomes show that the geo- linguistic factors do make Chinese-language SERPs diverge on certain websites, while converging on another. In particular, of the nine search engine variants, the first group that diverges from the rest contains search engine variants designed for mainland China (Baidu_CN, Yahoo_CN and Google_CN), The second group contains the Yahoo Search for Taiwan and Hong Kong (Yahoo_HK and Yahoo_TW). The findings suggest that the major online boundary in Chinese Internet is drawn first along the line of regional difference, with all mainland Chinese search engine settings share similar SERPs among themselves, but not with the others to the same degree, as shown in Figure 4. Another boundary is drawn for Yahoo Taiwan and Yahoo Hong Kong at the other end. It is relatively easy to explain the latter results because Yahoo Search by default prioritizes local content, with other geo-linguistic variant options available for users listed in the web interface: e.g. “search the traditional Chinese-character-written web pages” or “search the global websites”. In contrast, it is relatively difficult to provide just technical explanations regarding the question why all three mainland Chinese settings do not share that much with other settings in terms of the corresponding SERPs. It is likely that many of the websites that are absent from the SERPs in three mainland Chinese settings include those are not politically welcome in mainland China. Note that the first two columns in Table 5 represent Baidu_CN and Yahoo_CN, both of which constantly have weak ties with several of the top 100 websites. The two search engine variants also represent the only two that filter SERPs for users in mainland China. Note also that the third column in Table 5 represents Google_CN. While it is clustered with Baidu_CN and Yahoo_CN, it has more strong ties with the top 100 websites, suggesting it has less divergent results. The findings seems to suggest that users from mainland China, if using only Baidu_CN and Yahoo_CN, will have a substantial number of otherwise highly visible websites overlooked or even missing from their daily search experiences. These include websites such as YouTube and Facebook that have been reported being blocked by mainland China. They also include the websites of government and education institutions in Taiwan and Hong Kong: gov.tw gov.hk, edu.tw and edu.hk. In other words, the Table 6 Results for Google_CN Ranking 1 baidu.com 47.65% baidu.com 25.08% baidu.com 36.44% baidu.com 37.28% wikipedia.org 28.98% baidu.com 27.89% mbalib.com 27.99% 2 wikipedia.org 25.36% wikipedia.org 12.94% wikipedia.org 15.33% wikipedia.org 24.13% baidu.com 26.82% wikipedia.org 25.14% baidu.com 16.67% 3 hudong.com 8.74% sina.com.cn 12.06% sina.com.cn 9.46% hudong.com 11.00% hudong.com 7.66% hudong.com 9.63% fortunechina.com 13.65% 4 sina.com.cn 2.58% qq.com 6.67% douban.com 5.00% mbalib.com 3.55% sina.com.cn 7.10% sina.com.cn 7.17% wikipedia.org 8.74% 5 ifeng.com 2.03% 163.com 6.01% qq.com 4.45% sina.com.cn 3.18% xinhuanet.com 4.81% sohu.com 3.59% qq.com 4.09% 6 artxun.com 1.33% sohu.com 5.86% hudong.com 3.60% people.com.cn 2.66% people.com.cn 4.03% people.com.cn 3.34% qkankan.com 3.79% 7 soso.com 1.30% hudong.com 4.27% sohu.com 3.33% qq.com 2.64% qq.com 3.39% xinhuanet.com 2.95% sina.com.cn 3.62% 8 zdic.net 1.13% youku.com 4.26% youku.com 3.14% hc360.com 1.61% ifeng.com 3.30% youku.com 2.74% ifeng.com 3.59% 9 tiexue.net 1.07% xinhuanet.com 3.29% 163.com 3.09% sohu.com 1.50% 163.com 2.93% qq.com 2.45% hudong.com 3.41% 10 cncn.com 1.06% ifeng.com 2.78% mtime.com 2.14% 163.com 1.46% sohu.com 2.30% iciba.com 1.94% gold678.com 3.20% 11 xinhuanet.com 1.04% douban.com 2.47% youtube.com 1.77% hexun.com 1.44% weibo.com 1.71% 163.com 1.90% 163.com 2.43% 12 artx.cn 1.03% people.com.cn 2.31% 1ting.com 1.63% ifeng.com 1.43% youtube.com 1.55% ifeng.com 1.72% ciipp.com 1.29% 13 people.com.cn 0.96% hexun.com 1.85% weibo.com 1.58% studa.net 1.26% boxun.com 1.25% 360doc.com 1.50% sohu.com 1.15% 14 youku.com 0.84% huanqiu.com 1.59% m1905.com 1.56% 3edu.net 1.05% hexun.com 0.78% youtube.com 1.45% egouz.com 1.12% 15 163.com 0.83% youtube.com 1.57% iqiyi.com 1.50% 39.net 1.04% renren.com 0.62% sogou.com 1.25% bitauto.com 1.11% 16 sohu.com 0.73% yahoo.com 1.51% sogou.com 1.39% edu.cn 1.02% edu.tw 0.60% tianya.cn 1.12% people.com.cn 0.96% 17 qq.com 0.63% gov.tw 1.45% tudou.com 1.35% jrj.com.cn 1.00% china.com.cn 0.58% laonanren.com 1.03% zol.com.cn 0.93% 18 edu.tw 0.60% iqiyi.com 1.43% ifeng.com 1.27% chinaacc.com 0.97% libertytimes.com.tw 0.55% hexun.com 0.89% hexun.com 0.88% 19 edu.cn 0.54% weibo.com 1.32% xiami.com 1.07% xinhuanet.com 0.95% twitter.com 0.53% soso.com 0.81% yup.cn 0.72% 20 5156edu.com 0.54% tudou.com 1.27% pptv.com 0.91% youku.com 0.83% yahoo.com 0.52% cfdd.org.cn 0.76% google.cn 0.66% Fortune500 The Cambridge Encyclopedia of China Top 10 Search Terms (Google and Baidu) Best Film/Popular Music (China, Hong Kong, Taiwan) Modern Concepts (shared with modern Japanese) Notable People Potentially sensitive terms Table 7 Results for Baidu_CN Ranking 1 baidu.com 75.74% baidu.com 64.17% baidu.com 73.28% baidu.com 81.56% baidu.com 57.53% baidu.com 69.54% baidu.com 61.90% 2 wikipedia.org 6.20% youku.com 4.79% youku.com 6.66% wikipedia.org 2.41% wikipedia.org 7.48% wikipedia.org 5.30% mbalib.com 7.62% 3 hudong.com 1.98% sina.com.cn 4.59% iqiyi.com 2.57% sina.com.cn 2.16% qq.com 6.12% sina.com.cn 3.38% fortunechina.com 7.13% 4 sina.com.cn 1.94% qq.com 4.13% douban.com 2.30% qq.com 2.05% sina.com.cn 5.00% qq.com 3.23% sina.com.cn 3.20% 5 youku.com 1.86% sohu.com 3.05% tudou.com 1.91% youku.com 1.59% ifeng.com 2.82% youku.com 2.17% ifeng.com 2.27% 6 soso.com 1.64% iqiyi.com 2.73% sina.com.cn 1.65% xinhuanet.com 1.14% people.com.cn 2.52% sohu.com 1.73% fx678.com 1.91% 7 qq.com 1.61% 163.com 2.32% weibo.com 1.61% www.gov.cn 1.10% sohu.com 2.46% xinhuanet.com 1.68% zol.com.cn 1.73% 8 ifeng.com 1.18% tudou.com 1.91% qq.com 1.55% edu.cn 0.89% xinhuanet.com 2.31% 163.com 1.50% wikipedia.org 1.73% 9 douban.com 1.13% xinhuanet.com 1.53% xunlei.com 1.48% ifeng.com 0.80% 163.com 1.84% tianya.cn 1.47% qq.com 1.60% 10 tiexue.net 0.89% douban.com 1.28% mtime.com 1.07% sohu.com 0.78% soso.com 1.68% people.com.cn 1.40% bitauto.com 1.53% 11 weather.com.cn 0.88% ifeng.com 1.24% letv.com 0.78% people.com.cn 0.74% weibo.com 1.52% hexun.com 1.27% 163.com 1.38% 12 edu.cn 0.61% renren.com 1.19% m1905.com 0.78% douban.com 0.60% uname.cn 1.40% soso.com 1.17% qkankan.com 1.37% 13 xilu.com 0.59% letv.com 1.14% 163.com 0.73% 163.com 0.59% renren.com 1.36% douban.com 1.04% gongchang.com 1.05% 14 xinhuanet.com 0.58% weibo.com 0.97% verycd.com 0.68% rayli.com.cn 0.59% kaixin001.com 1.32% tudou.com 0.89% ticarefree.cn 1.04% 15 163.com 0.58% wikipedia.org 0.97% sohu.com 0.55% hao123.com 0.57% douban.com 0.97% bitauto.com 0.88% soso.com 0.86% 16 guoxue.com 0.57% zol.com.cn 0.93% 1ting.com 0.53% jrj.com.cn 0.50% youku.com 0.85% ifeng.com 0.73% yingjiesheng.com 0.83% 17 360buy.com 0.52% xunlei.com 0.80% pptv.com 0.50% huanqiu.com 0.49% 360buy.com 0.78% sensagent.com 0.70% autohome.com.cn 0.74% 18 qidian.com 0.51% taobao.com 0.80% ku6.com 0.48% iqiyi.com 0.48% www.gov.cn 0.73% hudong.com 0.66% xgo.com.cn 0.73% 19 tudou.com 0.51% huanqiu.com 0.74% yinyuetai.com 0.48% bankcomm.com 0.47% edu.cn 0.73% yangbihu.com 0.65% eastmoney.com 0.70% 20 sohu.com 0.50% 4399.com 0.71% wikipedia.org 0.42% chinaacc.com 0.46% hudong.com 0.58% tiexue.net 0.61% people.com.cn 0.68% Fortune500 The Cambridge Encyclopedia of China Top 10 Search Terms (Google and Baidu) Best Film/Popular Music (China, Hong Kong, Taiwan) Modern Concepts (shared with modern Japanese) Notable People Potentially sensitive terms
  • 9. 9 SERPs of the three mainland Chinese variants seem to diverge from these websites. In contrast, the websites of government and education institutions in mainland China, gov.cn and edu.cn, are still relatively visible for almost all other search engine variants except for the by-default-local Yahoo_TW and Yahoo_HK. Thus, the patterns of merging and diverging seem to reflect the cultural political complications of Chinese-language internet. While the offline boundary between Hong Kong and Taiwan seems to be overcome, that between mainland China and Hong Kong seems to be reinforced. Although the SERP data may not reflect perfectly what users actually read and click, it nonetheless indicates a general probabilistic tendency substantiated by industry data. 6. CONCLUSION The findings, visualized and analysed using network analysis techniques, clearly indicate a strong localization effects on the gatekeeping function of search engines, based on data covering over 97% of the search engine market for four Chinese-speaking regions. The findings also show major user-generated encyclopedias such as Baidu Baike and Chinese Wikipedia do dominate the SERPs with high rankings and visibility scores. Because of the geo-linguistic factors coincide with different cultural political situations of these Chinese-speaking regions, different localization variants produce divergent outcomes of high- ranking encyclopedia and other websites, thereby indicating strong effects of “network gatekeeping” by search engines in exercising gatekeeping bases of “display” and “localization”(Barzilai-Nahon, 2008). In addition, by examining the overall patterns of SERPs, I have demonstrated the merging and diverging effects contributed by the factors of search engine providers and regional and language settings. Different combinations of such provider and geo-linguistic information lead to different “search engine variants”. Nine major search engine variants, covering four regions with Chinese- speaking majority population, are identified for the Chinese- language internet. For a selected set of search queries covering major Chinese cultural and political topics, I have found that the SERPs converge on a specific type of websites (i.e. user-generated encyclopedias) and that some search engine variants converge more on Baidu Baike while other on Chinese Wikipedia. The merging and diverging patterns are further analysed by both network visualization and network analysis (blockmodeling analysis of two- mode networks). Different patterns indicate that both “nationalization” of a specific kind (i.e. mainland China) and “trans-nationalization” (i.e. Hong Kong and Taiwan) can be achieved by different gatekeeping options offered by various search engine variants. The results show that the SERPs are more likely to converge based on similar geo-linguistic preferences. For example, the SERPs diverge the most when users choose different Chinese characters (i.e. simplified Chinese versus traditional Chinese). It is then particularly intriguing that all Hong Kong variant results converge more with Taiwanese variant ones and much less so with mainland Chinese variants, while Hong Kong is much closer to mainland China geographically, politically and administratively. In addition, Chinese Wikipedia is much more visible in these regions than in mainland China. Though the findings here cannot further breakdown the geo-linguistic factors from cultural political ones, the converging and diverging patterns alone are important findings for Chinese-internet research and Wikipedia research. There are of course obvious limitations for the findings presented above. First, the selection of search query, while significant larger than previous social scientific research on Chinese-language search engines(Jiang & Akhtar, 2011), is still limited. Second, due to limitation of space, this paper has not yet fully unpacked the different findings for different categories of search queries. Third, only standard Mandarin Chinese terms are used for this research, overlooking other possibilities of written Cantonese queries (Chau, Fang, & Yang, 2007). Forth but not last, only the default setting for each localized search engine is analysed. While the dataset presented may be limited in the scope of selected search queries, time and search engine variants, I have demonstrated the usefulness and viability of examining the merging and diverging patterns because of the search engine variants, each of which correspond to a segment of search engine market. For instance, it can help online linguistics research by analysing different SERP outcome for regions that use a shared writing system but with regional variants, such as the difference between Egyptian Arabic and Maghrebi Arabic. For another example, these geo-linguistic factors can be said to constitute one of the most important online “situations” for online media, as defined by medium theorists in the tradition of media ecology (Meyrowitz, 1986, 1994), because these factors set the patterns of access. According to a statistical report by the Data Center of China Internet, During the first half year of 2010, the content produced by amateur Chinese Internet users have surpassed that produced by professional websites (Liao, 2013b; Qiang, 2010). Thus user- generated content by Chinese Internet users are expected to have influenced user-generated encyclopedias directly and SERP indirectly. While this study has not yet addressed the relationship among search engines, user-generated content and user-generated encyclopedias, the findings here seems to suggest similar geographic and linguistic dynamics. The clear outcome of “network gatekeeping”, identified by Chinese search engine variants and their respective preferred encyclopedias, may point to a larger online context for Chinese Internet users across regions. For future research, it will be useful to examine how geographic and linguistic factors may influence the network gatekeeping processes inside user-generated encyclopedias (Liao, 2009). It is likely that they also exercise the gatekeeping bases of “display” and “localization” as search engines do. The overall method can be systematically extended for other contexts. Various search engine variants can be chosen for research for almost all the other language in the world, including languages with transnational adoption such as Arabic, Hindu, Tamil, English, Spanish, Portuguese, etc. Researchers can thus further interpret the merging and diverging SERP outcome for research questions that are relevant for global, transnational or inter-cultural communications on one hand, and another set of questions for human-computer interaction and information system on the other. Also, the focus on examining geo-linguistic factors as important variables for understanding search engines can contribute to the development of geo-linguistic analysis of the Web (Liao & Petzold, 2011; Petzold & Liao, 2011). It can also be adopted for market and industry applications when geo-linguistic identifiers are central (DePalma, 2002; Dunne, 2006) . In conclusion, the proposed method has the potentials for a wider range of market and academic applications. The theoretical implication may be extended to other websites or information systems that produce or curate different outcome based on geographic and linguistic preferences (or configurations) of users. It highlights the role of geo-linguistic parameters as media “access codes”, or set patterns of access to information as articulated by medium theorists for TV research (Meyrowitz, 1986, 1994), or the
  • 10. 10 “network gatekeeping process” theorized by new information science theory (Barzilai-Nahon, 2008). Localization has become the new medium that has the (higher-level) messages of cultural political integration, reintegration or fragmentation of users. 7. ACKNOWLEDGMENTS This work was supported by the Taiwan National Science Council’s Taiwan Merit Scholarships Program (NSC-095-SAF-I- 564-028-TMS) and supported in part by the Oxford Internet Institute Scholarship. Special thanks to Ralph Schroeder, Bernie Hogan, Scott Hales and Min Jiang for their advice and support. 8. REFERENCES Aragón, P., Kaltenbrunner, A., Laniado, D., & Volkovich, Y. (2012). Biographical Social Networks on Wikipedia - A cross- cultural study of links that made history. In Proceedings of WikiSym 2012. Retrieved from http://arxiv.org/abs/1204.3799 Bao, P., Hecht, B., Carton, S., Quaderi, M., Horn, M., & Gergle, D. (2012). Omnipedia: bridging the Wikipedia language gap. In Proceedings of the 2012 ACM annual conference on Human Factors in Computing Systems (pp. 1075–1084). Retrieved from http://dl.acm.org/citation.cfm?id=2208553 Bar‐Ilan, J. (2006). Web links and search engine ranking: The case of Google and the query “jew.” Journal of the American Society for Information Science and Technology, 57(12), 1581–1589. doi:10.1002/asi.20404 Barzilai-Nahon, K. (2008). Toward a theory of network gatekeeping: A framework for exploring information control. Journal of the American Society for Information Science and Technology, 59(9), 1493–1512. doi:10.1002/asi.20857 Battelle, J. (2005). The Search: How Google and Its Rivals Rewrote the Rules of Business and Transformed Our Culture (First Edition.). Portfolio Hardcover. BBC. (2011, March 31). Google’s China exit “exaggerated.” BBC. Retrieved from http://www.bbc.co.uk/news/business-12917322 Benkler, Y. (2006). The Wealth of Networks: How Social Production Transforms Markets and Freedom. New Haven and London: Yale University Press. Retrieved from http://www.congo-education.net/wealth-of-networks/ Bermejo, F. (2009). Audience manufacture in historical perspective: from broadcasting to Google. New Media & Society, 11(1-2), 133 –154. doi:10.1177/1461444808099579 Brettel, M., & Spilker-Attig, A. (2010). Online advertising effectiveness: a cross-cultural comparison. Journal of Research in Interactive Marketing, 4(3), 176–196. doi:10.1108/17505931011070569 Charlton, G. (2012, February 13). Why Wikipedia is top on Google: the SEO truth no-one wants to hear. Econsultancy: Digital Marketers United. Retrieved from http://econsultancy.com/blog/9009-why-wikipedia-is-top-on- google-the-seo-truth-no-one-wants-to- hear?utm_campaign=bloglikes&utm_medium=socialnetwork& utm_source=facebook Chau, M., Fang, X., & Yang, C. C. (2007). Web searching in Chinese: A study of a search engine in Hong Kong. Journal of the American Society for Information Science and Technology, 58(7), 1044–1054. doi:10.1002/asi.20592 Chen, J. (2008). Essays on auction mechanisms and resource allocation in keyword advertising (The University of Texas at Austin). ProQuest. CIC. (2009). China Search Engine Market Report 2009. Beijing, China: China IntelliConsulting Corporation. Retrieved from http://tech.sina.com.cn/z/2009ssdc/index.shtml CNNIC. (2006, September 16). Chinese Search Engine Market Survey Report 2006. China Internet Network Information Center. Retrieved November 19, 2011, from http://xtlv.cn/html/Dir/2006/11/06/4216.htm CNNIC. (2007, September 26). 2007 Survey Report on Search Engine Market in China. China Internet Network Information Center. Retrieved November 19, 2011, from http://www.cnnic.cn/html/Dir/2007/10/10/4838.htm CNNIC. (2009, March 5). China Search Engine Report 2008 Advertisers and Users Behavior Study. (中国搜索引擎市场广 告主与用户行为研究报告). Retrieved November 19, 2011, from http://www.cnnic.cn/html/Dir/2009/03/05/5483.htm Couvering, E. V. (2004). New Media? The Political Economy of Internet Search Engines. Presented at the International Association of Media & Communications Researchers, Porto Alegre, Brazil. Retrieved from http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.129. 1900 Couvering, E. V. (2008). The History of the Internet Search Engine: Navigational Media and the Traffic Commodity. In A. Spink & M. Zimmer (Eds.), Web Search (Vol. 14, pp. 177–206). Berlin, Heidelberg: Springer Berlin Heidelberg. Retrieved from http://www.springerlink.com/content/xn75781g305j756h/ Čuhalev, J. (2006). Ranking of Wikipedia articles on search engines for searches about its own articles (Seminar Task for Internet Search Techniques and Business Intelligence class) (p. 7). Retrieved from http://www.jurecuhalev.com/blog/2006/10/13/seeing-lots-of- wikipedia-in-your-google-searches/ Dahlberg, L. (2005). The Corporate Colonization of Online Attention and the Marginalization of Critical Communication? Journal of Communication Inquiry, 29(2), 160 –180. doi:10.1177/0196859904272745 Damm, J. (2007). The Internet and the fragmentation of Chinese society. Critical Asian Studies, 39, 273–294. doi:doi:10.1080/14672710701339485 DePalma, D. A. (2002). Internationalization and Localization. In Business without borders: a strategic guide to global marketing. New York: John Wiley and Sons. Doreian, P., Batagelj, V., & Ferligoj, A. (2004). Generalized blockmodeling of two-mode network data. Social Networks, 26(1), 29–53. doi:10.1016/j.socnet.2004.01.002 Dunleavy, P., Margetts, H., Bastow, S., Pearce, O., & Tinkler, J. (2007). Government on the internet: progress in delivering information and services online. UK: National Audit Office. Retrieved from http://www.nao.org.uk/publications/nao_reports/06- 07/0607529.pdf Dunne, K. J. (2006). Perspectives on Localization. John Benjamins Publishing Company. Dutta, S., Dutton, W. H., & Law, G. (2011). The New Internet World: A Global Perspective on Freedom of Expression, Privacy, Trust and Security Online. SSRN eLibrary. Retrieved from http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1810005 Einhorn, B. S., Bruce. (2010, November 11). How Baidu Won China. BusinessWeek: Online Magazine. Retrieved from http://www.businessweek.com/magazine/content/10_47/b4204 060242597_page_6.htm Enquiro. (2007, June 15). Chinese Eye Tracking Study: Baidu Vs Google. Retrieved July 9, 2009, from http://searchengineland.com/chinese-eye-tracking-study-baidu- vs-google-11477
  • 11. 11 Fallows, D. (2008). Search Engine Use. Pew Research Center’s Internet & American Life Project. Retrieved November 19, 2011, from http://www.pewinternet.org/Reports/2008/Search- Engine-Use.aspx Goldberg, D., Nichols, D., Oki, B. M., & Terry, D. (1992). Using collaborative filtering to weave an information tapestry. Commun. ACM, 35(12), 61–70. doi:10.1145/138859.138867 Gray, M. (2007, May). Google Love Affair with Wikipedia - Graywolf’s SEO Blog. Graywolf’s SEO Blog. Retrieved December 2, 2011, from http://www.wolf- howl.com/google/google-love-affair-with-wikipedia/ Hargittai, E. (2007). The Social, Political, Economic, and Cultural Dimensions of Search Engines: An Introduction. Journal of Computer‐Mediated Communication, 12(3), 769–777. Hearne, R. (2006, August 12). SERP Click Through Rate of Google Search Results – AOL-data.tgz – Want to Know How Many Clicks The #1 Google Position Gets? Red Cardinal. Retrieved December 2, 2011, from http://www.redcardinal.ie/search- engine-optimisation/12-08-2006/clickthrough-analysis-of-aol- datatgz/ Hecht, B., & Gergle, D. (2010). The tower of Babel meets web 2.0: user-generated content and its applications in a multilingual context. In Proceedings of the 28th international conference on Human factors in computing systems (pp. 291–300). Retrieved from http://dl.acm.org/citation.cfm?id=1753370 Hopkins, H. (2009, January 23). Britannica 2.0: Wikipedia Gets 97% of Encyclopedia Visits. Hitwise Intelligence: Analyst Weblog. Retrieved from http://weblogs.hitwise.com/us-heather- hopkins/2009/01/britannica_20_wikipedia_gets_9.html Hussain, S., & Mohan, R. (2008). Localization in Asia Pacific. In F. Librero & P. B. Arinto (Eds.), Digital Review of Asia Pacific 2007/2008. Orbicom and the International Development Research Centre (IDRC). Retrieved from http://www.idrc.ca/openebooks/377-5/ IDATE. (2011). World Internet Usage & Markets. IDATE Consulting and Research. Retrieved from http://www.idate.org/en/Research-store/Collection/Market- Data-Reports_23/World-Internet-Usage-Markets_584.html Jansen, B. J., Brown, A., & Resnick, M. (2007). Factors relating to the decision to click on a sponsored link. Decision Support Systems, 44(1), 46–59. doi:10.1016/j.dss.2007.02.009 Jansen, B. J., & Mullen, T. (2008). Sponsored search: an overview of the concept, history, and technology. Int. J. Electronic Business, 6(2), 114–131. Jansen, J. (2011). Understanding Sponsored Search: Core Elements of Keyword Advertising. Cambridge University Press. Jiang, M., & Akhtar, A. (2011). Peer into the Black Box of Chinese Search Engines: A Comparative Study of Baidu, Google, and Goso. Presented at the The 9th Chinese Internet Research Conference (CIRC 2011), Washington, D.C.: Institute for the Study of Diplomacy. Georgetown University. Jones, R. (2007, June 26). 96.6% of Wikipedia Pages Rank in Google’s Top 10. The Google Cache: Search Engine Marketing, SEO & PPC. Retrieved December 2, 2011, from http://www.thegooglecache.com/white-hat-seo/966-of- wikipedia-pages-rank-in-googles-top-10/ Jucquois-Delpierre, M. (2007). Fictional reality or real fiction: how can one decide?: The strengths and weaknesses of information science concepts and methods in the media world. Journal of Information, Communication & Ethics in Society, 5(2/3), 235– 252. doi:10.1080/14616700306488 Jung, G. (2008). The Increasing Relevance of Online Marketing. GRIN Verlag. Khanna, A. (2011, October 26). Google drives traffic to Wikipedia, but half of readers look for Wikipedia content — Wikimedia blog. Wikimedia Foundation: Global blog. Official blog. Retrieved from http://blog.wikimedia.org/2011/10/26/search- and-wikipedia/ Liao, H.-T. (2008). A webometric comparison of Chinese Wikipedia and Baidu Baike and its implications for understanding the Chinese-speaking Internet. In 9th annual Internet Research Conference: Rethinking Community, Rethinking Place. Copenhagen. Liao, H.-T. (2009). Conflict and Consensus in the Chinese version of Wikipedia. IEEE Technology and Society Magazine, 28(2), 49–56. doi:10.1109/MTS.2009.932799 Liao, H.-T. (2011). Needing to Have a Voice: Linguisitc Grouping in the Digital Networked Environment (ISD Working Papers in New Diplomacy). Washington, D.C.: Institute for the Study of Diplomacy. Georgetown University. Retrieved from http://isd.georgetown.edu/files/Needing%20to%20Have%20a %20Voice.pdf Liao, H.-T. (2013a). How does Chinese localization influence online visibility? A study on Chinese-language Search Engine Result Pages (SERPs). (Accepted). To be presented at the 11th Annual Chinese Internet Research Conference (CIRC 2013), Oxford, UK. Liao, H.-T. (2013b). “Online Encyclopedia” (网上/网络百科全书 ), “User Generated Content” (用户生成内容). In (L. Cheng, Ed.)The Internet in China: An Encyclopedic Handbook of Online Business, Information Distribution, and Social Connectivity. Berkshire Publishing. Liao, H.-T., & Petzold, T. (2011). Analysing geo-linguistic dynamics of the World Wide Web: The use of cartograms and network analysis to understand linguistic development in Wikipedia. Cultural Science, 3(2). Luyt, B., Goh, D., & Lee, C. S. (2009). Searching locally: a comparison of Yehey! and Google. Online Information Review, 33(3), 499–510. Malaga, R. A. (2008). Worst practices in search engine optimization. Commun. ACM, 51(12), 147–150. doi:10.1145/1409360.1409388 Margetts, H. Z., & Escher, T. (2006). Governing from the Centre? Comparing the Nodality of Digital Governments. SSRN eLibrary. Retrieved from http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1755762 Massa, P., & Scrinzi, F. (2012). Manypedia: Comparing Language Points of View of Wikipedia Communities. In Proceedings of WikiSym 2012. Retrieved from http://orga.wikisym.org/ws2012/bin/download/Main/Program/ p13wikisym2012.pdf Mazieres, A., & Huron, S. (2013). Toward Google Borders. Presented at the Web Science. Retrieved from http://hal.inria.fr/hal-00805048 McKenna, M. G., & Naftulin, H. (2000). Challenges in the multicultural HCI development environment. In CHI ’00 extended abstracts on Human factors in computing systems (pp. 362–362). New York, NY, USA: ACM. doi:10.1145/633292.633509 Meyrowitz, J. (1986). No sense of place : the impact of electronic media on social behavior. New York ; Oxford: Oxford University Press. Meyrowitz, J. (1994). Medium theory. In D. Crowley & D. Mitchell (Eds.), Communication Theory Today. Stanford University Press.
  • 12. 12 Morris, M., & Ogan, C. (2002). The Internet as Mass Medium. In D. McQuail (Ed.), McQuail’s reader in mass communication theory (pp. 134–145). London: SAGE. Nguyen, C. (2011, March). Search Engine Market share by country. Chandler Nguyen Digital Marketing Blog. Retrieved December 1, 2011, from http://www.chandlernguyen.com/2011/03/search- engine-market-share-by-country-mar-2011.html Nielsen Online. (2008). Wikipedia U.S. Web Traffic Grows 8,000 Percent In Five Years, Driven By Search. New York: Nielsen Online. Retrieved from http://news.softpedia.com/news/Wikipedia-Traffic-Mostly- from-Google-85703.shtml Petzold, T., & Liao, H.-T. (2011). Geo-linguistic analysis of the World Wide Web: The use of cartograms and network analysis to understand linguistic development in Wikipedia. In D. Araya, Y. Breindl, & T. J. Houghton (Eds.), Nexus: New Intersections in Internet Research (pp. 55–75). New York: Peter Lang. Petzold, T., Liao, H.-T., Hartley, J., & Potts, J. (2012). A world map of knowledge in the making: Wikipedia’s inter-language linkage as a dependency explorer of global knowledge accumulation. Leonardo: Art, Science and Technology, 45(3), 284–284. doi:10.1162/LEON_a_00376 PricewaterhouseCoopers. (2011). IAB Internet Advertising Revenue Report. New York; DC: The Interactive Advertising Bureau. Retrieved from http://www.iab.net/AdRevenueReport Qiang, X. (2010, July 23). User-generated content online now 50.7% of total. China Daily. Beijing. Retrieved from http://www.chinadaily.com.cn/business/2010- 07/23/content_11042851.htm Rogers, R., & Sendijarevic, E. (2012). Neutral or National Point of View? A Comparison of Srebrenica articles across Wikipedia’s language versions. In Wikipedia Academy: Research and Free Knowledge (#wpac2012). Berlin. Retrieved from http://wikipedia- academy.de/2012/w/images/8/89/3_Paper_Richard_Rogers_E mina_Sendijarevic.pdf Russell, J. (2011). Why Yahoo! –not Google– rules Taiwan’s webspace. Asian Correspondent. Retrieved December 1, 2011, from http://asiancorrespondent.com/55695/focus-on-taiwan- where-yahoo-not-google-rules-the-countrys-webspace/ Segev, E. (2008). Search Engines and Power: A Politics of Online (Mis-) Information. text. Retrieved November 19, 2011, from http://www.webology.org/2008/v5n2/a54.html SEMPO. (2011). SEMPO State of Search Marketing Report 2011. SEMPO Institute. Retrieved from http://econsultancy.com/uk/reports/sempo-state-of-search Silverwood-Cope, S. (2012, February 8). Wikipedia: Page one of Google UK for 99% of searches. Intelligent Positioning Blog. Retrieved from http://www.intelligentpositioning.com/blog/2012/02/wikipedia -page-one-of-google-uk-for-99-of-searches/ Slingshot SEO. (2011). Google & Bing Click-Through Rates (White paper). Retrieved from http://www.slingshotseo.com/resources/white-papers/google- ctr-study/ Spindler, S. (2010). Online Marketing: How to Increase International Sales with Search Engine Optimisation. GRIN Verlag. StatCounter. (2011). Top 5 Search Engines in China/Hong Kong/Singapore/Taiwan from Nov 2010 to Nov 2011. StatCounter Global Stats. Retrieved December 1, 2011, from http://gs.statcounter.com/#search_engine-CN-monthly- 201011-201111 Sunstein, C. R. (2002). Fragmentation and Cybercascades. In Republic.Com. Princeton University Press. The Cambridge encyclopedia of China. (1991) (2nd ed.). Cambridge [England] ; New York: Cambridge University Press. University, J. G. H. L. S. P. of L. H., & School, T. W. P. of L. C. L. (2006). Who Controls the Internet? : Illusions of a Borderless World: Illusions of a Borderless World. Oxford University Press. Varian, H. R. (2007). The Economics of Internet Search. Presented at the Angelo Costa lecture, Rome. Retrieved from http://people.ischool.berkeley.edu/~hal/Papers/2007/costa- lecture.pdf Vaughan, L., & Thelwall, M. (2004). Search engine coverage bias: evidence and possible causes. Information Processing & Management, 40(4), 693–707. Vaughan, L., & Zhang, Y. (2007). Equal Representation by Search Engines? A Comparison of Websites across Countries and Domains. Journal of Computer-Mediated Communication, 12(3). Retrieved from http://jcmc.indiana.edu/vol12/issue3/vaughan.html Warncke-Wang, M., Uduwage, A., Dong, Z., & Riedl, J. (2012). In Search of the Ur-Wikipedia: Universality, Similarity, and Translation in the Wikipedia Inter-language Link Network. Retrieved from http://www.grouplens.org/system/files/p3wikisym2012.pdf Yang, Y. (2011, February 25). China’s “Wikipedia” Submits Complaint about Baidu. Economic Observer News, 508, 28. Young, R. D. (2011, August 10). Top Google Ranking Captures 18.2% of Clicks. Search Engine Watch (#SEW). Retrieved December 2, 2011, from http://searchenginewatch.com/article/2100616/Top-Google- Ranking-Captures-18.2-of-Clicks-Study Zhao, S., & Baldauf, R. B. J. (2007). Planning Chinese Characters: Reaction, Evolution or Revolution? Springer.