SlideShare una empresa de Scribd logo
1 de 116
Descargar para leer sin conexión
Risks of search engine dependency and
        its influence on data quality



  A thesis submitted for the European Master in Business Studies (EMBS)

                      by Ronan CHARDONNEAU



     Institut de Management de l'Université de Savoie d'Annecy (FR)

                   Università degli studi di Trento (IT)

                        Universität Kassel (GER)

                        Universidad de León (SP)
              Date of submission: June the 26th, 2009


                           Master Thesis
Acknowledgements

Sincere and grateful acknowledgements have to be made to:

       Mr Francesco VALOTTO, Co-founder of Edexon (Italy) who gave me the
idea to study the world of search engine optimization which finally ended to the
following topic.

       Mr Roland ZIMMERMANN, from the University of Kassel (Germany) for
his help in structuring the thesis and his rereading.

       Mr Eugenio POZZOLINI, from the Advancia Business School (France) for
all his advice.

       Mr Andrea MOLINARI, from the University of Trento (Italy) who accepted
to be the tutor of the thesis.

       Mr Charles KNIGHT, editor at AltSearchEngines (United States) for
promoting the thesis on his website.

       Mr Daniel Arias-Aranda, associate professor at the Universidad de Granada
(Spain) for his rereading and feedback.

       Mr Charles NODOT, first year student (France) within the European Master
in Business Studies, for his rereading, feedback and corrections.




                                                                               2
Contents


Acknowledgements...................................................................................................................2

Contents ....................................................................................................................................3

Table of figures .........................................................................................................................6

Foreword ...................................................................................................................................8

Chapter 1: Introduction of the topic background ....................................................................10

   1.1 Relevance of the subject ...............................................................................................13
   1.2 Major terms...................................................................................................................14
   1.3 Focus, goals and structure of the report ........................................................................15
   1.4 Chaper 1: Key points ....................................................................................................17
Chapter 2: Concept of data quality .........................................................................................18

   2.1 Data quality definition ..................................................................................................20
   2.2 Data quality issues within businesses ...........................................................................21
   2.3 Origins of data quality issues: Garbage In Garbage Out ..............................................24
       2.3.1 Poor data quality content: the Wikipedia example ................................................24
       2.3.2 Poor data quality content: the commercial example ..............................................25
       2.3.3 Metadata ................................................................................................................26
       2.3.4 Findability ..............................................................................................................27
   2.4 Data quality solutions ...................................................................................................29
       2.4.1 Learning how to use search tools ...........................................................................29
       2.4.2 Check out the information: the Triangle method ...................................................30
   2.5 Chapter 2: key points ....................................................................................................32
Chapter 3: Search engines dependency ..................................................................................33

   3.1 Search engine categories...............................................................................................34
       3.1.1 Commercial search engines ...................................................................................34
       3.1.2 Enterprise search engine (ESE) .............................................................................37
   3.2 Search engine market ....................................................................................................37
       3.2.1 Commercial search engine market .........................................................................37
       3.2.2 Commercial search engine market: Consumer behavior .......................................38
       3.2.3 Enterprise Search Engine market ...........................................................................40
                                                                                                                                            3
3.2.4 Enterprise Search Engine market: Consumer behavior .........................................41
      3.2.5 The commercial search market repartition ............................................................43
      3.2.6 The commercial search engines in the world .........................................................43
      3.2.7 Commercial search engine leaders presentation ....................................................46
      3.2.8 Commercial search engine complexity ..................................................................50
      3.2.9 Search engine market shares configuration ...........................................................51
      3.2.10 Search engines competition .................................................................................52
   3.3 Search engine dependency aspect .................................................................................54
      3.3.1 Search engines dependency proves........................................................................54
      3.3.2 Types of search engines dependency .....................................................................55
      3.3.3 Search engine loyalty .............................................................................................56
      3.3.4 Search engines dependency issues .........................................................................57
      3.3.5 Privacy issues.........................................................................................................58
      3.3.6 Search engine awareness .......................................................................................59
   3.4 Search engine dependency conclusion .........................................................................64
   3.5 Chapter 3: key points ....................................................................................................65
Chapter 4: Risks of search engines dependency and its influence on data quality .................66

   4.1 Search engine dependency and its influence on data quality: Issues ............................67
      4.1.1 Search Engine Optimization ..................................................................................67
      4.1.2 Commercial advertisement and perception ............................................................70
      4.1.3 Censorship .............................................................................................................72
      4.1.4 Technological partnerships ....................................................................................73
      4.1.5 The Visible Web .....................................................................................................74
      4.1.6 Invisible Web .........................................................................................................76
   4.2 Search engine dependency and its influence on data quality: Solutions ......................77
      4.2.1 A deeper knowledge in search engine abilities ......................................................77
      4.2.2 Taking the best part of each search engine ............................................................79
      4.2.3 Technological evolution .........................................................................................80
   4.3 The future of Internet search .........................................................................................87
   4.5 Chapter 4: Key points ...................................................................................................89
Chapter 5: The Google example .............................................................................................90

   5.1 Google presentation ......................................................................................................91
      5.1.1 Google....................................................................................................................91
      5.1.2 Google's success ....................................................................................................91

                                                                                                                                      4
5.1.3 Google image .........................................................................................................92
       5.1.4 Google dependency state .......................................................................................93
       5.1.5 Google added functionalities .................................................................................94
       5.1.6 Google success is his weakness .............................................................................94
   5.2 Google's disappearance consequences ..........................................................................96
       5.2.1 Google Search engine failure.................................................................................96
       5.2.2 Google Gmail failure .............................................................................................98
       5.2.3 Google other services failure ...............................................................................100
       5.2.4 Google collateral damages ...................................................................................101
   5.3 Chapter 5: Key points .................................................................................................102
Conclusion and recommendations ........................................................................................103

List of literature ....................................................................................................................110




                                                                                                                                        5
Table of figures

Figure 1: Internet Domain Survey Host Count January 1994 - January 2009 ........................11
Figure 2: Do you use a personal blog? ...................................................................................12
Figure 3: How frequently do Internet users participate in the most popular activities? .........12
Figure 4: Most used information source when people need help ...........................................19
Figure 5: How much of the information on the World Wide Web overall is generally reliable?
................................................................................................................................................20
Figure 6: Enterprise findability goal .......................................................................................23
Figure 7: 1st and 2nd results for "data quality" are Wikipedia websites ................................25
Figure 8: A query made on Google images with the keyword "P5170009" ...........................27
Figure 9: How well is findability understood in your organization? ......................................28
Figure 10: How critical is findability to your Organization's Business Goals and Success? ..28
Figure 11: Triangle method application ..................................................................................31
Figure 12: Ask search engine home page ...............................................................................35
Figure 13: Yahoo home page ..................................................................................................35
Figure 14: An example of vertical search with Yahoo Images ...............................................36
Figure 15: A semantic search engine: Wolfram Alpha ............................................................36
Figure 16: Top 10 Worldwide Search December 2007 ...........................................................37
Figure 17: How Much Of The Information On the Internet Do You Think is Reliable and
Accurate? ................................................................................................................................39
Figure 18: Enterprise search satisfaction ................................................................................41
Figure 19: Influence of the consumer web on enterprise search tools ....................................41
Figure 20: Success rate of finding the information with enterprise search tools ....................42
Figure 21: Worldwide Search by Region ................................................................................43
Figure 22: Search engine leaders (>50%) per country personal estimation ...........................44
Figure 23: Search engine market in the USA, source: Hitwise, february 2009 ......................46
Figure 24: Google logo ...........................................................................................................46
Figure 25: Yahoo Logo ...........................................................................................................47
Figure 26: Japanese search engine market, source:webcreate.ga-pro.com, May 2009 ..........47
Figure 27:Chinese search engine market, source:China IntelliConsulting Corp. sept 2008...47
Figure 28: Baidu logo .............................................................................................................47
Figure 29: Bing logo ...............................................................................................................48
Figure 30: Korean search engine market, source: July 2007 Koreanclick..............................48
Figure 31: Naver logo .............................................................................................................48
Figure 32: Yandex logo ...........................................................................................................49
Figure 33:Search engine market in Russia, source: LiveInternet.ru:December 31, 2008 ......49
Figure 34: Seznam logo ..........................................................................................................49
Figure 35: Search engine market in Czech Rep, source: navrcholu.cz, June 2008 ................49
Figure 36:Search engine market in Iceland, source: statice.is 2007 .......................................50
Figure 37: Leit.is logo.............................................................................................................50
Figure 38: An example of a customized interface on iGoogle................................................51
Figure 39: Search engine market shares in 2007 for the Czech Republic ..............................52
Figure 40: Google market shares in Europe in 2008, source:Comscore .................................56

                                                                                                                                                6
Figure 41: Use of search engines in 2004 and 2005 ...............................................................56
Figure 42: Search engine dependency relevancy ....................................................................57
Figure 43: Search users blame themselves not the technology...............................................58
Figure 44: Search engine syntax examples .............................................................................60
Figure 45: Use of advanced search functionalities in Canada ................................................61
Figure 46: Do users know how to use Boolean operators? .....................................................61
Figure 47: Use of meta search engines ...................................................................................62
Figure 48: Use of specialized search engines .........................................................................63
Figure 49: U.S. Advertising Market - Media Comparison – 2008 ($ Billions) ..........................67
Figure 50: Internet Ad Revenues by Advertising Format - 2008 Annual Results...................68
Figure 51: Search engine user behavior regarding results pages in the USA .........................68
Figure 52: An eye tracking study on several search engines ..................................................69
Figure 53: Differences between organic and sponsored results ..............................................70
Figure 54: Type of Search Result Selected .............................................................................71
Figure 55: Results relevancy according to users by search engine in 2004............................71
Figure 56: Attitudes towards search engines in India .............................................................72
Figure 57: Powered by Google logo .......................................................................................74
Figure 58: Powered by Yahoo logo.........................................................................................74
Figure 59: Estimation of the indexable web per search engine ..............................................74
Figure 60: Distribution of Public Web Sites By Country in 2002 ..........................................75
Figure 61: Percentage of Web Sites Covered by Google in 2002 ...........................................75
Figure 62: Google vertical search engines ..............................................................................78
Figure 63: Search engine search within website content comparison ....................................79
Figure 64: Future of web 2.0 ..................................................................................................80
Figure 65: Search engines are not the Internet .......................................................................81
Figure 66: Time and knowledge lag .......................................................................................82
Figure 67: Delicious bookmarks search..................................................................................83
Figure 68: Home page of the Similicious website ..................................................................83
Figure 69: Twitter real time information search engine ..........................................................84
Figure 70: Kartoo search results presentation.........................................................................85
Figure 71: 2008 Web trend map..............................................................................................86
Figure 72: 2007 Web trend map..............................................................................................86
Figure 73: Significant age-related differences in article discovery methods ..........................88
Figure 74: Google domination in Europe Figure 75: Google domination in Latin America
................................................................................................................................................93
Figure 76: Google coverage representation of the visible web...............................................95
Figure 77: Google search failure ............................................................................................96
Figure 78: Figure 77: Google bug analysis on January the 31st 2009 ....................................97
Figure 79: Google evolution traffic during the bug on January the 31st 2009 .......................98
Figure 80: Google Gmail failure.............................................................................................99
Figure 81: Main use of Internet ............................................................................................100




                                                                                                                                                7
Foreword

        A general trend of the early 21st century has been the use of the Internet
despite of TV as an information provider1.

        There are today 1,596,270,108 Internet users in the world2 and basically most
of them already have their habits: checking their e-mail box(es), making research,
finding information about goods and services, online chatting, reading the news3.

        Most of the functions described above can be done through an unique
information exchange provider: the search engines.

        According to the main actors in Internet traffic measurements search engines
are by far the most visited websites4.

        The main search engines actors are nowadays providing all kind of services
making the Internet use very comfortable.

        However using a single search engine everyday make people conditioned to
process information in a certain way.

        Such habits taken at home may unfortunately be present at work or the
other way around.

        It is for sure comfortable to have a standard when dealing with computers. As
an example Microsoft is the leading Operating System on computers with more than
90% of the all market5. But is Microsoft the computer? The same question arise with
search engines: are they the Internet?




1
  Cogar, P. (ed.) (2007). TV vs. the Internet: Internet wins. [online]. Available from : http://www.bit-
tech.net/news/2007/08/23/tv_vs_the_internet_internet_wins/1 [Accessed 17 June 2009]
2
  Internet World Stats. (2009). World Internet Usage Statistics News and World Population Stats.
[online]. Available from: http://www.internetworldstats.com/stats.htm [Accessed 17 June 2009]
3
  Malaysian Communications and Multimedia Commission. (2005). Household use of the Internet
survey                   2005.                   [online].               Available                from:
http://www.skmm.gov.my/facts_figures/stats/pdf/Household_use_internet_survey2005.pdf
[Accessed 17 June 2009]
4
    Alexa Web. (n.d). Alexa Top 500 Global Sites. [online]. Available from:
http://www.alexa.com/topsites [Accessed 17 June 2009]
Netcraft.       (n.d).      Most         visited       websites.    [online].      Available      from:
http://toolbar.netcraft.com/stats/topsites [Accessed 17 June 2009]
5
  One Stat. (2007). OneStat Website Statistics and website metrics - Press Room. [online]. Available
from :       http://www.onestat.com/html/aboutus_pressbox54-windows-vista-global-usage-share.html
[Accessed 20 June 2009]
                                                                                                      8
« Risks of search engine dependency and its influence on data quality » has
been written in the scope of understanding the potential risks of search engines
addiction on businesses.

        Search engines such as Google are used by all Internet users. According to
studies, Internet users are confident, satisfied and trust search engines 6 . They
unfortunately show that users are unaware and naïve as well.

        Search engines are set up to find information on the Internet,
information being the basis of any good decision making we can then
understand how important and interesting it is for businesses to understand
what are the consequences of their use.




                                                                 Ronan CHARDONNEAU




6
     Fallows,   D.     (2005).    Search     Engine      users.   [online].   Available   from:
http://www.pewinternet.org/~/media//Files/Reports/2005/PIP_Searchengine_users.pdf.pdf [Accessed
17 June 2009]
                                                                                             9
Chapter 1: Introduction of the topic
            background




                                       10
The Internet has been created to share information and to communicate with
each others.

           It is hard to evaluate how big is the Internet, estimations among companies
are very different, it varies from 15 to some 30 billion Web pages7.

           The number of websites is increasing everyday and estimated at more than
600,000,0008 for 2009 with a constant augmentation since the creation of the world
wide web.




            Figure 1: Internet Domain Survey Host Count January 1994 - January 2009

           Websites are used now in diverse manners if it comes to be a standard for
companies (enlargement of their business activity, new opportunity for advertisement)
it is also a space for many individuals (blog phenomenon).




7
    Cf. Koch, P. / Koch, S. (2009). How big is the Internet?. [online]. Available from
     http://www.pandia.com/sew/383-websize.html. [Accessed 19 January 2009]
8
    Internet Systems Consortium. (2009). The ISC Domain Survey Internet Systems Consortium. [online].
      Available from https://www.isc.org/solutions/survey [Accessed 17 June 2009]
                                                                                                 11
Figure 2: Do you use a personal blog?9

        A study realized on 29 countries shows that almost 25% of Internet users
under 34 year-old are using a blog, this trend is moreover growing since 2003.

        The vulgarization of the Internet and the fact that anyone can create his own
website for free increased drastically the number of contents. The explosion of social
networks      (Facebook,      Hi5…),       blogs     (Wordpress,       Blogger,     Myspace…),
microblogging (Twitter) are changing the nature and fabric of the world wide web:
from an Internet built by a few thousand of individuals we moved to one made by
millions.10

        If we take into account that searching is after e-mails the biggest activity




       Figure 3: How frequently do Internet users participate in the most popular activities?




9
   USC Annenberg School. (2008). The impact of the Internet. [online]. Available from:
http://advertising.microsoft.com/sverige/WWDocs/User/sv-se/NewsAndEvents/Events/jeff_cole.ppt
[Accessed 21 June 2009] p.7
10
   Cf. UCL. (2008). Information behaviour of the researcher of the future. [online]. Available from:
http://www.bl.uk/news/pdf/googlegen.pdf [Accessed 19 June 2009] p.16
                                                                                                 12
which is made of the Internet11:

        We can then understand that more sophisticated tools are needed to find the
right information on the Web.

        So far we access to websites through three ways:

       Direct access (for example entering directly the URL in the address bar,
clicking on a bookmark);

       External links (access to a website through the link of another website, this
is the case in most of websites, catalogs, advertisement);

       Through Search Engines;

        By using only the first two options one cannot browse the Internet normally.
It has been said as well that the first way is disappearing more and more in profit of
search engines12.

        A search engine is then indispensable in order to crawl the web properly.




1.1 Relevance of the subject

        The Internet is becoming more and more our information provider. As studies
show:
        ‖More people turned to the internet than any other source of information and
support, including experts, family members, government agencies, or libraries‖13.
        The Web is the primary source of information for many people with an
increase of its recognition14.



11
  USC. (2008). Annual Internet Survey by the Center for the Digital Future. [online]. Available from:
http://www.digitalcenter.org/pdf/2008-Digital-Future-Report-Final-Release.pdf [Accessed 21 June
2009] p.4
12
   cf. Ohayon, O. (2008). Google, moteur de recherche ou moteur de navigation?. [online]. Available
     from :          http://fr.techcrunch.com/2008/10/30/fr-google-moteur-de-recherche-ou-moteur-de-
     navigation/ [Accessed 17 June 2009]
13
   Estabrook, L. /Witt, E./ Rainie, L. (2007). Information searches that solve problems. [online].
Available                                                                                     from:
http://www.pewinternet.org/~/media//Files/Reports/2007/Pew_UI_LibrariesReport.pdf.pdf [Accessed
17 June 2009] p5
14
    Cole, J. I./Suman, M./Schramm, P./Lunn, R/Aquino, J.S. (2003). Surveying the Digital Future.
[online] Available from: http://www.digitalcenter.org/pdf/InternetReportYearThree.pdf [Accessed 17
June 2009]
                                                                                                  13
The number of Internet users is estimated to 1,463,632,361 (world population
6,676,120,288) with a growth rate from 2000-2008 fixed at 305.5 %15.
           The Internet is then our main information provider and his number of users is
increasing every day.
           This rule is the same for businesses as for individuals. More and more
information is digitalized and it comes then easier for companies to get data from the
Internet rather than extracting it in the former way. As an example it is simpler to
access the Yellow pages online, making copy and paste of some information rather
than opening the hard copy book and typing in the data you want to work on.
           The Internet is then a place where the working environment is crossing
the one of the individual.
           This information sharing have some consequences (lot of information,
accuracy issues, internet users are subject to many commercials). This is moreover
problematic because this is the first time that an information provider is
gathering in such extend those two sources of information. It was not the case
with TV, Radio or even newspapers.
           As we will see later some companies are only relying on information, finding
quality websites is then critical for businesses.



1.2 Major terms

           In this thesis the following expressions will be used: search engines, search
engine dependency, data quality, Web 2.0 and following versions.

           Search engine is the most flexible technology which has been created in
order to browse the web. A search engine is no more than a web application which is
processing information. It does not create data it just process some information it has
in his index.

―A search engine is simply a means to ask information on the Web, a system for
organizing the data held on the Internet. A search engine can be metaphorically




15
     Internet World Stats. (2009). INTERNET USAGE STATISTICS The Internet Big Picture.[online]
       Available from: http://www.internetworldstats.com/stats.htm [Accessed 17 June 2009]
                                                                                           14
compared to several activities: a miner panning for gold, a clerk looking for a
document in a cabinet…‖16

           Search engine dependency is the fact that Internet users use a single search
engine when looking for information on the Internet. This dependency can be created
from different factors such as loyalty, patriotism or convenience.

           Data quality is the quality of data. Data are of high quality "if they are fit for
their intended uses in operations, decision making and planning"17. Alternatively, the
data are deemed of high quality if they correctly represent the real-world construct to
which they refer. These two views can often be in disagreement, even about the same
set of data used for the same purpose.18

           Web 2.0 and following versions are not the name of a specific software or
technology. As an example Web 2.0 is an online movement that encourages users to
participate in the fresh, interactive nature of the Internet by using widely available,
                                                                                              19
less       expensive,       and      more      mature       state-of-the-art   technologies         .



1.3 Focus, goals and structure of the report


           The focus of this work is to put in evidence that there is a critical lack of
how to use the Internet either at home that within businesses and that one is
influencing the other.

           Such lack of knowledge is raising from the over evaluation we are making
about technologies, commercial search engine strategies, lack of awareness, strong
addiction to search engines, lack of training within businesses and educational
institutions.

           This has some critical consequences on business decision-making as well
as day to day choices.




16
     Friedman, B. G. (2004). Web search savvy. p.19
17
     Juran, J. (1999). Juran’s quality handbook: Fifth edition. p.976
18
    Kaplan, I. (2008). Bad Data Can Cost You Big Time. [online]. Available from:
    http://www.federationofcredit.com/base/document/Newsletter/IKaplanSept08.html [Accessed 17
    June 2009]
19
   Meyerson, M./Scarborough, M. E. (2007). Mastering Online Marketing. P.223
                                                                                                   15
If those risks are relevant it is then very important to put them in evidence
showing concretely what are those risks, where are they coming from and how much
is the gap of information between a search from a search engine addicted user and
the most rational way of looking for the information.

       The structure of the report is as follow:

       The first idea is to introduce the concept of data quality. What do we mean
by data quality? How to get data quality on the Internet?

       The next point is dealing with the world of search engines and the
dependency which is coming out from them.

       Analyzing the world of search engines is important to understand how the
Internet is not as rational as one could think and what are the actors of the
dependency (search engines may be not the Internet, search engines may be different
from a country to another).

       Once this analysis made, a look at the facts and figures regarding search
engine users attitudes will be conducted. This should drive us to the conclusion that
Internet users are not using an all set of search tools but only a couple of them: the
dependency concept.

       Once the dependency concept introduced we will measure the risks of such
addiction on data quality.

       Google being in Europe the most used search engine and it will be used as a
concrete example in the last part.

       In the last part recommendations will be given for companies interested
in improving their information research system and reducing data quality issues
when looking for information on the Internet.




                                                                                   16
1.4 Chaper 1: Key points



     The Internet is used to share information and to communicate;

     The number of websites created increase everyday;

     Websites are used for diverse purposes (making advertisement, expressing
      personal opinions, running businesses…);

     25% of young people Internet users aged of <34 year-old have a personal
      website;

     In a decade we skipped from an Internet built by a thousand of individuals to
      one made by millions;

     Search is the second biggest activity made on the Internet after e-mails;

     Search engines are so far the only way to crawl the Internet properly;

     The Internet is our main information provider;

     On the Internet, flows of information from businesses are mixed up with the
      ones of individuals, it can then be subject to confusions;

     Search engine are the origin of those confusions, it seems then critical to
      analyze how those technologies are working and what are the consequences
      of their use;




                                                                                  17
Chapter 2: Concept of data quality




                                     18
A recent study in the United States showed20 that the Internet is the most
used        source        of       information          when         people        need        help:




                 Figure 4: Most used information source when people need help

        This information is far more valuable if we consider that the World Wide
Web is now the largest resource of information21.

        The Internet has then several strengths: the most used information system,
the biggest resource of information, it is moreover the most global and accessible
one (free and mobile).22 The issue is how to use it wisely to get quality information.

        If we have a look at the perception that Internet users have regarding the
quality of information on the Internet we can see that a high percentage of users are
considering the data quality issue. Most of them however agree that in general the
Internet is a reliable source of information23:




20
   cf. Estabrook, L. /Witt, E./ Rainie, L. (2007). Information searches that solve problems. [online].
Available                                                                                       from:
http://www.pewinternet.org/~/media//Files/Reports/2007/Pew_UI_LibrariesReport.pdf.pdf [Accessed
17 June 2009]
21
   Muñoz, C./Moraga, A./Piattini, M. (2008). Handbook of Research on Web Information Systems
Quality. p.286
22
    Albarran, A.B./Chan-Olmsted,S.M./Wirth,M.O. (2006). Handbook of media management and
economics. p471
23
       Pierce, J. (2008). The World Internet Project. [online]. Available from:
http://www.digitalcenter.org/WIP2009/WorldInternetProject-FinalRelease.pdf [Accessed 20 June 2009]
                                                                                                  19
Figure 5: How much of the information on the World Wide Web overall is generally reliable?



2.1 Data quality definition

           ―Data has quality if it satisfies the requirements of its intended use. It lacks
quality to the extent that it does not satisfy the requirement. In other words, data
quality depends as much on the intended use as it does on the data itself. To satisfy
the intended use, the data must be accurate, timely, relevant, complete, understood,
and trusted.‖24

           In general one agrees to define data quality according to six dimensions.
           Accuracy: The quality of being near to the true value25. Accuracy is the most
important dimension.26
           Timelessness: unaffected by time27.
           Relevant: the degree to which search results meet the requirements or
expectations implicit in the query28
           Complete: bring to a whole, with all the necessary parts or elements.
           Understood: perceive (an idea or situation) mentally.
24
   Olsen, J. (2003). Data quality: The accuracy dimension. p.24
25
      Wordnet.princeton.edu.      (2009).    Accuracy    definition. [online].     Available     from:
wordnet.princeton.edu/perl/webwn [Accessed 17 June 2009]
26
   Olsen, J. (2003). Data quality: The accuracy dimension. p.3
27
     Wordnet.princeton.edu. (2009). Timelessness definition. [online]. Available from:
wordnet.princeton.edu/perl/webwn [Accessed 17 June 2009]
28
   WhamTech . (n.d). Glossary of less-than-usual terms used in the Web site. [online]. Available from:
www.whamtech.com/glossary.htm [Accessed 17 June 2009]
                                                                                                   20
Trusted: inclined to believe or confide readily.


           Each of those dimensions can be accepted with a certain level of
acceptance. As previously said everything depends on the intended use of the
information. For example a database with 70% of accuracy may have a value for
some company departments (e.g: marketing for estimations) because those 70% of
data are exploitable.
           On the other hand it can be useless for others, for e.g: an accounting
department releasing a balance sheet of 70% accuracy.


           Data quality is a complex topic and some additional dimensions can be
included for the use of the data such as:

           Accessibility, Accuracy, Amount of data, Applicability, Attractiveness,
Availability, Believability, Completeness, Concise representation, Consistent
representation, Cost effectiveness, Customer support, Currency, Documentation,
Duplicates, Ease of operation, Expiration, Flexibility, Granularity, Interactive,
Internal consistency, Interpretability, Latency, Maintainable, Novelty, Objectivity,
Ontology, Organization, Price, Relevancy, Reliability, Reputation, Response time,
Security, Specialization, Source's information, Timeliness, Understand ability,
Validity, Value-added.29



2.2 Data quality issues within businesses

           As we saw previously accurate data is the most important dimension of data
quality. Data is the heart of any good businesses or organizations. Some companies
such as financial ones are only living on information.

           The use of the Internet increased the flow of information and now company's
data are used by other companies to make decisions such as purchasing and
selling.




29
 Muñoz, C./Moraga, A./Piattini, M. (2008). Handbook of Research on Web Information Systems
Quality. p.138
                                                                                       21
So if company A is providing bad quality data which afterward are retaken
by company B it enters in a vicious circle where the flow of biased information never
stop.

           As Jack E. Olson mentioned it in his book ―Data quality‖:

           ―Data is generated by more people, is used in the execution of more tasks by
more people, and is used in corporate decision making more than ever before.‖30

           Data quality is critical.

           Even though databases are recognized as the most important asset, companies
tolerate enormous inaccuracies in their databases.

           According to the same author this issue is not only present within businesses
but as well in governmental organizations and educational systems:

          Businesses and organizations are aware of data issue;

          They all underestimate the consequences of it;

          They have no idea of the cost linked to those issues;

          They have no idea of the potential value in fixing the problem;

           Jack E.Olsen gives us as well in his book an estimation of the loss associated
to data quality fixing it at 15 to 25% of the operating profit.
           Those losses are of different kinds: transaction rework costs, costs incurred in
implementing new systems, delays in delivering data to decision makers, lost
customers through poor service, lost production through supply chain problems.

           Those issues are normally not coming from the data management system
(DMS are conceived to answer a specific request). The failure is mainly coming from
its users.

           To avoid this they need to be aware of three things:

           -   what are the system capabilities;

           -   how to use it properly;

           -   how to interpret its results.


30
     Olsen, J. (2003). Data quality: The accuracy dimension. p.5


                                                                                        22
The main remedy of this issue stands to be a long term strategy in which
teams within the organization are trained in the concept of data quality
management.

        The concept of data quality is very relevant when dealing about search
engines. Most of the search engines we know as consumers are commercial search
engines. But as we know the main objective of a commercial company is to make
profit and from this a lot of issues are raising.

        According to a study untitled ―Findability‖31 most of businesses (62%) agree
that finding information is critical however on the other hand most of them do not
know the criticality level of finding information and this due to a general lack of
awareness. It shows as well that strategy are almost mainly not defined (49%):




                              Figure 6: Enterprise findability goal

        And proper goals not clearly expressed. It draws the same conclusions as
some authors on this topic32.

        As we saw technology is not responsible of quality issues but the use of
technology and the interpretation made out of the information retrieval is a
source of quality problems. This can be reduced by implementing methodologies
such as:

–       Putting in place a better information research management strategy33 mainly
based on employees training. It does not only mean to train employees on how to use
technologies but as well how to develop a pro efficient behavior when making

31
   cf. The Association for Enterprise and Content Management. (2008). Findability: The Art and
     Science     of    Making      Content    Easy     to    Find.  [online].   Available    from:
     http://www.aiim.org/Research/MarketIQ/Findability-7-16-08.aspx [Accessed 17 June 2009] p.22
32
   Olsen, J. (2003). Data quality: The accuracy dimension. p.7-8
33
    Kehoe, M. (2009). Overview of the Enterprise Search Market. [online]. Available from:
http://www.ideaeng.com/tabId/98/itemId/181/Overview-of-the-Enterprise-Search-Market-2009.aspx
[Accessed 17 June 2009]

                                                                                               23
research. It means reconsidering the information process and participating in the
improvement of the all research information system (cf.chapter:2.3.4). Computer
users are expecting too much from technologies waiting to be fed with the most
rational solution whereas it is not yet on the market;

–       Implementing a more user oriented research application. Studies are showing
that regarding libraries too many of them did not investigate enough in this field,
focusing on the size of their database rather than how to retrieve the information34.
This is one of the reason why people move from libraries to the Internet as an
information provider;




2.3 Origins of data quality issues: Garbage In Garbage Out

―On two occasions I have been asked,—"Pray, Mr. Babbage, if you put into the
machine wrong figures, will the right answers come out?" … I am not able rightly to
apprehend the kind of confusion of ideas that could provoke such a question. “

                                                                           — Charles Babbage

        As we previously saw data quality issues with search engines are not coming
from technology. They are in fact coming from:

–       The one who wrote the contents of the results, it can be misspellings, no
concrete sources to justify himself, no adoption of standards, advertisement;

–       The one who type in the request (cf. chapter 3.3.6.1: Search engine use
awareness);

        The next parts will develop this first point in detail.



        2.3.1 Poor data quality content: the Wikipedia example

        Wikipedia is an easy example to illustrate the data quality issue with Internet
content and introduce well the chapters coming afterward.

34
  Cf. UCL. (2008). Information behaviour of the researcher of the future. [online]. Available from:
http://www.bl.uk/news/pdf/googlegen.pdf [Accessed 19 June 2009] p.31

                                                                                                24
Wikipedia is one of the greatest collaborative world wide web project ever
but on the other hand it has a couple of drawbacks. Those disadvantages are mainly
arising from an absence of standards in data quality, here are some of those points:

–        Everybody can provide his contribution and have the possibility to sign it as
anonymous, so in theory a 3 year-old kid can write an article. According to Sara
Baase: ―Accuracy and quality are impossible. Truth does not come from populist
free-for-alls. Some articles are biased and one sided‖35;

–        Some articles without reliable sources can be validated by an administrator,
Internet users may then take the displayed information for granted;

–        The success of Wikipedia: word of mouth;

–        Wikipedia's popularity36 made it ranks first on Google on most of the requests.
It has a page rank of 9 out of 10 which corresponds to almost the maximum
recognition Google can give to a website.




             Figure 7: 1st and 2nd results for "data quality" are Wikipedia websites




         2.3.2 Poor data quality content: the commercial example

         In a study untitled ―Of course it’s true I saw it on the Internet!‖ 37 aimed at
understanding how American students conduct searches the following question was
asked: ―List three major innovations developed by Microsoft over the past 10 years‖.


35
   Baase, S. (2007). A gift of Fire. p352
36
   Baase, S. (2007). A gift of Fire. p351
37
   Graham, E. L./ Metaxas, P. T. (2003). Of course it’s true I saw it on the Internet!: Critical thinking
in the Internet. Available from: http://www.wellesley.edu/CS/pmetaxas/CriticalThinking.pdf
[Accessed 17 June 2009]
                                                                                                      25
The survey was submitted to 180 college students in the United States during the
school year 2000-2001.

       As an answer 63% responded by using only one source of information:
Microsoft‟s website but is a commercial website a reliable, neutral and trusting
source of information?

       One thing is sure a company have no interest to critic herself on her own
website so it may be high probable that they will tend to sell themselves more than
keeping a neutral point of view.

       The commercial aspect of search engine will be retaken and more developed
in the next chapters.




       2.3.3 Metadata

       Metadata is the key in order to understand how search engines are currently
working and to understand how to perform good search. Commonly speaking the
definition of metadata is data about data.

       As an example a librarian is archiving his books by assigning to each of them
a reference.

       For example the reference ―AA1‖ corresponds to ―gone in the wind.

       Each web page on the Internet has several metadata such as the ―title‖ of the
page ―keyword associated to the page‖ ―description‖ etc etc…

       Metadata issues are coming mainly because they are not representing all
the data. The best example we can find is the one of images search. Today when
typing a request to look for pictures we get as a result a strange cocktail of a bit
everything. The reason in this case are a lack of metadata and a use of them which is
not appropriate.

       As an example most of Internet users are uploading pictures without giving
them any names, letting just a number as identifier. This is an incredible amount of
data which is unusable.


                                                                                  26
Figure 8: A query made on Google images with the keyword "P5170009"




        This is introducing another issue which is findability.




2.3.4 Findability


                              ―Findability Precedes Usability

                               In the alphabet and on the Web

                            You can’t use what you can’t find‖38

        Findability is the art and science of making content findable. The science is
library science; the art is language arts and the user interface design.39

        Findability is more or less understood by businesses and too often confused
as search.




38
   Morville, P. (2005). Ambient Findability. p.111
39
   The Association for Enterprise and Content Management. (2008). Findability: The Art and Science
of      Making         Content      Easy        to  Find.      [online].      Available      from:
http://www.aiim.org/Research/MarketIQ/Findability-7-16-08.aspx [Accessed 17 June 2009] p.9
                                                                                               27
Figure 9: How well is findability understood in your organization?

         Findability is not only about making research but also on how to make
information findable.

         Most of businesses agree on this point: Findability is critical in Organization’s
Business Goals and Success (62%).




     Figure 10: How critical is findability to your Organization's Business Goals and Success?

         However as a study on findability shows40 and as we will see later in Chapter
3 findability is not well defined and implemented within companies and this is
mainly due to a management failure.
                                                                                                 41
         As Peter Morville describes it in his book ―Ambient Findability‖
Findability is defying classification. It flows across the borders between design,
engineering, and marketing. Everybody is responsible, and so we run the risk that
nobody is accountable.

         Findability is the matter of everyone within a company for example when
designing the company website you have different actors: designers, engineers,
information architects, brand architects, marketing department.

         Another example is the one of a secretary or an archiver when storing
documents. He or she have to think about how to make those materials easy to find
for everyone (by choosing the right metadata, the right technology) this include a

40
   Cf. The Association for Enterprise and Content Management. (2008). Findability: The Art and
Science      of     Making     Content       Easy  to   Find.    [online].    Available  from:
http://www.aiim.org/Research/MarketIQ/Findability-7-16-08.aspx [Accessed 17 June 2009]
41
   Morville, P. (2005). Ambient Findability. p.111
                                                                                                 28
collaboration with all departments within a company. If not those contents are
not findable and lost in a certain way.

           The solutions given by the Peter Morville are two: cultivate cross-functional
collaboration and on an individual level to learn how to be pro efficient and to
go beyond the job responsibility.



2.4 Data quality solutions

                       A problem well defined is a problem half-solved."

                                                                           –John Dewey

Data quality issues are coming from:

-          Garbage In Garbage Out;

-          No check of information accuracy;

Solutions are then easy to find out:

-          Learning how to use search tools;

-          Check out the information;


           2.4.1 Learning how to use search tools

           The main issue with Internet users is that they stick to the “Principle of
Least Effort” invented by George Kingsley ZIPF:

           “Each individual will adopt a course of action that will involve the
expenditure of the probably least average of his work (least effort).”42

           And according to Calvin Mooers’ ―people will not seek information that
makes their jobs harder (even if it may benefit the organization they work for)‖.43

           Studies are in fact showing that users are sacrificing information quality




42
     Case, D. O. (2007) Looking for information. p.151
43
     Morville, P. (2005). Ambient Findability. p.54
                                                                                      29
for accessibility44.

          So users do not care about quality there are interested in easy to access
information.

          This is mainly why the Google Advanced search option is rarely used. People
assigning Advanced to complex.45 Whereas Advanced should be the right way to
search.




          2.4.2 Check out the information: the Triangle method


          Commonly used in the educational system the triangle method consists in lo-
cating three independent sources that point to the same answer in order to pro-
duce the most accurate information. This method is not making a distinction be-
tween quality websites and poor quality ones but it helps in checking the infor-
mation.
          Applying this concept can be more powerful that we can imagine. As an ex-
ample one can take a recent news event such the riots in Tibet in 2008. If we look at
the news provided from the United Kingdom46 and Germany47 as symbols of West-
ern media Tibetans were suffering a true chaos in March 2008.
          On the other hand by having a look at CCTV (China Central Television)48
some information posted by Western media were according to them totally biased
and incoherent. And when having a look at the proves advanced by the Chinese Me-
dia it is actually giving them reason 49. The inaccuracies came from the facts that

44
   Hirsh, S./Dinkelacker, J. (2004). Seeking Information in order to produce information: an empirical
study at Hewlett Packards Labs. p.816
45
   Olausson , A. M. (2007). Advanced Search: Is the name a problem?. [online]. Available from :
http://digital-lifestyles.info/2007/09/21/advanced-search-is-the-name-a-problem/ [Accessed 17 June
2009]
46
     BBC. (2008). Tibetans describe continuing unrest. [online]. Available from :
http://news.bbc.co.uk/2/hi/asia-pacific/7300312.stm [Accessed 17 June 2009]
47
   Berliner Morgenpost. (2008). China rüstet sich für « die entscheidende Schlacht ». [online].
Available                                                                                      from :
http://www.morgenpost.de/printarchiv/politik/article169230/China_ruestet_sich_fuer_die_entscheiden
de_Schlacht.html [Accessed 17 June 2009]
48
   XinHua. (2008). Commentary : Facts about Tibet should not be distorted. [online]. Available from:
http://news.xinhuanet.com/english/2008-03/24/content_7847789.htm
 http://news.xinhuanet.com/english/2008-03/24/content_7847789_1.htm [Accessed 17 June 2009]
49
    Beijing Review. (2008). Dialogue: Media Coverage on Tibet. [online]. Available from:
http://www.bjreview.com.cn/special/txt/2008-03/22/content_107054.htm [Accessed 17 June 2009]
                                                                                                   30
Western media did not know well enough the Chinese and Tibetan cultures and lan-
guages and were associating captions to images which were not true.
       In this configuration looking at three independent sources is critical. Who
could have thought that Western medias can be wrong for example.




                        Figure 11: Triangle method application

       Reliable sources is then a necessary condition for data accuracy but this
condition is not sufficient you need moreover to look at three independent and
reliable sources information which point to the same answer.




                                                                               31
2.5 Chapter 2: key points


     The Internet is the most used, largest, global and accessible source of
      information;

     The majority of Internet users consider the Internet has a reliable source of
      information and are aware of quality issues;

     Accuracy is the most important dimension in data quality and can be accepted
      in some cases with a certain level of acceptance;

     Some companies are only living on information;

     Company's data are used by other companies to make decisions;

     Data quality issues are touching all kind of organizations;

     The loss associated to data quality is estimated from 15 to 25% of the
      operating profit;
     In most of the cases Database Management System is not the cause of data
      quality issues;

     A majority of businesses do not have proper goals defined regarding the
      findability of their material within their research environment;

     Cultivate cross-functional collaboration and pro efficient behavior within
      companies are the keys to set up good information retrieval systems;

     Making content findable is the job responsibility of everyone within a
      company;

     People will not seek information that makes their jobs harder (even if it may
      benefit the organization they work for)

     Users are sacrificing information quality for accessibility;

     People are assigning Advanced to complex. Whereas Advanced is the right
      way to search.

     Accuracy issue can be reduced by checking the information from three
      independent and reliable sources;



                                                                                32
Chapter 3: Search engines dependency




                                       33
As previously seen search is the second most popular activity made of the
Internet and search engines are the most appropriate tool to do so.
       Before introducing the search engine dependency concept it may be
interesting to know the search engine market configuration.
       Even if Google is recognized as the leading brand in this field, his superiority
may be not worldwide.
       A strong effort has been made in this thesis to make it as global as possible.
Most of the publications in this area have been written considering the American and
European market as a representative sample of the market.
       The raising up of India and China in the technological world and the increase
of information on the Internet allow us now to get information about the Asian
market. If most of new technologies are coming from the United States it is
interesting to enlarge the research study to Asia to get a more representative and
exclusive panel.



3.1 Search engine categories


       Search engines can be divided into two categories:

      Commercial search engines available for free for the mass public mainly in
       exchange of advertisement display;

      Enterprise search engines for businesses. They are generally paid services,
       free of advertisement and customized for a specific need.

       3.1.1 Commercial search engines


   Commercial search engines are divided into four categories:

      Standard: the most well known search engines such as www.google.com,
       www.bing.com, http://www.ask.com/. They are looking for any kind of in-
       formation through the Internet and are characterized by a very light inter-
       face (mostly text-based applications):




                                                                                    34
Figure 12: Ask search engine home page

              Portals: Portals are a mix between standard search engines and direc-
               tories. Differently from search engines, directories are using human being
               instead of robots to index websites address. In theory (if we did not take
               into consideration the commercial aspect) directories should provide qual-
               ity information rather than quantity.50 Portals are then characterized by a
               lot of information on their home page including the search engine func-
               tion. The most well known portal is Yahoo.




                                     Figure 13: Yahoo home page



              Specialized search engines: they belong to a subcategory of the first
               group and are also called vertical search engines. Vertical search engine
               is to search the information sources of one industry or a kind.51 Specia-
               lized search engines are crawling only a restricted area and not the entire
               web. For example they can search only in a specific website or only a
               specific kind of document (books, images, .pdf documents, videos…).

               If specialized search engines are not a revolution in themselves (they are
               for most of them a filter of bigger search engines) they however find their




50
     Friedman, B. G. (2004). Web Search Savvy. p.21
51
     Wang, W. (2007). Integration and Innovation Orient to E-Society Volume 1. p.666
                                                                                       35
place when standard search engines are providing too many results for a
            given request.




                  Figure 14: An example of vertical search with Yahoo Images



           Semantic search engines: Most of search engines on the market are
            based on keywords and documents popularity (for example Google page
            rank) without taking into account the real content52. The idea behind se-
            mantic is to understand the hidden meaning of the information. A re-
            cent example of such search engine called ―Wolfram Alpha‖ just came out
            on the market, qualified as a ―knowledge engine‖53 designed to give you
            answers to your request rather than driving you to a website which may
            have it. Semantic search engines belong to the Web 3.0 generation where
            machines interpret the meaning of the data.54




                     Figure 15: A semantic search engine: Wolfram Alpha


52
   Priss, U./Corbett, D./Angelova, G. (2002). Conceptual structures. p.92
53
   Valentiner, Z. (2009). New search tool on the block: Wolfram Alpha. [online]. Available from :
http://www.mndaily.com/blogs/tech-corner/2009/05/20/new-search-tool-block-wolframalpha
[Accessed 17 June 2009]
54
   Cf. Sankar, K./Bouchard, S./Mancini, D. (2009). Enterprise Web 2.0 Fundamentals. P.161
                                                                                              36
3.1.2 Enterprise search engine (ESE)


           Enterprise Search Engine are dedicated to search within companies
environment such as Internet, Intranet, Customer Management System, Databases,
Wikis, Software Applications.

           Their use can be clearly understood when employees within companies are
looking for information which are not public or want to get pertinent information
within their own environment.

           Enterprise search engines have more or less the same technology and function
as commercial web search engine, they just target a specific group rather than a
mass public audience55.




3.2 Search engine market


       3.2.1 Commercial search engine market


       The commercial search engine market is segmented as follow:




                         Figure 16: Top 10 Worldwide Search December 2007



55
     cf. The Association for Enterprise and Content Management. (2008). Findability: The Art and
       Science    of    Making     Content    Easy     to    Find.    [online].   Available   from:
       http://www.aiim.org/Research/MarketIQ/Findability-7-16-08.aspx [Accessed 17 June 2009]

                                                                                                37
-       Google is the major leader with more than 60% ;

          -       Yahoo has a comfortable second position with more than 10%;

          -       Three other major search engines are sharing the 3rd , 4th and 5th place
          with market shares from 2,4 to 5%: Baidu, Microsoft and Naver;

          -       The presence of some specialized search engines in the top 10;



          As mentioned above, in 2007 the top 10 search website showed an interesting
market with the presence of:

          -       2 specialized search engines such as eBay and Alibaba.com;

          -       4 Asian search engines: Baidu, NHN, Yandex and Alibaba.com;

          This clearly shows the presence of Asian technologies. Moreover Baidu,
NHN and Yandex are nationally oriented as we will see later in chapter 3.2.7.



          3.2.2 Commercial search engine market: Consumer behavior


          A study made in the United States shows that Internet searchers are confident,
satisfied and trust search engines 56 some of those results are confirmed by a
Taiwanese study57:

•         92% are confident about their searching skills;

•         87% have a successful search experience;

•         68% believe that search engines are a fair and unbiased source of information;

•         44% of searchers say they regularly use a single search engine, 48% will use
just two or three, 7% will use more than three;



56
       Fallows,    D.    (2005).    Search      engine     users.   [online].  Available     from:
     http://www.pewinternet.org/~/media//Files/Reports/2005/PIP_Searchengine_users.pdf.pdf
     [Accessed 17 June 2009] p.2

57
    Insight Xplorer. (2006). 創 市 際 市 場 研 究 顧 問 . [online]. Available from:
http://www.insightxplorer.com/specialtopic/co_info_acquisition.html [Accessed 17 June 2009]
                                                                                               38
•          62% are not aware of a distinction between commercial and non commercial
results;

Moreover according to a study untitled: ―surveying the Digital Future‖ 58:




     Figure 17: How Much Of The Information On the Internet Do You Think is Reliable and
                                        Accurate?

           A huge majority of them is seeing it as a reliable and accurate source of
information over the time.

    According to another study 22% of Internet users have a search engine such as
     Google, Yahoo as their home page. This trend doubled since 2005.59

    Regarding search engines reliability and accuracy 51% in 2007 are saying that
     most or all the information produced by search engines is reliable and accurate.
     They were 62% in 2006;

    Internet users find high degree of reliability and accuracy on their favorite web
     sites, they were 81% in 2005, 83% in 2006 and 83% in 2007;60



58
   UCLA Center for Communication Policy. (2004). Surveying the Digital Future. [online]. Available
from: http://www.digitalcenter.org/downloads/DigitalFutureReport-Year4-2004.pdf [Accessed 18 June
2009]. P.39
59
   Center for the Digital Future (2008). Annual Internet Survey by the Center for the Digital Future.
[online].                                       Available                                       from
http://www.digitalcenter.org/pdf/2009_Digital_Future_Project_Release_Highlights.pdf [Accessed 19
June 2009] p.4
60
   Center for the Digital Future (2008). Annual Internet Survey by the Center for the Digital Future.
[online].                                       Available                                       from
http://www.digitalcenter.org/pdf/2009_Digital_Future_Project_Release_Highlights.pdf [Accessed 19
June 2009] p.5
                                                                                                  39
    In 2007, 80% of Internet users are considering that most or all of the information
     posted by well known media such as the New York Times and CNN is reliable
     and accurate. They were 77% in 2006.

        It seems that commercial Internet users have a positive search experience.
Even if they recognize data quality issues they seem not to understand where
those problems are coming from.

        It should then interesting to inform them more regarding the commercial
aspect of free search engines.



        3.2.3 Enterprise Search Engine market

        The Enterprise Search Engine market is far more confused and crowded 61
than the commercial one. There are not many information on it but what we can say
is that actors are different and that enterprise search engines are customized for a
specific use.

        In a book untitled ―Practical aspects of Knowledge Management― and written
in 2008 by Takahira Yamaguchi, a rank of the main actors in this field is given62:

        1st autonomy.com

        2nd Fastsearch.com

        3rd Endeca.com

        As we can see those three companies were not listed in the commercial search
engine ranking. However some commercial search engine firms are present on this
market such as Google with Google Search Appliance and Microsoft with Microsoft
Search Server.




61
     Feldman, S. (2005). Desperately seeking search. [online]. Available from:
http://www.kmworld.com/Articles/Editorial/Feature/Desperately-seeking-search-9665.aspx [Accessed
17 June 2009]
62
   Yamaguchi, T. (2008). Practical aspects of Knowledge Management. p.41
                                                                                             40
3.2.4 Enterprise Search Engine market: Consumer behavior

        The Enterprise search engine market has a different configuration that the
commercial one. However the main protagonists such as Google and Microsoft are
still present63. In opposite to the commercial web search engines, enterprise search
engine users are mostly disappointed by their search experience.




                        Figure 18: Enterprise search satisfaction

        It is quite impressive to see that almost the majority (49%) have a negative
image about searching for information within their enterprise search tools.

        The major reasons for this are:

–       The lack of training and consulting of those search tools within
organizations64;

–       The expectation to have results which are as pertinent as commercial web
search engines;




              Figure 19: Influence of the consumer web on enterprise search tools
63
    Kehoe, M. (2009). 2009 Overview of the Enterprise Search Market. [online]. Available from:
     http://www.ideaeng.com/tabId/98/itemId/181/Overview-of-the-Enterprise-Search-Market-
     2009.aspx [Accessed 17 June 2009]
64
   cf. The Association for Enterprise and Content Management. (2008). Findability: The Art and
     Science    of    Making     Content     Easy    to    Find.    [online].   Available    from:
     http://www.aiim.org/Research/MarketIQ/Findability-7-16-08.aspx [Accessed 17 June 2009] p.36
                                                                                               41
A vast majority (82%) agree to say that their consumer web experience on
how to look for information on the Internet influence their expectations regarding the
implementation of such technology within companies.

         As Ron Miller (cited in the following study) explained it: « On the web,
search engines like Google have the advantage of searching the entire web.
Therefore, the likelihood of finding query matches is much greater than in the
enterprise where the number of possible right answers is much smaller, and could in
fact be found in just a single document. (Of course finding more results doesn’t
necessarily mean finding right ones, but that’s another issue altogether.) »

         It is then not surprising to see that most of enterprise search engines are not
successful         in       finding        what         they        are       looking        for:




           Figure 20: Success rate of finding the information with enterprise search tools

         The problematic according to Ron Miller should then be as follow: "I don‟t
think the technology is failing us, I think it‟s the way we are using the
technologies," but he adds, "If I can’t find my content, it doesn’t exist."65

         This part clearly put in relevancy that searchers within companies are
confusing commercial search engines with enterprise search engines associating
directly one to the other. It shows as well the lack of training to those
technologies and confirm then the lack of technology literacy of Internet users.

         Moreover it clearly define what the market is: simple and easy to use
applications.



65
     Miller, R. (2009). Unlock Power Enterprise Search. [online]. Available from:
     http://byronmiller.typepad.com/UnlockPowerEnterpriseSearch.pdf [Accessed 17 June 2009] p.5

                                                                                              42
3.2.5 The commercial search market repartition


       Regarding the repartition of search use on the Internet we can see that the
block “Europe+ North America” is representing more than half of the market
with 55%.

       The Asian-Pacific area is well represented with one third of the market.




                         Figure 21: Worldwide Search by Region

       Northern American and Asian Internet users are more or less experiencing the
same volume of search whereas it is in Europe and Latin America that Internet users
are performing it the most per capita.

       This part will be more developed in chapter 5.1.4: Google dependency state.



               3.2.6 The commercial search engines in the world


       As mentioned in chapter 3.2.1, 6 research out of 10 on the Internet are made
on Google.

       However it does not mean that each country in the world has a population of
60% Google users.



                                                                                  43
Figure 22: Search engine leaders (>50%) per country personal estimation66

        The world is not covered entirely by Google. There are some 7 other leaders:
Yahoo, Yandex (Mail.ru), Baidu, Microsoft, Naver, Seznam and Leit.is.

        Almost all the American continent is using Google as well as Europe,
Northern Africa, Southern Africa, Australia and India.

        In one word almost all countries which have strong links with the Anglo-
Saxon culture.

        The strong presence of Yandex in Eastern Europe (ex-soviet countries) and
Russia could let us think about a possible « boycott of American technologies » and
support of Russian technologies. The recent partnership between Yandex (main
search engine in Russia) and the browser Firefox is increasing those suspicions67.




66
     Alexa Web. (n.d). Alexa Top 500 Global Sites. [online]. Available from:
http://www.alexa.com/topsites [Accessed 17 June 2009]
67
   cf. Houste, F. (2009). Russie: Yandex sera le moteur de recherche par défaut de Firefox. [online].
    Available from: http://www.search-engine-feng-shui.com/2009/01/russie-yandex-sera-le-moteur-
    de-recherche-par-defaut-de-firefox/ [Accessed 23 January 2009]
cf. Schwartz, B. (2009). Firefox Drops Google For Yandex In Russia, But Big Loser May Be Rambler.
[online]. Available from: http://searchengineland.com/firefox-drops-google-for-yandex-in-russia-but-
    big-loser-may-be-rambler-16107 [Accessed 23 January 2009]
                                                                                                  44
The same observation can be made in China. The recent advertisement
broadcast by Baidu 68 (the search engine leader in China) are going in that sense,
showing clearly the will of getting rid of foreigner search engines.69

          The Russian and Chinese cases are contradictory with the concept mentioned
in the book ―Winners, Losers and Microsoft‖ which is saying that the best product
always win70. The search engine market is then not a rational one.

          Information regarding Caribbean areas and Central Africa are hard to find and
are not very relevant taking in account that the Internet is not well implemented yet.

          On the other hand the Pacific area region is quite interesting because
containing all the « Tigers » (Taiwan, Thailand...) are all in red: Yahoo.



          As a conclusion the search engine world is divided into two parts:

          The Google planet: which is composed of all the Anglo-Saxon countries as
           well as countries which have strong links with the United States or Great
           Britain. Czech Republic and Iceland seem only to be a matter of time?71.

          The Asian – Pacific regions: Asia is composed of a lot of countries and then
           a lot of cultures. Among them we can identify four players:

               o   Yandex (Mail.ru) which is dominating all the ex-soviet countries;

               o   Baidu which has a total control over China;

               o   Naver (NHN Corporation), a 100% South Korean product which is
                   the best example that search engines work by culture;


68
        Baidu.      (2006).     Baidu      advertisement.     [online].   Available     from:
http://www.youtube.com/watch?v=EPnmsFl__nU [Accessed 17 June 2009]
69
   cf. Einhorn, B. (2007). Baidu Thinks It Can Play in Japan. [online]. Available
    from:http://www.businessweek.com/globalbiz/content/feb2007/gb20070215_649662.htm?chan=gl
    obalbiz_asia_technology [Accessed 23 January 2009]
cf.   Grallet, G. (2009). Baidu, un autre Google s'éveille. [online]. Available from:
    http://www.lexpress.fr/actualite/high-tech/baidu-un-autre-google-s-eveille_734826.html [Accessed
    23 January 2009]
cf. Shijun, Z./Peng, N./Weifeng, X. (2006). 时尚中国—网动中国英. p.45
70
   Liebowitz, S. J./Margolis, S. (1999). Winners, Losers and Microsoft
71
  cf. Rafat, A. (2008). Czech Portal Seznam Could Fetch $900 Million; Google, Apax, Warburg and
    Others     in    Fray.    [online]   Available   from:   http://www.washingtonpost.com/wp-
    dyn/content/article/2008/08/15/AR2008081502517.html [Accessed 23 January 2009]
cf.    Mar Hauksson, K. (2007). Global search report 2007 [online]. Available from:
      http://www.e3internet.com/downloads/global-search-report-2007.pdf [Accessed 23 January 2009]
      p.8
                                                                                                 45
o    Yahoo which is leader in almost all ―Tigers‖ Asian countries.



           Yahoo being an American technology how can we explain his domination in
Asia? The reason is mainly cultural, Yahoo is a shiny portal and that Asian culture on
the Internet recognize a quality website to the number of animations on it72. Another
explanation could be the leading presence of Yahoo in Japan which can influence the
tigers countries. Moreover Japan has one of the highest rate of the Internet
integration in the world per capita73.



                   3.2.7 Commercial search engine leaders presentation

           Knowing search engine leaders and the services they are providing is critical
to understand the search engine dependency concept. Here is a list of the main
commercial search engine actors:

           Google: 74 Created in 1998 in the United States. Physically present in 34
countries around the world.

           Services provided: News, Blogs, Images, Videos, Maps, Mail services,
Social networks, e-commerce, Online advertising…

           Language supported: More than 65.




                    Figure 24: Google logo



                                                  Figure 23: Search engine market in the
                                                   USA, source: Hitwise, february 2009




72
     cf. Tobin, R./Hotchkiss, G./Lee, P. (2008). Chinese Search Engine Engagement. [online]. Available
      from : http://www.enquiroresearch.com/download-research-whitepapers.aspx [Accessed 17 June
      2009] p.28.
73
    Internet World Stats. (2009). Internet Usage in Asia. [online].                 Available   from:
    http://www.internetworldstats.com/stats3.htm [Accessed 17 June 2009]
74
   Miller, M. (2006). Googlepedia. p.11
                                                                                                   46
Yahoo (―Yet Another Hierarchical Oracle‖):75 Created in 1994 in the United
States it started as a directory to become later an Internet Portal. Physically present in
20 countries around the world.

        Services provided: News, Business directory, Maps, Videos, Images, Online
advertising, Mail services, Jobs, Questions/Answers….

        Language supported: More than 20.




                   Figure 25: Yahoo Logo        Figure 26: Japanese search engine market,
                                                 source:webcreate.ga-pro.com, May 2009


        Baidu:76 Created in 2000 in China. Physically present in China and in Japan.

        Services provided 77 : News, Business directory, Maps, Music, Videos,
Images, Online advertising, Social networking…

        Language supported: 2 (Chinese and Japanese).




                   Figure 28: Baidu logo



                                                    Figure 27:Chinese search engine market,
                                                 source:China IntelliConsulting Corp. sept 2008




75
       Yahoo       Inc.     (n.d.).   Company        Overview.    [online].   Available     from:
http://yhoo.client.shareholder.com/press/overview.cfm [Accessed 17 June 2009]
Yahoo Inc. (n.d.). Yahoo dans le monde. [online]. Available from:
http://world.yahoo.com/?c=fr [Accessed 17 June 2009]
76
  Shijun, Z./Peng, N./Weifeng, X. (2006). 时尚中国—网动中国英. p45
Baidu Japan Inc. (n.d.). Baidu (バイドゥ)会社情報 - 会社概要 . [online]. Available from :
http://www.baidu.jp/info/corp/data.html [Accessed 17 June 2009]
77
         Baidu     Inc.     (n.d.).    Baidu      products.     [online].   Available from :
http://ir.baidu.com/phoenix.zhtml?c=188488&p=irol-products [Accessed 17 June 2009]
                                                                                              47
Microsoft: Microsoft main search engine is named ―Bing‖ (since June 2009).
Search engines being not the core activity of Microsoft it is quite complex to give a
description of it. Internet users do not go properly on Bing to use it but on Microsoft
other sites services such as hotmail. Microsoft is physically implemented all over the
world.

         Services provided: News, Social Networking, blogs, Mail services,
toolbar…

         Language supported:78 41




         Figure 29: Bing logo




         Naver: 79 Created in 1999 in South Korea. Naver is an Internet Portal.
Physically present in South Korea, China, Japan and the United States.

         Services provided: News, e-commerce, Social Networking, blogs, real time
information, Books, Mail services, toolbar.

         Language supported: Korean.




              Figure 31: Naver logo



                                               Figure 30: Korean search engine market, source:
                                                           July 2007 Koreanclick



         Yandex:80 Created in 1997 in Russia. Yandex is physically present in Russia,
Ukraine and the United States.


78
        Microsoft.     (n.d.).     Préférences    Bing.       [online].      Available    from:
http://www.bing.com/settings.aspx?sh=2&FORM=WIWA [Accessed 17 June 2009]
79
   NHN Corporation. (n.d.). NHN Corporation. [online]. Available from : http://www.nhncorp.com/
[Accessed 17 June 2009]
                                                                                            48
Services provided: News, e-commerce, Social Networking, blog search
engine, Maps, dictionary, Mail services, photos, website, videos, professional
network, online payment service, online advertising.

         Language supported: Russian, Ukrainian and English.




                   Figure 32: Yandex logo
                                                          Figure 33:Search engine market in Russia, source:
                                                                 LiveInternet.ru:December 31, 2008


         Seznam81: Created in 1996 in Czech Republic. Seznam is an Internet Portal.
Physically present in Czech Republic.

         Services provided: Search, Business directory, Images, Mail services, Online
advertising, e-commerce, News, Social Network, Jobs, Online Games.

         Language supported: Czech.




               Figure 34: Seznam logo


                                                        Figure 35: Search engine market in Czech Rep,
                                                               source: navrcholu.cz, June 2008




Leit.is:82 Leit.is is an Icelandic Internet portal created in 1999. It is physically
present in Iceland.

Services provided: Images, Music…


80
   Yandex inc. (2008). Russia’s largest internet search engine and a leading internet and technology
company. [online]. Available from: http://download.yandex.ru/company/mini_book_v19.pdf
[Accessed 17 June 2009]
81
    Seznam inc. (n.d.). Vize firmy | O společnosti Seznam.cz.[online]. Available from :
http://firma.seznam.cz/cz/vize-firmy.html [Accessed 17 June 2009]
82
   Leit.is. (n.d.). Leit.is - Um leit.is :: Um leit.is. [online] Available from: http://www.leit.is/umleit/
[Accessed 17 June 2009]
                                                                                                        49
Language supported: Icelandic and English




           Figure 37: Leit.is logo



                                              Figure 36:Search engine market in
                                                Iceland, source: statice.is 2007



       As we can see none of those search engine leaders are simple search
engines anymore. All are providing a bunch of services linked to their search
activity. Moreover they all have at least ten years of experience in the search field.
       The one accumulating the most market shares are the one who play
internationally.



               3.2.8 Commercial search engine complexity


       As previously mentioned commercial search engines are not only providing a
search experience. They are all moving toward a personalized interface with a set of
associated services. In fact they are changing to a personal desktop environment
where by creating a simple free account you can access to your emails, search engine,
personal documents, software suite solutions such as spreadsheet, slides or word
processor. iGoogle is a good example of it:




                                                                                   50
Figure 38: An example of a customized interface on iGoogle

         It is like an Operating System (Google) within the Operating System
(Microsoft, Linux, Mac OS).

         In such configuration commercial search engines are providing more
interesting services because more instinctive tools than the ones within companies.
Companies employees frustrations can then be understood. The technological mass
public market is for them moving faster than the business one.



                  3.2.9 Search engine market shares configuration


         A study untitled « Global Search Report 2007 »83 realized in 2007 made a
clear view of the market. It shows that the configuration of each market is always the
same:




83
     cf. Wilsdon, N. (2007). Global Search Report 2007. [online]. Available from:
     http://www.e3internet.com/downloads/global-search-report-2007.pdf [Accessed 23 January 2009]
                                                                                              51
Figure 39: Search engine market shares in 2007 for the Czech Republic

           It is very rare to find a country where there is a close competition among
search engines. Even if in the High Technology sector things change from a day to
another you have often the following configuration where the first search engine
is leading the game by more than 30 points on its followers.

           When a search engine get more than 50% of the market it is adopted as a
standard. This trend seems quite relevant in the software industry, people seem to
look for a standard used by all. This is the case for the Operating System industry,
the browser industry, the e-learning industry. The explanation of such a success with-
in a population can be found in the word to mouth, isn’t it how Google has been so
successful? How never heard sentences such as « you just have to Google it » Google
is even nowadays in dictionaries as a verb84.

           Markets are also define by a lot of small local search engines which are if
original enough bought by the biggest ones or if not will disappear quickly (some
examples are coming in the news every month). The only key of the success on the
short term seems to be advertisement but on the long run you need the technology
behind in order to compete.



                   3.2.10 Search engines competition




84
     cf. Merriam Webster. (2001). Google - Definition from the Merriam-Webster Online Dictionary.
      [online]. Available from: http://www.merriam-webster.com/dictionary/google [Accessed 17 June
      2009]


                                                                                               52
Google has been created in 1998 and was not a pioneer in the field of search
engines. In a short period of time Google succeed to take the lead and among the
pioneers in this field only Yahoo (created in 1994) is still in place.

         Even if Google has a dominating position on the market it will take him a lot
of time to be the number one in all countries (as we saw this market is not rational
mainly due to political and cultural reasons). This situation is in fact giving
hope/time to its competitors.

         Yahoo is still in discussion with Microsoft in order to buy Yahoo search
technologies. One can understand how strategic can be such acquisition. Yahoo
having the research knowledge and Microsoft the funds as well as the software
ownership.

         Regarding Baidu we cannot clearly see how they could compete against
Google outside of China.

         What about new comers? Starting from nothing they could maybe beat
famous search engines in a small period of time. It could have been the success of
some services such as Cuil launched in summer 2008 which received a lot of
advertisement through the news85. But the search engines market is a very ungrateful
world where visitors are giving no more than one chance: the product works or it
does not. In the case of Cuil it did not.
         ―An information retrieval system will tend not to be used whenever it is more
painful and troublesome for a customer to have information than for him not to have
it.‖86

         Users want the information as soon as they can. When you move from
Google to another search engine you are often intransigent. At the first result which
does not fit your expectations you will go back to Google. But is the search engine
wrong or is it because it is responding differently that on what you were used to?

         As a conclusion it is hard to say how Google can lose his dominant posi-
tion. Until now only one company succeeds to make a such gap in the world of



85
   cf. Arrington, M. (2008). Cuil On BusinessWeek's Most Successful of 2008 List. Huh?. [online].
Available from: http://www.techcrunch.com/2008/12/29/cuil-on-businessweeks-most-successful-of-
2008-list/ [Accessed 17 June 2009]
86
   Mooers, N. C. (1959). A panel discussion at the Annual Meeting of the
American Documentation Institute. 24 October.
                                                                                              53
search engine and it is Google itself and it was in a period where everything had to
be created on the Internet.

        A new technology regarding research is however more and more recurrent in
this field and is called semantic research.



        3.3 Search engine dependency aspect


        As mentioned in the introduction search engine dependency is the fact that
people are using only one search engine and then only one way to process data when
looking for information on the Internet.



                 3.3.1 Search engines dependency proves


        The sources used for this part are coming from Canadian 87 , French 88 and
Belgium students panels89. Some other information regarding Germany, China (Hong
Kong)90 and the United States91 have also been used.

        Those studies have been made on different panels: students, workers
(researchers), household and the following conclusion have been made: search
engine is the first tool when looking for information on the Internet.

        It also states regarding surveys made on students that most of them did not
receive enough training on how to look for information.



87
   cf. Crepuq. (2003). Etude sur les connaissances en recherche documentaire des étudiants entrant au
    1er     cycle    dans     les     universités    québécoises.   [online].    Available     from :
    http://www.crepuq.qc.ca/documents/bibl/formation/etude.pdf [Accessed 18 June 2009]
88
   cf. Université de Lyon. (2007). De la documentation au plagiat. [online]. Available from :
    http://www.compilatio.net/files/sixdegres-univ-lyon_enquete-plagiat_sept07.pdf [Accessed 18
    June 2009]
89
   cf. EduDoc. (2008). Enquête sur les compétences documentaires et informationnelles des étudiants
    qui accèdent à l'enseignement supérieur en Communauté française de Belgique. [online].
    Available from : http://www.edudoc.be/synthese.pdf [Accessed 18 June 2009]
90
   cf. Leung, H. W. 梁漢榮. (2004). A study of computer science students' conceptions of information
    literacy and their experiences in information search process and use. [online]. Available from:
    http://hub.hku.hk/handle/123456789/30758 [Accessed 18 June 2009]
91
    cf. Enquiro. (2004). Search Engine Usage in North America. [online]. Available from:
    http://www.enquiroresearch.com/download-research-whitepapers.aspx [Accessed 18 June 2009]
                                                                                                  54
Search Engine Risk Dependency by Ronan Chardennau
Search Engine Risk Dependency by Ronan Chardennau
Search Engine Risk Dependency by Ronan Chardennau
Search Engine Risk Dependency by Ronan Chardennau
Search Engine Risk Dependency by Ronan Chardennau
Search Engine Risk Dependency by Ronan Chardennau
Search Engine Risk Dependency by Ronan Chardennau
Search Engine Risk Dependency by Ronan Chardennau
Search Engine Risk Dependency by Ronan Chardennau
Search Engine Risk Dependency by Ronan Chardennau
Search Engine Risk Dependency by Ronan Chardennau
Search Engine Risk Dependency by Ronan Chardennau
Search Engine Risk Dependency by Ronan Chardennau
Search Engine Risk Dependency by Ronan Chardennau
Search Engine Risk Dependency by Ronan Chardennau
Search Engine Risk Dependency by Ronan Chardennau
Search Engine Risk Dependency by Ronan Chardennau
Search Engine Risk Dependency by Ronan Chardennau
Search Engine Risk Dependency by Ronan Chardennau
Search Engine Risk Dependency by Ronan Chardennau
Search Engine Risk Dependency by Ronan Chardennau
Search Engine Risk Dependency by Ronan Chardennau
Search Engine Risk Dependency by Ronan Chardennau
Search Engine Risk Dependency by Ronan Chardennau
Search Engine Risk Dependency by Ronan Chardennau
Search Engine Risk Dependency by Ronan Chardennau
Search Engine Risk Dependency by Ronan Chardennau
Search Engine Risk Dependency by Ronan Chardennau
Search Engine Risk Dependency by Ronan Chardennau
Search Engine Risk Dependency by Ronan Chardennau
Search Engine Risk Dependency by Ronan Chardennau
Search Engine Risk Dependency by Ronan Chardennau
Search Engine Risk Dependency by Ronan Chardennau
Search Engine Risk Dependency by Ronan Chardennau
Search Engine Risk Dependency by Ronan Chardennau
Search Engine Risk Dependency by Ronan Chardennau
Search Engine Risk Dependency by Ronan Chardennau
Search Engine Risk Dependency by Ronan Chardennau
Search Engine Risk Dependency by Ronan Chardennau
Search Engine Risk Dependency by Ronan Chardennau
Search Engine Risk Dependency by Ronan Chardennau
Search Engine Risk Dependency by Ronan Chardennau
Search Engine Risk Dependency by Ronan Chardennau
Search Engine Risk Dependency by Ronan Chardennau
Search Engine Risk Dependency by Ronan Chardennau
Search Engine Risk Dependency by Ronan Chardennau
Search Engine Risk Dependency by Ronan Chardennau
Search Engine Risk Dependency by Ronan Chardennau
Search Engine Risk Dependency by Ronan Chardennau
Search Engine Risk Dependency by Ronan Chardennau
Search Engine Risk Dependency by Ronan Chardennau
Search Engine Risk Dependency by Ronan Chardennau
Search Engine Risk Dependency by Ronan Chardennau
Search Engine Risk Dependency by Ronan Chardennau
Search Engine Risk Dependency by Ronan Chardennau
Search Engine Risk Dependency by Ronan Chardennau
Search Engine Risk Dependency by Ronan Chardennau
Search Engine Risk Dependency by Ronan Chardennau
Search Engine Risk Dependency by Ronan Chardennau
Search Engine Risk Dependency by Ronan Chardennau
Search Engine Risk Dependency by Ronan Chardennau
Search Engine Risk Dependency by Ronan Chardennau

Más contenido relacionado

La actualidad más candente

Os Property documentation
Os Property documentationOs Property documentation
Os Property documentationĐặng Đàm
 
React js notes for professionals
React js notes for professionalsReact js notes for professionals
React js notes for professionalsZafer Galip Ozberk
 
Ug recording excelmacros
Ug recording excelmacrosUg recording excelmacros
Ug recording excelmacrosHarry Adnan
 
Angular2 notes for professionals
Angular2 notes for professionalsAngular2 notes for professionals
Angular2 notes for professionalsZafer Galip Ozberk
 
Angular js notes for professionals
Angular js notes for professionalsAngular js notes for professionals
Angular js notes for professionalsZafer Galip Ozberk
 
Smart pass management platform for face&amp;temperature reader complete tutor...
Smart pass management platform for face&amp;temperature reader complete tutor...Smart pass management platform for face&amp;temperature reader complete tutor...
Smart pass management platform for face&amp;temperature reader complete tutor...Carmen Huang
 
ใบงาน 1-เทคโนโลยีสารสนเทศ
ใบงาน 1-เทคโนโลยีสารสนเทศใบงาน 1-เทคโนโลยีสารสนเทศ
ใบงาน 1-เทคโนโลยีสารสนเทศnareerat inthukhahit
 
23909483 how-internet-help-to-develop-business
23909483 how-internet-help-to-develop-business23909483 how-internet-help-to-develop-business
23909483 how-internet-help-to-develop-businessgpgw
 
Mongo db notes for professionals
Mongo db notes for professionalsMongo db notes for professionals
Mongo db notes for professionalsZafer Galip Ozberk
 
Jquery notes for professionals
Jquery notes for professionalsJquery notes for professionals
Jquery notes for professionalsZafer Galip Ozberk
 
MarvelSoft Library Management Software Guide
MarvelSoft Library Management Software GuideMarvelSoft Library Management Software Guide
MarvelSoft Library Management Software GuideRanganath Shivaram
 
UC For Business - Executive Desktop
UC For Business - Executive DesktopUC For Business - Executive Desktop
UC For Business - Executive DesktopNECIndia
 
Algorithms notesforprofessionals
Algorithms notesforprofessionalsAlgorithms notesforprofessionals
Algorithms notesforprofessionalsdesi2907
 

La actualidad más candente (18)

Glogster edu-users-guide
Glogster edu-users-guideGlogster edu-users-guide
Glogster edu-users-guide
 
Os Property documentation
Os Property documentationOs Property documentation
Os Property documentation
 
Hewlp
HewlpHewlp
Hewlp
 
React js notes for professionals
React js notes for professionalsReact js notes for professionals
React js notes for professionals
 
Ug recording excelmacros
Ug recording excelmacrosUg recording excelmacros
Ug recording excelmacros
 
perl_tk_tutorial
perl_tk_tutorialperl_tk_tutorial
perl_tk_tutorial
 
Angular2 notes for professionals
Angular2 notes for professionalsAngular2 notes for professionals
Angular2 notes for professionals
 
Angular js notes for professionals
Angular js notes for professionalsAngular js notes for professionals
Angular js notes for professionals
 
Smart pass management platform for face&amp;temperature reader complete tutor...
Smart pass management platform for face&amp;temperature reader complete tutor...Smart pass management platform for face&amp;temperature reader complete tutor...
Smart pass management platform for face&amp;temperature reader complete tutor...
 
An introduction-to-tkinter
An introduction-to-tkinterAn introduction-to-tkinter
An introduction-to-tkinter
 
ใบงาน 1-เทคโนโลยีสารสนเทศ
ใบงาน 1-เทคโนโลยีสารสนเทศใบงาน 1-เทคโนโลยีสารสนเทศ
ใบงาน 1-เทคโนโลยีสารสนเทศ
 
23909483 how-internet-help-to-develop-business
23909483 how-internet-help-to-develop-business23909483 how-internet-help-to-develop-business
23909483 how-internet-help-to-develop-business
 
Mongo db notes for professionals
Mongo db notes for professionalsMongo db notes for professionals
Mongo db notes for professionals
 
Jquery notes for professionals
Jquery notes for professionalsJquery notes for professionals
Jquery notes for professionals
 
MarvelSoft Library Management Software Guide
MarvelSoft Library Management Software GuideMarvelSoft Library Management Software Guide
MarvelSoft Library Management Software Guide
 
It project development fundamentals
It project development fundamentalsIt project development fundamentals
It project development fundamentals
 
UC For Business - Executive Desktop
UC For Business - Executive DesktopUC For Business - Executive Desktop
UC For Business - Executive Desktop
 
Algorithms notesforprofessionals
Algorithms notesforprofessionalsAlgorithms notesforprofessionals
Algorithms notesforprofessionals
 

Destacado

Stephen Covey 9010
Stephen Covey   9010Stephen Covey   9010
Stephen Covey 9010vladgliga
 
Tiny Review: Constrained by Design
Tiny Review: Constrained by DesignTiny Review: Constrained by Design
Tiny Review: Constrained by DesignMelissa Miranda
 
Brovoinet Presentation Eng
Brovoinet Presentation EngBrovoinet Presentation Eng
Brovoinet Presentation EngIvan Warman
 
Svíþjóð
SvíþjóðSvíþjóð
Svíþjóðjanusg
 
The Restart Project at REconomy Day
The Restart Project at REconomy DayThe Restart Project at REconomy Day
The Restart Project at REconomy DayUgo Vallauri
 
Version 6 Spbt 2007.Prs
Version 6    Spbt 2007.PrsVersion 6    Spbt 2007.Prs
Version 6 Spbt 2007.Prsgsapnar
 
Úkranía Janus
Úkranía JanusÚkranía Janus
Úkranía Janusjanusg
 
OLAP Release 13082012
OLAP Release 13082012OLAP Release 13082012
OLAP Release 13082012Pozzolini
 
The Thrill Of Negotiating
The Thrill Of NegotiatingThe Thrill Of Negotiating
The Thrill Of NegotiatingPozzolini
 
Innovachron Offering
Innovachron OfferingInnovachron Offering
Innovachron Offeringremicote
 
Hallgrimu Petursson
Hallgrimu PeturssonHallgrimu Petursson
Hallgrimu Peturssonjanusg
 
Fuglar
FuglarFuglar
Fuglarjanusg
 
Informatie, Middel Voor Gezonde Sturing
Informatie, Middel Voor Gezonde SturingInformatie, Middel Voor Gezonde Sturing
Informatie, Middel Voor Gezonde SturingDaan Blinde
 
Open Source per Donne / Girl Geek
Open Source per Donne / Girl GeekOpen Source per Donne / Girl Geek
Open Source per Donne / Girl GeekSara Rosso
 
Future World Giving - Recognising the potential of middle class giving
Future World Giving - Recognising  the potential of middle class givingFuture World Giving - Recognising  the potential of middle class giving
Future World Giving - Recognising the potential of middle class givingIDIS
 
Daily Bike Commute Sf Bay Area
Daily Bike Commute Sf Bay AreaDaily Bike Commute Sf Bay Area
Daily Bike Commute Sf Bay Areaguest77208866
 
Konyvtaroskepzes 2010
Konyvtaroskepzes 2010Konyvtaroskepzes 2010
Konyvtaroskepzes 2010bara1
 
Jess & Danny Math Exit Project
Jess & Danny Math Exit ProjectJess & Danny Math Exit Project
Jess & Danny Math Exit ProjectJessicaanddanny
 

Destacado (20)

Stephen Covey 9010
Stephen Covey   9010Stephen Covey   9010
Stephen Covey 9010
 
Tiny Review: Constrained by Design
Tiny Review: Constrained by DesignTiny Review: Constrained by Design
Tiny Review: Constrained by Design
 
Brovoinet Presentation Eng
Brovoinet Presentation EngBrovoinet Presentation Eng
Brovoinet Presentation Eng
 
Designer Fund
Designer FundDesigner Fund
Designer Fund
 
Svíþjóð
SvíþjóðSvíþjóð
Svíþjóð
 
The Restart Project at REconomy Day
The Restart Project at REconomy DayThe Restart Project at REconomy Day
The Restart Project at REconomy Day
 
Version 6 Spbt 2007.Prs
Version 6    Spbt 2007.PrsVersion 6    Spbt 2007.Prs
Version 6 Spbt 2007.Prs
 
Úkranía Janus
Úkranía JanusÚkranía Janus
Úkranía Janus
 
OLAP Release 13082012
OLAP Release 13082012OLAP Release 13082012
OLAP Release 13082012
 
The Thrill Of Negotiating
The Thrill Of NegotiatingThe Thrill Of Negotiating
The Thrill Of Negotiating
 
Innovachron Offering
Innovachron OfferingInnovachron Offering
Innovachron Offering
 
Anniversary2012
Anniversary2012Anniversary2012
Anniversary2012
 
Hallgrimu Petursson
Hallgrimu PeturssonHallgrimu Petursson
Hallgrimu Petursson
 
Fuglar
FuglarFuglar
Fuglar
 
Informatie, Middel Voor Gezonde Sturing
Informatie, Middel Voor Gezonde SturingInformatie, Middel Voor Gezonde Sturing
Informatie, Middel Voor Gezonde Sturing
 
Open Source per Donne / Girl Geek
Open Source per Donne / Girl GeekOpen Source per Donne / Girl Geek
Open Source per Donne / Girl Geek
 
Future World Giving - Recognising the potential of middle class giving
Future World Giving - Recognising  the potential of middle class givingFuture World Giving - Recognising  the potential of middle class giving
Future World Giving - Recognising the potential of middle class giving
 
Daily Bike Commute Sf Bay Area
Daily Bike Commute Sf Bay AreaDaily Bike Commute Sf Bay Area
Daily Bike Commute Sf Bay Area
 
Konyvtaroskepzes 2010
Konyvtaroskepzes 2010Konyvtaroskepzes 2010
Konyvtaroskepzes 2010
 
Jess & Danny Math Exit Project
Jess & Danny Math Exit ProjectJess & Danny Math Exit Project
Jess & Danny Math Exit Project
 

Similar a Search Engine Risk Dependency by Ronan Chardennau

Risks of search engine dependency and its influence on data quality
Risks of search engine dependency and its influence on data qualityRisks of search engine dependency and its influence on data quality
Risks of search engine dependency and its influence on data qualityNanor
 
bonino_thesis_final
bonino_thesis_finalbonino_thesis_final
bonino_thesis_finalDario Bonino
 
Dimensional modeling in a bi environment
Dimensional modeling in a bi environmentDimensional modeling in a bi environment
Dimensional modeling in a bi environmentdivjeev
 
Software Engineering
Software EngineeringSoftware Engineering
Software EngineeringSoftware Guru
 
Report on e-Notice App (An Android Application)
Report on e-Notice App (An Android Application)Report on e-Notice App (An Android Application)
Report on e-Notice App (An Android Application)Priyanka Kapoor
 
Specification of the Linked Media Layer
Specification of the Linked Media LayerSpecification of the Linked Media Layer
Specification of the Linked Media LayerLinkedTV
 
An Analysis of Component-based Software Development -Maximize the reuse of ex...
An Analysis of Component-based Software Development -Maximize the reuse of ex...An Analysis of Component-based Software Development -Maximize the reuse of ex...
An Analysis of Component-based Software Development -Maximize the reuse of ex...Mohammad Salah uddin
 
Ibm watson analytics
Ibm watson analyticsIbm watson analytics
Ibm watson analyticsLeon Henry
 
Capturing Knowledge Of User Preferences With Recommender Systems
Capturing Knowledge Of User Preferences With Recommender SystemsCapturing Knowledge Of User Preferences With Recommender Systems
Capturing Knowledge Of User Preferences With Recommender SystemsMegaVjohnson
 
Отчет из Германии о 4й промышленной революции
Отчет из Германии о 4й промышленной революции Отчет из Германии о 4й промышленной революции
Отчет из Германии о 4й промышленной революции Sergey Zhdanov
 
Using Open Source Tools For STR7XX Cross Development
Using Open Source Tools For STR7XX Cross DevelopmentUsing Open Source Tools For STR7XX Cross Development
Using Open Source Tools For STR7XX Cross DevelopmentGiacomo Antonino Fazio
 

Similar a Search Engine Risk Dependency by Ronan Chardennau (20)

Risks of search engine dependency and its influence on data quality
Risks of search engine dependency and its influence on data qualityRisks of search engine dependency and its influence on data quality
Risks of search engine dependency and its influence on data quality
 
bonino_thesis_final
bonino_thesis_finalbonino_thesis_final
bonino_thesis_final
 
Dimensional modeling in a bi environment
Dimensional modeling in a bi environmentDimensional modeling in a bi environment
Dimensional modeling in a bi environment
 
Software Engineering
Software EngineeringSoftware Engineering
Software Engineering
 
Report on e-Notice App (An Android Application)
Report on e-Notice App (An Android Application)Report on e-Notice App (An Android Application)
Report on e-Notice App (An Android Application)
 
Specification of the Linked Media Layer
Specification of the Linked Media LayerSpecification of the Linked Media Layer
Specification of the Linked Media Layer
 
An Analysis of Component-based Software Development -Maximize the reuse of ex...
An Analysis of Component-based Software Development -Maximize the reuse of ex...An Analysis of Component-based Software Development -Maximize the reuse of ex...
An Analysis of Component-based Software Development -Maximize the reuse of ex...
 
Ibm watson analytics
Ibm watson analyticsIbm watson analytics
Ibm watson analytics
 
IBM Watson Content Analytics Redbook
IBM Watson Content Analytics RedbookIBM Watson Content Analytics Redbook
IBM Watson Content Analytics Redbook
 
Capturing Knowledge Of User Preferences With Recommender Systems
Capturing Knowledge Of User Preferences With Recommender SystemsCapturing Knowledge Of User Preferences With Recommender Systems
Capturing Knowledge Of User Preferences With Recommender Systems
 
Fraser_William
Fraser_WilliamFraser_William
Fraser_William
 
cs-2002-01
cs-2002-01cs-2002-01
cs-2002-01
 
E participation study
E participation study E participation study
E participation study
 
Montero Dea Camera Ready
Montero Dea Camera ReadyMontero Dea Camera Ready
Montero Dea Camera Ready
 
Dynamics AX/ X++
Dynamics AX/ X++Dynamics AX/ X++
Dynamics AX/ X++
 
Отчет из Германии о 4й промышленной революции
Отчет из Германии о 4й промышленной революции Отчет из Германии о 4й промышленной революции
Отчет из Германии о 4й промышленной революции
 
Industry 4.0 Final Report, National Academy of Science and Engineering of Ger...
Industry 4.0 Final Report, National Academy of Science and Engineering of Ger...Industry 4.0 Final Report, National Academy of Science and Engineering of Ger...
Industry 4.0 Final Report, National Academy of Science and Engineering of Ger...
 
U M Lvs I D E F
U M Lvs I D E FU M Lvs I D E F
U M Lvs I D E F
 
Using Open Source Tools For STR7XX Cross Development
Using Open Source Tools For STR7XX Cross DevelopmentUsing Open Source Tools For STR7XX Cross Development
Using Open Source Tools For STR7XX Cross Development
 
Nato1968
Nato1968Nato1968
Nato1968
 

Más de Pozzolini

Think With Your Brain (Revised Release Of Do We Need Business Intelligence) A...
Think With Your Brain (Revised Release Of Do We Need Business Intelligence) A...Think With Your Brain (Revised Release Of Do We Need Business Intelligence) A...
Think With Your Brain (Revised Release Of Do We Need Business Intelligence) A...Pozzolini
 
Business Analytics Lesson Of The Day August 2012
Business Analytics Lesson Of The Day August 2012Business Analytics Lesson Of The Day August 2012
Business Analytics Lesson Of The Day August 2012Pozzolini
 
Its Not The Effort You Put In That Counts, Its The Results You Get Out.“
Its Not The Effort You Put In That Counts, Its The Results You Get Out.“Its Not The Effort You Put In That Counts, Its The Results You Get Out.“
Its Not The Effort You Put In That Counts, Its The Results You Get Out.“Pozzolini
 
Baron Bic Was A Genius
Baron Bic Was A GeniusBaron Bic Was A Genius
Baron Bic Was A GeniusPozzolini
 
Is It True That (Presentation Intercultural Management Course 2011) Version 1...
Is It True That (Presentation Intercultural Management Course 2011) Version 1...Is It True That (Presentation Intercultural Management Course 2011) Version 1...
Is It True That (Presentation Intercultural Management Course 2011) Version 1...Pozzolini
 
The Tolouse Lecture On Innovation Beta 8 22.01.10
The Tolouse Lecture On Innovation Beta 8 22.01.10The Tolouse Lecture On Innovation Beta 8 22.01.10
The Tolouse Lecture On Innovation Beta 8 22.01.10Pozzolini
 
Innovation 1.07 03042010
Innovation 1.07 03042010Innovation 1.07 03042010
Innovation 1.07 03042010Pozzolini
 
1.0 Lesson One The I Needs The You Beta One 1.06 01012010
1.0 Lesson One The I Needs The You Beta One 1.06 010120101.0 Lesson One The I Needs The You Beta One 1.06 01012010
1.0 Lesson One The I Needs The You Beta One 1.06 01012010Pozzolini
 
Decision Taking &amp; Making Personal Developement January 2010 27122009
Decision Taking &amp; Making Personal Developement January 2010 27122009Decision Taking &amp; Making Personal Developement January 2010 27122009
Decision Taking &amp; Making Personal Developement January 2010 27122009Pozzolini
 
The Return To Proactive Mangemen Of Say What You Do July 2009 Version 1.06 ...
The Return To Proactive  Mangemen Of Say What You  Do July 2009 Version 1.06 ...The Return To Proactive  Mangemen Of Say What You  Do July 2009 Version 1.06 ...
The Return To Proactive Mangemen Of Say What You Do July 2009 Version 1.06 ...Pozzolini
 
Le Retour Du RêVe Final Document Street Smart Manager 12102009
Le Retour Du RêVe Final Document Street Smart Manager 12102009Le Retour Du RêVe Final Document Street Smart Manager 12102009
Le Retour Du RêVe Final Document Street Smart Manager 12102009Pozzolini
 
01. Negotiating Presentation For The Seminar Generic 2009 September Beta Rele...
01. Negotiating Presentation For The Seminar Generic 2009 September Beta Rele...01. Negotiating Presentation For The Seminar Generic 2009 September Beta Rele...
01. Negotiating Presentation For The Seminar Generic 2009 September Beta Rele...Pozzolini
 
From Notebook To Facebook 22082009 1.06
From Notebook To Facebook 22082009 1.06From Notebook To Facebook 22082009 1.06
From Notebook To Facebook 22082009 1.06Pozzolini
 

Más de Pozzolini (13)

Think With Your Brain (Revised Release Of Do We Need Business Intelligence) A...
Think With Your Brain (Revised Release Of Do We Need Business Intelligence) A...Think With Your Brain (Revised Release Of Do We Need Business Intelligence) A...
Think With Your Brain (Revised Release Of Do We Need Business Intelligence) A...
 
Business Analytics Lesson Of The Day August 2012
Business Analytics Lesson Of The Day August 2012Business Analytics Lesson Of The Day August 2012
Business Analytics Lesson Of The Day August 2012
 
Its Not The Effort You Put In That Counts, Its The Results You Get Out.“
Its Not The Effort You Put In That Counts, Its The Results You Get Out.“Its Not The Effort You Put In That Counts, Its The Results You Get Out.“
Its Not The Effort You Put In That Counts, Its The Results You Get Out.“
 
Baron Bic Was A Genius
Baron Bic Was A GeniusBaron Bic Was A Genius
Baron Bic Was A Genius
 
Is It True That (Presentation Intercultural Management Course 2011) Version 1...
Is It True That (Presentation Intercultural Management Course 2011) Version 1...Is It True That (Presentation Intercultural Management Course 2011) Version 1...
Is It True That (Presentation Intercultural Management Course 2011) Version 1...
 
The Tolouse Lecture On Innovation Beta 8 22.01.10
The Tolouse Lecture On Innovation Beta 8 22.01.10The Tolouse Lecture On Innovation Beta 8 22.01.10
The Tolouse Lecture On Innovation Beta 8 22.01.10
 
Innovation 1.07 03042010
Innovation 1.07 03042010Innovation 1.07 03042010
Innovation 1.07 03042010
 
1.0 Lesson One The I Needs The You Beta One 1.06 01012010
1.0 Lesson One The I Needs The You Beta One 1.06 010120101.0 Lesson One The I Needs The You Beta One 1.06 01012010
1.0 Lesson One The I Needs The You Beta One 1.06 01012010
 
Decision Taking &amp; Making Personal Developement January 2010 27122009
Decision Taking &amp; Making Personal Developement January 2010 27122009Decision Taking &amp; Making Personal Developement January 2010 27122009
Decision Taking &amp; Making Personal Developement January 2010 27122009
 
The Return To Proactive Mangemen Of Say What You Do July 2009 Version 1.06 ...
The Return To Proactive  Mangemen Of Say What You  Do July 2009 Version 1.06 ...The Return To Proactive  Mangemen Of Say What You  Do July 2009 Version 1.06 ...
The Return To Proactive Mangemen Of Say What You Do July 2009 Version 1.06 ...
 
Le Retour Du RêVe Final Document Street Smart Manager 12102009
Le Retour Du RêVe Final Document Street Smart Manager 12102009Le Retour Du RêVe Final Document Street Smart Manager 12102009
Le Retour Du RêVe Final Document Street Smart Manager 12102009
 
01. Negotiating Presentation For The Seminar Generic 2009 September Beta Rele...
01. Negotiating Presentation For The Seminar Generic 2009 September Beta Rele...01. Negotiating Presentation For The Seminar Generic 2009 September Beta Rele...
01. Negotiating Presentation For The Seminar Generic 2009 September Beta Rele...
 
From Notebook To Facebook 22082009 1.06
From Notebook To Facebook 22082009 1.06From Notebook To Facebook 22082009 1.06
From Notebook To Facebook 22082009 1.06
 

Search Engine Risk Dependency by Ronan Chardennau

  • 1. Risks of search engine dependency and its influence on data quality A thesis submitted for the European Master in Business Studies (EMBS) by Ronan CHARDONNEAU Institut de Management de l'Université de Savoie d'Annecy (FR) Università degli studi di Trento (IT) Universität Kassel (GER) Universidad de León (SP) Date of submission: June the 26th, 2009 Master Thesis
  • 2. Acknowledgements Sincere and grateful acknowledgements have to be made to:  Mr Francesco VALOTTO, Co-founder of Edexon (Italy) who gave me the idea to study the world of search engine optimization which finally ended to the following topic.  Mr Roland ZIMMERMANN, from the University of Kassel (Germany) for his help in structuring the thesis and his rereading.  Mr Eugenio POZZOLINI, from the Advancia Business School (France) for all his advice.  Mr Andrea MOLINARI, from the University of Trento (Italy) who accepted to be the tutor of the thesis.  Mr Charles KNIGHT, editor at AltSearchEngines (United States) for promoting the thesis on his website.  Mr Daniel Arias-Aranda, associate professor at the Universidad de Granada (Spain) for his rereading and feedback.  Mr Charles NODOT, first year student (France) within the European Master in Business Studies, for his rereading, feedback and corrections. 2
  • 3. Contents Acknowledgements...................................................................................................................2 Contents ....................................................................................................................................3 Table of figures .........................................................................................................................6 Foreword ...................................................................................................................................8 Chapter 1: Introduction of the topic background ....................................................................10 1.1 Relevance of the subject ...............................................................................................13 1.2 Major terms...................................................................................................................14 1.3 Focus, goals and structure of the report ........................................................................15 1.4 Chaper 1: Key points ....................................................................................................17 Chapter 2: Concept of data quality .........................................................................................18 2.1 Data quality definition ..................................................................................................20 2.2 Data quality issues within businesses ...........................................................................21 2.3 Origins of data quality issues: Garbage In Garbage Out ..............................................24 2.3.1 Poor data quality content: the Wikipedia example ................................................24 2.3.2 Poor data quality content: the commercial example ..............................................25 2.3.3 Metadata ................................................................................................................26 2.3.4 Findability ..............................................................................................................27 2.4 Data quality solutions ...................................................................................................29 2.4.1 Learning how to use search tools ...........................................................................29 2.4.2 Check out the information: the Triangle method ...................................................30 2.5 Chapter 2: key points ....................................................................................................32 Chapter 3: Search engines dependency ..................................................................................33 3.1 Search engine categories...............................................................................................34 3.1.1 Commercial search engines ...................................................................................34 3.1.2 Enterprise search engine (ESE) .............................................................................37 3.2 Search engine market ....................................................................................................37 3.2.1 Commercial search engine market .........................................................................37 3.2.2 Commercial search engine market: Consumer behavior .......................................38 3.2.3 Enterprise Search Engine market ...........................................................................40 3
  • 4. 3.2.4 Enterprise Search Engine market: Consumer behavior .........................................41 3.2.5 The commercial search market repartition ............................................................43 3.2.6 The commercial search engines in the world .........................................................43 3.2.7 Commercial search engine leaders presentation ....................................................46 3.2.8 Commercial search engine complexity ..................................................................50 3.2.9 Search engine market shares configuration ...........................................................51 3.2.10 Search engines competition .................................................................................52 3.3 Search engine dependency aspect .................................................................................54 3.3.1 Search engines dependency proves........................................................................54 3.3.2 Types of search engines dependency .....................................................................55 3.3.3 Search engine loyalty .............................................................................................56 3.3.4 Search engines dependency issues .........................................................................57 3.3.5 Privacy issues.........................................................................................................58 3.3.6 Search engine awareness .......................................................................................59 3.4 Search engine dependency conclusion .........................................................................64 3.5 Chapter 3: key points ....................................................................................................65 Chapter 4: Risks of search engines dependency and its influence on data quality .................66 4.1 Search engine dependency and its influence on data quality: Issues ............................67 4.1.1 Search Engine Optimization ..................................................................................67 4.1.2 Commercial advertisement and perception ............................................................70 4.1.3 Censorship .............................................................................................................72 4.1.4 Technological partnerships ....................................................................................73 4.1.5 The Visible Web .....................................................................................................74 4.1.6 Invisible Web .........................................................................................................76 4.2 Search engine dependency and its influence on data quality: Solutions ......................77 4.2.1 A deeper knowledge in search engine abilities ......................................................77 4.2.2 Taking the best part of each search engine ............................................................79 4.2.3 Technological evolution .........................................................................................80 4.3 The future of Internet search .........................................................................................87 4.5 Chapter 4: Key points ...................................................................................................89 Chapter 5: The Google example .............................................................................................90 5.1 Google presentation ......................................................................................................91 5.1.1 Google....................................................................................................................91 5.1.2 Google's success ....................................................................................................91 4
  • 5. 5.1.3 Google image .........................................................................................................92 5.1.4 Google dependency state .......................................................................................93 5.1.5 Google added functionalities .................................................................................94 5.1.6 Google success is his weakness .............................................................................94 5.2 Google's disappearance consequences ..........................................................................96 5.2.1 Google Search engine failure.................................................................................96 5.2.2 Google Gmail failure .............................................................................................98 5.2.3 Google other services failure ...............................................................................100 5.2.4 Google collateral damages ...................................................................................101 5.3 Chapter 5: Key points .................................................................................................102 Conclusion and recommendations ........................................................................................103 List of literature ....................................................................................................................110 5
  • 6. Table of figures Figure 1: Internet Domain Survey Host Count January 1994 - January 2009 ........................11 Figure 2: Do you use a personal blog? ...................................................................................12 Figure 3: How frequently do Internet users participate in the most popular activities? .........12 Figure 4: Most used information source when people need help ...........................................19 Figure 5: How much of the information on the World Wide Web overall is generally reliable? ................................................................................................................................................20 Figure 6: Enterprise findability goal .......................................................................................23 Figure 7: 1st and 2nd results for "data quality" are Wikipedia websites ................................25 Figure 8: A query made on Google images with the keyword "P5170009" ...........................27 Figure 9: How well is findability understood in your organization? ......................................28 Figure 10: How critical is findability to your Organization's Business Goals and Success? ..28 Figure 11: Triangle method application ..................................................................................31 Figure 12: Ask search engine home page ...............................................................................35 Figure 13: Yahoo home page ..................................................................................................35 Figure 14: An example of vertical search with Yahoo Images ...............................................36 Figure 15: A semantic search engine: Wolfram Alpha ............................................................36 Figure 16: Top 10 Worldwide Search December 2007 ...........................................................37 Figure 17: How Much Of The Information On the Internet Do You Think is Reliable and Accurate? ................................................................................................................................39 Figure 18: Enterprise search satisfaction ................................................................................41 Figure 19: Influence of the consumer web on enterprise search tools ....................................41 Figure 20: Success rate of finding the information with enterprise search tools ....................42 Figure 21: Worldwide Search by Region ................................................................................43 Figure 22: Search engine leaders (>50%) per country personal estimation ...........................44 Figure 23: Search engine market in the USA, source: Hitwise, february 2009 ......................46 Figure 24: Google logo ...........................................................................................................46 Figure 25: Yahoo Logo ...........................................................................................................47 Figure 26: Japanese search engine market, source:webcreate.ga-pro.com, May 2009 ..........47 Figure 27:Chinese search engine market, source:China IntelliConsulting Corp. sept 2008...47 Figure 28: Baidu logo .............................................................................................................47 Figure 29: Bing logo ...............................................................................................................48 Figure 30: Korean search engine market, source: July 2007 Koreanclick..............................48 Figure 31: Naver logo .............................................................................................................48 Figure 32: Yandex logo ...........................................................................................................49 Figure 33:Search engine market in Russia, source: LiveInternet.ru:December 31, 2008 ......49 Figure 34: Seznam logo ..........................................................................................................49 Figure 35: Search engine market in Czech Rep, source: navrcholu.cz, June 2008 ................49 Figure 36:Search engine market in Iceland, source: statice.is 2007 .......................................50 Figure 37: Leit.is logo.............................................................................................................50 Figure 38: An example of a customized interface on iGoogle................................................51 Figure 39: Search engine market shares in 2007 for the Czech Republic ..............................52 Figure 40: Google market shares in Europe in 2008, source:Comscore .................................56 6
  • 7. Figure 41: Use of search engines in 2004 and 2005 ...............................................................56 Figure 42: Search engine dependency relevancy ....................................................................57 Figure 43: Search users blame themselves not the technology...............................................58 Figure 44: Search engine syntax examples .............................................................................60 Figure 45: Use of advanced search functionalities in Canada ................................................61 Figure 46: Do users know how to use Boolean operators? .....................................................61 Figure 47: Use of meta search engines ...................................................................................62 Figure 48: Use of specialized search engines .........................................................................63 Figure 49: U.S. Advertising Market - Media Comparison – 2008 ($ Billions) ..........................67 Figure 50: Internet Ad Revenues by Advertising Format - 2008 Annual Results...................68 Figure 51: Search engine user behavior regarding results pages in the USA .........................68 Figure 52: An eye tracking study on several search engines ..................................................69 Figure 53: Differences between organic and sponsored results ..............................................70 Figure 54: Type of Search Result Selected .............................................................................71 Figure 55: Results relevancy according to users by search engine in 2004............................71 Figure 56: Attitudes towards search engines in India .............................................................72 Figure 57: Powered by Google logo .......................................................................................74 Figure 58: Powered by Yahoo logo.........................................................................................74 Figure 59: Estimation of the indexable web per search engine ..............................................74 Figure 60: Distribution of Public Web Sites By Country in 2002 ..........................................75 Figure 61: Percentage of Web Sites Covered by Google in 2002 ...........................................75 Figure 62: Google vertical search engines ..............................................................................78 Figure 63: Search engine search within website content comparison ....................................79 Figure 64: Future of web 2.0 ..................................................................................................80 Figure 65: Search engines are not the Internet .......................................................................81 Figure 66: Time and knowledge lag .......................................................................................82 Figure 67: Delicious bookmarks search..................................................................................83 Figure 68: Home page of the Similicious website ..................................................................83 Figure 69: Twitter real time information search engine ..........................................................84 Figure 70: Kartoo search results presentation.........................................................................85 Figure 71: 2008 Web trend map..............................................................................................86 Figure 72: 2007 Web trend map..............................................................................................86 Figure 73: Significant age-related differences in article discovery methods ..........................88 Figure 74: Google domination in Europe Figure 75: Google domination in Latin America ................................................................................................................................................93 Figure 76: Google coverage representation of the visible web...............................................95 Figure 77: Google search failure ............................................................................................96 Figure 78: Figure 77: Google bug analysis on January the 31st 2009 ....................................97 Figure 79: Google evolution traffic during the bug on January the 31st 2009 .......................98 Figure 80: Google Gmail failure.............................................................................................99 Figure 81: Main use of Internet ............................................................................................100 7
  • 8. Foreword A general trend of the early 21st century has been the use of the Internet despite of TV as an information provider1. There are today 1,596,270,108 Internet users in the world2 and basically most of them already have their habits: checking their e-mail box(es), making research, finding information about goods and services, online chatting, reading the news3. Most of the functions described above can be done through an unique information exchange provider: the search engines. According to the main actors in Internet traffic measurements search engines are by far the most visited websites4. The main search engines actors are nowadays providing all kind of services making the Internet use very comfortable. However using a single search engine everyday make people conditioned to process information in a certain way. Such habits taken at home may unfortunately be present at work or the other way around. It is for sure comfortable to have a standard when dealing with computers. As an example Microsoft is the leading Operating System on computers with more than 90% of the all market5. But is Microsoft the computer? The same question arise with search engines: are they the Internet? 1 Cogar, P. (ed.) (2007). TV vs. the Internet: Internet wins. [online]. Available from : http://www.bit- tech.net/news/2007/08/23/tv_vs_the_internet_internet_wins/1 [Accessed 17 June 2009] 2 Internet World Stats. (2009). World Internet Usage Statistics News and World Population Stats. [online]. Available from: http://www.internetworldstats.com/stats.htm [Accessed 17 June 2009] 3 Malaysian Communications and Multimedia Commission. (2005). Household use of the Internet survey 2005. [online]. Available from: http://www.skmm.gov.my/facts_figures/stats/pdf/Household_use_internet_survey2005.pdf [Accessed 17 June 2009] 4 Alexa Web. (n.d). Alexa Top 500 Global Sites. [online]. Available from: http://www.alexa.com/topsites [Accessed 17 June 2009] Netcraft. (n.d). Most visited websites. [online]. Available from: http://toolbar.netcraft.com/stats/topsites [Accessed 17 June 2009] 5 One Stat. (2007). OneStat Website Statistics and website metrics - Press Room. [online]. Available from : http://www.onestat.com/html/aboutus_pressbox54-windows-vista-global-usage-share.html [Accessed 20 June 2009] 8
  • 9. « Risks of search engine dependency and its influence on data quality » has been written in the scope of understanding the potential risks of search engines addiction on businesses. Search engines such as Google are used by all Internet users. According to studies, Internet users are confident, satisfied and trust search engines 6 . They unfortunately show that users are unaware and naïve as well. Search engines are set up to find information on the Internet, information being the basis of any good decision making we can then understand how important and interesting it is for businesses to understand what are the consequences of their use. Ronan CHARDONNEAU 6 Fallows, D. (2005). Search Engine users. [online]. Available from: http://www.pewinternet.org/~/media//Files/Reports/2005/PIP_Searchengine_users.pdf.pdf [Accessed 17 June 2009] 9
  • 10. Chapter 1: Introduction of the topic background 10
  • 11. The Internet has been created to share information and to communicate with each others. It is hard to evaluate how big is the Internet, estimations among companies are very different, it varies from 15 to some 30 billion Web pages7. The number of websites is increasing everyday and estimated at more than 600,000,0008 for 2009 with a constant augmentation since the creation of the world wide web. Figure 1: Internet Domain Survey Host Count January 1994 - January 2009 Websites are used now in diverse manners if it comes to be a standard for companies (enlargement of their business activity, new opportunity for advertisement) it is also a space for many individuals (blog phenomenon). 7 Cf. Koch, P. / Koch, S. (2009). How big is the Internet?. [online]. Available from http://www.pandia.com/sew/383-websize.html. [Accessed 19 January 2009] 8 Internet Systems Consortium. (2009). The ISC Domain Survey Internet Systems Consortium. [online]. Available from https://www.isc.org/solutions/survey [Accessed 17 June 2009] 11
  • 12. Figure 2: Do you use a personal blog?9 A study realized on 29 countries shows that almost 25% of Internet users under 34 year-old are using a blog, this trend is moreover growing since 2003. The vulgarization of the Internet and the fact that anyone can create his own website for free increased drastically the number of contents. The explosion of social networks (Facebook, Hi5…), blogs (Wordpress, Blogger, Myspace…), microblogging (Twitter) are changing the nature and fabric of the world wide web: from an Internet built by a few thousand of individuals we moved to one made by millions.10 If we take into account that searching is after e-mails the biggest activity Figure 3: How frequently do Internet users participate in the most popular activities? 9 USC Annenberg School. (2008). The impact of the Internet. [online]. Available from: http://advertising.microsoft.com/sverige/WWDocs/User/sv-se/NewsAndEvents/Events/jeff_cole.ppt [Accessed 21 June 2009] p.7 10 Cf. UCL. (2008). Information behaviour of the researcher of the future. [online]. Available from: http://www.bl.uk/news/pdf/googlegen.pdf [Accessed 19 June 2009] p.16 12
  • 13. which is made of the Internet11: We can then understand that more sophisticated tools are needed to find the right information on the Web. So far we access to websites through three ways:  Direct access (for example entering directly the URL in the address bar, clicking on a bookmark);  External links (access to a website through the link of another website, this is the case in most of websites, catalogs, advertisement);  Through Search Engines; By using only the first two options one cannot browse the Internet normally. It has been said as well that the first way is disappearing more and more in profit of search engines12. A search engine is then indispensable in order to crawl the web properly. 1.1 Relevance of the subject The Internet is becoming more and more our information provider. As studies show: ‖More people turned to the internet than any other source of information and support, including experts, family members, government agencies, or libraries‖13. The Web is the primary source of information for many people with an increase of its recognition14. 11 USC. (2008). Annual Internet Survey by the Center for the Digital Future. [online]. Available from: http://www.digitalcenter.org/pdf/2008-Digital-Future-Report-Final-Release.pdf [Accessed 21 June 2009] p.4 12 cf. Ohayon, O. (2008). Google, moteur de recherche ou moteur de navigation?. [online]. Available from : http://fr.techcrunch.com/2008/10/30/fr-google-moteur-de-recherche-ou-moteur-de- navigation/ [Accessed 17 June 2009] 13 Estabrook, L. /Witt, E./ Rainie, L. (2007). Information searches that solve problems. [online]. Available from: http://www.pewinternet.org/~/media//Files/Reports/2007/Pew_UI_LibrariesReport.pdf.pdf [Accessed 17 June 2009] p5 14 Cole, J. I./Suman, M./Schramm, P./Lunn, R/Aquino, J.S. (2003). Surveying the Digital Future. [online] Available from: http://www.digitalcenter.org/pdf/InternetReportYearThree.pdf [Accessed 17 June 2009] 13
  • 14. The number of Internet users is estimated to 1,463,632,361 (world population 6,676,120,288) with a growth rate from 2000-2008 fixed at 305.5 %15. The Internet is then our main information provider and his number of users is increasing every day. This rule is the same for businesses as for individuals. More and more information is digitalized and it comes then easier for companies to get data from the Internet rather than extracting it in the former way. As an example it is simpler to access the Yellow pages online, making copy and paste of some information rather than opening the hard copy book and typing in the data you want to work on. The Internet is then a place where the working environment is crossing the one of the individual. This information sharing have some consequences (lot of information, accuracy issues, internet users are subject to many commercials). This is moreover problematic because this is the first time that an information provider is gathering in such extend those two sources of information. It was not the case with TV, Radio or even newspapers. As we will see later some companies are only relying on information, finding quality websites is then critical for businesses. 1.2 Major terms In this thesis the following expressions will be used: search engines, search engine dependency, data quality, Web 2.0 and following versions. Search engine is the most flexible technology which has been created in order to browse the web. A search engine is no more than a web application which is processing information. It does not create data it just process some information it has in his index. ―A search engine is simply a means to ask information on the Web, a system for organizing the data held on the Internet. A search engine can be metaphorically 15 Internet World Stats. (2009). INTERNET USAGE STATISTICS The Internet Big Picture.[online] Available from: http://www.internetworldstats.com/stats.htm [Accessed 17 June 2009] 14
  • 15. compared to several activities: a miner panning for gold, a clerk looking for a document in a cabinet…‖16 Search engine dependency is the fact that Internet users use a single search engine when looking for information on the Internet. This dependency can be created from different factors such as loyalty, patriotism or convenience. Data quality is the quality of data. Data are of high quality "if they are fit for their intended uses in operations, decision making and planning"17. Alternatively, the data are deemed of high quality if they correctly represent the real-world construct to which they refer. These two views can often be in disagreement, even about the same set of data used for the same purpose.18 Web 2.0 and following versions are not the name of a specific software or technology. As an example Web 2.0 is an online movement that encourages users to participate in the fresh, interactive nature of the Internet by using widely available, 19 less expensive, and more mature state-of-the-art technologies . 1.3 Focus, goals and structure of the report The focus of this work is to put in evidence that there is a critical lack of how to use the Internet either at home that within businesses and that one is influencing the other. Such lack of knowledge is raising from the over evaluation we are making about technologies, commercial search engine strategies, lack of awareness, strong addiction to search engines, lack of training within businesses and educational institutions. This has some critical consequences on business decision-making as well as day to day choices. 16 Friedman, B. G. (2004). Web search savvy. p.19 17 Juran, J. (1999). Juran’s quality handbook: Fifth edition. p.976 18 Kaplan, I. (2008). Bad Data Can Cost You Big Time. [online]. Available from: http://www.federationofcredit.com/base/document/Newsletter/IKaplanSept08.html [Accessed 17 June 2009] 19 Meyerson, M./Scarborough, M. E. (2007). Mastering Online Marketing. P.223 15
  • 16. If those risks are relevant it is then very important to put them in evidence showing concretely what are those risks, where are they coming from and how much is the gap of information between a search from a search engine addicted user and the most rational way of looking for the information. The structure of the report is as follow: The first idea is to introduce the concept of data quality. What do we mean by data quality? How to get data quality on the Internet? The next point is dealing with the world of search engines and the dependency which is coming out from them. Analyzing the world of search engines is important to understand how the Internet is not as rational as one could think and what are the actors of the dependency (search engines may be not the Internet, search engines may be different from a country to another). Once this analysis made, a look at the facts and figures regarding search engine users attitudes will be conducted. This should drive us to the conclusion that Internet users are not using an all set of search tools but only a couple of them: the dependency concept. Once the dependency concept introduced we will measure the risks of such addiction on data quality. Google being in Europe the most used search engine and it will be used as a concrete example in the last part. In the last part recommendations will be given for companies interested in improving their information research system and reducing data quality issues when looking for information on the Internet. 16
  • 17. 1.4 Chaper 1: Key points  The Internet is used to share information and to communicate;  The number of websites created increase everyday;  Websites are used for diverse purposes (making advertisement, expressing personal opinions, running businesses…);  25% of young people Internet users aged of <34 year-old have a personal website;  In a decade we skipped from an Internet built by a thousand of individuals to one made by millions;  Search is the second biggest activity made on the Internet after e-mails;  Search engines are so far the only way to crawl the Internet properly;  The Internet is our main information provider;  On the Internet, flows of information from businesses are mixed up with the ones of individuals, it can then be subject to confusions;  Search engine are the origin of those confusions, it seems then critical to analyze how those technologies are working and what are the consequences of their use; 17
  • 18. Chapter 2: Concept of data quality 18
  • 19. A recent study in the United States showed20 that the Internet is the most used source of information when people need help: Figure 4: Most used information source when people need help This information is far more valuable if we consider that the World Wide Web is now the largest resource of information21. The Internet has then several strengths: the most used information system, the biggest resource of information, it is moreover the most global and accessible one (free and mobile).22 The issue is how to use it wisely to get quality information. If we have a look at the perception that Internet users have regarding the quality of information on the Internet we can see that a high percentage of users are considering the data quality issue. Most of them however agree that in general the Internet is a reliable source of information23: 20 cf. Estabrook, L. /Witt, E./ Rainie, L. (2007). Information searches that solve problems. [online]. Available from: http://www.pewinternet.org/~/media//Files/Reports/2007/Pew_UI_LibrariesReport.pdf.pdf [Accessed 17 June 2009] 21 Muñoz, C./Moraga, A./Piattini, M. (2008). Handbook of Research on Web Information Systems Quality. p.286 22 Albarran, A.B./Chan-Olmsted,S.M./Wirth,M.O. (2006). Handbook of media management and economics. p471 23 Pierce, J. (2008). The World Internet Project. [online]. Available from: http://www.digitalcenter.org/WIP2009/WorldInternetProject-FinalRelease.pdf [Accessed 20 June 2009] 19
  • 20. Figure 5: How much of the information on the World Wide Web overall is generally reliable? 2.1 Data quality definition ―Data has quality if it satisfies the requirements of its intended use. It lacks quality to the extent that it does not satisfy the requirement. In other words, data quality depends as much on the intended use as it does on the data itself. To satisfy the intended use, the data must be accurate, timely, relevant, complete, understood, and trusted.‖24 In general one agrees to define data quality according to six dimensions. Accuracy: The quality of being near to the true value25. Accuracy is the most important dimension.26 Timelessness: unaffected by time27. Relevant: the degree to which search results meet the requirements or expectations implicit in the query28 Complete: bring to a whole, with all the necessary parts or elements. Understood: perceive (an idea or situation) mentally. 24 Olsen, J. (2003). Data quality: The accuracy dimension. p.24 25 Wordnet.princeton.edu. (2009). Accuracy definition. [online]. Available from: wordnet.princeton.edu/perl/webwn [Accessed 17 June 2009] 26 Olsen, J. (2003). Data quality: The accuracy dimension. p.3 27 Wordnet.princeton.edu. (2009). Timelessness definition. [online]. Available from: wordnet.princeton.edu/perl/webwn [Accessed 17 June 2009] 28 WhamTech . (n.d). Glossary of less-than-usual terms used in the Web site. [online]. Available from: www.whamtech.com/glossary.htm [Accessed 17 June 2009] 20
  • 21. Trusted: inclined to believe or confide readily. Each of those dimensions can be accepted with a certain level of acceptance. As previously said everything depends on the intended use of the information. For example a database with 70% of accuracy may have a value for some company departments (e.g: marketing for estimations) because those 70% of data are exploitable. On the other hand it can be useless for others, for e.g: an accounting department releasing a balance sheet of 70% accuracy. Data quality is a complex topic and some additional dimensions can be included for the use of the data such as: Accessibility, Accuracy, Amount of data, Applicability, Attractiveness, Availability, Believability, Completeness, Concise representation, Consistent representation, Cost effectiveness, Customer support, Currency, Documentation, Duplicates, Ease of operation, Expiration, Flexibility, Granularity, Interactive, Internal consistency, Interpretability, Latency, Maintainable, Novelty, Objectivity, Ontology, Organization, Price, Relevancy, Reliability, Reputation, Response time, Security, Specialization, Source's information, Timeliness, Understand ability, Validity, Value-added.29 2.2 Data quality issues within businesses As we saw previously accurate data is the most important dimension of data quality. Data is the heart of any good businesses or organizations. Some companies such as financial ones are only living on information. The use of the Internet increased the flow of information and now company's data are used by other companies to make decisions such as purchasing and selling. 29 Muñoz, C./Moraga, A./Piattini, M. (2008). Handbook of Research on Web Information Systems Quality. p.138 21
  • 22. So if company A is providing bad quality data which afterward are retaken by company B it enters in a vicious circle where the flow of biased information never stop. As Jack E. Olson mentioned it in his book ―Data quality‖: ―Data is generated by more people, is used in the execution of more tasks by more people, and is used in corporate decision making more than ever before.‖30 Data quality is critical. Even though databases are recognized as the most important asset, companies tolerate enormous inaccuracies in their databases. According to the same author this issue is not only present within businesses but as well in governmental organizations and educational systems:  Businesses and organizations are aware of data issue;  They all underestimate the consequences of it;  They have no idea of the cost linked to those issues;  They have no idea of the potential value in fixing the problem; Jack E.Olsen gives us as well in his book an estimation of the loss associated to data quality fixing it at 15 to 25% of the operating profit. Those losses are of different kinds: transaction rework costs, costs incurred in implementing new systems, delays in delivering data to decision makers, lost customers through poor service, lost production through supply chain problems. Those issues are normally not coming from the data management system (DMS are conceived to answer a specific request). The failure is mainly coming from its users. To avoid this they need to be aware of three things: - what are the system capabilities; - how to use it properly; - how to interpret its results. 30 Olsen, J. (2003). Data quality: The accuracy dimension. p.5 22
  • 23. The main remedy of this issue stands to be a long term strategy in which teams within the organization are trained in the concept of data quality management. The concept of data quality is very relevant when dealing about search engines. Most of the search engines we know as consumers are commercial search engines. But as we know the main objective of a commercial company is to make profit and from this a lot of issues are raising. According to a study untitled ―Findability‖31 most of businesses (62%) agree that finding information is critical however on the other hand most of them do not know the criticality level of finding information and this due to a general lack of awareness. It shows as well that strategy are almost mainly not defined (49%): Figure 6: Enterprise findability goal And proper goals not clearly expressed. It draws the same conclusions as some authors on this topic32. As we saw technology is not responsible of quality issues but the use of technology and the interpretation made out of the information retrieval is a source of quality problems. This can be reduced by implementing methodologies such as: – Putting in place a better information research management strategy33 mainly based on employees training. It does not only mean to train employees on how to use technologies but as well how to develop a pro efficient behavior when making 31 cf. The Association for Enterprise and Content Management. (2008). Findability: The Art and Science of Making Content Easy to Find. [online]. Available from: http://www.aiim.org/Research/MarketIQ/Findability-7-16-08.aspx [Accessed 17 June 2009] p.22 32 Olsen, J. (2003). Data quality: The accuracy dimension. p.7-8 33 Kehoe, M. (2009). Overview of the Enterprise Search Market. [online]. Available from: http://www.ideaeng.com/tabId/98/itemId/181/Overview-of-the-Enterprise-Search-Market-2009.aspx [Accessed 17 June 2009] 23
  • 24. research. It means reconsidering the information process and participating in the improvement of the all research information system (cf.chapter:2.3.4). Computer users are expecting too much from technologies waiting to be fed with the most rational solution whereas it is not yet on the market; – Implementing a more user oriented research application. Studies are showing that regarding libraries too many of them did not investigate enough in this field, focusing on the size of their database rather than how to retrieve the information34. This is one of the reason why people move from libraries to the Internet as an information provider; 2.3 Origins of data quality issues: Garbage In Garbage Out ―On two occasions I have been asked,—"Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" … I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question. “ — Charles Babbage As we previously saw data quality issues with search engines are not coming from technology. They are in fact coming from: – The one who wrote the contents of the results, it can be misspellings, no concrete sources to justify himself, no adoption of standards, advertisement; – The one who type in the request (cf. chapter 3.3.6.1: Search engine use awareness); The next parts will develop this first point in detail. 2.3.1 Poor data quality content: the Wikipedia example Wikipedia is an easy example to illustrate the data quality issue with Internet content and introduce well the chapters coming afterward. 34 Cf. UCL. (2008). Information behaviour of the researcher of the future. [online]. Available from: http://www.bl.uk/news/pdf/googlegen.pdf [Accessed 19 June 2009] p.31 24
  • 25. Wikipedia is one of the greatest collaborative world wide web project ever but on the other hand it has a couple of drawbacks. Those disadvantages are mainly arising from an absence of standards in data quality, here are some of those points: – Everybody can provide his contribution and have the possibility to sign it as anonymous, so in theory a 3 year-old kid can write an article. According to Sara Baase: ―Accuracy and quality are impossible. Truth does not come from populist free-for-alls. Some articles are biased and one sided‖35; – Some articles without reliable sources can be validated by an administrator, Internet users may then take the displayed information for granted; – The success of Wikipedia: word of mouth; – Wikipedia's popularity36 made it ranks first on Google on most of the requests. It has a page rank of 9 out of 10 which corresponds to almost the maximum recognition Google can give to a website. Figure 7: 1st and 2nd results for "data quality" are Wikipedia websites 2.3.2 Poor data quality content: the commercial example In a study untitled ―Of course it’s true I saw it on the Internet!‖ 37 aimed at understanding how American students conduct searches the following question was asked: ―List three major innovations developed by Microsoft over the past 10 years‖. 35 Baase, S. (2007). A gift of Fire. p352 36 Baase, S. (2007). A gift of Fire. p351 37 Graham, E. L./ Metaxas, P. T. (2003). Of course it’s true I saw it on the Internet!: Critical thinking in the Internet. Available from: http://www.wellesley.edu/CS/pmetaxas/CriticalThinking.pdf [Accessed 17 June 2009] 25
  • 26. The survey was submitted to 180 college students in the United States during the school year 2000-2001. As an answer 63% responded by using only one source of information: Microsoft‟s website but is a commercial website a reliable, neutral and trusting source of information? One thing is sure a company have no interest to critic herself on her own website so it may be high probable that they will tend to sell themselves more than keeping a neutral point of view. The commercial aspect of search engine will be retaken and more developed in the next chapters. 2.3.3 Metadata Metadata is the key in order to understand how search engines are currently working and to understand how to perform good search. Commonly speaking the definition of metadata is data about data. As an example a librarian is archiving his books by assigning to each of them a reference. For example the reference ―AA1‖ corresponds to ―gone in the wind. Each web page on the Internet has several metadata such as the ―title‖ of the page ―keyword associated to the page‖ ―description‖ etc etc… Metadata issues are coming mainly because they are not representing all the data. The best example we can find is the one of images search. Today when typing a request to look for pictures we get as a result a strange cocktail of a bit everything. The reason in this case are a lack of metadata and a use of them which is not appropriate. As an example most of Internet users are uploading pictures without giving them any names, letting just a number as identifier. This is an incredible amount of data which is unusable. 26
  • 27. Figure 8: A query made on Google images with the keyword "P5170009" This is introducing another issue which is findability. 2.3.4 Findability ―Findability Precedes Usability In the alphabet and on the Web You can’t use what you can’t find‖38 Findability is the art and science of making content findable. The science is library science; the art is language arts and the user interface design.39 Findability is more or less understood by businesses and too often confused as search. 38 Morville, P. (2005). Ambient Findability. p.111 39 The Association for Enterprise and Content Management. (2008). Findability: The Art and Science of Making Content Easy to Find. [online]. Available from: http://www.aiim.org/Research/MarketIQ/Findability-7-16-08.aspx [Accessed 17 June 2009] p.9 27
  • 28. Figure 9: How well is findability understood in your organization? Findability is not only about making research but also on how to make information findable. Most of businesses agree on this point: Findability is critical in Organization’s Business Goals and Success (62%). Figure 10: How critical is findability to your Organization's Business Goals and Success? However as a study on findability shows40 and as we will see later in Chapter 3 findability is not well defined and implemented within companies and this is mainly due to a management failure. 41 As Peter Morville describes it in his book ―Ambient Findability‖ Findability is defying classification. It flows across the borders between design, engineering, and marketing. Everybody is responsible, and so we run the risk that nobody is accountable. Findability is the matter of everyone within a company for example when designing the company website you have different actors: designers, engineers, information architects, brand architects, marketing department. Another example is the one of a secretary or an archiver when storing documents. He or she have to think about how to make those materials easy to find for everyone (by choosing the right metadata, the right technology) this include a 40 Cf. The Association for Enterprise and Content Management. (2008). Findability: The Art and Science of Making Content Easy to Find. [online]. Available from: http://www.aiim.org/Research/MarketIQ/Findability-7-16-08.aspx [Accessed 17 June 2009] 41 Morville, P. (2005). Ambient Findability. p.111 28
  • 29. collaboration with all departments within a company. If not those contents are not findable and lost in a certain way. The solutions given by the Peter Morville are two: cultivate cross-functional collaboration and on an individual level to learn how to be pro efficient and to go beyond the job responsibility. 2.4 Data quality solutions A problem well defined is a problem half-solved." –John Dewey Data quality issues are coming from: - Garbage In Garbage Out; - No check of information accuracy; Solutions are then easy to find out: - Learning how to use search tools; - Check out the information; 2.4.1 Learning how to use search tools The main issue with Internet users is that they stick to the “Principle of Least Effort” invented by George Kingsley ZIPF: “Each individual will adopt a course of action that will involve the expenditure of the probably least average of his work (least effort).”42 And according to Calvin Mooers’ ―people will not seek information that makes their jobs harder (even if it may benefit the organization they work for)‖.43 Studies are in fact showing that users are sacrificing information quality 42 Case, D. O. (2007) Looking for information. p.151 43 Morville, P. (2005). Ambient Findability. p.54 29
  • 30. for accessibility44. So users do not care about quality there are interested in easy to access information. This is mainly why the Google Advanced search option is rarely used. People assigning Advanced to complex.45 Whereas Advanced should be the right way to search. 2.4.2 Check out the information: the Triangle method Commonly used in the educational system the triangle method consists in lo- cating three independent sources that point to the same answer in order to pro- duce the most accurate information. This method is not making a distinction be- tween quality websites and poor quality ones but it helps in checking the infor- mation. Applying this concept can be more powerful that we can imagine. As an ex- ample one can take a recent news event such the riots in Tibet in 2008. If we look at the news provided from the United Kingdom46 and Germany47 as symbols of West- ern media Tibetans were suffering a true chaos in March 2008. On the other hand by having a look at CCTV (China Central Television)48 some information posted by Western media were according to them totally biased and incoherent. And when having a look at the proves advanced by the Chinese Me- dia it is actually giving them reason 49. The inaccuracies came from the facts that 44 Hirsh, S./Dinkelacker, J. (2004). Seeking Information in order to produce information: an empirical study at Hewlett Packards Labs. p.816 45 Olausson , A. M. (2007). Advanced Search: Is the name a problem?. [online]. Available from : http://digital-lifestyles.info/2007/09/21/advanced-search-is-the-name-a-problem/ [Accessed 17 June 2009] 46 BBC. (2008). Tibetans describe continuing unrest. [online]. Available from : http://news.bbc.co.uk/2/hi/asia-pacific/7300312.stm [Accessed 17 June 2009] 47 Berliner Morgenpost. (2008). China rüstet sich für « die entscheidende Schlacht ». [online]. Available from : http://www.morgenpost.de/printarchiv/politik/article169230/China_ruestet_sich_fuer_die_entscheiden de_Schlacht.html [Accessed 17 June 2009] 48 XinHua. (2008). Commentary : Facts about Tibet should not be distorted. [online]. Available from: http://news.xinhuanet.com/english/2008-03/24/content_7847789.htm http://news.xinhuanet.com/english/2008-03/24/content_7847789_1.htm [Accessed 17 June 2009] 49 Beijing Review. (2008). Dialogue: Media Coverage on Tibet. [online]. Available from: http://www.bjreview.com.cn/special/txt/2008-03/22/content_107054.htm [Accessed 17 June 2009] 30
  • 31. Western media did not know well enough the Chinese and Tibetan cultures and lan- guages and were associating captions to images which were not true. In this configuration looking at three independent sources is critical. Who could have thought that Western medias can be wrong for example. Figure 11: Triangle method application Reliable sources is then a necessary condition for data accuracy but this condition is not sufficient you need moreover to look at three independent and reliable sources information which point to the same answer. 31
  • 32. 2.5 Chapter 2: key points  The Internet is the most used, largest, global and accessible source of information;  The majority of Internet users consider the Internet has a reliable source of information and are aware of quality issues;  Accuracy is the most important dimension in data quality and can be accepted in some cases with a certain level of acceptance;  Some companies are only living on information;  Company's data are used by other companies to make decisions;  Data quality issues are touching all kind of organizations;  The loss associated to data quality is estimated from 15 to 25% of the operating profit;  In most of the cases Database Management System is not the cause of data quality issues;  A majority of businesses do not have proper goals defined regarding the findability of their material within their research environment;  Cultivate cross-functional collaboration and pro efficient behavior within companies are the keys to set up good information retrieval systems;  Making content findable is the job responsibility of everyone within a company;  People will not seek information that makes their jobs harder (even if it may benefit the organization they work for)  Users are sacrificing information quality for accessibility;  People are assigning Advanced to complex. Whereas Advanced is the right way to search.  Accuracy issue can be reduced by checking the information from three independent and reliable sources; 32
  • 33. Chapter 3: Search engines dependency 33
  • 34. As previously seen search is the second most popular activity made of the Internet and search engines are the most appropriate tool to do so. Before introducing the search engine dependency concept it may be interesting to know the search engine market configuration. Even if Google is recognized as the leading brand in this field, his superiority may be not worldwide. A strong effort has been made in this thesis to make it as global as possible. Most of the publications in this area have been written considering the American and European market as a representative sample of the market. The raising up of India and China in the technological world and the increase of information on the Internet allow us now to get information about the Asian market. If most of new technologies are coming from the United States it is interesting to enlarge the research study to Asia to get a more representative and exclusive panel. 3.1 Search engine categories Search engines can be divided into two categories:  Commercial search engines available for free for the mass public mainly in exchange of advertisement display;  Enterprise search engines for businesses. They are generally paid services, free of advertisement and customized for a specific need. 3.1.1 Commercial search engines Commercial search engines are divided into four categories:  Standard: the most well known search engines such as www.google.com, www.bing.com, http://www.ask.com/. They are looking for any kind of in- formation through the Internet and are characterized by a very light inter- face (mostly text-based applications): 34
  • 35. Figure 12: Ask search engine home page  Portals: Portals are a mix between standard search engines and direc- tories. Differently from search engines, directories are using human being instead of robots to index websites address. In theory (if we did not take into consideration the commercial aspect) directories should provide qual- ity information rather than quantity.50 Portals are then characterized by a lot of information on their home page including the search engine func- tion. The most well known portal is Yahoo. Figure 13: Yahoo home page  Specialized search engines: they belong to a subcategory of the first group and are also called vertical search engines. Vertical search engine is to search the information sources of one industry or a kind.51 Specia- lized search engines are crawling only a restricted area and not the entire web. For example they can search only in a specific website or only a specific kind of document (books, images, .pdf documents, videos…). If specialized search engines are not a revolution in themselves (they are for most of them a filter of bigger search engines) they however find their 50 Friedman, B. G. (2004). Web Search Savvy. p.21 51 Wang, W. (2007). Integration and Innovation Orient to E-Society Volume 1. p.666 35
  • 36. place when standard search engines are providing too many results for a given request. Figure 14: An example of vertical search with Yahoo Images  Semantic search engines: Most of search engines on the market are based on keywords and documents popularity (for example Google page rank) without taking into account the real content52. The idea behind se- mantic is to understand the hidden meaning of the information. A re- cent example of such search engine called ―Wolfram Alpha‖ just came out on the market, qualified as a ―knowledge engine‖53 designed to give you answers to your request rather than driving you to a website which may have it. Semantic search engines belong to the Web 3.0 generation where machines interpret the meaning of the data.54 Figure 15: A semantic search engine: Wolfram Alpha 52 Priss, U./Corbett, D./Angelova, G. (2002). Conceptual structures. p.92 53 Valentiner, Z. (2009). New search tool on the block: Wolfram Alpha. [online]. Available from : http://www.mndaily.com/blogs/tech-corner/2009/05/20/new-search-tool-block-wolframalpha [Accessed 17 June 2009] 54 Cf. Sankar, K./Bouchard, S./Mancini, D. (2009). Enterprise Web 2.0 Fundamentals. P.161 36
  • 37. 3.1.2 Enterprise search engine (ESE) Enterprise Search Engine are dedicated to search within companies environment such as Internet, Intranet, Customer Management System, Databases, Wikis, Software Applications. Their use can be clearly understood when employees within companies are looking for information which are not public or want to get pertinent information within their own environment. Enterprise search engines have more or less the same technology and function as commercial web search engine, they just target a specific group rather than a mass public audience55. 3.2 Search engine market 3.2.1 Commercial search engine market The commercial search engine market is segmented as follow: Figure 16: Top 10 Worldwide Search December 2007 55 cf. The Association for Enterprise and Content Management. (2008). Findability: The Art and Science of Making Content Easy to Find. [online]. Available from: http://www.aiim.org/Research/MarketIQ/Findability-7-16-08.aspx [Accessed 17 June 2009] 37
  • 38. - Google is the major leader with more than 60% ; - Yahoo has a comfortable second position with more than 10%; - Three other major search engines are sharing the 3rd , 4th and 5th place with market shares from 2,4 to 5%: Baidu, Microsoft and Naver; - The presence of some specialized search engines in the top 10; As mentioned above, in 2007 the top 10 search website showed an interesting market with the presence of: - 2 specialized search engines such as eBay and Alibaba.com; - 4 Asian search engines: Baidu, NHN, Yandex and Alibaba.com; This clearly shows the presence of Asian technologies. Moreover Baidu, NHN and Yandex are nationally oriented as we will see later in chapter 3.2.7. 3.2.2 Commercial search engine market: Consumer behavior A study made in the United States shows that Internet searchers are confident, satisfied and trust search engines 56 some of those results are confirmed by a Taiwanese study57: • 92% are confident about their searching skills; • 87% have a successful search experience; • 68% believe that search engines are a fair and unbiased source of information; • 44% of searchers say they regularly use a single search engine, 48% will use just two or three, 7% will use more than three; 56 Fallows, D. (2005). Search engine users. [online]. Available from: http://www.pewinternet.org/~/media//Files/Reports/2005/PIP_Searchengine_users.pdf.pdf [Accessed 17 June 2009] p.2 57 Insight Xplorer. (2006). 創 市 際 市 場 研 究 顧 問 . [online]. Available from: http://www.insightxplorer.com/specialtopic/co_info_acquisition.html [Accessed 17 June 2009] 38
  • 39. 62% are not aware of a distinction between commercial and non commercial results; Moreover according to a study untitled: ―surveying the Digital Future‖ 58: Figure 17: How Much Of The Information On the Internet Do You Think is Reliable and Accurate? A huge majority of them is seeing it as a reliable and accurate source of information over the time.  According to another study 22% of Internet users have a search engine such as Google, Yahoo as their home page. This trend doubled since 2005.59  Regarding search engines reliability and accuracy 51% in 2007 are saying that most or all the information produced by search engines is reliable and accurate. They were 62% in 2006;  Internet users find high degree of reliability and accuracy on their favorite web sites, they were 81% in 2005, 83% in 2006 and 83% in 2007;60 58 UCLA Center for Communication Policy. (2004). Surveying the Digital Future. [online]. Available from: http://www.digitalcenter.org/downloads/DigitalFutureReport-Year4-2004.pdf [Accessed 18 June 2009]. P.39 59 Center for the Digital Future (2008). Annual Internet Survey by the Center for the Digital Future. [online]. Available from http://www.digitalcenter.org/pdf/2009_Digital_Future_Project_Release_Highlights.pdf [Accessed 19 June 2009] p.4 60 Center for the Digital Future (2008). Annual Internet Survey by the Center for the Digital Future. [online]. Available from http://www.digitalcenter.org/pdf/2009_Digital_Future_Project_Release_Highlights.pdf [Accessed 19 June 2009] p.5 39
  • 40. In 2007, 80% of Internet users are considering that most or all of the information posted by well known media such as the New York Times and CNN is reliable and accurate. They were 77% in 2006. It seems that commercial Internet users have a positive search experience. Even if they recognize data quality issues they seem not to understand where those problems are coming from. It should then interesting to inform them more regarding the commercial aspect of free search engines. 3.2.3 Enterprise Search Engine market The Enterprise Search Engine market is far more confused and crowded 61 than the commercial one. There are not many information on it but what we can say is that actors are different and that enterprise search engines are customized for a specific use. In a book untitled ―Practical aspects of Knowledge Management― and written in 2008 by Takahira Yamaguchi, a rank of the main actors in this field is given62: 1st autonomy.com 2nd Fastsearch.com 3rd Endeca.com As we can see those three companies were not listed in the commercial search engine ranking. However some commercial search engine firms are present on this market such as Google with Google Search Appliance and Microsoft with Microsoft Search Server. 61 Feldman, S. (2005). Desperately seeking search. [online]. Available from: http://www.kmworld.com/Articles/Editorial/Feature/Desperately-seeking-search-9665.aspx [Accessed 17 June 2009] 62 Yamaguchi, T. (2008). Practical aspects of Knowledge Management. p.41 40
  • 41. 3.2.4 Enterprise Search Engine market: Consumer behavior The Enterprise search engine market has a different configuration that the commercial one. However the main protagonists such as Google and Microsoft are still present63. In opposite to the commercial web search engines, enterprise search engine users are mostly disappointed by their search experience. Figure 18: Enterprise search satisfaction It is quite impressive to see that almost the majority (49%) have a negative image about searching for information within their enterprise search tools. The major reasons for this are: – The lack of training and consulting of those search tools within organizations64; – The expectation to have results which are as pertinent as commercial web search engines; Figure 19: Influence of the consumer web on enterprise search tools 63 Kehoe, M. (2009). 2009 Overview of the Enterprise Search Market. [online]. Available from: http://www.ideaeng.com/tabId/98/itemId/181/Overview-of-the-Enterprise-Search-Market- 2009.aspx [Accessed 17 June 2009] 64 cf. The Association for Enterprise and Content Management. (2008). Findability: The Art and Science of Making Content Easy to Find. [online]. Available from: http://www.aiim.org/Research/MarketIQ/Findability-7-16-08.aspx [Accessed 17 June 2009] p.36 41
  • 42. A vast majority (82%) agree to say that their consumer web experience on how to look for information on the Internet influence their expectations regarding the implementation of such technology within companies. As Ron Miller (cited in the following study) explained it: « On the web, search engines like Google have the advantage of searching the entire web. Therefore, the likelihood of finding query matches is much greater than in the enterprise where the number of possible right answers is much smaller, and could in fact be found in just a single document. (Of course finding more results doesn’t necessarily mean finding right ones, but that’s another issue altogether.) » It is then not surprising to see that most of enterprise search engines are not successful in finding what they are looking for: Figure 20: Success rate of finding the information with enterprise search tools The problematic according to Ron Miller should then be as follow: "I don‟t think the technology is failing us, I think it‟s the way we are using the technologies," but he adds, "If I can’t find my content, it doesn’t exist."65 This part clearly put in relevancy that searchers within companies are confusing commercial search engines with enterprise search engines associating directly one to the other. It shows as well the lack of training to those technologies and confirm then the lack of technology literacy of Internet users. Moreover it clearly define what the market is: simple and easy to use applications. 65 Miller, R. (2009). Unlock Power Enterprise Search. [online]. Available from: http://byronmiller.typepad.com/UnlockPowerEnterpriseSearch.pdf [Accessed 17 June 2009] p.5 42
  • 43. 3.2.5 The commercial search market repartition Regarding the repartition of search use on the Internet we can see that the block “Europe+ North America” is representing more than half of the market with 55%. The Asian-Pacific area is well represented with one third of the market. Figure 21: Worldwide Search by Region Northern American and Asian Internet users are more or less experiencing the same volume of search whereas it is in Europe and Latin America that Internet users are performing it the most per capita. This part will be more developed in chapter 5.1.4: Google dependency state. 3.2.6 The commercial search engines in the world As mentioned in chapter 3.2.1, 6 research out of 10 on the Internet are made on Google. However it does not mean that each country in the world has a population of 60% Google users. 43
  • 44. Figure 22: Search engine leaders (>50%) per country personal estimation66 The world is not covered entirely by Google. There are some 7 other leaders: Yahoo, Yandex (Mail.ru), Baidu, Microsoft, Naver, Seznam and Leit.is. Almost all the American continent is using Google as well as Europe, Northern Africa, Southern Africa, Australia and India. In one word almost all countries which have strong links with the Anglo- Saxon culture. The strong presence of Yandex in Eastern Europe (ex-soviet countries) and Russia could let us think about a possible « boycott of American technologies » and support of Russian technologies. The recent partnership between Yandex (main search engine in Russia) and the browser Firefox is increasing those suspicions67. 66 Alexa Web. (n.d). Alexa Top 500 Global Sites. [online]. Available from: http://www.alexa.com/topsites [Accessed 17 June 2009] 67 cf. Houste, F. (2009). Russie: Yandex sera le moteur de recherche par défaut de Firefox. [online]. Available from: http://www.search-engine-feng-shui.com/2009/01/russie-yandex-sera-le-moteur- de-recherche-par-defaut-de-firefox/ [Accessed 23 January 2009] cf. Schwartz, B. (2009). Firefox Drops Google For Yandex In Russia, But Big Loser May Be Rambler. [online]. Available from: http://searchengineland.com/firefox-drops-google-for-yandex-in-russia-but- big-loser-may-be-rambler-16107 [Accessed 23 January 2009] 44
  • 45. The same observation can be made in China. The recent advertisement broadcast by Baidu 68 (the search engine leader in China) are going in that sense, showing clearly the will of getting rid of foreigner search engines.69 The Russian and Chinese cases are contradictory with the concept mentioned in the book ―Winners, Losers and Microsoft‖ which is saying that the best product always win70. The search engine market is then not a rational one. Information regarding Caribbean areas and Central Africa are hard to find and are not very relevant taking in account that the Internet is not well implemented yet. On the other hand the Pacific area region is quite interesting because containing all the « Tigers » (Taiwan, Thailand...) are all in red: Yahoo. As a conclusion the search engine world is divided into two parts:  The Google planet: which is composed of all the Anglo-Saxon countries as well as countries which have strong links with the United States or Great Britain. Czech Republic and Iceland seem only to be a matter of time?71.  The Asian – Pacific regions: Asia is composed of a lot of countries and then a lot of cultures. Among them we can identify four players: o Yandex (Mail.ru) which is dominating all the ex-soviet countries; o Baidu which has a total control over China; o Naver (NHN Corporation), a 100% South Korean product which is the best example that search engines work by culture; 68 Baidu. (2006). Baidu advertisement. [online]. Available from: http://www.youtube.com/watch?v=EPnmsFl__nU [Accessed 17 June 2009] 69 cf. Einhorn, B. (2007). Baidu Thinks It Can Play in Japan. [online]. Available from:http://www.businessweek.com/globalbiz/content/feb2007/gb20070215_649662.htm?chan=gl obalbiz_asia_technology [Accessed 23 January 2009] cf. Grallet, G. (2009). Baidu, un autre Google s'éveille. [online]. Available from: http://www.lexpress.fr/actualite/high-tech/baidu-un-autre-google-s-eveille_734826.html [Accessed 23 January 2009] cf. Shijun, Z./Peng, N./Weifeng, X. (2006). 时尚中国—网动中国英. p.45 70 Liebowitz, S. J./Margolis, S. (1999). Winners, Losers and Microsoft 71 cf. Rafat, A. (2008). Czech Portal Seznam Could Fetch $900 Million; Google, Apax, Warburg and Others in Fray. [online] Available from: http://www.washingtonpost.com/wp- dyn/content/article/2008/08/15/AR2008081502517.html [Accessed 23 January 2009] cf. Mar Hauksson, K. (2007). Global search report 2007 [online]. Available from: http://www.e3internet.com/downloads/global-search-report-2007.pdf [Accessed 23 January 2009] p.8 45
  • 46. o Yahoo which is leader in almost all ―Tigers‖ Asian countries. Yahoo being an American technology how can we explain his domination in Asia? The reason is mainly cultural, Yahoo is a shiny portal and that Asian culture on the Internet recognize a quality website to the number of animations on it72. Another explanation could be the leading presence of Yahoo in Japan which can influence the tigers countries. Moreover Japan has one of the highest rate of the Internet integration in the world per capita73. 3.2.7 Commercial search engine leaders presentation Knowing search engine leaders and the services they are providing is critical to understand the search engine dependency concept. Here is a list of the main commercial search engine actors: Google: 74 Created in 1998 in the United States. Physically present in 34 countries around the world. Services provided: News, Blogs, Images, Videos, Maps, Mail services, Social networks, e-commerce, Online advertising… Language supported: More than 65. Figure 24: Google logo Figure 23: Search engine market in the USA, source: Hitwise, february 2009 72 cf. Tobin, R./Hotchkiss, G./Lee, P. (2008). Chinese Search Engine Engagement. [online]. Available from : http://www.enquiroresearch.com/download-research-whitepapers.aspx [Accessed 17 June 2009] p.28. 73 Internet World Stats. (2009). Internet Usage in Asia. [online]. Available from: http://www.internetworldstats.com/stats3.htm [Accessed 17 June 2009] 74 Miller, M. (2006). Googlepedia. p.11 46
  • 47. Yahoo (―Yet Another Hierarchical Oracle‖):75 Created in 1994 in the United States it started as a directory to become later an Internet Portal. Physically present in 20 countries around the world. Services provided: News, Business directory, Maps, Videos, Images, Online advertising, Mail services, Jobs, Questions/Answers…. Language supported: More than 20. Figure 25: Yahoo Logo Figure 26: Japanese search engine market, source:webcreate.ga-pro.com, May 2009 Baidu:76 Created in 2000 in China. Physically present in China and in Japan. Services provided 77 : News, Business directory, Maps, Music, Videos, Images, Online advertising, Social networking… Language supported: 2 (Chinese and Japanese). Figure 28: Baidu logo Figure 27:Chinese search engine market, source:China IntelliConsulting Corp. sept 2008 75 Yahoo Inc. (n.d.). Company Overview. [online]. Available from: http://yhoo.client.shareholder.com/press/overview.cfm [Accessed 17 June 2009] Yahoo Inc. (n.d.). Yahoo dans le monde. [online]. Available from: http://world.yahoo.com/?c=fr [Accessed 17 June 2009] 76 Shijun, Z./Peng, N./Weifeng, X. (2006). 时尚中国—网动中国英. p45 Baidu Japan Inc. (n.d.). Baidu (バイドゥ)会社情報 - 会社概要 . [online]. Available from : http://www.baidu.jp/info/corp/data.html [Accessed 17 June 2009] 77 Baidu Inc. (n.d.). Baidu products. [online]. Available from : http://ir.baidu.com/phoenix.zhtml?c=188488&p=irol-products [Accessed 17 June 2009] 47
  • 48. Microsoft: Microsoft main search engine is named ―Bing‖ (since June 2009). Search engines being not the core activity of Microsoft it is quite complex to give a description of it. Internet users do not go properly on Bing to use it but on Microsoft other sites services such as hotmail. Microsoft is physically implemented all over the world. Services provided: News, Social Networking, blogs, Mail services, toolbar… Language supported:78 41 Figure 29: Bing logo Naver: 79 Created in 1999 in South Korea. Naver is an Internet Portal. Physically present in South Korea, China, Japan and the United States. Services provided: News, e-commerce, Social Networking, blogs, real time information, Books, Mail services, toolbar. Language supported: Korean. Figure 31: Naver logo Figure 30: Korean search engine market, source: July 2007 Koreanclick Yandex:80 Created in 1997 in Russia. Yandex is physically present in Russia, Ukraine and the United States. 78 Microsoft. (n.d.). Préférences Bing. [online]. Available from: http://www.bing.com/settings.aspx?sh=2&FORM=WIWA [Accessed 17 June 2009] 79 NHN Corporation. (n.d.). NHN Corporation. [online]. Available from : http://www.nhncorp.com/ [Accessed 17 June 2009] 48
  • 49. Services provided: News, e-commerce, Social Networking, blog search engine, Maps, dictionary, Mail services, photos, website, videos, professional network, online payment service, online advertising. Language supported: Russian, Ukrainian and English. Figure 32: Yandex logo Figure 33:Search engine market in Russia, source: LiveInternet.ru:December 31, 2008 Seznam81: Created in 1996 in Czech Republic. Seznam is an Internet Portal. Physically present in Czech Republic. Services provided: Search, Business directory, Images, Mail services, Online advertising, e-commerce, News, Social Network, Jobs, Online Games. Language supported: Czech. Figure 34: Seznam logo Figure 35: Search engine market in Czech Rep, source: navrcholu.cz, June 2008 Leit.is:82 Leit.is is an Icelandic Internet portal created in 1999. It is physically present in Iceland. Services provided: Images, Music… 80 Yandex inc. (2008). Russia’s largest internet search engine and a leading internet and technology company. [online]. Available from: http://download.yandex.ru/company/mini_book_v19.pdf [Accessed 17 June 2009] 81 Seznam inc. (n.d.). Vize firmy | O společnosti Seznam.cz.[online]. Available from : http://firma.seznam.cz/cz/vize-firmy.html [Accessed 17 June 2009] 82 Leit.is. (n.d.). Leit.is - Um leit.is :: Um leit.is. [online] Available from: http://www.leit.is/umleit/ [Accessed 17 June 2009] 49
  • 50. Language supported: Icelandic and English Figure 37: Leit.is logo Figure 36:Search engine market in Iceland, source: statice.is 2007 As we can see none of those search engine leaders are simple search engines anymore. All are providing a bunch of services linked to their search activity. Moreover they all have at least ten years of experience in the search field. The one accumulating the most market shares are the one who play internationally. 3.2.8 Commercial search engine complexity As previously mentioned commercial search engines are not only providing a search experience. They are all moving toward a personalized interface with a set of associated services. In fact they are changing to a personal desktop environment where by creating a simple free account you can access to your emails, search engine, personal documents, software suite solutions such as spreadsheet, slides or word processor. iGoogle is a good example of it: 50
  • 51. Figure 38: An example of a customized interface on iGoogle It is like an Operating System (Google) within the Operating System (Microsoft, Linux, Mac OS). In such configuration commercial search engines are providing more interesting services because more instinctive tools than the ones within companies. Companies employees frustrations can then be understood. The technological mass public market is for them moving faster than the business one. 3.2.9 Search engine market shares configuration A study untitled « Global Search Report 2007 »83 realized in 2007 made a clear view of the market. It shows that the configuration of each market is always the same: 83 cf. Wilsdon, N. (2007). Global Search Report 2007. [online]. Available from: http://www.e3internet.com/downloads/global-search-report-2007.pdf [Accessed 23 January 2009] 51
  • 52. Figure 39: Search engine market shares in 2007 for the Czech Republic It is very rare to find a country where there is a close competition among search engines. Even if in the High Technology sector things change from a day to another you have often the following configuration where the first search engine is leading the game by more than 30 points on its followers. When a search engine get more than 50% of the market it is adopted as a standard. This trend seems quite relevant in the software industry, people seem to look for a standard used by all. This is the case for the Operating System industry, the browser industry, the e-learning industry. The explanation of such a success with- in a population can be found in the word to mouth, isn’t it how Google has been so successful? How never heard sentences such as « you just have to Google it » Google is even nowadays in dictionaries as a verb84. Markets are also define by a lot of small local search engines which are if original enough bought by the biggest ones or if not will disappear quickly (some examples are coming in the news every month). The only key of the success on the short term seems to be advertisement but on the long run you need the technology behind in order to compete. 3.2.10 Search engines competition 84 cf. Merriam Webster. (2001). Google - Definition from the Merriam-Webster Online Dictionary. [online]. Available from: http://www.merriam-webster.com/dictionary/google [Accessed 17 June 2009] 52
  • 53. Google has been created in 1998 and was not a pioneer in the field of search engines. In a short period of time Google succeed to take the lead and among the pioneers in this field only Yahoo (created in 1994) is still in place. Even if Google has a dominating position on the market it will take him a lot of time to be the number one in all countries (as we saw this market is not rational mainly due to political and cultural reasons). This situation is in fact giving hope/time to its competitors. Yahoo is still in discussion with Microsoft in order to buy Yahoo search technologies. One can understand how strategic can be such acquisition. Yahoo having the research knowledge and Microsoft the funds as well as the software ownership. Regarding Baidu we cannot clearly see how they could compete against Google outside of China. What about new comers? Starting from nothing they could maybe beat famous search engines in a small period of time. It could have been the success of some services such as Cuil launched in summer 2008 which received a lot of advertisement through the news85. But the search engines market is a very ungrateful world where visitors are giving no more than one chance: the product works or it does not. In the case of Cuil it did not. ―An information retrieval system will tend not to be used whenever it is more painful and troublesome for a customer to have information than for him not to have it.‖86 Users want the information as soon as they can. When you move from Google to another search engine you are often intransigent. At the first result which does not fit your expectations you will go back to Google. But is the search engine wrong or is it because it is responding differently that on what you were used to? As a conclusion it is hard to say how Google can lose his dominant posi- tion. Until now only one company succeeds to make a such gap in the world of 85 cf. Arrington, M. (2008). Cuil On BusinessWeek's Most Successful of 2008 List. Huh?. [online]. Available from: http://www.techcrunch.com/2008/12/29/cuil-on-businessweeks-most-successful-of- 2008-list/ [Accessed 17 June 2009] 86 Mooers, N. C. (1959). A panel discussion at the Annual Meeting of the American Documentation Institute. 24 October. 53
  • 54. search engine and it is Google itself and it was in a period where everything had to be created on the Internet. A new technology regarding research is however more and more recurrent in this field and is called semantic research. 3.3 Search engine dependency aspect As mentioned in the introduction search engine dependency is the fact that people are using only one search engine and then only one way to process data when looking for information on the Internet. 3.3.1 Search engines dependency proves The sources used for this part are coming from Canadian 87 , French 88 and Belgium students panels89. Some other information regarding Germany, China (Hong Kong)90 and the United States91 have also been used. Those studies have been made on different panels: students, workers (researchers), household and the following conclusion have been made: search engine is the first tool when looking for information on the Internet. It also states regarding surveys made on students that most of them did not receive enough training on how to look for information. 87 cf. Crepuq. (2003). Etude sur les connaissances en recherche documentaire des étudiants entrant au 1er cycle dans les universités québécoises. [online]. Available from : http://www.crepuq.qc.ca/documents/bibl/formation/etude.pdf [Accessed 18 June 2009] 88 cf. Université de Lyon. (2007). De la documentation au plagiat. [online]. Available from : http://www.compilatio.net/files/sixdegres-univ-lyon_enquete-plagiat_sept07.pdf [Accessed 18 June 2009] 89 cf. EduDoc. (2008). Enquête sur les compétences documentaires et informationnelles des étudiants qui accèdent à l'enseignement supérieur en Communauté française de Belgique. [online]. Available from : http://www.edudoc.be/synthese.pdf [Accessed 18 June 2009] 90 cf. Leung, H. W. 梁漢榮. (2004). A study of computer science students' conceptions of information literacy and their experiences in information search process and use. [online]. Available from: http://hub.hku.hk/handle/123456789/30758 [Accessed 18 June 2009] 91 cf. Enquiro. (2004). Search Engine Usage in North America. [online]. Available from: http://www.enquiroresearch.com/download-research-whitepapers.aspx [Accessed 18 June 2009] 54