1. Internet Research Techniques
Internet Research Challenges
Finding information on the Internet that is relevant, useful, current and credible can be challenging.
Information on the Internet is:
• decentralized - thousands of networks are involved
• disorganized - no central index or database exists
• dynamic - changing every minute of every day
• expanding rapidly
• not subject to traditional pre-publication checks and balances
• not always authentic or accurate
• not always predictable - resources can disappear or change suddenly
Searching tools and techniques are:
• numerous and varied
• not standardized
• constantly changing
Cataloguing Internet Resources
Thousands of organizations are engaged in cataloguing internet resources. Addresses and
descriptions of sites are collected in databases which can then be searched by users. These
databases are developed by either:
• Human collection, review and classification of internet sites. This method usually
results in smaller but higher quality databases because of the application of human
judgement.
• Automated collection and classification of internet sites using programs called
spiders or crawlers. This method produces larger databases but often lacks the
quality found in human-generated databases.
HOW TO FIND INFORMATION ON THE INTERNET
There are a number of basic ways to access information on the Internet:
1. Go directly to a site if you have the address
2. Browse
3. Explore a subject directory
4. Conduct a search using a Web search engine
5. Explore the information stored in live databases on the Web, known as the "deep
Web"
6. Join an e-mail discussion group or Usenet newsgroup
1
2. Each of these options is described below.
1. GO DIRECTLY TO A SITE IF YOU HAVE THE ADDRESS
If you know the Internet address of a site you wish to visit, you can use a Web browser to
access that site. All you need to do is type the URL in the appropriate location window. URL
stands for Uniform Resource Locator. The URL specifies the Internet address of the
electronic document. Every file on the Internet, no matter what its access protocol, has a
unique URL. Web browsers use the URL to retrieve the file from the host computer and the
directory in which it resides. This file is then downloaded to the user's computer and
displayed on the monitor.
This is the format of the URL: &nsp; protocol://host/path/filename
For example:
http://www.house.gov/agriculture/schedule.htm - a hypertext file on the Web
ftp://ftp.uu.net/graphics/picasso - a file at an FTP site
telnet://locis.loc.gov - a Telnet connection
Any of these address can be typed into the location window of a Web browser.
2. BROWSE
Browsing home pages on the Web is a haphazard but interesting way of finding desired
material on the Internet. Because the creator of a home page programs each link, you never
know where these links might lead. High quality starting pages will contain high quality links.
3. EXPLORE A SUBJECT DIRECTORY
Universities, libraries, companies, organizations, and even volunteers have created subject
directories to catalogue portions of the Internet. These directories are organized by subject
and consist of links to Internet resources relating to these subjects. The major subject
directories available on the Web tend to have overlapping but different databases. Most
directories provide a search capability that allows you to query the database on your topic of
interest.
When to use directories? Directories are useful for general topics, for topics that need
exploring, for in-depth research, and for browsing.
There are two basic types of directories: academic and professional directories often created
and maintained by subject experts to support the needs of researchers, and directories
featured on commercial portals that cater to the general public and are competing for traffic.
Be sure you use the directory that appropriately meets your needs.
• INFOMINE, from the University of California, is a good example of an academic
subject directory
• Yahoo is a famous example of a commercial portal
Subject directories differ significantly in selectivity. For example, the famous Yahoo directory
does not carefully evaluate user-submitted content when adding Web pages to its database.
It is therefore NOT a reliable research source and should not be used for this purpose. In
contrast, INFOMINE selects only those sources considered useful to the academic and
2
3. research community. Consider the policies of any directory that you visit. One challenge to
this is the fact that not all directory services are willing to disclose either their policies or the
names and qualifications of site reviewers. A number of subject directories consist of links
accompanied by annotations that describe or evaluate site content. A well-written annotation
from a known reviewer is more useful than an annotation written by the site creator as is
usually the case with Yahoo.
It is useful to understand that certain directories are the result of many years of intellectual
effort. For this reason, it is important to consult subject directories when doing research on
the Web.
Recommended starting points:
• If you want to explore a large number and variety of sources, try Librarians Internet
Index. Supported by a federal grant, a large number of Californian librarians select
and annotate Web resources across a broad range of topics. New sites are added
continually, so the directory is up-to-date. With its extensive but careful selection,
objective and useful annotations, and heirarchical organization, LII might well be
thought of as "the thinking person's Yahoo."
• INFOMINE is a large directory of Web sites of scholarly interest compiled by the
University of California. The directory may be browsed or searched by subject,
keyword, or title. Each site listed is accompanied by a description.
• Resource Discovery Network is searchable interface to major meta-sites in academic
disciplines maintained in Great Britain, including Business, EEVL - Engineering,
Mathmatics and Computing, Humbul Humanities Hub, PsiGate - Physical Sciences
Information Gateway, and SOSIG - Social Science Information Gateway.
4. CONDUCT A SEARCH USING A WEB SEARCH ENGINE
An Internet search engine allows the user to enter keywords relating to a topic and retrieve
information about Internet sites containing those keywords. Search engines are available for
many of the Internet protocols. For example, Archie searches for files stored at anonymous
FTP sites.
Search engines located on the Web have become quite popular as the Web itself has
become the Internet's environment of choice. Web search engines have the advantage of
offering access to a vast range of information resources located on the Internet. Many
search engines also search multimedia or other file types on the deep Web, often accessible
as separate searches. Web search engines tend to be developed by private companies,
though most of them are available free of charge.
A Web search engine service consists of three components:
• Spider: Program that traverses the Web from link to link, identifying and reading
pages
• Index: Database containing a copy of each Web page gathered by the spider
• Search engine mechanism: Software that enables users to query the index and
that usually returns results in term relevancy ranked order
All search engines have rules for formulating queries. It is imperative that you read the help
files at the site before proceeding. Online tutorials can also help you learn the rules.
3
4. 5. EXPLORE THE DEEP WEB
The concept of the "deep" or "invisible" Web is a challenging one. This refers to content that
is stored in databases accessible on the Web but usually not available via search engines.
In other words, this content is "invisible" to search engines. This is because spiders cannot
or will not enter into databases and extract content from them as they can from static Web
pages. In the past, these databases were fewer in number and referred to as specialty
databases, subject specific databases, and so on.
The best way to access information on the invisible Web is to search the databases
themselves. Topical coverage runs the gamut from scholarly resources to commercial
entities. Very current, dynamically changing information is likely to be stored in databases,
including news, job listings, available airline flights, etc. As the number of Web-accessible
databases grows, it will become essential that they be used to conduct successful
information finding on the Web.
Other content usually not gathered by spiders includes non-textual files such as multimedia
files, graphical files, and documents in non-standard formats such as Portable Document
Format (PDF). Google is an exception here, since it indexes PDF, Word, and other
documents in its searchable index.
Keep in mind that many search engine sites and commercial portals feature deep Web
content as part of their package of services. This phenomenon falls under the heading of
converging content. For example, you can visit AltaVista and look up news, maps, jobs,
auctions, items for purchase, etc., all things outside the purview of a spider- gathered index.
As another example, Google integrates searches of PDF and Microsoft Office files into its
general search service.
6. JOIN AN E-MAIL DISCUSSION GROUP OR USENET NEWSGROUP
Join any of the thousands of e-mail discussion groups or Usenet newsgroups. These groups
cover a wealth of topics. You can ask questions of the experts and read the answers to
questions that others ask. Belonging to these groups is somewhat like receiving a daily
newspaper on topics that interest you. These groups provide a good way of keeping up with
what is being discussed on the Internet about your subject area. In addition, they can help
you find out how to locate information--both online and offline--that you want.
E-mail discussion groups can be associated with academic institutions. Many topics are
scholarly in nature, and it is not unusual for experts in the field to be among the participants.
In contrast, Usenet newsgroups cover a far wider variety of topics and participants have a
range of expertise. Be careful to evaluate the knowledge and opinions offered in any
discussion forum. Note also that a small number of e-mail groups are cross-posted as
Usenet newsgroups. For example, the early music e-mail group EARLYM-L also exists as
the newsgroup rec.music.early.
E-mail discussion groups are managed by software programs. There are three in common
use: Listserv, Majordomo, and Listproc. The commands for using these programs are
similar.
A list of Usenet newsgroups can be accessed from within a newsreader program. Web
browser suites such as Netscape Communicator include a newsreader. This offers the
convenience of Usenet access in a graphical environment as a part of the Web experience.
4
5. A good Web-based directory to assist in locating e-mail discussion groups and Usenet
newsgroups is Tile.net, located at http://tile.net/
7. SUBSCRIBE TO RSS FEEDS
One of the newer communication technologies on the Web is RSS. This variably stands for
Rich Site Summary, Really Simple Syndication, and so on. RSS allows people to place news
and other announcement-type items into a simple XML format that can then be pushed to
RSS readers and Web pages. Users can subscribe to the RSS newsfeeds of their choice,
and then have access to the updated information as it comes in. RSS is used for all kinds of
purposes, including the news itself and announcing new content on Web sites.
RSS content may be read by using an RSS reader, or aggregator. This is usually free
software that you can install on your computer that posts new items and stores old ones in a
graphical interface. An RSS reader similar to e-mail software in that it displays incoming
items and can store content for offline reading. Subscribing to a newsfeed is usually as
simple as entering the address of the RSS document. A useful list of RSS readers is
available on the site of Weblogs Compendium.
A blog (a contraction of the term "web log") is a type of website, usually maintained by an
individual with regular entries of commentary, descriptions of events, or other material such
as graphics or video. Entries are commonly displayed in reverse-chronological order. "Blog"
can also be used as a verb, meaning to maintain or add content to a blog.
It is also possible to subscribe to and read your own collection of RSS feeds on Web sites
devoted to this purpose. Bloglines is one such example. The advantage here is that you can
access your RSS feeds from any computer that is connected to the Web.
Searching by Browsing
In this method of searching the search page presents several topics and sub-topics. Users search by
selecting a topic, then a sub-topic and continue "drilling down" until the required information is found.
The browsing method is often used for broad searches.
Try this method of searching at Yahoo. Select the topic Regional, then Countries, then Canada and
continue making selections until you reach the category for Calgary.
Searching by Keyword
In this method of searching:
1. The user enters keywords in a query box and requests a search. Some search tools
place more emphasis on the first keyword, assuming it is the most important.
2. The search tool attempts to match the keywords with entries in its database then
returns a "hit list" of sites related to the keywords. The sites in the hit list are usually
ranked by relevance with the best matches at the top of the list. The information for
each site includes a link to the particular internet resource and in many cases a brief
abstract of the site.
3. The user selects appropriate sites from the hit list and reviews the pages to find the
information required. The keyword searching method is often used for narrow,
specific searches.
Example:
5
6. Hit List
Site 1 - link and description
Site 2 - link and description
Site 3 - link and description
etc.
Improving Keyword Search Results
Search results can usually be improved by using search operators. These
operators help the search tool select better matches from its database. Some search
tools recognize many different search operators - the user should consult the search
tool's HELP page for more information. Three search operators are so widely used
they are practically universal.
• +word - hit list includes sites that contain this word
• -word - hit list excludes sites that contain this word
• "phrase" - hit list includes sites that contain this exact phrase (multiple
words are treated as a single word)
Examples:
• hit list includes sites containing the word "topaz" and excludes sites
containing the word "gem"
• hit list includes sites containing the exact phrase "Robin Hood"
• hit list includes sites containing the exact phrase "Robin Hood" and
excludes sites containing the word "flour" (to exclude sites about the
Robin Hood Flour Company)
Try using this method of searching to answer the following questions with Google:
1. Who is the Mayor of Oshawa? ANSWER
2. What is the third question in the Four-Way Test? ANSWER
3. What was the real name of the legendary figure known as "Grey Owl"? ANSWER
Boolean Search Operators
Boolean operators can be used to define the relationship between multiple keywords. Not all search
engines recognize these operators. Check the HELP section of the search engine to determine which
operators can be used as part of the query term.
AND Operator
• Include resources that contain all keywords
• Used to narrow or tighten a search
• Example:
6
7. OR Operator
• Include resources that contain either or both keywords
• Used to broaden a search
• Example:
NOT Operator
• Exclude resources that contain the keyword
• Used to narrow a search
• Example:
Parenthetical phrases can be used to create more complex query terms consisting
of multiple boolean operators.
Example:
Proximity Search Operators
Proximity operators can be used to specify the relative location of keywords. Not all
search engines recognize these operators. Check the HELP section of the search
engine to determine which operators can be used as part of the query term.
ADJ Operator
• Keywords must occur beside each other, but may be in either order
• Example:
BEFORE Operator
• Find two keywords, one of which occurs before the other
• Example:
NEAR Operator
• Find two keywords that are within a specific number of words (or less)
from each other in either direction
• Example:
FAR Operator
• Find two keywords that are at least 25 words (or more) from each other
in either direction
• Example:
7
8. Truncation Search Operator
Truncation locates resources that include alternate forms of a keyword. Not all
search engines recognize this operator. Check the HELP section of the search
engine to determine which operators can be used as part of the query term.
• Truncation, also known as stemming, is applied through the use of a
wildcard character. The universal wildcard character is an asterisk (*).
Some search engines recognize other characters as a wildcard.
• Truncation is used where there are multiple valid spellings of a
keyword
• Example: will locate Canada, Canadian, Canadienne, etc.
Field Search Operators
Field search operators direct the search engine to look for keywords in different parts
of the web page. Not all search engines recognize these operators. Check the HELP
section of the search engine to determine which operators can be used as part of the
query term.
TITLE Operator
• Locates resources where the keyword occurs in the title of the web
page
• Example:
URL Operator
• Locates resources where the keyword occurs in the url of the web page
• Example:
LINK Operator
• Locates resources where the keyword occurs in hyper-text links on the
web page
• Example:
Planning and Conducting a Search
• Develop a general understanding of the search tools, process and language. However,
Start with the information that you need to search subjects of interest. You will find
that your understanding will build as your experience broadens.
• Avoid searching for obscure information not likely to be found without use of
sophisticated search methods.
8
9. • In keyword searches, start by working with no more than two or three search tools
until you gain some mastery over them. A search tool’s Help section usually describes
its current keyword search practices. From these learn how best to compose a query
and focus the search.
• Focus on the subject you wish to pursue, develop a search plan and go about finding
the desired information.
• Use natural language to compose your query since it does not require the use of
operators or special rules.
• Yahoo is particularly effective as a start, because it has a keyword option along with a
its subject search. With the largest subject directory, its sub-categories frequently
become detailed enough to provide choice documents. If necessary, use the keyword
search option at the appropriate place on the search path and employ natural language
for the query.
• Go to AltaVista and compose your query in direct question form. Be sure to use the
question mark at the end of the sentence to ensure a suitable search.
Selecting a Tool for Your Search
The general guidelines to get you started on your search of the Web is introduced below. It
is important to think about your information need before selecting a search tool. To help you
do this, the chart below lists different kinds of questions, information needs and preferences
that you may have. These are organized into categories that list a sample of tools that you
should try.
Query types Examples What to use
• Your topic is general • I'm doing research • Subject directories, especially
• You want to view a on drug abuse academic and professional
collection of sites
recommended by • Some general queries might be
experts helped by the next group of
• You want to browse options below
the possibilities
• You want a limited
number of high-
quality results
• You want results • I'm doing research • Concept clustering search
organized into on discrimination engines: Accumo, Clusty,
concept clusters iBoogie, Infonetware, Query
rather than one long Server, Vivisimo
list of results
• You are doing in- • Search engines that offer
depth research that searches on similar documents
includes an in the results list, or provide a
exploration of hyperlinked list of alternative
several subtopics topics for retrieving related
• You want a better results: AltaVista,, Ask.com,
9
10. understanding of the
scope of your topic Google, Ixquick
• You are unfamiliar
with your topic
• Your topic is narrow • I'm doing research • Peer ranking search engines:
and limited in scope on age Ask.com, Google
• You are looking for discrimination • General Search Engines
specific • I'm researching civil
site/fact/individual/ev rights in Turkey • Meta search engines: Fazzle,
ent, etc. • I'm looking for the Ixquick, more...
• Your topic consists site of the Society of
of more than one American
concept Registered
• Your topic is obscure Architects
• You are looking for a • I'm looking for
specific domain, file information about
type, geographic Nelson Mandela
location, etc. from South Africa
• You are looking for • What search terms • Thesaurus-creating search
the appropriate should I use to engine: SurfWax
search terms investigate my
topic?
• Your search terms
are ambiguous • How can I learn
about bridge?
• You are looking for • I need today's stock • Deep Web sources:
dynamically- price for Microsoft o Specialized search
changing information • I want to search engines:
• You are looking for news stories from FindSounds.com,
very recent yesterday NewsLibrary, many
information • I want to see a more
• You are looking for photo of the World o Search engines with
non-textual files Trade Center specialized search
such as software, • I need a list of offerings: AltaVista,
graphics, lawyers in Albany, IceRocket, many more
multimedia, N.Y. o General Search engines:
documents in PDF keyword searches may
format,etc • I want to research turn up a relevant site
the laws of with a searchable
• You are looking for California on database
information usually computer crime o Directories: keyword
stored in a database searches may turn up a
such as a directory, relevant site with a
phone book, etc. searchable database
o Collections of databases
on the Web: Turbo10
Some Search Tools
10
11. AltaVista
Home Page Address: http://www.altavista.com/
Help Page Addresses: Advanced Query
http://www.altavista.com/help/search/help_adv
Advanced Query- http://www.altavista.com/web/adv
Search Method: Primarily keyword, with a subject option that draws on LookSmart
subject directories. Also provides Popular Sites on its Home Page under "Specialty
Searches".
Google
Home Page Address: http://www.google.com/
Advanced Query:: http://www.google.com/help/index.html
Help Page Address: http://www.google.com/advanced_search?hl=en
Search Method: Primarily keyword. By selecting ‘I'm feeling lucky’ as an option
may limit the search to the most relevant site.
AlltheWeb ( Fast )
Home Page Address http://www.alltheweb.com/
Help Page Address: http://www.alltheweb.com/help/
Advanced Query: http://www.alltheweb.com/advanced?cs=utf-8
Search Method: Primarily keyword. Advanced query allows complicated boolean
searches with filtering.
EXCITE
Home Page Address: http://www.excite.com/
Help Page Address: http://www.infospace.com/info.xcite/about/corporate/help.htm
Search Method: Meta search. Provides advanced search. Also provides long lists of
Popular Sites.
DOGPILE
Home Page Address: http://www.dogpile.com/
Help Page Address: http://www.dogpile.com/notes.html
Advanced Query: http://www.dogpile.com/t/tools/custom
Search Method: Meta search. Also provides long lists of Popular Sites
Britannica Internet Guide
Home Page Address: http://www.britannica.com/
Help Page Address: http://www.britannica.com/help/search
Search Method: Primarily subject with a keyword option
HotBot
Home Page Address: http://www.hotbot.com/
Advanced Query: http://www.hotbot.com/prefs_filters.asp?
prov=Inktomi&filter=web
Help Page Address: http://help.lycos.com/hotbot/
Search Method: Meta search portal through which Inktomi, Google, Fast, and
Teoma search engines can be utilized with specialized filters
GO (INFOSEEK)
Home Page Address: http://infoseek.go.com/
Help Page Address: http://www.go.com/Help/
Search Method: Has become a portal site for the ABC families with Disney and
ESPN. Google is now the default search engine.
Practical Assignment
Search for information on:
11
12. • Network Operating System and Rules/Security
• Network Connectors & Cabling
• Setting Up of a PC on Network
• Requirements for Setting Up an Internet
• Internet Browsing Techniques
Send your results which is of high quality to: odugbesanadekunle@gmail.com
By:
O.A. Odugbesan
12