SlideShare una empresa de Scribd logo
1 de 75
UNIX Overview
The UNIX operating system was designed to let a number of programmers access the
computer at the same time and share its resources.
The operating system coordinates the use of the computer's resources, allowing one
person, for example, to run a spell check program while another creates a document, lets
another edit a document while another creates graphics, and lets another user format a
document -- all at the same time, with each user oblivious to the activities of the others.
The operating system controls all of the commands from all of the keyboards and all of
the data being generated, and permits each user to believe he or she is the only person
working on the computer.
This real-time sharing of resources make UNIX one of the most powerful operating
systems ever.
Although UNIX was developed by programmers for programmers, it provides an
environment so powerful and flexible that it is found in businesses, sciences, academia,
and industry. Many telecommunications switches and transmission systems also are
controlled by administration and maintenance systems based on UNIX.
While initially designed for medium-sized minicomputers, the operating system was soon
moved to larger, more powerful mainframe computers. As personal computers grew in
popularity, versions of UNIX found their way into these boxes, and a number of
companies produce UNIX-based machines for the scientific and programming
communities.
The uniqueness of UNIX
The features that made UNIX a hit from the start are:
·        Multitasking capability
·        Multiuser capability
·        Portability
·        UNIX programs
·        Library of application software
Multitasking
Many computers do just one thing at a time, as anyone who uses a PC or laptop can
attest. Try logging onto your company's network while opening your browser while
opening a word processing program. Chances are the processor will freeze for a few
seconds while it sorts out the multiple instructions.
UNIX, on the other hand, lets a computer do several things at once, such as printing out
one file while the user edits another file. This is a major feature for users, since users
don't have to wait for one application to end before starting another one.
Multiusers
The same design that permits multitasking permits multiple users to use the computer.
The computer can take the commands of a number of users -- determined by the design of
the computer -- to run programs, access files, and print documents at the same time.
The computer can't tell the printer to print all the requests at once, but it does prioritize
the requests to keep everything orderly. It also lets several users access the same
document by compartmentalizing the document so that the changes of one user don't
override the changes of another user.
System portability
A major contribution of the UNIX system was its portability, permitting it to move from
one brand of computer to another with a minimum of code changes. At a time when
different computer lines of the same vendor didn't talk to each other -- yet alone machines
of multiple vendors -- that meant a great savings in both hardware and software upgrades.
It also meant that the operating system could be upgraded without having all the
customer's data inputted again. And new versions of UNIX were backward compatible
with older versions, making it easier for companies to upgrade in an orderly manner.
UNIX tools
UNIX comes with hundreds of programs that can divided into two classes:
·       Integral utilities that are absolutely necessary for the operation of the computer,
such as the command interpreter, and
·       Tools that aren't necessary for the operation of UNIX but provide the user with
additional capabilities, such as typesetting capabilities and e-mail.

Tools can be added or removed from a UNIX system, depending upon the applications
required.
UNIX Communications
E-mail is commonplace today, but it has only come into its own in the business
community within the last 10 years. Not so with UNIX users, who have been enjoying e-
mail for several decades.
UNIX e-mail at first permitted users on the same computer to communicate with each
other via their terminals. Then users on different machines, even made by different
vendors, were connected to support e-mail. And finally, UNIX systems around the world
were linked into a world wide web decades before the development of today's World
Wide Web.
Applications libraries
UNIX as it is known today didn't just develop overnight. Nor were just a few people
responsible for it's growth. As soon as it moved from Bell Labs into the universities,
every computer programmer worth his or her own salt started developing programs for
UNIX.
Today there are hundreds of UNIX applications that can be purchased from third-party
vendors, in addition to the applications that come with UNIX.
How UNIX is organized
The UNIX system is functionally organized at three levels:
·       The kernel, which schedules tasks and manages storage;
·       The shell, which connects and interprets users' commands, calls programs from
memory, and executes them; and
·       The tools and applications that offer additional functionality to the operating
system

The three levels of the UNIX system: kernel, shell, and tools and applications.
The kernel
The heart of the operating system, the kernel controls the hardware and turns part of the
system on and off at the programer's command. If you ask the computer to list (ls) all the
files in a directory, the kernel tells the computer to read all the files in that directory from
the disk and display them on your screen.
The shell
There are several types of shell, most notably the command driven Bourne Shell and the
C Shell (no pun intended), and menu-driven shells that make it easier for beginners to
use. Whatever shell is used, its purpose remains the same -- to act as an interpreter
between the user and the computer.
The shell also provides the functionality of quot;pipes,quot; whereby a number of commands can
be linked together by a user, permitting the output of one program to become the input to
another program.
Tools and applications
There are hundreds of tools available to UNIX users, although some have been written by
third party vendors for specific applications. Typically, tools are grouped into categories
for certain functions, such as word processing, business applications, or programming.

Logging In & Out

When you have established contact with the Unix system, the login prompt will be
displayed. You must give your username followed by your password:

login: lnp3jb
Password: secret1

The username can be up to 8 characters in length. Unix usernames contain only
lowercase characters, and it is important that you type your username in lower case (if
you don't you will be permitted to log in, and then the shell will not recognise case
differences.) The password must normally contain between 6 and 8 characters. On some
unix systems the password must contain at least 1 non-alphabetic character.

System messages
When you log in a number of system messages may be displayed. The more filter will be
used to control the output if the file contains more than a screenful of information. Just
press the space bar to see the next screenful if it says 'more' at the bottom of the screen.

The message:
    You have new mail
indicates that electronic mail has been sent to your mailbox.

The prompt
When your login procedure is completed you should see the system prompt. This
indicates that the shell is running and is awaiting instructions from the user. The prompt
can take many forms, and you can change it later on if you want to. Often the prompt will
contain the % character, and a number in brackets. This number will represent the
number of a command, and can be used to recall commands already issued. It may also
display the name of machine or system that you are logged onto. Some users prefer to
have the name of the current working directory displayed in their prompt. For
convenience, in this document, the % character will be used to represent the prompt.

Changing your password

Use the passwd command to change your password:

% passwd                   -where '%' is the prompt

Changing password for lnp5mw
Old password:          -type in your old password
New password:           -type in your new password
Retype new password:       -and again, to make sure
%


Logging out
When you have finished your unix session you must log out from the system. To do this
give the command:

% logout

You should always wait for the message confirming that you have logged out.

On some unix systems you may receive the message:

      logout: command not known

If this happens you should type:

     exit

You may occasionally get the message:

There are stopped jobs

If this happens simply give the logout command again.

-----------------------------------------------------------------------------------------------------------
-
PRACTICE
Log in to the unix system using your username and password.Change your password
using the passwd command. You may find that the system will not change your password
immediately. In this case you may have to use your old password next time that you log
on.
-----------------------------------------------------------------------------------------------------------
-

THE UNIX FILESTORE
------------------

File hierarchy
Unix has a hierarchical tree-like filestore. The filestore contains files and directories.

The top-level directory is known as the root. Beneath the root are several system
directories. The root is designated by the / character.

The directories below the root are designated by the pathnames:

/bin                       /etc                       /usr

Confusingly, the / character is also used as a separator in pathnames. Historically, user
directories were often kept in the directory /usr. However, it is often desirable to organise
user directories in a different manner.

Users have their own directory in which they can create and delete files, and create their
own sub-directories. For example:

/user/ei/eib035

belongs to someone whoe has the username eib035.

Some typical system directories below the root directory:

/bin contains many of the programs which will be executed by users
/etc files used by system administrators
/dev hardware peripheral devices
/lib system libraries
/usr normally contains applications software
/home home directories for different systems

The current directory
This refers to your actual location in the filestore hierarchy. When you log in the current
directory is set to the home directory. You can then change current directory, effectively
moving around the filestore tree structure. The current directory is also called the quot;current
working directoryquot; and the quot;working directoryquot;. The current directory can be referred to
in pathnames by the . character (a full stop).

Changing current directory
The command cd is used to change your current directory. For example:
% cd bin

will move you from your current directory, down one quot;branchquot; to the directory bin, if
such a directory exists. Typing cd with no arguments takes you to your home directory.

Display current directory
The command pwd is used to display your current directory. For example:

% pwd
/home/sunserv1_b/lnp5jb/bin

Pathnames
Files and directories may be referred to by their absolute pathname. For example:

/home/sunserv1_b/lnp5jb/bin/hello

Files and directories may also be referred to by a relative pathname. For example, if your
current directory is /home/sunserv1_b/lnp5jb, the above file can be referred to as:

bin/hello

The home directory
Each user has a home directory. They will be attached to this directory when they log in.
Jenny Brown's home directory is:

/home/sunserv1_b/lnp5jb

The symbol ~ can be used to refer to the home directory. If Jenny Brown wishes to refer
to her file she can give:

~/bin/hello

rather than typing the long form:

/home/sunserv1_b/lnp5jb/bin/hello

The symbol ~ can also refer to other the home directory of other users. For example
Jenny can refer to a file in John Smith's home directory using:

~lnp5js/test.dat

The parent directory
The parent directory is the directory above the current directory. The parent directory can
be referred to by the .. characters (two full stops). For example to refer to the file test.dat
in the parent directory:
../test.dat

Linking files
The ln command can be used to link files and directories across the filestore system. The
symbolic link function (ln -s) is the most useful. This enables a file or directory to appear
to be in a particular directory when it is in fact stored somewhere else. This can save the
user from having to type out long pathnames for frequently used files or directories. For
example, if you want to use the files in /usr/games regularly, you can set up a symbolic
link to this directory. If Jenny Brown is in her home directory and types:

% ln -s /usr/games fun

this will create what appears to be a new directory below her home directory, entitled fun.
When she does cd fun she will move to /usr/games. If she now does pwd, the current
directory will appear as /home/sunserv2_a/lnp5jb/fun. Some things may be a little
surprising however: the parent directory, for example, will be that of the original file or
directory.


--------------------------------------------------------------------------------

Exercises

Check which directory you are currently in. If necessary, move to your home directory.
(Remember: cd will do this from anywhere).
Move to the root directory. (quot;Move to...quot; means quot;change your current working directory
to...quot;. It is useful to picture the process as movement around the tree structure.)
Work your way down one directory at a time to your home directory.
Experiment with using relative and absolute pathnames; show how the two can produce
the same results.
Explore your systems filestore. Try to get into the home directory of someone else you
know! (You may not be able to view their files.)

--------------------------------------------------------------------------------

UNIX COMMANDS
--------------------------------------------------------------------------------

Unix commands have the general format:

command [options] [item]

Items in brackets are optional, and words in italics are generic identifiers (i.e. options
must be replaced by a particular option, e.g. -a).

Note that:
Commands are case sensitive. The command ls is different from LS. In fact LS is not
recognised as a valid command.

Command options consist of a single character. The command to list all the files in a
directory is ls -a and could not be ls -all (the latter would have to mean a combination of
options.)

Command options can usually be combined or listed separately. For example:

ls -al or ls -a -l

The command item is given last. This is very often a file name. For example:

ls -a file1.f              not               ls file1.f -a

The echo command
The echo command 'echoes' its argument to the standard output. This means that in its
simplest form it prints something out on screen. For example:

% echo Hello                                 - you type
Hello                                        - response from the shell%

Who is logged on?
The command who gives a list of logged on users:

% who
root console Jan 4 10:34
men6matw ttyp1 Jan 6 09:45 (ecusun1)
cbl6nd ttyp2 Jan 6 10:10 (cblslcd)
cbl6ar ttyp3 Jan 6 16:03 (cblsuna)
csc6ea ttyp4 Jan 6 14:15 (csuna1)
root ttyp5 Jan 6 10:40 (sun032)
ecl6rsh ttyp6 Jan 6 15:39
csc6ea ttyp8 Jan 6 14:15 (csuna1)
lnp5mw ttyUf Jan 6 16:16
lnp5jb ttyp3 Jan 6 15:20 (sun051)

Also try the command finger. This command gives the full name of logged in users.



--------------------------------------------------------------------------------
PRACTICE
Type finger to get information on yourself and other users.
--------------------------------------------------------------------------------

Creating a directory
The mkdir command is used to create directories. The format of this command is:
% mkdir directory_name

Jenny Brown stores her unix scripts in a directory called scripts beneath her home
directory. In order to create this directory she uses the command:

% mkdir scripts

Deleting a directory
The rmdir command is used to delete directories. The format of this command is:
% rmdir directory_name

Jenny Brown stores files for project work in a directory called proj. When the project has
been completed she deletes the directory using the command:

% rmdir proj

Note that the directory must be empty before it can be deleted.

Listing contents of a directory
The command ls is used to list the contents of a directory. For example:
% ls
file1 scripts test.f test

Notice that directories are listed as well as files. To list all files, including hidden files,
give the command:

% ls -a
.cshrc file1      bin    test.f   test

Hidden files begin with . (a full stop). Hidden files are normally system files, and will
normally include the following:

% ls -a
.cshrc .forward .history .login .logout


.cshrc contains commands that are executed every time you start off a C-shell, including
when you log in
.forward enables you to redirect your mail to another computer
.history contains a record of previously executed commands
.login contains commands that are executed at login time
.logout contains commands that are executed at logout time
The purpose of some hidden files.

To identify directories in a listing give the command:

% ls -F
file1 bin/     test.f   test

Notice how the directory is identified by the slash (/) character.

Deleting files
Files can be deleted using the rm command. For example:

% rm test.f

Displaying files
The command cat is used to display the contents of a file on the screen.

For example:

% cat file1

Creating files
The command cat can also be used to create a file. For example:

% cat > test.f
When typing in a new file
the input must be terminated by
^D

NOTE ^D means press the <ctrl> and the d keys simultaneously. Be careful not to type
^D when you have the shell prompt, because this might log you out. Normally you would
use an editor for creating files. This example is given since it illustrates how to create a
small file without needing to learn the use of an editor.

Copying files
The command cp is used to copy a file. It takes the format:

% cp old_file new_file

For example:

% cp file1 file2

Renaming files
The command mv is used to rename a file.
For example:

% mv file2 temp

changes the name of file2 to temp.

Moving files
The command mv is also used to move a file to a new location in the filestore hierarchy.
For example:

% mv file2 bin

moves the file file2 into the subdirectory bin.

Overwriting files
Commands such as rm and cp can be dangerous if not used with care. The command:

% cp file1 file2

will delete file2 if a file of that name already exists. If you have spelled the name of the
new file incorrectly you may accidentally overwrite the contents of a file. Using the
wildcard symbol * with the command rm can also be very dangerous. The command:

% rm test*

will delete all files starting with test. However if you inadvertently type an extra space
(do not try this!):

% rm test *                              -do not try this!

the file test will be deleted if it exists. Then all other files in the directory will be deleted!
Often no warning will be given.

To prevent accidental deletion of files you can use the -i option with commands such as
rm. The format of the command is:

% rm -i file

You will be asked to confirm that files are to be deleted. You may find that this is set as
the default on your system.

Wildcards
Wildcard characters can be used to identify directory and file names. The wildcard
character * is used to refer to any combination of characters. For example:
% ls *            - refers to all files
% cat test*       - refers to all files starting with 'test',
                    e.g. 'test', 'testing', 'test.c', etc.

The wildcard character ? is used to refer to a single character. For example:

% ls test?       - refers to files starting with 'test' followed by a single
                 character e.g. 'test1', 'test2', 'testz', etc.% cat test.? - refers to all files
starting with 'test' with a single character                                after the full stop, e.g.
'test.c, test.f'


--------------------------------------------------------------------------------

Exercises
Display your current working directory using the pwd command.
Make a directory called exercises.
Change your directory to the directory exercises. Display the current working directory.
Return to your home directory.
List the contents of your directory. Use the -l, -a and -F options and compare the output.
Change your directory to the directory exercises. Create a file called example1 using the
cat command containing the following text:
water, water everywhere
and all the boards did shrink;
water, water everywhere,
Nor drop to drink

List the contents of your directory. Use the -l option to obtain a long listing.

Viewing files with the more command

The command more is used to display the contents of a file on the screen. The command
is particularly useful for viewing long files since the display stops at the bottom of the
screen. The following is a listing of a program in the Icon programming language:

% more lookup.icn
# program to look up words (given at the terminal) in the
# computer usable version of the OALD
# last change 18.12.91
# set global parameters
global k
# main body
procedure main()
# input word to be searched for
   write(quot;Give me a word: nquot;)
   word:=read()
# this the important line - call the 'lookup' procedure
   if not write(lookup(word)) then write(quot;Not found in the dictionary.quot;)
end
procedure lookup(voc)
# connect to the dictionary
(dict:=open(quot;/home/sunserv1_a/ecl6rsh/oald.mitton/cuv2quot;)) | stop(quot;can't open the
dictionaryquot;)
# lookup algorithm
      every k:=1 to *voc do {
--More-- (75%)

The message at the bottom of the screen means that 75% of the file has been viewed so
far. (The amount shown on screen will depend on the type of terminal you are using.)
You can now do the following:

To continue viewing press the space bar

To view the next line press <RETURN>

To quit press the <q> key

To jump to the next occurrence of a string of characters type /string

For a list of valid commands press the <h> key.

Viewing files with the pg command
The pg command is also available on some systems. This is an alternative to more

% pg lookup.icn
# program to look up words (given at the terminal) in the
# computer usable version of the OALD
# last change 18.12.91

# set global parameters
global k

# main body
procedure main()
# input word to be searched for
   write(quot;Give me a word: nquot;)
   word:=read()
# this the important line - call the 'lookup' procedure
   if not write(lookup(word)) then write(quot;Not found in the dictionary.quot;)
end

procedure lookup(voc)
# connect to the dictionary
   (dict:=open(quot;/home/sunserv1_a/ecl6rsh/oald.mitton/cuv2quot;)) | stop(quot;can't open the
dictionaryquot;)
# lookup algorithm
      every k:=1 to *voc do {
      bit:=bite(voc)

Commands can be typed to the ':' prompt at the bottom of the screen: Type <RETURN>
to view the next screen. Type <h> for a list of valid commands.


--------------------------------------------------------------------------------

PRACTICE


--------------------------------------------------------------------------------

If you have a file longer than 20 lines use pg to view it. Compare the use of pg with more.
Use them both on the file /etc/passwd, and find the listing for your own username.

Searching for strings in files
The command grep is used to search a file for a string of characters. For example, to
search the file lookup.icn for the character '#' (which designates comments in the
program), use the command:

% grep # lookup.icn
# program to look up words (given at the terminal) in the
# computer usable version of the OALD
# last change 18.12.91
# set global parameters
# main body
# input word to be searched for
# this the important line - call the 'lookup' procedure
# connect to the dictionary
# lookup algorithm

A lot of pattern matching operations can be carried out with grep. The following example
shows the use of a regular expression. In this example, the search is restricted to lines
beginning with the 'p' character.

% grep quot; pquot; lookup.icn
procedure main()     -output starts here
procedure lookup(voc)
procedure bite(voc2)
You will learn more about pattern matching expressions later.

Control characters
The actual key sequences for the following operations can vary from between different
systems and different terminals. The most commonly used key sequences are described
below. If it is different on your system, remember the correct sequence and use it
whenever the key sequences below are referred to later in the text. Where possible the
operation itself is named (e.g. end-of-file), and not just the key sequence.

Deleting the last character typed
If you make a typing mistake you can delete the last character typed by using your delete
key, which is usually the one marked <DEL> or <DELETE>.

Deleting the entire line
If you make many typing mistakes you can delete the entire line by typing ^U.

NOTE Remember ^U means quot;press <CTRL> and <u> keys simultaneouslyquot;.

Sending an interrupt
If you wish to terminate the execution of a command type ^C.

Sending an end-of-file character
In many Unix commands you need to finish your input with an end-of-file character. The
default end-of-file character is ^D.

Printing on paper
This is usually called 'obtaining hard copy output', as distinct from output to the screen or
a file. The command lpr sends a file to the line printer:

% lpr file1

Note that the command lp is used on some Unix systems. The command:

% lpr -Pprinter file

is used to submit the file to a specific printer.


--------------------------------------------------------------------------------

The locally developed command printers can be used to obtain a list of printers.


--------------------------------------------------------------------------------

Getting help
The command man is used to display help on the syntax of Unix commands.

The format of this command is:

% man [option] [file]

For example to obtain help information on the who command, type:

% man who

The keyword option -k keyword is used to display a list of help files associated with the
keyword. For example to display a list of all man files associated with password type the
command:

% man -k password
getpass(3) read a password
passwd(1)  change login password
passwd(5)  password file

The command man automatically invokes the more program for viewing files. You can
use the normal more commands to continue viewing.


--------------------------------------------------------------------------------

If you have any problems that can't be solved by referring to the manual, please consult
your supervisor or the Advisory Service. The Help Desk can be contacted in person in the
User Access Area, on the telephone on extension 5366, or by email to helpdesk. Also the
LUCS Unix system operators can be contacted on telephone extension 5380. With non-
urgent problems, an email message to your supervisor is usually the most efficient way of
getting help. (See next chapter on how to use email.)


--------------------------------------------------------------------------------


Exercises
1. Display a list of logged on users.

2. Obtain further information for a particular user using the finger command.

3. Use the man command to obtain further information on the finger command.

4. Use the man -k command to find what manual entries there are related to passwords.

5. Use the grep command to search the file example1 for occurrences of the string 'water'.
6. Use man and the keyword option to find out more information on communications and
e-mail in Unix.

7. Print out a file on paper.

COMMUNICATIONS

--------------------------------------------------------------------------------

Mail
The mail command enables the user to send and receive electronic mail messages to and
from users on both the Unix system and remote users.

This is the basic mail command. Enhanced versions, such as programs that run under a
windows program (e.g. mailtool), or screen-based versions of mail (e.g. elm) may be
available, and you will probably find them preferable to mail. If so, much of the
following can safely be ignored. Remember however that some version of mail will
definitely be available on any unix system that you use.

Sending mail
To send a message to a user on your system, type:

% mail username

The cursor will move to the next line, and you will get a Subject: prompt. You can now
type in the subject of your message, and then press <RETURN>. The cursor will go to
the start of the next line and there will be no prompt. You now type in the text of your
message. Terminate each line with <RETURN>. When you have finished the text of the
message, type an end-of-file character (usually ^D), or a full-stop character. You should
now return to your normal shell prompt. If the message is dispatched successfully, you
will hear no more about it. The following is example of the mail command in action:

% mail lnp6ttld
Subject: UNIX course
I don't think I'll ever be able to get the students
in the UNIX course to understand how to use e-mail.
^D
%

Entering the text of the message by this method is a rather crude process. Errors on the
line being typed can be erased with your delete key, but once you have pressed
<RETURN>, a line cannot be edited. A message may be aborted by pressing ^C twice.


--------------------------------------------------------------------------------
PRACTICE


--------------------------------------------------------------------------------

Send yourself a message. (You will find out where it has gone in the next section.)

Subcommands while entering mail
There are several commands you can type while entering mail:

<CTRL/Z> will cancel the message, and leave the text in a file named dead.letter.

^e invoke a text editor to edit your message.

~v invoke a screen editor to edit your message.

~f reads the contents of the message you have just read, into your message text.

~r file reads contents of file into your message text.

While this method is quick and easy to use, and quite adequate for short and simple
messages, many users prefer to first create a file containing the text of the message, and
then mail this file to the intended recipient. This enables you to use any system editor and
formatter to create the message, and you do not need to send it immediately.

The following sequence shows how to send a file note containing the text of a message to
another user.

% mail lnp6ttld < note

To understand fully how this works see the section on 'Re-direction of standard output' in
Chapter 8 below.

In this example the message will not contain a subject heading, unless one has already
been included as the first line of the file note. There is a -s option with the mail
command, that can be used to include a subject header, as follows:

% mail -s UNIX lnp6ttld < note

The string following the -s is the subject; in this case, the subject is quot;UNIXquot;.

Receiving mail
If new mail is waiting for you when you login, you will see the message:

You have new mail
To start the mail program type the command:

% mail

Each message is summarised on a numbered list. The current message is marked with a
quot;>quot; character. The mail prompt character is quot;&quot;. Type the number of the message you
want to read, or just press <RETURN> to read through the list. The list of mail headers
will look something like this:

% mail
Mail version SMI 4.0 Thu Oct 11 12:59:09 PDT 1990 Type ? for help.
quot;/usr/spool/mail/lnp5jbquot;: 2 messages 2 new
>N 1 lnp5mw        Thu Jan 9 15:10 11/262 hello
 N 2 lnp5js    Thu Jan 9 15:11 10/287 party
&

This tells Jenny Brown that she has two messages, one from user lnp5mw, and one from
lnp5js. The date and time at which the messages were received is also listed, and so is the
subject header (the last item on each line - here 'hello' and 'party'). The following
commands can be entered to the mail prompt:

d Mark the current message for deletion

d n Mark message number n for deletion

u n undelete message number n.

w file save the current message in file with the mail header and mark for deletion

s file Save the current message in file without the mail header and mark for deletion

r Reply to the current message

q Quit mail, removing deleted messages from your system mailbox. Undeleted messages
that have been read are normally stored in your personal mailbox (see below)

x Exit mail, leaving your mailbox untouched, i.e. messages deleted in this session are
restored

h Show list of message headers

? List the useful mail commands

! command Execute specified shell command
- Re-read previous message.

m recipient Send mail to named recipient

Files used by mail
~/mbox Your personal mailbox, located in your home directory. This is where messages
that you have saved are stored, unless you specified another location when you saved
them. You can access this file by issuing the command:

% mail -f mbox

~/.mailrc A file that can hold commands for mail to obey when it starts up.


--------------------------------------------------------------------------------

PRACTICE


--------------------------------------------------------------------------------

See if you have received any mail. If you have, save a message to your mailbox file. Send
yourself another message, and this time discard it. Send a message to another user.

Sending mail to remote users
The following also applies to the elm mail program.

Sending mail to users on other computer systems is simple using mail. Simply type the
full address of the remote user where the system username is used above. For example:

% mail lnp5mw@uk.ac.leeds.gps
or% mail -s Hello ecl6rsh@uk.ac.leeds.cms1 < note

These two examples show two ways of sending mail shown above.

It is also possible to use mail to look at folders of mail that you have already received. To
do this type:

% mail -f folder_name

and it will treat the messages in the folder as incoming mail.

Sending on-line messages
As you have seen, messages sent using mail are received in a special buffer, and it is up
to the recipient when to look at them and what to do with them. It is also possible to send
a message that will simply appear on the screen of the recipient, if they are logged on.
This is less useful than mail for the following reasons:

mail can be used irrespective of whether the recipient is logged on or not.

mail messages can be stored by the recipient. This means that files can be transferred by
mail, and a record of transactions can be kept.

On-line messages can be confused with whatever the recipient has on screen and can
easily disrupt what the are doing. They can be very annoying!

On the other hand, on-line messages do have the advantage of obtaining the immediate
attention of another user, and it is possible to have an interactive conversation. Bearing
these facts in mind, use the following command with caution!

write
The write command is used to send on-line messages to another user on the same
machine.

The format of the write command is as follows:

% write username
text of message
^D

After typing the command, you enter your message, starting on the next line, terminating
with the end-of-file character. The recipient will then hear a bleep, then receive your
message on screen, with a short header attached. The following is a typical exchange.
User lnp5jb types:

% write lnp8zz
Hi there - want to go to lunch?
^D
%

User lnp8zz will hear a beep and the following will appear on his/her screen:

Message from lnp5jb on sun050 at 12:42
Hi there - want to go to lunch?
EOF

If lnp8zz wasn't logged on, the sender would see the following:

% write lnp8zz
lnp8zz not logged in.
SunOS has the talk command. This has several advantages over write. Firstly, talk can
call other machines on a network. Secondly, talk provides a clearer interface for the
exchange of messages, dividing the screen into two windows for the interlocutors. Type

talk username@machine

to start a conversation.


--------------------------------------------------------------------------------

PRACTICE


--------------------------------------------------------------------------------

Try to have an extended on-line conversation with another user.

You can stop messages being flashed up on your screen if you wish. To turn off direct
communications type:

% mesg n

It will remain off for the remainder of your session, unless you type:

% mesg y

to turn the facility back on. Typing just mesg lets you know whether it is on or off.

Remote logins
It is possible to log on to another machine on a Unix network, provided that you have
permission to do so. To do this use the rlogin command. Type:

rlogin machine

and you will be asked for your password. It may be necessary for you to do this to make
on-line communications with another user easier.



--------------------------------------------------------------------------------

Exercises
1. Send a message to another user on your Unix system, and get them to reply.

2. Create a small text file and send it to another user.
3. When you receive a message, save it to a file other than your mailbox. (Remember you
can always send yourself a message if you don't have one.)

4. Send a message to a user on a different computer system.

5. Send a note to your course tutor telling him that you can use mail now.

FILE PERMISSIONS

--------------------------------------------------------------------------------

What are file permissions?
The Unix file security system can prevent unauthorised users from reading or altering
files.

Every file and directory has specific permissions associated with it, giving different
categories of user certain permissions to look at or change a file, and to run executable
files.

NOTE Executable files are files containing commands than can themselves be executed
as if the file itself were a command.

The file permissions can be displayed using the command:

% ls -l [filename]

For example, to display the permissions on the file lookup.icn, type the command:

% ls -l lookup.icn
-rw-r--r-- 1 lnp5jb 777 Dec 18 lookup.icn

The first set of characters in the output from the command (-rw-r--r--) gives the
permissions. The username in the middle of the line (lnp5jb) is the owner of the file. This
is user who created the file. The following fields tell you the number of characters in the
file, the date it was created and the name of the file.

Note that the first character specifies the file type. This is normally one of the following:

- indicates a file

d indicates a directory

The following nine characters represent permissions for different classes of users. Users
on a Unix system are assigned to a group or groups, which might correspond to a
particular department, or research group in the real world. Members of a particular group
can be allowed access to files belonging to other members of the group.

The second, third and fourth characters in the permissions string represent permissions
that apply to the owner of the file. The next three characters apply to members of the
owner's group. The last three apply to all other users. The file in this example therefore
has rw- for the owner, r-- for the group and r-- for others.

The three characters corresponding to each class of user each represent a different type of
permission. The first character represents 'read' permission. This means that a user has
permission to open a file and view the contents. If there is an r in this position then that
class of users has read permission. In this example all users have read permission. In this,
and in every case, a horizontal bar character (-) means that permission is denied.

The second position represents 'write' permission (the right to make changes to a file). In
the example, only the owner has write permission. Normally, you will not want others to
be allowed to make changes to your files, so write permission is only allowed to the
owner.

The third position represents 'execute permission'. This means permission to 'execute', or
run, a file that works like a command. In this example no-one has execute permission for
the file lookup.icn (it is an Icon program, and it would have to be compiled before it
could be executed, so execute permission would be useless). To summarise the above,
this is how the permissions string is divided up:

  -                     rw-          r--                       r--
type of file      owner group others

Here is another example, this time an executable file:

-rwxr-x--x 1     lnp5jb 562 Jan 10 hello

This tells us that hello is a file; the owner is lnp5jb, the owner has read, write and execute
permission; the group has read and execute permission; others just have execute
permission.


--------------------------------------------------------------------------------

PRACTICE


--------------------------------------------------------------------------------

What are the default permissions for your files and directories? Are they all the same?
When you copy a file what file permissions does the new file have?

Changing file permissions
The command chmod is used to change the permissions on a file. The format of this
command is:

% chmod mode filename

For example, to add read permission for the group to the file file1, give the command:

% chmod g+r file1

chmod modes
In the command:

% chmod mode filename

the mode consists of three elements:

who

operator

permissions

The following options are possible:

who:
u user (owner)

g group

o other

a all

operators:
- remove permission

+ add permission

= assign permission

permissions:
r read
w write

x execute

For example:

chmod o-rw file1.f

removes read and write permissions from others.

chmod u+x test

adds execute permission to the owner.

Permissions for directories
Read, write and execute permissions are set for directories as well as files. Read
permission means that the user may see the contents of a directory (e.g. use ls for this
directory.) Write permission means that a user may create files in the directory. Execute
permission means that the user may enter the directory (i.e. make it his current directory.)



--------------------------------------------------------------------------------

Exercises
1. Try to move to the home directory of someone else in your group. There are several
ways to do this, and you may find that you are not permitted to enter certain directories.
See what files they have, and what the file permissions are. (Remember that you can
protect your own files from prying eyes, or from interference.)

2. Try to copy a file from another user's directory to your own.

3. Set permissions on all of your files and directories to those that you want. You may
want to give read permission on some of your files and directories to members of your
group.

STANDARD INPUT AND OUTPUT

--------------------------------------------------------------------------------

Standard input
Input to Unix commands is normally given from the keyboard. For example you can use
the cat command interactively:

% cat
Hello               - you typeHello                   - responsethere              - you typethere
- response^D                 - you type%

Note that input from the keyboard is terminated with the end-of-file character, usually
^D. For another example consider the spell command, which is the unix spelling checker:

% spell                       - you typeInput to the spell ulitity      - you typeis typed at
the keyboard             - you type D                        - you typeulitity
- response

The spell command outputs words that are incorrectly spelled in the input.

Standard output
Output from Unix commands is normally displayed on the screen. For example:

% spell
Input to the spell ulitity
is typed at the keyboard
^D
ulitity                    - output


--------------------------------------------------------------------------------

PRACTICE


--------------------------------------------------------------------------------

Try out the spell checker. See how it copes with British spellings (remember it's an
American system), proper nouns, hyphens and recently coined vocabulary.

Re-direction of standard input
It is possible to redirect standard input so that the input is taken from a file. Imagine you
wish to check for spelling errors in a report. A text can be put into the file report, which
can be fed into the spell command:

% cat > report
Input to the spell ulitity
can come from a file
^D
% spell < report
ulitity

The < character is used to re-direct the input from the file report to the command spell.
The general format for re-direction of user input is:
command < filename

Another common use of re-direction of standard input is to mail a file to another user.
The command:

% mail lnp8zz < report

will mail the file report to local user lnp8zz.

Re-direction of standard output
You do not always want the output from a Unix command to be displayed on the screen.
It has already been shown how it is possible to direct the output from the cat command to
a file. Imagine you want a list of your files and directories kept in a file. You would use
the command:

% ls > filelist

The > character is used to re-direct the output from the command to the file called filelist.
The general format for re-direction of user output is:

% command > filename

Note that output directed to the file /dev/null is effectively discarded. This is the system
'wastebasket'.

Another example involves directing the output of echo to a file:

echo quot;Hello therequot; > greeting

This would normally overwrite any existing contents of the file greeting. Study the
following sequence:

% echo quot;Hello therequot; > greeting
% cat greeting
Hello there
% echo quot;This insteadquot; > greeting
% cat greeting
This instead

It is possible to append output to a file, rather than overwriting it, by using the >>
operator. For example:

% echo quot;Hello therequot; > greeting
% cat greeting
Hello there
% echo quot;and goodbyequot; >> greeting
% cat greeting
Hello there
and goodbye

Look carefully at the difference between these two examples.

Re-direction of input and output
It is possible to re-direct both standard input and output. If you have a report containing
many spelling mistakes you may wish to keep a list of the mistakes in a file. You can do
this using the following command:

% spell < report > errors

Piping
Output from one command can be sent ('piped') to the input of another command using
the | character:

command1 | command2

A common use for pipes is to control the output of large files to the screen. It is possible
to send output to the more command so that only one screenful at a time is output. If the
command

% ls -l

is used to give a long listing of all files and directories there may be too many lines to see
them all at once on the screen. (If you don't have many files, move to /etc where there
should be plenty.) Output from ls -l can be piped to more as follows:

% ls -l /etc | more

You can then use the usual more commands to control the output.

In the output from ls -l, directories are identified by the d character at the start of each
line. A list of just the directories can be obtained by piping the output of this command to
the grep command, giving grep an option which will list only lines containing the d
character at the start of the line. The command is:

% ls -l | grep quot;^dquot;

The commands sort and grep are often used when piping. For example:

% cat phonenos | sort | lpr
will send an alphabetically sorted list of the phone numbers contained in the file
phonenos to the line printer. The command:

% cat phonenos | grep leeds | sort | lpr

will send a sorted list of phone numbers containing the string 'leeds' to the line printer.



--------------------------------------------------------------------------------

Exercises
1. Put a listing of the files in your directory into a file called filelist. (Then delete it!)

2. Create a text file containing a short story, then use the spell program to check the
spelling of the words in the file.

3. Redirect the output of the spell program to a file called errors.

4. Type the command ls -l and examine the format of the output. Pipe the output of the
command ls -l to the word count program wc to obtain a count of the number of files in
your directory.

AN INTRODUCTION TO THE EX LINE EDITOR

--------------------------------------------------------------------------------

What's ex for?
Editors available on Unix include:

ed basic line editor

ex line editor

vi screen editor

emacs screen editor

Ex is an enhanced and more friendly version of ed. Vi is a screen-based version of ex.
Most users have no practical use for a line editor nowadays, and they are really a relic of
an earlier age in computing. However, you may occasionally have to use ex, if for some
reason you can't run a screen editor on your terminal. It is covered here mainly to teach
something else, namely, the way that Unix handles texts. This is perhaps most transparent
when you are using ex. Ex forces the user to use complicated pattern matching operations
to do things that are comparatively easy with a screen editor, such as making correcting
small typing errors in the text. While taking this approach may at times seem
unnecessarily difficult, it should be remembered that what follows here is just a stepping
stone to other Unix utilities, such as vi (which you are far more likely to want to use as an
editor than ex), and commands that use regular expressions, such as grep, tr and awk.
Learning to use ex involves skills necessary for getting the most out of these utilities.

Using ex
Starting ex
The command ex is used to invoke the editor. The format of this command is:

% ex [filename]

A filename can be supplied if you wish to edit an existing file.

% ex oldfile
quot;oldfilequot; 10 lines 465 characters
:

Alternatively the filename may be used as the name of a new file:

% ex newfile
quot;newfilequot; [Newfile]
:

notice that the prompt for ex commands is the ':' character.

Adding Text
To enter text simply type the command a (short for append), and then type in the text, as
follows:

:a
This is the text

Input is terminated by typing a full stop ('.') on a new line:

:a
This is just one line of text
.
:

The command i is used to insert text before the current line.

Saving Your Data
The command w (short for 'write') is used to save your data. The format of this command
is:

:w [filename]
If no filename is specified, the filename given when ex was invoked will be used. E.g.:

:w test.f
test.f 50 lines 576 characters
:

The number of lines and characters in the file will be displayed.

Quitting the Editor
The command q (short for 'quit') is used to quit the editor. Note that if changes have been
made to the file and have not been saved the editor will respond with a warning message:

No write since last change (:quit! overrides)

The command quit! (or just q!) must be given if you wish to quit without saving your
changes:

Displaying Lines in the File
The p command (for 'print') used to display lines in the file. The format of this command
is:

:[line_range] p

If no range is supplied the current line is displayed.

Pressing <RETURN> is equivalent to moving on to and displaying the next line. With
small files it is possible to display the entire file by pressing <RETURN> until the end of
the file is reached.

Line Ranges
Ranges of lines that can be given to edit commands include:

Absolute line number

6 refers to line 6

1,6 refers to lines 1 to 6

Relative line numbers

-2 refers to 2 lines before the current line

+3 refers to 3 lines after the current line

-2,+3 refers to a range from 2 lines before the current line to 3 lines after the current line
Special symbols

$ refers to the last line in the file e.g. $p to display last line, 1,$p to display entire file

. refers to the current line e.g. .,$p to display from the current line to the end

Examples:

6d                     - deletes lines the sixth line1,6d                  - deletes the
first six lines1,$d                     - deletes all lines3a              - append text
after line three.,+10w new - saves the next ten lines to a file called new

The = operator gives the line number, with the last line the default, so typing = gives you
the number of lines in a text. The number of the current line is obtained by typing .=.

Deleting Lines
The d command is used to delete lines. The format of this command is:

:[line_range] d

If no line number is given the current line will be deleted. It is possible to supply a range
of lines. For example:

:1,$d

will delete the entire file.

Searching
Searches are carried out by including the search string in slashes ('/'):

/string/

The search will start at the current line.

:/Jane/
This is Jane's file

The special characters '^' and '$' can be used to assist the search. For example:

/^This/           will find a line beginning with 'This'/file$/            will find a line ending
in 'file'

The last string searched for is the default string. This means that you can repeat a search
just by typing //.
Reverse Searches
Reverse searches are carried out by including the search string in question marks ('?'):

:?string?

The search will start at the current line and search backwards through the file.

Making Substitutions
The s command is used to make substitutions. The format of this command is:

:[line_range]s/old_string/new_string/

If no line number is given substitutions will be made only on the current line. For
example:

:s/old/new/

will substitute the first occurrence of the string 'old' with 'new' on the current line. The
command:

:.,$s/old/new/

will substitute the first occurrence of the string 'old' with 'new' in every line from the
current line to the end of the file.

Global Substitutions
The g command (for 'global') is used to make multiple substitutions on a line. For
example:

:s/old/new/g

will substitute all occurrences of the string 'old' with 'new' on the current line. The
command:

:1,$s/old/new/g

will substitute all occurrences of the string 'old' with 'new' in the file.

Search strings can also be used in conjuction with the s command in order to carry out
more sophisticated global changes. The line range preceding a substitution string may
include a search for the string to changed. For example:

:g/old/s//new/g
This means 'search globally for 'old', then replace every occurrence with 'new'.
Remember the null string (in s//) stands for the last RE, in this case the RE 'old'. This is
the same as:

:1,$s/old/new/g

Additional ex facilities
Additional commands available using the ex editor include:

c replaces lines

t transfers lines

m moves lines

j joins lines

l shows invisible characters

f gives the name of the file being edited

r inserts named file

e edits named file

u undo last change

The commands m and t above work in a similar way, in that they require two line
addresses, one before and one after the command. The address in front refers to the
source and the address after the destination. If either is omitted, the current line is
assumed. Line addresses may be ranges, allowing blocks of text to be moved. Here are a
few examples of commands:

:.m2

This moves the current line to a position after line 2.

:1,.m$

This moves a block (line 1 to the current line) to the end of the text.

:1,.t$

This copies the block at the end of the text, leaving the original block untouched.
--------------------------------------------------------------------------------

Exercises
1. Create a file using ex. Put the text of a message in the file and then mail it to someone
(see chapter on mail).


--------------------------------------------------------------------------------

2. Use ex to explore the file /etc/passwd. Search for your own listing, and those of others
in your group. (You won't be able to save changes to the file).

3. Find a text file to which you have access and copy it to your home directory. Try
making some changes to it.


REGULAR EXPRESSIONS

--------------------------------------------------------------------------------

What are regular expressions?
A regular expression (RE) is a string of characters that can be used to match a set of
character strings. For example, to globally search for all occurrences of the word quot;andquot;
would require a search for quot;andquot;, quot;Andquot;, quot;AnDquot;, quot;ANDquot;, etc. Without regular
expressions finding all possible occurrences of quot;andquot; would require eight separate
searches. Using an RE the search could be done with one command.

Regular expressions are used by many Unix utilities, including:

ed

ex

vi

grep

sed

awk
(The awk utility interprets a special-purpose programming language that makes it
possible to handle simple data-reformatting jobs easily with just a few lines of code. Awk
is not covered in this course, but the GAWK Manual is a good guide to its use.)

Regular expressions are used in searches and substitutions.
Character strings
A character string is the simplest regular expression which simply matches the string
itself. For example:

/hello/                  - matches 'hello's/hello/goodbye/         - matches 'hello' and makes a
substitution

Matching single characters
The '.' character is used to match a single character. For example:

/p.t/   - matches 'p' and 't' separated by a single character, e.g. 'pit', 'put', 'pot', etc.

Sets of characters
The expression /RE/ is used to match a set of characters in a single character position. For
example:

/x[ab2X]y/      - matches any of the following:
xay
xby
x2y
xXy

In the expression /[RE]/ a range of characters can be specified. For example:

[a-z]   - matches any single lower case character[0-9]             - matches any single digit

Note however:

[0-57] - matches any one of the following:0 1 2 3 4 5 7

i.e. 0-5 and 7. Sets of characters can be combined:

[a-d5-8X-Z]     - matches any one of the following:a b c d 5 6 7 8 X Y Z

It is possible to specify a set of characters which are not to be matched in the RE. For
example:

[^0-9] - matches any single character which is not a digit

Anchors
An anchor is used to match a RE found at a particular position. For example:

/^RE/ - matches RE at the start of a line
/RE$/ - matches RE at the end of a line
/^RE$/ - matches RE as the whole line
Note that there are two separate uses of the '^' operator. One is as the sart of line anchor,
and the other as the 'logical not' operator. The latter function only applies inside square
brackets.

Repetitions
Multiple occurrences of REs can be specified. For example:

a*       - matches 0 or more occurrences of 'a'aa*     - matches 1 or more occurrences of
'a'.*    - matches any string of characters

Remembered regular expressions
A null RE stands for the last RE. For example:

:/[Tt]he.*car/p
The blue car exploded with a roar.
:s//(The blue car)/p
(The blue car) exploded with a roar.

The '&' character in a replacement string stands for the most recently matched string. For
example:

:/[Tt]he.*car/p
The blue car exploded with a roar.
:s//(&)/p
(The blue car) exploded with a roar.

Sub-expressions
A sub-expression in a RE can be referred to.

(string)       - defines an RE sub-expressionn      - refers to the nth RE sub-expression

NOTE The backslash is the escape character for REs. This means it neutralises the
special meanings of special characters. For example:

:p
A line of text
:s/(line).*(text)/21/p
A text line
:*

Repetition
It is possible to specify multiple occurrences of REs. For example:

c{4}         matches exactly 4 c'sc{4,}            matches 4 or more c'sc{2,4}
         matches between 2 and 4 c's
For example, to find a line containing 5 digits:

/[0-9]{5}/

A summary of special characters
Special characters in the search string
start of line anchor (or NOT operator inside [] )

$ end of line anchor

. any character

* character repeated any number of times

 escape character

[ ] contains range of characters

Special characters in the replacement string
& string matched in search string

 escape character

Note that any regular expression can be used with grep. (It gets its name from the editor
command g/RE/p which means 'globally search for RE and print it'). This opens up many
new possibilities for the use of grep. Unix commands that use regular expressions often
makes the use of an editor redundant.



--------------------------------------------------------------------------------
PRACTICE
Obtain a listing of the members of your group from the password file using grep.


--------------------------------------------------------------------------------

Introduction to sed
sed is a non-interactive stream editor which is used for text. The command to invoke sed
is:

sed [-n] [-e command] [-f edfile] [input_file]

For example:
sed quot;s/UNIX/Unix/gquot; thesis > thesis.new

This will process the file thesis line by line, outputting each line to the file thesis.new and
replacing each occurrence of the string quot;UNIXquot; with quot;Unixquot;.

In the above example every line of thesis will be output to thesis.new, irrespective of
whether it has been changed or not. This is because the default output for sed is every line
of the input. Using the -n option supresses the default output, and only specified lines are
output. In the above example this would mean that no lines would be output in the
following example:

sed -n quot;s/UNIX/Unix/gquot; thesis > thesis.new

since a change but no output has been specified. If a print command is added, as follows:

sed -n quot;s/UNIX/Unix/gpquot; thesis > thesis.new

then only those lines in which quot;UNIXquot; had been changed to quot;Unixquot; would be output.

As you also see in the example, the -e option is not not necessary when there is only one
editor command. It is possible to specify more than one command, and in this case each
must be preceded by -e. For example:

% sed -e quot;s/a/A/quot; -e quot;s/b/B/quot; file1 > file2

This command will carry out the two substitutions on each line of file1.

The -f option enables the user to use a file containing editor commands, instead of typing
out a series of commands with the -e option.

sed examples
The sed command to list only files (exclude directories) is:

% ls -l | sed -n quot;/ -/pquot;
-rw------- 1 lnp5jb  1765 mbox
-rw------- 1 lnp5jb   320 example1

The sed command to extract a list of usernames from the password file is:

% sed quot;s/:.*//quot; /etc/passwd | more

What this does is to delete everything that comes after ':' in the password file.



--------------------------------------------------------------------------------
Exercises
1. Reproduce the effects of the above sed examples using grep instead. Note that grep is
generally better for searches, such as this, while sed can be used to make changes to files.

2. Find the system's games directory and type quiz function ed-command to do the ed
commands quiz. Don't worry if there are a couple of things that you haven't come across.
Try it again and see if you improve your score.


PROCESSING LARGE TEXT CORPORA

--------------------------------------------------------------------------------

This section will focus on exploiting large files containing linguistic material with the use
of the commands already covered plus many more.

Compressed files
Often large files are compressed to save disk space. If this is the case then the user must
make the file revert to it's original format in order to be able to do anything with it. A
popular compressing command is called, simply, compress. The command:

% compress filename

will cause the file to be replaced by a compressed file with a .Z suffix. The command
uncompress will cause it to revert to its original format. It is often not necessary to
uncompress a file to use it. In fact, the file will often be owned by someone else, and you
would have to copy it and then uncompress it, using up a great deal of disk space and
processor time. It is often better to use the zcat which sends the uncompressed contents of
a compressed file to the standard output, while leaving the compressed version of the file
in the filestore.



--------------------------------------------------------------------------------
PRACTICE
Try compressing and uncompressing some of your own files.

Find a large compressed file on your system and search it for some appropriate string
using grep without uncompressing the file.


--------------------------------------------------------------------------------

Some useful commands for processing text files
The following is a summary of some useful commands for processing text files, some of
which you have met already, some of which are new to you. Both have been included so
that this section can easily be used for reference purposes. Not all of these commands are
standard Unix, so they may not all work in the way you expect (or at all) on your system.
For the same reasons, their syntax is somewhat incongruous and some use different input
and output conventions. Not all are included in the command summary in the appendix
below. See the relevant manual pages for more details.

sort sort into alphabetical order

sort -n sort into numerical order

sort -m merge sorted files into one sorted file

sort -r sort into reverse order (highest first)

sort -c check a file is already sorted

uniq remove duplicate lines (or partly-duplicate lines)

uniq -d output only duplicate lines

uniq -c count identical lines (or lines with identical fields)

grep find lines containing given string or pattern

grep -v find lines not containing given string or pattern

grep -c count lines containing given string or pattern

grep -n give line numbers of lines containing...

fgrep same as grep except that it does not recognise regular expressions

egrep same as grep except that it recognises all REs grep only recognises certain special
characters

wc -c count characters

wc -w count words

wc -l count lines

NOTE
wc -l file will output the number of lines in the file, and the file name.
wc -l < file just gives the bare line count.

head -17 output first 17 lines

tail -17 output last 17 lines

tail +30 output from line 30

cut -f3 delete all but third field of each line

cut -f3,5 delete all but third and fifth fields of each line

cut -f3-5,7 delete all but 3rd, 4th, 5th, 7th fields of each line

cut -c-4,6-8 delete all but 2nd 3rd 4th, 6th 7th 8th characters

cut -f2 -dquot;:quot; deletes all but the second field where quot;:quot; is the field delimiter (tab is the
default)

paste combines files horizontally; corresponding lines are appended

paste -dquot;>quot; pastes with delimiter defined as quot;>quot; (tab is default). The special characters
quot;nquot; (newline) and quot;0quot; (null string) may be used.

cat concatenates file vertically (appends files to one another)

cat -n precedes each line with a line number in the output

cat -b as above, but does not number blank lines

cat -s reduces any number of successive blank lines to one blank line

tr quot;abc-equot; quot;kmx-zquot; translates a, b, c, d, e to k, m, x, y, z respectively.

tr -d quot;xyquot; deletes all occurrences of x and y

tr -s quot;aquot; quot;bquot; translates all a to b and reduces any string of consecutive b to just one b.

To go down to the character, rather than field, level, sed is simplest for line by line
processing. sed looks for patterns, so is not very good with column or field positions.

uniq needs an already-sorted file. A common idiom is

sort | uniq
to produce a sorted list of all the different lines in a file. uniq has a peculiar way of
spacing its output, so it is difficult to use in a pipeline with another command such as cut.
tr is useful for converting blanks to newlines (hence converting a text to a vertical list of
words, which can then be sorted, counted etc.). The command:

% tr quot; quot; quot;012quot; < filename

will do this. 012 is the octal code for the linefeed character. This is also useful for
converting strings of blanks or tabs to single characters. 011 is the octal code for the tab
character.



--------------------------------------------------------------------------------
PRACTICE
Try out the following pipeline on a text file:


--------------------------------------------------------------------------------

tr quot; quot; quot;012quot; < input_file | sort | uniq > output_file


--------------------------------------------------------------------------------

Using language corpora
A corpus (plural corpora) is a collection of language data. The corpora with which we
will be concerned here are electronic, that is they are stored in a computer. Corpora may
contain data about written or spoken language. They usually contain texts from one
language, but they may also be multilingual. Corpora are usually designed and collated
for a specific purpose. Many of the major corpora in use today aim to be representative of
different domains of language use, and can facilitate comparative studies. For example,
the average length of words in academic texts and newspaper reports could be compared
by measuring words in texts from these two domains. Computers obviously make this
type of number-crunching (or word-crunching) activity much easier than it would be if
you had to count words and letters in a printed text. Corpora are particularly useful for
checking the intuitions that we have and the generalisations that are made about language
use.

Unix commands can be used to extract information from language corpora. The
commands learned in this course can be used for issuing commands and writing simple
scripts that can be used to extract information from language corpora.

Types of Corpora
There are many types of corpora, defined by the types of language that they represent and
the formats in which that information is stored. Unix commands for handling strings are
sufficiently flexible to handle many different formats. Users however need to be sensitive
to the arcane minutiae of the format and markup of the different corpora that they use.
The 'l' command in the vi editor can be used to view hidden characters (such as spaces
and tabs) in a file.

The LOB and Brown corpora
Brown and LOB are parallel corpora, with very similar formats and tagging. Brown,
which was constructed first, represents different types of written American English. LOB
represents the same categories of British English. All words are lemmatised and given a
word class tag. Here is a sample from the so-called 'vertical tagged' version of Brown:

^N01002001     -----   ----- -----
N01002010      -       NP    Alastair
N01002020      -       BEDZ was
N01002030      -       AT    a
N01002040      -       NN    bachelor
N01002041      -       .     .
^N01002042     -----   ----- -----
N01002050      -       ABN all
N01002060      -       PP$ his
N01002070      -       NN    life
N01002080      -       PP3A he
N01002090      -       HVD had
N01002100      -       BEN been
N01002110      -       VBN inclined
N01002120      -       TO    to
N01003010      -       VB    regard
N01003020      -       NNS women
N01003030      -       IN    as
N01003040      -       PN    something
N01003050      -       WDTRwhich
N01003060      -       MD must
N01003070      -       RB    necessarily
N01003080      -       BE    be
N01003090      -       VBN subordinated
N01003100      -       IN    to
N01004010      -       PP$ his

And the 'untagged' version of the same passage, plus the following lines:

N01 0010 DAN MORGAN TOLD HIMSELF HE WOULD FORGET Ann Turner. He
N01 0020 was well rid of her. He certainly didn't want a wife who was fickle
N01 0030 as Ann. If he had married her, he'd have been asking for trouble.
N01 0010 DAN MORGAN TOLD HIMSELF HE WOULD FORGET Ann Turner. He
N01 0020 was well rid of her. He certainly didn't want a wife who was fickle
N01 0030 as Ann. If he had married her, he'd have been asking for trouble.
N01 0040 But all of this was rationalization. Sometimes he woke up in
N01 0050 the middle of the night thinking of Ann, and then could not get back
N01 0060 to sleep. His plans and dreams had revolved around her so much and for
N01 0070 so long that now he felt as if he had nothing. The easiest thing would
N01 0080 be to sell out to Al Budd and leave the country, but there was
N01 0090 a stubborn streak in him that wouldn't allow it. The best antidote
N01 0100 for the bitterness and disappointment that poisoned him was hard
N01 0110 work. He found that if he was tired enough at night, he went to sleep

Users can choose the version (from those available to them) which includes the
information that they need. If you are only interested in word frequencies, then the
grammatical information encoded in the tagged version is redundant, and the untagged
version can be used. If however you are looking for the word 'set' used as a noun, then it
would be necessary to use a tagged version, so that this word can be differentiated from
'set' used as a verb or adjective.

Processing LOB and Brown
The Susanne corpus
This corpus uses a section of the Brown corpus and marks it up with syntactic
information.

N01:0010a      -      YB   <minbrk>      -       [Oh.Oh]
N01:0010b      -      NP1m DAN Dan [O[S[Nns:s.
N01:0010c      -      NP1s MORGAN        Morgan          .Nns:s]
N01:0010d      -      VVDv TOLD tell     [Vd.Vd]
N01:0010e      -      PPX1m        HIMSELF       himself         [Nos:i.Nos:i]
N01:0010f      -      PPHS1m       HE    he      [Fn:o[Nas:s.Nas:s]
N01:0010g      -      VMd WOULD          will [Vdc.
N01:0010h      -      VV0v FORGET        forget .Vdc]
N01:0010i      -      NP1f Ann Ann [Nns:o.
N01:0010j      -      NP1s Turner Turner .Nns:o]Fn:o]S]
N01:0010k      -      YF   +.      -     .
N01:0010m      -      PPHS1m       He    he      [S[Nas:s.Nas:s]
N01:0020a      -      VBDZ was be        [Vsb.Vsb]
N01:0020b      -      RR   well well [Tn:e[R:h.R:h]
N01:0020c      -      VVNt rid     rid   [Vn.Vn]
N01:0020d      -      IO   of      of    [Po:u.
N01:0020e      -      PPHO1f       her   she     .Po:u]Tn:e]S]
N01:0020f      -      YF   +.      -     .
N01:0020g      -      PPHS1m       He    he      [S[Nas:s.Nas:s]
N01:0020h      -      RR   certainly     certainly       [R:m.R:m]
N01:0020i      -      VDD did      do    [Vde.
N01:0020j      -      XX   +n<apos>t     not     .
N01:0020k      -      VV0v want want .Vde]
N01:0020m      -      AT1 a        a     [Ns:o101.
N01:0020n      -      NN1c wife wife .
N01:0020p       -       PNQSr who       who     [Fr[Nq:s101.Nq:s101]

The London-Lund corpus
This corpus differs from the others that we have looked at because it is a transcription of
spoken English. Intonation is marked.

1 1 1 10 1 1 B 11 ((of ^Spanish)) . graphology#/

1 1 1 20 1 1 A 11 ^w=ell# ./

1 1 1 30 1 1 A 11 ((if)) did ^y/ou _set _that# - /

1 1 1 40 1 1 B 11 ^well !Joe and _I#/

1 1 1 50 1 1 B 11 ^set it between _us#/

1 1 1 60 1 1 B 11 ^actually !Joe 'set the :paper#/

1 1 1 70 1 1 B 20 and *((3 to 4 sylls))*/

1 1 1 80 1 1 A 11 *^w=ell# ./

1 1 1 90 1 1 A 11 quot;^m/ay* I _ask#/

1 1 1 100 1 1 A 11 ^what goes !into that paper n/ow#/

1 1 1 110 1 1 A 11 be^cause I !have to adv=ise# ./

1 1 1 120 1 1 A 21 ((a)) ^couple of people who are !doing [dhi: @]/

1 1 1 130 1 1 B 11 well ^what you :d/o#/

1 1 1 140 1 2 B 12 ^is to - - ^this is sort of be:tween the :tw/o of /

1 1 1 140 1 1 B 12 _us# /

1 1 1 150 1 1 B 11 ^what *you* :d/o#/

1 1 1 160 2 1 B 23 is to ^make sure that your 'own . !candidate/

1 1 1 170 1 1 A 11 *^[m]#*/

1 1 1 160 1 2(B 13 is . *.* ^that your . there`s ^something that your /

1 1 1 160 1 1(B 13 :own candidate can :h/andle# - -/
CUVOALD
This acronym stands for the Computer Usable Version of the Oxford Advanced Learners
Dictionary. There are in fact two versions. The most useful is usually in a file called
cuv2.dat contains 68742 words including inflected forms and proper nouns. It is most
often of use as a wordlist, but the file also contains a phonemic transcription and a part-
of-speech tag for every word. Here is a sample of cuv2.dat:

verbs v3bz Kj
verdancy         'v3dnsIL@
verdant          'v3dnt OA
verdict 'v3dIkt K6
verdicts         'v3dIkts      Kj
verdigris        'v3dIgrIs     L@
verdure          'v3dj@R       L@
verge v3dZ I2,K6 3A
verged v3dZd Ic,Id 3A
verger 'v3dZ@R           K6
vergers'v3dZ@z           Kj
verges 'v3dZIz           Ia,Kj 3A
verging          'v3dZIN       Ib    3A
verifiable       'verIfaI@bl OA
verification ,verIfI'keISn M6
verifications ,verIfI'keISnz Mj
verified         'verIfaId     Hc,Hd 6A
verifies         'verIfaIz     Ha    6A
verify 'verIfaI H3       6A
verifying        'verIfaIIN    Hb    6A
verily 'ver@lIPu
verisimilitude ,verIsI'mIlItjud      M6
verisimilitudes,verIsI'mIlItjudz     Mj
veritable        'verIt@bl     OA
verities'verItIz Mj
verity 'verItI M8
vermicelli       ,v3mI'selI    L@
vermiform        'v3mIfOm      OA
vermilion        v@'mIlI@n M6,OA

The coding conventions for the phonemic and syntactic tags are explained in a file that
comes with dictionary. Some examples of applications that use the dictionary can be
found in the appendix of this course.

Other texts
Corpus building is currently a growth area, and there are many, many more corpora as
well as the above examples. Currently available or under construction are a number of
very large corpora, comprehensive corpora aiming to cover all registers of English,
international English corpora, corpora of different languages and specialised corpora
covering a single well-defined domain of language.



--------------------------------------------------------------------------------

Exercises
1. Find a large text file with a fixed field format (e.g. the Brown or LOB corpora) and
inspect the format. Use zcat to view it if necessary.

3. Use cut to strip away the reference material and leave just the text field.

4. Use tr to strip away any tags that are actually in the text (e.g. attached to the words), so
that you are left with just the words.

5. Make a sorted wordlist from the file.

6. Combine the above commands in a shell script so that you have a small program for
extracting a wordlist.


INTRODUCTION TO THE VI SCREEN EDITOR

--------------------------------------------------------------------------------

What is vi
Vi is a screen editor. This means that you can see part of the file in a window on the
screen, and editing operations can be controlled by moving a cursor around the text on
screen.

Vi works in a different way from the editing functions of modern word processors. It's
effective use requires a considerable amount of expertise on the part of the user. The user
must have the ability to remember and manipulate opaquely named one-letter commands
that can be combined in an arbitrary variety of different ways.

Vi is a screen-based version of ex. It's lack of user-friendliness is largely a result of this.
In many ways it still works like a line editor, with complicated commands typed in by the
user.

The main enhancements on ex are the window, which enables you to constantly view part
or all of the file, the visible cursor and the commands that can be issued without moving
to the command line. Once you have learned to start vi, you will probably not need to use
ex again. Everything that you have learned with ex, you can do with vi. What is more,
with vi you have a window and the possibility to use interactive commands. The only
time that you might want to use ex now is if you have trouble running a screen-based
utility on your terminal.

Using vi
The next section lists the commands needed to start and use vi. In this section, the key
concepts underpinning the use of vi are explained so that you can understand what is
happening when you use it.

The first thing to understand is that there are three modes:

command mode:

insert mode

last line mode (or command line mode)

You start in command mode. The commands listed below for moving the cursor and
changing the file are entered in command mode. To enter a command simply type it at
the keyboard. What you type will not appear anywhere on screen. To abandon a
command you have started, you can type <ESC>. If you are not sure which mode you are
in at any time you can type <ESC> and return to command mode. When you leave the
other modes you return to command mode. Insert mode is used to enter text. Insert mode
is entered by issuing one of a variety of commands that involve entering text. Insert mode
must be exited in order to issue more commands. A common mistake made is to attempt
to enter a command while in insert mode, which results in the command appearing on
screen as part of the text.

Last line mode is entered from command mode, and enables the user to type a command
on the last line of the screen. Any ex command can be used in this way, simply by typing
':' followed by the command. The current line will be that where the cursor is positioned.

When you start vi you will see a screen similar to the one below. If you are starting a new
file, or the file you are editing is less than 18 lines long, then the empty lines in the
window will be marked by the '~' (tilde) character.


--------------------------------------------------------------------------------

--------------------------------------------------------------------------------

This is a small file called 'vi.prac'.
This is the second and last line.
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
quot;vi.pracquot; 2 lines 103 characters
A typical vi screen

Note that is necessary to press return at the end of each line of text that you enter.
Otherwise, vi will interpret all of your text as a single line!



--------------------------------------------------------------------------------
PRACTICE
Create a new file, enter several lines of text and save it.

Edit an existing file that you have, making several changes.


--------------------------------------------------------------------------------

vi reference
vi modes
command Normal and initial state. <ESC> cancels partial command
insert entered by the following commands: a, A, i, I, o, O, c, C, s, S, R. Terminates with
<ESC> (or ^C).
last line entered by :, /, ? or !. Input is read and echoed at the bottom of the screen.
Commands executed by <RETURN> or <ESC>, terminated by ^C.
Entering and leaving vi
% vi file edit file
% vi +n file edit starting at line n
% vi + file :edit starting at end
% vi +/RE/ file edit starting at RE
% view file read only mode
ZZ exit from vi, saving changes (same as :wq)
^Z stop vi process, for later resumption
Some simple commands
The following are examples of some compound commands, using the operators listed
later.

dw delete word
de delete word leaving punctuation
dd delete line
4dd delete 4 lines
xp transpose characters
cwtext<ESC> change word to text
File manipulation
The following are all last line mode commands, so must be preceded by a colon.

w save changes
wq save and quit
q quit
q! quit, discarding changes
e file edit file
e! re-edit current file, discarding changes
w file write to file
w! file overwrite file
! command execute shell command, then return
f show current file and line
Positioning within the file
^F forward one screenful
^B back one screenful
^D scroll down half screen
^U scroll up half screen
nG go to line n (last line default)
/RE/ go to next occurrence of RE
% find matching bracket
Marking
`` return to previous cursor position
mx mark position with x
`x go to mark x
Line positioning
H top line of window (home)
M middle line of window
L last line of window
+ next line, at first non-white character
- previous line, at first non-white character
<RETURN> same as +
j next line, same column (same as down arrow)
k previous line, same column (same as up arrow)
Character positioning
0 beginning of line
^ first non-white in line
$ end of line
<SPACE> forward (same as right arrow)
fx find x forwards in current line
Fx find x backwards in current line
; repeat last find command forwards
: repeat last find command backwards
n| go to column n
Words, sentences, paragraphs
w forward to start of next word (delimited by non-alphanumeric character)
b back to start of last word
e forward to end of next word
W as w, with word delimited by blank only
B as b, with word delimited by blank only
E as e, with word delimited by blank only
) forward to start of next sentence
( Back to start of next sentence
} Forward to start of next sentence
{ Back to start of last sentence
Corrections during insert
H erase last character (or your usual delete key)
W erase last word
 escape character
<ESC> ends insert; back to command mode
C ends insert
Insert and replace commands
a append after cursor
i insert before cursor
A append at end of line
I insert before first non-blank
o open line below current line
O open line above current line
rx replace single character with x
R replace characters
Operators
The following can be doubled to apply to a line and also preceded by a number to
indicate a number of lines. They can be combined with positional commands (e.g.d$ to
delete to end of line.)

d delete
c change
y yank
Miscellaneous operations
x delete character
X delete character to left of cursor
C change rest of line (same as c$).
D delete rest of line (same as d$)
J join lines
Y yank (paste) lines
Yank and put
p put back after cursor
P put back before cursor
quot;xp put from buffer x
quot;xy yank to buffer x
quot;xd delete to buffer x
Undo, redo and retrieve
u undo last change
U restore current line
. repeat last command
quot;np retrieve nth last delete


TEXT FORMATTING

--------------------------------------------------------------------------------

There are text formatting facilities available with all Unix implementations. They will not
be investigated in any detail here. Many users will prefer to use a PC-based word
processing package for document production. Those that want to format text on Unix will
have vastly differing needs, and it would be impossible to go into all of the possibilities
here. A flavour of the simpler programs is given here, and users can look elsewhere for
more extensive documentation.

pr
This is a filter that will format a text, giving a choice of columns, page width, length etc..
It is not capable of sophisticated formatting for document production.

nroff
The simplest of the proper formatters is nroff. You can format a plain text file with nroff,
by simply typing:

% nroff text_file

Formatting commands can be inserted into text files. Some simple commands:

 .ce                       centre text .ll                line length .pl                  page
length .po                         page offset (left margin) .sp                   blank line

These commands may be followed by a numerical argument, which will make the
command apply to the specified number of lines, e.g. .sp 3 to leave three blank lines.
Formatting commands must be placed at the beginning of a line to be recognised as such.
Normally they appear as the only text on a line. Commands are normally composed of
lower-case characters. Here is an example of a text containing some nroff instructions:
.ce
This is the title
.sp 2
And this is the text, which
will be formatted and justified when I run nroff. You will see
that the line
breaks will change, and the text will look tidier. That is what
formatting is all about.
.sp
That was a blank line.

The following is what the output from this file would look like:

This is the title
And this is the text, which will be formatted and justified when I run nroff. You will see
that the line breaks will change, and the text will look tidier. That is what formatting is
all about.
That was a blank line.

nroff macros
Macros are a special type of nroff command, identified by being in upper-case characters.
Standard macro libraries can be invoked by using option flags with the nroff command,
e.g.:

nroff -ms filename

for the standard macros. Other macro libraries can be invoked by the me, mn and mv
options. Here are some standard macros:

 .FS                          footnote starts .FE                           footnote
ends .ND                              no date .TL                           title .PP
                              start paragraph

The .PP tag, for example, is the equivalent of the following sequence of ordinary nroff
instructions:

.sp 5
.ce 1
.sp 5

It is possible write your macros.

More details on nroff can be found in the manual.

MORE ON THE SHELL
--------------------------------------------------------------------------------

General
The role of the shell
A Unix shell is used to:

evaluate the command line. For example:

% car nofile
car: Command not found

Here the shell looks for a command called car. Since it cannot find this command it gives
an error message.

perform variable substitution. For example:

% echo quot;In directory $HOMEquot;
In directory /home/sunserv1_b/lnp5jb

Here the shell variable $HOME is evaluated and displayed.

handle pipelines. For example:

% who | wc -l

Here the output from who is piped through to the wc command which displays a count of
the number of lines in its input.

Types of shells
A number of shells are available for Unix systems, including:

Bourne shell

C shell

Korn shell

Graphical User Interface (GUI) shells

The Bourne shell, which was developed by Steve Bourne at Bell Laboratories, is one of
the oldest shells and, as such, has gained a lot of popularity. It is widely used for shell
programming because of its efficiency and because it is available on all Unix systems.

The C shell provides sophisticated interactive capabilities lacking in the Bourne shell.
The C shell, which was developed at the University of California, Berkeley, has a syntax
which resembles the C language. Features of the C shell include a command history
buffer, command aliases and file name completion.

However the C shell does not allow efficient shell programs (also known as scripts) to be
written. Due to the fact that C shell programs are written in a style similar to the C
programming language, people who are unfamiliar with C may find the C shell difficult
to program in.

The Korn shell combines the best features of the Bourne and C shells. Korn scripts are
95% upwardly compatible with Bourne scripts. The Korn shell interactive features
include:

in-line editing

command editing

job control

Graphical User Interface (GUI) shells provide a iconic interface to Unix. GUI shells
require the use of workstations (or powerful microcomputers) which perform part of the
processing locally. The use of GUIs such as X-Windows is likely to become increasingly
important in the near future. GUIs currently available include:

Sun View A Sun-specific GUI

Open Look GUI standard supported by Sun

Motif GUI standard supported by other suppliers

Vista eXceed Available on PCs; similar in style to Motif

There is a battle currently taking place in the market-place to establish the standard GUI.

Recommended shells
The Bourne shell is the oldest shell, and is widely used. The C shell has more utilities
however and is probably more widely used now.


--------------------------------------------------------------------------------

The default shell for interactive shells at Leeds is the C shell. The Bourne shell is the
default for shell programs.


--------------------------------------------------------------------------------
However the Bourne shell is recommended for shell programs. The Korn shell is not
widely available and is not a standard part of Unix, but is perhaps the best option if
available, unless you want to do a lot of C programming. You can change your default
login shell using the command:

% chsh username /bin/sh                   Bourne shell% chsh username /bin/csh
                     C shell% chsh username /bin/ksh                 Korn shell

Warning! You probably don't want to try these commands now.

C shell features
The history mechanism
The history mechanism enables previous typed Unix commands to be re-invoked and
edited. There are two forms. One is the quick substitution, which acts only on the
immediately preceding command, e.g:

% car message
car: Command not found
% ^r^t
This is the message file

This command replaces the first occurrence of 'r' with 't' in the last command.

A list of previously entered commands can be displayed using the history command:

% history
1 cd texts
2 vi lookup
3 who
4 history

Commands can be re-entered using the number. For example:

% !2

will re-execute the second command (vi lookup). It is possible to add extra options to
commands re-executed. For example to redirect output from the who command to a file
called list we could give the command (for the above list):

% !3 > list

You may also edit previous commands e.g:

% !2:s/vi/cat/
cat lookup
although it is usually easier to re-type the whole command. The last command may be
referred to as !!, and you can count back using !-2, !-3 etc..

File name completion
Within the C shell when a file name is used in a command it is possible to specify only as
many characters as will uniquely identify the file, and then press the <ESC> key to
complete the filename:

% ls
mbox      message
% cat me<ESC>
This is the message file

When you type <ESC>, the file name will be extended to 'message' on screen.

Command aliases
Command aliases provide a way of customising commands. For example:

% alias dir ls
% dir
mbox     message

Note that command aliases are only valid during the execution of the current shell. It is
normal practice to include alias definitions in your .cshrc file.

The following aliases could be useful to shorten long command names:

alias hh history
alias ll 'ls -al'
alias q logout

The quotes around ls -al are necessary because of the space in the command. This tells
the shell that it is all one command.


--------------------------------------------------------------------------------

PRACTICE


--------------------------------------------------------------------------------

Put the above aliases in your .cshrc file. Think of some other aliases that you would use,
such as shortened versions of commands or different names for commands that you will
find easier to remember.
C shell startup files
Certain files are executed automatically.

These are:

.cshrc file

Executed whenever a new C shell spawned

Useful for specifying command aliases

Since C shells may be spawned automatically be certain systems commands (such as the
mail system of a compiler) this file should NOT contain commands which send output to
your terminal.

Contains a list of directories that are searched for commands. A line in the .cshrc file will
give a value to the PATH system variable. The user can add pathnames to this list. It is
conventional to store any of your own commands or shell scripts that you will use
frequently directory called bin, and to add ^/bin to your search path.

.login file

Executed when you login.

Use for setting system wide variables, such as your terminal type.

Can be used to display information, such as who is logged on, or news from the system
managers.

Shell processes
A process is an executing program. To display a list of processes use the ps command:

% ps
PID TTY TIME COMMAND
23268 ttyp1 0:01 ps
22520 ttyp1 0:00 csh

The PID specifies the Process Identifier. The 'time' field gives the amount of CPU used
by the process.

Background processes
Normally processes run interactively, but they may also be run interactively, to enable the
user to do something else while a process is running (this is known as 'multitasking').
This is usually necessary when you are running a very long job. To run a command in the
background use the & character at the end of the command line, as follows:
% command &

Note that output from command will still be sent to standard output. If you fail to redirect
standard output it will be sent to your terminal where it is likely to be confused with
output from your interactive process.

For example, to sort logged on users using a background process give the command:

% who | sort > sortedwho &

Note that this would normally be a very short process and you would not in fact need to
run it in the background.

Controlling processes
You may wish to terminate a background process. To do this first you must first find out
its process id (PID) using ps:

% ps
PID TTY TIME COMMAND
23397 ttyp1 0:01 who
23268 ttyp1 0:02 ps
22520 ttyp1 0:00 csh

Then use the kill command to terminate your process.

For example:

% kill 23397

If the process continues use the -9 argument:

% kill -9 23397

Another way of displaying your background processes is to use the jobs command:

% jobs
[1] + Running who - sort > sortedwho

The background process (or 'job') has been assigned the number 1, and this can be used to
refer to it instead of the process i.d.. The job number is usually identified by preceding it
with the '%' (per cent) character, so as to differentiate it from a process i.d.. So, for
example, the command:

% kill %1
Unix Quick Learn
Unix Quick Learn
Unix Quick Learn
Unix Quick Learn
Unix Quick Learn
Unix Quick Learn
Unix Quick Learn
Unix Quick Learn
Unix Quick Learn
Unix Quick Learn
Unix Quick Learn
Unix Quick Learn
Unix Quick Learn
Unix Quick Learn

Más contenido relacionado

La actualidad más candente

introduction to Operating system. (windows)
introduction to Operating system. (windows)introduction to Operating system. (windows)
introduction to Operating system. (windows)shantanu milkhe
 
Operating systems
Operating systemsOperating systems
Operating systemsartipradhan
 
01 operating systems final
01 operating systems final01 operating systems final
01 operating systems finalAman Garg
 
What is Google Chrome?
What is Google Chrome?What is Google Chrome?
What is Google Chrome?s1170222
 
windows and its components
windows and its componentswindows and its components
windows and its componentsprachi1210
 
Introduction To computer (discovery of computer)
Introduction To computer (discovery of computer)Introduction To computer (discovery of computer)
Introduction To computer (discovery of computer)Athar Mutahari
 
Operating systems system structures
Operating systems   system structuresOperating systems   system structures
Operating systems system structuresMukesh Chinta
 
Advantages of windows
Advantages of windowsAdvantages of windows
Advantages of windowsmyrajendra
 
STRUCTURE OF OPERATING SYSTEMS
STRUCTURE OF OPERATING SYSTEMSSTRUCTURE OF OPERATING SYSTEMS
STRUCTURE OF OPERATING SYSTEMSKABILESH RAMAR
 
Chapter 8 Operating Systems And Utility Programs
Chapter 8 Operating Systems And Utility ProgramsChapter 8 Operating Systems And Utility Programs
Chapter 8 Operating Systems And Utility Programsnorzaini
 
Operating System 2
Operating System 2Operating System 2
Operating System 2tech2click
 
Operating System Presentation
Operating System PresentationOperating System Presentation
Operating System PresentationSajid Khan
 

La actualidad más candente (20)

introduction to Operating system. (windows)
introduction to Operating system. (windows)introduction to Operating system. (windows)
introduction to Operating system. (windows)
 
Operating systems
Operating systemsOperating systems
Operating systems
 
01 operating systems final
01 operating systems final01 operating systems final
01 operating systems final
 
What is Google Chrome?
What is Google Chrome?What is Google Chrome?
What is Google Chrome?
 
windows and its components
windows and its componentswindows and its components
windows and its components
 
Introduction To computer (discovery of computer)
Introduction To computer (discovery of computer)Introduction To computer (discovery of computer)
Introduction To computer (discovery of computer)
 
Os
OsOs
Os
 
Introductory Mac OS X
Introductory Mac OS XIntroductory Mac OS X
Introductory Mac OS X
 
Operating systems system structures
Operating systems   system structuresOperating systems   system structures
Operating systems system structures
 
Advantages of windows
Advantages of windowsAdvantages of windows
Advantages of windows
 
STRUCTURE OF OPERATING SYSTEMS
STRUCTURE OF OPERATING SYSTEMSSTRUCTURE OF OPERATING SYSTEMS
STRUCTURE OF OPERATING SYSTEMS
 
Ch04
Ch04Ch04
Ch04
 
Mac OS X Manual
Mac OS X ManualMac OS X Manual
Mac OS X Manual
 
Windows operating system presentation
Windows operating system presentationWindows operating system presentation
Windows operating system presentation
 
Ch10
Ch10Ch10
Ch10
 
Chapter 8 Operating Systems And Utility Programs
Chapter 8 Operating Systems And Utility ProgramsChapter 8 Operating Systems And Utility Programs
Chapter 8 Operating Systems And Utility Programs
 
Ch06
Ch06Ch06
Ch06
 
Operating System 2
Operating System 2Operating System 2
Operating System 2
 
Operating System Presentation
Operating System PresentationOperating System Presentation
Operating System Presentation
 
Mac os casestudy
Mac os casestudyMac os casestudy
Mac os casestudy
 

Similar a Unix Quick Learn

Unix seminar
Unix seminarUnix seminar
Unix seminarajeet6742
 
CHAPTER 1 INTRODUCTION TO UNIX.pptx
CHAPTER 1 INTRODUCTION TO UNIX.pptxCHAPTER 1 INTRODUCTION TO UNIX.pptx
CHAPTER 1 INTRODUCTION TO UNIX.pptxMahiDivya
 
User and Operating-System Interface We mentioned earlier that there ar.docx
User and Operating-System Interface We mentioned earlier that there ar.docxUser and Operating-System Interface We mentioned earlier that there ar.docx
User and Operating-System Interface We mentioned earlier that there ar.docxStewartyUMGlovern
 
Intro tounix
Intro tounixIntro tounix
Intro tounixdjprince
 
Modern operating system.......
Modern operating system.......Modern operating system.......
Modern operating system.......vignesh0009
 
209979479 study-material
209979479 study-material209979479 study-material
209979479 study-materialhomeworkping8
 
Network operating systems
Network operating systems Network operating systems
Network operating systems Sachin Awasthi
 
Unix Operating System
Unix Operating SystemUnix Operating System
Unix Operating SystemMahakKasliwal
 
Introduction to Networking | Linux-Unix and System Administration | Docker an...
Introduction to Networking | Linux-Unix and System Administration | Docker an...Introduction to Networking | Linux-Unix and System Administration | Docker an...
Introduction to Networking | Linux-Unix and System Administration | Docker an...andega
 
Evolution of OS (Operating System)/CS.pptx
Evolution of OS (Operating System)/CS.pptxEvolution of OS (Operating System)/CS.pptx
Evolution of OS (Operating System)/CS.pptxcetaw81672
 
Ge6151 computer programming notes
Ge6151 computer programming notesGe6151 computer programming notes
Ge6151 computer programming notesshanmura
 

Similar a Unix Quick Learn (20)

Wk2 UNIX
Wk2  UNIXWk2  UNIX
Wk2 UNIX
 
Unix seminar
Unix seminarUnix seminar
Unix seminar
 
Intro tounix
Intro tounixIntro tounix
Intro tounix
 
CHAPTER 1 INTRODUCTION TO UNIX.pptx
CHAPTER 1 INTRODUCTION TO UNIX.pptxCHAPTER 1 INTRODUCTION TO UNIX.pptx
CHAPTER 1 INTRODUCTION TO UNIX.pptx
 
training report
training reporttraining report
training report
 
User and Operating-System Interface We mentioned earlier that there ar.docx
User and Operating-System Interface We mentioned earlier that there ar.docxUser and Operating-System Interface We mentioned earlier that there ar.docx
User and Operating-System Interface We mentioned earlier that there ar.docx
 
IntroToUnix.ppt
IntroToUnix.pptIntroToUnix.ppt
IntroToUnix.ppt
 
os.ppt
os.pptos.ppt
os.ppt
 
Intro tounix
Intro tounixIntro tounix
Intro tounix
 
Modern operating system.......
Modern operating system.......Modern operating system.......
Modern operating system.......
 
209979479 study-material
209979479 study-material209979479 study-material
209979479 study-material
 
Network operating systems
Network operating systems Network operating systems
Network operating systems
 
Unix Operating System
Unix Operating SystemUnix Operating System
Unix Operating System
 
Introduction to Networking | Linux-Unix and System Administration | Docker an...
Introduction to Networking | Linux-Unix and System Administration | Docker an...Introduction to Networking | Linux-Unix and System Administration | Docker an...
Introduction to Networking | Linux-Unix and System Administration | Docker an...
 
Evolution of OS (Operating System)/CS.pptx
Evolution of OS (Operating System)/CS.pptxEvolution of OS (Operating System)/CS.pptx
Evolution of OS (Operating System)/CS.pptx
 
UNIX_module1.pptx
UNIX_module1.pptxUNIX_module1.pptx
UNIX_module1.pptx
 
System structure
System structureSystem structure
System structure
 
Unix final
Unix finalUnix final
Unix final
 
Unix w
Unix wUnix w
Unix w
 
Ge6151 computer programming notes
Ge6151 computer programming notesGe6151 computer programming notes
Ge6151 computer programming notes
 

Último

What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 

Último (20)

What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 

Unix Quick Learn

  • 1. UNIX Overview The UNIX operating system was designed to let a number of programmers access the computer at the same time and share its resources. The operating system coordinates the use of the computer's resources, allowing one person, for example, to run a spell check program while another creates a document, lets another edit a document while another creates graphics, and lets another user format a document -- all at the same time, with each user oblivious to the activities of the others. The operating system controls all of the commands from all of the keyboards and all of the data being generated, and permits each user to believe he or she is the only person working on the computer. This real-time sharing of resources make UNIX one of the most powerful operating systems ever. Although UNIX was developed by programmers for programmers, it provides an environment so powerful and flexible that it is found in businesses, sciences, academia, and industry. Many telecommunications switches and transmission systems also are controlled by administration and maintenance systems based on UNIX. While initially designed for medium-sized minicomputers, the operating system was soon moved to larger, more powerful mainframe computers. As personal computers grew in popularity, versions of UNIX found their way into these boxes, and a number of companies produce UNIX-based machines for the scientific and programming communities. The uniqueness of UNIX The features that made UNIX a hit from the start are: · Multitasking capability · Multiuser capability · Portability · UNIX programs · Library of application software Multitasking Many computers do just one thing at a time, as anyone who uses a PC or laptop can attest. Try logging onto your company's network while opening your browser while opening a word processing program. Chances are the processor will freeze for a few seconds while it sorts out the multiple instructions. UNIX, on the other hand, lets a computer do several things at once, such as printing out one file while the user edits another file. This is a major feature for users, since users don't have to wait for one application to end before starting another one. Multiusers The same design that permits multitasking permits multiple users to use the computer. The computer can take the commands of a number of users -- determined by the design of the computer -- to run programs, access files, and print documents at the same time. The computer can't tell the printer to print all the requests at once, but it does prioritize the requests to keep everything orderly. It also lets several users access the same document by compartmentalizing the document so that the changes of one user don't override the changes of another user.
  • 2. System portability A major contribution of the UNIX system was its portability, permitting it to move from one brand of computer to another with a minimum of code changes. At a time when different computer lines of the same vendor didn't talk to each other -- yet alone machines of multiple vendors -- that meant a great savings in both hardware and software upgrades. It also meant that the operating system could be upgraded without having all the customer's data inputted again. And new versions of UNIX were backward compatible with older versions, making it easier for companies to upgrade in an orderly manner. UNIX tools UNIX comes with hundreds of programs that can divided into two classes: · Integral utilities that are absolutely necessary for the operation of the computer, such as the command interpreter, and · Tools that aren't necessary for the operation of UNIX but provide the user with additional capabilities, such as typesetting capabilities and e-mail. Tools can be added or removed from a UNIX system, depending upon the applications required. UNIX Communications E-mail is commonplace today, but it has only come into its own in the business community within the last 10 years. Not so with UNIX users, who have been enjoying e- mail for several decades. UNIX e-mail at first permitted users on the same computer to communicate with each other via their terminals. Then users on different machines, even made by different vendors, were connected to support e-mail. And finally, UNIX systems around the world were linked into a world wide web decades before the development of today's World Wide Web. Applications libraries UNIX as it is known today didn't just develop overnight. Nor were just a few people responsible for it's growth. As soon as it moved from Bell Labs into the universities, every computer programmer worth his or her own salt started developing programs for UNIX. Today there are hundreds of UNIX applications that can be purchased from third-party vendors, in addition to the applications that come with UNIX. How UNIX is organized The UNIX system is functionally organized at three levels: · The kernel, which schedules tasks and manages storage; · The shell, which connects and interprets users' commands, calls programs from memory, and executes them; and · The tools and applications that offer additional functionality to the operating system The three levels of the UNIX system: kernel, shell, and tools and applications. The kernel The heart of the operating system, the kernel controls the hardware and turns part of the system on and off at the programer's command. If you ask the computer to list (ls) all the
  • 3. files in a directory, the kernel tells the computer to read all the files in that directory from the disk and display them on your screen. The shell There are several types of shell, most notably the command driven Bourne Shell and the C Shell (no pun intended), and menu-driven shells that make it easier for beginners to use. Whatever shell is used, its purpose remains the same -- to act as an interpreter between the user and the computer. The shell also provides the functionality of quot;pipes,quot; whereby a number of commands can be linked together by a user, permitting the output of one program to become the input to another program. Tools and applications There are hundreds of tools available to UNIX users, although some have been written by third party vendors for specific applications. Typically, tools are grouped into categories for certain functions, such as word processing, business applications, or programming. Logging In & Out When you have established contact with the Unix system, the login prompt will be displayed. You must give your username followed by your password: login: lnp3jb Password: secret1 The username can be up to 8 characters in length. Unix usernames contain only lowercase characters, and it is important that you type your username in lower case (if you don't you will be permitted to log in, and then the shell will not recognise case differences.) The password must normally contain between 6 and 8 characters. On some unix systems the password must contain at least 1 non-alphabetic character. System messages When you log in a number of system messages may be displayed. The more filter will be used to control the output if the file contains more than a screenful of information. Just press the space bar to see the next screenful if it says 'more' at the bottom of the screen. The message: You have new mail indicates that electronic mail has been sent to your mailbox. The prompt When your login procedure is completed you should see the system prompt. This indicates that the shell is running and is awaiting instructions from the user. The prompt can take many forms, and you can change it later on if you want to. Often the prompt will contain the % character, and a number in brackets. This number will represent the number of a command, and can be used to recall commands already issued. It may also display the name of machine or system that you are logged onto. Some users prefer to
  • 4. have the name of the current working directory displayed in their prompt. For convenience, in this document, the % character will be used to represent the prompt. Changing your password Use the passwd command to change your password: % passwd -where '%' is the prompt Changing password for lnp5mw Old password: -type in your old password New password: -type in your new password Retype new password: -and again, to make sure % Logging out When you have finished your unix session you must log out from the system. To do this give the command: % logout You should always wait for the message confirming that you have logged out. On some unix systems you may receive the message: logout: command not known If this happens you should type: exit You may occasionally get the message: There are stopped jobs If this happens simply give the logout command again. ----------------------------------------------------------------------------------------------------------- - PRACTICE Log in to the unix system using your username and password.Change your password using the passwd command. You may find that the system will not change your password immediately. In this case you may have to use your old password next time that you log on.
  • 5. ----------------------------------------------------------------------------------------------------------- - THE UNIX FILESTORE ------------------ File hierarchy Unix has a hierarchical tree-like filestore. The filestore contains files and directories. The top-level directory is known as the root. Beneath the root are several system directories. The root is designated by the / character. The directories below the root are designated by the pathnames: /bin /etc /usr Confusingly, the / character is also used as a separator in pathnames. Historically, user directories were often kept in the directory /usr. However, it is often desirable to organise user directories in a different manner. Users have their own directory in which they can create and delete files, and create their own sub-directories. For example: /user/ei/eib035 belongs to someone whoe has the username eib035. Some typical system directories below the root directory: /bin contains many of the programs which will be executed by users /etc files used by system administrators /dev hardware peripheral devices /lib system libraries /usr normally contains applications software /home home directories for different systems The current directory This refers to your actual location in the filestore hierarchy. When you log in the current directory is set to the home directory. You can then change current directory, effectively moving around the filestore tree structure. The current directory is also called the quot;current working directoryquot; and the quot;working directoryquot;. The current directory can be referred to in pathnames by the . character (a full stop). Changing current directory The command cd is used to change your current directory. For example:
  • 6. % cd bin will move you from your current directory, down one quot;branchquot; to the directory bin, if such a directory exists. Typing cd with no arguments takes you to your home directory. Display current directory The command pwd is used to display your current directory. For example: % pwd /home/sunserv1_b/lnp5jb/bin Pathnames Files and directories may be referred to by their absolute pathname. For example: /home/sunserv1_b/lnp5jb/bin/hello Files and directories may also be referred to by a relative pathname. For example, if your current directory is /home/sunserv1_b/lnp5jb, the above file can be referred to as: bin/hello The home directory Each user has a home directory. They will be attached to this directory when they log in. Jenny Brown's home directory is: /home/sunserv1_b/lnp5jb The symbol ~ can be used to refer to the home directory. If Jenny Brown wishes to refer to her file she can give: ~/bin/hello rather than typing the long form: /home/sunserv1_b/lnp5jb/bin/hello The symbol ~ can also refer to other the home directory of other users. For example Jenny can refer to a file in John Smith's home directory using: ~lnp5js/test.dat The parent directory The parent directory is the directory above the current directory. The parent directory can be referred to by the .. characters (two full stops). For example to refer to the file test.dat in the parent directory:
  • 7. ../test.dat Linking files The ln command can be used to link files and directories across the filestore system. The symbolic link function (ln -s) is the most useful. This enables a file or directory to appear to be in a particular directory when it is in fact stored somewhere else. This can save the user from having to type out long pathnames for frequently used files or directories. For example, if you want to use the files in /usr/games regularly, you can set up a symbolic link to this directory. If Jenny Brown is in her home directory and types: % ln -s /usr/games fun this will create what appears to be a new directory below her home directory, entitled fun. When she does cd fun she will move to /usr/games. If she now does pwd, the current directory will appear as /home/sunserv2_a/lnp5jb/fun. Some things may be a little surprising however: the parent directory, for example, will be that of the original file or directory. -------------------------------------------------------------------------------- Exercises Check which directory you are currently in. If necessary, move to your home directory. (Remember: cd will do this from anywhere). Move to the root directory. (quot;Move to...quot; means quot;change your current working directory to...quot;. It is useful to picture the process as movement around the tree structure.) Work your way down one directory at a time to your home directory. Experiment with using relative and absolute pathnames; show how the two can produce the same results. Explore your systems filestore. Try to get into the home directory of someone else you know! (You may not be able to view their files.) -------------------------------------------------------------------------------- UNIX COMMANDS -------------------------------------------------------------------------------- Unix commands have the general format: command [options] [item] Items in brackets are optional, and words in italics are generic identifiers (i.e. options must be replaced by a particular option, e.g. -a). Note that:
  • 8. Commands are case sensitive. The command ls is different from LS. In fact LS is not recognised as a valid command. Command options consist of a single character. The command to list all the files in a directory is ls -a and could not be ls -all (the latter would have to mean a combination of options.) Command options can usually be combined or listed separately. For example: ls -al or ls -a -l The command item is given last. This is very often a file name. For example: ls -a file1.f not ls file1.f -a The echo command The echo command 'echoes' its argument to the standard output. This means that in its simplest form it prints something out on screen. For example: % echo Hello - you type Hello - response from the shell% Who is logged on? The command who gives a list of logged on users: % who root console Jan 4 10:34 men6matw ttyp1 Jan 6 09:45 (ecusun1) cbl6nd ttyp2 Jan 6 10:10 (cblslcd) cbl6ar ttyp3 Jan 6 16:03 (cblsuna) csc6ea ttyp4 Jan 6 14:15 (csuna1) root ttyp5 Jan 6 10:40 (sun032) ecl6rsh ttyp6 Jan 6 15:39 csc6ea ttyp8 Jan 6 14:15 (csuna1) lnp5mw ttyUf Jan 6 16:16 lnp5jb ttyp3 Jan 6 15:20 (sun051) Also try the command finger. This command gives the full name of logged in users. -------------------------------------------------------------------------------- PRACTICE Type finger to get information on yourself and other users.
  • 9. -------------------------------------------------------------------------------- Creating a directory The mkdir command is used to create directories. The format of this command is: % mkdir directory_name Jenny Brown stores her unix scripts in a directory called scripts beneath her home directory. In order to create this directory she uses the command: % mkdir scripts Deleting a directory The rmdir command is used to delete directories. The format of this command is: % rmdir directory_name Jenny Brown stores files for project work in a directory called proj. When the project has been completed she deletes the directory using the command: % rmdir proj Note that the directory must be empty before it can be deleted. Listing contents of a directory The command ls is used to list the contents of a directory. For example: % ls file1 scripts test.f test Notice that directories are listed as well as files. To list all files, including hidden files, give the command: % ls -a .cshrc file1 bin test.f test Hidden files begin with . (a full stop). Hidden files are normally system files, and will normally include the following: % ls -a .cshrc .forward .history .login .logout .cshrc contains commands that are executed every time you start off a C-shell, including when you log in .forward enables you to redirect your mail to another computer .history contains a record of previously executed commands .login contains commands that are executed at login time
  • 10. .logout contains commands that are executed at logout time The purpose of some hidden files. To identify directories in a listing give the command: % ls -F file1 bin/ test.f test Notice how the directory is identified by the slash (/) character. Deleting files Files can be deleted using the rm command. For example: % rm test.f Displaying files The command cat is used to display the contents of a file on the screen. For example: % cat file1 Creating files The command cat can also be used to create a file. For example: % cat > test.f When typing in a new file the input must be terminated by ^D NOTE ^D means press the <ctrl> and the d keys simultaneously. Be careful not to type ^D when you have the shell prompt, because this might log you out. Normally you would use an editor for creating files. This example is given since it illustrates how to create a small file without needing to learn the use of an editor. Copying files The command cp is used to copy a file. It takes the format: % cp old_file new_file For example: % cp file1 file2 Renaming files The command mv is used to rename a file.
  • 11. For example: % mv file2 temp changes the name of file2 to temp. Moving files The command mv is also used to move a file to a new location in the filestore hierarchy. For example: % mv file2 bin moves the file file2 into the subdirectory bin. Overwriting files Commands such as rm and cp can be dangerous if not used with care. The command: % cp file1 file2 will delete file2 if a file of that name already exists. If you have spelled the name of the new file incorrectly you may accidentally overwrite the contents of a file. Using the wildcard symbol * with the command rm can also be very dangerous. The command: % rm test* will delete all files starting with test. However if you inadvertently type an extra space (do not try this!): % rm test * -do not try this! the file test will be deleted if it exists. Then all other files in the directory will be deleted! Often no warning will be given. To prevent accidental deletion of files you can use the -i option with commands such as rm. The format of the command is: % rm -i file You will be asked to confirm that files are to be deleted. You may find that this is set as the default on your system. Wildcards Wildcard characters can be used to identify directory and file names. The wildcard character * is used to refer to any combination of characters. For example:
  • 12. % ls * - refers to all files % cat test* - refers to all files starting with 'test', e.g. 'test', 'testing', 'test.c', etc. The wildcard character ? is used to refer to a single character. For example: % ls test? - refers to files starting with 'test' followed by a single character e.g. 'test1', 'test2', 'testz', etc.% cat test.? - refers to all files starting with 'test' with a single character after the full stop, e.g. 'test.c, test.f' -------------------------------------------------------------------------------- Exercises Display your current working directory using the pwd command. Make a directory called exercises. Change your directory to the directory exercises. Display the current working directory. Return to your home directory. List the contents of your directory. Use the -l, -a and -F options and compare the output. Change your directory to the directory exercises. Create a file called example1 using the cat command containing the following text: water, water everywhere and all the boards did shrink; water, water everywhere, Nor drop to drink List the contents of your directory. Use the -l option to obtain a long listing. Viewing files with the more command The command more is used to display the contents of a file on the screen. The command is particularly useful for viewing long files since the display stops at the bottom of the screen. The following is a listing of a program in the Icon programming language: % more lookup.icn # program to look up words (given at the terminal) in the # computer usable version of the OALD # last change 18.12.91 # set global parameters global k # main body procedure main() # input word to be searched for write(quot;Give me a word: nquot;) word:=read()
  • 13. # this the important line - call the 'lookup' procedure if not write(lookup(word)) then write(quot;Not found in the dictionary.quot;) end procedure lookup(voc) # connect to the dictionary (dict:=open(quot;/home/sunserv1_a/ecl6rsh/oald.mitton/cuv2quot;)) | stop(quot;can't open the dictionaryquot;) # lookup algorithm every k:=1 to *voc do { --More-- (75%) The message at the bottom of the screen means that 75% of the file has been viewed so far. (The amount shown on screen will depend on the type of terminal you are using.) You can now do the following: To continue viewing press the space bar To view the next line press <RETURN> To quit press the <q> key To jump to the next occurrence of a string of characters type /string For a list of valid commands press the <h> key. Viewing files with the pg command The pg command is also available on some systems. This is an alternative to more % pg lookup.icn # program to look up words (given at the terminal) in the # computer usable version of the OALD # last change 18.12.91 # set global parameters global k # main body procedure main() # input word to be searched for write(quot;Give me a word: nquot;) word:=read() # this the important line - call the 'lookup' procedure if not write(lookup(word)) then write(quot;Not found in the dictionary.quot;) end procedure lookup(voc)
  • 14. # connect to the dictionary (dict:=open(quot;/home/sunserv1_a/ecl6rsh/oald.mitton/cuv2quot;)) | stop(quot;can't open the dictionaryquot;) # lookup algorithm every k:=1 to *voc do { bit:=bite(voc) Commands can be typed to the ':' prompt at the bottom of the screen: Type <RETURN> to view the next screen. Type <h> for a list of valid commands. -------------------------------------------------------------------------------- PRACTICE -------------------------------------------------------------------------------- If you have a file longer than 20 lines use pg to view it. Compare the use of pg with more. Use them both on the file /etc/passwd, and find the listing for your own username. Searching for strings in files The command grep is used to search a file for a string of characters. For example, to search the file lookup.icn for the character '#' (which designates comments in the program), use the command: % grep # lookup.icn # program to look up words (given at the terminal) in the # computer usable version of the OALD # last change 18.12.91 # set global parameters # main body # input word to be searched for # this the important line - call the 'lookup' procedure # connect to the dictionary # lookup algorithm A lot of pattern matching operations can be carried out with grep. The following example shows the use of a regular expression. In this example, the search is restricted to lines beginning with the 'p' character. % grep quot; pquot; lookup.icn procedure main() -output starts here procedure lookup(voc) procedure bite(voc2)
  • 15. You will learn more about pattern matching expressions later. Control characters The actual key sequences for the following operations can vary from between different systems and different terminals. The most commonly used key sequences are described below. If it is different on your system, remember the correct sequence and use it whenever the key sequences below are referred to later in the text. Where possible the operation itself is named (e.g. end-of-file), and not just the key sequence. Deleting the last character typed If you make a typing mistake you can delete the last character typed by using your delete key, which is usually the one marked <DEL> or <DELETE>. Deleting the entire line If you make many typing mistakes you can delete the entire line by typing ^U. NOTE Remember ^U means quot;press <CTRL> and <u> keys simultaneouslyquot;. Sending an interrupt If you wish to terminate the execution of a command type ^C. Sending an end-of-file character In many Unix commands you need to finish your input with an end-of-file character. The default end-of-file character is ^D. Printing on paper This is usually called 'obtaining hard copy output', as distinct from output to the screen or a file. The command lpr sends a file to the line printer: % lpr file1 Note that the command lp is used on some Unix systems. The command: % lpr -Pprinter file is used to submit the file to a specific printer. -------------------------------------------------------------------------------- The locally developed command printers can be used to obtain a list of printers. -------------------------------------------------------------------------------- Getting help
  • 16. The command man is used to display help on the syntax of Unix commands. The format of this command is: % man [option] [file] For example to obtain help information on the who command, type: % man who The keyword option -k keyword is used to display a list of help files associated with the keyword. For example to display a list of all man files associated with password type the command: % man -k password getpass(3) read a password passwd(1) change login password passwd(5) password file The command man automatically invokes the more program for viewing files. You can use the normal more commands to continue viewing. -------------------------------------------------------------------------------- If you have any problems that can't be solved by referring to the manual, please consult your supervisor or the Advisory Service. The Help Desk can be contacted in person in the User Access Area, on the telephone on extension 5366, or by email to helpdesk. Also the LUCS Unix system operators can be contacted on telephone extension 5380. With non- urgent problems, an email message to your supervisor is usually the most efficient way of getting help. (See next chapter on how to use email.) -------------------------------------------------------------------------------- Exercises 1. Display a list of logged on users. 2. Obtain further information for a particular user using the finger command. 3. Use the man command to obtain further information on the finger command. 4. Use the man -k command to find what manual entries there are related to passwords. 5. Use the grep command to search the file example1 for occurrences of the string 'water'.
  • 17. 6. Use man and the keyword option to find out more information on communications and e-mail in Unix. 7. Print out a file on paper. COMMUNICATIONS -------------------------------------------------------------------------------- Mail The mail command enables the user to send and receive electronic mail messages to and from users on both the Unix system and remote users. This is the basic mail command. Enhanced versions, such as programs that run under a windows program (e.g. mailtool), or screen-based versions of mail (e.g. elm) may be available, and you will probably find them preferable to mail. If so, much of the following can safely be ignored. Remember however that some version of mail will definitely be available on any unix system that you use. Sending mail To send a message to a user on your system, type: % mail username The cursor will move to the next line, and you will get a Subject: prompt. You can now type in the subject of your message, and then press <RETURN>. The cursor will go to the start of the next line and there will be no prompt. You now type in the text of your message. Terminate each line with <RETURN>. When you have finished the text of the message, type an end-of-file character (usually ^D), or a full-stop character. You should now return to your normal shell prompt. If the message is dispatched successfully, you will hear no more about it. The following is example of the mail command in action: % mail lnp6ttld Subject: UNIX course I don't think I'll ever be able to get the students in the UNIX course to understand how to use e-mail. ^D % Entering the text of the message by this method is a rather crude process. Errors on the line being typed can be erased with your delete key, but once you have pressed <RETURN>, a line cannot be edited. A message may be aborted by pressing ^C twice. --------------------------------------------------------------------------------
  • 18. PRACTICE -------------------------------------------------------------------------------- Send yourself a message. (You will find out where it has gone in the next section.) Subcommands while entering mail There are several commands you can type while entering mail: <CTRL/Z> will cancel the message, and leave the text in a file named dead.letter. ^e invoke a text editor to edit your message. ~v invoke a screen editor to edit your message. ~f reads the contents of the message you have just read, into your message text. ~r file reads contents of file into your message text. While this method is quick and easy to use, and quite adequate for short and simple messages, many users prefer to first create a file containing the text of the message, and then mail this file to the intended recipient. This enables you to use any system editor and formatter to create the message, and you do not need to send it immediately. The following sequence shows how to send a file note containing the text of a message to another user. % mail lnp6ttld < note To understand fully how this works see the section on 'Re-direction of standard output' in Chapter 8 below. In this example the message will not contain a subject heading, unless one has already been included as the first line of the file note. There is a -s option with the mail command, that can be used to include a subject header, as follows: % mail -s UNIX lnp6ttld < note The string following the -s is the subject; in this case, the subject is quot;UNIXquot;. Receiving mail If new mail is waiting for you when you login, you will see the message: You have new mail
  • 19. To start the mail program type the command: % mail Each message is summarised on a numbered list. The current message is marked with a quot;>quot; character. The mail prompt character is quot;&quot;. Type the number of the message you want to read, or just press <RETURN> to read through the list. The list of mail headers will look something like this: % mail Mail version SMI 4.0 Thu Oct 11 12:59:09 PDT 1990 Type ? for help. quot;/usr/spool/mail/lnp5jbquot;: 2 messages 2 new >N 1 lnp5mw Thu Jan 9 15:10 11/262 hello N 2 lnp5js Thu Jan 9 15:11 10/287 party & This tells Jenny Brown that she has two messages, one from user lnp5mw, and one from lnp5js. The date and time at which the messages were received is also listed, and so is the subject header (the last item on each line - here 'hello' and 'party'). The following commands can be entered to the mail prompt: d Mark the current message for deletion d n Mark message number n for deletion u n undelete message number n. w file save the current message in file with the mail header and mark for deletion s file Save the current message in file without the mail header and mark for deletion r Reply to the current message q Quit mail, removing deleted messages from your system mailbox. Undeleted messages that have been read are normally stored in your personal mailbox (see below) x Exit mail, leaving your mailbox untouched, i.e. messages deleted in this session are restored h Show list of message headers ? List the useful mail commands ! command Execute specified shell command
  • 20. - Re-read previous message. m recipient Send mail to named recipient Files used by mail ~/mbox Your personal mailbox, located in your home directory. This is where messages that you have saved are stored, unless you specified another location when you saved them. You can access this file by issuing the command: % mail -f mbox ~/.mailrc A file that can hold commands for mail to obey when it starts up. -------------------------------------------------------------------------------- PRACTICE -------------------------------------------------------------------------------- See if you have received any mail. If you have, save a message to your mailbox file. Send yourself another message, and this time discard it. Send a message to another user. Sending mail to remote users The following also applies to the elm mail program. Sending mail to users on other computer systems is simple using mail. Simply type the full address of the remote user where the system username is used above. For example: % mail lnp5mw@uk.ac.leeds.gps or% mail -s Hello ecl6rsh@uk.ac.leeds.cms1 < note These two examples show two ways of sending mail shown above. It is also possible to use mail to look at folders of mail that you have already received. To do this type: % mail -f folder_name and it will treat the messages in the folder as incoming mail. Sending on-line messages As you have seen, messages sent using mail are received in a special buffer, and it is up to the recipient when to look at them and what to do with them. It is also possible to send
  • 21. a message that will simply appear on the screen of the recipient, if they are logged on. This is less useful than mail for the following reasons: mail can be used irrespective of whether the recipient is logged on or not. mail messages can be stored by the recipient. This means that files can be transferred by mail, and a record of transactions can be kept. On-line messages can be confused with whatever the recipient has on screen and can easily disrupt what the are doing. They can be very annoying! On the other hand, on-line messages do have the advantage of obtaining the immediate attention of another user, and it is possible to have an interactive conversation. Bearing these facts in mind, use the following command with caution! write The write command is used to send on-line messages to another user on the same machine. The format of the write command is as follows: % write username text of message ^D After typing the command, you enter your message, starting on the next line, terminating with the end-of-file character. The recipient will then hear a bleep, then receive your message on screen, with a short header attached. The following is a typical exchange. User lnp5jb types: % write lnp8zz Hi there - want to go to lunch? ^D % User lnp8zz will hear a beep and the following will appear on his/her screen: Message from lnp5jb on sun050 at 12:42 Hi there - want to go to lunch? EOF If lnp8zz wasn't logged on, the sender would see the following: % write lnp8zz lnp8zz not logged in.
  • 22. SunOS has the talk command. This has several advantages over write. Firstly, talk can call other machines on a network. Secondly, talk provides a clearer interface for the exchange of messages, dividing the screen into two windows for the interlocutors. Type talk username@machine to start a conversation. -------------------------------------------------------------------------------- PRACTICE -------------------------------------------------------------------------------- Try to have an extended on-line conversation with another user. You can stop messages being flashed up on your screen if you wish. To turn off direct communications type: % mesg n It will remain off for the remainder of your session, unless you type: % mesg y to turn the facility back on. Typing just mesg lets you know whether it is on or off. Remote logins It is possible to log on to another machine on a Unix network, provided that you have permission to do so. To do this use the rlogin command. Type: rlogin machine and you will be asked for your password. It may be necessary for you to do this to make on-line communications with another user easier. -------------------------------------------------------------------------------- Exercises 1. Send a message to another user on your Unix system, and get them to reply. 2. Create a small text file and send it to another user.
  • 23. 3. When you receive a message, save it to a file other than your mailbox. (Remember you can always send yourself a message if you don't have one.) 4. Send a message to a user on a different computer system. 5. Send a note to your course tutor telling him that you can use mail now. FILE PERMISSIONS -------------------------------------------------------------------------------- What are file permissions? The Unix file security system can prevent unauthorised users from reading or altering files. Every file and directory has specific permissions associated with it, giving different categories of user certain permissions to look at or change a file, and to run executable files. NOTE Executable files are files containing commands than can themselves be executed as if the file itself were a command. The file permissions can be displayed using the command: % ls -l [filename] For example, to display the permissions on the file lookup.icn, type the command: % ls -l lookup.icn -rw-r--r-- 1 lnp5jb 777 Dec 18 lookup.icn The first set of characters in the output from the command (-rw-r--r--) gives the permissions. The username in the middle of the line (lnp5jb) is the owner of the file. This is user who created the file. The following fields tell you the number of characters in the file, the date it was created and the name of the file. Note that the first character specifies the file type. This is normally one of the following: - indicates a file d indicates a directory The following nine characters represent permissions for different classes of users. Users on a Unix system are assigned to a group or groups, which might correspond to a
  • 24. particular department, or research group in the real world. Members of a particular group can be allowed access to files belonging to other members of the group. The second, third and fourth characters in the permissions string represent permissions that apply to the owner of the file. The next three characters apply to members of the owner's group. The last three apply to all other users. The file in this example therefore has rw- for the owner, r-- for the group and r-- for others. The three characters corresponding to each class of user each represent a different type of permission. The first character represents 'read' permission. This means that a user has permission to open a file and view the contents. If there is an r in this position then that class of users has read permission. In this example all users have read permission. In this, and in every case, a horizontal bar character (-) means that permission is denied. The second position represents 'write' permission (the right to make changes to a file). In the example, only the owner has write permission. Normally, you will not want others to be allowed to make changes to your files, so write permission is only allowed to the owner. The third position represents 'execute permission'. This means permission to 'execute', or run, a file that works like a command. In this example no-one has execute permission for the file lookup.icn (it is an Icon program, and it would have to be compiled before it could be executed, so execute permission would be useless). To summarise the above, this is how the permissions string is divided up: - rw- r-- r-- type of file owner group others Here is another example, this time an executable file: -rwxr-x--x 1 lnp5jb 562 Jan 10 hello This tells us that hello is a file; the owner is lnp5jb, the owner has read, write and execute permission; the group has read and execute permission; others just have execute permission. -------------------------------------------------------------------------------- PRACTICE -------------------------------------------------------------------------------- What are the default permissions for your files and directories? Are they all the same?
  • 25. When you copy a file what file permissions does the new file have? Changing file permissions The command chmod is used to change the permissions on a file. The format of this command is: % chmod mode filename For example, to add read permission for the group to the file file1, give the command: % chmod g+r file1 chmod modes In the command: % chmod mode filename the mode consists of three elements: who operator permissions The following options are possible: who: u user (owner) g group o other a all operators: - remove permission + add permission = assign permission permissions: r read
  • 26. w write x execute For example: chmod o-rw file1.f removes read and write permissions from others. chmod u+x test adds execute permission to the owner. Permissions for directories Read, write and execute permissions are set for directories as well as files. Read permission means that the user may see the contents of a directory (e.g. use ls for this directory.) Write permission means that a user may create files in the directory. Execute permission means that the user may enter the directory (i.e. make it his current directory.) -------------------------------------------------------------------------------- Exercises 1. Try to move to the home directory of someone else in your group. There are several ways to do this, and you may find that you are not permitted to enter certain directories. See what files they have, and what the file permissions are. (Remember that you can protect your own files from prying eyes, or from interference.) 2. Try to copy a file from another user's directory to your own. 3. Set permissions on all of your files and directories to those that you want. You may want to give read permission on some of your files and directories to members of your group. STANDARD INPUT AND OUTPUT -------------------------------------------------------------------------------- Standard input Input to Unix commands is normally given from the keyboard. For example you can use the cat command interactively: % cat
  • 27. Hello - you typeHello - responsethere - you typethere - response^D - you type% Note that input from the keyboard is terminated with the end-of-file character, usually ^D. For another example consider the spell command, which is the unix spelling checker: % spell - you typeInput to the spell ulitity - you typeis typed at the keyboard - you type D - you typeulitity - response The spell command outputs words that are incorrectly spelled in the input. Standard output Output from Unix commands is normally displayed on the screen. For example: % spell Input to the spell ulitity is typed at the keyboard ^D ulitity - output -------------------------------------------------------------------------------- PRACTICE -------------------------------------------------------------------------------- Try out the spell checker. See how it copes with British spellings (remember it's an American system), proper nouns, hyphens and recently coined vocabulary. Re-direction of standard input It is possible to redirect standard input so that the input is taken from a file. Imagine you wish to check for spelling errors in a report. A text can be put into the file report, which can be fed into the spell command: % cat > report Input to the spell ulitity can come from a file ^D % spell < report ulitity The < character is used to re-direct the input from the file report to the command spell. The general format for re-direction of user input is:
  • 28. command < filename Another common use of re-direction of standard input is to mail a file to another user. The command: % mail lnp8zz < report will mail the file report to local user lnp8zz. Re-direction of standard output You do not always want the output from a Unix command to be displayed on the screen. It has already been shown how it is possible to direct the output from the cat command to a file. Imagine you want a list of your files and directories kept in a file. You would use the command: % ls > filelist The > character is used to re-direct the output from the command to the file called filelist. The general format for re-direction of user output is: % command > filename Note that output directed to the file /dev/null is effectively discarded. This is the system 'wastebasket'. Another example involves directing the output of echo to a file: echo quot;Hello therequot; > greeting This would normally overwrite any existing contents of the file greeting. Study the following sequence: % echo quot;Hello therequot; > greeting % cat greeting Hello there % echo quot;This insteadquot; > greeting % cat greeting This instead It is possible to append output to a file, rather than overwriting it, by using the >> operator. For example: % echo quot;Hello therequot; > greeting % cat greeting Hello there
  • 29. % echo quot;and goodbyequot; >> greeting % cat greeting Hello there and goodbye Look carefully at the difference between these two examples. Re-direction of input and output It is possible to re-direct both standard input and output. If you have a report containing many spelling mistakes you may wish to keep a list of the mistakes in a file. You can do this using the following command: % spell < report > errors Piping Output from one command can be sent ('piped') to the input of another command using the | character: command1 | command2 A common use for pipes is to control the output of large files to the screen. It is possible to send output to the more command so that only one screenful at a time is output. If the command % ls -l is used to give a long listing of all files and directories there may be too many lines to see them all at once on the screen. (If you don't have many files, move to /etc where there should be plenty.) Output from ls -l can be piped to more as follows: % ls -l /etc | more You can then use the usual more commands to control the output. In the output from ls -l, directories are identified by the d character at the start of each line. A list of just the directories can be obtained by piping the output of this command to the grep command, giving grep an option which will list only lines containing the d character at the start of the line. The command is: % ls -l | grep quot;^dquot; The commands sort and grep are often used when piping. For example: % cat phonenos | sort | lpr
  • 30. will send an alphabetically sorted list of the phone numbers contained in the file phonenos to the line printer. The command: % cat phonenos | grep leeds | sort | lpr will send a sorted list of phone numbers containing the string 'leeds' to the line printer. -------------------------------------------------------------------------------- Exercises 1. Put a listing of the files in your directory into a file called filelist. (Then delete it!) 2. Create a text file containing a short story, then use the spell program to check the spelling of the words in the file. 3. Redirect the output of the spell program to a file called errors. 4. Type the command ls -l and examine the format of the output. Pipe the output of the command ls -l to the word count program wc to obtain a count of the number of files in your directory. AN INTRODUCTION TO THE EX LINE EDITOR -------------------------------------------------------------------------------- What's ex for? Editors available on Unix include: ed basic line editor ex line editor vi screen editor emacs screen editor Ex is an enhanced and more friendly version of ed. Vi is a screen-based version of ex. Most users have no practical use for a line editor nowadays, and they are really a relic of an earlier age in computing. However, you may occasionally have to use ex, if for some reason you can't run a screen editor on your terminal. It is covered here mainly to teach something else, namely, the way that Unix handles texts. This is perhaps most transparent when you are using ex. Ex forces the user to use complicated pattern matching operations to do things that are comparatively easy with a screen editor, such as making correcting small typing errors in the text. While taking this approach may at times seem
  • 31. unnecessarily difficult, it should be remembered that what follows here is just a stepping stone to other Unix utilities, such as vi (which you are far more likely to want to use as an editor than ex), and commands that use regular expressions, such as grep, tr and awk. Learning to use ex involves skills necessary for getting the most out of these utilities. Using ex Starting ex The command ex is used to invoke the editor. The format of this command is: % ex [filename] A filename can be supplied if you wish to edit an existing file. % ex oldfile quot;oldfilequot; 10 lines 465 characters : Alternatively the filename may be used as the name of a new file: % ex newfile quot;newfilequot; [Newfile] : notice that the prompt for ex commands is the ':' character. Adding Text To enter text simply type the command a (short for append), and then type in the text, as follows: :a This is the text Input is terminated by typing a full stop ('.') on a new line: :a This is just one line of text . : The command i is used to insert text before the current line. Saving Your Data The command w (short for 'write') is used to save your data. The format of this command is: :w [filename]
  • 32. If no filename is specified, the filename given when ex was invoked will be used. E.g.: :w test.f test.f 50 lines 576 characters : The number of lines and characters in the file will be displayed. Quitting the Editor The command q (short for 'quit') is used to quit the editor. Note that if changes have been made to the file and have not been saved the editor will respond with a warning message: No write since last change (:quit! overrides) The command quit! (or just q!) must be given if you wish to quit without saving your changes: Displaying Lines in the File The p command (for 'print') used to display lines in the file. The format of this command is: :[line_range] p If no range is supplied the current line is displayed. Pressing <RETURN> is equivalent to moving on to and displaying the next line. With small files it is possible to display the entire file by pressing <RETURN> until the end of the file is reached. Line Ranges Ranges of lines that can be given to edit commands include: Absolute line number 6 refers to line 6 1,6 refers to lines 1 to 6 Relative line numbers -2 refers to 2 lines before the current line +3 refers to 3 lines after the current line -2,+3 refers to a range from 2 lines before the current line to 3 lines after the current line
  • 33. Special symbols $ refers to the last line in the file e.g. $p to display last line, 1,$p to display entire file . refers to the current line e.g. .,$p to display from the current line to the end Examples: 6d - deletes lines the sixth line1,6d - deletes the first six lines1,$d - deletes all lines3a - append text after line three.,+10w new - saves the next ten lines to a file called new The = operator gives the line number, with the last line the default, so typing = gives you the number of lines in a text. The number of the current line is obtained by typing .=. Deleting Lines The d command is used to delete lines. The format of this command is: :[line_range] d If no line number is given the current line will be deleted. It is possible to supply a range of lines. For example: :1,$d will delete the entire file. Searching Searches are carried out by including the search string in slashes ('/'): /string/ The search will start at the current line. :/Jane/ This is Jane's file The special characters '^' and '$' can be used to assist the search. For example: /^This/ will find a line beginning with 'This'/file$/ will find a line ending in 'file' The last string searched for is the default string. This means that you can repeat a search just by typing //.
  • 34. Reverse Searches Reverse searches are carried out by including the search string in question marks ('?'): :?string? The search will start at the current line and search backwards through the file. Making Substitutions The s command is used to make substitutions. The format of this command is: :[line_range]s/old_string/new_string/ If no line number is given substitutions will be made only on the current line. For example: :s/old/new/ will substitute the first occurrence of the string 'old' with 'new' on the current line. The command: :.,$s/old/new/ will substitute the first occurrence of the string 'old' with 'new' in every line from the current line to the end of the file. Global Substitutions The g command (for 'global') is used to make multiple substitutions on a line. For example: :s/old/new/g will substitute all occurrences of the string 'old' with 'new' on the current line. The command: :1,$s/old/new/g will substitute all occurrences of the string 'old' with 'new' in the file. Search strings can also be used in conjuction with the s command in order to carry out more sophisticated global changes. The line range preceding a substitution string may include a search for the string to changed. For example: :g/old/s//new/g
  • 35. This means 'search globally for 'old', then replace every occurrence with 'new'. Remember the null string (in s//) stands for the last RE, in this case the RE 'old'. This is the same as: :1,$s/old/new/g Additional ex facilities Additional commands available using the ex editor include: c replaces lines t transfers lines m moves lines j joins lines l shows invisible characters f gives the name of the file being edited r inserts named file e edits named file u undo last change The commands m and t above work in a similar way, in that they require two line addresses, one before and one after the command. The address in front refers to the source and the address after the destination. If either is omitted, the current line is assumed. Line addresses may be ranges, allowing blocks of text to be moved. Here are a few examples of commands: :.m2 This moves the current line to a position after line 2. :1,.m$ This moves a block (line 1 to the current line) to the end of the text. :1,.t$ This copies the block at the end of the text, leaving the original block untouched.
  • 36. -------------------------------------------------------------------------------- Exercises 1. Create a file using ex. Put the text of a message in the file and then mail it to someone (see chapter on mail). -------------------------------------------------------------------------------- 2. Use ex to explore the file /etc/passwd. Search for your own listing, and those of others in your group. (You won't be able to save changes to the file). 3. Find a text file to which you have access and copy it to your home directory. Try making some changes to it. REGULAR EXPRESSIONS -------------------------------------------------------------------------------- What are regular expressions? A regular expression (RE) is a string of characters that can be used to match a set of character strings. For example, to globally search for all occurrences of the word quot;andquot; would require a search for quot;andquot;, quot;Andquot;, quot;AnDquot;, quot;ANDquot;, etc. Without regular expressions finding all possible occurrences of quot;andquot; would require eight separate searches. Using an RE the search could be done with one command. Regular expressions are used by many Unix utilities, including: ed ex vi grep sed awk (The awk utility interprets a special-purpose programming language that makes it possible to handle simple data-reformatting jobs easily with just a few lines of code. Awk is not covered in this course, but the GAWK Manual is a good guide to its use.) Regular expressions are used in searches and substitutions.
  • 37. Character strings A character string is the simplest regular expression which simply matches the string itself. For example: /hello/ - matches 'hello's/hello/goodbye/ - matches 'hello' and makes a substitution Matching single characters The '.' character is used to match a single character. For example: /p.t/ - matches 'p' and 't' separated by a single character, e.g. 'pit', 'put', 'pot', etc. Sets of characters The expression /RE/ is used to match a set of characters in a single character position. For example: /x[ab2X]y/ - matches any of the following: xay xby x2y xXy In the expression /[RE]/ a range of characters can be specified. For example: [a-z] - matches any single lower case character[0-9] - matches any single digit Note however: [0-57] - matches any one of the following:0 1 2 3 4 5 7 i.e. 0-5 and 7. Sets of characters can be combined: [a-d5-8X-Z] - matches any one of the following:a b c d 5 6 7 8 X Y Z It is possible to specify a set of characters which are not to be matched in the RE. For example: [^0-9] - matches any single character which is not a digit Anchors An anchor is used to match a RE found at a particular position. For example: /^RE/ - matches RE at the start of a line /RE$/ - matches RE at the end of a line /^RE$/ - matches RE as the whole line
  • 38. Note that there are two separate uses of the '^' operator. One is as the sart of line anchor, and the other as the 'logical not' operator. The latter function only applies inside square brackets. Repetitions Multiple occurrences of REs can be specified. For example: a* - matches 0 or more occurrences of 'a'aa* - matches 1 or more occurrences of 'a'.* - matches any string of characters Remembered regular expressions A null RE stands for the last RE. For example: :/[Tt]he.*car/p The blue car exploded with a roar. :s//(The blue car)/p (The blue car) exploded with a roar. The '&' character in a replacement string stands for the most recently matched string. For example: :/[Tt]he.*car/p The blue car exploded with a roar. :s//(&)/p (The blue car) exploded with a roar. Sub-expressions A sub-expression in a RE can be referred to. (string) - defines an RE sub-expressionn - refers to the nth RE sub-expression NOTE The backslash is the escape character for REs. This means it neutralises the special meanings of special characters. For example: :p A line of text :s/(line).*(text)/21/p A text line :* Repetition It is possible to specify multiple occurrences of REs. For example: c{4} matches exactly 4 c'sc{4,} matches 4 or more c'sc{2,4} matches between 2 and 4 c's
  • 39. For example, to find a line containing 5 digits: /[0-9]{5}/ A summary of special characters Special characters in the search string start of line anchor (or NOT operator inside [] ) $ end of line anchor . any character * character repeated any number of times escape character [ ] contains range of characters Special characters in the replacement string & string matched in search string escape character Note that any regular expression can be used with grep. (It gets its name from the editor command g/RE/p which means 'globally search for RE and print it'). This opens up many new possibilities for the use of grep. Unix commands that use regular expressions often makes the use of an editor redundant. -------------------------------------------------------------------------------- PRACTICE Obtain a listing of the members of your group from the password file using grep. -------------------------------------------------------------------------------- Introduction to sed sed is a non-interactive stream editor which is used for text. The command to invoke sed is: sed [-n] [-e command] [-f edfile] [input_file] For example:
  • 40. sed quot;s/UNIX/Unix/gquot; thesis > thesis.new This will process the file thesis line by line, outputting each line to the file thesis.new and replacing each occurrence of the string quot;UNIXquot; with quot;Unixquot;. In the above example every line of thesis will be output to thesis.new, irrespective of whether it has been changed or not. This is because the default output for sed is every line of the input. Using the -n option supresses the default output, and only specified lines are output. In the above example this would mean that no lines would be output in the following example: sed -n quot;s/UNIX/Unix/gquot; thesis > thesis.new since a change but no output has been specified. If a print command is added, as follows: sed -n quot;s/UNIX/Unix/gpquot; thesis > thesis.new then only those lines in which quot;UNIXquot; had been changed to quot;Unixquot; would be output. As you also see in the example, the -e option is not not necessary when there is only one editor command. It is possible to specify more than one command, and in this case each must be preceded by -e. For example: % sed -e quot;s/a/A/quot; -e quot;s/b/B/quot; file1 > file2 This command will carry out the two substitutions on each line of file1. The -f option enables the user to use a file containing editor commands, instead of typing out a series of commands with the -e option. sed examples The sed command to list only files (exclude directories) is: % ls -l | sed -n quot;/ -/pquot; -rw------- 1 lnp5jb 1765 mbox -rw------- 1 lnp5jb 320 example1 The sed command to extract a list of usernames from the password file is: % sed quot;s/:.*//quot; /etc/passwd | more What this does is to delete everything that comes after ':' in the password file. --------------------------------------------------------------------------------
  • 41. Exercises 1. Reproduce the effects of the above sed examples using grep instead. Note that grep is generally better for searches, such as this, while sed can be used to make changes to files. 2. Find the system's games directory and type quiz function ed-command to do the ed commands quiz. Don't worry if there are a couple of things that you haven't come across. Try it again and see if you improve your score. PROCESSING LARGE TEXT CORPORA -------------------------------------------------------------------------------- This section will focus on exploiting large files containing linguistic material with the use of the commands already covered plus many more. Compressed files Often large files are compressed to save disk space. If this is the case then the user must make the file revert to it's original format in order to be able to do anything with it. A popular compressing command is called, simply, compress. The command: % compress filename will cause the file to be replaced by a compressed file with a .Z suffix. The command uncompress will cause it to revert to its original format. It is often not necessary to uncompress a file to use it. In fact, the file will often be owned by someone else, and you would have to copy it and then uncompress it, using up a great deal of disk space and processor time. It is often better to use the zcat which sends the uncompressed contents of a compressed file to the standard output, while leaving the compressed version of the file in the filestore. -------------------------------------------------------------------------------- PRACTICE Try compressing and uncompressing some of your own files. Find a large compressed file on your system and search it for some appropriate string using grep without uncompressing the file. -------------------------------------------------------------------------------- Some useful commands for processing text files
  • 42. The following is a summary of some useful commands for processing text files, some of which you have met already, some of which are new to you. Both have been included so that this section can easily be used for reference purposes. Not all of these commands are standard Unix, so they may not all work in the way you expect (or at all) on your system. For the same reasons, their syntax is somewhat incongruous and some use different input and output conventions. Not all are included in the command summary in the appendix below. See the relevant manual pages for more details. sort sort into alphabetical order sort -n sort into numerical order sort -m merge sorted files into one sorted file sort -r sort into reverse order (highest first) sort -c check a file is already sorted uniq remove duplicate lines (or partly-duplicate lines) uniq -d output only duplicate lines uniq -c count identical lines (or lines with identical fields) grep find lines containing given string or pattern grep -v find lines not containing given string or pattern grep -c count lines containing given string or pattern grep -n give line numbers of lines containing... fgrep same as grep except that it does not recognise regular expressions egrep same as grep except that it recognises all REs grep only recognises certain special characters wc -c count characters wc -w count words wc -l count lines NOTE wc -l file will output the number of lines in the file, and the file name.
  • 43. wc -l < file just gives the bare line count. head -17 output first 17 lines tail -17 output last 17 lines tail +30 output from line 30 cut -f3 delete all but third field of each line cut -f3,5 delete all but third and fifth fields of each line cut -f3-5,7 delete all but 3rd, 4th, 5th, 7th fields of each line cut -c-4,6-8 delete all but 2nd 3rd 4th, 6th 7th 8th characters cut -f2 -dquot;:quot; deletes all but the second field where quot;:quot; is the field delimiter (tab is the default) paste combines files horizontally; corresponding lines are appended paste -dquot;>quot; pastes with delimiter defined as quot;>quot; (tab is default). The special characters quot;nquot; (newline) and quot;0quot; (null string) may be used. cat concatenates file vertically (appends files to one another) cat -n precedes each line with a line number in the output cat -b as above, but does not number blank lines cat -s reduces any number of successive blank lines to one blank line tr quot;abc-equot; quot;kmx-zquot; translates a, b, c, d, e to k, m, x, y, z respectively. tr -d quot;xyquot; deletes all occurrences of x and y tr -s quot;aquot; quot;bquot; translates all a to b and reduces any string of consecutive b to just one b. To go down to the character, rather than field, level, sed is simplest for line by line processing. sed looks for patterns, so is not very good with column or field positions. uniq needs an already-sorted file. A common idiom is sort | uniq
  • 44. to produce a sorted list of all the different lines in a file. uniq has a peculiar way of spacing its output, so it is difficult to use in a pipeline with another command such as cut. tr is useful for converting blanks to newlines (hence converting a text to a vertical list of words, which can then be sorted, counted etc.). The command: % tr quot; quot; quot;012quot; < filename will do this. 012 is the octal code for the linefeed character. This is also useful for converting strings of blanks or tabs to single characters. 011 is the octal code for the tab character. -------------------------------------------------------------------------------- PRACTICE Try out the following pipeline on a text file: -------------------------------------------------------------------------------- tr quot; quot; quot;012quot; < input_file | sort | uniq > output_file -------------------------------------------------------------------------------- Using language corpora A corpus (plural corpora) is a collection of language data. The corpora with which we will be concerned here are electronic, that is they are stored in a computer. Corpora may contain data about written or spoken language. They usually contain texts from one language, but they may also be multilingual. Corpora are usually designed and collated for a specific purpose. Many of the major corpora in use today aim to be representative of different domains of language use, and can facilitate comparative studies. For example, the average length of words in academic texts and newspaper reports could be compared by measuring words in texts from these two domains. Computers obviously make this type of number-crunching (or word-crunching) activity much easier than it would be if you had to count words and letters in a printed text. Corpora are particularly useful for checking the intuitions that we have and the generalisations that are made about language use. Unix commands can be used to extract information from language corpora. The commands learned in this course can be used for issuing commands and writing simple scripts that can be used to extract information from language corpora. Types of Corpora There are many types of corpora, defined by the types of language that they represent and the formats in which that information is stored. Unix commands for handling strings are
  • 45. sufficiently flexible to handle many different formats. Users however need to be sensitive to the arcane minutiae of the format and markup of the different corpora that they use. The 'l' command in the vi editor can be used to view hidden characters (such as spaces and tabs) in a file. The LOB and Brown corpora Brown and LOB are parallel corpora, with very similar formats and tagging. Brown, which was constructed first, represents different types of written American English. LOB represents the same categories of British English. All words are lemmatised and given a word class tag. Here is a sample from the so-called 'vertical tagged' version of Brown: ^N01002001 ----- ----- ----- N01002010 - NP Alastair N01002020 - BEDZ was N01002030 - AT a N01002040 - NN bachelor N01002041 - . . ^N01002042 ----- ----- ----- N01002050 - ABN all N01002060 - PP$ his N01002070 - NN life N01002080 - PP3A he N01002090 - HVD had N01002100 - BEN been N01002110 - VBN inclined N01002120 - TO to N01003010 - VB regard N01003020 - NNS women N01003030 - IN as N01003040 - PN something N01003050 - WDTRwhich N01003060 - MD must N01003070 - RB necessarily N01003080 - BE be N01003090 - VBN subordinated N01003100 - IN to N01004010 - PP$ his And the 'untagged' version of the same passage, plus the following lines: N01 0010 DAN MORGAN TOLD HIMSELF HE WOULD FORGET Ann Turner. He N01 0020 was well rid of her. He certainly didn't want a wife who was fickle N01 0030 as Ann. If he had married her, he'd have been asking for trouble. N01 0010 DAN MORGAN TOLD HIMSELF HE WOULD FORGET Ann Turner. He N01 0020 was well rid of her. He certainly didn't want a wife who was fickle N01 0030 as Ann. If he had married her, he'd have been asking for trouble.
  • 46. N01 0040 But all of this was rationalization. Sometimes he woke up in N01 0050 the middle of the night thinking of Ann, and then could not get back N01 0060 to sleep. His plans and dreams had revolved around her so much and for N01 0070 so long that now he felt as if he had nothing. The easiest thing would N01 0080 be to sell out to Al Budd and leave the country, but there was N01 0090 a stubborn streak in him that wouldn't allow it. The best antidote N01 0100 for the bitterness and disappointment that poisoned him was hard N01 0110 work. He found that if he was tired enough at night, he went to sleep Users can choose the version (from those available to them) which includes the information that they need. If you are only interested in word frequencies, then the grammatical information encoded in the tagged version is redundant, and the untagged version can be used. If however you are looking for the word 'set' used as a noun, then it would be necessary to use a tagged version, so that this word can be differentiated from 'set' used as a verb or adjective. Processing LOB and Brown The Susanne corpus This corpus uses a section of the Brown corpus and marks it up with syntactic information. N01:0010a - YB <minbrk> - [Oh.Oh] N01:0010b - NP1m DAN Dan [O[S[Nns:s. N01:0010c - NP1s MORGAN Morgan .Nns:s] N01:0010d - VVDv TOLD tell [Vd.Vd] N01:0010e - PPX1m HIMSELF himself [Nos:i.Nos:i] N01:0010f - PPHS1m HE he [Fn:o[Nas:s.Nas:s] N01:0010g - VMd WOULD will [Vdc. N01:0010h - VV0v FORGET forget .Vdc] N01:0010i - NP1f Ann Ann [Nns:o. N01:0010j - NP1s Turner Turner .Nns:o]Fn:o]S] N01:0010k - YF +. - . N01:0010m - PPHS1m He he [S[Nas:s.Nas:s] N01:0020a - VBDZ was be [Vsb.Vsb] N01:0020b - RR well well [Tn:e[R:h.R:h] N01:0020c - VVNt rid rid [Vn.Vn] N01:0020d - IO of of [Po:u. N01:0020e - PPHO1f her she .Po:u]Tn:e]S] N01:0020f - YF +. - . N01:0020g - PPHS1m He he [S[Nas:s.Nas:s] N01:0020h - RR certainly certainly [R:m.R:m] N01:0020i - VDD did do [Vde. N01:0020j - XX +n<apos>t not . N01:0020k - VV0v want want .Vde] N01:0020m - AT1 a a [Ns:o101. N01:0020n - NN1c wife wife .
  • 47. N01:0020p - PNQSr who who [Fr[Nq:s101.Nq:s101] The London-Lund corpus This corpus differs from the others that we have looked at because it is a transcription of spoken English. Intonation is marked. 1 1 1 10 1 1 B 11 ((of ^Spanish)) . graphology#/ 1 1 1 20 1 1 A 11 ^w=ell# ./ 1 1 1 30 1 1 A 11 ((if)) did ^y/ou _set _that# - / 1 1 1 40 1 1 B 11 ^well !Joe and _I#/ 1 1 1 50 1 1 B 11 ^set it between _us#/ 1 1 1 60 1 1 B 11 ^actually !Joe 'set the :paper#/ 1 1 1 70 1 1 B 20 and *((3 to 4 sylls))*/ 1 1 1 80 1 1 A 11 *^w=ell# ./ 1 1 1 90 1 1 A 11 quot;^m/ay* I _ask#/ 1 1 1 100 1 1 A 11 ^what goes !into that paper n/ow#/ 1 1 1 110 1 1 A 11 be^cause I !have to adv=ise# ./ 1 1 1 120 1 1 A 21 ((a)) ^couple of people who are !doing [dhi: @]/ 1 1 1 130 1 1 B 11 well ^what you :d/o#/ 1 1 1 140 1 2 B 12 ^is to - - ^this is sort of be:tween the :tw/o of / 1 1 1 140 1 1 B 12 _us# / 1 1 1 150 1 1 B 11 ^what *you* :d/o#/ 1 1 1 160 2 1 B 23 is to ^make sure that your 'own . !candidate/ 1 1 1 170 1 1 A 11 *^[m]#*/ 1 1 1 160 1 2(B 13 is . *.* ^that your . there`s ^something that your / 1 1 1 160 1 1(B 13 :own candidate can :h/andle# - -/
  • 48. CUVOALD This acronym stands for the Computer Usable Version of the Oxford Advanced Learners Dictionary. There are in fact two versions. The most useful is usually in a file called cuv2.dat contains 68742 words including inflected forms and proper nouns. It is most often of use as a wordlist, but the file also contains a phonemic transcription and a part- of-speech tag for every word. Here is a sample of cuv2.dat: verbs v3bz Kj verdancy 'v3dnsIL@ verdant 'v3dnt OA verdict 'v3dIkt K6 verdicts 'v3dIkts Kj verdigris 'v3dIgrIs L@ verdure 'v3dj@R L@ verge v3dZ I2,K6 3A verged v3dZd Ic,Id 3A verger 'v3dZ@R K6 vergers'v3dZ@z Kj verges 'v3dZIz Ia,Kj 3A verging 'v3dZIN Ib 3A verifiable 'verIfaI@bl OA verification ,verIfI'keISn M6 verifications ,verIfI'keISnz Mj verified 'verIfaId Hc,Hd 6A verifies 'verIfaIz Ha 6A verify 'verIfaI H3 6A verifying 'verIfaIIN Hb 6A verily 'ver@lIPu verisimilitude ,verIsI'mIlItjud M6 verisimilitudes,verIsI'mIlItjudz Mj veritable 'verIt@bl OA verities'verItIz Mj verity 'verItI M8 vermicelli ,v3mI'selI L@ vermiform 'v3mIfOm OA vermilion v@'mIlI@n M6,OA The coding conventions for the phonemic and syntactic tags are explained in a file that comes with dictionary. Some examples of applications that use the dictionary can be found in the appendix of this course. Other texts Corpus building is currently a growth area, and there are many, many more corpora as well as the above examples. Currently available or under construction are a number of very large corpora, comprehensive corpora aiming to cover all registers of English,
  • 49. international English corpora, corpora of different languages and specialised corpora covering a single well-defined domain of language. -------------------------------------------------------------------------------- Exercises 1. Find a large text file with a fixed field format (e.g. the Brown or LOB corpora) and inspect the format. Use zcat to view it if necessary. 3. Use cut to strip away the reference material and leave just the text field. 4. Use tr to strip away any tags that are actually in the text (e.g. attached to the words), so that you are left with just the words. 5. Make a sorted wordlist from the file. 6. Combine the above commands in a shell script so that you have a small program for extracting a wordlist. INTRODUCTION TO THE VI SCREEN EDITOR -------------------------------------------------------------------------------- What is vi Vi is a screen editor. This means that you can see part of the file in a window on the screen, and editing operations can be controlled by moving a cursor around the text on screen. Vi works in a different way from the editing functions of modern word processors. It's effective use requires a considerable amount of expertise on the part of the user. The user must have the ability to remember and manipulate opaquely named one-letter commands that can be combined in an arbitrary variety of different ways. Vi is a screen-based version of ex. It's lack of user-friendliness is largely a result of this. In many ways it still works like a line editor, with complicated commands typed in by the user. The main enhancements on ex are the window, which enables you to constantly view part or all of the file, the visible cursor and the commands that can be issued without moving to the command line. Once you have learned to start vi, you will probably not need to use ex again. Everything that you have learned with ex, you can do with vi. What is more, with vi you have a window and the possibility to use interactive commands. The only
  • 50. time that you might want to use ex now is if you have trouble running a screen-based utility on your terminal. Using vi The next section lists the commands needed to start and use vi. In this section, the key concepts underpinning the use of vi are explained so that you can understand what is happening when you use it. The first thing to understand is that there are three modes: command mode: insert mode last line mode (or command line mode) You start in command mode. The commands listed below for moving the cursor and changing the file are entered in command mode. To enter a command simply type it at the keyboard. What you type will not appear anywhere on screen. To abandon a command you have started, you can type <ESC>. If you are not sure which mode you are in at any time you can type <ESC> and return to command mode. When you leave the other modes you return to command mode. Insert mode is used to enter text. Insert mode is entered by issuing one of a variety of commands that involve entering text. Insert mode must be exited in order to issue more commands. A common mistake made is to attempt to enter a command while in insert mode, which results in the command appearing on screen as part of the text. Last line mode is entered from command mode, and enables the user to type a command on the last line of the screen. Any ex command can be used in this way, simply by typing ':' followed by the command. The current line will be that where the cursor is positioned. When you start vi you will see a screen similar to the one below. If you are starting a new file, or the file you are editing is less than 18 lines long, then the empty lines in the window will be marked by the '~' (tilde) character. -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- This is a small file called 'vi.prac'. This is the second and last line. ^ ^ ^ ^
  • 51. ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ quot;vi.pracquot; 2 lines 103 characters A typical vi screen Note that is necessary to press return at the end of each line of text that you enter. Otherwise, vi will interpret all of your text as a single line! -------------------------------------------------------------------------------- PRACTICE Create a new file, enter several lines of text and save it. Edit an existing file that you have, making several changes. -------------------------------------------------------------------------------- vi reference vi modes command Normal and initial state. <ESC> cancels partial command insert entered by the following commands: a, A, i, I, o, O, c, C, s, S, R. Terminates with <ESC> (or ^C). last line entered by :, /, ? or !. Input is read and echoed at the bottom of the screen. Commands executed by <RETURN> or <ESC>, terminated by ^C. Entering and leaving vi % vi file edit file % vi +n file edit starting at line n % vi + file :edit starting at end % vi +/RE/ file edit starting at RE % view file read only mode ZZ exit from vi, saving changes (same as :wq) ^Z stop vi process, for later resumption Some simple commands
  • 52. The following are examples of some compound commands, using the operators listed later. dw delete word de delete word leaving punctuation dd delete line 4dd delete 4 lines xp transpose characters cwtext<ESC> change word to text File manipulation The following are all last line mode commands, so must be preceded by a colon. w save changes wq save and quit q quit q! quit, discarding changes e file edit file e! re-edit current file, discarding changes w file write to file w! file overwrite file ! command execute shell command, then return f show current file and line Positioning within the file ^F forward one screenful ^B back one screenful ^D scroll down half screen ^U scroll up half screen nG go to line n (last line default) /RE/ go to next occurrence of RE % find matching bracket Marking `` return to previous cursor position mx mark position with x `x go to mark x Line positioning H top line of window (home) M middle line of window L last line of window + next line, at first non-white character - previous line, at first non-white character <RETURN> same as + j next line, same column (same as down arrow) k previous line, same column (same as up arrow) Character positioning 0 beginning of line ^ first non-white in line
  • 53. $ end of line <SPACE> forward (same as right arrow) fx find x forwards in current line Fx find x backwards in current line ; repeat last find command forwards : repeat last find command backwards n| go to column n Words, sentences, paragraphs w forward to start of next word (delimited by non-alphanumeric character) b back to start of last word e forward to end of next word W as w, with word delimited by blank only B as b, with word delimited by blank only E as e, with word delimited by blank only ) forward to start of next sentence ( Back to start of next sentence } Forward to start of next sentence { Back to start of last sentence Corrections during insert H erase last character (or your usual delete key) W erase last word escape character <ESC> ends insert; back to command mode C ends insert Insert and replace commands a append after cursor i insert before cursor A append at end of line I insert before first non-blank o open line below current line O open line above current line rx replace single character with x R replace characters Operators The following can be doubled to apply to a line and also preceded by a number to indicate a number of lines. They can be combined with positional commands (e.g.d$ to delete to end of line.) d delete c change y yank Miscellaneous operations x delete character X delete character to left of cursor C change rest of line (same as c$). D delete rest of line (same as d$)
  • 54. J join lines Y yank (paste) lines Yank and put p put back after cursor P put back before cursor quot;xp put from buffer x quot;xy yank to buffer x quot;xd delete to buffer x Undo, redo and retrieve u undo last change U restore current line . repeat last command quot;np retrieve nth last delete TEXT FORMATTING -------------------------------------------------------------------------------- There are text formatting facilities available with all Unix implementations. They will not be investigated in any detail here. Many users will prefer to use a PC-based word processing package for document production. Those that want to format text on Unix will have vastly differing needs, and it would be impossible to go into all of the possibilities here. A flavour of the simpler programs is given here, and users can look elsewhere for more extensive documentation. pr This is a filter that will format a text, giving a choice of columns, page width, length etc.. It is not capable of sophisticated formatting for document production. nroff The simplest of the proper formatters is nroff. You can format a plain text file with nroff, by simply typing: % nroff text_file Formatting commands can be inserted into text files. Some simple commands: .ce centre text .ll line length .pl page length .po page offset (left margin) .sp blank line These commands may be followed by a numerical argument, which will make the command apply to the specified number of lines, e.g. .sp 3 to leave three blank lines. Formatting commands must be placed at the beginning of a line to be recognised as such. Normally they appear as the only text on a line. Commands are normally composed of lower-case characters. Here is an example of a text containing some nroff instructions:
  • 55. .ce This is the title .sp 2 And this is the text, which will be formatted and justified when I run nroff. You will see that the line breaks will change, and the text will look tidier. That is what formatting is all about. .sp That was a blank line. The following is what the output from this file would look like: This is the title And this is the text, which will be formatted and justified when I run nroff. You will see that the line breaks will change, and the text will look tidier. That is what formatting is all about. That was a blank line. nroff macros Macros are a special type of nroff command, identified by being in upper-case characters. Standard macro libraries can be invoked by using option flags with the nroff command, e.g.: nroff -ms filename for the standard macros. Other macro libraries can be invoked by the me, mn and mv options. Here are some standard macros: .FS footnote starts .FE footnote ends .ND no date .TL title .PP start paragraph The .PP tag, for example, is the equivalent of the following sequence of ordinary nroff instructions: .sp 5 .ce 1 .sp 5 It is possible write your macros. More details on nroff can be found in the manual. MORE ON THE SHELL
  • 56. -------------------------------------------------------------------------------- General The role of the shell A Unix shell is used to: evaluate the command line. For example: % car nofile car: Command not found Here the shell looks for a command called car. Since it cannot find this command it gives an error message. perform variable substitution. For example: % echo quot;In directory $HOMEquot; In directory /home/sunserv1_b/lnp5jb Here the shell variable $HOME is evaluated and displayed. handle pipelines. For example: % who | wc -l Here the output from who is piped through to the wc command which displays a count of the number of lines in its input. Types of shells A number of shells are available for Unix systems, including: Bourne shell C shell Korn shell Graphical User Interface (GUI) shells The Bourne shell, which was developed by Steve Bourne at Bell Laboratories, is one of the oldest shells and, as such, has gained a lot of popularity. It is widely used for shell programming because of its efficiency and because it is available on all Unix systems. The C shell provides sophisticated interactive capabilities lacking in the Bourne shell. The C shell, which was developed at the University of California, Berkeley, has a syntax
  • 57. which resembles the C language. Features of the C shell include a command history buffer, command aliases and file name completion. However the C shell does not allow efficient shell programs (also known as scripts) to be written. Due to the fact that C shell programs are written in a style similar to the C programming language, people who are unfamiliar with C may find the C shell difficult to program in. The Korn shell combines the best features of the Bourne and C shells. Korn scripts are 95% upwardly compatible with Bourne scripts. The Korn shell interactive features include: in-line editing command editing job control Graphical User Interface (GUI) shells provide a iconic interface to Unix. GUI shells require the use of workstations (or powerful microcomputers) which perform part of the processing locally. The use of GUIs such as X-Windows is likely to become increasingly important in the near future. GUIs currently available include: Sun View A Sun-specific GUI Open Look GUI standard supported by Sun Motif GUI standard supported by other suppliers Vista eXceed Available on PCs; similar in style to Motif There is a battle currently taking place in the market-place to establish the standard GUI. Recommended shells The Bourne shell is the oldest shell, and is widely used. The C shell has more utilities however and is probably more widely used now. -------------------------------------------------------------------------------- The default shell for interactive shells at Leeds is the C shell. The Bourne shell is the default for shell programs. --------------------------------------------------------------------------------
  • 58. However the Bourne shell is recommended for shell programs. The Korn shell is not widely available and is not a standard part of Unix, but is perhaps the best option if available, unless you want to do a lot of C programming. You can change your default login shell using the command: % chsh username /bin/sh Bourne shell% chsh username /bin/csh C shell% chsh username /bin/ksh Korn shell Warning! You probably don't want to try these commands now. C shell features The history mechanism The history mechanism enables previous typed Unix commands to be re-invoked and edited. There are two forms. One is the quick substitution, which acts only on the immediately preceding command, e.g: % car message car: Command not found % ^r^t This is the message file This command replaces the first occurrence of 'r' with 't' in the last command. A list of previously entered commands can be displayed using the history command: % history 1 cd texts 2 vi lookup 3 who 4 history Commands can be re-entered using the number. For example: % !2 will re-execute the second command (vi lookup). It is possible to add extra options to commands re-executed. For example to redirect output from the who command to a file called list we could give the command (for the above list): % !3 > list You may also edit previous commands e.g: % !2:s/vi/cat/ cat lookup
  • 59. although it is usually easier to re-type the whole command. The last command may be referred to as !!, and you can count back using !-2, !-3 etc.. File name completion Within the C shell when a file name is used in a command it is possible to specify only as many characters as will uniquely identify the file, and then press the <ESC> key to complete the filename: % ls mbox message % cat me<ESC> This is the message file When you type <ESC>, the file name will be extended to 'message' on screen. Command aliases Command aliases provide a way of customising commands. For example: % alias dir ls % dir mbox message Note that command aliases are only valid during the execution of the current shell. It is normal practice to include alias definitions in your .cshrc file. The following aliases could be useful to shorten long command names: alias hh history alias ll 'ls -al' alias q logout The quotes around ls -al are necessary because of the space in the command. This tells the shell that it is all one command. -------------------------------------------------------------------------------- PRACTICE -------------------------------------------------------------------------------- Put the above aliases in your .cshrc file. Think of some other aliases that you would use, such as shortened versions of commands or different names for commands that you will find easier to remember.
  • 60. C shell startup files Certain files are executed automatically. These are: .cshrc file Executed whenever a new C shell spawned Useful for specifying command aliases Since C shells may be spawned automatically be certain systems commands (such as the mail system of a compiler) this file should NOT contain commands which send output to your terminal. Contains a list of directories that are searched for commands. A line in the .cshrc file will give a value to the PATH system variable. The user can add pathnames to this list. It is conventional to store any of your own commands or shell scripts that you will use frequently directory called bin, and to add ^/bin to your search path. .login file Executed when you login. Use for setting system wide variables, such as your terminal type. Can be used to display information, such as who is logged on, or news from the system managers. Shell processes A process is an executing program. To display a list of processes use the ps command: % ps PID TTY TIME COMMAND 23268 ttyp1 0:01 ps 22520 ttyp1 0:00 csh The PID specifies the Process Identifier. The 'time' field gives the amount of CPU used by the process. Background processes Normally processes run interactively, but they may also be run interactively, to enable the user to do something else while a process is running (this is known as 'multitasking'). This is usually necessary when you are running a very long job. To run a command in the background use the & character at the end of the command line, as follows:
  • 61. % command & Note that output from command will still be sent to standard output. If you fail to redirect standard output it will be sent to your terminal where it is likely to be confused with output from your interactive process. For example, to sort logged on users using a background process give the command: % who | sort > sortedwho & Note that this would normally be a very short process and you would not in fact need to run it in the background. Controlling processes You may wish to terminate a background process. To do this first you must first find out its process id (PID) using ps: % ps PID TTY TIME COMMAND 23397 ttyp1 0:01 who 23268 ttyp1 0:02 ps 22520 ttyp1 0:00 csh Then use the kill command to terminate your process. For example: % kill 23397 If the process continues use the -9 argument: % kill -9 23397 Another way of displaying your background processes is to use the jobs command: % jobs [1] + Running who - sort > sortedwho The background process (or 'job') has been assigned the number 1, and this can be used to refer to it instead of the process i.d.. The job number is usually identified by preceding it with the '%' (per cent) character, so as to differentiate it from a process i.d.. So, for example, the command: % kill %1