[Next] [Previous] [Up] [Top] [Contents] [Index]

Chapter 5: Important UNIX Concepts

5.4 Important Concepts

This section attempts to provide an overview of a few of the important concepts in UNIX which are very different from other systems and may therefore be confusing to the novice user. In order to be able to make effective use of UNIX, these concepts need to be understood.

5.4.1 Path

When you issue a command, the shell program parses the command line and either processes it directly or searches for an executable file with that name in any of the directories specified in your search path, which is controlled by the variable PATH. If the file is not found in any of the directories in your search path, the shell reports that the command was not found. The file may well be on the disk somewhere, but it is not in your path.[19]

FUE attempts to provide an appropriate path, and we recommend that you not change this basic path. However, feel free to add directories to it. For the csh family, your .login file contains a set path line for the shell variable path.[20] Uncomment this line (remove the #) and include additional directories in the shown format:

set path=($path /dir1 /dir2... )

Or change the environment variable PATH (also in .login), as follows:

setenv PATH "${PATH}:/dir1:/dir2"

For the sh family, uncomment and add directories to the PATH line in your .profile file:

PATH=$PATH:/dir1:/dir2...

See section 9.2 for information on the PATH variable.

As an aside, if you add an executable to one of the directories in your search path, it may be necessary for you to either log out and log back in, or to recreate the internal tables used by the shell with the rehash (csh) or hash (sh) command (see section 4.4).

5.4.2 Standard Input and Output Redirection

The shell and many UNIX commands take their input from standard input (stdin), write output to standard output (stdout), and write error output to standard error (stderr). By default, standard input is connected to the terminal keyboard and standard output and error to the terminal screen.[21]

The way of indicating an end-of-file on the default standard input, a terminal, is usually <Ctrl-d>.

Redirection of I/O, for example to a file, is accomplished by specifying the destination on the command line using a redirection metacharacter followed by the desired destination.

C Shell Family

Some of the forms of redirection for the C shell family are:

Character

Action

>

Redirect standard output

>&

Redirect standard output and standard error

<

Redirect standard input

>!

Redirect standard output; overwrite file if it exists

>&!

Redirect standard output and standard error; overwrite file if it exists

|

Redirect standard output to another command (pipe)

>>

Append standard output

>>&

Append standard output and standard error

The form of a command with standard input and output redirection is:

% command -[options] [arguments] < input file  > output file 

If you are using csh and do not have the noclobber variable set (see section 9.2), using > and >& to redirect output will overwrite any existing file of that name. Setting noclobber prevents this. Using >! and >&! always forces the file to be overwritten. Use >> and >>& to append output to existing files.

Redirection may fail under some circumstances: 1) if you have the variable noclobber set and you attempt to redirect output to an existing file without forcing an overwrite, 2) if you redirect output to a file you don't have write access to, and 3) if you redirect output to a directory.

Examples:

% who > names

Redirect standard output to a file named names

% (pwd; ls -l) > out

Redirect output of both commands to a file named out

% pwd; ls -l > out

Redirect output of ls command only to a file named out

Input redirection can be useful, for example, if you have written a FORTRAN program which expects input from the terminal but you want it to read from a file. In the following example, myprog, which was written to read standard input and write standard output, is redirected to read myin and write myout:

% myprog < myin > myout 

You can suppress redirected output and/or errors by sending it to the null device, /dev/null. The example shows redirection of both output and errors:

% who >& /dev/null 

To redirect standard error and output to different files, you can use grouping:

% (cat myfile > myout) >& myerror 

Bourne Shell Family

The Bourne shell uses a different format for redirection which includes numbers. The numbers refer to the file descriptor numbers (0 standard input, 1 standard output, 2 standard error). For example, 2> redirects file descriptor 2, or standard error. &n is the syntax for redirecting to a specific open file. For example 2>&1 redirects 2 (standard error) to 1 (standard output); if 1 has been redirected to a file, 2 goes there too. Other file descriptor numbers are assigned sequentially to other open files, or can be explicitly referenced in the shell scripts. Some of the forms of redirection for the Bourne shell family are:

Character

Action

>

Redirect standard output

2>

Redirect standard error

2>&1

Redirect standard error to standard output

<

Redirect standard input

|

Pipe standard output to another command

>>

Append to standard output

2>&1|

Pipe standard output and standard error to another command

Note that < and > assume standard input and output, respectively, as the default, so the numbers 0 and 1 can be left off. The form of a command with standard input and output redirection is:

$ command -[options] [arguments] < input file > output file 

Redirection may fail under some circumstances: 1) if you have the variable noclobber set and you attempt to redirect output to an existing file without forcing an overwrite, 2) if you redirect output to a file you don't have write access to, and 3) if you redirect output to a directory.

Examples:

$ who > names

Direct standard output to a file named names

$ (pwd; ls -l) > out

Direct output of both commands to a file named out

$ pwd; ls -l > out

Direct output of ls command only to a file named out

Input redirection can be useful if you have written a FORTRAN program which expects input from the terminal and you want to provide it from a file. In the following example, myprog, which was written to read standard input and write standard output, is redirected to read myin and write myout.

$ myprog < myin > myout 

You can suppress redirected output and/or error by sending it to the null device, /dev/null. The example shows redirection of standard error only:

$ who 2> /dev/null 

To redirect standard error and output to different files (note that grouping is not necessary in Bourne shell):

$ cat myfile > myout 2> myerror 

5.4.3 Pipes

UNIX uses the concept of a pipe to connect the standard output of one program directly into the standard input of another program. This is specified by separating the two commands with the pipe operator, the vertical bar (|). The general format is:

% command1 | command2 | ... 

where, of course, each command can have options and arguments. To implement pipes of commands, the shell forks off multiple processes. For example if you run the command:

% history | more

the shell forks twice; the grandchild runs history, the child runs more (after hooking up the right file descriptors to the right pipe ends), and the parent shell waits for the process to finish. The history command, a built-in, is implemented in the grandchild shell process directly, while the more command requires an exec system call.

The tee command can be used to send output to a file as well as to another command.

% who | tee whoout | sort 

This creates a file named whoout which contains the original who output. It also sorts the who output and sends it to standard output, the terminal screen. The following example sends the (unsorted) who output to the file and the screen:

% who | tee whoout

5.4.4 Filters

A filter is a command or program which gets its input from standard input, sends its output to standard output, and may be used anywhere in a pipeline. Examples of filters are the UNIX utilities:

The combination of UNIX filters grep, awk, and sort and the use of pipes is very powerful.

more and less

The more filter allows you to display output on a terminal one screen at a time. You press Spacebar to move to the following screen, and q to quit.

less is a much more flexible variant of the standard UNIX utility more and is provided under FUE[22]. The command less lists the output (e.g., specified files) on the terminal screen by screen like the command more, but in addition allows backward movement in the file (press b to go back one full screen) as well as forward movement. You can also move a set number of lines instead of a whole page. To view a file with the less filter, enter:

% less [options] [filename]...

The options and usage are described in the man pages for more and less.

After displaying a page of information, more and less display a colon prompt (:) at the bottom of the screen and wait for instructions.

LESS(1) UNIX System V LESS(1)

NAME

less - opposite of more

SYNOPSIS

less [-[+]aABcCdeEimMnqQuUsw] [-bN] [-hN] [-xN] [-[z]N]

[-P[mM=]string] [-[lL]logfile] [+cmd]

[-ttag] [filename]...

DESCRIPTION

Less is a program similar to more (1), but which allows

backwards movement in the file as well as forward movement.

Also, less does not have to read the entire input file

before starting, so with large input files it starts up

faster than text editors like vi (1). Less uses termcap (or

terminfo on some systems), so it can run on a variety of

terminals. There is even limited support for hardcopy

:

You can search for patterns in the file by entering /pattern at the less prompt. Continue to search for the same pattern using a slash (/). A further advantage is that less does not have to read the entire input file before starting, so with large input files it starts up faster than text editors like vi.

grep

The grep filter searches the contents of one or more files for a pattern and displays only those lines matching that pattern. grep is described in Section 6.4.2.

awk

awk is much more than a filter; it is a powerful pattern scanning and processing language. Although you will need to spend a little time learning how to use awk, it is very well suited to data-manipulation tasks. It handles internally what you would have to handle laboriously in a language like C or FORTRAN. You can do in a few lines what would take many, many lines of FORTRAN.

awk works best when the data it operates on has some structure, for example a document with heading levels, or a table. In the case of a table, you can tell it the field separator (spaces, colons, commas, tabs) and it can align and interpret the contents of the field according to the way you use it. Or you can reorder the columns, or change rows into columns and vice-versa.

We present here some very basic information to get you acquainted with the concepts of awk, but you will need a more in-depth reference in order to use this utility. A widely-available book on awk is The awk Programming Language by Aho, Kernighan, and Weinberger, Addison-Wesley. Another good reference, from which much of the information in the present section is extracted, is sed & awk published by O'Reilly & Associates.

There are several versions of awk, and they differ from platform to platform. "Old" awk may be awk or oawk, "new" awk may be nawk. FUE provides a GNU version of awk called gawk as part of the shells product.

Some of the features of awk are:

With nawk, additional features make it easier to write larger scripts. Using nawk you can:

awk executes a set of instructions for each line of input. You can specify instructions on the command line or create a script file. Input is read a line at a time from one or more files or from standard input. The instructions must be enclosed in single quotes to protect them from the shell. (Instructions always contain curly braces which are interpreted as special characters by the shell.) We refer you to one of the books on awk for the available instructions. We'll use the instruction print in our examples.

For command lines, the syntax is:

% awk 'instructions' files

As an example, say that file test contains only the line Hello, world. The command:

% awk '{ print }' test

produces the output:

Hello, world

Multiple command lines can be entered by separating commands with semicolons or using the multi-line input capability of the Bourne shell. awk programs are usually placed in a file where they can be tested and modified. The syntax for invoking awk with a script file is:

% awk -f 'script' files

where -f indicates that the filename of a script follows.

awk interprets each line of the input data file(s) as a record, and each word on that line, delimited by blank spaces or tabs, as a field. You can reference these fields, either in patterns or procedures. $0 represents the entire input line; $1, $2, ... refer to the position of individual fields on the input line. As an example, say that the file personnel contains a list of employees' first names, last names, and addresses. The command:

% awk '{ print $1}' personnel

would produce output of the type:

John

Alice

Mary

Eric

To use the pattern-matching features of awk, you need to be familiar with the metacharacters used in regular expressions (see section 5.4.5). A pattern is enclosed between forward slashes (/) on the command line or in a script. When awk reads an input line, it attempts to match each pattern-matching rule in a script. Only the lines matching the particular pattern are the object of an action. If no action is specified, the line that matches the pattern is printed (executing the print statement is the default action).

In our personnel example, let's assume that Alice is from Illinois (IL) and Eric is from Iowa (IA). To bring up the complete records with the pattern IL or IA, we could issue the command:

% awk '/I./{print}' personnel

where the metacharacter . matches any single character. We could more simply type:

% awk '/I./' personnel

and get the same result in either case:

Alice Jones 834 S. Jefferson St., Batavia, IL 60510

Eric Smith 24 Birch St., Albert City, IA 50510

In Appendix D we present awk's programming model, which is beyond the scope of the present section. This model will help you understand the potential that awk offers the programmer.

sort

sort sorts the lines of the specified files, typically in alphabetical order. Using the -m option it can merge sorted input files. Its syntax is:

% sort [options] [field-specifier] [filename(s)] 

For example, start with the personnel file contents:

John Smith 75 South Ave., Denver, CO 80145

Alice Jones 834 S. Jefferson St., Batavia, IL 60510

Mary Fahey 901 California St., San Francisco, CA 94121

Eric Smith 24 Birch St., Albert City, IA 50510

Run the command:

% sort personnel

to reorder the file contents as follows:

Alice Jones 834 S. Jefferson St., Batavia, IL 60510

Eric Smith 24 Birch St., Albert City, IA 50510

John Smith 75 South Ave., Denver, CO 80145

Mary Fahey 901 California St., San Francisco, CA 94121

sort is very easy to use. Read the man page for sort to see what the available options are and how to specify the sort fields. If a field is not specified, the sort key is the entire line. The sorted output goes to standard output by default.

5.4.5 Regular Expressions

A regular expression is a string composed of letters, numbers, and special symbols that defines one or more strings. They are used to specify text patterns for searching. This is similar to wildcards on VMS.

A regular expression is said to match any string it defines. The major capabilities include:

  1. match single characters or strings of characters

2) match any arbitrary character

3) match classes of characters

4) match specified patterns only at the start or end of a line

5) match alternative patterns

Regular expressions are used by vi, grep, and awk (and at least a couple of utilities not covered in this manual, for instance ed and sed). grep in fact stands for global regular expression printer. For a complete discussion of regular expressions, refer to a UNIX text. To get you started, we include a table of special characters that can be used in expressions.

Note that regular expression special characters are different from those used in filename expansion.

.

Matches any single character

Example: .ing matches all strings with any character preceding ing; singing, ping

*

Represents 0 or more occurrences of the preceding character

Example: ab*c matches a followed by 0 or more b's followed by c; ac, abc, abbbbbc

.*

Matches any string of characters (. matches any character, * matches any number of occurrences of the preceding regular expression)

$

Placed at the end of a regular expression, matches the end of a line

Example: ay$ matches ay at the end of a line; ... today

^

Placed at the beginning of a regular expression, matches the beginning of a line

Example: ^T matches a T at the beginning of a line; Today ...

"

Delimits operator characters to prevent interpretation

\

Turns off special meaning of the following single character (\ is often called a quote character)

[]

Specifies character classes

[...]

Matches any one of the characters enclosed in square brackets

Example: [bB]ill matches bill or Bill

There is an extended set of special characters available for full regular expressions, including for example ? and +. These can be used in egrep and awk. Refer to a UNIX book for information.


[19] This concept will be familiar to users of MS-DOS.
[20] Shell versus environment variables are discussed in section 9.1.
[21] VMS users would know these as the logical devices SYS$INPUT, SYS$OUTPUT, and SYS$ERROR.
[22] FUE sets your environment variable PAGER to the less filter.

UNIX at Fermilab - 10 Apr 1998

[Next] [Previous] [Up] [Top] [Contents] [Index]