[Next] [Previous] [Up] [Top] [Contents] [Index]

Chapter 6: The UNIX File System

6.2 Files

An ordinary file contains ASCII characters or binary data and is considered by the UNIX system to be merely a sequence of bytes. No structure is imposed on the file and no meaning attached to its contents by the system; the meaning depends on the program that reads the file.

A directory file contains an entry for each file in that directory. The directory entry for a particular file contains the file name and inode number. The inode number is a volume data structure used by the file system. It has an associated entry in the inode table which contains other information about the file such as the owner, file protection, modification date.

A hidden file is an ordinary file whose name begins with a period (called "dot"). The .login, .cshrc, and .logout files described in Chapter 9 are hidden files. The reason they are called hidden is that the ls (list files) command does not list them by default. Use the -a option with ls to see them. Hidden files do not appear in filename expansion of *, either. Filename expansion is discussed in section 6.2.2, below.

[Missing image]UNIX does not support file versions. If you edit a file and save it with the same name, your earlier version is overwritten. Similarly, if you copy or rename (move) a file to a filename that already exists, the original file is overwritten.

6.2.1 Filenames

A full file specification has only two parts, the directory specification and the file name. A filename is composed of from 1 to 14 characters in old UNIX implementations and a much larger number in more recent versions (up to 255, typically). Although you can use any character in a filename except /, UNIX assigns special meaning to many characters (metacharacters), so they should be avoided (see section 2.5). It is safe to use the upper- and lowercase letters, numbers, dash (-), underscore (_), period (.), and comma (,). As mentioned in the previous section, files beginning with a dot (.) are hidden files. The "filenames" . and .. (single and double dot) are reserved. The . refers to the current directory, and the .. refers to the current directory's parent directory. No two files in the same directory can have the same name, but files in different directories can have the same name.

VMS users have not had the luxury of using dashes in filenames. Dashes are much more common in UNIX filenames simply because it's easier to type my-file than my_file.

Filenames are case sensitive. This means MYFILE is different from Myfile is different from myfile is different from myFile, etc.

You cannot distinguish a directory file from an ordinary file by its name, although some people make their own convention by beginning directory filenames with a capital letter, or ending them in .d.

Filename extensions are not required in UNIX. You can include a period and an extension in a filename to help describe the contents of the file, but it will not have special meaning to UNIX itself. However, programs can make use of extensions, for example the FORTRAN compiler expects certain extensions. Note, you can have more than one period in a filename, for example, lex.yy.c.

6.2.2 Filename Expansion and Wildcard Characters

The UNIX shells have a number of special characters which can be used on the command line when specifying filenames and directory names. They allow the shell to expand the argument into a set of filenames. These characters are called wildcards. Filename references that contain these characters are called ambiguous file references. Filename expansion is also called globbing.

The question mark (?) causes the shell to generate filenames which match any single character in that position. For example, out? matches out1 but not out12.

The asterisk (*) causes the shell to generate filenames which match any number of characters (including zero characters) in that position. For example, myfile matches myf*. The * alone means all files (except those that begin with dot (.), which is a special case).

A pair of brackets ([ ]) surrounding a list of characters causes the shell to match filenames containing the individual characters in that position. The brackets define a character class and each definition can only replace a single character in a filename. In other words, it is like a question mark that will only allow certain characters. For example, memo1 and memoa match memo[14a]), but memo3 and memo1a do not. A hyphen can be used to define a range of characters, for example [a-z] represents all lowercase characters. Thus memo[a-z] matches memoa but not memo2 or memoB.

Character

Action

?

matches any single character in a filename

*

matches any string of characters (including the empty string) in a filename

[ ]

matches any single character from the set enclosed in the brackets

Examples:

% ls out*

lists all files beginning with out

% ls out?

lists all files with 4-character names beginning with out

% ls out[ab]*

lists all files beginning with out followed by a or b (e.g., outa4)

% ls *out*

lists all files containing out

Filename expansion may surprise you with the results. For example, ls b* would list all files starting with b in the current directory, but it would also list the contents of all directories whose names start with b because of the way ls behaves for a directory argument. If you want to be sure of what filename expansion will result in, you can use the echo command to check it before executing a command.[24] For example, say you have a few matching files in your directory for the command:

% echo *out* 

You would obtain output something like this:

fout fout275 inandout out1 out2 out

Filename expansion in csh can be turned off by setting the noglob variable:

% set noglob

To turn it back on, type unset noglob.


[24] echo is otherwise useful for sending messages to the terminal from a script and sending known data into a pipe.

UNIX at Fermilab - 10 Apr 1998

[Next] [Previous] [Up] [Top] [Contents] [Index]