Much of this was take from William Knottenbelt (email@example.com), the original website is: http://www.doc.ic.ac.uk/~wjk/UnixIntro
This lecture covers:
- The concept of an operating system.
- The internal architecture of an operating system.
- The evolution of the UNIX operating system into two broad schools (BSD and SYSV) and the development of Linux, a popular open source operating system.
- The architecture of the Linux operating system in more detail.
- The concept of a process.
- Passing output from one process as input to another using pipes.
- Redirecting process input and output.
- Controlling processes associated with the current shell.
- Controlling other processes.
|1.2 What is an Operating System?|
An operating system (OS) is a resource manager. It takes the form of a set of routines that allow users and application programs to access system resources (e.g. the CPU, memory, disks, modems, printers network cards etc.) in a safe, efficient and abstract way.
For example, an OS ensures safe access to a printer by allowing only one application program to send data directly to the printer at any one time. An OS encourages efficient use of the CPU by suspending programs that are waiting for I/O operations to complete to make way for programs that can use the CPU more productively. An OS also provides convenient abstractions (such as files rather than disk locations) which isolate application programmers and users from the details of the underlying hardware.
Fig. 1.1: General operating system architecture
Fig. 1.1 presents the architecture of a typical operating system and shows how an OS succeeds in presenting users and application programs with a uniform interface without regard to the details of the underlying hardware. We see that:
Operating systems (and different flavours of the same operating system) can be distinguished from one another by the system calls, system utilities and user interface they provide, as well as by the resource scheduling policies implemented by the kernel.
- The operating system kernel is in direct control of the underlying hardware. The kernel provides low-level device, memory and processor management functions (e.g. dealing with interrupts from hardware devices, sharing the processor among multiple programs, allocating memory for programs etc.)
- Basic hardware-independent kernel services are exposed to higher-level programs through a library of system calls (e.g. services to create a file, begin execution of a program, or open a logical network connection to another computer).
- Application programs (e.g. word processors, spreadsheets) and system utility programs (simple but useful application programs that come with the operating system, e.g. programs which find text inside a group of files) make use of system calls. Applications and system utilities are launched using a shell (a textual command line interface) or a graphical user interface that provides direct user interaction.
|1.3 A Brief History of UNIX|
UNIX has been a popular OS for more than two decades because of its multi-user, multi-tasking environment, stability, portability and powerful networking capabilities. What follows here is a simplified history of how UNIX has developed (to get an idea for how complicated things really are, see the web site http://www.levenez.com/unix/).
Fig. 1.2: Simplified UNIX FamilyTree
In the late 1960s, researchers from General Electric, MIT and Bell Labs launched a joint project to develop an ambitious multi-user, multi-tasking OS for mainframe computers known as MULTICS (Multiplexed Information and Computing System). MULTICS failed (for some MULTICS enthusiasts "failed" is perhaps too strong a word to use here), but it did inspire Ken Thompson, who was a researcher at Bell Labs, to have a go at writing a simpler operating system himself. He wrote a simpler version of MULTICS on a PDP7 in assembler and called his attempt UNICS (Uniplexed Information and Computing System). Because memory and CPU power were at a premium in those days, UNICS (eventually shortened to UNIX) used short commands to minimize the space needed to store them and the time needed to decode them - hence the tradition of short UNIX commands we use today, e.g. ls, cp, rm, mv etc.
Ken Thompson then teamed up with Dennis Ritchie, the author of the first C compiler in 1973. They rewrote the UNIX kernel in C - this was a big step forwards in terms of the system's portability - and released the Fifth Edition of UNIX to universities in 1974. The Seventh Edition, released in 1978, marked a split in UNIX development into two main branches: SYSV (System 5) and BSD (Berkeley Software Distribution). BSD arose from the University of California at Berkeley where Ken Thompson spent a sabbatical year. Its development was continued by students at Berkeley and other research institutions. SYSV was developed by AT&T and other commercial companies. UNIX flavours based on SYSV have traditionally been more conservative, but better supported than BSD-based flavours.
The latest incarnations of SYSV (SVR4 or System 5 Release 4) and BSD Unix are actually very similar. Some minor differences are to be found in file system structure, system utility names and options and system call libraries as shown in Fig 1.3.
Feature Typical SYSV Typical BSD
kernel name /unix /vmunix
boot init /etc/rc.d directories /etc/rc.* files
mounted FS /etc/mnttab /etc/mtab
default shell sh, ksh csh, tcsh
FS block size 512 bytes->2K 4K->8K
print subsystem lp, lpstat, cancel lpr, lpq, lprm
echo command echo "\c" echo -n
(no new line)
ps command ps -fae ps -aux
multiple wait poll select
memory access memset, memcpy bzero, bcopy
Fig. 1.3: Differences between SYSV and BSD
Linux is a free open source UNIX OS for PCs that was originally developed in 1991 by Linus Torvalds, a Finnish undergraduate student. Linux is neither pure SYSV or pure BSD. Instead, incorporates some features from each (e.g. SYSV-style startup files but BSD-style file system layout) and aims to conform with a set of IEEE standards called POSIX (Portable Operating System Interface). To maximise code portability, it typically supports SYSV, BSD and POSIX system calls (e.g. poll, select, memset, memcpy, bzero and bcopy are all supported).
The open source nature of Linux means that the source code for the Linux kernel is freely available so that anyone can add features and correct deficiencies. This approach has been very successful and what started as one person's project has now turned into a collaboration of hundreds of volunteer developers from around the globe. The open source approach has not just successfully been applied to kernel code, but also to application programs for Linux (see e.g. http://www.freshmeat.net).
As Linux has become more popular, several different development streams or distributions have emerged, e.g. Redhat, Slackware, Mandrake, Debian, and Caldera. A distribution comprises a prepackaged kernel, system utilities, GUI interfaces and application programs.
Redhat is the most popular distribution because it has been ported to a large number of hardware platforms (including Intel, Alpha, and SPARC), it is easy to use and install and it comes with a comprehensive set of utilities and applications including the X Windows graphics system, GNOME and KDE GUI environments, and the StarOffice suite (an open source MS-Office clone for Linux).
|1.4 Architecture of the Linux Operating System|
Linux has all of the components of a typical OS (at this point you might like to refer back to Fig 1.1):
The Linux kernel includes device driver support for a large number of PC hardware devices (graphics cards, network cards, hard disks etc.), advanced processor and memory management features, and support for many different types of filesystems (including DOS floppies and the ISO9660 standard for CDROMs). In terms of the services that it provides to application programs and system utilities, the kernel implements most BSD and SYSV system calls, as well as the system calls described in the POSIX.1 specification.
The kernel (in raw binary form that is loaded directly into memory at system startup time) is typically found in the file /boot/vmlinuz, while the source files can usually be found in /usr/src/linux.The latest version of the Linux kernel sources can be downloaded from http://www.kernel.org.
- Shells and GUIs
Linux supports two forms of command input: through textual command line shells similar to those found on most UNIX systems (e.g. sh - the Bourne shell, bash - the Bourne again shell and csh - the C shell) and through graphical interfaces (GUIs) such as the KDE and GNOME window managers. If you are connecting remotely to a server your access will typically be through a command line shell.
- System Utilities
Virtually every system utility that you would expect to find on standard implementations of UNIX (including every system utility described in the POSIX.2 specification) has been ported to Linux. This includes commands such as ls, cp, grep, awk, sed, bc, wc, more, and so on. These system utilities are designed to be powerful tools that do a single task extremely well (e.g. grep finds text inside files while wc counts the number of words, lines and bytes inside a file). Users can often solve problems by interconnecting these tools instead of writing a large monolithic application program.
Like other UNIX flavours, Linux's system utilities also include server programs called daemons which provide remote network and administration services (e.g. telnetd and sshd provide remote login facilities, lpd provides printing services, httpd serves web pages, crond runs regular system administration tasks automatically). A daemon (probably derived from the Latin word which refers to a beneficient spirit who watches over someone, or perhaps short for "Disk And Execution MONitor") is usually spawned automatically at system startup and spends most of its time lying dormant (lurking?) waiting for some event to occur.
- Application programs
Linux distributions typically come with several useful application programs as standard. Examples include the emacs editor, xv (an image viewer), gcc (a C compiler), g++ (a C++ compiler), xfig (a drawing package), latex (a powerful typesetting language) and soffice (StarOffice, which is an MS-Office style clone that can read and write Word, Excel and PowerPoint files).
Redhat Linux also comes with rpm, the Redhat Package Manager which makes it easy to install and uninstall application programs.
A process is a program in execution. Every time you invoke a system utility or an application program from a shell, one or more "child" processes are created by the shell in response to your command. All UNIX processes are identified by a unique process identifier or PID. An important process that is always present is the init process. This is the first process to be created when a UNIX system starts up and usually has a PID of 1. All other processes are said to be "descendants" of init.
The pipe ('|') operator is used to create concurrently executing processes that pass data directly to one another. It is useful for combining system utilities to perform more complex functions. For example:
$ cat hello.txt | sort | uniq
creates three processes (corresponding to cat, sort and uniq) which execute concurrently. As they execute, the output of the who process is passed on to the sort process which is in turn passed on to the uniq process. uniq displays its output on the screen (a sorted list of users with duplicate lines removed). Similarly:
$ cat hello.txt | grep "dog" | grep -v "cat"
finds all lines in hello.txt that contain the string "dog" but do not contain the string "cat".
|1.7 Redirecting input and output|
The output from programs is usually written to the screen, while their input usually comes from the keyboard (if no file arguments are given). In technical terms, we say that processes usually write to standard output (the screen) and take their input from standard input (the keyboard). There is in fact another output channel called standard error, where processes write their error messages; by default error messages are also sent to the screen.
To redirect standard output to a file instead of the screen, we use the > operator:
$ echo hello
$ echo hello > output
$ cat output
In this case, the contents of the file output will be destroyed if the file already exists. If instead we want to append the output of the echo command to the file, we can use the >> operator:
$ echo bye >> output
$ cat output
To capture standard error, prefix the > operator with a 2 (in UNIX the file numbers 0, 1 and 2 are assigned to standard input, standard output and standard error respectively), e.g.:
$ cat nonexistent 2>errors
$ cat errors
cat: nonexistent: No such file or directory
You can redirect standard error and standard output to two different files:
$ find . -print 1>errors 2>files
or to the same file:
$ find . -print 1>output 2>output
$ find . -print >& output
Standard input can also be redirected using the < operator, so that input is read from a file instead of the keyboard:
$ cat < output
You can combine input redirection with output redirection, but be careful not to use the same filename in both places. For example:
$ cat < output > output
will destroy the contents of the file output. This is because the first thing the shell does when it sees the > operator is to create an empty file ready for the output.
One last point to note is that we can pass standard output to system utilities that require filenames as "-":
$ cat package.tar.gz | gzip -d | tar tvf -
Here the output of the gzip -d command is used as the input file to the tar command.
|1.8 Controlling processes associated with the current shell|
Most shells provide sophisticated job control facilities that let you control many running jobs (i.e. processes) at the same time. This is useful if, for example, you are editing a text file and want ot interrupt your editing to do something else. With job control, you can suspend the editor, go back to the shell prompt, and start work on something else. When you are finished, you can switch back to the editor and continue as if you hadn't left.
Jobs can either be in the foreground or the background. There can be only one job in the foreground at any time. The foreground job has control of the shell with which you interact - it receives input from the keyboard and sends output to the screen. Jobs in the background do not receive input from the terminal, generally running along quietly without the need for interaction (and drawing it to your attention if they do).
The foreground job may be suspended, i.e. temporarily stopped, by pressing the Ctrl-Z key. A suspended job can be made to continue running in the foreground or background as needed by typing "fg" or "bg" respectively. Note that suspending a job is very different from interrupting a job (by pressing the interrupt key, usually Ctrl-C); interrupted jobs are killed off permanently and cannot be resumed.
Background jobs can also be run directly from the command line, by appending a '&' character to the command line. For example:
$ find / -print 1>output 2>errors &
Here the  returned by the shell represents the job number of the background process, and the 27501 is the PID of the process. To see a list of all the jobs associated with the current shell, type jobs:
+ Running find / -print 1>output 2>errors &
Note that if you have more than one job you can refer to the job as %n where n is the job number. So for example fg %3 resumes job number 3 in the foreground.
To find out the process ID's of the underlying processes associated with the shell and its jobs, use ps (process show):
PID TTY TIME CMD
17717 pts/10 00:00:00 bash
27501 pts/10 00:00:01 find
27502 pts/10 00:00:00 ps
So here the PID of the shell (bash) is 17717, the PID of find is 27501 and the PID of ps is 27502.
To terminate a process or job abrubtly, use the kill command. kill allows jobs to referred to in two ways - by their PID or by their job number. So
$ kill %1
$ kill 27501
would terminate the find process. Actually kill only sends the process a signal requesting it shutdown and exit gracefully (the SIGTERM signal), so this may not always work. To force a process to terminate abruptly (and with a higher probability of sucess), use a -9 option (the SIGKILL signal):
$ kill -9 27501
kill can be used to send many other types of signals to running processes. For example a -19 option (SIGSTOP) will suspend a running process. To see a list of such signals, run kill -l.
|1.9 Controlling other processes|
You can also use ps to show all processes running on the machine (not just the processes in your current shell):
$ ps -fae(or ps -aux on BSD machines)
ps -aeH displays a full process hierarchy (including the init process).
Many UNIX versions have a system utility called top that provides an interactive way to monitor system activity. Detailed statistics about currently running processes are displayed and constantly refreshed. Processes are displayed in order of CPU utilization. Useful keys in top are:
s - set update frequency k - kill process (by PID)
u - display processes of one user q - quit
On some systems, the utility w is a non-interactive substitute for top.
One other useful process control utility that can be found on most UNIX systems is the killall command. You can use killall to kill processes by name instead of PID or job number. So another way to kill off our background find process (along with any another find processes we are running) would be:
$ killall find
+ Terminated find / -print 1>output 2>errors
Note that, for obvious security reasons, you can only kill processes that belong to you (unless you are the superuser).