OS Storage Management.pdf

Viewer
Transcript

OPERATING SYSTEM

Unit-3 Memory Management Background Memory is central to the operation of a modern computer system. Memory consists of a large array of words or bytes, each with its own address. The CPU fetches instructions from memory according to the value of the program counter. These instructions may cause additional loading from and storing to specific memory addresses. A typical instruction-execution cycle, for example, first fetches an instruction from memory. The instruction is then decoded and may cause operands to be fetched from memory. After the instruction has been executed on the operands, results may be stored back in memory. The memory unit sees only a stream of memory addresses; it 400 does not know how they are generated (by the instruction counter, indexing, indirection, literal addresses, and so on) or what they are for (instructions or data).

Address Binding Usually, a program resides on a disk as a binary executable file. To be executed, the program must be brought into memory and placed within a process. Depending on the memory management in use, the process may be moved between disk and memory during its execution. The processes on the disk that are waiting to be brought into memory for execution form the input queue. The normal procedure is to select one of the processes in the input queue and to load that process into memory. As the process is executed, it accesses instructions and data from memory. Eventually, the process terminates, and its memory space is declared available. Most systems allow a user process to reside in any part of the physical memory. Thus, although the address space of the computer starts at 00000, the first address of the user process need not be 00000. This approach affects the addresses that the user program can use. In most cases, a user program will go through several steps—some of which may be optional—before being executed. Addresses may be represented in different ways during these steps. Addresses in the source program are generally symbolic (such as count). A compiler will typically bind these symbolic addresses to relocatable addresses (such as "14 bytes from the beginning of this module"). The linkage editor or loader will in turn bind the relocatable addresses to absolute addresses (such as 74014). Each binding is a mapping from one address space to another. Classically, the binding of instructions and data to memory addresses can be done at any step along the way:  Compile time: If memory location known a priori, absolute code can be generated; must recompile code if starting location changes.  Load time: Must generate relocatable code if memory location is not known at compile time.  Execution time: Binding delayed until run time if the process can be moved during its execution from one memory segment to another. Need hardware support for address maps (e.g., base and limit registers). UNIT-4

Page 1

OPERATING SYSTEM

Multistep Processing of a User Program

Logical- versus Physical-Address Space An address generated by the CPU is commonly referred to as a logical address, whereas an address seen by the memory unit—that is, the one loaded into the memory-address register of the memory—is commonly referred to as a physical address. The compile-time and load-time address-binding methods generate identical logical and physical addresses. However, the execution-time address-binding scheme results in differing logical and physical addresses. In this case, we usually refer to the logical address as a virtual address. We use logical address and virtual address interchangeably in this text. The set of all logical addresses generated by a program is a logical-address space; the set of all physical addresses corresponding to these logical addresses is a physical-address space. Thus, in the execution-time address binding scheme, the logical- and physical-address spaces differ. The run-time mapping from virtual to physical addresses is done by a hardware device called the memory-management unit (MMU). The base register is now called a relocation register. The value in the relocation register is added to every address generated by a user process at the time it is sent to memory. For example, if the base is at 14000, then an attempt by the user to address UNIT-4

Page 2

OPERATING SYSTEM

location 0 is dynamically relocated to location 14000; an access to location 346 is mapped to location 14346. The MS-DOS operating system running on the Intel 80x 86 families of processors uses four relocation registers when loading and running processes.

Dynamic relocation using relocation register

The user program never sees the real physical addresses. The program can create a pointer to location 346, store it in memory, manipulate it, and compare it with other addresses—all as the number 346. Only when it is used as a memory address (in an indirect load or store, perhaps) is it relocated relative to the base register. The user program deals with logical addresses. The memorymapping hardware converts logical addresses into physical addresses. We now have two different types of addresses: logical addresses (in the range 0 to max) and physical addresses (in the range R + 0 to R + max for a base value R). The user generates only logical addresses and thinks that the process runs in locations 0 to max. The user program supplies logical addresses; these logical addresses must be mapped to physical addresses before they are used. The concept of a logical-address space that is bound to a separate physical-address space is central to proper memory management.

Dynamic Loading The entire program and all data of a process must be in physical memory for the process to execute. The size of a process is thus limited to the size of physical memory. To obtain better memory-space utilization, we can use dynamic loading. With dynamic loading, a routine is not loaded until it is called. All routines are kept on disk in a relocatable load format. The main program is loaded into memory and is executed. When a routine needs to call another routine, the calling routine first checks to see whether the other routine has been loaded. If not, the relocatable linking loader is called to load the desired routine into memory and to update the program's address tables to reflect this change. Then control is passed to the newly loaded routine. The advantage of dynamic loading is that an unused routine is never loaded. This method is particularly useful when large amounts of code are needed to handle infrequently occurring cases, such as error routines. Dynamic loading does not require special support from the operating system. It is the responsibility of the users to design their programs to take advantage of such a method. Operating UNIT-4

Page 3

OPERATING SYSTEM

systems may help the programmer, however, by providing library routines to implement dynamic loading.

Dynamic Linking and Shared Libraries Some operating systems support only static linking, in which system language libraries are treated like any other object module and are combined by the loader into the binary program image. The concept of dynamic linking is similar to that of dynamic loading. Here, though, linking, rather than loading, is postponed until execution time. This feature is usually used with system libraries, such as language subroutine libraries. Without this facility, each program on a system must have a copy of its language library (or at least the routines referenced by the program) included in the executable image. This requirement wastes both disk space and main memory. With dynamic linking, a stub is included in the image for each library-routine reference. The stub is a small piece of code that indicates how to locate the appropriate memory-resident library routine or how to load the library if the routine is not already present. When the stub is executed, it checks to see whether the needed routine is already in memory. If not, the program loads the routine into memory. Either way, the stub replaces itself with the address of the routine and executes the routine. Thus, the next time that particular code segment is reached, the library routine is executed directly, incurring no cost for dynamic linking. Under this scheme, all processes that use a language library execute only one copy of the library code. This feature can be extended to library updates (such as bug fixes). A library may be replaced by a new version, and all programs that reference the library will automatically use the new version. Without dynamic linking, all such programs would need to be relinked to gain access to the new library. Only programs that are compiled with the new library version are affected by the incompatible changes incorporated in it. Other programs linked before the new library was installed will continue using the older library. This system is also known as shared libraries. Unlike dynamic loading, dynamic linking generally requires help from the operating system. If the processes in memory are protected from one another, then the operating system is the only entity that can check to see whether the needed routine is in another process's memory space or that can allow multiple processes to access the same memory addresses.

Overlays To enable a process to be larger than the amount of memory allocated to it, we can use overlays. The idea of overlays is to keep in memory only those instructions and data that are needed at any given time. When other instructions are needed, they are loaded into space occupied previously by instructions that are no longer needed. As an example, consider a two-pass assembler. During pass 1, it constructs a symbol table; then, during pass 2, it generates machine-language code. We may be able to partition such an assembler into pass 1 code, pass 2 code, the symbol table, and common support routines used by both pass 1 and pass 2. Assume that the sizes of these components are as follows: Pass 1: 70KB Pass 2: 80KB Symbol Table: 20KB Common Routines: 30KB To load everything at once, we would require 200 KB of memory. If only 150 KB is available, we cannot run our process. However, notice that pass 1 and pass 2 do not need to be in memory at the UNIT-4

Page 4

OPERATING SYSTEM

same time. We thus define two overlays: Overlay A is the symbol table, common routines, and pass 1; and overlay B is the symbol table, common routines, and pass 2. We add an overlay driver (which manages the overlays and requires 10 KB itself) and start with overlay A in memory. When we finish pass 1, we jump to the overlay driver, which reads overlay B into memory, overwriting overlay A, and then transfers control to pass 2. Overlay A needs only 120 KB, whereas overlay B needs 130 KB. We can now run our assembler in the 150 KB of memory. It will load somewhat faster because fewer data need to be transferred before execution starts. However, it will run somewhat slower because of the extra I/O required to read the code for overlay B over the code for overlay A. The code for overlay A and the code for overlay B are kept on disk as absolute memory images and are read by the overlay driver as needed. Special relocation and linking algorithms are needed to construct the overlays.

Overlay for Two-pass assemblers Overlays do not require any special support from the operating system. They can be implemented completely by the user with simple file structures. The operating system notices only that there is more I/O than usual.

Swapping The single user system uses a resident monitor with the remainder of memory available to the currently executing user programs. When it switches to next user program, the current contents of user memory are written out to a backing store (disk or drum), and the memory of the next user program is read in. This scheme is called swapping.

Backing store: Swapping requires a backing store. The backing store is commonly a fast drum or disk. It must be large enough to accommodate copies of all memory images of al user programs, and must provide direct access to these memory images. The ready queue consists of all processes whose memory images are on the backing store and which are ready to run. A separate system variable indicates which process is currently in memory. Whenever the CPU scheduler decides to execute a process, it calls the dispatcher. This dispatcher checks to see whether that process is in memory; if not, it swaps out the process that is currently in memory and swaps in the desired process. It then reloads registers a normal and transfers control to the selected process. UNIT-4

Page 5

OPERATING SYSTEM

The major part of the swap time is the transfer time. The total transfer time is directly proportional to the amount of memory swapped.

Swapping of two processes using a disk as a backing store A variant of this swapping policy is used for priority-based scheduling algorithms. If a higher-priority process arrives and wants service, the memory manager can swap out the lower-priority process and then load and execute the higher-priority process. When the higher-priority process finishes, the lower-priority process can be swapped back in and continued. This variant of swapping is sometimes called roll out, roll in. Process that is swapped out will be swapped back into the same memory space that it occupied previously. This restriction is dictated by the method of address binding. If binding is done at assembly or load time, then the process cannot be easily moved to a different location. If execution-time binding is being used, however, then a process can be swapped into a different memory space, because the physical addresses are computed during execution time. Swapping requires a backing store. The backing store is commonly a fast disk. It must be large enough to accommodate copies of all memory images for all users, and it must provide direct access to these memory images. The system maintains a ready queue consisting of all processes whose memory images are on the backing store or in memory and are ready to run. Whenever the CPU scheduler decides to execute a process, it calls the dispatcher. The dispatcher checks to see whether the next process in the queue is in memory. If it is not, and if there is no free memory region, the dispatcher swaps out a process currently in memory and swaps in the desired process. It then reloads registers and transfers control to the selected process. Limitations of swapping: 1. If we want to swap a process, we must be sure that it is completely idle. 2. If a process waits for an I/O operation, we may want to swap that process to free up its memory. However, if the I/O is asynchronously accessing the user memory foe I/O buffers, then the process cannot be swapped.

Contiguous-Memory Allocation The main memory must accommodate both the operating system and the various user processes. We therefore need to allocate the parts of the main memory in the most efficient way possible. UNIT-4

Page 6

OPERATING SYSTEM

The memory is usually divided into two partitions: one for the resident operating system and one for the user processes. We can place the operating system in either low memory or high memory. The major factor affecting this decision is the location of the interrupt vector. Since the interrupt vector is often in low memory, programmers usually place the operating system in low memory as well. We usually want several user processes to reside in memory at the same time. We therefore need to consider how to allocate available memory to the processes that are in the input queue waiting to be brought into memory. In this contiguous-memory allocation, each process is contained in a single contiguous section of memory.

Memory Protection Protecting the operating system from user processes and protecting user processes from one another. We can provide this protection by using a relocation register, with a limit register. The relocation register contains the value of the smallest physical address; the limit register contains the range of logical addresses (for example, relocation = 100040 and limit = 74600). With relocation and limit registers, each logical address must be less than the limit register; the MMU maps the logical address dynamically by adding the value in the relocation register. This mapped address is sent to memory.

Hardware support for relocation and limit registers When the CPU scheduler selects a process for execution, the dispatcher loads the relocation and limit registers with the correct values as part of the context switch. The relocation-register scheme provides an effective way to allow the operating system size to change dynamically. This flexibility is desirable in many situations. For example, the operating system contains code and buffer space for device drivers. If a device driver (or other operatingsystem service) is not commonly used, we do not want to keep the code and data in memory, as we might be able to use that space for other purposes. Such code is sometimes called transient operating-system code; it comes and goes as needed. Thus, using this code changes the size of the operating system during program execution.

Memory Allocation One of the simplest methods for memory allocation is to divide memory into several fixed-sized partitions. Each partition may contain exactly one process. Thus, the degree of multiprogramming UNIT-4

Page 7

OPERATING SYSTEM

is bound by the number of partitions. In this multiple-partition method, when a partition is free, a process is selected from the input queue and is loaded into the free partition. When the process terminates, the partition becomes available for another process. This method was originally used by the IBM OS/360 operating system (called MFT); it is no longer in use. The method described next is a generalization of the fixed-partition scheme (called MVT); it is used primarily in a batch environment. Many of the ideas presented here are also applicable to a timesharing environment in which pure segmentation is used for memory management. In the fixed-partition scheme, the operating system keeps a table indicating which parts of memory are available and which are occupied. Initially, all memory is available for user processes and is considered as one large block of available memory, a hole. When a process arrives and needs memory, we search for a hole large enough for this process. If we find one, we allocate only as much memory as is needed, keeping the rest available to satisfy future requests. As processes enter the system, they are put into an input queue. The operating system takes into account the memory requirements of each process and the amount of available memory space in determining which processes are allocated memory. When a process is allocated space, it is loaded into memory, and it can then compete for the CPU. When a process terminates, it releases its memory, which the operating system may then fill with another process from the input queue. At any given time, we have a list of available block sizes and the input queue. The operating system can order the input queue according to a scheduling algorithm. Memory is allocated to processes until, finally, the memory requirements of the next process cannot be satisfied—that is, no available block of memory (or hole) is large enough to hold that process. The operating system can then wait until a large enough block is available, or it can skip down the input queue to see whether the smaller memory requirements of some other process can be met. This procedure is a particular instance of the general dynamic storage-allocation problem, which concerns how to satisfy a request of size n from a list of free holes. There are many solutions to this problem. The first-fit, best-fit, and worst-fit strategies are the ones most commonly used to select a free hole from the set of available holes. • First fit: Allocate the first hole that is big enough. Searching can start either at the beginning of the set of holes or where the previous first-fit search ended. We can stop searching as soon as we find a free hole that is large enough. • Best fit: Allocate the smallest hole that is big enough. We must search the entire list, unless the list is ordered by size. This strategy produces the smallest leftover hole. • Worst fit: Allocate the largest hole. Again, we must search the entire list, unless it is sorted by size. This strategy produces the largest leftover hole, which may be more useful than the smaller leftover hole from a best-fit approach. Both first fit and best fit are better than worst fit in terms of decreasing time and storage utilization. Neither first fit nor best fit is clearly better than the other in terms of storage utilization, but first fit is generally faster.

Fragmentation Both the first-fit and the best-fit strategies for memory allocation suffer from external fragmentation. As processes are loaded and removed from memory, the free memory space is UNIT-4

Page 8

OPERATING SYSTEM

broken into little pieces. External fragmentation exists when there is enough total memory space to satisfy a request, but the available spaces are not contiguous; storage is fragmented into a large number of small holes. This fragmentation problem can be severe. In the worst case, we could have a block of free (or wasted) memory between every two processes. If all these small pieces of memory were in one big free block instead, we might be able to run several more processes. Whether we are using the first-fit or best-fit strategy can affect the amount of fragmentation. (First fit is better for some systems, whereas best fit is better for others.) Another factor is which end of a free block is allocated. No matter which algorithm is used, external fragmentation will be a problem. Depending on the total amount of memory storage and the average process size, external fragmentation may be a minor or a major problem. Statistical analysis of first fit, for instance, reveals that, even with some optimization, given N allocated blocks, another 0.5N blocks will be lost to fragmentation. That is, one-third of memory may be unusable! This property is known as the 50percent rule. Memory fragmentation can be internal as well as external. Consider a multiple partition allocation scheme with a hole of 18,464 bytes. Suppose that the next process requests 18,462 bytes. If we allocate exactly the requested block, we are left with a hole of 2 bytes. The overhead to keep track of this hole will be substantially larger than the hole itself. The general approach to avoiding this problem is to break the physical memory into fixed-sized blocks and allocate memory in units based on block size. With this approach, the memory allocated to a process may be slightly larger than the requested memory. The difference between these two numbers is internal fragmentation— memory that is internal to a partition but is not being used. One solution to the problem of external fragmentation is compaction. The goal is to shuffle the memory contents so as to place all free memory together in one large block. Compaction is not always possible, however. If relocation is static and is done at assembly or load time, compaction cannot be done; compaction is possible only if relocation is dynamic and is done at execution time. If addresses are relocated dynamically, relocation requires only moving the program and data and then changing the base register to reflect the new base address. When compaction is possible, we must determine its cost. The simplest compaction algorithm is to move all processes toward one end of memory; all holes move in the other direction, producing one large hole of available memory. This scheme can be expensive. Another possible solution to the external-fragmentation problem is to permit the logical-address space of a process to be noncontiguous, thus allowing a process to be allocated physical memory wherever the latter is available. Two complementary techniques achieve this solution: paging and segmentation. These techniques can also be combined.

UNIT-4

Page 9

OPERATING SYSTEM

Virtual Memory The memory that gives the programmer an illusion that he has a very large memory which is equal to the total memory, even though the computer has relatively small amount of main memory. For practical implementation, we use Paging and Segmentation.

Paging: Paging is a memory-management scheme that permits the physical-address space of a process to be noncontiguous. Paging avoids the considerable problem of fitting memory chunks of varying sizes onto the backing store; most memory-management schemes used before the introduction of paging suffered from this problem. The problem arises because, when some code fragments or data residing in main memory need to be swapped out, space must be found on the backing store. The backing store also has the fragmentation problems discussed in connection with main memory, except that access is much slower, so compaction is impossible. Because of its advantages over earlier methods, paging in its various forms is commonly used in most operating systems. (1) Basic Method The basic method for implementing paging involves breaking physical memory into fixed-sized blocks called frames and breaking logical memory into blocks of the same size called pages. When a process is to be executed, its pages are loaded into any available memory frames from the backing store.

Paging hardware The backing store is divided into fixed-sized blocks that are of the same size as the memory frames. The hardware support for paging is illustrated in figure above. Every address generated by the CPU is divided into two parts: a page number (p) and a page offset (d). The page number is used as an index into a page table. The page table contains the base address of each page in physical memory. This base address is combined with the page offset to define the physical memory address that is sent to the memory unit.

UNIT-4

Page 10

OPERATING SYSTEM

Paging model of logical and physical memory Physical memory is broken into fixed-sized blocks called “frames”. Logical memory is also broken into blocks of the same size called “pages”. When a program is to be executed, its pages are loaded into any available frames and the page table is defined to translate from user pages to memory frames. Let p be the page size (no. of words in a page) and U be the logical address. Then the page number and the offset can be calculated as follows: Page number (p) = U div p Offset (d) = U mod P EX: Let the logical memory of a program is divided into 4 pages and each page is of size 4. Consider the following memory mapping:

Paging example for a 32-byte memory with 4-byte pages UNIT-4

Page 11

OPERATING SYSTEM

For the logical address 13, the physical address is calculated as follows: U = Logical address = 13 P = page size = 4 Then, p = page number = U div p = 13 div 4 = 3 d = offset = U mod p = 13 mod 4 = 1 Also that we have, Physical address = Base address. Of the page + offset Where, Base address. = Frame no. X page size Therefore, Physical address of the given logical address 13 is base address + offset = 2 x 4 + 1

Free Frames (a) Before allocation (b) After allocation

(2) Hardware Support Each operating system has its own methods for storing page tables. Most allocate a page table for each process. A pointer to the page table is stored with the other register values (like the instruction counter) in the process control block. When the dispatcher is told to start a process, it must reload the user registers and define the correct hardware page-table values from the stored user page table. The hardware implementation of the page table can be done in several ways. In the simplest case, the page table is implemented as a set of dedicated registers. These registers should be built with very high-speed logic to make the paging-address translation efficient. Every access to memory must go through the paging map, so efficiency is a major consideration. The CPU dispatcher reloads these registers, just as it reloads the other registers. Instructions to load or modify the page-table registers are, of course, privileged, so that only the operating system can change the memory map. The page table thus consists of eight entries that are kept in fast registers. The use of registers for the page table is satisfactory if the page table is reasonably small (for example, 256 entries). Most contemporary computers, however, allow the page table to be very UNIT-4

Page 12

OPERATING SYSTEM

large (for example, 1 million entries). For these machines, the use of fast registers to implement the page table is not feasible. Rather, the page table is kept in main memory, and a page-table base register (PTBR) points to the page table. Changing page tables requires changing only this one register, substantially reducing context-switch time. The problem with this approach is the time required to access a user memory location. If we want to access location i, we must first index into the page table, using the value in the PTBR offset by the page number for i. This task requires a memory access. It provides us with the frame number, which is combined with the page offset to produce the actual address. We can then access the desired place in memory. With this scheme, two memory accesses are needed to access a byte (one for the page-table entry, one for the byte). Thus, memory access is slowed by a factor of 2. The standard solution to this problem is to use a special, small, fast-lookup hardware cache, called translation look-aside buffer (TLB). The TLB is associative, high-speed memory. Each entry in the TLB consists of two parts: a key (or tag) and a value. When the associative memory is presented with an item, the item is compared with all keys simultaneously. If the item is found, the corresponding value field is returned. The search is fast; the hardware, however, is expensive. The TLB is used with page tables in the following way. The TLB contains only a few of the pagetable entries. When a logical address is generated by the CPU, its page number is presented to the TLB. If the page number is found, its frame number is immediately available and is used to access memory. If the page number is not in the TLB (known as a TLB miss), a memory reference to the page table must be made. When the frame number is obtained, we can use it to access memory. In addition, we add the page number and frame number to the TLB, so that they will be found quickly on the next reference. If the TLB is already full of entries, the operating system must select one for replacement. Replacement policies range from least recently used (LRU) to random. Some TLBs allow entries to be wired down, meaning that they cannot be removed from the TLB. Typically, TLB entries for kernel code are wired down.

Paging hardware with TLB UNIT-4

Page 13

OPERATING SYSTEM

The TLB could include old entries that contain valid virtual addresses but have incorrect or invalid physical addresses left over from the previous process. The percentage of times that a particular page number is found in the TLB is called the hit ratio.

(3) Protection Memory protection in a paged environment is accomplished by protection bits associated with each frame. Normally, these bits are kept in the page table. One bit can define a page to be readwrite or read-only. Every reference to memory goes through the page table to find the correct frame number. At the same time that the physical address is being computed, the protection bits can be checked to verify that no writes are being made to a read-only page. An attempt to write to a read-only page causes a hardware trap to the operating system (or memory-protection violation). One more bit is generally attached to each entry in the page table: a valid-invalid bit. When this bit is set to "valid," the associated page is in the process's logical address space and is thus a legal (or valid) page. When the bit is set to "invalid," the page is not in the process's logical-address space. Illegal addresses are trapped by use of the valid-invalid bit. The operating system sets this bit for each page to allow or disallow accesses to the page. Suppose, for example, that in a system with a 14-bit address space (0 to 16383), we have a program that should use only addresses 0 to 10468. Given a page size of 2 KB, we get the situation shown in Figure. Addresses in pages 0, 1, 2, 3, 4, and 5 are mapped normally through the page table. Any attempt to generate an address in pages 6 or 7, however, will find that the valid-invalid bit is set to invalid, and the computer will trap to the operating system (invalid page reference).

Valid (v) or invalid (i) bit in a page table Notice that this scheme has created a problem. Because the program extends to only address 10468, any reference beyond that address is illegal. However, references to page 5 are classified as UNIT-4

Page 14

OPERATING SYSTEM

valid, so accesses to addresses up to 12287 are valid. Only the addresses from 12288 to 16383 are invalid. This problem is a result of the 2-KB page size and reflects the internal fragmentation of paging.

Structure of the Page Table Hierarchical Paging Most modern computer systems support a large logical-address space (232 to 264). In such an environment, the page table itself becomes excessively large. For example, consider a system with a 32-bit logical-address space. If the page size in such a system is 4 KB (212), then a page table may consist of up to 1 million entries (232/212). Assuming that each entry consists of 4 bytes, each process may need up to 4 MB of physical-address space for the page table alone. Clearly, we would not want to allocate the page table contiguously in main memory. One simple solution to this problem is to divide the page table into smaller pieces. We can accomplish this division in several ways. One way is to use a two-level paging algorithm, in which the page table itself is also paged. Remember our example of a 32-bit machine with a page size of 4 KB. A logical address is divided into a page number consisting of 20 bits and a page offset consisting of 12 bits. Because we page the page table, the page number is further divided into a 10-bit page number and a 10-bit page offset. Thus, a logical address is as follows:

A two-level page-table scheme

Address translation for a two-level 32-bit paging architecture UNIT-4

Page 15

OPERATING SYSTEM

Inverted Page Tables Usually, each process has an associated page table. The page table has one entry for each page that the process is using (or one slot for each virtual address, regardless of the latter's validity). This table representation is a natural one, since processes reference pages through the pages' virtual addresses. The operating system must then translate this reference into a physical memory address. Since the table is sorted by virtual address, the operating system is able to calculate where in the table the associated physical-address entry is and to use that value directly. One of the drawbacks of this method is that each page table may consist of millions of entries. These tables may consume large amounts of physical memory just to keep track of how the other physical memory is being used. To solve this problem, we can use an inverted page table. An inverted page table has one entry for each real page (or frame) of memory. Each entry consists of the virtual address of the page stored in that real memory location; with information about the process that owns that page. Thus, only one page table is in the system, and it has only one entry for each page of physical memory. Figure below shows the operation of an inverted page table.

Inverted page table

Shared Pages: Sharing itself is a form of dynamic relocation. Using a page scheme, we have no external fragmentation; any frame can be allocated to a job that needs it. However, we may have some internal fragmentation. The last frame allocated may not be completely full. Each job has its own page table, which is stored with the other register values in the process control block. UNIT-4

Page 16

OPERATING SYSTEM

Shared Pages: Another advantage of paging is the possibility of sharing common code. This consideration is particularly important in a time-sharing environment. Consider a system which supports 40 users, each of whom executes a text editor. If the text editor consists of 30k of code and 5k of data space, we would need 1400k to support the 40 users. If the code is reentrant, it could be shared as shown below:

Sharing of code in a paging environment In the above figure, we see a 3-page editor being shared among 3 processes. Each process has its own data page. Reentrant code, also called pure code, is non-self-modifying code. If the code is reentrant, then it never changes during execution. Thus two or more processes can execute the same code at the same time. Each process has its own copy of registers and data storage to hold the data for its execution. Only one copy of the editor needs to be kept in physical memory. Each user’s page table maps onto the same physical copy of the editor, but data pages are mapped onto different frames.

Segmentation An important aspect of memory management that became unavoidable with paging is the separation of the user's view of memory and the actual physical memory. The user's view of memory is not the same as the actual physical memory. The user's view is mapped onto physical memory. This mapping allows differentiation between logical memory and physical memory.

Basic Method Many people would not think of memory as a linear array of bytes, some containing instructions and others containing data. Rather, users prefer to view memory as a collection of variable-sized segments, with no necessary ordering among segments. UNIT-4

Page 17

OPERATING SYSTEM

User's view of a program Consider how you think of a program when you are writing it. You think of it as a main program with a set of methods, procedures, or functions. It may also include various data structures: objects, arrays, stacks, variables, and so on. Each of these modules or data elements is referred to by name. You talk about "the symbol table," "method Sqrt()," "the main program," without caring what addresses in memory these elements occupy. You are not concerned with whether the symbol table is stored before or after the Sqrt() method. Each of these segments is of variable length; the length is intrinsically defined by the purpose of the segment in the program. Elements within a segment are identified by their offset from the beginning of the segment: the first statement of the program, the seventeenth entry in the symbol table, the fifth instruction of the Sqrt() method, and so on. Segmentation is a memory-management scheme that supports this user view of memory. A logicaladdress space is a collection of segments. Each segment has a name and a length. The addresses specify both the segment name and the offset within the segment. The user therefore specifies each address by two quantities: a segment name and an offset. For simplicity of implementation, segments are numbered and are referred to by a segment number, rather than by a segment name. Thus, a logical address consists of a two tuple: . Normally, the user program is compiled, and the compiler automatically constructs segments reflecting the input program. A Java compiler might create separate segments for the following: 1. The method area, which holds the code for all methods 2. The heap, from which memory for objects is allocated 3. The stacks used by each Java thread 4. The class loader A C compiler might create a separate segment for global variables. Libraries that are linked in during compile time might be assigned separate segments. The loader would take all these segments and assign them segment numbers.

Hardware Although the user can now refer to objects in the program by a two-dimensional address, the actual physical memory is still, of course, a one-dimensional sequence of bytes. Thus, we must define an implementation to map two-dimensional user-defined addresses into one-dimensional physical addresses. This mapping is effected by a segment table. Each entry in the segment table has a UNIT-4

Page 18

OPERATING SYSTEM

segment base and a segment limit. The segment base contains the starting physical address where the segment resides in memory, whereas the segment limit specifies the length of the segment.

Segmentation Hardware The use of a segment table is illustrated in Figure above. A logical address consists of two parts: a segment number, s, and an offset into that segment, d. The segment number is used as an index to the segment table. The offset d of the logical address must be between 0 and the segment limit. If it is not, we trap to the operating system (logical addressing attempt beyond end of segment). When an offset is legal, it is added to the segment base to produce the address in physical memory of the desired byte. The segment table is thus essentially an array of base-limit register pairs. As an example, consider the situation shown in Figure below. We have five segments numbered from 0 through 4. The segments are stored in physical memory as shown. The segment table has a separate entry for each segment, giving the beginning address of the segment in physical memory (or base) and the length of that segment (or limit). For example, segment 2 is 400 bytes long and begins at location 4300. Thus, a reference to byte 53 of segment 2 is mapped onto location 4300 + 53 = 4353. A reference to segment 3, byte 852, is mapped to 3200 (the base of segment 3) + 852 = 4052. A reference to byte 1222 of segment 0 would result in a trap to the operating system, as this segment is only 1,000 bytes long.

Example of Segmentation UNIT-4

Page 19

OPERATING SYSTEM

Protection and Sharing A particular advantage of segmentation is the association of protection with the segments. Because the segments represent a semantically defined portion of the program, it is likely that all entries in the segment will be used the same way. Hence, some segments are instructions, whereas other segments are data. In a modern architecture, instructions are non-self-modifying, so instruction segments can be defined as read-only or execute-only. The memory-mapping hardware will check the protection bits associated with each segment-table entry to prevent illegal accesses to memory, such as attempts to write into a read-only segment or to use an execute-only segment as data. By placing an array in its own segment, the memory management hardware will automatically check that array indexes are legal and do not stray outside the array boundaries. Thus, many common program errors will be detected by the hardware before they can cause serious damage. Another advantage of segmentation involves the sharing of code or data. Each process has an associated segment table, which the dispatcher uses to define the hardware segment table when this process is given the CPU. Segments are shared when entries in the segment tables of two different processes point to the same physical location. The sharing occurs at the segment level. Thus, any information can be shared if it is defined to be a segment. Several segments can be shared, so a program composed of several segments can be shared. For example, consider the use of a text editor in a time-sharing system. A complete editor might be quite large, composed of many segments. These segments can be shared among all users, limiting the physical memory needed to support editing tasks. Rather than n copies of the editor, we need only one copy. For each user, we still need separate, unique segments to store local variables. These segments, of course, are not shared. We can also share only parts of programs.

Sharing of segments in a segmented memory system UNIT-4

Page 20

OPERATING SYSTEM

Although this sharing appears simple, there are subtle considerations. Code segments typically contain references to themselves. For example, a conditional jump normally has a transfer address, which consists of a segment number and an offset. The segment number of the transfer address will be the segment number of the code segment. If we try to share this segment, all sharing processes must define the shared code segment to have the same segment number. Read-only data segments that contain no physical pointers may be shared as different segment numbers, as may code segments that refer to them only indirectly.

Fragmentation The long-term scheduler must find and allocate memory for all the segments of a user program. This situation is similar to paging except that the segments are of variable length, whereas pages are all the same size. Thus, as with the variable sized partition scheme, memory allocation is a dynamic storage-allocation problem, usually solved with a best-fit or first-fit algorithm. It follows that segmentation may cause external fragmentation. When all blocks of free memory are too small to accommodate a segment, the process may simply have to wait until more memory (or at least a larger hole) becomes available or until compaction creates a larger hole. Because segmentation is by its nature a dynamic relocation algorithm, we can compact memory whenever we want. If the CPU scheduler must wait for one process because of a memory-allocation problem, it may (or may not) skip through the CPU queue looking for a smaller, lower-priority process to run. Generally, if the average segment size is small, external fragmentation will also be small. Because the individual segments are smaller than the overall process, they are more likely to fit in the available memory blocks.

Segmentation with Paging Both paging and segmentation have advantages and disadvantages. In fact, of the two most popular microprocessors now being used, one—the Motorola 68000 line—is based on a flataddress space, whereas the other—the Intel 80x86 and Pentium family—is based on segmentation. Both are merging memory models toward a mixture of paging and segmentation. We can combine these two methods to improve on each. 80x86 uses segmentation with paging for memory management. The maximum number of segments per process is 16 KB, and each segment can be as large as 4 gigabytes. The page size is 4 KB. The logical-address space of a process is divided into two partitions. The first partition consists of up to 8 KB segments that are private to that process. The second partition consists of up to 8 KB segments that are shared among all the processes. Information about the first partition is kept in the local descriptor table (LDT); information about the second partition is kept in the global descriptor table (GDT). Each entry in the LDT and GDT consists of an 8-byte segment descriptor with detailed information about a particular segment, including the base location and limit of that segment. The logical address is a pair (selector, offset), where the selector is a 16-bit number:

UNIT-4

Page 21

OPERATING SYSTEM

in which s designates the segment number, g indicates whether the segment is in the GDT or LDT, and p deals with protection. The offset is a 32-bit number specifying the location of the byte (or word) within the segment in question.

Segmentation with Paging

Virtual Memory Virtual memory is a technique that allows the execution of processes that are not completely in memory. One major advantage of this scheme is that programs can be larger than physical memory. Further, virtual memory abstracts main memory into an extremely large, uniform array of storage, separating logical memory as viewed by the user from physical memory. This technique frees programmers from the concerns of memory-storage limitations. Virtual memory also allows processes to share files easily and to implement shared memory. In addition, it provides an efficient mechanism for process creation. Virtual memory is not easy to implement, however, and may substantially decrease performance if it is used carelessly. The ability to execute a program that is only partially in memory would confer many benefits: • A program would no longer be constrained by the amount of physical memory that is available. Users would be able to write programs for an extremely large virtual address space, simplifying the programming task. • Because each user program could take less physical memory, more programs could be run at the same time, with a corresponding increase in CPU utilization and throughput but with no increase in response time or turnaround time. • Less I/O would be needed to load or swap each user program into memory, so each user program would run faster. Thus, running a program that is not entirely in memory would benefit both the system and the user. Virtual memory involves the separation of logical memory as perceived by users from physical memory. This separation allows an extremely large virtual memory to be provided for programmers UNIT-4

Page 22

OPERATING SYSTEM

when only a smaller physical memory is available. Virtual memory makes the task of programming much easier, because the programmer no longer needs to worry about the amount of physical memory available or about what code can be placed in overlays; she can concentrate instead on the problem to be programmed. Indeed, on systems that support virtual memory, overlays have almost disappeared.

Virtual Memory is larger than Physical Memory The virtual-address space of a process refers to the logical (or virtual) view of how a process is stored in memory. Typically, this view is that a process begins at a certain logical address—say, addresses 0—and exists in contiguous memory.

Virtual Address Space In addition to separating logical memory from physical memory, virtual memory also allows files and memory to be shared by two or more different processes through page sharing. This leads to the following benefits: • System libraries can be shared by several different processes through mapping of the shared object into a virtual address space. Although each process considers the shared libraries to be part of its virtual address space, the actual pages where the libraries reside in physical memory are shared by all the processes. Typically, a library is mapped read only into the space of each process that is linked with it. • Similarly, virtual memory enables processes to share memory. UNIT-4

Page 23

OPERATING SYSTEM

Sharing Libraries with Virtual Memory

Demand Paging Demand paging is one technique of virtual memory that allows us to execute a program which is not entirely in memory. Demand paging is similar to paging system with swapping. Programs reside on a swapping device, the backing store. When we want to execute a program, we swap it into memory. However, rather than swapping the entire program into memory, we use a lazy swapper. The lazy swapper never swaps a page into memory unless it is needed. Whenever the program tries to use a page that is not in the memory, the hardware will translate logical address of the page into the physical address using the page table. But, there will be no frame allocated in the page table. The allocation of page for the newly loading page can be controlled by a valid/invalid bit. The situation, where a program tries to access a page which is not brought into memory is called a page fault. A page fault leads to a hardware trap.

Transfer of a paged memory to contiguous disk space. UNIT-4

Page 24

OPERATING SYSTEM

When we want to execute a process, we swap it into memory. Rather than swapping the entire process into memory, however, we use a lazy swapper. A lazy swapper never swaps a page into memory unless that page will be needed. Since we are now viewing a process as a sequence of pages, rather than as one large contiguous address space, use of the term swapper is technically incorrect. A swapper manipulates entire processes, whereas a pager is concerned with the individual pages of a process. The procedure for handling this page fault is straightforward: 1. We check an internal table (usually kept with the process control block) for this process to determine whether the reference was a valid or an invalid memory access. 2. If the reference was invalid, we terminate the process. If it was valid, but we have not yet brought in that page, we now page it in. 3. We find a free frame (by taking one from the free-frame list, for example). 4. We schedule a disk operation to read the desired page into the newly allocated frame. 5. When the disk read is complete, we modify the internal table kept with the process and the page table to indicate that the page is now in memory. 6. We restart the instruction that was interrupted by the trap. The process can now access the page as though it had always been in memory.

In the extreme case, we could start executing a process with no pages in memory. When the operating system sets the instruction pointer to the first instruction of the process, which is on a non-memory-resident page, the process immediately faults for the page. After this page is brought into memory, the process continues to execute, faulting as necessary until every page that it needs is in memory. At that point, it can execute with no more faults. This scheme is pure demand paging: Never bring a page into memory until it is required. Programs tend to have locality of reference, which results in reasonable performance from demand paging. The hardware to support demand paging is the same as the hardware for paging and swapping: UNIT-4

Page 25

OPERATING SYSTEM

• Page table: This table has the ability to mark an entry invalid through a valid-invalid bit or special value of protection bits. • Secondary memory: This memory holds those pages that are not present in main memory. The secondary memory is usually a high-speed disk. It is known as the swap device, and the section of disk used for this purpose is known as swap space.

Page Replacement Page replacement takes the following approach. If no frame is free, find one which is not currently being used and free it. We can free a frame by writing its contents to the backing store, and changing the page table to indicate that the page is no longer in memory. The freed frame can now be used to hold the page for which the program faulted. Now the page fault service routine takes the following form: 1. Find the location of the desired page on the backing store. 2. Find a free frame: • If there is a free frame, use it. • Otherwise, use a page-replacement algorithm to select a victim page. • Write the victim page on the backing store; change the page and page table accordingly. 1. Read the desired page into the free frame. 2. Restart the user program.

Page Replacement Algorithms An algorithm is evaluated by running it on a particular string of memory references and computing the number of page faults. The string of memory references is called a reference string. A virtual memory system has an address space of 8K and memory space is 4K. The page and block size is 1K. Consider the reference string: 4,2,0,1,2,6,1,4,0,1,0,2,3,5,7 1. First-in-First-out (FIFO): Here, with each page, the time is associated when it is brought into memory. When a page must be replaced, the oldest page is chosen. For this, we can create a FIFO queue to hold all pages in memory. Page Reference

Pages in Memory

2 6 1 4 0 1 0 2 3 5 7

4,2,0,1 4,2,0,1 6,2,0,1 6,2,0,1 6,4,0,1 6,4,0,1 6,4,0,1 6,4,0,1 6,4,2,1 6,4,2,3 5,4,2,3 5,7,2,3 Number of page faults = 4+6=10

UNIT-4

Page 26

OPERATING SYSTEM

Least Recently Used (LRU): LRU replacement associates with each page the time of its last use. When a page must be replaced, LRU chooses that page which has not been used for the longest period of time. This is the optimal page replacement algorithm looking backwards in time. We can implement LRU • Using a counter • Using stacks through doubly-linked lists Page Reference 2 6 1 4 0 1 0 2 3 5 7

Pages in Memory

Removal order

4,2,0,1 4,2,0,1 6,2,0,1 6,2,0,1 6,2,4,1 6,0,4,1 6,0,4,1 6,0,4,1 2,0,4,1 2,0,3,1 2,0,3,5 2,7,3,5

4,2,0,1 4,0,1,2 0,1,2,6 0,2,6,1 2,6,1,4 6,1,4,0 6,4,0,1 6,4,1,0 4,1,0,2 1,0,2,3 0,2,3,4 2,3,5,7 Number of Page Faults = 4+7 = 11

Optimal Replacement (OPT): Here, we replace that page which will not be used for the longest period of time. Page Reference

Pages in Memory

2 6 1 4 0 1 0 2 3 5 7

4,2,0,1 4,2,0,1 4,6,0,1 4,6,0,1 4,6,0,1 4,6,0,1 4,6,0,1 4,6,0,1 2,6,0,1 2,3,0,1 2,3,5,1 2,3,5,7 Number of Page Faults= 4+5=9

Thrashing: If the number of frames allocated to a low-priority process falls below the minimum number required by the computer architecture, we must suspend that process's execution. We should then page out its remaining pages, freeing all its allocated frames. This provision introduces a swap-in, swap-out level of intermediate CPU scheduling. UNIT-4

Page 27

OPERATING SYSTEM

Consequently, it quickly faults again, and again, and again, replacing pages that it must bring back in right away. This high paging activity is called thrashing. A process is thrashing if it is spending more time paging than executing. Cause of Thrashing Thrashing results in severe performance problems. Consider the following scenario, which is based on the actual behavior of early paging systems. • The operating system monitors CPU utilization. If CPU utilization is too low, we increase the degree of multiprogramming by introducing a new process to the system. A global pagereplacement algorithm is used; it replaces pages with no regard to the process to which they belong. The CPU scheduler sees the decreasing CPU utilization and increases degree of multiprogramming. • The CPU scheduler sees the decreasing CPU utilization and increases the degree of multiprogramming as a result. • CPU utilization drops even further, and the CPU scheduler tries to increase the degree of multiprogramming even more. • Thrashing has occurred, and system throughput plunges. The page fault rate increases tremendously.

So, at this point, to increase CPU utilization and to stop thrashing, we must decrease the degree of multiprogramming. We can limit the effects of thrashing by using a local replacement algorithm (or priority replacement algorithm). With local replacement, if one process starts thrashing, it cannot steal frames from another process and cause the latter to thrash also. Pages are replaced with regard to the process of which they are a part. However, the problem is not entirely solved. If processes are thrashing, they will be in the queue for the paging device most of the time. The average service time for a page fault will increase because of the longer average queue for the paging device. Thus, the effective access time will increase even for a process that is not thrashing. To prevent thrashing, we must provide a process with as many frames as it needs. There are several techniques. The working-set strategy starts by looking at how many frames a process is actually using. This approach defines the locality model of process execution.

UNIT-4

Page 28

OPERATING SYSTEM

The locality model states that, as a process executes, it moves from locality to locality. A locality is a set of pages that are actively used together. A program is generally composed of several different localities, which may overlap. Working-Set Model The working-set model is based on the assumption of locality. This model uses a parameter, Δ, to define the working-set window. The idea is to examine the most recent Δ page references. The set of pages in the most recent Δ page references is the working set. If a page is in active use, it will be in the working set. If it is no longer being used, it will drop from the working set Δ time units after its last reference. Thus, the working set is an approximation of the program's locality. The accuracy of the working set depends on the selection of Δ. If Δ is too small, it will not encompass the entire locality; if Δ is too large, it may overlap several localities. In the extreme, if Δ is infinite, the working set is the set of pages touched during the process execution.

Working-set model Once Δ has been selected, use of the working-set model is simple. The operating system monitors the working set of each process and allocates to that working set enough frames to provide it with its working-set size. If there are enough extra frames, another process can be initiated. If the sum of the working-set sizes increases, exceeding the total number of available frames, the operating system selects a process to suspend. The process's pages are written out (swapped), and its frames are reallocated to other processes. The suspended process can be restarted later. This working-set strategy prevents thrashing while keeping the degree of multiprogramming as high as possible. Thus, it optimizes CPU utilization. The difficulty with the working-set model is keeping track of the working set. The working-set window is a moving window. At each memory reference, a new reference appears at one end and the oldest reference drops off the other end. A page is in the working set if it is referenced anywhere in the working-set window. We can approximate the working-set model with a fixed interval timer interrupt and a reference bit.

UNIT-4

Page 29

OPERATING SYSTEM

File System Interface For most users, the file system is the most visible aspect of an operating system. It provides the mechanism for on-line storage of and access to both data and programs of the operating system and all the users of the computer system. The file system consists of two distinct parts: - A collection of files, each storing related data, and - A directory structure, which organizes and provides information about all the files in the system. Some file systems have a third part, partitions, which are used to separate physically or logically large collections of directories.

File Concept Computers can store information on various storage media, such as magnetic disks, magnetic tapes, and optical disks. So that the computer system will be convenient to use, the operating system provides a uniform logical view of information storage. The operating system abstracts from the physical properties of its storage devices to define a logical storage unit, the file. Files are mapped by the operating system onto physical devices. These storage devices are usually nonvolatile, so the contents are persistent through power failures and system reboots. A file is a named collection of related information that is recorded on secondary storage. From a user's perspective, a file is the smallest allotment of logical secondary storage; that is, data cannot be written to secondary storage unless they are within a file. Commonly, files represent programs (both source and object forms) and data. Data files may be numeric, alphabetic, alphanumeric, or binary. Files may be free form, such as text files, or may be formatted rigidly. In general, a file is a sequence of bits, bytes, lines, or records, the meaning of which is defined by the file's creator and user. Many different types of information may be stored in a file—source programs, object programs, executable programs, numeric data, text, payroll records, graphic images, sound recordings, and so on.  A file has a certain defined structure, which depends on its type.  A text file is a sequence of characters organized into lines (and possibly pages).  A source file is a sequence of subroutines and functions, each of which is further organized as declarations followed by executable statements.  An object file is a sequence of bytes organized into blocks understandable by the system's linker.  An executable file is a series of code sections that the loader can bring into memory and execute.

File Attributes A file has certain other attributes, which vary from one operating system to another but typically consist of these:  Name: The symbolic file name is the only information kept in human readable form.  Identifier: This unique tag, usually a number, identifies the file within the file system; it is the non-human-readable name for the file. UNIT-4

Page 30

OPERATING SYSTEM

 Type: This information is needed for those systems that support different types of files.  Location: This information is a pointer to a device and to the location of the file on that device.  Size: The current size of the file (in bytes, words, or blocks) and possibly the maximum allowed size are included in this attribute.  Protection: Access-control information determines who can do reading, writing, executing, and so on.  Time, date, and user identification: This information may be kept for creation, last modification, and last use. These data can be useful for protection, security, and usage monitoring.

File Operations A file is an abstract data type. The operating system can provide system calls to create, write, read, reposition, delete, and truncate files. • Creating a file: Two steps are necessary to create a file. o First, space in the file system must be found for the file. We discuss how to allocate space for the file. o Second, an entry for the new file must be made in the directory. The directory entry records the name of the file, its location in the file system, and possibly other information.  Writing a file: To write a file, we make a system call specifying both the name of the file and the information to be written to the file. Given the name of the file, the system searches the directory to find the file's location. The system must keep a write pointer to the location in the file where the next write is to take place. The write pointer must be updated whenever a write occurs.  Reading a file: To read from a file, we use a system call that specifies the name of the file and where (in memory) the next block of the file should be put. Again, the directory is searched for the associated entry, and the system needs to keep a read pointer to the location in the file where the next read is to take place. Once the read has taken place, the read pointer is updated. A given process is usually only reading or writing a given file and the current operation location is kept as a per-process current-file-position pointer. Both the read and write operations use this same pointer, saving space and reducing the system complexity.  Repositioning within a file: The directory is searched for the appropriate entry, and the currentfile-position pointer is set to a given value. Repositioning within a file need not involve any actual I/O. This file operation is also known as files seek.  Deleting a file: To delete a file, we search the directory for the named file. Having found the associated directory entry, we release all file space, so that it can be reused by other files, and erase the directory entry.  Truncating a file: The user may want to erase the contents of a file but keep its attributes. Rather than forcing the user to delete the file and then recreate it, this function allows all attributes to remain unchanged—except for file length—but lets the file be reset to length zero and its file space released. These six basic operations comprise the minimal set of required file operations. Other common operations include appending new information to the end of an existing file and renaming an existing file. These primitive operations may then be combined to perform other file operations. For UNIT-4

Page 31

OPERATING SYSTEM

instance, creating a copy of a file, or copying the file to another I/O device, such as a printer or a display, can be accomplished by creating a new file and then reading from the old and writing to the new.

Open-file Table Most of the file operations mentioned involve searching the directory for the entry associated with the named file. To avoid this constant searching, many systems require that an open() system call be used before that file is first used actively. The operating system keeps a small table, called the open-file table, containing information about all open files. When a file operation is requested, the file is specified via an index into this table, so no searching is required. When the file is no longer being actively used, it is closed by the process, and the operating system removes its entry from the open-file table. The implementation of the open() and close() operations in a multiuser environment, such as UNIX, is more complicated. In such a system, several users may open the file at the same time. Typically, the operating system uses two levels of internal tables:  Per-process table: The per-process table tracks all files that a process has open.  System-wide table contains process-independent information, such as the location of the file on disk, access dates, and file size. The open-file table also has an open count associated with each file to indicate how many processes have the file open. Each close() decreases this count, and when the open count reaches zero, the file is no longer in use, and the file's entry is removed from the open file table. Several pieces of information are associated with an open file.  File pointer: On systems that do not include a file offset as part of the read() and write() system calls, the system must track the last read-write location as a current-file-position pointer. This pointer is unique to each process operating on the file and therefore must be kept separate from the on-disk file attributes.  File-open count: As files are closed, the operating system must reuse its open-file table entries, or it could run out of space in the table. Because multiple processes may have opened a file, the system must wait for the last file to close before removing the open-file table entry. The file-open counter tracks the number of opens and closes and reaches zero on the last close. The system can then remove the entry.  Disk location of the file: Most file operations require the system to modify data within the file. The information needed to locate the file on disk is kept in memory so that the system does not have to read it from disk for each operation.  Access rights: Each process opens a file in an access mode. This information is stored on the per-process table so the operating system can allow or deny subsequent I/O requests.

File Locks Some operating systems provide facilities for locking an open file (or sections of a file). File locks allow one process to lock a file and prevent other processes from gaining access to it. File locks are useful for files that are shared by several processes—for example, a system log file that can be accessed and modified by a number of processes in the system. UNIT-4

Page 32

OPERATING SYSTEM

File locks provide functionality similar to readers-writers locks,  A shared lock is akin to a reader lock in that several processes may acquire the lock concurrently. It is of course dependent upon the behavior of the processes to only read from—and not modify—a file that has been accessed via a shared lock; the operating system does not enforce this.  An exclusive lock behaves like a writer lock; a file can have only a single concurrent exclusive lock (and no shared locks). It is important to note that not all operating systems provide both types of locks; some systems only provide exclusive file locking. Operating systems may provide either mandatory or advisory file locking mechanisms. - If a lock is mandatory, then once a process acquires a lock, the operating system will prevent any other process from accessing the locked file. For example, assume a process acquires an exclusive lock on the file system.log. If we attempt to open system.log from another process—for example, a text editor—the operating system will prevent access until the exclusive lock is released. This occurs even if the text editor is not written explicitly to acquire the lock. - Alternatively, if the lock is advisory, then the operating system will not prevent the text editor from acquiring access to system.log. Rather, the text editor would have had to be written so that it manually acquired the lock before accessing the file.

File Types When we design a file system—indeed, an entire operating system—we always consider whether the operating system should recognize and support file types. If an operating system recognizes the type of a file, it can then operate on the file in reasonable ways. A common technique for implementing file types is to include the type as part of the file name. The name is split into two parts—a name and an extension, usually separated by a period character. In this way, the user and the operating system can tell from the name alone what the type of a file is. File type refers to the ability of the operating system to distinguish different types of file such as text files source files and binary files etc. Many operating systems support many types of files. Operating system like MS-DOS and UNIX has the following types of files: ORDINARY FILES • These are the files that contain user information. • These may have text, databases or executable program. • The user can apply various operations on such files like add, modify, delete or even remove the entire file. DIRECTORY FILES • These files contain list of file names and other information related to these files. SPECIAL FILES: • These files are also known as device files. • These files represent physical device like disks, terminals, printers, networks, tape drive etc. • These files are of two types o Character special files - data is handled character by character as in case of terminals or printers. o Block special files - data is handled in blocks as in the case of disks and tapes. UNIT-4

Page 33

OPERATING SYSTEM

Common file types

File Structure File types may be used to indicate the internal structure of a file. Source and object files have structures that match the expectations of the programs that read them. Further, certain files must conform to a required structure that is understood by the operating system. For example, the operating system may require that an executable file have a specific structure so that it can determine where in memory to load the file and what the location of the first instruction is. Some operating systems extend this idea into a set of system supported file structures, with sets of special operations for manipulating files with those structures. For instance, DEC's VMS operating system has a file system that supports three defined file structures. One of the disadvantages of having the operating system support multiple file structures: The resulting size of the operating system is cumbersome. If the operating system defines five different file structures, it needs to contain the code to support these file structures. In addition, every file may need to be definable as one of the file types supported by the operating system. When new applications require information structured in ways not supported by the operating system, severe problems may result. For example, assume that a system supports two types of files: text files (composed of ASCII characters separated by a carriage return and line feed) and executable binary files. Now, if we (as UNIT-4

Page 34

OPERATING SYSTEM

users) want to define an encrypted file to protect our contents from being read by unauthorized people, we may find neither file type to be appropriate. The encrypted file is not ASCII text lines but rather is (apparently) random bits. Although it may appear to be a binary file, it is not executable. As a result, we may have to circumvent or misuse the operating system's file-types mechanism or abandon our encryption scheme.

Internal File Structure Internally, locating an offset within a file can be complicated for the operating system. All disk I/O is performed in units of one block (physical record), and all blocks are the same size. It is unlikely that the physical record size will exactly match the length of the desired logical record. Logical records may even vary in length. Packing a number of logical records into physical blocks is a common solution to this problem. For example, the UNIX operating system defines all files to be simply streams of bytes. Each byte is individually addressable by its offset from the beginning (or end) of the file. In this case, the logical record is 1 byte. The file system automatically packs and unpacks bytes into physical disk blocks— say, 512 bytes per block—as necessary. The logical record size, physical block size, and packing technique determine how many logical records are in each physical block. The packing can be done either by the user's application program or by the operating system. In either case, the file may be considered to be a sequence of blocks. All the basic I/O functions operate in terms of blocks. The conversion from logical records to physical blocks is a relatively simple software problem. Because disk space is always allocated in blocks, some portion of the last block of each file is generally wasted. If each block were 512 bytes, for example, then a file of 1,949 bytes would be allocated four blocks (2,048 bytes); the last 99 bytes would be wasted. The waste incurred to keep everything in units of blocks (instead of bytes) is internal fragmentation. All file systems suffer from internal fragmentation; the larger the block size, the greater the internal fragmentation.

Access Methods Files store information. When it is used, this information must be accessed and read into computer memory. The information in the file can be accessed in several ways. 1) Sequential Access 2) Direct Access 3) Other Access methods Sequential Access Methods  Records are accessed in some sequence i.e the information in the file is processed in order, one record after the other. This access method is the most primitive one.  Example: Compilers usually access files in this fashion.

UNIT-4

Page 35

OPERATING SYSTEM

Direct Access Methods • Another method is direct access (or relative access). A file is made up of fixed-length. Random access file organization provides, accessing the records directly. Each record has its own address on the file with by the help of which it can be directly accessed for reading or writing. The records need not be in any sequence within the file and they need not be in adjacent locations on the storage medium. • Logical records that allow programs to read and write records rapidly in no particular order. The direct access method is based on a disk model of a file, since disks allow random access to any file block. For a direct access file allows arbitrary blocks to be read or written.

Other Access Methods • This mechanism is built up on base of sequential access. An index is created for each file which contains pointers to various blocks. Index is searched sequentially and its pointer is used to access the file directly. • Index Sequential Access Method (ISAM) – uses indexes in a hierarchy to point to records in a file.

• Other access methods can be built on top of a direct-access method. These methods generally involve the construction of an index for the file. The index, like an index in the back of a book, contains pointers to the various blocks. To find a record in the file, we first search the index and then use the pointer to access the file directly and to find the desired record.

UNIT-4

Page 36

OPERATING SYSTEM

Directory Structure To manage data, we need to organize them. This organization is usually done in two parts. ◦ Disks are split into one or more partitions, also known as minidisks. Each disk on a system contains at least one partition which is a low-level structure in which files and directories reside. ◦ Each partition contains information about files within it. This information is kept in entries in a device directory or volume table of contents. A directory can viewed as a “symbol table” that translates file names into their directory entries.

A typical file system organization The operations that are to be performed on a directory: - Search for a file – need to find a particular entry or be able to find file names based on a pattern match. - Create a file - and add its entry to the directory. - Delete a file – and remove it from the directory. - List a directory – list both the files in the directory and the directory contents for each file. - Rename a file – renaming may imply changing the position of the file entry in the directory structure. - Traverse the file system – the directory needs a logical structure such that every directory and every file within each directory can be accessing efficiently. Directory Structure is a collection of nodes containing information about all files. Both the directory structure and the files reside on disk. Backups of these two structures are kept on tapes.

UNIT-4

Page 37

OPERATING SYSTEM

Directory Design Goal To organize the logical structure to obtain: - Efficiency – locating a file quickly. - Naming – convenient to users. o Two users can have same name for different files. o The same file can have several different names. - Grouping – logical grouping of files by properties, (e.g., all Java programs, all games, …)

Single-Level Directory  All files are contained in the same directory which is easy to understand  Limitations are ◦ Unique File names ◦ Limited to the length of the filename

Two-Level Directory  In this directory, each user has her own user file directory (UFD). The UFDs have similar structures, but each lists only the files of a single user. When a user job starts or a user logs in, the system's master file directory (MFD) is searched. The MFD is indexed by user name or account number, and each entry points to the UFD for that user.

• It solves the problem of name collision. • The isolation is an advantage when the users are completely independent; but is a disadvantage when the users want to cooperate on some task and to access one another’s files. Some systems do not allow local user files to be accessed by other users.

Tree-Structured Directories  A directory (or subdirectory) contains a set of files or subdirectories. A directory is simply another file, but it is treated in a special way. All directories have the same internal format. One bit in each directory entry defines the entry as a file (0) or as a subdirectory (1).  Special system calls are used to create and delete directories.  Pathnames can be of two types  Absolute Path (Begins at the root and follows a path down to the specified file giving directory names on the path) UNIT-4

Page 38

OPERATING SYSTEM

 Relative Path (defines a path from the current directory) • Deleting a directory requires directory must be empty. • We can access the files from one directory to another directory.

Acyclic-graph Directories  An acyclic graph—that is, a graph with no cycles—allows directories to share subdirectories and files. A shared directory or a file will exist in the file system in two or more places at once. A shared file is not the same as two copies of a file. This allows directories to have shared subdirectories and files. The same file or subdirectory may be in two different directories.  A link is effectively a pointer to another file or subdirectory.  “Aliasing” is a problem.  Dangling pointers is a problem (remove all the pointers if a file is deleted).

General Graph Directory:  When links are added to an existing tree-structured directory, a general graph structure can be created. A general graph can have cycles and cycles cause problems when searching or traversing file system. UNIT-4

Page 39

OPERATING SYSTEM

 The primary advantage of an acyclic graph is the relative simplicity of the algorithms to traverse the graph and to determine when there are no more references to a file.

• We want to avoid traversing shared sections of an acyclic graph twice, mainly for performance reasons. If we have just searched a major shared directory for a particular file, without finding that file, we want to avoid be waste of time. • If cycles are allowed to exist in the directory, we likewise want to avoid searching any component twice, for reasons of correctness as well as performance. • Deletion of a file (removes all the symbolic links). Allow only links to files not subdirectories. Use Garbage collection. {computationally expensive} Every time a new link is added, use a cycle detection algorithm to determine whether a cycle now exists. {computationally expensive}

File System Mounting Just as a file must be opened before it is used, a file system must be mounted before it can be available to processes on the system. More specifically, the directory structure can be built out of multiple partitions, which must be mounted to make them available within the file system name space. The mount procedure is straightforward. The operating system is given the name of the device and the mount point—the location within the file structure where the file system is to be attached. Typically, a mount point is an empty directory. For instance, on a UNIX system, a file system containing a user's home directories might be mounted as /home; then, to access the directory structure within that file system, we could precede the directory names with /home, as in /home/jane. Mounting that file system under /users would result in the path name /users/jane, which we could use to reach the same directory. Next, the operating system verifies that the device contains a valid file system. It does so by asking the device driver to read the device directory and verifying that the directory has the expected format. Finally, the operating system notes in its directory structure that a file system is mounted at the specified mount point. This scheme enables the operating system to traverse its directory structure, switching among file systems as appropriate. UNIT-4

Page 40

OPERATING SYSTEM

File system. (a) Existing. (b) Un-mounted partition. Systems impose semantics to clarify functionality. For example, a system may disallow a mount over a directory that contains files; or it may make the mounted file system available at that directory and obscure the directory's existing files until the file system is un-mounted, terminating the use of the file system and allowing access to the original files in that directory. As another example, a system may allow the same file system to be mounted repeatedly, at different mount points; or it may only allow one mount per file system. At this point, only the files on the existing file system can be accessed. Figure shows the effects of mounting the partition residing on /device/dsk over /users.

File Sharing File sharing is very desirable for users who want to collaborate and to reduce the effort required to achieve a computing goal. Therefore, user-oriented operating systems must accommodate the need to share files in spite of the inherent difficulties. Once multiple users are allowed to share files, the challenge is to extend sharing to multiple file systems, including remote file systems.

Multiple Users On a multi-user system, more information needs to be stored for each file:  The owner (user) who owns the file, and who can control its access.  The group of other user IDs that may have some special access to the file. UNIT-4

Page 41

OPERATING SYSTEM

 

What access rights are afforded to the owner (User), the Group, and to the rest of the world ( the universe, a.k.a. Others. ) Some systems have more complicated access control, allowing or denying specific accesses to specifically named users or groups.

Remote File Systems The advent of the Internet introduces issues for accessing files stored on remote computers  The original method was ftp, allowing individual files to be transported across systems as needed. Ftp can be either account or password controlled, or anonymous, not requiring any user name or password.  Various forms of distributed file systems allow remote file systems to be mounted onto a local directory structure, and accessed using normal file access commands. (The actual files are still transported across the network as needed, possibly using ftp as the underlying transport mechanism.)  The WWW has made it easy once again to access files on remote systems without mounting their file systems, generally using (anonymous) ftp as the underlying file transport mechanism. The Client-Server Model  When one computer system remotely mounts a file system that is physically located on another system, the system which physically owns the files acts as a server, and the system which mounts them is the client.  User IDs and group IDs must be consistent across both systems for the system to work properly. (i.e. this is most applicable across multiple computers managed by the same organization, shared by a common group of users.)  The same computer can be both a client and a server. (E.g. cross-linked file systems.)  There are a number of security concerns involved in this model: o Servers commonly restrict mount permission to certain trusted systems only. Spoofing ( a computer pretending to be a different computer ) is a potential security risk. o Servers may restrict remote access to read-only. o Servers restrict which file systems may be remotely mounted. Generally the information within those subsystems is limited, relatively public, and protected by frequent backups.  The NFS (Network File System) is a classic example of such a system. Distributed Information Systems  The Domain Name System, DNS, provides for a unique naming system across all of the Internet.  Domain names are maintained by the Network Information System, NIS, which unfortunately has several security issues. NIS+ is a more secure version, but has not yet gained the same widespread acceptance as NIS.  Microsoft's Common Internet File System, CIFS, establishes a network login for each user on a networked system with shared file access. Older Windows systems used domains, and newer systems (XP, 2000), use active directories. User names must match across the network for this system to be valid. UNIT-4

Page 42

OPERATING SYSTEM



A newer approach is the Lightweight Directory-Access Protocol, LDAP, which provides a secure single sign-on for all users to access all resources on a network. This is a secure system which is gaining in popularity, and which has the maintenance advantage of combining authorization information in one central location.

Failure Modes  When a local disk file is unavailable, the result is generally known immediately, and is generally non-recoverable. The only reasonable response is for the response to fail.  However when a remote file is unavailable, there are many possible reasons, and whether or not it is unrecoverable is not readily apparent. Hence most remote access systems allow for blocking or delayed response, in the hopes that the remote system (or the network) will come back up eventually.

Consistency Semantics  





Consistency Semantics deals with the consistency between the views of shared files on a networked system. When one user changes the file, when do other users see the changes? UNIX Semantics • The UNIX file system uses the following semantics: • Writes to an open file are immediately visible to any other user who has the file open. • One implementation uses a shared location pointer, which is adjusted for all sharing users. • The file is associated with a single exclusive physical resource, which may delay some accesses. Session Semantics • The Andrew File System, AFS uses the following semantics: • Writes to an open file are not immediately visible to other users. • When a file is closed, any changes made become available only to users who open the file at a later time. • According to these semantics, a file can be associated with multiple (possibly different) views. Almost no constraints are imposed on scheduling accesses. No user is delayed in reading or writing their personal copy of the file. • AFS file systems may be accessible by systems around the world. Access control is maintained through (somewhat) complicated access control lists, which may grant access to the entire world (literally) or to specifically named users accessing the files from specifically named remote environments. Immutable-Shared-Files Semantics • Under this system, when a file is declared as shared by its creator, it becomes immutable and the name cannot be re-used for any other resource. Hence it becomes read-only, and shared access is simple.

Protection  

UNIT-4

Files must be kept safe for reliability (against accidental damage), and protection (against deliberate malicious access.) The former is usually managed with backup copies. One simple protection scheme is to remove all access to a file. However this makes the file unusable, so some sort of controlled access must be arranged. Page 43

OPERATING SYSTEM

Types of Access  The following low-level operations are often controlled: o Read - View the contents of the file o Write - Change the contents of the file. o Execute - Load the file onto the CPU and follow the instructions contained therein. o Append - Add to the end of an existing file. o Delete - Remove a file from the system. o List -View the name and other attributes of files on the system.  Higher-level operations, such as copy, can generally be performed through combinations of the above. Access Control  One approach is to have complicated Access Control Lists, ACL, which specify exactly what access is allowed or denied for specific users or groups. o The AFS uses this system for distributed access. o Control is very finely adjustable, but may be complicated, particularly when the specific users involved are unknown. (AFS allows some wild cards, so for example all users on a certain remote system may be trusted, or a given username may be trusted when accessing from any remote system.)  UNIX uses a set of 9 access control bits, in three groups of three. These correspond to R, W, and X permissions for each of the Owner, Group, and Others. The RWX bits control the following privileges for ordinary files and directories: bit R W

X

Files Read ( view ) file contents. Write ( change ) file contents. Execute file contents as a program.

Directories Read directory contents. Required to get a listing of the directory. Change directory contents. Required to create or delete files.

Access detailed directory information. Required to get a long listing, or to access any specific file in the directory. Note that if a user has X but not R permissions on a directory, they can still access specific files, but only if they already know the name of the file they are trying to access.

To condense the length of the access control list, many systems recognize three classifications of users in connection with each file: • Owner: The user who created the file is the owner. • Group: A set of users who are sharing the file and need similar access is a group, or work group. • Universe: All other users in the system constitute the universe. In addition there are some special bits that can also be applied: • The set user ID ( SUID ) bit and/or the set group ID ( SGID ) bits applied to executable files temporarily change the identity of whoever runs the program to match that of the owner / group of the executable program. This allows users running specific programs to have access to files ( while running that program ) to which they would normally be unable to UNIT-4

Page 44

OPERATING SYSTEM

access. Setting of these two bits is usually restricted to root, and must be done with caution, as it introduces a potential security leak. • The sticky bit on a directory modifies write permission, allowing users to only delete files for which they are the owner. This allows everyone to create files in /tmp, for example, but to only delete files which they have created, and not anyone else's. • The SUID, SGID, and sticky bits are indicated with an S, S, and T in the positions for execute permission for the user, group, and others, respectively. If the letter is lower case, ( s, s, t ), then the corresponding execute permission is not also given. If it is upper case, ( S, S, T ), then the coresponding execute permission IS given. • The numeric form of chmod is needed to set these advanced bits.

UNIT-4

Page 45

OPERATING SYSTEM

File System Implementation File System Structure Disks provide the bulk of secondary storage on which a file system is maintained. They have two characteristics that make them a convenient medium for storing multiple files: ◦ They can be rewritten in place; it is possible to read a block from the disk, to modify the block, and to write it back into the same place. ◦ They can access directly any given block of information on the disk.

File System Organization To provide an efficient and convenient access to the disk, the operating system imposes one or more file systems to allow the data to be stored, located, and retrieved easily. A file system poses two quite different design problems. ◦ The first problem is defining how the file system should look to the user. This task involves defining a file and its attributes, the operations allowed on a file and the directory structure for organizing files. ◦ The second problem is creating algorithms and data structures to map the logical file system onto the physical secondary-storage devices. The file system itself is generally composed of many different levels. The structure shown in figure is an example of a layered design. Each level in the design uses the features of lower levels to create new features for use by higher levels. • At the lowest layer are the physical devices, consisting of the magnetic media, motors & controls, and the electronics connected to them and controlling them. Modern disk put more and more of the electronic controls directly on the disk drive itself, leaving relatively little work for the disk controller card to perform. • I/O Control consists of device drivers, special software programs ( often written in assembly ) which communicate with the devices by reading and writing special codes directly to and from memory addresses corresponding to the controller card's registers. Each controller card ( device ) on a system has a different set of addresses ( registers, a.k.a. ports ) that it listens to, and a unique set of command codes and results codes that it understands. • The basic file system level works directly with the device drivers in terms of retrieving and storing raw blocks of data, without any consideration for what is in each block. Depending on the system, blocks may be referred to with a single block number, ( e.g. block # 234234 ), or with head-sector-cylinder combinations. • The file organization module knows about files and their logical blocks, and how they map to physical blocks on the disk. In addition to translating from logical to physical blocks, the file UNIT-4

Page 46

OPERATING SYSTEM

organization module also maintains the list of free blocks, and allocates free blocks to files as needed. • The logical file system deals with all of the meta data associated with a file ( UID, GID, mode, dates, etc ), i.e. everything about the file except the data itself. This level manages the directory structure and the mapping of file names to file control blocks, FCBs, which contain all of the meta data as well as block number information for finding the data on the disk. The layered approach to file systems means that much of the code can be used uniformly for a wide variety of different file systems, and only certain layers need to be file system specific.

File System Implementation File systems store several important data structures on the disk: • A boot-control block, ( per volume ) a.k.a. the boot block in UNIX or the partition boot sector in Windows contains information about how to boot the system off of this disk. This will generally be the first sector of the volume if there is a bootable system loaded on that volume, or the block will be left vacant otherwise. • A volume control block, ( per volume ) a.k.a. the master file table in UNIX or the superblock in Windows, which contains information such as the partition table, number of blocks on each filesystem, and pointers to free blocks and free FCB blocks. • A directory structure ( per file system ), containing file names and pointers to corresponding FCBs. UNIX uses inode numbers, and NTFS uses a master file table. • The File Control Block, FCB, ( per file ) containing details about ownership, size, permissions, dates, etc. UNIX stores this information in inodes, and NTFS in the master file table as a relational database structure. There are also several key data structures stored in memory: • An in-memory mount table. • An in-memory directory cache of recently accessed directory information. • A system-wide open file table, containing a copy of the FCB for every currently open file in the system, as well as some other related information. • A per-process open file table, containing a pointer to the system open files table as well as some other information. ( For example the current file position pointer may be either here or in the system file table, depending on the implementation and whether the file is being shared or not. ) The figure below illustrates some of the interactions of file system components when files are created and/or used:  When a new file is created, a new FCB is allocated and filled out with important information regarding the new file. The appropriate directory is modified with the new file name and FCB information.  When a file is accessed during a program, the open() system call reads in the FCB information from disk, and stores it in the system-wide open file table. An entry is added to UNIT-4

Page 47

OPERATING SYSTEM





the per-process open file table referencing the system-wide table, and an index into the perprocess table is returned by the open() system call. UNIX refers to this index as a file descriptor, and Windows refers to it as a file handle. If another process already has a file open when a new request comes in for the same file, and it is sharable, then a counter in the system-wide table is incremented and the perprocess table is adjusted to point to the existing entry in the system-wide table. When a file is closed, the per-process table entry is freed, and the counter in the systemwide table is decremented. If that counter reaches zero, then the system wide table is also freed. Any data currently stored in memory cache for this file is written out to disk if necessary.

In-memory file-system structures. (a) File open. (b) File read. Partitions and Mounting  Physical disks are commonly divided into smaller units called partitions. They can also be combined into larger units, but that is most commonly done for RAID installations and is left for later chapters.  Partitions can either be used as raw devices ( with no structure imposed upon them ), or they can be formatted to hold a file system ( i.e. populated with FCBs and initial directory structures as appropriate. ) Raw partitions are generally used for swap space, and may also be used for certain programs such as databases that choose to manage their own disk storage system. Partitions containing file systems can generally only be accessed using the file system structure by ordinary users, but can often be accessed as a raw device also by root. UNIT-4

Page 48

OPERATING SYSTEM 





The boot block is accessed as part of a raw partition, by the boot program prior to any operating system being loaded. Modern boot programs understand multiple OSes and filesystem formats, and can give the user a choice of which of several available systems to boot. The root partition contains the OS kernel and at least the key portions of the OS needed to complete the boot process. At boot time the root partition is mounted, and control is transferred from the boot program to the kernel found there. ( Older systems required that the root partition lie completely within the first 1024 cylinders of the disk, because that was as far as the boot program could reach. Once the kernel had control, then it could access partitions beyond the 1024 cylinder boundary. ) Continuing with the boot process, additional file systems get mounted, adding their information into the appropriate mount table structure. As a part of the mounting process the file systems may be checked for errors or inconsistencies, either because they are flagged as not having been closed properly the last time they were used, or just for general principals. File systems may be mounted either automatically or manually. In UNIX a mount point is indicated by setting a flag in the in-memory copy of the inode, so all future references to that inode get re-directed to the root directory of the mounted file system.

Virtual File Systems  Virtual File Systems, VFS, provide a common interface to multiple different filesystem types. In addition, it provides for a unique identifier ( vnode ) for files across the entire space, including across all filesystems of different types. ( UNIX inodes are unique only across a single filesystem, and certainly do not carry across networked file systems. )  The VFS in Linux is based upon four key object types: o The inode object, representing an individual file o The file object, representing an open file. o The superblock object, representing a filesystem. o The dentry object, representing a directory entry.  Linux VFS provides a set of common functionalities for each filesystem, using function pointers accessed through a table. The same functionality is accessed through the same table position for all filesystem types, though the actual functions pointed to by the pointers may be filesystem-specific. See /usr/include/linux/fs.h for full details. Common operations provided include open( ), read( ), write( ), and mmap( ). UNIT-4

Page 49

OPERATING SYSTEM

Allocation Methods An allocation method refers to how disk blocks are allocated for files:  Contiguous allocation  Linked allocation  Indexed allocation

Contiguous Allocation This method requires each file to occupy a set of contiguous blocks on the disk. Disk addresses define linear ordering on the disk. With this ordering, accessing block b+1 after block normally requires no head movement. Contiguous allocation of a file is defined by the disk address and length (in block units) of the first block. If the file is n blocks long and starts at location b, then it occupies blocks b, b + 1, b + 2, …, b + n − 1. The directory entry for each file indicates the address of the starting block and the length of the area allocated for this file.

Problems:  One of the difficulties with contiguous allocation is determining and finding space for a new file to be created and allocate.  Storage allocation involves the same issues for the allocation of contiguous blocks of memory ( first fit, best fit, fragmentation problems, etc. ) The distinction is that the high time penalty required for moving the disk heads from spot to spot may now justify the benefits of keeping files contiguously when possible.  Suffers from external fragmentation  Problems can arise when files grow, or if the exact size of a file is unknown at creation time: o Over-estimation of the file's final size increases external fragmentation and wastes disk space. o Under-estimation may require that a file be moved or a process aborted if the file grows beyond its originally allocated space. o If a file grows slowly over a long time period and the total final space must be allocated initially, then a lot of space becomes unusable before the file fills the space. UNIT-4

Page 50

OPERATING SYSTEM

Linked Allocation Linked allocation solves all problems of contiguous allocation. With linked allocation, each file is a linked list of disk blocks; the disk blocks may be scattered anywhere on the disk. The directory contains a pointer to the first and last blocks of the file.

The advantage of Linked allocation involves no external fragmentation, does not require pre-known file sizes, and allows files to grow dynamically at any time. Disadvantages:  Linked allocation is only efficient for sequential access files, as random access requires starting at the beginning of the list for each new location access.  Allocating clusters of blocks reduces the space wasted by pointers, at the cost of internal fragmentation.  Another big problem with linked allocation is reliability if a pointer is lost or damaged. Doubly linked lists provide some protection, at the cost of additional overhead and wasted space. The File Allocation Table, FAT, used by DOS is a variation of linked allocation, where all the links are stored in a separate table at the beginning of the disk. The benefit of this approach is that the FAT table can be cached in memory, greatly improving random access speeds.

Indexed Allocation This allocation provides solution to problems of contiguous and linked allocation. An index block is created having all pointers to files together into one location. Each file has its own index block which stores the addresses of disk space occupied by the file. Directory contains the addresses of index blocks of files. UNIT-4

Page 51

OPERATING SYSTEM

Some disk space is wasted (relative to linked lists or FAT tables) because an entire index block must be allocated for each file, regardless of how many data blocks the file contains. This leads to questions of how big the index block should be, and how it should be implemented. There are several approaches: 

 

UNIT-4

Linked Scheme - An index block is one disk block, which can be read and written in a single disk operation. The first index block contains some header information, the first N block addresses, and if necessary a pointer to additional linked index blocks. Multi-Level Index - The first index block contains a set of pointers to secondary index blocks, which in turn contain pointers to the actual data blocks. Combined Scheme - This is the scheme used in UNIX inodes, in which the first 12 or so data block pointers are stored directly in the inode, and then singly, doubly, and triply indirect pointers provide access to more data blocks as needed. The advantage of this scheme is that for small files ( which many are ), the data blocks are readily accessible ( up to 48K with 4K block sizes ); files up to about 4144K ( using 4K blocks ) are accessible with only a single indirect block ( which can be cached ), and huge files are still accessible using a relatively small number of disk accesses ( larger in theory than can be addressed by a 32-bit address, which is why some systems have moved to 64-bit file pointers. ) Page 52

OPERATING SYSTEM

Performance  

 

The optimal allocation method is different for sequential access files than for random access files, and is also different for small files than for large files. Some systems support more than one allocation method, which may require specifying how the file is to be used (sequential or random access) at the time it is allocated. Such systems also provide conversion utilities. Some systems have been known to use contiguous access for small files, and automatically switch to an indexed scheme when file sizes surpass a certain threshold. And of course some systems adjust their allocation schemes (e.g. block sizes) to best match the characteristics of the hardware for optimum performance.

Free Space Management The free space list records all disk blocks that are free not allocated to some file or directory. When a new file is created in the free space available, the space is then removed from the free space list. The free space list is implemented as ◦ Bit Vector ◦ Linked List ◦ Grouping ◦ Counting

Bit Vector: The free space list is implemented as a bit map or bit vector. Each block is represented by 1 bit. If block is free the bit is 1. If block is allocated the bit is 0.

Advantage  Simple and efficient to find the first free block or consecutive free blocks on the disk Disadvantage  Inefficient when the vector is kept in main memory.

Linked List A linked list can also be used to keep track of all free blocks. Traversing the list and/or finding a contiguous block of a given size are not easy, but fortunately are not frequently needed operations. Generally the system just adds and removes single blocks from the beginning of the list. Link together all the free disk blocks keeping a pointer to the first free block in a special location on the disk and caching it in memory. The first block contains a pointer to the next free block and so on. UNIT-4

Page 53

OPERATING SYSTEM

Grouping A variation on linked list free lists is to use links of blocks of indices of free blocks. If a block holds up to N addresses, then the first block in the linked-list contains up to N-1 addresses of free blocks and a pointer to the next block of free addresses. The importance of this implementation is that addresses of a large number of free blocks can be found quickly. Counting When there are multiple contiguous blocks of free space then the system can keep track of the starting address of the group and the number of contiguous free blocks. As long as the average length of a contiguous group of free blocks is greater than two this offers a savings in space needed for the free list. (Similar to compression techniques used for graphics images when a group of pixels all the same color is encountered.)

Directory Implementation Directories need to be fast to search, insert, and delete, with a minimum of wasted disk space.

Linear List A linear list is the simplest and easiest directory structure to set up, but it does have some drawbacks.  Finding a file (or verifying one does not already exist upon creation) requires a linear search.  Deletions can be done by moving all entries, flagging an entry as deleted, or by moving the last entry into the newly vacant position.  Sorting the list makes searches faster, at the expense of more complex insertions and deletions.  A linked list makes insertions and deletions into a sorted list easier, with overhead for the links.  More complex data structures, such as B-trees, could also be considered.

Hash Table A hash table can also be used to speed up searches. Hash tables are generally implemented in addition to a linear or other structure. It takes a value computed from file name and returns a pointer to the file name in the linear list. Therefore, it can greatly decrease the directory search time. Insertion and deletion are also fairly straight forward, although some provision must be made for collisions – situations where two file names hash to the same location. The major difficulties with hash table are the generally fixed size of the hash table and the dependence of the hash function on the size of the hash table.

UNIT-4

Page 54

OPERATING SYSTEM

Secondary Storage Structure Overview of Mass-Storage Structure Magnetic Disks Traditional magnetic disks have the following basic structure: o One or more platters in the form of disks covered with magnetic media. Hard disk platters are made of rigid metal, while "floppy" disks are made of more flexible plastic. o Each platter has two working surfaces. Older hard disk drives would sometimes not use the very top or bottom surface of a stack of platters, as these surfaces were more susceptible to potential damage. o Each working surface is divided into a number of concentric rings called tracks. The collection of all tracks that are the same distance from the edge of the platter, ( i.e. all tracks immediately above one another in the following diagram ) is called a cylinder. o Each track is further divided into sectors, traditionally containing 512 bytes of data each, although some modern disks occasionally use larger sector sizes. (Sectors also include a header and a trailer, including checksum information among other things. Larger sector sizes reduce the fraction of the disk consumed by headers and trailers, but increase internal fragmentation and the amount of disk that must be marked bad in the case of errors. ) o The data on a hard drive is read by read-write heads. The standard configuration ( shown below ) uses one head per surface, each on a separate arm, and controlled by a common arm assembly which moves all heads simultaneously from one cylinder to another. o The storage capacity of a traditional disk drive is equal to the number of heads ( i.e. the number of working surfaces ), times the number of tracks per surface, times the number of sectors per track, times the number of bytes per sector. A particular physical block of data is specified by providing the head-sector-cylinder number at which it is located.

Figure 1 - Moving-head disk mechanism.

UNIT-4

Page 55

OPERATING SYSTEM











In operation the disk rotates at high speed, such as 7200 rpm ( 120 revolutions per second. ) The rate at which data can be transferred from the disk to the computer is composed of several steps: o The positioning time, a.k.a. the seek time or random access time is the time required to move the heads from one cylinder to another, and for the heads to settle down after the move. This is typically the slowest step in the process and the predominant bottleneck to overall transfer rates. o The rotational latency is the amount of time required for the desired sector to rotate around and come under the read-write head. This can range anywhere from zero to one full revolution, and on the average will equal one-half revolution. This is another physical step and is usually the second slowest step behind seek time. ( For a disk rotating at 7200 rpm, the average rotational latency would be 1/2 revolution / 120 revolutions per second, or just over 4 milliseconds, a long time by computer standards. o The transfer rate, which is the time required to move the data electronically from the disk to the computer. (Some authors may also use the term transfer rate to refer to the overall transfer rate, including seek time and rotational latency as well as the electronic data transfer rate. ) Disk heads "fly" over the surface on a very thin cushion of air. If they should accidentally contact the disk, then a head crash occurs, which may or may not permanently damage the disk or even destroy it completely. For this reason it is normal to park the disk heads when turning a computer off, which means to move the heads off the disk or to an area of the disk where there is no data stored. Floppy disks are normally removable. Hard drives can also be removable, and some are even hot-swappable, meaning they can be removed while the computer is running, and a new hard drive inserted in their place. Disk drives are connected to the computer via a cable known as the I/O Bus. Some of the common interface formats include Enhanced Integrated Drive Electronics, EIDE; Advanced Technology Attachment, ATA; Serial ATA, SATA, Universal Serial Bus, USB; Fiber Channel, FC, and Small Computer Systems Interface, SCSI. The host controller is at the computer end of the I/O bus, and the disk controller is built into the disk itself. The CPU issues commands to the host controller via I/O ports. Data is transferred between the magnetic surface and onboard cache by the disk controller, and then the data is transferred from that cache to the host controller and the motherboard memory at electronic speeds.

Solid-State Disks - New 



UNIT-4

As technologies improve and economics change, old technologies are often used in different ways. One example of this is the increasing use of solid state disks, or SSDs. SSDs use memory technology as a small fast hard disk. Specific implementations may use either flash memory or DRAM chips protected by a battery to sustain the information through power cycles. Because SSDs have no moving parts they are much faster than traditional hard drives, and certain problems such as the scheduling of disk accesses simply do not apply. However SSDs also have their weaknesses: They are more expensive than hard drives, generally not as large, and may have shorter life spans. SSDs are especially useful as a highPage 56

OPERATING SYSTEM



speed cache of hard-disk information that must be accessed quickly. One example is to store file system meta-data, e.g. directory and inode information that must be accessed quickly and often. Another variation is a boot disk containing the OS and some application executables, but no vital user data. SSDs are also used in laptops to make them smaller, faster, and lighter. Because SSDs are so much faster than traditional hard disks, the throughput of the bus can become a limiting factor, causing some SSDs to be connected directly to the system PCI bus for example.

Magnetic Tapes   

Magnetic tapes were once used for common secondary storage before the days of hard disk drives, but today are used primarily for backups. Accessing a particular spot on a magnetic tape can be slow, but once reading or writing commences, access speeds are comparable to disk drives. Capacities of tape drives can range from 20 to 200 GB, and compression can double that capacity.

Disk Structure The traditional head-sector-cylinder, HSC numbers are mapped to linear block addresses by numbering the first sector on the first head on the outermost track as sector 0. Numbering proceeds with the rest of the sectors on that same track, and then the rest of the tracks on the same cylinder before proceeding through the rest of the cylinders to the center of the disk. In modern practice these linear block addresses are used in place of the HSC numbers for a variety of reasons: 1. The linear length of tracks near the outer edge of the disk is much longer than for those tracks located near the center, and therefore it is possible to squeeze many more sectors onto outer tracks than onto inner ones. 2. All disks have some bad sectors, and therefore disks maintain a few spare sectors that can be used in place of the bad ones. The mapping of spare sectors to bad sectors in managed internally to the disk controller. 3. Modern hard drives can have thousands of cylinders, and hundreds of sectors per track on their outermost tracks. These numbers exceed the range of HSC numbers for many (older) operating systems, and therefore disks can be configured for any convenient combination of HSC values that falls within the total number of sectors physically on the drive. There is a limit to how closely packed individual bits can be placed on a physical media, but that limit is growing increasingly more packed as technological advances are made. Modern disks pack many more sectors into outer cylinders than inner ones, using one of two approaches: o With Constant Linear Velocity, CLV, the density of bits is uniform from cylinder to cylinder. Because there are more sectors in outer cylinders, the disk spins slower when reading those cylinders, causing the rate of bits passing under the read-write head to remain constant. This is the approach used by modern CDs and DVDs. o With Constant Angular Velocity, CAV, the disk rotates at a constant angular speed, with the bit density decreasing on outer cylinders. (These disks would have a constant number of sectors per track on all cylinders.) UNIT-4

Page 57

OPERATING SYSTEM

Disk Scheduling 



Disk transfer speeds are limited primarily by seek times and rotational latency. When multiple requests are to be processed there is also some inherent delay in waiting for other requests to be processed. Bandwidth is measured by the amount of data transferred divided by the total amount of time from the first request being made to the last transfer being completed, ( for a series of disk requests.) Both bandwidth and access time can be improved by processing requests in a good order. Disk requests include the disk address, memory address, number of sectors to transfer, and whether the request is for reading or writing.

We illustrate them with a disk queue (0-199) with requests for I/O to blocks on cylinders: 98, 183, 37, 122, 14, 124, 65, 67 Head pointer starts at cylinder number 53

FCFS Scheduling First-Come First-Serve is simple and intrinsically fair, but not very efficient.

Total head movement of 640 cylinders. 98-53=45; 183-98=85; 183-37=146; 122-37=85; 122-14=108; 124-14=110; 124-65=59; 67-65=2 45 + 85 + 146 + 85 + 108 + 110 + 59 + 2 = 640 The problem with this schedule is illustrated by the wild swing from 122 to 14 and then back to 124.

SSTF (Shortest Seek Time First) Scheduling This selects the request with the minimum seek time from the current head position. SSTF scheduling is a form of SJF scheduling; may cause starvation of some requests if a constant stream of requests arrives for the same general area of the disk. SSTF reduces the total head movement to 236 cylinders, down from 640 required for the same set of requests under FCFS. Note, however UNIT-4

Page 58

OPERATING SYSTEM

that the distance could be reduced still further to 208 by starting with 37 and then 14 first before processing the rest of the requests.

SCAN Scheduling The SCAN algorithm, a.k.a. the elevator algorithm moves back and forth from one end of the disk to the other, similarly to an elevator processing requests in a tall building. Under the SCAN algorithm, if a request arrives just ahead of the moving head then it will be processed right away, but if it arrives just after the head has passed, then it will have to wait for the head to pass going the other way on the return trip. This leads to a fairly wide variation in access times which can be improved upon. Consider, for example, when the head reaches the high end of the disk: Requests with high cylinder numbers just missed the passing head, which means they are all fairly recent requests, whereas requests with low numbers may have been waiting for a much longer time. Making the return scan from high to low then ends up accessing recent requests first and making older requests wait that much longer.

The above example shows total head movement of 208 cylinders. UNIT-4

Page 59

OPERATING SYSTEM

C-SCAN Scheduling The Circular-SCAN algorithm improves upon SCAN by treating all requests in a circular queue fashion - Once the head reaches the end of the disk, it returns to the other end without processing any requests, and then starts again from the beginning of the disk:

LOOK and C-LOOK Scheduling In both SCAN and C-SCAN the arm of the disk goes all the way to the end of disk (0 and 199). LOOK scheduling improves upon SCAN by looking ahead at the queue of pending requests, and not moving the heads any farther towards the end of the disk than is necessary. LOOK and C-LOOK same as SCAN and C-SCAN respectively, but the arm of the disk only goes as far as the last request in each direction, then reverses direction immediately, without first going all the way to the end of the disk.

UNIT-4

Page 60

OPERATING SYSTEM

Selection of a Disk-Scheduling Algorithm With very low loads all algorithms are equal, since there will normally only be one request to process at a time.  For slightly larger loads, SSTF offers better performance than FCFS, but may lead to starvation when loads become heavy enough.  For busier systems, SCAN and LOOK algorithms eliminate starvation problems. The actual optimal algorithm may be something even more complex than those discussed here, but the incremental improvements are generally not worth the additional overhead. Some improvement to overall file system access times can be made by intelligent placement of directory and/or inode information. If those structures are placed in the middle of the disk instead of at the beginning of the disk, then the maximum distance from those structures to data blocks is reduced to only one-half of the disk size. If those structures can be further distributed and furthermore have their data blocks stored as close as possible to the corresponding directory structures, then that reduces still further the overall time to find the disk block numbers and then access the corresponding data blocks. On modern disks the rotational latency can be almost as significant as the seek time, however it is not within the OSes control to account for that, because modern disks do not reveal their internal sector mapping schemes, ( particularly when bad blocks have been remapped to spare sectors. )  Some disk manufacturers provide for disk scheduling algorithms directly on their disk controllers, ( which do know the actual geometry of the disk as well as any remapping ), so that if a series of requests are sent from the computer to the controller then those requests can be processed in an optimal order.  Unfortunately there are some considerations that the OS must take into account that are beyond the abilities of the on-board disk-scheduling algorithms, such as priorities of some requests over others, or the need to process certain requests in a particular order. For this reason OSes may elect to spoon-feed requests to the disk controller one at a time in certain situations.

Disk Management Disk Formatting 





UNIT-4

Before a disk can be used, it has to be low-level formatted, which means laying down all of the headers and trailers marking the beginning and ends of each sector. Included in the header and trailer are the linear sector numbers, and error-correcting codes, ECC, which allow damaged sectors to not only be detected, but in many cases for the damaged data to be recovered (depending on the extent of the damage.) Sector sizes are traditionally 512 bytes, but may be larger, particularly in larger drives. ECC calculation is performed with every disk read or write, and if damage is detected but the data is recoverable, then a soft error has occurred. Soft errors are generally handled by the on-board disk controller, and never seen by the OS. Once the disk is low-level formatted, the next step is to partition the drive into one or more separate partitions. This step must be completed even if the disk is to be used as a single large partition, so that the partition table can be written to the beginning of the disk.

Page 61

OPERATING SYSTEM 

After partitioning, then the file systems must be logically formatted, which involves laying down the master directory information (FAT table or inode structure), initializing free lists, and creating at least the root directory of the file system. (Disk partitions which are to be used as raw devices are not logically formatted. This saves the overhead and disk space of the file system structure, but requires that the application program manage its own disk storage requirements. )

Boot Block 



 



Computer ROM contains a bootstrap program ( OS independent ) with just enough code to find the first sector on the first hard drive on the first controller, load that sector into memory, and transfer control over to it. ( The ROM bootstrap program may look in floppy and/or CD drives before accessing the hard drive, and is smart enough to recognize whether it has found valid boot code or not. ) The first sector on the hard drive is known as the Master Boot Record, MBR, and contains a very small amount of code in addition to the partition table. The partition table documents how the disk is partitioned into logical disks, and indicates specifically which partition is the active or boot partition. The boot program then looks to the active partition to find an operating system, possibly loading up a slightly larger / more advanced boot program along the way. In a dual-boot ( or larger multi-boot ) system, the user may be given a choice of which operating system to boot, with a default action to be taken in the event of no response within some time frame. Once the kernel is found by the boot program, it is loaded into memory and then control is transferred over to the OS. The kernel will normally continue the boot process by initializing all important kernel data structures, launching important system services ( e.g. network daemons, sched, init, etc. ), and finally providing one or more login prompts. Boot options at this stage may include single-user a.k.a.maintenance or safe modes, in which very few system services are started - These modes are designed for system administrators to repair problems or otherwise maintain the system.

Figure 10.9 - Booting from disk in Windows 2000.

Bad Blocks 

UNIT-4

No disk can be manufactured to 100% perfection, and all physical objects wear out over time. For these reasons all disks are shipped with a few bad blocks, and additional blocks can be

Page 62

OPERATING SYSTEM









expected to go bad slowly over time. If a large number of blocks go bad then the entire disk will need to be replaced, but a few here and there can be handled through other means. In the old days, bad blocks had to be checked for manually. Formatting of the disk or running certain disk-analysis tools would identify bad blocks, and attempt to read the data off of them one last time through repeated tries. Then the bad blocks would be mapped out and taken out of future service. Sometimes the data could be recovered, and sometimes it was lost forever. ( Disk analysis tools could be either destructive or non-destructive. ) Modern disk controllers make much better use of the error-correcting codes, so that bad blocks can be detected earlier and the data usually recovered. ( Recall that blocks are tested with every write as well as with every read, so often errors can be detected before the write operation is complete, and the data simply written to a different sector instead. ) Note that re-mapping of sectors from their normal linear progression can throw off the disk scheduling optimization of the OS, especially if the replacement sector is physically far away from the sector it is replacing. For this reason most disks normally keep a few spare sectors on each cylinder, as well as at least one spare cylinder. Whenever possible a bad sector will be mapped to another sector on the same cylinder, or at least a cylinder as close as possible. Sector slipping may also be performed, in which all sectors between the bad sector and the replacement sector are moved down by one, so that the linear progression of sector numbers can be maintained. If the data on a bad block cannot be recovered, then a hard error has occurred, which requires replacing the file(s) from backups, or rebuilding them from scratch.

Swap-Space Management Swap-space management is another low-level task of the operating system. Virtual memory uses disk space as an extension of main memory. Since disk access is much slower than memory access, using swap space significantly decreases system performance. The main goal for the design and implementation of swap space is to provide the best throughput for the virtual-memory system. Modern systems typically swap out pages as needed, rather than swapping out entire processes. Hence the swapping system is part of the virtual memory management system. Managing swap space is obviously an important task for modern OSes.

Swap-Space Use The amount of swap space needed by an OS varies greatly according to how it is used. Some systems require an amount equal to physical RAM; some want a multiple of that; some want an amount equal to the amount by which virtual memory exceeds physical RAM, and some systems use little or none at all! Some systems support multiple swap spaces on separate disks in order to speed up the virtual memory system.

Swap-Space Location Swap space can be physically located in one of two locations:  As a large file which is part of the regular file system. This is easy to implement, but inefficient. Not only must the swap space be accessed through the directory system, the file is also subject to fragmentation issues. Caching the block location helps in finding the physical blocks, but that is not a complete fix. UNIT-4

Page 63

OPERATING SYSTEM 

As a raw partition, possibly on a separate or little-used disk. This allows the OS more control over swap space management, which is usually faster and more efficient. Fragmentation of swap space is generally not a big issue, as the space is re-initialized every time the system is rebooted. The downside of keeping swap space on a raw partition is that it can only be grown by repartitioning the hard drive.

Swap-Space Management: An Example 



Historically OSes swapped out entire processes as needed. Modern systems swap out only individual pages, and only as needed. (For example process code blocks and other blocks that have not been changed since they were originally loaded are normally just freed from the virtual memory system rather than copying them to swap space, because it is faster to go find them again in the file system and read them back in from there than to write them out to swap space and then read them back.) In the mapping system shown below for Linux systems, a map of swap space is kept in memory, where each entry corresponds to a 4K block in the swap space. Zeros indicate free slots and non-zeros refer to how many processes have a mapping to that particular block (>1 for shared pages only.)

The data structures for swapping on Linux systems.

UNIT-4

Page 64

OPERATING SYSTEM

I/O Management Introduction The two main jobs of a computer are I/O and processing. In many cases, the main job is I/O and the processing is merely incidental. For instance, when we browse a web page or edit a file, our immediate interest is to read or enter some information, not to compute an answer. The role of the operating system in computer I/O is to manage and control I/O operations and I/O devices.

Overview The control of devices connected to the computer is a major concern of operating system designers. Because I/O devices vary so widely in their function and speed (consider a mouse, a hard disk, and a CD-ROM jukebox), a variety of methods is needed to control them. These methods form the I/O subsystem of the kernel, which separates the rest of the kernel from the complexity of managing I/O devices. I/O-device technology exhibits two conflicting trends. On one hand, we see increasing standardization of software and hardware interfaces. This trend helps us to incorporate improved device generations into existing computers and operating systems. On the other hand, we see an increasingly broad variety of I/O devices new devices are so unlike previous devices that it is a challenge to incorporate them into our computers and operating systems. This challenge is met by a combination of hardware and software techniques. The basic I/O hardware elements, such as ports, buses, and device controllers, accommodate a wide variety of I/O devices. To encapsulate the details and oddities of different devices, the kernel of an operating system is structured to use device-driver modules. The device drivers present a uniform device-access interface to the I/O subsystem, much as system calls provide a standard interface between the application and the operating system.

I/O Hardware Computers operate a great many kinds of devices. Most fit into the general categories of storage devices (disks, tapes), transmission devices (network cards, modems), and human-interface devices (screen, keyboard, mouse). Other devices are more specialized, such as the steering of a military fighter jet or a space shuttle. In these aircraft, a human gives input to the flight computer via a joystick, and the computer sends output commands that cause motors to move rudders, flaps, and thrusters. Despite the incredible variety of I/O devices, we need only a few concepts to understand how the devices are attached, and how the software can control the hardware. A device communicates with a computer system by sending signals over a cable or even through the air. The device communicates with the machine via a connection point (or port), for example, a serial port. If one or more devices use a common set of wires, the connection is called a bus. A bus is a set of wires and a rigidly defined protocol that specifies a set of messages that can be sent on the wires. In terms of the electronics, the messages are conveyed by patterns of electrical voltages applied to the wires with defined timings. When device A has a cable that plugs into device B, and device B has a cable that plugs into device C, and device C plugs into a port on the computer, this arrangement is called a daisy chain. A daisy chain usually operates as a bus. Buses are used widely in computer architecture. Figure below shows a typical PC bus structure. This figure shows a PCI bus (the common PC system bus) that connects the processor-memory UNIT-4

Page 65

OPERATING SYSTEM

subsystem to the fast devices, and an expansion bus that connects relatively slow devices such as the keyboard and serial and parallel ports. In the upper-right portion of the figure, four disks are connected together on a SCSI bus plugged into a SCSI controller.

A typical PC Bus Structure A controller is a collection of electronics that can operate a port, a bus, or a device. A serial-port controller is a simple device controller. It is a single chip (or portion of a chip) in the computer that controls the signals on the wires of a serial port. By contrast, a SCSI bus controller is not simple. Because the SCSI protocol is complex, the SCSI bus controller is often implemented as a separate circuit board (or a host adapter) that plugs into the computer. It typically contains a processor, microcode, and some private memory to enable it to process the SCSI protocol messages. Some devices have their own built-in controllers. If you look at a disk drive, you will see a circuit board attached to one side. This board is the disk controller. It implements the disk side of the protocol for some kind of connection, SCSI or IDE, for instance. It has microcode and a processor to do many tasks, such as bad-sector mapping, pre-fetching, buffering, and caching. The processor communicates with the controller by reading and writing bit patterns in these registers. One way that this communication can occur is through the use of special I/O instructions that specify the transfer of a byte or word to an I/O port address. The I/O instruction triggers bus lines to select the proper device and to move bits into or out of a device register. Alternatively, the device controller can support memory-mapped I/O. In this case, the device-control registers are mapped into the address space of the processor. The CPU executes I/O requests using the standard data-transfer instructions to read and write the device-control registers. Some systems use both techniques. For instance, PCs use I/O instructions to control some devices and memory-mapped I/O to control others. Figure below shows the usual PC I/O port addresses. The graphics controller has I/O ports for basic control operations, but the controller has a large memory-mapped region to hold screen contents. The process sends output to the screen by writing data into the memory-mapped region. UNIT-4

Page 66

OPERATING SYSTEM

The controller generates the screen image based on the contents of this memory. This technique is simple to use. Moreover, writing millions of bytes to the graphics memory is faster than issuing millions of I/O instructions. But the ease of writing to a memory-mapped I/O controller is offset by a disadvantage. Because a common type of software fault is a write through an incorrect pointer to an unintended region of memory, a memory-mapped device register is vulnerable to accidental modification. Of course, protected memory helps to reduce this risk.

Device I/O Port Locations on PCs An I/O port typically consists of four registers, called the status, control, data-in, and data-out registers. • The status register contains bits that can be read by the host. These bits indicate states such as whether the current command has completed, whether a byte is available to be read from the data-in register, and whether there has been a device error. • The control register can be written by the host to start a command or to change the mode of a device. For instance, a certain bit in the control register of a serial port chooses between full-duplex and half-duplex communication, another enables parity checking, a third bit sets the word length to 7 or 8 bits, and other bits select one of the speeds supported by the serial port. • The data-in register is read by the host to get input. • The data-out register is written by the host to send output. The data registers are typically 1 to 4 bytes. Some controllers have FIFO chips that can hold several bytes of input or output data to expand the capacity of the controller beyond the size of the data register. A FIFO chip can hold a small burst of data until the device or host is able to receive those data.

Polling The complete protocol for interaction between the host and a controller can be intricate, but the basic handshaking notion is simple. We explain handshaking by an example. We assume that 2 bits are used to coordinate the producer-consumer relationship between the controller and the host. The controller indicates its state through the busy bit in the status register. The controller sets the UNIT-4

Page 67

OPERATING SYSTEM

busy bit when it is busy working, and clears the busy bit when it is ready to accept the next command. The host signals its wishes via the command-ready bit in the command register. The host sets the command-ready bit when a command is available for the controller to execute. For this example, the host writes output through a port, coordinating with the controller by handshaking as follows. 1. The host repeatedly reads the busy bit until that bit becomes clear. 2. The host sets the write bit in the command register and writes a byte into the data-out register. 3. The host sets the command-ready bit. 4. When the controller notices that the command-ready bit is set, it sets the busy bit. 5. The controller reads the command register and sees the write command. 6. It reads the data-out register to get the byte, and does the I/O to the device. 7. The controller clears the command-ready bit, clears the error bit in the status register to indicate that the device I/O succeeded, and clears the busy bit to indicate that it is finished. This loop is repeated for each byte. In step 1, the host is busy-waiting or polling: It is in a loop, reading the status register over and over until the busy bit becomes clear. If the controller and device are fast, this method is a reasonable one. But if the wait may be long, the host should probably switch to another task. For some devices, the host must service the device quickly, or data will be lost. For instance, when data are streaming in on a serial port or from a keyboard, the small buffer on the controller will overflow and data will be lost if the host waits too long before returning to read the bytes. In many computer architectures, three CPU-instruction cycles are sufficient to poll a device: read a device register, logical-and to extract a status bit, and branch if not zero. Clearly, the basic polling operation is efficient. But polling becomes inefficient when it is attempted repeatedly, yet rarely finds a device to be ready for service, while other useful CPU processing remains undone. In such instances, it may be more efficient to arrange for the hardware controller to notify the CPU when the device becomes ready for service, rather than to require the CPU to poll repeatedly for an I/O completion. The hardware mechanism that enables a device to notify the CPU is called an interrupt.

Interrupts The basic interrupt mechanism works as follows. The CPU hardware has a wire called the interruptrequest line that the CPU senses after executing every instruction. When the CPU detects that a controller has asserted a signal on the interrupt request line, the CPU saves a small amount of state, such as the current value of the instruction pointer, and jumps to the interrupt-handler routine at a fixed address in memory. The interrupt handler determines the cause of the interrupt, performs the necessary processing, and executes a return from interrupt instruction to return the CPU to the execution state prior to the interrupt. We say that the device controller raises an interrupt by asserting a signal on the interrupt request line, the CPU catches the interrupt and dispatches to the interrupt handler, and the handler clears the interrupt by servicing the device. This basic interrupt mechanism enables the CPU to respond to an asynchronous event, such as a device controller becoming ready for service. In a modern operating system, we need more sophisticated interrupt-handling features. First, we need the ability to defer interrupt handling during critical processing. Second, we need an efficient way to dispatch to the proper interrupt handler for a device, without first polling all the devices to see which one raised the interrupt. Third, UNIT-4

Page 68

OPERATING SYSTEM

we need multilevel interrupts, so that the operating system can distinguish between high- and lowpriority interrupts, and can respond with the appropriate degree of urgency. In modern computer hardware, these three features are provided by the CPU and by the interrupt-controller hardware.

Interrupt Driven I/O Cycle Most CPUs have two interrupt request lines. One is the non-maskable interrupt, which is reserved for events such as unrecoverable memory errors. The second interrupt line is maskable: It can be turned off by the CPU before the execution of critical instruction sequences that must not be interrupted. The maskable interrupt is used by device controllers to request service. The interrupt mechanism accepts an address—a number that selects a specific interrupt-handling routine from a small set. In most architecture, this address is an offset in a table called the interrupt vector. This vector contains the memory addresses of specialized interrupt handlers. The purpose of a vectored interrupt mechanism is to reduce the need for a single interrupt handler to search all possible sources of interrupts to determine which one needs service. In practice, however, computers have more devices (and hence, interrupt handlers) than they have address elements in the interrupt vector. A common way to solve this problem is to use the technique of interrupt chaining, in which each element in the interrupt vector points to the head of a list of interrupt handlers. When an interrupt is raised, the handlers on the corresponding list are called one by one, until one is found that can service the request. This structure is a compromise between the overhead of a huge interrupt table and the inefficiency of a dispatching to a single interrupt handler. The interrupt mechanism also implements a system of interrupt priority levels. This mechanism enables the CPU to defer the handling of low-priority interrupts without masking off all interrupts, and makes it possible for a high-priority interrupt to preempt the execution of a low-priority interrupt. UNIT-4

Page 69

OPERATING SYSTEM

Direct Memory Access For a device that does large transfers, such as a disk drive, it seems wasteful to use an expensive general-purpose processor to watch status bits and to feed data into a controller register 1 byte at a time—a process termed programmed I/O (PIO). Many computers avoid burdening the main CPU with PIO by offloading some of this work to a special-purpose processor called a direct-memoryaccess (DMA) controller. To initiate a DMA transfer, the host writes a DMA command block into memory. This block contains a pointer to the source of a transfer, a pointer to the destination of the transfer, and a count of the number of bytes to be transferred. The CPU writes the address of this command block to the DMA controller, and then goes on with other work. The DMA controller proceeds to operate the memory bus directly, placing addresses on the bus to perform transfers without the help of the main CPU. A simple DMA controller is a standard component in PCs, and bus-mastering I/O boards for the PC usually contain their own high-speed DMA hardware. Handshaking between the DMA controller and the device controller is performed via a pair of wires called DMA-request and DMA-acknowledge. The device controller places a signal on the DMArequest wire when a word of data is available for transfer. This signal causes the DMA controller to seize the memory bus, to place the desired address on the memory-address wires, and to place a signal on the DMA-acknowledge wire. When the device controller receives the DMA-acknowledge signal, it transfers the word of data to memory, and removes the DMA-request signal.

Steps in DMA Transfer When the entire transfer is finished, the DMA controller interrupts the CPU. This process is depicted in figure above. When the DMA controller seizes the memory bus, the CPU is momentarily prevented from accessing main memory, although it can still access data items in its primary and secondary cache. Although this cycle stealing can slow down the CPU computation, offloading the data-transfer work to a DMA controller generally improves the total system performance. Some computer architectures use physical memory addresses for DMA, but others perform direct virtualmemory access (DVMA), using virtual addresses that undergo virtual- to physical-memory address translation. DVMA can perform a transfer between two memory-mapped devices without the intervention of the CPU or the use of main memory.

UNIT-4

Page 70

OPERATING SYSTEM

On protected-mode kernels, the operating system generally prevents processes from issuing device commands directly. This discipline protects data from access-control violations, and also protects the system from erroneous use of device controllers that could cause a system crash. Instead, the operating system exports functions that a sufficiently privileged process can use to access low-level operations on the underlying hardware. On kernels without memory protection, processes can access device controllers directly. This direct access can be used to obtain high performance, since it can avoid kernel communication, context switches, and layers of kernel software. Unfortunately, it interferes with system security and stability. The trend in general-purpose operating systems is to protect memory and devices, so that the system can try to guard against erroneous or malicious applications. Although the hardware aspects of I/O are complex when considered at the level of detail of electronics-hardware designers, the concepts that we have just described are sufficient to understand many I/O aspects of operating systems. Let's review the main concepts: • A bus • A controller • An I/O port and its registers • The handshaking relationship between the host and a device controller • The execution of this handshaking in a polling loop or via interrupts • The offloading of this work to a DMA controller for large transfers

Application I/O Interface Like other complex software-engineering problems, the approach here involves abstraction, encapsulation, and software layering. Specifically, we can abstract away the detailed differences in I/O devices by identifying a few general kinds. Each general kind is accessed through a standardized set of functions—an interface. The differences are encapsulated in kernel modules called device drivers that internally are custom tailored to each device, but that export one of the standard interfaces. Figure below illustrates how the I/O-related portions of the kernel are structured in software layers.

UNIT-4

Page 71

OPERATING SYSTEM

The purpose of the device-driver layer is to hide the differences among device controllers from the I/O subsystem of the kernel, much as the I/O system calls encapsulate the behavior of devices in a few generic classes that hide hardware differences from applications. Making the I/O subsystem independent of the hardware simplifies the job of the operating-system developer. It also benefits the hardware manufacturers. They either design new devices to be compatible with an existing host controller interface (such as SCSI-2), or they write device drivers to interface the new hardware to popular operating systems. Thus, new peripherals can be attached to a computer without waiting for the operating-system vendor to develop support code. Devices vary in many dimensions, as illustrated in Figure below. • Character-stream or block: A character-stream device transfers bytes one by one, whereas a block device transfers a block of bytes as a unit. • Sequential or random-access: A sequential device transfers data in a fixed order determined by the device, whereas the user of a random-access device can instruct the device to seek to any of the available data storage locations. • Synchronous or asynchronous: A synchronous device is one that performs data transfers with predictable response times. An asynchronous device exhibits irregular or unpredictable response times. • Sharable or dedicated: A sharable device can be used concurrently by several processes or threads; a dedicated device cannot. • Speed of operation: Device speeds range from a few bytes per second to a few gigabytes per second. • Read-write, read only, or write only: Some devices perform both input and output, but others support only one data direction.

Characteristics of I/O devices Block and Character Devices The block-device interface captures all the aspects necessary for accessing disk drives and other block-oriented devices. The expectation is that the device understands commands such as read() UNIT-4

Page 72

OPERATING SYSTEM

and write(), and, if it is a random access device, it has a seek() command to specify which block to transfer next. Applications normally access such a device through a file-system interface. The operating system itself, and special applications such as database-management systems, may prefer to access a block device as a simple linear array of blocks. This mode of access is sometimes called raw I/O. We can see that read(), write(), and seek() capture the essential behaviors of block-storage devices, so that applications are insulated from the low-level differences among those devices. Memory-mapped file access can be layered on top of block-device drivers. Rather than offering read and write operations, a memory-mapped interface provides access to disk storage via an array of bytes in main memory. The system call that maps a file into memory returns the virtual-memory address of an array of characters that contains a copy of the file. The actual data transfers are performed only when needed to satisfy access to the memory image. Because the transfers are handled by the same mechanism as that used for demand-paged virtual-memory access, memorymapped I/O is efficient. Memory mapping is also convenient for programmers—access to a memory-mapped file is as simple as reading and writing to memory. Operating systems that offer virtual memory commonly use the mapping interface for kernel services. For instance, to execute a program, the operating system maps the executable into memory, and then transfers control to the entry address of the executable. The mapping interface is also commonly used for kernel access to swap space on disk. A keyboard is an example of a device that is accessed through a character-stream interface. The basic system calls in this interface enable an application to get() or put() one character. On top of this interface, libraries can be built that offer line-at- a-time access, with buffering and editing services.

Network Devices Because the performance and addressing characteristics of network I/O differ significantly from those of disk I/O, most operating systems provide a network I/O interface that is different from the read()-write()-seek() interface used for disks. One interface available in many operating systems, including UNIX and Windows NT, is the network socket interface. Think of a wall socket for electricity: Any electrical appliance can be plugged in. By analogy, the system calls in the socket interface enable an application to create a socket, to connect a local socket to a remote address (which plugs this application into a socket created by another application), to listen for any remote application to plug into the local socket, and to send and receive packets over the connection. To support the implementation of servers, the socket interface also provides a function called select() that manages a set of sockets. A call to select() returns information about which sockets have a packet waiting to be received, and which sockets have room to accept a packet to be sent. The use of select() eliminates the polling and busy waiting that would otherwise be necessary for network I/O. These functions encapsulate the essential behaviors of networks, greatly facilitating the creation of distributed applications that can use any underlying network hardware and protocol stack. Many other approaches to interprocess communication and network communication have been implemented. For instance, Windows NT provides one interface to the network interface card, and a second interface to the network protocols. UNIT-4

Page 73

OPERATING SYSTEM

Clocks and Timers Most computers have hardware clocks and timers that provide three basic functions: • Give the current time • Give the elapsed time • Set a timer to trigger operation X at time T These functions are used heavily by the operating system, and also by time sensitive applications. Unfortunately, the system calls that implement these functions are not standardized across operating systems. The hardware to measure elapsed time and to trigger operations is called a programmable interval timer. It can be set to wait a certain amount of time and then to generate an interrupt. It can be set to do this operation once, or to repeat the process, to generate periodic interrupts. The scheduler uses this mechanism to generate an interrupt that will preempt a process at the end of its time slice. The disk I/O subsystem uses it to invoke the flushing of dirty cache buffers to disk periodically, and the network subsystem uses it to cancel operations that are proceeding too slowly because of network congestion or failures. The operating system may also provide an interface for user processes to use timers. The operating system can support more timer requests than the number of timer hardware channels by simulating virtual clocks. To do so, the kernel (or the timer device driver) maintains a list of interrupts wanted by its own routines and by user requests, sorted in earliest-time-first order. It sets the timer for the earliest time. When the timer interrupts, the kernel signals the requester, and reloads the timer with the next earliest time. Blocking and Non-blocking I/O Another aspect of the system-call interface relates to the choice between blocking I/O and non blocking (or asynchronous) I/O. When an application issues a blocking system call, the execution of the application is suspended. The application is moved from the operating system's run queue to a wait queue. After the system call completes, the application is moved back to the run queue, where it is eligible to resume execution, at which time it will receive the values returned by the system call. The physical actions performed by I/O devices are generally asynchronous— they take a varying or unpredictable amount of time. Nevertheless, most operating systems use blocking system calls for the application interface, because blocking application code is easier to understand than nonblocking application code. Some user-level processes need non-blocking I/O. One example is a user interface that receives keyboard and mouse input while processing and displaying data on the screen. Another example is a video application that reads frames from a file on disk while simultaneously decompressing and displaying the output on the display. One way that an application writer can overlap execution with I/O is to write a multithreaded application. Some threads can perform blocking system calls, while others continue executing. The Solaris developers used this technique to implement a user-level library for asynchronous I/O, freeing the application writer from that task. Some operating systems provide non-blocking I/O system calls. A non-blocking call does not halt the execution of the application for an extended time. Instead, it returns quickly, with a return value that indicates how many bytes were transferred. An alternative to a non-blocking system call is an asynchronous system call. An asynchronous call returns immediately, without waiting for the I/O to complete. The application continues to execute its code. The completion of the I/O at some future time is communicated to the application, either UNIT-4

Page 74

OPERATING SYSTEM

through the setting of some variable in the address space of the application, or through the triggering of a signal or software interrupt or a call-back routine that is executed outside the linear control flow of the application. The difference between non-blocking and asynchronous system calls is that a non-blocking read() returns immediately with whatever data are available—the full number of bytes requested, fewer, or none at all.

Kernel I/O Subsystem Kernels provide many services related to I/O. Several services—scheduling, buffering, caching, spooling, device reservation, and error handling—are provided by the kernel's I/O subsystem and build on the hardware and device-driver infrastructure.

I/O Scheduling To schedule a set of I/O requests means to determine a good order in which to execute them. The order in which applications issue system calls rarely is the best choice. Scheduling can improve overall system performance, can share device access fairly among processes, and can reduce the average waiting time for I/O to complete. Here is a simple example to illustrate the opportunity. Suppose that a disk arm is near the beginning of a disk, and that three applications issue blocking read calls to that disk. Application 1 requests a block near the end of the disk, application 2 requests one near the beginning, and application 3 requests one in the middle of the disk. The operating system can reduce the distance that the disk arm travels by serving the applications in order 2, 3, 1. Rearranging the order of service in this way is the essence of I/O scheduling. Operating-system developers implement scheduling by maintaining a queue of requests for each device. When an application issues a blocking I/O system call, the request is placed on the queue for that device. The I/O scheduler rearranges the order of the queue to improve the overall system efficiency and the average response time experienced by applications. The operating system may also try to be fair, so that no one application receives especially poor service, or it may give priority service for delay-sensitive requests. For instance, requests from the virtual memory subsystem may take priority over application requests.

Buffering A buffer is a memory area that stores data while they are transferred between two devices or between a device and an application. Buffering is done for three reasons. One reason is to cope with a speed mismatch between the producer and consumer of a data stream. Suppose, for example, that a file is being received via modem for storage on the hard disk. The modem is about a thousand times slower than the hard disk. So a buffer is created in main memory to accumulate the bytes received from the modem. When an entire buffer of data has arrived, the buffer can be written to disk in a single operation. Since the disk write is not instantaneous and the modem still needs a place to store additional incoming data, two buffers are used. After the modem fills the first buffer, the disk write is requested. The modem then starts to fill the second buffer while the first buffer is written to disk. By the time the modem has filled the second buffer, the disk write from the first one should have completed, so the modem can switch back to the first buffer while the disk writes the second one. This double buffering decouples the producer of data from the consumer, thus relaxing timing requirements between them.

UNIT-4

Page 75

OPERATING SYSTEM

A second use of buffering is to adapt between devices that have different data transfer sizes. Such disparities are especially common in computer networking, where buffers are used widely for fragmentation and reassembly of messages. At the sending side, a large message is fragmented into small network packets. The packets are sent over the network, and the receiving side places them in a reassembly buffer to form an image of the source data. A third use of buffering is to support copy semantics for application I/O. An example will clarify the meaning of "copy semantics."

Caching A cache is a region of fast memory that holds copies of data. Access to the cached copy is more efficient than access to the original. For instance, the instructions of the currently running process are stored on disk, cached in physical memory, and copied again in the CPU's secondary and primary caches. The difference between a buffer and a cache is that a buffer may hold the only existing copy of a data item, whereas a cache, by definition, just holds a copy on faster storage of an item that resides elsewhere. Caching and buffering are distinct functions, but sometimes a region of memory can be used for both purposes. For instance, to preserve copy semantics and to enable efficient scheduling of disk I/O, the operating system uses buffers in main memory to hold disk data. These buffers are also used as a cache, to improve the I/O efficiency for files that are shared by applications or that are being written and reread rapidly. When the kernel receives a file I/O request, the kernel first accesses the buffer cache to see whether that region of the file is already available in main memory. If so, a physical disk I/O can be avoided or deferred. Also, disk writes are accumulated in the buffer cache for several seconds, so that large transfers are gathered to allow efficient write schedules. This strategy of delaying writes to improve I/O efficiency.

Spooling and Device Reservation A spool is a buffer that holds output for a device, such as a printer, that cannot accept interleaved data streams. Although a printer can serve only one job at a time, several applications may wish to print their output concurrently, without having their output mixed together. The operating system solves this problem by intercepting all output to the printer. Each application's output is spooled to a separate disk file. When an application finishes printing, the spooling system queues the corresponding spool file for output to the printer. The spooling system copies the queued spool files to the printer one at a time. In some operating systems, spooling is managed by a system daemon process. In other operating systems, it is handled by an in-kernel thread. In either case, the operating system provides a control interface that enables users and system administrators to display the queue, to remove unwanted jobs before those jobs print, to suspend printing while the printer is serviced, and so on.

Error Handling An operating system that uses protected memory can guard against many kinds of hardware and application errors, so that a complete system failure is not the usual result of each minor mechanical glitch. Devices and I/O transfers can fail in many ways, either for transient reasons, such as a network becoming overloaded, or for "permanent" reasons, such as a disk controller becoming defective. Operating systems can often compensate effectively for transient failures. For instance, a disk read() failure results in a read() retry, and a network send() error results in a resend(), if the UNIT-4

Page 76

OPERATING SYSTEM

protocol so specifies. Unfortunately, if an important component experiences a permanent failure, the operating system is unlikely to recover.

Kernel Data Structures The kernel needs to keep state information about the use of I/O components. It does so through a variety of in-kernel data structures, such as the open-file table structure. The kernel uses many similar structures to track network connections, character-device communications, and other I/O activities. UNIX provides file-system access to a variety of entities, such as user files, raw devices, and the address spaces of processes. Although each of these entities supports a read() operation, the semantics differ. For instance, to read a user file, the kernel needs to probe the buffer cache before deciding whether to perform a disk I/O. To read a raw disk, the kernel needs to ensure that the request size is a multiple of the disk sector size, and is aligned on a sector boundary. To read a process image, it is merely necessary to copy data from memory. UNIX encapsulates these differences within a uniform structure by using an object-oriented technique. The open-file record, shown in Figure below, contains a dispatch table that holds pointers to the appropriate routines, depending on the type of file.

The I/O subsystem coordinates an extensive collection of services that are available to applications and to other parts of the kernel. The I/O subsystem supervises • The management of the name space for files and devices • Access control to files and devices • Operation control (for example, a modem cannot seek()) • File system space allocation • Device allocation • Buffering, caching, and spooling • I/O scheduling • Device-status monitoring, error handling, and failure recovery • Device-driver configuration and initialization The upper levels of the I/O subsystem access devices via the uniform interface provided by the device drivers. UNIT-4

Page 77

OS Storage Management.pdf

Loadingâ¦ Page 1. Whoops! There was a problem loading more pages. OS Storage Management.pdf. OS Storage Management.pdf. Open. Extract. Open with.

Download PDF

2MB Sizes 4 Downloads 213 Views

Report

OS Storage Management.pdf

Recommend Documents