Atomic Transactions

Atomic Transactions

The mutual exclusion of critical sections ensures that the critical sections are executed atomically. That is, if two critical sections are executed concurrently, the result is equivalent to their sequential execution in some unknown order. Although this property is useful in many application domains, in many cases we would like to make sure that a critical section forms a single logical unit of work that either is performed in its entirety or is not performed at all.

An example is funds transfer, in which one account is debited and another is credited. Clearly, it is essential for data consistency either that both the credit and debit occur or that neither occur. Consistency of data, along with storage and retrieval of data, is a concern often associated with database systems. Recently, there has been an upsurge of interest in using database-systems techniques in operating systems.

 Operating systems can be viewed as manipulators of data; as such, they can benefit from the advanced techniques and models available from database research. For instance, many of the ad hoc techniques used in operating systems to manage files could be more flexible and powerful if more formal database methods were used in their place. In Sections 6.9.2 to 6.9.4, we describe some of these database techniques and explain how they can be used by operating systems. First, however, we deal with the general issue of transaction atomicity. It is this property that the database techniques are meant to address.

System Model

 A collection of instructions (or operations) that performs a single logical function is called a transaction. A major issue in processing transactions is the preservation of atomicity despite the possibility of failures within the computer system. We can think of a transaction as a program unit that accesses and perhaps updates various data items that reside on a disk within some files. From our point of view, such a transaction is simply a sequence of read and write operations terminated by either a commit operation or an abort operation.

A commit operation signifies that the transaction has terminated its execution successfully, whereas an abort operation signifies that the transaction has ended its normal execution due to some logical error or a system failure. If a terminated transaction has completed its execution successfully, it is committed; otherwise, it is aborted. Since an aborted transaction may already have modified the data that it has accessed, the state of these data may not be the same as it would have been if the transaction had executed atomically. So that atomicity is ensured an aborted transaction must have no effect on the state of the data that it has already modified. Thus, the state of the data accessed by an aborted transaction must be restored to what it was just before the transaction started executing.

 We say that such a transaction has been rolled back. It is part of the responsibility of the system to ensure this property. To determine how the system should ensure atomicity, we need first to identify the properties of devices used for storing the various data accessed by the transactions. Various types of storage media are distinguished by their relative speed, capacity, and resilience to failure.

• Volatile storage. Information residing in volatile storage does not usually survive system crashes. Examples of such storage are main and cache memory. Access to volatile storage is extremely fast, both because of the speed of the memory access itself and because it is possible to access directly any data item in volatile storage.

• Nonvolatile storage. Information residing in nonvolatile storage usually survives system crashes. Examples of media for such storage are disks and magnetic tapes. Disks are more reliable than main memory but less reliable than magnetic tapes. Both disks and tapes, however, are subject to failure, which may result in loss of information. Currently, nonvolatile storage is slower than volatile storage by several orders of magnitude, because disk and tape devices are electromechanical and require physical motion to access data.

• Stable storage. Information residing in stable storage is never lost (never should be taken with a grain of salt, since theoretically such absolutes cannot be guaranteed). To implement an approximation of such storage, we need to replicate information in several nonvolatile storage caches (usually disk) with independent failure modes and to update the information in a controlled manner (Section 12.8). Here, we are concerned only with ensuring transaction atomicity in an environment where failures result in the loss of information on volatile storage.

Log-Based Recovery

One way to ensure atomicity is to record, on stable storage, information describing all the modifications made by the transaction to the various data it accesses. The most widely used method for achieving this form of recording is write-ahead logging. Here, the system maintains, on stable storage, a data structure called the log. Each log record describes a single operation of a transaction write and has the following fields:

• Transaction name. The unique name of the transaction that performed the write operation

• Data item name. The unique name of the data item written

 • Old value. The value of the data item prior to the write operation

 • New value.

The value that the data item will have after the write Other special log records exist to record significant events during transaction processing, such as the start of a transaction and the commit or abort of a transaction. Before a transaction T, starts its execution, the record < T, starts> is written to the log. During its execution, any write operation by T, is preceded by the writing of the appropriate new record to the log. When 7/ commits, the record < T, commits> is written to the log. Because the information in the log is used in reconstructing the state of the data items accessed by the various transactions, we cannot allow the actual update to a data item to take place before the corresponding log record is written out to stable storage. We therefore require that, prior to execution of a write(X) operation, the log records corresponding to X be written onto stable storage. Note the performance penalty inherent in this system. Two physical writes are required for every logical write requested. Also, more storage is needed, both for the data themselves and for the log recording the changes.

In cases where the data are extremely important and fast failure recovery is necessary, the price is worth the functionality. Using the log, the system can handle any failure that does not result in the loss of information on nonvolatile storage. The recovery algorithm uses two procedures:

• undo(TJ), which restores the value of all data updated by transaction T, to the old values

• redo(Tj), which sets the value of all data updated by transaction T; to the new values The set of data updated by 7} and their respective old and new values can be found in the log. The undo and redo operations must be idempotent (that is, multiple executions must have the same result as does one execution) to guarantee correct behavior, even if a failure occurs during the recovery process. If a transaction 7} aborts, then we can restore the state of the data that it has updated by simply executing undo(7}).

If a system failure occurs, we restore the state of all updated data by consulting the log to determine which transactions need to be redone and which need to be undone. This classification of transactions is accomplished as follows:

• Transaction 7, needs to be undone if the log contains the < T, starts> record but does not contain the < T-, commits> record.

• Transaction T, needs to be redone if the log contains both the < T, starts> and the < 7/ commits> records.


 When a system failure occurs, we must consult the log to determine those transactions that need to be redone and those that need to be undone. In principle, we need to search the entire log to make these determinations. There are two major drawbacks to this approach: 6.9 Atomic Transactions 225 1. The searching process is time consuming. » 2. Most of the transactions that, according to our algorithm, need to be redone have already actually updated the data that the log says they need to modify. Although redoing the data modifications will cause no harm (due to idempotency), it will nevertheless cause recovery to take longer.

To reduce these types of overhead, we introduce the concept of checkpoints. During execution, the system maintains the write-ahead log. In addition, the system periodically performs checkpoints that require the following sequence of actions to take place:

 1. Output all log records currently residing in volatile storage (usually main memory) onto stable storage.

 2. Output all modified data residing in volatile storage to the stable storage.

3. Output a log record onto stable storage. The presence of a record in the log allows the system to streamline its recovery procedure. Consider a transaction Tj that committed prior to the checkpoint.


The < T, commit s> record appears in the log before the record, and then finding the subsequent < Ti start > record. Once transaction Tj has been identified, the redo and undo operations need be applied only to transaction Tj and all transactions Tj that started executing after transaction Tj-. We'll call these transactions set T. The remainder of the log can thus be ignored.

 The recovery operations that are required are as follows: a For all transactions Tjt in T such that the record < Tj;- commits> appears in the log, execute redo(T)t). • For all transactions Tj- in T that have no < Ti- commits> record in the log, execute undo(TO- 6.9.4 Concurrent Atomic Transactions We have been considering an environment in which only one transaction can be executing at a time. We now turn to the case where multiple transactions are active simultaneously. Because each transaction is atomic, the concurrent execution of transactions must be equivalent to the case where these transactions are executed serially in some arbitrary order. This property, called serializability, can be maintained by simply executing each transaction within 226 Chapter 6 Process Synchronization a critical section. That is, all transactions share a common semaphore mutex, which is initialized to 1.

When a transaction starts executing, its first action is to execute wa.±t(mutex). After the transaction either commits or aborts, it executes signal(/?z«ta')- Although this scheme ensures the atomicity of all concurrently executing transactions, it is nevertheless too restrictive. As we shall see, in many cases we can allow transactions to overlap their execution while maintaining serializability. A number of different concurrency-control algorithms ensure serializability. These algorithms are described below.


 Consider a system with two data items, A and B, that are both read and written by two transactions, To and T\. Suppose that these transactions are executed atomically in the order To followed by T\. This execution sequence, which is called a schedule, is represented in Figure 6.22. In schedule 1 of Figure 6.22, the sequence of instruction steps is in chronological order from top to bottom, with instructions of To appearing in the left column and instructions of T\ appearing in the right column. A schedule in which each transaction is executed atomically is called a serial schedule. A serial schedule consists of a sequence of instructions from various transactions wherein the instructions belonging to a particular transaction appear together.

 Thus, for a set of n transactions, there exist n\ different valid serial schedules. Each serial schedule is correct, because it is equivalent to the atomic execution of the various participating transactions in some arbitrary order. If we allow the two transactions to overlap their execution, then the resulting schedule is no longer serial. A nonserial schedule does not necessarily imply an incorrect execution (that is, an execution that is not equivalent to one represented by a serial schedule).

Atomic Transactions

 To see that this is the case, we need to define the notion of conflicting operations. Consider a schedule S in which there are two consecutive operations O,- and Oj of transactions T, and Tj, respectively. We say that O, and Oj conflict if they access the same data item and at least one of them is a write operation. To illustrate the concept of conflicting operations, we consider the nonserial schedule 2 of Figure 6.23. The write(A) operation of To conflicts with the read(A) operation of Ti.

However, the write(A) operation of T\ does not conflict with the read(B) operation of To, because the two operations access different data items. Let Oj and Oj be consecutive operations of a schedule S. If O, and O; are operations of different transactions and O-, and Oj do not conflict, then we can swap the order of O, and 0/ to produce a new schedule S'. We expect S to be equivalent to S', as all operations appear in the same order in both schedules, except for O, and Oj, whose order does not matter. We can illustrate the swapping idea by considering again schedule 2 of Figure 6.23.

As the write(A) operation of T\ does not conflict with the read(B) operation of To, we can swap these operations to generate an equivalent schedule. Regardless of the initial system state, both schedules produce the same final system state. Continuing with this procedure of swapping nonconflicting operations, we get:

 • Swap the read(B) operation of TQ with the read(A) operation of T\.

• Swap the write(B) operation of To with the write(A) operation of T\.

• Swap the write(B) operation of To with the read(A) operation of T\. The final result of these swaps is schedule 1 in Figure 6.22, which is a serial schedule. Thus, we have shown that schedule 2 is equivalent to a serial schedule. This result implies that, regardless of the initial system state, schedule 2 will produce the same final state as will some serial schedule. If a schedule S can be transformed into a serial schedule S' by a series of swaps of nonconflicting operations, we say that a schedule S is conflict serializable. Thus, schedule 2 is conflict serializable, because it can be transformed into the serial schedule 1

Locking Protocol

One way to ensure serializability is to associate with each data item a lock and to require that each transaction follow a locking protocol that governs how locks are acquired and released. There are various modes in which a data item can be locked. In this section, we restrict our attention to two modes: 228 Chapter 6 Process Synchronization

• Shared. If a transaction X, has obtained a shared-mode lock (denoted by S) on data item Q, then 7] can read this item but cannot write Q

 • Exclusive. If a transaction T, has obtained an exclusive-mode lock (denoted by X) on data item Q, then 7} can both read and write Q. We require that every transaction request a lock in an appropriate mode on data item Q, depending on the type of operations it will perform on Q. To access data item Q, transaction 7} must first lock Q in the appropriate mode. If Q is not currently locked, then the lock is granted, and T; can now access it. However, if the data item Q is currently locked by some other transaction, then 7) may have to wait. More specifically, suppose that 7} requests an exclusive lock on Q. In this case, 7] must wait until the lock on Q is released. If T, requests a shared lock on Q, then 7) must wait if Q is locked in exclusive mode.

 Otherwise, it can obtain the lock and access Q. Notice that this scheme is quite similar to the readers-writers algorithm discussed in Section 6.6.2. A transaction may unlock a data item that it locked at an earlier point. It must, however, hold a lock on a data item as long as it accesses that item. Moreover, it is not always desirable for a transaction to unlock a data item immediately after its last access of that data item, because serializability may not be ensured.

One protocol that ensures serializability is the two-phase locking protocol. This protocol requires that each transaction issue lock and unlock requests in two phases:

• Growing phase. A transaction may obtain locks but may not release any lock.

• Shrinking phase. A transaction may release locks but may not obtain any new locks. Initially, a transaction is in the growing phase. The transaction acquires locks as needed.

Once the transaction releases a lock, it enters the shrinking phase, and no more lock requests can be issued. The two-phase locking protocol ensures conflict serializability (Exercise 6.25). It does not, however, ensure freedom from deadlock. In addition, it is possible that, for a given set of transactions, there are conflict-serializable schedules that cannot be obtained by use of the two-phase locking protocol. However, to improve performance over two-phase locking, we need either to have additional information about the transactions or to impose some structure or ordering on the set of data.

Timestamp-Based Protocols

 In the locking protocols described above, the order followed by pairs of conflicting transactions is determined at execution time by the first lock that both request and that involves incompatible modes. Another method for determining the serializability order is to select an order in advance. The most common method for doing so is to use a timestamp ordering scheme. With each transaction T, in the system, we associate a unique fixed timestamp, denoted by TS(T/). This timestamp is assigned by the system 6.9 Atomic Transactions 229 before the transaction T, starts execution. If a transaction 7} has been assigned timestamp TS(Tj-), and later a new transaction 7) enters the system, then TS(7}) < TS(Tj). There are two simple methods for implementing this scheme:

• Use the value of the system clock as the timestamp; that is, a transaction's timestamp is equal to the value of the clock when the transaction enters the system. This method will not work for transactions that occur on separate systems or for processors that do not share a clock.

• Use a logical counter as the timestamp; that is, a transaction's timestamp is equal to the value of the counter when the transaction enters the system. The counter is incremented after a new timestamp is assigned. The timestamps of the transactions determine the serializability order. Thus, if TS(T,) < TS(T,), then the system must ensure that the produced schedule is equivalent to a serial schedule in which transaction T, appears before transaction T,. To implement this scheme, we associate with each data item Q two timestamp values:

• W-timestamp(Q) denotes the largest timestamp of any transaction that successfully executed write(Q).

 • R-timestamp(Q) denotes the largest timestamp of any transaction that successfully executed read(Q). These timestamps are updated whenever a new read(Q) or write(Q) instruction is executed. The timestamp-ordering protocol ensures that any conflicting read and write operations are executed in timestamp order. This protocol operates as follows:

• Suppose that transaction T,- issues read(Q): o If TS(T,) < W-timestamp(), then T, needs to read a value of Q that was already overwritten. Hence, the read operation is rejected, and Tj- is rolled back. o If TS(TJ) > W-timestamp(Q), then the read operation is executed, and R-timestamp(Q) is set to the maximum of R-timestamp(Q) and TS(T,).

 • Suppose that transaction 7} issues write(Q): o If TS(T,) < R-timestamp(Q), then the value of Q that 7} is producing was needed previously and T,- assumed that this value would never be produced. Hence, the write operation is rejected, and 7} is rolled back. => If TS(T,) < W-timestamp(Q), then T, is attempting to write an obsolete value of Q. Hence, this write operation is rejected, and T, is rolled back. o Otherwise, the write operation is executed. A transaction T, that is rolled back as a result of the issuing of either a read or write operation is assigned a new timestamp and is restarted.

Atomic Transactions

To illustrate this protocol, consider schedule 3 of Figure 6.24, which includes transactions % and T3. We assume that a transaction is assigned a timestamp immediately before its first instruction. Thus, in schedule 3, TS(T2) < TS(T3), and the schedule is possible under the timestamp protocol. This execution can also be produced by the two-phase locking protocol. However, some schedules are possible under the two-phase locking protocol but not under the timestamp protocol, and vice versa. The timestamp protocol ensures conflict serializability. This capability follows from the fact that conflicting operations are processed in timestamp order. The protocol also ensures freedom from deadlock, because no transaction ever waits.

Frequently Asked Questions

Ans: Overview of Mass-Storage Structure In this section we present a general overview of the physical structure of secondary and tertiary storage devices. view more..
Ans: Types of System Calls System calls can be grouped roughly into five major categories: process control, file manipulation, device manipulation, information maintenance, and communications. In Sections 2.4.1 through 2.4.5, we discuss briefly the types of system calls that may be provided by an operating system. view more..
Ans: Event Ordering In a centralized system, we can always determine the order in which two events occurred, since the system has a single common memory and clock. Many applications may require us to determine order. For example, in a resourceallocation scheme, we specify that a resource can be used only after the resource has been granted. A distributed system, however, has no common memory and no common clock. Therefore, it is sometimes impossible to say which of two events occurred first. The liappened-before relation is only a partial ordering of the events in distributed systems. Since the ability to define a total ordering is crucial in many applications, we present a distributed algorithm for exterding the happened-before relation to a consistent total ordering of all the events in the system. view more..
Ans: Atomic Transactions The mutual exclusion of critical sections ensures that the critical sections are executed atomically. That is, if two critical sections are executed concurrently, the result is equivalent to their sequential execution in some unknown order. Although this property is useful in many application domains, in many cases we would like to make sure that a critical section forms a single logical unit of work that either is performed in its entirety or is not performed at all. An example is funds transfer, in which one account is debited and another is credited. Clearly, it is essential for data consistency either that both the credit and debit occur or that neither occur. Consistency of data, along with storage and retrieval of data, is a concern often associated with database systems. Recently, there has been an upsurge of interest in using database-systems techniques in operating systems. view more..
Ans: Programmer Interface The Win32 API is the fundamental interface to the capabilities of Windows XP. This section describes five main aspects of the Win32 API: access to kernel objects, sharing of objects between processes, process management, interprocess communication, and memory management. view more..
Ans: Memory Management The main memory is central to the operation of a modern computer system. Main memory is a large array of words or bytes, ranging in size from hundreds of thousands to billions. Each word or byte has its own address. Main memory is a repository of quickly accessible data shared by the CPU and I/O devices. The central processor reads instructions from main memory during the instruction-fetch cycle and both reads and writes data from main memory during the data-fetch cycle (on a Von Neumann architecture). The main memory is generally the only large storage device that the CPU is able to address and access directly. view more..
Ans: Storage Management To make the computer system convenient for users, the operating system provides a uniform, logical view of information storage. The operating system abstracts from the physical properties of its storage devices to define a logical storage unit, the file. The operating system maps files onto physical media and accesses these files via the storage devices view more..
Ans: Protection and Security If a computer system has multiple users and allows the concurrent execution of multiple processes, then access to data must be regulated. For that purpose, mechanisms ensure that files, memory segments, CPU, and other resources can be operated on by only those processes that have gained proper authorization from the operating system. For example, memory-addressing hardware ensures that a process can execute only within its own address space. view more..
Ans: Distributed Systems A distributed system is a collection of physically separate, possibly heterogeneous computer systems that are networked to provide the users with access to the various resources that the system maintains. Access to a shared resource increases computation speed, functionality, data availability, and reliability. Some operating systems generalize network access as a form of file access, with the details of networking contained in the network interface's device driver. view more..
Ans: Special-Purpose Systems The discussion thus far has focused on general-purpose computer systems that we are all familiar with. There are, however, different classes of computer systems whose functions are more limited and whose objective is to deal with limited computation domains. view more..
Ans: Operating systems provide a number of services. At the lowest level, system calls allow a running program to make requests from the operating system directly. At a higher level, the command interpreter or shell provides a mechanism for a user to issue a request without writing a program. Commands may come from files during batch-mode execution or directly from a terminal when in an interactive or time-shared mode. System programs are provided to satisfy many common user requests. The types of requests vary according to level. view more..
Ans: Summary A thread is a flow of control within a process. A multithreaded process contains several different flows of control within the same address space. The benefits of multithreading include increased responsiveness to the user, resource sharing within the process, economy, and the ability to take advantage of multiprocessor architectures. User-level threads are threads that are visible to the programmer and are unknown to the kernel. view more..
Ans: Motivation A distributed system is a collection of loosely coupled processors interconnected by a communication network. From the point of view of a specific processor in a distributed system, the rest of the processors and their respective resources are remote, whereas its own resources are local. The processors in a distributed system may vary in size and function. They may include small microprocessors, workstations, minicomputers, and large general-purpose computer systems. view more..
Ans: Summary Multimedia applications are in common use in modern computer systems. Multimedia files include video and audio files, which may be delivered to systems such as desktop computers, personal digital assistants, and cell phones. view more..
Ans: Summary CPU scheduling is the task of selecting a waiting process from the ready queue and allocating the CPU to it. The CPU is allocated to the selected process by the dispatcher. First-come, first-served (FCFS) scheduling is the simplest scheduling algorithm, but it can cause short processes to wait for very long processes. Shortestjob-first (SJF) scheduling is provably optimal, providing the shortest average waiting time. Implementing SJF scheduling is difficult, however, because predicting the length of the next CPU burst is difficult. view more..
Ans: Summary It is desirable to be able to execute a process whose logical address space is larger than the available physical address space. Virtual memory is a technique that enables us to map a large logical address space onto a smaller physical memory. Virtual memory allowr s us to run extremely large processes and to raise the degree of multiprogramming, increasing CPU utilization. Further, it frees application programmers from worrying about memory availability. In addition, with virtual memory, several processes can share system libraries and memory. view more..
Ans: Summary Disk drives are the major secondary-storage I/O devices on most computers. Most secondary storage devices are either magnetic disks or magnetic tapes. Modern disk drives are structured as a large one-dimensional array of logical disk blocks which is usually 512 bytes. Disks may be attached to a computer system in one of two ways: (1) using the local I/O ports on the host computer or (2) using a network connection such as storage area networks. view more..
Ans: Microsoft designed Windows XP to be an extensible, portable operating system —one able to take advantage of new techniques and hardware. Windows XP supports multiple operating environments and symmetric multiprocessing, including both 32-bit and 64-bit processors and NUMA computers. view more..

Rating - 3/5