Explain Reaching Agreement.




Reaching Agreement

 For a system to be reliable, we need a mechanism that allows a set of processes to agree on a common value. Such an agreement may not take place, for several reasons. First, the communication medium may be faulty, resulting in lost or garbled messages. Second, the processes themselves may be faulty, resulting in unpredictable process behavior.

The best we can hope for in this case is that processes fail in a clean way, stopping their execution without deviating from their normal execution pattern. In the worst case, processes may send garbled or incorrect messages to other processes or even collaborate with other failed processes in an attempt to destroy the integrity of the system.

The Byzantine generals problem provides an analogy for this situation. Several divisions of the Byzantine army, each commanded by its own general, surround an enemy camp. The Byzantine generals must reach agreement on whether or not to attack the enemy at dawn. It is crucial that all generals agree, since an attack by only some of the divisions would result in defeat.

 The various divisions are geographically dispersed, and the generals can communicate with one another only via messengers who run from camp to camp. The generals may not be able to reach agreement for at least two major reasons:

1. Messengers may get caught by the enemy and thus may be unable to deliver their messages. This situation corresponds to unreliable communication in a computer system and is discussed further in Section 18.7.1.

2. Generals may be traitors, trying to prevent the loyal generals from reaching an agreement. This situation corresponds to faulty processes in a computer system and is discussed further in Section 18.7.2.

Unreliable Communications

Let us first assume that, if processes fail, they do so in a clean way and that the communication medium is unreliable. Suppose that process P,- at site Si, which has sent a message to process P; at site S2, needs to know whether Pj has received the message so that it can decide how to proceed with its computation. For example, P, may decide to compute a function foo if Pj has received its message or to compute a function boo if Pj has not received the message (because of some hardware failure). To detect failures, we can use a time-out scheme similar to the one described in Section 16.7.1.

Explain Reaching Agreement.

When P, sends out a message, it also specifies 18.7 Reaching Agreement 687 a time interval during which it is willing to wait for an acknowledgment message from P,. When P, receives the message, it immediately sends an acknowledgment to P,-. If P; receives the acknowledgment message within the specified time interval, it can safely conclude that P, has received its message. If, however, a time-out occurs, then P, needs to retransmit its message and wait for an acknowledgment. This procedure continues until P, either gets the acknowledgment message back or is notified by the system that site En is down. In the first case, it will compute S; in the latter case, it will compute F.

Note that, if these are the only two viable alternatives, P; must wait until it has been notified that one of the situations has occurred.

Suppose now that P< also needs to know that P,- has received its acknowledgment message, so that it can decide how to proceed with its computation. For example, Pj may want to compute foo only if it is assured that P>- got its acknowledgment. In other words, P; and Pj will compute foo if and only if both have agreed on it. It turns out that, in the presence of failure, it is not possible to accomplish this task. More precisely, it is not possible in a distributed environment for processes P; and Pr to agree completely on their respective states.

To prove this claim, let us suppose that a minimal sequence of message transfers exists such that, after the messages have been delivered, both processes agree to compute foo. Let in' be the last message sent by P, to Pj. Since P; does not know whether its message will arrive at Pj (since the message may be lost due to a failure), P; will execute foo regardless of the outcome of the message delivery. Thus, m' could be removed from the sequence without affecting the decision procedure. Hence, the original sequence was not minimal, contradicting our assumption and showing that there is no sequence. The processes can never be sure that both will compute foo.

Faulty Processes

Now let us assume that the communication medium is reliable but that processes can fail in unpredictable ways. Consider a system of n processes, of which no more than m are faulty. Suppose that each process P; has some private value of V/. We wish to devise an algorithm that allows each nonfaulty process P, to construct a vector X, = (A.i, A.2- •••, A,n) such that the following conditions exist: 1. If Pj is a nonfaulty process, then Aj.j = Vj. 2. If P, and Pj are both nonfaulty processes, then X,- = Xj. There are many sokitions to this problem, and they share the following properties:

1. A correct algorithm can be devised only if n > 3 x m + 1.

2. The worst-case delay for reaching agreement is proportionate to in + 1 message-passing delays.

3. The number of messages required for reaching agreement is large. No single process is trustworthy, so all processes must collect all information and make their own decisions Rather than presenting a general solution, which would be complicated, we present an algorithm for the simple case where m = 1 and n - 4.

The algorithm requires two rounds of information exchange:

 1. Each process sends its private value to the other three processes.

2. Each process sends the information it has obtained in the first round to all other processes. A faulty process obviously may refuse to send messages. In this case, a nonfaulty process can choose an arbitrary value and pretend that the value was sent by the faulty process. Once these two rounds are completed, a nonfaulty process P, can construct its vector X, = (A.i, A.2/ A.3/ A.4) as follows:

2. For j ^ i, if at least two of the three values reported for process Pj (in the two rounds of exchange) agree, then the majority value is used to set the value of A,/- Otherwise, a default value—say, nil—is used to set the value of A.,.



Frequently Asked Questions

+
Ans: Election Algorithms Many distributed algorithms employ a coordinator process that performs functions needed by the other processes in the system. These functions include enforcing mutual exclusion, maintaining a global wait-for graph for deadlock detection, replacing a lost token, and controlling an input or output device in the system. If the coordinator process fails due to the failure of the site at which it resides, the system can continue only by restarting a new copy of the coordinator on some other site. The algorithms that determine where a new copy of the coordinator should be restarted are called election algorithms. Election algorithms assume that a unique priority number is associated with each active process in the system. For ease of notation, we assume that the priority number of process P, is /. To simplify our discussion, we assume a one-to-one correspondence between processes and sites and thus refer to both as processes. view more..
+
Ans: Access Matrix Our model of protection can be viewed abstractly as a matrix, called an access matrix. The rows of the access matrix represent domains, and the columns represent objects. Each entry in the matrix consists of a set of access rights. Because the column defines objects explicitly, we can omit the object name from the access right. The entry access(/,/) defines the set of operations that a process executing in domain Dj can invoke on object . view more..
+
Ans: History In the mid-1980s, Microsoft and IBM cooperated to develop the OS/2 operating system, which was written in assembly language for single-processor Intel 80286 systems. In 1988, Microsoft decided to make a fresh start and to develop a "new technology" (or NT) portable operating system that supported both the OS/2 and POSIX application-programming interfaces (APIs). view more..
+
Ans: Reaching Agreement For a system to be reliable, we need a mechanism that allows a set of processes to agree on a common value. Such an agreement may not take place, for several reasons. First, the communication medium may be faulty, resulting in lost or garbled messages. Second, the processes themselves may be faulty, resulting in unpredictable process behavior. The best we can hope for in this case is that processes fail in a clean way, stopping their execution without deviating from their normal execution pattern. In the worst case, processes may send garbled or incorrect messages to other processes or even collaborate with other failed processes in an attempt to destroy the integrity of the system. view more..
+
Ans: Atomicity We introduced the concept of an atomic transaction, which is a program unit that must be executed atomically. That is, either all the operations associated with it are executed to completion, or none are performed. When we are dealing with a distributed system, ensuring the atomicity of a transaction becomes much more complicated than in a centralized system. This difficulty occurs because several sites may be participating in the execution of a single transaction. The failure of one of these sites, or the failure of a communication link connecting the sites, may result in erroneous computations. Ensuring that the execution of transactions in the distributed system preserves atomicity is the function of the transaction coordinator. Each site has its own local transaction coordinator, which is responsible for coordinating the execution of all the transactions initiated at that site. view more..
+
Ans: Concurrency Control We move next to the issue of concurrency control. In this section, we show how certain of the concurrency-control schemes discussed in Chapter 6 can be modified for use in a distributed environment. The transaction manager of a distributed database system manages the execution of those transactions (or subtransactions) that access data stored in a local site. Each such transaction may be either a local transaction (that is, a transaction that executes only at that site) or part of a global transaction (that is, a transaction that executes at several sites). Each transaction manager is responsible for maintaining a log for recovery purposes and for participating in an appropriate concurrency-control scheme to coordinate the conciirrent execution of the transactions executing at that site. As we shall see, the concurrency schemes described in Chapter 6 need to be modified to accommodate the distribution of transactions. view more..
+
Ans: Features of Real-Time Kernels In this section, we discuss the features necessary for designing an operating system that supports real-time processes. Before we begin, though, let's consider what is typically not needed for a real-time system. We begin by examining several features provided in many of the operating systems discussed so far in this text, including Linux, UNIX, and the various versions of Windows. These systems typically provide support for the following: • A variety of peripheral devices such as graphical displays, CD, and DVD drives • Protection and security mechanisms • Multiple users Supporting these features often results in a sophisticated—and large—kernel. For example, Windows XP has over forty million lines of source code. view more..
+
Ans: Implementing Real-Time Operating Systems Keeping in mind the many possible variations, we now identify the features necessary for implementing a real-time operating system. This list is by no means absolute; some systems provide more features than we list below, while other systems provide fewer. • Preemptive, priority-based scheduling • Preemptive kernel • Minimized latency view more..
+
Ans: VxWorks 5.x In this section, we describe VxWorks, a popular real-time operating system providing hard real-time support. VxWorks, commercially developed by Wind River Systems, is widely used in automobiles, consumer and industrial devices, and networking equipment such as switches and routers. VxWorks is also used to control the two rovers—Spirit and Opportunity—that began exploring the planet Mars in 2004. The organization of VxWorks is shown in Figure 19.12. VxWorks is centered around the Wind microkernel. Recall from our discussion in Section 2.7.3 that microkernels are designed so that the operating-system kernel provides a bare minimum of features; additional utilities, such as networking, file systems, and graphics, are provided in libraries outside of the kernel. This approach offers many benefits, including minimizing the size of the kernel—a desirable feature for an embedded system requiring a small footprint view more..
+
Ans: Mutual Exclusion In this section, we present a number of different algorithms for implementing mutual exclusion in a distributed environment. We assume that the system consists of n processes, each of which resides at a different processor. To simplify our discussion, we assume that processes are numbered uniquely from 1 to n and that a one-to-one mapping exists between processes and processors (that is, each process has its own processor). view more..
+
Ans: Event Ordering In a centralized system, we can always determine the order in which two events occurred, since the system has a single common memory and clock. Many applications may require us to determine order. For example, in a resourceallocation scheme, we specify that a resource can be used only after the resource has been granted. A distributed system, however, has no common memory and no common clock. Therefore, it is sometimes impossible to say which of two events occurred first. The liappened-before relation is only a partial ordering of the events in distributed systems. Since the ability to define a total ordering is crucial in many applications, we present a distributed algorithm for exterding the happened-before relation to a consistent total ordering of all the events in the system. view more..
+
Ans: Types of System Calls System calls can be grouped roughly into five major categories: process control, file manipulation, device manipulation, information maintenance, and communications. In Sections 2.4.1 through 2.4.5, we discuss briefly the types of system calls that may be provided by an operating system. view more..
+
Ans: Overview of Mass-Storage Structure In this section we present a general overview of the physical structure of secondary and tertiary storage devices. view more..
+
Ans: Atomic Transactions The mutual exclusion of critical sections ensures that the critical sections are executed atomically. That is, if two critical sections are executed concurrently, the result is equivalent to their sequential execution in some unknown order. Although this property is useful in many application domains, in many cases we would like to make sure that a critical section forms a single logical unit of work that either is performed in its entirety or is not performed at all. An example is funds transfer, in which one account is debited and another is credited. Clearly, it is essential for data consistency either that both the credit and debit occur or that neither occur. Consistency of data, along with storage and retrieval of data, is a concern often associated with database systems. Recently, there has been an upsurge of interest in using database-systems techniques in operating systems. view more..
+
Ans: Programmer Interface The Win32 API is the fundamental interface to the capabilities of Windows XP. This section describes five main aspects of the Win32 API: access to kernel objects, sharing of objects between processes, process management, interprocess communication, and memory management. view more..
+
Ans: Memory Management The main memory is central to the operation of a modern computer system. Main memory is a large array of words or bytes, ranging in size from hundreds of thousands to billions. Each word or byte has its own address. Main memory is a repository of quickly accessible data shared by the CPU and I/O devices. The central processor reads instructions from main memory during the instruction-fetch cycle and both reads and writes data from main memory during the data-fetch cycle (on a Von Neumann architecture). The main memory is generally the only large storage device that the CPU is able to address and access directly. view more..
+
Ans: Storage Management To make the computer system convenient for users, the operating system provides a uniform, logical view of information storage. The operating system abstracts from the physical properties of its storage devices to define a logical storage unit, the file. The operating system maps files onto physical media and accesses these files via the storage devices view more..
+
Ans: Protection and Security If a computer system has multiple users and allows the concurrent execution of multiple processes, then access to data must be regulated. For that purpose, mechanisms ensure that files, memory segments, CPU, and other resources can be operated on by only those processes that have gained proper authorization from the operating system. For example, memory-addressing hardware ensures that a process can execute only within its own address space. view more..




Rating - 3/5
526 views

Advertisements