I/O performance
I/O Performance
I/O is a major factor in system performance. It places heavy demands on the CPU to execute device-driver code and to schedule processes fairly and efficiently as they block and unblock. The resulting context switches stress the CPU and its hardware caches. I/O also exposes any inefficiencies in the interrupt-handling mechanisms in the kernel.
In addition, I/O loads down the memory bus during data copy between controllers and physical memory and again during copies between kernel buffers and application data space. Coping gracefully with all these demands is one of the major concerns of a computer architect. Although modern computers can handle many thousands of interrupts per second, interrupt handling is a relatively expensive task:
Each interrupt causes the system to perform a state change, to execute the interrupt handler, and then to restore state. Programmed I/O can be more efficient than interrupt-driven J/O, if the number of cycles spent in busy waiting is not excessive. An I/O completion typically unblocks a process, leading to the full overhead of a context switch. Network traffic can also cause a high context-switch rate. Consider, for instance, a remote login from one machine to another.
Each character typed on the local machine must be transported to the remote machine. On the local machine, the character is typed; a keyboard interrupt is generated; and the character is passed through the interrupt handler to the device driver, to the kernel, and then to the user process. The user process issues a network I/O system call to send the character to the remote machine. The character then flows into the local kernel, through the network layers that construct a network packet, and into the network device driver. The network device driver transfers the packet to the network controller, which sends the character and generates an interrupt.
The interrupt is passed, back up through the kernel to cause the network I/O system call to complete. 13.7 Performance 523 Now, the remote system's network hardware receives the packet, and an interrupt is generated. The character is unpacked from the network protocols and is given to the appropriate network daemon. The network daemon. identifies which remote login session is involved and passes the packet to the appropriate subdaemon for that session.
Throughout this flow, there are context switches and state switches (Figure 13.15). Usually, the receiver echoes the character back to the sender; that approach doubles the work. To eliminate the context switches involved in moving each character between daemons and the kernel,, the Solaris developers reimplemented the telnet daemon using in-kernel threads. Sun estimates that this improvement increased the maximum number of network logins from a few hundred to a few thousand on a large server.
Other systems use separate front-end processors for terminal I/O to reduce the interrupt burden on the main CPU. For instance, a terminal concentrator can multiplex the traffic from hundreds of remote terminals into one port on a large computer. An I/O channel is a dedicated, special-purpose CPU found in mainframes and in other high-end systems. The job of a channel is to offload I/O work from the main CPU. The idea is that the channels keep the data flowing smoothly, while the main CPU remains free to process the data. Like the device controllers and DMA controllers found in smaller computers, a channel can process more general and sophisticated programs, so channels can be tuned for particular workloads. We can employ several principles to improve the efficiency of I/O:
• Reduce the number of context switches.
• Reduce the number of times that data must be copied in memory while passing between device and application.
• Reduce the frequency of interrupts by using large transfers, smart controllers, and polling (if busy waiting can be minimized).
• Increase concurrency by using DMA-knowledgeable controllers or channels to offload simple data copying from the CPU.
• Move processing primitives into hardware, to allow their operation in device controllers to be concurrent with CPU and bus operation.
• Balance CPU, memory subsystem, bus, and r/O performance, because an overload in any one area will cause idleness in others. Devices vary greatly in complexity. For instance, a mouse is simple. The mouse movements and button clicks are converted into numeric values that are passed from hardware, through the mouse device driver, to the application. By contrast, the functionality provided by the Windows NT disk device driver is complex. It not only manages individual disks but also implements RAID arrays (Section 12.7). To do so, it converts an application's read or write request into a coordinated set of disk I/O operations. Moreover, it implements sophisticated error-handling and data-recovery algorithms and takes many steps to optimize disk performance. Where should the I/O functionality be implemented—in the device hardware, in the device driver, or in application software? Sometimes we observe the progression depicted in Figure 13.16.
• Initially, we implement experimental I/O algorithms at the application level, because application code is flexible and application bugs are unlikely to cause system crashes. Furthermore, by developing code at the application level, we avoid the need to reboot or reload device drivers after every change to the code.
An application-level implementation can be inefficient, however, because of the overhead of context switches and because the application cannot take advantage of internal kernel data structures and kernel functionality (such as efficient in-kerne! messaging, threading, and locking). » When an application-level algorithm has demonstrated its worth, we may reimplement it in the kernel. This can improve the performance, but the development effort is more challenging, because an operatingsystem kernel is a large, complex software system
. Moreover, an in-kernel implementation must be thoroughly debugged to avoid data corruption and system crashes. The highest performance may be obtained by a specialized implementation in hardware, either in the device or in the controller. The disadvantages of a hardware implementation include the difficulty and expense of making further improvements or of fixing bugs, the increased development time (months rather than days), and the decreased flexibility. For instance, a hardware RAID controller may not provide any means for the kernel to influence the order or location of individual block reads and writes, even if the kernel has special information about the workload that would enable the kernel to improve the I/O performance.
Frequently Asked Questions
Recommended Posts:
- Operating System Concepts ( Multi tasking, multi programming, multi-user, Multi-threading )
- Different Types of Operating Systems
- Batch Operating Systems
- Time sharing operating systems
- Distributed Operating Systems
- Network Operating System
- Real Time operating System
- Various Operating system services
- Architectures of Operating System
- Monolithic architecture - operating system
- Layered Architecture of Operating System
- Microkernel Architecture of operating system
- Hybrid Architecture of Operating System
- System Programs and Calls
- Process Management - Process concept