Chapter 16. Server-Side Performance Tuning

Performance analysis and tuning, particularly when it involves NFS and NIS, is a topic subject to heated debate. The focus of the next three chapters is on the analysis techniques and configuration options used to identify performance bottlenecks and improve overall system response time. Tuning a network and its servers is similar to optimizing a piece of user-written code. Finding the obvious flaws and correcting poor programming habits generally leads to marked improvements in performance. Similarly, there is a definite and noticeable difference between networked systems with abysmal performance and those that run reasonably well; those with poor response generally suffer from "poor habits" in network resource use or configuration. It's easy to justify spending the time to eliminate major flaws when the return on your time investment is so large.

However, all tuning processes are subject to a law of diminishing returns. Getting the last 5-10% out of an application usually means hand-rolling loops or reading assembly language listings. Fine-tuning a network server to an "optimum" configuration may yield that last bit of performance, but the next network change or new client added to the system may make performance of the finely tuned system worse than that of an untuned system. If other aspects of the computing environment are neglected as a result of the incremental server tuning, then the benefits of fine-tuning certainly do not justify its costs.

Our approach will be to make things "close enough for jazz." Folklore has it that jazz musicians take their instruments from their cases, and if all of the keys, strings, and valves look functional, they start playing music. Fine-tuning instruments is frowned upon, especially when the ambient street noise masks its effects. Simply ensuring that network and server performance are acceptable -- and remain consistently acceptable in the face of network changes -- is often a realistic goal for the tuning process.

As a network manager, you are also faced with the task of balancing the demands of individual users against the global constraints of the network and its resources. Users have a local view: they always want their machines to run faster, but the global view of the system administrator must be to tune the network to meet the aggregate demands of all users. There are no constraints in NFS or NIS that keep a client from using more than its fair share of network resources, so NFS and NIS tuning requires that you optimize both the servers and the ways in which the clients use these servers.[43]

[43]Add-on products such as the Solaris Bandwidth Manager allow you to specify the amount of network bandwidth on specified ports, allowing you to restrict the amount of network resources used by NFS. The Sun BluePrints Resource Management book published by Sun Microsystems Press provides good information on the Solaris Bandwidth Manager.

16.1. Characterization of NFS behavior

You must be able to characterize the demands placed on your servers as well as available configuration options before starting the tuning process. You'll need to know the quantities that you can adjust, and the mechanisms used to measure the success of any particular change. Above all else, it helps to understand the general behavior of a facility before you begin to measure it. In the first part of this book, we have examined individual NFS and NIS requests, but haven't really looked at how they are generated in "live" environments.

NFS requests exhibit randomness in two ways: they are typically generated in bursts, and the types of requests in each burst usually don't have anything to do with each other. It is very rare to have a steady, equally spaced stream of requests arriving at any server. The typical NFS request generation pattern involves a burst of requests as a user loads an application from an NFS server into memory or when the application reads or writes a file. These bursts are followed by quiet periods when the user is editing, thinking, or eating lunch. In addition, the requests from one client are rarely coordinated with those from another; one user may be reading mail while another is building software. Consecutive NFS requests received by a server are likely to perform different functions on different parts of one or more disks.

NFS traffic volumes also vary somewhat predictably over the course of a day. In the early morning, many users read their mail, placing a heavier load on a central mail server; at the end of the day most file servers will be loaded as users wrap up their work for the day and write out modified files. Perhaps the most obvious case of time-dependent server usage is a student lab. The hours after class and after dinner are likely to be the busiest for the lab servers, since that's when most people gravitate toward the lab.

Simply knowing the sheer volume of requests won't help you characterize your NFS work load. It's easy to provide "tremendous" NFS performance if only a few requests require disk accesses. Requests vary greatly in the server resources they need to be completed. "Big" RPC requests force the server to read or write from disk. In addition to the obvious NFS read and write requests, some symbolic link resolutions require reading information from disk. "Small" NFS RPC requests simply touch file attribute information, or the directory name look-up cache, and can usually be satisfied without a disk access if the server has previously cached the attribute information.

The average percentage of all RPC calls of each type is the "NFS RPC mixture," and it defines the kind of work the server is being asked to do, as opposed to simply the volume of work presented to it. The RPC mixture indicates possible areas of improvement, or flags obvious bottlenecks. It is important to determine if your environment is data- or attribute-intensive, since this will likely dictate the network utilization and the type of tuning required on the client and server.

A data-intensive environment is one in which large file transfers dominate the NFS traffic. Transfers are considered large if the size of the files is over 100 MB. Examples of these environments include computer aided design and image processing. An attribute-intensive environment, on the other hand, is dominated by small file and meta-data access. The NFS clients mostly generate traffic to obtain directory contents, file attributes, and the data contents of small files. For example, in a software development environment, engineers edit relatively small source files, header files, and makefiles. The compilation and linkage process involves a large number of attribute checks that verify the modification time of the files to decide when new object files need to be rebuilt, resulting in multiple frequent small file reads and writes. Because of their nature, attribute-intensive environments will benefit greatly from aggressive caching of name-lookup information on the server, and a reduced network collision rate. On the other hand, a high-bandwidth network and a fast server with fast disks will most benefit data-intensive applications due to their dependence on data access. Studies have shown that most environments are attribute intensive. Once you have characterized your NFS workload, you will need to know how to measure server performance as seen by NFS clients.


15.6. Asynchronous NFS error messages		16.2. Measuring performance

Chapter 16. Server-Side Performance Tuning

Contents:

16.1. Characterization of NFS behavior