Appendix B. NFS Problem Diagnosis
Throughout this book, we've used the output of
nfsstat on both NFS clients and servers to
locate performance bottlenecks or inefficient NFS architectures. The
first two sections in this appendix summarize symptoms of problems
identified from the output of
nfsstat. The last
list contains typical values for the error variable
errno that may be returned by file operations on
NFS-mounted filesystems.
B.1. NFS server problems
Check the output of
nfsstat -s for the
following problems:
- badcalls > 0
-
RPC requests are being rejected out of hand by the NFS server. This
could indicate authentication problems caused by having a user in too
many groups, attempts to access exported filesystems as
root, or an improper Secure RPC configuration.
- badlen > 0 or xdrcall > 0
-
This indicates a malformed NFS request, detected by RPC or XDR
protocol decoding on the server. This can be caused by bugs in the
client or server, or by physical network problems.
- dupreqs > 0
-
The duplicate request cache keeps a record of previously executed NFS
requests. The dupchecks counter tracks the number of times this cache
was consulted, or checked. The dupreqs counter tracks the number of
times a check of the cache had a "hit." In other words,
dupreqs counts the number of times the NFS server received a
previously executed request. For connection-oriented (TCP) requests,
a high ratio of dupreqs to dupchecks
is 0.01%. For connectionless (UDP) requests, a high ratio
of dupreqs to dupchecks is
one percent. High ratios indicate one of three problems:
-
The timeout set on one or more clients' NFS mounts is too low.
Adjust the timeo option in the automounter map
or the NFS mount command upward.
-
The server is not responding quickly enough. There could be lots of
reasons for this having to do with physical capabilities of the
server: processor speed, numbers of processors (if it is a
multiprocessor), not enough primary memory (check if the percentage
of reads is high, say over 5%; this would indicate lots of reads that
would be best served from cache if there was enough memory), numbers
of disk drives on the system (spreading more data accesses across
more spindles reduces response time; if you've eliminated
primary memory as a cause, check if the percentage of writes is high,
say over 5%), etc. Other possibilities extend to artificial limits,
such as the number of server threads set via
nfsd.
-
There is a routing problem impeding replies from the server to one or
more clients.
- readlink > 10%
-
Clients are making excessive use of symbolic links that are on
filesystems exported by the server. If the link is to a directory,
replace the symbolic link with a directory, and mount both the
underlying filesystem and the link's target on the client. If
the link is to a file, replace the symbolic link with a hard link.
- getattr > 60%
-
Check for possible non-default attribute cache values on NFS clients.
A very high percentage of getattr requests may
indicate that the attribute cache window has been reduced or set to
zero with the actimeo or
noac mount option. It can also indicate that the
NFS filesystem implementation is doing a poor job of attribute
caching.
- null > 1%
-
The automounter has been configured to mount replicated filesystems,
but the timeout values for the mount are too short. The null
procedure calls are made by the automounter to locate a server for
the filesystem; too many null calls indicates
that the automounter is retrying the mount frequently. Increase the
mount timeout parameter on the automounter command line.
- fsinfo > 1%
-
This is typically used only on mounts. Lots of
fsinfo calls suggests that the automounter is
frequently mounting and unmounting the same filesystems. If so, tune
the automounter to hold mounts longer via the -t
option to automount. This will improve the
response time on clients.
Keep in mind that the percentages of each operation type used are
only general rules of thumb. Your site may
have legitimate
reasons for percentages that go outside the rule of thumb.
| | |
A.2. Static routing | | B.2. NFS client problems |