Stale filehandles (Managing NFS and NIS, 2nd Edition)

18.8. Stale filehandles

A filehandle becomes stale whenever the file or directory referenced by the handle is removed by another host, while your client still holds an active reference to the object. A typical example occurs when the current directory of a process, running on your client, is removed on the server (either by a process running on the server or on another client). For example, the following sequence of operations produces a stale filehandle error for the current directory of the process running on client1:

client1                 client2 or server
% cd /shared/mod1 
                        % cd /shared 
                        % rm -rf mod1 
% ls 
.: Stale File Handle

It is important to note that recreating the removed directory before client1 lists the directory would not have prevented the stale filehandle problem:

client1                 client2 or server
% cd /shared/mod1 
                        % cd /shared 
                        % rm -rf mod1
                        % mkdir mod1
% ls 
.: Stale File Handle

This occurs because the client filehandle is tied to the inode number and generation count of the file or directory. Removing and recreating the directory mod1 results in the creation of a new directory entry with the same name as before but with a different inode number and generation count (and consequently a different filehandle). This explains why clients get stale filehandle errors when files or directories on the server are moved to a different filesystem. Be careful when you perform filesystem maintenance on the NFS server. Unfortunately you cannot bring a server down, move files to a new filesystem (perhaps to a larger disk), and reshare the new filesystem without risking your clients getting stale filehandles. Moving the files to a new filesystem on the server results in new inode numbers and generation counts for the files since inode numbers are not preserved across filesystem moves. If your client gets stale filehandles, then you may need to terminate all processes accessing the filesystem on the client, and unmount the NFS filesystem in order to clear the large number of stale filehandles. Unfortunately, identifying all the processes that hold a filesystem busy is not always feasible, in which case you may have to resort to forcibly unmounting the filesystem:

 # umount -f /shared

Specify the -f option to the umount [59] command to forcibly unmount a filesystem. This should be done only as a last resort, since using this option can cause data loss for open files.

[59]The ability to forcibly unmount a filesystem was introduced in Solaris 8. This feature is supported by the Linux kernel 2.1.116 or later. Previously, you would have had to reboot the NFS client to clear the stale filehandles.

You will also get stale filehandle errors when the server or another client removes a file that your client currently has open:

Process A on client1                 client2 or server
...
fd = open("/shared/foo", O_RDONLY);
                                     % rm /shared/foo       
read(fd, &buffer, buffer_len);
Read fails! Stale File Handle

If you consistently suffer from stale filehandle errors, you should look at the way in which users share files using NFS. Even though users see the same set of files, they do not necessarily have to do their work in the same directories. Watch out for users who share directories or copies of code. Use a source code control system that lets them make private copies of source files in their own directories. NFS provides an excellent mechanism for allowing all users to see the common source tree, but nobody should be doing development in it. Similarly, users who share scratch space may decide to clean it out periodically. Any user who had a scratch file open when another user on another NFS client purged the scratch directory will receive stale filehandle errors on the next reference to the (now removed) scratch file.

As with most things, it helps to have an understanding of how your users are using the filesystems presented to them by NFS. In many cases, users want access to a wide variety of filesystems, but they do not want all of them mounted at all times (for fear of server crashes), nor do they want to keep track of where all filesystems are exported from and where they should be mounted. The NFS automounter solves all of these problems by applying NIS management to NFS mount information. As part of your client tuning, consider using the automounter to make client NFS administration easier. Chapter 9, "The Automounter" describes the automounter in detail.


18.7. Mount point constructions		A. IP Packet Routing