This verifies that the application is stuck in a lock request. We can use pfiles to see what is going on with the files of process 23796:client1% ps -eaf | grep SuperApp mre 23796 10031 0 11:13:22 pts/6 0:00 SuperApp client1% pstack 23796 23796: SuperApp ff313134 fcntl (1, 7, ffbef9dc) ff30de48 fcntl (1, 7, ffbef9dc, 0, 0, 0) + 1c8 ff30e254 lockf (1, 1, 0, 2, ff332584, ff2a0140) + 98 0001086c main (1, ffbefac4, ffbefacc, 20800, 0, 0) + 1c 00010824 _start (0, 0, 0, 0, 0, 0) + dc client1% truss -p 23796 fcntl(1, F_SETLKW, 0xFFBEF9DC) (sleeping...)
That we are told that there is an advisory lock set on file descriptor 1 that is set by another process, process ID 3242, is useful, but unfortunately it doesn't tell us if 3242 is a local process or a process on another NFS client or NFS server. We also aren't told if the file mapped to file descriptor 1 is a local file, or an NFS file. We are, however, told that the major and minor device numbers of the filesystem are 208 and 1823 respectively. If you run the mount command without any arguments, this dumps the list of mounted file systems. You should see a display similar to:client1% pfiles 23796 pfiles 23796 23796: SuperApp Current rlimit: 256 file descriptors 0: S_IFCHR mode:0620 dev:136,0 ino:37990 uid:466 gid:7 rdev:24,37 O_RDWR 1: S_IFREG mode:0644 dev:208,1823 ino:5516985 uid:466 gid:300 size:0 O_WRONLY|O_LARGEFILE advisory write lock set by process 3242 2: S_IFCHR mode:0620 dev:136,0 ino:37990 uid:466 gid:7 rdev:24,37 O_RDWR
The numbers after dev= are in hexadecimal. Device numbers are constructed by taking the major number, shifting it left several bits, and then adding the minor number. Convert the minor number 1823 to hexadecimal, and look for it in the mount table:/ on /dev/dsk/c0t0d0s0 read/write/setuid/intr/largefiles/onerror=panic/dev=2200000 on Thu Dec 21 11:13:33 2000 /usr on /dev/dsk/c0t0d0s6 read/write/setuid/intr/largefiles/onerror=panic/ dev=2200006 on Thu Dec 21 11:13:34 2000 /proc on /proc read/write/setuid/dev=31c0000 on Thu Dec 21 11:13:29 2000 /dev/fd on fd read/write/setuid/dev=32c0000 on Thu Dec 21 11:13:34 2000 /etc/mnttab on mnttab read/write/setuid/dev=3380000 on Thu Dec 21 11:13:35 2000 /var on /dev/dsk/c0t0d0s7 read/write/setuid/intr/largefiles/onerror=panic/ dev=2200007 on Thu Dec 21 11:13:40 2000 /home/mre on spike:/export/home/mre remote/read/write/setuid/intr/dev=340071f on Thu Dec 28 08:51:30 2000
We now know four things:client1% printf "%x\n" 1823 71f client1% mount | grep 'dev=.*71f' /home/mre on spike:/export/home/mre remote/read/write/setuid/intr/dev=340071f on Thu Dec 28 08:51:30 2000
then you can use crash to get more lock state. The crash command is like a shell, but with internal commands for examining kernel state. The internal command we will be using is lck :spike# df -F ufs | grep /export /export (/dev/dsk/c0t0d0s7 ): 503804 blocks 436848 files
An important field is PROC. PROC is the "slot" number of the process. If it is -1, that indicates that the lock is being held by a nonlocal (i.e., an NFS client) process, and the PID field thus indicates the process ID, relative to the NFS client. In the sample display, we see one such entry:spike# crash dumpfile = /dev/mem, namelist = /dev/ksyms, outfile = stdout > lck Active and Sleep Locks: INO TYP START END PROC PID FLAGS STATE PREV NEXT LOCK 30000c3ee18 w 0 0 13 136 0021 3 48bf0f8 ae9008 6878d00 30000dd8710 w 0 MAXEND 17 212 0001 3 8f1a48 8f02d8 8f0e18 30001cce1c0 w 193 MAXEND -1 3242 2021 3 6878850 c43a08 2338a38 Summary From List: TOTAL ACTIVE SLEEP 3 3 0 >
Note that the process id, 3242, is equal to that which the pfiles command displayed earlier in this example. We can confirm that this lock is for the file in question via crash's uinode command:30001cce1c0 w 193 MAXEND -1 3242 2021 3 6878850 c43a08 2338a38
The inode numbers match what pfiles earlier displayed on the NFS client. However, inode numbers are unique per local filesystem. We can make doubly sure this is the file by comparing the major and minor device numbers from the uinode command, 136 and 7, with that of the filesystem that is mounted on /export :> uinode 30001cce1c0 UFS INODE MAX TABLE SIZE = 34020 ADDR MAJ/MIN INUMB RCNT LINK UID GID SIZE MODE FLAGS 30001cce1c0 136, 7 5516985 2 1 466 300 403 f---644 mt rf >
spike# ls -lL /dev/dsk/c0t0d0s7 brw------- 1 root sys 136, 7 May 6 2000 /dev/dsk/c0t0d0s7 spike#
The entries prefixed with ipv4 are just symbolic links to other entries. The non-symbolic link entries identify the hosts we want to check for. The most likely cause of the lock not getting released is that the holding NFS client has crashed. You can take the list of hosts from the /var/statmon/sm directory and check if any are dead, or not responding due to a network partition. Once you determine which are dead, you can use Solaris's clear_locks command to clear lock state. Let's suppose you determine that gonzo is dead. Then you would do:spike# cd /var/statmon/sm spike# ls client1 ipv4.10.1.0.25 ipv4.10.1.0.26 gonzo java
If clearing the lock state of dead clients doesn't fix the problem, then perhaps a now-live client crashed, but for some reason after it rebooted, its status monitor did not send a notification to the NLM server's status monitor. You can log onto the live clients and check if they are currently mounting the filesystem from the server (in our example, spike:/export). If they are not, then you should consider using clear_locks to clear any residual lock state those clients might have had. Ultimately, you may be forced to reboot your server. Short of that there are other things you could do. Since you know the inode number and filesystem of file in question, you can determine the file's name:spike# clear_locks gonzo
You could rename file database to something else, and copy it back to a file named database. Then kill and restart the SuperApp application on client1. Of course, such an approach requires intimate knowledge or experience with the application to know if this will be safe.spike# cd /export find . -inum 5516985 -print ./home/mre/database
11.2. NFS and file locking | 12. Network Security |
Copyright © 2002 O'Reilly & Associates. All rights reserved.