Most of the tools described in this chapter collect information at the protocol level. While it is unlikely that any of these tools will provide the detailed information you would want for a problem with a specific application, they should help you identify where the problem lies and will help if the problem is with the protocol. Most applications will have their own approaches to solving problems, e.g., debug modes, and log files. But you'll want to be sure the problem is with the application before you start with these. If the problem is with the application, you'll need to consult the specific documentation for the application.
If you are having trouble setting up a network application for the first time, you are probably better off rereading the documentation than investing time learning a new tool. But if you've read the directions three or four times in several different books or if you have used an application many times and it has suddenly stopped working, then it's probably time to look at tools. For many of the protocols, you'll have a number of choices. You won't need every tool, so pick the most appropriate, convenient tool and start there.
Providing a detailed description of all available tools is beyond the scope of any reasonable book. This would require both a detailed review of the protocol as well as a description of the tool. For example, Hal Stern's 400-page book, Managing NFS and NIS, has three chapters totaling about 125 pages on tools, debugging, and tuning NIS and NFS. What I'm trying to do here is provide you with enough information to get started and handle simple problems. If you need more information, you should consider looking at one of the many books, like Stern's, devoted to the specific protocol in question. A number of such books are described in Appendix B, "Resources and References".
Generally, these applications are based on a client/server model. The approach you'll take in debugging a client may be different from that used to debug a server. The first step, in general, is to decide if the problem is with the client application, the server application, or the underlying protocols. If any client on any machine can connect to a server, the server and protocols are probably operating correctly. So when communications fail, the first thing you may want to try is a different client program or a similar client on a different computer. With many protocols, you don't even need a client program. Many protocols are based on the exchange of commands in NVT ASCII[38] over a TCP connection. You can interact with these servers using any program that can open a TCP connection using NVT ASCII. Examples include telnet and netcat.
[38]Network Virtual Terminal (NVT) ASCII is a 7-bit U.S. variant of the common ASCII character code. It is used throughout the TCP/IP protocol. It uses 7 bits to encode a character that is transmitted as an 8-bit byte with the high-order bit set to 0.
The process is very simple. telnet is used to connect to port 25, the SMTP port, on the email server in question. The next four lines were returned by the server. At this point, we can see that the server is up and that we are able to communicate with it. To send email, use the commands helo to identify yourself, mail from: to specify the email source, and rcpt to: to specify the destination. Use names, not IP addresses, to specify the destination. Notice that no password is required to send email. (The server is responding with the lines starting with numbers or codes.) The data command was used to signal the start of the body of the message. The body is one line long here but can be as long as you like. When you are done entering the body, it is terminated with a new line that has a single period on it. The session was terminated with the quit command. Clearly the server is up and can be reached in this example. Any problems you may be having must be with your email client.bsd2# telnet mail.lander.edu 25 Trying 205.153.62.5... Connected to mail.lander.edu. Escape character is '^]'. 220 mail.lander.edu ESMTP Sendmail 8.9.3/8.9.3; Wed, 22 Nov 2000 13:22:15 -0500 helo 205.153.60.236 250 mail.lander.edu Hello [205.153.60.236], pleased to meet you mail from:<jsloan@205.153.60.236> 250 <jsloan@205.153.60.236>... Sender ok rcpt to:<jsloan@lander.edu> 250 jsloan@lander.edu... Recipient ok data 354 Enter mail, end with "." on a line by itself This is the body of a message. . 250 NAA28089 Message accepted for delivery quit 221 mail.lander.edu closing connection Connection closed by foreign host.
As noted, you had a pretty good idea the server was working as soon as it replied and could have quit at this point. There are a couple of reasons for going through the process of sending a message. First, it gives a nice warm feeling seeing that everything is truly working. More important, it confirms that the recipient is known to the server. For example, consider the following:
This reply lets us know that the user is unknown to the system. If you have doubts about a recipient, you can use the vrfy and expand commands. The vrfy command will confirm the recipient address is valid, as shown in the following example:rcpt to:<jsloane@lander.edu> 550 <jsloane@lander.edu>... User unknown
expn fully expands an alias, giving a list of all the recipients named in the alias. Be warned, expn and vrfy are often seen as security holes and may be disabled. (Prudence would dictate using vrfy and expn only on your own systems.) There are other commands, but these are enough to verify that the server is available.vrfy jsloan 250 Joseph Sloan <jsloan@mail.lander.edu> vrfy freddy 550 freddy... User unknown
Another reason for sending the email is that it gives you something to retrieve, the next step in testing your email connection. The process of retrieving email with telnet is similar, although the commands will vary with the specific protocol being used. Here is an example using a POP3 server:
As you can see, telnet is used to connect to port 110, the POP3 port. As soon as the first message comes back, you know the server is up and reachable. Next, you identify yourself using the user and pass commands. This is a quick way to make sure that the account exists and you have the right password. Often, email readers give cryptic error messages when you use a bad account or password. The system has informed us that there is one message waiting for this user. Next, retrieve that message with the retr command. The argument is the message number. This is the message we just sent. Delete the message and log off with the dele and quit commands, respectively. (As an aside, sometimes mail clients will hang with overlarge attachments. You can use the dele command to delete the offending message.)bsd2# telnet mail.lander.edu 110 Trying 205.153.62.5... Connected to mail.lander.edu. Escape character is '^]'. +OK POP3 mail.lander.edu v7.59 server ready user jsloan +OK User name accepted, password please pass xyzzy +OK Mailbox open, 1 messages retr 1 +OK 347 octets Return-Path: <jsloan@205.153.60.236> Received: from 205.153.60.236 ([205.153.60.236]) by mail.lander.edu (8.9.3/8.9.3) with SMTP id NAA28089; Wed, 22 Nov 2000 13:23:14 -0500 Date: Wed, 22 Nov 2000 13:23:14 -0500 From: jsloan@205.153.60.236 Message-Id: <200011221823.NAA28089@mail.lander.edu> Status: This is the body of a message. . dele 1 +OK Message deleted quit +OK Sayonara Connection closed by foreign host.
Of course, this is how a system running POP3 or SMTP is supposed to work. If it works this way, any subsequent problems are probably with the client, and you need to turn to the client documentation. You can confirm this with packet capture software. If your system doesn't work properly, the problem could be with the server software or with communications. You might try logging onto the server and verifying that the appropriate software is listening, using ps, or, if it is started by inetd, using netstat. Or you might try using telnet to connect to the server directly from the server, i.e., telnet localhost 25. If this succeeds, you may have routing problems, name service problems, or firewall problems. If it fails, then look to the documentation for the software you are using on the server.
The commands used by most email protocols are described in the relevant RFCs. For SMTP, see RFC 821; for POP2, see RFC 937; for POP3, see REF 1939; and for IMAP, see RFC 1176.
In this example, I've checked to see if the server is responding from the server itself. In general, however, using telnet is probably not worth the effort since it is usually very easy to find a working web browser that you can use somewhere on your network.bsd2# telnet localhost http Trying 127.0.0.1... Connected to localhost.lander.edu. Escape character is '^]'. HEAD / HTTP / 1.0 HTTP/1.1 200 OK Date: Sun, 22 Apr 2001 13:27:32 GMT Server: Apache/1.3.12 (Unix) Content-Location: index.html.en Vary: negotiate,accept-language,accept-charset TCN: choice Last-Modified: Tue, 29 Aug 2000 09:14:16 GMT ETag: "a4cd3-55a-39ab7ee8;3a4a1b39" Accept-Ranges: bytes Content-Length: 1370 Connection: close Content-Type: text/html Content-Language: en Expires: Sun, 22 Apr 2001 13:27:32 GMT Connection closed by foreign host.
Most web problems, in my experience, stem from incorrectly configured security files or are performance problems. For security configuration problems, you'll need to consult the appropriate documentation for your software. For a quick performance profile of your server, you might visit Patrick Killelea's web site, http://patrick.net. If you have problems, you probably want to look at his book, Web Performance Tuning.
Once you know the server is up, you'll want to switch over to a real FTP client. Because FTP opens a reverse connection when transferring information, you are limited with what you can do with telnet. Fortunately, this is enough to verify that the server is up, communication works, and you can successfully log on to the server.lnx1# telnet bsd2 ftp Trying 172.16.2.236... Connected to bsd2.lander.edu. Escape character is '^]'. 220 bsd2.lander.edu FTP server (Version 6.00LS) ready. user jsloan 331 Password required for jsloan. pass xyzzy 230 User jsloan logged in. stat 211- bsd2.lander.edu FTP server status: Version 6.00LS Connected to 172.16.3.234 Logged in as jsloan TYPE: ASCII, FORM: Nonprint; STRUcture: File; transfer MODE: Stream No data connection 211 End of status quit 221 Goodbye. Connection closed by foreign host.
Unlike FTP, TFTP is UDP based. Consequently, TCP-based tools like telnet are not appropriate. You'll want to use a TFTP client to test for connectivity. Fortunately, TFTP is a simple protocol and usually works well.
If you suspect name resolution is not working on a client, try using ping, alternating between hostnames and IP addresses. If you are consistently able to reach remote hosts with IP addresses but not with names, then you are having a problem with name resolution. If you have a problem with name resolution on the client side, start by reviewing the configuration files. It is probably easiest to start with /etc/hosts and then look at DNS. Leave NIS until last.
For most purposes, there is not much difference among these programs. Your choice will largely be a matter of personal preference. However, you should be aware that some other programs may be built on top of dig, so be sure to keep it around even if you prefer one of the other tools.
Of these, nslookup, written by Andrew Cherenson, is the most ubiquitous and the most likely to be installed by default. It is even available under Windows. It can be used either in command-line mode or interactively. In command-line mode, you use the name or IP address of interest as an argument:
As you can see, it returns both the name and IP address of the host in question, the identity of the server supplying the information, and, in the second example, that the queried name is an alias. You can specify the server you want to use as well as other options on the command line. You should be aware, however, that it is not unusual for reverse lookups to fail, usually because the DNS database is incomplete.sol1# nslookup 205.153.60.20 Server: lab.lander.edu Address: 205.153.60.5 Name: ntp.lander.edu Address: 205.153.60.20 bsd2# nslookup www.lander.edu Server: lab.lander.edu Address: 205.153.60.5 Name: web.lander.edu Address: 205.153.60.15 Aliases: www.lander.edu
Earlier versions of nslookup required a special format for finding the names associated with IP addresses. For example, to look up the name associated with 205.153.60.20, you would have used the command nslookup 20.60.153.205.in-addr.arpa. Fortunately, unless you are using a very old version of nslookup, you won't need to bother with this.
While command-line mode is adequate for an occasional quick query, if you want more information, you'll probably want to use nslookup in interactive mode. If you know the right combination of options, you could use command-line options. But if you are not sure, it is easier to experiment step-by-step in interactive mode.
Interactive mode is started by typing nslookup without any arguments:
As you can see, nslookup responds with the name of the default server and a prompt. A ? will return a list of available options. You can change the server you want to query with the server command. You can get a listing of all machines in a domain with the ls command. For example, ls netlab.lander.edu would list all the machines in the netlab.lander.edu domain. Use the ls command with caution -- it can return a lot of information. You can use the -t option to specify a query type, i.e., a particular type of record. For example, ls -t mx lander.edu will return the mail entries from lander.edu. Query types can include cname to list canonical names for aliases, hinfo for host information, ns for name servers for named zones, soa for zone authority record, and so on. For more information, start with the manpage for nslookup.sol1# nslookup Default Server: lab.lander.edu Address: 205.153.60.5 >
One useful trick is to retrieve the soa record for local and authoritative servers. Here is part of one such record retrieved in interactive mode:
The entry labeled serial is a counter that should be incremented each time the DNS records are updated. If the serial number on your local server, when compared to the authoritative server, is off by more than 1 or 2, the local server is not updating its records in a timely manner. One possible cause is an old version of bind.> ls -t soa lander.edu [lab.lander.edu] $ORIGIN lander.edu. @ 1D IN SOA lab root ( 960000090 ; serial
Many administrators prefer dig to nslookup. While not quite as ubiquitous as nslookup, it is included as a tool with bind and is also available as a separate tool. dig is a command-line tool that is quite easy to use. It seems to have a few more options and, since it is command line oriented, it is more suited for shell scripts. On the other hand, using nslookup interactively may be better if you are groping around and not really sure what you are looking for.
dig, short for Domain Internet Groper, was written by Steve Hotz. Here is an example of using dig to do a simple query:
The first argument, in this case @lander.edu, is optional. It gives the name of the name server to be queried. The second argument is the name of the host you are looking up.bsd2# dig @lander.edu www.lander.edu ; <<>> DiG 8.3 <<>> @lander.edu www.lander.edu ; (1 server found) ;; res options: init recurs defnam dnsrch ;; got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 6 ;; flags: qr aa rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 1, ADDITIONAL: 1 ;; QUERY SECTION: ;; www.lander.edu, type = A, class = IN ;; ANSWER SECTION: www.lander.edu. 1D IN CNAME web.lander.edu. web.lander.edu. 1D IN A 205.153.60.15 ;; AUTHORITY SECTION: lander.edu. 1D IN NS lander.edu. ;; ADDITIONAL SECTION: lander.edu. 1D IN A 205.153.60.5 ;; Total query time: 9 msec ;; FROM: bsd2.lander.edu to SERVER: lander.edu 205.153.60.5 ;; WHEN: Tue Nov 7 10:26:42 2000 ;; MSG SIZE sent: 32 rcvd: 106
As you can see, a simple dig provides a lot more information, by default at least, than does nslookup. It begins with information about the name server and resolver flags used. (The flags are documented in the manpage for bind 's resolver.) Next come the header fields and flags followed by the query being answered. These are followed by the answer, authority records, and additional records. The format is the domain name, TTL field, type code for the record, and the data field. Finally, summary information about the exchange is included.
You can also use dig to get other types of information. For example, the -x option is used to do a reverse name lookup:
The mx option (no hyphen) will return mail records, the soa option will return zone authority records, and so on. See the manpage for details.bsd2# dig -x 205.153.63.30 ; <<>> DiG 8.3 <<>> -x ;; res options: init recurs defnam dnsrch ;; got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 4 ;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 1, ADDITIONAL: 1 ;; QUERY SECTION: ;; 30.63.153.205.in-addr.arpa, type = ANY, class = IN ;; ANSWER SECTION: 30.63.153.205.in-addr.arpa. 1D IN PTR sloan.lander.edu. ;; AUTHORITY SECTION: 63.153.205.in-addr.arpa. 1D IN NS lander.edu. ;; ADDITIONAL SECTION: lander.edu. 1D IN A 205.153.60.5 ;; Total query time: 10 msec ;; FROM: bsd2.lander.edu to SERVER: default -- 205.153.60.5 ;; WHEN: Mon Nov 6 10:54:17 2000 ;; MSG SIZE sent: 44 rcvd: 127
nslookup and dig are not unique. For example, host and dnsquery are other alternatives you may want to look at. host is said to be designed as a successor for nslookup and dig. But it does everything online and can generate a lot of traffic as a result. While very useful tools, all of them rely on your ability to go back and analyze the information returned. There are other tools that help to fill this gap.
The results are recorded in a log file; in this case log.lander.edu. is the filename. (Note its trailing period.)bsd2# doc lander.edu. Doc-2.1.4: doc lander.edu. Doc-2.1.4: Starting test of lander.edu. parent is edu. Doc-2.1.4: Test date - Mon Nov 6 11:55:07 EST 2000 ;; res_nsend to server g.root-servers.net. 192.112.36.4: Operation timed out DIGERR (UNKNOWN): dig @g.root-servers.net. for SOA of parent (edu.) failed Summary: ERRORS found for lander.edu. (count: 3) WARNINGS issued for lander.edu. (count: 1) Done testing lander.edu. Mon Nov 6 11:55:40 EST 2000
dnswalk, written by David Barr, is a similar tool. It is a Perl script that does a zone transfer and checks the database for internal consistency. (Be aware that more and more systems are disabling zone transfers from unknown sites.)
Be sure to include the period at the end of the domain name. This can produce a lot of output, so you may want to redirect output to a file. A number of options are available. Consult the manpage.bsd2# dnswalk lander.edu. Checking lander.edu. BAD: lander.edu. has only one authoritative nameserver Getting zone transfer of lander.edu. from lander.edu...done. SOA=lab.lander.edu contact=root.lander.edu WARN: bookworm.lander.edu A 205.153.62.205: no PTR record WARN: library.lander.edu A 205.153.61.11: no PTR record WARN: wamcmaha.lander.edu A 205.153.62.11: no PTR record WARN: mrtg.lander.edu CNAME elmer.lander.edu: unknown host 0 failures, 4 warnings, 1 errors.
You'll want to take the output from these tools with a grain of salt. Even though these tools do a lot of work for you, you'll need a pretty good understanding of DNS to make sense of the error messages. And, as you can see, for the same domain, one found three errors and one warning while the other found one error and four warnings for a fully functional DNS domain. There is no question that this domain's database, which was being updated when this was run, has a few minor problems. But it does work. The moral is, don't panic when you see an error message.
Another program you might find useful is lamers. This was written by Bryan Beecher and requires both doc and dig. It is used to find lame delegations, i.e., a name server that is listed as authoritative for a domain but is not actually performing that service for the listed domain. This problem most often arises when name services are moved from one machine to another, but the parent domain is not updated. lamers is a simple script that can be used to identify this problem.
If you are setting up NIS, your best strategy is to fully test DNS first. If you are having problems with NIS, there are a number of simple utilities supplied with NIS. ypcat lists an entire map, ypmatch matches a single key and prints an entry, and ypwhich identifies client bindings. But if you have read the NIS documentation, you are already familiar with these.
If you are using RIP, rtquery and ripquery are two tools that can be used to retrieve routing tables from remote systems. rtquery is supplied as part of the routed distribution, while ripquery comes with gated. The advantage of these tools is that they use the RIP query and response mechanism to retrieve the route information. Thus, you can use either of these tools to confirm that the RIP exchange mechanism is really working, as well as to retrieve the routing tables to check for correctness.
It really doesn't matter which of these you use, as the output from the two is basically the same. Here is the output from ripquery:
Here is the output from rtquery :bsd2# ripquery 172.16.2.1 84 bytes from NLCisco.netlab.lander.edu(172.16.2.1) to 172.16.2.236 version 2: 172.16.1.0/255.255.255.0 router 0.0.0.0 metric 1 tag 0000 172.16.3.0/255.255.255.0 router 0.0.0.0 metric 1 tag 0000 172.16.5.0/255.255.255.0 router 0.0.0.0 metric 2 tag 0000 172.16.7.0/255.255.255.0 router 0.0.0.0 metric 2 tag 0000
You'll notice that these are not your usual routing tables. Rather, these are the tables used by RIP's distance vector algorithm. They give reachable networks and the associated costs. Of course, you could always capture a RIP update with tcpdump or ethereal or use SNMP, but the tools discussed here are a lot easier to use.bsd2# rtquery 172.16.2.1 NLCisco.netlab.lander.edu (172.16.2.1): RIPv2 84 bytes 172.16.1.0/24 metric 1 172.16.3.0/24 metric 1 172.16.5.0/24 metric 2 172.16.7.0/24 metric 2
If you are using Open Shortest Path First (OSPF) (regretfully I don't at present), gated provides ospf_monitor. This interactive program provides a wealth of statistics, including I/O statistics and error logs in addition to OSPF routing tables. (For more information on routing protocols, you might consult Routing in the Internet by Christian Huitema or Interconnections by Radia Perlman.)
Unlike most other protocols where a single process is started, NFS relies on a number of different programs or daemons that vary from client to server and, to some extent, from system to system. If you are having problems with NFS, the first step is to consult your documentation to determine which daemons need to be running on your system. Next, make sure they are running. Be warned, the daemons you need and the names they go by vary from operating system to operating system. For example, on most systems, mountd and nfsd, respectively, mount filesystems and access files. On some systems they go by the names rpc.mountd and rpc.nfsd. Since these rely on portmap, sometimes called rpcbind, you'll need to make sure it is running as well. (NFS daemons are typically based on RPC and use the portmapper daemon to provide access information.) The list of daemons will be different for the client and the server. For example, nfsiod (or biod) will typically be running on the client but not the server. Keep in mind, however, that a computer may be both a client and a server.
There are a couple of ways to ensure the appropriate processes are available. You could log on to both machines and use ps to discover what is running. This has the advantage of showing you everything that is running. Another approach is to use rpcinfo to do a portmapper dump. Here is an example of querying a server from a client:
This has the advantage of showing that these services are actually reachable across the network.bsd2# rpcinfo -p bsd1 program vers proto port 100000 2 tcp 111 portmapper 100000 2 udp 111 portmapper 100005 3 udp 1023 mountd 100005 3 tcp 1023 mountd 100005 1 udp 1023 mountd 100005 1 tcp 1023 mountd 100003 2 udp 2049 nfs 100003 3 udp 2049 nfs 100003 2 tcp 2049 nfs 100003 3 tcp 2049 nfs 100024 1 udp 1011 status 100024 1 tcp 1022 status
Once you know that everything is running, you should check the access files, typically /etc/dfs/dfstab or /etc/exports, to make sure the client isn't being blocked. You can't just edit these files and expect to see the results immediately. Consult your documentation on how to inform your NFS implementation of the changes. Be generous with privileges if you are having problems, but don't forget to tighten security once everything is working.
Finally, check your syntax. Make sure the mount point exists and has appropriate permissions. Mount the remote system manually and verify that it is mounted with the mount command. You should see something recognizable. Here are mount table entries returned, respectively, by FreeBSD, Linux, and Solaris:
While they are not too similar, you should see a recognizable change to the mount table before and after mounting a remote filesystem.bsd1:/ on /mnt/nfs type nfs (rw,addr=172.16.2.231,addr=172.16.2.231) 172.16.2.231:/ on /mnt/nfs (nfs) /mnt/nfs on 172.16.2.231:/usr read/write/remote on Thu Nov 30 09:49:52 2000
If you are having intermittent problems or if you suspect performance problems, you might want to use the nfsstat command. It provides a wealth of statistics about your NFS connection and its performance. You can use it to query the client, the server, or both. When called without any options, it queries both client and server. With the -c option, it queries the client. With the -s option, it queries the server. Here is an example of querying a client:
Unfortunately, it seems that every operating system has its own implementation of nfsstat and each implementation returns a different set of statistics labeled in a different way. What you'll be most interested in is the number of problems in relation to the total number of requests. For example, a large number of timeouts is no cause for concern if it is a small percentage of a much larger number of total requests. If the timeouts are less than a couple of percent, they are probably not a cause for concern. But if the percent of timeouts is large, you need to investigate. You'll need to sort out the meaning of various numbers returned by your particular implementation of nfsstat. And, unfortunately, the labels aren't always intuitive.bsd2# nfsstat -c Client Info: Rpc Counts: Getattr Setattr Lookup Readlink Read Write Create Remove 0 0 33 2 0 21 4 0 Rename Link Symlink Mkdir Rmdir Readdir RdirPlus Access 0 0 0 0 0 8 0 66 Mknod Fsstat Fsinfo PathConf Commit GLease Vacate Evict 0 13 3 0 2 0 0 0 Rpc Info: TimedOut Invalid X Replies Retries Requests 0 0 0 0 152 Cache Info: Attr Hits Misses Lkup Hits Misses BioR Hits Misses BioW Hits Misses 232 36 74 33 0 0 0 21 BioRLHits Misses BioD Hits Misses DirE Hits Misses 13 2 18 8 13 0
Several other NFS tools were once popular but seem to have languished in recent years. You probably won't have much luck in finding these or getting them running. Two of the ones that were once more popular are nhfsstone and nfswatch. nhfsstone is a benchmark tool for NFS, which seems to have been superseded with the rather pricey SFS tool in SPEC. nfswatch is a tool that allows you to watch NFS traffic. tcpdump or ethereal, when used with the appropriate filters, provide a workable alternative to nfswatch.
9.3. Microsoft Windows | 10.2. Microsoft Windows |
Copyright © 2002 O'Reilly & Associates. All rights reserved.