Reading Debugging Output (DNS and BIND, 4th Edition)

13.3. Reading Debugging Output

We'll cover five examples of debugging output. The first example shows the name server starting up. The next two examples show successful name lookups. The fourth example shows a secondary name server keeping its zone up to date. And in the last example, we switch from showing you name server behavior to showing you resolver behavior: the resolver search algorithm. After each trace (except the last one) we killed the name server and started it again so that each trace started with a fresh, nearly empty cache.

You might wonder why we've chosen to show normal name server behavior for all our examples; after all, this chapter is about debugging. We're showing you normal behavior because you have to know what normal operation is before you track down abnormal operation. Another reason is to help you understand the concepts (retransmissions, roundtrip times, etc.) we described in earlier chapters.

13.3.1. Name Server Startup (BIND 8, Debug Level 1)

We'll start the debugging examples by watching the name server initialize. This first name server is a BIND 8 name server. We used -d 1 on the command line, and this is the named.run output that resulted:

1) Debug level 1
2) Version = named 8.2.3-T7B Mon Aug 21 19:21:21 MDT 2000
3) cricket@abugslife.movie.edu:/usr/local/src/bind-8.2.3-T7B/src/bin/named
4) conffile = ./named.conf
5) starting.  named 8.2.3-T7B Mon Aug 21 19:21:21 MDT 2000
6) cricket@abugslife.movie.edu:/usr/local/src/bind-8.2.3-T7B/src/bin/named
7) ns_init(./named.conf)
8) Adding 64 template zones
9) update_zone_info('0.0.127.in-addr.arpa', 1)
10) source = db.127.0.0
11) purge_zone(0.0.127.in-addr.arpa,1)
12) reloading zone
13) db_load(db.127.0.0, 0.0.127.in-addr.arpa, 1, Nil, Normal)
14) purge_zone(0.0.127.in-addr.arpa,1)
15) master zone "0.0.127.in-addr.arpa" (IN) loaded (serial 2000091500)
16) zone[1] type 1: '0.0.127.in-addr.arpa' z_time 0, z_refresh 0
17) update_zone_info('.', 3)
18) source = db.cache
19) reloading hint zone
20) db_load(db.cache, , 2, Nil, Normal)
21) purge_zone(,1)
22) hint zone "" (IN) loaded (serial 0)
23) zone[2] type 3: '.' z_time 0, z_refresh 0
24) update_pid_file(  )
25) getnetconf(generation 969052965)
26) getnetconf: considering lo [127.0.0.1]
27) ifp->addr [127.0.0.1].53 d_dfd 20
28) evSelectFD(ctx 0x80d8148, fd 20, mask 0x1, func 0x805e710, uap 0x40114344)
29) evSelectFD(ctx 0x80d8148, fd 21, mask 0x1, func 0x8089540, uap 0x4011b0e8)
30) listening on [127.0.0.1].53 (lo)
31) getnetconf: considering eth0 [192.249.249.3]
32) ifp->addr [192.249.249.3].53 d_dfd 22
33) evSelectFD(ctx 0x80d8148, fd 22, mask 0x1, func 0x805e710, uap 0x401143b0)
34) evSelectFD(ctx 0x80d8148, fd 23, mask 0x1, func 0x8089540, uap 0x4011b104)
35) listening on [206.168.194.122].53 (eth0)
36) fwd ds 5 addr [0.0.0.0].1085
37) Forwarding source address is [0.0.0.0].1085
38) evSelectFD(ctx 0x80d8148, fd 5, mask 0x1, func 0x805e710, uap 0)
39) evSetTimer(ctx 0x80d8148, func 0x807cbe8, uap 0x40116158, due 969052990.
812648000, inter 0.000000000)
40) exit ns_init(  )
41) update_pid_file(  )
42) Ready to answer queries.
43) prime_cache: priming = 0, root = 0
44) evSetTimer(ctx 0x80d8148, func 0x805bc30, uap 0, due 969052969.000000000,
inter 0.000000000)
45) sysquery: send -> [192.33.4.12].53 dfd=5 nsid=32211 id=0 retry=969052969
46) datagram from [192.33.4.12].53, fd 5, len 436
47) 13 root servers

We added the line numbers to the debugging output; you won't see them in yours. Lines 2 through 6 give the version of BIND you are running and the name of the configuration file. Version 8.2.3-T 7B was released by ISC (Internet Software Consortium) in August 2000. We used the configuration file in the current directory, . /named.conf, for this run.

Lines 7 through 23 show BIND reading the configuration file and the zone data files. This name server is a caching-only name server -- the only files read are db.127.0.0 (lines 9 through 16) and db.cache (lines 17-23). Line 9 shows the zone being updated (0.0.127.IN-ADDR.ARPA) and line 10 shows the file containing the zone data (db.127.0.0). Line 11 indicates that any old data for the zone is purged before new data is added. Line 12 says the zone is being reloaded, even though the zone is actually being loaded for the first time. The zone data is loaded during lines 13 through 15. On lines 16 and 23, z_time is the time to check when this zone is up to date; z_refresh is the zone refresh time. These values matter only if the name server is a slave for the zone.

Lines 25 through 39 show the initialization of file descriptors. (In this case, they're really socket descriptors.) File descriptors 20 and 21 (lines 27-29) are bound to 127.0.0.1, the loopback address. Descriptor 20 is a datagram socket and descriptor 21 is a stream socket. File descriptors 22 and 23 (lines 32-34) are bound to the 192.249.249.3 interface. Each interface address was considered and used -- they would not be used if the interface had not been initialized or if the address were already in the list. File descriptor 5 (lines 36-39) is bound to 0.0.0.0, the wildcard address. Most network daemons use only one socket bound to the wildcard address, not sockets bound to individual interfaces. The wildcard address picks up packets sent to any interface on the host. Let's digress for a moment to explain why named uses both a socket bound to the wildcard address and sockets bound to specific interfaces.

When named receives a request from an application or from another name server, it receives the request on one of the sockets bound to a specific interface. If named did not have sockets bound to specific interfaces, it would receive the requests on the socket bound to the wildcard address. When named sends back a response, it uses the same socket descriptor that the request came in on. Why does named do this? When responses are sent out via the socket bound to the wildcard address, the kernel fills in the sender's address with the address of the interface the response was actually sent out on. This address may or may not be the same address that the request was sent to. When responses are sent out via the socket bound to a specific address, the kernel fills in the sender's address with that specific address -- the same address the request was sent to. If the name server gets a response from an IP address it doesn't know about, the response is tagged a "martian" and discarded. named tries to avoid martian responses by sending its responses on descriptors bound to specific interfaces, so the sender's address is the same address the request was sent to. However, when named sends out queries, it uses the wildcard descriptor since there is no need to use a specific IP address.

Lines 43 through 47 show the name server sending out a system query to find out which name servers are currently serving the root zone. This is known as "priming the cache." The first server queried sent a response that included 13 name servers.

The name server is now initialized and ready to answer queries.

13.3.2. Name Server Startup (BIND 9, Debug Level 1)

Here's what a BIND 9 name server looks like starting up:

1)  Sep 15 15:34:53.878 starting BIND 9.1.0 -d1
2)  Sep 15 15:34:53.883 using 1 CPU
3)  Sep 15 15:34:53.899 loading configuration from './named.conf'
4)  Sep 15 15:34:53.920 the default for the 'auth-nxdomain' option is now 'no'
5)  Sep 15 15:34:54.141 no IPv6 interfaces found
6)  Sep 15 15:34:54.143 listening on IPv4 interface lo, 127.0.0.1#53
7)  Sep 15 15:34:54.151 listening on IPv4 interface eth0, 192.249.249.3#53
8)  Sep 15 15:34:54.163 command channel listening on 0.0.0.0#953
9)  Sep 15 15:34:54.180 now using logging configuration from config file
10) Sep 15 15:34:54.181 dns_zone_load: zone 0.0.127.in-addr.arpa/IN: start
11) Sep 15 15:34:54.188 dns_zone_load: zone 0.0.127.in-addr.arpa/IN: loaded
12) Sep 15 15:34:54.189 dns_zone_load: zone 0.0.127.in-addr.arpa/IN: dns_journal
_rollforward: no journal
13) Sep 15 15:34:54.190 dns_zone_maintenance: zone 0.0.127.in-addr.arpa/IN: enter
14) Sep 15 15:34:54.190 dns_zone_maintenance: zone version.bind/CHAOS: enter
15) Sep 15 15:34:54.190 running

The first difference you probably noticed between BIND 9's debugging output and BIND 8's is BIND 9's terseness. Remember that BIND 8 has been around for three years, and the authors have had plenty of time to add debugging messages to the code. BIND 9 is brand-spanking-new, so there aren't as many debugging messages yet.

You probably also noticed that BIND 9 includes a timestamp for each debugging message, which can be handy if you're trying to correlate messages to real-world events.

Lines 1 and 2 show the version of BIND we're running (9.1.0) and the configuration file it's reading. As with the previous example, we're using named.conf in the current directory. Line 3 tells us we're using only one CPU -- to be expected on a box with just one processor.

Line 4 gives us a simple warning that the default for the auth-nxdomain substatement (covered in Chapter 10, "Advanced Features") has changed. Line 5 reminds us that our host doesn't have any IP Version 6 network interfaces; if it did, BIND 9 could listen on those interfaces for queries.

Lines 6 and 7 show the name server listening on two network interfaces: lo, the loopback interface, and eth0, the Ethernet interface. BIND 9 displays the address and port in the format address#port, unlike BIND 8, which uses [address].port. Line 8 shows named listening on port 953, the default port, for control messages.

Lines 10-12 show the name server loading 0.0.127.in-addr.arpa. The start and loaded messages are self-explanatory. The no journal message indicates that no journal was present. (A journal, described in Chapter 10, "Advanced Features", is a record of dynamic updates the name server received for the zone.)

Finally, lines 13 and 14 show the name server doing maintenance on the 0.0.127.in-addr.arpa and version.bind zones. (version.bind is a built-in CHAOSNET zone that contains a single TXT record, attached to the domain name version.bind.) Zone maintenance is the process that schedules periodic tasks, such as SOA queries for slave and stub zones or NOTIFY messages.

13.3.3. A Successful Lookup (BIND 8, Debug Level 1)

Suppose you want to watch the name server look up a name. Your name server wasn't started with debugging. Use ndc once to turn on debugging, look up the name, then again to turn off debugging, like this:

# ndc trace 1
# /etc/ping galt.cs.purdue.edu.
# ndc notrace

We did this; here's the resulting named.run file:

datagram from [192.249.249.3].1162, fd 20, len 36

req: nlookup(galt.cs.purdue.edu) id 29574 type=1 class=1
req: missed 'galt.cs.purdue.edu' as '' (cname=0)
forw: forw -> [198.41.0.10].53 ds=4 nsid=40070 id=29574 2ms retry 4sec
datagram from [198.41.0.10].53, fd 4, len 343

;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 40070
;; flags: qr; QUERY: 1, ANSWER: 0, AUTHORITY: 9, ADDITIONAL: 9
;;              galt.cs.purdue.edu, type = A, class = IN
EDU.                        6D IN NS    A.ROOT-SERVERS.NET.
EDU.                        6D IN NS    H.ROOT-SERVERS.NET.
EDU.                        6D IN NS    B.ROOT-SERVERS.NET.
EDU.                        6D IN NS    C.ROOT-SERVERS.NET.
EDU.                        6D IN NS    D.ROOT-SERVERS.NET.
EDU.                        6D IN NS    E.ROOT-SERVERS.NET.
EDU.                        6D IN NS    I.ROOT-SERVERS.NET.
EDU.                        6D IN NS    F.ROOT-SERVERS.NET.
EDU.                        6D IN NS    G.ROOT-SERVERS.NET.
A.ROOT-SERVERS.NET.             5w6d16h IN A    198.41.0.4
H.ROOT-SERVERS.NET.             5w6d16h IN A    128.63.2.53
B.ROOT-SERVERS.NET.             5w6d16h IN A    128.9.0.107
C.ROOT-SERVERS.NET.             5w6d16h IN A    192.33.4.12
D.ROOT-SERVERS.NET.             5w6d16h IN A    128.8.10.90
E.ROOT-SERVERS.NET.             5w6d16h IN A    192.203.230.10
I.ROOT-SERVERS.NET.             5w6d16h IN A    192.36.148.17
F.ROOT-SERVERS.NET.             5w6d16h IN A    192.5.5.241
G.ROOT-SERVERS.NET.             5w6d16h IN A    192.112.36.4
resp: nlookup(galt.cs.purdue.edu) qtype=1
resp: found 'galt.cs.purdue.edu' as 'edu' (cname=0)
resp: forw -> [192.36.148.17].53 ds=4 nsid=40071 id=29574 1ms
datagram from [192.36.148.17].53, fd 4, len 202

;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 40071
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 4, ADDITIONAL: 4
;;   galt.cs.purdue.edu, type = A, class = IN
PURDUE.EDU.                 2D IN NS    NS.PURDUE.EDU.
PURDUE.EDU.                 2D IN NS    MOE.RICE.EDU.
PURDUE.EDU.                 2D IN NS    PENDRAGON.CS.PURDUE.EDU.
PURDUE.EDU.                 2D IN NS    HARBOR.ECN.PURDUE.EDU.
NS.PURDUE.EDU.                  2D IN A     128.210.11.5
MOE.RICE.EDU.                   2D IN A     128.42.5.4
PENDRAGON.CS.PURDUE.EDU.              2D IN A  128.10.2.5
HARBOR.ECN.PURDUE.EDU.              2D IN A     128.46.199.76
resp: nlookup(galt.cs.purdue.edu) qtype=1
resp: found 'galt.cs.purdue.edu' as 'cs.purdue.edu' (cname=0)
resp: forw -> [128.46.199.76].53 ds=4 nsid=40072 id=29574 8ms
datagram from [128.46.199.76].53, fd 4, len 234

send_msg -> [192.249.249.3].1162 (UDP 20) id=29574
Debug off

First, notice that IP addresses, not domain names, are logged -- odd for a name server, don't you think? It's really not that odd, though. If you are trying to debug a problem with looking up names, you don't want the name server looking up additional names just to make the debugging output more readable -- the extra queries would interfere with the debugging. None of the debugging levels translates IP addresses into domain names. You'll have to use a tool (like the one we provide later) to convert them for you.

Let's go through this debugging output line by line. This detailed approach is important if you want to understand what each line means. If you turn on debugging, you're probably trying to find out why some name can't be looked up, and you're going to have to figure out what the trace means.

datagram from [192.249.249.3].1162, fd 20, len 36

A datagram came from the host with IP address 192.249.249.3 (terminator.movie.edu). You may see the datagram come from 127.0.0.1 if the sender is on the same host as the name server. The sending application used port 1162. The name server received the datagram on file descriptor (fd) 20. The startup debugging output, like the one shown earlier, tells you which interface file descriptor 20 is bound to. The length (len) of the datagram was 36 bytes.

req: nlookup(galt.cs.purdue.edu) id 29574 type=1 class=1

Since the next debugging line starts with req, we know that the datagram was a request. The name looked up in the request was galt.cs.purdue.edu. The request id is 29574. The type=1 means the request is for address information. The class=1 means the class is IN. You can find a complete list of query types and classes in the header file /usr/ include/arpa/nameser.h.

req: missed 'galt.cs.purdue.edu' as '' (cname=0)

The name server looked up the requested name and didn't find it. Then it tried to find a remote name server to ask; none was found until the root zone (the empty quotes). The cname=0 means the name server didn't encounter a CNAME record. If it does see a CNAME record, the canonical name is looked up instead of the original name, and cname will be nonzero.

forw: forw -> [198.41.0.10].53 ds=4 nsid=40070 id=29574 2ms retry 4sec

The query was forwarded to the name server (port 53) on host 198.41.0.10 ( j.root-servers.net). The name server used file descriptor 4 (which is bound to the wildcard address) to send the query. The name server tagged this query with ID number 40070 (nsid=40070) so that it could match the response to the original question. The application used ID number 29574 (id=29574 ), as you saw on the nlookup line. The name server will wait four seconds before trying the next name server.

datagram from [198.41.0.10].53, fd 4, len 343

The name server on j.root-servers.net responded. Since the response was a delegation, it is printed in full in the debug log.

resp: nlookup(galt.cs.purdue.edu) qtype=1

After the information in the response message is cached, the name is looked up again. As mentioned earlier, qtype=1 means that the name server is looking for address information.

resp: found 'galt.cs.purdue.edu' as 'edu' (cname=0)
resp: forw -> [192.36.148.17].53 ds=4 nsid=40071 id=29574 1ms
datagram from [192.36.148.17].53, fd 4, len 202

The root name server responded with a delegation to the edu servers. The same query is sent to 192.36.148.17 (i.root-servers.net), one of the edu servers. i.root-servers.net responds with information about the purdue.edu servers.

resp: found 'galt.cs.purdue.edu' as 'cs.purdue.edu' (cname=0)

This time there is some information at the cs.purdue.edu level.

resp: forw -> [128.46.199.76].53 ds=4 nsid=40072 id=29574 8ms

A query was sent to the name server on 128.46.199.76 (harbor.ecn.purdue.edu). This time the name server ID is 40072.

datagram from [128.46.199.76].53, fd 4, len 234

The name server on harbor.ecn.purdue.edu responded. We have to look at what happens next to figure out the contents of this response.

send_msg -> [192.249.249.3].1162 (UDP 20) id=29574

The last response must have contained the address requested, since the name server responded to the application (which used port 1162, if you look back at the original query). The response was in a UDP packet (as opposed to a TCP connection), and it used file descriptor 20.

This name server was "quiet" when we did this trace; it wasn't handling other queries at the same time. When you do a trace on an active name server, though, you won't be so lucky. You'll have to sift through the output and patch together those pieces that pertain to the lookup in which you are interested. It's not that hard, though. Start up your favorite editor, search for the nlookup line with the name you looked up, then trace the entries with the same nsid. You'll see how to follow the nsid in the next BIND 8 trace.

13.3.4. A Successful Lookup (BIND 9, Debug Level 1)

We'll show you the debugging output produced by looking up the same domain name on a BIND 9 name server at debug level 1, but it's almost laughably short. Still, as we said, it's important to know what debugging output looks like under correct operation. Anyway, here goes:

Sep 16 17:20:57.193 client 192.249.249.3#1090: query: galt.cs.purdue.edu A
Sep 16 17:20:57.194 createfetch: galt.cs.purdue.edu. A

The first line tells us that a client at IP address 192.249.249.3 (that is, the local host), running on port 1090, sent us a query for galt.cs.purdue.edu's address. The second line is logged by the portion of the name server that does name resolution to let us know what it's up to.

13.3.5. A Successful Lookup with Retransmissions (BIND 8, Debug Level 1)

Not all lookups are as "clean" as the last one -- sometimes the query must be retransmitted. The user doesn't see any difference as long as the lookup succeeds, although a query involving retransmissions will take longer. Following is a trace where there are retransmissions. We converted the IP addresses to domain names after the trace was done. Notice how much easier it is to read with names!

1)  Debug turned ON, Level 1
2)
3)  datagram from terminator.movie.edu port 3397, fd 20, len 35
4)  req: nlookup(ucunix.san.uc.edu) id 1 type=1 class=1
5)  req: found 'ucunix.san.uc.edu' as 'edu' (cname=0)
6)  forw: forw -> i.root-servers.net port 53  ds=4 nsid=2 id=1 0ms retry 4 sec
7)
8)  datagram from i.root-servers.net port 53, fd 4, len 240
    <delegation lines removed>
9)  resp: nlookup(ucunix.san.uc.edu) qtype=1
10) resp: found 'ucunix.san.uc.edu' as 'san.uc.edu' (cname=0)
11) resp: forw -> uceng.uc.edu port 53 ds=4 nsid=3 id=1 0ms
12) resend(addr=1 n=0) - > ucbeh.san.uc.edu port 53 ds=4 nsid=3 id=1 0ms
13)
14) datagram from terminator.movie.edu port 3397, fd 20, len 35
15) req: nlookup(ucunix.san.uc.edu) id 1 type=1 class=1
16) req: found 'ucunix.san.uc.edu' as 'san.uc.edu' (cname=0)
17) resend(addr=2 n=0) - > uccba.uc.edu port 53 ds=4 nsid=3 id=1 0ms
18) resend(addr=3 n=0) - > mail.cis.ohio-state.edu port 53 ds=4 nsid=3 id=1 0ms
19)
20) datagram from mail.cis.ohio-state.edu port 53, fd 4, len 51
21) send_msg -> terminator.movie.edu (UDP 20 3397) id=1

This trace starts out the same way as the last trace (lines 1 through 11): the name server receives a query for ucunix.san.uc.edu, sends the query to an edu name server (i.root-servers.net), receives a response that includes a list of name servers for uc.edu, and sends the query to one of the uc.edu name servers (uceng.uc.edu).

What's new in this trace is the resend lines (lines 12, 17, and 18). The forw on line 11 counts as resend(addr=0 n=0) -- we CS dweebs always start counting at zero. Since uceng.uc.edu didn't respond, the name server went on to try ucbeh.san.uc.edu (line 12), uccba.uc.edu (line 17), and mail.cis.ohio-state.edu (line 18). The off-site name server on mail.cis.ohio-state.edu finally responded (line 20). Notice that you can track all the retransmissions by searching for nsid=3; that's important to know, because lots of other queries may be wedged between these.

Also, notice the second datagram from terminator.movie.edu (line 14). It has the same port, file descriptor, length, ID, and type as the query on line 3. The application didn't receive a response in time, so it retransmitted its original query. Since the name server is still working on the first query transmitted, this one is a duplicate. It doesn't say so in this output, but the name server detected the duplicate and dropped it. We can tell because there is no forw: line after the req: lines, as there was on lines 4 through 6.

Can you guess what this output might look like if the name server were having trouble looking up a name? You'd see a lot of retransmissions as the name server kept trying to look up the name (which you could track by matching the nsid= lines). You'd see the application send a couple more retransmissions, thinking that the name server hadn't received the application's first query. Eventually the name server would give up, usually after the application itself gave up.

With a BIND 9.1.0 name server, you won't see resends until debug level 3, and at that point they'll be very difficult to pick out from BIND 9's other logged messages. Moreover, even at debug level 3, BIND 9.1.0 doesn't tell you which name server it's resending to.

13.3.6. A Slave Name Server Checking Its Zone (BIND 8, Debug Level 1)

In addition to tracking down problems with name server lookups, you may have to track down why a slave server is not loading from its master. Tracking down this problem can often be done by simply comparing the zone's SOA serial numbers on the two servers using nslookup or dig, as we'll show in Chapter 14, "Troubleshooting DNS and BIND". If your problem is more elusive, you may have to resort to looking at the debugging information. We'll show you what the debugging information should look like if your server is running normally.

This debugging output was generated on a "quiet" name server -- one not receiving any queries -- to show you exactly which lines pertain to zone maintenance. Remember that a BIND 4 or 8 slave name server uses a child process to transfer the zone data to the local disk before reading it in. While the slave logs its debugging information to named.run, the slave's child process logs its debugging information to xfer.ddt.PID. The PID suffix, by default the process ID of the child process, may be changed to ensure that the filename is unique. Beware -- turning on debugging on a slave name server will leave xfer.ddt.PID files lying around, even if you are only trying to trace a lookup. Our trace is at debugging level 1, and we turned on the BIND 8 logging option print-time. Debug level 3 gives you more information, more than you may want if a transfer actually occurs. A debugging level 3 trace of a zone transfer of several hundred resource records can create an xfer.ddt.PID file several megabytes in size.

21-Feb 00:13:18.026 do_zone_maint for zone movie.edu (class IN)
21-Feb 00:13:18.034 zone_maint('movie.edu')
21-Feb 00:13:18.035 qserial_query(movie.edu)
21-Feb 00:13:18.043 sysquery: send -> [192.249.249.3].53 dfd=5
                         nsid=29790 id=0 retry=888048802
21-Feb 00:13:18.046 qserial_query(movie.edu) QUEUED
21-Feb 00:13:18.052 next maintenance for zone 'movie.edu' in 2782 sec
21-Feb 00:13:18.056 datagram from [192.249.249.3].53, fd 5, len 380
21-Feb 00:13:18.059 qserial_answer(movie.edu, 26739)
21-Feb 00:13:18.060 qserial_answer: zone is out of date
21-Feb 00:13:18.061 startxfer(  ) movie.edu
21-Feb 00:13:18.063 /usr/etc/named-xfer -z movie.edu -f db.movie
                    -s 26738 -C 1 -P 53 -d 1 -l xfer.ddt 192.249.249.3
21-Feb 00:13:18.131 started xfer child 390
21-Feb 00:13:18.132 next maintenance for zone 'movie.edu' in 7200 sec

21-Feb 00:14:02.089 endxfer: child 390 zone movie.edu returned
                         status=1 termsig=-1
21-Feb 00:14:02.094 loadxfer(  ) "movie.edu"
21-Feb 00:14:02.094 purge_zone(movie.edu,1)

21-Feb 00:14:30.049 db_load(db.movie, movie.edu, 2, Nil)
21-Feb 00:14:30.058 next maintenance for zone 'movie.edu' in 1846 sec

21-Feb 00:17:12.478 slave zone "movie.edu" (IN) loaded (serial 26739)
21-Feb 00:17:12.486 no schedule change for zone 'movie.edu'

21-Feb 00:42:44.817 Cleaned cache of 0 RRs

21-Feb 00:45:16.046 do_zone_maint for zone movie.edu (class IN)
21-Feb 00:45:16.054 zone_maint('movie.edu')
21-Feb 00:45:16.055 qserial_query(movie.edu)
21-Feb 00:45:16.063 sysquery: send -> [192.249.249.3].53 dfd=5
                         nsid=29791 id=0 retry=888050660
21-Feb 00:45:16.066 qserial_query(movie.edu) QUEUED
21-Feb 00:45:16.067 next maintenance for zone 'movie.edu' in 3445 sec
21-Feb 00:45:16.074 datagram from [192.249.249.3].53, fd 5, len 380
21-Feb 00:45:16.077 qserial_answer(movie.edu, 26739)
21-Feb 00:45:16.078 qserial_answer: zone serial is still OK
21-Feb 00:45:16.131 next maintenance for zone 'movie.edu' in 2002 sec

Unlike the previous traces, each line in this trace has a timestamp. The timestamp makes it clear which debug statements are grouped together.

This name server is a slave for a single zone, movie.edu. The line with time 00:13:18.026 shows that it is time to check with the master server. The server queries for the zone's SOA record and compares serial numbers before deciding to load the zone. The lines with times 00:13:18.059 through 00:13:18.131 show you the zone's serial number (26739), tell you the zone is out of date, and start a child process (pid 390) to transfer the zone. At time 00:13:18.132, a timer is set to expire 7200 seconds later. This is the amount of time the server allows for a transfer to complete. At time 00:14:02.089, you see the exit status of the child process. The status of 1 indicates that the zone data was successfully transferred. The old zone data is purged (time 00:14:02.094), and the new data is loaded.

The next maintenance (see time 00:14:30.058) is scheduled for 1846 seconds later. For this zone, the refresh interval is 3600, but the name server chose to check again in 1846 seconds. Why? The name server is trying to avoid having its refresh timer become synchronized. Instead of using 3600 exactly, it uses a random time between half the refresh interval (1800) and the full refresh interval (3600). At 00:45:16.046, the zone is checked again and this time it is up to date.

If your trace ran long enough, you'd see more lines like the one at 00:42:44.817 -- one line each hour. What's happening is that the server is making a pass through its cache, freeing any data that has expired to reduce the amount of memory used.

The master server for this zone is a BIND 4 name server. If the master were a BIND 8 name server, the slave would have been notified when the zone changed rather than waiting for the refresh interval to pass. The slave server's debug output would look almost exactly the same, but the trigger to check the zone status is a NOTIFY:

rcvd NOTIFY(movie.edu, IN, SOA) from [192.249.249.3].1059
qserial_query(movie.edu)
sysquery: send -> [192.249.249.3].53 dfd=5
          nsid=29790 id=0 retry=888048802

13.3.7. A Slave Name Server Checking Its Zone (BIND 9 Debug Level 1)

The equivalent debugging output from a BIND 9.1.0 name server at level 1 is, as usual, more concise. Here's what it looks like:

Sep 18 15:05:00.059 zone_timer: zone movie.edu/IN: enter
Sep 18 15:05:00.059 dns_zone_maintenance: zone movie.edu/IN: enter
Sep 18 15:05:00.059 queue_soa_query: zone movie.edu/IN: enter
Sep 18 15:05:00.059 soa_query: zone movie.edu/IN: enter
Sep 18 15:05:00.061 refresh_callback: zone movie.edu/IN: enter
Sep 18 15:05:00.062 refresh_callback: zone movie.edu/IN: Serial: new 2000010923, old 2000010922
Sep 18 15:05:00.062 queue_xfrin: zone movie.edu/IN: enter
Sep 18 15:05:00.070 zone_xfrdone: zone movie.edu/IN: success
Sep 18 15:05:00.070 transfer of 'movie.edu' from 192.249.249.3#53: end of transfer
Sep 18 15:05:01.089 zone_timer: zone movie.edu/IN: enter
Sep 18 15:05:01.089 dns_zone_maintenance: zone movie.edu/IN: enter
Sep 18 15:05:19.121 notify_done: zone movie.edu/IN: enter
Sep 18 15:05:19.621 notify_done: zone movie.edu/IN: enter

The message at 15:05:00.059 shows the refresh timer popping, causing the name server to begin maintenance for the zone on the next line. First the name server queues a query for the SOA record for the IN class zone movie.edu (queue_soa_query at the same timestamp), which it sends. At 15:05:00.062, the name server finds that the master name server has a higher serial number than it does (2000010923 to its 2000010922), so it queues an inbound zone transfer (queue_xfrin). All of eight milliseconds later (at 15:05:00.070) the transfer is done, and at 15:05:01.089 the name server resets the refresh timer (zone_timer).

The next three lines show the name server doing maintenance on movie.edu again. If, for example, some of movie.edu's name servers were outside the movie.edu zone, the name server would use this opportunity to look up their addresses (not just A, but also A6 and AAAA records!) so that it could include them in future responses. On the last two lines, our name server sends NOTIFY messages -- two, to be exact -- to the name servers listed in the NS records for movie.edu.


13.2. Turning On Debugging		13.4. The Resolver Search Algorithm and Negative Caching