13.3. Remote procedure call tools
Network failures on a grand scale are generally
caused by problems at the MAC or IP level, and are immediately
noticed by users. Problems involving higher layers of the network
protocol stack manifest themselves in more subtle ways, affecting
only a few machines or particular pairs of clients and servers. The
utilities discussed in the following sections analyze functionality
from the remote procedure call (RPC) layer up through the NFS or NIS
application layer. The next section contains a detailed examination
of the RPC mechanism at the heart of NFS and NIS.
13.3.1. RPC mechanics
The Remote Procedure Call (RPC) mechanism 
imposes
a client/server relationship on machines in a network. A server is a
host that physically owns some shared resource, such as a disk
exported for NFS service or an NIS map. Clients operate on resources
owned by servers by making RPC requests; these operations appear (to
the client) to have been executed locally. For example, when
performing a read RPC on an NFS-mounted disk, the reading application
has no knowledge of where the read is actually executed. Many
client-server relationships may be defined for each machine on a
network; a server for one resource is often a client for many others
in the same network.
13.3.1.1. Identifying RPC services
Services available through RPC are
identified
by four values:
- Program number
 - Version number
 - Procedure number
 - Protocol (UDP or TCP)
 
The program number uniquely identifies the RPC service. Each RPC
service, such as the 
mountd or NIS server
daemons, is assigned a 
program number. The file
/etc/rpc and the 
rpc NIS
map contain an enumeration of RPC program numbers, formal names, and
nicknames for each service:
Excerpt from /etc/rpc:
nfs             100003  nfsprog
ypserv          100004  ypprog
mountd          100005  mount showmount
ypbind          100007
Note that program 100005, 
mountd, has two names,
reflecting the fact that the 
mountd daemon
services both 
mount requests and the
showmount utility.
Program numbers can also be expressed in hexadecimal. Well-known RPC
services such as NFS and NIS are assigned reserved program numbers in
the range 0x0 to 0x199999. Numbers above this range may be assigned
to local applications such as license servers. The well-known
programs are commonly expressed in decimal, though.
A version number is 
used
to differentiate between various flavors of the same service, and is
mostly utilized to evolve the service over time, while providing
backwards compatibility if so desired. For example, there are two
versions of the NFS service: Versions 2 and 3 (there is no Version
1). Each version of the program may be composed of many procedures.
Each version of the NFS service, program number 100003, consists of
several procedures, each of which is assigned a procedure number.
These procedures perform client requests on the NFS server. For
example: read a directory, create a file, read a block from a file,
write to a file, get the file's attributes, or get statistics
about a filesystem. The procedure number is passed in an RPC request
as an "op code" for the RPC server. Procedure numbers
start with 1; procedure 0 is reserved for a "null"
function. While RPC program numbers are well-advertised, version and
procedure numbers are particular to the service and often are
contained in a header file that gets compiled into the client
program. NFS procedure numbers, for example, are defined in the
header files 
/usr/include/nfs/nfs.h.
RPC clients and servers deal exclusively with RPC program numbers. At
the session layer in the protocol stack, the code doesn't
really care what protocols are used to provide the session services.
The UDP and TCP transport protocols need port numbers to identify the
local and remote ends of a connection. The portmapper is used to
perform translation between the RPC program number-based view of the
world and the TCP/UDP port numbers.
 
13.3.1.2. RPC portmapper  --  rpcbind
The 
rpcbind daemon (also known as the
portmapper),
[32] exists
  
to register RPC services and to provide their IP port numbers when
given an RPC program number. 
rpcbind itself is
an RPC service, but it resides at a well-known IP port (port 111) so
that it may be contacted directly by remote hosts. For example, if
host 
fred needs to mount a filesystem from host
barney, it must send an RPC request to the
mountd daemon on 
barney.
The mechanics of making the RPC request are as follows:
- fred gets the IP address for
barney, using the ipnodes
NIS map. fred also looks up the RPC program
number for mountd in the
rpc NIS map. The RPC program number for
mountd is 100005.
 - Knowing that the portmapper lives at port 111,
fred sends an RPC request to the portmapper on
barney, asking for the IP port (on
barney) of RPC program 100005.
fred also specifies the particular protocol and
version number for the RPC service. barney
's portmapper responds to the request with port 704,
the IP port at which mountd is listening for
incoming mount RPC requests over the specified protocol. Note that it
is possible for the portmapper to return an error, if the specified
program does not exist or if it hasn't been registered on the
remote host. barney, for example, might not be
an NFS server and would therefore have no reason to run the
mountd daemon.
 - fred sends a mount RPC
request to barney, using the IP port number
returned by the portmapper. This RPC request contains an RPC
procedure number, which tells the mountd daemon
what to do with the request. The RPC request also contains the
parameters for the procedure, in this case, the name of the
filesystem fred needs to mount.
 
The portmapper is also used to handle an 
RPC
broadcast. Recall that a network broadcast is a packet
that is sent to all hosts on the network; an RPC broadcast is a
request that is sent to all servers for a particular RPC service. For
example, the NIS client 
ypbind daemon uses an
RPC broadcast to locate an NIS server for its domain. There's
one small problem with RPC broadcasts: to send a broadcast packet, a
host must fill in the remote port number, so all hosts receiving the
packet know where to deliver the broadcast packet. RPC doesn't
have any knowledge of port numbers, and the RPC server daemons on
some hosts may be registered at different port numbers. This problem
is resolved by sending RPC broadcasts to the portmapper, and asking
the portmapper to make the RPC call indirectly on behalf of the
sender. In the case of the 
ypbind daemon, it
sends a broadcast to all 
rpcbind daemons; they
in turn call the 
ypserv RPC server on 
   each host.
 
13.3.1.3. RPC version numbers
As mentioned before, each new implementation of
  an RPC server
has its own version number. Different version numbers are used to
coordinate multiple implementations of the same service, each of
which may have a different interface. As an RPC service matures, the
service's author may find it necessary to add new procedures or
add arguments to existing procedures. Changing the interface in this
way requires incrementing the version number. The first (and
earliest) version of an RPC program is version 1; subsequent releases
of the server should use consecutive version numbers. For example,
the mount service has several versions, each one supporting more
options than its predecessors.
Multiple versions are implemented in a single server process; there
doesn't need to be a separate instance of the RPC server daemon
for each version supported. Each RPC server daemon registers its RPC
program number and all versions it supports with the portmapper. It
is helpful to think of dispatching a request through an RPC server as
a two-level switch: the first level discriminates on the version
number, and chooses a set of procedure routines comprising that
version of the RPC service. The second level dispatch invokes one of
the routines in that set based on the program number in the RPC
request.
When contacting the portmapper
on
 a
remote host, the local and remote sides must agree on the version
number of the RPC service that will be used. The rule of thumb is to
use the highest-numbered version that both parties understand. In
cases where version numbers are not consecutively numbered, or no
mutually agreeable version number can be found, the portmapper
returns a 
version mismatch error looking like:
mount: RPC: Program version mismatch
Even though Solaris supports Transport-Independent RPC (TI-RPC),
in
 reality most RPC services use the TCP, UDP
and loopback transport protocols. Servers may register themselves for
any of the protocols, depending upon the varieties of connections
they need to support. UDP packets are unreliable and unsequenced and
are often used for broadcast or stateless services. The RPC server
for the 
spray utility, which
"catches" packets thrown at the remote host, uses the UDP
protocol to accept as many requests as it can without requiring
retransmission of any missed packets. In contrast to UDP, TCP packets
are reliably delivered and are presented in the order in which they
were transmitted, making them a requirement when requests must be
processed by the server in the order in which they were transmitted
by the client. The loopback transports are used for communication
within the local host and can be connection-less or
connection-oriented. For example, the automounter daemon uses RPC
over a connection-oriented loopback transport to communicate with the
local kernel.
RPC servers listen on the ports they have registered with the
portmapper, and are used repeatedly for short-lived sessions.
Connections to an RPC server may exist for the duration of the RPC
call only, or may remain across calls. They do not usually fork new
processes for each request, since the overhead of doing so would
significantly impair the performance of RPC-intensive services such
as NFS. Many RPC servers are multithreaded, such as NFS in Solaris,
which allows the server to have multiple NFS requests being processed
in parallel. A multithreaded NFS server can take advantage of
multiple disks and disk controllers, it also allows
"fast" NFS requests such as attribute or
 name lookups to not
get trapped behind slower disk
  requests.
 
 
13.3.2. RPC registration
Making RPC calls is a reasonably complex affair 
because there are several places for the
procedure to break down. The 
rpcinfo utility is
an analog of 
ping that queries RPC servers and
their registration with the portmapper. Like
ping, 
rpcinfo provides a
measure of basic connectivity, albeit at the session layer in the
network protocol stack. Pinging a remote machine ensures that the
underlying physical network and IP address handling are correct;
using 
rpcinfo to perform a similar test verifies
that the remote machine is capable of accepting and replying to an
RPC request.
rpcinfo can be used to detect and debug
a
 
variety of failures:
- "Dead" or hung servers caused by improper configuration
or a failed daemon
 - RPC program version number mismatches between client and server
 - Bogus or renegade RPC servers, such as an NIS server that does not
have valid maps for the domain it pretends to serve
 - Broadcast-related problems
 
In its simplest usage, 
rpcinfo -p takes a remote
hostname (or uses the local hostname if none is specified) and
queries the portmapper on that host for all registered RPC services:
% rpcinfo -p corvette 
   program vers proto   port  service
    100000    4   tcp    111  portmapper
    100000    3   tcp    111  portmapper
    100000    2   tcp    111  portmapper
    100000    4   udp    111  portmapper
    100000    3   udp    111  portmapper
    100000    2   udp    111  portmapper
    100024    1   udp  32781  status
    100024    1   tcp  32775  status
    100011    1   udp  32787  rquotad
    100002    2   udp  32789  rusersd
    100002    3   udp  32789  rusersd
    100002    2   tcp  32777  rusersd
    100002    3   tcp  32777  rusersd
    100021    1   udp   4045  nlockmgr
    100021    2   udp   4045  nlockmgr
    100021    3   udp   4045  nlockmgr
    100021    4   udp   4045  nlockmgr
    100021    1   tcp   4045  nlockmgr
    100021    2   tcp   4045  nlockmgr
    100021    3   tcp   4045  nlockmgr
    100021    4   tcp   4045  nlockmgr
    100012    1   udp  32791  sprayd
    100008    1   udp  32793  walld
    100001    2   udp  32795  rstatd
    100001    3   udp  32795  rstatd
    100001    4   udp  32795  rstatd
    100068    2   udp  32796  cmsd
    100068    3   udp  32796  cmsd
    100068    4   udp  32796  cmsd
    100068    5   udp  32796  cmsd
    100005    1   udp  32810  mountd
    100005    2   udp  32810  mountd
    100005    3   udp  32810  mountd
    100005    1   tcp  32795  mountd
    100005    2   tcp  32795  mountd
    100005    3   tcp  32795  mountd
    100003    2   udp   2049  nfs
    100003    3   udp   2049  nfs
    100227    2   udp   2049
    100227    3   udp   2049
    100003    2   tcp   2049  nfs
    100003    3   tcp   2049  nfs
    100227    2   tcp   2049
    100227    3   tcp   2049
The output from 
rpcinfo shows the RPC program
and version numbers, the protocols supported, the IP port used by the
RPC server, and the name of the RPC service. Service names come from
the 
rpc.bynumber NIS map; if no name is printed
next to the registration information then the RPC program number does
not appear in the map. This may be expected for third-party packages
that run RPC server daemons, since the hardware vendor creating the
/etc/rpc file doesn't necessarily list all
of the software vendors' RPC numbers. However, a well-known RPC
service should be listed properly. Missing RPC service names could
indicate a corrupted or incomplete 
rpc.bynumber
NIS map. One exception is the NFS ACL service, defined as RPC program
100227. Solaris does not list it in 
/etc/rpc,
and therefore its name is not printed in the previous output. The NFS
ACL service implements the protocol used between Solaris hosts to
exchange ACL (Access Control List) information, though it is
currently only interoperable between Solaris hosts. If the client or
server do not implement the service, then traditional Unix file
access control based on permission bits is used.
If the portmapper on the remote machine has died or is not accepting
connections for some reason, 
rpcinfo times out
attempting to reach it and reports the error. This is a good first
step toward diagnosing any RPC-related problem: verify that the
remote portmapper is alive and returning valid RPC service
registrations.
rpcinfo can also be used like
ping for a particular RPC server:
rpcinfo -u host program version           UDP-based services 
rpcinfo -t host program version           TCP-based services
The 
-u or 
-t parameter
specifies the transport protocol to be used  --  UDP or TCP,
respectively. The hostname must be specified, even if the local host
is being queried. Finally, the RPC program and version number are
given; the program may be supplied by name (one reported by
rpcinfo -p) or by explicit numerical value.
As a practical example, consider trying to mount an NFS filesystem
from server 
mahimahi. You can mount it
successfully, but attempts to operate on its files hang the client.
You can use 
rpcinfo to check on the status of
the NFS RPC daemons on 
mahimahi:
% rpcinfo -u mahimahi nfs 2 
program 100003 version 2 ready and waiting
In this example, the NFS v2 RPC service is queried on remote host
mahimahi. Since the service is specified by
name, 
rpcinfo looks it up in the
rpc NIS map. The 
-u flag
tells 
rpcinfo to use the UDP protocol. If the
-t option had been specified instead,
rpcinfo would have reported the status of the
NFS over TCP service. At the time of this writing, a handful of
vendors still do not support NFS over TCP, therefore a
-t query to one of their servers would report
that 
rpcinfo could not find a registration for
the service using such a protocol.
rpcinfo -u and 
rpcinfo -t
call the null procedure (procedure 0) of the RPC server. The null
procedure normally does nothing more than return a zero-length reply.
If you cannot contact the null procedure of a server, then the health
of the server daemon process is suspect. If the daemon never started
running, 
rpcinfo would have reported that it
couldn't find the server daemon at all. If
rpcinfo finds the RPC server daemon but
can't get a null procedure reply from it, then the 
  daemon is 
 probably hung.
 
13.3.3. Debugging RPC problems
In the previous examples, we used 
rpcinfo to see
if a particular service
 was registered or not. If the RPC
service is not registered, or if you can't reach the RPC server
daemon, it's likely there is a low-level problem in the
network. However, sometimes you reach an RPC server, but you find the
wrong one or it gives you the wrong answer. If you have a
heterogeneous environment and are running multiple versions of each
RPC service, it's possible to get RPC version number mismatch
errors.
These problems affect NIS and diskless client booting; they are best
sorted out by using 
rpcinfo to emulate an RPC
call and by observing server responses. Networks with multiple,
heterogeneous servers may produce multiple, conflicting responses to
the same broadcast request. Debugging problems that arise from this
behavior often require knowing the order in which the responses are
received.
Here's an example: we'll perform a broadcast and then
watch the order in which responses are received. When a diskless
client boots, it may receive several replies to a request for boot
parameters. The boot fails if the first reply contains incorrect or
invalid boot parameter information. 
rpcinfo -b
sends a broadcast request to the specified RPC program and version
number. The RPC program can either be specified in numeric (100026)
form, or in its name equivalent (bootparam):
% rpcinfo -b bootparam 1
fe80::a00:20ff:feb5:1fba.128.67           unknown
fe80::a00:20ff:feb9:2ad1.128.78           unknown
131.40.52.238.128.67                      mora
131.40.52.81.128.68                       kanawha
131.40.52.221.128.79                      holydev
Next Broadcast
% rpcinfo -b bootparam 1
131.40.52.81.128.68                       kanawha
fe80::a00:20ff:feb5:1fba.128.67           unknown
131.40.52.238.128.67                      mora
fe80::a00:20ff:feb9:2ad1.128.78           unknown
131.40.52.221.128.79                      holydev
Next Broadcast
In this example, a broadcast packet is sent to the boot parameter
server (bootparam). 
rpcinfo obtains the RPC
program number (100026) from 
/etc/rpc or the
rpc.bynumber NIS map (depending on
/etc/nsswitch.conf ). Any host that is running
the boot parameter server replies to the broadcast with the standard
null procedure "empty" reply. The 
universal
address for the RPC service is printed by the requesting
host in the order in which replies are received from these hosts (see
the sidebar). After a short interval, another broadcast is sent.
Universal addresses
A universal address identifies the location of a transport endpoint.
For UDP and TCP, it is composed of the dotted IP address with the
port number of the service appended. In this example, the host
kanawha has a universal address of
131.40.52.81.128.68.
The first four elements in the dotted string form the IP address of
the server kanawha:
% ypmatch 131.40.52.81
hosts.byaddr
131.40.52.81   kanawha  
The last two elements, "128.68", are the high and low
octets of the port on which the service is registered (32836). This
number is obtained by multiplying the high octet value by 2^8 and
adding it to the low octet value:
128 * 2^8 = 32768   (high
octet)
+                68   (low octet)
-----
32836   (decimal representation of port)  
rpcinfo helps us verify that
bootparam is indeed registered on port 32836:
% rpcinfo -p kanawha | grep
bootparam
100026 1 udp 32836 bootparam  
 | 
Server loading may cause the order of replies between successive
broadcasts to vary significantly. A busy server takes longer to
schedule the RPC server and process the request. Differing reply
sequences from RPC servers are not themselves indicative of a
problem, if the servers all return the correct information. If one or
more servers has incorrect information, though, you will see
irregular failures. A machine returning correct information may not
always be the first to deliver a response to a client broadcast, so
sometimes the client gets the wrong response.
In the last example (diskless client booting), a client that gets the
wrong response won't boot. The boot failures may be very
intermittent due to variations in server loading: when the server
returning an invalid reply is heavily loaded, the client will boot
without problem. However, when the servers with the correct
information are loaded, then the client gets an invalid set of boot
parameters and cannot start booting a kernel.
Binding to the wrong NIS server causes another kind of problem. A
renegade NIS server may be the first to answer a
ypbind broadcast for NIS service, and its lack
of information about the domain makes the client machine unusable.
Sometimes, just looking at the list of servers that respond to a
request may flag a problem, if you notice that one of the servers
should not be answering the broadcast:
% rpcinfo -b ypserv 1 
131.40.52.138.3.255      poi 
131.40.52.27.3.166       onaga 
131.40.52.28.3.163       mahimahi
In this example, all NIS servers on the local network answer the
rpcinfo broadcast request to the null procedure
of the 
ypserv daemon. If
poi should not be an NIS server, then the
network will be prone to periods of intermittent failure if clients
bind to it. Failure to fully decommission a host as an NIS server
 --  leaving empty NIS map directories, for example  --  may
cause this problem.
There's another possibility for NIS failure that
rpcinfo cannot detect: there may be NIS servers
on the network, but no servers for the client's NIS domain. In
the previous example, 
poi may be a valid NIS
server in another domain, in which case it is operating properly by
responding to the 
rpcinfo broadcast. You might
not be able to get 
ypbind started on an NIS
client because all of the servers are in the wrong domain, and
therefore the client's broadcasts are not answered. The
rpcinfo -b test is a little misleading because
it doesn't ask the NIS RPC daemons what domains they are
serving, although the client's requests will be
domain-specific. Check the servers that reply to an 
rpcinfo
-b and ensure that they serve the NIS domain used by the
clients experiencing NIS failures.
If a client cannot find an NIS server, 
ypbind
hangs the boot sequence with errors of the form:
WARNING: Timed out waiting for NIS to come up
Using 
rpcinfo as shown helps to determine why a
particular client cannot start the NIS service: if no host replies to
the 
rpcinfo request, then the broadcast packet
is failing to reach any NIS servers. If the NIS domain name and the
broadcast address are correct, then it may be necessary to override
the broadcast-based search and hand 
ypbind the
name and address of a valid NIS server. Tools for examining and
altering
 NIS
bindings 
are the subject of the next section.
 
  |   |   | 
| 13.2. MAC and IP layer tools |   | 13.4. NIS tools |