It's helpful to know the default configuration when you're called upon to correct the configuration of someone else's system. In this section we examine the values set in the default configuration on a Solaris 8 system. (The default Solaris 8 configuration file is listed in Appendix F, "Solaris httpd.conf File".)
Here we focus on the directives that are actually used in the Solaris 8 configuration, and a few others that show important Apache features. There are some other directives that we don't discuss. If you need additional information about any directive, there are many places to look. The full httpd.conf file contains many comments, which explain the purpose of each directive and are an excellent source of information. The Apache web site (http://www.apache.org) provides online documentation. Two excellent books on Apache configuration are Apache: The Definitive Guide, by Ben and Peter Laurie (O'Reilly), and Linux Apache Web Server Administration, by Charles Aulds (Sybex). However, you'll probably find more information about the httpd.conf file than you need for an average configuration right here in this chapter.
The httpd.conf file that comes with Solaris has 160 active configuration lines. To tackle that much information, the following sections organize the configuration directives into different groups. Note that the configuration file itself organizes directives by scope: global environment directives, main server directives, and virtual host directives. (Virtual hosts are explained later in this chapter.) Although that organization is great for httpd when it is processing the file, it's not so great for a human reading the file. Here, related directives are grouped by function to make the individual directives more understandable. Once you understand the individual directives, you will understand the entire configuration.
We start our look at the httpd.conf file with the directives that load dynamically loadable modules. These modules must be loaded before the directives they provide can be used in the configuration, so it makes sense to discuss loading the modules before we discuss the features they provide. Understanding dynamically loadable modules is a good place to start understanding Apache configuration.
The two directives that appear most in the Solaris httpd.conf file are LoadModule and AddModule. Together, they make up more than 60 of the 160 active lines in the httpd.conf file. All 60 of these lines configure the Dynamic Shared Object (DSO) modules used by the Apache web server.
Apache is composed of many software modules. Like kernel modules, DSO modules can be compiled into Apache or loaded at runtime. Running httpd with the -l command-line option lists all the modules compiled into Apache. The following example is from a Solaris 8 system:
$ /usr/apache/bin/httpd -l Compiled-in modules: http_core.c mod_so.c
Some systems may have many modules compiled into the Apache daemon. Solaris and Red Hat systems are delivered with only the following two modules compiled in:
This is the core module. It is always statically linked into the Apache kernel, and it provides the basic functions that must be found in every Apache web server. This module is required; all other modules are optional.
This module provides runtime support for Dynamic Shared Object modules. It is required if you plan to dynamically link in other modules at runtime. If modules are loaded through the httpd.conf file, this module must be installed in Apache to support those modules. For this reason it is often statically linked into the Apache kernel.
In addition to these statically linked modules, Solaris uses many dynamically loadable modules. The LoadModule and AddModule directives are used in the httpd.conf file to load DSOs. First, each module is identified by a LoadModule directive. For example, this line in the Solaris httpd.conf file identifies the module that tracks users through the use of cookies:
LoadModule usertrack_module /usr/apache/libexec/mod_usertrack.so
The LoadModule directive is followed by the module name and the path of the shared object file.
Before a module can be used, it must be added to the list of modules that are available to Apache. The first step in building the new module list is to clear the old one. This is done with the ClearModuleList directive. ClearModuleList has no arguments or options. It occurs in the httpd.conf file after the last LoadModule directive and before the first AddModule directive.
The AddModule directive adds a module name to the module list. The module list must include all optional modules, both those compiled into the server and those that are dynamically loaded. On our sample Solaris system, that means that there is one more AddModule directive in the httpd.conf file than there are LoadModule directives. The extra AddModule directive handles mod_so.c, which is the only optional module compiled into Apache on our sample system.[127]
[127]The http_core.c module is an integrated part of Apache. It is not installed with LoadModule and AddModule commands.
Mostly, however, LoadModule and AddModule directives occur in pairs: there is one AddModule directive for every LoadModule directive. For example, the following AddModule directive in the Solaris httpd.conf file adds the usertrack_module defined by the LoadModule directive shown previously to the module list:
AddModule mod_usertrack.c
The AddModule directive is followed by the name of the source file for the module being loaded. Notice that this is the name of the source file that produced the object module, not the module name seen in the LoadModule directive. This name is identical to the object filename except for the extension. In the LoadModule directive, which uses the shared object extension .so, the object filename is mod_usertrack.so. AddModule uses the source filename extension .c, so the module name is mod_usertrack.c.
Table 11-1 lists all the modules referenced by AddModule directives in the Solaris 8 httpd.conf file.
Module |
Function |
---|---|
mod_access |
Enables allow/deny type access controls. |
mod_actions |
Enables the use of user-defined handlers for specific MIME types or access methods. |
mod_alias |
Allows references to documents and scripts outside the document root. |
mod_asis |
Defines file types returned without headers. |
mod_auth |
Enables user authentication. |
mod_auth_anon |
Enables anonymous logins. |
mod_auth_dbm |
Enables use of a DBM authentication file. |
mod_autoindex |
Enables automatic index generation. |
mod_cern_meta |
Enables compatibility with old CERN web servers. |
mod_cgi |
Enables execution of CGI programs. |
mod_digest |
Enables MD5 authentication. |
mod_dir |
Controls formatting of directory listings. |
mod_env |
Allows CGI scripts and server-side includes (SSI) to inherit all shell environment variables. |
mod_expires |
Set the date for the Expires: header. |
mod_headers |
Enables customized response headers. |
mod_imap |
Processes image map files. |
mod_include |
Processes SSI files. |
mod_info |
Enables use of the server-info handler. |
mod_log_config |
Enables use of custom log formats. |
mod_mime |
Provides support for MIME files. |
mod_mime_magic |
Determines the MIME type of a file from its content. |
mod_negotiation |
Enables MIME content negotiation. |
mod_perl |
Provides support for the Perl language. |
mod_proxy |
Enables web caching. |
mod_rewrite |
Enables URI-to-filename mapping. |
mod_setenvif |
Sets environment variables from client information. |
mod_so |
Provides runtime support for dynamic shared objects (DSOs). |
mod_speling |
Automatically corrects minor spelling errors. |
mod_status |
Provides web-based access to the server-info report. |
mod_unique_id |
Generates a unique request identifier for each request. |
mod_userdir |
Defines where users can create public web pages. |
mod_usertrack |
Provides user tracking through a unique identifier called a cookie. |
mod_vhost_alias |
Provides support for name-based virtual hosts. |
If you decide to add modules to your configuration, do so very carefully. The order of the LoadModule and AddModule directives in the httpd.conf file is critical. Don't change things without knowing what you're doing. Before proceeding with a new installation, read the documentation that comes with your new module and the modules documentation found in the manual/mod directory of the Apache distribution. See the previously mentioned book Linux Apache Web Server Administration for detailed advice about adding new modules.
Once the DSOs are loaded, the directives that they provide can be used in the configuration file. Let's continue looking at the Solaris httpd.conf file by examining some of the basic configuration directives.
This section covers six different directives. The directives as they appear in the sample configuration we created for our Solaris system are:
ServerAdmin webmaster@www.wrotethebook.com ServerName www.wrotethebook.com UseCanonicalName On ServerRoot "/var/apache" ServerType standalone Port 80
Two of the basic directives, ServerAdmin and ServerName, were touched upon earlier in the chapter. ServerAdmin defines the email address of the web server administrator. This is set to a bogus value, you@your.host, in the default Solaris configuration. You should change this to the full email address of the real web administrator before starting the server.
ServerName defines the hostname returned to clients when they read data from this server. In the default Solaris configuration, the ServerName directive is commented out, which means that the "real" hostname is sent to clients. Thus, if the name assigned to the first network interface is crab.wrotethebook.com, then that is the name sent to clients. Many Apache experts suggest defining an explicit value for ServerName in order to document your configuration and to ensure that you get exactly the value you want. Earlier, we set ServerName to www.wrotethebook.com, so that even though the web server is running on crab, the server will be known as www.wrotethebook.com during web interactions. Of course, www.wrotethebook.com must be a valid hostname configured in DNS. (See Chapter 8, "Configuring DNS", where www is defined as a nickname for crab in the wrotethebook.com zone file.)
A configuration directive related to ServerName is UseCanonicalName, which defines how httpd builds "self-referencing" URLs. A self-referencing URL contains the name of the server itself in the hostname portion of the URL. For example, on the server www.wrotethebook.com, a URL that starts with http://www.wrotethebook.com would be a self-referencing URL. The hostname in the URL should be a canonical name, which is a name that DNS can resolve to a valid IP address. When UseCanonicalName is set to on, as it is in the default Solaris configuration, the value in ServerName is used to identify the server in self-referencing URLs. For most configurations, leave it set to on. If it is set to off, the value that came in the query from the client is used.
The ServerRoot option defines the directory that contains important files used by httpd, including error files, log files, and the three configuration files: httpd.conf, srm.conf, and access.conf. In the Solaris configuration, ServerRoot points to /var/apache. This is surprising in that the Solaris httpd configuration files are actually located in /etc/apache, so clearly something else is at work.
Solaris uses the -f option on the httpd command line to override the location of the httpd.conf file at runtime. httpd is started at boot time using the script /etc/init.d/apache. That script defines a variable named CONF_FILE that contains the value /etc/apache/httpd.conf. This variable is used with the httpd command that launches the web server, and it is this variable that defines the location of the configuration file on a Solaris system.
The ServerType option defines how the server is started. If the server starts from a startup script at boot time, the option is set to standalone. If the server is run on demand by inetd, the option is set to inetd. The default Solaris configuration sets ServerType to standalone, which is the best value; web servers are usually in high demand, so it is best to start them at boot time. It is possible, of course, for a user to set up a small, rarely used web site on a desktop workstation, in which case running the server from inetd may be desirable. But the web server you create for your network should be standalone.
Port defines the TCP port number used by the server. The standard port number is 80. On occasion, private web servers run on other port numbers. For example, Solaris runs the AnswerBook2 server on port 8888. Other popular alternative ports for special-purpose web sites are 8080 and 8000. If you change the port number, you must then tell your users the nonstandard port number. For example, http://jerboas.wrotethebook.com:8080 is a URL for a web site running on TCP port 8080 on host jerboas.wrotethebook.com.
When ServerType is set to inetd, it is usually desirable to set Port to something other than 80. The reason for this is that the ports under 1024 are "privileged" ports. If 80 is used, httpd must be run from inetd with the userid root. This is a potential security problem, as an intruder might be able to exploit the web site to get root access. Using port 80 is okay when ServerType is standalone because the initial httpd process does not provide direct client service. Instead it starts several other HTTP daemons, called the swarm, to provide client services. The daemons in the swarm do not run with root privilege.
In the original web server design, the server would create separate processes to handle individual requests. This placed a heavy load on the CPU when the server was busy and had a major negative impact on responsiveness. It was possible for the entire system to be overwhelmed by httpd processes.
Apache uses a different approach. A swarm of server processes starts at boot time (the ps command earlier in the chapter shows several httpd processes running on the Solaris system), and all the processes in the swarm share the workload. If all the persistent httpd processes become busy, spare processes are started to share the work. Five directives in the Apache configuration control how the swarm of server child processes is managed. They are:
This directive sets the minimum number of idle server processes that must be maintained. In the Solaris configuration, this is set to 5, which is the default value used in the Apache distribution. When the number of idle processes drops below 5, another process is created to maintain the correct number of idle processes. Five is a good value for an average server; it allows a burst of up to five quick requests to be handled without making the client wait for a child process to start. A lightly used server might have a lower number, and a heavily used server could benefit from a higher number. However, you don't want too many idle servers waiting around for requests that may never come.
This directive sets the maximum number of idle server processes that may be maintained. It prevents too many idle servers from sitting around with nothing to do. If the number of idle servers exceeds axSpareServers, the excess idle servers are killed. In the Solaris configuration, MaxSpareServers is set to 10, which is the default value that ships with the Apache distribution. Set this value to about twice the value set for MinSpareServers.
This directive defines the number of httpd daemons started at boot time. In the Solaris configuration, it is set to 5. The effect of this directive can be seen in the output of the ps command earlier in this chapter, which showed that six httpd daemons were running. One of these is the parent process that manages the swarm; the other five are the child processes that actually handle client requests for data.
This directive sets the maximum number of client connections that can be serviced simultaneously. HTTP connection requests beyond the number set by MaxClients are rejected. Solaris sets this to 150, which is the most commonly used value. MaxClients prevents the server from consuming all system resources when it receives an overwhelming number of client requests. MaxClients should be increased only if you have an extremely powerful system with fast disks and a large amount of memory. It is generally best to handle additional clients by adding additional servers. The upper limit for MaxClients is set by HARD_SERVER_LIMIT, which is compiled into Apache. The default for HARD_SERVER_LIMIT is 256.
This directive defines the number of client requests a child process can handle before it must terminate. Solaris sets MaxRequestsPerChild to 0, which means "unlimited" -- a child process can keep handling client requests for as long as the system is up and running. This directive should always be set to 0, unless you know for a fact that the library you used to compile Apache has a memory leak.
The User and Group directives define the UID and GID under which the swarm of httpd processes are run. When httpd starts at boot time, it runs as a root process, binds to port 80, and then starts a group of child processes that provide the actual web services. These child processes are the ones given the UID and GID defined in the file. The UID and GID should provide the least possible system privileges to the web server. On the Solaris system, this is the user nobody and the group nobody. The previous ps command output shows this clearly. One httpd process belongs to root and five other httpd processes belong to the user nobody. An alternative to using nobody is to create a userid and groupid just for httpd. If you do this, create the file permissions granted to the new user account very carefully. The advantage of creating a special user and group for httpd is that you can use group permissions for added protection, and you won't be completely dependent on the world permissions granted to nobody.
The DocumentRoot directive defines the directory that contains the web server documents. For security reasons, this is not the same directory that holds the configuration files. As we saw earlier, the Solaris setting for DocumentRoot is:
DocumentRoot "/var/apache/htdocs"
To apply directives to a specific directory, create a container for those directives. Three of the httpd.conf directives used to create containers are:
The Directory directive creates a container for directives that apply to the directory identified by pathname. Any configuration directives that occur after the Directory directive and before the next </Directory> statement apply only to the specified directory.
The Location directive creates a container for directives that apply to a specific document. Any configuration directives that occur after the Location directive and before the next </Location> statement apply only to the specified document.
The Files directive creates a container for directives that apply to the file identified by filename. Any configuration directives that occur after the Files directive and before the next </Files> statement apply only to the specified file. filename can refer to more than one file if it contains the Unix wildcard character * or ?. Additionally, if the Files directive is followed by an optional ~ (tilde), the filename field is interpreted as a regular expression.
Directories and files are easy to understand: they are parts of the Unix filesystem that every system administrator knows. Documents, on the other hand, are specific to the web server. The screenful of information that appears in response to a web query is a document; it can be made up of many files from different directories. The Location container provides an easy way to refer to a complex document as a single entity. We will see examples of Location and Files containers later in this chapter. Here we look at Directory containers.
The Solaris configuration defines a Directory container for the server's root directory and for the DocumentRoot:
<Directory /> Options FollowSymLinks AllowOverride None </Directory> <Directory "/var/apache/htdocs"> Options Indexes FollowSymLinks AllowOverride None Order allow,deny Allow from all </Directory>
Each Directory container starts with a Directory directive and ends with a </Directory> tag. Both containers shown here enclose configuration statements that apply to only a single directory. The purpose of the directives inside these containers is covered later in Section 11.4, "Web Server Security". For now, it is sufficient to understand that containers are used inside the httpd.conf file to limit the scope of various configuration directives.
The Alias directive and the ScriptAlias directive both map a URL path to a directory on the server. For example, the Solaris configuration contains the following three directives:
Alias /icons/ "/var/apache/icons/" Alias /manuals/ "/usr/apache/htdocs/manual/" ScriptAlias /cgi-bin/ "/var/apache/cgi-bin/"
The first line maps the URL path /icons/ to the directory /var/apache/icons/. Thus a request for www.wrotethebook.com/icons/ is mapped to www.wrotethebook.com/var/apache/icons/. The second directive maps the URL path /manuals/ to www.wrotethebook.com/usr/apache/htdocs/manual/.
You may have several Alias directives to handle several different mappings, but you will have only one ScriptAlias directive. The ScriptAlias directive functions in exactly the same ways as the Alias directive, except that the directory it points to contains executable CGI programs. Therefore, httpd grants this directory execution privileges. ScriptAlias is particularly important because it allows you to maintain executable web scripts in a directory separate from the DocumentRoot. CGI scripts are the single biggest security threat to your server; maintaining them separately allows you to have tighter control over who has access to the scripts.
The Solaris configuration has containers for the /var/apache/icons directory and the /var/apache/cgi-bin directory, but none for the /usr/apache/htdocs/manual directory. Just because a directory is defined inside the httpd.conf file does not mean that a Directory container must be created for that directory. The /var/apache/icons and the /var/apache/cgi-bin containers are shown here:
<Directory "/var/apache/icons"> Options Indexes MultiViews AllowOverride None Order allow,deny Allow from all </Directory> <Directory "/var/apache/cgi-bin"> AllowOverride None Options None Order allow,deny Allow from all </Directory>
These containers enclose AllowOverride, Options, Order, and Allow statements -- all of which relate to security. Most of the directives found in containers have security implications, and have been placed in containers to provide special security settings for a file, document, or directory. All of the directives used in the containers shown above are covered in Section 11.4, "Web Server Security" later in this chapter.
The UserDir directive enables personal user web pages and points to the directory that contains the user pages. UserDir usually points to public_html, and it does in the Solaris configuration. With this default setting, users create a directory named public_html in their home directories to hold their personal web pages. When a request comes in for www.wrotethebook.com/~sara, for example, it is mapped to www.wrotethebook.com/export/home/sara/public_html. An alternative is to define a full pathname on the UserDir directive line such as /export/home/userpages. Then the administrator creates the directory and allows each user to store personal pages in subdirectories of this directory, so that a request for www.wrotethebook.com/~sara will map to www.wrotethebook.com/export/home/userpages/sara. The advantage of this approach is that it makes it easier for you to monitor the content of user pages. The disadvantage is that a separate user web directory tree must be created and protected separately, whereas a web folder within the user's home directory will inherit the protection of that user's home.
The PidFile and ScoreBoardFile directives define the paths of files that relate to process status. The PidFile is the file in which httpd stores its process ID, and the ScoreBoardFile is the file where httpd writes process status information.
The DirectoryIndex option defines the name of the file retrieved if the client's request does not include a filename. Our Solaris system has the following value for this option:
DirectoryIndex index.html
Given the value defined for DocumentRoot and this value, if the server gets a request for http://www.wrotethebook.com, it gives the client the file /var/apache/htdocs/index.html. If it gets a request for http://www.wrotethebook.com/books/, it gives the client the file /var/apache/htdocs/books/index.html. The DocumentRoot is prepended to every request, and the DirectoryIndex is appended to any request that doesn't end in a filename.
Earlier in this chapter, we saw from an ls of /var/apache/htdocs that the directory contains a file named index.html. But what if it didn't? What would Apache send to the client? If the file index.html is not found in the directory, httpd sends the client a listing of the directory, if the configuration permits it. A directory listing is allowed if the Options directive in the Directory container for the directory contains the keyword Indexes. (More on Options later.) If a directory index is allowed, several different directives control how that directory listing is formatted.
The keyword FancyIndexing is used on the IndexOptions directive line to enable a "fancy index" of the directory when Apache is forced to send the client a directory listing. When fancy indexing is enabled, httpd creates a directory list that includes graphics, links, and other advanced features. The Solaris configuration enables fancy indexing with the IndexOptions directive, and it contains about 20 extra lines to help configure the fancy index. Solaris uses the following directives to define the graphics and features used in the fancy directory listing:
Identifies the files that should not be included in the directory listing. Files can be specified by name, partial name, extension, or by standard wildcard characters.
Specifies the name of a file that contains information to be displayed at the top of the directory listing.
Specifies the name of a file that contains information to be displayed at the bottom of the directory listing.
Points to the icon used to represent a file based on its MIME encoding type.
Points to the icon used to represent a file based on its MIME file type.
Points to the icon used to represent a file based on its extension.
Points to the icon file used to represent a file that has not been given an icon by any other option.
MIME file types and file extensions play a major role in helping the server determine how a file should be handled. Specifying IME options is also a major part of the Solaris httpd.conf file. The directives involved are:
Defines the MIME type that is used when the server cannot determine the type of a file. In the Solaris configuration this is set to text/plain. Thus, when a file has no file extension, the server assumes it is a plain-text file.
Maps a MIME encoding type to a file extension. The Solaris configuration contains two AddEncoding directives:
AddEncoding x-compress Z AddEncoding x-gzip gz tgz
The first directive maps the extension Z to the IME encoding type x-compress. The second line maps the extensions gz and tgz to MIME encoding type x-gzip.
Maps a MIME language type to a file extension. The Solaris configuration contains mappings for six languages, e.g., .en for English and .fr for French.
Sets the priority of the language encoding used when preparing multiviews, and the language used when the client does not specify a preference. In the Solaris configuration, the priority is English (en), French (fr), and German (de). This means that English, French, and German views will be prepared if multiviews are used. The client will be sent the English version if no language preference is specified.
Maps a MIME file type to a file extension. The Solaris configuration has only one AddType directive; it maps MIME type application/x-tar to the extension .tgz. A configuration can have several AddType directives.
Another directive that is commonly used to process files based on the filename extension is the AddHandler directive. This directive maps a file handler to a file extension. A file handler is a program that knows how to process a specific file type. For example, the handler cgi-script is able to execute CGI files. The Solaris configuration does not define any optional handlers, so all the AddHandler directives are commented out.
The KeepAlive directive enables the use of persistent connections. Without persistent connections, the client must make a new connection to the server for every link the user selects. Because HTTP runs over TCP, every connection requires a connection setup, adding time to every file retrieval. With persistent connections, the server waits to see if the client has additional requests before it closes the connection. Therefore, the client does not need to create a new connection to request a new document. The KeepAliveTimeout defines the number of seconds the server holds a persistent connection open waiting to see if the client has additional requests. The Solaris configuration turns KeepAlive on and sets KeepAliveTimeout to 15 seconds.
MaxKeepAliveRequests defines the maximum number of requests that will be accepted on a "kept-alive" connection before a new TCP connection is required. Solaris sets this value to 100, which is the Apache default. Setting MaxKeepAliveRequests to 0 allows unlimited requests. 100 is a good value for this parameter: few users request 100 document transfers, so the value essentially creates a persistent connection for all reasonable cases. If the client does request more than 100 document transfers, it might indicate a problem with the client system, so requiring another connection request is probably a good idea.
Timeout defines the number of seconds the server waits for a transfer to complete. The value needs to be large enough to handle the size of the files your site sends as well as the low performance of the modem connections of your clients. But if it is set too high, the server will hold open connections for clients that may have gone offline. The Solaris configuration has the Timeout set to 5 minutes (300 seconds), which is a very common setting.
BrowserMatch is a different type of tuning parameter: it reduces performance for compatibility's sake. The Solaris configuration contains the following five BrowserMatch directives:
BrowserMatch "Mozilla/2" nokeepalive BrowserMatch "MSIE 4\.0b2;" nokeepalive downgrade-1.0 force-response-1.0 BrowserMatch "RealPlayer 4\.0" force-response-1.0 BrowserMatch "Java/1\.0" force-response-1.0 BrowserMatch "JDK/1\.0" force-response-1.0
The BrowserMatch statements are used to present information in ways that are compatible with the capabilities of different web browsers. For example, a browser may be able to handle only HTTP 1.0, not HTTP 1.1. In this case, downgrade-1.0 is used on the BrowserMatch line to ensure that the server uses only HTTP 1.0 when dealing with that browser.
In the Solaris configuration, keepalives are disabled for two browsers. One browser is offered only HTTP 1.0 during the connection, and responses are formatted to be compatible with HTTP 1.0 for four different browsers.
Don't fiddle with the BrowserMatch directives. These settings are shipped as defaults in the Apache distribution, and are set to handle the limitations of different browsers. These are tuning parameters, but they are used by the Apache developers to adjust to the limitations of older browsers.
HostnameLookups tells httpd whether or not it should log hostnames as well as IP addresses. The advantage of enabling hostname logging is that you get a more readable log. The disadvantage is that httpd has the added overhead of DNS name lookups. Setting this to off, as in the Solaris configuration, enhances server performance. The HostnameLookups directive affects what is logged, but its major impact is on system performance, which is why we cover it under tuning parameters instead of logging directives.
Log files provide a great deal of information about the web server. The following seven lines define the Apache logging configuration in the default Solaris 8 httpd.conf file:
ErrorLog /var/apache/logs/error_log LogLevel warn LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined LogFormat "%h %l %u %t \"%r\" %>s %b" common LogFormat "%{Referer}i -> %U" referer LogFormat "%{User-agent}i" agent CustomLog /var/apache/logs/access_log common
ErrorLog defines the path of the error log file. Use the error log to track and correct failures. You should review the log at least once a day to check for problems. To keep a close eye on the file while you're logged in, use the tail command with the -f option:
$ tail -l 1 -f /var/log/httpd/apache/error_log
The tail command prints the tail end of a file; in the example, the file is /var/log/httpd/apache/error_log. The -l option is the lines option. It tells tail how many lines from the end of the file to print. In this case, -l 1 directs tail to print the (one) last line in the file. The -f option keeps the tail process running so that you will see each record as it is written to the file. This allows you to monitor the file in real time.
The LogLevel directive defines the type of events written to the error log. The Solaris configuration sets LogLevel to warn, which specifies that warnings and other more critical errors are to be written to the log. This is a safe setting for an error log because it logs a wide variety of operational errors. LogLevel has eight possible settings: debug, info, notice, warn, error, crit, alert, and emerg. The log levels are cumulative. For example, warn provides warnings, errors, critical messages, alerts, and emergency messages; debug provides all types of logging, which causes the file to grow at a very rapid rate; emerg keeps the file small but notifies you only of disasters. warn is a good compromise between not enough detail and too much detail.
Just as important as reporting errors, the logs provide information about who is using the server, how much it is being used, and how well it is servicing the users. Web servers are used to distribute information; if no one wants or uses the information, you need to know it. The LogFormat and CustomLog directives do not configure the error log, but rather how server activity is logged.
The LogFormat directives define the format of log file entries. A LogFormat directive contains two things: the layout of a file entry and a label used in the httpd.conf file to identify the log entry. The layout of the entry is placed directly after the LogFormat keyword and is enclosed in quotes. The layout is defined using literals and variables.
Examining a sample LogFormat directive shows how the variables are used. The basic Apache log file conforms to the Common Log Format (CLF). CLF is a standard used by all web server vendors, and using this format means that the logs generated by Apache servers can be processed by any log analysis tool that conforms to the standard. The format of a standard CLF entry is clearly defined by the second LogFormat directive in the Solaris httpd.conf file:
LogFormat "%h %l %u %t \"%r\" %>s %b" common
This LogFormat directive specifies exactly the information required for a CLF log entry. It does this using seven different LogFormat variables:
Logs the IP address of the client. If HostnameLookups is set to on, this is the client's fully qualified hostname. On the sample Solaris system, this would be the client's IP address because HostnameLookups is turned off to enhance server performance.
Logs the username used to log in to the client, if available. The name is retrieved using the identd protocol; however, most clients do not run identd and thus do not provide this information. Therefore, this field usually contains a hyphen to indicate a missing value. Likewise, if the server does not have a value for a field, the log contains a hyphen in the field.
Logs the username used to access a password-protected web page. This should match a name you defined in the AuthUser file or the AuthDBMUser database you created on the server. (AuthUser and AuthDBMUser are covered in Section 11.4, "Web Server Security" of this chapter.) Most documents are not password protected, and therefore this field contains a hyphen in most log entries.
Logs the date and time the log entry was made.
Logs the first line of the client's request. This often contains the URL of the requested document. The \" characters in the LogFormat directive indicate that quotes should be inserted in the output. In the log file, the client's request will be enclosed in quotes.
Logs the status of the last request. This is the three-digit response code that the server returned to the client.
Logs the number of bytes contained in the document sent to the client.
Apache log entries are not limited to the CLF format. The LogFormat directive lets you define what information is logged. A wide variety of information can be logged.
The Solaris configuration contains three additional LogFormat directives that demonstrate some optional log formats. The three directives are:
LogFormat "%{User-agent}i" agent LogFormat "%{Referer}i -> %U" referer LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined
All of these directives log the contents of HTTP headers. For example, the first directive logs the value received from the client in the User-agent header. User-agent is the user program that generates the document request; generally this is the name of a browser. The format that logs the header is:
%{User-agent}i
This format works for any header: simply replace User-agent with the name of the header. The i indicates that this is an input header; output headers are indicated by an o. Apache can log the contents of any header records received or sent.
The second LogFormat directive logs the contents of the Referer header received from the client (%{Referer}i), the literal characters dash and greater-than sign (->), and the requested URL (%U). Referer is the name of the remote site that referred the client to your web site; %U is the document to which the site referred the client.
The last LogFormat directive starts with the CLF (%h %l %u %t \"%r\" %>s %b \") and adds to that the values from the Referer header and the User-agent header. This format is labeled combined because it combines the CLF with other information; the other two formats are also aptly labeled as agent and referer. Yet none of these formats is actually used in the Solaris configuration. Simply creating a LogFormat is not enough to generate a log file; you must also add a matching CustomLog directive to map the format to a file, as explained later.
In the LogFormat directive, the layout of the log entry is enclosed in quotes. The label that occurs after the closing quote is not part of the format. In the LogFormat directive that defines the CLF format, the label common is an arbitrary string used to tie the LogFormat directive to a CustomLog directive. In the Solaris configuration, that particular LogFormat is tied to the file /var/apache/logs/access_log defined by this line:
CustomLog /var/apache/logs/access_log common
The label common binds the two directives together. Thus the CLF entries defined by this LogFormat directive are written to the file defined by this CustomLog directive.
In the Solaris configuration, the other CustomLog directives that create the agent, referer, and combined log files are commented out:
#CustomLog /var/apache/logs/referer_log referer #CustomLog /var/apache/logs/agent_log agent #CustomLog /var/apache/logs/access_log combined
The referer_log stores the URL of the source page that linked to your web server. This helps you determine what sites are pointing to your web pages. Entries in the referer_log are defined by this line:
LogFormat "%{Referer}i -> %U" referer
To create the log, uncomment this line:
CustomLog /var/apache/logs/referer_log referer
The agent_log identifies the browsers that are used to access your site, and is defined by this LogFormat statement:
LogFormat "%{User-agent}i" agent
To create the log, uncomment this line:
CustomLog /var/apache/logs/agent_log agent
Lastly, the format for the expanded CLF log is defined by this line:
LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined
To create a combined log, uncomment this line:
CustomLog /var/apache/logs/access_log combined
and comment this line:
#CustomLog /var/apache/logs/access_log common
These changes cause the combined log format to be used to build a log file named /var/apache/logs/access_log. This is the same log file that is used by the default common log format. To avoid duplicate log entries, turn off common logging when you turn on combined logging. In effect, these changes switch the access_log file from using the common log format to logging the combined log entry.
Each LogFormat statement and its associated CustomLog statement end with the same label. The label is an arbitrary name used to bind the format and the file together.
Apache also supports conditional logging to identify fields that are logged only when certain status codes are returned by the server. The status codes are listed in Table 11-2.
Status code |
Meaning |
---|---|
200: OK |
A valid request |
302: Found |
The document was found |
304: Not Modified |
The requested document has not been modified |
400: Bad Request |
An invalid request |
401: Unauthorized |
The client or user is denied access |
403: Forbidden |
The requested access is not allowed |
404: Not Found |
The requested document does not exist |
500 Server Error |
An unspecified server error occurred |
503: Out of Resources (Service Unavailable) |
The server has insufficient resources to honor the request |
501: Not Implemented |
The requested server feature is not available |
502: Bad Gateway |
The client specified an invalid gateway |
To make a field conditional, put one or more status codes on the field in the LogFormat entry. If multiple status codes are used, separate them with commas. Assume that you want to log the browser name only if the browser requests a service that is not implemented in your server. Combine the Not Implemented (501) status code with User-agent header in this manner:
%501{User-agent}i
If this value appears in the LogFormat statement, the name of the browser is logged only when the status code is 501.
Place an exclamation mark in front of the status codes to specify that you want to log a field only when the status code does not contain the specified values. For example, to log the address of the site that referred the user to your web page only if the status code is not one of the good status codes, add the following to a LogFormat:
%!200,302,304{Referer}i
This particular conditional log entry is very useful, as it tells you when a remote page has a stale link pointing to your web site.
Combine these features with the common log format to create a more useful log entry. Here we modify the Solaris combined format to include conditional logging:
LogFormat "%h %l %u %t \"%r\" %>s %b \"%!200,302,304{Referer}i\" \"%{User-Agent}i\"" combined
This entry provides all the data of the CLF and thus can be analyzed by standard tools. But it also provides the browser name and, when the user requests a stale link, it provides the address of the remote site that references that link.
Despite the fact that the Solaris configuration file contains over 160 active lines, there are some interesting Apache features that the Solaris configuration does not exploit. Before we move on to the important ongoing tasks of server security and server monitoring, the following sections provide a quick overview of three features not included in the default Solaris configuration: proxies and caching, multi-homed server configuration, and virtual hosts.
Servers that act as intermediaries between clients and web servers are called proxy servers. When firewalls are used, direct web access is often blocked. Instead, users connect to the proxy server through the local network, and the proxy server is trusted to connect to the remote web server. Proxy servers can maintain cached copies of remote servers' web pages to improve performance by reducing the amount of traffic sent over the wide area network and by reducing the contention for popular web sites. The options that control caching behavior are:
Allows proxy servers to cache web pages from your server. By default, Apache asks proxy servers not to cache your server's web pages. This option takes no command-line arguments.
Setting this option to on turns your server into a proxy server. By default, this is set to off.
Enables or disables the use of Via: headers, which aid in tracking where cached pages actually came from.
Specifies the directory path where cached web pages are written when this server is configured as a proxy server. To avoid making the directory writable by the user nobody, create a special userid for httpd when you run a proxy server.
Sets the maximum size of the cache in kilobytes. The default is 5.
Sets the time interval (in hours) at which the server prunes the cache. The default is 4. Given the defaults, the server prunes the cache down to 5 kilobytes every 4 hours.
Sets the maximum number of hours a document can be held in the cache without requesting a fresh copy from the remote server. The default is 24 hours. With the default, a cached document can be up to a day out of date.
Sets the length of time a document is cached based on when it was last modified. The default factor is 0.1. Therefore, if a document that was modified 10 hours ago is retrieved, it is held in the cache for only 1 hour before a fresh copy is requested. The assumption is that if a document changes frequently, the time of its last modification will be recent; thus, documents that change frequently are cached for only a short period of time. Regardless, nothing is cached longer than CacheMaxExpire.
Sets a default cache expiration value for protocols that do not provide the value. The default is 1 hour.
Defines a list of servers whose pages you do not want to cache. If you know that a server has constantly changing information, you won't want to cache information from that server because your cache will always be out of date. Listing the name of that server on the NoCache command line means that queries are sent directly to the server, and responses from the server are not saved in the cache.
All of these directives are commented out in the Solaris configuration. By default, the Solaris Apache server is not configured to be a proxy server. If you need to create a proxy server, refer to a book dedicated to Apache configuration such as Linux Apache Web Server Administration.
Web servers with more than one IP address are said to be multi-homed. A multi-homed web server needs to know what address it should listen to for incoming server requests. There are two configuration options to handle this:
Specifies the address used for server interactions. The default value is *, which means that the server should respond to web service requests addressed to any of its valid IP addresses. If a specific address is used on the BindAddress line, only requests for that address are honored.
Specifies addresses and ports to monitor for web service requests in addition to the default port and address. Address and port pairs are separated by a colon. For example, to monitor port 8080 on IP address 172.16.12.5, enter Listen 172.16.12.5:8080. If a port is entered with no address, the address of the server is used. If the Listen directive is not used, httpd monitors only the port defined by the Port directive.
The BindAddress and Listen directives are commented out of the Solaris configuration.
Some of the options commented out of the sample httpd.conf file are used if your server hosts multiple web sites. For example, to host web sites for fish.edu and mammals.com on the crab.wrotethebook.com server, add these lines to the httpd.conf file:
<VirtualHost "www.fish.edu"> DocumentRoot /var/apache/fish ServerName www.fish.edu </VirtualHost> <VirtualHost "www.mammals.com"> DocumentRoot /var/apache/mammals ServerName www.mammals.com </VirtualHost>
Each VirtualHost option defines a hostname alias that your server responds to. For this to be valid, DNS must define the alias with a CNAME record. Our example requires CNAME records that assign crab.wrotethebook.com the aliases of www.fish.edu and www.mammals.com. When crab receives a server request addressed to one of these aliases, it uses the configuration parameters defined here to override its normal settings. Therefore, when it gets a request for www.fish.edu, it uses www.fish.edu as its ServerName value instead of its own server name, and /var/apache/fish as the DocumentRoot.
Copyright © 2002 O'Reilly & Associates. All rights reserved.