3.2. A Cryptography Primer
We've
covered the basic properties of SSH. Now we focus on cryptography,
introducing important terms and ideas regarding the technology in
general. There are many good references on cryptographic theory and
practice, and we make no attempt here to be comprehensive. (For more
detailed information, check out Bruce Schneier's excellent
book,
Applied Cryptography, published by John
Wiley & Sons.) We introduce encryption and decryption, plaintext
and ciphertext, keys, secret-key and public-key cryptography, and
hash functions, both in general and as they apply to SSH.
Encryption is
the process of scrambling data so that it can't be read by
unauthorized parties. An
encryption algorithm
(or
cipher)
is a particular method of performing the scrambling; examples of
currently popular encryption algorithms are RSA, RC4, DSA, and IDEA.
The original, readable data is called the
plaintext,
or data in the clear, while the encrypted version is called the
corresponding ciphertext. To convert plaintext to ciphertext, you
apply an encryption algorithm parameterized by a
key, a string that is
typically known only to you. An encryption algorithm is considered
secure if it is infeasible for anyone to read (or
decrypt)
the encrypted data without the key. An attempt to decrypt data
without its key is called
cryptanalysis.
3.2.1. How Secure Is Secure?
It's important to understand the word "infeasible"
in the previous paragraph. Today's most popular and secure
ciphers are vulnerable to
brute-force attacks: if you try every possible key,
you will eventually succeed in decryption. However, when the number
of possible keys is large, a brute-force search requires a great deal
of time and computing power. Based on the state of the art in
computer hardware and algorithms, it is possible to pick sufficiently
large key sizes so as to render brute-force key search infeasible for
your adversary. What counts as infeasible, though, varies depending
on how valuable the data is, how long it must stay secure, and how
motivated and well-funded your adversary is. Keeping something secret
from your rival startup for a few days is one thing; keeping it
secret from a major world government for 10 years is quite another.
Of course, for all this to make sense, you must be convinced that
brute force is the only way to attack your cipher. Encryption
algorithms have structure and are susceptible to mathematical
analysis. Over the years, many ciphers previously thought secure have
fallen to advances in cryptanalysis. It isn't currently
possible to
prove a practical cipher secure.
Rather, a cipher acquires respectability through intensive study by
mathematicians and cryptographers. If a new cipher exhibits good
design principles, and well-known researchers study it for some time
and fail to find a practical, faster method of breaking it than brute
force, then people will consider it secure.
[16]
3.2.2. Public- and Secret-Key Cryptography
Encryption algorithms as described so far are called
symmetric or
secret-key ciphers; the same key is used for
encrypting and decrypting. Examples are Blowfish, DES, IDEA, and RC4.
Such a cipher immediately introduces the
key-distribution
problem: how do you get the key to your intended recipient? If you
can meet in person every once and a while and exchange a list of
keys, all well and good, but for dynamic communication over computer
networks, this doesn't work.
Public-key, or
asymmetric, cryptography replaces the single key
with a pair of related keys: public and private. They are related in
a mathematically clever way: data encrypted with the public key may
be decrypted with its private counterpart, and it is infeasible to
derive the private key from the public one. You keep your private
key, well... private, and give the public key to anyone who wants it,
without worrying about disclosure. Ideally, you publish it in a
directory next to your name, like a telephone book. When someone
wants to send you a secret message, they encrypt it with your public
key. Other people may have your public key, but that won't
allow them to decrypt the message; only you can do that with the
corresponding private key. Public-key cryptography goes a long way
towards solving the key-distribution problem.
[17]
Public-key methods are also the basis for
digital
signatures: extra information attached to a
digital document to provide evidence that a particular person has
seen and agreed to it, much as a pen-and-ink signature does with a
paper document. Any asymmetric cipher (RSA, ElGamal, Elliptic Curve,
etc.) may be used for digital signatures, though the reverse
isn't true. For instance, the DSA algorithm, which is used by
the SSH-2 protocol for its keys, is a signature-only public-key
scheme and can't be used for encryption.
[18]
Secret- and public-key encryption algorithms differ in another way:
performance. All common public-key algorithms are enormously slower
than secret-key ciphers -- by orders of magnitude. It is simply
infeasible to encrypt large quantities of data using a public-key
cipher. For this reason, modern data encryption uses both methods
together. Suppose you want to send some data securely to your friend
Bob Bitflipper. Here's what a modern encryption program does:
- Generate a random key, called the bulk
key, for a fast, secret-key algorithm such as 3
DES (a.k.a the bulk cipher).
- Encrypt the plaintext with the bulk key.
- Secure the bulk key by encrypting it with Bob Bitflipper's
public key, so only Bob can decrypt it. Since secret keys are small
(a few hundred bits long at most), the speed of the public-key
algorithm isn't an issue.
To reverse the operation, Bob's decryption program first
decrypts the bulk key, and then uses it to decrypt the ciphertext.
This method yields the advantages of both kinds of encryption
technology, and in fact, SSH uses this technique. User data crossing
an SSH connection is encrypted using a fast secret-key cipher, the
key for which is shared between the client and server using
public-key methods.
3.2.3. Hash Functions
In cryptography (and elsewhere in computing and network technology),
it is often useful to know if some collection of data has changed. Of
course, one can just send along (or keep around) the original data
for comparison, but that can be prohibitively expensive both in time
and storage. The common tool addressing this need is called a
hash function. Hash functions are used by SSH-1 for
integrity checking (and have various other uses in cryptography we
won't discuss here).
A hash function is simply a mapping from a larger set of data values
to a smaller set. For instance, a hash function
H might take an input bit string of any length
up to 50,000 bits, and uniformly produce a 128-bit output. The idea
is that when sending a message
m to Alice, I
also send along the hash value
H(m). Alice
computes
H(m) independently and compares it to
the
H(m) value I sent; if they differ, she
concludes that the message was modified in transit.
This simple technique can't be completely effective. Since the
range of the hash function is strictly smaller than its domain, many
different messages have the same hash value. To be useful,
H must have the property that the kinds of
alterations expected to happen to the messages in transit, must be
overwhelmingly likely to cause a change in the message hash. Put
another way: given a message
m and a typical
changed message
m', it must be extremely
unlikely that
H(m) = H(m').
Thus a hash function must be tailored to its intended use. One common
use is in networking: datagrams transmitted over a network frequently
include a message hash that detects transmission errors due to
hardware failure or software bugs. Another use is in cryptography, to
implement digital signatures. Signing a large amount of data is
prohibitively expensive, since it involves slow public-key operations
as well as shipping along a complete encrypted copy of the data. What
is actually done is to first hash the document, producing a small
hash value, and then sign that, sending the signed hash along
instead. A verifier independently computes the hash, then decrypts
the signature using the appropriate public key, and compares them. If
they are the same, he concludes (with high probability) that the
signature is valid, and that the data hasn't changed since the
private key holder signed it.
These two uses, however, have different requirements, and a hash
function suitable for detecting transmission errors due to line noise
might be ineffective at detecting deliberate alterations introduced
by a human attacker! A cryptographic hash function must make it
computationally infeasible to find two different messages having the
same hash or to find a message having a particular fixed hash. Such a
function is said to be
collision-resistant (or
collision-proof, though that's a bit
misleading), and
pre-image-resistant .
The
Cyclic Redundancy Check hash commonly
used to detect accidental data changes (e.g., in Ethernet frame
transmissions) is an example of a non-collision-resistant hash. It is
easy to find CRC-32 hash collisions, and the SSH-1 insertion attack
is based on this fact. [
Section 3.10.5, "The Insertion Attack"] Examples of
cryptographically strong hash functions are MD5 and SHA-1.
| | |
3. Inside SSH | | 3.3. The Architecture of an SSH System |