home | legal stuff | glossary | blog | search

 Legend:  new window    outside link    tools page  glossary link   

Following a single SMTP handoff

What with all of the strange jargon, cryptic header dumps, and seemingly redundant information thrown around on this website and elsewhere, you might get the impression that e-mail is a rather complicated business — and, that’s a correct impression! Most of this complexity comes about because the basic protocol for transmitting e-mail is just that: a protocol for transmitting e-mail, and not for transmitting fully authenticated and verified e-mail. There are quite a few holes in the dyke, as it were, and at least as many measures that are undertaken to plug them, which results in the mail system taking on the appearance of a crazy-quilt. It can be tough to understand how the various pieces of the quilt are fitted together.

On this page, we’ll look at the transfer of a message from one mail host to another in which the simple mail transfer protocol (SMTP) is used. Specifically, we will look at the critical handoff of the message to the recipient’s mail exchanger (MX) host — the point at which the spam becomes the problem of the recipient and his ISP. My intention is to demystify this process and also to show some of the loopholes that spammers can exploit.

Preliminaries: we’ve got mail, what do we do with it?

We start with a computer (the “originating host”) that has an e-mail message to send. This computer could be any of the following:

  1. An honest mail host belonging to an ISP (or other business or institution) which received the message from one of its users (e.g., someone tapping away on a home or office computer);
  2. Some other honest machine (not strictly a mail host) that uses direct-to-MX mailing for some reason.
  3. A spammer’s machine (or an open proxy that he has “recruited” using viruses or other malware) that has direct-to-MX mail sending software running on it; or,
  4. An “open relay” mail host that got the message from a spammer (i.e., not an authorized user) seeking to hide his activites behind the coattails of the relay host.

Whatever the case, this machine has two pieces of information at hand:

The originating host’s first task is to figure out what remote machine it has to contact in order to send the mail. This machine is known as a mail exchanger or simply MX.

To find the proper MX to use, the originating host isolates the domain portion of the recipient’s e-mail address (“chinchilla.tv” in this case) and looks up the MX records for this domain in the domain name service (DNS) by contacting its local name server. The originating host’s mail transfer program (sendmail, for example) will probably its own internal calls for this task, but a human being performing the same task might use the dig mx command (the info in this example is phony, made up by me for purposes of illustration):

[G4733:~] rconner% dig mx chinchilla.tv

; <<>> DiG 9.2.2 <<>> mx chinchilla.tv
;; global options: printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 60810
;; flags: qr rd ra; QUERY: 1, ANSWER: 4, AUTHORITY: 2, ADDITIONAL: 3

;; QUESTION SECTION:
;chinchilla.tv.              IN   MX

;; ANSWER SECTION:
chinchilla.tv.       14400   IN   MX   10 mx1.friendly-isp.com.
chinchilla.tv.       14400   IN   MX   10 mx3.friendly-isp.com.
chinchilla.tv.       14400   IN   MX   20 mx4.friendly-isp.com.

[remaining output snipped]

Our dig mx lookup tells us that the three hosts highlighted in blue are registered within DNS as being the mail hosts (MXs) that are officially assigned to receive mail messages directed to the chinchilla.tv domain. In other words, if we want to send e-mail to an address in the chinchilla.tv domain, we simply transfer the mail to one of these three hosts in the friendly-isp.com domain; this MX host will complete the delivery for us.

Opening the connection to the MX

Now that the originating host has a list of MX hosts for the recipient, it can just pick one and open a standard socket connection (an IP transmission path between two machines) on IP port #25, the well-known port for SMTP service.

If the MX host is online, and it has a mail transfer agent (MTA) software application up and listening on port 25, then this MTA app will set up its end of the socket connection and return a greeting message (i.e., it “picks up the phone”):

220 mx1.friendly-isp.com SENDMAIL ready

This message officially marks the start of our SMTP session. The code number (220) at the beginning tells the originating host that the SMTP service at the MX is ready to receive mail data. The originating host usually looks at just this code in order to figure out what to do next; the rest of the information is interesting, but not particularly useful to us at this point.

HELO there! — Signing on with the MX

Having gotten the go-ahead from the MX, the originating host can start by (maybe) identifying itself, using the SMTP HELO command followed by its host name.

HELO toothfairy.org

As the command suggests, the originating host is simply saying “Hello, I’m a host named toothfairy.org.” Is this true? We don’t know yet. This is the first of several crufty bits of SMTP that we will encounter: SMTP does not require this HELO name to be valid, to trace back to an actual host, nor even to look like a mail host name. Most receiving hosts will simply accept the HELO name without question. For example, most or all of the following HELOs would be permissible under minimal SMTP rules (although a particular mail host might decide to reject some or all of them according to its own programming):

HELO "I am a spammer"    || a string
HELO <fake pharmz pimp>  || another kind of string
HELO aol.com             || a bare domain name (not a host name)
HELO grow.your.penis     || looks like a host name
HELO 87.65.43.21         || looks like an IP address
HELO -12385745           || huh what??

Fortunately, the MX does not have to rely upon the HELO command as an accurate identification of the host that is trying to send the mail. In fact, one of the first things that most MX hosts will do is to retrieve the IP address (say, 12.34.56.78) of the originating host from the data supporting the socket connection (this can be obtained by basic system calls, and cannot be forged or disguised by the originating host). The MX may then try a reverse DNS lookup on this address; again, it will use its own system calls for this, but a human could use the host command to perform the same task (as in this phony example):

[G4733:~] rconner% host 12.34.56.78
78.56.34.12.in-addr.arpa domain name pointer gus.evil-isp.foo.

Here, we see that the host name for this IP address isn’t even close to toothfairy.org. You might think that the MX would be entitled to shut down the connection at this point (since the originating host appears to have lied about its identity), but things aren’t this simple. The originating host may in fact be under the control of a lying spammer, but it’s also possible that:

For these reasons, apparently, most mail hosts simply accept bogus HELOs rather than refuse potentially honest e-mail messages. The MX will have other means to check the bona-fides of the orignating host, as we will shortly see. For now, the MX simply acknowledges the HELO in a response with a status code of 250 (which essentially means “OK, go ahead”):

250 mx1.friendly-isp.com Hello, pleased to meet you.

Checking spam blocklists

On the other hand, before giving the code-250 message above, the MX might be programmed to check the IP address of the originating host in one or more spam block lists; these block lists form a sort of “DEW line” against spam, allowing the MXs to reject any and all connections from suspected spam sources. In other words, consulting an accurate and well-maintained block list enables the MX to stop spam from being accepted and delivered to users, who thereby are relieved from having to figure out how to filter or block it themselves.

There are lots of block lists operating on the internet today, maintained by various individuals, organizations, and private companies. Many can be used free of charge, but others require payment (or contributions) for use, particularly when the users are large mail operations. A few of the more famous and popular block lists today are the Spamhaus Block List (SBL), the SpamCop Blocking List (SCBL). and the Composite Block List (CBL). These sites all have web-based lookups that human visitors can use to check addresses of interest. Jeff Makey at the San Diego Supercomputer Center has a listing of links to many other block lists, and Al Iverson’s DNSBL website (http://www.dnsbl.com/) also provides more detailed discusson of how various block lists work.

Block lists vary in their focus:

The best lists are operated in an automated or semi-automated fashion by teams of careful and ethical professionals, and provide easy means for innocent parties to remove themselves; others are operated by individuals who are engaging in some capricious and highly-subjective axe-grinding. It’s up to the user of a block list to decide whether that list’s priorites coincide with his own.

It would be way beyond the scope of this page (and beyond my expertise) for me to describe how MTA software is set up to use block lists, but generally (in the case of DNSBL-style lists) the MX will perform an nslookup-style lookup of the originator’s IP address at a specific server operated by the block list, and will then get a rapid, simple response from this server indicating whether or not the address is on the block list. Note that it is the IP address that is looked up, and never the host name given in the HELO (which is trivial to forge and therefore not trustworthy).

If the address appears to be clean (i.e., the block list has no records for it), then the MX can give the code-250 reply shown above. Otherwise, it has the option of refusing the mail and shutting down the connection to the originating host with a message like this:

550 Address blocked -- see http://xyz-abc.net/dnsbl for info.

The text part of this message, like those of other SMTP replies, isn’t usually used by the originating host, but it can be retained in the originating host’s log files. The text may help an honest system administrator to figure out why his customers’ mail isn’t getting out, and what he needs to do to fix the problem (i.e., to talk to the folks at the “xyz-abc” block list about getting his addresses removed).

Who’s it from?

Assuming that the originating host passed the block-list checks and got the code-250 reply from the MX, the ball is now back in its court. It will generally start by identifying (maybe) the originator of the message using the SMTP command MAIL FROM:

MAIL FROM:alligator@reptiles.be

As in the case of the HELO, SMTP does not require this address to be valid or correct. Since the MX does not “peek” inside the message body (which, anyway, hasn’t yet been sent at this point), the MAIL FROM address is the only indication it has of the original sender’s identity, but this address is trivial to forge. In fact, this is the point at which spammers will “inject” made-up or stolen e-mail addresses in order to deflect responsibility for the mailing.

This may seem like an obvious flaw in SMTP, but we should consider that SMTP is based on the model of postal mail: when you send a letter or postcard, you do not have to provide a correct return address (or, for that matter, any return address at all). SMTP, above all, is simply a protocol that enables the transfer of mail, not necessarily the secure and fully-validated transfer of mail.

These days, however, most MTA apps will do a rather minimal sort of check on the from-address: if the address doesn’t look like an address should look (i.e., a user name followed by “@” followed by what looks like a valid domain name), the MX may end the session. For example, given the following:

MAIL FROM:chicken.gizzards

the MX could reject the mail and end the session with a command like:

550 MAIL FROM address must resolve

Otherwise, if the from-address looks legit, the MX will reply with another code-250 message:

250 alligator@reptiles.be ... sender OK

Who’s it to?

With the MAIL FROM out of the way, the originating host will then identify the address to which the message is to be sent, using the RCPT TO command:

RCPT TO:harry@chinchilla.tv

Here at last, the originating host is required to provide information that must be valid — or, at any rate, it must be the actual address to which the sender wants the mail delivered. This is the only point in the mail handoff at which the recipient is identified, so if the address isn’t valid, the message won’t go through.

Checking the to-address (or not)

Now that the MX has the intended to-address in hand, it must do some checking to find out whether it can deliver the message. It must first ensure that it does not have to make an external relay to send the message; then, it must determine whether the address exists and is allowed to receive mail.

Or else, as we will see, it might simply punt on one or both of these checks.

Relay check

First, we must check whether delivery to the to-address would require making an external relay.

Each MX has information in its configuration (or “rulesets,” to use sendmail parlance) that tells it which domains it is directly supporting (this is the converse of the information we got above from dig, telling us which MXs support a particular domain). If the MX is offered a message bound for one of its “customer” domains, it simply accepts the message and consults its rulesets to find out what else it needs to do to deliver the message (e.g., to which MDA it has to send the message). In our example here, the message is bound for a user at chinchilla.tv, which is one of the domains assigned to this MX, so all is well.

But, consider the following:

RCPT TO:somebody-else@another-isp.net

Here, the MX will quickly find that another-isp.net is not in its ruleset—in other words, not one of its assigned domains. Therefore, if it wanted to deliver this message, this MX would have to look up the MX for another-isp.net (which would probably be another host somewhere halfway around the world) and then try to drop the message onto that MX. This is known as an “external relay.”

SMTP allows external relaying; in fact, this capability was probably very useful back in the days when internet connectivity wasn’t nearly as complete and voluminous as it is now. External relaying would also not have posed much of an abuse problem in those days, when the internet consisted of a few dozen universities, military installations, and private companies who could all trust each other not to send spam. Things have changed just a bit since those days, and indiscriminate relaying is no longer considered good practice. Mail system adminstrators have buttoned down their MXs to forbid them making external relays; in fact, most MTA software (like sendmail) comes pre-configured “out of the box” to prohibit relaying (and this ban can be difficult to lift even when you need to lift it for special reasons).

As a result, if the MX finds (as in the another-isp.net case) that delivering a message would require an external relay, it will simply hang up on the originating host with a suitable message:

550 We do not relay.

Address check

Now, the MX has an opportunity to decide whether the to-address is allowed to receive mail; or, the MX can just pass the buck for this task to some other machine.

The SMTP protocol is designed to allow MX hosts to check the to-addresses and then provide immediate feedback to the originating host as to whether or not the message will go through. For example, if an MX somehow determines that the to-address doesn’t exist within its domains (i.e., no one is using it now), then it could respond with:

550 No such user

If the address is in use, but the MX finds that the user has exceeded his online storage allocation (i.e., it’s been awhile since he downloaded and deleted his mail), then the MX could respond with:

552 Storage allocation exceeded

Either of these would be a “hang-up” message, after which the MX will terminate the session with the originating host.

If the address exists and is “deliverable,” then the MX can return an appropriate code-250 response:

250 harry@chinchilla.tv ... recipient OK

These checks are easy enough to do for an MX that serves a small community of users at a single domain; however, things get much more complicated if the MX serves many independent domains or does a high volume of mail transfer (some big domains, like AOL or Comcast, could easily be offered hundreds of incoming messages per second); in such cases, there simply isn’t time for the MX to fish through databases containing millions of user account records or to query remote domain hosts for the status of their user accounts.

For this reason, many MXs don’t bother making these checks; instead, they will simply accept the messages and pass them on to other “back-office” hosts called mail delivery agents (MDAs); it will be up to the MDAs to determine whether the messages are deliverable. Since the MDAs aren’t under the same kind of time pressure, it is easier for them to make the checks. If the MXs and MDAs are set up properly, users won’t notice any delays in receiving their incoming mail.

Unfortunately, however, by the time the MDA receives the message, the original SMTP session is long over with. The MDA doesn’t have the option of refusing the message directly if it is not deliverable; instead, the best it can do is to send a “bounce” message back to the from-address to inform the sender that the message is refused.

This behavior of the MDA gives rise to an annoying secondary effect of spam: the misdirected bounce, or “blowback.” As we saw above, you don’t have to give a valid from-address to send an e-mail. If a spammer uses the address of an innocent party as the MAIL FROM address, the MDA will send any and all bounces to this address. That party will probably get confused or upset about the sudden appearance in his inbox of several hundred cryptic bounce messages (most of which probably won’t quote the original message, so he’ll have no way to see what was done in his name).

Finally, the message

Once the originating host has indicated the from-address and the to-address(es) for the message, and these have been vetted by the MX, pretty much all that’s left is to send the message packet itself (you do remember my mentioning the message packet, don’t you?), using the SMTP DATA command:

DATA

In response to the DATA command, the MX will return a code-354 response:

354 Start mail input, end with "." on a line by itself

The originating host will then simply send out the mail packet, ending with the indicated end-of-message marker:

Received: from WHOCARES by MYOB (GOO) with ESMTP id 12345;
Wed, 28 Dec 2005 22:51:03 -0500

Date: 28 December 2005 23:14:29 -0500
From: savebigondrugs@hotmail.com
To: not-you@nowhere.xxx

Save big on all your fake meds at http://woifjapsoidjf.poisonpills.nl

To be removed, visit http://go-fsck-yourself.com
.

Note that the first line of this message is a header line; in this case, it’s an obvious forgery. The transfer implied by this line never took place; the spammers simply made up the info and planted it in the message in order to mislead. It is possible to make these forgeries look a bit more authentic than what is shown here, but generally spammers don’t bother.

A blank line follows the fake Received: line to indicate the end of the header per RFC-2822. Note also that the To: and From: addresses here are not the same as those given in the SMTP transaction, and probably not valid; this is perfectly OK in SMTP, and extremely common in spam.

For the sake of simplicity, I made this a plain-text (non-MIME) e-mail; if the message were MIME encoded, then this packet would include any MIME header info, part boundaries, and (possibly) MIME-encoded data.

So long, and thanks for all the spam

If the originating host has sent all of the data properly, and ended with the “.” mark, then the MX will return a code-250 response, possibly including the unique message ID it has assigned:

250 23fsea9fseq Message accepted for delivery

From this point, the message has now been successfully handed off and has become the responsibility of the MX. The originating host can go back and start another mail transfer (with another MAIL FROM command, say), or it can simply sign off:

QUIT

After which the MX will acknowledge and break the connection:

221 mx1.friendly-isp.com Closing command channel

Recording the transaction

One task remains for the MX before it can forward the message to an MDA or to the user’s mail queue: to record the details of the handoff in the message header. To do this, it simply adds a new Received line to the top of the message packet. The packet now looks like this (the new line is highlighted in yellow):

Received: from toothfairy.org (gus.evil-isp.foo [12.34.56.78])
by mx1.friendly-isp.com (SENDMAIL) with SMTP id 23fsea9fseq
for harry@chinchilla.tv;
Wed, 28 Dec 2005 23:41:22 -0500
(envelope-from alligator@reptiles.be)
Received: from WHOCARES by MYOB (GOO) with ESMTP id 12345;
   Wed, 28 Dec 2005 22:51:03 -0500

Date: 28 December 2005 23:14:29 -0500
From: savebigondrugs@hotmail.com
To: not-you@nowhere.xxx

Save big on all your fake meds at http://woifjapsoidjf.poisonpills.nl

To be removed, visit http://go-fsck-yourself.com

This new line ties together most of the info we saw in the SMTP transaction above. The exact format of the line may vary from one MTA to another, but in general most of this information will be available.

Can the leaks in SMTP be plugged?

What this very long page has shown is that there are several weak spots in the mail process that allow spammers to sneak in their spewage. These include the following:

PROBLEM: The originating host does not have to provide its real host name in the HELO command.

This may not be a problem worth fixing. As we noted, there are reasons why an honest mail host might give an incorrect HELO. If we required the originating host to give a proper HELO name (i.e., one that will survive a both-ways DNS lookup), then subtle DNS misconfigurations or other problems might prevent honest mail from going through. As we have seen, the MX has other ways to check the identity of the originator if it is so inclined.

PROBLEM: The originating host in an SMTP transfer does not have to be an “official” mail server.

We could probably knock out quite a bit of spam at one stroke if we could somehow prevent non-mail-host computers from sending mail directly to MXs. This would stop spammers from using direct-to-MX mailings from dialup connections or open proxies.

Unfortunately, however, it would be very difficult to figure out how to let “real” MTAs identify themselves in some fashion that couldn’t be emulated by spam machines. We could perhaps bury some sort of certificate data within DNS, but this isn’t what DNS was designed for — DNS simply tells you where things are, not necessarily what they are or whether you should trust them.

Some folks might propose some sort of central “registry” of mail hosts to take care of this issue: the MX could look up the host in this list and then reject the mail if the originating host didn’t appear. This wouldn’t be (much of) a technical problem, but it would be an administrative and internetworking nightmare. Who would run such a service? How would we pay for it? How could the operators of such a registry prevent spammers from getting on it? What would prevent these operators from becoming “mail nazis,” arbitrarily denying access to legitimate users who want to run mail hosts? Such a body would truly have the world’s e-mail system by the throat.

Really, there is no good reason why you shouldn’t be able to run SMTP transfers from any machine you want (at least, not as long as you don’t send spam). To restrict the use of the internet in this fashion would be going against the grain of years of lassiez-faire internet practice.

PROBLEM: The originating host does not have to give the proper from-address for the message in the MAIL FROM command.

As things stand now, SMTP is modeled after postal mail, which does not require return addresses on mailings (undeliverable mailings won’t get returned to you if you don’t provide a return address, but you may not care). There are plenty of honest mails sent today that don’t really need to have a return address (e.g., automated reminders and update messages).

If we wanted to fix this problem, we’d have to set up a system whereby a valid from-address would have to be “tightly bound” (as the programmers say) to each message; we would then need a mechanism to make sure that the binding was correct, and that the e-mail address was valid, and these seemingly simple matters could very quickly become very complicated.



 home | legal stuff | glossary | blog | search

 Legend:  new window    outside link    tools page  glossary link   


(c) 2003-2006, Richard C. Conner ( )

02128 hits since

Updated: Wed, 28 Jun 2006