Legend: new window outside link tools page glossary link
Spammers use pretty much the same protocols and mechanisms to send their mail as do you or I. However, the difference is that they use special tools and network resources (often hijacked or misused) to send out thousands or millions of copies of the same message in a single spam run. Such a gigantic amount of network traffic is very easily spotted and traced (and perhaps stopped) if it goes through the normal mail-host mechanisms that normal people use. Thus, the spammer must find ways to send this volume of mail that will not attract unwanted attention from network administrators. This gives rise to the techniques that I’ll describe on this page.
First, let’s get a “baseline” by looking at how a typical normal e-mail message is transmitted. This transmission is accomplished by a series of relays from one computer to another, the first of these being your own computer, and the last one being that of the recipient of your message. Each of these relays is carried out using a particular procedure (or protocol, as the network folks like to call these). Although this relaying business sounds as though it might be very poky and inefficient, it is in fact quite fast given good network connections and well-managed host computers. Not surprisingly, perhaps, the e-mail delivery process also bears a great deal of resemblance to the manner in which postal mail is delivered, albeit at a much faster pace.
The following diagram may be helpful; it illustrates the path taken by a typical e-mail (not spam) from you to a friend (start at the lower left corner in “your domain”):
When you compose an e-mail and then hit the “send” button in your mail program, your computer contacts a computer on the network (usually one belonging to your ISP, your employer, your school, etc.) called a mail transfer agent (MTA). This MTA is known to your computer because you entered its name when you set up your mail services (or perhaps you used an automated setup script of some sort to do this). Your computer generally uses a procedure known as the Simple Mail Transfer Protocol (SMTP) defined in IETF RFC-2821 to transfer your outgoing message to the MTA.
Once the MTA has the message, it will look at the to-address you supplied and then figure out (using DNS) which mail host elsewhere is accepting mail for that address. That host is also an MTA, but in this role it is known as a mail exchanger (MX). Your MTA then contacts the other guy’s MX and uses the same SMTP procedures to transfer the message to that host.
We’re not quite done, but almost: the MX in your pal’s domain will usually hand off the mail (typically using SMTP again, or in some cases a proprietary protocol) to yet another mail host, this one known as a mail delivery agent (MDA); the next time your buddy checks his e-mail, his computer will contact this MDA and download your message; this time, however, a different protocol is used to transfer the mail: either the Post Office Protocol (POP3) or the Internet Message Access Protocol (IMAP).
You may be seeing a pattern here; generally, internet hosts that want to send e-mail messages all tend to use SMTP to send it (there are exceptions, but not commonly found on the public network). In the hypothetical message shown above, for instance, there would be at least three separate SMTP handoffs or relays:
This path is recorded in the header of the message, a block of data that neither you nor your correspondent would normally see. The path is documented as a series of Received: lines, one for each SMTP handoff.
If we checked the header of the message as your buddy received it, then, we would expect to see as many as three Received: lines, one for each of the handoffs above. In reality, we might see fewer than three Received: lines for reasons such as the following:
Notwithstanding these variations, there should always be at least one Received: line in the header: the one that shows the handoff to the recipient’s MX (that’s transaction #2 in our list above). It turns out that this is the critical information you need in order to determine the origin of spam. Now, hold that thought for the moment, and read on to see how spam mail differs from normal mail.
As I said above, spammers send their mail quite a bit differently than you or I, mainly because of the need to keep their operations as invisible and untraceable as possible. They mainly use two techniques: open relays and direct-to-MX. We’ll look at these here.
If you have spam to send, why not just pump it all through your own dial-up or broadband internet service? Early on, that’s exactly what most spammers did. However, when they found out they’d get in trouble if they sent millions of bulk-mails through their own ISPs’ facilities, they figured out a simple ruse: they would steal service from some other ISP’s facilities.
To do this, they would find (through trial and error, perhaps) some other MTA belonging to some other outfit somewhere else that was willing to accept mail from them and then deliver it to you. Such an MTA is known as an open relay, because it is open for use by any sender of mail. The following shows the path for a typical open-relay spam (start at bottom left from the “spammer-controlled computer”):
Since the open relay was willing to accept mail from anywhere for delivery to anywhere, it didn’t really matter that the open relay wasn’t in the spammer’s own domain. And, unless the administrator of this open relay host somehow got suspicious about who was suddenly using his machine to relay thousands of messages, there was no reckoning for the spammer to face.
However, as spam mail grew to become a major internet nuisance, mail host administrators began tightening up access to their machines, all but closing the open-relay pipeline. It’s now consdered best practice for a mail host to accept mail for transmission only from computers within its domain. Those mail hosts that do still permit relaying from outsiders will usually require the sending host to validate itself (using a username and password) before accepting the mail for relay. It has taken awhile (particularly in the developing world, where folks have other fish to fry), but most mail host admins are now on board with the program. Also, newer distributions of mail-host software tend to come configured out-of-the-box to discourage open relaying. For these reasons, open-relay spam has greatly diminished over the past few years.
When you look at an open-relay spam, you will see three trustworthy Received: lines corresponding to three separate SMTP handoffs:
Once the open relays started drying up, it didn’t take long for the spammers to find another path to your inbox. They made it a shorter path by cutting out the intermediate MTA altogether, and sending the mail directly from machines under their control to your domain’s MX host. The so-called “direct-to-MX” model is now used for the vast majority of spam. It takes advantage of the fact that spammers can now use bulk-mail software capable of looking up MX records for itself, so they no longer need the services of intermediate MTAs. Here’s a diagram of how direct-to-MX works (again, start at the lower left). You can see how much simpler the path has become.
Here, some computer under the spammer’s control simply looks up the MX associated with your e-mail address and drops the message directly onto that MX, whence it winds up in your inbox. If that spammer-controlled computer is a “zombie” or open proxy (i.e., a malware-infested machine belonging to an innocent third party), it will be very difficult to trace that spam any farther back than this hijacked machine; the zombie’s ISP may eventually get around to shutting down this particular spam source, but there will be many, many others to take its place.
When you examine a direct-to-MX spam, you will see two trustworthy Received: lines:
Either of the above methods of transmission provides the spammer with the opportunity for some camouflage: specifically, it allows him to tamper with the mail header. This tampering can take a number of forms, and can serve to confuse novice spam-hunters. It is important to note, however, that the spammer cannot (as yet, anyway) conceal the most important information: the IP address of the host that left mail with your domain (which comes from the Received: line that documents the handoff to your MX).
Header forgery is against the acceptable use policies of nearly all ISPs that have such published policies. It is also a violation of U.S. federal law (specifically, the CAN SPAM act, which bans the “material falsification” of mail headers). It’s also quite easy to spot, and provides more-or-less positive proof of the ill-intent of the spammer (since there’s really no purpose for header tampering other than to deceive). Here are the principal forms of header forgery used in spam mail:
In SMTP, the HELO command is used by the sending host early on in an SMTP handoff to identify itself by name to the receiving host. As it happens, SMTP does not require the HELO name to be the host’s actual name, or even to be in the correct format for a host name. It’s extremely common, if not universal, for hosts sending spam to use fake HELO names (like “aol.com”, “BITE_ME”, and so forth). Fortunately, the receiving host does not have to rely upon the HELO name to know who is trying to send it mail; although you might think that checking a HELO name against its address could be a good way to stop spam, very few spam filters use this technique (mainly because it is prone to reject non-spam mail from innocent but mildly-misconfigured mail hosts).
Our next bit of forgery is like the old spy-movie cliché give the secret agent a false identity, complete with a passport containing lots of visa stamps from places he’s never actually been. In this case, the spammer can actually insert false Received: records into the header before he hands it to the open relay host or the MX. If properly done, these records can fool inexperienced spam investigators into attributing the blame elswhere than where it belongs.
If you are patient and careful, you will have little trouble spotting such forgeries; they will show up as “breaks” in the chain that links the ultimate message source to the ultimate recipient. See my page on finding spam mail hosts for more information on how this is done.
A good spammer never misses an opportunity to lie, and so you will often see other less consequential fudging like this:
It used to be popular among some really low-as-whale-sh*t spammers to troll through other people’s websites to find unsecured mailback scripts; by manipulating the data submited to these scripts, they could get them to send mail to anybody, not just the website owner. These mails were not traceable any farther back than the website from which they originated, so they provided pretty good cover for the spammer even if they didn’t give him much to work with (besides a few lines of plain text).
You can find an example of this stuff here, although this is fortunately of no more than historical interest at this point, since modern mailback scripts have been armored to prevent such abuse. I don’t find much mailback spam coming into my inbox anymore these days.
Legend: new window outside link tools page glossary link
|(c) 2003-2008, Richard C. Conner (
10501 hits since March 28 2009
|Updated: Fri, 13 Jun 2008|