Legend: new window outside link tools page glossary link
Now that we’ve opened the hood on the body of the spam message, we can peek underneath for features of interest. At the least, these will include web URLs and other network resources that can be included in spam reports. We can also find other features that prove the ill-intent and deceitfulness of the senders of this mail.
Most “merchant” spams will refer you to websites where the products are sold; a few may refer you to still other sites to “remove” (ahem) your address from their lists. You may report any of these that you find in a spam e-mail body, provided you are sure that these actually are connected with the spam pitch.
To repeat, you should be sure that the web links you report are actually involved in the spam pitch (i.e., they are used to sell a product, collect form data, or gather “list removal” requests). Beware of unrelated web links included as “bait” for inexperienced spam hunters (who will report these and then possibly get nasty, discouraging notes from the operators of innocent websites). You may have to visit some of these bait sites yourself to ascertain whether they are truly involved with the spam.
You will find the web links anywhere in the plain-text or HTML parts of a message body. You should be able to determine by context whether they are reportable links (it helps to be able to read HTML markup). The links may be found as the HREFs of actual hyperlinks such as:
or they may just be part of the message text (especially in text-only messages):
Along the way, you may also find spammers playing tricks with their website links such as these:
Many spam recipients can be taken in by a shamefully simple trick: putting a “friendly” URL in the anchored (visible) text of a hyperlink that actually points to another completely different (and hostile) URL. This is a dodge much beloved of phishers. Consider the following bit of HTML markup:
<p>We have detected
possible misuse of your account.</p>
<p>Please log in at
<a href="http://evil.phisher.foo">https://www.yourbank.com</a> to change your password.</p>
When you view this in your mail program, it will look something like:
We have detected possible misuse of your account
Please log in at https://www.yourbank.com to change your password.
You may be fooled into using the link provided, but what you may not know (unless you look carefully before clicking) is that you will be going not to yourbank.com, but to a phisher’s website (in yellow highlight) that may be phonied up to look like the login page for yourbank.com.
It is also worth noting that the anchor text (https://www.yourbank.com) in this example implies that this is a secure HTTPS link, but the actual link to http://evil.phisher.foo is plain old HTTP, and therefore not protected by encryption.
In these cases, the only link that you should report would be the one in the HREF (i.e. evil.phisher.tv). The link to yourbank.com is not really a link at all, just camouflage.
Spammers sometimes will exploit the seldom-used “user-ID” feature of URLs to obfuscate or disguise their links.
According to IETF RFC 3986, which defines URI (URL) syntax, you can include a user ID (and even a password) in a URL. This might be useful if you are using a browser (or perhaps curl) to fetch files from an FTP site or some other resource that requires a user name and password. These features are almost never used with web URLs, however, because the idea behind the web is to make information public without requiring user IDs or passwords (besides which, transmitting user names and especially passwords “in the clear” without any encryption or other protection is a big security risk).
Take a look at this URL, for example:
To an untrained eye, this looks like a link to a host named “fine-watches.yahoo.com,” (with some forgettable numeric gobbledogook after it); the misuse of the Yahoo domain name confers some credibility to the link. In fact, however, the highlighted portion (in yellow) is simply a user ID field, denoted by the trailing “@” sign, while the actual web host is at the raw IP address 10.1.2.3, which presumably has nothing to do with Yahoo. The web server at 10.1.2.3 will simply ignore the user-ID stuff, which functions here as camouflage. The site you need to report is 10.1.2.3 — again, fine-watches.yahoo.com@ is not a link, just camouflage.
It was once popular among spammers to disguise one URL as another by using a “public redirector” link of the kind that many search engines (and other businesses) provide for their own internal tracking purposes. For example:
See more watches at http://rd.yahoo.com/?http://cheap-crap.ru
Here, “rd.yahoo.com” is a host that does nothing but redirect you to whatever site is named after the question mark. To the untrained eye, however, it looks as though the watches are being sold by Yahoo. Once again, the spammer is using Yahoo as camouflage; in effect, he’s “bouncing” you off Yahoo to get to his site. The host you want to report in this case is cheap-crap.ru.
This trick was rampant some years back, but Yahoo (for one) has spoiled the game by including a warning message whenever such links are used from outside its domain (try this one).
Spammers can disguise their URLs with deliberate over-application of a standard encoding technique commonly used with URLs.
Certain characters in URLs have special meanings (e.g., : / + ? @ ). If these characters were to appear in in a URL outside their normal meanings (e.g., in a CGI call), the software trying to process the URL could become confused. So-called “URI encoding” (see RFC3986) is used in these cases to “escape” the special characters with harmless constructions that can’t be misinterpreted. These “escaped” characters can be converted back to their normal values once the URL has been digested.
You can actually encode huge portions of a URL even if they don’t require such treatment. That’s how you can get URLs that look like this:
This link uses “www.rolex.com” as a user ID (note the “@” that follows it, marking it as a URL user-ID field). To most folks, however, this would look as though www.rolex.com were the actual host name. We know better: the host is actually encoded in all of the %-escapes after the @ sign. Using the URL-deobfuscator tool at http://www.dnsstuff.com/, we can easily decode this stuff:
So we know now that the site we need to report is watchcrooks.cn.
Spammers can mangle the IP addresses in their URLs to make them difficult for humans to figure out, while still allowing browsers to find them. Consider the following URL:
We know enough about host names to know that 3519447827 doesn’t seem to be one, but it doesn’t look like a “dotted-quad” IP address either. What is it?
In fact, it is an IP address, but one that has been rendered in an alternative form. Rather than the “dotted-quad” form, which displays each of the four octets separately, this form mashes them all together into one big 32-bit integer value. You could untangle this yourself using a lot of calculation, but it’s easier just to paste it into the DNS Stuff de-obfuscator tool to get:
...which just happens to be the IP address of this site (at least at this writing):
[G4733:~] rconner% host 188.8.131.52
184.108.40.206.in-addr.arpa domain name pointer www.rickconner.net.
By the way, the obfuscated URL http://3519447827/spamweb/ will actually work in many browsers (provided, of course, that my site hasn’t been moved to another IP address in the meantime).
Spammers often direct their web traffic to alternate connections. This isn’t necessarily used as a means to disguise the URL, but it does tend to indicate something strange going on. For example:
This URL directs us to IP port #3591 on the host chickenlips.scrounge.foo, rather than the standard port #80 for unencrpyted, normal web service.
Why would a spammer use alternate port numbers? Possibly because he’s sharing a host that already has web service on port 80, and he wants to run his own private web server and keep it somewhat out of sight. This server would probably have its own private logs through which the spammer could check his traffic.
We should note that using an alternate port number in a URL is not automatically suspicious behavior; however, it is sufficiently unusual to cause one to raise an eyebrow. The host to report in this case is simply chickenlips.scrounge.foo; your report should include the alternate port number used by the spammer.
As we noted on another page, the spammer can’t do very many tricks with a plain-text e-mail body. He can still use a fairly crude form of disguise for his URLs, however:
To see our line of fine replica
watches, paste the following into your browser:
You may report any such web host (wer39sksegi.cheezy-watchs.foo in this case); the fact that it isn’t formatted as a clickable link doesn’t make it any less culpable.
Here’s another example of a type of “DIY” link more often seen these days:
This URL is probably safe from routine detection by spam filters because even though it has “http://” in front, it has been munged so that it cannot be parsed by a machine as a valid URL. If you cared to, you could report the site at morepillz4u.foo.
One occasionally-used way to disguise URLs in an HTML-format spam message is provided by the <BASE> tag, found in the <HEAD> element (near the top of the HTML markup). This tag lets a web designer break up hyperlinks within his page into two pieces: the “base” part, and the “relative” part; this basically saves him the labor of retyping the base portion over and over within the page. For instance, here’s a fragment of HTML markup for a spam message body:
. . .
<base href="http://green-wrists.foo/"> << the base part
. . .
<p>Visit our <a href="showroom.php">showroom</a> to see our line of fine timepieces. << "showroom.php" is a relative part
. . .
<p>Click <a href="remove.php">here</a> to remove yourself.
^^ "remove.php" is a relative part
Here, http://green-wrists.foo/ is the “base” of all relative URLs in the markup. Two “relative” links are provided, to showroom.php and remove.php. The full URLs in each case are given simply by putting the base in front of the relative:
Many spam filters won’t catch the <BASE> tag; if you find it yourself, and if it appears to be used within the message (twice, in this example), you can report the web host identified in the <BASE> tag.
Suppose that, inside every piece of postal junk mail you received, there was a tiny radio transmitter. Suppose further that, as soon as you brought the mail into your house, the transmitter would kick on and relay your acceptance of the mail back to the company that sent the mail. Sounds too much like science fiction, like something from a long-lost Philip K. Dick novel? Well, maybe so in the world of postal mail, but in the world of marketing e-mail, this practice is not only possible but absolutely rampant. We refer to these “virtual transmitters” as web bugs or (as I first heard them called) beacon URLs.
A web bug is generally a standard hyperlink of one sort or another that has been “tagged” with information that identifies the recipient of the mail (either by a plain-text e-mail address, or a unique identifying number or code). The purpose of this link is to leave a trace (including the tagged info) in the web-bugger’s web server log files. When you receive a web-bugged message and click on the link (or, in many cases, if you just open the message without clicking on anything), your mail program will transmit this link back to the web-bugger’s server, which will record the information in its log files. When the web-bugger reads the log files, he’ll find your code or e-mail address, indicating that the message was successfully delivered to, and possibly read by, the recipient.
Yes, of course the web-bugger already has your e-mail address (or else you wouldn’t have gotten the mail), but he doesn’t necessarily know that your address works, or that you have any interest in his pitch. The web bug helps him to answer these questions, since it proves that you received the message or followed one of its links. Also, he can use this technique to “wash” his mailing list, removing those addresses that don’t appear to work (i.e., those that don’t later appear in his web server logs); a dishonest web-bugger can even make money by selling or renting his laundered list to spammers and other equally-unscrupulous mailers.
Many legitimate businesses use web bugs in their mailings; I personally don’t consider it to be a particularly ethical practice (since it represents undeclared spying), but at least these businesses can usually claim existing relationships with the recipients of their mail. Clearly, however, for spammers to use web bugs is a particularly scabrous practice. Fortunately, the use of web-buggery (ugh!) in spam is confined to a small group of well-heeled spammers, such as those who run their own net blocks or who send spam on behalf of quasi-legitimate “mainsleaze” businesses. The usual variety of warez-pills-’n-watches spammers don’t bother with web bugging, possibly because they are ill-equipped to deal with the receipt of thousands or millions of web-bug responses.
As I mentioned above, there are two types of web bugs.
Now, let’s look at the main forms that these web-bugs take:
The spammer can use a straightforward <A> hyperlink with tagged data to collect your e-mail address when you click on the link. For instance:
Click here to remove your address from our lists
In this case, the link is used to (ha ha) “remove” your address from the spammer’s list. Note that your address is included in the call, so it will appear in the spammer’s server logs, as well as being made available to the program “remove.pl.”
This kind of web-bug does not work unless you click the link. If you’re curious to see what’s going on, you can copy the link (in this case, the part in the yellow highlight), trim off your address, and load it in your browser; you may see the spam pitch, or you may get some sort of error page (because the program didn’t get the data that it was expecting).
As I said, the “involuntary” type of web-bug will work silently as soon as you bring the message into view in an HTML-capable mail program or browser; you do not have to click anything in order for the tagged data to be sent. The most usual hiding place for an involuntary web bug is within a standard image link.
It’s common for commercial e-mails (even non-spam ones) to include images loaded from external servers. For example, an online merchant will use this feature to transmit pictures of his products to you. These images appear thanks to standard HTML image links (i.e., <IMG SRC="...">). It’s very easy to create a web bug using this mechanism. Consider the following image link:
<IMG SRC="http://zip.foo/images/pic39.gif?a=frank16%40nobody.zzz" HEIGHT="1" WIDTH="1">
This is an involuntary web bug: when the page is loaded, your mail program will follow this link and request the image using the URL shown; unless you’ve somehow turned off automatic image loading in your mail program (always a good idea) or you’ve disconnected from the internet before reading this message, this URL will wind up in the spammer’s server log.
Let’s take a closer look:
There are many variations on this theme. For example, the argument could be a “funny number” (a long alphanumeric code) that contains your address in an encrypted form, or that “points” to your address in some database maintained by the spammer:
<IMG SRC="http://choke.chicken.lv/image12.jpg?qw93s7ekxif347892" ...>
Or, the image link could contain a bogus anchor name:
<img src="http://weroisudfsd.gieerfuhdiry.com/pansy.gif#frank16-nobody-zzz" ...>
(this notation is “bogus” because in HTML you can’t anchor to a specific point inside a single image; neither the web server nor your browser will care, however, and the fetch and its arguments will be recorded in the spammer’s web logs).
Another way to force an HTTP fetch from an e-mail message is to specify an external stylesheet link. You do this with the <LINK REL="..."> tag in the head of the HTML markup:
<link href="http://sodifji.foo/x.txt?3s90jscvu" rel="stylesheet" media="screen">
This is a complete block of HTML markup, but it would display as a blank. It will, however, try to load the stylesheet at the indicated URL, even though the stylesheet may not be used at all within the markup. In this case, the stylesheet URL contains not a “*.css” file as we would expect, but another “tagged” link of the kind we saw above in the IMG-based web bugs. The HTTP fetch for the stylesheet, including the funny number, will be recorded in the web logs for sodifji.foo.
Content-based spam filters are getting smarter about scanning e-mails to look for spammy text. This means that the spammer must often resort to trickery (of the sorts listed below) in order to get past the filters.
It’s easy for a spam filter to check text when it appears as a string of character values from a known character set. It’s much harder to deal with the text if it is contained in an image (e.g., GIF, JPEG, or PNG), even though humans will still be able to read it.
Of course, including such an image in a spam e-mail poses a small logistical problem: how will the image file be delivered to the recipient?
The normal route would be to store the image on a remote web server, and then use an <IMG SRC="http://..."> link in the HTML markup of the message to fetch and position the image. The problems with this approach for the spammer are that (1) some folks’ mail programs may be set not to automatically load remote images in e-mail messages (an excellent way to stop web bugs as well as image display), and (2) the image link might go offline before the recipient opens his mail.
The usual workaround for this (particularly among stock, drug, and mortgage spammers) is to embed the image file directly into the message as a separate MIME attachment, and then to use a “Content ID” or “cid” link within the HTML (e.g., <IMG SRC="cid:...">) to place the image in the spam pitch. This means that a spam mailing that might normally be 1-2 kBytes in size might suddenly swell up to many, many times that value (because of the image file), but this apparently no longer poses the kind of risk and expense to the spammer that it once did.
A lot of spam these days is full of odd, unrelated text like this (which comes verbatim from fragments of Charles Dickens by way of a recent stock spam):
to kiss her fan again and shake it at
the sequester who was looking at us in a state of
half a crown I was got up in a special great coat and shawl expressly to do honour to that
Mr Vogt said not one word though the old lady looked to him as if for his commentary on
This represents the spammers’ countermeasure against one of the recent innovations that has made their lives more difficult: the Bayesian spam filter, which uses text analysis and historical information to accurately distinguish spam from honest mail.
When properly designed, coded, and maintained, Bayesian filters are very tough to beat — but, maybe not impossible to beat. Spammers imagine that they can make a message less likely to be detained by a Bayesian filter if they dilute its “spamfulness” with random, non-spammy text (just as you might be better able to hide that box of smuggled Cuban cigars in a big duffel full of dirty clothes than in a shopping bag from the duty-free store). This might have two effects: not only might it allow a given message to sneak by a filter undetected, but it might also have the long-term effect of making the differences between spam and non-spam messages less apparent to a filter (i.e., the “spam” and “non-spam” populations of messages on which the filter will be trained will start to look more and more alike). I don’t know whether this tactic actually works, but the spammers are certainly doing it to death these days.
The neutral text that spammers use could come from a variety of sources; some of it looks as though it may have come from old books (possibly via free e-texts from the Gutenberg Project or other sources); or, spammers will sometimes use recent articles from news websites. One recent spammer quoted lengthy National Weather Service forecasts. Some particularly puckish spammers may use programs that can generate random, quasi-intelligible text from sources such as those given above.
The spammer has several options as to how and where he can place the neutral text in his message. One trick is to use basic HTML tags to make the text very small or invisible:
<font size="1">tiny, tiny text</font>
<font color="white">white-on-white text</font>
In the first case, the text is rendered at “size 1”, which may be just a bit too small to be readable in many browsers (it will look like “flyspeck” on the screen). In the second case, the text is rendered in the same color as the background; this is sometimes called the invisible ink trick. Since some spam filters can check color tags and recognize when text is the same color as the background, spammers sometimes use HTML RGB codes to set the color to something slightly different from the background but still quite invisible; this is known, of course, as the nearly-invisible ink trick.
It’s also possible to conceal neutral text inside HTML comment tags;
<!-- blah blah yadda yadda -->
This trick won’t work as well in cases where the Bayesian filter is smart enough to ignore HTML comments (i.e., it won’t take the bait).
Sometimes, you’ll see where a spammer meant to include random text, but something in his software broke: instead of the text, you will see some sort of token that didn’t get removed and replaced:
The use of cascading style sheets (CSS) gives spammers further options for making things invisible; the most obvious is to change the visibility attribute of the text enclosed in a SPAN or DIV structure:
<span style="visibility:hidden">goo goo da da etc.</span>
Trying to decode CSS information is probably beyond the ability of most spam filters, so they will probably scan the text and include it in their calculations, even though the text isn’t visible.
On the other hand, one particular school of low-tech spammers doesn’t consider it necessary to bother hiding the neutral text from the recipient; it goes right into the spam message, maybe alongside or underneath a GIF image that carries the actual spam pitch.
According to current internet legend, researchers at Cambridge University in England postulated that humans have little trouble reading text even when the spelling is grossly botched, so long as a certain resemblance to the proper spelling is maintained. Some spammers apparently read the same postings as the rest of us back in 2003, since they’ve now taken to deliberate and selective misspelling in order to disguise their pitches. For example:
We truly hope our website will be a
one-stop Syber phua rm acy
and help you get their pree
Best stuff on Tigera only at $ 1.56/ puill, and all others too in same brackets
Softabs" is better than Pfizer Viiagrra
and normal Ci-ialis because:
- Guaaraantees 36 hours lasting
- Safe to take, no side effects at all
- Boost and increase se-xual performance
- Haarder e-rectiions and quick recharge
- Proven and certified by experts and doctors
- only $3.99 per tabs
Merge Your Deibts and owe to one
person and pay a small monthly.
Pay 55-75% leesser
ST0P Cried!tor Harassment
It’s worth noting here that the spammers tend to garble just those words that might have a high “spam index” — like “Viagra,” “debts,” “pharmacy,” “creditor,” etc. The theory here, I suppose, is that the human recipients will still be able to make out the messages (if only barely), while simple spam filters will be fooled. Messages like these may get past the filters, but they certainly don’t give the reader a very good impression of the advertisiers. Of course, if these people were marketing experts, they wouldn’t have to be sending spam.
Related to the creative-misspelling trick is another trend that’s cropped up in recent spam: the use of convoluted, awkward, or uncustomary words or phrasings (presumably to avoid the more familiar forms that might get caught in a spam filter). For example, here’s a spammer who’s made good use of his thesaurus:
Avail yourself of our online phramacuetical emporium and realize substantial reductions of value for your needful preparations.
Translated, this means “come to our website and save big on drugs you need.”
It seems to me that at some point this kind of stuff will get entirely too baroque and constipated for anyone reading it to be able to understand it (particularly the less-than-astute folks whom the spammers count on to respond to their pitches). However, I say let ’em keep trying, eventually they’ll mallocute themselves out of business.
You can see examples of this sort of thing in my sample spam analyses here and here.
The spammer can apply some further “creativity” to the HTML tags he uses in the markup of his message. He can use undefined HTML tags (or use defined tags out of normal context) to break up key words in the message.
Most browsers are designed to be fairly tolerant of invalid HTML tags. Therefore, you can often put anything you like into pointy brackets and chances are most browsers will ignore these “tags.” Often, invalid tags are used (like “<goofy>”), but often spammers use empty tag pairs (like “mort<b></b>gage”) to get the same effect. They can also use certain tags outside their intended context (e.g., using the table cell tags “<td>...</td>” outside an actual HTML table). As I said, your browser will probably ignore these tags when it renders the message, but a content filter would have to know how to weed them out before scanning the message text. See my sample spam analysis #4 for an example of this practice.
On rare occasions, a spammer will go to incredible lengths to obfuscate text using HTML tricks. Here’s an example taken from my page on spammer tricks (where you can go to see the HTML markup for this example):
It’s not possible for a spam filter to make much sense out of this as raw HTML markup; the key word “viagra” does not appear except as isolated letters widely separated by various table tags.
Another recent innovation in the world of text-mangling is to use obscure CSS (stylesheet) features such as the “float” attribute:
|A R G A I V|
Here’s the markup used to create this gem:
cellspacing="2" cellpadding="10" bgcolor="#dcdcdc">
<span style="float: right">A</span>
<span style="float: right">R</span>
<span style="float: right">G</span>
<span style="float: right">A</span>
<span style="float: right">I</span>
<span style="float: right">V</span>
As you can see, this markup uses the “float: right” attribute to add the letters one at a time starting from the right (i.e., backwards); so, even if a spam filter could read around the span tags, all it would see would be “ARGAIV.”
E-mail addresses that you see in spam require special treatment; in fact, in most cases, it is actually a waste of time (and a type of ill-informed network abuse) to deal with them because they are nearly always false and forged, and do not point back to the spam operation in any way. There are, however, some notable exceptions.
Every spam message, like every other e-mail message, generally includes an address that purports to be the source of the message; you will find these addresses in the familiar “From:” or “Reply-To:” fields in the visible header of the mail, as well as in the Return-Path: or Envelope-From: fields of the “invisible” header. In the vast majority of cases, the spammer does not expect these addresses to be used for response; they are included only because they must be present in order for the message to be delivered (i.e., most mail hosts require incoming mail to be marked with what looks like a valid from-address, although they do not check up to see whether such addresses are true or correct). In most cases, spammers “borrow” the e-mail addresses of innocent, uninvolved parties, or make up fictitious addresses at existing, valid domains (see this page for more info).
In such cases, if you attempt to report the e-mail address that the spam is supposedly “from,” you are in effect making an allegation with no proof against a completely innocent party. For this reason, the general rule is simply to ignore e-mail addresses found in spam.
Every rule has exceptions, however: some e-mail fraudsters do actually expect to receive responses via e-mail. Their messages will usually explicitly request such replies, and may provide special addresses within the body of the message to be used for this purpose (or, they may rely upon the From: or Reply-To: addresses). If you are faced with a spam (such as for a 419 come-on, a job scam, or a lottery scam, or a lonelyhearts scam) that explicitly requests e-mail replies, you may (and should) report the reply addresses to the providers responsible for having issued them. If these addresses can be shut down quickly, this will prevent the scammers from hearing from potential victims.
Legend: new window outside link tools page glossary link
|(c) 2003-2008, Richard C. Conner (
06283 hits since March 28 2009
|Updated:Fri, 11 Jul 2008|