home | legal stuff | glossary | blog | search

 Legend:  new window    outside link    tools page  glossary link   

Popular Spammer Tricks

Although spammers would like you to think that they’re up-and-up business people, they do seem to go to some great lengths to hide their identities, obscure the origin of their messages or of their websites, and to collect information about you without your knowing. Gee, they must have guilty consciences or something, doncha think?

If you need any further proof as to the bona-fides (or lack thereof) of the typical spammer, consider some of these artful (and not-so-artful) technical and psychological dodges commonly used to transmit or cloak spam, and to get you to open it:

Transfer of mail via open relay mail hosts

SMTP, the protocol that governs e-mail transfer over the internet, was originally designed to permit cross-domain “relaying” of mail: that is, a user from domain A could contact a mail host in domain B to send mail to a user in domain C. Now, suppose that the domain-A user is a spammer, and the domain-C user is you — then the poor mail host at domain B becomes a pipeline for spam (it thereby earns the name “open-relay mail host”). In addition, the spammer can prepend bogus SMTP handoff records (“Received” lines) to the message before submitting it to the open-relay host, so that anyone trying to trace the message would likely be led down a blind alley.

However useful this relaying capability may have been in earlier days (when there were fewer mail hosts and sparser network connections), it has been horribly abused by spammers and others to the point where it is now a significant security breach (and an invitation for theft of service) to allow an open relay host to operate on the public network. System operators have been diligent in closing off these open relay hosts, which has driven spammers to develop new spam-mail vectors (keep reading).

Use of offshore ISPs

Most reuptable ISPs in the developed countries realize that taking on a spammer’s traffic can do nothing but get them in trouble with other ISPs as well as millions of angry spam recipients (including their own current and prospective customers). So, spammers are often forced to go elsewhere to get their messages sent or to host their websites. Often, they’ll go halfway around the world where operators aren’t so picky about ethical practice and (as a bonus) often sell their services for much less (or, their networks may simply be more vulnerable to subversion than those in the developed world). Popular choices include mainland China (and occasionally Taiwan), Hong Kong, South Korea, Indonesia, Malaysia, and other countries on the Eastern Rim, as well as countries in South America and the former Soviet bloc. Sometimes, spammers organize cadres of such offshore resources to create “bulletproof hosting” and other offerings to the spam industry.

As these countries struggle to join the new information economy, they are eventually confronted with the necessity to play nice with other network users; this often leads to crusades against spam operators. The foremost example here may be Brazil, which is waging a fierce war against spammers, although they face resourceful and well-entrenched opponents. Mainland China, on the other hand, seems to run hot and cold; while they are quite diligent about restricting their own citizens’ access to (and activities on) the internet, they seem only fitfully concerned about the use of their network bandwidth to host websites for pornography, stolen software, bogus drugs, and counterfeit goods offered to the rest of the world.

Pink contracts

Rest assured, the average retail ISP will have no problem cancelling your $20/month dialup account if they catch you using it for spam. What, however, if you were instead paying them hundreds or thousands per month, much more than you would have to pay for the same services elsewhere?

Some spammers do so well at their trade that they can afford to pay a premium for, eh, “special” levels of service from their internet providers. Such services often include turning a blind eye to complaints about the spammers’ activities, even when these activities are explicitly in violation of the ISPs’ abuse policies. This kind of under-the-table deal is known as a pink contract (presumably from the pink color of SPAM™ the meat product) and more than one large or reputable ISP has gotten embarrassed when their participation in such deals was revealed to the public. This practice may be more widespread than we know, but of course we don’t have any way to prove it.

Often, the pink contract is offered or accepted by the sales force at an ISP, who are naturally interested in maximizing their incoming revenue. Once such contracts are in place, however, the sales folks are pretty much out of the picture, and it falls to the technical and abuse-desk folks to have to deal with the consequences. Frequently, these harrassed folks will rationalize matters by simply forwarding spam complaints to the spamming customer so that the customer can, ah, deal with them. At best, they may simply ignore complaints or else pretend not to understand them.

If a spammer is really well-off, he can buy his own sandbox to play in: he can get a block of addresses signed over to him by some large wholesale provider. With the spammer listed in the whois database as the owner of the block, investigators find it necessary to do a lot more digging in order to identify this upstream provider; given the revenue it derives from such a relationship, this provider will often be far less inclined to interfere directly with the operations of its customer even with clear evidence of abuse.

Transfer of mail via insecure third-party mailback software

The mailback application, which allows website visitors to communicate with a website operator by filling in and transmitting a web form, is a popular feature found on many websites. It is a good way for a webmaster to exercise some control over the mail he gets (and to avoid spam), as you can read elsewhere. Yet, if the mailback script is vulnerable, it can be hijacked by spammers to send mail to anyone — not just the intended recipient. The spammer can simply bang away at the script as long as he is able, and can be confident that the spam won’t be traceable any further back than the innocent website proprietor, who can log on one day to find his inbox inexplicably filled with angry spam complaints.

If you choose to run a mailback script (such as the popular formmail), make sure that it can’t be hijacked by a spammer (go here for some advice). If you can manage it, it might also be a good idea for you to have the script retain statistics (e.g., HTTP query info including the user’s IP address) for each mailing made by the script so you can report any abuse of your scripts to the source. Try to use a good script or program that’s been around for awhile, and has received frequent security updates. Make sure that you understand (and make) all appropriate security settings within the script, particularly those that (1) limit the distribution of mail from the script and that (2) prevent the execution of shell commands that might be embedded in the text submitted to the script.

If you’re reporting the rare spam delivered these days via mailback script, remember that you may be dealing with a fellow-victim (i.e., the operator of the script), so spare the invective. You can (and should), however, advise him to get his mailback facility under control or else turn it off.

Transfer of mail via open proxies or “zombies”

How would you like to become part of a spam operation? It’s easy to do. Just ignore prudent security precautions while you leave a vulnerable computer on the network full-time (via a DSL or cable-modem connection), and the spammers will eventually move in and do the rest. Soon, your computer will be turned into a remote-control spam pump (called in the trade an open proxy or a zombie), and you may not even be aware that it is happening. In fact, it could already be happening now.

In network parlance, a proxy is a network-based computer that does work on behalf of other computers, usually hiding their identity in the process. For example, if you work for a large company, your web surfing at the office may be done behind a proxy server that can conslidate and cache web lookups, and can hide information about individual computers in the domain. These are good uses of network proxies.

In the case of the open spam proxy, the spammer uses some hostile mechanism (typically viruses like Sobig and its children) to implant a proxy agent on your computer (thereby turning it into a sort of internet “sock puppet”); he can then remotely command this program to send out spam mail via your normal internet connection. The proxy won’t use your mail program to send its mailings, and probably won’t leave any traces of its activity for you to detect. Spam entrepreneurs often recruit large “bot-nets” of such machines and resell access to them to others spammers who want to send mail.

It’s very difficult if not impossible to trace proxy-originated spam messages all the way back to the actual spammer; usually, you can only get to the ISP whose customer’s infected computer sent the message in the first place. Thus, the open proxy provides excellent cover for a spammer, and represents a nexus between the worlds of spam and computer malware.

In addition to sending e-mail untraceably, spammers also use their open proxy botnets to “front” for spam websites or name servers, or to engage in directory harvest attacks in order to collect fresh addresses for spam mailing lists.

The machines most vulnerable to becoming open proxies are those running Microsoft Windows (yeah, big surprise, I know), particularly the machines of unsophisticated users who don’t keep up proper security. Macs, Linux machines, and other non-Windows boxes don’t generally get targeted, possibly because cracking them requires a different skill set than most virus writers possess (and also because there aren’t nearly as many of these machines in circulation).

Your best defense against open proxy infestation is to follow all the normal precautions you would take to defend against virus attacks in general: be careful with downloaded files and mail attachments, keep up with the security news, patch your operating system (and anti-virus software) when required, and scan your system often for problems. An external firewall or router that can block activity on unused IP ports may also help stop zombie activity (although these, too, are often vulnerable to “back-door” subversion). You might also consider simply pulling the plug (any plug) on your DSL or cable modem when you’re not actually using the network (this won’t remove the proxy, but it will at least hinder its work). You can also shut down your computer when it isn’t in use, to much the same effect. Simply turning off your network access from your software (e.g., PPPoE control panel) may not be effective in every case.

Transfer of mail via “direct-to-MX” software

When you or I (or other “normal” users) send an e-mail message, we just type in the address of the recipient as, say, “servo@satellite-of-love.mst,” and hit the “send” button. But where will this message go? Of course, it will go (eventually) to the satellite-of-love.mst domain (or to some other related domain), but to what machine specifically inside that domain? We don’t know and we don’t care; these knotty details are handled for us by our ISP’s mail system.

When we send that message, it will go from our computer to a mail transfer agent (MTA) computer in our ISP’s domain. This machine then takes the recipient address we supply and does a DNS lookup to find out out exactly to which other mail host it must send the mail. This other host is known as a mail exchanger or MX. This interaction of MTAs, DNS, and MXs is a very good setup, because it allows ISPs to change their mail host setups at will with minimal disruption to users, and it does not require us normal users to keep up with these changes. It also allows ISPs to keep track (in mail server logs) of the mail they’re sending out to other domains.

Of course, if you have the right bulk-mailing software with the ability to look up MX records directly, you can bypass this process and drop the mail directly onto the recipient’s MX host from your computer. This is known as direct-to-MX mailing, and is very useful to (and much used by) spammers. First, as with open-relay and open-proxy mailing, it allows them to construct an almost completely forged header (since the message will never have actually passed through any MTAs on its way to the destination). Second, the spammer is also able to forge the HELO (the host name by which a sending host identifies itself to a receiving host). Finally, since the message bypasses the sending ISP’s MTA, the ISP will have no record of the mail in its logs.

Direct-to-MX mailing, however, is actually quite easy to spot: there’s usually only one non-forged Received: line in the header that shows an external relay (from the spammer to your MX). If the spammer uses a forged HELO with the MX, it can be easily checked against the IP address that his machine must provide when it talks to the MX: you need only do an nslookup on the HELO name, and another reverse nslookup on the IP address, and then see whether these all match up; if not, then you probably have spam (or at least very suspicious mail).

Currently, the vast majority of spam is sent using the direct-to-MX technique. The typical direct-to-MX user can use a “sacrificial” dialup or broadband internet access account to send his mail, with the expectation that this account will be closed soon after the complaints roll in. Of course, if the spammer uses a stolen credit card to pay for the account, he doesn’t lose much when the account is terminated. Direct-to-MX can also be used in conjunction with open proxies (the zombies can be programmed to do the lookups necessary for direct-to-MX mailing).

Forged header information

As you can read elsewhere, the header of every e-mail contains detailed information about the route that a message has taken to get from the ultimate origin to the ultimate recipient. Normally, this information is automatically added by each mail host through which the message passes, but if a spammer can find a spam-friendly open relay mail host (or a convenient open proxy “zombie”), it is possible for them to insert bogus routing information into the header in order to hide the origin of the message. Also, direct-to-MX spammers will try to jigger the header by adding bogus mail host names or “HELOs” (which most mail transfer agents will ignore, but which inexperienced amateur spam-sleuths might not).

Normally, smart spam analysis tools (like SpamCop) can usually detect forged header lines and will simply ignore them, since such information is highly suspect. If you do try to trace such header information down, you could either hit a dead end (due to malformed records or a non-existent host) or, worse, you could end up implicating an innocent party.

As well as being against the posted policies of nearly all ISPs, forging headers is now a federal crime, thanks to the CAN SPAM law. You have to catch the spammer before you can prosecute him, however, and this is usually far easier said than done.

Bogus to- and from-addresses

As I explain elsewhere, the SMTP protocol that governs the transfer of e-mail messages across the internet does not require the To- or From-address of the message to appear in the message itself. This may seem surprising, since these seem like pretty fundamental parts of an e-mail message, but it is true. The actual recipient address is passed “out-of-band” in the SMTP conversation (i.e., the mail hosts don’t peek inside the message to get routing info), while the sender's address simply doesn’t have to be provided at all.

Therefore, it is risky to rely upon the veracity of the visible To- and From-addresses in questionable e-mails. Most smart spam tools won’t even bother to process these. Initially, most e-mail programs depended upon From-address blacklisting as the principal means of filtering spam, but this is now regarded as a very ineffective technique (because spammers never use their own e-mail addresses, and seldom reuse their stolen or forged addresses from one run to the next).

Obfuscated or misleading URLs

Did you know that 1077562591 equals 64.58.76.223, and that both equal www.yahoo.com (or at least they did as of this writing)? Well, most web browsers do, and so do many spammers. That’s why they’ll commonly render the hostname or IP-address part of a website URL in strange looking numbers (decimal, hex, or octal), or obscure character escapes.

Also, the spammer may sometimes try to make his site look like someone else’s with the seldom-used user-ID field in the URL (e.g., “http://www.notspam.foo@10.10.10.10/,” where the stuff to the left ot the @ sign is a bogus “user id” that most web servers will simply ignore, and the 10.10.10.10 address points to the actual site). To a casual observer, the URL may look like “www.notspam.foo” rather than the true IP address.

Possibly related to the above is the trick of sending you to a different IP port than the customary one used for HTTP transactions. For example, “http://www.sleazyspamweasel.foo:2236” will send you to port 2236 instead of the default port 80 for HTTP service. I’m guessing that this trick is used when the spammer is running his own private web server on a host that already serves port-80 traffic; this may enable him to keep private logs or to keep his traffic somewhat out of sight of system admins.

Occasionally spammers find apparent loopholes in URL processing (or in the URL-handling behavior of particular web browsers), embedding invalid characters in the URL in order to disguise it. Most people, even if they want to bother trying to track such URLs down, won’t know what to make of them. It is often necessary to use a URL de-obfuscation tool to remove the garbage and noise; you may be able to find such a tool from this Google search.

Redirecting URLs

Probably everyone who has built a web page knows how easy it is to create links that visitors can use to jump to other websites (learning to create <A HREF="..."> links is pretty close to being Lesson #1 in web development). Of course, it is also possible to link visitors to other wesites automatically, without their having to take any action. This behavior is known as redirection, and there are a number of ways to accomplish it (you can find the technical details elsewhere on this site).

Redirection is just the ticket for spammers who want to protect their website operations. All they need to do is set up a phalanx of “portal sites” that do nothing more than to redirect visitors elsewhere (i.e., to the real drug- or watch-selling website). It will be one of these redirector links that gets included in the spam mailing, and not the link to the actual website. If one of the portal websites should be nailed by investigators and shut down by its hosting service, it is a simple matter to set up more redirectors elsewhere (often using the very same hosting service). 

Redirector websites and links come in various flavors, with new ones being concocted regularly. Here are some of the current varieties:

Free web (and blog) hosts

Free web hosts, like Geocities or Googlepages, as well as free blog services (like Blogger.com) are fertile ground for planting redirectors. Since these sites are free, users don’t have to enter much in the way of verifiable personal or financial information, and it is even possible for spammers to use automated tools to create and stockpile large numbers of redirectors for later use. While the firms mentioned above (along with many other free hosters) do make efforts to weed out the redirectors when they receive complaints about them, this process can often take a few days, giving time for the redirector to do its work.

URL-shortening services

Another means to set up a redirection is to use a so-called URL-shortening service (with TinyURL being the most familiar example). For example, the following link will open another copy of this page in a new window:

http://tinyurl.com/yugqq2
(opens http://www.rickconner.net/spamweb/tricks.html)

These "short" URLs generally do not contain any hints as to where they actually point, so spammers can use them to redirect invisibly to their actual websites. Many URL shortening services (including TinyURL) have strict anti-spam policies that allow them to break links that are reported to point to “spamvertised” websites, but this can take a few days to happen (by which time the link may already have fulfilled its purpose for the spammer).

Creating a web-shortening service is an attractive project for journeyman web developers, and so lots of these services spring up from month to month. Unfortunately, their proprietors are not always aware of the potential for their abuse by spammers, and one suspects that some of these services are even set up just to serve spammers.

“Public” redirector links

Many search engines and other large web enterprises use internal redirector links to send you to sites that you might click on in your searches. One example of such a public redirector is rd.yahoo.com:

http://rd.yahoo.com/?http://www.rickconner.net/

All that rd.yahoo.com does here is simply to redirect your browser to the URL named after the question mark. Yahoo often uses this technique when it provides you links to non-Yahoo sites (they could just as well link directly to the external URL by itself, but I suspect they use the redirector in order to capture info in their server logs in order to develop marketing or usage statistics). Yahoo doesn’t have a thing to do with the sites listed in such redirection links, but the naive user might assume that the spammer’s site is hosted by Yahoo so that he couldn’t be such a bad egg.

Doctored search-engine links

Most people arrive at most websites not by loading them directly but through the intermediary of a search engine (Google, Yahoo, etc.). If a spammer can contrive to get his website indexed by a search engine, he can then use an appropriate search-engine link to point to the site, effectively using the search engine itself as a redirector. Such links are often combined with public redirectors (as above) to create a veritable fiesta of obfuscatory redirection. Google, of course, is a very popular choice for such links, but other search engines are also exploited in this way. It can be difficult for search-engine operators to deal with such redirection, because it simply exploits the natural behavior of a good search engine.

Needless encoding of message bodies and other data

MIME, or multipurpose internet mail extensions, is a set of procedures used to include various kinds of (possibly) binary data in the bodies or attachments of e-mail messages. Basically, these techniques turn binary data (which can interfere with the mail transfer process) into a text-like form containing only legal ASCII characters. This is a good thing, because it allows you to send pictures or word-processing documents with your e-mail, and it also allows those in other countries to send their messages in native character sets (such as Chinese or Cyrillic). However, these techniques can also be used to disguise the content of a mail message as it traverses the internet, which is exactly the kind of thing that spammers like to do. Most browsers or mail programs will decode this data when they display such messages (thus allowing us suckers to read them), but if you’re dealing with raw mail messages either on the net, or on your computer, you may be mystified by the encodings.

Three types of MIME encoding are (ab)used by spammers:

=?ISO-8859-1?b?c3NlZCBsYXRl?=

Two other encoding techniques are often used by spammers in HTML message bodies; these work in much the same way as MIME but aren’t MIME per se. So-called URI encoding is used to disguise selected characters in website URLs (indeed, any type of URL or URI); this stuff looks like “%nn” where “nn” is the hex code of the character being escaped (just like QP encoding above). Less often, spammers will use HTML character entity codes to disguise selected characters in an HTML message body (these look like “&#nnn;”, where “nnn” is the decimal (not hex) code of the character being escaped.

Again, the point of these techniques is to disguise the message content while it is in transit on the internet, thereby allowing it (so hopes the spammer) to sneak past content-based spam filtering. If your spam filter doesn’t know enough to decode this stuff before it weighs the message, these tricks will work.

Encrypted message bodies

From time to time, you may get what looks like a normal HTML spam; upon inspecting the HTML source, however, you find nothing but some JavaScript and a big pile of cryptic data (as in my spam example #5). What’s going on? Well, the spammer is hiding the content of the message in the weird data, and using the script to get your browser to decode it and display it. Many spam-reporting tools like SpamCop will not try very hard to analyze such information, as it can be misleading and result in misdirected complaints. Indeed, as you can see from example #5, unraveling such a message can be a lot of work (almost prohibitively difficult to do without a web browser).

Let’s get one thing straight: There is no reason (other than deliberate deception) for the sender of a message to hide its full contents from the recipient. If the sender has something he doesn’t want you to see, he damned well ought not to send you the message (now, there’s a concept!).

It may take some detective work to ferret out any reportable information from the body of such messages; if you’re not up to deep detective work (like that in example #5), then try putting the mouse over hyperlinks in the message and see whether they show up in your browser’s status line, but be careful not to click on them unless you’re sure you know what you are getting into (be aware that these links can be disguised on certain browsers, as we see in spam example #3). On the other hand, the good news is that the mail header is not affected by this trick and can be investigated in the usual manner.

Disabled right mouse button

In most browsers, when you right-click on an e-mail or web page (or, if you have a Mac with one-button mouse, when you hold down “control” and click), you get a little pop-up menu that, among other things, allows you to load the raw HTML markup for the page or message into a text editor window.

Looking at other people’s HTML source is a great way to learn how things are done in HTML; I’ve used it, and continue to use it from time to time. Examining HTML source is also a great way to uncover clues as to the origin and nature of spam messages. This apparently is why spammers hit on the tactic of “turning off” the right-click function with a simple JavaScript trick. Often, instead of the usual pop-up menu, a right-click will display an alert saying “Source not available” or some such.

Perhaps this is enough to put off most casual readers, but workarounds for this dirty little trick are usually pathetically easy. First, you can try the “show source” command that lurks somewhere in your menu bar. If that doesn’t work, or if you can’t find it, you can always do a “save as” and save the page as HTML source.

Frankly, I’m not impressed by such crude attempts to control the operation of my software on my computer, and they make me want to drop the hammer all the harder on the weasels who use them.

Hashbusters

Many people may wonder why ISPs can’t simply detect outgoing spam as thousands of instances of the same message passing through their mail hosts. In theory, this would be easy to do, but in practice it would require huge amounts of processor time and would slow down legitimate mail transfers.

Some mail servers, however, use a less computationally intensive form of such checking, whereby they distill the message arithmetically into a smaller package called a “hash.” Identical messages going through can thus be relatively easily spotted by comparing the abbreviated hashes. However, if the spammer does the simple trick of altering the message ever-so-slightly on every few dozen iterations, the hash detection can be effectively bypassed. These “hashbusters” usually take the form of a “mutable” string of gibberish appearing in the messge subject or in the body.

It would be difficult, if not impossible, to construct an effective client-side filter to spot hashbuster subject lines. Of course, when you receive such a message, you can quickly spot these goofy subject lines, and they’ll just scream, “SPAM!”

Embedding recipients’ e-mail address in hyperlinks or “web bugs” (beacon URLs).

It is a waste of time and resources for spammers to send mail to nonworking e-mail addresses. They may have only limited access to their mail conduits before they’re detected and cut off, so transferring a bunch of messages that will just get bounced or rejected is a waste of precious bandwidth.

For these reasons, spammers like to find out whether their messages actually reach anyone. Thanks to HTML-based e-mail, they now have the tools to do so, often silently (without your being aware of the process). Check my spam analysis #2, and the “good clean humor” spam, to see how they do this. Often, your address isn’t shown “in the clear,” but is encrypted into an apparently-meaningless string of letters and numbers which is then “tagged” to one of two kinds of URLs. The more benign of these is the tagged hyperlink, which requires you to click on the link in order to transmit the data. The worse of the pair is the so-called “web bug” (I first saw these with the more descriptive term “beacon URL”), which embeds a zero-size or invisible image in the message body, and tags this image link with your identitiy; the data will be sent automatically without your knowledge every time you view the message in an HTML-enabled mail reader.

Needless to say, then, you don’t want to click on any links in a spam e-mail, at least not without carefully scrutinizing them first. You should also see whether you can turn off automatic image loading in your mail program, which will render most (not all) of the beacon URLs inoperative.

“Personalized” messages and provocative subject lines.

What red-blooded male wouldn’t be interested in opening an e-mail message from “Candi”, saying “Haven’t heard from you lately”? Or, consider the ominousness of messages with subject lines like “Regarding your account,” “Payment past due,” or even “You are being monitored.” These are all subject lines that I’ve seen in recent spam.

Sometimes you’ll get what looks like a personal message that purports to have been intended for someone else. Who could resist the opportunity to open and read someone else’s mail, there in the privacy of his own home? It is exceptionally rare, if not impossible, for a message to be misdelivered by internet mail systems, but most folks probaby don’t know that.

These really aren’t technical tricks per se, although they are what hackers sometimes call “social engineering,” exploiting the naivté of users to get them to open the message and read it. Sometimes, these ruses are easy to spot, other times not.

“Creative” misspelling.

What with persistent errors like “loose” for “lose,” “your” for “you’re” (and vice-versa), the inevitable confusion between “its” and “it’s”, and a thousand other little shocks, the internet seems to be a hotbed of variable orthography. You will also see a lot of misspelling in spam, but it turns out that not all of it is due to ignorance or negligence.

For example, if you have a content filter that traps messages containing “viagra” (because of the flood of viagra spams), all the spammer has to do is spell it as “v1agra” or “\/iagra” or even “viagara” and he’ll probably sneak past. On the other hand, so-called Bayesian spam filters can remember such odd spellings and assign them a very high statistical weight, ironically making the spam easier to spot.

One variant on the creative-misspelling trick might be called the “creative syntax” trick, in which the spammers use very strange phrasings for their messages; these may sneak past a filter while still remaining marginally readable. Here are a couple of real-life examples:

Sir/Madam,
Your current homeloan qualifies you to get abundant gains. Our database will synchronise you with the most qualified broker, so that you will have more finances in your statement at the end of each month.

According to the most recent paper,
they said that most American's
are drawn in retaining their savings
That is why they divulged yesterday this place drug website URL deleted
They ran upon it after going around the internet.
The great secrets of the net.

I sometimes wonder, however, whether these spammers’ target audience (i.e., the ill-informed and gullible) will be able to understand such constipated English.

Bogus text

Bayesian spam filters, as mentioned above, look at the content of a message and “weigh” the presence of spam-related words and phrases (e.g., “viagra”, “remove”, etc.) against the message as a whole. If you can contrive to put enough “neutral” text in your spam message, you might be able to drop it below the statistical threshhold of such filters. And so, spammers will often introduce lots of unelated words or phrases, sometimes inside the HTML markup (but often outside it, after the final </HTML> tag. Sometimes they mask the words by making them the same color as the page background, or putting them in a comment. Often, the words are quotations from some unnamed literary work, but they may just as often be randomly generated strings of words.

In the most recent spams I’m getting, the ratio of bogus text to real text has increased markedly, and the spammers are also no longer always bothering to make it invisible. When combined with the “creative misspelling” trick, this results in sales pitches that are almost complete gibberish. Nowhere else in the world of advertising do its practitioners see fit to deliver such wilfully incomprehensible messages (okay, there were those original ads for Nissan Infiniti cars).

Obfuscation of HTML message bodies.

Most browsers are designed to be fairly tolerant of invalid HTML tags. Therefore, you can often put anything you like into pointy brackets and chances are most browsers will ignore these “tags.” Often, invalid tags are used (like “<goofy>”), but often spammers use empty tag pairs (like “mort<b></b>gage”) to get the same effect. They can also use certain tags outside their intended context (e.g., using the table cell tags “<td> ... </td>” outside an actual HTML table). As I said, your browser will probably ignore these tags when it renders the message, but a content filter would have to know how to weed them out before scanning the message text. See my sample spam analysis #4 for an example of this practice.

On rare occasions, a spammer will go to incredible lengths to obfuscate text using HTML tricks. For example, you might not notice anything amiss about the following (except perhaps for the raggedy letterspace)

B
u
y
V
I
A
G
R
A
H
e
r
e

but in fact it was constructed using a 6x3 HTML table, with one letter in each cell. Here’s what the markup looks like:

<div align="center">
<table width="45" border="0" cellspacing="2" cellpadding="0">
<tr>
<td>
<div align="center">
<b><font size="5">B</font></b></div>
</td>
<td>
<div align="center">
<b><font size="5">u</font></b></div>
</td>
<td>
<div align="center">
<b><font size="5">y</font></b></div>
</td>
<td>
<div align="center">
</div>
</td>
<td>
<div align="center">
</div>
</td>
<td>
<div align="center">
</div>
</td>
</tr>
<tr>
<td>
<div align="center">
<b><font size="5">V</font></b></div>
</td>
<td>
<div align="center">
<b><font size="5" color="red">I</font></b></div>
</td>
<td>
<div align="center">
<b><font size="5" color="red">A</font></b></div>
</td>
<td>
<div align="center">
<b><font size="5" color="red">G</font></b></div>
</td>
<td>
<div align="center">
<b><font size="5" color="red">R</font></b></div>
</td>
<td>
<div align="center">
<b><font size="5" color="red">A</font></b></div>
</td>
</tr>
<tr>
<td>
<div align="center">
<b><font size="5"><a href="http://www.pfizer.com/" target="_blank">H</a></font></b></div>
</td>
<td>
<div align="center">
<b><font size="5"><a href="http://www.pfizer.com/" target="_blank">e</a></font></b></div>
</td>
<td>
<div align="center">
<b><font size="5"><a href="http://www.pfizer.com/" target="_blank">r</a></font></b></div>
</td>
<td>
<div align="center">
<b><font size="5"><a href="http://www.pfizer.com/" target="_blank">e</a></font></b></div>
</td>
<td>
<div align="center">
</div>
</td>
<td>
<div align="center">
</div>
</td>
</tr>
</table>
</div>

As you can see, it would be very difficult for a spam filter to fish the text out of this markup, although it would be able to see (and trace) the web URLs provided.

CSS features, like the “float” attribute, can also be used to break up text within HTML markup, yet still have it be perfectly readable to a human once it is rendered. See this page from my web log (scroll down) for an example.

If you find it necessary to code this deep just to get past routine spam filters, you really should think about whether you are in a viable business.

Monkeying with DNS for websites

In early days, spammers depended upon e-mail responses to their ads, and many still require you to call them on the telephone to take advantage of their amazing offers. However, most spammers have long since moved on to selling their goods via websites. It’s (apparently) pretty easy for a spammer to find “bulletproof” web hosting by services that don’t mind supporting spam and throwing carloads of complaints into /dev/null (they’re being richly rewarded for this by the spammers).

Yet, sneaky as they are, the spammers still apparently find it necessary to play games with their web resources to keep spam hunters off their trail. Most often, these tricks involve manipulations of the domain name service (or DNS), the distributed service that helps match host names with numeric IP addresses.

The spammer may contract with a low-profile spam-friendly DNS operator to play these tricks, or more often nowadays will simply include rudimentary name server applications on his own hosts, giving him DNS service that he can manipulate as and when he desires.

The DNS tricks I see most often include:

You can read more about some of these dodges in my page on spam website cloaking.

Affiliates and portal sites

One of the best ways that an online merchant can “leverage” his internet coverage is to develop a cadre of affiliates; that is, people who use their own websites or e-mail to direct traffic to the merchant’s site. Usually, the merchant will offer a commission, a bounty, or other kickback to the affiliate, which is payable when orders are placed bearing the affiliate’s code number. This sort of thing is widely (and for the most part ethically) used in the porn-website business, and also by Amazon.com and other firms (such as the latter-day Napster online music service).

Spammers have also glommed onto the affiliate approach, as have marginal businesses who aren’t offended by spam marketing but would rather that others do it on their behalf so as not to risk getting nailed themselves. You can often spot affiliate links in spam mail because the URLs will contain codes like “a=103” or some such to identify the particular affiliate (see my spam analysis #5 for an example of affiliate spamming). The affiliate model seems to be the standard division of labor in many spam rackets today: the spam merchant puts up the websites to advertise the goods and take the orders, and pays bounties to affiliates to send out the mail (several unrelated affiliates may be used, judging by the varying “fingerprints” in the mails I get advertising the same sites). If one or another affiliate is busted off the net, there will be others to take up the slack. I’m guessing that most of the more stubborn and pernicious spam we get these days is based on the affiliate model.

Dictionary attacks

The “dictionary attack,” also known as the “directory harvest attack” (DHA) or “MX probe,” is a popular activity among spammers who want to develop or maintain lists of target addresses. It relies for its success upon the fact that most mail exchanger hosts (i.e., mail receivers) allow visiting mail-sending hosts to deliver (or attempt to deliver) hundreds or thousands of messages in a very short time, and within a single SMTP session. The sending host will usually get very quick feedback from the MX as to whether each message will be deliverable. This trick can thus be used to identify non-deliverable addresses (which the spammer can then strike from his list), but more importantly to gather new addresses (the spammer can literally make up an address — like jsmith@foo.bar, for example — and then test it against the MX to find out whether or not it works).

For the normal mail user, such activity shows up in the form of mysteriously blank incoming messages (often without subject lines or bodies). These prove that the prober has managed to get a message to you and now knows that your address works (so, stand by for more spam).

Even if you carefully protect your address from harvesting, and even if you never use it at all, it is still vulnerable to being picked up in a dictionary attack. One possible way to avoid such harvesting would be to make up an e-mail address that does not contain recognizable words or names; see my page on avoiding spam for further detail.

You can read more about dictionary attacks in my spam analysis #11.



 Legend:  new window    outside link    tools page  glossary link   


(c) 2003-2008, Richard C. Conner ( )

27396 hits since March 28 2009

Updated:Wed, 25 Jun 2008

Document made with KompoZer