home | legal stuff | glossary | blog | search

 Legend:  new window    outside link    tools page  glossary link   

Finding the the service that hosts a spam website

What we’re doing on this page: Using the URLs collected from the body of the spam mail to find out which services are hosting the websites involved.

Finding the owner of an address used as a spam mail source is a bit of a slam-dunk, once you get used to reading mail headers: the raw IP address of the spam source is always unambiguously included in the header, so once you find it all you have to do is a whois lookup and you’re (usually) done.

In the case of spam web hosts, however, we have a bit more diggging to do in order to ensure that we file an accurate, factual report. Not only is the IP address of the spam website usually not available to us within the message, but we also find that spammers use various tricks to hide the technical “guts” of their sites from snoops. Specifically, we must do the following:

  1. Resolve the web host in the spam URL to an IP address (preferably from an authoritative source).
  2. Determine whether there is any web service at this host.
  3. If there is web service at this host, we must decide whether the website it presents is implicated in the spamming.

If we cannot resolve the web host to an address, either authoritatively or non-authoritatively, then it’s the same as there being no website at all, so we have nothing to report. Likewise, if there’s no web service at this host (even if the host is online), we have nothing to report. Finally, if the website has (apparently) already been dealt with by the hosting provider, or if the website doesn’t seem directly connected to the spam, then we have nothing to do.

Finding the address(es) of the website

Assuming that you have host names for one or more spam websites that were mentioned in the message, the first thing to try is a basic DNS lookup of each host name, using the host or nslookup commands (in the case below, we’re looking at an actual web host mentioned in a recent spam). I highlighted the answer to our query in pink:

alu-g4pb:~ rconner$ nslookup chmabeiklh.goldheir.net
Server: 199.45.32.43
Address: 199.45.32.43#53

Non-authoritative answer:
Name: chmabeiklh.goldheir.net
Address: 82.98.78.100

That was pretty easy, wasn’t it? We aren’t done, however. Note that nslookup told us above that it was giving us a “non-authoritative answer” (i.e., a cached lookup). This cached information is usually correct — but not always.

In general, for spam-reporting purposes, it is best to get your address lookup (if possible) from an authoritative name server for the domain in question. This ensures that you get the most accurate and up-to-date information about the site, bypassing local name server caches. There are many ways to do authoritative lookups, the most basic of which is to use one host (or nslookup) command to find the name(s) of the authoritative server(s), and then more host (or nslookup) commands to query each of these servers for the address (see my page on nslookup and host). 

If you cannot find an authoritative lookup, or if the process becomes too confusing or overwhelming (as we will see just below), you can usually just report the address given by your local name server instead, although as I said there is some risk in relying upon this information.

In doing your authoritative lookups, you are likely to uncover various kinds of DNS chicanery practiced by the more sophisticated spammers:

You can read some more in-depth info about these oddities on my page devoted to spammers’ website DNS tricks. Sadly, these techniques are very effective at keeping spam websites alive despite the best efforts of investigators, since it is simply too much work for the average spam recipient to collect all of the necessary data and to file all the necessary reports.

Reasons for failure of DNS lookups on web domains

What if our lookups do not return an address for the web host name?

There are a number of reasons why host lookups might fail. In general, you’ll get some clues in the readout from nslookup (or host, dig, etc.); if you’re keen on LARTing a particular spam website, you’ll want to pay close attention to these clues, since they differ slightly in their implications for spam reporting.

NXDOMAIN (non-existent domain)

Consider this host lookup:

alu-g4pb:~ rconner$ host www.fakehostname.com
Host www.fakehostname.com not found: 3(NXDOMAIN)

Here’s a domain for which host could find no information within DNS (return code 3, “NXDOMAIN” for “non-existent domain”). This could be due to any of the following:

Getting an NXDOMAIN result provides fairly positive proof that you cannot reach this domain right now from where you sit, since neither the local cache nor the authoritative server lookup reports an address. This doesn’t mean, however, that other people using other ISPs are also unable to reach this domain (due to DNS pathology sometimes exploited by clever spammers, it is possible for a site to be NXDOMAIN from one location, but perfectly resolvable from somewhere else). Unless you want to investigate this possibility further, however — and you probably don’t — there’s nothing to do here, and no report to file.

SERVFAIL (name server failure)

Here's another host lookup that failed, this time for a different reason:

alu-g4pb:~ rconner$ host www.notherfakehostname.com
Host www.notherfakehostname.com not found: 2(SERVFAIL)

Here, this return code (2, “SERVFAIL”) indicates that there was a problem with a name server somewhere. The culprit could be your own local name server (i.e., the one run by your ISP), or (on extremely rare occasions) the generic top-level domain (gTLD) servers, but this error is far more likely to be due to lack of response from any of the authoritative name servers for the domain. In the case of a spam domain, this may mean that the jackleg DNS service set up by the spammer has gone offline, timed out, or has been deliberately shut down.

This result differs from NXDOMAIN because it implies that the domain may indeed be listed in DNS (and is therefore reachable), but that no authoritative lookup was possible at the time you checked. In other words, while NXDOMAIN indicates that all of the necessary name servers were online and operating properly (but had no info for the domain in question), SERVFAIL means that there was a critical failure or disappearance of one or more name servers, so that the query could not be completed and no address could be found for this host.

It would seem that a SERVFAIL error pretty much leaves a domain unreachable and therefore dead in the water. However, the domain may not be completely unreachable:

As with NXDOMAIN, a SERVFAIL result means that you could get neither a cached nor an authoritative lookup for the host. This means that you can’t reach the site from your service, but it is possible that others may find a cached lookup that will lead them to the host (as we’ll see shortly).

Unless you’re really keen on complaining about this one spammer, it probably isn't worth waiting for the SERVFAIL status to change, or doing any other sort of deeper digging. In any case, you’ll probably get more spam from this operator later on, so you’ll certainly have another chance to report his sites.

General timeout message

Sometimes, a DNS lookup will take a long time, after which you’ll get back a timeout message, indicating that your computer (or some other name server up the chain) has given up:

alu-g4pb:~ rconner$ host www.tomato-aspic.ch
;; connection timed out; no servers could be reached

You may not necessarily see this exact message depending upon your operating system and the services you use, but you should see something about a timeout.

It’s a bit difficult to tell exactly what went wrong (maybe your internet service is offline or broken, perhaps the name server hosts were up, but did not have a name server application running, etc.), but clearly some server failed to respond, so we may be able to interpret this as being equivalent to a SERVFAIL error.

Does the website work?

Just because you can resolve the name of a spam website’s host to an IP address does not mean that there’s any website there (or, indeed, any web service at all). Your next task, then, is to determine whether any website is online at the indicated address. If the host is still “up” but the website itself is “down,” then you don’t need to (and should not) file any reports on it.

The easiest way to confirm that a website is online is, of course, to load its URL into a web browser. There are many good reasons, however, why you might not want to do this (particularly if you are stuck with an ill-secured operating system or web browser), so here are some workarounds.

Batten down your browser

If you plan to explore a spam website with a normal browser, it might be a good idea to make some security-related settings before you do. These include:

Even if you do take these precautions, however, you are not guaranteed to be immune from all tricks that spammers might play. Website redirections, for example, will probably work unhindered.

Using a command-line (or web based) web fetch tool

It may be safer, and not much more work, to use some other tool besides a conventional web browser to retrieve data from the site. Specifically, you want a tool that simply fetches files from a website, but does not try to render, display, or execute them (as a browser will). You will also want this tool to show you the HTTP status code for the transaction, so you can judge whether a site has been terminated, or is redirecting you elsewhere. There are several tools fit for this purpose; my favorite is curl.; another is wget, while Perl progammers can put together their own scripts using the LWP module (other languages should have comparable packages or modules). 

I strongly recommend this method for those of you who are unsure about the security of your operating system or browser, and can read HTML markup to figure out what it is showing or doing. You can use any of these tools to save the web output to disk, then examine it with a suitable text editor, pager, or other tool.

Another advantage of these tools is that they can usually be made (e.g., via the -i option of curl or the -S option of wget) to show the HTTP header returned by the server, which includes the HTTP status code and other possibly useful info. This makes these tools the best way to spot and track spam website redirection. Oddly, it seems that few if any normal web browsers can be made to show HTTP header info (even though this seems like a fairly fundamental diagnostic feature).

Scrubbing the URL before loading it

If the spam website URL contains any information that might identify you personally, then you’ll be sending that info to the spammer if you load the URL without “munging” away the web bug data. This is true whether you use a browser or some other non-browser technique. Therefore, you might want to cut anything from the right side of the URL that looks suspicious (e.g., CGI argument lists or long, strange-looking alphanumeric codes). For example:

http://get-instant-dates.foo/?a=123&e=you@yourisp.fum
http://buy-phony-watches.foo/?ea234siausd3235

If you’re feeling puckish, you can also substitute your e-mail address (if it appears) with another less digestible to the spammer (like “uce@fcc.gov”).

If you get one of those URLs with long "alphabet soup" host names in front, it might be the better part of wisdom to remove one (or all) of these host name parts, down to the bare domain name:

http://soijfaowiemnoireg.slivoisjdt.vissfudi.info/ex

Usually, the spammers who use URLs like the one above have set up so-called "wild-card DNS" on their hosts, so the host name portions are really irrelevant, and the site will load properly when just the bare domain name is given. It is possible that these long host-name strings could be web bugs of a sort, but they are more likely just random character strings meant to confuse or distract.

You should be aware that if you mangle the URL in this fashion, you might be sent to some other page, and you could thus draw an incorrect conclusion about the nature of the site. For example, spammers frequently redirect partial URLs to “removal” pages, and in the past they often used to respond to link-mangling by sending the snoops to Yahoo! or some other unrelated and innocent website.

Evaluating the contents of a spam website

What do you see when you load the spam website URL? It will probably fall into one of these three categories:

  1. A website that matches what you expect based upon your reading of the spam mail (e.g., a drug sales website for a drug spam).
  2. An “error page” from the hosting service indicating that the site is no longer available for viewing, or could not be found.
  3. A website that somehow doesn’t quite correspond to the spam mail, or that may have a disclaimer stating that the site’s proprietors are under “Joe job” attack by someone who wants to entangle them in false accusations of spam.

In the first case, you can be fairly confident that you’ve reached a reportable site. In the second, it is likely that the spam website has already been taken offline (probably by the hosting provider), so no report will be necessary. In the third, you will have to use some human judgement. Here are some common pitfalls to watch out for:

Website has been shut down

If you see something in your browser like “can’t be found,” “not available,” “shut down for abuse,” or the like, then you can probably mark this site down as a kill. Frankly, many ISPs don’t get on top of spam websites as quickly as they might, but you do occasionally see such things.

Innocent bystanders

Sometimes, spammers will insert web URLs into their messages that have nothing directly to do with their spam pitch. Stock spammers used to include links to various stock tracking websites (e.g., http://finance.yahoo.com/) to beef up their appeals, while other spammers used to include random URLs (some nonexistent) as camouflage to confound automated spam tracing tools or inexperienced investigators (such as in this classic example).

Such websites are really innocent bystanders, since they have nothing to do with the spam pitch. In fact, they are really fellow-victims of the spammer. You should avoid reporting these sites.

It can sometimes be difficult to tell whether or not there is any relationship between a spammer and a website that he cites (as in the case of my classic lottery spam). Some possible clues of a direct linkage include:

Joe-jobbery

As I mentioned above (and elsewhere), a “Joe job” is a run of spam mail sent by a bad guy in the name of an innocent party; Joe jobs are usually done to harass enemies or competitors, or to get revenge for real or percieved injuries. Usually, the Joe-jobbers don’t stop at simply tagging their victims as spammers; they often will attempt to implicate them as child-pornographers, drug-pedlars, fraud artists, or other types of criminals. If successful, a Joe job can cause a lot of damage for the victim, ruining his net reputation and requiring him to prove (or attempt to prove) his innocence to his ISP, to angry complainers, and possibly even to law enforcement.

Often, once a webmaster has detected that he’s being targeted with a Joe job, he will post a note on his website to disclaim any responsibility for the spam. This is more likely to be the case for a small site than for one belonging to a large corporation or institution (which woudn’t want to let on that it is vulnerable to such tactics). You will need to exercise some judgement in deciding whether to report such sites; it may be best not to report them unless they become repeat offenders.

Keep in mind that you yourself may someday get hit by a Joe-jobber, particularly if your anti-spam efforts become too effective or too high in profile, so the Golden Rule (“do unto others...”) may be worth following here.

Redirected websites

As I explain on my page about spammer’s website-hiding tricks, spammers will often redirect you from the URL given in the spam mail to some other site somewhere else. In such cases, the URL in the mail message is usually just a “portal” site set up just to protect the main site (the one you finally land on) from being reported.

The most common ways in which spammers do this redirection are:

In general, if you end up on a different URL than the one you entered into your browser, you have probably been redirected. In order to see exactly how this has been done, you can use a web fetch tool to show you the HTTP header and the HTML <HEAD> element for the portal site; the answers will usually be found in one or the other of these palces.

You are entitled to report not only the portal website, but the target site to which you have been redirected, although few people have the time or patience to track down these “hidden” sites.

Looking up (finally!) the owner of the address

Once you've (1) found an IP address for the spam website, (2) confirmed that is is a valid address for the site, and (3) concluded that the site is reportable, then you are ready to proceed with finding the service that controls the address. To find this service, and the abuse contact to whom you can report, you use the same techniques as you do for finding the parties responsible for spam mail hosts. Be aware that some spammers may put their websites in IP address blocks that they themselves control (i.e., they are listed in IP-whois as the owners of the blocks), so you may want to track down the upstream providers for these blocks.



 home | legal stuff | glossary | blog | search

 Legend:  new window    outside link    tools page  glossary link   


(c) 2003-2009, Richard C. Conner ( )

06814 hits since March 28 2009

Updated: Wed, 01 Apr 2009