home | legal stuff | glossary | blog | search

 Legend:  new window    outside link    tools page  glossary link   

Classic Spam: Link Spamming

The bottom line: Not everyone who wants to swap links with you deserves such consideration, and not everyone who visits your website does so with the intention to read it. This page doesn’t deal with e-mail spam per se (the messages dissected here probably don’t fall under my definition of spam), but I include it here anyway mainly for the information of my fellow webmasters.

Search engine optimization, or “SEO” (to use the inevitable Three-Letter Acronym), is the art of improving the placement of websites in search engine results. Since most folks’ first stop in surfing the web (and, perhaps more to the point, shopping on the web) is usually one of the major search engines, it is obviously very important for an online merchant to have his site place as highly as possible in the lists of results returned by these search engines in order to capture the most clicks.

There are, of course, both straightforward and devious means to pursue this goal. The straightforward approaches to the problem include providing good and relevant content on your pages, the use of descriptive page titles, careful organization of the pages (to facilitate “spidering” by the search engines), proper use of robots.txt to direct search-engine spiders in the manner you desire, and — perhaps most importantly — linking to other related sites and having them link to yours.

In the cases described here, however, the methods are a bit less up-and-up; these folks are interested only in gaming the system by indiscriminately peppering the web with links to their sites that they intend to be followed only by search-engine spiders. They are, in effect, attempting to leverage the high search rankings of other sites to boost the ratings of their own, without any real benefit to the linking sites. This is often known as link spam.

For the link spammer, it’s all about getting the maximum search-engine exposure for his website with the minimum cost (or, better yet, no cost at all). In most cases, the only way to make this happen, short of taking paid placements or ads, is to increase the “linkfulness” of the website by getting lots of other people to link to it. If the link spammer can circulate his URLs to lots of other websites, particularly those that already have fairly high placement themselves, then his own placement will generally tend to rise.

On this page, I’ll describe a couple of the many varieties of link spam, the kinds that have involved me personally: spammy link-exchange requests, and referrer spam.

Link-exchange requests

Suppose a stranger knocks on your door and announces that he's selling penis pills, and asks whether you would mind putting one of his signs on your lawn (for free, of course) in exchange for him putting one of yours on his. Does this sound like something you would want to do? Well, that's the kind of deal offered by the folks here.

Example one

If you operate a domain and have a website with any sort of ranking or visibility, you’re bound to get occasional messages like this:

Return-Path: <team@betterbreasts.org>
Received: from zaxxon.io.com (zaxxon.io.com [209.198.128.81])
 by postoffice.prismnet.com (8.13.4/8.13.3) with ESMTP
 id k2F0jffH080624 for address-hidden;
 Tue, 14 Mar 2006 18:45:41 -0600 (CST)
 (envelope-from team@betterbreasts.org)
Received: from whois-1.gkg.net (whois.gkg.net [216.217.56.138])
 by zaxxon.io.com (8.13.4/8.13.4) with ESMTP
 id k2F0jNm3083685 for address-hidden;
 Tue, 14 Mar 2006 18:45:40 -0600 (CST)
(envelope-from team@betterbreasts.org)
Authentication-Results: zaxxon.io.com from=team@betterbreasts.org;
 sender-id=neutral; spf=neutral
Received: from amavis.gkg.net (amavisd.gkg.net [216.217.56.20])
 by whois-1.gkg.net (Postfix) with ESMTP id 68258D9DD4
 for address-hidden; Tue, 14 Mar 2006 18:45:18 -0600 (CST)
Received: from semidedicated11.websitwelcome.com
 (unknown [70.85.158.50])
 by whois-1.gkg.net (Postfix) with ESMTP id 3C647D9DD2
 for whois contact address hidden;
 Tue, 14 Mar 2006 18:45:15 -0600 (CST)
Received: from s010600131009604b.no.shawcable.net
 ([70.67.134.178] helo=Bowie)
 by semidedicated11.websitwelcome.com with esmtpa (Exim 4.52)
 id 1FJK8b-0007Qg-5U
 for whois contact address hidden; Tue, 14 Mar 2006 18:45:17 -0600
From: "Kevin Gartland" <team@betterbreasts.org>
Subject: penis pills
To: "Rickconner Net" whois contact address hidden
Date: Tue, 14 Mar 2006 16:45:14 -0800
X-AntiAbuse: This header was added to track abuse, please include it with any abuse report
X-AntiAbuse: Primary Hostname - semidedicated11.websitwelcome.com
X-AntiAbuse: Original Domain - privatedomain.gkg.net
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - betterbreasts.org
X-Source:
X-Source-Args:
X-Source-Dir:
Message-Id: <20060315004515.3C647D9DD2@whois-1.gkg.net>
X-Virus-Scanned: ClamAV version 0.88, clamav-milter version 0.87 on zaxxon.io.com
X-Virus-Scanned: by amavisd-new at gkg.net
X-Spam-Score: 4.726
X-Virus-Status: Clean
X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on zaxxon.io.com
Status:

Hello,

We visited rickconner.net and it seems like it would interest our
visitors. We have two herbal health sites -

http://www.optimumpenis.com - PR3
http://www.penisperfection.com - (Brand New)

We have resource directories on both and invite you to submit your
related site(s).

Our Directories are -
http://www.optimumpenis.com/directory/resources.html
http://www.penisperfection.com/directory/index.html

>From them you'll find our link submission pages.

Thanks for your time,
Shannon Daniels

Here, the link spammers have decided that my site “...seems like it would interest our visitors,” and so they want me to “...submit [my] related site.” When I visited the “link submission pages,” however, I found that they were more interested in getting their link onto my site than vice-versa. If they could do this, then they could leverage my current search engine ratings to boost those of their own sites (which appear to be selling good old-fashioned penis pills).

Does it matter to them that my pages on penis pills and breast enlargement scams are hardly the sort of reading they would want to offer to prospective customers (since I’m highly critical of both)? Not really—their link to my site, if indeed they provide such, will no doubt be buried well inside the site and not prominently featured. As for my links to their site (were I to provide such), the only potential users of these links that they’re interested in are the search engine robots when these come a-spidering to see to whom I’ve linked. They couldn’t give a rat’s ass whether any humans actually use these links (although if they did, it would be so much more gravy for them).

At least in this case, these penis-pill pushers have actually come right out and asked me make a link to them; they did not extend this courtesy to the numerous blogs on which they planted their links, according to my own Google searches for the domains in question. Since many blog systems are vulnerable to automated comment-planting (by which link-spam URLs can be introduced onto a “foreign” site), weeding out such link spam becomes a big headache for bloggers.

What ticks me off about this message is the fact that they wrote me at the contact address included in the whois data for this domain, rather than use the mail address I posted on the site. There doesn’t seem to be anything in ICANN policy that prohibits the use of this information for one-on-one marketing contacts (although collecting whois data in bulk for the development of spam lists is definitely off-limits), but this just strikes me as wrong. If they were really interested in my site and the content thereof, they should have used the contact info I provided on the site, and not the domain contact. After all, the domain owner for a website may not necessarily be responsible for the content that “...seems like it would interest our visitors” (e.g., the registered owner of the geocities.com domain is not responsible for the content of the thousands of individual sites hosted under this domain, so it would be pointless to write to this party about any given site).

Also notable here is the use of a forged HELO (semidedicated11.websitwelcome.com), a non-existent domain name. The mail actually came through 70.85.158.50, in the domain of the ISP ThePlanet.com; this is the same address at which we find both of the sites mentioned above, as well as their authoritative DNS servers, so this must be one very busy host. The domains mentioned here are mostly registered to various parties in and around Liverpool in the UK. Do I want to do a link exchange with a header forger? I don’t think so.

Example Two

Here’s the body of another recent link spam solicitation I received. It seems excessively tricky for my tastes, since it uses both a redirect link (from www.webark.ru) and a web bug (which I have defaced):

I have found your website rickconner.net by searching Yahoo for "how a webserver has a url name instead of a ip address". I think our websites has a similar theme, so I have already added your link to my website.

Your link: http://www.rickconner.net/spamweb/tools.html
Your link title: Rick's spam digest :: {attic} Spam analysis tools

You can view the page where your link was added, approve and modify your listing anytime by clicking link below:
http://www.webpark.ru/go.php?url
   =http://www.netihotell.net/approve.php?key=
 web bug code hidden 

If you will not approve your listing in 10 days, link will be automatically removed.

This message was addressed to my webmaster@ address, which I do not use; however, this is far closer to being a correct contact address for me than my domain-whois info. One could reasonably assume that mailing to this address would actually reach someone responsible for the content of the site, which is not the case when you use domain-whois contact addresses.

Referrer spam

Another way in which spammers try to steal bandwidth from others to promote their own websites is known as referrer spam. Again, the idea is for the spammer to embed his links within other higher-ranking websites using automated means (referrer logs, in this case) so as to raise the page ranks of his own sites.

First, let me explain about web server logs and referrer data. If you run a website and you’re like me, you like to read through your web server logs to find out who’s been by and whence they came. The most interesting field in the server log for this purpose is the “referrer” field, which simply contains the URL that the user’s browser was displaying when he requested your page.

In most cases, the URL in the referrer field of a log record indicates some website from which your page was linked, or else perhaps a search-engine query that turned up your page. By reading these records, you can get an idea of who's linking to you, or what people are searching for when they find your site.

But what if you find something in your logs that looks like this? —

85.17.35.46 - - [18/May/2006:11:26:25 -0500] "GET /spamweb/figleaves.html HTTP/1.1" 200 18384 "http://www.lp.etilaazak" "Mozilla/4.0 (compatible; MSIE 5.01; Windows 98)"
85.17.35.46 - - [18/May/2006:11:26:25 -0500] "GET /spamweb/figleaves.html HTTP/1.1" 200 18384 "http://www.lp.etilaazak" "Mozilla/4.0 (compatible; MSIE 5.01; Windows 98)"
85.17.35.46 - - [18/May/2006:11:26:26 -0500] "GET /spamweb/figleaves.html HTTP/1.1" 200 18384 "http://www.lp.etilaazak" "Mozilla/4.0 (compatible; MSIE 5.01; Windows 98)"

[ 77 more-or-less identical records omitted ]

85.17.35.46 - - [18/May/2006:11:26:31 -0500] "GET /spamweb/figleaves.html HTTP/1.1" 200 18384 "http://www.lp.etilaazak" "Mozilla/4.0 (compatible; MSIE 5.01; Windows 98)"

One would gather from these records that a Windows 98 user at IP address 85.17.35.46 used Microsoft Internet Explorer 5.01 to request my page figleaves.html a total of 81 times in the space of six seconds (that’s over 13 hits per second!). Nobody that I know can click a mouse that fast (particularly where Windows 98 is involved), so we’re obviously looking at some sort of automated program here; this conclusion is backed up by the fact that this client never requested any of the other items (style sheets, pictures, etc.) linked from that page of mine.

The referrer URL for each of these records is the same: a software developers’ site in Poland (I mangled the host name in the log records above for reasons that will shortly become clear).

Just what exactly is the point of this, besides bloating up my log files (which fortunately aren’t included in my shell account disk quota)? Simple — referrer spam.

It turns out that many bloggers (and other webmasters) like to use server log analysis tools to generate online listings of referrers so that they can learn more about who’s visiting their sites; often, these referrer logs are linked publicly so that others (including search engines) can see them as well. What the referrer spammer is doing is trying to embed his URL into these public referrer logs.

Just as in the link-exchange messages described above, the idea isn’t so much to get human beings to click on the links as it is to get search engines to “spider” the pages and collect these links so as to boost the page ranks of the spammer’s site.

It is very easy to forge an HTTP request to include a fake referrer URL; you don’t even need a web browser to do this, a simple script program (Perl, Visual Basic, etc.) will suffice. In fact, you have to use such a program to get the kinds of “hit rates” we see here.

Finding one or two such records in your own server logs isn’t worth your throwing a fit over, but many of these guys are like hungry piglets elbowing their siblings away from the sow’s teat: they can make dozens or even hundreds of page requests in a very short time, thereby tying up the web server with pointless transactions that prevent others from getting to the site (and might even overstress client-side tasks such as CGI or SSI that might be associated with the page). This amounts to a denial-of-service attack, albeit one with (fortunately) short duration. Such large numbers of requests can also increase the costs for operators of high-volume websites who are often billed based on the bandwidth their websites consume.

In other words, the referrer spammer is stealing bandwidth (and possibly even cash money) from the website operator twice: once when he makes the bogus HTTP queries, and again when the search engines pull these multiple instances of the referrer URL from the victim’s referrer logs.

What can you do about referrer spam? Not all that much, unfortunately. If you maintain a public referrer log, you may want to pare these entries out by hand if you can, or else just stop publicizing the referrer data (i.e., keep referrer logs in a private space where you can see them but others cannot).

You really can’t stop people from trying to fetch files from your server, even if those files are nonexistent or inaccessible to the public, or even if the same party has requested the same file many times in a short space of time. You could block the particular IP address used for the attack from future use of your website (e.g., by putting that IP in a hosts.deny file on your server box, or using Apache configuration directives to block them), but this probably won’t be effective, and might prevent legitimate users of the same IP address from visiting your site (e.g., if the address is in a dynamic IP pool for broadband or dialup users, or is a web proxy serving a number of users).



 home | legal stuff | glossary | blog | search

 Legend:  new window    outside link    tools page  glossary link   


(c) 2003-2007, Richard C. Conner ( )

06711 hits since March 27 2009

Updated: Sat, 18 Aug 2007