Legend: new window outside link tools page glossary link
This page offers a glossary of selected terms you may hear in connection with spam. Many of these definitions were gleaned from other sources, but were rewritten by me, so any mistakes or misattributions are most likely my fault and can be reported by contacting me at .
A type of advance-fee fraud propagated via e-mail.
The department of an ISP or other network service that deals with complaints of network abuse (including spam).
Typically, although not always, they can be reached at the e-mail address "firstname.lastname@example.org" where "xxx.yyy" is the name of the domain in question. You can use a particular kind of WHOIS lookup to find out for sure.
The policies published by an internet service provider (ISP) that set forth what it allows (and forbids) its customers to do with the service it provides.
Not all ISPs have a published AUP, but most serious retail ISPs do, and they also make these part of their customers’ contracts (so that the ISP has grounds to terminate service if the customer violates the AUP). AUPs generally include some provisions barring spam; some go as far as to forbid the customer using spam to promote services (like websites) offered using their facilities, even if the spam itself does not pass through their facilities.
A type of fraud (often perpetrated via e-mail) in which the victim is offered large amounts of cash with no apparent risk on his own part. Before he can get any of the money, however, the victim is required to fund bank accounts or pay advance fees that the fraudster will simply steal.
(see also Nigerian 419 scam)
The advance-fee fraudster’s pitch is calculated to appeal to the greed and ignorance of the target, and often seeks to make the target himself think that he’s doing the swindling, and not the other way round. The bite may take any of various forms:
In internet commerce, a person who receives money or other considerations in exchange for using his internet resources to direct business to other firms.
For example, if you enter into an affiliate relationship with a web merchant, you put a special link to the merchant’s website on your own site or in your outgoing e-mails; you may then receive cash, merchandise, or discounts based on the volume of sales your links have generated for the merchant. Many perfectly legitimate and respectable businesses (like amazon.com) use the affiliate model to great advantage, but spamvertisers and mainsleaze businesses often pretend to be ignorant of the spam activities of their affiliates, whom they use to amplify their mail coverage (and distribute the blame).
An alternate name, registered within DNS, for an internet host.
For example, a web server host at capuchin.monkey.foo might be assigned the alias www.monkey.foo. (and the bare domain name monkey.foo might also be set as an alias for www.monkey.foo, so that surfers who like to omit the “www.” from their URLs can still reach the main web server host). The aliases and the “real” host name should all be resolvable by DNS, and one or more of them may also appear in a reverse-DNS lookup of the IP address.
(“American Standard Code for Information Interchange”)
One of the oldest and most common character sets used in computing today (it dates from the 1960s), and the one on which internet e-mail standards are based. Any text that appears in an e-mail message must be ASCII, or else must be identified (using MIME) as being of some other character set.
See also: binary.
ASCII assigns an individual character value (or “glyph”) to each of the numeric codes 0 through 127, which makes it a “seven-bit” code (i.e., 2^7 = 128 values), although an individual ASCII character is generally stored in eight bits (one byte or octet) with the high bit masked off (set to zero). ASCII only supports the letters, numerals, and punctuation marks commonly used in American English (plus a set of basic data link and carriage control marks); in order for an e-mail message to be able to contain accented characters, foreign alphabets or glyph-sets, and other specialized characters, the message must be encoded using MIME.
(see acceptable use policy)
The IP address returned by an authoritative name server for a given host name or alias. The address returned by a local name server is usually not authoritative because it probably came from the local server’s cache and not directly from the authoritative name server (and thus might be outdated by a few minutes or hours).
Reporting spam websites usually requires you to find their authoritative addresses, since spammers can use various tricks to confuse matters.
New web domains, when they are entered into DNS, must be assigned one or more authoritative name servers (usually at least two are provided, preferably in different IP blocks). These authoritative servers are where the rest of the world will be sent when looking for hosts in this new domain.
Local name servers (such as those used by the typical internet user for everyday transactions) get their info from the authoritative name servers for each domain, and usually retain this information for a time in their cache (memory) before “refreshing” it with another authoritative lookup.
If a spammer can set up his own authoritative name servers for his website domains, he can use these to rapidly change the apparent address of his websites among large "botnets" of compromised home computers (see IP rotation).
(short for “automatic responder”)
A machine or software program that indiscriminately sends automatic replies to all incoming messages.
Mail systems often use autoresponders for “vacation messages” or other out-of-office notices, mailing list management systems (and other systems that are controlled by commands sent by e-mail), or challenge-response spam filters. Generally, the autoresponders simply mail their responses back to the from-address of the incoming message, without considering whether this address may have been stolen or forged.
You should not use an autoresponder for any purpose unless absolutely necessary, because these can result in unwanted “blowback” for those whose e-mail addresses have been, eh, “borrowed” by spammers.
A form of MIME encoding that converts large blocks of possibly binary data into larger blocks of ASCII text data (using a numerical technique called “base64”) for safe transmission over e-mail systems.
Since base64 can be applied to pure text as well as binary or mixed data, and since it will disguise the text content of the data, it is often used by spammers to hide the content of their messages from lazy spam filters.
A form of content-based spam filter named for its application of the Reverend Thomas Bayes’ famous theorem of conditional probabilities. Such a filter uses Bayesian inference to determine whether or not a message may be spam.
A Bayesian filter looks at individual words (or “tokens”) in a message and “weighs” them for their frequency of appearance in spam versus their frequency of appearance in general, non-spam e-mail; when the weights of all these words are combined using the Bayes equation, the result is an estimate of the probability that a message is spam.
One distinctive feature of the Bayesian approach is that it relies on objective analysis of actual data (i.e., large, frequently-updated compendia of known spam mail) rather than on some filter-writer’s supposition of what is in spam and what isn’t, and proponents of Bayesian filters claim very effective filtering with low false positives and negatives.
Many of the filters now found on popular mail clients (such as Apple Mail) are based on Bayesian principles.
The most often cited treatise on Bayesian mail filtering (which provides a very readable overview of the technique) is Paul Graham’s “A Plan for Spam,” found at http://www.paulgraham.com/spam.html.
(also “web bug”)
A special type of hyperlink planted in an HTML-formatted spam message; it is used to signal back to the spammer which of his recipients specifically have opened or responded to the mailing. The beacon URL is designed to be undetectable to the recipient, and may require no effort on the recipient's part in order to "fire."
Beacon URLs usually take the form of invisible images or undefined anchors that have clear or encrypted data appended to them; these data will show up in the spammer’s web server logs or private databases when the link is fetched, and can be used for list laundering as well as for measuring the penetration of a spam remailer’s campaigns (so he can deliver a bigger bill to his spamvertising customer). Most web bugs use the "<IMG SRC=...>" HTML tag, since this tag causes an automatic fetch of the image each time the page is displayed (i.e., you can’t stop it being transmitted).
A typical beacon URL found in a spam message might look something like:
when the image is fetched by your browser or mail program, the data in the call (after the question mark) is recorded in the spammer’s web server logs.
In the context of e-mail systems, describes a block of data that should not be treated as ASCII text.
While “binary” has a specific meaning to mathematicians and computer scientists (which I won't cover here), in the world of e-mail processing it generally just means “not ASCII.” Binary data may contain byte values that are outside the legal ASCII range (i.e., greater than 127), and even the legal values the block contains may not be meant to be interpreted as ASCII text.
Binary data may represent text in some other character set besides ASCII, or they may be non-text information (e.g., pictures, MP3 files, or proprietary application documents like spreadsheets). They may also represent machine-language code (like executable programs or libraries). The basic protocols for composing and transmitting e-mail do not allow the transmission of non-ASCII data at any point; in order to include such data in e-mail messages, the data must be encoded to a text-like form using MIME.
(not to be confused with block list)
See also: whitelist.
Blacklisting is pretty ineffective at catching spam since it depends upon the from-address given by the spammer. The spammers seldom re-use these forged or stolen addresses, so adding them to a blacklist could be a big waste of time.
(from the Hollywood Western cliché in which “bad guys always wear black hats”)
Describes an internet provider that does not restrict the spamming activities of its customers, or actively colludes with them (e.g., “I had to quit using that remailing service, their hats were very black.”)
People will sometimes ask for a “hat color check” on a particular internet service; they are seeking to know whether the service has a reputation for effective spam control.
See also: white hat.
(also blocking list or DNSBL; not to be confused with blacklist)
A database of IP addresses (or sometimes web URLs) suspected of being involved in spam or other abuse; generally a blocklist is not directly used by end-users, but is instead queried by mail hosts using so-called DNSBL procedures in order to reject or tag probable spam messages at the time they are offered for delivery.
Today, blocklists form the core of the most effective ISP-based spam filtering systems; when effectively managed and properly used, they can enable ISPs to reject or detain 90% or better of incoming spam mail.
There are dozens of blocklists maintained by many organizations (and even individuals), and mail hosts can query these blocklists before allowing untrusted hosts to leave mail messages for them. If the sender’s IP appears in the blocklist, the mail host can reject delivery of the message, so that it will not appear in the recipient’s inbox. Alternatively, the mail host can simply tag the message as spam (perhaps by inserting “[SPAM]” into the subject line) so that the recipient can take action on his own.
Many of the most useful blocklists follow automated or semi-automated procedures to add addresses (e.g., mail sent to spam trap addresses), and also generally delete them from the list according to some automated procedure (e.g., after some time period free from offending mail from the address). This largely removes the elements of personal vindictiveness and human inertia from the picture.
Blocklists are somewhat controversial. Despite the fairly rigorous procedures that most blocklist operators follow for naming addresses or blocks of addresses to their lists, innocent or quasi-innocent providers’ addresses are occasionally added; bitter complaints and even lawsuits often ensue. Blocklist operators are often accused (often by spammers, no surprise) of being “vigilantes” or “censors” although it is the action of the blocklist user, rather than the blocklist operator, that results in denial of service to blocklisted addresses. Blocklist users depend upon the probity and good judgement of the blocklist operators to make sure that the blocking is legitimate.
Automated responses sent indiscriminately to innocent parties in response to incoming spam e-mails.
The term “blowback” describes various kinds of automatically-generated e-mail messages, including vacation messages, challenge-response notifications, MDA bounces, and other “autoresponder” mails. These messages are typically sent back automatically and immediately to the return-path address of each incoming message. In the case of spam, the return paths are almost always false, and may belong to innocent third parties who suddenly find themselves receiving dozens (or more) of mysterious responses to messages they never sent. Blowback is thus an indirect nuisance resulting from spam.
(short for 'robot')
The term "bot" has two different meanings in the context of spamming.
A person who organizes a botnet.
(short for “robot network”)
A term usually applied to a group of open proxy computers (“zombies”) used for spam transmission and related tasks. The operator of a botnet can use its services for his own spamming, or can rent them to other spammers for profit.
(more formally called non-delivery notification (NDN))
See also: reject.
In theory, the mail exchanger should reject the transfer of a message outright if it can’t be delivered (this is sometimes called an "SMTP reject"); however, some MXs simply punt this job to the MDA. Since the MDA can’t “reject” the mail (it’s already been accepted by the MX), the best it can do is to send a bounce message back to the return-path address indicating that the mail is undeliverable]. When spammers forge other people’s e-mail addresses into their messages, these people often receive hundreds or thousands of bounce messages from ill-configured mail systems (a form of blowback).
The service offered by a hosting provider who claims to be able to host websites (i.e., for spammers) that will not be shut down due to AUP violations, questionable content or use of spam marketing.
(“Controlling the Assault of Non-Solicited Pornography and Marketing Act of 2003”)
The identifier of a bill signed into federal law by U.S. President George W. Bush in late 2003, intended to make spamming a federal crime within the United States.
CAN-SPAM is supposed to limit spam by making it a federal crime, but in practice it permits opt-out spamming (provided certain conditions are met), attempts to invalidate some tougher anti-spam laws already in effect in individual states, and does not effectively deal with the problem of off-shore spamming and spam website hosting. Although wags started calling it the “YOU CAN SPAM” act almost as soon as it came into effect, the law may well have had some success in curtailing spam, particularly among mainsleaze operations. Some prosecutors are relying upon more concrete indictments, such as wire fraud or computer crime, to go after some of the more recalcitrant spammers.
(“Completely Automated Public Turing test to tell Computers and Humans Apart”)
A mechanism used on many websites to ensure that users are human beings and not software “robots” bent on abuse. The website visitor is required to transcribe some text from a complex, distorted graphical image into a web form before proceeding with his business.
See also: (official website for CAPTCHA)
Developed by researchers at Carnegie Mellon University, CAPTCHAs are based on the supposition that automated tools (such as spam harvesters) cannot reliably decode text from distorted images, a task that humans can usually do without much trouble (early CAPTCHA tests were rendered ineffective when attackers found means to get their programs to decode the images; modern tests apply more extensive text distortion and visual noise to make automated decoding much more difficult).
CAPTCHAs are a sort of inverse of the classic , which is designed to test whether a human can discriminate between a conversationalist who is another human and one that is a machine only in the case of the CAPTCHA, it is the machine that must do the discriminating).
CAPTCHAs are deployed to prevent automated activity on websites that are intended only for use by humans. For example, many domain registrars protect their web-based WHOIS lookups with CAPTCHAs, as do many providers of other online tools that might be used for spam harvesting or other forms of abuse. Individual users can also deploy CAPTCHAs to protect web logs or message boards from being flooded with automated traffic.
(probably from “cartoon attorney”)
A loud (but usually empty) threat of legal action made by a spammer against those who seek to limit his operations. The connotation is that a “cartoon attorney” (i.e., an imaginary one) is the only sort of legal counsel at the spammer’s disposal.
Typically, the cartooney is directed at blocklist operators that have listed the spammer’s addresses for blocking, or ISPs that have taken action to block mail from the spammer. The spammer may also target individuals who post information, allegations, or threats in public fora (web logs, usenet groups, etc.).
Most cartooneys are nothing more than crude and blustery attempts to intimidate. However, spammers do sometimes make good on their threats of litigation against individuals, particularly when they think they can deflect attention away from the their own abusive activities and onto possibly ill-advised or intemperate responses of the plaintiffs (such as when someone posts threats or exposes private information about the spammer). An excellent reason to remember my spam rules #5 and #6.
An e-mail address to which any incoming e-mails for a domain are delivered if the actual address does not exist. Commonly used with small virtual-domain setups.
For example, mail to email@example.com or firstname.lastname@example.org or even email@example.com might be directed to the catchall address firstname.lastname@example.org if these addresses were not in operation at the foo.bar domain.
While catchall addresses are a well-intentioned feature (ensuring that any mail directed to a domain can be received and directed to someone in charge, even if the address is misspelled), they cause problems when mixed with spam and other forms of mail abuse. For example, if a spammer decides to forge non-existent addresses in the foo.bar domain into his messages, then hundreds or thousands of bounces for all of these messages will be sent back to the victim's catchall address.
If you operate a private domain, you would be well advised to have your hosting or mail service disable catchall addresses unless you have a good reason to use them.
The Coalition Against Unsolicited Commercial E-mail (http://www.cauce.org/), an anti-spam organization.
(commercial electronic mail message)
A synonym for commercial e-mail (which may or may not be spam), coined in the CAN-SPAM act, and used virtually nowhere else.
A fraudulent e-mail (or postal letter, fax, etc.) that invites you to send money to people on a list, and then add your own name to the list and forward the modified message to others (so that they can send money to you).
The typical chain letter usually promises extravagant returns to the participant (although the originators are the only ones likely to make any significant money, owing to the laws of geometric propagation and market saturation). The chain letter often uses the exchange of trivial goods for money (such as “e-books” on how to spam) as a sort of fig-leaf, although this doesn’t prevent it from being considered mail fraud if it involves postal mail at any point (e.g., for sending the funds). Considered by law enforcement folks to be a form of gambling, the chain letter is a very old scam that predates e-mail. Sometimes also known as an MMF (more information here).
(also “C/R filter”)
An “active” sort of spam filter that works by blocking unknown senders’ mail from delivery until they prove that they are well-intentioned human beings and not spammers or robots.
See also: my comments on challenge-response filters (elsewhere on this site).
Challenge-response filters have gained some traction in recent years among individuals and even a few ISPs. In the C/R scheme, e-mail from an untrusted source is temporarily withheld from delivery to the intended recipient; a “challenge” message is automatically sent back to the from-address of the message, asking for a trivial “response” (e.g., replying to the challenge message, or visiting a web link), and if such a response is received, the sender’s message (and usually any further message from the sender) is released for delivery. The C/R technique depends for its effectiveness upon the fact that most spams have invalid, stolen, or unmonitored from-addresses, and so challenging them will likely never yield a response.
Challenge-response filters may seem to work for their users, but they are a nuisance to almost everyone else (with the exception of the spammer, who suffers no more than to lose a delivery or two).
(originated among usenet anti-spammers)
A very derogatory term for a particularly obvious or maladroit spammer (usually of the “network marketing” or “MLM” variety).
The chickenboner, according to a popular imaginary word-portrait, is a “fat bald redneck” who taps away at his computer surrounded by buckets of rotting carry-out chicken bones. This term was intended to apply to the sort of stupid, inexperienced, or low-rent spammers who try to pass themselves off (falsely) as successful, wealthy, and credible business people, but it may not apply to many in the current cohort of professional spammers (who often really are successful and wealthy, and anything but stupid).
(also “closed-loop opt-in”)
Characteristic of an e-mail list that requires you first to request to be added, and then to confirm separately that you want to be added, before you can begin receiving mail.
Confirmed or “closed-loop” opt-in is the best way to run an ethical mailing list, since it virtually eliminates the possibility that a recipient could be sent mail when he did not ask for it.
A spam filter that depends upon analysis of a message’s content (in the body or subject line) to determine whether the message may be spam.
The simplest content filters may just look for certain words or patterns (like “viagra” or “S.1618,” but these are not as effective as content filters using more complex techniques such as Bayesian filtering.
(“cascading style sheet”)
A technique used in website design to provide precise control of the placement and appearance of text, images, and other elements on a web page.
Some of the more advanced spammers employ CSS tricks to obfuscate their messages or to plant web-bugs.
(Dynamic Host Configuration Protocol)
A protocol (defined in IETF RFC 2131) whereby a host computer connects to a server and receives assignment of an IP address, along with other useful items (such as the names of network gateways and name servers). DHCP automates the addition of computers (and other devices) to local networks, and makes efficient possible.
(also known as “directory harvest attack” (DHA) or “MX probe”)
A type of attack in which the spammer repeatedly attempts to deliver empty messages to large numbers of addresses within a given domain in order to learn which among these addresses are deliverable.
See also: How a dictionary attack works (elsewhere on this site).
The dictionary attack is a very-frequently used means by which spammers can collect deliverable addresses for their spam lists. The spammer can use a dictionary attack to test any addresses he has for a domain, even those he may have simply have guessed at. For example, the spammer could contact the MX host for the domain xxx.yyy and try to deliver a message to “email@example.com;” he would have a fair probability of actually reaching someone at this address. It’s as though the spammer is throwing a whole “dictionary” of possible addresses at the MX host.
Typically, the dictionary attack makes itself known to end-users in the form of strangely blank messages without content or subject line.
("domain internet groper")
“dig” is a common network utility that lets users interact directly with name servers to get (very) detailed information about domains and hosts. It can also find the mail exchangers used to send mail to addresses within a domain.
See also: How to use dig (elsewhere on this site).
The direct-to-MX technique allows message origins to be disguised (although not completely), and keeps the outgoing mail from being monitored or detected by the sending ISP’s mail system. Direct-to-MX mailing can be done from a simple dialup or broadband account belonging to the spammer, or by open proxy machines controlled by the spammer. The vast majority of spam is now sent via direct-to-MX, usually from open proxies.
(see domain name service)
(“DNS block list”)
See also: blocklist.
The DNSBL mechanism makes it easy for mail host operators to include blocklist checking in their mail handling procedures. When the mail host queries a DNSBL for a particular host name or address, it receives not an IP address, but a coded answer resembling an IP address. The code indicates the blocklist’s opinion of the address in question, in particular whether the address is implicated in spam and should not be trusted.
The term “domain” is also used within proprietary Microsoft Windows networks, but has a slightly different meaning in this case (which is beyond the scope of this glossary).
A name (e.g., “rickconner.net”) for a group of individual host computers, each of which will have a host name and possibly one or more aliases, as well as common use of authoritative name servers and mail exchangers.
The domain name is in some ways analogous to a person’s family name (i.e., it is usually the same for others in his family), while the fullly-qualified host name or alias (e.g., tiger.rickconner.net) is analogous to the person’s full name (it has a first portion to distinguish it from other hosts in the domain).
Public internet domain names must be registered by their owners with an ICANN-accredited domain registrar in order for their hosts to be included in the domain name service. Sometimes, information about spammers can be uncovered by using a WHOIS tool to look up information about their web domains.
A distributed database, accessible everywhere on the public internet, that can convert or “resolve” host name references to IP addresses (or vice-versa), identify mail exchanger hosts for a domain, and provide other useful features.
DNS queries are made to a type of internet host known as a name server. Typically DNS queries are done invisibly to the end-user by his applications (web browsers, mail programs, etc.), but “manual” DNS-related tools such as nslookup or dig are often used by investigators for tracing and identifying the sources of spam messages.
A company that is accredited by ICANN to provide domain registration services for the public.
See also: looking up domain registration info for a spam domain (elsewhere on this site).
Registrars are required to adhere to the terms of an ICANN agreement regarding the process for registering domains, including the collecting and posting of registrant information, but they are otherwise free to set their own policies regarding the types of domains they offer, their prices, terms of service and payment, etc.
Domain registrars vary widely in their attitudes toward spam. Some will cancel domain registrations wherever the domains are shown to have been used in spam, while others choose not to police their customers in this fashion. Still others apparently seek out the spam trade, allowing customers to register dozens or hundreds of domains automatically and in bulk, wilfully accepting incomplete or bogus registrant info, and taking no responsibility regarding the use of these domains in spam or other forms of abuse.
(also dynamic IP)
A practice whereby internet providers assign to users’ computers on demand, and only for a limited period of time. Dynamic addressing allows a provider to serve a large group of users with a smaller number of addresses, on the assumption that only a fraction of users will be online at any point in time. The previous practice of permanently assigning IP addresses to clients has been given the retronym static addressing.
See also: DHCP
If your ISP uses dynamic IP (as do most retail providers these days), then when you sign on to your service your computer will request assignment of an IP address (usually via the DHCP protocol). You’ll get a “lease” on an address that is good only for a limited time (say, 24 hours). Once your lease runs out, or if you sign back onto your service after having disconnected, your computer will need to ask for another address assignment. The upshot of this is that a given computer will have an IP address that changes from day to day. From the user’s point of view, this process is completely transparent and tends not to cause any problems.
Dynamic IP comes into the study of spam because most spammers now use of malware-infested home computers (“”) to send their mail (and perform other useful tasks for them). The majority of these botnet hosts, one imagines, are on dynamic IP networks, so their addresses are constantly changing. So, while spam filters and tracing tools can very easily detect the IP addresses from which spam originates, they cannot tie these addresses back to specific computers or users. On the other hand, many spam filters can determine whether an IP address belongs to a dynamic-IP “pool,” and thus may not belong to a bona-fide mail host (which should have a static IP address outside of the pool), and this capability is often used to spot spam attacks.
(also “mailing list”)
A message broadcast system that uses e-mail to deliver its messages; it generally consists of a closed group of members who send their messages to a central e-mail address (i.e., they “send to the list”); a computer program monitors this address and broadcasts any incoming mail (from members) out to all other members of the list.
The software programs typically used to run mailing lists include majordomo and LISTSERV, which names are sometimes seen in conjunction with mailing lists (e.g., “join the tractor-pull LISTSERV”).
A good mailing list is usually quite secure against spam, because only bona-fide members can send mail to the list, and membership usually requires a voluntary signup and verification (and may even be restricted by invitation only). It is also easy to kick miscreant members off such a list should they begin spamming. However, such lists are vulnerable to harvesting if they post publicly-accessible archives (e.g. on the web).
It is generally easier to control and moderate traffic on mailing lists than on usenet groups.
Many old-style LISTSERV/majordomo mailing lists have migrated to web-based message boards, or hybrids of mailing-list and web-board (e.g., Google Groups).
A form of MIME encoding that operates on visible portions of an e-mail header (such as an address nickname or a subject line).
Encoded-word encoding allows non-ASCII text to be used in address nicknames and subject lines of e-mail messages; it does this by encoding this text to a safe ASCII-like form. The feature permits address nicknames and subject lines to be rendered in non-ASCII character sets for the convenience of those who use such character sets; it also allows spammers to disguise portions of their messages during transit.
Escapes are often used (legitimately) to modify data that might be confusing to computers in plain form (e.g., so-called URI escapes are applied to complex URLs quoted in HTML markup). Spammers can use escapes even when they aren’t needed, in order to disguise portions of the message or the origins of their network resources.
Microsoft’s trade name for its proprietary e-mail management system, popular among large corporations or institutions that are Microsoft shops.
Just to put things in perspective, Exchange is the service that your company uses to run its e-mail system; it is not the client (program) you run on your own computer to pick up your mail. Exchange is supported primarily by Microsoft’s Outlook mail client, and by an increasing number of non-Microsoft clients as well. Exchange generally does not use open standard protocols (SMTP, POP, etc.) for internal relaying and delivery of mail, although it can be made to use SMTP to transfer messages to and from external domains.
Spam recipients who use Exchange-based mail services are at a relative disadvantage in tracing spam messages, since Exchange clients often make it difficult to see SMTP headers, and will often munge the format of messages and of complex MIME-based e-mail bodies.
A spam message that is wrongly tagged as non-spam and allowed to pass through a spam filter.
See also: false positive.
Spammers are always on the lookout for ways to make their messages pass the filters as false negatives. A low rate of false negatives is one characteristic of an effective spam filter.
A non-spam e-mail message that is wrongly tagged as a spam message by a spam filter.
See also: false negative.
False positives can be more worrisome than false negatives because they might represent important personal communications that the recipient will never see unless he carefully examines the messages trapped by his filter. Unfortunately, however, the more spam a user receives each day, the greater is the likelihood that a false-positive message could be overlooked in the noise. A low rate of false positives is a measure of the effectiveness of a spam filter.
The e-mail address that appears in the “For” clause of a routing line in the typical e-mail header; it indicates the e-mail address to which the sender intends the message to be delivered.
The for-address, where present, indicates the individual e-mail address to which the message originator wished to send the message (using the SMTP “RCPT TO” command). It may be different from the to-address, particularly in spam mail (which explains why you can receive spam that does not appear to have been addressed to you). The for-address is typically not displayed to the user unless he elects to view the full mail headers for the message.
An SMTP mail header that has been manipulated by a spammer in order to disguise the message’s true origins.
Common header-forging tricks include use of phony from- and return-path addresses, bogus HELO mail host names (often simple domain names like “aol.com”), and fictitious routing lines to give a false history to the message. The vast majority of spam messages make use of forged headers, although they are explicitly prohibited by the AUPs of most ISPs (as well as by the CAN SPAM law).
“Formmail” has come to be a generic term for mailback apps in general.
The e-mail address that appears in the visible “From:” field of the customary e-mail header. It is supposed to be (but isn’t always) the e-mail address of the person who sent the message.
See also: return-path address.
The from-address (for technical reasons) need not be that of the ultimate sender of the mail; in nearly all spams, this address is absent, bogus, or stolen from an innocent party.
(or “greylisting;” less “black” than “blocklisting”)
See also: Wikipedia article on graylisting.
As we know, spammers think that following SMTP rules is for other people; Graylisting is an elegant technique that uses their own disdain against them. It works to discourage spam deliveries at the (theoretically slight) cost of delaying honest mail.
Graylisting requires a bit of extra work by the MX (particularly for large mailing services that must coordinate among many MX hosts), but possibly not as much CPU time as would be required to process and filter the spam if it were not rejected. Graylisting can result in delays in delivery of honest mail that may be objectionable in some cases (e.g., when waiting on an e-mail confirmation of a web transaction). Still, many mail services are pleased with the performance of graylisting (as are, one imagines, their users).
Graylisting has spawned a couple of related tricks that also exploit the fact that spam mail senders don’t follow the rules (of SMTP). In the technique curiously called nolisting, the ISP deliberately sets up a dummy or non-existent MX as its primary mail host; bona-fide mailers will follow SMTP and automatically move to the secondary MX when the primary does not work; spammers (it is supposed) will move on without trying the secondary MX. On the other hand, there is evidence to suggest that spammers often go to the secondary servers first (in the hope that these will be less well-protected against abuse), so nolisting may not be as effective as first thought. Still, it is very cheap protection.
(by contrast with “SPAM”)
E-mail that is not spam (e.g., “Since using the new filter, my spam-to-ham ratio has dropped sharply.").
I don’t really like this term, since it carries the somewhat elitist implication that ham (the food) is somehow superior to SPAM (the food).
Collecting e-mail addresses for a spam mailing list.
Spammers employ various techniques to gather potentially-deliverable addresses for their lists: collecting them from websites or usenet groups using spambot software, testing them out on MX hosts using dictionary attacks, or simply asking users to surrender them (by offering services like electronic greeting cards, online dating, and the like).
A piece of random text appearing in the subject line or body of a spam message, designed to elude a basic form of spam filtering on outgoing mail hosts.
It used to be that outgoing mail hosts would try to detect bulk mail by distilling each outgoing message into a compact string called a “hash;” by comparing the hashes of successive messages, it could decide whether they were identical, and if there were too many identical copies of the same message submitted by a given user, the mail host could stop all deliveries for the user and end the spam run. Spammers soon figured out that by slightly altering the message (using a mutable “hashbuster” string) every few iterations, they could get around this process. Hashbusters frequently appeared in the subject lines of messages, or were appended to the ends of the message bodies. Hashbusters are no longer as common or as useful as they once were, since spammers now use direct-to-MX mailing (which does not incur hashing) rather than risk interacting with outgoing MTA hosts.
(see mail header)
(from the SMTP command of the same name)
The host name given in an SMTP transaction by a mail host that wishes to leave mail for another mail host.
SMTP does not require that the HELO be authentic or correct, and spammers invariably use bogus or forged HELO names (like “aol.com” or “BADF00”). Since the numeric IP address of the sending host is known to the receiving host (it comes from the basic IP socket transaction), it can be matched to the HELO name using DNS lookups; if the match fails, it’s pretty conclusive evidence of header forgery.
(from the spycraft term meaning a well-baited trap for enemy agents or targets)
A vulnerable host, placed on the network for the specific purpose of attracting attack (under controlled conditions) by hostile parties. The honeypot is usually used to detect and study various types of system attacks (not just spam).
See also: about the 'host' command (elsewhere on this site).
The text name assigned to a host computer (e.g., “horace.rickconner.net”).
See also: domain name.
Locally (i.e. on a local area network), a host might simply be known by a single name (which can be set or retrieved on Unix systems using the "hostname" command), or example “horace.” When called from outside the local area, the computer will need to have its domain name appended (e.g., “horace.rickconner.net”). Such a “fully-qualified” host name is thus analogous to a person’s first name (which is his own) and last name (which he shares with others in his family). In addition to its “proper” host name, a host can also be called by one or more aliases.
An ISP that provides hosting services (disk space, network access, software tools, etc.) to individuals or companies that wish to operate websites.
See also: bulletproof hosting.
Most spammers use websites to collect orders and inquiries, and these usually have nothing to do with the mail service(s) that they use (or steal) to send their spam. Some hosting services have strict acceptable-use policies that prohibit customers using spam (even from outside services) to promote their sites; many other providers, however, don’t much care. Still other hosting services specialize in the spam trade (see bulletproof hosting).
(“hypertext markup language”)
The “markup language” used to create web pages that include formatted text, embedded images, client-side automation, and other features.
See also: HTML 4.0 specification at http://www.w3.org/.
Most modern e-mail client programs can also understand and render HTML-formatted mail messages, and most spams are now delivered in HTML form (which offers many advantages to the spammer, such as the use of beacon URLs).
A type of character escape used in HTML markup, in which certain unusual, non-Latin, or “reserved” characters are represented using a special notation (e.g., "<“ for the less-than sign, or ”B“ for ”B“ -- 66 being the decimal character code for ”B" in the Latin-1 character set).
Occasionally, spammers will use HTML CEs to disguise the contents of their messages during transit.
(“hypertext transfer protocol”)
See also: IETF RFC 2616.
HTTP is the everyday workhorse of the World Wide Web. Spammers use HTTP in conjunction with their HTML-format e-mails for a variety of malign purposes.
(“Internet Corporation for Assigned Names and Numbers”)
The international non-profit corporation set up to administer various technical features of the internet (including creation of top-level domains, domain registration, protocol identifiers, etc.). They are, in effect, the czars of the Internet, although they tend to delegate most of their work to businesses and private individuals.
See also: ICANN website http://www.icann.org/.
(“Internet Engineering Task Force”)
An arm of the Internet Society (ISOC) (http://www.isoc.org/) that oversees the technical development of internet services. They exert their authority through the development of requests for comment (RFCs) and standards (STDs) that define how these services are to work. Most common internet services, including SMTP (e-mail) and HTTP (web services) are defined in IETF RFCs and standards.
See also: IETF website http://www.ietf.org/.
Spam mailings in which the “pitch” is embedded in a graphic image (usually a GIF or JPEG, sometimes a PDF). It is far more difficult to extract and analyze message content from a raster image than from a simple string of text, so this technique helps spammers get their messages past content-based filters.
The images are generally embedded directly into the e-mail packet as MIME attachments, which greatly increases the size of the message packet. This technique was once widely used, particularly by stock spammers (who used a great deal of “boilerplate” text that was otherwise easily spotted and detained). With the development of spam filters incorporating efficient optical character recognition (OCR) technology, this technique seems to have gone into a hiatus.
See also: Page about stock spam with examples of image spam (elsewhere on this site).
(“internet mail access protocol;” formerly "interactive" mail access protocol)
IMAP is less commonly used than the similar post office protocol (POP3). These protocols don’t come up very often in the study of spam, because they do not figure in the propagation of spam.
A party whose computers, networks, or other resources have been used without his knowledge or permission to transmit or support spam. Such parties should not be accused of spam or abuse insofar as it is possible to avoid doing so.
See also: SpamCop Wiki entry on Innocent Bystanders.
Classic examples of innocent bystanders include:
Innocent bystanders may not to be to blame for the spam, but they may often be negligent in failing to secure their computers and networks against exploitation by spammers and other crooks.
A business that offers internet services (such as home internet hookups, commerce websites, etc.).
The term can also generally refer to a school, government agency, private employer, or other institution that provides such services for its members, employees, or constituents.
See also: IETF RFC 791.
IP provides for the transfer of packets of data between computers on large, heterogeneous networks; additional “higher-level” protocols (such as TCP, HTTP, and SMTP) “sit on top” of IP to define what these packets must look like and how they are to be processed once received.
A numerical address, usually rendered as a so-called “dotted quad” (e.g., 192.168.0.3) that uniquely identifies a computer (or other device) on an IP network (like the public internet). In other words, the IP address is the address at which a particular machine can be reached for IP communications.
The dotted-quad is the customary or “canonical” form for standard IPv4 addresses, although they can also be given in other numerical forms (which spammers sometimes use in order to deflect scrutiny from their resources). The new IPv6 addressing scheme uses other forms to represent its much larger addresses.
The IP address is independent of any host names or domain names that a machine might have; the linkage between addresses and names is maintained in the domain name service.
The “traditional” form of IP addressing, which uses a 32-bit register that can contain any of up to about 4,300 million possible addresses. The popularity of the internet has led to most of these addresses being “claimed,” although clever routing and address-translation techniques have generally kept the threat of “address exhaustion” at bay, buying some time for the deployment of IPv6 addressing.
A new IP addressing scheme, which uses a 128-bit register that can contain a whopping 3.4 x 10^38 possible addresses (about 8x10^28 times as many as standard IPv4). IPv6 is intended to solve the problem of address exhaustion, and also to simplify the allocation and routing of IP addresses. As yet, IPv6 is not in widespread use on the public network, but several large nations and corporations have committed to support it in the future. What impacts IPv6 will have on spamming and other forms of network abuse are as yet not fully known.
See also: http://en.wikipedia.org/wiki/Ipv6
A programming language developed by Netscape; today, it is the most popular choice for “client-side” web automation.
(also “employment scam,” “payment processing scam”)
A fraud propagated via e-mail in which the scammer offers the victim a high-paying stay-at-home job, usually as a pretext to steal from him.
See also: example of a job scam (elsewhere on this site).
The “job” offered by the fraudster usually involves “payment processing;” the victim is sent a check (supposedly a customer payment) that he is instructed to deposit in his own bank account, forwarding the bulk of the payment (minus his “processing fee”) to the “boss.” The check invariably bounces after the victim has sent the money.
(named for Joe Doll of Joe’s CyberPost, canonically the first recorded victim of the practice)
A type of vindictive attack in which a spammer sends bulk mail deliberately implicating an innocent party as the source.
Joe-job attacks are frequently launched as retaliation for real or imagined injuries (such as being exposed as a spammer, or being kicked off a web forum), and often attempt to implicate the victim as a fraud artist, thief, drug dealer, child pornographer, or the like. Spams that use stolen from-addresses are a milder form of Joe-job.
(hacker/sysadmin jargon: “luser attitude readjustment tool;” adept hackers and admins often regard simple users (“losers” or “lusers”) with contempt)
The connotation here is that the list launderer is interested only in removing addresses that don’t work, and not in removing recipients who don’t want the spam.
(technically, a “mail user agent”)
A software application that helps users create and send outgoing e-mail, and receive and read incoming mail.
Mail clients are end-users’ “one-stop” software interface with the world of e-mail. They range from the old TTY-based Unix clients such as Mail, Elm, or Pine, to modern PC-based programs like Eudora, Microsoft Outlook, Mozilla Thunderbird, and Apple Mail. Mail clients are now invariably included in the software shipped with new computers, and are frequently “bundled” with web browsers as part of an internet software suite. Mail clients interact with remote mail hosts in order to pick up and transmit mail on behalf of the user. Mail clients are also sometimes known technically as mail user agents or MUAs.
A technical name for a mail host, one specifically intended to hold mail for pickup by mail clients. It can receive mail from other hosts via SMTP relay, but does not typically relay mail itself (instead holding it for pickup via POP3 or IMAP). MDAs may also host special capabilities such as spam filtering or virus detection.
See also: procmail.
An external host with mail to send to a user in a given domain will use DNS to find the MX hosts for the domain, and will then deposit the mail with one of these MXs. For example, when you send mail to firstname.lastname@example.org, your mail host will use DNS to look up the MX record for “big-isp.foo” (which might, for instance, be "mx1.somewhere-else.foo”) and forward the mail to that host.
The portion of a complete SMTP e-mail message packet that appears before the body, and that contains detailed technical information about the message, including routing lines.
The familiar lines showing the to-address and from-address of the message, its subject line, and its date, are technically not part of the SMTP header (as properly understood); they are easy to forge and do not contain any trustworthy information that can be used in spam tracing. The actual SMTP headers are normally hidden from users by their mail clients. Tracing the origins of spam requires exposing and analyzing the information in the SMTP header.
A computer on the internet that is dedicated to transferring e-mail from senders to recipients. Mail hosts usually must be further identified as to the specific role they play in the mail process.
In the most general case, the sender of an e-mail message usually transfers it from his computer (using his mail client) to a mail transfer agent (MTA) within his provider's domain. This MTA will then transfer the message to a mail exchanger (MX) for the recipient’s domain, identified via a DNS lookup. The MX usually passes the message to a mail delivery agent (MDA) for pickup by the recipient (via her computer and mail client).
(synonym for mail host)
A technical synonym for mail host. The mail transfer agent is understood to be a host that does SMTP mail transfers to and from other MTAs (as well as from mail user agents, to mail exchangers, and to mail delivery agents).
A technical synonym for mail client.
A web automation program designed to collect messages from website visitors (via an HTML form) and deliver them privately to the webmaster via e-mail.
Formmail is one of the more popular of such programs, although there are many others. Mailback scripts can be used by website proprietors to get feedback from visitors without exposing an e-mail address to spambots, but misconfigured mailback scripts can also be exploited by spammers to send spam untraceably.
(wordplay on “mainstream” and “sleazy”)
A term used (by many anti-spammers) to describe an otherwise-respectable or “mainstream” company that uses spam in its advertising (and thus becomes a “sleazy” spamvertiser).
The stereotypical mainsleaze company claims no responsibility for mailings sent in its name, and does not respond meaningfully to spam complaints, preferring instead to pass the buck to the remailers with which it has contracted to distribute the spam. Naturally, it does not directly involve itself in controlling the behavior of these remailers, maintaining a degree of “plausible deniability” with regard to the spam.
Many companies, including some very famous mail-order merchants and speciality retailers, have fallen into this trap over the years, although there is evidence to show that the practice has retreated in the face of anti-spam publicity and legislation like CAN SPAM.
(see mail delivery agent)
(short for “mail filter”)
See also: Wikipedia entry for milter.
(“multipurpose internet mail extensions”)
A group of standards, defined in IETF RFC 2045 and associated documents, that define how data can be included in e-mails when they are not necessarily text in the ASCII character set (e.g., foreign language text, binary attachments).
MIME is a necessary update to the SMTP-based mail system because SMTP does not allow non-ASCII data to appear anywhere in e-mail messages.
MIME defines both (1) the structure of the e-mail body and (2) the methods used to encode the data in the body to make it safe for transmission. MIME allows one or more “chunks” of possibly related data (plain or formatted text, HTML markup, attached images and files) to be inserted into the body of a message packet. The forms of MIME encoding most often seen in conjunction with spam are base64, encoded-word encoding, and quoted-printable encoding; spammers abuse these techniques to disguise their messages during transit.
A business in which participants make money not only by selling product, but by recruiting and collecting commissions from subordinates or “downlines.”
The MLM model is the basis for many ethical and honest businesses, but is also often exploited by spammers in a fashion similar to chain letters.
(from '“Make Money Fast,” a tagline often found in older spams)
A type of spam, such as a chain letter, that tempts recipients with the opportunity to create a “business” that will make them lots of money with no real work on their part.
MMFs are always frauds of one sort or another and are usually illegal. The term is most often associated with a virulent breed of chain-letter spam once epidemic on usenet. The MMF usually suggests that the target will build a continuing business enterprise, as opposed to an advance-fee fraud, which offers a one-time windfall.
(see mail transfer agent)
(see mail user agent)
(also mung; hacker jargon for “mash until no good”)
To manipulate a string of data so that it can no longer be understood properly by machines, although it may still be recognizable to humans (for example, to alter an e-mail address by disguising the “@” sign).
See also: use of munging to avoid spam (elsewhere on this site).
(from the name of former U.S. Senator Frank Murkowski)
A phony disclaimer in a spam message stating that the message was sent in compliance with S.1618 (e.g., “this old-fashioned spam contains a murk”).
Sen. Murkowski was one of the original sponsors of S.1618. His namesakes used to be commonly included in spams, although the spammers were seldom in compliance with the terms of the bill. Nowadays, hard-core spammers seldom claim to obey any laws regarding their behavior.
(see mail exchanger)
Most internet users interact with local name servers run by their ISPs; these local servers in turn get their information from the designated authoritative name servers for each domain. The local name servers will usually cache (store in memory) the lookups they are asked to make, so that the next user who wants to make the same lookup can be served in much less time (and so that the load on the authoritative name servers is reduced and distributed).
A group of IP addresses assigned to (and therefore controlled by) a single individual, company, or institution.
You can sometimes track spammers to particular IP addresses that they use (for sending mail or for maintaining websites); you can then use WHOIS lookups to determine the owners of the netblocks in which these addresses appear, and then report the abuse to these parties, or to the upstream provider responsible for having sold the block.
A criminal racket named for a section of the Nigerian criminal code that supposedly outlaws the practice. A form of advance-fee fraud, this scam invites you to share in the distribution of lots of ill-gotten cash, but not before you pay various advance fees (that the scammer will simply steal).
These low-tech, low-profile scams are most frequently associated with Nigerians, both inside and outside Nigeria, but are also sometimes tried by non-Nigerians.
An IP networking utility that can query the domain name service to find out what IP address is associated with a given host name or vice-versa. On some operating systems, the similar command “host” is preferred.
See also: how to use host and nslookup (elsewhere on this site).
A spam filter that does its work on the user’s computer (and thus “off the network”), once a user has downloaded his mail.
See also: on-network filter.
Usually off-network filtering is done by built-in features of the user’s mail client, although add-on filter programs are becoming increasingly popular. Usually, however, the best that such filters can do is simply to segregate spam from wanted mail; they are rarely effective at analyzing and tracing spam, or helping to report it to the proper sources. Off-network filters do not prevent the transmission, relay, acceptance, or downloading of spam, which represent the principal economic damage of spam.
An internet host maintained outside the U.S. (or, not to be US-centric, outside any country where you happen to live).
Offshore hosts are frequently used by spammers for mail services and website hosting, since they are essentially beyond the reach of U.S. law enforcement, may not be bound by onerous anti-spam laws, and are more willing to take on spammers’ traffic for the money they can make.
A spam filter that performs its work “on the network,” before mail is released to the recipient for download.
See also: off-network filtering.
Usually, on-network spam filtering is done either on the user’s mail delivery agent (using procmail or similar), or on a special host or service to which the mail has been forwarded for screening (e.g., SpamCop).
On-network filters offer the advantage that the user does not have to download messages identified as spam to his own computer, but it requires that the user periodically inspect the trapped or detained mail to report it or to release “false positives” (nonspam messages erroneously identified as spam). Alternatively, users can often instruct their ISPs to simply discard any mail detained by the filter.
The spam filtering services offered by many retail ISPs, commercial online spam-filtering services (such as SpamCop), MDA-based procmail scripts, or packages like SpamAssassin, are all examples of on-network filtering. Some mail client add-on programs can screen mail while it remains on the mail delivery host, providing on-network filtering of a sort, although these programs invariably must download at least part of each message in order to screen it.
A third-party computer that has been configured (typically using a virus or other hostile code) to serve spammers by untraceably relaying their mail, hosting or guarding their websites, or performing other tasks.
The open proxies (often known as “bots” or “zombies”) are usually individual home or business computers that have not been properly secured against malware attack, and that are connected to the internet full-time via some sort of broadband connection (so they’re always available to the spammer). Typically, the owners of such computers do not know that their machines are open proxies, since the proxy code leaves little in the way of traceable data. Usually, the open proxies are not detected until they have become so bogged-down with botnet traffic that they no longer have CPU time left to provide good performance for their owners’ own tasks.
While the vast majority of open-proxy hosts are Microsoft Windows machines (reflecting the dominance of this operating system among computers in general), it is also possible for non-Windows machines (e.g., Linux or MacOS machines) to become open proxies of a sort if their owners do not keep them safe from installation of “rootkits” or similar mechanisms. Infecting such machines is more difficult, however, and requires that their owners actually disable important security features of their systems that are otherwise (usually) enabled by default.
Since the open proxies are controlled “by proxy” outside recognized channels, they provide effective cover for spammers. Open proxies now account for the vast majority of spam mail transmissions, as they make the mail much more difficult to trace to its ultimate source. Criminal gangs, assisted by resourceful hackers, regularly “recruit” large networks of open proxies whose services they sell to prospective spammers.
A mail host that has been misconfigured (or deliberately configured) to accept mail (including spam) from any host for transmission (“relay”) to any other host.
Currently, most mail host administrators (assisted by the packagers of mail transport software and operating systems) have pretty much eliminated open relays, requiring either the sending host or the recipient (or both) to be within the mail host’s domain, or else requiring the sending host to validate itself with a user name and password. This has forced spammers to find other means of transmitting spam, such as direct-to-MX mailing.
Characteristic of a mailing list to which the mail recipient is not added unless he makes an explicit prior request.
Opt-in without confirmation is better than opt-out, but can still permit people to be added to mailing lists without their permission.
Characteristic of a mailing list to which a mail recipient may be added without having given explicit prior consent, and from which the recipient must explicitly request to be removed.
The typical spam mailing list is effectively an opt-out list, although some spammers will claim otherwise. In fact, there is now usually no way to “opt out” of a particular spammer’s list, the CAN SPAM law notwithstanding.
(crackerish respelling of “fishing”)
A type of fraud propagated by e-mail that seeks to trick users into surrendering valuable personal information (user names, passwords, account numbers, and the like).
See also: more info about phishing scams (elsewhere on this site)
Phishing is one of the more common types of internet fraud, and borrows heavily from techniques perfected by spammers (more information here).
(from submariners’ jargon)
See also: how to use ping (elsewhere on this site).
Ping is not extremely useful by itself in spam investigation, but can be used to determine whether a host you are interested in is currently online (or whether your own internet connection currently works).
(probably from the pink color of SPAM meat product)
A secret agreement between an ISP and a spamming customer in which the ISP agrees to waive or weaken its published anti-spam policies in exchange for significantly higher fees.
(“post office protocol”)
See also: IETF RFC 1939 (STD 53)
POP (or POP3 as the improved version is usually known) is the most widely used protocol for internet mail pickup, but the IMAP protocol is also sometimes used. These protocols don’t come up very often in the study of spam, because they do not figure in the propagation of spam.
See also: postfix website http://www.postfix.org/.
Postfix is described by its maintainers as “...fast, easy to administer, and secure, while at the same time being sendmail compatible enough to not upset existing users. Thus, the outside has a sendmail-ish flavor, but the inside is completely different.” (http://www.postfix.org/start.html).
A vestigial e-mail message that represents the fallout from a dictionary attack, usually with a bare minimum of information (i.e., no body content, no from-address, no subject line, etc.).
See also: example of a probe message (elsewhere on this site).
Probe messages are evidence of a dictionary attack (or “directory harvest attack” or “MX probe”) in which the spammer sends probe messages to a large number of e-mail addresses (some of which may be guesses) and prunes out those that are rejected by the MX.
A mail delivery agent program that can accept and process incoming mail for delivery to users.
See also: procmail website http://www.procmail.org/.
Procmail is a popular component of Unix-based SMTP/POP/IMAP mail systems, and can be configured for spam filtering. Procmail is most often found as a component of the mail systems of ISPs, businesses, and institutions that support large numbers of mail user accounts.
A type of securities fraud in which victims are tricked into buying nonperforming stocks for the ultimate benefit of the fraudsters. Presently endemic in spam e-mail.
See also: all about stock spam (elsewhere on this website).
In the pump-and-dump game, a nonperforming stock is heavily promoted to the public with the expectation that it will temporarily rise in value, allowing current investors (including the perpetrators of the scam) to sell out at a profit, at which point the share price usually collapses back to its original levels. Spam is now a very popular means for pump-and-dump artists to “spread the word,” since it offers widespread coverage, is very cheap, and is difficult to trace back to the perpetrator.
A type of MIME encoding applied to the bodies and attachments in e-mail messages.
See also: MIME.
Quoted-printable encoding can replace parts of a block of data (such as an e-mail body or an attached file) with a text-like representation to allow the data to be sent safely over e-mail systems. It is normally used when the text to be encoded is already very close to ASCII text, but may have occasional non-ASCII characters (such as accented letters). QP encoding also ensures that mail bodies have no lines longer than about 80 characters so as to be fully compliant with SMTP. Like other types of encodings and escapes, QP encoding can be used by spammers to disguise the contents of their messages during transit.
Quoted-printable data can be recognized by the use of “=nn” codes, where “nn” is a hexadecimal integer (i.e., a byte value), and also by the presence of “=” signs at the ends of longer lines of a message.
Describes any of a number of techniques used by spammers to hide their “real” websites behind disposable “portal” sites. Generally, the spam recipient is invited to visit the portal site, which immediately redirects to the spam website (sometimes by way of additional portals).
See also: examples of web redirection (elsewhere on this site)
See also: Inside an SMTP handoff (elsewhere on this site).
For a number of reasons (including limiting spam), the best practice is for the mail system administrator to program his MXs to reject the mail at the time of the delivery attempt, rather than to pass it on to an MDA to be bounced (which may generate unwanted “blowback”). Rejecting known spam mail is also beneficial to users, who do not have to download and deal with messages that would otherwise be accepted by the MX.
A mail host is supposed to reject messages immediately if they cannot be delivered, or if the mail host does not wish to deliver them for some reason (e.g., they are suspected as spam). The rejection is indicated by a 500-series SMTP return code in response to a request by the sending mail host.
If the receiving host has good reason to believe that an offered message is spam (e.g., the address of the sending host shows up in a DNSBL), it can reject the message as described above, which will spare the recipient from having to download it and deal with it.
Some large mail providers (particularly retail ISPs) may not reject nondeliverable mail, instead passing it to an MDA which must then bounce the message later on.
The use of an open relay mail host to transmit spam or other abusive e-mail.
The connotation of this rather-too-picturesque term is that the operator of the relay host is presumed to be an innocent victim of the spammer, and not a willing conspirator.
Restriction of indiscriminate relaying, along with the rise of open proxies or zombies, has greatly reduced the proportion of spam sent by traditional open relays.
An e-mail address or web link (or sometimes a telephone number or postal address) provided in a spam e-mail, ostensibly to allow recipients to remove themselves from future mailings.
No doubt some spammers may be diligent and honest in honoring such requests, but more often the remove links are simply used to verify that the complainer’s address actually works and can be sent more spam. For this reason, it is never advisable to use remove links (see my Spam Rule #4). Likewise, network-based removal links (websites, e-mail addresses) are fully reportable to the internet providers who support them as possible violations of the providers’ anti-spam policies.
(“request for comment”)
A document published by the Internet Engineering Task Force (IETF) that sets forth the requirements for an internet protocol (or a feature of a protocol, or some other matter related to the internet).
See also: IETF documents website http://tools.ietf.org/html/
The common protocols used on the internet are all described in RFCs, which are technically “requests for comment” but are usually accepted as law (or, at least, as standard operating procedure) once published in a stable form. A few of these have been “promoted” to STD (“standard”) documents, but they are often still more widely known by their RFC number. For example, STD 9 defines the file transfer protocol (FTP), but the “original” RFC 959 is cited just as often.
RFCs are prepared in a simple standard format (usually as simple “ROFF’ed” text files) by individuals, groups, or committees that volunteer to perform this service for the community at large. The RFCs are available for download free of charge from many locations including IETF’s own website.
A business that specializes in the mechanics of e-mail delivery; remailers make money by sending e-mail on behalf of paying clients.
See also: tips on ethical bulk-mailing (elsewhere on this site).
There are legitimate applications for bulk e-mail, and the process is sufficiently complex that a good ethical remailer service is to be recommended in such cases. Many remailers are honest and spam-averse, although a few are simply spammers-for-hire.
This field usually appears at or near the top of the “invisible” portion of the header, and if present indicates the return e-mail address that was passed in the MAIL FROM command of the first SMTP handoff. This need not be the same as the visible from-address, and may not be trustworthy, particularly in forged-header spam.
The reverse web proxy takes requests from website visitors and relays them to the protected spam website, returning any data it receives from the protected site back to the visitor. To the visitor, the proxy itself appears to be the spam website, and there is no good way to tell that it is actually only a proxy (i.e., we usually cannot learn very much about the actual protected website itself). Many botnet spammers will use large networks of reverse proxies to shield their websites, and will rotate the proxies at intervals as small as a couple of minutes to make detection and enforcement difficult.
(“Registry of Known Spam Operations”)
An extensive database of spam perpetrators and support operations, maintained by The Spamhaus Project.
See also: ROKSO website.
(also “rotating DNS”)
Continual and frequent changing of the IP address of a network resource (such as a spam website) so as to make it difficult to pinpoint and report to providers.
Fast rotating of IP addresses is used by some spammers to keep their websites from being traced and shut down. The spammer will set up reverse web proxies on many hosts (drawing from botnets that can number well into the thousands), and then use name servers under his control (often hosted on the very same machines, and also rotated over time) to frequently change which one (or more) of these proxies will be pointed to by an authoritative (“official”) DNS lookup for the site. The rotation often occurs at intervals as short as a couple of minutes. Even if the spam investigator finds and reports the addresses for the website that were authoritative at the time of his analysis, the website will have long since moved on to dozens or hundreds of other addresses in succession by the time the report recipients can take any action.
See also: content filter.
Routing filters look for evidence like forged header lines, or IP addresses known to have been spam sources (i.e., addresses that appear on a blocklist). The results of routing filtering are usually (but not always) more certain than those of content filtering.
See also: inside an SMTP handoff (elsewhere on this site).
A complete header will contain one or more routing lines to describe the complete path taken by a message, from host to host, on its way from the sending machine to the recipient’s mail host. Spammers are able to forge some, but not all, of these records.
(U.S. Senate bill 1618, 104th Congress)
The identifier of a bill deliberated in the United States Senate during the 104th Congress (1998-2000) that was one of Congress’ first attempts to legislate specifically against spam.
See also: CAN-SPAM.
S.1618’s anti-spam provisions were actually only a small part of the bill, which was chiefly aimed at the then-prevalent problem of sharp trading or “slamming” in the retail long-distance telephone market. The bill was never passed by the Senate and therefore never became law. Even if it had, it would not have disrupted the livelihoods of spammers to any great degree, since it permitted opt-out spamming. Disclaimers (or “murks”) citing S.1618 were once very popular in spams, although those who used them were almost never in compliance with even the minimal requirements of the bill (i.e., they did not give a correct postal address or phone number, and did not provide a trustworthy means of opt-out).
Collecting e-mail addresses for a spam list by scanning publicly-posted websites, mail list archives, or usenet newsgroups. Usually, a spambot or other automated tool is used for scraping; it gulps in all the pages or posts it finds and spits out text patterns that look like e-mail addresses
A set of procedures that allows mail providers to exchange information in real time on who they permit to send mail and from which hosts.
See also: http://www.openspf.org/ (Open SPF website).
Sender policy framework, or SPF, is a sort of “patch” for internet e-mail systems that (1) allows the operators of an internet domain to post their “sender policy” (that is, which hosts are allowed and which are not allowed to send mail on behalf of their users), and (2) allows hosts receiving mail purporting to be from a given domain to look up the policies of this domain to determine whether the mail may be spoofed or illegitimate.
The “policy” is posted in a specially-formatted TXT record within the domain’s DNS setup, and any receiving mail hosts can use special lookup tools to check these records before they accept mail. If the IP address of the sending host does not appear in the policies of the domain given in the return-path e-mail address, the receiving host can either reject the mail or can flag it as possibly abusive.
In other words, SPF can be thought of as a sort of reverse of the dig mx lookup, allowing receiving mail hosts to look up sending mail hosts for a domain, just as sending hosts can use dig mx to find the receiving hosts for a domain.
The use of SPF by mail providers is, of course, purely voluntary. Right now, unfortunately, SPF isn’t in widespread-enough use to be viable as a spam-detection tool (i.e., the spammer can simply make sure that his spam-sending machines don’t reside in domains that have rigorous SPF policies); however, it is somewhat beneficial in preventing blowback bounces to innocent mail users whose addresses have been forged into spam (i.e., the receiving mail host can check the SPF records, and if they do not match, it can opt not to send a bounce message).
A very popular mail transfer agent program, which serves as the baseline MTA for many standard internet mail systems.
(“simple mail transfer protocol”)
The protocol (defined in RFC 2821) that a mail host uses to transmit messages to another mail host along the path from sender to recipient.
Knowledge of SMTP is useful in understanding how spam works.
(from the Hormel Foods canned meat product (normally spelled in all-caps as SPAM), or from a sketch from the Monty Python’s Flying Circus television series)
See also: frequently-asked questions (elsewhere on this site).
The term “spam” was previously more general in meaning, referring to any sort of unwanted or off-topic traffic on usenet or mailing lists; then, it narrowed to refer mainly to unsolicited bulk commercial e-mail. Of late, however, the meaning of “spam” is once again expanding to include other forms of objectionable internet advertising such as “link spam,” “blog spam,” or “search engine spam.”
A software process that examines a stream of incoming mail messages and sorts them into those that may be spam and those that probably aren’t.
See also: content filter, routing filter, challenge-response filter, on-network filter, off-network filter, Bayesian filter, SpamAssassin, milter, false negative, false positive, discussion of e-mail filtering (elsewhere on this site).
Spam filters can be deployed on the network (where the spam can be detained before you have to download it), or on your computer (where they filter the mail after downloading). The filter can operate on the routing of a message, on its content, or both. Filters are not perfect, and can make false-positive errors (detaining e-mails that aren’t spam) and false-negative errors (passing e-mails that are spam). Filters can be set to discard mail that is suspected to be spam, but more often they are used just to tag or segregate suspected spam to allow the user to inspect it for false positives, or to report the spam to its sources.
A spam transmission session, although the term “session” seems wimpy. Like any other industrial-strength enterprise, successful spam runs require a great deal of preparation and coordination.
(possibly not to be confused with spambait)
A valid and deliverable e-mail address that exists only for the purpose of attracting spam.
Some website operators deploy such addresses in locations where they are likely to be seen only by spambots or harvesters (e.g., in invisible portions of a public web page); by monitoring spam sent to these otherwise unused and uncirculated addresses, analysts can detect and study ongoing spam attacks.
An open-source spam filtering system currently maintained by the Apache Software Foundation, and used by many small-to-medium-size ISPs, online businesses, and end-users.
See also: http://spamassassin.apache.org/ (SpamAssassin website).
SpamAssassin is a highly versatile and configurable spam filtering program that identifies spam by testing for certain predefined “fingerprints” and assigning arbitrary numerical scores where it finds them; messages that exceed a user-adjustable threshold score are then tagged as spam.
SpamAssassin is typically run on a mail delivery agent (i.e., so that spam can be held and not downloaded to the user).
Administrators can specify the tests to be used by their local SpamAssassin installations, can vary these tests for each user, and can even write their own tests (using a basic syntax plus regular expressions).
While it is arithmetical in nature, SpamAssassin is not itself a Bayesian filter, although Bayesian filters can be included as SpamAssassin tests.
Bogus (invalid and undeliverable) e-mail addresses deliberately planted on a website with the expectation that a spambot will harvest them and thereby pollute its output.
Sometimes, automated programs can be used to randomly generate hundreds or thousands of such addresses for the delectation of hungry spambots. The term possibly connotes an element of desperation, harassment, or revenge as opposed to spam trap, which suggests an interest in detecting and studying spam.
(portmanteau word for “spam robot”)
A computer program that can visit websites and usenet news groups in the same manner that a human reader would, and can thereby collect what look like valid e-mail addresses for inclusion in a spam mailing list.
If you run a website and have access to its web server logs, you can often find evidence of the activities of spambots -- they frequently leave odd signatures in the http-user-agent field.
An anti-spam service that automates the filtering, detection, and reporting of spam.
See also: http://spamcop.net/ (SpamCop website).
SpamCop helps you identify where the spam comes from and what other net resources it involves, and provides pro-forma reports for you to file with the responsible parties. For a small fee, you can also get a spamcop.net e-mail address, and have SpamCop filter all your incoming mail and detain the spam.
Derisive nickname among anti-spammers for Sanford Wallace, a notoriously prolific and unapologetic spammer of the 1990s.
See also: Wikipedia entry for Sanford Wallace.
After unsuccessfully fighting considerable litigation directed against him, Wallace announced his retirement from the spam business in 1997. However, this retirement appears to have been less than permanent, and he presently faces judgments totaling in the millions of dollars for continued spamming, spyware, and related network abuse. More recently, he was sued by the online social network MySpace for creating thousands of fake profiles. For a time, he spun records at a Las Vegas nightclub under the “nom de disque” DJ Masterweb.
(mock-German for “spam house”)
A derisive term usually applied to a support operation used by spammers (e.g., a remailer service or spam-friendly hosting provider). Sometimes used to refer to a particular spam operation as a whole.
The German-like plural form “spamhausen” is used when referring to more than one spamhaus (although sticklers might prefer the correct form “Spamhäuser”).
See also: http://www.spamhaus.org/ (The Spamhaus Project's website).
Needless to say, perhaps, The Spamhaus Project is not itself a spamhaus. As a backhanded tribute to its effectiveness in militating against spam, The Spamhaus Project has become the frequent target of denial-of-service attacks and lawsuits from vengeful spammers.
One who sends spam e-mails.
The use of spam for advertising goods or services.
(from the German for “tar pit,” properly pronounced “TEER-groo-buh”)
The spammers who encounter a teergrube are supposedly attracted to the prospect of an open relay, and then become stuck in the “tar pit” of slow-moving or non-moving mail transfers. Honest users of the teergrube are said not to be affected, since they seldom have more than a couple of messages to pass per session (as opposed to the hundreds or thousands that a spammer might want to pass).
More than one teergrube is often called by the German plural form “teergruben,” while the act of running a teergrube is often called “teergrubing.” Sometimes the English terms “tarpit” and “tarpitting” are used instead.
Once regarded by some as a promising spam countermeasure, the use of “teergrubing” is no longer effective for stopping spam mail now that spammers have moved to open proxies rather than continuing to hunt for scarce open relays. However, tarpitting is regaining some glory as a possible means for MX mail hosts to protect themselves from directory harvest attacks, or “dictionary attacks;” an MX equipped for tarpitting can detect abusers and then restrict or block service to them.
An e-mail address obtained from a free e-mail provider (such as Hotmail, Yahoo, etc.), and used for defense against spam or other abuse.
See also: avoiding spam (elsewhere on this site).
In earlier times, spammers used such addresses to collect responses from the public, although websites are now more popular (and more reliable) for such purposes. People who wish to avoid or deflect spam can make use of throwaway addresses in conjunction with other techniques.
A basic network utility that can identify the intermediate stops that an IP packet will make on its way from an originating host to a receiving host.
See also: about traceroute (elsewhere on this site).
Traceroute is often useful for tracing down the upstream providers of spammers who own their own net blocks.
The address that appears in the visible “To:” field of the customary e-mail header.
See also: for-address.
The to-address (for technical reasons) need not be the address of the ultimate recipient of the message (this is more authoritatively given in the for-address header clause), and in many spams it is bogus. It used to be possible to filter out a great deal of spam by simply trashing messages in which your e-mail address did not appear in the to-address field, although spammers have now gotten less lazy about this.
(“unsolicited bulk e-mail”)
A less colloquial synonym for spam.
(“unsolicited commercial e-mail”)
A less-colloquial synonym for spam.
The internet provider that sells network access services to a particular network user (the user may be an individual, a business, or even another ISP).
The term “upstream provider” is somewhat general, and is probably derived from the notion of network traffic (or perhaps network access fees) flowing “upstream” from a smaller to a larger provider. Even upstream providers may have their own upstream providers. In the usual case, the upstream provider (or simply “upstream”) is a “wholesale” ISP, or one that provides services to businesses (including spammers) or to other network providers; the upstream is then not necessarily a “retail” ISP that serves the public directly. The ultimate upstream provider would be a provider that holds a “direct allocation” of IP addresses from a regional internet registry (such as ARIN or APNIC), since there’s no one “above” him in IP space.
In the context of spam, the term usually refers to the hosting service that serves a spam domain, or the provider from whose net blocks spam messages have been sent, or the provider that has sold (and transferred) a net block to a spammer.
Even if a spammer appears to control his own net block, he’s gotten it from an upstream provider which may have an interest in stopping spam. Frequently, then, the activities of spammers can be stopped by reporting them to the spammer’s “upstream” (see also LART).
(“uniform resource identifier”)
A structured text string (conforming to IETF RFC 2396) that identifies some network resource, as well as the scheme or protocol that is to be followed to obtain it.
The URI is the “parent class” of the more familiar widget known as a uniform resource locator or URL. URIs can be also be interpreted as URNs (uniform resource names). The distinctions among URIs, URLs and URNs are rather esoteric and don’t bear much discussion here (see the RFC for more info). Mainly the only reason to know about URIs is that they are the basis for the URI encoding scheme.
A type of escape following the rules of RFC 2396, in which certain characters that have special meanings within the URI syntax are replaced by alternate representations so that programs that process URIs aren’t confused.
See also: IETF RFC 2396.
For example, the forward slash “/” has a special meaning in many URIs; if you insert it “bare naked” into a URI outisde this intended role, this can confuse programs (e.g., web browsers) that must process URIs. The URI escape allows the slash to be replaced by the escaped sequence “%2f”; which defines the slash in a non-confusing way (i.e., 2f is the hexadecimal ASCII code for the slash).
Spammers sometimes use URI escapes within the bodies of their messages as a crude means to disguise the locations of their websites.
(uniform resource locator)
A URI that defines the physical location of a resource on the internet.
A typical URL (like “http://www.rickconner.net/spamweb/index.html”) first gives the protocol to be used to fetch the resource (“http”), followed by the server that holds the resource (“www.rickconner.net”), and finally by the location on the server (usually the path and name of a file) where the resource is found (“/spamweb/index.html”). The “resource” might be a static file (as in this example), a server-side program to be run (e.g., a PHP or CGI script), or program to be downloaded and run by the client (e.g., a Java applet). Aside from the well-known HTTP URLs, the most commonly-seen varieties are mailto (for causing e-mail programs to create pre-addressed blank messages) and ftp (for retrieving files from servers using the FTP protocol).
(short for “user network”)
A large, distributed collection of online text-based message boards or “newsgroups,” which is maintained and accessed using the network news transfer protocol (NNTP).
Along with e-mail, rlogin/telnet, WAIS (gopher), and FTP, usenet news is one of the original applications to find a home on the internet in the days before the web. Usenet is still widely used, although much of its “market share” has been absorbed by e-mail broadcast lists and web-based message boards. The problem of spam first became apparent on usenet, and usenetters developed many ways to deal with it. Since the technology of usenet is considerably different from that of e-mail, the technical study of usenet spam remains largely distinct from that of e-mail spam.
See: beacon URL.
(not to be confused with “web log” or “blog”)
A structured text log of transactions (e.g., file fetches) handled by a web server, identifying the time, date, and nature of the transaction.
See also: Apache 1.3 documentation for log files.
Web server logs are usually maintained in a special directory outside those of the website(s) handled by the server (e.g., “/var/log/httpd”). The logs are in text form and are readable by humans, but are more conveniently dealt with by log-analysis programs or scripts. Website operators can use the logs to spot trends in traffic, web-based attacks of various sorts, and errors in their web setups. Spammers often use web server logs in conjunction with beacon URLs and the like to keep track of who is reading their mailings.
A service offered by many ISPs and other organizations that allows users to send and receive e-mail via a web-based interface, rather than a standard mail client.
Webmail is a great convenience for many users, because it frees them from always having to use their own private computers to send and receive mail. It does tend to complicate the life of the spam investigator, however. Certain scammers (such as advance-fee fraudsters) like to use webmail (mainly because they are probably using public computers in libraries or net cafés); the webmail services vary widely in how they document the transaction in the SMTP mail header, and it can be difficult to trace webmail messages back from the webmail service to the host from which the webmail service was accessed.
A list of e-mail addresses created by a user from which the user will accept e-mail; messages from senders not on the whitelist are presumably discarded or sequestered.
See also: blacklist.
By itself, a whitelist filter that simply trashes all mail from addresses not on the list is a bit too draconian to be much good to most people. If combined with a mechanism such as challenge-response filtering, however, whitelists can be marginally more useful.
(from the Hollywood Western cliché in which “good guys always wear white hats”)
Describes an internet provider that has (and enforces) effective anti-spam policies.
See also: black hat.
(from “Who is … ?”)
A basic network utility that is typically used to find out who has registered for the use of a particular network domain name (e.g., “rickconner.net”), or who controls a particular numeric IP address, or to whom abuse complaints regarding a particular domain should be addressed (via the whois.abuse.net server).
See also: about WHOIS (elsewhere on this site).
Persons or organizations that register for the use of domain names or IP address blocks are required to provide identifying information about themselves (e.g., names, postal addresses, phone numbers, contacts) as part of their contracts. This information is published in databases that are accessible to the public via WHOIS client software. WHOIS is a “well-known” network service on port 43, defined in IETF RFC 3912. It was originally intended to be used for conveying personal information (like e-mail addresses and telephone numbers), but is now used mainly to fetch domain and IP-block ownership info.
The X-lines, if used, usually appear near the bottom of the header and are preceded by “X-.” Common examples include “X-Priority,” “X-Virus-Scanned,” etc. Such records may have special meanings for a particular mail service and may not used or understood by other mail services; for this reason, they are considered “experimental” header lines, and not standard features of SMTP. They may, nevertheless, contain information of interest to spam investigators (for example, SpamAssassin typically puts the results of its analysis of a message into an X-line within the message header).
(from the mythical creature that can carry out the bidding of others with no will of its own)
A computer (usually belonging to a home or small-business user) that has been implanted with open proxy software.
Such a computer can form part of a botnet that can be used by spammers (and others) for various malign purposes.