Spam analysis tools
(WARNING! High geekage ahead!)

About IP networking
IP tools "roadmap" (start here if you like)
ping (ICMP) || nslookup || traceroute
whois
     ( ... for domain-owner lookups)
     ( ... for abuse-contact lookups)
     ( ... for IP block owner lookups)
"safe" web surfing (telnet, curl)
Other tools occasionally mentioned
Where to find tools (including on the web)

Shelock Holmes had his magnifying glass. Sam Spade had his gat. Clouseau had ... well, whatever he had. Every good detective needs tools, and spam detectives are no exception. On this page, we'll provide a very brief overview of IP networking (just enough for some background info), and then introduce you to some of the tools that you can use to analyze and trace spam messages. This isn't the place, nor am I the person, to provide a complete discussion of IP or of the implications of some of the tools described. If you need to know more, you should seek out references on the topic (a Yahoo search would not be a bad place to start).

Ten-cent tour of IP networking

The Internetworking Protocol ("IP") is the foundation upon which the internet is built. By following the rules and procedures of IP, computers of all types can communicate successfully and efficiently over networks of all sizes, from your home or office LAN, to your cell phone network, all the way up to the global carrier network. You'll sometimes see IP referred to as "TCP/IP", but TCP (the "transmission control protocol") is just one of the many protocols that operate over IP networks.

First of all, every computer in an IP network (including yours, when it is connected to the internet by modem or by LAN gateway) has a distinct IP address. This address is usually represented as a so-called "dotted quad" (e.g., "192.168.0.13"), four numbers in the range 0-255 separated by decimal points.

Of course, humans have a hard time dealing with bunches of numbers, so we usually prefer to use host names instead. Host names are built up from simple strings connected by dots (e.g., "www.yahoo.com"). The last two (or sometimes three) "words" in a host name are known as the domain name. In the example above, "yahoo.com" is a domain that could contain many individual hosts — www.yahoo.com is a specific host (or more properly an alias for a specific host).

On the other hand, the computers can't deal very efficiently with the names, so we need a way to go from names to numbers and vice-versa. This is done using a huge distributed database known as the "domain name service" (DNS). When you ask for a host computer by name (e.g., when you type a URL into your web browser or an e-mail address into your mail program), your computer consults DNS to get the corresponding address. Your computer can then contact the address to get the information or service you want.

IP is a very involved subject, but for our purposes it is most important to understand that:

Your network toolbox

Now, we'll look at some specific tools that come in handy for network gumshoeing. First, here's a useful "road map" to tell you what tools are best used for common spam analysis tasks:

If you want to . . . you use . . .
...find out whether a host is online and responding to IP communications ping [hostname-or-IP]
...find out the IP address of a host (or the host name associated with an IP address nslookup [hostname-or-IP]
...trace the network route to a host traceroute [hostname-or-IP]
...find out who controls a given IP address whois -h [RIR] [IP]
...find out who has registered a given domain name whois [domain]
whois -h [registrar] [domain]
...find (maybe) abuse contacts registered at abuse.net whois -h whois.abuse.net [host]
...download the raw contents of a web page safely without using a browser telnet [host] 80 [...] GET [page]
curl [url]
...find out more info on a domain or a named host dig [domain]

ping -- "Are you out there?"

Our first tool, ping, is a basic but very useful diagnostic tool for IP networks. Using ping, you can determine whether your computer can "reach" (connect to) some other computer or device elsewhere at a given IP address. Ping uses a specialized IP protocol called ICMP (internet control message protocol), an acronym you may often see in conjunction with ping.

For spam-hunting purposes, we usually use ping to determine whether a particular host machine is "alive" (i.e., that it is turned on and answering at the given address). Ping does not tell you what the remote machine does (e.g., whether it has a web server, whether anyone is logged on), what kind of machine it is (e.g., a home user, a server, etc.), or who it belongs to (the Pentagon, the Roman Catholic Church, your next-door neighbor, etc.), just whether a path exists to send IP packets to it from your machine.

If you use a Unix-type operating system (like Linux or Mac OS X) or Microsoft Windows 95 (or later), you have ping built right in on your command line. Just open a terminal window (for Unix/OS X) or a command prompt (DOS window) on Windows and type the ping command like so:

ping [host-name or IP-address]

supplying the host name or numeric IP address you want to reach. Ping will send out a series of ICMP packets, typically one every second. and then print the replies it receives (if any) from the remote machine (this process can sometimes take a couple of seconds).

For example, here's a ping I ran to hostname www.yahoo.com from my Mac's terminal window (the first line is the command prompt followed by my command, and the rest is the output from the command):

[localhost:~] rconner% ping www.yahoo.com
PING www.yahoo.akadns.net (64.58.76.179): 56 data bytes
64 bytes from 64.58.76.179: icmp_seq=0 ttl=51 time=24.126 ms
64 bytes from 64.58.76.179: icmp_seq=1 ttl=51 time=25.078 ms
64 bytes from 64.58.76.179: icmp_seq=2 ttl=51 time=23.183 ms
64 bytes from 64.58.76.179: icmp_seq=3 ttl=51 time=24.562 ms
^C
--- www.yahoo.akadns.net ping statistics ---
4 packets transmitted, 4 packets received, 0% packet loss
round-trip min/avg/max = 23.183/24.237/25.078 ms

I stopped this ping after four packets by typing control-C; when you ping on the public network, it is considered poor form to send any more packets than you require in order to verify the connection. Also, don't use the "-f" (flood) option with ping, as it will send packets as fast as it can and places a burden on the remote computer (plus, if you get obnoxious with flood-pings, you can find yourself in trouble for launching a "denial of service" attack). Again, you don't need to see any more than a couple of returns to know that you can make a connection.

Most of the output of ping is rather technical in nature, but as long as you see a lot of lines with "time=nnn" at the end, and as long as the final lines (after the control-C) show a zero (or very low) packet-loss rate, you know there's a reliable connection between the two machines.

nslookup -- "I know your name, but what's your address?"

As we said, your computer can automatically convert names you supply into IP addresses using DNS. Normally, this is done transparently to you. However, if you're curious, you can do a lookup "by hand" using our next featured tool, the nslookup command. Again, this command is available to you from the Unix (or Darwin) or Windows command line, and you type it thusly:

nslookup [host-name or ip-address]

If the DNS is reachable from your computer, nslookup will return a result.

For example:

(Get address from name...)

[localhost:~] rconner% nslookup www.yahoo.com
Server: home1.bellatlantic.net
Address: 199.45.32.43

Non-authoritative answer:
Name: www.yahoo.akadns.net
Addresses: 64.58.76.178, 64.58.76.179, 64.58.76.229, 64.58.76.177
64.58.76.227, 64.58.76.223, 64.58.76.225, 64.58.76.176, 64.58.76.224
Aliases: www.yahoo.com

(Get canonical name from address...)

[localhost:~] rconner% nslookup 64.58.76.178
Server: home1.bellatlantic.net
Address: 199.45.32.43

Name: www9.dcx.yahoo.com
Address: 64.58.76.178

Note that we did two lookups here: first on "www.yahoo.com," which returned 9 different IP addresses we could use to contact www.yahoo.com. Then, we looked up one of those addresses and got back the principal name of that specific host.

Here, Yahoo's "authoritative" name server www.yahoo.akadns.net recognizes several different addresses for the alias www.yahoo.com; this is a means for Yahoo to share its tremendous load among several servers. For one of these addresses (64.58.76.178), the CNAME is www9.dcx.yahoo.com (this is the actual name of the host, although it can also be reached at the alias www.yahoo.com. It's all rather confusing, but we don't need to know much about it for our purposes here.

Note that nslookup does not tell us whether the host is available right now, or what it does; it tells us only what DNS says is the name or address corresponding to the address or name you supplied. Typically, however, if a DNS lookup is successful, the host is probably available (unless it broke, or unless there was some isolated problem with the network). You can use ping to check this.

One important use of nslookup in spam analysis is to test whether a host name given in a routing header actually corresponds to the IP address given for it. For example, spammers frequently provide false HELO host names (like "yahoo.com" when passing spam e-mail; if you have the actual IP address in hand, you can use nslookup to detect the forgery.

traceroute -- "How do I reach you?"

Packets of data sent over the internet will seldom go directly from originator to recipient; instead, they may pass over any number of intermediary stops (host machines, routers, gateways, proxy servers, etc.), each of which will nudge them a bit closer to their ultimate destination. In fact, successive packets can in theory go by completely different routes and can therefore be received out of their intended sequence (it's the job of protocols like TCP to sort out the mess for you).

Sometimes the route that a packet might take from here to there can be of interest in hunting down spam, particularly if other tools fail to provide complete information about a spammer's host. You can determine the route that a packet would take from your computer to some other computer using the traceroute command. Traceroute is available on Unix and Windows computers from the command line (it may be called "tracert" on Windows boxes), and you call it like so:

traceroute [host-name or IP-address]

The traceroute command works its way through the routing step by step, and may sometimes take a few seconds to complete. However, if the route is successfully traced, you'll get a display like the following:

(Find the traceroute command on my system...)

[localhost:~] rconner% whereis traceroute
/usr/sbin/traceroute

(Run traceroute...)

[localhost:~] rconner% /usr/sbin/traceroute www.yahoo.com
traceroute to www.yahoo.akadns.net (64.58.76.225), 30 hops max, 40 byte packets
1 151.200.154.1 (151.200.154.1) 17.752 ms 12.812 ms 13.441 ms
2 a7-0-31120.q-gsr1.res.verizon-gni.net (151.200.4.194) 22.36 ms 16.015 ms 13.464 ms
3 wash02-edge01.dc.inet.qwest.net (63.148.64.221) 25.414 ms 19.462 ms 24.43 ms
4 wash02-core01.dc.inet.qwest.net (205.171.9.21) 21.657 ms 18.968 ms 27.921 ms
5 wash05-brdr01.dc.inet.qwest.net (205.171.209.46) 19.516 ms 19.831 ms 29.036 ms
6 bpr2-so-7-2-0-0.virginiaequinix.cw.net (208.173.50.233) 18.003 ms 19.38 ms 30.727 ms
7 acr2-loopback.restonrst.cw.net (206.24.178.62) 21.112 ms 21.836 ms 20.704 ms
8 agr3-loopback.washington.cw.net (206.24.226.103) 19.137 ms 24.865 ms 23.93 ms
9 dcr1-so-6-2-0.washington.cw.net (206.24.238.57) 20.065 ms 19.064 ms 25.061 ms
10 cable-and-wireless-internal-isp.washington.cw.net (206.24.238.26) 24.163 ms 20.706 ms 44.278 ms
11 dcr01-g8-0.stng01.exodus.net (216.33.96.145) 20.487 ms 19.979 ms 20.077 ms
12 csr22-ve241.stng01.exodus.net (216.33.98.19) 25.192 ms 243.914 ms 22.006 ms
13 216.35.210.126 (216.35.210.126) 145.252 ms 34.287 ms 35.901 ms
14 w4.dcx.yahoo.com (64.58.76.225) 30.63 ms 19.245 ms 26.693 ms

The Registry database contains ONLY .COM, .NET, .ORG, .EDU domains and
Registrars.

The readout from traceroute shows each host that the packet passes through on its way to the destination, along with the name (if available) and IP address of the host, and the times required for each of three queries or probes to reach this host. Typically, the last couple of entries will be inside the domain that is hosting the spammer, so this is one way to deduce whom to complain to if other methods fail.

If you should get asterisks or stars ("* * *") in your traceroute printout, this means that some of the intermediary hosts have been slow in responding to the traceroute probes, or that these hosts are somehow misconfigured. The probe will, however, try to continue to the next hosts in the chain until the end is reached. Occasionally, you'll continue to get the missing probe lines for a very long time; this does not mean that the site is unreachable (ping can tell you this more quickly), just that the intermediate hosts don't properly respond to traceroute queries.

Although the Yahoo site seems to load very quickly when you have a good internet connection, the above traceroute tells us that each packet to or from Yahoo goes over as many 14 hops (possibly more from your own location), which gives you an idea of how fast and efficient IP networking can be even if it is rather complicated.

whois -- "Who owns this domain?"

OK, so we've used nslookup to identify the address of a host by its name, and used ping to determine whether that host is open for business, and finally used traceroute to figure out how we get there. Now, how can we tell who that host belongs to?

As you may know, you can't just barge in on the public internet and set up your own name and address. You must obtain permission to use a given domain name by "registering" it with any of a number of companies that provide this service, and then you have to contract with an internet service provider to "host" this domain by inserting the name(s) and corresponding IP address(es) of your host(s) into the DNS (the ISP may also provide the computers you use for this domain, or they may simply "park" your domain and assign you "private" static IP addresses for your own computers).

Our next tool, whois, enables you to get (among other data) information about who has registered a domain name; this information may include people's names, company names, telephone numbers, and e-mail addresses. Whois is available from the command line in Unix systems, and is called as follows:

whois [domain-name]

Be sure to use a domain name here (e.g., rickconner.net) rather than a specific host name (e.g., some-host.rickconner.net), or else you may not get anything useful back from plain whois.

Also, note that it doesn't (usually) make sense to call plain old whois with an IP address (since you can't register IP addresses, only domain names). If you do, you'll probably just get nothing useful back (but read on to discover when you can use whois to investigate IP addresses)

Here's an example of a "default" whois query (on my employer's domain, good old arinc.com):

[localhost:~] rconner% whois arinc.com

Whois Server Version 1.3

Domain names in the .com, .net, and .org domains can now be registered with many different competing registrars. Go to http://www.internic.net for detailed information.

Domain Name: ARINC.COM
Registrar: NETWORK SOLUTIONS, INC.
Whois Server: whois.networksolutions.com
Referral URL: http://www.networksolutions.com
Name Server: NS.CW.NET
Name Server: ENTERPRISE.ARINC.COM
Name Server: ZULU.ARINC.NET
Updated Date: 13-may-2002

>>> Last update of whois database: Fri, 16 Aug 2002 17:02:01 EDT <<<

The Registry database contains ONLY .COM, .NET, .ORG, .EDU domains and Registrars.

The default whois query might not give you all the information you need to identify the registant for a spam website. For this, you'll need to dig a bit deeper with whois, as we will see below.

For spam-hunters, whois can provide valuable information about who owns a domain, and where spam complaints might be directed. You can also use information from the whois entry to link spammers to other persons or entities (such as ISPs) that might be assisting them; or, you can spot similarities in whois data for two or more spam domains. A detailed whois report usually includes name server addresses, as well as administrative and abuse contact information. You can also usually see the full name and address of the registrant (although this information is often unreliable in the case of spammers, see below).

Advanced whois #1 -- querying specific domain registrars

The growth of the internet has made whois lookups somewhat more complicated in recent years. More and more new domain types are appearing (e.g., ".biz"), and the responsibility for registering domains has been spun off to many different private companies and quasi-government entities. There are a bunch of these private registrars, and each one is required to maintain a full set of domain-related information for public lookup via whois.

Thanks to this multiplicity of registrars, default whois lookups may only give you cursory information on the domain, and then tell you where else to look. Or, in some cases, they may tell you nothing at all.

Let's tackle the easy case first, in which the default lookup gives us the name of the registrar who sold the domain. In the arinc.com example above, we see that the default whois lookup mentions a specific whois server (at whois.networksolutions.com) that we can consult to learn more about this domain. In order to use these servers, we use the "-h" option with whois, which works as follows:

whois -h [whois-server] [domain-name]

where [whois-server] is replaced with the host name of the whois server we want to query. So, to learn more about arinc.com, you can use the following command (try this yourself for practice):

whois -h whois.networksolutions.com arinc.com

Occasionally, your default whois lookup may not yield anything at all on a domain. This might be the case if the domain is very new, is registered with a fairly obscure registrar, or is itself fairly obscure (e.g., a ".aero" domain). You could go to the list of accredited registrars, but there are too many of them to hit (and even the ICANN list I just linked to is not complete). What to do?

Your best chance would be to use one of the sophisticated "smart" whois services on the web, such as that at http://www.geektools.com/whois.php, or the VERY formidable http://www.completewhois.com/ (these folks also offer a whois server at whois.completewhois.com that you can use for command-line lookups). If you are persistent, you should soon be able to identify the specific parties who own the domain in which you're interested.

Whois Blues (added 31 August 2005)

Before you go off half-cocked with whois for domain lookups, I should point out some of its current problems. Over the last decade or so, with the expansion of the internet and the "privatization" of the domain registration process, the whois service has become much more unwieldy and tolerant of abuse. Here are three of the bigger problems you may run into when using whois for looking up domain info.

Finally, and needless to say, it is very risky to act upon the registrant contact information provided by whois for a suspected spam domain. I would not use such data for anything other than circumstantial evidence (possibly to link the domain to other spam operations), and I certainly would not send any e-mails or postal mails (or worse) based on such information.

Advanced whois #2 — whois.abuse.net lookups for abuse contacts

One specialized whois server is worthy of further discussion here: the folks at abuse.net run a whois server (at whois.abuse.net) that returns not the usual whois information, but a list of e-mail addresses that can be used to report abuse pertaining to the domain name you supply. In this case, "abuse" might include anything from spam websites and e-mail addresses to malware and cracker attacks. To use this particular server, you type:

whois -h whois.abuse.net [domain-name]

Note that listing domains and hosts with whois.abuse.net is strictly voluntary, so you may not always get back useful information from a whois.abuse.net inquiry. When you use this server, you should make sure that you get "real" addresses rather than the "default -- no info" addresses that will be supplied if whois.abuse.net doesn't have any entries for the host or domain. You may not want to report to such default addresses, so as not to reveal your e-mail address to people who shouldn't see it.

Advanced whois #3 -- IP-whois: who controls the IP address?

Sometimes, you are confronted by a bare IP address (say, for a spam website), and an nslookup doesn't do you much good (i.e., no reverse lookup is defined for the address). Even a traceroute may not be of much help if the last couple of hosts before the target are also identified only by cryptic IP addresses. You need to know who is responsible for the address in order to be able to report it to them, but how can you find out?

Again, whois, the Swiss Army Knife of network utilities, comes to your rescue. You can (with a small amount of effort) trace the IP address right back to its owner using the whois -h command together with the names of a few selected whois hosts belonging to the regional internet registries (RIRs) that control the allocation of IP address blocks throughout the world. This sort of lookup is often called "IP-whois."

An IP-whois lookup would look something like:

whois -h [RIR-whois-server] [IP-address]

Note that this is the one instance where it makes sense to use an IP address in the arguments to whois; in fact, since the RIR's don't control domain names, it may not be productive to lookup up domain names via the RIR whois hosts.

The following are the RIR whois servers that will be of most use:

Region RIR whois server
Asia, Pacific Rim Asia-Pacific Network Information Centre (APNIC) (http://www.apnic.net/) whois.apnic.net
USA, Canada, Caribbean (partial), Africa (partial) American Registry for Internet Numbers (ARIN) (http://www.arin.net/) whois.arin.net
Europe Réseaux IP Européens (RIPE) NCC (http://www.ripe.net/) whois.ripe.net
Latin America, Caribbean (partial) Latin American and Caribbean Internet Addresses Registry (LACNIC) (http://lacnic.org) whois.lacnic.org
Africa African Network Information Centre (AfriNic) (http://www.afrinic.net/) whois.afrnic.net

You need not use the whois command line to query these hosts; if you find it more comfortable, you can go to the RIR's website and use a web-based whois lookup (follow the links above, and look for a link to "whois").

So, which one of these do you contact? In general, you can't tell just from scanning an address which RIR to use (because the IP numbers are rather haphazardly allocated and don't easily match up to a particular RIR). So, you may have to try a couple of them before you hit pay dirt. The best place to start is with ARIN, since it tends to refer you to the correct RIR if the address is not one they have allocated (the other RIRs don't necessarily do this).

For example, let's pick an IP address at random -- oh, say 211.144.150.35. We'll try an ARIN lookup first:

[G4733:~] rconner% whois -h whois.arin.net 211.144.150.35

OrgName: Asia Pacific Network Information Centre
OrgID: APNIC
Address: PO Box 2131
City: Milton
StateProv: QLD
PostalCode: 4064
Country: AU

ReferralServer: whois://whois.apnic.net

NetRange: 210.0.0.0 - 211.255.255.255
CIDR: 210.0.0.0/7
NetName: APNIC-CIDR-BLK2
NetHandle: NET-210-0-0-0-1
Parent:
NetType: Allocated to APNIC
NameServer: NS1.APNIC.NET
NameServer: NS3.APNIC.NET
NameServer: NS4.APNIC.NET
NameServer: NS.RIPE.NET
NameServer: TINNIE.ARIN.NET
NameServer: DNS1.TELSTRA.NET
Comment: This IP address range is not registered in the ARIN database.
Comment: For details, refer to the APNIC Whois Database via
Comment: WHOIS.APNIC.NET or http://www.apnic.net/apnic-bin/whois2.pl
Comment: ** IMPORTANT NOTE: APNIC is the Regional Internet Registry
Comment: for the Asia Pacific region. APNIC does not operate networks
Comment: using this IP address range and is not able to investigate
Comment: spam or abuse reports relating to these addresses. For more
Comment: help, refer to http://www.apnic.net/info/faq/abuse
Comment:
RegDate: 1996-07-01
Updated: 2004-03-30

OrgTechHandle: AWC12-ARIN
OrgTechName: APNIC Whois Contact
OrgTechPhone: +61 7 3858 3100
OrgTechEmail: search-apnic-not-arin@apnic.net

# ARIN WHOIS database, last updated 2004-05-14 19:15
# Enter ? for additional hints on searching ARIN's WHOIS database.

The Registry database contains ONLY .COM, .NET, .ORG, .EDU domains and Registrars.

Here, we got no specific information tracing the address down to its owner. So, we kinda struck out, but at least ARIN points us promptly to the correct RIR (see the white highlight), which happens to be APNIC:

[G4733:~] rconner% whois -h whois.apnic.net 211.144.150.35
% [whois.apnic.net node-1]
% Whois data copyright terms http://www.apnic.net/db/dbcopyright.html

inetnum: 211.144.150.0 - 211.144.150.254
netname: YONGCHANG
descr: Beijing Yong Chang Wireless Telecom Co.,Ltd
descr: Co.
descr: BeiJing
country: CN
admin-c: LL212-AP
tech-c: LL212-AP
mnt-by: MAINT-CNNIC-AP
changed: llz@srit.com.cn 20020218
status: ASSIGNED NON-PORTABLE
source: APNIC
changed: hm-changed@apnic.net 20020827

person: lizhang li
nic-hdl: LL212-AP
e-mail: as9811@srit.com.cn
address: No.225 Chaonei Street Dongcheng District Beijing China
phone: +86-10-65253831
fax-no: +86-10-65244907
country: CN
changed: ipas@cnnic.net.cn 20030318
mnt-by: MAINT-CNNIC-AP
source: APNIC

Aha! This query returned all of the info that APNIC has on this particular IP number. We can see from the red highlight that this address is part of a block of 255 address allocated to Beijing Yong Chang Wireless Telecom, Ltd. in mainland China. We also uncovered a contact for inquiries regarding this address: one Lichang Li, whose e-mail address is noted in the blue highlight; we also have this person's telephone number and mailing address. If we were to receive spam implicating this address, we would at least be able to contact this party to file a complaint (it would have to wait in line behind the zillions this person has no doubt already received, for this domain is currently a veritable anthill of spam websites).

"Safe surfing" with telnet and curl

So you've received some spam advertising a particular website, and you're curious to know what might be on that site—but you're afraid to load it in a browser (because it might contain harmful stuff, or tricky code that hides important information about itself). What to do? Can you surf the web safely?

In fact, you can, and there are many ways to do so, but most of these may not be exactly what you want.

What you really want (probably) is to fetch a web page from a web server as a plain, unprocessed, unrendered file. This would allow you to see exactly what gets fed to your browser (as opposed to what appears in the browser window, which could be quite different). There are two good ways to do this: with the telnet command, which appears on most Windows and Unix-like systems (including Mac OS X/Darwin), and with the utility command curl, which can fetch files using HTTP, FTP, WAIS, or other protocols specified in the URL you pass to it. The curl command is usually included in modern Unix distributions (including Linux and Mac OS X), but if you don't have it you can obtain it from http://curl.haxx.se/ for your own system.

OK, so why on Earth would you want to go all around Robinson's barn to download web pages in this way when you have a perfectly good web browser? The answer is that these techniques don't filter or interpret the downloaded data in any way (as a browser might); they would be useful in spam investigations when you need to see exactly what's going on at a website. The advantages of using telnet or curl rather than a browser to fetch a web page are:

Using curl to fetch web pages

The curl command is fairly simple to use; in its basic form, you simply type:

curl [url]

and the file at the URL you specify will be printed in the console window. If you would rather save this data to a file without printing onscreen (probably a good idea if you are fetching a binary file, such as an image), you can simply redirect standard output to the file:

curl http://www.rickconner.net/spamweb/ > swindex.html

Here, you would be fetching the index page of this site and storing it on your system as "swindex.html"

Note that you must give curl a full URL so that it knows what protocol to use in order to fetch the file. This means that for a web fetch you must type "http://something-or-other/directory/file" and not just "something-or-other." Similarly, to fetch a file via FTP, you type an FTP URL (like "ftp://big.ftp.site/pub/freeestuff.zip"). Refer to the curl manual page for more information, including special options that might be of use.

Using telnet to fetch web pages

If you don't have curl, you can use a brief telnet session to fetch a web page. This isn't as convenient as curl (it's a two-step operation, and doesn't permit you to redirect output to files), but it is adequate for most uses.

Normally, telnet is used for logging on to a remote system for a command-line or text-based session (and it operates by default on IP port 23, the "well-known" port for telnet service). However, you can specify any port you like on the telnet command line (such as port 80, used for most web transactions); this is the capability we will use in order to "pretend" to be a web browser.

Fetching a page via telnet is a two-step procedure: you first issue the telnet command, giving the port option:

telnet [web-server-name-or-ip] [port-number]

Then, after the connection is made (and you are given a blank line without a prompt), you use the HTTP command "GET" (note it is in upper case):

GET /[file-to-get]

(be sure to add the "/" before the file name). Then, telnet will type the page or file onto the screen (or an error message, if it can't fill the request), after which the telnet session is disconnected (this disconnection is perfectly normal, and is part of the "connectionless" model used by the HTTP protocol; you're not being "booted" by the spammer).

For example, here are the commands you would use in a telnet session to fetch this page (http://www.rickconner.net/spamweb/tools.html) as a raw file. What you would type is in white:

[G4733:~] rconner% telnet www.rickconner.net 80
Trying 206.224.90.170...
Connected to www.rickconner.net.
Escape character is '^]'.
GET /spamweb/tools.html


(file printout will follow)

There are two things to be aware of: first, IP port #80 is used by default for web connections, but some websites (particularly, those belonging to some spammers) use alternate ports. You can recognize these as a colon followed by a number after the domain name. In the following URL, for example, the port number is actually 8080 (green highlight) rather than the standard 80:

http://www.greedyspammer.zzz:8080/getrichquick.html

so your telnet fetch of this page (using, for variety, an MS-DOS command shell) would go something like:

C:\> telnet www.greedyspammer.zzz 8080
(telnet introductory stuff followed by blank line prompt...)

GET /getrichquick.html

(file printout follows...)

Secondly, if the last character in the URL is "/", web servers treat this as a request for the default or "index" page of the directory; you can thus just add "/" to the end of the directory path in the GET command in order to fetch this default page:

C:\> telnet www.rickconner.net 80
(telnet introductory stuff followed by blank line prompt...)

GET /spamweb/

(printout of "/spamweb/index.html" follows...)

Note that the telnet command will print its output to the "standard output" (the console or terminal window or DOS window, in this case). There's no easy way (that I know of) to redirect this output into a file, but you can simply copy the text from your terminal window and paste it into a file using a text editor. You can also fetch any kind of file using this trick, even "binary" files like images; be warned, however, that your screen display may get messed up by such files, and it won't be easy to save the data into a file (this is a good reason to use curl instead, since it can be used with normal pipes and redirects on the command line).

Deeper DNS digging with dig

The dig tool is like the special wrench you keep down in the bottom of your toolbox, wrapped up in oilcloth; you don't use it very often, but when you do, it's just the ticket. The dig command is most useful on those occasions when you need to know more about a domain or host name than nslookup or host can provide. Dig (short for "domain information groper," apparently) isn't the easiest tool to use, and its output isn't the easiest to read, but there are occasions when enduring these shortcomings will pay off. I am hardly an expert with dig, but I can offer a few tips.

The basic dig call looks like this:

dig @NAME-SERVER TYPE NAME

where NAME-SERVER is the server you want to query (if you omit it, your local name server(s) will be queried), TYPE is the type of lookup you want (i.e., the question you want to ask), and NAME is the domain or host name you're interested in. In most cases, you want to use the domain name (e.g., "rickconner.net") rather than a specific host name or alias within the domain ("www.rickconner.net").

There are many possible types of lookups you can do with dig, but most of them probably won't work (i.e., you'll get no "answer section" in the dig output), mainly because the DNS admins for the domain haven't seen fit to set them up. The most useful lookups should be available, however, and they include:

The simplest dig call (i.e., "dig rickconner.net") will give you an A lookup, and you'll get more or less the same info you would get from nslookup or host. You can use either a domain name or a host name as the NAME. For example:

[G4733:~] rconner% dig www.rickconner.net

; <<>> DiG 9.2.2 <<>> www.rickconner.net
;; global options: printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 42365
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 2, ADDITIONAL: 2

;; QUESTION SECTION:
;www.rickconner.net. IN A

;; ANSWER SECTION:
www.rickconner.net. 5540 IN A 209.198.131.19

;; AUTHORITY SECTION:
rickconner.net. 5540 IN NS ns2.prismnet.com.
rickconner.net. 5540 IN NS ns1.prismnet.com.

;; ADDITIONAL SECTION:
ns1.prismnet.com. 8605 IN A 209.198.128.11
ns2.prismnet.com. 8605 IN A 209.198.128.27

;; Query time: 147 msec
;; SERVER: 199.45.32.43#53(199.45.32.43)
;; WHEN: Mon Oct 10 22:44:18 2005
;; MSG SIZE rcvd: 132

If you want to find out specifically what the name servers are for a domain, you can use the NS lookup. This will usually work only when you use a domain name as the name and not a host or alias. Most of the time, you'll get the same info here as you would with the defaut A-type lookup, so I'll skip the example.

The last dig lookup I'll cover here is the MX lookup, which will tell you what mail hosts will accept e-mail directed to the domain. Here, again, you should use a domain name as the NAME. For example:

[G4733:~] rconner% dig @ns1.prismnet.com mx rickconner.net

; <<>> DiG 9.2.2 <<>> @ns1.prismnet.com mx rickconner.net
;; global options: printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 11053
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 4, AUTHORITY: 2, ADDITIONAL: 6

;; QUESTION SECTION:
;rickconner.net. IN MX

;; ANSWER SECTION:
rickconner.net. 14400 IN MX 10 defender.io.com.
rickconner.net. 14400 IN MX 20 mx2.prismnet.com.
rickconner.net. 14400 IN MX 10 zaxxon.io.com.
rickconner.net. 14400 IN MX 10 redbaron.io.com.

;; AUTHORITY SECTION:
rickconner.net. 14400 IN NS ns1.prismnet.com.
rickconner.net. 14400 IN NS ns2.prismnet.com.

;; ADDITIONAL SECTION:
defender.io.com. 600 IN A 209.198.128.79
mx2.prismnet.com. 14400 IN A 209.198.128.34
zaxxon.io.com. 600 IN A 209.198.128.81
redbaron.io.com. 600 IN A 209.198.128.80
ns1.prismnet.com. 14400 IN A 209.198.128.11
ns2.prismnet.com. 14400 IN A 209.198.128.27

;; Query time: 134 msec
;; SERVER: 209.198.128.11#53(ns1.prismnet.com)
;; WHEN: Mon Oct 10 23:00:10 2005
;; MSG SIZE rcvd: 272

There's a lot more digging you can do with dig, but for further information I'd direct you to the dig manual page.

Other tools you may hear about

The tools I describe above should fill most of your spam-hunting requirements. You may hear tell of other tools, however, and I'll describe them here for your information.

Finger

The service known as finger used to be quite useful and popular; from a finger lookup, you could have found out the name of the person who "owned" a particular e-mail address, possibly along with that person's postal address, telephone number, office location, and other information (including an indication of whether or not he or she was online at the time via a telnet or rlogin session).

I say that finger "used to be" useful, but support for public finger services has just about disappeared from the face of the Earth (some organizations, mainly universities, may still use "private" finger service). Blame this mostly on spammers and other abusers; finger made it so easy for them to harvest addresses or collect other information they shouldn't have, that no competent network operator would now dare risk running a public finger server.

Port scanner

The port scanner is a type of diagnostic software application that determines whether a remote machine is accepting connections on one or more IP ports of interest. For instance, you can use a port scan on port 25 of a given internet host to determine whether that host is running SMTP service.

Port scanners are usually built into more comprehensive network security tools, although telnet (as we saw above) can provide a rather crude and quick port scanning capability thanks to its port-number option. Port scanning is seldom, if ever, of interest to me in investigating spam; I don't want to try to break into a spammer's machine, I just want to ask the spammer's ISP to shut it down.

Packet sniffer

The packet sniffer is a program that can listen in to ALL Ethernet traffic on the network to which it is connected; it can be "tuned" to pick out packets of interest to the user and print these to a display or a file. Sniffers may offer other useful tools such as packet parsers or TCP transaction reconstruction.

One of the best known of the breed is Wireshark (http://www.wireshark.org/), formerly known as Ethereal, an open-source program that is available for a variety of computer platforms.

Packet sniffers sound rather sinister (and, in fact, they can be), but their range is fortunately rather limited: if your computer is isolated from the sniffer by an ethernet switch or router, a firewall, a DSL/cable "modem" (or other "layer 3 device"), the sniffer will only be able to see your traffic when it goes through that switch or router. Again, while the packet sniffer is an invaluable tool for general network security purposes, it isn't of much help in spam hunting (although I have used Wireshark to study the behavior of certain spam websites as they load onto my computer).

Spam filtering software

There's a lot of software on offer that will help you filter or interdict spam. These fall into two categories:

You'll find descriptions of particular tools for mail filtering on my page about filters. I also explain there why some of these are less effective or useful than others.

Spam filtering hardware

Many ISPs admins and corporate IT managers tend to like to solve problems by simply dropping a new box into the rack. A few firms have stepped forward to meet this demand with "turn-key" spam filtering machines. These machines are basically standard servers running specialized software; the organization's incoming mail is directed to these machines, and they can spot spam and viruses using a variety of techniques (many of which are described on these pages). Usually, these systems will be linked to central servers or databases that can distribute software patches and new "mal-mail" fingerprints so that they can be kept up to date with the latest tricks of the criminal trade.

The advantage of using such machines is that they can provide a high level of protection against spam and other mail-borne threats without taxing the abilities of an individual company's IT staff.

Where to find tools

As I noted, ping, traceroute, telnet, and nslookup are all available to most Windows and Unix users from the command line. If you use a Macintosh with MacOS 9 or earlier, you may not have access to these tools. However, there are several shareware and freeware utilities available to fill the gap; you can find many of these at the Tucows Macintosh internet tools site.

Whois and dig are not provided on most Windows machines (nor on pre-OS-X Macintoshes), but you can find versions of them in various open-source or shareware repositories (and from http://samspade.org/ssw/).

There's actually no need, however, to download software to your computer; you can find web-based versions of most of these tools, some optimized for particular uses (including investigation of spam or other abuse). I offer a sampling of these below; I don't actively endorse them and have no business relationship with them, but I do use many of them frequently and I note that they all offer their basic services free of charge for the publc benefit:


(c) 2003-2006, Richard C. Conner ( )

14175 hits since March 28 2009

Updated: Fri, 02 Feb 2007