A Web Developer's Guide
to
Combating Spam

- by Robert D. Hughes -

Key Points

Unsolicited junk e-mail, also known as spam, is a growing and seemingly out of control problem on the Internet.

Exposed website e-mail addresses contribute to the spam problem by attracting programs called spambots which rove the Internet looking for e-mail addresses to harvest and send back to spammers.

Several techniques can be used to mask or hide website e-mail addresses from spambots. These include munging, using graphics for e-mail addresses, and using JavaScript and ASCII.

The Problem

Spam, or unsolicited junk e-mail, is the bane of the Internet. An estimated 70% of all e-mail messages are now considered spam. That's nearly 3 out of 4 e-mails. Clearly the situation is out of control, and Web developers have contributed to the problem by exposing Web page e-mail addresses to automated programs called spambots. These spambots scour the Web looking for @ symbols and mailto: schemes in e-mail links, and when they find them they send the associated e-mail addresses back to their spammer parents. The spammers then use the collected e-mail addresses to send out spam, or they sell them to other spammers. The end result is an increase in spam. The focus of this article will be to demonstrate techniques that Web developers can use to hide or mask e-mail addresses in Web page mailto links from roving spambots. Note: Though effective, the techniques described in this article are not foolproof; the only way to guarantee that a Web page e-mail address won't be discovered by spambots is not to publish it. This article is not intended to be a primer on spam itself. For that, try a Web search using "spam" as the keyword.

Techniques

Munging

Munging means removing certain characters from an e-mail address and replacing them with other characters that spambots don't recognize, thus hiding the e-mail address. For example, in the hypothetical e-mail address below, the @ symbol has been replaced with the letters AT.

james.peterson AT someisp.com

The main drawback with this technique is that a munged e-mail address appears on a Web page as plain text and not as a clickable link (don't try to use a munged e-mail address in an e-mail link; it won't work). Thus a user has 2 choices if he or she wants to use a munged e-mail address: either manually type the address into the To: field of an e-mail message and replace the undesirable characters with the appropriate characters, or copy the munged e-mail address from a Web page, paste it into the To: field of an e-mail message, remove the undesirable characters, and replace them with the appropriate characters. Less Web savvy folks might not know how to do this and there's always the possibility that mistakes will be made during the character replacement process, e.g., the required @ symbol will be replaced by another character, rendering the e-mail address unusable. It's also fairly labor intensive compared to just clicking a link. For these reasons munging is a less than desirable way of hiding Web page e-mail addresses from spambots.

Graphical E-mail Addresses

Some websites use graphics to represent e-mail addresses. For example, the hypothetical e-mail address below is actually a GIF graphic created in Photoshop. If you click it or try and copy and paste it into the To: field of an e-mail message, nothing happens. The user has to manually type the characters into the To: field of an e-mail message. Like munging, this is a fairly labor intensive process compared to clicking a link, and there's always the possibility of making a typo. Also, some users turn image rendering off in their browsers. A graphical e-mail address won't appear on a Web page if this is the case.

Hypothetical e-mail address

Using JavaScript

JavaScript can be used to encrypt or "scramble" Web page e-mail addresses to hide them from spambots. The example below shows a hypothetical e-mail address scrambled using JavaScript:

<noscript>
<p> If you are seeing this message you have either turned JavaScript off in your browser or you are using a browser that doesn't support JavaScript. Without JavaScript, the e-mail address below will not work. </p>
</noscript>
<script type="text/javascript">
var first_part = "<a href='mailto:";
var user_name = "james.peterson";
var at = "@";
var domain = "someisp.com'";
var name = "James Peterson";
var last_part = "</a>";
document.write(first_part + user_name + at + domain + name + last_part);
</script>

In the example above, the e-mail address has been broken up into parts and each part has been assigned to a variable. The document.write statement is then used to write the variables to the Web page. On a Web page this all appears as a clickable mailto link, if JavaScript is enabled (see the next paragraph). This is only one way of using JavaScript to encrypt an e-mail address, and if you're familiar with JavaScript you can probably come up with a few others. If you're not familiar with JavaScript, you can use The Hivelogic Enkoder, an online form that will automatically encrypt e-mail addresses for you.

The downside of using JavaScript to encrypt e-mail addresses is that some Web users disable JavaScript in their browsers. If JavaScript has been disabled, a JavaScript encrypted e-mail address won't appear on a Web page. According to the latest browser statistics from the W3 Schools, JavaScript is enabled in about 96% of all browsers, so the overwhelming majority of Web users have JavaScript enabled. For the few that don't, you can add a <noscript> tag and a message (like I did in the e-mail address above) warning them that the e-mail address will not work without JavaScript. My feeling about this matter is that the people who disable JavaScript in their browsers are Web savvy enough to know that they will be missing certain website features without JavaScript.

Using ASCII

American Standard Code for Information Interchange (ASCII for short and pronounced ASS-kee) "is a code for representing English characters as numbers" (from Webopedia). ASCII code can be substituted for all of the characters in an e-mail address. Take a look at the hypothetical e-mail address below. The @ symbol has been replaced by its ASCII equvilant, the number 64. As mentioned in the introduction, spambots cue in on the @ symbol in Web page e-mail links to find e-mail addresses. My personal e-mail address on this Web page, including the mailto: scheme, is written in ASCII code (view the source code of this page to see it), yet my e-mail address appears as (and works as) a normal link on the page.

Hypothetical e-mail address

At the very least the mailto: scheme and the @ symbol should be replaced by their ASCII equivalents, and to be safe the e-mail address should be replaced also. An ASCII Reference page can be found on the W3 Schools website. If you use ASCII on your Web pages, don't forget to include the ampersand sign (&) and the pound sign (#) before the number, and the semicolon (;) after the number. Without these additional symbols ASCII won't work.

The beauty of using ASCII to mask Web page e-mail addresses is that all browsers understand it, since ASCII is just plain text. The downside may be that it probably wouldn't be too hard for spammers to write programs that can sniff out ASCII based e-mail addresses and translate them back into English characters, thus eliminating the advantage of using ASCII. This is just speculation though.

Conclusion

Spam will probably be a part of our Internet lives for the foreseeable future, but that doesn't mean that we as Web developers are powerless against it. The techniques described in this article for hiding website e-mail links can help reduce the glut of spam.

***********

Other Web Development Articles By Robert D. Hughes