A Web Developer's Guide to Combating Spam

Key Points

Unsolicited junk email, also known as spam, is a growing and seemingly out of control problem on the Internet.

Exposed website email addresses contribute to the spam problem by attracting programs called spambots. Spambots rove the Internet looking for email addresses to harvest and send back to spammers.

Several techniques can be used to mask or hide website email addresses from spambots. These include munging, using graphics for email addresses, and using JavaScript and ASCII.

By Robert D. Hughes

The Problem

Spam, or unsolicited junk email, is the bane of the Internet. An estimated 70% of all email messages are now considered spam. That's nearly 3 out of 4 emails. Clearly the situation is out of control, and Web developers have contributed to the problem by exposing Web page email addresses to automated programs called spambots. These spambots scour the Web looking for @ symbols and mailto: schemes in email links, and when they find them they send the associated email addresses back to their spammer parents. The spammers then use the collected email addresses to send out spam, or they sell them to other spammers. The end result is an increase in spam. The focus of this article will be to demonstrate techniques that Web developers can use to hide or mask email addresses in Web page mailto links from roving spambots. Note: Though effective, the techniques described in this article are not foolproof; the only way to guarantee that a Web page email address won't be discovered by spambots is not to publish it. This article is not intended to be a primer on spam itself. For that, try a Web search using "spam" as the keyword.

Techniques

Munging

Munging means removing certain characters from an email address and replacing them with other characters that spambots don't recognize, thus hiding the email address. For example, in the hypothetical email address below, the @ symbol has been replaced with the letters AT.

james.peterson AT someisp.com

The main drawback with this technique is that a munged email address appears on a Web page as plain text and not as a clickable link (don't try to use a munged email address in an email link; it won't work). Thus a user has 2 choices if he or she wants to use a munged email address: either manually type the address into the To: field of an email message and replace the undesirable characters with the appropriate characters, or copy the munged email address from a Web page, paste it into the To: field of an email message, remove the undesirable characters, and replace them with the appropriate characters. Less Web savvy folks might not know how to do this and there's always the possibility that mistakes will be made during the character replacement process, e.g., the required @ symbol will be replaced by another character, rendering the email address unusable. It's also fairly labor intensive compared to just clicking a link. For these reasons munging is a less than desirable way of hiding Web page email addresses from spambots.

Graphical email Addresses

Some websites use graphics to represent email addresses. For example, the hypothetical email address below is actually a GIF graphic created in Photoshop. If you click it or try and copy and paste it into the To: field of an email message, nothing happens. The user has to manually type the characters into the To: field of an email message. Like munging, this is a fairly labor intensive process compared to clicking a link, and there's always the possibility of making a typo. Also, some users turn image rendering off in their browsers. A graphical email address won't appear on a Web page if this is the case.

Using JavaScript

JavaScript can be used to encrypt or "scramble" Web page email addresses to hide them from spambots. The example below shows a hypothetical email address scrambled using JavaScript:

In the example above, the email address has been broken up into parts and each part has been assigned to a variable. The document.write statement is then used to write the variables to the Web page. On a Web page this all appears as a clickable mailto link, if JavaScript is enabled (see the next paragraph). This is only one way of using JavaScript to encrypt an email address, and if you're familiar with JavaScript you can probably come up with a few others.

The downside of using JavaScript to encrypt email addresses is that some Web users disable JavaScript in their browsers. If JavaScript has been disabled, a JavaScript encrypted email address won't appear on a Web page. According to the latest browser statistics from the W3 Schools, JavaScript is enabled in about 96% of all browsers, so the overwhelming majority of Web users have JavaScript enabled. For the few that don't, you can add a <noscript> tag and a message (like I did in the email address above) warning them that the email address will not work without JavaScript. My feeling about this matter is that the people who disable JavaScript in their browsers are Web savvy enough to know that they will be missing certain website features without JavaScript.

Using ASCII

American Standard Code for Information Interchange (ASCII for short and pronounced ASS-kee) "is a code for representing English characters as numbers" (from Webopedia). ASCII code can be substituted for all of the characters in an email address. Take a look at the hypothetical email address below. The @ symbol has been replaced by its ASCII equvilant, the number 64. As mentioned in the introduction, spambots cue in on the @ symbol in Web page email links to find email addresses. My personal email address on this Web page, including the mailto: scheme, is written in ASCII code (view the source code of this page to see it), yet my email address appears as (and works as) a normal link on the page.

At the very least the mailto: scheme and the @ symbol should be replaced by their ASCII equivalents, and to be safe the email address should be replaced also. An ASCII Reference page can be found on the W3 Schools website. If you use ASCII on your Web pages, don't forget to include the ampersand sign (&) and the pound sign (#) before the number, and the semicolon (;) after the number. Without these additional symbols ASCII won't work.

The beauty of using ASCII to mask Web page email addresses is that all browsers understand it, since ASCII is just plain text. The downside may be that it probably wouldn't be too hard for spammers to write programs that can sniff out ASCII based email addresses and translate them back into English characters, thus eliminating the advantage of using ASCII. This is just speculation though.

Conclusion

Spam will probably be a part of our Internet lives for the foreseeable future, but that doesn't mean that we as Web developers are powerless against it. The techniques described in this article for hiding website email links can help reduce the glut of spam.