A Thorn in Spammers’ Sides

I’m not sure if this means that the spammers win or that I win, but I’ve made a couple decisions about how I’m going to fight comment spam. I’m changing a few policies for a couple reasons.

First, my auto-block count has reached 13,000 blocks. There’s no question that MT-Blacklist has saved me hours of work, and saved my blog from falling into some horrible, decrepit state where my readers would be foced to wade neck-deep through ads for refinanced morgages, cheap medication, and porn just to read each others’ comments. I would sooner shut down my blog than allow that to happen.

The second reason is due to something I never anticipated until the first time it happened. My apologies for the lengthy explanation, but I figure some of you may find this interesting. Even though I am successfully blocking thousands of attempts to spam my blog and the sub-blogs on this site, I cannot keep the spammers’ scripts from using the CPU power of my server. Each time a spammer’s script attempts a posting, the server has to think for a second before it can tell them that they cannot post their URLs, because it has to check the URL they’re trying to post against my blacklist, and determine if it should allow them to make their post. And when the spammer attempts to post 750 comments (or over 3000 comments as happened recently) my server becomes quite busy just blocking them.

The guys who run my server don’t like the fact that my site can take CPU power away from the other sites hosted on their machine, so they have to either shut down comments (which they did the first time), or install some kind of software governor that slows down the CPU usage for just my site, so that the other sites on the server aren’t inconvenienced by the spam attacks I get. That’s what they’ve done for the most recent attacks. So now, when an attack happens, no one can read my site until the CPU usage goes back down to normal levels. Visitors even see a page that says something like, “Resource limit exceeded. Will refresh in a few seconds…”. Some of you may have seen that recently. That’s what you see when I’m in the middle of a Spam Storm.

Obviously I don’t like the idea that spammers can temporarily deny my readers access to my content, so I’ve decided to close the comments for any entry that has been up for a few months. But as usual, I want to fight this war on a larger scale. Closing comments on old entries by going through each entry and clicking “Close” and “Save” can take hours of work (at least the first time through). I want to help people do that by providing a free PHP script (at least for Movable Type 2.x users, but possibly 3.x users as well) that can automatically close old posts whenever it is run. So any time you want to close the comments on any posts that are older than X days, you just run the script, tell it how many days to leave open, and it does the rest for you in about 1 second. Probably less than 1 second, but you get the idea. A quick rebuild of your blog will finish up the work.

I’d like to use this blog post to gauge interest in such a script. Assuming it was easy to use, would you find such a thing useful, or not? If not, why not?

11 thoughts on “A Thorn in Spammers’ Sides

  1. Sounds interesting. If I was having the same problems as you, I’d consider it too. Of course, I’d probably just do the following:

    % mysql –host #myhost# -u #nick# -p
    USE #mydb#;
    UPDATE mt_entry SET entry_allow_comments = 0 WHERE DATE_SUB(CURDATE(),INTERVAL 2 MONTH) > entry_created_on;

    This will disable comments on all entries older than 2 months. This is a MT 3.x installation, but I believe the relevant table/keys are the same in MT 2.x as well.

  2. Kevin,

    Yes, that’s what we would do, knowing MySQL and being geeks. But it’s not user-friendly at all. I want to give all the non-geeks something they can drop on their servers and enter information they completely understand, and get results they will understand.

    (Also, the query above assumes you only host one blog on your server. You’d want to add what in MT 2.x is blog_id = #myblog# to that query)

  3. This is a bit unrelated, but I have never had an issue with comment spam on my ‘blog’ in the 4 years I have kept it. Is that because I don’t use Moveable Type or is it a problem with all of the blogging services out there?

    Whenever I post an anon. comment on some sites I have to authenticate and prove that I’m a human being (with that funky picture where you have to type in the letters and numbers to match it). Could something like this be done to keep comment spam from even getting to your server?

    Or perhaps just switching to a different blogging software if the one you are using doesn’t have the functionality you need?

    (Forgive my complete lack of technical knowledge….)

    In any case, good luck! I can’t stand junk in my inbox so I have highly developed filters. It would frustrate me to no end if I also dealt with it on my blog.

  4. That does actually sound handy. For some MT sites I host I’ve taken the approach of limiting the request even getting to MT if some simple stuff isn’t met (mod_perl), but that depends on a lot of server access. It’s unfortunate that people are trying to exploit MT so much. It seems like anything that’s popular gets that way. 🙁 I also haven’t had issues (custom software, but I hear about everyone with MT or wordpress hitting it. Hopefully little stuff like this will catch on but stay diverse enough that spammers won’t be able to pick on. It’d suck if everyone had to start using CAPCHAs (although Lara’s comment sounds like some services do use them already?).

  5. I really should’ve known better than to look at your blacklist. Apparently 8+ years of internet use wasn’t enough for my commonsense to remind me of the kind of garbage found on something labeled ‘blacklist’. Seriously, I am supposed to be smarter than that!

  6. D’oh! You know, some times I forget about stuff like that, and I just put that stuff up there without realizing it might cause difficulties for people. Sorry about that. I’ve removed the link.

    Suffice it to say, my blacklist is a collection of 3000-some sites, some of which are obviously quite nasty.

  7. For anyone wondering what a CAPCHA is, you can read about them here. In a nutshell, they’re those little images you see in certain webpages that ask you to type the word you see in the image. When you garble text in just the right way, humans can still easily read it, but computers can’t. So it’s basically a “prove you’re a human” test. Suddenly I’m thinking of Blade Runner.

    I agree Jeremy, it would suck if those were enforced everywhere. What a pain! I’ve seen them on people’s blogs and on signup forms for different services. I guess it’s a nice way to keep a blog spam-free unless there are actual humans out there doing the spamming. But in that case, the volume of spam won’t be very high.

    Lara, I’m guessing that the fact you haven’t been spammed much probably does have to do with the fact that you’re not using popular blogging software. I’m more interested in helping the community to avoid spam than just in de-spamming my blog. Hopefully the script I’m writing (which is almost done! I’m trying to make it non-ugly and make it foolproof) will be a nice kick in the shins to spammers worldwide.

    I’m happy to say, though, that I don’t get much email spam. I’m ultra-careful about giving out my email address, and I have an infinite number of of email aliases I use.

  8. The big issue I have with CAPCHAs is the whole accessibility thing. You basically lock out anyone with a screenreader when you use a visual one. Of course you can have both visual and audio CAPCHAs and get almost everyone, but it just becomes a pain for everyone.

Leave a Reply

Your email address will not be published. Required fields are marked *