19 January 2011

Comment Spam Purged And Comments Analyzed

I've finished reviewing every single comment ever made on this blog in order to purge comment spam and have removed about two hundred spam comments. The signal to noise ratio isn't bad. Only about one in twenty was spam (although this doesn't reflect an almost equal number that I had found and deleted along the way before now). I may have missed a handful of subtle spam comments, but I think I have it reasonably well in hand. Comment spam will not, of course, be tolerated in the future either and is routinely deleted.

I also reviewed the spam filter cache that I didn't know that I had, which contained 44 comments at the time, of which about 4 were not spam (and were restored to the blog) and the rest were spam. It is annoying that legitimate comments were being withheld without my knowledge, but I know now and a 10% false positive rate isn't horrible.

The effort was made possible by the new Blogger comment tab feature, although I wish it was possible to review comments in a way that screened contributors whom I know to be non-spammer, as about 90% of the comments on this blog have come from me or one of about eight regulars. Likewise, if I could protect these individuals from the spam filter, the false positive rate of the filter's spam detection would go way down. The good aspect of the Blogger comment tab feature is that it makes it much easier to be on the lookout for spam directed at old posts without having to shut down comments at those posts and without having to receive an e-mail every time a comment is made to the blog.

The biggest comment spam offenders are selling prescription drugs (sex and non-sex related), subprime lending (e.g. payday, mortgage and car title loans), criminal lawyers, personal injury lawyers, vacations, real estate, and jobs in third world countries.

I also removed one death threat directed at me by an anonymous poster in connection with a post about the anti-Islamic comic matter, which is notable and only the second death threat that I've ever received in my life. The other was a variant on the Nigerian money transfer scam which I received via e-mail a few years ago (and reported to the appropriate authorities), demanding a payoff or my life in short order.

Once the spam was removed, it turns out that there are roughly six comments on this blog for every ten posts. About half of them are from me, either as updated to old posts or in response to other comments. This leaves about three comments per ten posts from everyone else. Perhaps eighty percent of those (perhaps 750) are from the small group of people who comment regularly, about ten percent (perhaps 160) are on the dozen or so posts that have attracted a great deal of comment activity (Kent Hovind, Rich Dad-Poor Dad, Against Planners, New Orleans Is Doomed, Shane Co. bankuptcy, How Safe Are Motorcycles, and a few others). The rest are isolated comments to posts from people who don't comment regularly.

No comments: