spam || !spam

(Or, (or spam (not spam)) in Scheme… :-)

This blog is powered by WordPress. Among other things, it notifies me when somebody posted a comment that needs to be moderated. Notifications go to my GMail address.

The other day, I found two notification mails in GMail’s Spam section, that looked like this:

A new comment on the post #29 “Python vs Scheme: strings” is waiting for your approval
http://4.flowsnake.org/archives/29

Author : Xvyozvcu (IP: 206.53.55.5 , 206.53.55.5)
E-mail : yqufupiy@gmail.com
URL : http://blahblah.com/blah.html
Whois : http://ws.arin.net/cgi-bin/whois.pl?queryinput=206.53.55.5
Comment:
[...lots of bogus text with spammy URLs elided...]

The question is: from GMail’s point of view, is this message spam or not?

My first reaction would be, no, it’s not spam… it’s a valid notification message with comment text that happens to contain spam. But there’s a problem with that: the actual mail *does* contain spam, whether it’s in the context of a WordPress comment or not, and marking it as non-spam might well give the spam filter the wrong idea.

On the other hand, if I do mark it as spam in GMail, then it might conclude that valid WordPress notifications are spam as well! (After all, they share the same header and structure.)

Hmm. Can’t win for losing. Eventually I decided to leave them marked as non-spam, and deleted them manually. I’d rather get a few notifications that contain spam, than miss valid comments because they were mistakenly thrown in the spam bucket. Akismet should catch this kind of thing anyway (and usually does), so I should not get too many of those messages. Still, it’s an odd problem.

4 Comments

  1. John Cowan said,

    March 22, 2008 @ 10:03 pm

    Disclaimer: I work for Google, but not on GMail, and I don’t know any secrets about it, though if I did I couldn’t tell you.

    From what I understand, clicking Report Spam doesn’t directly affect the filtering of your personal incoming spam. Rather, it just provides another data point for Google’s general spam-catching algorithm. With something over 10 million users, what you do or don’t do about a particular message isn’t going to have huge knock-on effects for any one user, not even you.

    So if it looks spammy, go ahead and report it — it helps Google’s filters improve incrementally. It’s pretty unlikely that a class of valid messages will all come to be treated as spam unless the overwhelming majority of that class contains spammy stuff — which seemingly is not the case.

  2. Piet Delport said,

    March 23, 2008 @ 1:41 am

    Hans Nowak: I’m not sure how it interacts with spam filtering, but you can try setting up a filter to label these notifications.

  3. Hans Nowak said,

    March 23, 2008 @ 9:32 am

    I didn’t know you worked at Google, John. Although it doesn’t come as a complete surprise… =)

    So, in other words, GMail’s spam filter is based on all users’ mail and spam, rather than being personalized. Makes sense.

    I also like pjdelport’s idea of sticking a label to notification messages… I’m going to try that right now.

    Thanks!

  4. John Cowan said,

    March 23, 2008 @ 12:34 pm

    Indeed, the fact that identical or very similar emails are sent to thousands or tens of thousands of Gmail users is a strong suggestion that they are spam — which is not something that any one user’s email filter can pick up, obviously.

RSS feed for comments on this post