Almost certainly the majority of the anti spam system at any large mailbox flogger is a bayesian classifier. Tell your mailbox what you don't want (mark as spam) and it will gradually learn.
It does not care whether it is in German or Cyrillic, it will learn the characteristics gradually. It does require effort.
If you teach a bayesian classifier with around 500 ham and 500 spam, you will see very little spam and if you continue to teach it, it gets better and better. I am assuming that is what Google give you but if I was them, that's what I'd do. I'd also add a few lists and stuff but the gold standard is a trained bayesian filter. You get to do the training, there is no shortcut. I suspect that if you don't mark a mail as spam then it will be implicitly marked as ham.
Getting an efficient spammy feedback mechanism working in a mail system is surprisingly hard. Email changes at each hop as it gets from source MUA via MTAs to the destination MUA. Headers are added at a minimum at each hop. Anyway, that's my problem - not yours!
So, no they do not need to run up a language detector but given that Google have an online translation service, I doubt that would be tricky. That sort of thing may be added to an "enterprise" offering.
Try teaching your mailbox what you want and don't want and see if the clever buggers at Google have actually mastered the basics. They probably have but you need to do the work to provide the data that corresponds to showing what you want.
It does not care whether it is in German or Cyrillic, it will learn the characteristics gradually. It does require effort.
If you teach a bayesian classifier with around 500 ham and 500 spam, you will see very little spam and if you continue to teach it, it gets better and better. I am assuming that is what Google give you but if I was them, that's what I'd do. I'd also add a few lists and stuff but the gold standard is a trained bayesian filter. You get to do the training, there is no shortcut. I suspect that if you don't mark a mail as spam then it will be implicitly marked as ham.
Getting an efficient spammy feedback mechanism working in a mail system is surprisingly hard. Email changes at each hop as it gets from source MUA via MTAs to the destination MUA. Headers are added at a minimum at each hop. Anyway, that's my problem - not yours!
So, no they do not need to run up a language detector but given that Google have an online translation service, I doubt that would be tricky. That sort of thing may be added to an "enterprise" offering.
Try teaching your mailbox what you want and don't want and see if the clever buggers at Google have actually mastered the basics. They probably have but you need to do the work to provide the data that corresponds to showing what you want.