Via the ESET blog: the guidelines for testing Anti-Malware products were published by AMTSO (the Anti Malware Testing Standards Organization). Go and read them if you are so inclined (each of them consists of only 5 pages - you have to give them props for brevity - although maybe they just wanted to avoid being to specific so there is less of a chance of being wrong ;-)).
In general I feel that testers (no offense) don't have the necessary technical skills to evaluate in an objective manner the relevance of their tests. Sorry, but someone who never put their hand on an ASM level debugger (like Olly), a disassembler (IDA), who never participated in a crackme contest (with at least some success), who never analyzed shell code, who never unpacked a malware - just doesn't cut it.
Also, I find some conflicting statements in the two papers. First they sidestepped the question of what constitutes "creating new malware" (this is interesting in the context of the Consumer Reports situation - BTW, my personal opinion on the matter is that CR was justified in creating variants).
Second, they say that "test results should be statistically valid". First of all, the expression is "statistically relevant". Statistics (as I found out) is not a black and white game. Usually, limit criteria are selected somewhat arbitrarily (using "well accepted" values is common - however they are more a psychological factor than a mathematical one). Example: what is an acceptable error margin? 5%? 10%? 50%? There is no magic formula which can respond to that, it is largely determined by how you feel about risk.
Now this principle goes against the dynamic testing where it acknowledges that (given the complexity of the situation) only as little as 50 (!) samples might be tested for a particular test. Given that each month more 100 000 new (undetected) samples appear (and this is a conservative number), this sample set is utterly insignificant.