Back to Top

Monday, November 09, 2009

What VirusTotal is not

2139429_dedfc5706f_b Since its inception VirusTotal has been used by people to compare different AV products (just in case you don’t know: VirusTotal is great free service which scans the uploaded file with 40 AV engines currently and reports back the results). The AV industry has objected to this practice because of a couple of reasons, some more valid than others IMHO.

Today however I want to talk about the practice of saying “(only) X% of AV detect this” and then giving a VirusTotal link. Two recent examples: here and here (to be clear: I don’t have anything against the particular blogs / companies / authors – there are many more examples of this practice, these are just two recent ones which came to my attention).

Why is this percentage meaningless and serves only to perpetuate FUD?

  • As I first argument I could mention all the discussion about AV engine configuration (this is frequently raised in discussion regarding the detection discussion, so I won’t dissect it further). A very thoroughly discussed argument is also that VT results represent a “point in time” rather than “now” (ie. detections since the scanning might have changed).
  • The second argument would be: VirusTotal goes for quantity not necessarily quality. Ie. the fact that a given engine is included in the list of engines used by VirusTotal isn’t a statement about the engine resource use, detection rate or false positive rate. Again, this doesn’t mean that the engines used are of low quality, it just means that VirusTotal isn’t in the AV engine testing business. It doesn’t say anything about the market share of the product either.
  • This means that the affirmation “X% of the engines detect a given file on VT” isn’t equivalent with the affirmation “X% of the users using AV are protected” or “AV software is X% effective”. However these are the thoughts which appear (by association) in a readers mind when seeing the initial affirmation.
  • Furthermore, some engines appear in multiple products (for example GData integrates BitDefender – amongst others) while other engines appear “split” (for example the McAfee desktop product contains both the “classical” and “cloud” engine, however on VT they appear as two separate entries “McAfee” and “McAfee+Artemis” respectively). If these relations are not considered (and I’m almost sure that they aren’t – given that these relations are not always publicly documented and they can change over time), the results come out skewed.

Conclusion: please never, ever take the VT result page and copy-paste the percentage from it! Do provide permalinks to the result pages and you can even make some sensible general statements (like “most of the major AV vendors detect this threat” or “this threat is not well detected by the smaller, Asian AV companies, but given its reliance on the English language for social engineering, it might not be such a big threat”). However, giving percentage wreaks of FUD and smells of negative propaganda (do we really want to be at each-others throat, analyzing which vendor doesn’t detect what? – there would be no winners in such a discussion). Lets concentrate on giving sensible security advice to users instead.

Picture taken from Peter Kaminski's photostream with permission.


  1. Hi,

    I've read your article and can understand your standpoint but to be honest, what I do with Virus Total is not negative propaganda but pure information. Your arguments are debatable but I respect your arguments.

    When I publish a virus report on the MX Lab Blog I include the Virus Total information as additional information but also as a warning that certain viruses, trojans and variants of those aren't detected by the majority of AV engines.

    Of course, this is only at a certain 'point in time'. Perhaps I should do a blog article where I submit a virus at certain time frames to see how the AV engines detect the virus over time.

    Virus Total allows us to analyse, up to a certain level, okay quite limiting I have to admit, a threat without going to the hassle of installing and maintaining +40 computers or virtual machines with all the available AV engines.

    But we have to face the fact that AV engines, with signature based techniques, aren't adequate for the job. And I am not the only person who is thinking this and loudly saying it. You should read the following article:

  2. @Peter: information is very rarely pure, especially if it comes from persons who have considerable involvement with the given issue.

    While I agree with you that AV is not perfect, I disagree with the method of "talking down" a given technology, especially when your own technology (ie. the "Zero-hour detection") is very similar to the one you criticize. IMHO, it also makes no business sense, because it is much easier to sell using the slogan "we are a better AV" than the slogan "we have magic pixie dust which is better than AV!" - but what do I know, I'm just a techno geek, right?

    Disclaimer: I have no detailed knowledge of the inner workings of the "Zero-hour" technology, but from what I've seen, it is very similar to the existing AV technologies (ie. centralized updates, pushed frequently and the clients use pattern matching based on the db). If I understand correctly the distinct features would be the automatic generation of patterns and the collection of samples from clients - however both of these are present in current mainstream AV products (some widely publicized - like McAfee Artemis - others not).

    PS. Hopefully you take this as it was intended - as constructive criticism. I still am (and will be) a subscriber to your blog.