Note that although devolution identifiers are part of some sub-variant identifiers, they are treated in their own sub-section below because of the size of both sections.

Minor variants are members of the same malware family with the same infective length (if relevant to the malware type), with similar structure and behaviour as each other, yet still different in their invariant parts. That is, if a fully specified name has been applied to two (or more) pieces of malware incorporating all components of the name structure from left to right, and to this point the names do not diverge, sub-variant modifiers are used to set the samples’ names apart. For binary parasitic infectors, minor variants are usually different patches of one and the same virus, but for most macro and script viruses, monolithic replicators and all non-infective malware, the sub-variant designator is the main level of naming differentiation of variants.

When selecting a sub-variant name, identifiers are constructed of consecutive letters of the English alphabet (A, B, C, … Z, AA, AB, etc, etc). Earlier advice that longer and less formally derived sub-variant identifiers are acceptable is now rejected. This largely became obvious with the rise of virus types – particularly macro viruses – where `infective length’ was generally meaningless or otherwise problematic. Where infective length is not used as the main variant identifier, sub-variant identifiers take that role. Note also that variants of pure boot infectors (i.e. ones that are not file/boot multipartite) seldom, if ever, were given infective length identifiers but were separated, naming wise, by sub-variant identifiers.

Further, the increasing size of malware, and particularly Win32 malware, has also contributed to this changing point of view — with the infective length of currently common mass mailers being five and six digits long, the infective length specifier is increasingly less useful for identifying the malware to users. Although `user friendliness’ is not a primary function of a technical naming standard, having the technical and pragmatic standards closely track each other is a desirable state of affairs. If common practice was to separate monolithic replicators, Trojans and the like name-wise with sub-variant designators and not to use infective lengths, then `popular’ names, as reported to users, in the media, etc such as Win32/FooBar.A and Win32/FooBar.B would quickly bear no obvious nor readily maintainable mapping to their FSMNs, say trojan://Win32/FooBar.12345.A (which might actually be Win32/FooBar.B) and trojan://Win32/FooBar.12346.A. Thus sub-variant identifiers must follow a standard and expected pattern that, in a perfect world, would map the order of discovery of the sub-variants. Occasional classification and other more `human’ mistakes mean that in reality, no family with more than a small number of sub-variants will have sub-variant names that exactly map discovery order. A scheme designed to track discovery order as closely as possible has a certain parsimony to it, but so does a scheme that results in few `gaps’ in its sub-variant ascriptions. On these latter points, the CARO preference is for filling gaps rather than possibly large renamings to maintain a neat mapping between discovery and sub-variant name order.

A FSMN will always include a sub-variant identifier. Historically virus://Win32/Foo.1234 and trojan://VBS/Bar were considered sufficient FSMNs – a sub-variant ascription of `.A’ was assumed if no sub-variant identifier was specified. This practice is now rejected and sub-variants must be specified in FSMNs. Note that this does not address whether a scanner should include a sub-variant identifier or not. In fact, misleading sub-variant `identification’ is one of the most common causes of confusion among AV users. Traversing an FSMN from left to right to extract a suitable name to report to users, a scanner should only name detected malware to the right-most level that the product reliably detects the malware in accordance with its `official’ FSMN.

Thus, products that detect, say, all 19 known variants of the W97M/FooBar family with just four `generic’ scan strings are wrong to report detecting W97M/FooBar.A, .B, .C or .D when detecting some W97M/FooBar variant. The same product is equally wrong to report W97M/FooBar.A, .E, .L or .Q, even if the isolation of those variants was what prompted the addition of each of the new scan strings into the product. The point is that reporting names to the sub-variant level implies a much higher level of identification precision than the product possesses. Sticking with the fictional W97M/FooBar example, this scanner should only report FooBar detections as W97M/FooBar, as the family name level is the rightmost level of the naming structure that is reliably distinguished for this family.

« InfectiveLength · Naming scheme · DevolutionIdentifier »