Thus far we have mostly considered relatively straightforward examples, but what if a program is part of a complex ‘cycle’ of programs that, overall, is malware? Or what if a virus works under two, or more, distinct platforms? Or what if product developers with quite precise variant identification, and wishing to report as precisely as possible and in line with the naming standard, have some borderline cases where their products cannot exactly separate two trivially different variants of a malware family?

Is there a way for the standard to accommodate such cases?

There is.

A simple form of set notation is used to aggregate identifiers of the type appropriate to the name component being specified. The general notation is:

{<identifier>,<identifier>[,…]}

That is, at least two identifiers of the appropriate type, separated with commas and no spaces, all enclosed in a matching pair of curly braces.

Let’s look at an example — the virus many products detect as O97M/Tristate.A is most correctly (and fully) named virus://{W97M, X97M,PP97M}/Tristate.A. For the purposes of reporting such a detection to an end-user of a virus scanner this is, of course, very ugly and not very practical. Thus the ‘catch-all’ infection platform name O97M (or Office97Macro) was created for optional use with cross-application infectors that target two or more Microsoft Office and/or related VBA platforms (Project, Visio and potentially others). The existence of such a non-specific platform name raises issues surrounding the correct identification of samples. Should a Word document infected with O97M/Tristate.A be reported as infected with O97M/Tristate.A or W97M/Tristate.A? Either is correct, and both are preferable (from the perspective of reporting the infection to an end-user) to the technically accurate virus://{W97M, X97M,PP97M}/Tristate.A.

Aside from the O97M pseudo-platform, which has become well-established, another pseudo-platform name, reserved for a few very special cases, has recently been accepted into the naming convention. Some platforms have very ‘loose’ file formats and often ‘easy’ error handling (tending to continue execution rather than aborting, except for the most extreme of errors). Thus, occasionally a single file can be treated as functional code on more than one platform. One example is that a DOS COM-style program may also be a functional BAT file and/or possibly be a valid program for some other text-based scripting platform. The full, formal name of a piece of a virus that exploited such situations would be of the form virus://{DOS,BAT}/Foo.A, however reporting such to a user is not desirable. Thus it is agreed that wherever multiple platforms must be bracketed for a full formal name, the pseudo-platform name ‘Multi’, or ‘Mul’ in its short form, may be used instead. Mul/Multi may be used for O97M case just discussed as well, but as O97M is well-established, its continued use is acceptable. Note that O97M and Mul/Multi are purely reporting issues — the FSMN of such malware is correctly handled by the simple set notation described above.

Announcing multiple values in a single name component slot can be useful in other situations too. For example, a script virus may be hard-coded to use directory names that are specific to the Italian and Spanish versions of Windows and depend on the existence of those directories in such a way that if neither exists, the code fails to be classified as malware at all. In such a case, an FSMN along the lines of virus://VBS/FooBar.A:{It,Sp} would be appropriate.

However, note that, at least insofar as a FSMN is concerned, it is rare for name components other than the platform name and locale specifier to hold multiple values, as the purpose of most of the name components is to separate and identify variants. For example, VBS/FooBar.{A,B} is nonsensical as an FSMN. Despite that however, a scanner can meaningfully report such a detection. Staying with this hypothetical example, two obvious possibilities arise. First, related variants of a piece of malware comprised of multiple files may share one or more of those files, with the variant code being in one or more of the other files. Thus, the identification of a specific file may correctly indicate that this file is known to be part of two (or more) variants of the same malware. A second possibility is that two variants may differ so very slightly, that a detection engine may not be able to differentiate them. Despite this the developer may wish to provide as detailed identification information as possible (see the discussion about detection and reporting precision at the end of the .<sub-variant>[<devolution>] section). If the developer knows that other known variants will not be ‘mis-identified’ with this same detection item and unknown, possible future variants are very unlikely to be mis-identified, grouping the variants known to be indistinguishably different to this detection engine is an acceptable solution. Note the important distinction here between detecting or reporting a piece of malware and uniquely identifying then naming a malware variant. When doing the former, some liberties in bracketing multiple values together in a name component may be acceptable whereas such bracketing is clearly ‘impossible’ if uniquely identifying a malware variant.

« VendorComment · Naming scheme · NamingConsiderations »