Truth about computer security hysteria
The debut of realtime virus data

Rob Rosenberger, Vmyths co-founder
Friday, 23 February 2001 KOURNIKOVA MARKS THE first media event with realtime virus data.
It took the antivirus industry eleven years to give us realtime virus data. Sadly, experts don't yet know how to assess and interpret it. Raw numbers mean nothing if you can't evaluate their impact.
I waited a long time for this to happen. It took the antivirus industry a whopping eleven years to reach this point. And we still have a long way to go — most corporate virus fighters can't track their own firms "OnTheFly" (pun intended) even if their jobs depended on it. Sadly, fearmongers once again showed us just how badly we need to assess and interpret realtime data. Raw numbers mean nothing if you can't evaluate their impact. Hmmm, let me caveat my previous sentence. The media as a whole doesn't really care if you understand a number, so long as you have a number they can cite. Fearmongers like to throw figures around for this very reason. This leads us to
Media Axiom #1: all other things being equal, a guy with a number gets more airtime.
Take Kenny Liao (Trend Micro), for example. It doesn't even look like he gave the Associated Press a hard number. He merely pulled an estimate out of his butt: "if we look at the total number of users that have been reported to us and consider we are only contacting a small portion of the Australian population — the estimation would be more than 100,000 [infected Australian computer users]." The Xinhua news agency soon reported a complete meltdown of email service in Australia thanks to Liao's brown-tinged guesstimate. This leads us to
Corollary #1 to Media Axiom #1: all other things being equal, a guy with a big number gets more airtime.
The media suckled for 15 years on wild conjectures in the antivirus world and I see no real end to their guess-feeding. Reporters will forever mistake estimates for facts — but at least now we'll see some actual empirical data mixed in with all the fearmongering.
THEN AGAIN, ADDING real data to the freakshow may result in more fearmongering, not less. The more raw figures you have, the more easily you can mislead reporters. I can imagine someone making the following conjecture while talking to the press:
...so as you can see, our raw data obviously shows a virus catastrophe on the horizon. It's going to be big, too. Based on a pseudo-logarithmic increase in detections from 3:14am to 5:13pm, and based on a quasi-logarithmic increase from 5:14pm to 7:32pm, we at Fearmongers Inc. believe this horrifying virus will infect 148,932,000 PCs, with a margin of error of +/- 22,500, before antivirus vendors ultimately contain the threat. Of course, this is an estimate; your actual mileage may vary. We at Fearmongers Inc. pray our calculations prove wrong, so we beg every reporter on the planet to help us spread the word. The media will make a difference if they save even one PC user...
We can't yet put virus tracking maps into motion like a weather report. What does it truly mean if you can't see time as a factor of virus proliferation?
We've certainly seen this type of BS before, although no one bothered to crutch it with raw data before now. Critics will find it even more difficult to combat fearmongers who mix facts with estimates. Reporters desperately want to cite numbers, so they'll ask a trick question: "if their figures are bad, then what's a good estimate?" The question defies an answer because it assumes someone knows a good estimate right when a virus takes off. It'll take years before antivirus experts can quickly do a careful realtime data analysis. Therefore, critics will continue to get shoved aside in favor of fearmongers. Like I said — all other things being equal, a guy with a number gets more airtime. "I gotta put something in my story," even if it's bad. Which brings us back to the need for valid assessment and interpretation of realtime virus data. A good estimate will only come from careful (repeat: careful) analysis. Consider the following:
  • What does it truly mean if a small email outsource firm trapped m-thousand file attachments during x time period?
  • What does it truly mean if a large email outsource firm trapped n-thousand file attachments during y time period?
  • What does it truly mean if we add m+n file attachments? Does it matter if the x and y time periods match up?
Here, I'll ask another simple question. Put McAfee's or Trend's virus tracking maps into motion and show me how the-- oops, my apologies: you can't yet put their maps into motion like a weather report. (Study this map instead.) Let me ask a different question, then. What does it truly mean if you can't see time as a factor of virus proliferation?
WHAT DOES IT truly mean? Answer: it means you can still promote virus hysteria — even if your own (broken) virus tracking maps contradict everything you say. In an earlier column, I pointed out how MessageLabs' initial comparison charts didn't support their claims about the spread of Kournikova. Spokesmodel Alex Shipp wrote to say he agreed with my view:
What we meant to convey by this was that the time between first release and achieving epidemic spread rates was half as much for Kournikova as for LoveBug. However, looking at the page again, I take your point that it is not at all obvious what we mean, and in fact we haven't even bothered to tell anyone what the coloured bars mean on the graph, so how anyone is meant to make any sense of it all beats me.
MessageLabs' radioactive tennis ball Read Shipp's quote again: "it is not at all obvious what we mean." Reporters still lavished ink on MessageLabs — simply because they provided realtime virus data. Shipp promised to update the pages "to be more useful," and indeed he did. Click on the radioactive tennis ball to see the improvements. (A radioactive tennis ball? The antivirus world spends too much time creating graphics if you ask me.) One of the best improvements came when they added this caveat: "MessageLabs [acquired] more customers since the outbreak of LoveBug [last year], so direct comparisons by number may not be too meaningful." Their updated report offers a much better comparison, shown in numbers as a ratio of malicious emails. You don't see raw data with those statistics (tsk tsk), but the web page includes enough data to deduce MessageLabs' daily email flow during ILoveYou and Kournikova. I'll leave it for you to calculate as an exercise in math.
I WON'T BLAME MessageLabs for misleading the media with their initial comparison charts. They really do want to please reporters & critics alike. No, the real blame must fall on the press. They collectively can't (or won't) study a chart for validity. Reporters saw some numbers on a bar graph and said "duh, it's intuitively obvious." This leads us to
Corollary #2 to Media Axiom #1: all other things being equal, a guy with a slick chart gets more airtime.
I critiqued MessageLabs pretty heavily, but I want to give them genuine kudos for supplying raw data and for using time as the key factor in a virus proliferation map. As I said, I waited a long time for this to happen. MessageLabs set a new standard by providing realtime virus data during the Kournikova coverage. Honorable mentions go to Mail.com, Brightmail, McAfee, and Trend Micro. Once you've got raw data, you can begin to create metrics. Come on, say it out loud with me: "you can't develop metrics without raw data." I'd love to see a line chart showing each day's ratio of malicious emails, for example. Forget the immediacy of a plummeting value — I want to see how the line rises or falls over the long term.
I love to fantasize about virus metrics. My brain gets hard just thinking about it...
...But it'll take years before virus experts can quickly do a careful realtime data analysis.
Whew! I love to fantasize about virus metrics. My brain gets hard just thinking about it. Some of the biggest companies on Earth got their first taste of global realtime data. They looked at those tracking maps and wondered why they don't have a corporate version for their own use. I can already hear security managers on the phone with their antivirus vendors. "What do you mean, you can't make a map for us? We spend a million-plus per year on you guys! Or should I say, we used to buy your stuff..." Reporters got a taste of global realtime data, too, and they'll want more. Nay, they'll expect more. This leads us to
Corollary #3 to Media Axiom #1: all other things being equal, a guy with more numbers gets more airtime.
Oh, sure, the antivirus industry will stumble while they try to figure out how to present the raw data they collect. Security managers will scratch their heads while they try to analyze the numbers. CIOs will stare blankly at charts they've never seen before. But I can assure you, our world just got better. All because some firms published a bit of raw data during a virus media event.
NOT EVERYONE MAY feel as ecstatic as I do about the debut of realtime virus data. Shipp seemed just a little too eager to email me about his firm's data collection theories. I'll go out on a limb here — didn't MessageLabs find anyone else willing to listen to them? Surely Virus Bulletin would beg Shipp to write an article about the vectoring data he assembled. It makes little sense to hand me so much great material unless other antivirus media outlets turned it down first. If they did turn it down, then we need to ask why. Shipp knows more than he wants to say under direct questioning. Fine: his evasiveness gives me an opportunity to speculate. I believe the antivirus industry holds a dirty little secret. Namely, they don't have access to as much "proprietary client data" as they've led reporters to believe over the years. Indeed, this industry has thrived for its whole life on inaccuracy. I first started ranting about it, what, a dozen years ago? (Time flies when you're having fun.)
The raw data in our 'Top 3' list ironically explains why you shouldn't take it too seriously right now. We admit it because "truth" is the first word in our website's slogan.
First, antivirus vendors collect much more hearsay than data during a virus crisis. We know this because vendors will admit it under direct questioning. It often comes from low-level worker bees who "confirm" their company will file for bankruptcy after the attack subsides. "Whatever our market capitalization was yesterday, that's how much damage this virus caused!" Other hearsay comes straight from a news story, and antivirus vendors simply parrot it to the next reporter who calls. Second, I think what little true data they collect during a virus crisis tends to overlap. Multiple vendors repeat one anonymous company's woes, giving the impression of multiple anonymous companies. I can't confirm it, though, because vendors won't reveal their proprietary client data. Doing so would violate one of Marketing's ten commandments: "Thou Shalt Not Smite Customers By Name." Third, the antivirus grapevine distorts anecdotes during a virus crisis. Again, we know this because vendors will admit it under direct questioning. If Microsoft shut down, say, four email servers during a virus attack, the grapevine will quickly report Microsoft shut down their entire email capability worldwide. Believe it or not, virus experts embrace some of the urban legends grown on their grapevine. And they'll tell you those whoppers with a perfectly straight face. I usually hit a roadblock when I confront reporters with the industry's dirty little secret. David Perry (Trend Micro), for example, will correctly — I repeat, correctly — explain "we can't divulge our proprietary client data." To which I tell a reporter: "how can you gauge the accuracy of his data if you don't even know what it is in the first place?" Reporters then erect the roadblock: "how do you know it's not accurate if you've never seen it, either?" To which I respond, "I've got 15 years of hysteria on my side." The reporter dismisses me at this point: "he may actually be right for once, and besides, he's got a number and you don't." Media Axiom #1 applies. (I use Trend in this example as a compliment, not an insult. Their PR team has the guts to back me up on this point, "with clarifications" of course. Go on, call Susan Orbuch or David Perry. They'll vouch for me.) I think realtime virus data will force the antivirus industry to come to grips with its dirty little secret. It will put a damper on the overlapping hearsay. It will repel anecdotes and expose distortions. It will deflate vendors who pump themselves up for PR reasons. So I think some people in the antivirus world will publicly embrace — yet privately sneer at — realtime virus data for these reasons.
Memo to U.S. Navy's top admiral: ask your CIO to produce imagery showing Kournikova's impact on the fleet every ╝hr from 0000Z 12 Feb 01 to 2359Z 15 Feb 01, with accuracy down to the ship. Smile when the CIO responds "not possible."
And yes, it will create more of a software workload when it comes to analysis & design & programming & deployment & support. Just another reason why they won't like it. Did other antivirus media outlets pass on Shipp's story before he turned to me? Too bad for them. Those who cover the glasnost in its early stages will cover it best...
LET'S TALK NOW about some data we gather at Vmyths.com. What does it truly mean when we track the weekly 'Top 3' virus hoaxes? Take a quick glance at the data. You'll notice every 'winner' received a few dozen reports at most. We don't yet break a hundred with the weekly top three combined. Say it out loud: "your figures look boring, Rob." And rightly so. You can't deduce worldwide impacts from a few dozen emails per week. Ah! Now let's suppose Vmyths.com never divulged the raw numbers. Suppose we offered percentages instead. We might further cloud things by saying we provide the world's only open-source data on virus hoax proliferation. You'd take our weekly 'Top 3' list a lot more seriously then, wouldn't you? In fact, you might even take it as seriously as Sophos' monthly 'Top 10' virus list. If I recall correctly, Sophos used to include raw numbers in their reports. The low values didn't give the impression of a large firm, so they switched to percentages. The raw data in our 'Top 3' list ironically explains why you shouldn't take it too seriously right now. We think it will grow more precise over time as the word gets out — but it qualifies as a novelty for the moment. Treat it as such. And why shouldn't we admit it? After all, "truth" is the first word in our website's slogan.