On the Evolution of Malware Species Vasileios Vlachos∗, Christos Ilioudis†and Alexandros Papanikolaou∗
Abstract Computer viruses have evolved from funny artifacts which were crafted mostly to annoy inexperienced users to sophisticated tools for industrial espionage, unsolicited bulk email (ube), piracy and other illicit acts. Despite the steadily increasing number of new malware species, we observe the formation of monophyletic clusters. In this paper, using public available data, we demonstrate the departure of the democratic virus writing model in which even moderate programmers managed to create successful virus strains to an entirely aristocratic ecosystem of highly evolved malcode.
Keywords: malware, computer virus, phylogeny, cybercrime, malware writers
1
Introduction
Malicious software is one of the most persistent threats to computer users. Earlier types of malcode debuted at the mainframes [1, 2], but a substantial rise could be attributed to the proliferation of home and personal computers [3]. Computer virology was theoretically and experimentally established by Fred Cohen and his supervisor Leonard Adleman [4–6]. Since then, computer viruses and other parasitic applications have became a common albeit annoyance for most computer users. As a result a multibillion world market for security applications has emerged and soared since then. Europe spent more than 4.6 billion eur for security applications and services in 2008 [7]. According to antivirus vendors more than 4500 new malware species appear daily [8]. The effective handling of such a large number of threats requires substantial efforts and resources, human as well as computational, in order to provide timely remedies and protective measures. As consequence the absolute number of malware species constantly increases and at the time exceeds 2.6 million threats [8]. The overwhelming majority of the malware is either proof of concept code or flawed malicious programming attempts. Only a small number of viruses and worms manages to propagate in the wild (or in other words to reach and affect normal users), and merely a handful of them had the potential to become epidemics or pandemics. Therefore it is necessary to prioritize the imminent malware threats and devote the appropriate resources accordingly. In this paper we analyze a large data set of the computer viruses and other forms of malcode, that have been seen in ∗
{vsvlachos, alpapanik}@teilar.gr, Department of Computer Science and Telecommunications, Technological Educational Institute of Larissa, Larissa, GR 411 10, Greece. †
[email protected], Department of Information Technology, Alexander Technological Educational Institute of Thessaloniki, P.O. BOX 141 GR, 57400 Thessaloniki, Greece.
1
the wild and we evaluate the current landscape so as to identify current hot spots that should trigger immediate attention. We believe that through the understanding of malcode evolution, a prioritization of current threats is both viable and beneficial. By extending the well established Darwinian theory, we find that the small percentage of computer viruses which is capable to mutate and adapt to the environment, is responsible for the majority of the security incidents. The rest of this paper is organized as follows: Section 2 summarizes the related work, Section 3 presents and discusses our findings, whereas Section 4 concludes this paper along with possible future directions.
2
Related Work
A number of analogies between biological and computer viruses have been revealed [4, 9] in the past and more recently [10, 11]. An important outcome of this approach is the realization that the monocultures are particular harmful for the security of the software ecosystem [12–15]. Most of the work, however tackled the evolution of the security mechanisms from the defenders perspective [9, 16–18]. A more aggressive strategy would focus on reconnaissance of the weak points of the malware development process through biological analogies. The Phylogenetics is the study of the relationships between organisms based on how closely they are related to each other [19]. Researchers have applied similar methodologies to investigate the evolution of software and malware in particular, either using manual methodologies [20] or automated techniques [21–23]. It is reasonable to expect that only successful viruses will have the chance to mutate and eventually to create phylogenetic clusters. Therefore the WildList is better suited to become the basis of an evolutionary study. Though there is no reason to believe that the actual number of computer viruses differs from the estimation of major antivirus vendors, there is a clear difference between the malcode that has been developed for proof of concept purposes, in vitro environments and the number of malware strains that can be found in vivo. Moreover even if a virus circulates, it is not expected to cause significant damage given the total number of viruses in the wild. In our previous work we examined the factors that contributed to the success or the failure of a worm [10]. In this study we decided to utilize data from the WildList Foundation to capture the malware dynamics that have been seen in the wild. This list is somehow arbitrary as it is based on a limited number of participants, but as we will discuss, we believe that it provides significant advantages over other traditional approaches [24]. Despite the fact that some antivirus vendors [25] and researchers [26] do not agree with the methodology used by the WildList, still in general “it is considered as an authoritative collection of the widespread malcode and is widely utilized as the test bench for in-the-wild virus testing and certification of anti-virus products by the icsa and Virus Bulletin” [27]. Various av vendors provide statistical data about the proliferation of computer malcode, paying more attention to the evolution of the malware codebase and the financial motives of their developers [8]. On the other hand researchers have focused on interviewing malware writers in order to explain their psychosynthesis [28–31]. These findings are important and useful, but have not been updated and correlated with the current trends. Our work shows that the development of malcode is no more a “democratic” activity, in which any individual with moderate skills (for fun, 2
Name of Virus W32/Feebs!ITW#33 W32/Feebs!ITW#45 W32/Feebs!ITW#83 W32/Feebs!ITW#89
Malware Families [Alias(es) ] List Date [!3501..........] 7/06 [!E7A1..........] 7/06 [!D840..........] 5/07 [!9FA2..........] 11/07
Reported by SjSt SjSt JgRsSt PaStTl’
Table 1: Malware. political, religious or other reasons) could develop a new strain of a computer virus and cause significant or widespread damage. Most modern malware incidents are the result of a few number of prominent malcode families which dominate the landscape and are responsible for most annoyances and damages. The rate of which improved versions of the specific families are rolled out predominates most of the malware activity.
3
Discussion
Although the current malware activity can be obtained through various sources, we deliberately choose to work with the WildList because we believe it represents better the observed malcode dynamics. According to their definition “The list should not be considered a list of ‘the most common viruses’, however, since no specific provision is made for a commonness factor. This data indicates only ’which’ viruses are In-theWild, but viruses reported by many (or most) participants are obviously widespread ”. In other words, this list contains the viruses, worms and other types of malicious software that succeeded to propagate sufficiently to be detectable, which clearly excludes proof of concept prototypes, academic examples, or ill engineered malcode artifacts. The WildList employes an arbitrary naming scheme to identify malware treats which is basically the name most used by different av scanners or the name given a virus by the person who first reported it. For the purpose of identifying malicious code of the same malware family we analyze the archives of the Wild List Organization from July 1993 till June 2010 and we taxonomize them according to their name. For example during January 2008 we identified the worm strains shown in Table 1 as members of the W32/Feebs family. This approach which is based on the categorization of the WildList is not as detailed as the manual or automatic inspection of the malcode using “phylogeny model generators (pmgs)” [21] so as to discern their phylogenetic characteristics. Nonetheless we find the method of the WildList Organization sufficient to correctly categorize most of malcode species to malware families. Another issue with the Wild List is the fact that does not provide absolute numbers regarding the malevolent activity of the malware species. Therefore we are not able to know the number of infections so as to categorize the viruses and the worms according to their virulence. As a result a worm with a single entry in the WildLight might have caused more infections than all mutations of a malware family. On the other hand the fact that numerous mutations of a malcode phylogeny managed to propagate to a wide scale so as to be included in the WildList is indicative of its capabilities to exploit a large pool of victims. In order to proceed with the classification we used a small bash script to down3
load all the monthly archives form the WildList Organization. A Python program stripped all the unnecessary content of the archives and a subsequent Python application identified the malware families and performed analysis on the data. Our applications processed 175 files containing 238474 lines of text which were eventually stripped down to 69820 lines of data. These data were the basis of the analysis for identifying the current threats in computer virology. The first and most observable trend indicates an important clusterization of the malicious software to a small number of malware families. From Figure 2 we can witness that the percentage of the malcode species that belong to a dominant malware family does not show significant change in respect to the first available data of the year 1993. Though one can observe evident increase for some months after the February of 1997 as well as for the period of the last years (after 2005), the latest measurements show a stabilization of the dominant malicious activity related to the dominant malcode family around 15% of all the viruses, worms, spyware families that were found in the wild each month. Far more important are the findings if we analyze the trends of the three, five or ten most dominant families in conjunction. In that case we can observe that according to the latest data (January 2010) the three most dominant families represent now the 40.81% of all malware species that have been actively circulating compared to a mere 24.04% of the first available data at the July of 1993. The five most dominant families at the same period show a serious increase from 28.85% to 58.77%, where for the ten most dominant malware families we recorded a substantial growth from 38.46% to 77.42%. The trends depict a significant change of the malware activity. Our interpretations of these findings agree with the work of S. Gordon [28–30], who examined the motivation of malware writers from a psychological perspective and that of S.Savage et al [32], which focused on the economic initiatives that drive the proliferation of computer crimes through the development and the maintenance of botnets. The earliest data (1993) depict a number of different malware strains that managed to propagate sufficiently so as to be included to the WildList. This trend eventually fades out as very few dominant malware families and their respective members represent the vast majority of the viruses that succeed to circulate at large. Therefore it is not as easy for a malicious entity to develop a new virus, worm or spyware as it used to be fifteen years ago. On the contrary one has much better chances to achieve widespread infection using a modified or extended version of a well maintained malware family. Based on the data analysis, the top ten malware families with most incidents in the WildList are presented in the following table (Table 3), where a more extended view of the malicious software landscape is available in Figure 3. Unfortunately, due to space limitations we had to include only the viruses, worms, spyware and bots that had more than 100 entries in the WildList in total and hence Figure 3 contains only 97 from the 821 malicious applications that were identified in the WildList. Further analysis of the data indicates that the top ten malware families account for the 37.4% of the 817 total incidents that have been recorded in the WildList, while the top ten malware species are responsible for the 48.5% of all the incidents (Figure 1). In other words ten malware phylogenetic clusters are accountable for half of the cases that formulate the WildList so far. The common characteristic of the top ten entrants is that they have caused widespread problems and are also well 4
known for their ability to mutate rapidly. 64439 2642690
WildList Symantec
WildList Incidents 7097 2664 2504 2473 2373 1562 1210 1141 1110 1046
2250000
Number of threats
Dominant Malware in the Rank Name of Virus 1 W32/Mytob 2 W32/Onlinegames 3 W32/Sdbot 4 W32/Bagle 5 W32/Autorun 6 W32/Netsky 7 W32/Rbot 8 W32/Opaserv 9 W32/Mydoom 10 W32/Lovgate
3000000
20547 18827 69107 113025 140690 624267 1656227
1500000
750000
0
WildList
Symantec
Figure 1: Malware with most incidents and total number of viruses. The implications of these findings are important as they suggest that most of the viruses, worms, spyware, do not manage to propagate in the wild and remain in vitro samples of malicious code. Even the malcode that manages to infect a sufficient number of victims so as to be included in the WildList, either mutates and evolves rapidly, or eventually diminishes and vanishes. Therefore only well written malcode, which offers high degree of upgradability or can be easily mutated, has improved chances to survive in the wild for a sufficient period.
4
100.00
Top 10 Malware Families Top 1 Malware Families
Top 5 Malware Families Top 3 Malware Families
75.00
50.00
Jul2009
Apr2009
Figure 2: Percentage of malware incidents attributed to top malcode families. 5
Oct2009
Jan2009
Jul2008
Apr2008
Oct2008
Jul2007
Jan2008
Apr2007
Oct2007
Jul2006
Jan2007
Apr2006
Oct2006
Jan2006
Oct2005
Jun2005
Mar2005
Jun2004
Dec2004
Sep2004
Mar2004
Dec2003
Sep2003
Feb2003
Nov2002
May2003
Feb2002
Aug2002
May2002
Jul2001
Nov2001
Apr2001
Jan2001
Jul2000
Apr2000
Oct2000
Dec1999
Jun1999
Mar1999
Sep1999
Jul1998
Apr1998
Dec1998
Jan1998
Jul1997
Oct1997
Feb1997
Jan1996
Sep1996
Sep1995
May1996
Jul1994
0
Jan1995
25.00
Jun1995
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Jul1993
11.54 13.64 14.08 15.79 15.22 14.77 14.44 13.33 14.00 13.21 13.21 14.52 11.73 11.56 10.93 10.99 10.53 11.17 10.93 10.87 10.58 10.31 10.40 8.59 8.02 7.62 9.40 10.88 11.67 13.65 17.88 19.10 20.60 20.97 21.61 24.61 24.42 25.19 5.77 5.73 5.73 5.84 5.70 5.10 4.76 4.51 5.04 4.83 4.79 4.58 4.64 6.08 7.58 7.35 7.43 8.18 7.98 7.14 6.86 6.59 6.44 6.86 6.80
Dec1993
3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
Our analysis is targeted to identify the dominant malware phylogenetic clusters and to indicate via statistical means that virus writing has become a professional activity in which amateurs with moderate skills are no more eligible to participate. Of course the available data of the WildList Organization could reveal other significant
%
24.04 27.27 26.76 27.63 26.09 26.14 24.44 22.22 22.00 20.75 20.75 21.77 16.67 16.18 15.85 15.93 15.26 15.64 15.30 15.22 14.81 14.43 14.36 12.63 12.26 16.14 19.23 20.50 21.25 22.89 26.28 27.34 28.46 28.84 29.30 32.42 32.17 32.95 10.38 11.07 11.07 11.28 11.03 10.59 10.32 10.15 12.95 12.41 11.64 11.11 12.58 13.51 15.91 14.71 15.54 18.24 17.79 17.26 17.14 15.93 15.84 17.16 16.50
Future Work and Concluding Remarks
un y
erv om te
ace 2/Onlinegames nia 2/Sdbot on 2/Bagle ot
2/Mytob
2/Autorun 2/Netsky 2/Rbot l ux 2/Opaserv ker 2/Mydoom 2/Lovgate kbro 2/Ircbot ned 2/Koobface an 2/Magania s 2/Stration s 2/Agobot 2/Sober
d
s 2/Mimail ss M/Laroux ru r 7M/Marker
2/Korgo
2/Agent cker 2/Rontokbro etter 2/Yaha 2/Vaklik e 2/Autoit 7M/Ethan ssa er 2/Klez 2/Areses 7M/Thus G 2/VB usalem 2/Looked tr 2/Sobig ov 2/Hybris 7M/Class 2/Dumaru ot 2/Sasser aLaroux 2/Conficker sada S/LoveLetter e m 2/Feebs PCK1 2/Mywife e2/Bagz dApe 7M/Melissa ot 2/Blaster pt /Wazzu odle 2/Zbot ks ky S/VBSWG 5/CIH xCMOS M/Divi y 2/Magistr bot 7M/Groov ear 2/Zafi t ans 2/Elkern ht941 E_Bug 2/Spybot na bot pire ot 2/Nimda 2/Nachi M/Barisada ve 2/MyLife cade 2/Parite 7M/VMPCK1 7M/Pri 2/Reatle 7M/ColdApe 2/Vanbot /Concept kee Doodle 2/Fujacks 2/Locksky 2/Kriz e_Half 2/Bobax 2/Virut 7M/Opey 2/Wootbot 2/BugBear ty_Boot 2/BadTrans 7M/Eight941
7M/Myna 2/Slenfbot alth_Boot per EXE 2/Funlove 7097 2664 2504 2473 2373 1562 1210 1141 1110 1046 969 863 847 727 639 621 595 584 521 492 456 453 422 395 358 348 340 332 296 291 273 272 268 263 262 248 247 246 242 236 234 232 219 218 217 212 209 204 196 187 180 179 176 170 165 163 162 153 153 150 146 143 139 138 135 134 133 131 131 130 129 128 127 125 123 122 122 122 121 119 118 117 115 115 111 106 105 103 103 103 103 103 102
7097 2664 2504 2473 2373 1562 1210 1141 1110 1046 969 863 847 727 639 621 595 584 521 492 456 453 422 395 358 348 340 332 296 291 273 272 268 263 262 248 247 246 242 236 234 232 219 218 217 212 209 204 196 187 180 179 176 170 165 163 162 153 153 150 146 143 139 138 135 134 133 131 131 130 129 128 127 125 123 122 122 122 121 119 118 117 115 115 111 106 105 103 103 103 103 103 102
W32/Mytob W32/Onlinegames W32/Sdbot W32/Bagle W32/Autorun W32/Netsky W32/Rbot W32/Opaserv W32/Mydoom W32/Lovgate W32/Ircbot Stoned W32/Koobface W32/Magania W32/Stration W32/Agobot W32/Sober WM W32/Korgo W32/Mimail X97M/Laroux W97M/Marker W32/Agent W32/Rontokbro W32/Yaha W32/Vaklik W32/Autoit W97M/Ethan W32/Klez W32/Areses W97M/Thus W32/VB Jerusalem W32/Looked W32/Sobig W32/Hybris W97M/Class W32/Dumaru W32/Sasser XM/Laroux W32/Conficker VBS/LoveLetter Form W32/Feebs W32/Mywife W32/Bagz W97M/Melissa W32/Blaster WM/Wazzu W32/Zbot VBS/VBSWG W95/CIH AntiCMOS X97M/Divi W32/Magistr W97M/Groov W32/Zafi W32/Elkern EXE_Bug W32/Spybot Empire W32/Nimda W32/Nachi X97M/Barisada W32/MyLife Cascade W32/Parite W97M/VMPCK1 W97M/Pri W32/Reatle W97M/ColdApe W32/Vanbot WM/Concept Yankee Doodle W32/Fujacks W32/Locksky W32/Kriz One_Half W32/Bobax W32/Virut W97M/Opey W32/Wootbot W32/BugBear Parity_Boot W32/BadTrans W97M/Eight941 Flip W97M/Myna W32/Slenfbot Stealth_Boot Ripper AntiEXE W32/Funlove
0.656 0.658 0.66 0.662 0.664 0.666 0.668 0.67 0.672 0.674 0.676 0.678 0.68 0.682 0.684 0.686 0.688 0.69 0.692 0.694 0.696 0.698 0.7 0.702 0.704 0.706 0.708 0.71 0.712
0
40.00%
W32/Mytob W32/Onlinegames W32/Sdbot W32/Bagle W32/Autorun W32/Netsky W32/Rbot W32/Opaserv W32/Mydoom W32/Lovgate W32/Ircbot Stoned W32/Koobface W32/Magania W32/Stration W32/Agobot W32/Sober WM W32/Korgo W32/Mimail X97M/Laroux W97M/Marker W32/Agent W32/Rontokbro W32/Yaha W32/Vaklik W32/Autoit W97M/Ethan W32/Klez W32/Areses W97M/Thus W32/VB Jerusalem W32/Looked W32/Sobig W32/Hybris W97M/Class W32/Dumaru W32/Sasser XM/Laroux W32/Conficker VBS/LoveLetter Form W32/Feebs W32/Mywife W32/Bagz W97M/Melissa W32/Blaster WM/Wazzu W32/Zbot VBS/VBSWG W95/CIH AntiCMOS X97M/Divi W32/Magistr W97M/Groov W32/Zafi W32/Elkern EXE_Bug W32/Spybot Empire W32/Nimda W32/Nachi X97M/Barisada W32/MyLife Cascade W32/Parite W97M/VMPCK1 W97M/Pri W32/Reatle W97M/ColdApe W32/Vanbot WM/Concept Yankee Doodle W32/Fujacks W32/Locksky W32/Kriz One_Half W32/Bobax W32/Virut W97M/Opey W32/Wootbot W32/BugBear Parity_Boot W32/BadTrans W97M/Eight941 Flip W97M/Myna W32/Slenfbot Stealth_Boot Ripper AntiEXE W32/Funlove
2400
W32/Mytob W32/Onlinegames W32/Sdbot W32/Bagle W32/Autorun W32/Netsky W32/Rbot W32/Opaserv W32/Mydoom W32/Lovgate W32/Ircbot Stoned W32/Koobface W32/Magania W32/Stration W32/Agobot W32/Sober WM W32/Korgo W32/Mimail X97M/Laroux W97M/Marker W32/Agent W32/Rontokbro W32/Yaha W32/Vaklik W32/Autoit W97M/Ethan W32/Klez W32/Areses W97M/Thus W32/VB Jerusalem W32/Looked W32/Sobig W32/Hybris W97M/Class W32/Dumaru W32/Sasser XM/Laroux W32/Conficker VBS/LoveLetter Form W32/Feebs W32/Mywife W32/Bagz W97M/Melissa W32/Blaster WM/Wazzu W32/Zbot VBS/VBSWG W95/CIH AntiCMOS X97M/Divi W32/Magistr W97M/Groov W32/Zafi W32/Elkern EXE_Bug W32/Spybot Empire W32/Nimda W32/Nachi X97M/Barisada W32/MyLife Cascade W32/Parite W97M/VMPCK1 W97M/Pri W32/Reatle W97M/ColdApe W32/Vanbot WM/Concept Yankee Doodle W32/Fujacks W32/Locksky W32/Kriz One_Half W32/Bobax W32/Virut W97M/Opey W32/Wootbot W32/BugBear Parity_Boot W32/BadTrans W97M/Eight941 Flip W97M/Myna W32/Slenfbot Stealth_Boot Ripper AntiEXE W32/Funlove
Incidents involved (absolute numbers)
0.002 0.002 0.002 0.002 0.002 0.002 0.002 0.002 0.002 0.002 0.002 0.002 0.002 0.002 0.002 0.002 0.002 0.002 0.002 0.002 0.002 0.002 0.002 0.002 0.002 0.002 0.002 0.002 0.002
Incidents involved (absolute numbers)
0.115 0.158 0.198 0.238 0.276 0.301 0.321 0.339 0.357 0.374 0.39 0.404 0.115 0.418 0.158 0.43 0.198 0.44 0.238 0.45 0.46 0.276 0.469 0.301 0.477 0.321 0.485 0.492 0.339 0.499 0.357 0.506 0.374 0.512 0.5180.39 0.524 0.404 0.529 0.418 0.534 0.5390.43 0.5440.44 0.548 0.45 0.552 0.5560.46 0.469 0.56 0.564 0.477 0.568 0.485 0.572 0.492 0.576 0.58 0.499 0.584 0.506 0.588 0.512 0.592 0.518 0.596 0.6 0.524 0.604 0.529 0.607 0.534 0.61 0.613 0.539 0.616 0.544 0.619 0.548 0.622 0.552 0.625 0.628 0.556 0.631 0.56 0.634 0.564 0.637 0.64 0.568 0.642 0.572 0.644 0.576 0.646 0.6480.58 0.65 0.584 0.652 0.588 0.654 0.592 0.656 0.658 0.596 0.66 0.6 0.662 0.604 0.664 0.607 0.666 0.6680.61 0.67 0.613 0.672 0.616 0.674 0.619 0.676 0.678 0.622 0.68 0.625 0.682 0.628 0.684 0.686 0.631 0.688 0.634 0.69 0.637 0.692 0.6940.64 0.696 0.642 0.698 0.644 0.7 0.646 0.702 0.704 0.648 0.706 0.65 0.708 0.652 0.71 0.654 0.712
Incidents involved (percentage)
0.115 0.043 0.040 0.040 0.038 0.025 0.020 0.018 0.018 0.017 0.016 0.0140.115 0.014 0.043 0.012 0.0100.040 0.0100.040 0.0100.038 0.009 0.025 0.008 0.0080.020 0.0070.018 0.007 0.018 0.007 0.0060.017 0.0060.016 0.0060.014 0.005 0.014 0.005 0.0050.012 0.0050.010 0.004 0.010 0.004 0.0040.010 0.0040.009 0.0040.008 0.004 0.008 0.004 0.0040.007 0.0040.007 0.004 0.007 0.004 0.006 0.004 0.0040.006 0.0040.006 0.004 0.005 0.003 0.0030.005 0.0030.005 0.003 0.005 0.003 0.004 0.003 0.0030.004 0.0030.004 0.003 0.004 0.003 0.0030.004 0.0030.004 0.0020.004 0.002 0.004 0.002 0.0020.004 0.0020.004 0.002 0.004 0.002 0.0020.004 0.0020.004 0.0020.004 0.002 0.004 0.002 0.0020.003 0.0020.003 0.002 0.003 0.002 0.0020.003 0.0020.003 0.0020.003 0.002 0.003 0.002 0.0020.003 0.0020.003 0.002 0.003 0.002 0.0020.003 0.0020.003 0.0020.002 0.002 0.002 0.002 0.0020.002 0.0020.002 0.002 0.002 0.002 0.0020.002 0.0020.002
80.00%
1600
W32/Mytob W32/Onlinegames W32/Sdbot W32/Bagle W32/Autorun W32/Netsky W32/Rbot W32/Opaserv W32/Mydoom W32/Lovgate W32/Ircbot Stoned W32/Koobface W32/Magania W32/Stration W32/Agobot W32/Sober WM W32/Korgo W32/Mimail X97M/Laroux W97M/Marker W32/Agent W32/Rontokbro W32/Yaha W32/Vaklik W32/Autoit W97M/Ethan W32/Klez W32/Areses W97M/Thus W32/VB Jerusalem W32/Looked W32/Sobig W32/Hybris W97M/Class W32/Dumaru W32/Sasser XM/Laroux W32/Conficker VBS/LoveLetter Form W32/Feebs W32/Mywife W32/Bagz W97M/Melissa W32/Blaster WM/Wazzu W32/Zbot VBS/VBSWG W95/CIH AntiCMOS X97M/Divi W32/Magistr W97M/Groov W32/Zafi W32/Elkern EXE_Bug W32/Spybot Empire W32/Nimda W32/Nachi X97M/Barisada W32/MyLife Cascade W32/Parite W97M/VMPCK1 W97M/Pri W32/Reatle W97M/ColdApe W32/Vanbot WM/Concept Yankee Doodle W32/Fujacks W32/Locksky W32/Kriz One_Half W32/Bobax W32/Virut W97M/Opey W32/Wootbot W32/BugBear Parity_Boot W32/BadTrans W97M/Eight941 Flip W97M/Myna W32/Slenfbot Stealth_Boot Ripper AntiEXE W32/Funlove
b egames
8000
7200
6400
5600
8000
4800
7200
6400 4000
3200
5600
2400
1600
4800
800
4000
0 3200
Figure 3: Dominant Malware in the WildList.
800
60.00%
20.00%
80.00%
Top malware threat Cumulative sum of top threats
60.00%
40.00%
20.00%
Figure 4: Top family per month.
6
Jan2010
Jul2009
Apr2009
Oct2009
Jul2008
Jan2009
Apr2008
Oct2008
Jul2007
Jan2008
Oct2007
Apr2007
Jul2006
Jan2007
Apr2006
Oct2006
Jan2006
Oct2005
Jun2005
Mar2005
Jun2004
Dec2004
Mar2004
Sep2004
Dec2003
Sep2003
Feb2003
May2003
Aug2002
Nov2002
May2002
Jul2001
Feb2002
Apr2001
Nov2001
Jul2000
Jan2001
Apr2000
Oct2000
Jun1999
Dec1999
Sep1999
Jul1998
Mar1999
Dec1998
Apr1998
Jul1997
Jan1998
Oct1997
Feb1997
Jan1996 Jan1996
Sep1996
Sep1995
Jun1995
Sep1995
May1996
Jul1994
Jan1995
Jun1995
Jan1995
Jul1993
0
Dec1993
17500
Jul1994
1500
1125
750
Jan2010
Oct2009
Jul2009
Apr2009
Jan2009
Oct2008
Jul2008
Apr2008
Jan2008
Oct2007
Jul2007
Apr2007
Jan2007
Oct2006
Jul2006
Apr2006
Jan2006
Oct2005
Jun2005
Mar2005
Dec2004
Sep2004
Jun2004
Mar2004
Dec2003
Sep2003
Feb2003
May2003
Nov2002
Feb2002
Aug2002
May2002
Jul2001
Apr2001
Nov2001
Jan2001
Oct2000
Jul2000
Apr2000
Jun1999
Dec1999
Mar1999
Sep1999
Dec1998
Jul1998
Apr1998
Jan1998
Oct1997
Jul1997
Feb1997
0
Sep1996
375
May1996
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Jul1993
5.60 5.26 4.78 4.00 4.05 4.07 3.74 4.69 3.85 3.98 3.55 3.52 3.54 3.48 3.52 3.45 3.43 3.38 3.35 3.45 3.54 3.48 3.59 3.57 4.88 5.42 6.44 6.60 7.14 6.99 6.84 6.97 8.24 8.70 9.20 8.05 7.81 6.98 7.32 7.52 7.65 7.25 7.14 7.48 8.14 8.21 7.73 9.02 8.72 11.37 11.71 13.53 21.33 22.58 26.24 27.12 28.92 29.57 29.35 33.11 33.11 37.63 36.88 38.26 36.85 36.62 36.21 35.26 35.14 34.22 34.00 33.51 33.42 32.78 26.16 27.24 25.67 24.66 23.43 23.50 21.56 20.42 17.97 16.93 14.11 13.27 16.88 24.26 28.30 30.15 36.22 35.50 25.93 34.68 28.61 23.21 26.44 26.27 23.19 21.38 18.68 17.35 18.05 16.87 16.48 15.43 14.31
Dec1993
3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
Malware incidents
13.36 13.16 12.61 11.56 11.26 11.31 11.21 11.74 10.58 10.45 9.14 9.55 8.59 8.46 9.05 8.87 8.33 9.18 9.09 9.36 9.60 9.45 9.74 9.18 11.71 12.81 13.37 14.15 14.29 15.28 14.96 14.75 15.69 16.21 17.20 17.24 16.73 17.94 20.38 20.38 19.88 21.16 20.60 21.05 20.87 21.03 20.05 21.80 21.55 25.89 26.19 32.51 38.23 39.63 42.55 43.39 44.86 45.47 45.77 48.70 49.39 51.27 50.22 51.37 49.86 49.24 48.83 47.82 47.29 47.98 48.06 47.57 47.39 47.19 41.74 42.29 42.55 41.38 41.43 41.34 40.71 39.11 38.64 38.34 36.05 35.10 38.74 44.83 48.42 49.83 59.84 70.53 62.96 52.19 46.02 49.24 50.00 49.79 49.04 48.61 48.79 50.54 50.91 47.61 46.74 43.77 40.81
Figure 5: Number of incidents per month.
characteristics of computer virology. Our intention is to work in the future towards the prediction of the imminent threats by implementing econometric models and technical analysis on security data. Specifically, known models such as ar, ma and arma could be used to predict future threats depending on past data by finding self-similarities and periodicity. The latest highly sophisticated malcode of the largest malware families indicates an escalation of the security arms race between malware writers and security researchers. The analysis of the WildList data emphasizes on the fact that malware writing is not any longer a trivial task. Gone are the days when disgruntled teenagers, activists or college dropouts could wreak havoc using simplistic programing tricks and earn their 15 minutes of fame. Competent malware should be able to mutate rapidly so as to propagate sufficiently and overcome the creation of effective signatures and evade other security mechanisms. The available data on the other hand signalize that the spreading of a virus or a worm in a wide scale is far from a trivial task. Therefore from a malware perspective it is better to work on a well maintained malicious code base than to develop new virus strain from scratch. Security professionals might found more promising an approach which prioritizes and concentrates their efforts against the most dominant malware phylogenies rather than trying to neutralize an overwhelming number of threats. For that reason if the available recourses are not adequate, it would be more productive for the research community to focus on the largest malware families, to monitor closely all the related developments and disseminate as fast as possible any findings of this activity. For years malcode developers exploit the monoculture weakness of modern it in order to perform their vicious acts. By turning our attention to the most common and widely used malcode, we can exploit their tactics for our benefit.
References [1] D. Ferbrache. A Pathology of Computer Viruses. Springer-Verlag, NY, USA, 1992.
7
[2] P. Szor. The Art of Computer Virus Research and Defense. Addison-Wesley, Upper Saddle River, NJ, February 2005. [3] E. Skoudis. Malware: Fighting Malicious Code. Computer Networking and Distributed Systems. Prentice Hall, NJ, USA, 6th edition, 2004. [4] F. Cohen. Computer Viruses: Theory and Experiments. In Proceedings of the 7th national security conference, pages 240–263, September 1984. [5] F. Cohen. Computer Viruses – Theory and Experiments. Computers and Security, 6:22–35, 1987. [6] F. Cohen. A Short Course on Computer Viruses. Wiley Professional Computing. Wiley, Canada, 1994. [7] R. Anderson, R. B¨ohme, R. Clayton, and T. Moore. Security Economics and the Internal Market. Technical report, European Network and information Security Agency (ENISA), January 2008. [8] D. Turner, J. Blackbird, M. K. Low, T. Adams, David McKinney, S. Entwisle, M. Laucht C. Wueest, P. Wood, D. Bleaken, G. Ahmad, D. Kemp, and A. Samnani. Symantec Global Internet Security Threat Report. Trends for 2008. Technical report, Symantec, April 2009. [9] S. Forrest, S. Hofmeyr, and A. Somayaji. Computer Immunology. Communications of the ACM, 40(10):88–96, 1997. [10] V. Vlachos, D. Spinellis, and S. Androutsellis-Theotokis. Biological Aspects of Computer Virology. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, 26:209–219, January 2010. [11] J. Li and P. Knickerbocker. Functional Similarities Between Computer Worms and Bilogical Pathogens. Computers & Security, 26:338–347, 2007. [12] D. Geer. Monoculture on the Back of the Envelope. ;login, 30(6):6–8, December 2005. [13] G. Goth. Addressing the Monoculture. IEEE Security & Privacy, 1(6):8–10, December 2003. [14] D. Geer, R. Bace, P. Gutmann, P. Metzger, C. P. Pfleeger, J. S. Quarterman, and B. Schneier. Cyber Insecurity: The Cost of Monopoly. Technical report, Computer & Communications Industry Association, 2003. [15] D. Geer. The Evolution of Security. ACM Queue, pages 31–35, April 2007. [16] A. Somayaji, S. Hofmeyr, and S. Forrest. Principles of a Computer Immune System. In Meeting on New Security Paradigms, 23-26 Sept. 1997, Langdale, UK, pages 75–82. New York, NY, USA: ACM, 1998, 1997.
8
[17] K. Anagnostakis, M. Greenwald, S. Ioannidis, A. Keromytis, and D. Li. A Cooperative Immunization System for an Untrusting Internet. In Proceedings of the 11th IEEE International Conference on Networks (ICON) 2003, pages 403–408, October 2003. [18] S. Sidiroglou and A. Keromytis. A Network Worm Vaccine Architecture. In IEEE International Workshops on Enabling Technologies: Infrastructure for Collaborative Enterprises (WETICE), Workshop on Enterprise Security, Linz, Austria., June 2003. [19] MedicineNet. Definition of Phylogenetics. http://www.medterms.com/script/ main/art.asp?articlekey=39615, 2010. (Accessed March 2010). [20] F. de la Cuadra. The Geneology of Malware. Network Security, pages 17–20, April 2007. [21] M. Hayes, A. Walenstein, and A. Lakhotia. Evaluation of Malware Phylogeny Modelling Systems Using Automated Variant Generation. Journal in Computer Virology, 5(4):335–343, November 2009. [22] Md. Karim, Andrew Walenstein, Arun Lakhotia, and Laxmi Parida. Malware Phylogeny Using Permutations of Code. Journal in Computer Virology, 1(1):13– 23, November 2005. [23] A. K. Seewald. Towards Automating Malware Classification and Characterization. In Konferenzband der 4. Jahrestagung des Fachbereichs Sicherheit der Gesellschaft f¨ ur Informatik (german-language proceedings), pages 291–302, Saarbr¨ ucken, April 2008. [24] S. Gordon. What is Wild? In Proceedings of the 20th National Information Systems Security Conference, 1997. [25] P. Bustamante. The Disconnect Between the WildList and Reality. Technical report, PandaLabs, January 2007. [26] A. Marx and F. Dessman. The WildList is Dead, Long Live the WildList! In Virus Bulletin Conference, pages 136–146, September 2007. [27] The WildList Organization International. Wildlist. http://www.wildlist.org/ WildList/201001.htm, 2010. (Accessed 2010). [28] S. Gordon. Inside the Mind of Dark Avenger. In Virus News International, 1993. [29] S. Gordon. Generic Virus Writer. In 4th International Virus Bulletin Conference, Jersey, UK, September 1994. [30] S. Gordon. Generic Virus Writer II. In 6th International Virus Bulletin Conference, Brighton, UK, September 1996. [31] S. Gordon. Understanding the adversary. IEEE Security & Privacy, 4(5):67–70, September 2006.
9
[32] C. Kanich, C. Kreibich, K. Levchenko, B. Enright, G. Voelker, V. Paxson, and S. Savage. Spamalytics: an empirical analysis of spam marketing conversion. Commun. ACM, 52(9):99–107, 2009.
10