You are on page 1of 5

JOURNAL OF COMPUTING, VOLUME 2, ISSUE 12, DECEMBER 2010, ISSN 2151-9617

HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 15

Malware detection using OWA measure


M. Eskandari, B. Hosseini, S. Hashemi and A. Salajegheh

Abstract—Threats from malwares are growing everyday hence new methods with higher performance and lower overloads for
processing time and memory complexity making it possible to discover malicious activities in a faster and reliable manner based
on static feature analysis instead of execution of frustrating dynamic analysis.the problem here is making a decision wether
any operations are malicious or not hence, here we introduce a new method using OWA operator to make the process of
decision making much realistic with maximum dispersion that is impressed by all factors of decision with least false positive and
better detection rate in compare with previous mere analysis. OWA while showing maximum entropy over all possibility of
occurance on the other hand scales down the dimension of decision and has efficient fusion of multiple features into one that
reduces partial relations and magnifies the hidden but significant relations also causes pretty reduction in false detections while
maintaing acceptable in true detections. The succeeding empirical results prove this claim.

Index Terms—Classification, Data mining, Detection, Malicious, Malware, Ordered weighting aggregation, OWA.

——————————  ——————————

1 INTRODUCTION

T he term computer virus was first used in a science


fiction novel by David Gerrold in 1972 [1]. Computer
virus detection has evolved into malicious program
decision makers needing to be as precise as possible.
In this paper, we focus on static analysis method, as-
sembly instructions and API calls are used as fetures like
detection since Cohen first formalized the term computer previous works, but with innovation on the importance
virus in 1983 [2]. Malicious programs can be classified weight factor in feature selection phase. We modify a new
into viruses, worms, trojans, spywares, adwares and a feature selection measure that detects malwares more accu-
variety of other classes and subclasses that sometimes rate and has faster detection process, comparing with re-
overlap and blur the boundaries among these classes [3]. vealed methods.
Both traditional signature based detection and genera- The instruction of this paper is as follow, in Section 2
lized approaches can be used to identify these malicious describes related works in malware detection based on
programs. To avoid detection by the traditional signature- data mining approaches. Section 3 discusses the proposed
based algorithms, a number of stealth techniques have method, an overview of our malware detection approach,
been developed by the malicious code writers. The inabil- feature extraction, feature selection use OWA measure and
ity of traditional signature based detection approaches to classification algorithms used in experiments. The system
catch these new breed of malicious programs has shifted evaluation is presented in Section 4 whose results are in-
the focus of virus research to find more generalized and terpreted and commented in same section. Section 5 con-
scalable features that can identify malicious behavior as a cludes our achievements, future works and summarizes
process instead of a single signature action. the results.
In detection approaches based on data mining me-
thods different variations of features are utilized. These
2 RELATED WORK
fetures are gained through static and dynamic anlaysis.
Dynamic analysis executes the program and monitors its Bergeron et al. used a static misuse detection scheme
behaviour, then extracts the program behaviour attributes where they used program slicing to extract program re-
as data mining feature. While in static analysis, features gions that are critcal from a security point of view. Prior
are extracted without actually executing the program, in to this, the programs were disassembled and converted to
the other words static analysis works on program source an intermediate representation. Once the program slices
code or binary code directly. Dynamic analysis needs are obtained, their behavior was checked against a pre-
more computation power, putting the host system under defined security policy to detect the presence of mali-
risk of any suspective code executions for detection also ciousness [4].
imposes overloads of cocuurent execution with any pro- Bergeron et al. in other work extracted an API call
gram as dynamic monitoring but makes more flexible graph instead of the program slices to test against the
deciosion through execution on the other hand static me- security policy. The programs were disassembled first
thods don’t need such resources however they are fixed and then a control graph is created from the disassembly.
Next step was to extract the API call graph from the con-
———————————————— trol graph [5].
 M. Eskandari, Islamic Azad university south Tehran branch, and also APA Lo et al. proposed the idea of tell-tale signs which were
Malware Research and Education Center at Shiraz University heuristic signatures of malicious program behaviors.
 B. Hosseini, APA Malware Research and Education Center at Shiraz
University They created an intermediate representation of the pro-
 S. Hashemi, Department of Computer and IT of Shiraz University and also gram under investigation in the form of control flow
APA Malware Research and Education Center at Shiraz University graph. The CFG was verified against the tell-tale signs to
 A. Salajegheh, Islamic Azad university south Tehran branch
JOURNAL OF COMPUTING, VOLUME 2, ISSUE 12, DECEMBER 2010, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 16

detect any malicious activity. The approach was imple- This operator has been proved to be very useful, because
mented in their system called Malicious Code Filter of its versatility, the OWA operators provide a paramete-
(MCF) [6]. rized family of aggregation operators, which include any
Sung et al. implemented signatures in the form of API of the well-known operators such as the maxium, mini-
calls in a technique called Static Analysis for Vicious Ex- mum, the k-order statistics, the median and the arithmetic
ecutables (SAVE). The API calls from any program under mean. In order to obtain these particular operators we
investigation were compared against the signature calls should simply choose particular weights. The Ordered
using Euclidean distance [7]. Weighted Averaging operators are commutative, mono-
tone, idempotent, they are stable for positive linear trans-
formations, and they have a compensatory behavior. This
3 THE PROPOSED APPROACH
last property translates the fact that the aggregation done
In the first step we extract the features and normalize by an OWA operator always is between the maximum
them. After that, we apply the OWA operator on feature and the minimum. It can be seen as a parameterized way
vector weights to select the best feature distribution to go from the min to the max. In this context, a degree of
which has the maximum entropy over all constratints. maxness (initially called orness) was introduced in [8],
And in the final phase, we use a classification algorithm defined by
to make decision whether an input file is malicious or not.
1 n
3.1 Extracting features
maxness( w1 , w2 ,..., wn )   (n  i)wi
n  1 i 1
(2)

Fetures used in this study are divided into two main where the minimum, maxness (1, 0, . . . , 0) = 0 and for the
classes, first the assembly instructions and the second maximum maxness (0, . . . , 0, 1) = 1. A simple class of
class is API calls. In order to extract both of these types, at OWA operators as exponential class of OWA operators
the first step we disassembled input file, with one of was introduced to generate the OWA weights satisfying a
COTS disassemblers. In the next phase called normaliza- given degree of maxness. The optimistic and pessimistic
tion phase, unnecessary instructions are removed, all exponential OWA operators were correspondingly
conditional jumps are replaced with one jump instruction, intrduced to proportionally figure out what the result of
procedure call and “return” instructions are converted to aggregation would be with most and less importance
equivalent jumps, and stack elimination is performed in over the ordered entering units. This is a means for
order to remove push, pop, pushf, popf, etc. Afterward weight vector reordering to maximize relational decisions
the normalized file is scaned, instructions and their repe- constraint with highest dispersion.
tition rate are saved to an instruction repository. This re-
petition rate is range by the devision of number of each 3.2 Features weight reordering using owa
specific instruction over amount of all instruction. Assume there is a collection of m samples observations
To extract the called APIs, first find the API call in- each comprised of a n-tuple of arguments (ak1, ak2, ..., akn),
struction and get its parameters, then add the called API and an associated aggregated value, dk. We denote the
name into an API repository. reordered objects of the kth sample by (bk1, bk2, ..., bkn)
To create a feature vector dataset, the API repository where bkj is the jth largest element of the argument collec-
and instruction repository are combined. tion (ak1, ak2, ..., akn). Using these ordered arguments, we
need to find a vector of the OWA weights w = (w1, w2, ...,
3.2 OWA Defintion
wn)T to satisfy the following condition as faithfully as
The Oredered Weighted Averaging is an aggregation op- possible:
erator originally founded by Yager [8] to provide a means
for aggregating scores associated with the satisfacion of bk1w1  bk2w2  bk3w3    bknwn  dk , k 1,,m (3)
multiple criteria in decision making, which unifies in one
operator the conjunctive and disjunctive behavior. A pret- We relax the above condition by looking for a vector of
ty survey over the weights determination was written by OWA weights that approximates the aggregation opera-
Xu [10]. Various applications have been improved by tor by minimizing the instantaneous errors:
OWA operator modifications. Some recent ones include
1
 bk1w1  bk2w2   bknwn  dk  , k 1,...,m
2
ching [11] using OWA reordering for classifications also ek  (4)
Emrouznejad [9] made some advantages of OWA in doc- 2
ument similarity detection and Sadiq [20] used this func- with respect to the weights wi ,i=(1, 2, ..., n) regarding to
tion for optimum source distribution with criteria. wi such that wi є [0,1] and (∑ni=1 wi = 1). Now Let λi, i=(1,
The OWA operator of dimension n is a mapping such as: 2, ..., n) be n parameters, and set the initial values λi(0)=0,
F: Rn →R and is given by: i=(1, 2, ..., n), then the procedure used at each iteration is
n
suggested as follows:
OWA( x1 , x2 ,..., xn )   w j x ( j ) (1) 1. Calculate the current estimates of the λi(l) ,i=(1, 2,
j 1
..., n) and a new observation consisting of the or-
where σ is a permutation that orders the elements in the dered arguments bk1, bk2, ..., bkn.
following order, x σ(1) ≤ x σ(2) ≤ x σ(3) ≤ x σ(4) ≤ … ≤ x σ(n). 2. Use the λi(l), i=(1, 2, ..., n) to provide a current es-
The weights are all non-negative (wi > 0) and their sum equals timate of the weights by Equation (5).
to one (∑ni=1 wi =1).
JOURNAL OF COMPUTING, VOLUME 2, ISSUE 12, DECEMBER 2010, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 17

3. Utilize the estimated weights along with the or- regarding their repetation rate.
dered arguments to get a calculated aggregated
value using Equation (6). 3.3 Classification Models
4. Update the estimates of the λi according to Equa- A naive Bayes classifier assumes that the presence (or
tion (7). absence) of a particular feature of a class is unrelated to
the presence (or absence) of any other feature.
e i ( l )
wi (l )  n
, i  1, 2,..., m (5) Bayesian networks belong to the family of probabilistic
graphical models. These graphical structures are used to
e
 j (l )
represent knowledge about an uncertain domain. In par-
j 1
ticular, each node in the graph represents a random vari-
dˆk bk1w1  l  bk2w2  l   bknwn  l , k 1,2,..n (6) able, while the edges between the nodes represent proba-
bilistic dependencies among the corresponding random
variables. These conditional dependencies in the graph
i (l  1)  i (l )   w i (l )(bki  dˆk )(dˆk  d k ) (7) are often estimated by using known statistical and com-
putational methods. Hence, bayesian networks combine
Clearly, the parameters λi, determining the OWA principles from graph theory, probability theory, com-
weights are updated by propagation of the error puter science, and statistics [12].
( dˆ k  d k ) between the current estimated aggregated val- A random tree is a tree or arborescence that is formed
ue and the actual aggregated value with factors wi and by a stochastic process. Different types of random trees
( b k i  dˆ k ). include: uniform spanning tree, random minimal span-
So we have a normalized solution here to find the ning tree, random binary tree, random recursive tree,
weights according to the importance of the API calls and treap or randomized binary search tree, rapidly exploring
the instructions orders applied during the code synthesis random tree, Brownian tree, random forest and branching
to maximize the enthropy or dispersion factor over all process [13].
weights at the first iteration to begin a homogenous Sequential minimal optimization (SMO) is an algo-
weight for all features and the initial order of the input rithm for solving large quadratic programming (QP) op-
vector is not affected by the order of the input location of timization problems, widely used for the training of sup-
entrance. port vector machines. SMO breaks up large QP problems
From this step the reordering process will be followed into a series of smallest possible QP problems, which are
to change the structure entrance orderings. Instructions then solved analytically [14].
and API calls due to their optimistic OWA weights are Logistic regression is used for prediction of the proba-
organized. The solution is found by solving the following bility of occurrence of an event by fitting data to a logit
linear objective programming model: function logistic curve. It is a generalized linear model
n used for binomial regression. Like many forms of regres-
min J   (ek  ek ), k  1, 2,..., m (8) sion analysis, it makes use of several predictor variables
k 1 that may be either numerical or categorical.
n
s.t. bkj w j  d k  ek  ek  0, k  1, 2,..., m (9) 4 EXPERIMENTAL RESULTS
j 1

w H 4.1 Data Collection


n Many researchers, in this field, used an imbalanced data-
wi  0, i  1, 2,..., n , w
i 1
i 1 set in their experiments in which number of malwares is
much more than the number of benign files [15, 16, 17].
We thought, however, that the above assumption does
ek  0, ek  0, k  1, 2,..., m stand in real world domain, in which number of mali-
cious binaries is at most as much that of benign files.
where, e k and e k are the upper and lower deviation va- Clearly, it can be conclude that those methods, in which
riables of dk, respectively. By solving the above linear ob- we have considerable amount of malicious binaries com-
jective programming model, we can obtain the vector of pared to the benign ones, provide better accuracy. Ac-
the OWA weights. Regarding to each data set applicative cording to this thought we collect 537 benign Windows
weights are gained. Now by the aid of new orders mod- PE-files and select 563 malwares from malware repository
ified by the relationship between instructions we have a of APA malware research center at Shiraz University.
better detection of related classes of equal functions. Here
the Orness amount is equal to ½ in order to have a normal
and symmetric vector space. Then we begin by modifing 4.2 Evaluation measures
the λi(0)=0 ,due to Xu [10] amount of β= 0.37 will give the We define “Detection Rate” as the percentage of all input
best practical results . As practical experiment proceeds files labeled “malicious” that can receive correct label by
0.251 is the amount found suitable in final step with least the system, as illustrate in Equation (10).
error rate for e k = e k = 0.001 now these optimum weights
TP
will meet our needs for aggregation of selected features DetectionRate ( Dt.Rate)  (10)
TP  FN
JOURNAL OF COMPUTING, VOLUME 2, ISSUE 12, DECEMBER 2010, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 18

TABLE 1
The “False Alarm Rate” is the percentage labeled “nor- EXPERIMENTAL RESULTS ON MENTIONED DATASET WITH DIFFER-
ENT CLASSIFIERS AND COMPARISION WITH OWA BASED CLAS-
mal” that likewise receive the wrong label by the system,
SIFIERS
as illustrate in Equation (11).
FP
FalseAlarmRate ( FA.Rate)  (11) Model Dt. Rate FA. Rate Accuracy AUC
TN  FP
The “Accuracy” is the overall accuracy of the system to
detect malwares and benign files, as illustrate in Equation Baye net 81.3 59.7 60.8 85.3
(12).
TP  TN Baye net 91.7 57.7 67.5 87.5
Accuracy  (12)
TP  TN  FP  FN With OWA
The “cross validation”, is a technique for assessing how
the results of a statistical analysis will generalize to an Random 85.1 22.5 81.3 86.4
independent data set. It is mainly used in settings where
Tree
the goal is prediction, and one wants to estimate how ac-
curately a predictive model will perform in practice. One
round of cross validation involves partitioning a sample Random 92.2 12.5 89.9 90.6
of data into complementary subsets, performing the anal- Tree With
ysis on one subset that called the training set, and validat- OWA
ing the analysis on the other subset that called the testing
set. To reduce variability, multiple rounds of cross valida-
tion are performed using different partitions, and the va- Bayesian 88.6 19.8 84.4 87.5
lidation results are averaged over the rounds, we called Logistic
each round as a fold [18]. Regression
The “ROC curve”, is a graphical plot of true positive
rate, versus false positive rate, for a binary classifier sys-
tem as its discrimination threshold is varied. The area Bayesian 97.9 6.1 95.9 95.9
under the “ROC curve”, called “AUC”. The “AUC” is Logistic
equal to the probability that a classifier will rank a ran- Regression
domly chosen positive instance higher than a randomly With OWA
chosen negative one [19].

4.3 Analysis SMO 92.9 16.8 88.1 91.4


We illustrate the different acuracy rates over different
classification models and compare them with OWA oper-
ator at the same classification models in tabular form, as SMO With 97.9 5.8 96.1 96.0
illustrate in Table 1. These results obtained with 10 fold OWA
cross validation test. The results indicate that the methods
based on OWA feature selection provide excellent acura-
cy than rival approaches. According to this comparison, search and Education Center at Shiraz University.
our approach has the best accuracy, among other detec-
tion methods.
REFERENCES
[1] D. Gerrold, “When Harlie Was One,” Doubleday, 1972.
5 CONCLUSIONS [2] F. Cohen, “Computer Viruses,” PhD thesis, University of
In static approaches having least false detection is critical Southern California, 1985.
whilst true detection precision should maintain in its [3] P. Szor, “The Art of Computer Virus Research and Defense,”
highest efficiency. Here through using OWA operator, we Addison Wesley for Symantec Press, New Jersey, 2005.
are able to have better feature selection on malwares detc- [4] J. Bergeron, M. Debbabi, M. M. Erhioui, and B. Ktari, “Static
tion with no further overloads on computation time or Analysis of Binary Code to Isolate Malicious Behavior,” In Pro-
memory complexity in comparison with equivalent static ceedings of the 8th Workshop on Enabling Technologies on In-
pure ones however the false detection is drastically frastructure for Collaborative Enterprises (WETICE’99), pp.
drawn off and somehow a sensible increment is seen in 184-189, 1999.
true detection rate making this method best among rival [5] J. Bergeron, M. Debbabi, J. Desharnais, M. M. Erhioui, Y. La-
approaches. voie, and N. Tawbi, “Static Detection of Malicious Code in Ex-
ecutable Programs,” Symposium on Requirements Engineering
ACKNOWLEDGMENT for Information Security (SREIS’01), 2001.
[6] R.W. Lo, K.N. Levitt, and R.A. Olsson. “MCF: A Malicious
This work was supported in part by APA Malware Re-
JOURNAL OF COMPUTING, VOLUME 2, ISSUE 12, DECEMBER 2010, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 19

Code Filter,” Computers and Security, vol 14, no 6, pp 541-566,


1995.
[7] A.H. Sung, J. Xu, P. Chavez, and S. Mukkamala. “Static Ana-
lyzer of Vicious Executables,” In 20th Annual Computer Securi-
ty Applications Conference, pp 326-334, 2004.
[8] R.R. Yager, “On ordered weighted averaging aggregation oper-
ators in multi-criteria decision making,” IEEE transactions on
Systems, Man and Cybernetics, vol 18, pp 183–190, 1988.
[9] A. Emrouznejad, G.R. Amin, “Document Similarity: A New
Measure Using OWA, “Fuzzy Systems and Knowledge Discov-
ery, 2009. FSKD '09. Sixth International Conference on , vol 7,
no 1, pp.186-190, 14-16 Aug. 2009
[10] Z.S. Xu, “An overview of methods for determining OWA
weights,“ Int. J. Intelligent .System, vol 20, pp 843–865, 2005.
[11] C.H. Cheng, J.W. Wang, M.C. Wu, “OWA-weighted based
clustering method for classification problem,” Expert Systems
with Applications, vol 36, pp 4988–499, 2009.
[12] N.L. Zhang, D. Poole, “Exploiting causal independence in
Bayesian network inference,” Journal of Artificial Intelligence
Research vol 5, pp 301-328, 1996.
[13] T.G. Dietterich, “An Experimental Comparison of Three Me-
thods for Constructing Ensembles of Decision Trees: Bagging,
Boosting, and Randomization,” Machine Learning, vol 40, no
2, pp. 139-157, 2000.
[14] J.C. Platt, “Fast training of support vector machines using se-
quential minimal optimization,” In Advances in kernel me-
thods, pp. 185-208, MIT Press, Cambridge, MA, USA, 1999.
[15] G. Bonfante, M. Kaczmarek, and J. Marion, “Control Flow to
Detect Malware,” Inter-Regional Workshop on Rigorous Sys-
tem Development and Analysis, 2007.
[16] Y. Ye, D. Wang, T. Li, D. Ye, Q. Jiang, “An intelligent PE-
malware detection system based on association mining,” Jour-
nal of Computer Virology, Feb 2008.
[17] G. Bonfante, M. Kaczmarek, and J. Marion, “Architecture of a
morphological malware detector,” Journal of Computer Virolo-
gy, Sep 2008.
[18] R. Picard, D. Cook, “Cross-Validation of Regression Models,”
Journal of the American Statistical Association, pp 575-583,
1984.
[19] L.E. Dodd, M.S. Pepe, “Partial AUC Estimation and Regres-
sion,” Biometrics, pp 614-623, 2003.
[20] R. Sadiq, M.J. Rodríguez, S. Tesfamariam, “Integrating indica-
tors for performance assessment of small water utilities using
ordered weighted averaging (OWA) operators,” Expert Sys-
tems with Applications ,vol 37, pp 4881–4891, 2010.

You might also like