Professional Documents
Culture Documents
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 15
Abstract—Threats from malwares are growing everyday hence new methods with higher performance and lower overloads for
processing time and memory complexity making it possible to discover malicious activities in a faster and reliable manner based
on static feature analysis instead of execution of frustrating dynamic analysis.the problem here is making a decision wether
any operations are malicious or not hence, here we introduce a new method using OWA operator to make the process of
decision making much realistic with maximum dispersion that is impressed by all factors of decision with least false positive and
better detection rate in compare with previous mere analysis. OWA while showing maximum entropy over all possibility of
occurance on the other hand scales down the dimension of decision and has efficient fusion of multiple features into one that
reduces partial relations and magnifies the hidden but significant relations also causes pretty reduction in false detections while
maintaing acceptable in true detections. The succeeding empirical results prove this claim.
Index Terms—Classification, Data mining, Detection, Malicious, Malware, Ordered weighting aggregation, OWA.
—————————— ——————————
1 INTRODUCTION
detect any malicious activity. The approach was imple- This operator has been proved to be very useful, because
mented in their system called Malicious Code Filter of its versatility, the OWA operators provide a paramete-
(MCF) [6]. rized family of aggregation operators, which include any
Sung et al. implemented signatures in the form of API of the well-known operators such as the maxium, mini-
calls in a technique called Static Analysis for Vicious Ex- mum, the k-order statistics, the median and the arithmetic
ecutables (SAVE). The API calls from any program under mean. In order to obtain these particular operators we
investigation were compared against the signature calls should simply choose particular weights. The Ordered
using Euclidean distance [7]. Weighted Averaging operators are commutative, mono-
tone, idempotent, they are stable for positive linear trans-
formations, and they have a compensatory behavior. This
3 THE PROPOSED APPROACH
last property translates the fact that the aggregation done
In the first step we extract the features and normalize by an OWA operator always is between the maximum
them. After that, we apply the OWA operator on feature and the minimum. It can be seen as a parameterized way
vector weights to select the best feature distribution to go from the min to the max. In this context, a degree of
which has the maximum entropy over all constratints. maxness (initially called orness) was introduced in [8],
And in the final phase, we use a classification algorithm defined by
to make decision whether an input file is malicious or not.
1 n
3.1 Extracting features
maxness( w1 , w2 ,..., wn ) (n i)wi
n 1 i 1
(2)
Fetures used in this study are divided into two main where the minimum, maxness (1, 0, . . . , 0) = 0 and for the
classes, first the assembly instructions and the second maximum maxness (0, . . . , 0, 1) = 1. A simple class of
class is API calls. In order to extract both of these types, at OWA operators as exponential class of OWA operators
the first step we disassembled input file, with one of was introduced to generate the OWA weights satisfying a
COTS disassemblers. In the next phase called normaliza- given degree of maxness. The optimistic and pessimistic
tion phase, unnecessary instructions are removed, all exponential OWA operators were correspondingly
conditional jumps are replaced with one jump instruction, intrduced to proportionally figure out what the result of
procedure call and “return” instructions are converted to aggregation would be with most and less importance
equivalent jumps, and stack elimination is performed in over the ordered entering units. This is a means for
order to remove push, pop, pushf, popf, etc. Afterward weight vector reordering to maximize relational decisions
the normalized file is scaned, instructions and their repe- constraint with highest dispersion.
tition rate are saved to an instruction repository. This re-
petition rate is range by the devision of number of each 3.2 Features weight reordering using owa
specific instruction over amount of all instruction. Assume there is a collection of m samples observations
To extract the called APIs, first find the API call in- each comprised of a n-tuple of arguments (ak1, ak2, ..., akn),
struction and get its parameters, then add the called API and an associated aggregated value, dk. We denote the
name into an API repository. reordered objects of the kth sample by (bk1, bk2, ..., bkn)
To create a feature vector dataset, the API repository where bkj is the jth largest element of the argument collec-
and instruction repository are combined. tion (ak1, ak2, ..., akn). Using these ordered arguments, we
need to find a vector of the OWA weights w = (w1, w2, ...,
3.2 OWA Defintion
wn)T to satisfy the following condition as faithfully as
The Oredered Weighted Averaging is an aggregation op- possible:
erator originally founded by Yager [8] to provide a means
for aggregating scores associated with the satisfacion of bk1w1 bk2w2 bk3w3 bknwn dk , k 1,,m (3)
multiple criteria in decision making, which unifies in one
operator the conjunctive and disjunctive behavior. A pret- We relax the above condition by looking for a vector of
ty survey over the weights determination was written by OWA weights that approximates the aggregation opera-
Xu [10]. Various applications have been improved by tor by minimizing the instantaneous errors:
OWA operator modifications. Some recent ones include
1
bk1w1 bk2w2 bknwn dk , k 1,...,m
2
ching [11] using OWA reordering for classifications also ek (4)
Emrouznejad [9] made some advantages of OWA in doc- 2
ument similarity detection and Sadiq [20] used this func- with respect to the weights wi ,i=(1, 2, ..., n) regarding to
tion for optimum source distribution with criteria. wi such that wi є [0,1] and (∑ni=1 wi = 1). Now Let λi, i=(1,
The OWA operator of dimension n is a mapping such as: 2, ..., n) be n parameters, and set the initial values λi(0)=0,
F: Rn →R and is given by: i=(1, 2, ..., n), then the procedure used at each iteration is
n
suggested as follows:
OWA( x1 , x2 ,..., xn ) w j x ( j ) (1) 1. Calculate the current estimates of the λi(l) ,i=(1, 2,
j 1
..., n) and a new observation consisting of the or-
where σ is a permutation that orders the elements in the dered arguments bk1, bk2, ..., bkn.
following order, x σ(1) ≤ x σ(2) ≤ x σ(3) ≤ x σ(4) ≤ … ≤ x σ(n). 2. Use the λi(l), i=(1, 2, ..., n) to provide a current es-
The weights are all non-negative (wi > 0) and their sum equals timate of the weights by Equation (5).
to one (∑ni=1 wi =1).
JOURNAL OF COMPUTING, VOLUME 2, ISSUE 12, DECEMBER 2010, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 17
3. Utilize the estimated weights along with the or- regarding their repetation rate.
dered arguments to get a calculated aggregated
value using Equation (6). 3.3 Classification Models
4. Update the estimates of the λi according to Equa- A naive Bayes classifier assumes that the presence (or
tion (7). absence) of a particular feature of a class is unrelated to
the presence (or absence) of any other feature.
e i ( l )
wi (l ) n
, i 1, 2,..., m (5) Bayesian networks belong to the family of probabilistic
graphical models. These graphical structures are used to
e
j (l )
represent knowledge about an uncertain domain. In par-
j 1
ticular, each node in the graph represents a random vari-
dˆk bk1w1 l bk2w2 l bknwn l , k 1,2,..n (6) able, while the edges between the nodes represent proba-
bilistic dependencies among the corresponding random
variables. These conditional dependencies in the graph
i (l 1) i (l ) w i (l )(bki dˆk )(dˆk d k ) (7) are often estimated by using known statistical and com-
putational methods. Hence, bayesian networks combine
Clearly, the parameters λi, determining the OWA principles from graph theory, probability theory, com-
weights are updated by propagation of the error puter science, and statistics [12].
( dˆ k d k ) between the current estimated aggregated val- A random tree is a tree or arborescence that is formed
ue and the actual aggregated value with factors wi and by a stochastic process. Different types of random trees
( b k i dˆ k ). include: uniform spanning tree, random minimal span-
So we have a normalized solution here to find the ning tree, random binary tree, random recursive tree,
weights according to the importance of the API calls and treap or randomized binary search tree, rapidly exploring
the instructions orders applied during the code synthesis random tree, Brownian tree, random forest and branching
to maximize the enthropy or dispersion factor over all process [13].
weights at the first iteration to begin a homogenous Sequential minimal optimization (SMO) is an algo-
weight for all features and the initial order of the input rithm for solving large quadratic programming (QP) op-
vector is not affected by the order of the input location of timization problems, widely used for the training of sup-
entrance. port vector machines. SMO breaks up large QP problems
From this step the reordering process will be followed into a series of smallest possible QP problems, which are
to change the structure entrance orderings. Instructions then solved analytically [14].
and API calls due to their optimistic OWA weights are Logistic regression is used for prediction of the proba-
organized. The solution is found by solving the following bility of occurrence of an event by fitting data to a logit
linear objective programming model: function logistic curve. It is a generalized linear model
n used for binomial regression. Like many forms of regres-
min J (ek ek ), k 1, 2,..., m (8) sion analysis, it makes use of several predictor variables
k 1 that may be either numerical or categorical.
n
s.t. bkj w j d k ek ek 0, k 1, 2,..., m (9) 4 EXPERIMENTAL RESULTS
j 1
TABLE 1
The “False Alarm Rate” is the percentage labeled “nor- EXPERIMENTAL RESULTS ON MENTIONED DATASET WITH DIFFER-
ENT CLASSIFIERS AND COMPARISION WITH OWA BASED CLAS-
mal” that likewise receive the wrong label by the system,
SIFIERS
as illustrate in Equation (11).
FP
FalseAlarmRate ( FA.Rate) (11) Model Dt. Rate FA. Rate Accuracy AUC
TN FP
The “Accuracy” is the overall accuracy of the system to
detect malwares and benign files, as illustrate in Equation Baye net 81.3 59.7 60.8 85.3
(12).
TP TN Baye net 91.7 57.7 67.5 87.5
Accuracy (12)
TP TN FP FN With OWA
The “cross validation”, is a technique for assessing how
the results of a statistical analysis will generalize to an Random 85.1 22.5 81.3 86.4
independent data set. It is mainly used in settings where
Tree
the goal is prediction, and one wants to estimate how ac-
curately a predictive model will perform in practice. One
round of cross validation involves partitioning a sample Random 92.2 12.5 89.9 90.6
of data into complementary subsets, performing the anal- Tree With
ysis on one subset that called the training set, and validat- OWA
ing the analysis on the other subset that called the testing
set. To reduce variability, multiple rounds of cross valida-
tion are performed using different partitions, and the va- Bayesian 88.6 19.8 84.4 87.5
lidation results are averaged over the rounds, we called Logistic
each round as a fold [18]. Regression
The “ROC curve”, is a graphical plot of true positive
rate, versus false positive rate, for a binary classifier sys-
tem as its discrimination threshold is varied. The area Bayesian 97.9 6.1 95.9 95.9
under the “ROC curve”, called “AUC”. The “AUC” is Logistic
equal to the probability that a classifier will rank a ran- Regression
domly chosen positive instance higher than a randomly With OWA
chosen negative one [19].