You are on page 1of 5

Encoding constants in Watermarking Structure

A Graph-based Software Watermarking Technique

Riya Rajan
Computer Science and Engineering Department College of Engineering SNGIST Ernakulam, Kerala e.sngist@gmail.com
AbstractSoftware watermarking technique embeds an identification mark, i.e. a watermark value within software to discourage software theft. There are several graph theoretic watermarking techniques which encode the watermark values as graph structures and this graph structure is embedded in application programs. In this paper we propose an efficient algorithm which encodes both constants in the program and watermark value in a graph called Reducible Permutation Graph. Since both watermark value and constants in the program are encoded in a single graph structure, any modification on this graph will lead to execution failure. This property causes the watermarking system resilient to attacks. Moreover, our encoding and decoding algorithm have low time complexity and can be easily implemented. Keywords- Software watermarking, Watermark value, encoding, Reducible Permutation Graph, Self Inverting Permutation.

Jyothimon C
Computer Science and Engineering Department College of Engineering SNGIST Ernakulam, Kerala jyothi.sngist@gmail.com

SOFTWARE WATERMARKING Software Watermarking Software Watermarking can be described as the problem of embedding a structure w into a program P such that w can be reliably located and extracted from P even after P has be subjected to code transformations. Precisely, given a program P, a watermark w, a key k, the software watermarking problem can be described by the following two functions: embed ( P, w, k) P and extract ( P, k ) w. A. Classification of Watermarking algorithms Watermarking algorithms can be generally divided into static algorithms and dynamic algorithms [2,3,4]. A watermark is stored inside a program code in a certain format, and it does not change during the program execution is called static software watermarking. A dynamic watermark is built during program execution, perhaps only after a particular sequence of input. According to the representation of watermark information there are two types of static watermarks: data watermarks and code watermarks. Data watermark stores watermark information as program data, and can be stored anywhere inside a program, such as in comments or in variables. A code watermark is represented by choosing particular sequence of instructions has an equivalent effect. There are three dynamic watermarking techniques [2]: Easter Egg Watermarking, Execution Trace Watermarking and Dynamic Data Structure Watermarking. B. Characteristics of a good watermarking algorithms Date rate, stealth, and resilience are the characteristics of software watermarking [2]. The data rate expresses the quantity of hidden data that can be embedded within the cover message. The stealth expresses how imperceptible the embedded data is to an observer, and the resilience express the hidden messages degree of immunity to attack by an adversary [10,11,12].

INTRODUCTION Development in Internet Technology, wide spread use of peer-to-peer resource sharing technology, software products are spreading faster. A survey conducted by Business Software Alliance [1] shows that more than 41% of software using worldwide are pirated. To protect intellectual property rights of software products become a serious challenge to the software industry. Software Watermarking is a technique to prevent or discourage software piracy and copyright infringement. This paper is structured as follows. In section 2, we describe the formal definition of graph based software watermarking. Preliminary section will be discussed in section 3. Different models of attacks that can occur in graph based watermarking algorithms are expressed in section 4. We propose an algorithm which effectively overcomes the previously mentioned attacks is discussed in section 5. In section 6, we conclude by discussing the main advantages of our proposed algorithm.

Identify applicable sponsor/s here. (sponsors)

C. Attacks against Watermarks A successful attack against the watermarked program Pw prevents the recognizer from extracting the watermark while not seriously harming the performances or correctness of the program Pw. Attacks against software watermarks can be classified into subtractive attacks, distortive attacks and additive attacks. A subtractive attack is one where the attacker, perhaps with some tool support, tries to locate and remove the watermark. If the attacker cannot locate the watermark and is willing accept some degradation in quality of watermarked program, he can apply distortive transformations uniformly over the watermarked program is known as distortive attacks. An additive attack is one where the attacker inserts his own watermark in an attempt to override owners watermark, or at least make it plausible that the owners watermark was not inserted before the attackers [5,6,7]. We can classify the most relevant existing software watermarking techniques as Graph-based software watermarking, register based software watermarking, thread based software watermarking, obfuscation based software watermarking, branch based software watermarking, program slicing based software watermarking and abstract interpretation based software watermarking[6,7,12,13]. In this paper we propose a graph based software watermarking algorithm in which constants used in the programs and watermark information are encoded into a graph called Reducible Permutation Graph. This graph is embedded into the graph. Hence any modification in the graph will destroy the proper execution of the program. GRAPH BASED SOFTWARE WATERMARKING There are several software watermarking algorithms have been proposed that encode watermarks as graph structures [9, 10, 11]. In general, such encoding make use of an encoding function encode which converts a watermarking number w into a graph G, Encode(w)G and also a decoding function decode that convert the graph G into the number w, decode(G)w. We usually call the pair (encode, decode) as graph codec. From a graph theoretic point of view we are looking for a class of graph G and corresponding codec. (encode, decode)G. Collberg and Thomborson[2,4] proposed the first dynamic graph watermarking scheme CT to overcome problems with static watermarking schemes. Static watermarks are highly fragile and therefore susceptible to semantics preserving transformation attacks. Dynamic graph watermarking schemes are similar to static graph watermarking except the graph is built at run-time. Preliminaries This section describes an efficient and easily implemented algorithm for encoding numbers as reducible permutation graphs through the use of self-inverting permutations. This section also gives some basic definitions required for

understanding how to produce a self inverting permutation from a watermark value. An algorithm for converting watermark value W to Self inverting permutation and the reverse process proposed by Maria Chroni and Nikolopolos [14,15] is described. They also presented an algorithms for embedding the reducible permutation graphs into the program code and explains how we can extracts the reducible permutation graphs from the program code. We consider finite graphs with no multiple edges. For a graph G, we denote V (G) and E (G) the vertex set and edge set of G, respectively. Next they introduce some definitions that are key-objects in our algorithms for encoding numbers as graphs. Let be a permutation over the set Nn=1, 2, 3,,n. We think of permutation as a sequence (1, 2, 3, n). Definition 1: The inverse of a permutation (1, 2, 3, n) is the permutation (q1,q2,,qn) with qi=qi. A self inverting permutation is a permutation that is its own inverse: i = i. By definition, every permutation has a unique inverse, and the inverse of the inverse is the original permutation. Clearly, a permutation is a self-inverting permutation if and only if all its cycles are of length 1 or 2; hereafter, we shall denote a 2-cycle as c = (x, y) and an 1-cycle as c(x), or, equivalently, c = (x, x). Definition 2: Let C1,2={c1=(x1, y1), c2=(x2 ,y2),, ck=(xk ,yk)} be the set of all the cycles of a self-inverting permutation such that xi < yi (1<=i<=k), and let < be a linear order on C 1,2 such that ci< cj if xi< yi. A sequence C= (c1,c2,,ck ) of all the cycles of a self inverting permutation is called increasing cycle representation of if c1 < c2 < <ck. The cycle c1 is the minimum element of the sequence C. Let be a permutation on N=1,2,...,n. We say that an element i of the permutation dominates the element j if i>j and i-1< j-1. An element i directly dominates the element j if i dominates j and there exists no element k in such that i dominates k and k dominates j. Definition 3: An undirected graph G with vertices numbered from 1 to n; that is, V(G)=1,2,...,n, is called a permutation graph if there exists a permutation =(1, 2, 3, n) on Nn such that, (i,j)E(G)if and only if (i , j )( i-1< j-1)<0. A flow-graph is a directed graph F with an initial node s from which all other nodes are reachable. A directed graph G is strongly connected when there is a path xy for all nodes x,y in V(G). A node u is an entry for a sub graph H of the graph G when there is a path p=(y1,y2,,yk, x) such that pH=x. Definition 4: Reducible Flow Graph[16, 17] : A flow-graph is reducible when it does not have a strongly connected subgraph with two (or more) entries. A flow graph G reducible if and only if we can partition the edges into two disjoint groups, often called forward edges and back edges, with the following two properties: 1) The forward edges from an acyclic graph in which every node can be reached from the initial node of G. 2) The back edges consists only of edges whose heads dominates their tails.

ENCODING CONSTANTS IN WATERMARKING STRUCTURE A program consists of n number of functions or objects. These functions or objects are the real block of programs having some unique properties with respect to the program requirements. Constants are usual in most of the program code which actually decide the proper execution in most of the time. We use this property of constants in our new Watermarking technique. Suppose P is the program consists of n number of functions or objects and C1,C2,...,Cn are the constants collected. We select one constant from a function. If there is more than one constant, select one. If there is no constants add any constant. W is the value of the watermark to be encoded and B is the binary representation of W. Suppose p is the number of bits needed to represent W in binary. The or SIP have 2p+1 bits. Select 2p+1 constant from Program P. Suppose C = C1,C2,...Cm are the constants collected from the program P and W is the watermark value and k is the key used to extract the Watermark value. We also use the algorithm Encode W to SIP and Encode RP G from SIP proposed by Maria Chroni et al. for encoding the watermark value W to Self Inverting Permutation and Reducible Permutation Graph from Self Inverting Permutation. RP G 0 is the new form after encoding the constants into RP G. We embed the new graph RP G 0 into the program. The algorithm for encoding both constants C and Watermark W is explained in this section. Algorithm computing Constants from Program Suppose C consists of series or sequence of constants C1, C2, , Cm collected from the program P having n functions or objects (where m <= n). P is the input program. The algorithm will collect the sequence of constants C from the program P where m is the number of constants collected. Algorithm Encode Constants in RPG The Algorithm Encode C to RP G will encode the constants C into the Reducible Permutation Graph constructed from the Watermark value W. This algorithm do not add or delete any nodes or edges of graph, we propose a new method of encoding which do not affect the structure of the graph. F = f1,f2,,fk are the factors of Least Common Multiple of Constants C where Ci can be achieved from the product F . Initial node value of the Reducible Permutation Graph is S . Each node has two edges forward edge and a backward edge. Forward dge of the node can be travelled with rptr and backward edge can be achieved by lptr. Start is the pointer where the root node starts.

The algorithm finds a constant from one function, assign it into the constant sequence and jump to another function. It move to another function if there is no constant in a function. Hence we always get a sequence of constants C= C1, C2, , Cm . The algorithm computing C from P perform basic search operations on sequence of lengths O(n), mergesort time complexity is O(mlogm); where n is the number of functions in the program, m is the number of constants collected. Hence the time complexity is O(mlogm), since m <=n. The algorithm uses no additional space except of the constants. Hence the algorithm takes O(m) space.

The algorithm compute the factors of Least Common Multiples of C which is a general theorm. We compute the distance of Reducible Permutation Graph dis[] which is a simple graph traversal procedure. We use the algorithm mergesort to sort the distance (dis[]). Simple graph search algorithm will be done to place the factors in RP G by which we can produce the Ci in each node. Time Complexity: f1,f2,,fk are the factors of C where C=C1, C2, , Cm . The algorithm checks all the possible computation of F = f1,f2,,fk in order to generate C . The time complexity of this algorithm is O(mk). Algorithm Decode Constants From RPG We have a Reducible Permutation Graph, RP G consists of Watermark value W and an encoded set of constants C . The algorithm decode constants from RPG helps the program to supply constants C, that we encoded in the RPG. Each node has a left pointer (lptr) and right pointer (rptr). Using the input sequence, k and the algorithm locates

the value of the constant corresponding to RPG. A memory location which holds the starting location of the reducible permutation graph is represented by 'start'. k is the sequence from which the decoding start.

constants in the program either by manual cross check or with

This algorithm uses property of graph traversal algorithm hence if there are n number of nodes in the graph the maximum time taken to execute the algorithm is O(n). The algorithm does not take any extra space other than to store the graph of n nodes. So the space complexity is also O(n). Algorithm to Extract W from RPG We know that RPG is encoded graph of constants C, hence to produce the RP G is the primary objective to extract the watermark. We use the key value k to extract the watermark which is inserted by the owner to prove ownership mark. The inputs of the algorithm are RPGand W. Both RPG and RPGdo not have any structural difference other than the node values.

If n is the number of nodes in the reducible permutation graph RPG'. Then O(n) is the time complexity to replace n node values. Both the algorithm decode SIP from RPG and Decode W from SIP takes O(n) time and space respectively. Hence O(n) is the time complexity of above algorithm. IMPLEMENTATION AND RESULTS All software watermarking algorithms need an embedder to embed the watermark code in to the program structure and a recognizer to recognize the watermark that we encoded either in the program code or in a separate section other than the program code which helps the user to locate the watermark. We can find out or compute the number of

a separate program for locating the constants. The Reducible Permutation Graph have error correcting properties which means that small modification on the graph does not affect the graph from executing the watermark value. Hence their algorithm is more efficient when it compares with other graph based algorithms. Removal of a node or some edges would seriously affect extraction of watermark difficult. In the newly proposed algorithm we proposed a technique to encode both watermark value and constants in the program into graph called reducible permutation graph. Even though we encode constants and watermark value into a single graph, the property of the reducible graph does not change. Because we do not make any modification on the graph structure but only the node values changed. Next we would like check the correctness of the algorithm. We have seen that the simply collects the constants from the program. Then Reducible Permutation graph is generated with a watermark value the correctness of the algorithm is already proven. We compute the factors of the least common multiples of the constant sequence which is a simple arithmetic computation done to compute the algorithm. There are algorithms which helps us to find the least common factors of constants which is a simple computation. These factors are supplied to the Reducible permutation graph. A simple replacement procedure is carried out here to replace the node values. Which is also a graph traversal algorithm. Hence the algorithm always produce a RPG' which is an encoded form of constants and watermark value. CONCLUSION We succeeded in developing the method of encoding a sequence of numbers in an encoded watermarking graph. We achieved double encoding of numbers. This is an entirely new concept in software watermarking. Our algorithm's e efficiency depends on the graph we select to encode multiple sequences of values. The algorithm extract the constants without harming the graph structure which is very simple

compared to other methods. But we can use the similar technique used in the extraction of watermark value in extracting the constant. Then the algorithm produces more stealth we leave it as a problem to investigate in future, we also leave as problem that to and a graph where we can do double encoding of sequence of values in the graph structure. REFERENCES [1] Business Software Alliance. Sixth annual BSA and IDC global software piracy study. Technical Report, Business software Alliance, 2008. [2] Christian Collberg and Clark Thomborson. Software watermarking: Models and Dynamic embeddings. In principles of Programming Languages 1999, POPL'99, January 1999. [3] William Feng Zhu. Concepts and Technologies in Software Watermarking and Obfuscation. PhD Thesis, The University of Auckland, 2007. [4] Christain Collberg, Stephen Kobourov, Edward Carter, and Clark Thomborson. ErrorCorrecting graphs for software watermarking. In proceedings of the 29th Workshop on Graph Theoretic Concepts in Computer Science, Pages 156-167, 2003. [5] X. Chen, D. Fang, J. Shen, F. Chen, W. Wang, L. He, A Dynamic Graph Watermark Scheme of Tamper Resistance , Fifth International Conference on Information Assurance and Security, IEEE Computer Society, ISBN 978-0-7695-3744-3, Pages 3-6, 2009. [6] William Zhu and Clark Thomborson. Extraction in software watermarking. In Sviatoslav Voloshynovskiy, Jana Dittmann, and Jessica J, pages 175-181. ACM, 2006. ISBN 159593. [7] S. Jamal, H. Zaidi and Hongxia Wang. On the Analysis of Software Watermarking, 2nd International Conference on Software Technologies and Engineering, IEEE, ISBN 978-14244-8666-3, pages VI 26-VI 30, 2010. [8] L. Zhang, Y. Yang, X. Niu, and S. Niu, A Survey of software Watermarking, Journal of Software, volume 14, pages 268-277, 2003. [9] James Hamilton, Sebastian Danicic. An Evaluation of the Resilience of Static Java Bytecode Watermarks Against Distortive Attacks, IAENG International Journal of Computer Science, 2011.

[10] C. Collberg, A. Huntwork, E. Carter, and G. Townsend. Graph Theoretic software watermarks: Implementation, analysis, and attacks. In workshop on Information Hiding, 2004. [11] Ramarathnam Venkatesan and Vijay Vazirani. Technique for producing through watermarking highly tamper-resistant executable code and resulting watermarked code so formed , May 2006. Microsoft Corporation, US Patent: 70521208. [12] Ramarathnam Venkatesan, Vijay Vazirani and Saurabh Sinha. A graph theoretic approach to software watermarking. Inproceedings of the 4th International Workshop on Information Hiding, 2001. [13] Robert Davidson and Nathan Myhrvold. Method and system for generating and auditing a signature for a computer program, June 1996. Microsoft Corporation, US Patent 5559884. [14] Maria Chroni and Stavros D. Nikolopoulos. Encoding watermark integers as self-inverting permutations. In proceedings of the 11th International Conference on Computer Systems and Technologies and Workshop for PhD Students, pages 125-130, Sofia, Bulgaria, 2010. ACM. ISBN 978-14503-0243-2. [15] Maria Chroni and Stavros D. Nikolopoulos. Efficient Encoding of Watermark Numbers as Reducible Permutation Graphs. In proceedings of the 10th International Conference on Computer Systems and Technologies and Workshop for PhD Students, 25-130, Sofia, Bulgaria, 2009. [16] Maria Chroni and Stavros D. Nikolopoulos. Efficient encoding of Watermark numbers as Cographs using Self inverting permutations. In proceedings of the 12th International Conference on Computer Systems and Technologies and Workshop for PhD Students, pages 142148, Sofia, Bulgaria, 2011. ACM ICPS 578, 2011. [17] Maria Chroni and S.D. Nikolopoulos. An Embedding graph based model for software watermarking, 8th International Conference on Information Hiding and Multimedia Signal Processing, IEEE proceedings, 2012.

You might also like