You are on page 1of 6

A NEURAL NETWORK BASED ADAPTIVE IMAGE RETRIEVAL SYSTEM WITH RELEVANCE FEEDBACK AND CLUSTERING

Ankit Nagpal, Tanveer Siddiqui (Indian Institute of Information Technology, Jhalwa, Allahabad, India) ankit.nagpal@is.iiita.ac.in, tanveer@iiita.ac.in Abstract
The motivation for this work is to develop an adaptive image retrieval system with an innovative approach to use artificial neural networks and clustering techniques to retrieve images similar to the input image. The system is made intelligent by making the system learn the users preference as feedback. This paper also explains the use of relevance feedback to improve the accuracy of the system by analyzing users relevance feedback for each retrieved image while neural network and clustering techniques are simultaneously used to reduce the time complexity of the system. The system uses three-layered neural network to train the system using image clusters as training dataset by a supervised approach. Also taking the feedback from the user the image clusters are reclustered by shuffling the images after each retrieval. Given a user query as an image, the neural system retrieves similar images by computing similarities with images in the given image clusters. To provide preference, from all the retrieved images user selects an image as relevant one and all other are hence treated as irrelevant ones. So, the rank of the selected image is increased while the ranks of other images are decreased. With this feedback, the system refinement method estimates global approximations and adjusts the similarity probabilities Key words: RGB. similarity measure are in first place while those with low similarity measure in the last place. The similarity is calculated by a heuristic algorithm whose inputs are probability of neural network and the relevance value of the image. And the output of the algorithm is the similarity of the image with the input image or if an image cluster is selected then the output is the rank of the all the images in the cluster. The system is adaptive as it adjusts according to the users feedback. The system also adds the input image by the user to the image training dataset in the relevant cluster and trains the neural network for that image also such that the users input image becomes the part of the training image dataset and can be retrieved at the next query and thus increasing the accuracy of the system. This innovative feature of the system makes it adaptive to the user queries and increases the accuracy. The system works in two stages: training and testing. The training consists of features extraction of the all images in the training image dataset and then separate neural network is trained for each cluster of the image database. In this process features of all the images of each cluster are extracted and then input to train a separate neural network for each cluster. The image database consists of several clusters of images where each cluster contains images of a particular type. In our training dataset we have taken 10 different clusters of images e.g. Cars, Buildings, Flowers, Weapons etc. Each cluster contains around 50 training images whose features are used to train a separate neural network for each cluster. Content of an image can be expressed in terms of different features such as color, texture, shape etc. Retrieval based on these features varies by the way how the features are extracted and used. Since features in color, texture, and shape are extracted using different computation methods, different features may give different similarity measurements. In our system we have used following spatial features of an image: Entropy, Homogeneity, Energy, Correlation, Contrast. These features are extracted from the image and then input to the neural network for training. A detailed explanation of these features is given in section 2.1. A neural network for each cluster of images is created and trained using the features extracted. The neural network has 3 layers: 2 hidden layers for non-linear learning of neural network using back-propagation and 1

1. Introduction
In this paper, we present a neural network [1] based adaptive image retrieval [2] system with relevance feedback [3] and image clustering [4]. The system takes an image as an input by the user, extracts features [5] from the image, measures similarity of the image with existing clusters of images using neural networks and ranks the retrieved images using relevance feedback. The user can initially either input an image to retrieve similar image or can select an image cluster from given set of image clusters to start with. If the user inputs an image then the system retrieves the most similar images from the image database or if user starts with selecting an image cluster then all the images belonging to that cluster are retrieved with relevance of the images in decreasing order, that is images of the same cluster with high

output layer for output. The input layer contains 5 neurons to input the 5 spatial features of the each image of a cluster. The output layer contains 1 neuron in output layer such the output should be 1 if the image contains the object else output should be 0 if the image does not contains the object. Detailed architecture and working of neural network training is discussed in section 2.2. In testing Process user is asked to input an image and then features are extracted from the image and input to all the trained neural network of each cluster and if output of one or more than one neural network is greater than a certain threshold then all the most relevant images of those image clusters are shown to the user as the most similar images to the input image. Now if the user clicks on any of the retrieved image then this information is used as relevance feedback and is used to increase the relevance of the clicked image. Hence the results are optimized and accuracy is increased. The input image is also stored in the relevant cluster and the cluster is retrained with the new image and its relevance value is initialized to a default value such that it is retrieved in the subsequent queries with lowest rank or similarity measure. The similarity is calculated using a heuristic algorithm whose inputs are probability of neural network and the relevance value the image and the output of the algorithm is the similarity of the image. Detailed execution and calculations of the algorithm is discussed in section 2.4. The accuracy of the system is more than 98% with 10 clusters containing around 50 images in each cluster and tested with 90,000 user clicks as the relevance feedback. These hits were simulated by the system for testing. Detailed architecture and working of the system is discussed in section 2. Section 3 contains the results obtained. Section 4 briefly discusses directions for future research, conclusions and some techniques that can be used to make the system more efficient.

extracted again and are input to the neural network. After the neural network is trained, the system is tested to classify the user input image to one or more clusters. Also when all the relevant images are retrieved, the user can click on any of the retrieved image which means that the clicked image is the most similar to the input image. So the relevance of that image is increased for that cluster while all other retrieved images relevance is reduced as those images were not clicked by the user. These relevance values and neural network probability are input to a heuristic function which gives the overall similarity measure of the image. The user image is also stored that cluster whose similarity measure is the highest for the input image. So, the neural network of that cluster needs to be trained again to retrieve the new stored image on subsequent queries. And this cycle of calculating the similarity measure based on relevance feedback, storing the user query image in the appropriate cluster on each user query then re-clustering and training the neural network of the cluster continues for each user query. This continues for each query by the user. This innovative feature improves the accuracy of the system. Detailed architecture of the system is explained in sec 2 below. 2.1 Stage One: Spatial Features Extraction The first stage of the system is image processing and feature extraction. With initial clusters of images the feature extraction process is applied to each image, but before feature extraction all the images are resized to image of dimension 256 X 256 that is 256 rows and 256 columns. This is necessary as the dimensions of images in a cluster may vary. So, all the images of a cluster are resized to 256 X 256 and converted from RGB [6] to Gray level image. Now the Correlation, Energy, Contrast, Homogeneity, Entropy features are extracted from the image. First a gray-level co-occurrence matrix from image matrix is created by calculating how often a pixel with gray-level (grayscale intensity) value i occurs horizontally adjacent to a pixel with the value j. Now all the above mentioned features can be extracted from the co-occurrence matrix. Correlation is a measure of how correlated a pixel is to its neighbor over the whole image and it is defined as:

2. Description of the System


The system operates in two stages: Training and Testing. Initially the system contains the folders of training images like Cars, Buildings, Flowers, Food, Weapons etc., each containing images belonging their clusters. E.g. Flower folder will contain images of flowers only. Same is the true for all other fodders. So these folders are the cluster of images used for training. In training the system is trained by using neural networks. A separate neural network is trained for each cluster of images such that the output of the neural network while training should be 1 for all the images of that particular folder. The inputs to the neural network are the features of the images. The features are extracted from the images before training of the neural network and are stored in a file for later use. So, the features need not be

i, j

(i i )( j j ) p(i, j )
i j

Energy is the sum of squared elements in the cooccurrence matrix and it is defined as:

p(i, j )
i, j

Contrast is a measure of the intensity contrast between a pixel and its neighbor over the whole image and it is defined as:

i
i, j

j p (i , j )
2

is noticed that time required to train the network will also increase slightly. This conclusion support the technique to store the user input image also in the relevant image cluster as it will increase the accuracy while sacrificing slight time complexity. Percentage of false-detections is also reduced by increase in number of training images used. 2.3 Stage Three: Testing using image as a query by User The testing starts when user inputs a test image to retrieve similar images or when user selects an image cluster from a given list of image category list. When user inputs an image, the image is input to all the neural networks representing different image cluster. If the probability of input image for one or more neural networks is more than a particular threshold which is 0.6, then the image belongs to those image clusters. It is possible for an image to be in more than one image cluster if the input image contains more than one object. For e.g. If an image contains Car, and Building both then it belongs to both categories. So, images from both the clusters should be retrieved with rank according to the output of the heuristic algorithm which adds the neural network probability and the relevance value to determine the rank of the retrieved images. If the user clicks on one of the retrieved image then relevance measure of the retrieved image is increased by a value 0.1 while relevance measure of those images which are not clicked is decreased by 0.01. Also the input image by the user is stored in those relevant clusters whose probabilities are more than the threshold 0.6. The new image is stored in the training dataset. The image is indexed using heuristic algorithm, its features are extracted and the network is retrained using the features of new image also. These steps are repeated at each query. If the user inputs an image:

Homogeneity is a measure the closeness of the distribution of elements in the co-occurrence matrix to the co-occurrence matrix diagonal and it is defined as:

p (i, j ) 1 + [i j ] i, j
Entropy is a statistical measure of randomness that can be used to characterize the texture of the input image and it is defined as

sum( p log( p ))

And all these features are normalized to a range of (01) for input to the neural network. These features are also stored in a file for later use. 2.2 Stage Two: Neural Network Training The features extracted are to be input to the neural network. The neural network architecture [2] consists of 5 neurons in the input layer for the 5 features, 5 and 3 neurons in the 1st and 2nd hidden layers respectively and 1 neuron in output layer whose output will be 1 if the image belongs to a cluster or 0 if does not belong to a cluster. The neural network is trained using back-propagation algorithm and Tan-Sigmoid Transfer Function shown as below:

Fig. 1 Tan-Sigmoid Function

tansig (n ) =

2 -1 1 + e -2n

It takes about 300 epochs to train each neural network belonging to different image clusters. Increase in the number of images in each image cluster in training does not cause much increase in the number of epochs but time required in each epoch is slightly increased. The results improve if number of training images are increased but it

Fig. 2: a test image

The result after testing for all the neural networks on test image is:

Cluster Name Neural Network probabili ty Relevanc e Total

Building

Car

Flower

Food

Space

Weap on

0.9

0.7

0.1

0.2

0.3

0.16

0.1 1

0.1 0.8

0 0.1

0 0.2

0 0.3

0 0.16

Fig. 3: neural network probability of fig 2 Only Flowers is the only image cluster whose neural network probability for the test image is greater than the threshold 0.6. Hence image belongs to the Flowers cluster and images from the Flowers will be retrieved as the result. And if the user clicks on any of the retrieved images then the relevance of that image will be increased by a value 0.1 while for all those images that were not clicked their relevance will be reduced by a value 0.01. Also this input image will be stored in the Flowers cluster with a default relevance value 0.1 and the neural network for Flowers will be re-trained with this image to retrieve this image in the next query for Flowers. 2.4 Stage Four: Heuristic Algorithm for similarity measurement This is main underlying algorithm which decides the actual relevance of an image with all the clusters. It decides which image cluster is most relevant to the input image. It also manipulates the relevance measure of the image after each query. When an images relevance drops to less than 0 then it also deletes that image file from the data-set. So, this algorithm also maintains the integrity of the dataset When a new image is input to the system then first it creates the index of the input image in the form an array which contains the relevance of each cluster and the probability of each neural network. An example index of a new image calculated by the algorithm is given below. Each column represents an image cluster while rows 1 and 2 represent the neural network probability and Relevance score respectively

Each column represents an image cluster. Rows 1 and 2 represent the neural network probability and Relevance score respectively for each image cluster. For a new image the relevance score is set to 0.1 initially for those clusters whose probabilities of neural network comes out to be after testing is more than the threshold 0.6 and for all other clusters the relevance is set to 0. The images from those relevant clusters are retrieved whose sum of both values is more than the final threshold 1.0 and the users input image is stored in the dataset with its index. The neural network of those relevant clusters will be trained again with this new image also. Also if the user clicks on any of the retrieved image then the relevance of that image is increased by 0.1 and the relevance of those images that are not clicked is decreased by 0.001. And this process continues on each query. For e.g. if the index of an image which is already stored in the data-set is:
Cluster Name Neural Network probabili ty Relevanc e Total Building Car Flower Food Space Weap on

0.2

0.9

0.1

0.2

0.3

0.16

0.1 0.3

0.3 1.2

0 0.1

0 0.2

0 0.3

0 0.16

And if on a query this image is retrieved and clicked on first retrieval then its relevance value for that cluster will increase by 0.1 as:
Cluster Name Neural Network probabili ty Relevanc e Total Building Car Flower Foo d Space Weap on

0.2

0.9

0.1

0.2

0.3

0.16

0.1 0.3

0.4 1.3

0 0.1

0 0.2

0 0.3

0 0.16

And if the image is again retrieved on another quey but is not clicked on second retrieval then its relevance value for that cluster will decrease by 0.01 as:
Cluster Name Neural Network probabili ty Relevanc e Total Building Car Flo wer Food Space Weap on

Car

Weapon

0.2

0.9

0.1

0.2

0.3

0.16 Building Food

0.1 0.3

0.39 1.29

0 0.1

0 0.2

0 0.3

0 0.16

If the sum of both values of an image falls below 1.0 for all the image clusters, it is termed as irrelevant to any image cluster and hence deleted from the dataset. And the network again is trained without that image. Hence the irrelevant images are deleted continuously by the system. The image can be retrieved only for those clusters whose sum of values is greater than 1.0 in the index of that image. E.g.
Cluster Name Neural Network probabili ty Relevanc e Total Building Car Flower Food Space Weap on

Fig. 4: some training and test images Probabilities of Neural network on a query image:

0.22

0.7

0.9

0.2

0.3

0.16

0.1 0.32

0.5 1.2

0.4 1.3

0 0.2

0 0.3

0 0.16

Here the total of both values of an image is greater than 1.0 for Cars and Flowers clusters that is for 2 image clusters then the image will be retrieved for a query relevant to any of the two clusters. Hence the values can increase or decrease with the feedback from the user. Thus the accuracy increases with the number of clicks.

Fig. 5: output probabilities of each neural network A snapshot of GUI of the system:

3. Results
Over a total of 500 images distributed as test-set of 50 images in each of the 10 image cluster, the efficiency of the system is over 98% when simple images containing single object are used for testing. Also the number of false-positives is less than 0.1%. The efficiency reduces to 70% when images containing two or more objects are used in testing. But the efficiency keeps on improving with the increased number of user clicks with slight increase in time complexity also because all the query images are added to the test data-set also. Hence this approach proves to be very efficient. Some of the training and testing images: Fig. 6: initial screen of the system

A snapshot of images retrieved:

ICCVG 2004, Warsaw, Poland, September 2004, Proceedings [5] Adam Kuffner1 and Antonio Robles-Kelly, Image Feature Evaluation for Contents-based Image Retrieval, Department of Theoretical Physics, Australian National University, Canberra, Australia [9] Zhou, X. S. and Huang, T. S. 2003. Relevance feedback in image retrieval: A comprehensive review. Multimedia Syst. 8, 6, 536544 [6] B. S. Manjunath , W. Y. Ma, Texture Features for Browsing and Retrieval of Image Data, IEEE Transactions on Pattern Analysis and Machine Intelligence, v.18 n.8, p.837-842, August 1996

Fig. 7: screen showing the retrieved similar images

4. Conclusions and future work


The system has been tested on a wide variety of images, with many clusters and many images in a cluster giving very high accuracy. Still there are a number of directions for future work. The main suggestion is to use wavelets to extract high-level features. This may slow down the execution of the system but accuracy will increase. The system is a bit slow because of training involved after each query. Other neural network architecture and different training algorithms can be also experimented with the system.

[7] Tieu, K. and Viola, P. 2000. Boosting image retrieval. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (Hilton Head, SC). 228235 [8] C. Nastar , M. Mitschke , C. Meilhac, Efficient Query Refinement for Image Retrieval, Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, p.547, June 23-25, 1998 [9] Muwei Jian, Junyu Dong, Ruichun Tang, "Combining Color, Texture and Region with Objects of User's Interest for Content-Based Image Retrieval," snpd, pp. 764769, Eighth ACIS International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing (SNPD 2007), 2007

REFERENCES
[1] A. C. Gonzalez-Garcia, J. H. Sossa-Azuela, E. M. Felipe-Riveron, "Image Retrieval based on Wavelet Computation and Neural Network Classification," wiamis, p. 44, Eight International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS '07), 2007 [2] Muneesawang, P.; Guan, L., A neural network approach for learning image similarity in adaptive CBIR, 2001 IEEE Fourth Workshop on Multimedia Signal Processing, Volume , Issue , 2001 [3] Rui, Y., Huang, T. S., Ortega, M., and Mehrotra, S. 1998. Relevance feedback: A power tool for interactive content-based image retrieval. IEEE Trans. Circuits Syst. Video Technol. 8, 5, 644655 [4] K. Wojciechowski, B. Smolka, H. Palus, R.S. Kozera, W. Skarbek and L. Noakes, Clustering method for fast content based image retrieval, International Conference,

You might also like