You are on page 1of 4

NIR: Content Based Image Retrieval on Cloud Computing

Zhuo YANG , Sei-ichiro KAMATA


Graduate School of Info., Pro.&Sys. Waseda University 2-7 Hibikino, Wakamatsu-ku, Kitakyushu, Japan joel@ruri.waseda.jp, kam@waseda.jp

Alireza AHRARY
IPSRC Waseda University 2-1 Hibikino, Wakamatsu-ku, Kitakyushu, Japan ahrary@ieee.org

Abstract NIR is an open source cloud computing enabled content based image retrieval system. With the development and popularization of cloud computing, more and more researchers from different research areas do research with the help of cloud computing. Nowadays content based image retrieval as one of the challenging and emerging technologies is high computation task because of the algorithm computation complexity and big amount of data. As based on cloud computing infrastructure, NIR is easy to extent and flexible for deployment. As an open source project, NIR can be improved on demand and integrated to other existing systems. This paper presents our ideas, findings, design and the work of NIR. system from Keywords-component; content based image retrieval; cloud computing; open source

research [14][15][16][17]. How to use and integrate cloud computing to research is vital. The paper is organized as follows: previous works are discussed in section II. In section III, we explain our ideas and the architecture of NIR system. Section IV shows the demo and experiment results that show our system benefit from cloud computing. Section V makes the conclusions. II. PREVIOUS WORKS

I.

INTRODUCTION

Content based image retrieval, also commonly known as image search engine, one of the emerging technologies, attracts more and more people from different fields, such as computer vision, information retrieval, database systems, machine learning. Especially nowadays image and video information become extremely popular with Flickr, which hosts hundreds of millions of pictures with diverse content, and YouTube, that has also brought in a new revolution in multimedia usage. Since 1990s for finding and retrieving images based on image content, CBIR system is under active research and development [1][2]. With the progress of research and development, new problems are met and new methods are proposed [3]. To provide accurate and fast image retrieval system, there are two problems need to solve: x x Semantic gap between low-level visual content and higher-level concepts. Computation time of image analysis, image indexing, image searching and machine learning algorithms

This paper focuses on the second problem and discuss our ideas, findings, design and the system that benefit from cloud computing. Cloud computing, due to its high performance and flexibility, is under high attention around the industry and

A. Content Based Image Retrieval Semantic gap between low-level visual content and highlevel concepts is one problem that content based image retrieval system need to solve as mentioned in last section. How to reduce the semantic gap is key to provide accurate image retrieval. Many ideas are proposed. Color histograms in RGB and other color spaces [4], MPEG-7 standard based descriptors about scalable color, color layout and edge histograms [5][10], auto color correlation feature [8], color and edge directivity descriptor [7], Tamura texture features coarseness, contrast and directionality [9], Fuzzy color and texture histogram[6] and other descriptors. Different descriptors have different features. With prior knowledge and requirement of application, descriptors are chosen, improved, combined or tuned by the parameters. Within many algorithm implementations, one of them need to mention is LIRe library [11][12] which is part of the Caliph & Emir project. It has been first released in Feb. 2006 under GPL license. With 3 years more development in open source community, it has covered most main stream algorithms including those mentioned in last paragraph. LIRe extracts image features from images and stores them in a Lucene [13] index for future retrieval. By implementing the DocumentBuilder interface, it can be easily extended to other descriptors based on the requirements. Because of high quality and significant potential impact, Caliph & Emir project won the ACM Multimedia Open Source Software Competition 2009. But there are still problems for LIRe library since LIRe

_____________________________

978-1-4244-4738-1/09/$25.00 2009 IEEE

suffers from computation time while image data increase. We start up NIR project to solve these problems. B. Cloud computing Generally speaking, the premise of cloud computing is to provide computational resources in a shared infrastructures using lower computing costs, both in time and money[14]. Academia begins to research with cloud computing for its conveniences and capability [15][16]. Many cloud models are proposed. MapReduce is a method that widely used in cloud computing [17][19][20].

Nutch [26][27] is an open source search engine framework with well designed plugin system. It is based on Hadoop and can be easily extended with sharing the flexibility of cloud computing. IBM researchers prove its high scalability and performance [28]. Because of the similarity between traditional text search engine and content based image retrieval system, NIR keep similar component of Nutch and extends it in key components. Just as the blue parts shown in figure 2, NIR redesign following three components: image fetching, image indexing and image searching. We discuss the detail in following section. III. x NIR SYSTEM ARCHITECTURE

Image different from text in following three points: Image data is larger than text. As image contains colorful world and lots of data included. Storage need for image is much larger than ordinary text. Cloud computing philosophy is that moving computing is cheaper than moving storage. So for each node in the cloud just storage, index and search its own data on disk. Image indexing is complicated than text. There is still a long way to go for image analysis and understanding. Text data is easier to understand inherently. For image indexing, different descriptors can be applied. Image searching criterion is harder to judge than text. Information retrieval from huge data set need criterion. Similarity is calculated between the images.

Figure 1. MapReduce

MapReduce model consists of two steps, map and reduce. Mappers accept the incoming pairs, and map them into intermediate key/value pairs. After the workers finish analyzing the input data, they pass them to reducers. Hadoop [18] is the most reputable open source implementation of MapReduce which is sponsored by Yahoo and is widely used by Yahoo, Amazon, Baidu and a number of companies. Hadoop is java based and can run on most platforms. Many machine learning algorithms such as Bayesian, K-Means, PCA and etc are implemented on Hadoop platform that proves its high stability and usability [21][22]. In next section we discuss how to do MapReduce for content based image retrieval with hadoop. C. Nutch Content based image retrieval is similar with traditional text search engine, like Google.

Because of previous three reasons, following three parts discuss them separately. A. Image Feteching By default, Nutch does not fetch, index or search images. For content based image retrieval, we need firstly fetch images. There are three steps to support image fetching: accept image suffix files, image parser, and thumbnail generation. Files like html, txt, pdf, Microsoft Word are fetched by Nutch as it can parse those file format. NIR can handle images by modify crawl-urlfilter.txt under conf folder, include all image suffix files like : gif|GIF|jpg|JPG|png|PNG. For each type of files, there need a parser for that type.

ImageParse class will be registered to the NIR system.


<mimeType name="image/jpeg"> <plugin id="parse-image" /> </mimeType>

As mentioned Nutch has plugin system. By add those configurations, NIR handles gif,jpg,png related files using ImageParse parser. Thumbnail is very important in image retrieval system when retrieval results back just like figure 5 shows. After ImageParse saves the image to local disk, it also generates a thumbnail for each image.
Figure 2. NIR and Nutch

B. Image Indexing Indexing is done while calling ImageIndexer class. It uses MapReduce to index all the images that have been fetched.

correlogram image searcher, CEDD image searcher, Tamura image searcher, FCTH image searcher and weighted searcher that used to combine several searchers results by adding different weight value. Developers can easily customize similarity comparison algorithm by extending AbstractImageSearcher. IV. NIR SYSTEM DEMO

Web based user interface is also included in NRI system. The following demo shows usage of NRI system. Figure 5 shows the clean and beautiful initial page of NRI. Several images are randomly displayed in the initial page. User can choose to upload an image or choose one of the randomly displayed images by clicking. For example after clicking Christmas present, related images are retrieved by NRI, shown in Figure 6.
Figure 3. MapReduce image indexing

Mappers in the cloud calculate the feature vectors of each image based on the descriptor algorithms. Intermediate data is in key-value format. Key is the md5 signature of the image file based on URL that can uniquely represent the image. Bins are the feature vectors. For example, for example: URL: http://uwall.net/holiday/Christmas_night_1600_01/s/ uwall.net_Christmas_wallpaper_christmas_Night_view_101s.jpg Key: 4e318ae26d9541f422413d867eb274cd Value: RGB 256 17 0 2 2 2 3 3 4 8 8 7 10 14 32 C. Image Searching Image searching is triggered while user clicks an image that randomly displayed on the initial page or uploads an image. Feature vectors are computed firstly user uploads an image.

Figure 5.

NIR image retrieval initial page

Figure 4.

MapReduce image searching Figure 6. NIR search results page

Feature vectors are passed to Mappers as parameter. Mappers in cloud compute similarity between each image and search image using bins that have been indexed and saved in the Lucene index. Intermediate data is also in key-value format. Key represents the images that have been indexed. Value is the similarity between the image has been indexed and the search image. Reducers collect all the result and sort then. Threshold is set to filter search result.

ImageSearcherFactory class provides several similarity comparison schemes, including color histogram searcher,

Experiments were conducted to compare the indexing time and searching time of NIR system on 1 node, 2 nodes and 4 nodes cloud. 1770 images test data are fetched from internet. For indexing, color histogram, auto color correlation, CEDD, Tamura and FCTH are used to index data. For searching, time to retrieve one image is computed. PC (Intel Core2 Quad; 2.66 GHz) with Windows XP and Sun Java 1.6 u13 were used in the experiments.

TABLE I.

NRI SYSTEM INDEXING AND SEARCHING RESULTS Cloud Running Time (s)
1 node 2 nodes 4 nodes

[7] 5.19 15.09 15.64 128.86 17.66 0.39 [10] [9] [8]

Color histogram Auto color correlation CEDD Tamura FCTH Searching

11.64 36.25 26.92 476.47 39.23 0.56

7.45 21.82 19.52 241.77 23.19 0.45

Table 1 gives an overview of the cloud running time decrease with number of cloud nodes increases, especially for high computation algorithm like Tamura, the running time decrease speed is almost linearly as nodes number increases. V. CONCLUSION

[11]

In summary, this paper presents our ideas and findings from designing and implementation the first open source cloud computing enabled content based image retrieval system NIR. We hope our pioneer work can help other researchers do further research on clouding computing enabled image retrieval and finally manage and retrieve all the images around the world. ACKNOWLEDGMENT We would like to thank numerous people providing their code for research especially thanks to Lucene, Nutch and Lire community. NIR is available online. Please check latest updates from http://sourceforge.net/projects/nir .This work was supported in part by Grant-in-Aid (No.21500181) for Scientific Research by the Ministry of Education, Science and Culture of Japan. REFERENCES
[1] Y. Rui and T. S.Huang Image Retrieval: Current Techniques, Promising Directions, and Open Issues , Journal of Visual Communication and Image Representation, Vol.10 , 39-62 Mar 1999 A. W.M. Smeulders, M. Worring, S. Santini, A. Gupta and R. Jain Content-Based Image Retrieval at the End of the Early Years, IEEE Trans. Pattern Analysis and Machine Intelligence, Vol.22 No.12, 13491380 , Dec 2000 R. Datta, D. Joshi, J. Li and J. Z.Wang Image Retrieval: Ideas, Influences, and Trends of the New Age, ACM Computing Surveys, Vol.40, No.2, Article 5 , 2008 M. J. Swain and D. H. Ballard, "Color indexing," Int. J.Comput. Vision 7(1), 11-32 1991 S.-F. Chang, T. Sikora, and A. Puri. Overview of the mpeg-7 standard. IEEE Transactions on Circuits and Systems for Video Technology, 11(6):688695, June 2001 S. A. Chatzichristofis and Y. S. Boutalis. Fcth: Fuzzy color and texture histogram a low level feature for accurate image retrieval. In Proceedings of the 9th International Workshop on Image Analysis for

[12] [13] [14] [15] [16] [17] [18] [19]

[20]

[21]

[22] [23]

[2]

[24]

[3]

[25]

[4] [5]

[26] [27] [28]

[6]

[29]

Multimedia Interactive Services, WIAMIS 2008, pages 191196, Klagenfurt, Austria, May 2008 S. A. Chatzichristofis and Y. S. Boutalis, "Cedd: Color and edge directivity descriptor. a compact descriptor for image indexing and retrieval," Proceedings of the 6th International Conference on Computer Vision Systems, ICVS 2008, volume 5008 of LNCS, pages 312322, Santorini, Greece, May 2008 J. Huang, S. R. Kumar, M. Mitra, W.-J. Zhu, and R. Zabih. Image indexing using color correlograms. In Proceedings of the 1997 Conference on Computer Vision and Pattern Recognition, CVPR 97, volume 00, pages 762768, San Juan, Puerto Rico, June 1997 H. Tamura, S. Mori, and T. Yamawaki. Textural features corresponding to visual perception. IEEE Transactions on Systems, Man, and Cybernetics,8(6):460472, June 1978 M. Lux, Revisiting the Vector Retrieval Model in Context of the MPEG-7 Semantic Description Scheme, Proceedings of the 2008 Ninth International Workshop on Image Analysis for Multimedia Interactive Services, pages 134-138, 2008 M. Lux and S. A. Chatzichristofis. LIRe: Lucene Image Retrieval - An Extensible Java CBIR Library, Proceeding of the 16th ACM international conference on Multimedia, pages 1085-1088 ,2008 LIRe library, http://www.semanticmetadata.net/lire/ Lucene project, http://lucene.apache.org/ Douglis, F. Staring at Clouds, Internet Computing, IEEE, 13(3):4-6, May-June 2009 Gorder and P. Frost. Coming Soon: Research in a Cloud, Computing in Science & Engineering, 10(6):6-10 , Nov.-Dec. 2008 Mika P. and Tummarello, G. Web Semantics in the Clouds, Intelligent Systems, IEEE, 23(5):82-87 , Sept.-Oct. 2008 J. Dean and S. Ghemawat, MapReduce: simplified data processing on large clusters, Communications of the ACM, 51(1): 107-113, Jan. 2008 Hadoop project, http://hadoop.apache.org/ J. Dean, Experiences with MapReduce, an abstraction for large-scale computation, Proceedings of the 15th international conference on Parallel architectures and compilation, pages 1-1, 2006 B. He, W. Fang, Q. Luo, N. K. Govindaraju, T. Wang, Mars: a MapReduce framework on graphics processors, Proceedings of the 17th international conference on Parallel architectures and compilation, pages 260-269, 2008 C-T. Chu, S. Kyun Kim,Y-A Lin, Y. yuan Yu, G. Bradski and A. Y. Ng, Map-Reduce for Machine Learning on Multicore, http://www.cs.stanford.edu/people/ang/papers/nips06mapreducemulticore.pdf Mahout project, http://lucene.apache.org/mahout/ M. M. Rafique, B. Rose, A. R. Butt, D. S. Nikolopoulos, Supporting MapReduce on large-scale asymmetric multi-core clusters, ACM SIGOPS Operating Systems Review, 43(2):25-34, Apr. 2009 T. Sandholm and K. Lai, MapReduce optimization using regulated dynamic prioritization, Joint International Conference on Measurement and Modeling of Computer Systems archive Proceedings of the eleventh international joint conference on Measurement and modeling of computer systems, pages 299-310 , 2009 S. Westman, A. Lustila and P. Oittinen, Search strategies in multimodal image retrieval, Proceedings of the second international symposium on Information interaction, pages 13-20, 2008 Nutch project, http://lucene.apache.org/nutch/ M. Cafarella and D. Cutting, Building Nutch: Open Source Search, Search Engines, 2(2):54-61, Apr. 2004 J. E. Moreira, M. M. Michael, D. D. Silva, D. Shiloach, P. Dube and L. Zhang, Scalability of the Nutch search engine, Proceedings of the 21st annual international conference on Supercomputing, pages 3-12, 2007 NIR project, https://sourceforge.net/projects/nir

You might also like