Inter-Chunk Popularity-Based Edge-First Caching in Content-Centric Networking Sung-Hwa Lim, Young-Bae Ko, Member, IEEE, Gue-Hwan Jung, Jaehoon Kim, and Myeong-Wuk Jang AbstractContent-centric networking (CCN) is considered promising for the efcient support of ever-increasing streaming multimedia services. Inter-chunk popularity-based caching is one of the key requirements in CCNmultimedia services because some chunks of a content le tend to be requested more frequently than others. For multimedia contents, forepart chunks often have higher popularity than others as users may interrupt and abort before nishing its service. This paper presents a novel cache replication scheme, which places more popular chunks ahead on the edge router, and establishes a cache pipelining on the relaying routers along the path to reduce user-perceived delay. Simulation results show that the proposed scheme incurs less delay and reduces the overall redundant network trafc while guaranteeing a higher cache hit ratio. Index TermsContent centric networking, cache replication, streaming services. I. INTRODUCTION R ECENT trends in multimedia services have seen vast in- crease in the popularity of streaming video/audio and VoD (Video on Demand) services. These services tend to deliver a large amount of data which may incur high user delay due to congestion, leading to poor user satisfaction [14]. In order to provide faster content delivery and reduce redundant network trafc, the Content Centric Networking (CCN) [1] approach exploits in-network storage which can be embedded in CCN routers by caching frequently or recently requested contents. A multimedia data le is often too big to be cached as a whole in a CCN router, and it is much more efcient to cache only certain parts of the content [2], [3]. In CCN, a content le consists of a number of chunks, which is the minimal unit of data to be transferred over the network. Some chunks of a con- tent le may be requested more frequently than other chunks of the le. Yu et al. [4] presented that internal chunk popularity of a streaming video le follows a Zipf-like distribution. It is also known that in multimedia contents, forepart chunks likely have higher popularity than others [3], [4]. In YouTube, 60% of all videos are watched less than 20% of their duration [5], and 80% of these video interrupts are due to the lack of user Manuscript received September 30, 2013; revised April 24, 2014; accepted May 21, 2014. Date of publication June 12, 2014; date of current version August 8, 2014. This work was supported in part by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education under Grant 2012R1A1B3003573 and in part by the Namseoul University. The associate editor coordinating the review of this paper and approving it for publication was P. Chatzimisios. (Corresponding author: Young-Bae Ko.) S.-H. Lim is with the Department of Multimedia, School of Engineering, Namseoul University, Cheonan 330-707, Korea (e-mail: sunghwa@nsu.ac.kr). Y.-B. Ko and G.-H. Jung are with the Department of Computer Engineer- ing, Ajou University, Suwon 446-749, Korea (e-mail: youngko@ajou.ac.kr; guehwan@ajou.ac.kr). J. Kim and M. Jang are with Samsung Electronics, Yongin 446-711, Korea (e-mail: jaehoonk@samsung.coms; myeong.jang@samsung.com). Color versions of one or more of the gures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identier 10.1109/LCOMM.2014.2329482 interest [6]. Therefore, high performance can be gained when more frequently requested foreparts of a content le are cached on an edge CCN router which is closer to the end users. Existing chunk-based cache replication proposals in CCN such as Leave Copy Down (LCD) [7] or Prob Caching [8] do not effectively consider such inter-chunk popularity of a content le, because they consider a chunk as an individual content data. Cho et al. presented the WAVE [9], a chunk-based cache replication scheme, which considers a chunk as one of data chunks belonging to a content le. However, this scheme also does not consider the inter-chunk popularity of a content le. Moreover, main motivation of existing proposals is to reduce server stress and network burden, which encourages to rstly locate caches on the core side (i.e., near the content server). However, user perceived delay can be signicantly reduced by locating caches on edge side (i.e., near the user). This paper presents a new caching scheme, named as Inter-chunk Popularity-based Edge-rst Caching (IPEC). Our scheme initially caches more popular chunks, such as forepart chunks of streaming services on the edge-side router and gradually builds a cache pipeline 1 on the CCN routers located along the path between the user and the content origin server according to the request frequency for the content le. The proposed scheme is decentralized, and it is easily operable with any cache replacement schemes as well as any content routing schemes. Our simulation studies prove that the proposed IPEC scheme provides better performance than the existing proposals [9] especially in streaming multimedia services. II. SYSTEM MODEL A CCN network consists of numbers of content-aware nodes (or routers) which are equipped with forwarding engines. The forwarding engine of a CCN router consists of three data structures as presented in [1]: a Content Store (CS), Pending Interest Table (PIT), and Forwarding Information Base (FIB). CS represents the in-network cache storage. PIT aggregates on-going multiple user requests for the same content and also maintains the backward path to deliver content chunks by remembering the incoming interface of each content request. Lastly, FIB has the same role to that of the legacy routing table in traditional IP networks. The CCN packet carrying a user request for a desired content is called an Interest, while the chunk with the desired content is called a Data chunk. We augmented the original CCN architecture proposed in [1] as follows: Interests and Data chunks carry m value which represents the number of chunks to constitute a chunk block of a content le. We assume that each content le consists of multiple chunk blocks. Interests and Data chunks carry metadata information about the content le (e.g., name, size, etc.). 1 Cache pipeline for a content item is built by sequentially placing exclusive chunks of the item on the content routers along the path from a user to a server. 1089-7798 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. 1332 IEEE COMMUNICATIONS LETTERS, VOL. 18, NO. 8, AUGUST 2014 An Interest carries the hop count information, which is set to 0 at the content requestor and increased by one whenever the Interest is relayed by a CCN router. An Interest carries a reservation bit to suggest caching the requested data chunk to the next upstream router when it is forwarded. When an Interest is being stored in PIT, hop count value and reservation bit value in the Interest are also stored. Algorithm 1. Interest processing algorithm for catching 1 m: The number of chunks in a block of content le K 2 n i : The hop count of interest i from the user 3 4 On receiving Interest i requesting i-th chunk of le K 5 if i-th data chunk of K is cached 6 return the cached data chunk and exit: 7 if n i == 1 //edge router 8 if i <= m //forepart block 9 set Interest is reservation bit, and then copy it into PIT; 10 clear Interest is reservation bit again for forwarding; 11 else if (1 through (i1)/mm)-th chunks are cached 12 set the reservation bit of Interest i; 13 else //not an edge router 14 if (Interest is reservation bit is set) 15 if (i/m == n i ) 16 copy Interest i into PIT with n i marking to cache; 17 clear Interest is reservation bit for forwarding; 18 else if (i > n i m) & & 19 (((n i 1) m + 1) through (i 1)/m m-th chunks are cached) 20 set Interest is reservation bit; 21 forwarding Interest i to the next upstream router; Algorithm 2. Data chunk processing algorithm for caching 1 n i : The hop count value of Interest i stored in PIT 2 3 On receiving i-th Data chunk (i.e., Data chunk i) of le K 4 if Interest i resides in PIT 5 if Interest is reservation bit is set //marked to cache 6 if Data chunk i is in name caching table 7 cache Data chunk i; 8 remove Data chunk i from name caching table; 9 else 10 put the name of Data chunk i into name caching table; 11 forward Data chunk i; III. EDGE & FOREPART FIRST CACHING Our proposed scheme, Inter-chunk Popularity-based Edge- rst Caching (IPEC), is required to: (1) divide a content le into several blocks 2 , (2) cache the most popular block(s) of a content le (forepart of multimedia contents) on the edge 2 A content le, which is sorted in ascending order by the sequence numbers of chunks, is divided into several blocks. Each block consists of equal number of chunks (i.e., m) and also sorted in ascending order by the sequence number. Fig. 1. Example operation of the proposed IPEC scheme. router nearest to the user at the rst request to the con- tent le, and (3) gradually build a cache pipeline along the path from the edge router to the content server according to the request rate for the content le. IPEC does not require any centralized network manage- ment or additional high-burdened control overhead. It simply operates in every CCN router in a distributed manner. The Algorithms 1 and 2, which are illustrated with Fig. 1, present how Interest and Data chunks are processed by IPEC scheme. Note that, in Algorithm 2, we employ the content name caching mechanism [10], in which only the name of a chunk is cached in the name caching table at rst, instead of caching the actual data. After cached in the name table, the actual data will be cached only after a successive caching trial occurs. The name caching mechanism helps us to prevent the situation that unpopular contents forepart chunks push out very popular contents chunks cached in the CCN router. Fig. 1 illustrates the actual operation of our scheme using a simple scenario, where two users u 1 , u 2 are connected to the Internet through an edge router a, and they seek for a streaming content from just 3-hop apart content server. For simplicity, a content le K is assumed to consist of two consecutive blocks, K 1 and K 2 , where each block is again divided into m chunks. Now, u 1 sequentially sends Interests for K to its edge router a, as shown in Fig. 1(1). On receiving Interests, the router a copies them into PIT after marking only for the forepart chunks (i.e., K 1 ) to be cached, and forwards the Interests to the next router b. Then, b currently having no data chunks of K keeps forwarding the Interests to a content server. Fig. 1(2) depicts how requested contents can be replied by the content server, cached by intermediate routers, and eventually reached to the requesting user u 1 via a reverse path. Observe that, while forwarding these data chunks, router a caches data chunks of K 1 because entries for K 1 stored in PIT have been marked for caching. However, data chunks for K 2 will never be cached in a as no entries for K 2 in PIT have been marked for caching. This is important for cache utilization because the remaining space of a can be used to cache forepart chunks of other content les. Fig. 1(3) shows the case when a different user u 2 sends new Interests for the same content K to its edge router a. On receiving the Interests, a immediately returns the cached data LIM et al.: INTER-CHUNK POPULARITY-BASED EDGE-FIRST CACHING IN CCN 1333 Fig. 2. Example of pipelined cache access with user interrupts. (1) Case 1: Forepart caching using IPEC. (2) Case 2: Traditional le caching. chunks of K 1 to u 2 . At the same instant, a also sends the Interests requesting for K 2 to b after setting the reservation bit of every Interest requesting for K 2 to make b cache all the Data chunks of K 2 . When b gets the Interests requesting for K 2 , it copies those Interests in PIT after marking them to be cached, and forwards the Interests to the content server after clearing the reservation bit of every Interest. The nal step is illustrated in Fig. 1(4), where the content server sends every Data chunk of K 2 to b, and b forwards the data chunks to a after caching them because the prexes for K 2 in PIT have been marked to be cached. From the process shown above, a pipelined cache access for content K can be provided to all users con- nected to a. Fig. 2 depicts user interrupts cases where there are two content les (i.e., K and L). Let d 1 , d 2 , and d 3 be the transmit delay between the user and router a, router a and router b, and router b and the content server, respectively. Lets assume that each router has storage space that can cache only two blocks of chunks (i.e., 2m). In Fig. 2(1), IPEC scheme is employed whereas it is not in Fig. 2(2). In Fig. 2(2), router a caches all chunks of K 1 and L 1 , and b caches all chunks of K 2 and L 2 , so the cache pipelines of K and L are completed for all users connected to a. Lets now assume that u 1 has requested all chunks of L, and u 2 has requested all chunks of K to a. Then, u 1 and u 2 interrupts its watching or downloading just after receiving forepart chunks (i.e. L 1 and K 1 , respectively). Then, the delay for u 1 is just sizeof(L 1 )d 1 , because a has cached forepart chunks by using our scheme. By the same way, the delay for u 2 is sizeof(K 1 )d 1 . Now in Fig. 2(2), a caches all chunks of K, and b caches all chunks of L. For the same user behaviors including user interrupts shown in Fig. 2(1), the delay for u 1 is expected to be sizeof(L 1 )d 1 + sizeof(L 1 )d 2 whereas that for u 2 is sizeof(K 1 )d 1 . IV. PERFORMANCE EVALUATION For the performance evaluation, we have developed an OPNET-based CCN simulator. OPNET [11] is a widely used commercial simulator to emulate various real network envi- ronments. Using the simulator, we implement the forwarding engine of the CCN and evaluate with an Internet-like topol- ogy, which is randomly generated by Georgia Tech Internet Topology Model [12]. A screen shot of our simulation topology is shown in Fig. 3. The FIB of the forwarding engine is constructed by a simple routing protocol providing the shortest path based on the Dijkstra algorithm from each client to a content server. The least recently used policy is utilized for a cache replacement in the content store of the forwarding engine. Note that CCN protocol stack replaces TCP (or UDP)/IP stack [1]. There are 10 transit routers, 3 stub routers per transit router, Fig. 3. Simulation topology. TABLE I SIMULATION PARAMETERS and 10 users per stub router. Transit routers and stub routers have caching capabilities and work as CCN routers. We assume that every CCN router has the same cache capacity. In order to build a more realistic network conguration, we set up the network bandwidth as 100 Mb/s for the link between a user and its edge-side CCN router (i.e., stub router) and 1 Gb/s for the links among CCN routers, respectively. In each scenario, one content server is deployed on the user- side, having 10,000 video clips. The size of each content le is set to 45Mbytes here, and we assume a streaming video le with 720
480 pixels and 180 second duration encoded by H.264. The probability distribution of requests of all content les follows Zipf distribution. We also assume that 60% of user requests for a content le are interrupted during the download, based on the well-known fact that 60%of all videos are watched less than 20% of their duration [5]. The interrupt points are determined by Zipf distribution with = 0.8. Table I sum- marizes the simulation parameters. The arrival rate represents the periodic generation of initial Interest scheduling for a new content. We make the network congested for the simulation because networks tend to be congested in the real-world, which badly increases user-perceived delay. We evaluate the performance of our IPEC, by comparing it to the following two schemes: Prob, by which a data chunk is randomly cached on each of the relaying routers with some static probability. The probability is set to 0.3 in our simulation, i.e., Prob (0.3). WAVE [9], by which data chunks of a content le are cached at rst on a core router and gradually populated while propagating toward the edge side. We measure three metrics for performance comparison: Effective delay: It is dened as the average end-to-end de- lay per effective chunk. Effective chunks mean the actually played data chunks of a multimedia content le by users. Therefore, this metric helps us to see the user perceived delay incurred by a cache replication scheme, considering the situation that a user interrupts the play of a content le while downloading or after completing its download. Cache hit ratio: This metric represents the average cache hit ratio on Interests. 1334 IEEE COMMUNICATIONS LETTERS, VOL. 18, NO. 8, AUGUST 2014 Fig. 4. Simulation results. (1) Effective Delay with different cache capacities. (2) Effective Delay with different request arrival rates. (3) Effective Delay with different number of chunks per content le. (4) Effective Delay when the server is on the core-side. (5) Cache Hit Ratio with different cache capacities. (6) Network trafc per data chunk with different cache capacities. Network trafc per data chunk: It is dened as the sum of trafc produced in the whole network to serve a user re- quest for a data chunk. Interests and Data chunks produced by the user and every CCN router for the request will be included. This metric is the average value for all data chunk requests by all user nodes during the experiment, which shows us how much unnecessary or redundant trafcs are generated. The size of a chunk block (i.e., m) may affect the per- formance of our IPEC scheme. If m is too small, caching effect may not be effectively achieved. On the other hand, if m is too large, too much cache storage in a content router is occupied by a chunk block, which may cause more frequent cache replacements. In our simulation, IPEC shows the best performance when the size of a chunk block is around 1/3 of the size of a content le. Simulations for each scenario were repeated 10 times and averaged. We used the t-statistics where the degree of freedom is 29 for condence interval, which for each result are shown in the graph. Fig. 4(1) presents the effective chunk delay for each scheme with different cache capacities. IPEC scheme shows the lowest effective delay when content is downloaded. More specically, IPECincurs around 20%less effective delay than WAVE. These reductions of delay can be considered quite an improvement because multimedia data are usually large sized in real-time environments. We measure the effective delay by varying the request rate as shown in Fig. 4(2). IPEC scheme shows the lowest delay where the rate is higher than 4. Below 4, all schemes show the similar performance because too less data were generated comparing to the network capacity. We also measure the effective delay with the different number of chunks per content le while xing the le size as shown in Fig. 4(3). IPEC scheme shows the lowest delay for all cases. To show the effect of the location of the content server, we measure the effective chunk delay by setting the location of the content server on core-side of the network (i.e., directly connected to a transit node) as shown in Fig. 4(4). Although WAVE incurs slightly reduced effective delay than it does when the content server is on the user-side, IPEC scheme also show the smallest effective delay. The performance on the average cache hit ratio is shown in Fig. 4(5). Again, IPEC scheme shows the best performance achieving at 50%
100% higher cache hit ratio than other schemes. Fig. 4(6) depicts the comparison results of three schemes in terms of network trafc. IPEC shows the best performance achieving at 10% less network trafc than other schemes. V. CONCLUSION This paper presents a novel cache replication scheme to reduce user perceived delays. Simulation results show that the proposed IPEC scheme incurs less user delay (20%) and network trafc (10%), and higher cache hit ratio (30%) than other schemes. In the future, we will optimize our scheme according to various trafc and usage patterns, and congure diverse content size and communication models to evaluate user satisfaction aspects (e.g., Mean Opinion Scores). REFERENCES [1] V. Jacobson et al., Networking named content, in Proc. CoNEXT, Rome, Italy, Dec. 2009, pp. 112. [2] A. Dan and D. Sitaram, A generalized interval caching policy for mixed interactive and long video workloads, in Proc. MMCN, San Jose, CA, USA, Jan. 1996, pp. 699706. [3] Y. Liu, Y. Guo, and C. Liang, A survey on peer-to peer video streaming systems, Peer-to-Peer Netw. Appl., vol. 1, no. 1, pp. 1828, Mar. 2008. [4] J. Yu, C. T. Chou, X. Du, and T. Wang, Internal popularity of streaming video and its implication on caching, in Proc. IEEE AINA, Vienna, Austria, Apr. 2006, pp. 16. [5] P. Gill, M. Arlitt, Z. Li, and A. Mahanti, Youtube trafc characterization: A view from the edge, in Proc. Internet Meas. Conf., San Diego, CA, USA, Oct. 2007, pp. 1528. [6] A. Finamore, M. Mellia, M. M. Munaf, R. Torres, and S. G. Rao, Youtube everywhere: Impact of device and infrastructure synergies on user experience, Purdue Univ., West Lafayette, IN, USA, Tech. Rep. 418, 2011. [7] N. Laoutaris, H. Che, and I. Stavrakakis, The LCD interconnection of LRU caches and its analysis, Perform. Eval., vol. 63, no. 7, pp. 609634, Jul. 2006. [8] I. Psaras, W. K. Chai, and G. Pavlou, Probabilistic in-network caching for information-centric networks, in Proc. SIGCOM ICN Workshop, Helsinki, Finland, Aug. 2012, pp. 5560. [9] K. Cho et al., WAVE: Popularity-based and collaborative in-network caching for content-oriented networks, in Proc. IEEE INFOCOM NOMEN Workshops, Orlando, FL, USA, Mar. 2012, pp. 316321. [10] M. Xie, I. Widjaja, and H. Wang, Enhancing cache robustness for content-centric networking, in Proc. IEEE INFOCOM, Orlando, FL, USA, Mar. 2012, pp. 24262434. [11] OPNET Ofcial Page. [Online]. Available: http://www.opnet.com [12] E. W. Zegura, K. L. Calvert, and S. Bhattacharjee, How to model an Internetwork, in Proc. IEEE INFOCOM, San Francisco, CA, USA, Mar. 1996, pp. 594602. [13] C. Fricker, P. Robert, J. Roberts, and N. Sbihi, Impact of trafc mix on caching performance in a content-centric network, in Proc. IEEE INFOCOM NOMEN Workshops, Orlando, FL, USA, Mar. 2012, pp. 310315. [14] T. Hossfeld et al., Initial delay vs. interruptions: Between the devil and the deep blue sea, in Proc. Qual. Multim. Experience, 2012, pp. 16.