Distributed Content Services Framework

PhD Candidacy Exam
Lisa Amini
Advisor: Henning Schulzrinne

The content delivery model for the Internet is evolving from centralized hosting to an affiliated broadcast or content distribution network model. Until recently, much of the Web's rich media content, especially streaming media, was hosted from a centralized location to be accessed by clients distributed throughout the Internet. Hot spots due to popular content are an obvious problem with this model. Early attempts to solve this problem included mirroring sites and installing caching proxies at IAPs (Internet Access Providers). While caching proxies help alleviate hot spots and address IAP requirements for minimizing traffic on their backbone and peering links, they lack focus on Content Publisher's (CPs) concerns. Mirroring generally involves a user manually selecting the server -- without knowledge of which would actually provide the best service. A key turning point in this evolution was the realization that CPs were willing to pay to outsource fast, reliable delivery of their content. Specifically, CPs will pay not just for hosting services, but for the distributed hosting services offered by Content Distribution Networks (CDNs). We anticipate content distribution and delivery services are only the first of a spectrum of content services that will radically change Internet operations.

Building the case for change requires a benchmark through which we can evaluate current technology and predict its shortcomings. There has been extensive research [1, 2, 3, 4, 5] into attempting to build an appropriate model of the Internet over the past ten years. While it is recognized that the rapid evolution of the Internet ensures characterization work cannot be definitive, certain invariants have been identified from which we can gain insights and derive solutions.

With this background on the state of the art in characterizing the Internet, focusing primarily on WWW traffic, we can begin an in-depth look at current, best effort technology for storage in the network. These techniques focus on exploiting well-known properties such as temporal locality, Zipf-like popularity distributions, hierarchical efficiencies, and the periodic nature of human access to content. However, they tend to fall short when it comes to addressing the heavy tailed distributions characterizing file sizes, distribution times, idle times between accesses, session durations, and number of page requests per site. In this context, we will discuss local cache management algorithms [10, 11], organization of cache meshes [6, 7, 9], and optimal placement of storage [8, 12, 23, 24, 25] in the network.

While significant gains have been achieved with Web caching, it exhibits limitations on several fronts. Specifically, the requirements of some content types (e.g., streaming media) and some content providers can not be adequately met through best effort services. From a technical perspective, hierarchies and whole-object caching tend to create instability as the object sizes grow. From a business perspective, the value proprosition, and therefore the quality of service, of all content types is not uniform. We evaluate current proposals for video caching [13,14, 15, 16] and replication [17, 19, 20, 21], in this context.

Finally, we will discuss current proposals for content distribution and delivery networks [18, 22] and for peer-to-peer solutions for distribution content storage and access, and the interoperation of various content service providers [26, 27]. The goal of these proposals is to better address the requirements of CPs, significantly improve end user service, and can potentially create a content services environment which goes significantly beyond today's distribution and delivery orientation. I will include an evaluation [24, 28] of these solutions and the pressing research issues for developing this next phase in the evolution of Web infrastructure.

Bibliography

1. Web Server Workload Characterization: The Search for Invariants
Arlitt, M., Williamson, C.,
ACM Sigmetrics Conference, 1996.

2. Web Caching and Zip-like Distributions: Evidence and Implications
Breslau, L., Cao, P., Fan, L., Phillips, G., Shenker, S.,
IEEE Infocom, 1999.

3. Self-Similarity in World Wide Web Traffic: Evidence and Possible Causes
Crovella, M., Bestavros, A..,
IEEE/ACM Transactions on Networking, Vol 5, No 6, December 1997

4. Traffic Analysis of a Web Proxy Caching Hierarchy
Mahanti, Williamson, C., Eager, D.,
IEEE Network, May/June 2000

5. The Dependence of Internet User Traffic Characteristics on Access Speed
Vicari, N., Kohler, S., Charzinski, J.,
University of Wurzburg, Research Report 246, January, 2000

6. A Quantitative Comparison of Graph-Based Models for Internet Topology
Zegura, E., Calvert, K., Donahoo, M.,
IEEE Transactions on Networking , 1997

7. World Wide Web Caching: The Application-Level View of the Internet
Baentsch, M., Baum, L., Molter, G., Rothkugel, S., Sturm, P.
IEEE Communications, June 1997

8. Self-Organizing Wide-Area Network Caches
Bhattacharjee, S., Calvert, K., Zegura, E.,
IEEE Infocom, 1998

9. Beyond Hierarchies: Design Considerations for Distributed Caching on the Internet
Tewari, R., Dahlin, M., Vin, H., Kay, J.
UTCS Technical Report: TR98-04, 1998

10. Caching on the World Wide Web
Aggarwal, C., Wolf, J., Yu, P.,
IEEE Transactions on Knowledge and Data Engineering, Vol 11, No 1, January 1999

11. GreedyDual* Web Caching Algorithm: Exploiting the Two Sources of Temporal Locality in Web Request Streams
Jin, S., Bestavros, A.,
Proceedings of Web Caching Workshop, 2000

12. Coordinated Placement and Replacement for Large-Scale Distributed Caches
Korupolu, M., Dahlin, M.,
Workshop on Internet Applications, July 1999

13. Resource Based Caching for Web Servers
Tewari, R., Vin, H., Dan, A., Sitaram, D.,
Proceedings of SPIE/ACM Conference on Multimedia Computing and Networking, 1998

14. An Active Services Framework and Its Application to Real-time Multimedia Transcoding. Proceedings of ACM
Amir, E., McCanne, S., Katz, R.,
SIGCOMM, Sept 1998

15. Multimedia Proxy Caching Mechanism for Quality Adaptive Streaming Applications in the Internet
Regaie, R., Yu, H., Handley, M., Estrin, D.,
Proceedings of IEEE Infocom, 2000

16. Optimized Caching in Systems with Heterogenous Client Populations
Eager, D., Ferris, M., Vernon, M.,
Performance Evaluation 42 (2000)

17. The Case for Geographical Push Caching
Gwertzman, J., Seltzer, M.,
Proceedings of the 1995 Workshop on Hot Operating Systems, May 1995.

18. Distributed Network Storage
Chaung, J.,
University of California at Berkeley, Ph.D. Thesis, 1999.

19. World-Wide Web Cache Consistency
Gwertzman, J., Seltzer, M.
Proceedings of the 1996 Usenix Technical Conference, January 1996.

20. Using Leases to Support Server-Driven Consistency in Large-Scale Systems
Yin, J., Alvisi, L., Dahlin, M., Lin, C.,
IEEE Transactions on Knowledge and Data Engineering Special Issue on Web Technologies. 1999

21. Adaptive Leases: A Strong Consistency Mechanism for the World Wide Web
Duvvri, V., Shenoy, P., Tewari, R.,
Proceedings of IEEE INFOCOM'2000, Tel Aviv, Israel, March 2000.

22. Active Names: Flexible Location and Transport of Wide-Area Resources
Vahdat, A., Anderson, T., Dahlin, M.,
USENIX Symposium on Internet Technologies and Systems (USITS99), October 1999

23. On the Optimal Placement of Web Proxies in the Internet
Li, B., Golin, M., Italiano, F., Deng, X., Sohraby, K.,
Proceedings of Infocom 1999.

24. The Cache Location Problem
Krishnan, P., Raz, D., Shavitt, Y.,
IEEE/ACM Transactions on Networking, August 2000.

25. Optimum Distribution of Switching Centers in a Communications Network and Some Related Graph Theoretic Problems.
Hakimi, S.,
Operations Research, 13, 1965.

26. A Model for CDN Peering
Day, M., Cain, M., Tomlinson, G.,
Internet Draft November, 2000, http://www.ietf.org/internet-drafts/draft-day-cdnp-model.04.txt

27. CDN Peering Architectural Overview
Green, M., Cain, B., Tomlinson, G., Thomas, S.,
Internet Draft November, 2000, http://www.ietf.org/internet-drafts/draft-cdnp-gen-arch-02.txt

28. An Economy for Managing Replicated Data in Autonomous Decentralized Systems
Ferguson, D., Nikolaou, C., Yemini, Y.,
Proceedings fo International Symposium on Autonomous and Decentralized Systems, 1993.