Linux and Unix Books page6
- An Analysis of Internet Content Delivery Systems
Few things compare with the growth of the Internet over the last decade, except perhaps its growth in the last several years. A key challenge for Internet infrastructure has been delivering increasingly complex data to a voracious and growing user population. The need to scale has led to the development of thousand-node clusters, global-scale content delivery networks, and more recently, self-managing peer-to-peer structures. These content delivery mechanisms are rapidly changing the nature of Internet content delivery and traffic; therefore, an understanding of the modern Internet requires a detailed understanding of these new mechanisms and the data they serve.
This paper examines content delivery by focusing on four content delivery systems: HTTP web traffic, the Akamai content delivery network, and the Kazaa and Gnutella peer-to-peer file sharing systems. To perform the study, we traced all incoming and outgoing Internet traffic at the University of Washington, a large university with over 60,000 students, faculty, and staff.
- The Operating System I/O Speculation
In the past decade, the gap between processing speeds and disk access times has increased by an order of magnitude . At the same time, although memory sizes have increased substantially, so have application data requirements. Systems therefore continue to swap their data sets to and from disk as they are often too large to all fit in memory.
In recognition of this problem, there has been a great deal of research into automating disk prefetching algorithms that are dramatically more accurate than the standard heuristics in current operating systems. Recent work by Demke and Mowry  demonstrated impressive performance results using compiler-based techniques. However, system-wide use of their approach would require recompiling every application. Moreover, their compiler analyses are limited to looping array codes.
- Detecting BGP Configuration Faults with Static Analysis
This paper describes the design, implementation, and evaluation of rcc, the router configuration checker, a tool that uses static analysis to detect faults in Border Gateway Protocol (BGP) configuration. By finding faults over a distributed set of router configurations, rcc enables network operators to test and debug configurations before deploying them in an operational network. This approach improves on the status quo of "stimulus-response" debugging where operators need to run configurations in an operational network before finding faults.
Network operators use router configurations to provide reachability, express routing policy, configure primary and backup links , and perform traffic engineering across multiple links . Configuring a network of BGP routers is like writing a distributed program where complex feature interactions occur both within one router and across multiple routers.
Circumventing Web Censorship and Surveillance
The World Wide Web is a prime facilitator of free speech; many people rely on it to voice their views and to gain access to information that traditional publishing venues may be loath to publish. However, over the past few years, many countries, political regimes, and corporations have attempted to monitor and often restrict access to portions of the Web by clients who use networks they control. Many of these attempts have been successful, and the use of the Web as a free-flowing medium for information exchange is being severely compromised.
Several countries filter Internet content at their borders, fearful of alternate political views or external influences. For example, China forbids access to many news sites that have been critical of the country's domestic policies. Saudi Arabia is currently soliciting content filter vendors to help block access to sites that the government deems inappropriate for political or religious reasons.
- Introduction of Multidimensional Data and
Large, multidimensional datasets are becoming more prevalent in both scientific and business computing. Applications, such as earthquake simulation and oil and gas exploration, utilize large three-dimensional datasets representing the composition of the earth. Simulation and visualization transform these datasets into four dimensions, adding time as a component of the data. Conventional two-dimensional relational databases can be represented as multidimensional data using online analytical processing (OLAP) techniques, allowing complex queries for data mining. Queries on this data are often ad-hoc, making it difficult to optimize for a particular workload or access pattern. As these datasets grow in size and popularity, the performance of the applications that access them growing in importance.