Computer Science Department, Columbia University
CS E6998-10. Advanced Topics in Network Storage Systems
Spring 2004

info | assignments | readings | syllabus | other resources | discussion

Course Staff

Instructor: Kostas Magoutis | magoutis at cs | Office Hours: Wednesday 7pm-9pm at CSB 464 (and by appt).
TA: Marc Eaddy | eaddy at cs | Office Hours: Tuesday 5-6, Friday 1-2 in CEPSR 7LW1 (and by appt).

General Information

The course meets on Tuesdays from 6:10pm to 8pm in Mudd 327.

Announcements

The final exam is scheduled for Tuesday 5/11, 7:10-10pm. This is the reading list. (4/28/2004).

Answers to quiz #2 are now posted. (4/14/2004)

Please note the reorganization in the reading schedule (3/29/2004)

Answers to the midterm questions are now posted. (3/27/2004)

The reading list for the midterm consists of these papers. (3/2/2004)

Answers to quiz #1 are now posted. (3/1/2004)

Suggestions on how to read the papers for Tuesday 3/2 are now posted. (2/25/2004)

Suggestions on how to read the papers for Tuesday 2/3 are now posted (1/28/2004, updated 2/2/2004).

Paper reviews and suggested format (1/28/2004). Please note that paper reviews are required and count as part of class participation so make sure you email them in to cs699810@cs before class. Reviews can be as detailed as you like. However, the suggested format is a bulleted list summarizing the main points of each paper. You can use the following example as a guideline.

Assignments

Assignment #2 is due on Friday 4/23.

Grading criteria for assignment #1.

Assignment #1 is due on Tuesday 2/24. You will need this paper for the first part. The SHA-1 algorithm is described in this RFC.

Course Description

Network storage systems are distributed systems designed to offer access to storage resources over a network. In recent years network storage has emerged as an important systems research field driven by the demand for scalable storage structures to satisfy the growing needs of Internet services.

Some of the advantages of the network storage model over direct-attached storage include better scalability and improved utilization and sharing of distributed storage resources. A number of challenges, however, are facing the network storage architect: First, it is the higher complexity (compared to direct-attached storage) due to the distributed nature of the network storage system. Administration, capacity planning, configuration, backup, and disaster recovery are complicated in large-scale network storage systems. Second, transferring data over the network requires stronger security and safety guarantees than when transferring them on the system I/O bus. In addition, it sometimes requires new, storage-specific network transport protocols. These and other challenges make network storage an exciting research area that has made significant advances in recent years.

This course will offer an exploration and study of network storage systems based on readings of classic and current papers and class discussions.

Coursework

Prerequisites

Grading

The final grade depends on class participation, two homework assignments, two quizzes, a midterm, a final examination, and a research project. Quizzes will be 15-minute, in class, at previously announced dates, and will consist of short-answer questions about the material covered in the papers we've read.

Research projects will be chosen by students either independently or from a list of possible project topics that will be made available by the course staff.

Readings

There are a number of paper readings that are available online. You are expected to read the papers and send a short review for each paper to the course account before the beginning of each class.

There is no required textbook for this class. The following textbooks, however, are recommended readings:

Syllabus

Date Notes Papers
Tue 1/20 Lecture slides Chen: RAID: High-performance, Reliable Secondary Storage (optional)
UNIX Internals: The New Frontiers, Sections 11.2.3, 11.2.4, 11.4, 11.4.1, 11.7, 11.7.1-11.7.4 (optional)
Lampson: Atomic Transactions, in Distributed Systems--Architecture and Implementation, Lecture Notes in Computer Science, pp. 246-265, 1981 (optional)
Tue 1/27 - Wilkes: HP AutoRAID Hierarchical Storage System
Sandberg: Design and Implementation of NFS
Pawlowski: NFS Version 3 Design and Implementation (optional)
Tue 2/3 How to read these papers Bhide: A Highly-Available Network File Server (HA-NFS)
Hartung: IBM Enterprise Storage Server: A Designer's View
Howard: Scale and Performance of a Distributed File System (AFS)
Tue 2/10 - Lee: Petal: Distributed Virtual Disks
Thekkath: Frangipani: A Scalable Distributed File System
Tue 2/17 Notes Hagmann: Reimplementing the Cedar File System Using Logging and Group Commit
Lampson: An Open Operating System for a Single-user Machine (Sections 3, 3.1-3.6, optional)
Tue 2/24 How to read these papers Gibson: A Cost-effective High-bandwidth Storage Architecture (NASD)
Anderson: Serverless Network File Systems (xFS)
Tue 3/2 How to read these papers Martin: NFS Sensitivity to High Performance Networks
Magoutis: Structure and Performance of the Direct Access File System (DAFS)
Tue 3/9 Midterm Meth: Design of the iSCSI Protocol
Jurgens: Fibre Channel: A Connection to the Future
Tue 3/23 - Wee Teck Ng: Obtaining High-Performance for Storage Outsourcing
Anderson: Running Circles around Storage Administration (Hippodrome)
Tue 3/30 - Quinlann: A New Approach to Archival Storage (Venti)
Muthitacharoen: A Low-bandwidth File System (LBFS) (optional)
Henson: An Analysis of Compare-by-Hash (optional)
Tue 4/6 - Radkov: A Performance Comparison of NFS and iSCSI for IP-Networked Storage
Tue 4/13 - Saito: Manageability, Availability and Performance in Porcupine
Tue 4/20 - Kistler: Disconnected Operation in the Coda File System
Satyanarayanan: On the Ubiquity of Logging in Distributed File Systems (optional)
Tue 4/27 - Dabek: Wide Area Cooperative Storage with CFS
Tue 5/11 Final exam Reading list

Other Resources