Design and Implementation of the Sun NFS Goals: - Machine and OS independence - Not tied to Unix - Crash recovery - Stateless - Transparent access - Hides server identity, which is used only at mount time - Unix semantics maintained on client - File locking and concurrent access - Performance NFS protocol - How is it specified: As a set of RPC procedures with inputs, outputs and their effects. - Stateless --> Simple recovery - Transport independent. First implementation uses UDP/IP. - File handle. Opaque. - XDR. External Data Representation. NFS server side - Stable writes. Data and metadata. - Add generation number in the inode, which is reflected in the fhandle. Client side - Transparent access to remote files. - Server hostname is only used at mount time. VFS vs. VNODE operations Namespace traversal done a component at a time. - Implementation - Hard vs. soft mount - nfsiods and nfsds. - Hard issues - Root filesystems Architecture-dependent binaries /tmp and /dev hard to share - Credentials Lack of flat uid, gid space over entire network - File locking Need additional protocol to do the locking. Should be mandatory, not advisory. - Concurrent access Unix does file locking per I/O, so that I/Os serialize. - Unix open file semantics Allows files to be deleted but still be accessible if not closed. Requires renaming with NFS. - Time skew. Can be a problem, for example with make. Performance - Care about end-to-end, not benchmarks. Initial performance bad. The graphs are missing units. What improvements did they try? - Client caching - Large UDP datagrams - avoid many bcopies - cache attributes (getattr) - Swap in small executables to take advantage of server readahead in case of client readahead. - Name cache - Future improvements - Full diskless operation - To avoid dependance on ND for root fs - Remote file locking - Presently, it is advisory. - Support other clients. - CIFS is a bit different, more emphasis on locking. - Performance - Build dedicated NFS server. - Improved security - Automatic mounting - See amd (automount daemon) Things not mentioned in this paper - Open-to-close consistency - Retransmission cache (at-most-once semantics) - Improve write performance - v3 add asynchronous reads/writes