Commit dc7a08166f3a5f23e79e839a8a88849bd3397c32

Authored by J. Bruce Fields
1 parent e343eb0d60

nfs: new subdir Documentation/filesystems/nfs

We're adding enough nfs documentation that it may as well have its own
subdirectory.

Acked-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>

Showing 21 changed files with 1035 additions and 1029 deletions Side-by-side Diff

Documentation/filesystems/00-INDEX
1 1 00-INDEX
2 2 - this file (info on some of the filesystems supported by linux).
3   -Exporting
4   - - explanation of how to make filesystems exportable.
5 3 Locking
6 4 - info on locking rules as they pertain to Linux VFS.
7 5 9p.txt
... ... @@ -66,12 +64,8 @@
66 64 - info on the Linux implementation of Sys V mandatory file locking.
67 65 ncpfs.txt
68 66 - info on Novell Netware(tm) filesystem using NCP protocol.
69   -nfs41-server.txt
70   - - info on the Linux server implementation of NFSv4 minor version 1.
71   -nfs-rdma.txt
72   - - how to install and setup the Linux NFS/RDMA client and server software.
73   -nfsroot.txt
74   - - short guide on setting up a diskless box with NFS root filesystem.
  67 +nfs/
  68 + - nfs-related documentation.
75 69 nilfs2.txt
76 70 - info and mount options for the NILFS2 filesystem.
77 71 ntfs.txt
Documentation/filesystems/Exporting
1   -
2   -Making Filesystems Exportable
3   -=============================
4   -
5   -Overview
6   ---------
7   -
8   -All filesystem operations require a dentry (or two) as a starting
9   -point. Local applications have a reference-counted hold on suitable
10   -dentries via open file descriptors or cwd/root. However remote
11   -applications that access a filesystem via a remote filesystem protocol
12   -such as NFS may not be able to hold such a reference, and so need a
13   -different way to refer to a particular dentry. As the alternative
14   -form of reference needs to be stable across renames, truncates, and
15   -server-reboot (among other things, though these tend to be the most
16   -problematic), there is no simple answer like 'filename'.
17   -
18   -The mechanism discussed here allows each filesystem implementation to
19   -specify how to generate an opaque (outside of the filesystem) byte
20   -string for any dentry, and how to find an appropriate dentry for any
21   -given opaque byte string.
22   -This byte string will be called a "filehandle fragment" as it
23   -corresponds to part of an NFS filehandle.
24   -
25   -A filesystem which supports the mapping between filehandle fragments
26   -and dentries will be termed "exportable".
27   -
28   -
29   -
30   -Dcache Issues
31   --------------
32   -
33   -The dcache normally contains a proper prefix of any given filesystem
34   -tree. This means that if any filesystem object is in the dcache, then
35   -all of the ancestors of that filesystem object are also in the dcache.
36   -As normal access is by filename this prefix is created naturally and
37   -maintained easily (by each object maintaining a reference count on
38   -its parent).
39   -
40   -However when objects are included into the dcache by interpreting a
41   -filehandle fragment, there is no automatic creation of a path prefix
42   -for the object. This leads to two related but distinct features of
43   -the dcache that are not needed for normal filesystem access.
44   -
45   -1/ The dcache must sometimes contain objects that are not part of the
46   - proper prefix. i.e that are not connected to the root.
47   -2/ The dcache must be prepared for a newly found (via ->lookup) directory
48   - to already have a (non-connected) dentry, and must be able to move
49   - that dentry into place (based on the parent and name in the
50   - ->lookup). This is particularly needed for directories as
51   - it is a dcache invariant that directories only have one dentry.
52   -
53   -To implement these features, the dcache has:
54   -
55   -a/ A dentry flag DCACHE_DISCONNECTED which is set on
56   - any dentry that might not be part of the proper prefix.
57   - This is set when anonymous dentries are created, and cleared when a
58   - dentry is noticed to be a child of a dentry which is in the proper
59   - prefix.
60   -
61   -b/ A per-superblock list "s_anon" of dentries which are the roots of
62   - subtrees that are not in the proper prefix. These dentries, as
63   - well as the proper prefix, need to be released at unmount time. As
64   - these dentries will not be hashed, they are linked together on the
65   - d_hash list_head.
66   -
67   -c/ Helper routines to allocate anonymous dentries, and to help attach
68   - loose directory dentries at lookup time. They are:
69   - d_alloc_anon(inode) will return a dentry for the given inode.
70   - If the inode already has a dentry, one of those is returned.
71   - If it doesn't, a new anonymous (IS_ROOT and
72   - DCACHE_DISCONNECTED) dentry is allocated and attached.
73   - In the case of a directory, care is taken that only one dentry
74   - can ever be attached.
75   - d_splice_alias(inode, dentry) will make sure that there is a
76   - dentry with the same name and parent as the given dentry, and
77   - which refers to the given inode.
78   - If the inode is a directory and already has a dentry, then that
79   - dentry is d_moved over the given dentry.
80   - If the passed dentry gets attached, care is taken that this is
81   - mutually exclusive to a d_alloc_anon operation.
82   - If the passed dentry is used, NULL is returned, else the used
83   - dentry is returned. This corresponds to the calling pattern of
84   - ->lookup.
85   -
86   -
87   -Filesystem Issues
88   ------------------
89   -
90   -For a filesystem to be exportable it must:
91   -
92   - 1/ provide the filehandle fragment routines described below.
93   - 2/ make sure that d_splice_alias is used rather than d_add
94   - when ->lookup finds an inode for a given parent and name.
95   - Typically the ->lookup routine will end with a:
96   -
97   - return d_splice_alias(inode, dentry);
98   - }
99   -
100   -
101   -
102   - A file system implementation declares that instances of the filesystem
103   -are exportable by setting the s_export_op field in the struct
104   -super_block. This field must point to a "struct export_operations"
105   -struct which has the following members:
106   -
107   - encode_fh (optional)
108   - Takes a dentry and creates a filehandle fragment which can later be used
109   - to find or create a dentry for the same object. The default
110   - implementation creates a filehandle fragment that encodes a 32bit inode
111   - and generation number for the inode encoded, and if necessary the
112   - same information for the parent.
113   -
114   - fh_to_dentry (mandatory)
115   - Given a filehandle fragment, this should find the implied object and
116   - create a dentry for it (possibly with d_alloc_anon).
117   -
118   - fh_to_parent (optional but strongly recommended)
119   - Given a filehandle fragment, this should find the parent of the
120   - implied object and create a dentry for it (possibly with d_alloc_anon).
121   - May fail if the filehandle fragment is too small.
122   -
123   - get_parent (optional but strongly recommended)
124   - When given a dentry for a directory, this should return a dentry for
125   - the parent. Quite possibly the parent dentry will have been allocated
126   - by d_alloc_anon. The default get_parent function just returns an error
127   - so any filehandle lookup that requires finding a parent will fail.
128   - ->lookup("..") is *not* used as a default as it can leave ".." entries
129   - in the dcache which are too messy to work with.
130   -
131   - get_name (optional)
132   - When given a parent dentry and a child dentry, this should find a name
133   - in the directory identified by the parent dentry, which leads to the
134   - object identified by the child dentry. If no get_name function is
135   - supplied, a default implementation is provided which uses vfs_readdir
136   - to find potential names, and matches inode numbers to find the correct
137   - match.
138   -
139   -
140   -A filehandle fragment consists of an array of 1 or more 4byte words,
141   -together with a one byte "type".
142   -The decode_fh routine should not depend on the stated size that is
143   -passed to it. This size may be larger than the original filehandle
144   -generated by encode_fh, in which case it will have been padded with
145   -nuls. Rather, the encode_fh routine should choose a "type" which
146   -indicates the decode_fh how much of the filehandle is valid, and how
147   -it should be interpreted.
Documentation/filesystems/nfs-rdma.txt
1   -################################################################################
2   -# #
3   -# NFS/RDMA README #
4   -# #
5   -################################################################################
6   -
7   - Author: NetApp and Open Grid Computing
8   - Date: May 29, 2008
9   -
10   -Table of Contents
11   -~~~~~~~~~~~~~~~~~
12   - - Overview
13   - - Getting Help
14   - - Installation
15   - - Check RDMA and NFS Setup
16   - - NFS/RDMA Setup
17   -
18   -Overview
19   -~~~~~~~~
20   -
21   - This document describes how to install and setup the Linux NFS/RDMA client
22   - and server software.
23   -
24   - The NFS/RDMA client was first included in Linux 2.6.24. The NFS/RDMA server
25   - was first included in the following release, Linux 2.6.25.
26   -
27   - In our testing, we have obtained excellent performance results (full 10Gbit
28   - wire bandwidth at minimal client CPU) under many workloads. The code passes
29   - the full Connectathon test suite and operates over both Infiniband and iWARP
30   - RDMA adapters.
31   -
32   -Getting Help
33   -~~~~~~~~~~~~
34   -
35   - If you get stuck, you can ask questions on the
36   -
37   - nfs-rdma-devel@lists.sourceforge.net
38   -
39   - mailing list.
40   -
41   -Installation
42   -~~~~~~~~~~~~
43   -
44   - These instructions are a step by step guide to building a machine for
45   - use with NFS/RDMA.
46   -
47   - - Install an RDMA device
48   -
49   - Any device supported by the drivers in drivers/infiniband/hw is acceptable.
50   -
51   - Testing has been performed using several Mellanox-based IB cards, the
52   - Ammasso AMS1100 iWARP adapter, and the Chelsio cxgb3 iWARP adapter.
53   -
54   - - Install a Linux distribution and tools
55   -
56   - The first kernel release to contain both the NFS/RDMA client and server was
57   - Linux 2.6.25 Therefore, a distribution compatible with this and subsequent
58   - Linux kernel release should be installed.
59   -
60   - The procedures described in this document have been tested with
61   - distributions from Red Hat's Fedora Project (http://fedora.redhat.com/).
62   -
63   - - Install nfs-utils-1.1.2 or greater on the client
64   -
65   - An NFS/RDMA mount point can be obtained by using the mount.nfs command in
66   - nfs-utils-1.1.2 or greater (nfs-utils-1.1.1 was the first nfs-utils
67   - version with support for NFS/RDMA mounts, but for various reasons we
68   - recommend using nfs-utils-1.1.2 or greater). To see which version of
69   - mount.nfs you are using, type:
70   -
71   - $ /sbin/mount.nfs -V
72   -
73   - If the version is less than 1.1.2 or the command does not exist,
74   - you should install the latest version of nfs-utils.
75   -
76   - Download the latest package from:
77   -
78   - http://www.kernel.org/pub/linux/utils/nfs
79   -
80   - Uncompress the package and follow the installation instructions.
81   -
82   - If you will not need the idmapper and gssd executables (you do not need
83   - these to create an NFS/RDMA enabled mount command), the installation
84   - process can be simplified by disabling these features when running
85   - configure:
86   -
87   - $ ./configure --disable-gss --disable-nfsv4
88   -
89   - To build nfs-utils you will need the tcp_wrappers package installed. For
90   - more information on this see the package's README and INSTALL files.
91   -
92   - After building the nfs-utils package, there will be a mount.nfs binary in
93   - the utils/mount directory. This binary can be used to initiate NFS v2, v3,
94   - or v4 mounts. To initiate a v4 mount, the binary must be called
95   - mount.nfs4. The standard technique is to create a symlink called
96   - mount.nfs4 to mount.nfs.
97   -
98   - This mount.nfs binary should be installed at /sbin/mount.nfs as follows:
99   -
100   - $ sudo cp utils/mount/mount.nfs /sbin/mount.nfs
101   -
102   - In this location, mount.nfs will be invoked automatically for NFS mounts
103   - by the system mount command.
104   -
105   - NOTE: mount.nfs and therefore nfs-utils-1.1.2 or greater is only needed
106   - on the NFS client machine. You do not need this specific version of
107   - nfs-utils on the server. Furthermore, only the mount.nfs command from
108   - nfs-utils-1.1.2 is needed on the client.
109   -
110   - - Install a Linux kernel with NFS/RDMA
111   -
112   - The NFS/RDMA client and server are both included in the mainline Linux
113   - kernel version 2.6.25 and later. This and other versions of the 2.6 Linux
114   - kernel can be found at:
115   -
116   - ftp://ftp.kernel.org/pub/linux/kernel/v2.6/
117   -
118   - Download the sources and place them in an appropriate location.
119   -
120   - - Configure the RDMA stack
121   -
122   - Make sure your kernel configuration has RDMA support enabled. Under
123   - Device Drivers -> InfiniBand support, update the kernel configuration
124   - to enable InfiniBand support [NOTE: the option name is misleading. Enabling
125   - InfiniBand support is required for all RDMA devices (IB, iWARP, etc.)].
126   -
127   - Enable the appropriate IB HCA support (mlx4, mthca, ehca, ipath, etc.) or
128   - iWARP adapter support (amso, cxgb3, etc.).
129   -
130   - If you are using InfiniBand, be sure to enable IP-over-InfiniBand support.
131   -
132   - - Configure the NFS client and server
133   -
134   - Your kernel configuration must also have NFS file system support and/or
135   - NFS server support enabled. These and other NFS related configuration
136   - options can be found under File Systems -> Network File Systems.
137   -
138   - - Build, install, reboot
139   -
140   - The NFS/RDMA code will be enabled automatically if NFS and RDMA
141   - are turned on. The NFS/RDMA client and server are configured via the hidden
142   - SUNRPC_XPRT_RDMA config option that depends on SUNRPC and INFINIBAND. The
143   - value of SUNRPC_XPRT_RDMA will be:
144   -
145   - - N if either SUNRPC or INFINIBAND are N, in this case the NFS/RDMA client
146   - and server will not be built
147   - - M if both SUNRPC and INFINIBAND are on (M or Y) and at least one is M,
148   - in this case the NFS/RDMA client and server will be built as modules
149   - - Y if both SUNRPC and INFINIBAND are Y, in this case the NFS/RDMA client
150   - and server will be built into the kernel
151   -
152   - Therefore, if you have followed the steps above and turned no NFS and RDMA,
153   - the NFS/RDMA client and server will be built.
154   -
155   - Build a new kernel, install it, boot it.
156   -
157   -Check RDMA and NFS Setup
158   -~~~~~~~~~~~~~~~~~~~~~~~~
159   -
160   - Before configuring the NFS/RDMA software, it is a good idea to test
161   - your new kernel to ensure that the kernel is working correctly.
162   - In particular, it is a good idea to verify that the RDMA stack
163   - is functioning as expected and standard NFS over TCP/IP and/or UDP/IP
164   - is working properly.
165   -
166   - - Check RDMA Setup
167   -
168   - If you built the RDMA components as modules, load them at
169   - this time. For example, if you are using a Mellanox Tavor/Sinai/Arbel
170   - card:
171   -
172   - $ modprobe ib_mthca
173   - $ modprobe ib_ipoib
174   -
175   - If you are using InfiniBand, make sure there is a Subnet Manager (SM)
176   - running on the network. If your IB switch has an embedded SM, you can
177   - use it. Otherwise, you will need to run an SM, such as OpenSM, on one
178   - of your end nodes.
179   -
180   - If an SM is running on your network, you should see the following:
181   -
182   - $ cat /sys/class/infiniband/driverX/ports/1/state
183   - 4: ACTIVE
184   -
185   - where driverX is mthca0, ipath5, ehca3, etc.
186   -
187   - To further test the InfiniBand software stack, use IPoIB (this
188   - assumes you have two IB hosts named host1 and host2):
189   -
190   - host1$ ifconfig ib0 a.b.c.x
191   - host2$ ifconfig ib0 a.b.c.y
192   - host1$ ping a.b.c.y
193   - host2$ ping a.b.c.x
194   -
195   - For other device types, follow the appropriate procedures.
196   -
197   - - Check NFS Setup
198   -
199   - For the NFS components enabled above (client and/or server),
200   - test their functionality over standard Ethernet using TCP/IP or UDP/IP.
201   -
202   -NFS/RDMA Setup
203   -~~~~~~~~~~~~~~
204   -
205   - We recommend that you use two machines, one to act as the client and
206   - one to act as the server.
207   -
208   - One time configuration:
209   -
210   - - On the server system, configure the /etc/exports file and
211   - start the NFS/RDMA server.
212   -
213   - Exports entries with the following formats have been tested:
214   -
215   - /vol0 192.168.0.47(fsid=0,rw,async,insecure,no_root_squash)
216   - /vol0 192.168.0.0/255.255.255.0(fsid=0,rw,async,insecure,no_root_squash)
217   -
218   - The IP address(es) is(are) the client's IPoIB address for an InfiniBand
219   - HCA or the cleint's iWARP address(es) for an RNIC.
220   -
221   - NOTE: The "insecure" option must be used because the NFS/RDMA client does
222   - not use a reserved port.
223   -
224   - Each time a machine boots:
225   -
226   - - Load and configure the RDMA drivers
227   -
228   - For InfiniBand using a Mellanox adapter:
229   -
230   - $ modprobe ib_mthca
231   - $ modprobe ib_ipoib
232   - $ ifconfig ib0 a.b.c.d
233   -
234   - NOTE: use unique addresses for the client and server
235   -
236   - - Start the NFS server
237   -
238   - If the NFS/RDMA server was built as a module (CONFIG_SUNRPC_XPRT_RDMA=m in
239   - kernel config), load the RDMA transport module:
240   -
241   - $ modprobe svcrdma
242   -
243   - Regardless of how the server was built (module or built-in), start the
244   - server:
245   -
246   - $ /etc/init.d/nfs start
247   -
248   - or
249   -
250   - $ service nfs start
251   -
252   - Instruct the server to listen on the RDMA transport:
253   -
254   - $ echo rdma 20049 > /proc/fs/nfsd/portlist
255   -
256   - - On the client system
257   -
258   - If the NFS/RDMA client was built as a module (CONFIG_SUNRPC_XPRT_RDMA=m in
259   - kernel config), load the RDMA client module:
260   -
261   - $ modprobe xprtrdma.ko
262   -
263   - Regardless of how the client was built (module or built-in), use this
264   - command to mount the NFS/RDMA server:
265   -
266   - $ mount -o rdma,port=20049 <IPoIB-server-name-or-address>:/<export> /mnt
267   -
268   - To verify that the mount is using RDMA, run "cat /proc/mounts" and check
269   - the "proto" field for the given mount.
270   -
271   - Congratulations! You're using NFS/RDMA!
Documentation/filesystems/nfs.txt
1   -
2   -The NFS client
3   -==============
4   -
5   -The NFS version 2 protocol was first documented in RFC1094 (March 1989).
6   -Since then two more major releases of NFS have been published, with NFSv3
7   -being documented in RFC1813 (June 1995), and NFSv4 in RFC3530 (April
8   -2003).
9   -
10   -The Linux NFS client currently supports all the above published versions,
11   -and work is in progress on adding support for minor version 1 of the NFSv4
12   -protocol.
13   -
14   -The purpose of this document is to provide information on some of the
15   -upcall interfaces that are used in order to provide the NFS client with
16   -some of the information that it requires in order to fully comply with
17   -the NFS spec.
18   -
19   -The DNS resolver
20   -================
21   -
22   -NFSv4 allows for one server to refer the NFS client to data that has been
23   -migrated onto another server by means of the special "fs_locations"
24   -attribute. See
25   - http://tools.ietf.org/html/rfc3530#section-6
26   -and
27   - http://tools.ietf.org/html/draft-ietf-nfsv4-referrals-00
28   -
29   -The fs_locations information can take the form of either an ip address and
30   -a path, or a DNS hostname and a path. The latter requires the NFS client to
31   -do a DNS lookup in order to mount the new volume, and hence the need for an
32   -upcall to allow userland to provide this service.
33   -
34   -Assuming that the user has the 'rpc_pipefs' filesystem mounted in the usual
35   -/var/lib/nfs/rpc_pipefs, the upcall consists of the following steps:
36   -
37   - (1) The process checks the dns_resolve cache to see if it contains a
38   - valid entry. If so, it returns that entry and exits.
39   -
40   - (2) If no valid entry exists, the helper script '/sbin/nfs_cache_getent'
41   - (may be changed using the 'nfs.cache_getent' kernel boot parameter)
42   - is run, with two arguments:
43   - - the cache name, "dns_resolve"
44   - - the hostname to resolve
45   -
46   - (3) After looking up the corresponding ip address, the helper script
47   - writes the result into the rpc_pipefs pseudo-file
48   - '/var/lib/nfs/rpc_pipefs/cache/dns_resolve/channel'
49   - in the following (text) format:
50   -
51   - "<ip address> <hostname> <ttl>\n"
52   -
53   - Where <ip address> is in the usual IPv4 (123.456.78.90) or IPv6
54   - (ffee:ddcc:bbaa:9988:7766:5544:3322:1100, ffee::1100, ...) format.
55   - <hostname> is identical to the second argument of the helper
56   - script, and <ttl> is the 'time to live' of this cache entry (in
57   - units of seconds).
58   -
59   - Note: If <ip address> is invalid, say the string "0", then a negative
60   - entry is created, which will cause the kernel to treat the hostname
61   - as having no valid DNS translation.
62   -
63   -
64   -
65   -
66   -A basic sample /sbin/nfs_cache_getent
67   -=====================================
68   -
69   -#!/bin/bash
70   -#
71   -ttl=600
72   -#
73   -cut=/usr/bin/cut
74   -getent=/usr/bin/getent
75   -rpc_pipefs=/var/lib/nfs/rpc_pipefs
76   -#
77   -die()
78   -{
79   - echo "Usage: $0 cache_name entry_name"
80   - exit 1
81   -}
82   -
83   -[ $# -lt 2 ] && die
84   -cachename="$1"
85   -cache_path=${rpc_pipefs}/cache/${cachename}/channel
86   -
87   -case "${cachename}" in
88   - dns_resolve)
89   - name="$2"
90   - result="$(${getent} hosts ${name} | ${cut} -f1 -d\ )"
91   - [ -z "${result}" ] && result="0"
92   - ;;
93   - *)
94   - die
95   - ;;
96   -esac
97   -echo "${result} ${name} ${ttl}" >${cache_path}
Documentation/filesystems/nfs/00-INDEX
  1 +00-INDEX
  2 + - this file (nfs-related documentation).
  3 +Exporting
  4 + - explanation of how to make filesystems exportable.
  5 +nfs.txt
  6 + - nfs client, and DNS resolution for fs_locations.
  7 +nfs41-server.txt
  8 + - info on the Linux server implementation of NFSv4 minor version 1.
  9 +nfs-rdma.txt
  10 + - how to install and setup the Linux NFS/RDMA client and server software
  11 +nfsroot.txt
  12 + - short guide on setting up a diskless box with NFS root filesystem.
Documentation/filesystems/nfs/Exporting
  1 +
  2 +Making Filesystems Exportable
  3 +=============================
  4 +
  5 +Overview
  6 +--------
  7 +
  8 +All filesystem operations require a dentry (or two) as a starting
  9 +point. Local applications have a reference-counted hold on suitable
  10 +dentries via open file descriptors or cwd/root. However remote
  11 +applications that access a filesystem via a remote filesystem protocol
  12 +such as NFS may not be able to hold such a reference, and so need a
  13 +different way to refer to a particular dentry. As the alternative
  14 +form of reference needs to be stable across renames, truncates, and
  15 +server-reboot (among other things, though these tend to be the most
  16 +problematic), there is no simple answer like 'filename'.
  17 +
  18 +The mechanism discussed here allows each filesystem implementation to
  19 +specify how to generate an opaque (outside of the filesystem) byte
  20 +string for any dentry, and how to find an appropriate dentry for any
  21 +given opaque byte string.
  22 +This byte string will be called a "filehandle fragment" as it
  23 +corresponds to part of an NFS filehandle.
  24 +
  25 +A filesystem which supports the mapping between filehandle fragments
  26 +and dentries will be termed "exportable".
  27 +
  28 +
  29 +
  30 +Dcache Issues
  31 +-------------
  32 +
  33 +The dcache normally contains a proper prefix of any given filesystem
  34 +tree. This means that if any filesystem object is in the dcache, then
  35 +all of the ancestors of that filesystem object are also in the dcache.
  36 +As normal access is by filename this prefix is created naturally and
  37 +maintained easily (by each object maintaining a reference count on
  38 +its parent).
  39 +
  40 +However when objects are included into the dcache by interpreting a
  41 +filehandle fragment, there is no automatic creation of a path prefix
  42 +for the object. This leads to two related but distinct features of
  43 +the dcache that are not needed for normal filesystem access.
  44 +
  45 +1/ The dcache must sometimes contain objects that are not part of the
  46 + proper prefix. i.e that are not connected to the root.
  47 +2/ The dcache must be prepared for a newly found (via ->lookup) directory
  48 + to already have a (non-connected) dentry, and must be able to move
  49 + that dentry into place (based on the parent and name in the
  50 + ->lookup). This is particularly needed for directories as
  51 + it is a dcache invariant that directories only have one dentry.
  52 +
  53 +To implement these features, the dcache has:
  54 +
  55 +a/ A dentry flag DCACHE_DISCONNECTED which is set on
  56 + any dentry that might not be part of the proper prefix.
  57 + This is set when anonymous dentries are created, and cleared when a
  58 + dentry is noticed to be a child of a dentry which is in the proper
  59 + prefix.
  60 +
  61 +b/ A per-superblock list "s_anon" of dentries which are the roots of
  62 + subtrees that are not in the proper prefix. These dentries, as
  63 + well as the proper prefix, need to be released at unmount time. As
  64 + these dentries will not be hashed, they are linked together on the
  65 + d_hash list_head.
  66 +
  67 +c/ Helper routines to allocate anonymous dentries, and to help attach
  68 + loose directory dentries at lookup time. They are:
  69 + d_alloc_anon(inode) will return a dentry for the given inode.
  70 + If the inode already has a dentry, one of those is returned.
  71 + If it doesn't, a new anonymous (IS_ROOT and
  72 + DCACHE_DISCONNECTED) dentry is allocated and attached.
  73 + In the case of a directory, care is taken that only one dentry
  74 + can ever be attached.
  75 + d_splice_alias(inode, dentry) will make sure that there is a
  76 + dentry with the same name and parent as the given dentry, and
  77 + which refers to the given inode.
  78 + If the inode is a directory and already has a dentry, then that
  79 + dentry is d_moved over the given dentry.
  80 + If the passed dentry gets attached, care is taken that this is
  81 + mutually exclusive to a d_alloc_anon operation.
  82 + If the passed dentry is used, NULL is returned, else the used
  83 + dentry is returned. This corresponds to the calling pattern of
  84 + ->lookup.
  85 +
  86 +
  87 +Filesystem Issues
  88 +-----------------
  89 +
  90 +For a filesystem to be exportable it must:
  91 +
  92 + 1/ provide the filehandle fragment routines described below.
  93 + 2/ make sure that d_splice_alias is used rather than d_add
  94 + when ->lookup finds an inode for a given parent and name.
  95 + Typically the ->lookup routine will end with a:
  96 +
  97 + return d_splice_alias(inode, dentry);
  98 + }
  99 +
  100 +
  101 +
  102 + A file system implementation declares that instances of the filesystem
  103 +are exportable by setting the s_export_op field in the struct
  104 +super_block. This field must point to a "struct export_operations"
  105 +struct which has the following members:
  106 +
  107 + encode_fh (optional)
  108 + Takes a dentry and creates a filehandle fragment which can later be used
  109 + to find or create a dentry for the same object. The default
  110 + implementation creates a filehandle fragment that encodes a 32bit inode
  111 + and generation number for the inode encoded, and if necessary the
  112 + same information for the parent.
  113 +
  114 + fh_to_dentry (mandatory)
  115 + Given a filehandle fragment, this should find the implied object and
  116 + create a dentry for it (possibly with d_alloc_anon).
  117 +
  118 + fh_to_parent (optional but strongly recommended)
  119 + Given a filehandle fragment, this should find the parent of the
  120 + implied object and create a dentry for it (possibly with d_alloc_anon).
  121 + May fail if the filehandle fragment is too small.
  122 +
  123 + get_parent (optional but strongly recommended)
  124 + When given a dentry for a directory, this should return a dentry for
  125 + the parent. Quite possibly the parent dentry will have been allocated
  126 + by d_alloc_anon. The default get_parent function just returns an error
  127 + so any filehandle lookup that requires finding a parent will fail.
  128 + ->lookup("..") is *not* used as a default as it can leave ".." entries
  129 + in the dcache which are too messy to work with.
  130 +
  131 + get_name (optional)
  132 + When given a parent dentry and a child dentry, this should find a name
  133 + in the directory identified by the parent dentry, which leads to the
  134 + object identified by the child dentry. If no get_name function is
  135 + supplied, a default implementation is provided which uses vfs_readdir
  136 + to find potential names, and matches inode numbers to find the correct
  137 + match.
  138 +
  139 +
  140 +A filehandle fragment consists of an array of 1 or more 4byte words,
  141 +together with a one byte "type".
  142 +The decode_fh routine should not depend on the stated size that is
  143 +passed to it. This size may be larger than the original filehandle
  144 +generated by encode_fh, in which case it will have been padded with
  145 +nuls. Rather, the encode_fh routine should choose a "type" which
  146 +indicates the decode_fh how much of the filehandle is valid, and how
  147 +it should be interpreted.
Documentation/filesystems/nfs/nfs-rdma.txt
  1 +################################################################################
  2 +# #
  3 +# NFS/RDMA README #
  4 +# #
  5 +################################################################################
  6 +
  7 + Author: NetApp and Open Grid Computing
  8 + Date: May 29, 2008
  9 +
  10 +Table of Contents
  11 +~~~~~~~~~~~~~~~~~
  12 + - Overview
  13 + - Getting Help
  14 + - Installation
  15 + - Check RDMA and NFS Setup
  16 + - NFS/RDMA Setup
  17 +
  18 +Overview
  19 +~~~~~~~~
  20 +
  21 + This document describes how to install and setup the Linux NFS/RDMA client
  22 + and server software.
  23 +
  24 + The NFS/RDMA client was first included in Linux 2.6.24. The NFS/RDMA server
  25 + was first included in the following release, Linux 2.6.25.
  26 +
  27 + In our testing, we have obtained excellent performance results (full 10Gbit
  28 + wire bandwidth at minimal client CPU) under many workloads. The code passes
  29 + the full Connectathon test suite and operates over both Infiniband and iWARP
  30 + RDMA adapters.
  31 +
  32 +Getting Help
  33 +~~~~~~~~~~~~
  34 +
  35 + If you get stuck, you can ask questions on the
  36 +
  37 + nfs-rdma-devel@lists.sourceforge.net
  38 +
  39 + mailing list.
  40 +
  41 +Installation
  42 +~~~~~~~~~~~~
  43 +
  44 + These instructions are a step by step guide to building a machine for
  45 + use with NFS/RDMA.
  46 +
  47 + - Install an RDMA device
  48 +
  49 + Any device supported by the drivers in drivers/infiniband/hw is acceptable.
  50 +
  51 + Testing has been performed using several Mellanox-based IB cards, the
  52 + Ammasso AMS1100 iWARP adapter, and the Chelsio cxgb3 iWARP adapter.
  53 +
  54 + - Install a Linux distribution and tools
  55 +
  56 + The first kernel release to contain both the NFS/RDMA client and server was
  57 + Linux 2.6.25 Therefore, a distribution compatible with this and subsequent
  58 + Linux kernel release should be installed.
  59 +
  60 + The procedures described in this document have been tested with
  61 + distributions from Red Hat's Fedora Project (http://fedora.redhat.com/).
  62 +
  63 + - Install nfs-utils-1.1.2 or greater on the client
  64 +
  65 + An NFS/RDMA mount point can be obtained by using the mount.nfs command in
  66 + nfs-utils-1.1.2 or greater (nfs-utils-1.1.1 was the first nfs-utils
  67 + version with support for NFS/RDMA mounts, but for various reasons we
  68 + recommend using nfs-utils-1.1.2 or greater). To see which version of
  69 + mount.nfs you are using, type:
  70 +
  71 + $ /sbin/mount.nfs -V
  72 +
  73 + If the version is less than 1.1.2 or the command does not exist,
  74 + you should install the latest version of nfs-utils.
  75 +
  76 + Download the latest package from:
  77 +
  78 + http://www.kernel.org/pub/linux/utils/nfs
  79 +
  80 + Uncompress the package and follow the installation instructions.
  81 +
  82 + If you will not need the idmapper and gssd executables (you do not need
  83 + these to create an NFS/RDMA enabled mount command), the installation
  84 + process can be simplified by disabling these features when running
  85 + configure:
  86 +
  87 + $ ./configure --disable-gss --disable-nfsv4
  88 +
  89 + To build nfs-utils you will need the tcp_wrappers package installed. For
  90 + more information on this see the package's README and INSTALL files.
  91 +
  92 + After building the nfs-utils package, there will be a mount.nfs binary in
  93 + the utils/mount directory. This binary can be used to initiate NFS v2, v3,
  94 + or v4 mounts. To initiate a v4 mount, the binary must be called
  95 + mount.nfs4. The standard technique is to create a symlink called
  96 + mount.nfs4 to mount.nfs.
  97 +
  98 + This mount.nfs binary should be installed at /sbin/mount.nfs as follows:
  99 +
  100 + $ sudo cp utils/mount/mount.nfs /sbin/mount.nfs
  101 +
  102 + In this location, mount.nfs will be invoked automatically for NFS mounts
  103 + by the system mount command.
  104 +
  105 + NOTE: mount.nfs and therefore nfs-utils-1.1.2 or greater is only needed
  106 + on the NFS client machine. You do not need this specific version of
  107 + nfs-utils on the server. Furthermore, only the mount.nfs command from
  108 + nfs-utils-1.1.2 is needed on the client.
  109 +
  110 + - Install a Linux kernel with NFS/RDMA
  111 +
  112 + The NFS/RDMA client and server are both included in the mainline Linux
  113 + kernel version 2.6.25 and later. This and other versions of the 2.6 Linux
  114 + kernel can be found at:
  115 +
  116 + ftp://ftp.kernel.org/pub/linux/kernel/v2.6/
  117 +
  118 + Download the sources and place them in an appropriate location.
  119 +
  120 + - Configure the RDMA stack
  121 +
  122 + Make sure your kernel configuration has RDMA support enabled. Under
  123 + Device Drivers -> InfiniBand support, update the kernel configuration
  124 + to enable InfiniBand support [NOTE: the option name is misleading. Enabling
  125 + InfiniBand support is required for all RDMA devices (IB, iWARP, etc.)].
  126 +
  127 + Enable the appropriate IB HCA support (mlx4, mthca, ehca, ipath, etc.) or
  128 + iWARP adapter support (amso, cxgb3, etc.).
  129 +
  130 + If you are using InfiniBand, be sure to enable IP-over-InfiniBand support.
  131 +
  132 + - Configure the NFS client and server
  133 +
  134 + Your kernel configuration must also have NFS file system support and/or
  135 + NFS server support enabled. These and other NFS related configuration
  136 + options can be found under File Systems -> Network File Systems.
  137 +
  138 + - Build, install, reboot
  139 +
  140 + The NFS/RDMA code will be enabled automatically if NFS and RDMA
  141 + are turned on. The NFS/RDMA client and server are configured via the hidden
  142 + SUNRPC_XPRT_RDMA config option that depends on SUNRPC and INFINIBAND. The
  143 + value of SUNRPC_XPRT_RDMA will be:
  144 +
  145 + - N if either SUNRPC or INFINIBAND are N, in this case the NFS/RDMA client
  146 + and server will not be built
  147 + - M if both SUNRPC and INFINIBAND are on (M or Y) and at least one is M,
  148 + in this case the NFS/RDMA client and server will be built as modules
  149 + - Y if both SUNRPC and INFINIBAND are Y, in this case the NFS/RDMA client
  150 + and server will be built into the kernel
  151 +
  152 + Therefore, if you have followed the steps above and turned no NFS and RDMA,
  153 + the NFS/RDMA client and server will be built.
  154 +
  155 + Build a new kernel, install it, boot it.
  156 +
  157 +Check RDMA and NFS Setup
  158 +~~~~~~~~~~~~~~~~~~~~~~~~
  159 +
  160 + Before configuring the NFS/RDMA software, it is a good idea to test
  161 + your new kernel to ensure that the kernel is working correctly.
  162 + In particular, it is a good idea to verify that the RDMA stack
  163 + is functioning as expected and standard NFS over TCP/IP and/or UDP/IP
  164 + is working properly.
  165 +
  166 + - Check RDMA Setup
  167 +
  168 + If you built the RDMA components as modules, load them at
  169 + this time. For example, if you are using a Mellanox Tavor/Sinai/Arbel
  170 + card:
  171 +
  172 + $ modprobe ib_mthca
  173 + $ modprobe ib_ipoib
  174 +
  175 + If you are using InfiniBand, make sure there is a Subnet Manager (SM)
  176 + running on the network. If your IB switch has an embedded SM, you can
  177 + use it. Otherwise, you will need to run an SM, such as OpenSM, on one
  178 + of your end nodes.
  179 +
  180 + If an SM is running on your network, you should see the following:
  181 +
  182 + $ cat /sys/class/infiniband/driverX/ports/1/state
  183 + 4: ACTIVE
  184 +
  185 + where driverX is mthca0, ipath5, ehca3, etc.
  186 +
  187 + To further test the InfiniBand software stack, use IPoIB (this
  188 + assumes you have two IB hosts named host1 and host2):
  189 +
  190 + host1$ ifconfig ib0 a.b.c.x
  191 + host2$ ifconfig ib0 a.b.c.y
  192 + host1$ ping a.b.c.y
  193 + host2$ ping a.b.c.x
  194 +
  195 + For other device types, follow the appropriate procedures.
  196 +
  197 + - Check NFS Setup
  198 +
  199 + For the NFS components enabled above (client and/or server),
  200 + test their functionality over standard Ethernet using TCP/IP or UDP/IP.
  201 +
  202 +NFS/RDMA Setup
  203 +~~~~~~~~~~~~~~
  204 +
  205 + We recommend that you use two machines, one to act as the client and
  206 + one to act as the server.
  207 +
  208 + One time configuration:
  209 +
  210 + - On the server system, configure the /etc/exports file and
  211 + start the NFS/RDMA server.
  212 +
  213 + Exports entries with the following formats have been tested:
  214 +
  215 + /vol0 192.168.0.47(fsid=0,rw,async,insecure,no_root_squash)
  216 + /vol0 192.168.0.0/255.255.255.0(fsid=0,rw,async,insecure,no_root_squash)
  217 +
  218 + The IP address(es) is(are) the client's IPoIB address for an InfiniBand
  219 + HCA or the cleint's iWARP address(es) for an RNIC.
  220 +
  221 + NOTE: The "insecure" option must be used because the NFS/RDMA client does
  222 + not use a reserved port.
  223 +
  224 + Each time a machine boots:
  225 +
  226 + - Load and configure the RDMA drivers
  227 +
  228 + For InfiniBand using a Mellanox adapter:
  229 +
  230 + $ modprobe ib_mthca
  231 + $ modprobe ib_ipoib
  232 + $ ifconfig ib0 a.b.c.d
  233 +
  234 + NOTE: use unique addresses for the client and server
  235 +
  236 + - Start the NFS server
  237 +
  238 + If the NFS/RDMA server was built as a module (CONFIG_SUNRPC_XPRT_RDMA=m in
  239 + kernel config), load the RDMA transport module:
  240 +
  241 + $ modprobe svcrdma
  242 +
  243 + Regardless of how the server was built (module or built-in), start the
  244 + server:
  245 +
  246 + $ /etc/init.d/nfs start
  247 +
  248 + or
  249 +
  250 + $ service nfs start
  251 +
  252 + Instruct the server to listen on the RDMA transport:
  253 +
  254 + $ echo rdma 20049 > /proc/fs/nfsd/portlist
  255 +
  256 + - On the client system
  257 +
  258 + If the NFS/RDMA client was built as a module (CONFIG_SUNRPC_XPRT_RDMA=m in
  259 + kernel config), load the RDMA client module:
  260 +
  261 + $ modprobe xprtrdma.ko
  262 +
  263 + Regardless of how the client was built (module or built-in), use this
  264 + command to mount the NFS/RDMA server:
  265 +
  266 + $ mount -o rdma,port=20049 <IPoIB-server-name-or-address>:/<export> /mnt
  267 +
  268 + To verify that the mount is using RDMA, run "cat /proc/mounts" and check
  269 + the "proto" field for the given mount.
  270 +
  271 + Congratulations! You're using NFS/RDMA!
Documentation/filesystems/nfs/nfs.txt
  1 +
  2 +The NFS client
  3 +==============
  4 +
  5 +The NFS version 2 protocol was first documented in RFC1094 (March 1989).
  6 +Since then two more major releases of NFS have been published, with NFSv3
  7 +being documented in RFC1813 (June 1995), and NFSv4 in RFC3530 (April
  8 +2003).
  9 +
  10 +The Linux NFS client currently supports all the above published versions,
  11 +and work is in progress on adding support for minor version 1 of the NFSv4
  12 +protocol.
  13 +
  14 +The purpose of this document is to provide information on some of the
  15 +upcall interfaces that are used in order to provide the NFS client with
  16 +some of the information that it requires in order to fully comply with
  17 +the NFS spec.
  18 +
  19 +The DNS resolver
  20 +================
  21 +
  22 +NFSv4 allows for one server to refer the NFS client to data that has been
  23 +migrated onto another server by means of the special "fs_locations"
  24 +attribute. See
  25 + http://tools.ietf.org/html/rfc3530#section-6
  26 +and
  27 + http://tools.ietf.org/html/draft-ietf-nfsv4-referrals-00
  28 +
  29 +The fs_locations information can take the form of either an ip address and
  30 +a path, or a DNS hostname and a path. The latter requires the NFS client to
  31 +do a DNS lookup in order to mount the new volume, and hence the need for an
  32 +upcall to allow userland to provide this service.
  33 +
  34 +Assuming that the user has the 'rpc_pipefs' filesystem mounted in the usual
  35 +/var/lib/nfs/rpc_pipefs, the upcall consists of the following steps:
  36 +
  37 + (1) The process checks the dns_resolve cache to see if it contains a
  38 + valid entry. If so, it returns that entry and exits.
  39 +
  40 + (2) If no valid entry exists, the helper script '/sbin/nfs_cache_getent'
  41 + (may be changed using the 'nfs.cache_getent' kernel boot parameter)
  42 + is run, with two arguments:
  43 + - the cache name, "dns_resolve"
  44 + - the hostname to resolve
  45 +
  46 + (3) After looking up the corresponding ip address, the helper script
  47 + writes the result into the rpc_pipefs pseudo-file
  48 + '/var/lib/nfs/rpc_pipefs/cache/dns_resolve/channel'
  49 + in the following (text) format:
  50 +
  51 + "<ip address> <hostname> <ttl>\n"
  52 +
  53 + Where <ip address> is in the usual IPv4 (123.456.78.90) or IPv6
  54 + (ffee:ddcc:bbaa:9988:7766:5544:3322:1100, ffee::1100, ...) format.
  55 + <hostname> is identical to the second argument of the helper
  56 + script, and <ttl> is the 'time to live' of this cache entry (in
  57 + units of seconds).
  58 +
  59 + Note: If <ip address> is invalid, say the string "0", then a negative
  60 + entry is created, which will cause the kernel to treat the hostname
  61 + as having no valid DNS translation.
  62 +
  63 +
  64 +
  65 +
  66 +A basic sample /sbin/nfs_cache_getent
  67 +=====================================
  68 +
  69 +#!/bin/bash
  70 +#
  71 +ttl=600
  72 +#
  73 +cut=/usr/bin/cut
  74 +getent=/usr/bin/getent
  75 +rpc_pipefs=/var/lib/nfs/rpc_pipefs
  76 +#
  77 +die()
  78 +{
  79 + echo "Usage: $0 cache_name entry_name"
  80 + exit 1
  81 +}
  82 +
  83 +[ $# -lt 2 ] && die
  84 +cachename="$1"
  85 +cache_path=${rpc_pipefs}/cache/${cachename}/channel
  86 +
  87 +case "${cachename}" in
  88 + dns_resolve)
  89 + name="$2"
  90 + result="$(${getent} hosts ${name} | ${cut} -f1 -d\ )"
  91 + [ -z "${result}" ] && result="0"
  92 + ;;
  93 + *)
  94 + die
  95 + ;;
  96 +esac
  97 +echo "${result} ${name} ${ttl}" >${cache_path}
Documentation/filesystems/nfs/nfs41-server.txt
  1 +NFSv4.1 Server Implementation
  2 +
  3 +Server support for minorversion 1 can be controlled using the
  4 +/proc/fs/nfsd/versions control file. The string output returned
  5 +by reading this file will contain either "+4.1" or "-4.1"
  6 +correspondingly.
  7 +
  8 +Currently, server support for minorversion 1 is disabled by default.
  9 +It can be enabled at run time by writing the string "+4.1" to
  10 +the /proc/fs/nfsd/versions control file. Note that to write this
  11 +control file, the nfsd service must be taken down. Use your user-mode
  12 +nfs-utils to set this up; see rpc.nfsd(8)
  13 +
  14 +(Warning: older servers will interpret "+4.1" and "-4.1" as "+4" and
  15 +"-4", respectively. Therefore, code meant to work on both new and old
  16 +kernels must turn 4.1 on or off *before* turning support for version 4
  17 +on or off; rpc.nfsd does this correctly.)
  18 +
  19 +The NFSv4 minorversion 1 (NFSv4.1) implementation in nfsd is based
  20 +on the latest NFSv4.1 Internet Draft:
  21 +http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion1-29
  22 +
  23 +From the many new features in NFSv4.1 the current implementation
  24 +focuses on the mandatory-to-implement NFSv4.1 Sessions, providing
  25 +"exactly once" semantics and better control and throttling of the
  26 +resources allocated for each client.
  27 +
  28 +Other NFSv4.1 features, Parallel NFS operations in particular,
  29 +are still under development out of tree.
  30 +See http://wiki.linux-nfs.org/wiki/index.php/PNFS_prototype_design
  31 +for more information.
  32 +
  33 +The current implementation is intended for developers only: while it
  34 +does support ordinary file operations on clients we have tested against
  35 +(including the linux client), it is incomplete in ways which may limit
  36 +features unexpectedly, cause known bugs in rare cases, or cause
  37 +interoperability problems with future clients. Known issues:
  38 +
  39 + - gss support is questionable: currently mounts with kerberos
  40 + from a linux client are possible, but we aren't really
  41 + conformant with the spec (for example, we don't use kerberos
  42 + on the backchannel correctly).
  43 + - no trunking support: no clients currently take advantage of
  44 + trunking, but this is a mandatory feature, and its use is
  45 + recommended to clients in a number of places. (E.g. to ensure
  46 + timely renewal in case an existing connection's retry timeouts
  47 + have gotten too long; see section 8.3 of the draft.)
  48 + Therefore, lack of this feature may cause future clients to
  49 + fail.
  50 + - Incomplete backchannel support: incomplete backchannel gss
  51 + support and no support for BACKCHANNEL_CTL mean that
  52 + callbacks (hence delegations and layouts) may not be
  53 + available and clients confused by the incomplete
  54 + implementation may fail.
  55 + - Server reboot recovery is unsupported; if the server reboots,
  56 + clients may fail.
  57 + - We do not support SSV, which provides security for shared
  58 + client-server state (thus preventing unauthorized tampering
  59 + with locks and opens, for example). It is mandatory for
  60 + servers to support this, though no clients use it yet.
  61 + - Mandatory operations which we do not support, such as
  62 + DESTROY_CLIENTID, FREE_STATEID, SECINFO_NO_NAME, and
  63 + TEST_STATEID, are not currently used by clients, but will be
  64 + (and the spec recommends their uses in common cases), and
  65 + clients should not be expected to know how to recover from the
  66 + case where they are not supported. This will eventually cause
  67 + interoperability failures.
  68 +
  69 +In addition, some limitations are inherited from the current NFSv4
  70 +implementation:
  71 +
  72 + - Incomplete delegation enforcement: if a file is renamed or
  73 + unlinked, a client holding a delegation may continue to
  74 + indefinitely allow opens of the file under the old name.
  75 +
  76 +The table below, taken from the NFSv4.1 document, lists
  77 +the operations that are mandatory to implement (REQ), optional
  78 +(OPT), and NFSv4.0 operations that are required not to implement (MNI)
  79 +in minor version 1. The first column indicates the operations that
  80 +are not supported yet by the linux server implementation.
  81 +
  82 +The OPTIONAL features identified and their abbreviations are as follows:
  83 + pNFS Parallel NFS
  84 + FDELG File Delegations
  85 + DDELG Directory Delegations
  86 +
  87 +The following abbreviations indicate the linux server implementation status.
  88 + I Implemented NFSv4.1 operations.
  89 + NS Not Supported.
  90 + NS* unimplemented optional feature.
  91 + P pNFS features implemented out of tree.
  92 + PNS pNFS features that are not supported yet (out of tree).
  93 +
  94 +Operations
  95 +
  96 + +----------------------+------------+--------------+----------------+
  97 + | Operation | REQ, REC, | Feature | Definition |
  98 + | | OPT, or | (REQ, REC, | |
  99 + | | MNI | or OPT) | |
  100 + +----------------------+------------+--------------+----------------+
  101 + | ACCESS | REQ | | Section 18.1 |
  102 +NS | BACKCHANNEL_CTL | REQ | | Section 18.33 |
  103 +NS | BIND_CONN_TO_SESSION | REQ | | Section 18.34 |
  104 + | CLOSE | REQ | | Section 18.2 |
  105 + | COMMIT | REQ | | Section 18.3 |
  106 + | CREATE | REQ | | Section 18.4 |
  107 +I | CREATE_SESSION | REQ | | Section 18.36 |
  108 +NS*| DELEGPURGE | OPT | FDELG (REQ) | Section 18.5 |
  109 + | DELEGRETURN | OPT | FDELG, | Section 18.6 |
  110 + | | | DDELG, pNFS | |
  111 + | | | (REQ) | |
  112 +NS | DESTROY_CLIENTID | REQ | | Section 18.50 |
  113 +I | DESTROY_SESSION | REQ | | Section 18.37 |
  114 +I | EXCHANGE_ID | REQ | | Section 18.35 |
  115 +NS | FREE_STATEID | REQ | | Section 18.38 |
  116 + | GETATTR | REQ | | Section 18.7 |
  117 +P | GETDEVICEINFO | OPT | pNFS (REQ) | Section 18.40 |
  118 +P | GETDEVICELIST | OPT | pNFS (OPT) | Section 18.41 |
  119 + | GETFH | REQ | | Section 18.8 |
  120 +NS*| GET_DIR_DELEGATION | OPT | DDELG (REQ) | Section 18.39 |
  121 +P | LAYOUTCOMMIT | OPT | pNFS (REQ) | Section 18.42 |
  122 +P | LAYOUTGET | OPT | pNFS (REQ) | Section 18.43 |
  123 +P | LAYOUTRETURN | OPT | pNFS (REQ) | Section 18.44 |
  124 + | LINK | OPT | | Section 18.9 |
  125 + | LOCK | REQ | | Section 18.10 |
  126 + | LOCKT | REQ | | Section 18.11 |
  127 + | LOCKU | REQ | | Section 18.12 |
  128 + | LOOKUP | REQ | | Section 18.13 |
  129 + | LOOKUPP | REQ | | Section 18.14 |
  130 + | NVERIFY | REQ | | Section 18.15 |
  131 + | OPEN | REQ | | Section 18.16 |
  132 +NS*| OPENATTR | OPT | | Section 18.17 |
  133 + | OPEN_CONFIRM | MNI | | N/A |
  134 + | OPEN_DOWNGRADE | REQ | | Section 18.18 |
  135 + | PUTFH | REQ | | Section 18.19 |
  136 + | PUTPUBFH | REQ | | Section 18.20 |
  137 + | PUTROOTFH | REQ | | Section 18.21 |
  138 + | READ | REQ | | Section 18.22 |
  139 + | READDIR | REQ | | Section 18.23 |
  140 + | READLINK | OPT | | Section 18.24 |
  141 +NS | RECLAIM_COMPLETE | REQ | | Section 18.51 |
  142 + | RELEASE_LOCKOWNER | MNI | | N/A |
  143 + | REMOVE | REQ | | Section 18.25 |
  144 + | RENAME | REQ | | Section 18.26 |
  145 + | RENEW | MNI | | N/A |
  146 + | RESTOREFH | REQ | | Section 18.27 |
  147 + | SAVEFH | REQ | | Section 18.28 |
  148 + | SECINFO | REQ | | Section 18.29 |
  149 +NS | SECINFO_NO_NAME | REC | pNFS files | Section 18.45, |
  150 + | | | layout (REQ) | Section 13.12 |
  151 +I | SEQUENCE | REQ | | Section 18.46 |
  152 + | SETATTR | REQ | | Section 18.30 |
  153 + | SETCLIENTID | MNI | | N/A |
  154 + | SETCLIENTID_CONFIRM | MNI | | N/A |
  155 +NS | SET_SSV | REQ | | Section 18.47 |
  156 +NS | TEST_STATEID | REQ | | Section 18.48 |
  157 + | VERIFY | REQ | | Section 18.31 |
  158 +NS*| WANT_DELEGATION | OPT | FDELG (OPT) | Section 18.49 |
  159 + | WRITE | REQ | | Section 18.32 |
  160 +
  161 +Callback Operations
  162 +
  163 + +-------------------------+-----------+-------------+---------------+
  164 + | Operation | REQ, REC, | Feature | Definition |
  165 + | | OPT, or | (REQ, REC, | |
  166 + | | MNI | or OPT) | |
  167 + +-------------------------+-----------+-------------+---------------+
  168 + | CB_GETATTR | OPT | FDELG (REQ) | Section 20.1 |
  169 +P | CB_LAYOUTRECALL | OPT | pNFS (REQ) | Section 20.3 |
  170 +NS*| CB_NOTIFY | OPT | DDELG (REQ) | Section 20.4 |
  171 +P | CB_NOTIFY_DEVICEID | OPT | pNFS (OPT) | Section 20.12 |
  172 +NS*| CB_NOTIFY_LOCK | OPT | | Section 20.11 |
  173 +NS*| CB_PUSH_DELEG | OPT | FDELG (OPT) | Section 20.5 |
  174 + | CB_RECALL | OPT | FDELG, | Section 20.2 |
  175 + | | | DDELG, pNFS | |
  176 + | | | (REQ) | |
  177 +NS*| CB_RECALL_ANY | OPT | FDELG, | Section 20.6 |
  178 + | | | DDELG, pNFS | |
  179 + | | | (REQ) | |
  180 +NS | CB_RECALL_SLOT | REQ | | Section 20.8 |
  181 +NS*| CB_RECALLABLE_OBJ_AVAIL | OPT | DDELG, pNFS | Section 20.7 |
  182 + | | | (REQ) | |
  183 +I | CB_SEQUENCE | OPT | FDELG, | Section 20.9 |
  184 + | | | DDELG, pNFS | |
  185 + | | | (REQ) | |
  186 +NS*| CB_WANTS_CANCELLED | OPT | FDELG, | Section 20.10 |
  187 + | | | DDELG, pNFS | |
  188 + | | | (REQ) | |
  189 + +-------------------------+-----------+-------------+---------------+
  190 +
  191 +Implementation notes:
  192 +
  193 +DELEGPURGE:
  194 +* mandatory only for servers that support CLAIM_DELEGATE_PREV and/or
  195 + CLAIM_DELEG_PREV_FH (which allows clients to keep delegations that
  196 + persist across client reboots). Thus we need not implement this for
  197 + now.
  198 +
  199 +EXCHANGE_ID:
  200 +* only SP4_NONE state protection supported
  201 +* implementation ids are ignored
  202 +
  203 +CREATE_SESSION:
  204 +* backchannel attributes are ignored
  205 +* backchannel security parameters are ignored
  206 +
  207 +SEQUENCE:
  208 +* no support for dynamic slot table renegotiation (optional)
  209 +
  210 +nfsv4.1 COMPOUND rules:
  211 +The following cases aren't supported yet:
  212 +* Enforcing of NFS4ERR_NOT_ONLY_OP for: BIND_CONN_TO_SESSION, CREATE_SESSION,
  213 + DESTROY_CLIENTID, DESTROY_SESSION, EXCHANGE_ID.
  214 +* DESTROY_SESSION MUST be the final operation in the COMPOUND request.
  215 +
  216 +Nonstandard compound limitations:
  217 +* No support for a sessions fore channel RPC compound that requires both a
  218 + ca_maxrequestsize request and a ca_maxresponsesize reply, so we may
  219 + fail to live up to the promise we made in CREATE_SESSION fore channel
  220 + negotiation.
  221 +* No more than one IO operation (read, write, readdir) allowed per
  222 + compound.
Documentation/filesystems/nfs/nfsroot.txt
  1 +Mounting the root filesystem via NFS (nfsroot)
  2 +===============================================
  3 +
  4 +Written 1996 by Gero Kuhlmann <gero@gkminix.han.de>
  5 +Updated 1997 by Martin Mares <mj@atrey.karlin.mff.cuni.cz>
  6 +Updated 2006 by Nico Schottelius <nico-kernel-nfsroot@schottelius.org>
  7 +Updated 2006 by Horms <horms@verge.net.au>
  8 +
  9 +
  10 +
  11 +In order to use a diskless system, such as an X-terminal or printer server
  12 +for example, it is necessary for the root filesystem to be present on a
  13 +non-disk device. This may be an initramfs (see Documentation/filesystems/
  14 +ramfs-rootfs-initramfs.txt), a ramdisk (see Documentation/initrd.txt) or a
  15 +filesystem mounted via NFS. The following text describes on how to use NFS
  16 +for the root filesystem. For the rest of this text 'client' means the
  17 +diskless system, and 'server' means the NFS server.
  18 +
  19 +
  20 +
  21 +
  22 +1.) Enabling nfsroot capabilities
  23 + -----------------------------
  24 +
  25 +In order to use nfsroot, NFS client support needs to be selected as
  26 +built-in during configuration. Once this has been selected, the nfsroot
  27 +option will become available, which should also be selected.
  28 +
  29 +In the networking options, kernel level autoconfiguration can be selected,
  30 +along with the types of autoconfiguration to support. Selecting all of
  31 +DHCP, BOOTP and RARP is safe.
  32 +
  33 +
  34 +
  35 +
  36 +2.) Kernel command line
  37 + -------------------
  38 +
  39 +When the kernel has been loaded by a boot loader (see below) it needs to be
  40 +told what root fs device to use. And in the case of nfsroot, where to find
  41 +both the server and the name of the directory on the server to mount as root.
  42 +This can be established using the following kernel command line parameters:
  43 +
  44 +
  45 +root=/dev/nfs
  46 +
  47 + This is necessary to enable the pseudo-NFS-device. Note that it's not a
  48 + real device but just a synonym to tell the kernel to use NFS instead of
  49 + a real device.
  50 +
  51 +
  52 +nfsroot=[<server-ip>:]<root-dir>[,<nfs-options>]
  53 +
  54 + If the `nfsroot' parameter is NOT given on the command line,
  55 + the default "/tftpboot/%s" will be used.
  56 +
  57 + <server-ip> Specifies the IP address of the NFS server.
  58 + The default address is determined by the `ip' parameter
  59 + (see below). This parameter allows the use of different
  60 + servers for IP autoconfiguration and NFS.
  61 +
  62 + <root-dir> Name of the directory on the server to mount as root.
  63 + If there is a "%s" token in the string, it will be
  64 + replaced by the ASCII-representation of the client's
  65 + IP address.
  66 +
  67 + <nfs-options> Standard NFS options. All options are separated by commas.
  68 + The following defaults are used:
  69 + port = as given by server portmap daemon
  70 + rsize = 4096
  71 + wsize = 4096
  72 + timeo = 7
  73 + retrans = 3
  74 + acregmin = 3
  75 + acregmax = 60
  76 + acdirmin = 30
  77 + acdirmax = 60
  78 + flags = hard, nointr, noposix, cto, ac
  79 +
  80 +
  81 +ip=<client-ip>:<server-ip>:<gw-ip>:<netmask>:<hostname>:<device>:<autoconf>
  82 +
  83 + This parameter tells the kernel how to configure IP addresses of devices
  84 + and also how to set up the IP routing table. It was originally called
  85 + `nfsaddrs', but now the boot-time IP configuration works independently of
  86 + NFS, so it was renamed to `ip' and the old name remained as an alias for
  87 + compatibility reasons.
  88 +
  89 + If this parameter is missing from the kernel command line, all fields are
  90 + assumed to be empty, and the defaults mentioned below apply. In general
  91 + this means that the kernel tries to configure everything using
  92 + autoconfiguration.
  93 +
  94 + The <autoconf> parameter can appear alone as the value to the `ip'
  95 + parameter (without all the ':' characters before). If the value is
  96 + "ip=off" or "ip=none", no autoconfiguration will take place, otherwise
  97 + autoconfiguration will take place. The most common way to use this
  98 + is "ip=dhcp".
  99 +
  100 + <client-ip> IP address of the client.
  101 +
  102 + Default: Determined using autoconfiguration.
  103 +
  104 + <server-ip> IP address of the NFS server. If RARP is used to determine
  105 + the client address and this parameter is NOT empty only
  106 + replies from the specified server are accepted.
  107 +
  108 + Only required for NFS root. That is autoconfiguration
  109 + will not be triggered if it is missing and NFS root is not
  110 + in operation.
  111 +
  112 + Default: Determined using autoconfiguration.
  113 + The address of the autoconfiguration server is used.
  114 +
  115 + <gw-ip> IP address of a gateway if the server is on a different subnet.
  116 +
  117 + Default: Determined using autoconfiguration.
  118 +
  119 + <netmask> Netmask for local network interface. If unspecified
  120 + the netmask is derived from the client IP address assuming
  121 + classful addressing.
  122 +
  123 + Default: Determined using autoconfiguration.
  124 +
  125 + <hostname> Name of the client. May be supplied by autoconfiguration,
  126 + but its absence will not trigger autoconfiguration.
  127 +
  128 + Default: Client IP address is used in ASCII notation.
  129 +
  130 + <device> Name of network device to use.
  131 +
  132 + Default: If the host only has one device, it is used.
  133 + Otherwise the device is determined using
  134 + autoconfiguration. This is done by sending
  135 + autoconfiguration requests out of all devices,
  136 + and using the device that received the first reply.
  137 +
  138 + <autoconf> Method to use for autoconfiguration. In the case of options
  139 + which specify multiple autoconfiguration protocols,
  140 + requests are sent using all protocols, and the first one
  141 + to reply is used.
  142 +
  143 + Only autoconfiguration protocols that have been compiled
  144 + into the kernel will be used, regardless of the value of
  145 + this option.
  146 +
  147 + off or none: don't use autoconfiguration
  148 + (do static IP assignment instead)
  149 + on or any: use any protocol available in the kernel
  150 + (default)
  151 + dhcp: use DHCP
  152 + bootp: use BOOTP
  153 + rarp: use RARP
  154 + both: use both BOOTP and RARP but not DHCP
  155 + (old option kept for backwards compatibility)
  156 +
  157 + Default: any
  158 +
  159 +
  160 +
  161 +
  162 +3.) Boot Loader
  163 + ----------
  164 +
  165 +To get the kernel into memory different approaches can be used.
  166 +They depend on various facilities being available:
  167 +
  168 +
  169 +3.1) Booting from a floppy using syslinux
  170 +
  171 + When building kernels, an easy way to create a boot floppy that uses
  172 + syslinux is to use the zdisk or bzdisk make targets which use zimage
  173 + and bzimage images respectively. Both targets accept the
  174 + FDARGS parameter which can be used to set the kernel command line.
  175 +
  176 + e.g.
  177 + make bzdisk FDARGS="root=/dev/nfs"
  178 +
  179 + Note that the user running this command will need to have
  180 + access to the floppy drive device, /dev/fd0
  181 +
  182 + For more information on syslinux, including how to create bootdisks
  183 + for prebuilt kernels, see http://syslinux.zytor.com/
  184 +
  185 + N.B: Previously it was possible to write a kernel directly to
  186 + a floppy using dd, configure the boot device using rdev, and
  187 + boot using the resulting floppy. Linux no longer supports this
  188 + method of booting.
  189 +
  190 +3.2) Booting from a cdrom using isolinux
  191 +
  192 + When building kernels, an easy way to create a bootable cdrom that
  193 + uses isolinux is to use the isoimage target which uses a bzimage
  194 + image. Like zdisk and bzdisk, this target accepts the FDARGS
  195 + parameter which can be used to set the kernel command line.
  196 +
  197 + e.g.
  198 + make isoimage FDARGS="root=/dev/nfs"
  199 +
  200 + The resulting iso image will be arch/<ARCH>/boot/image.iso
  201 + This can be written to a cdrom using a variety of tools including
  202 + cdrecord.
  203 +
  204 + e.g.
  205 + cdrecord dev=ATAPI:1,0,0 arch/i386/boot/image.iso
  206 +
  207 + For more information on isolinux, including how to create bootdisks
  208 + for prebuilt kernels, see http://syslinux.zytor.com/
  209 +
  210 +3.2) Using LILO
  211 + When using LILO all the necessary command line parameters may be
  212 + specified using the 'append=' directive in the LILO configuration
  213 + file.
  214 +
  215 + However, to use the 'root=' directive you also need to create
  216 + a dummy root device, which may be removed after LILO is run.
  217 +
  218 + mknod /dev/boot255 c 0 255
  219 +
  220 + For information on configuring LILO, please refer to its documentation.
  221 +
  222 +3.3) Using GRUB
  223 + When using GRUB, kernel parameter are simply appended after the kernel
  224 + specification: kernel <kernel> <parameters>
  225 +
  226 +3.4) Using loadlin
  227 + loadlin may be used to boot Linux from a DOS command prompt without
  228 + requiring a local hard disk to mount as root. This has not been
  229 + thoroughly tested by the authors of this document, but in general
  230 + it should be possible configure the kernel command line similarly
  231 + to the configuration of LILO.
  232 +
  233 + Please refer to the loadlin documentation for further information.
  234 +
  235 +3.5) Using a boot ROM
  236 + This is probably the most elegant way of booting a diskless client.
  237 + With a boot ROM the kernel is loaded using the TFTP protocol. The
  238 + authors of this document are not aware of any no commercial boot
  239 + ROMs that support booting Linux over the network. However, there
  240 + are two free implementations of a boot ROM, netboot-nfs and
  241 + etherboot, both of which are available on sunsite.unc.edu, and both
  242 + of which contain everything you need to boot a diskless Linux client.
  243 +
  244 +3.6) Using pxelinux
  245 + Pxelinux may be used to boot linux using the PXE boot loader
  246 + which is present on many modern network cards.
  247 +
  248 + When using pxelinux, the kernel image is specified using
  249 + "kernel <relative-path-below /tftpboot>". The nfsroot parameters
  250 + are passed to the kernel by adding them to the "append" line.
  251 + It is common to use serial console in conjunction with pxeliunx,
  252 + see Documentation/serial-console.txt for more information.
  253 +
  254 + For more information on isolinux, including how to create bootdisks
  255 + for prebuilt kernels, see http://syslinux.zytor.com/
  256 +
  257 +
  258 +
  259 +
  260 +4.) Credits
  261 + -------
  262 +
  263 + The nfsroot code in the kernel and the RARP support have been written
  264 + by Gero Kuhlmann <gero@gkminix.han.de>.
  265 +
  266 + The rest of the IP layer autoconfiguration code has been written
  267 + by Martin Mares <mj@atrey.karlin.mff.cuni.cz>.
  268 +
  269 + In order to write the initial version of nfsroot I would like to thank
  270 + Jens-Uwe Mager <jum@anubis.han.de> for his help.
Documentation/filesystems/nfs41-server.txt
1   -NFSv4.1 Server Implementation
2   -
3   -Server support for minorversion 1 can be controlled using the
4   -/proc/fs/nfsd/versions control file. The string output returned
5   -by reading this file will contain either "+4.1" or "-4.1"
6   -correspondingly.
7   -
8   -Currently, server support for minorversion 1 is disabled by default.
9   -It can be enabled at run time by writing the string "+4.1" to
10   -the /proc/fs/nfsd/versions control file. Note that to write this
11   -control file, the nfsd service must be taken down. Use your user-mode
12   -nfs-utils to set this up; see rpc.nfsd(8)
13   -
14   -(Warning: older servers will interpret "+4.1" and "-4.1" as "+4" and
15   -"-4", respectively. Therefore, code meant to work on both new and old
16   -kernels must turn 4.1 on or off *before* turning support for version 4
17   -on or off; rpc.nfsd does this correctly.)
18   -
19   -The NFSv4 minorversion 1 (NFSv4.1) implementation in nfsd is based
20   -on the latest NFSv4.1 Internet Draft:
21   -http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion1-29
22   -
23   -From the many new features in NFSv4.1 the current implementation
24   -focuses on the mandatory-to-implement NFSv4.1 Sessions, providing
25   -"exactly once" semantics and better control and throttling of the
26   -resources allocated for each client.
27   -
28   -Other NFSv4.1 features, Parallel NFS operations in particular,
29   -are still under development out of tree.
30   -See http://wiki.linux-nfs.org/wiki/index.php/PNFS_prototype_design
31   -for more information.
32   -
33   -The current implementation is intended for developers only: while it
34   -does support ordinary file operations on clients we have tested against
35   -(including the linux client), it is incomplete in ways which may limit
36   -features unexpectedly, cause known bugs in rare cases, or cause
37   -interoperability problems with future clients. Known issues:
38   -
39   - - gss support is questionable: currently mounts with kerberos
40   - from a linux client are possible, but we aren't really
41   - conformant with the spec (for example, we don't use kerberos
42   - on the backchannel correctly).
43   - - no trunking support: no clients currently take advantage of
44   - trunking, but this is a mandatory feature, and its use is
45   - recommended to clients in a number of places. (E.g. to ensure
46   - timely renewal in case an existing connection's retry timeouts
47   - have gotten too long; see section 8.3 of the draft.)
48   - Therefore, lack of this feature may cause future clients to
49   - fail.
50   - - Incomplete backchannel support: incomplete backchannel gss
51   - support and no support for BACKCHANNEL_CTL mean that
52   - callbacks (hence delegations and layouts) may not be
53   - available and clients confused by the incomplete
54   - implementation may fail.
55   - - Server reboot recovery is unsupported; if the server reboots,
56   - clients may fail.
57   - - We do not support SSV, which provides security for shared
58   - client-server state (thus preventing unauthorized tampering
59   - with locks and opens, for example). It is mandatory for
60   - servers to support this, though no clients use it yet.
61   - - Mandatory operations which we do not support, such as
62   - DESTROY_CLIENTID, FREE_STATEID, SECINFO_NO_NAME, and
63   - TEST_STATEID, are not currently used by clients, but will be
64   - (and the spec recommends their uses in common cases), and
65   - clients should not be expected to know how to recover from the
66   - case where they are not supported. This will eventually cause
67   - interoperability failures.
68   -
69   -In addition, some limitations are inherited from the current NFSv4
70   -implementation:
71   -
72   - - Incomplete delegation enforcement: if a file is renamed or
73   - unlinked, a client holding a delegation may continue to
74   - indefinitely allow opens of the file under the old name.
75   -
76   -The table below, taken from the NFSv4.1 document, lists
77   -the operations that are mandatory to implement (REQ), optional
78   -(OPT), and NFSv4.0 operations that are required not to implement (MNI)
79   -in minor version 1. The first column indicates the operations that
80   -are not supported yet by the linux server implementation.
81   -
82   -The OPTIONAL features identified and their abbreviations are as follows:
83   - pNFS Parallel NFS
84   - FDELG File Delegations
85   - DDELG Directory Delegations
86   -
87   -The following abbreviations indicate the linux server implementation status.
88   - I Implemented NFSv4.1 operations.
89   - NS Not Supported.
90   - NS* unimplemented optional feature.
91   - P pNFS features implemented out of tree.
92   - PNS pNFS features that are not supported yet (out of tree).
93   -
94   -Operations
95   -
96   - +----------------------+------------+--------------+----------------+
97   - | Operation | REQ, REC, | Feature | Definition |
98   - | | OPT, or | (REQ, REC, | |
99   - | | MNI | or OPT) | |
100   - +----------------------+------------+--------------+----------------+
101   - | ACCESS | REQ | | Section 18.1 |
102   -NS | BACKCHANNEL_CTL | REQ | | Section 18.33 |
103   -NS | BIND_CONN_TO_SESSION | REQ | | Section 18.34 |
104   - | CLOSE | REQ | | Section 18.2 |
105   - | COMMIT | REQ | | Section 18.3 |
106   - | CREATE | REQ | | Section 18.4 |
107   -I | CREATE_SESSION | REQ | | Section 18.36 |
108   -NS*| DELEGPURGE | OPT | FDELG (REQ) | Section 18.5 |
109   - | DELEGRETURN | OPT | FDELG, | Section 18.6 |
110   - | | | DDELG, pNFS | |
111   - | | | (REQ) | |
112   -NS | DESTROY_CLIENTID | REQ | | Section 18.50 |
113   -I | DESTROY_SESSION | REQ | | Section 18.37 |
114   -I | EXCHANGE_ID | REQ | | Section 18.35 |
115   -NS | FREE_STATEID | REQ | | Section 18.38 |
116   - | GETATTR | REQ | | Section 18.7 |
117   -P | GETDEVICEINFO | OPT | pNFS (REQ) | Section 18.40 |
118   -P | GETDEVICELIST | OPT | pNFS (OPT) | Section 18.41 |
119   - | GETFH | REQ | | Section 18.8 |
120   -NS*| GET_DIR_DELEGATION | OPT | DDELG (REQ) | Section 18.39 |
121   -P | LAYOUTCOMMIT | OPT | pNFS (REQ) | Section 18.42 |
122   -P | LAYOUTGET | OPT | pNFS (REQ) | Section 18.43 |
123   -P | LAYOUTRETURN | OPT | pNFS (REQ) | Section 18.44 |
124   - | LINK | OPT | | Section 18.9 |
125   - | LOCK | REQ | | Section 18.10 |
126   - | LOCKT | REQ | | Section 18.11 |
127   - | LOCKU | REQ | | Section 18.12 |
128   - | LOOKUP | REQ | | Section 18.13 |
129   - | LOOKUPP | REQ | | Section 18.14 |
130   - | NVERIFY | REQ | | Section 18.15 |
131   - | OPEN | REQ | | Section 18.16 |
132   -NS*| OPENATTR | OPT | | Section 18.17 |
133   - | OPEN_CONFIRM | MNI | | N/A |
134   - | OPEN_DOWNGRADE | REQ | | Section 18.18 |
135   - | PUTFH | REQ | | Section 18.19 |
136   - | PUTPUBFH | REQ | | Section 18.20 |
137   - | PUTROOTFH | REQ | | Section 18.21 |
138   - | READ | REQ | | Section 18.22 |
139   - | READDIR | REQ | | Section 18.23 |
140   - | READLINK | OPT | | Section 18.24 |
141   -NS | RECLAIM_COMPLETE | REQ | | Section 18.51 |
142   - | RELEASE_LOCKOWNER | MNI | | N/A |
143   - | REMOVE | REQ | | Section 18.25 |
144   - | RENAME | REQ | | Section 18.26 |
145   - | RENEW | MNI | | N/A |
146   - | RESTOREFH | REQ | | Section 18.27 |
147   - | SAVEFH | REQ | | Section 18.28 |
148   - | SECINFO | REQ | | Section 18.29 |
149   -NS | SECINFO_NO_NAME | REC | pNFS files | Section 18.45, |
150   - | | | layout (REQ) | Section 13.12 |
151   -I | SEQUENCE | REQ | | Section 18.46 |
152   - | SETATTR | REQ | | Section 18.30 |
153   - | SETCLIENTID | MNI | | N/A |
154   - | SETCLIENTID_CONFIRM | MNI | | N/A |
155   -NS | SET_SSV | REQ | | Section 18.47 |
156   -NS | TEST_STATEID | REQ | | Section 18.48 |
157   - | VERIFY | REQ | | Section 18.31 |
158   -NS*| WANT_DELEGATION | OPT | FDELG (OPT) | Section 18.49 |
159   - | WRITE | REQ | | Section 18.32 |
160   -
161   -Callback Operations
162   -
163   - +-------------------------+-----------+-------------+---------------+
164   - | Operation | REQ, REC, | Feature | Definition |
165   - | | OPT, or | (REQ, REC, | |
166   - | | MNI | or OPT) | |
167   - +-------------------------+-----------+-------------+---------------+
168   - | CB_GETATTR | OPT | FDELG (REQ) | Section 20.1 |
169   -P | CB_LAYOUTRECALL | OPT | pNFS (REQ) | Section 20.3 |
170   -NS*| CB_NOTIFY | OPT | DDELG (REQ) | Section 20.4 |
171   -P | CB_NOTIFY_DEVICEID | OPT | pNFS (OPT) | Section 20.12 |
172   -NS*| CB_NOTIFY_LOCK | OPT | | Section 20.11 |
173   -NS*| CB_PUSH_DELEG | OPT | FDELG (OPT) | Section 20.5 |
174   - | CB_RECALL | OPT | FDELG, | Section 20.2 |
175   - | | | DDELG, pNFS | |
176   - | | | (REQ) | |
177   -NS*| CB_RECALL_ANY | OPT | FDELG, | Section 20.6 |
178   - | | | DDELG, pNFS | |
179   - | | | (REQ) | |
180   -NS | CB_RECALL_SLOT | REQ | | Section 20.8 |
181   -NS*| CB_RECALLABLE_OBJ_AVAIL | OPT | DDELG, pNFS | Section 20.7 |
182   - | | | (REQ) | |
183   -I | CB_SEQUENCE | OPT | FDELG, | Section 20.9 |
184   - | | | DDELG, pNFS | |
185   - | | | (REQ) | |
186   -NS*| CB_WANTS_CANCELLED | OPT | FDELG, | Section 20.10 |
187   - | | | DDELG, pNFS | |
188   - | | | (REQ) | |
189   - +-------------------------+-----------+-------------+---------------+
190   -
191   -Implementation notes:
192   -
193   -DELEGPURGE:
194   -* mandatory only for servers that support CLAIM_DELEGATE_PREV and/or
195   - CLAIM_DELEG_PREV_FH (which allows clients to keep delegations that
196   - persist across client reboots). Thus we need not implement this for
197   - now.
198   -
199   -EXCHANGE_ID:
200   -* only SP4_NONE state protection supported
201   -* implementation ids are ignored
202   -
203   -CREATE_SESSION:
204   -* backchannel attributes are ignored
205   -* backchannel security parameters are ignored
206   -
207   -SEQUENCE:
208   -* no support for dynamic slot table renegotiation (optional)
209   -
210   -nfsv4.1 COMPOUND rules:
211   -The following cases aren't supported yet:
212   -* Enforcing of NFS4ERR_NOT_ONLY_OP for: BIND_CONN_TO_SESSION, CREATE_SESSION,
213   - DESTROY_CLIENTID, DESTROY_SESSION, EXCHANGE_ID.
214   -* DESTROY_SESSION MUST be the final operation in the COMPOUND request.
215   -
216   -Nonstandard compound limitations:
217   -* No support for a sessions fore channel RPC compound that requires both a
218   - ca_maxrequestsize request and a ca_maxresponsesize reply, so we may
219   - fail to live up to the promise we made in CREATE_SESSION fore channel
220   - negotiation.
221   -* No more than one IO operation (read, write, readdir) allowed per
222   - compound.
Documentation/filesystems/nfsroot.txt
1   -Mounting the root filesystem via NFS (nfsroot)
2   -===============================================
3   -
4   -Written 1996 by Gero Kuhlmann <gero@gkminix.han.de>
5   -Updated 1997 by Martin Mares <mj@atrey.karlin.mff.cuni.cz>
6   -Updated 2006 by Nico Schottelius <nico-kernel-nfsroot@schottelius.org>
7   -Updated 2006 by Horms <horms@verge.net.au>
8   -
9   -
10   -
11   -In order to use a diskless system, such as an X-terminal or printer server
12   -for example, it is necessary for the root filesystem to be present on a
13   -non-disk device. This may be an initramfs (see Documentation/filesystems/
14   -ramfs-rootfs-initramfs.txt), a ramdisk (see Documentation/initrd.txt) or a
15   -filesystem mounted via NFS. The following text describes on how to use NFS
16   -for the root filesystem. For the rest of this text 'client' means the
17   -diskless system, and 'server' means the NFS server.
18   -
19   -
20   -
21   -
22   -1.) Enabling nfsroot capabilities
23   - -----------------------------
24   -
25   -In order to use nfsroot, NFS client support needs to be selected as
26   -built-in during configuration. Once this has been selected, the nfsroot
27   -option will become available, which should also be selected.
28   -
29   -In the networking options, kernel level autoconfiguration can be selected,
30   -along with the types of autoconfiguration to support. Selecting all of
31   -DHCP, BOOTP and RARP is safe.
32   -
33   -
34   -
35   -
36   -2.) Kernel command line
37   - -------------------
38   -
39   -When the kernel has been loaded by a boot loader (see below) it needs to be
40   -told what root fs device to use. And in the case of nfsroot, where to find
41   -both the server and the name of the directory on the server to mount as root.
42   -This can be established using the following kernel command line parameters:
43   -
44   -
45   -root=/dev/nfs
46   -
47   - This is necessary to enable the pseudo-NFS-device. Note that it's not a
48   - real device but just a synonym to tell the kernel to use NFS instead of
49   - a real device.
50   -
51   -
52   -nfsroot=[<server-ip>:]<root-dir>[,<nfs-options>]
53   -
54   - If the `nfsroot' parameter is NOT given on the command line,
55   - the default "/tftpboot/%s" will be used.
56   -
57   - <server-ip> Specifies the IP address of the NFS server.
58   - The default address is determined by the `ip' parameter
59   - (see below). This parameter allows the use of different
60   - servers for IP autoconfiguration and NFS.
61   -
62   - <root-dir> Name of the directory on the server to mount as root.
63   - If there is a "%s" token in the string, it will be
64   - replaced by the ASCII-representation of the client's
65   - IP address.
66   -
67   - <nfs-options> Standard NFS options. All options are separated by commas.
68   - The following defaults are used:
69   - port = as given by server portmap daemon
70   - rsize = 4096
71   - wsize = 4096
72   - timeo = 7
73   - retrans = 3
74   - acregmin = 3
75   - acregmax = 60
76   - acdirmin = 30
77   - acdirmax = 60
78   - flags = hard, nointr, noposix, cto, ac
79   -
80   -
81   -ip=<client-ip>:<server-ip>:<gw-ip>:<netmask>:<hostname>:<device>:<autoconf>
82   -
83   - This parameter tells the kernel how to configure IP addresses of devices
84   - and also how to set up the IP routing table. It was originally called
85   - `nfsaddrs', but now the boot-time IP configuration works independently of
86   - NFS, so it was renamed to `ip' and the old name remained as an alias for
87   - compatibility reasons.
88   -
89   - If this parameter is missing from the kernel command line, all fields are
90   - assumed to be empty, and the defaults mentioned below apply. In general
91   - this means that the kernel tries to configure everything using
92   - autoconfiguration.
93   -
94   - The <autoconf> parameter can appear alone as the value to the `ip'
95   - parameter (without all the ':' characters before). If the value is
96   - "ip=off" or "ip=none", no autoconfiguration will take place, otherwise
97   - autoconfiguration will take place. The most common way to use this
98   - is "ip=dhcp".
99   -
100   - <client-ip> IP address of the client.
101   -
102   - Default: Determined using autoconfiguration.
103   -
104   - <server-ip> IP address of the NFS server. If RARP is used to determine
105   - the client address and this parameter is NOT empty only
106   - replies from the specified server are accepted.
107   -
108   - Only required for NFS root. That is autoconfiguration
109   - will not be triggered if it is missing and NFS root is not
110   - in operation.
111   -
112   - Default: Determined using autoconfiguration.
113   - The address of the autoconfiguration server is used.
114   -
115   - <gw-ip> IP address of a gateway if the server is on a different subnet.
116   -
117   - Default: Determined using autoconfiguration.
118   -
119   - <netmask> Netmask for local network interface. If unspecified
120   - the netmask is derived from the client IP address assuming
121   - classful addressing.
122   -
123   - Default: Determined using autoconfiguration.
124   -
125   - <hostname> Name of the client. May be supplied by autoconfiguration,
126   - but its absence will not trigger autoconfiguration.
127   -
128   - Default: Client IP address is used in ASCII notation.
129   -
130   - <device> Name of network device to use.
131   -
132   - Default: If the host only has one device, it is used.
133   - Otherwise the device is determined using
134   - autoconfiguration. This is done by sending
135   - autoconfiguration requests out of all devices,
136   - and using the device that received the first reply.
137   -
138   - <autoconf> Method to use for autoconfiguration. In the case of options
139   - which specify multiple autoconfiguration protocols,
140   - requests are sent using all protocols, and the first one
141   - to reply is used.
142   -
143   - Only autoconfiguration protocols that have been compiled
144   - into the kernel will be used, regardless of the value of
145   - this option.
146   -
147   - off or none: don't use autoconfiguration
148   - (do static IP assignment instead)
149   - on or any: use any protocol available in the kernel
150   - (default)
151   - dhcp: use DHCP
152   - bootp: use BOOTP
153   - rarp: use RARP
154   - both: use both BOOTP and RARP but not DHCP
155   - (old option kept for backwards compatibility)
156   -
157   - Default: any
158   -
159   -
160   -
161   -
162   -3.) Boot Loader
163   - ----------
164   -
165   -To get the kernel into memory different approaches can be used.
166   -They depend on various facilities being available:
167   -
168   -
169   -3.1) Booting from a floppy using syslinux
170   -
171   - When building kernels, an easy way to create a boot floppy that uses
172   - syslinux is to use the zdisk or bzdisk make targets which use zimage
173   - and bzimage images respectively. Both targets accept the
174   - FDARGS parameter which can be used to set the kernel command line.
175   -
176   - e.g.
177   - make bzdisk FDARGS="root=/dev/nfs"
178   -
179   - Note that the user running this command will need to have
180   - access to the floppy drive device, /dev/fd0
181   -
182   - For more information on syslinux, including how to create bootdisks
183   - for prebuilt kernels, see http://syslinux.zytor.com/
184   -
185   - N.B: Previously it was possible to write a kernel directly to
186   - a floppy using dd, configure the boot device using rdev, and
187   - boot using the resulting floppy. Linux no longer supports this
188   - method of booting.
189   -
190   -3.2) Booting from a cdrom using isolinux
191   -
192   - When building kernels, an easy way to create a bootable cdrom that
193   - uses isolinux is to use the isoimage target which uses a bzimage
194   - image. Like zdisk and bzdisk, this target accepts the FDARGS
195   - parameter which can be used to set the kernel command line.
196   -
197   - e.g.
198   - make isoimage FDARGS="root=/dev/nfs"
199   -
200   - The resulting iso image will be arch/<ARCH>/boot/image.iso
201   - This can be written to a cdrom using a variety of tools including
202   - cdrecord.
203   -
204   - e.g.
205   - cdrecord dev=ATAPI:1,0,0 arch/i386/boot/image.iso
206   -
207   - For more information on isolinux, including how to create bootdisks
208   - for prebuilt kernels, see http://syslinux.zytor.com/
209   -
210   -3.2) Using LILO
211   - When using LILO all the necessary command line parameters may be
212   - specified using the 'append=' directive in the LILO configuration
213   - file.
214   -
215   - However, to use the 'root=' directive you also need to create
216   - a dummy root device, which may be removed after LILO is run.
217   -
218   - mknod /dev/boot255 c 0 255
219   -
220   - For information on configuring LILO, please refer to its documentation.
221   -
222   -3.3) Using GRUB
223   - When using GRUB, kernel parameter are simply appended after the kernel
224   - specification: kernel <kernel> <parameters>
225   -
226   -3.4) Using loadlin
227   - loadlin may be used to boot Linux from a DOS command prompt without
228   - requiring a local hard disk to mount as root. This has not been
229   - thoroughly tested by the authors of this document, but in general
230   - it should be possible configure the kernel command line similarly
231   - to the configuration of LILO.
232   -
233   - Please refer to the loadlin documentation for further information.
234   -
235   -3.5) Using a boot ROM
236   - This is probably the most elegant way of booting a diskless client.
237   - With a boot ROM the kernel is loaded using the TFTP protocol. The
238   - authors of this document are not aware of any no commercial boot
239   - ROMs that support booting Linux over the network. However, there
240   - are two free implementations of a boot ROM, netboot-nfs and
241   - etherboot, both of which are available on sunsite.unc.edu, and both
242   - of which contain everything you need to boot a diskless Linux client.
243   -
244   -3.6) Using pxelinux
245   - Pxelinux may be used to boot linux using the PXE boot loader
246   - which is present on many modern network cards.
247   -
248   - When using pxelinux, the kernel image is specified using
249   - "kernel <relative-path-below /tftpboot>". The nfsroot parameters
250   - are passed to the kernel by adding them to the "append" line.
251   - It is common to use serial console in conjunction with pxeliunx,
252   - see Documentation/serial-console.txt for more information.
253   -
254   - For more information on isolinux, including how to create bootdisks
255   - for prebuilt kernels, see http://syslinux.zytor.com/
256   -
257   -
258   -
259   -
260   -4.) Credits
261   - -------
262   -
263   - The nfsroot code in the kernel and the RARP support have been written
264   - by Gero Kuhlmann <gero@gkminix.han.de>.
265   -
266   - The rest of the IP layer autoconfiguration code has been written
267   - by Martin Mares <mj@atrey.karlin.mff.cuni.cz>.
268   -
269   - In order to write the initial version of nfsroot I would like to thank
270   - Jens-Uwe Mager <jum@anubis.han.de> for his help.
Documentation/filesystems/porting
... ... @@ -140,7 +140,7 @@
140 140 New super_block field "struct export_operations *s_export_op" for
141 141 explicit support for exporting, e.g. via NFS. The structure is fully
142 142 documented at its declaration in include/linux/fs.h, and in
143   -Documentation/filesystems/Exporting.
  143 +Documentation/filesystems/nfs/Exporting.
144 144  
145 145 Briefly it allows for the definition of decode_fh and encode_fh operations
146 146 to encode and decode filehandles, and allows the filesystem to use
Documentation/kernel-parameters.txt
... ... @@ -1017,7 +1017,7 @@
1017 1017 No delay
1018 1018  
1019 1019 ip= [IP_PNP]
1020   - See Documentation/filesystems/nfsroot.txt.
  1020 + See Documentation/filesystems/nfs/nfsroot.txt.
1021 1021  
1022 1022 ip2= [HW] Set IO/IRQ pairs for up to 4 IntelliPort boards
1023 1023 See comment before ip2_setup() in
1024 1024  
... ... @@ -1538,10 +1538,10 @@
1538 1538 going to be removed in 2.6.29.
1539 1539  
1540 1540 nfsaddrs= [NFS]
1541   - See Documentation/filesystems/nfsroot.txt.
  1541 + See Documentation/filesystems/nfs/nfsroot.txt.
1542 1542  
1543 1543 nfsroot= [NFS] nfs root filesystem for disk-less boxes.
1544   - See Documentation/filesystems/nfsroot.txt.
  1544 + See Documentation/filesystems/nfs/nfsroot.txt.
1545 1545  
1546 1546 nfs.callback_tcpport=
1547 1547 [NFS] set the TCP port on which the NFSv4 callback
... ... @@ -24,7 +24,7 @@
24 24 */
25 25  
26 26 /*
27   - * See Documentation/filesystems/Exporting
  27 + * See Documentation/filesystems/nfs/Exporting
28 28 * and examples in fs/exportfs
29 29 *
30 30 * Since cifs is a network file system, an "fsid" must be included for
... ... @@ -6,7 +6,7 @@
6 6 * and for mapping back from file handles to dentries.
7 7 *
8 8 * For details on why we do all the strange and hairy things in here
9   - * take a look at Documentation/filesystems/Exporting.
  9 + * take a look at Documentation/filesystems/nfs/Exporting.
10 10 */
11 11 #include <linux/exportfs.h>
12 12 #include <linux/fs.h>
... ... @@ -9,7 +9,7 @@
9 9 *
10 10 * The following files are helpful:
11 11 *
12   - * Documentation/filesystems/Exporting
  12 + * Documentation/filesystems/nfs/Exporting
13 13 * fs/exportfs/expfs.c.
14 14 */
15 15  
... ... @@ -90,7 +90,7 @@
90 90 If you want your system to mount its root file system via NFS,
91 91 choose Y here. This is common practice for managing systems
92 92 without local permanent storage. For details, read
93   - <file:Documentation/filesystems/nfsroot.txt>.
  93 + <file:Documentation/filesystems/nfs/nfsroot.txt>.
94 94  
95 95 Most people say N here.
96 96  
include/linux/exportfs.h
... ... @@ -97,7 +97,7 @@
97 97 * @get_name: find the name for a given inode in a given directory
98 98 * @get_parent: find the parent of a given directory
99 99 *
100   - * See Documentation/filesystems/Exporting for details on how to use
  100 + * See Documentation/filesystems/nfs/Exporting for details on how to use
101 101 * this interface correctly.
102 102 *
103 103 * encode_fh:
... ... @@ -166,7 +166,7 @@
166 166  
167 167 If unsure, say Y. Note that if you want to use DHCP, a DHCP server
168 168 must be operating on your network. Read
169   - <file:Documentation/filesystems/nfsroot.txt> for details.
  169 + <file:Documentation/filesystems/nfs/nfsroot.txt> for details.
170 170  
171 171 config IP_PNP_BOOTP
172 172 bool "IP: BOOTP support"
... ... @@ -181,7 +181,7 @@
181 181 does BOOTP itself, providing all necessary information on the kernel
182 182 command line, you can say N here. If unsure, say Y. Note that if you
183 183 want to use BOOTP, a BOOTP server must be operating on your network.
184   - Read <file:Documentation/filesystems/nfsroot.txt> for details.
  184 + Read <file:Documentation/filesystems/nfs/nfsroot.txt> for details.
185 185  
186 186 config IP_PNP_RARP
187 187 bool "IP: RARP support"
... ... @@ -194,7 +194,7 @@
194 194 older protocol which is being obsoleted by BOOTP and DHCP), say Y
195 195 here. Note that if you want to use RARP, a RARP server must be
196 196 operating on your network. Read
197   - <file:Documentation/filesystems/nfsroot.txt> for details.
  197 + <file:Documentation/filesystems/nfs/nfsroot.txt> for details.
198 198  
199 199 # not yet ready..
200 200 # bool ' IP: ARP support' CONFIG_IP_PNP_ARP
... ... @@ -1447,7 +1447,7 @@
1447 1447  
1448 1448 /*
1449 1449 * Decode any IP configuration options in the "ip=" or "nfsaddrs=" kernel
1450   - * command line parameter. See Documentation/filesystems/nfsroot.txt.
  1450 + * command line parameter. See Documentation/filesystems/nfs/nfsroot.txt.
1451 1451 */
1452 1452 static int __init ic_proto_name(char *name)
1453 1453 {