Merge branch 'docs' of git://git.lwn.net/linux-2.6

* 'docs' of git://git.lwn.net/linux-2.6: Add additional examples in Documentation/spinlocks.txt Move sched-rt-group.txt to scheduler/ Documentation: move rpc-cache.txt to filesystems/ Documentation: move nfsroot.txt to filesystems/ Spell out behavior of atomic_dec_and_lock() in kerneldoc Fix a typo in highres.txt Fixes to the seq_file document Fill out information on patch tags in SubmittingPatches Add the seq_file documentation

Merge branch 'docs' of git://git.lwn.net/linux-2.6
* 'docs' of git://git.lwn.net/linux-2.6: Add additional examples in Documentation/spinlocks.txt Move sched-rt-group.txt to scheduler/ Documentation: move rpc-cache.txt to filesystems/ Documentation: move nfsroot.txt to filesystems/ Spell out behavior of atomic_dec_and_lock() in kerneldoc Fix a typo in highres.txt Fixes to the seq_file document Fill out information on patch tags in SubmittingPatches Add the seq_file documentation
Linus Torvalds
2 parents b0fac02370 14dadf1d5e
Showing 18 changed files Side-by-side Diff
Documentation/00-INDEX
Documentation/SubmittingPatches
Documentation/filesystems/00-INDEX
Documentation/filesystems/nfsroot.txt
Documentation/filesystems/rpc-cache.txt
Documentation/filesystems/seq_file.txt
Documentation/hrtimers/highres.txt
Documentation/kernel-parameters.txt
Documentation/nfsroot.txt
Documentation/rpc-cache.txt
Documentation/sched-rt-group.txt
Documentation/scheduler/00-INDEX
Documentation/scheduler/sched-rt-group.txt
Documentation/spinlocks.txt
fs/Kconfig
include/linux/spinlock.h
net/ipv4/Kconfig
net/ipv4/ipconfig.c
@@ -271,8 +271,6 @@
 	- directory with information on the NetLabel subsystem.
 networking/
 	- directory with info on various aspects of networking with Linux.
-nfsroot.txt
-	- short guide on setting up a diskless box with NFS root filesystem.
 nmi_watchdog.txt
 	- info on NMI watchdog for SMP systems.
 nommu-mmap.txt
@@ -321,8 +319,6 @@
 	- a description of what robust futexes are.
 rocket.txt
 	- info on the Comtrol RocketPort multiport serial driver.
-rpc-cache.txt
-	- introduction to the caching mechanisms in the sunrpc layer.
 rt-mutex-design.txt
 	- description of the RealTime mutex implementation design.
 rt-mutex.txt
@@ -328,7 +328,7 @@
 point out some special detail about the sign-off. 
  
  
-13) When to use Acked-by:
+13) When to use Acked-by: and Cc:
  
 The Signed-off-by: tag indicates that the signer was involved in the
 development of the patch, or that he/she was in the patch's delivery path.
  
  
@@ -349,11 +349,59 @@
 For example, if a patch affects multiple subsystems and has an Acked-by: from
 one subsystem maintainer then this usually indicates acknowledgement of just
 the part which affects that maintainer's code.  Judgement should be used here.
- When in doubt people should refer to the original discussion in the mailing
+When in doubt people should refer to the original discussion in the mailing
 list archives.
  
+If a person has had the opportunity to comment on a patch, but has not
+provided such comments, you may optionally add a "Cc:" tag to the patch.
+This is the only tag which might be added without an explicit action by the
+person it names.  This tag documents that potentially interested parties
+have been included in the discussion
  
-14) The canonical patch format
+
+14) Using Test-by: and Reviewed-by:
+
+A Tested-by: tag indicates that the patch has been successfully tested (in
+some environment) by the person named.  This tag informs maintainers that
+some testing has been performed, provides a means to locate testers for
+future patches, and ensures credit for the testers.
+
+Reviewed-by:, instead, indicates that the patch has been reviewed and found
+acceptable according to the Reviewer's Statement:
+
+	Reviewer's statement of oversight
+
+	By offering my Reviewed-by: tag, I state that:
+
+ 	 (a) I have carried out a technical review of this patch to
+	     evaluate its appropriateness and readiness for inclusion into
+	     the mainline kernel.
+
+	 (b) Any problems, concerns, or questions relating to the patch
+	     have been communicated back to the submitter.  I am satisfied
+	     with the submitter's response to my comments.
+
+	 (c) While there may be things that could be improved with this
+	     submission, I believe that it is, at this time, (1) a
+	     worthwhile modification to the kernel, and (2) free of known
+	     issues which would argue against its inclusion.
+
+	 (d) While I have reviewed the patch and believe it to be sound, I
+	     do not (unless explicitly stated elsewhere) make any
+	     warranties or guarantees that it will achieve its stated
+	     purpose or function properly in any given situation.
+
+A Reviewed-by tag is a statement of opinion that the patch is an
+appropriate modification of the kernel without any remaining serious
+technical issues.  Any interested reviewer (who has done the work) can
+offer a Reviewed-by tag for a patch.  This tag serves to give credit to
+reviewers and to inform maintainers of the degree of review which has been
+done on the patch.  Reviewed-by: tags, when supplied by reviewers known to
+understand the subject area and to perform thorough reviews, will normally
+increase the liklihood of your patch getting into the kernel.
+
+
+15) The canonical patch format
  
 The canonical patch subject line is:
  
@@ -66,6 +66,8 @@
 	- info on the Linux implementation of Sys V mandatory file locking.
 ncpfs.txt
 	- info on Novell Netware(tm) filesystem using NCP protocol.
+nfsroot.txt
+	- short guide on setting up a diskless box with NFS root filesystem.
 ntfs.txt
 	- info and mount options for the NTFS filesystem (Windows NT).
 ocfs2.txt
@@ -82,6 +84,10 @@
 	- info on relay, for efficient streaming from kernel to user space.
 romfs.txt
 	- description of the ROMFS filesystem.
+rpc-cache.txt
+	- introduction to the caching mechanisms in the sunrpc layer.
+seq_file.txt
+	- how to use the seq_file API
 sharedsubtree.txt
 	- a description of shared subtrees for namespaces.
 smbfs.txt
+Mounting the root filesystem via NFS (nfsroot)
+===============================================
+
+Written 1996 by Gero Kuhlmann <gero@gkminix.han.de>
+Updated 1997 by Martin Mares <mj@atrey.karlin.mff.cuni.cz>
+Updated 2006 by Nico Schottelius <nico-kernel-nfsroot@schottelius.org>
+Updated 2006 by Horms <horms@verge.net.au>
+
+
+
+In order to use a diskless system, such as an X-terminal or printer server
+for example, it is necessary for the root filesystem to be present on a
+non-disk device. This may be an initramfs (see Documentation/filesystems/
+ramfs-rootfs-initramfs.txt), a ramdisk (see Documentation/initrd.txt) or a
+filesystem mounted via NFS. The following text describes on how to use NFS
+for the root filesystem. For the rest of this text 'client' means the
+diskless system, and 'server' means the NFS server.
+
+
+
+
+1.) Enabling nfsroot capabilities
+    -----------------------------
+
+In order to use nfsroot, NFS client support needs to be selected as
+built-in during configuration. Once this has been selected, the nfsroot
+option will become available, which should also be selected.
+
+In the networking options, kernel level autoconfiguration can be selected,
+along with the types of autoconfiguration to support. Selecting all of
+DHCP, BOOTP and RARP is safe.
+
+
+
+
+2.) Kernel command line
+    -------------------
+
+When the kernel has been loaded by a boot loader (see below) it needs to be
+told what root fs device to use. And in the case of nfsroot, where to find
+both the server and the name of the directory on the server to mount as root.
+This can be established using the following kernel command line parameters:
+
+
+root=/dev/nfs
+
+  This is necessary to enable the pseudo-NFS-device. Note that it's not a
+  real device but just a synonym to tell the kernel to use NFS instead of
+  a real device.
+
+
+nfsroot=[<server-ip>:]<root-dir>[,<nfs-options>]
+
+  If the `nfsroot' parameter is NOT given on the command line,
+  the default "/tftpboot/%s" will be used.
+
+  <server-ip>	Specifies the IP address of the NFS server.
+		The default address is determined by the `ip' parameter
+		(see below). This parameter allows the use of different
+		servers for IP autoconfiguration and NFS.
+
+  <root-dir>	Name of the directory on the server to mount as root.
+		If there is a "%s" token in the string, it will be
+		replaced by the ASCII-representation of the client's
+		IP address.
+
+  <nfs-options>	Standard NFS options. All options are separated by commas.
+		The following defaults are used:
+			port		= as given by server portmap daemon
+			rsize		= 4096
+			wsize		= 4096
+			timeo		= 7
+			retrans		= 3
+			acregmin	= 3
+			acregmax	= 60
+			acdirmin	= 30
+			acdirmax	= 60
+			flags		= hard, nointr, noposix, cto, ac
+
+
+ip=<client-ip>:<server-ip>:<gw-ip>:<netmask>:<hostname>:<device>:<autoconf>
+
+  This parameter tells the kernel how to configure IP addresses of devices
+  and also how to set up the IP routing table. It was originally called
+  `nfsaddrs', but now the boot-time IP configuration works independently of
+  NFS, so it was renamed to `ip' and the old name remained as an alias for
+  compatibility reasons.
+
+  If this parameter is missing from the kernel command line, all fields are
+  assumed to be empty, and the defaults mentioned below apply. In general
+  this means that the kernel tries to configure everything using
+  autoconfiguration.
+
+  The <autoconf> parameter can appear alone as the value to the `ip'
+  parameter (without all the ':' characters before).  If the value is
+  "ip=off" or "ip=none", no autoconfiguration will take place, otherwise
+  autoconfiguration will take place.  The most common way to use this
+  is "ip=dhcp".
+
+  <client-ip>	IP address of the client.
+
+  		Default:  Determined using autoconfiguration.
+
+  <server-ip>	IP address of the NFS server. If RARP is used to determine
+		the client address and this parameter is NOT empty only
+		replies from the specified server are accepted.
+
+		Only required for for NFS root. That is autoconfiguration
+		will not be triggered if it is missing and NFS root is not
+		in operation.
+
+		Default: Determined using autoconfiguration.
+		         The address of the autoconfiguration server is used.
+
+  <gw-ip>	IP address of a gateway if the server is on a different subnet.
+
+		Default: Determined using autoconfiguration.
+
+  <netmask>	Netmask for local network interface. If unspecified
+		the netmask is derived from the client IP address assuming
+		classful addressing.
+
+		Default:  Determined using autoconfiguration.
+
+  <hostname>	Name of the client. May be supplied by autoconfiguration,
+  		but its absence will not trigger autoconfiguration.
+
+  		Default: Client IP address is used in ASCII notation.
+
+  <device>	Name of network device to use.
+
+		Default: If the host only has one device, it is used.
+			 Otherwise the device is determined using
+			 autoconfiguration. This is done by sending
+			 autoconfiguration requests out of all devices,
+			 and using the device that received the first reply.
+
+  <autoconf>	Method to use for autoconfiguration. In the case of options
+                which specify multiple autoconfiguration protocols,
+		requests are sent using all protocols, and the first one
+		to reply is used.
+
+		Only autoconfiguration protocols that have been compiled
+		into the kernel will be used, regardless of the value of
+		this option.
+
+                  off or none: don't use autoconfiguration
+				(do static IP assignment instead)
+		  on or any:   use any protocol available in the kernel
+			       (default)
+		  dhcp:        use DHCP
+		  bootp:       use BOOTP
+		  rarp:        use RARP
+		  both:        use both BOOTP and RARP but not DHCP
+		               (old option kept for backwards compatibility)
+
+                Default: any
+
+
+
+
+3.) Boot Loader
+    ----------
+
+To get the kernel into memory different approaches can be used.
+They depend on various facilities being available:
+
+
+3.1)  Booting from a floppy using syslinux
+
+	When building kernels, an easy way to create a boot floppy that uses
+	syslinux is to use the zdisk or bzdisk make targets which use
+      	and bzimage images respectively. Both targets accept the
+     	FDARGS parameter which can be used to set the kernel command line.
+
+	e.g.
+	   make bzdisk FDARGS="root=/dev/nfs"
+
+   	Note that the user running this command will need to have
+     	access to the floppy drive device, /dev/fd0
+
+     	For more information on syslinux, including how to create bootdisks
+     	for prebuilt kernels, see http://syslinux.zytor.com/
+
+	N.B: Previously it was possible to write a kernel directly to
+	     a floppy using dd, configure the boot device using rdev, and
+	     boot using the resulting floppy. Linux no longer supports this
+	     method of booting.
+
+3.2) Booting from a cdrom using isolinux
+
+     	When building kernels, an easy way to create a bootable cdrom that
+     	uses isolinux is to use the isoimage target which uses a bzimage
+     	image. Like zdisk and bzdisk, this target accepts the FDARGS
+     	parameter which can be used to set the kernel command line.
+
+	e.g.
+	  make isoimage FDARGS="root=/dev/nfs"
+
+     	The resulting iso image will be arch/<ARCH>/boot/image.iso
+     	This can be written to a cdrom using a variety of tools including
+     	cdrecord.
+
+	e.g.
+	  cdrecord dev=ATAPI:1,0,0 arch/i386/boot/image.iso
+
+     	For more information on isolinux, including how to create bootdisks
+     	for prebuilt kernels, see http://syslinux.zytor.com/
+
+3.2) Using LILO
+	When using LILO all the necessary command line parameters may be
+	specified using the 'append=' directive in the LILO configuration
+	file.
+
+	However, to use the 'root=' directive you also need to create
+	a dummy root device, which may be removed after LILO is run.
+
+	mknod /dev/boot255 c 0 255
+
+	For information on configuring LILO, please refer to its documentation.
+
+3.3) Using GRUB
+	When using GRUB, kernel parameter are simply appended after the kernel
+	specification: kernel <kernel> <parameters>
+
+3.4) Using loadlin
+	loadlin may be used to boot Linux from a DOS command prompt without
+	requiring a local hard disk to mount as root. This has not been
+	thoroughly tested by the authors of this document, but in general
+	it should be possible configure the kernel command line similarly
+	to the configuration of LILO.
+
+	Please refer to the loadlin documentation for further information.
+
+3.5) Using a boot ROM
+	This is probably the most elegant way of booting a diskless client.
+	With a boot ROM the kernel is loaded using the TFTP protocol. The
+	authors of this document are not aware of any no commercial boot
+	ROMs that support booting Linux over the network. However, there
+	are two free implementations of a boot ROM, netboot-nfs and
+	etherboot, both of which are available on sunsite.unc.edu, and both
+	of which contain everything you need to boot a diskless Linux client.
+
+3.6) Using pxelinux
+	Pxelinux may be used to boot linux using the PXE boot loader
+	which is present on many modern network cards.
+
+	When using pxelinux, the kernel image is specified using
+	"kernel <relative-path-below /tftpboot>". The nfsroot parameters
+	are passed to the kernel by adding them to the "append" line.
+	It is common to use serial console in conjunction with pxeliunx,
+	see Documentation/serial-console.txt for more information.
+
+	For more information on isolinux, including how to create bootdisks
+	for prebuilt kernels, see http://syslinux.zytor.com/
+
+
+
+
+4.) Credits
+    -------
+
+  The nfsroot code in the kernel and the RARP support have been written
+  by Gero Kuhlmann <gero@gkminix.han.de>.
+
+  The rest of the IP layer autoconfiguration code has been written
+  by Martin Mares <mj@atrey.karlin.mff.cuni.cz>.
+
+  In order to write the initial version of nfsroot I would like to thank
+  Jens-Uwe Mager <jum@anubis.han.de> for his help.
+	This document gives a brief introduction to the caching
+mechanisms in the sunrpc layer that is used, in particular,
+for NFS authentication.
+
+CACHES
+======
+The caching replaces the old exports table and allows for
+a wide variety of values to be caches.
+
+There are a number of caches that are similar in structure though
+quite possibly very different in content and use.  There is a corpus
+of common code for managing these caches.
+
+Examples of caches that are likely to be needed are:
+  - mapping from IP address to client name
+  - mapping from client name and filesystem to export options
+  - mapping from UID to list of GIDs, to work around NFS's limitation
+    of 16 gids.
+  - mappings between local UID/GID and remote UID/GID for sites that
+    do not have uniform uid assignment
+  - mapping from network identify to public key for crypto authentication.
+
+The common code handles such things as:
+   - general cache lookup with correct locking
+   - supporting 'NEGATIVE' as well as positive entries
+   - allowing an EXPIRED time on cache items, and removing
+     items after they expire, and are no longer in-use.
+   - making requests to user-space to fill in cache entries
+   - allowing user-space to directly set entries in the cache
+   - delaying RPC requests that depend on as-yet incomplete
+     cache entries, and replaying those requests when the cache entry
+     is complete.
+   - clean out old entries as they expire.
+
+Creating a Cache
+----------------
+
+1/ A cache needs a datum to store.  This is in the form of a
+   structure definition that must contain a
+     struct cache_head
+   as an element, usually the first.
+   It will also contain a key and some content.
+   Each cache element is reference counted and contains
+   expiry and update times for use in cache management.
+2/ A cache needs a "cache_detail" structure that
+   describes the cache.  This stores the hash table, some
+   parameters for cache management, and some operations detailing how
+   to work with particular cache items.
+   The operations requires are:
+   	struct cache_head *alloc(void)
+		This simply allocates appropriate memory and returns
+   		a pointer to the cache_detail embedded within the
+		structure
+	void cache_put(struct kref *)
+		This is called when the last reference to an item is
+		dropped.  The pointer passed is to the 'ref' field
+		in the cache_head.  cache_put should release any
+		references create by 'cache_init' and, if CACHE_VALID
+		is set, any references created by cache_update.
+		It should then release the memory allocated by
+   		'alloc'.
+        int match(struct cache_head *orig, struct cache_head *new)
+		test if the keys in the two structures match.  Return
+		1 if they do, 0 if they don't.
+	void init(struct cache_head *orig, struct cache_head *new)
+		Set the 'key' fields in 'new' from 'orig'.  This may
+		include taking references to shared objects.
+	void update(struct cache_head *orig, struct cache_head *new)
+		Set the 'content' fileds in 'new' from 'orig'.
+	int cache_show(struct seq_file *m, struct cache_detail *cd,
+			struct cache_head *h)
+		Optional.  Used to provide a /proc file that lists the
+		contents of a cache.  This should show one item,
+   		usually on just one line.
+	int cache_request(struct cache_detail *cd, struct cache_head *h,
+   		char **bpp, int *blen)
+		Format a request to be send to user-space for an item
+   		to be instantiated.  *bpp is a buffer of size *blen.
+		bpp should be moved forward over the encoded message,
+		and  *blen should be reduced to show how much free
+		space remains.  Return 0 on success or <0 if not
+		enough room or other problem.
+	int cache_parse(struct cache_detail *cd, char *buf, int len)
+		A message from user space has arrived to fill out a
+		cache entry.  It is in 'buf' of length 'len'.
+		cache_parse should parse this, find the item in the
+		cache with sunrpc_cache_lookup, and update the item
+		with sunrpc_cache_update.
+
+
+3/ A cache needs to be registered using cache_register().  This
+   includes it on a list of caches that will be regularly
+   cleaned to discard old data.
+
+Using a cache
+-------------
+
+To find a value in a cache, call sunrpc_cache_lookup passing a pointer
+to the cache_head in a sample item with the 'key' fields filled in.
+This will be passed to ->match to identify the target entry.  If no
+entry is found, a new entry will be create, added to the cache, and
+marked as not containing valid data.
+
+The item returned is typically passed to cache_check which will check
+if the data is valid, and may initiate an up-call to get fresh data.
+cache_check will return -ENOENT in the entry is negative or if an up
+call is needed but not possible, -EAGAIN if an upcall is pending,
+or 0 if the data is valid;
+
+cache_check can be passed a "struct cache_req *".  This structure is
+typically embedded in the actual request and can be used to create a
+deferred copy of the request (struct cache_deferred_req).  This is
+done when the found cache item is not uptodate, but the is reason to
+believe that userspace might provide information soon.  When the cache
+item does become valid, the deferred copy of the request will be
+revisited (->revisit).  It is expected that this method will
+reschedule the request for processing.
+
+The value returned by sunrpc_cache_lookup can also be passed to
+sunrpc_cache_update to set the content for the item.  A second item is
+passed which should hold the content.  If the item found by _lookup
+has valid data, then it is discarded and a new item is created.  This
+saves any user of an item from worrying about content changing while
+it is being inspected.  If the item found by _lookup does not contain
+valid data, then the content is copied across and CACHE_VALID is set.
+
+Populating a cache
+------------------
+
+Each cache has a name, and when the cache is registered, a directory
+with that name is created in /proc/net/rpc
+
+This directory contains a file called 'channel' which is a channel
+for communicating between kernel and user for populating the cache.
+This directory may later contain other files of interacting
+with the cache.
+
+The 'channel' works a bit like a datagram socket. Each 'write' is
+passed as a whole to the cache for parsing and interpretation.
+Each cache can treat the write requests differently, but it is
+expected that a message written will contain:
+  - a key
+  - an expiry time
+  - a content.
+with the intention that an item in the cache with the give key
+should be create or updated to have the given content, and the
+expiry time should be set on that item.
+
+Reading from a channel is a bit more interesting.  When a cache
+lookup fails, or when it succeeds but finds an entry that may soon
+expire, a request is lodged for that cache item to be updated by
+user-space.  These requests appear in the channel file.
+
+Successive reads will return successive requests.
+If there are no more requests to return, read will return EOF, but a
+select or poll for read will block waiting for another request to be
+added.
+
+Thus a user-space helper is likely to:
+  open the channel.
+    select for readable
+    read a request
+    write a response
+  loop.
+
+If it dies and needs to be restarted, any requests that have not been
+answered will still appear in the file and will be read by the new
+instance of the helper.
+
+Each cache should define a "cache_parse" method which takes a message
+written from user-space and processes it.  It should return an error
+(which propagates back to the write syscall) or 0.
+
+Each cache should also define a "cache_request" method which
+takes a cache item and encodes a request into the buffer
+provided.
+
+Note: If a cache has no active readers on the channel, and has had not
+active readers for more than 60 seconds, further requests will not be
+added to the channel but instead all lookups that do not find a valid
+entry will fail.  This is partly for backward compatibility: The
+previous nfs exports table was deemed to be authoritative and a
+failed lookup meant a definite 'no'.
+
+request/response format
+-----------------------
+
+While each cache is free to use it's own format for requests
+and responses over channel, the following is recommended as
+appropriate and support routines are available to help:
+Each request or response record should be printable ASCII
+with precisely one newline character which should be at the end.
+Fields within the record should be separated by spaces, normally one.
+If spaces, newlines, or nul characters are needed in a field they
+much be quoted.  two mechanisms are available:
+1/ If a field begins '\x' then it must contain an even number of
+   hex digits, and pairs of these digits provide the bytes in the
+   field.
+2/ otherwise a \ in the field must be followed by 3 octal digits
+   which give the code for a byte.  Other characters are treated
+   as them selves.  At the very least, space, newline, nul, and
+   '\' must be quoted in this way.
+The seq_file interface
+
+	Copyright 2003 Jonathan Corbet <corbet@lwn.net>
+	This file is originally from the LWN.net Driver Porting series at
+	http://lwn.net/Articles/driver-porting/
+
+
+There are numerous ways for a device driver (or other kernel component) to
+provide information to the user or system administrator.  One useful
+technique is the creation of virtual files, in debugfs, /proc or elsewhere.
+Virtual files can provide human-readable output that is easy to get at
+without any special utility programs; they can also make life easier for
+script writers. It is not surprising that the use of virtual files has
+grown over the years.
+
+Creating those files correctly has always been a bit of a challenge,
+however. It is not that hard to make a virtual file which returns a
+string. But life gets trickier if the output is long - anything greater
+than an application is likely to read in a single operation.  Handling
+multiple reads (and seeks) requires careful attention to the reader's
+position within the virtual file - that position is, likely as not, in the
+middle of a line of output. The kernel has traditionally had a number of
+implementations that got this wrong.
+
+The 2.6 kernel contains a set of functions (implemented by Alexander Viro)
+which are designed to make it easy for virtual file creators to get it
+right.
+
+The seq_file interface is available via <linux/seq_file.h>. There are
+three aspects to seq_file:
+
+     * An iterator interface which lets a virtual file implementation
+       step through the objects it is presenting.
+
+     * Some utility functions for formatting objects for output without
+       needing to worry about things like output buffers.
+
+     * A set of canned file_operations which implement most operations on
+       the virtual file.
+
+We'll look at the seq_file interface via an extremely simple example: a
+loadable module which creates a file called /proc/sequence. The file, when
+read, simply produces a set of increasing integer values, one per line. The
+sequence will continue until the user loses patience and finds something
+better to do. The file is seekable, in that one can do something like the
+following:
+
+    dd if=/proc/sequence of=out1 count=1
+    dd if=/proc/sequence skip=1 out=out2 count=1
+
+Then concatenate the output files out1 and out2 and get the right
+result. Yes, it is a thoroughly useless module, but the point is to show
+how the mechanism works without getting lost in other details.  (Those
+wanting to see the full source for this module can find it at
+http://lwn.net/Articles/22359/).
+
+
+The iterator interface
+
+Modules implementing a virtual file with seq_file must implement a simple
+iterator object that allows stepping through the data of interest.
+Iterators must be able to move to a specific position - like the file they
+implement - but the interpretation of that position is up to the iterator
+itself. A seq_file implementation that is formatting firewall rules, for
+example, could interpret position N as the Nth rule in the chain.
+Positioning can thus be done in whatever way makes the most sense for the
+generator of the data, which need not be aware of how a position translates
+to an offset in the virtual file. The one obvious exception is that a
+position of zero should indicate the beginning of the file.
+
+The /proc/sequence iterator just uses the count of the next number it
+will output as its position.
+
+Four functions must be implemented to make the iterator work. The first,
+called start() takes a position as an argument and returns an iterator
+which will start reading at that position. For our simple sequence example,
+the start() function looks like:
+
+	static void *ct_seq_start(struct seq_file *s, loff_t *pos)
+	{
+	        loff_t *spos = kmalloc(sizeof(loff_t), GFP_KERNEL);
+	        if (! spos)
+	                return NULL;
+	        *spos = *pos;
+	        return spos;
+	}
+
+The entire data structure for this iterator is a single loff_t value
+holding the current position. There is no upper bound for the sequence
+iterator, but that will not be the case for most other seq_file
+implementations; in most cases the start() function should check for a
+"past end of file" condition and return NULL if need be.
+
+For more complicated applications, the private field of the seq_file
+structure can be used. There is also a special value whch can be returned
+by the start() function called SEQ_START_TOKEN; it can be used if you wish
+to instruct your show() function (described below) to print a header at the
+top of the output. SEQ_START_TOKEN should only be used if the offset is
+zero, however.
+
+The next function to implement is called, amazingly, next(); its job is to
+move the iterator forward to the next position in the sequence.  The
+example module can simply increment the position by one; more useful
+modules will do what is needed to step through some data structure. The
+next() function returns a new iterator, or NULL if the sequence is
+complete. Here's the example version:
+
+	static void *ct_seq_next(struct seq_file *s, void *v, loff_t *pos)
+	{
+	        loff_t *spos = v;
+	        *pos = ++*spos;
+	        return spos;
+	}
+
+The stop() function is called when iteration is complete; its job, of
+course, is to clean up. If dynamic memory is allocated for the iterator,
+stop() is the place to free it.
+
+	static void ct_seq_stop(struct seq_file *s, void *v)
+	{
+	        kfree(v);
+	}
+
+Finally, the show() function should format the object currently pointed to
+by the iterator for output. It should return zero, or an error code if
+something goes wrong. The example module's show() function is:
+
+	static int ct_seq_show(struct seq_file *s, void *v)
+	{
+	        loff_t *spos = v;
+	        seq_printf(s, "%lld\n", (long long)*spos);
+	        return 0;
+	}
+
+We will look at seq_printf() in a moment. But first, the definition of the
+seq_file iterator is finished by creating a seq_operations structure with
+the four functions we have just defined:
+
+	static const struct seq_operations ct_seq_ops = {
+	        .start = ct_seq_start,
+	        .next  = ct_seq_next,
+	        .stop  = ct_seq_stop,
+	        .show  = ct_seq_show
+	};
+
+This structure will be needed to tie our iterator to the /proc file in
+a little bit.
+
+It's worth noting that the interator value returned by start() and
+manipulated by the other functions is considered to be completely opaque by
+the seq_file code. It can thus be anything that is useful in stepping
+through the data to be output. Counters can be useful, but it could also be
+a direct pointer into an array or linked list. Anything goes, as long as
+the programmer is aware that things can happen between calls to the
+iterator function. However, the seq_file code (by design) will not sleep
+between the calls to start() and stop(), so holding a lock during that time
+is a reasonable thing to do. The seq_file code will also avoid taking any
+other locks while the iterator is active.
+
+
+Formatted output
+
+The seq_file code manages positioning within the output created by the
+iterator and getting it into the user's buffer. But, for that to work, that
+output must be passed to the seq_file code. Some utility functions have
+been defined which make this task easy.
+
+Most code will simply use seq_printf(), which works pretty much like
+printk(), but which requires the seq_file pointer as an argument. It is
+common to ignore the return value from seq_printf(), but a function
+producing complicated output may want to check that value and quit if
+something non-zero is returned; an error return means that the seq_file
+buffer has been filled and further output will be discarded.
+
+For straight character output, the following functions may be used:
+
+	int seq_putc(struct seq_file *m, char c);
+	int seq_puts(struct seq_file *m, const char *s);
+	int seq_escape(struct seq_file *m, const char *s, const char *esc);
+
+The first two output a single character and a string, just like one would
+expect. seq_escape() is like seq_puts(), except that any character in s
+which is in the string esc will be represented in octal form in the output.
+
+There is also a function for printing filenames:
+
+	int seq_path(struct seq_file *m, struct path *path, char *esc);
+
+Here, path indicates the file of interest, and esc is a set of characters
+which should be escaped in the output.
+
+
+Making it all work
+
+So far, we have a nice set of functions which can produce output within the
+seq_file system, but we have not yet turned them into a file that a user
+can see. Creating a file within the kernel requires, of course, the
+creation of a set of file_operations which implement the operations on that
+file. The seq_file interface provides a set of canned operations which do
+most of the work. The virtual file author still must implement the open()
+method, however, to hook everything up. The open function is often a single
+line, as in the example module:
+
+	static int ct_open(struct inode *inode, struct file *file)
+	{
+		return seq_open(file, &ct_seq_ops);
+	}
+
+Here, the call to seq_open() takes the seq_operations structure we created
+before, and gets set up to iterate through the virtual file.
+
+On a successful open, seq_open() stores the struct seq_file pointer in
+file->private_data. If you have an application where the same iterator can
+be used for more than one file, you can store an arbitrary pointer in the
+private field of the seq_file structure; that value can then be retrieved
+by the iterator functions.
+
+The other operations of interest - read(), llseek(), and release() - are
+all implemented by the seq_file code itself. So a virtual file's
+file_operations structure will look like:
+
+	static const struct file_operations ct_file_ops = {
+	        .owner   = THIS_MODULE,
+	        .open    = ct_open,
+	        .read    = seq_read,
+	        .llseek  = seq_lseek,
+	        .release = seq_release
+	};
+
+There is also a seq_release_private() which passes the contents of the
+seq_file private field to kfree() before releasing the structure.
+
+The final step is the creation of the /proc file itself. In the example
+code, that is done in the initialization code in the usual way:
+
+	static int ct_init(void)
+	{
+	        struct proc_dir_entry *entry;
+
+	        entry = create_proc_entry("sequence", 0, NULL);
+	        if (entry)
+	                entry->proc_fops = &ct_file_ops;
+	        return 0;
+	}
+
+	module_init(ct_init);
+
+And that is pretty much it.
+
+
+seq_list
+
+If your file will be iterating through a linked list, you may find these
+routines useful:
+
+	struct list_head *seq_list_start(struct list_head *head,
+	       		 		 loff_t pos);
+	struct list_head *seq_list_start_head(struct list_head *head,
+			 		      loff_t pos);
+	struct list_head *seq_list_next(void *v, struct list_head *head,
+					loff_t *ppos);
+
+These helpers will interpret pos as a position within the list and iterate
+accordingly.  Your start() and next() functions need only invoke the
+seq_list_* helpers with a pointer to the appropriate list_head structure.  
+
+
+The extra-simple version
+
+For extremely simple virtual files, there is an even easier interface.  A
+module can define only the show() function, which should create all the
+output that the virtual file will contain. The file's open() method then
+calls:
+
+	int single_open(struct file *file,
+	                int (*show)(struct seq_file *m, void *p),
+	                void *data);
+
+When output time comes, the show() function will be called once. The data
+value given to single_open() can be found in the private field of the
+seq_file structure. When using single_open(), the programmer should use
+single_release() instead of seq_release() in the file_operations structure
+to avoid a memory leak.
@@ -98,7 +98,7 @@
 event devices are used to provide local CPU functionality such as process
 accounting, profiling, and high resolution timers.
  
-The management layer assignes one or more of the folliwing functions to a clock
+The management layer assigns one or more of the following functions to a clock
 event device:
       - system global periodic tick (jiffies update)
       - cpu local update_process_times
@@ -844,7 +844,7 @@
 			arch/alpha/kernel/core_marvel.c.
  
 	ip=		[IP_PNP]
-			See Documentation/nfsroot.txt.
+			See Documentation/filesystems/nfsroot.txt.
  
 	ip2=		[HW] Set IO/IRQ pairs for up to 4 IntelliPort boards
 			See comment before ip2_setup() in
  
@@ -1198,10 +1198,10 @@
 			file if at all.
  
 	nfsaddrs=	[NFS]
-			See Documentation/nfsroot.txt.
+			See Documentation/filesystems/nfsroot.txt.
  
 	nfsroot=	[NFS] nfs root filesystem for disk-less boxes.
-			See Documentation/nfsroot.txt.
+			See Documentation/filesystems/nfsroot.txt.
  
 	nfs.callback_tcpport=
 			[NFS] set the TCP port on which the NFSv4 callback
-Mounting the root filesystem via NFS (nfsroot)
-===============================================
-
-Written 1996 by Gero Kuhlmann <gero@gkminix.han.de>
-Updated 1997 by Martin Mares <mj@atrey.karlin.mff.cuni.cz>
-Updated 2006 by Nico Schottelius <nico-kernel-nfsroot@schottelius.org>
-Updated 2006 by Horms <horms@verge.net.au>
-
-
-
-In order to use a diskless system, such as an X-terminal or printer server
-for example, it is necessary for the root filesystem to be present on a
-non-disk device. This may be an initramfs (see Documentation/filesystems/
-ramfs-rootfs-initramfs.txt), a ramdisk (see Documentation/initrd.txt) or a
-filesystem mounted via NFS. The following text describes on how to use NFS
-for the root filesystem. For the rest of this text 'client' means the
-diskless system, and 'server' means the NFS server.
-
-
-
-
-1.) Enabling nfsroot capabilities
-    -----------------------------
-
-In order to use nfsroot, NFS client support needs to be selected as
-built-in during configuration. Once this has been selected, the nfsroot
-option will become available, which should also be selected.
-
-In the networking options, kernel level autoconfiguration can be selected,
-along with the types of autoconfiguration to support. Selecting all of
-DHCP, BOOTP and RARP is safe.
-
-
-
-
-2.) Kernel command line
-    -------------------
-
-When the kernel has been loaded by a boot loader (see below) it needs to be
-told what root fs device to use. And in the case of nfsroot, where to find
-both the server and the name of the directory on the server to mount as root.
-This can be established using the following kernel command line parameters:
-
-
-root=/dev/nfs
-
-  This is necessary to enable the pseudo-NFS-device. Note that it's not a
-  real device but just a synonym to tell the kernel to use NFS instead of
-  a real device.
-
-
-nfsroot=[<server-ip>:]<root-dir>[,<nfs-options>]
-
-  If the `nfsroot' parameter is NOT given on the command line,
-  the default "/tftpboot/%s" will be used.
-
-  <server-ip>	Specifies the IP address of the NFS server.
-		The default address is determined by the `ip' parameter
-		(see below). This parameter allows the use of different
-		servers for IP autoconfiguration and NFS.
-
-  <root-dir>	Name of the directory on the server to mount as root.
-		If there is a "%s" token in the string, it will be
-		replaced by the ASCII-representation of the client's
-		IP address.
-
-  <nfs-options>	Standard NFS options. All options are separated by commas.
-		The following defaults are used:
-			port		= as given by server portmap daemon
-			rsize		= 4096
-			wsize		= 4096
-			timeo		= 7
-			retrans		= 3
-			acregmin	= 3
-			acregmax	= 60
-			acdirmin	= 30
-			acdirmax	= 60
-			flags		= hard, nointr, noposix, cto, ac
-
-
-ip=<client-ip>:<server-ip>:<gw-ip>:<netmask>:<hostname>:<device>:<autoconf>
-
-  This parameter tells the kernel how to configure IP addresses of devices
-  and also how to set up the IP routing table. It was originally called
-  `nfsaddrs', but now the boot-time IP configuration works independently of
-  NFS, so it was renamed to `ip' and the old name remained as an alias for
-  compatibility reasons.
-
-  If this parameter is missing from the kernel command line, all fields are
-  assumed to be empty, and the defaults mentioned below apply. In general
-  this means that the kernel tries to configure everything using
-  autoconfiguration.
-
-  The <autoconf> parameter can appear alone as the value to the `ip'
-  parameter (without all the ':' characters before).  If the value is
-  "ip=off" or "ip=none", no autoconfiguration will take place, otherwise
-  autoconfiguration will take place.  The most common way to use this
-  is "ip=dhcp".
-
-  <client-ip>	IP address of the client.
-
-  		Default:  Determined using autoconfiguration.
-
-  <server-ip>	IP address of the NFS server. If RARP is used to determine
-		the client address and this parameter is NOT empty only
-		replies from the specified server are accepted.
-
-		Only required for for NFS root. That is autoconfiguration
-		will not be triggered if it is missing and NFS root is not
-		in operation.
-
-		Default: Determined using autoconfiguration.
-		         The address of the autoconfiguration server is used.
-
-  <gw-ip>	IP address of a gateway if the server is on a different subnet.
-
-		Default: Determined using autoconfiguration.
-
-  <netmask>	Netmask for local network interface. If unspecified
-		the netmask is derived from the client IP address assuming
-		classful addressing.
-
-		Default:  Determined using autoconfiguration.
-
-  <hostname>	Name of the client. May be supplied by autoconfiguration,
-  		but its absence will not trigger autoconfiguration.
-
-  		Default: Client IP address is used in ASCII notation.
-
-  <device>	Name of network device to use.
-
-		Default: If the host only has one device, it is used.
-			 Otherwise the device is determined using
-			 autoconfiguration. This is done by sending
-			 autoconfiguration requests out of all devices,
-			 and using the device that received the first reply.
-
-  <autoconf>	Method to use for autoconfiguration. In the case of options
-                which specify multiple autoconfiguration protocols,
-		requests are sent using all protocols, and the first one
-		to reply is used.
-
-		Only autoconfiguration protocols that have been compiled
-		into the kernel will be used, regardless of the value of
-		this option.
-
-                  off or none: don't use autoconfiguration
-				(do static IP assignment instead)
-		  on or any:   use any protocol available in the kernel
-			       (default)
-		  dhcp:        use DHCP
-		  bootp:       use BOOTP
-		  rarp:        use RARP
-		  both:        use both BOOTP and RARP but not DHCP
-		               (old option kept for backwards compatibility)
-
-                Default: any
-
-
-
-
-3.) Boot Loader
-    ----------
-
-To get the kernel into memory different approaches can be used.
-They depend on various facilities being available:
-
-
-3.1)  Booting from a floppy using syslinux
-
-	When building kernels, an easy way to create a boot floppy that uses
-	syslinux is to use the zdisk or bzdisk make targets which use
-      	and bzimage images respectively. Both targets accept the
-     	FDARGS parameter which can be used to set the kernel command line.
-
-	e.g.
-	   make bzdisk FDARGS="root=/dev/nfs"
-
-   	Note that the user running this command will need to have
-     	access to the floppy drive device, /dev/fd0
-
-     	For more information on syslinux, including how to create bootdisks
-     	for prebuilt kernels, see http://syslinux.zytor.com/
-
-	N.B: Previously it was possible to write a kernel directly to
-	     a floppy using dd, configure the boot device using rdev, and
-	     boot using the resulting floppy. Linux no longer supports this
-	     method of booting.
-
-3.2) Booting from a cdrom using isolinux
-
-     	When building kernels, an easy way to create a bootable cdrom that
-     	uses isolinux is to use the isoimage target which uses a bzimage
-     	image. Like zdisk and bzdisk, this target accepts the FDARGS
-     	parameter which can be used to set the kernel command line.
-
-	e.g.
-	  make isoimage FDARGS="root=/dev/nfs"
-
-     	The resulting iso image will be arch/<ARCH>/boot/image.iso
-     	This can be written to a cdrom using a variety of tools including
-     	cdrecord.
-
-	e.g.
-	  cdrecord dev=ATAPI:1,0,0 arch/i386/boot/image.iso
-
-     	For more information on isolinux, including how to create bootdisks
-     	for prebuilt kernels, see http://syslinux.zytor.com/
-
-3.2) Using LILO
-	When using LILO all the necessary command line parameters may be
-	specified using the 'append=' directive in the LILO configuration
-	file.
-
-	However, to use the 'root=' directive you also need to create
-	a dummy root device, which may be removed after LILO is run.
-
-	mknod /dev/boot255 c 0 255
-
-	For information on configuring LILO, please refer to its documentation.
-
-3.3) Using GRUB
-	When using GRUB, kernel parameter are simply appended after the kernel
-	specification: kernel <kernel> <parameters>
-
-3.4) Using loadlin
-	loadlin may be used to boot Linux from a DOS command prompt without
-	requiring a local hard disk to mount as root. This has not been
-	thoroughly tested by the authors of this document, but in general
-	it should be possible configure the kernel command line similarly
-	to the configuration of LILO.
-
-	Please refer to the loadlin documentation for further information.
-
-3.5) Using a boot ROM
-	This is probably the most elegant way of booting a diskless client.
-	With a boot ROM the kernel is loaded using the TFTP protocol. The
-	authors of this document are not aware of any no commercial boot
-	ROMs that support booting Linux over the network. However, there
-	are two free implementations of a boot ROM, netboot-nfs and
-	etherboot, both of which are available on sunsite.unc.edu, and both
-	of which contain everything you need to boot a diskless Linux client.
-
-3.6) Using pxelinux
-	Pxelinux may be used to boot linux using the PXE boot loader
-	which is present on many modern network cards.
-
-	When using pxelinux, the kernel image is specified using
-	"kernel <relative-path-below /tftpboot>". The nfsroot parameters
-	are passed to the kernel by adding them to the "append" line.
-	It is common to use serial console in conjunction with pxeliunx,
-	see Documentation/serial-console.txt for more information.
-
-	For more information on isolinux, including how to create bootdisks
-	for prebuilt kernels, see http://syslinux.zytor.com/
-
-
-
-
-4.) Credits
-    -------
-
-  The nfsroot code in the kernel and the RARP support have been written
-  by Gero Kuhlmann <gero@gkminix.han.de>.
-
-  The rest of the IP layer autoconfiguration code has been written
-  by Martin Mares <mj@atrey.karlin.mff.cuni.cz>.
-
-  In order to write the initial version of nfsroot I would like to thank
-  Jens-Uwe Mager <jum@anubis.han.de> for his help.
-	This document gives a brief introduction to the caching
-mechanisms in the sunrpc layer that is used, in particular,
-for NFS authentication.
-
-CACHES
-======
-The caching replaces the old exports table and allows for
-a wide variety of values to be caches.
-
-There are a number of caches that are similar in structure though
-quite possibly very different in content and use.  There is a corpus
-of common code for managing these caches.
-
-Examples of caches that are likely to be needed are:
-  - mapping from IP address to client name
-  - mapping from client name and filesystem to export options
-  - mapping from UID to list of GIDs, to work around NFS's limitation
-    of 16 gids.
-  - mappings between local UID/GID and remote UID/GID for sites that
-    do not have uniform uid assignment
-  - mapping from network identify to public key for crypto authentication.
-
-The common code handles such things as:
-   - general cache lookup with correct locking
-   - supporting 'NEGATIVE' as well as positive entries
-   - allowing an EXPIRED time on cache items, and removing
-     items after they expire, and are no longer in-use.
-   - making requests to user-space to fill in cache entries
-   - allowing user-space to directly set entries in the cache
-   - delaying RPC requests that depend on as-yet incomplete
-     cache entries, and replaying those requests when the cache entry
-     is complete.
-   - clean out old entries as they expire.
-
-Creating a Cache
-----------------
-
-1/ A cache needs a datum to store.  This is in the form of a
-   structure definition that must contain a
-     struct cache_head
-   as an element, usually the first.
-   It will also contain a key and some content.
-   Each cache element is reference counted and contains
-   expiry and update times for use in cache management.
-2/ A cache needs a "cache_detail" structure that
-   describes the cache.  This stores the hash table, some
-   parameters for cache management, and some operations detailing how
-   to work with particular cache items.
-   The operations requires are:
-   	struct cache_head *alloc(void)
-		This simply allocates appropriate memory and returns
-   		a pointer to the cache_detail embedded within the
-		structure
-	void cache_put(struct kref *)
-		This is called when the last reference to an item is
-		dropped.  The pointer passed is to the 'ref' field
-		in the cache_head.  cache_put should release any
-		references create by 'cache_init' and, if CACHE_VALID
-		is set, any references created by cache_update.
-		It should then release the memory allocated by
-   		'alloc'.
-        int match(struct cache_head *orig, struct cache_head *new)
-		test if the keys in the two structures match.  Return
-		1 if they do, 0 if they don't.
-	void init(struct cache_head *orig, struct cache_head *new)
-		Set the 'key' fields in 'new' from 'orig'.  This may
-		include taking references to shared objects.
-	void update(struct cache_head *orig, struct cache_head *new)
-		Set the 'content' fileds in 'new' from 'orig'.
-	int cache_show(struct seq_file *m, struct cache_detail *cd,
-			struct cache_head *h)
-		Optional.  Used to provide a /proc file that lists the
-		contents of a cache.  This should show one item,
-   		usually on just one line.
-	int cache_request(struct cache_detail *cd, struct cache_head *h,
-   		char **bpp, int *blen)
-		Format a request to be send to user-space for an item
-   		to be instantiated.  *bpp is a buffer of size *blen.
-		bpp should be moved forward over the encoded message,
-		and  *blen should be reduced to show how much free
-		space remains.  Return 0 on success or <0 if not
-		enough room or other problem.
-	int cache_parse(struct cache_detail *cd, char *buf, int len)
-		A message from user space has arrived to fill out a
-		cache entry.  It is in 'buf' of length 'len'.
-		cache_parse should parse this, find the item in the
-		cache with sunrpc_cache_lookup, and update the item
-		with sunrpc_cache_update.
-
-
-3/ A cache needs to be registered using cache_register().  This
-   includes it on a list of caches that will be regularly
-   cleaned to discard old data.
-
-Using a cache
--------------
-
-To find a value in a cache, call sunrpc_cache_lookup passing a pointer
-to the cache_head in a sample item with the 'key' fields filled in.
-This will be passed to ->match to identify the target entry.  If no
-entry is found, a new entry will be create, added to the cache, and
-marked as not containing valid data.
-
-The item returned is typically passed to cache_check which will check
-if the data is valid, and may initiate an up-call to get fresh data.
-cache_check will return -ENOENT in the entry is negative or if an up
-call is needed but not possible, -EAGAIN if an upcall is pending,
-or 0 if the data is valid;
-
-cache_check can be passed a "struct cache_req *".  This structure is
-typically embedded in the actual request and can be used to create a
-deferred copy of the request (struct cache_deferred_req).  This is
-done when the found cache item is not uptodate, but the is reason to
-believe that userspace might provide information soon.  When the cache
-item does become valid, the deferred copy of the request will be
-revisited (->revisit).  It is expected that this method will
-reschedule the request for processing.
-
-The value returned by sunrpc_cache_lookup can also be passed to
-sunrpc_cache_update to set the content for the item.  A second item is
-passed which should hold the content.  If the item found by _lookup
-has valid data, then it is discarded and a new item is created.  This
-saves any user of an item from worrying about content changing while
-it is being inspected.  If the item found by _lookup does not contain
-valid data, then the content is copied across and CACHE_VALID is set.
-
-Populating a cache
-------------------
-
-Each cache has a name, and when the cache is registered, a directory
-with that name is created in /proc/net/rpc
-
-This directory contains a file called 'channel' which is a channel
-for communicating between kernel and user for populating the cache.
-This directory may later contain other files of interacting
-with the cache.
-
-The 'channel' works a bit like a datagram socket. Each 'write' is
-passed as a whole to the cache for parsing and interpretation.
-Each cache can treat the write requests differently, but it is
-expected that a message written will contain:
-  - a key
-  - an expiry time
-  - a content.
-with the intention that an item in the cache with the give key
-should be create or updated to have the given content, and the
-expiry time should be set on that item.
-
-Reading from a channel is a bit more interesting.  When a cache
-lookup fails, or when it succeeds but finds an entry that may soon
-expire, a request is lodged for that cache item to be updated by
-user-space.  These requests appear in the channel file.
-
-Successive reads will return successive requests.
-If there are no more requests to return, read will return EOF, but a
-select or poll for read will block waiting for another request to be
-added.
-
-Thus a user-space helper is likely to:
-  open the channel.
-    select for readable
-    read a request
-    write a response
-  loop.
-
-If it dies and needs to be restarted, any requests that have not been
-answered will still appear in the file and will be read by the new
-instance of the helper.
-
-Each cache should define a "cache_parse" method which takes a message
-written from user-space and processes it.  It should return an error
-(which propagates back to the write syscall) or 0.
-
-Each cache should also define a "cache_request" method which
-takes a cache item and encodes a request into the buffer
-provided.
-
-Note: If a cache has no active readers on the channel, and has had not
-active readers for more than 60 seconds, further requests will not be
-added to the channel but instead all lookups that do not find a valid
-entry will fail.  This is partly for backward compatibility: The
-previous nfs exports table was deemed to be authoritative and a
-failed lookup meant a definite 'no'.
-
-request/response format
------------------------
-
-While each cache is free to use it's own format for requests
-and responses over channel, the following is recommended as
-appropriate and support routines are available to help:
-Each request or response record should be printable ASCII
-with precisely one newline character which should be at the end.
-Fields within the record should be separated by spaces, normally one.
-If spaces, newlines, or nul characters are needed in a field they
-much be quoted.  two mechanisms are available:
-1/ If a field begins '\x' then it must contain an even number of
-   hex digits, and pairs of these digits provide the bytes in the
-   field.
-2/ otherwise a \ in the field must be followed by 3 octal digits
-   which give the code for a byte.  Other characters are treated
-   as them selves.  At the very least, space, newline, nul, and
-   '\' must be quoted in this way.
-
-
-Real-Time group scheduling.
-
-The problem space:
-
-In order to schedule multiple groups of realtime tasks each group must
-be assigned a fixed portion of the CPU time available. Without a minimum
-guarantee a realtime group can obviously fall short. A fuzzy upper limit
-is of no use since it cannot be relied upon. Which leaves us with just
-the single fixed portion.
-
-CPU time is divided by means of specifying how much time can be spent
-running in a given period. Say a frame fixed realtime renderer must
-deliver 25 frames a second, which yields a period of 0.04s. Now say
-it will also have to play some music and respond to input, leaving it
-with around 80% for the graphics. We can then give this group a runtime
-of 0.8 * 0.04s = 0.032s.
-
-This way the graphics group will have a 0.04s period with a 0.032s runtime
-limit.
-
-Now if the audio thread needs to refill the DMA buffer every 0.005s, but
-needs only about 3% CPU time to do so, it can do with a 0.03 * 0.005s
-= 0.00015s.
-
-
-The Interface:
-
-system wide:
-
-/proc/sys/kernel/sched_rt_period_ms
-/proc/sys/kernel/sched_rt_runtime_us
-
-CONFIG_FAIR_USER_SCHED
-
-/sys/kernel/uids/<uid>/cpu_rt_runtime_us
-
-or
-
-CONFIG_FAIR_CGROUP_SCHED
-
-/cgroup/<cgroup>/cpu.rt_runtime_us
-
-[ time is specified in us because the interface is s32; this gives an
-  operating range of ~35m to 1us ]
-
-The period takes values in [ 1, INT_MAX ], runtime in [ -1, INT_MAX - 1 ].
-
-A runtime of -1 specifies runtime == period, ie. no limit.
-
-New groups get the period from /proc/sys/kernel/sched_rt_period_us and
-a runtime of 0.
-
-Settings are constrained to:
-
-   \Sum_{i} runtime_{i} / global_period <= global_runtime / global_period
-
-in order to keep the configuration schedulable.
@@ -12,6 +12,8 @@
 	- information on scheduling domains.
 sched-nice-design.txt
 	- How and why the scheduler's nice levels are implemented.
+sched-rt-group.txt
+	- real-time group scheduling.
 sched-stats.txt
 	- information on schedstats (Linux Scheduler Statistics).
+
+
+Real-Time group scheduling.
+
+The problem space:
+
+In order to schedule multiple groups of realtime tasks each group must
+be assigned a fixed portion of the CPU time available. Without a minimum
+guarantee a realtime group can obviously fall short. A fuzzy upper limit
+is of no use since it cannot be relied upon. Which leaves us with just
+the single fixed portion.
+
+CPU time is divided by means of specifying how much time can be spent
+running in a given period. Say a frame fixed realtime renderer must
+deliver 25 frames a second, which yields a period of 0.04s. Now say
+it will also have to play some music and respond to input, leaving it
+with around 80% for the graphics. We can then give this group a runtime
+of 0.8 * 0.04s = 0.032s.
+
+This way the graphics group will have a 0.04s period with a 0.032s runtime
+limit.
+
+Now if the audio thread needs to refill the DMA buffer every 0.005s, but
+needs only about 3% CPU time to do so, it can do with a 0.03 * 0.005s
+= 0.00015s.
+
+
+The Interface:
+
+system wide:
+
+/proc/sys/kernel/sched_rt_period_ms
+/proc/sys/kernel/sched_rt_runtime_us
+
+CONFIG_FAIR_USER_SCHED
+
+/sys/kernel/uids/<uid>/cpu_rt_runtime_us
+
+or
+
+CONFIG_FAIR_CGROUP_SCHED
+
+/cgroup/<cgroup>/cpu.rt_runtime_us
+
+[ time is specified in us because the interface is s32; this gives an
+  operating range of ~35m to 1us ]
+
+The period takes values in [ 1, INT_MAX ], runtime in [ -1, INT_MAX - 1 ].
+
+A runtime of -1 specifies runtime == period, ie. no limit.
+
+New groups get the period from /proc/sys/kernel/sched_rt_period_us and
+a runtime of 0.
+
+Settings are constrained to:
+
+   \Sum_{i} runtime_{i} / global_period <= global_runtime / global_period
+
+in order to keep the configuration schedulable.
@@ -5,6 +5,28 @@
 __SPIN_LOCK_UNLOCKED()/__RW_LOCK_UNLOCKED() as appropriate for static
 initialization.
  
+Most of the time, you can simply turn:
+
+	static spinlock_t xxx_lock = SPIN_LOCK_UNLOCKED;
+
+into:
+
+	static DEFINE_SPINLOCK(xxx_lock);
+
+Static structure member variables go from:
+
+	struct foo bar {
+		.lock	=	SPIN_LOCK_UNLOCKED;
+	};
+
+to:
+
+	struct foo bar {
+		.lock	=	__SPIN_LOCK_UNLOCKED(bar.lock);
+	};
+
+Declaration of static rw_locks undergo a similar transformation.
+
 Dynamic initialization, when necessary, may be performed as
 demonstrated below.
  
@@ -1744,10 +1744,10 @@
 	  If you want your Linux box to mount its whole root file system (the
 	  one containing the directory /) from some other computer over the
 	  net via NFS (presumably because your box doesn't have a hard disk),
-	  say Y. Read <file:Documentation/nfsroot.txt> for details. It is
-	  likely that in this case, you also want to say Y to "Kernel level IP
-	  autoconfiguration" so that your box can discover its network address
-	  at boot time.
+	  say Y. Read <file:Documentation/filesystems/nfsroot.txt> for
+	  details. It is likely that in this case, you also want to say Y to
+	  "Kernel level IP autoconfiguration" so that your box can discover
+	  its network address at boot time.
  
 	  Most people say N here.
  
@@ -341,6 +341,9 @@
  * atomic_dec_and_lock - lock on reaching reference count zero
  * @atomic: the atomic counter
  * @lock: the spinlock in question
+ *
+ * Decrements @atomic by 1.  If the result is 0, returns true and locks
+ * @lock.  Returns false for all other cases.
  */
 extern int _atomic_dec_and_lock(atomic_t *atomic, spinlock_t *lock);
 #define atomic_dec_and_lock(atomic, lock) \
@@ -160,7 +160,7 @@
  
 	  If unsure, say Y. Note that if you want to use DHCP, a DHCP server
 	  must be operating on your network.  Read
-	  <file:Documentation/nfsroot.txt> for details.
+	  <file:Documentation/filesystems/nfsroot.txt> for details.
  
 config IP_PNP_BOOTP
 	bool "IP: BOOTP support"
@@ -175,7 +175,7 @@
 	  does BOOTP itself, providing all necessary information on the kernel
 	  command line, you can say N here. If unsure, say Y. Note that if you
 	  want to use BOOTP, a BOOTP server must be operating on your network.
-	  Read <file:Documentation/nfsroot.txt> for details.
+	  Read <file:Documentation/filesystems/nfsroot.txt> for details.
  
 config IP_PNP_RARP
 	bool "IP: RARP support"
@@ -187,8 +187,8 @@
 	  discovered automatically at boot time using the RARP protocol (an
 	  older protocol which is being obsoleted by BOOTP and DHCP), say Y
 	  here. Note that if you want to use RARP, a RARP server must be
-	  operating on your network. Read <file:Documentation/nfsroot.txt> for
-	  details.
+	  operating on your network. Read
+	  <file:Documentation/filesystems/nfsroot.txt> for details.
  
 # not yet ready..
 #   bool '    IP: ARP support' CONFIG_IP_PNP_ARP		
@@ -1411,7 +1411,7 @@
  
 /*
  *  Decode any IP configuration options in the "ip=" or "nfsaddrs=" kernel
- *  command line parameter.  See Documentation/nfsroot.txt.
+ *  command line parameter.  See Documentation/filesystems/nfsroot.txt.
  */
 static int __init ic_proto_name(char *name)
 {
...	...	@@ -271,8 +271,6 @@
271	271	- directory with information on the NetLabel subsystem.
272	272	networking/
273	273	- directory with info on various aspects of networking with Linux.
274		-nfsroot.txt
275		- - short guide on setting up a diskless box with NFS root filesystem.
276	274	nmi_watchdog.txt
277	275	- info on NMI watchdog for SMP systems.
278	276	nommu-mmap.txt
...	...	@@ -321,8 +319,6 @@
321	319	- a description of what robust futexes are.
322	320	rocket.txt
323	321	- info on the Comtrol RocketPort multiport serial driver.
324		-rpc-cache.txt
325		- - introduction to the caching mechanisms in the sunrpc layer.
326	322	rt-mutex-design.txt
327	323	- description of the RealTime mutex implementation design.
328	324	rt-mutex.txt
...	...	@@ -328,7 +328,7 @@
328	328	point out some special detail about the sign-off.
329	329
330	330
331		-13) When to use Acked-by:
	331	+13) When to use Acked-by: and Cc:
332	332
333	333	The Signed-off-by: tag indicates that the signer was involved in the
334	334	development of the patch, or that he/she was in the patch's delivery path.
335	335
336	336
...	...	@@ -349,11 +349,59 @@
349	349	For example, if a patch affects multiple subsystems and has an Acked-by: from
350	350	one subsystem maintainer then this usually indicates acknowledgement of just
351	351	the part which affects that maintainer's code. Judgement should be used here.
352		- When in doubt people should refer to the original discussion in the mailing
	352	+When in doubt people should refer to the original discussion in the mailing
353	353	list archives.
354	354
	355	+If a person has had the opportunity to comment on a patch, but has not
	356	+provided such comments, you may optionally add a "Cc:" tag to the patch.
	357	+This is the only tag which might be added without an explicit action by the
	358	+person it names. This tag documents that potentially interested parties
	359	+have been included in the discussion
355	360
356		-14) The canonical patch format
	361	+
	362	+14) Using Test-by: and Reviewed-by:
	363	+
	364	+A Tested-by: tag indicates that the patch has been successfully tested (in
	365	+some environment) by the person named. This tag informs maintainers that
	366	+some testing has been performed, provides a means to locate testers for
	367	+future patches, and ensures credit for the testers.
	368	+
	369	+Reviewed-by:, instead, indicates that the patch has been reviewed and found
	370	+acceptable according to the Reviewer's Statement:
	371	+
	372	+ Reviewer's statement of oversight
	373	+
	374	+ By offering my Reviewed-by: tag, I state that:
	375	+
	376	+ (a) I have carried out a technical review of this patch to
	377	+ evaluate its appropriateness and readiness for inclusion into
	378	+ the mainline kernel.
	379	+
	380	+ (b) Any problems, concerns, or questions relating to the patch
	381	+ have been communicated back to the submitter. I am satisfied
	382	+ with the submitter's response to my comments.
	383	+
	384	+ (c) While there may be things that could be improved with this
	385	+ submission, I believe that it is, at this time, (1) a
	386	+ worthwhile modification to the kernel, and (2) free of known
	387	+ issues which would argue against its inclusion.
	388	+
	389	+ (d) While I have reviewed the patch and believe it to be sound, I
	390	+ do not (unless explicitly stated elsewhere) make any
	391	+ warranties or guarantees that it will achieve its stated
	392	+ purpose or function properly in any given situation.
	393	+
	394	+A Reviewed-by tag is a statement of opinion that the patch is an
	395	+appropriate modification of the kernel without any remaining serious
	396	+technical issues. Any interested reviewer (who has done the work) can
	397	+offer a Reviewed-by tag for a patch. This tag serves to give credit to
	398	+reviewers and to inform maintainers of the degree of review which has been
	399	+done on the patch. Reviewed-by: tags, when supplied by reviewers known to
	400	+understand the subject area and to perform thorough reviews, will normally
	401	+increase the liklihood of your patch getting into the kernel.
	402	+
	403	+
	404	+15) The canonical patch format
357	405
358	406	The canonical patch subject line is:
359	407
...	...	@@ -66,6 +66,8 @@
66	66	- info on the Linux implementation of Sys V mandatory file locking.
67	67	ncpfs.txt
68	68	- info on Novell Netware(tm) filesystem using NCP protocol.
	69	+nfsroot.txt
	70	+ - short guide on setting up a diskless box with NFS root filesystem.
69	71	ntfs.txt
70	72	- info and mount options for the NTFS filesystem (Windows NT).
71	73	ocfs2.txt
...	...	@@ -82,6 +84,10 @@
82	84	- info on relay, for efficient streaming from kernel to user space.
83	85	romfs.txt
84	86	- description of the ROMFS filesystem.
	87	+rpc-cache.txt
	88	+ - introduction to the caching mechanisms in the sunrpc layer.
	89	+seq_file.txt
	90	+ - how to use the seq_file API
85	91	sharedsubtree.txt
86	92	- a description of shared subtrees for namespaces.
87	93	smbfs.txt
	1	+Mounting the root filesystem via NFS (nfsroot)
	2	+===============================================
	3	+
	4	+Written 1996 by Gero Kuhlmann <gero@gkminix.han.de>
	5	+Updated 1997 by Martin Mares <mj@atrey.karlin.mff.cuni.cz>
	6	+Updated 2006 by Nico Schottelius <nico-kernel-nfsroot@schottelius.org>
	7	+Updated 2006 by Horms <horms@verge.net.au>
	8	+
	9	+
	10	+
	11	+In order to use a diskless system, such as an X-terminal or printer server
	12	+for example, it is necessary for the root filesystem to be present on a
	13	+non-disk device. This may be an initramfs (see Documentation/filesystems/
	14	+ramfs-rootfs-initramfs.txt), a ramdisk (see Documentation/initrd.txt) or a
	15	+filesystem mounted via NFS. The following text describes on how to use NFS
	16	+for the root filesystem. For the rest of this text 'client' means the
	17	+diskless system, and 'server' means the NFS server.
	18	+
	19	+
	20	+
	21	+
	22	+1.) Enabling nfsroot capabilities
	23	+ -----------------------------
	24	+
	25	+In order to use nfsroot, NFS client support needs to be selected as
	26	+built-in during configuration. Once this has been selected, the nfsroot
	27	+option will become available, which should also be selected.
	28	+
	29	+In the networking options, kernel level autoconfiguration can be selected,
	30	+along with the types of autoconfiguration to support. Selecting all of
	31	+DHCP, BOOTP and RARP is safe.
	32	+
	33	+
	34	+
	35	+
	36	+2.) Kernel command line
	37	+ -------------------
	38	+
	39	+When the kernel has been loaded by a boot loader (see below) it needs to be
	40	+told what root fs device to use. And in the case of nfsroot, where to find
	41	+both the server and the name of the directory on the server to mount as root.
	42	+This can be established using the following kernel command line parameters:
	43	+
	44	+
	45	+root=/dev/nfs
	46	+
	47	+ This is necessary to enable the pseudo-NFS-device. Note that it's not a
	48	+ real device but just a synonym to tell the kernel to use NFS instead of
	49	+ a real device.
	50	+
	51	+
	52	+nfsroot=[<server-ip>:]<root-dir>[,<nfs-options>]
	53	+
	54	+ If the `nfsroot' parameter is NOT given on the command line,
	55	+ the default "/tftpboot/%s" will be used.
	56	+
	57	+ <server-ip> Specifies the IP address of the NFS server.
	58	+ The default address is determined by the `ip' parameter
	59	+ (see below). This parameter allows the use of different
	60	+ servers for IP autoconfiguration and NFS.
	61	+
	62	+ <root-dir> Name of the directory on the server to mount as root.
	63	+ If there is a "%s" token in the string, it will be
	64	+ replaced by the ASCII-representation of the client's
	65	+ IP address.
	66	+
	67	+ <nfs-options> Standard NFS options. All options are separated by commas.
	68	+ The following defaults are used:
	69	+ port = as given by server portmap daemon
	70	+ rsize = 4096
	71	+ wsize = 4096
	72	+ timeo = 7
	73	+ retrans = 3
	74	+ acregmin = 3
	75	+ acregmax = 60
	76	+ acdirmin = 30
	77	+ acdirmax = 60
	78	+ flags = hard, nointr, noposix, cto, ac
	79	+
	80	+
	81	+ip=<client-ip>:<server-ip>:<gw-ip>:<netmask>:<hostname>:<device>:<autoconf>
	82	+
	83	+ This parameter tells the kernel how to configure IP addresses of devices
	84	+ and also how to set up the IP routing table. It was originally called
	85	+ `nfsaddrs', but now the boot-time IP configuration works independently of
	86	+ NFS, so it was renamed to `ip' and the old name remained as an alias for
	87	+ compatibility reasons.
	88	+
	89	+ If this parameter is missing from the kernel command line, all fields are
	90	+ assumed to be empty, and the defaults mentioned below apply. In general
	91	+ this means that the kernel tries to configure everything using
	92	+ autoconfiguration.
	93	+
	94	+ The <autoconf> parameter can appear alone as the value to the `ip'
	95	+ parameter (without all the ':' characters before). If the value is
	96	+ "ip=off" or "ip=none", no autoconfiguration will take place, otherwise
	97	+ autoconfiguration will take place. The most common way to use this
	98	+ is "ip=dhcp".
	99	+
	100	+ <client-ip> IP address of the client.
	101	+
	102	+ Default: Determined using autoconfiguration.
	103	+
	104	+ <server-ip> IP address of the NFS server. If RARP is used to determine
	105	+ the client address and this parameter is NOT empty only
	106	+ replies from the specified server are accepted.
	107	+
	108	+ Only required for for NFS root. That is autoconfiguration
	109	+ will not be triggered if it is missing and NFS root is not
	110	+ in operation.
	111	+
	112	+ Default: Determined using autoconfiguration.
	113	+ The address of the autoconfiguration server is used.
	114	+
	115	+ <gw-ip> IP address of a gateway if the server is on a different subnet.
	116	+
	117	+ Default: Determined using autoconfiguration.
	118	+
	119	+ <netmask> Netmask for local network interface. If unspecified
	120	+ the netmask is derived from the client IP address assuming
	121	+ classful addressing.
	122	+
	123	+ Default: Determined using autoconfiguration.
	124	+
	125	+ <hostname> Name of the client. May be supplied by autoconfiguration,
	126	+ but its absence will not trigger autoconfiguration.
	127	+
	128	+ Default: Client IP address is used in ASCII notation.
	129	+
	130	+ <device> Name of network device to use.
	131	+
	132	+ Default: If the host only has one device, it is used.
	133	+ Otherwise the device is determined using
	134	+ autoconfiguration. This is done by sending
	135	+ autoconfiguration requests out of all devices,
	136	+ and using the device that received the first reply.
	137	+
	138	+ <autoconf> Method to use for autoconfiguration. In the case of options
	139	+ which specify multiple autoconfiguration protocols,
	140	+ requests are sent using all protocols, and the first one
	141	+ to reply is used.
	142	+
	143	+ Only autoconfiguration protocols that have been compiled
	144	+ into the kernel will be used, regardless of the value of
	145	+ this option.
	146	+
	147	+ off or none: don't use autoconfiguration
	148	+ (do static IP assignment instead)
	149	+ on or any: use any protocol available in the kernel
	150	+ (default)
	151	+ dhcp: use DHCP
	152	+ bootp: use BOOTP
	153	+ rarp: use RARP
	154	+ both: use both BOOTP and RARP but not DHCP
	155	+ (old option kept for backwards compatibility)
	156	+
	157	+ Default: any
	158	+
	159	+
	160	+
	161	+
	162	+3.) Boot Loader
	163	+ ----------
	164	+
	165	+To get the kernel into memory different approaches can be used.
	166	+They depend on various facilities being available:
	167	+
	168	+
	169	+3.1) Booting from a floppy using syslinux
	170	+
	171	+ When building kernels, an easy way to create a boot floppy that uses
	172	+ syslinux is to use the zdisk or bzdisk make targets which use
	173	+ and bzimage images respectively. Both targets accept the
	174	+ FDARGS parameter which can be used to set the kernel command line.
	175	+
	176	+ e.g.
	177	+ make bzdisk FDARGS="root=/dev/nfs"
	178	+
	179	+ Note that the user running this command will need to have
	180	+ access to the floppy drive device, /dev/fd0
	181	+
	182	+ For more information on syslinux, including how to create bootdisks
	183	+ for prebuilt kernels, see http://syslinux.zytor.com/
	184	+
	185	+ N.B: Previously it was possible to write a kernel directly to
	186	+ a floppy using dd, configure the boot device using rdev, and
	187	+ boot using the resulting floppy. Linux no longer supports this
	188	+ method of booting.
	189	+
	190	+3.2) Booting from a cdrom using isolinux
	191	+
	192	+ When building kernels, an easy way to create a bootable cdrom that
	193	+ uses isolinux is to use the isoimage target which uses a bzimage
	194	+ image. Like zdisk and bzdisk, this target accepts the FDARGS
	195	+ parameter which can be used to set the kernel command line.
	196	+
	197	+ e.g.
	198	+ make isoimage FDARGS="root=/dev/nfs"
	199	+
	200	+ The resulting iso image will be arch/<ARCH>/boot/image.iso
	201	+ This can be written to a cdrom using a variety of tools including
	202	+ cdrecord.
	203	+
	204	+ e.g.
	205	+ cdrecord dev=ATAPI:1,0,0 arch/i386/boot/image.iso
	206	+
	207	+ For more information on isolinux, including how to create bootdisks
	208	+ for prebuilt kernels, see http://syslinux.zytor.com/
	209	+
	210	+3.2) Using LILO
	211	+ When using LILO all the necessary command line parameters may be
	212	+ specified using the 'append=' directive in the LILO configuration
	213	+ file.
	214	+
	215	+ However, to use the 'root=' directive you also need to create
	216	+ a dummy root device, which may be removed after LILO is run.
	217	+
	218	+ mknod /dev/boot255 c 0 255
	219	+
	220	+ For information on configuring LILO, please refer to its documentation.
	221	+
	222	+3.3) Using GRUB
	223	+ When using GRUB, kernel parameter are simply appended after the kernel
	224	+ specification: kernel <kernel> <parameters>
	225	+
	226	+3.4) Using loadlin
	227	+ loadlin may be used to boot Linux from a DOS command prompt without
	228	+ requiring a local hard disk to mount as root. This has not been
	229	+ thoroughly tested by the authors of this document, but in general
	230	+ it should be possible configure the kernel command line similarly
	231	+ to the configuration of LILO.
	232	+
	233	+ Please refer to the loadlin documentation for further information.
	234	+
	235	+3.5) Using a boot ROM
	236	+ This is probably the most elegant way of booting a diskless client.
	237	+ With a boot ROM the kernel is loaded using the TFTP protocol. The
	238	+ authors of this document are not aware of any no commercial boot
	239	+ ROMs that support booting Linux over the network. However, there
	240	+ are two free implementations of a boot ROM, netboot-nfs and
	241	+ etherboot, both of which are available on sunsite.unc.edu, and both
	242	+ of which contain everything you need to boot a diskless Linux client.
	243	+
	244	+3.6) Using pxelinux
	245	+ Pxelinux may be used to boot linux using the PXE boot loader
	246	+ which is present on many modern network cards.
	247	+
	248	+ When using pxelinux, the kernel image is specified using
	249	+ "kernel <relative-path-below /tftpboot>". The nfsroot parameters
	250	+ are passed to the kernel by adding them to the "append" line.
	251	+ It is common to use serial console in conjunction with pxeliunx,
	252	+ see Documentation/serial-console.txt for more information.
	253	+
	254	+ For more information on isolinux, including how to create bootdisks
	255	+ for prebuilt kernels, see http://syslinux.zytor.com/
	256	+
	257	+
	258	+
	259	+
	260	+4.) Credits
	261	+ -------
	262	+
	263	+ The nfsroot code in the kernel and the RARP support have been written
	264	+ by Gero Kuhlmann <gero@gkminix.han.de>.
	265	+
	266	+ The rest of the IP layer autoconfiguration code has been written
	267	+ by Martin Mares <mj@atrey.karlin.mff.cuni.cz>.
	268	+
	269	+ In order to write the initial version of nfsroot I would like to thank
	270	+ Jens-Uwe Mager <jum@anubis.han.de> for his help.
	1	+ This document gives a brief introduction to the caching
	2	+mechanisms in the sunrpc layer that is used, in particular,
	3	+for NFS authentication.
	4	+
	5	+CACHES
	6	+======
	7	+The caching replaces the old exports table and allows for
	8	+a wide variety of values to be caches.
	9	+
	10	+There are a number of caches that are similar in structure though
	11	+quite possibly very different in content and use. There is a corpus
	12	+of common code for managing these caches.
	13	+
	14	+Examples of caches that are likely to be needed are:
	15	+ - mapping from IP address to client name
	16	+ - mapping from client name and filesystem to export options
	17	+ - mapping from UID to list of GIDs, to work around NFS's limitation
	18	+ of 16 gids.
	19	+ - mappings between local UID/GID and remote UID/GID for sites that
	20	+ do not have uniform uid assignment
	21	+ - mapping from network identify to public key for crypto authentication.
	22	+
	23	+The common code handles such things as:
	24	+ - general cache lookup with correct locking
	25	+ - supporting 'NEGATIVE' as well as positive entries
	26	+ - allowing an EXPIRED time on cache items, and removing
	27	+ items after they expire, and are no longer in-use.
	28	+ - making requests to user-space to fill in cache entries
	29	+ - allowing user-space to directly set entries in the cache
	30	+ - delaying RPC requests that depend on as-yet incomplete
	31	+ cache entries, and replaying those requests when the cache entry
	32	+ is complete.
	33	+ - clean out old entries as they expire.
	34	+
	35	+Creating a Cache
	36	+----------------
	37	+
	38	+1/ A cache needs a datum to store. This is in the form of a
	39	+ structure definition that must contain a
	40	+ struct cache_head
	41	+ as an element, usually the first.
	42	+ It will also contain a key and some content.
	43	+ Each cache element is reference counted and contains
	44	+ expiry and update times for use in cache management.
	45	+2/ A cache needs a "cache_detail" structure that
	46	+ describes the cache. This stores the hash table, some
	47	+ parameters for cache management, and some operations detailing how
	48	+ to work with particular cache items.
	49	+ The operations requires are:
	50	+ struct cache_head *alloc(void)
	51	+ This simply allocates appropriate memory and returns
	52	+ a pointer to the cache_detail embedded within the
	53	+ structure
	54	+ void cache_put(struct kref *)
	55	+ This is called when the last reference to an item is
	56	+ dropped. The pointer passed is to the 'ref' field
	57	+ in the cache_head. cache_put should release any
	58	+ references create by 'cache_init' and, if CACHE_VALID
	59	+ is set, any references created by cache_update.
	60	+ It should then release the memory allocated by
	61	+ 'alloc'.
	62	+ int match(struct cache_head orig, struct cache_head new)
	63	+ test if the keys in the two structures match. Return
	64	+ 1 if they do, 0 if they don't.
	65	+ void init(struct cache_head orig, struct cache_head new)
	66	+ Set the 'key' fields in 'new' from 'orig'. This may
	67	+ include taking references to shared objects.
	68	+ void update(struct cache_head orig, struct cache_head new)
	69	+ Set the 'content' fileds in 'new' from 'orig'.
	70	+ int cache_show(struct seq_file m, struct cache_detail cd,
	71	+ struct cache_head *h)
	72	+ Optional. Used to provide a /proc file that lists the
	73	+ contents of a cache. This should show one item,
	74	+ usually on just one line.
	75	+ int cache_request(struct cache_detail cd, struct cache_head h,
	76	+ char *bpp, int blen)
	77	+ Format a request to be send to user-space for an item
	78	+ to be instantiated. bpp is a buffer of size blen.
	79	+ bpp should be moved forward over the encoded message,
	80	+ and *blen should be reduced to show how much free
	81	+ space remains. Return 0 on success or <0 if not
	82	+ enough room or other problem.
	83	+ int cache_parse(struct cache_detail cd, char buf, int len)
	84	+ A message from user space has arrived to fill out a
	85	+ cache entry. It is in 'buf' of length 'len'.
	86	+ cache_parse should parse this, find the item in the
	87	+ cache with sunrpc_cache_lookup, and update the item
	88	+ with sunrpc_cache_update.
	89	+
	90	+
	91	+3/ A cache needs to be registered using cache_register(). This
	92	+ includes it on a list of caches that will be regularly
	93	+ cleaned to discard old data.
	94	+
	95	+Using a cache
	96	+-------------
	97	+
	98	+To find a value in a cache, call sunrpc_cache_lookup passing a pointer
	99	+to the cache_head in a sample item with the 'key' fields filled in.
	100	+This will be passed to ->match to identify the target entry. If no
	101	+entry is found, a new entry will be create, added to the cache, and
	102	+marked as not containing valid data.
	103	+
	104	+The item returned is typically passed to cache_check which will check
	105	+if the data is valid, and may initiate an up-call to get fresh data.
	106	+cache_check will return -ENOENT in the entry is negative or if an up
	107	+call is needed but not possible, -EAGAIN if an upcall is pending,
	108	+or 0 if the data is valid;
	109	+
	110	+cache_check can be passed a "struct cache_req *". This structure is
	111	+typically embedded in the actual request and can be used to create a
	112	+deferred copy of the request (struct cache_deferred_req). This is
	113	+done when the found cache item is not uptodate, but the is reason to
	114	+believe that userspace might provide information soon. When the cache
	115	+item does become valid, the deferred copy of the request will be
	116	+revisited (->revisit). It is expected that this method will
	117	+reschedule the request for processing.
	118	+
	119	+The value returned by sunrpc_cache_lookup can also be passed to
	120	+sunrpc_cache_update to set the content for the item. A second item is
	121	+passed which should hold the content. If the item found by _lookup
	122	+has valid data, then it is discarded and a new item is created. This
	123	+saves any user of an item from worrying about content changing while
	124	+it is being inspected. If the item found by _lookup does not contain
	125	+valid data, then the content is copied across and CACHE_VALID is set.
	126	+
	127	+Populating a cache
	128	+------------------
	129	+
	130	+Each cache has a name, and when the cache is registered, a directory
	131	+with that name is created in /proc/net/rpc
	132	+
	133	+This directory contains a file called 'channel' which is a channel
	134	+for communicating between kernel and user for populating the cache.
	135	+This directory may later contain other files of interacting
	136	+with the cache.
	137	+
	138	+The 'channel' works a bit like a datagram socket. Each 'write' is
	139	+passed as a whole to the cache for parsing and interpretation.
	140	+Each cache can treat the write requests differently, but it is
	141	+expected that a message written will contain:
	142	+ - a key
	143	+ - an expiry time
	144	+ - a content.
	145	+with the intention that an item in the cache with the give key
	146	+should be create or updated to have the given content, and the
	147	+expiry time should be set on that item.
	148	+
	149	+Reading from a channel is a bit more interesting. When a cache
	150	+lookup fails, or when it succeeds but finds an entry that may soon
	151	+expire, a request is lodged for that cache item to be updated by
	152	+user-space. These requests appear in the channel file.
	153	+
	154	+Successive reads will return successive requests.
	155	+If there are no more requests to return, read will return EOF, but a
	156	+select or poll for read will block waiting for another request to be
	157	+added.
	158	+
	159	+Thus a user-space helper is likely to:
	160	+ open the channel.
	161	+ select for readable
	162	+ read a request
	163	+ write a response
	164	+ loop.
	165	+
	166	+If it dies and needs to be restarted, any requests that have not been
	167	+answered will still appear in the file and will be read by the new
	168	+instance of the helper.
	169	+
	170	+Each cache should define a "cache_parse" method which takes a message
	171	+written from user-space and processes it. It should return an error
	172	+(which propagates back to the write syscall) or 0.
	173	+
	174	+Each cache should also define a "cache_request" method which
	175	+takes a cache item and encodes a request into the buffer
	176	+provided.
	177	+
	178	+Note: If a cache has no active readers on the channel, and has had not
	179	+active readers for more than 60 seconds, further requests will not be
	180	+added to the channel but instead all lookups that do not find a valid
	181	+entry will fail. This is partly for backward compatibility: The
	182	+previous nfs exports table was deemed to be authoritative and a
	183	+failed lookup meant a definite 'no'.
	184	+
	185	+request/response format
	186	+-----------------------
	187	+
	188	+While each cache is free to use it's own format for requests
	189	+and responses over channel, the following is recommended as
	190	+appropriate and support routines are available to help:
	191	+Each request or response record should be printable ASCII
	192	+with precisely one newline character which should be at the end.
	193	+Fields within the record should be separated by spaces, normally one.
	194	+If spaces, newlines, or nul characters are needed in a field they
	195	+much be quoted. two mechanisms are available:
	196	+1/ If a field begins '\x' then it must contain an even number of
	197	+ hex digits, and pairs of these digits provide the bytes in the
	198	+ field.
	199	+2/ otherwise a \ in the field must be followed by 3 octal digits
	200	+ which give the code for a byte. Other characters are treated
	201	+ as them selves. At the very least, space, newline, nul, and
	202	+ '\' must be quoted in this way.
	1	+The seq_file interface
	2	+
	3	+ Copyright 2003 Jonathan Corbet <corbet@lwn.net>
	4	+ This file is originally from the LWN.net Driver Porting series at
	5	+ http://lwn.net/Articles/driver-porting/
	6	+
	7	+
	8	+There are numerous ways for a device driver (or other kernel component) to
	9	+provide information to the user or system administrator. One useful
	10	+technique is the creation of virtual files, in debugfs, /proc or elsewhere.
	11	+Virtual files can provide human-readable output that is easy to get at
	12	+without any special utility programs; they can also make life easier for
	13	+script writers. It is not surprising that the use of virtual files has
	14	+grown over the years.
	15	+
	16	+Creating those files correctly has always been a bit of a challenge,
	17	+however. It is not that hard to make a virtual file which returns a
	18	+string. But life gets trickier if the output is long - anything greater
	19	+than an application is likely to read in a single operation. Handling
	20	+multiple reads (and seeks) requires careful attention to the reader's
	21	+position within the virtual file - that position is, likely as not, in the
	22	+middle of a line of output. The kernel has traditionally had a number of
	23	+implementations that got this wrong.
	24	+
	25	+The 2.6 kernel contains a set of functions (implemented by Alexander Viro)
	26	+which are designed to make it easy for virtual file creators to get it
	27	+right.
	28	+
	29	+The seq_file interface is available via <linux/seq_file.h>. There are
	30	+three aspects to seq_file:
	31	+
	32	+ * An iterator interface which lets a virtual file implementation
	33	+ step through the objects it is presenting.
	34	+
	35	+ * Some utility functions for formatting objects for output without
	36	+ needing to worry about things like output buffers.
	37	+
	38	+ * A set of canned file_operations which implement most operations on
	39	+ the virtual file.
	40	+
	41	+We'll look at the seq_file interface via an extremely simple example: a
	42	+loadable module which creates a file called /proc/sequence. The file, when
	43	+read, simply produces a set of increasing integer values, one per line. The
	44	+sequence will continue until the user loses patience and finds something
	45	+better to do. The file is seekable, in that one can do something like the
	46	+following:
	47	+
	48	+ dd if=/proc/sequence of=out1 count=1
	49	+ dd if=/proc/sequence skip=1 out=out2 count=1
	50	+
	51	+Then concatenate the output files out1 and out2 and get the right
	52	+result. Yes, it is a thoroughly useless module, but the point is to show
	53	+how the mechanism works without getting lost in other details. (Those
	54	+wanting to see the full source for this module can find it at
	55	+http://lwn.net/Articles/22359/).
	56	+
	57	+
	58	+The iterator interface
	59	+
	60	+Modules implementing a virtual file with seq_file must implement a simple
	61	+iterator object that allows stepping through the data of interest.
	62	+Iterators must be able to move to a specific position - like the file they
	63	+implement - but the interpretation of that position is up to the iterator
	64	+itself. A seq_file implementation that is formatting firewall rules, for
	65	+example, could interpret position N as the Nth rule in the chain.
	66	+Positioning can thus be done in whatever way makes the most sense for the
	67	+generator of the data, which need not be aware of how a position translates
	68	+to an offset in the virtual file. The one obvious exception is that a
	69	+position of zero should indicate the beginning of the file.
	70	+
	71	+The /proc/sequence iterator just uses the count of the next number it
	72	+will output as its position.
	73	+
	74	+Four functions must be implemented to make the iterator work. The first,
	75	+called start() takes a position as an argument and returns an iterator
	76	+which will start reading at that position. For our simple sequence example,
	77	+the start() function looks like:
	78	+
	79	+ static void ct_seq_start(struct seq_file s, loff_t *pos)
	80	+ {
	81	+ loff_t *spos = kmalloc(sizeof(loff_t), GFP_KERNEL);
	82	+ if (! spos)
	83	+ return NULL;
	84	+ spos = pos;
	85	+ return spos;
	86	+ }
	87	+
	88	+The entire data structure for this iterator is a single loff_t value
	89	+holding the current position. There is no upper bound for the sequence
	90	+iterator, but that will not be the case for most other seq_file
	91	+implementations; in most cases the start() function should check for a
	92	+"past end of file" condition and return NULL if need be.
	93	+
	94	+For more complicated applications, the private field of the seq_file
	95	+structure can be used. There is also a special value whch can be returned
	96	+by the start() function called SEQ_START_TOKEN; it can be used if you wish
	97	+to instruct your show() function (described below) to print a header at the
	98	+top of the output. SEQ_START_TOKEN should only be used if the offset is
	99	+zero, however.
	100	+
	101	+The next function to implement is called, amazingly, next(); its job is to
	102	+move the iterator forward to the next position in the sequence. The
	103	+example module can simply increment the position by one; more useful
	104	+modules will do what is needed to step through some data structure. The
	105	+next() function returns a new iterator, or NULL if the sequence is
	106	+complete. Here's the example version:
	107	+
	108	+ static void ct_seq_next(struct seq_file s, void v, loff_t pos)
	109	+ {
	110	+ loff_t *spos = v;
	111	+ pos = ++spos;
	112	+ return spos;
	113	+ }
	114	+
	115	+The stop() function is called when iteration is complete; its job, of
	116	+course, is to clean up. If dynamic memory is allocated for the iterator,
	117	+stop() is the place to free it.
	118	+
	119	+ static void ct_seq_stop(struct seq_file s, void v)
	120	+ {
	121	+ kfree(v);
	122	+ }
	123	+
	124	+Finally, the show() function should format the object currently pointed to
	125	+by the iterator for output. It should return zero, or an error code if
	126	+something goes wrong. The example module's show() function is:
	127	+
	128	+ static int ct_seq_show(struct seq_file s, void v)
	129	+ {
	130	+ loff_t *spos = v;
	131	+ seq_printf(s, "%lld\n", (long long)*spos);
	132	+ return 0;
	133	+ }
	134	+
	135	+We will look at seq_printf() in a moment. But first, the definition of the
	136	+seq_file iterator is finished by creating a seq_operations structure with
	137	+the four functions we have just defined:
	138	+
	139	+ static const struct seq_operations ct_seq_ops = {
	140	+ .start = ct_seq_start,
	141	+ .next = ct_seq_next,
	142	+ .stop = ct_seq_stop,
	143	+ .show = ct_seq_show
	144	+ };
	145	+
	146	+This structure will be needed to tie our iterator to the /proc file in
	147	+a little bit.
	148	+
	149	+It's worth noting that the interator value returned by start() and
	150	+manipulated by the other functions is considered to be completely opaque by
	151	+the seq_file code. It can thus be anything that is useful in stepping
	152	+through the data to be output. Counters can be useful, but it could also be
	153	+a direct pointer into an array or linked list. Anything goes, as long as
	154	+the programmer is aware that things can happen between calls to the
	155	+iterator function. However, the seq_file code (by design) will not sleep
	156	+between the calls to start() and stop(), so holding a lock during that time
	157	+is a reasonable thing to do. The seq_file code will also avoid taking any
	158	+other locks while the iterator is active.
	159	+
	160	+
	161	+Formatted output
	162	+
	163	+The seq_file code manages positioning within the output created by the
	164	+iterator and getting it into the user's buffer. But, for that to work, that
	165	+output must be passed to the seq_file code. Some utility functions have
	166	+been defined which make this task easy.
	167	+
	168	+Most code will simply use seq_printf(), which works pretty much like
	169	+printk(), but which requires the seq_file pointer as an argument. It is
	170	+common to ignore the return value from seq_printf(), but a function
	171	+producing complicated output may want to check that value and quit if
	172	+something non-zero is returned; an error return means that the seq_file
	173	+buffer has been filled and further output will be discarded.
	174	+
	175	+For straight character output, the following functions may be used:
	176	+
	177	+ int seq_putc(struct seq_file *m, char c);
	178	+ int seq_puts(struct seq_file m, const char s);
	179	+ int seq_escape(struct seq_file m, const char s, const char *esc);
	180	+
	181	+The first two output a single character and a string, just like one would
	182	+expect. seq_escape() is like seq_puts(), except that any character in s
	183	+which is in the string esc will be represented in octal form in the output.
	184	+
	185	+There is also a function for printing filenames:
	186	+
	187	+ int seq_path(struct seq_file m, struct path path, char *esc);
	188	+
	189	+Here, path indicates the file of interest, and esc is a set of characters
	190	+which should be escaped in the output.
	191	+
	192	+
	193	+Making it all work
	194	+
	195	+So far, we have a nice set of functions which can produce output within the
	196	+seq_file system, but we have not yet turned them into a file that a user
	197	+can see. Creating a file within the kernel requires, of course, the
	198	+creation of a set of file_operations which implement the operations on that
	199	+file. The seq_file interface provides a set of canned operations which do
	200	+most of the work. The virtual file author still must implement the open()
	201	+method, however, to hook everything up. The open function is often a single
	202	+line, as in the example module:
	203	+
	204	+ static int ct_open(struct inode inode, struct file file)
	205	+ {
	206	+ return seq_open(file, &ct_seq_ops);
	207	+ }
	208	+
	209	+Here, the call to seq_open() takes the seq_operations structure we created
	210	+before, and gets set up to iterate through the virtual file.
	211	+
	212	+On a successful open, seq_open() stores the struct seq_file pointer in
	213	+file->private_data. If you have an application where the same iterator can
	214	+be used for more than one file, you can store an arbitrary pointer in the
	215	+private field of the seq_file structure; that value can then be retrieved
	216	+by the iterator functions.
	217	+
	218	+The other operations of interest - read(), llseek(), and release() - are
	219	+all implemented by the seq_file code itself. So a virtual file's
	220	+file_operations structure will look like:
	221	+
	222	+ static const struct file_operations ct_file_ops = {
	223	+ .owner = THIS_MODULE,
	224	+ .open = ct_open,
	225	+ .read = seq_read,
	226	+ .llseek = seq_lseek,
	227	+ .release = seq_release
	228	+ };
	229	+
	230	+There is also a seq_release_private() which passes the contents of the
	231	+seq_file private field to kfree() before releasing the structure.
	232	+
	233	+The final step is the creation of the /proc file itself. In the example
	234	+code, that is done in the initialization code in the usual way:
	235	+
	236	+ static int ct_init(void)
	237	+ {
	238	+ struct proc_dir_entry *entry;
	239	+
	240	+ entry = create_proc_entry("sequence", 0, NULL);
	241	+ if (entry)
	242	+ entry->proc_fops = &ct_file_ops;
	243	+ return 0;
	244	+ }
	245	+
	246	+ module_init(ct_init);
	247	+
	248	+And that is pretty much it.
	249	+
	250	+
	251	+seq_list
	252	+
	253	+If your file will be iterating through a linked list, you may find these
	254	+routines useful:
	255	+
	256	+ struct list_head seq_list_start(struct list_head head,
	257	+ loff_t pos);
	258	+ struct list_head seq_list_start_head(struct list_head head,
	259	+ loff_t pos);
	260	+ struct list_head seq_list_next(void v, struct list_head *head,
	261	+ loff_t *ppos);
	262	+
	263	+These helpers will interpret pos as a position within the list and iterate
	264	+accordingly. Your start() and next() functions need only invoke the
	265	+seq_list_* helpers with a pointer to the appropriate list_head structure.
	266	+
	267	+
	268	+The extra-simple version
	269	+
	270	+For extremely simple virtual files, there is an even easier interface. A
	271	+module can define only the show() function, which should create all the
	272	+output that the virtual file will contain. The file's open() method then
	273	+calls:
	274	+
	275	+ int single_open(struct file *file,
	276	+ int (show)(struct seq_file m, void *p),
	277	+ void *data);
	278	+
	279	+When output time comes, the show() function will be called once. The data
	280	+value given to single_open() can be found in the private field of the
	281	+seq_file structure. When using single_open(), the programmer should use
	282	+single_release() instead of seq_release() in the file_operations structure
	283	+to avoid a memory leak.
...	...	@@ -98,7 +98,7 @@
98	98	event devices are used to provide local CPU functionality such as process
99	99	accounting, profiling, and high resolution timers.
100	100
101		-The management layer assignes one or more of the folliwing functions to a clock
	101	+The management layer assigns one or more of the following functions to a clock
102	102	event device:
103	103	- system global periodic tick (jiffies update)
104	104	- cpu local update_process_times
...	...	@@ -844,7 +844,7 @@
844	844	arch/alpha/kernel/core_marvel.c.
845	845
846	846	ip= [IP_PNP]
847		- See Documentation/nfsroot.txt.
	847	+ See Documentation/filesystems/nfsroot.txt.
848	848
849	849	ip2= [HW] Set IO/IRQ pairs for up to 4 IntelliPort boards
850	850	See comment before ip2_setup() in
851	851
...	...	@@ -1198,10 +1198,10 @@
1198	1198	file if at all.
1199	1199
1200	1200	nfsaddrs= [NFS]
1201		- See Documentation/nfsroot.txt.
	1201	+ See Documentation/filesystems/nfsroot.txt.
1202	1202
1203	1203	nfsroot= [NFS] nfs root filesystem for disk-less boxes.
1204		- See Documentation/nfsroot.txt.
	1204	+ See Documentation/filesystems/nfsroot.txt.
1205	1205
1206	1206	nfs.callback_tcpport=
1207	1207	[NFS] set the TCP port on which the NFSv4 callback
1		-Mounting the root filesystem via NFS (nfsroot)
2		-===============================================
3		-
4		-Written 1996 by Gero Kuhlmann <gero@gkminix.han.de>
5		-Updated 1997 by Martin Mares <mj@atrey.karlin.mff.cuni.cz>
6		-Updated 2006 by Nico Schottelius <nico-kernel-nfsroot@schottelius.org>
7		-Updated 2006 by Horms <horms@verge.net.au>
8		-
9		-
10		-
11		-In order to use a diskless system, such as an X-terminal or printer server
12		-for example, it is necessary for the root filesystem to be present on a
13		-non-disk device. This may be an initramfs (see Documentation/filesystems/
14		-ramfs-rootfs-initramfs.txt), a ramdisk (see Documentation/initrd.txt) or a
15		-filesystem mounted via NFS. The following text describes on how to use NFS
16		-for the root filesystem. For the rest of this text 'client' means the
17		-diskless system, and 'server' means the NFS server.
18		-
19		-
20		-
21		-
22		-1.) Enabling nfsroot capabilities
23		- -----------------------------
24		-
25		-In order to use nfsroot, NFS client support needs to be selected as
26		-built-in during configuration. Once this has been selected, the nfsroot
27		-option will become available, which should also be selected.
28		-
29		-In the networking options, kernel level autoconfiguration can be selected,
30		-along with the types of autoconfiguration to support. Selecting all of
31		-DHCP, BOOTP and RARP is safe.
32		-
33		-
34		-
35		-
36		-2.) Kernel command line
37		- -------------------
38		-
39		-When the kernel has been loaded by a boot loader (see below) it needs to be
40		-told what root fs device to use. And in the case of nfsroot, where to find
41		-both the server and the name of the directory on the server to mount as root.
42		-This can be established using the following kernel command line parameters:
43		-
44		-
45		-root=/dev/nfs
46		-
47		- This is necessary to enable the pseudo-NFS-device. Note that it's not a
48		- real device but just a synonym to tell the kernel to use NFS instead of
49		- a real device.
50		-
51		-
52		-nfsroot=[<server-ip>:]<root-dir>[,<nfs-options>]
53		-
54		- If the `nfsroot' parameter is NOT given on the command line,
55		- the default "/tftpboot/%s" will be used.
56		-
57		- <server-ip> Specifies the IP address of the NFS server.
58		- The default address is determined by the `ip' parameter
59		- (see below). This parameter allows the use of different
60		- servers for IP autoconfiguration and NFS.
61		-
62		- <root-dir> Name of the directory on the server to mount as root.
63		- If there is a "%s" token in the string, it will be
64		- replaced by the ASCII-representation of the client's
65		- IP address.
66		-
67		- <nfs-options> Standard NFS options. All options are separated by commas.
68		- The following defaults are used:
69		- port = as given by server portmap daemon
70		- rsize = 4096
71		- wsize = 4096
72		- timeo = 7
73		- retrans = 3
74		- acregmin = 3
75		- acregmax = 60
76		- acdirmin = 30
77		- acdirmax = 60
78		- flags = hard, nointr, noposix, cto, ac
79		-
80		-
81		-ip=<client-ip>:<server-ip>:<gw-ip>:<netmask>:<hostname>:<device>:<autoconf>
82		-
83		- This parameter tells the kernel how to configure IP addresses of devices
84		- and also how to set up the IP routing table. It was originally called
85		- `nfsaddrs', but now the boot-time IP configuration works independently of
86		- NFS, so it was renamed to `ip' and the old name remained as an alias for
87		- compatibility reasons.
88		-
89		- If this parameter is missing from the kernel command line, all fields are
90		- assumed to be empty, and the defaults mentioned below apply. In general
91		- this means that the kernel tries to configure everything using
92		- autoconfiguration.
93		-
94		- The <autoconf> parameter can appear alone as the value to the `ip'
95		- parameter (without all the ':' characters before). If the value is
96		- "ip=off" or "ip=none", no autoconfiguration will take place, otherwise
97		- autoconfiguration will take place. The most common way to use this
98		- is "ip=dhcp".
99		-
100		- <client-ip> IP address of the client.
101		-
102		- Default: Determined using autoconfiguration.
103		-
104		- <server-ip> IP address of the NFS server. If RARP is used to determine
105		- the client address and this parameter is NOT empty only
106		- replies from the specified server are accepted.
107		-
108		- Only required for for NFS root. That is autoconfiguration
109		- will not be triggered if it is missing and NFS root is not
110		- in operation.
111		-
112		- Default: Determined using autoconfiguration.
113		- The address of the autoconfiguration server is used.
114		-
115		- <gw-ip> IP address of a gateway if the server is on a different subnet.
116		-
117		- Default: Determined using autoconfiguration.
118		-
119		- <netmask> Netmask for local network interface. If unspecified
120		- the netmask is derived from the client IP address assuming
121		- classful addressing.
122		-
123		- Default: Determined using autoconfiguration.
124		-
125		- <hostname> Name of the client. May be supplied by autoconfiguration,
126		- but its absence will not trigger autoconfiguration.
127		-
128		- Default: Client IP address is used in ASCII notation.
129		-
130		- <device> Name of network device to use.
131		-
132		- Default: If the host only has one device, it is used.
133		- Otherwise the device is determined using
134		- autoconfiguration. This is done by sending
135		- autoconfiguration requests out of all devices,
136		- and using the device that received the first reply.
137		-
138		- <autoconf> Method to use for autoconfiguration. In the case of options
139		- which specify multiple autoconfiguration protocols,
140		- requests are sent using all protocols, and the first one
141		- to reply is used.
142		-
143		- Only autoconfiguration protocols that have been compiled
144		- into the kernel will be used, regardless of the value of
145		- this option.
146		-
147		- off or none: don't use autoconfiguration
148		- (do static IP assignment instead)
149		- on or any: use any protocol available in the kernel
150		- (default)
151		- dhcp: use DHCP
152		- bootp: use BOOTP
153		- rarp: use RARP
154		- both: use both BOOTP and RARP but not DHCP
155		- (old option kept for backwards compatibility)
156		-
157		- Default: any
158		-
159		-
160		-
161		-
162		-3.) Boot Loader
163		- ----------
164		-
165		-To get the kernel into memory different approaches can be used.
166		-They depend on various facilities being available:
167		-
168		-
169		-3.1) Booting from a floppy using syslinux
170		-
171		- When building kernels, an easy way to create a boot floppy that uses
172		- syslinux is to use the zdisk or bzdisk make targets which use
173		- and bzimage images respectively. Both targets accept the
174		- FDARGS parameter which can be used to set the kernel command line.
175		-
176		- e.g.
177		- make bzdisk FDARGS="root=/dev/nfs"
178		-
179		- Note that the user running this command will need to have
180		- access to the floppy drive device, /dev/fd0
181		-
182		- For more information on syslinux, including how to create bootdisks
183		- for prebuilt kernels, see http://syslinux.zytor.com/
184		-
185		- N.B: Previously it was possible to write a kernel directly to
186		- a floppy using dd, configure the boot device using rdev, and
187		- boot using the resulting floppy. Linux no longer supports this
188		- method of booting.
189		-
190		-3.2) Booting from a cdrom using isolinux
191		-
192		- When building kernels, an easy way to create a bootable cdrom that
193		- uses isolinux is to use the isoimage target which uses a bzimage
194		- image. Like zdisk and bzdisk, this target accepts the FDARGS
195		- parameter which can be used to set the kernel command line.
196		-
197		- e.g.
198		- make isoimage FDARGS="root=/dev/nfs"
199		-
200		- The resulting iso image will be arch/<ARCH>/boot/image.iso
201		- This can be written to a cdrom using a variety of tools including
202		- cdrecord.
203		-
204		- e.g.
205		- cdrecord dev=ATAPI:1,0,0 arch/i386/boot/image.iso
206		-
207		- For more information on isolinux, including how to create bootdisks
208		- for prebuilt kernels, see http://syslinux.zytor.com/
209		-
210		-3.2) Using LILO
211		- When using LILO all the necessary command line parameters may be
212		- specified using the 'append=' directive in the LILO configuration
213		- file.
214		-
215		- However, to use the 'root=' directive you also need to create
216		- a dummy root device, which may be removed after LILO is run.
217		-
218		- mknod /dev/boot255 c 0 255
219		-
220		- For information on configuring LILO, please refer to its documentation.
221		-
222		-3.3) Using GRUB
223		- When using GRUB, kernel parameter are simply appended after the kernel
224		- specification: kernel <kernel> <parameters>
225		-
226		-3.4) Using loadlin
227		- loadlin may be used to boot Linux from a DOS command prompt without
228		- requiring a local hard disk to mount as root. This has not been
229		- thoroughly tested by the authors of this document, but in general
230		- it should be possible configure the kernel command line similarly
231		- to the configuration of LILO.
232		-
233		- Please refer to the loadlin documentation for further information.
234		-
235		-3.5) Using a boot ROM
236		- This is probably the most elegant way of booting a diskless client.
237		- With a boot ROM the kernel is loaded using the TFTP protocol. The
238		- authors of this document are not aware of any no commercial boot
239		- ROMs that support booting Linux over the network. However, there
240		- are two free implementations of a boot ROM, netboot-nfs and
241		- etherboot, both of which are available on sunsite.unc.edu, and both
242		- of which contain everything you need to boot a diskless Linux client.
243		-
244		-3.6) Using pxelinux
245		- Pxelinux may be used to boot linux using the PXE boot loader
246		- which is present on many modern network cards.
247		-
248		- When using pxelinux, the kernel image is specified using
249		- "kernel <relative-path-below /tftpboot>". The nfsroot parameters
250		- are passed to the kernel by adding them to the "append" line.
251		- It is common to use serial console in conjunction with pxeliunx,
252		- see Documentation/serial-console.txt for more information.
253		-
254		- For more information on isolinux, including how to create bootdisks
255		- for prebuilt kernels, see http://syslinux.zytor.com/
256		-
257		-
258		-
259		-
260		-4.) Credits
261		- -------
262		-
263		- The nfsroot code in the kernel and the RARP support have been written
264		- by Gero Kuhlmann <gero@gkminix.han.de>.
265		-
266		- The rest of the IP layer autoconfiguration code has been written
267		- by Martin Mares <mj@atrey.karlin.mff.cuni.cz>.
268		-
269		- In order to write the initial version of nfsroot I would like to thank
270		- Jens-Uwe Mager <jum@anubis.han.de> for his help.
1		- This document gives a brief introduction to the caching
2		-mechanisms in the sunrpc layer that is used, in particular,
3		-for NFS authentication.
4		-
5		-CACHES
6		-======
7		-The caching replaces the old exports table and allows for
8		-a wide variety of values to be caches.
9		-
10		-There are a number of caches that are similar in structure though
11		-quite possibly very different in content and use. There is a corpus
12		-of common code for managing these caches.
13		-
14		-Examples of caches that are likely to be needed are:
15		- - mapping from IP address to client name
16		- - mapping from client name and filesystem to export options
17		- - mapping from UID to list of GIDs, to work around NFS's limitation
18		- of 16 gids.
19		- - mappings between local UID/GID and remote UID/GID for sites that
20		- do not have uniform uid assignment
21		- - mapping from network identify to public key for crypto authentication.
22		-
23		-The common code handles such things as:
24		- - general cache lookup with correct locking
25		- - supporting 'NEGATIVE' as well as positive entries
26		- - allowing an EXPIRED time on cache items, and removing
27		- items after they expire, and are no longer in-use.
28		- - making requests to user-space to fill in cache entries
29		- - allowing user-space to directly set entries in the cache
30		- - delaying RPC requests that depend on as-yet incomplete
31		- cache entries, and replaying those requests when the cache entry
32		- is complete.
33		- - clean out old entries as they expire.
34		-
35		-Creating a Cache
36		-----------------
37		-
38		-1/ A cache needs a datum to store. This is in the form of a
39		- structure definition that must contain a
40		- struct cache_head
41		- as an element, usually the first.
42		- It will also contain a key and some content.
43		- Each cache element is reference counted and contains
44		- expiry and update times for use in cache management.
45		-2/ A cache needs a "cache_detail" structure that
46		- describes the cache. This stores the hash table, some
47		- parameters for cache management, and some operations detailing how
48		- to work with particular cache items.
49		- The operations requires are:
50		- struct cache_head *alloc(void)
51		- This simply allocates appropriate memory and returns
52		- a pointer to the cache_detail embedded within the
53		- structure
54		- void cache_put(struct kref *)
55		- This is called when the last reference to an item is
56		- dropped. The pointer passed is to the 'ref' field
57		- in the cache_head. cache_put should release any
58		- references create by 'cache_init' and, if CACHE_VALID
59		- is set, any references created by cache_update.
60		- It should then release the memory allocated by
61		- 'alloc'.
62		- int match(struct cache_head orig, struct cache_head new)
63		- test if the keys in the two structures match. Return
64		- 1 if they do, 0 if they don't.
65		- void init(struct cache_head orig, struct cache_head new)
66		- Set the 'key' fields in 'new' from 'orig'. This may
67		- include taking references to shared objects.
68		- void update(struct cache_head orig, struct cache_head new)
69		- Set the 'content' fileds in 'new' from 'orig'.
70		- int cache_show(struct seq_file m, struct cache_detail cd,
71		- struct cache_head *h)
72		- Optional. Used to provide a /proc file that lists the
73		- contents of a cache. This should show one item,
74		- usually on just one line.
75		- int cache_request(struct cache_detail cd, struct cache_head h,
76		- char *bpp, int blen)
77		- Format a request to be send to user-space for an item
78		- to be instantiated. bpp is a buffer of size blen.
79		- bpp should be moved forward over the encoded message,
80		- and *blen should be reduced to show how much free
81		- space remains. Return 0 on success or <0 if not
82		- enough room or other problem.
83		- int cache_parse(struct cache_detail cd, char buf, int len)
84		- A message from user space has arrived to fill out a
85		- cache entry. It is in 'buf' of length 'len'.
86		- cache_parse should parse this, find the item in the
87		- cache with sunrpc_cache_lookup, and update the item
88		- with sunrpc_cache_update.
89		-
90		-
91		-3/ A cache needs to be registered using cache_register(). This
92		- includes it on a list of caches that will be regularly
93		- cleaned to discard old data.
94		-
95		-Using a cache
96		--------------
97		-
98		-To find a value in a cache, call sunrpc_cache_lookup passing a pointer
99		-to the cache_head in a sample item with the 'key' fields filled in.
100		-This will be passed to ->match to identify the target entry. If no
101		-entry is found, a new entry will be create, added to the cache, and
102		-marked as not containing valid data.
103		-
104		-The item returned is typically passed to cache_check which will check
105		-if the data is valid, and may initiate an up-call to get fresh data.
106		-cache_check will return -ENOENT in the entry is negative or if an up
107		-call is needed but not possible, -EAGAIN if an upcall is pending,
108		-or 0 if the data is valid;
109		-
110		-cache_check can be passed a "struct cache_req *". This structure is
111		-typically embedded in the actual request and can be used to create a
112		-deferred copy of the request (struct cache_deferred_req). This is
113		-done when the found cache item is not uptodate, but the is reason to
114		-believe that userspace might provide information soon. When the cache
115		-item does become valid, the deferred copy of the request will be
116		-revisited (->revisit). It is expected that this method will
117		-reschedule the request for processing.
118		-
119		-The value returned by sunrpc_cache_lookup can also be passed to
120		-sunrpc_cache_update to set the content for the item. A second item is
121		-passed which should hold the content. If the item found by _lookup
122		-has valid data, then it is discarded and a new item is created. This
123		-saves any user of an item from worrying about content changing while
124		-it is being inspected. If the item found by _lookup does not contain
125		-valid data, then the content is copied across and CACHE_VALID is set.
126		-
127		-Populating a cache
128		-------------------
129		-
130		-Each cache has a name, and when the cache is registered, a directory
131		-with that name is created in /proc/net/rpc
132		-
133		-This directory contains a file called 'channel' which is a channel
134		-for communicating between kernel and user for populating the cache.
135		-This directory may later contain other files of interacting
136		-with the cache.
137		-
138		-The 'channel' works a bit like a datagram socket. Each 'write' is
139		-passed as a whole to the cache for parsing and interpretation.
140		-Each cache can treat the write requests differently, but it is
141		-expected that a message written will contain:
142		- - a key
143		- - an expiry time
144		- - a content.
145		-with the intention that an item in the cache with the give key
146		-should be create or updated to have the given content, and the
147		-expiry time should be set on that item.
148		-
149		-Reading from a channel is a bit more interesting. When a cache
150		-lookup fails, or when it succeeds but finds an entry that may soon
151		-expire, a request is lodged for that cache item to be updated by
152		-user-space. These requests appear in the channel file.
153		-
154		-Successive reads will return successive requests.
155		-If there are no more requests to return, read will return EOF, but a
156		-select or poll for read will block waiting for another request to be
157		-added.
158		-
159		-Thus a user-space helper is likely to:
160		- open the channel.
161		- select for readable
162		- read a request
163		- write a response
164		- loop.
165		-
166		-If it dies and needs to be restarted, any requests that have not been
167		-answered will still appear in the file and will be read by the new
168		-instance of the helper.
169		-
170		-Each cache should define a "cache_parse" method which takes a message
171		-written from user-space and processes it. It should return an error
172		-(which propagates back to the write syscall) or 0.
173		-
174		-Each cache should also define a "cache_request" method which
175		-takes a cache item and encodes a request into the buffer
176		-provided.
177		-
178		-Note: If a cache has no active readers on the channel, and has had not
179		-active readers for more than 60 seconds, further requests will not be
180		-added to the channel but instead all lookups that do not find a valid
181		-entry will fail. This is partly for backward compatibility: The
182		-previous nfs exports table was deemed to be authoritative and a
183		-failed lookup meant a definite 'no'.
184		-
185		-request/response format
186		------------------------
187		-
188		-While each cache is free to use it's own format for requests
189		-and responses over channel, the following is recommended as
190		-appropriate and support routines are available to help:
191		-Each request or response record should be printable ASCII
192		-with precisely one newline character which should be at the end.
193		-Fields within the record should be separated by spaces, normally one.
194		-If spaces, newlines, or nul characters are needed in a field they
195		-much be quoted. two mechanisms are available:
196		-1/ If a field begins '\x' then it must contain an even number of
197		- hex digits, and pairs of these digits provide the bytes in the
198		- field.
199		-2/ otherwise a \ in the field must be followed by 3 octal digits
200		- which give the code for a byte. Other characters are treated
201		- as them selves. At the very least, space, newline, nul, and
202		- '\' must be quoted in this way.