13 Jul, 2011

1 commit

  • If the client is using NFS v4.1, then we can use SECINFO_NO_NAME to find
    the secflavor for the initial mount. If the server doesn't support
    SECINFO_NO_NAME then I fall back on the "guess and check" method used
    for v4.0 mounts.

    Signed-off-by: Bryan Schumaker
    Signed-off-by: Trond Myklebust

    Bryan Schumaker
     

13 Apr, 2011

1 commit


07 Apr, 2011

1 commit

  • rpc_authflavor_t is cast from an unsigned int, but the
    initial code tried to use it as a signed int. I fix
    this by passing an rpc_authflavor_t pointer around, and
    returning signed integers from functions.

    Signed-off-by: Bryan Schumaker
    Reported-by: Dan Carpenter
    Signed-off-by: Trond Myklebust

    Bryan Schumaker
     

27 Mar, 2011

1 commit


25 Mar, 2011

2 commits


21 Mar, 2011

1 commit


17 Mar, 2011

2 commits


16 Jan, 2011

3 commits

  • Unexport do_add_mount() and make ->d_automount() return the vfsmount to be
    added rather than calling do_add_mount() itself. follow_automount() will then
    do the addition.

    This slightly complicates things as ->d_automount() normally wants to add the
    new vfsmount to an expiration list and start an expiration timer. The problem
    with that is that the vfsmount will be deleted if it has a refcount of 1 and
    the timer will not repeat if the expiration list is empty.

    To this end, we require the vfsmount to be returned from d_automount() with a
    refcount of (at least) 2. One of these refs will be dropped unconditionally.
    In addition, follow_automount() must get a 3rd ref around the call to
    do_add_mount() lest it eat a ref and return an error, leaving the mount we
    have open to being expired as we would otherwise have only 1 ref on it.

    d_automount() should also add the the vfsmount to the expiration list (by
    calling mnt_set_expiry()) and start the expiration timer before returning, if
    this mechanism is to be used. The vfsmount will be unlinked from the
    expiration list by follow_automount() if do_add_mount() fails.

    This patch also fixes the call to do_add_mount() for AFS to propagate the mount
    flags from the parent vfsmount.

    Signed-off-by: David Howells
    Signed-off-by: Al Viro

    David Howells
     
  • Make NFS use the new d_automount() dentry operation rather than abusing
    follow_link() on directories.

    Signed-off-by: David Howells
    Acked-by: Trond Myklebust
    Acked-by: Ian Kent
    Signed-off-by: Al Viro

    David Howells
     
  • Add a dentry op (d_manage) to permit a filesystem to hold a process and make it
    sleep when it tries to transit away from one of that filesystem's directories
    during a pathwalk. The operation is keyed off a new dentry flag
    (DCACHE_MANAGE_TRANSIT).

    The filesystem is allowed to be selective about which processes it holds and
    which it permits to continue on or prohibits from transiting from each flagged
    directory. This will allow autofs to hold up client processes whilst letting
    its userspace daemon through to maintain the directory or the stuff behind it
    or mounted upon it.

    The ->d_manage() dentry operation:

    int (*d_manage)(struct path *path, bool mounting_here);

    takes a pointer to the directory about to be transited away from and a flag
    indicating whether the transit is undertaken by do_add_mount() or
    do_move_mount() skipping through a pile of filesystems mounted on a mountpoint.

    It should return 0 if successful and to let the process continue on its way;
    -EISDIR to prohibit the caller from skipping to overmounted filesystems or
    automounting, and to use this directory; or some other error code to return to
    the user.

    ->d_manage() is called with namespace_sem writelocked if mounting_here is true
    and no other locks held, so it may sleep. However, if mounting_here is true,
    it may not initiate or wait for a mount or unmount upon the parameter
    directory, even if the act is actually performed by userspace.

    Within fs/namei.c, follow_managed() is extended to check with d_manage() first
    on each managed directory, before transiting away from it or attempting to
    automount upon it.

    follow_down() is renamed follow_down_one() and should only be used where the
    filesystem deliberately intends to avoid management steps (e.g. autofs).

    A new follow_down() is added that incorporates the loop done by all other
    callers of follow_down() (do_add/move_mount(), autofs and NFSD; whilst AFS, NFS
    and CIFS do use it, their use is removed by converting them to use
    d_automount()). The new follow_down() calls d_manage() as appropriate. It
    also takes an extra parameter to indicate if it is being called from mount code
    (with namespace_sem writelocked) which it passes to d_manage(). follow_down()
    ignores automount points so that it can be used to mount on them.

    __follow_mount_rcu() is made to abort rcu-walk mode if it hits a directory with
    DCACHE_MANAGE_TRANSIT set on the basis that we're probably going to have to
    sleep. It would be possible to enter d_manage() in rcu-walk mode too, and have
    that determine whether to abort or not itself. That would allow the autofs
    daemon to continue on in rcu-walk mode.

    Note that DCACHE_MANAGE_TRANSIT on a directory should be cleared when it isn't
    required as every tranist from that directory will cause d_manage() to be
    invoked. It can always be set again when necessary.

    ==========================
    WHAT THIS MEANS FOR AUTOFS
    ==========================

    Autofs currently uses the lookup() inode op and the d_revalidate() dentry op to
    trigger the automounting of indirect mounts, and both of these can be called
    with i_mutex held.

    autofs knows that the i_mutex will be held by the caller in lookup(), and so
    can drop it before invoking the daemon - but this isn't so for d_revalidate(),
    since the lock is only held on _some_ of the code paths that call it. This
    means that autofs can't risk dropping i_mutex from its d_revalidate() function
    before it calls the daemon.

    The bug could manifest itself as, for example, a process that's trying to
    validate an automount dentry that gets made to wait because that dentry is
    expired and needs cleaning up:

    mkdir S ffffffff8014e05a 0 32580 24956
    Call Trace:
    [] :autofs4:autofs4_wait+0x674/0x897
    [] avc_has_perm+0x46/0x58
    [] autoremove_wake_function+0x0/0x2e
    [] :autofs4:autofs4_expire_wait+0x41/0x6b
    [] :autofs4:autofs4_revalidate+0x91/0x149
    [] __lookup_hash+0xa0/0x12f
    [] lookup_create+0x46/0x80
    [] sys_mkdirat+0x56/0xe4

    versus the automount daemon which wants to remove that dentry, but can't
    because the normal process is holding the i_mutex lock:

    automount D ffffffff8014e05a 0 32581 1 32561
    Call Trace:
    [] __mutex_lock_slowpath+0x60/0x9b
    [] do_path_lookup+0x2ca/0x2f1
    [] .text.lock.mutex+0xf/0x14
    [] do_rmdir+0x77/0xde
    [] tracesys+0x71/0xe0
    [] tracesys+0xd5/0xe0

    which means that the system is deadlocked.

    This patch allows autofs to hold up normal processes whilst the daemon goes
    ahead and does things to the dentry tree behind the automouter point without
    risking a deadlock as almost no locks are held in d_manage() and none in
    d_automount().

    Signed-off-by: David Howells
    Was-Acked-by: Ian Kent
    Signed-off-by: Al Viro

    David Howells
     

07 Jan, 2011

2 commits

  • dcache_lock no longer protects anything. remove it.

    Signed-off-by: Nick Piggin

    Nick Piggin
     
  • The remaining usages for dcache_lock is to allow atomic, multi-step read-side
    operations over the directory tree by excluding modifications to the tree.
    Also, to walk in the leaf->root direction in the tree where we don't have
    a natural d_lock ordering.

    This could be accomplished by taking every d_lock, but this would mean a
    huge number of locks and actually gets very tricky.

    Solve this instead by using the rename seqlock for multi-step read-side
    operations, retry in case of a rename so we don't walk up the wrong parent.
    Concurrent dentry insertions are not serialised against. Concurrent deletes
    are tricky when walking up the directory: our parent might have been deleted
    when dropping locks so also need to check and retry for that.

    We can also use the rename lock in cases where livelock is a worry (and it
    is introduced in subsequent patch).

    Signed-off-by: Nick Piggin

    Nick Piggin
     

15 May, 2010

1 commit


30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

23 Jun, 2009

1 commit


12 Jun, 2009

1 commit


08 Oct, 2008

2 commits


01 Aug, 2008

1 commit


17 May, 2008

2 commits


20 Apr, 2008

1 commit


15 Feb, 2008

2 commits

  • * Add path_put() functions for releasing a reference to the dentry and
    vfsmount of a struct path in the right order

    * Switch from path_release(nd) to path_put(&nd->path)

    * Rename dput_path() to path_put_conditional()

    [akpm@linux-foundation.org: fix cifs]
    Signed-off-by: Jan Blunck
    Signed-off-by: Andreas Gruenbacher
    Acked-by: Christoph Hellwig
    Cc:
    Cc: Al Viro
    Cc: Steven French
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Blunck
     
  • This is the central patch of a cleanup series. In most cases there is no good
    reason why someone would want to use a dentry for itself. This series reflects
    that fact and embeds a struct path into nameidata.

    Together with the other patches of this series
    - it enforced the correct order of getting/releasing the reference count on
    pairs
    - it prepares the VFS for stacking support since it is essential to have a
    struct path in every place where the stack can be traversed
    - it reduces the overall code size:

    without patch series:
    text data bss dec hex filename
    5321639 858418 715768 6895825 6938d1 vmlinux

    with patch series:
    text data bss dec hex filename
    5320026 858418 715768 6894212 693284 vmlinux

    This patch:

    Switch from nd->{dentry,mnt} to nd->path.{dentry,mnt} everywhere.

    [akpm@linux-foundation.org: coding-style fixes]
    [akpm@linux-foundation.org: fix cifs]
    [akpm@linux-foundation.org: fix smack]
    Signed-off-by: Jan Blunck
    Signed-off-by: Andreas Gruenbacher
    Acked-by: Christoph Hellwig
    Cc: Al Viro
    Cc: Casey Schaufler
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Blunck
     

30 Jan, 2008

1 commit


01 Sep, 2007

1 commit


08 Aug, 2007

1 commit

  • This will avoid deadlocks of the form:

    stack backtrace:
    [] show_trace_log_lvl+0x1a/0x30
    [] show_trace+0x12/0x20
    [] dump_stack+0x15/0x20
    [] __lock_acquire+0xc22/0x1030
    [] lock_acquire+0x61/0x80
    [] flush_workqueue+0x49/0x70
    [] flush_scheduled_work+0xd/0x10
    [] nfs_release_automount_timer+0x2c/0x30 [nfs]
    [] nfs_free_server+0x9e/0xd0 [nfs]
    [] nfs_kill_super+0x16/0x20 [nfs]
    [] deactivate_super+0x7d/0xa0
    [] mntput_no_expire+0x4b/0x80
    [] expire_mount_list+0xe4/0x140
    [] mark_mounts_for_expiry+0x99/0xb0
    [] nfs_expire_automounts+0xd/0x40 [nfs]
    [] run_workqueue+0x12b/0x1e0
    [] worker_thread+0x9b/0x100
    [] kthread+0x42/0x70
    [] kernel_thread_helper+0x7/0x18
    =======================

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     

13 Feb, 2007

1 commit

  • Many struct inode_operations in the kernel can be "const". Marking them const
    moves these to the .rodata section, which avoids false sharing with potential
    dirty data. In addition it'll catch accidental writes at compile time to
    these shared resources.

    Signed-off-by: Arjan van de Ven
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arjan van de Ven
     

22 Nov, 2006

2 commits

  • Pass the work_struct pointer to the work function rather than context data.
    The work function can use container_of() to work out the data.

    For the cases where the container of the work_struct may go away the moment the
    pending bit is cleared, it is made possible to defer the release of the
    structure by deferring the clearing of the pending bit.

    To make this work, an extra flag is introduced into the management side of the
    work_struct. This governs auto-release of the structure upon execution.

    Ordinarily, the work queue executor would release the work_struct for further
    scheduling or deallocation by clearing the pending bit prior to jumping to the
    work function. This means that, unless the driver makes some guarantee itself
    that the work_struct won't go away, the work function may not access anything
    else in the work_struct or its container lest they be deallocated.. This is a
    problem if the auxiliary data is taken away (as done by the last patch).

    However, if the pending bit is *not* cleared before jumping to the work
    function, then the work function *may* access the work_struct and its container
    with no problems. But then the work function must itself release the
    work_struct by calling work_release().

    In most cases, automatic release is fine, so this is the default. Special
    initiators exist for the non-auto-release case (ending in _NAR).

    Signed-Off-By: David Howells

    David Howells
     
  • Separate delayable work items from non-delayable work items be splitting them
    into a separate structure (delayed_work), which incorporates a work_struct and
    the timer_list removed from work_struct.

    The work_struct struct is huge, and this limits it's usefulness. On a 64-bit
    architecture it's nearly 100 bytes in size. This reduces that by half for the
    non-delayable type of event.

    Signed-Off-By: David Howells

    David Howells
     

04 Oct, 2006

1 commit


27 Sep, 2006

1 commit


23 Sep, 2006

3 commits

  • The attached patch makes NFS share superblocks between mounts from the same
    server and FSID over the same protocol.

    It does this by creating each superblock with a false root and returning the
    real root dentry in the vfsmount presented by get_sb(). The root dentry set
    starts off as an anonymous dentry if we don't already have the dentry for its
    inode, otherwise it simply returns the dentry we already have.

    We may thus end up with several trees of dentries in the superblock, and if at
    some later point one of anonymous tree roots is discovered by normal filesystem
    activity to be located in another tree within the superblock, the anonymous
    root is named and materialises attached to the second tree at the appropriate
    point.

    Why do it this way? Why not pass an extra argument to the mount() syscall to
    indicate the subpath and then pathwalk from the server root to the desired
    directory? You can't guarantee this will work for two reasons:

    (1) The root and intervening nodes may not be accessible to the client.

    With NFS2 and NFS3, for instance, mountd is called on the server to get
    the filehandle for the tip of a path. mountd won't give us handles for
    anything we don't have permission to access, and so we can't set up NFS
    inodes for such nodes, and so can't easily set up dentries (we'd have to
    have ghost inodes or something).

    With this patch we don't actually create dentries until we get handles
    from the server that we can use to set up their inodes, and we don't
    actually bind them into the tree until we know for sure where they go.

    (2) Inaccessible symbolic links.

    If we're asked to mount two exports from the server, eg:

    mount warthog:/warthog/aaa/xxx /mmm
    mount warthog:/warthog/bbb/yyy /nnn

    We may not be able to access anything nearer the root than xxx and yyy,
    but we may find out later that /mmm/www/yyy, say, is actually the same
    directory as the one mounted on /nnn. What we might then find out, for
    example, is that /warthog/bbb was actually a symbolic link to
    /warthog/aaa/xxx/www, but we can't actually determine that by talking to
    the server until /warthog is made available by NFS.

    This would lead to having constructed an errneous dentry tree which we
    can't easily fix. We can end up with a dentry marked as a directory when
    it should actually be a symlink, or we could end up with an apparently
    hardlinked directory.

    With this patch we need not make assumptions about the type of a dentry
    for which we can't retrieve information, nor need we assume we know its
    place in the grand scheme of things until we actually see that place.

    This patch reduces the possibility of aliasing in the inode and page caches for
    inodes that may be accessed by more than one NFS export. It also reduces the
    number of superblocks required for NFS where there are many NFS exports being
    used from a server (home directory server + autofs for example).

    This in turn makes it simpler to do local caching of network filesystems, as it
    can then be guaranteed that there won't be links from multiple inodes in
    separate superblocks to the same cache file.

    Obviously, cache aliasing between different levels of NFS protocol could still
    be a problem, but at least that gives us another key to use when indexing the
    cache.

    This patch makes the following changes:

    (1) The server record construction/destruction has been abstracted out into
    its own set of functions to make things easier to get right. These have
    been moved into fs/nfs/client.c.

    All the code in fs/nfs/client.c has to do with the management of
    connections to servers, and doesn't touch superblocks in any way; the
    remaining code in fs/nfs/super.c has to do with VFS superblock management.

    (2) The sequence of events undertaken by NFS mount is now reordered:

    (a) A volume representation (struct nfs_server) is allocated.

    (b) A server representation (struct nfs_client) is acquired. This may be
    allocated or shared, and is keyed on server address, port and NFS
    version.

    (c) If allocated, the client representation is initialised. The state
    member variable of nfs_client is used to prevent a race during
    initialisation from two mounts.

    (d) For NFS4 a simple pathwalk is performed, walking from FH to FH to find
    the root filehandle for the mount (fs/nfs/getroot.c). For NFS2/3 we
    are given the root FH in advance.

    (e) The volume FSID is probed for on the root FH.

    (f) The volume representation is initialised from the FSINFO record
    retrieved on the root FH.

    (g) sget() is called to acquire a superblock. This may be allocated or
    shared, keyed on client pointer and FSID.

    (h) If allocated, the superblock is initialised.

    (i) If the superblock is shared, then the new nfs_server record is
    discarded.

    (j) The root dentry for this mount is looked up from the root FH.

    (k) The root dentry for this mount is assigned to the vfsmount.

    (3) nfs_readdir_lookup() creates dentries for each of the entries readdir()
    returns; this function now attaches disconnected trees from alternate
    roots that happen to be discovered attached to a directory being read (in
    the same way nfs_lookup() is made to do for lookup ops).

    The new d_materialise_unique() function is now used to do this, thus
    permitting the whole thing to be done under one set of locks, and thus
    avoiding any race between mount and lookup operations on the same
    directory.

    (4) The client management code uses a new debug facility: NFSDBG_CLIENT which
    is set by echoing 1024 to /proc/net/sunrpc/nfs_debug.

    (5) Clone mounts are now called xdev mounts.

    (6) Use the dentry passed to the statfs() op as the handle for retrieving fs
    statistics rather than the root dentry of the superblock (which is now a
    dummy).

    Signed-Off-By: David Howells
    Signed-off-by: Trond Myklebust

    David Howells
     
  • Move the rpc_ops from the nfs_server struct to the nfs_client struct as they're
    common to all server records of a particular NFS protocol version.

    Signed-Off-By: David Howells
    Signed-off-by: Trond Myklebust

    David Howells
     
  • Add some extra const qualifiers into NFS.

    Signed-Off-By: David Howells
    Signed-off-by: Trond Myklebust

    David Howells
     

04 Aug, 2006

1 commit


09 Jun, 2006

2 commits

  • As fs/nfs/inode.c is rather large, heterogenous and unwieldy, the attached
    patch splits it up into a number of files:

    (*) fs/nfs/inode.c

    Strictly inode specific functions.

    (*) fs/nfs/super.c

    Superblock management functions for NFS and NFS4, normal access, clones
    and referrals. The NFS4 superblock functions _could_ move out into a
    separate conditionally compiled file, but it's probably not worth it as
    there're so many common bits.

    (*) fs/nfs/namespace.c

    Some namespace-specific functions have been moved here.

    (*) fs/nfs/nfs4namespace.c

    NFS4-specific namespace functions (this could be merged into the previous
    file). This file is conditionally compiled.

    (*) fs/nfs/internal.h

    Inter-file declarations, plus a few simple utility functions moved from
    fs/nfs/inode.c.

    Additionally, all the in-.c-file externs have been moved here, and those
    files they were moved from now includes this file.

    For the most part, the functions have not been changed, only some multiplexor
    functions have changed significantly.

    I've also:

    (*) Added some extra banner comments above some functions.

    (*) Rearranged the function order within the files to be more logical and
    better grouped (IMO), though someone may prefer a different order.

    (*) Reduced the number of #ifdefs in .c files.

    (*) Added missing __init and __exit directives.

    Signed-Off-By: David Howells

    David Howells
     
  • Respond to a moved error on NFS lookup by setting up the referral.
    Note: We don't actually follow the referral during lookup/getattr, but
    later when we detect fsid mismatch in inode revalidation (similar to the
    processing done for cloning submounts). Referrals will have fake attributes
    until they are actually followed or traversed.

    Signed-off-by: Manoj Naik
    Signed-off-by: Trond Myklebust

    Manoj Naik