02 Nov, 2011

1 commit

  • Since the commit below which added O_PATH support to the *at() calls, the
    error return for readlink/readlinkat for the empty pathname has switched
    from ENOENT to EINVAL:

    commit 65cfc6722361570bfe255698d9cd4dccaf47570d
    Author: Al Viro
    Date: Sun Mar 13 15:56:26 2011 -0400

    readlinkat(), fchownat() and fstatat() with empty relative pathnames

    This is both unexpected for userspace and makes readlink/readlinkat
    inconsistant with all other interfaces; and inconsistant with our stated
    return for these pathnames.

    As the readlinkat call does not have a flags parameter we cannot use the
    AT_EMPTY_PATH approach used in the other calls. Therefore expose whether
    the original path is infact entry via a new user_path_at_empty() path
    lookup function. Use this to determine whether to default to EINVAL or
    ENOENT for failures.

    Addresses http://bugs.launchpad.net/bugs/817187

    [akpm@linux-foundation.org: remove unused getname_flags()]
    Signed-off-by: Andy Whitcroft
    Cc: Christoph Hellwig
    Cc: Al Viro
    Cc:
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Christoph Hellwig

    Andy Whitcroft
     

27 Sep, 2011

2 commits

  • That flag no longer makes sense, since we don't look up automount points
    as eagerly any more. Additionally, it turns out that the NO_AUTOMOUNT
    handling was buggy to begin with: it would avoid automounting even for
    cases where we really *needed* to do the automount handling, and could
    return ENOENT for autofs entries that hadn't been instantiated yet.

    With our new non-eager automount semantics, one discussion has been
    about adding a AT_AUTOMOUNT flag to vfs_fstatat (and thus the
    newfstatat() and fstatat64() system calls), but it's probably not worth
    it: you can always force at least directory automounting by simply
    adding the final '/' to the filename, which works for *all* of the stat
    family system calls, old and new.

    So AT_NO_AUTOMOUNT (and thus LOOKUP_NO_AUTOMOUNT) really were just a
    result of our bad default behavior.

    Acked-by: Ian Kent
    Acked-by: Trond Myklebust
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • Since we've now turned around and made LOOKUP_FOLLOW *not* force an
    automount, we want to add the ability to force an automount event on
    lookup even if we don't happen to have one of the other flags that force
    it implicitly (LOOKUP_OPEN, LOOKUP_DIRECTORY, LOOKUP_PARENT..)

    Most cases will never want to use this, since you'd normally want to
    delay automounting as long as possible, which usually implies
    LOOKUP_OPEN (when we open a file or directory, we really cannot avoid
    the automount any more).

    But Trond argued sufficiently forcefully that at a minimum bind mounting
    a file and quotactl will want to force the automount lookup. Some other
    cases (like nfs_follow_remote_path()) could use it too, although
    LOOKUP_DIRECTORY would work there as well.

    This commit just adds the flag and logic, no users yet, though. It also
    doesn't actually touch the LOOKUP_NO_AUTOMOUNT flag that is related, and
    was made irrelevant by the same change that made us not follow on
    LOOKUP_FOLLOW.

    Cc: Trond Myklebust
    Cc: Ian Kent
    Cc: Jeff Layton
    Cc: Miklos Szeredi
    Cc: David Howells
    Cc: Al Viro
    Cc: Greg KH
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

20 Jul, 2011

3 commits


18 Mar, 2011

1 commit


15 Mar, 2011

1 commit

  • For name_to_handle_at(2) we'll want both ...at()-style syscall that
    would be usable for non-directory descriptors (with empty relative
    pathname). Introduce new flag (AT_EMPTY_PATH) to deal with that and
    corresponding LOOKUP_EMPTY; teach user_path_at() and path_init() to
    deal with the latter.

    Signed-off-by: Al Viro

    Al Viro
     

14 Mar, 2011

4 commits

  • New lookup flag: LOOKUP_ROOT. nd->root is set (and held) by caller,
    path_init() starts walking from that place and all pathname resolution
    machinery never drops nd->root if that flag is set. That turns
    vfs_path_lookup() into a special case of do_path_lookup() *and*
    gets us down to 3 callers of link_path_walk(), making it finally
    feasible to rip the handling of trailing symlink out of link_path_walk().
    That will not only simply the living hell out of it, but make life
    much simpler for unionfs merge. Trailing symlink handling will
    become iterative, which is a good thing for stack footprint in
    a lot of situations as well.

    Signed-off-by: Al Viro

    Al Viro
     
  • Don't stash the struct file * used as starting point of walk in nameidata;
    pass file ** to path_init() instead.

    Signed-off-by: Al Viro

    Al Viro
     
  • instead of ad-hackery around need_reval_dot(), do the following:
    set a flag (LOOKUP_JUMPED) in the beginning of path, on absolute
    symlink traversal, on ".." and on procfs-style symlinks. Clear on
    normal components, leave unchanged on ".". Non-nested callers of
    link_path_walk() call handle_reval_path(), which checks that flag
    is set and that fs does want the final revalidate thing, then does
    ->d_revalidate(). In link_path_walk() all the return_reval stuff
    is gone.

    Signed-off-by: Al Viro

    Al Viro
     
  • all remaining callers pass LOOKUP_PARENT to it, so
    flags argument can die; renamed to kern_path_parent()

    Signed-off-by: Al Viro

    Al Viro
     

16 Jan, 2011

2 commits

  • Add an AT_NO_AUTOMOUNT flag to suppress terminal automounting of automount
    point directories. This can be used by fstatat() users to permit the
    gathering of attributes on an automount point and also prevent
    mass-automounting of a directory of automount points by ls.

    Signed-off-by: David Howells
    Acked-by: Ian Kent
    Signed-off-by: Al Viro

    David Howells
     
  • Add a dentry op (d_manage) to permit a filesystem to hold a process and make it
    sleep when it tries to transit away from one of that filesystem's directories
    during a pathwalk. The operation is keyed off a new dentry flag
    (DCACHE_MANAGE_TRANSIT).

    The filesystem is allowed to be selective about which processes it holds and
    which it permits to continue on or prohibits from transiting from each flagged
    directory. This will allow autofs to hold up client processes whilst letting
    its userspace daemon through to maintain the directory or the stuff behind it
    or mounted upon it.

    The ->d_manage() dentry operation:

    int (*d_manage)(struct path *path, bool mounting_here);

    takes a pointer to the directory about to be transited away from and a flag
    indicating whether the transit is undertaken by do_add_mount() or
    do_move_mount() skipping through a pile of filesystems mounted on a mountpoint.

    It should return 0 if successful and to let the process continue on its way;
    -EISDIR to prohibit the caller from skipping to overmounted filesystems or
    automounting, and to use this directory; or some other error code to return to
    the user.

    ->d_manage() is called with namespace_sem writelocked if mounting_here is true
    and no other locks held, so it may sleep. However, if mounting_here is true,
    it may not initiate or wait for a mount or unmount upon the parameter
    directory, even if the act is actually performed by userspace.

    Within fs/namei.c, follow_managed() is extended to check with d_manage() first
    on each managed directory, before transiting away from it or attempting to
    automount upon it.

    follow_down() is renamed follow_down_one() and should only be used where the
    filesystem deliberately intends to avoid management steps (e.g. autofs).

    A new follow_down() is added that incorporates the loop done by all other
    callers of follow_down() (do_add/move_mount(), autofs and NFSD; whilst AFS, NFS
    and CIFS do use it, their use is removed by converting them to use
    d_automount()). The new follow_down() calls d_manage() as appropriate. It
    also takes an extra parameter to indicate if it is being called from mount code
    (with namespace_sem writelocked) which it passes to d_manage(). follow_down()
    ignores automount points so that it can be used to mount on them.

    __follow_mount_rcu() is made to abort rcu-walk mode if it hits a directory with
    DCACHE_MANAGE_TRANSIT set on the basis that we're probably going to have to
    sleep. It would be possible to enter d_manage() in rcu-walk mode too, and have
    that determine whether to abort or not itself. That would allow the autofs
    daemon to continue on in rcu-walk mode.

    Note that DCACHE_MANAGE_TRANSIT on a directory should be cleared when it isn't
    required as every tranist from that directory will cause d_manage() to be
    invoked. It can always be set again when necessary.

    ==========================
    WHAT THIS MEANS FOR AUTOFS
    ==========================

    Autofs currently uses the lookup() inode op and the d_revalidate() dentry op to
    trigger the automounting of indirect mounts, and both of these can be called
    with i_mutex held.

    autofs knows that the i_mutex will be held by the caller in lookup(), and so
    can drop it before invoking the daemon - but this isn't so for d_revalidate(),
    since the lock is only held on _some_ of the code paths that call it. This
    means that autofs can't risk dropping i_mutex from its d_revalidate() function
    before it calls the daemon.

    The bug could manifest itself as, for example, a process that's trying to
    validate an automount dentry that gets made to wait because that dentry is
    expired and needs cleaning up:

    mkdir S ffffffff8014e05a 0 32580 24956
    Call Trace:
    [] :autofs4:autofs4_wait+0x674/0x897
    [] avc_has_perm+0x46/0x58
    [] autoremove_wake_function+0x0/0x2e
    [] :autofs4:autofs4_expire_wait+0x41/0x6b
    [] :autofs4:autofs4_revalidate+0x91/0x149
    [] __lookup_hash+0xa0/0x12f
    [] lookup_create+0x46/0x80
    [] sys_mkdirat+0x56/0xe4

    versus the automount daemon which wants to remove that dentry, but can't
    because the normal process is holding the i_mutex lock:

    automount D ffffffff8014e05a 0 32581 1 32561
    Call Trace:
    [] __mutex_lock_slowpath+0x60/0x9b
    [] do_path_lookup+0x2ca/0x2f1
    [] .text.lock.mutex+0xf/0x14
    [] do_rmdir+0x77/0xde
    [] tracesys+0x71/0xe0
    [] tracesys+0xd5/0xe0

    which means that the system is deadlocked.

    This patch allows autofs to hold up normal processes whilst the daemon goes
    ahead and does things to the dentry tree behind the automouter point without
    risking a deadlock as almost no locks are held in d_manage() and none in
    d_automount().

    Signed-off-by: David Howells
    Was-Acked-by: Ian Kent
    Signed-off-by: Al Viro

    David Howells
     

07 Jan, 2011

2 commits

  • Perform common cases of path lookups without any stores or locking in the
    ancestor dentry elements. This is called rcu-walk, as opposed to the current
    algorithm which is a refcount based walk, or ref-walk.

    This results in far fewer atomic operations on every path element,
    significantly improving path lookup performance. It also avoids cacheline
    bouncing on common dentries, significantly improving scalability.

    The overall design is like this:
    * LOOKUP_RCU is set in nd->flags, which distinguishes rcu-walk from ref-walk.
    * Take the RCU lock for the entire path walk, starting with the acquiring
    of the starting path (eg. root/cwd/fd-path). So now dentry refcounts are
    not required for dentry persistence.
    * synchronize_rcu is called when unregistering a filesystem, so we can
    access d_ops and i_ops during rcu-walk.
    * Similarly take the vfsmount lock for the entire path walk. So now mnt
    refcounts are not required for persistence. Also we are free to perform mount
    lookups, and to assume dentry mount points and mount roots are stable up and
    down the path.
    * Have a per-dentry seqlock to protect the dentry name, parent, and inode,
    so we can load this tuple atomically, and also check whether any of its
    members have changed.
    * Dentry lookups (based on parent, candidate string tuple) recheck the parent
    sequence after the child is found in case anything changed in the parent
    during the path walk.
    * inode is also RCU protected so we can load d_inode and use the inode for
    limited things.
    * i_mode, i_uid, i_gid can be tested for exec permissions during path walk.
    * i_op can be loaded.

    When we reach the destination dentry, we lock it, recheck lookup sequence,
    and increment its refcount and mountpoint refcount. RCU and vfsmount locks
    are dropped. This is termed "dropping rcu-walk". If the dentry refcount does
    not match, we can not drop rcu-walk gracefully at the current point in the
    lokup, so instead return -ECHILD (for want of a better errno). This signals the
    path walking code to re-do the entire lookup with a ref-walk.

    Aside from the final dentry, there are other situations that may be encounted
    where we cannot continue rcu-walk. In that case, we drop rcu-walk (ie. take
    a reference on the last good dentry) and continue with a ref-walk. Again, if
    we can drop rcu-walk gracefully, we return -ECHILD and do the whole lookup
    using ref-walk. But it is very important that we can continue with ref-walk
    for most cases, particularly to avoid the overhead of double lookups, and to
    gain the scalability advantages on common path elements (like cwd and root).

    The cases where rcu-walk cannot continue are:
    * NULL dentry (ie. any uncached path element)
    * parent with d_inode->i_op->permission or ACLs
    * dentries with d_revalidate
    * Following links

    In future patches, permission checks and d_revalidate become rcu-walk aware. It
    may be possible eventually to make following links rcu-walk aware.

    Uncached path elements will always require dropping to ref-walk mode, at the
    very least because i_mutex needs to be grabbed, and objects allocated.

    Signed-off-by: Nick Piggin

    Nick Piggin
     
  • dcache_lock no longer protects anything. remove it.

    Signed-off-by: Nick Piggin

    Nick Piggin
     

23 Dec, 2009

1 commit


12 Dec, 2009

1 commit

  • By teaching sysfs_revalidate to hide a dentry for
    a sysfs_dirent if the sysfs_dirent has been renamed,
    and by teaching sysfs_lookup to return the original
    dentry if the sysfs dirent has been renamed. I can
    show the results of renames correctly without having to
    update the dcache during the directory rename.

    This massively simplifies the rename logic allowing a lot
    of weird sysfs special cases to be removed along with
    a lot of now unnecesary helper code.

    Acked-by: Tejun Heo
    Signed-off-by: Eric W. Biederman
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     

21 Sep, 2009

1 commit


12 Jun, 2009

3 commits

  • Signed-off-by: Al Viro

    Al Viro
     
  • Signed-off-by: Al Viro

    Al Viro
     
  • New field: nd->root. When pathname resolution wants to know the root,
    check if nd->root.mnt is non-NULL; use nd->root if it is, otherwise
    copy current->fs->root there. After path_walk() is finished, we check
    if we'd got a cached value in nd->root and drop it. Before calling
    path_walk() we should either set nd->root.mnt to NULL *or* copy (and
    pin down) some path to nd->root. In the latter case we won't be
    looking at current->fs->root at all.

    Signed-off-by: Al Viro

    Al Viro
     

09 May, 2009

1 commit


01 Jan, 2009

1 commit


23 Oct, 2008

3 commits


27 Jul, 2008

5 commits


15 Feb, 2008

4 commits

  • * Add path_put() functions for releasing a reference to the dentry and
    vfsmount of a struct path in the right order

    * Switch from path_release(nd) to path_put(&nd->path)

    * Rename dput_path() to path_put_conditional()

    [akpm@linux-foundation.org: fix cifs]
    Signed-off-by: Jan Blunck
    Signed-off-by: Andreas Gruenbacher
    Acked-by: Christoph Hellwig
    Cc:
    Cc: Al Viro
    Cc: Steven French
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Blunck
     
  • This is the central patch of a cleanup series. In most cases there is no good
    reason why someone would want to use a dentry for itself. This series reflects
    that fact and embeds a struct path into nameidata.

    Together with the other patches of this series
    - it enforced the correct order of getting/releasing the reference count on
    pairs
    - it prepares the VFS for stacking support since it is essential to have a
    struct path in every place where the stack can be traversed
    - it reduces the overall code size:

    without patch series:
    text data bss dec hex filename
    5321639 858418 715768 6895825 6938d1 vmlinux

    with patch series:
    text data bss dec hex filename
    5320026 858418 715768 6894212 693284 vmlinux

    This patch:

    Switch from nd->{dentry,mnt} to nd->path.{dentry,mnt} everywhere.

    [akpm@linux-foundation.org: coding-style fixes]
    [akpm@linux-foundation.org: fix cifs]
    [akpm@linux-foundation.org: fix smack]
    Signed-off-by: Jan Blunck
    Signed-off-by: Andreas Gruenbacher
    Acked-by: Christoph Hellwig
    Cc: Al Viro
    Cc: Casey Schaufler
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Blunck
     
  • Move the definition of struct path into its own header file for further
    patches.

    Signed-off-by: Jan Blunck
    Signed-off-by: Andreas Gruenbacher
    Acked-by: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Blunck
     
  • path_release_on_umount() should only be called from sys_umount(). I merged the
    function into sys_umount() instead of having in in namei.c.

    Signed-off-by: Jan Blunck
    Acked-by: Christoph Hellwig
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Blunck
     

14 Feb, 2008

1 commit


17 Oct, 2007

1 commit

  • Try to fix the mess created by sysfs braindamage.

    - refactor code internal to fs/namei.c a little to avoid too much
    duplication:
    o __lookup_hash_kern is renamed back to __lookup_hash
    o the old __lookup_hash goes away, permission checks moves to
    the two callers
    o useless inline qualifiers on above functions go away
    - lookup_one_len_kern loses it's last argument and is renamed to
    lookup_one_noperm to make it's useage a little more clear
    - added kerneldoc comments to describe lookup_one_len aswell as
    lookup_one_noperm and make it very clear that no one should use
    the latter ever.

    Signed-off-by: Christoph Hellwig
    Cc: Josef 'Jeff' Sipek
    Cc: Miklos Szeredi
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     

20 Jul, 2007

2 commits