18 Jan, 2011

1 commit

  • There is a ref count problem in fs/namei.c:do_lookup().

    When walking in ref-walk mode, if follow_managed() returns a fail we
    need to drop dentry and possibly vfsmount. Clean up properly,
    as we do in the other caller of follow_managed().

    Signed-off-by: Ian Kent
    Signed-off-by: Al Viro

    Ian Kent
     

17 Jan, 2011

2 commits

  • ... and shift it from namei.c to namespace.c

    Signed-off-by: Al Viro

    Al Viro
     
  • Instead of splitting refcount between (per-cpu) mnt_count
    and (SMP-only) mnt_longrefs, make all references contribute
    to mnt_count again and keep track of how many are longterm
    ones.

    Accounting rules for longterm count:
    * 1 for each fs_struct.root.mnt
    * 1 for each fs_struct.pwd.mnt
    * 1 for having non-NULL ->mnt_ns
    * decrement to 0 happens only under vfsmount lock exclusive

    That allows nice common case for mntput() - since we can't drop the
    final reference until after mnt_longterm has reached 0 due to the rules
    above, mntput() can grab vfsmount lock shared and check mnt_longterm.
    If it turns out to be non-zero (which is the common case), we know
    that this is not the final mntput() and can just blindly decrement
    percpu mnt_count. Otherwise we grab vfsmount lock exclusive and
    do usual decrement-and-check of percpu mnt_count.

    For fs_struct.c we have mnt_make_longterm() and mnt_make_shortterm();
    namespace.c uses the latter in places where we don't already hold
    vfsmount lock exclusive and opencodes a few remaining spots where
    we need to manipulate mnt_longterm.

    Note that we mostly revert the code outside of fs/namespace.c back
    to what we used to have; in particular, normal code doesn't need
    to care about two kinds of references, etc. And we get to keep
    the optimization Nick's variant had bought us...

    Signed-off-by: Al Viro

    Al Viro
     

16 Jan, 2011

8 commits

  • Unexport do_add_mount() and make ->d_automount() return the vfsmount to be
    added rather than calling do_add_mount() itself. follow_automount() will then
    do the addition.

    This slightly complicates things as ->d_automount() normally wants to add the
    new vfsmount to an expiration list and start an expiration timer. The problem
    with that is that the vfsmount will be deleted if it has a refcount of 1 and
    the timer will not repeat if the expiration list is empty.

    To this end, we require the vfsmount to be returned from d_automount() with a
    refcount of (at least) 2. One of these refs will be dropped unconditionally.
    In addition, follow_automount() must get a 3rd ref around the call to
    do_add_mount() lest it eat a ref and return an error, leaving the mount we
    have open to being expired as we would otherwise have only 1 ref on it.

    d_automount() should also add the the vfsmount to the expiration list (by
    calling mnt_set_expiry()) and start the expiration timer before returning, if
    this mechanism is to be used. The vfsmount will be unlinked from the
    expiration list by follow_automount() if do_add_mount() fails.

    This patch also fixes the call to do_add_mount() for AFS to propagate the mount
    flags from the parent vfsmount.

    Signed-off-by: David Howells
    Signed-off-by: Al Viro

    David Howells
     
  • Allow d_manage() to be called from pathwalk when it is in RCU-walk mode as well
    as when it is in Ref-walk mode. This permits __follow_mount_rcu() to call
    d_manage() directly. d_manage() needs a parameter to indicate that it is in
    RCU-walk mode as it isn't allowed to sleep if in that mode (but should return
    -ECHILD instead).

    autofs4_d_manage() can then be set to retain RCU-walk mode if the daemon
    accesses it and otherwise request dropping back to ref-walk mode.

    Signed-off-by: David Howells
    Signed-off-by: Al Viro

    David Howells
     
  • Remove a further kludge from __do_follow_link() as it's no longer required with
    the automount code.

    This reverts the non-helper-function parts of
    051d381259eb57d6074d02a6ba6e90e744f1a29f, which breaks union mounts.

    Reported-by: vaurora@redhat.com
    Signed-off-by: David Howells
    Signed-off-by: Al Viro

    David Howells
     
  • Remove the automount through follow_link() kludge code from pathwalk in favour
    of using d_automount().

    Signed-off-by: David Howells
    Acked-by: Ian Kent
    Signed-off-by: Al Viro

    David Howells
     
  • Add an AT_NO_AUTOMOUNT flag to suppress terminal automounting of automount
    point directories. This can be used by fstatat() users to permit the
    gathering of attributes on an automount point and also prevent
    mass-automounting of a directory of automount points by ls.

    Signed-off-by: David Howells
    Acked-by: Ian Kent
    Signed-off-by: Al Viro

    David Howells
     
  • Add a dentry op (d_manage) to permit a filesystem to hold a process and make it
    sleep when it tries to transit away from one of that filesystem's directories
    during a pathwalk. The operation is keyed off a new dentry flag
    (DCACHE_MANAGE_TRANSIT).

    The filesystem is allowed to be selective about which processes it holds and
    which it permits to continue on or prohibits from transiting from each flagged
    directory. This will allow autofs to hold up client processes whilst letting
    its userspace daemon through to maintain the directory or the stuff behind it
    or mounted upon it.

    The ->d_manage() dentry operation:

    int (*d_manage)(struct path *path, bool mounting_here);

    takes a pointer to the directory about to be transited away from and a flag
    indicating whether the transit is undertaken by do_add_mount() or
    do_move_mount() skipping through a pile of filesystems mounted on a mountpoint.

    It should return 0 if successful and to let the process continue on its way;
    -EISDIR to prohibit the caller from skipping to overmounted filesystems or
    automounting, and to use this directory; or some other error code to return to
    the user.

    ->d_manage() is called with namespace_sem writelocked if mounting_here is true
    and no other locks held, so it may sleep. However, if mounting_here is true,
    it may not initiate or wait for a mount or unmount upon the parameter
    directory, even if the act is actually performed by userspace.

    Within fs/namei.c, follow_managed() is extended to check with d_manage() first
    on each managed directory, before transiting away from it or attempting to
    automount upon it.

    follow_down() is renamed follow_down_one() and should only be used where the
    filesystem deliberately intends to avoid management steps (e.g. autofs).

    A new follow_down() is added that incorporates the loop done by all other
    callers of follow_down() (do_add/move_mount(), autofs and NFSD; whilst AFS, NFS
    and CIFS do use it, their use is removed by converting them to use
    d_automount()). The new follow_down() calls d_manage() as appropriate. It
    also takes an extra parameter to indicate if it is being called from mount code
    (with namespace_sem writelocked) which it passes to d_manage(). follow_down()
    ignores automount points so that it can be used to mount on them.

    __follow_mount_rcu() is made to abort rcu-walk mode if it hits a directory with
    DCACHE_MANAGE_TRANSIT set on the basis that we're probably going to have to
    sleep. It would be possible to enter d_manage() in rcu-walk mode too, and have
    that determine whether to abort or not itself. That would allow the autofs
    daemon to continue on in rcu-walk mode.

    Note that DCACHE_MANAGE_TRANSIT on a directory should be cleared when it isn't
    required as every tranist from that directory will cause d_manage() to be
    invoked. It can always be set again when necessary.

    ==========================
    WHAT THIS MEANS FOR AUTOFS
    ==========================

    Autofs currently uses the lookup() inode op and the d_revalidate() dentry op to
    trigger the automounting of indirect mounts, and both of these can be called
    with i_mutex held.

    autofs knows that the i_mutex will be held by the caller in lookup(), and so
    can drop it before invoking the daemon - but this isn't so for d_revalidate(),
    since the lock is only held on _some_ of the code paths that call it. This
    means that autofs can't risk dropping i_mutex from its d_revalidate() function
    before it calls the daemon.

    The bug could manifest itself as, for example, a process that's trying to
    validate an automount dentry that gets made to wait because that dentry is
    expired and needs cleaning up:

    mkdir S ffffffff8014e05a 0 32580 24956
    Call Trace:
    [] :autofs4:autofs4_wait+0x674/0x897
    [] avc_has_perm+0x46/0x58
    [] autoremove_wake_function+0x0/0x2e
    [] :autofs4:autofs4_expire_wait+0x41/0x6b
    [] :autofs4:autofs4_revalidate+0x91/0x149
    [] __lookup_hash+0xa0/0x12f
    [] lookup_create+0x46/0x80
    [] sys_mkdirat+0x56/0xe4

    versus the automount daemon which wants to remove that dentry, but can't
    because the normal process is holding the i_mutex lock:

    automount D ffffffff8014e05a 0 32581 1 32561
    Call Trace:
    [] __mutex_lock_slowpath+0x60/0x9b
    [] do_path_lookup+0x2ca/0x2f1
    [] .text.lock.mutex+0xf/0x14
    [] do_rmdir+0x77/0xde
    [] tracesys+0x71/0xe0
    [] tracesys+0xd5/0xe0

    which means that the system is deadlocked.

    This patch allows autofs to hold up normal processes whilst the daemon goes
    ahead and does things to the dentry tree behind the automouter point without
    risking a deadlock as almost no locks are held in d_manage() and none in
    d_automount().

    Signed-off-by: David Howells
    Was-Acked-by: Ian Kent
    Signed-off-by: Al Viro

    David Howells
     
  • Add a dentry op (d_automount) to handle automounting directories rather than
    abusing the follow_link() inode operation. The operation is keyed off a new
    dentry flag (DCACHE_NEED_AUTOMOUNT).

    This also makes it easier to add an AT_ flag to suppress terminal segment
    automount during pathwalk and removes the need for the kludge code in the
    pathwalk algorithm to handle directories with follow_link() semantics.

    The ->d_automount() dentry operation:

    struct vfsmount *(*d_automount)(struct path *mountpoint);

    takes a pointer to the directory to be mounted upon, which is expected to
    provide sufficient data to determine what should be mounted. If successful, it
    should return the vfsmount struct it creates (which it should also have added
    to the namespace using do_add_mount() or similar). If there's a collision with
    another automount attempt, NULL should be returned. If the directory specified
    by the parameter should be used directly rather than being mounted upon,
    -EISDIR should be returned. In any other case, an error code should be
    returned.

    The ->d_automount() operation is called with no locks held and may sleep. At
    this point the pathwalk algorithm will be in ref-walk mode.

    Within fs/namei.c itself, a new pathwalk subroutine (follow_automount()) is
    added to handle mountpoints. It will return -EREMOTE if the automount flag was
    set, but no d_automount() op was supplied, -ELOOP if we've encountered too many
    symlinks or mountpoints, -EISDIR if the walk point should be used without
    mounting and 0 if successful. The path will be updated to point to the mounted
    filesystem if a successful automount took place.

    __follow_mount() is replaced by follow_managed() which is more generic
    (especially with the patch that adds ->d_manage()). This handles transits from
    directories during pathwalk, including automounting and skipping over
    mountpoints (and holding processes with the next patch).

    __follow_mount_rcu() will jump out of RCU-walk mode if it encounters an
    automount point with nothing mounted on it.

    follow_dotdot*() does not handle automounts as you don't want to trigger them
    whilst following "..".

    I've also extracted the mount/don't-mount logic from autofs4 and included it
    here. It makes the mount go ahead anyway if someone calls open() or creat(),
    tries to traverse the directory, tries to chdir/chroot/etc. into the directory,
    or sticks a '/' on the end of the pathname. If they do a stat(), however,
    they'll only trigger the automount if they didn't also say O_NOFOLLOW.

    I've also added an inode flag (S_AUTOMOUNT) so that filesystems can mark their
    inodes as automount points. This flag is automatically propagated to the
    dentry as DCACHE_NEED_AUTOMOUNT by __d_instantiate(). This saves NFS and could
    save AFS a private flag bit apiece, but is not strictly necessary. It would be
    preferable to do the propagation in d_set_d_op(), but that doesn't normally
    have access to the inode.

    [AV: fixed breakage in case if __follow_mount_rcu() fails and nameidata_drop_rcu()
    succeeds in RCU case of do_lookup(); we need to fall through to non-RCU case after
    that, rather than just returning with ungrabbed *path]

    Signed-off-by: David Howells
    Was-Acked-by: Ian Kent
    Signed-off-by: Al Viro

    David Howells
     
  • do_lookup() has a path leading from LOOKUP_RCU case to non-RCU
    crossing of mountpoints, which breaks things badly. If we
    hit need_revalidate: and do nothing in there, we need to come
    back into LOOKUP_RCU half of things, not to done: in non-RCU
    one.

    Signed-off-by: Al Viro

    Al Viro
     

15 Jan, 2011

1 commit


14 Jan, 2011

5 commits

  • J. R. Okajima noticed that ->put_link is being attempted on the
    wrong inode, and suggested the way to fix it. I changed it a bit
    according to Al's suggestion to keep an explicit link path around.

    Signed-off-by: Nick Piggin

    Nick Piggin
     
  • …t/npiggin/linux-npiggin

    * 'vfs-scale-working' of git://git.kernel.org/pub/scm/linux/kernel/git/npiggin/linux-npiggin:
    fs: fix do_last error case when need_reval_dot
    nfs: add missing rcu-walk check
    fs: hlist UP debug fixup
    fs: fix dropping of rcu-walk from force_reval_path
    fs: force_reval_path drop rcu-walk before d_invalidate
    fs: small rcu-walk documentation fixes

    Fixed up trivial conflicts in Documentation/filesystems/porting

    Linus Torvalds
     
  • When open(2) without O_DIRECTORY opens an existing dir, it should return
    EISDIR. In do_last(), the variable 'error' is initialized EISDIR, but it
    is changed by d_revalidate() which returns any positive to represent
    'the target dir is valid.'

    Should we keep and return the initialized 'error' in this case.

    Signed-off-by: Nick Piggin

    J. R. Okajima
     
  • As J. R. Okajima noted, force_reval_path passes in the same dentry to
    d_revalidate as the one in the nameidata structure (other callers pass in a
    child), so the locking breaks. This can oops with a chrooted nfs mount, for
    example. Similarly there can be other problems with revalidating a dentry
    which is already in nameidata of the path walk.

    Signed-off-by: Nick Piggin

    Nick Piggin
     
  • d_revalidate can return in rcu-walk mode even when it returns 0. We can't just
    call any old dcache function on rcu-walk dentry (the dentry is unstable, so
    even through d_lock can safely be taken, the result may no longer be what we
    expect -- careful re-checks would be required). So just drop rcu in this case.

    (I missed this conversion when switching to the rcu-walk convention that Linus
    suggested)

    Signed-off-by: Nick Piggin

    Nick Piggin
     

13 Jan, 2011

1 commit

  • When a file is opened with O_TRUNC, the truncate processing is handled
    by handle_truncate(). This function however doesn't receive any info
    about the newly instantiated filp, and therefore can't pass that info
    along so that the setattr can use it.

    This makes NFSv4 misbehave. The client does an open and gets a valid
    stateid, and then doesn't use that stateid on the subsequent truncate.
    It uses the zero-stateid instead. Most servers ignore this fact and
    just do the truncate anyway, but some don't like it (notably, RHEL4).

    It seems more correct that since we have a fully instantiated file at
    the time that handle_truncate is called, that we pass that along so
    that the truncate operation can properly use it.

    Signed-off-by: Jeff Layton
    Signed-off-by: Al Viro

    Jeff Layton
     

10 Jan, 2011

1 commit

  • Fix new kernel-doc notation warnings in fs/namei.c and spell
    ECHILD correctly.

    Warning(fs/namei.c:218): No description found for parameter 'flags'
    Warning(fs/namei.c:425): Excess function parameter 'Returns' description in 'nameidata_drop_rcu'
    Warning(fs/namei.c:478): Excess function parameter 'Returns' description in 'nameidata_dentry_drop_rcu'
    Warning(fs/namei.c:540): Excess function parameter 'Returns' description in 'nameidata_drop_rcu_last'

    Signed-off-by: Randy Dunlap
    Cc: Nick Piggin
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     

07 Jan, 2011

9 commits

  • The problem that this patch aims to fix is vfsmount refcounting scalability.
    We need to take a reference on the vfsmount for every successful path lookup,
    which often go to the same mount point.

    The fundamental difficulty is that a "simple" reference count can never be made
    scalable, because any time a reference is dropped, we must check whether that
    was the last reference. To do that requires communication with all other CPUs
    that may have taken a reference count.

    We can make refcounts more scalable in a couple of ways, involving keeping
    distributed counters, and checking for the global-zero condition less
    frequently.

    - check the global sum once every interval (this will delay zero detection
    for some interval, so it's probably a showstopper for vfsmounts).

    - keep a local count and only taking the global sum when local reaches 0 (this
    is difficult for vfsmounts, because we can't hold preempt off for the life of
    a reference, so a counter would need to be per-thread or tied strongly to a
    particular CPU which requires more locking).

    - keep a local difference of increments and decrements, which allows us to sum
    the total difference and hence find the refcount when summing all CPUs. Then,
    keep a single integer "long" refcount for slow and long lasting references,
    and only take the global sum of local counters when the long refcount is 0.

    This last scheme is what I implemented here. Attached mounts and process root
    and working directory references are "long" references, and everything else is
    a short reference.

    This allows scalable vfsmount references during path walking over mounted
    subtrees and unattached (lazy umounted) mounts with processes still running
    in them.

    This results in one fewer atomic op in the fastpath: mntget is now just a
    per-CPU inc, rather than an atomic inc; and mntput just requires a spinlock
    and non-atomic decrement in the common case. However code is otherwise bigger
    and heavier, so single threaded performance is basically a wash.

    Signed-off-by: Nick Piggin

    Nick Piggin
     
  • Signed-off-by: Nick Piggin

    Nick Piggin
     
  • Require filesystems be aware of .d_revalidate being called in rcu-walk
    mode (nd->flags & LOOKUP_RCU). For now do a simple push down, returning
    -ECHILD from all implementations.

    Signed-off-by: Nick Piggin

    Nick Piggin
     
  • Reduce some branches and memory accesses in dcache lookup by adding dentry
    flags to indicate common d_ops are set, rather than having to check them.
    This saves a pointer memory access (dentry->d_op) in common path lookup
    situations, and saves another pointer load and branch in cases where we
    have d_op but not the particular operation.

    Patched with:

    git grep -E '[.>]([[:space:]])*d_op([[:space:]])*=' | xargs sed -e 's/\([^\t ]*\)->d_op = \(.*\);/d_set_d_op(\1, \2);/' -e 's/\([^\t ]*\)\.d_op = \(.*\);/d_set_d_op(\&\1, \2);/' -i

    Signed-off-by: Nick Piggin

    Nick Piggin
     
  • Use a seqlock in the fs_struct to enable us to take an atomic copy of the
    complete cwd and root paths. Use this in the RCU lookup path to avoid a
    thread-shared spinlock in RCU lookup operations.

    Multi-threaded apps may now perform path lookups with scalability matching
    multi-process apps. Operations such as stat(2) become very scalable for
    multi-threaded workload.

    Signed-off-by: Nick Piggin

    Nick Piggin
     
  • Perform common cases of path lookups without any stores or locking in the
    ancestor dentry elements. This is called rcu-walk, as opposed to the current
    algorithm which is a refcount based walk, or ref-walk.

    This results in far fewer atomic operations on every path element,
    significantly improving path lookup performance. It also avoids cacheline
    bouncing on common dentries, significantly improving scalability.

    The overall design is like this:
    * LOOKUP_RCU is set in nd->flags, which distinguishes rcu-walk from ref-walk.
    * Take the RCU lock for the entire path walk, starting with the acquiring
    of the starting path (eg. root/cwd/fd-path). So now dentry refcounts are
    not required for dentry persistence.
    * synchronize_rcu is called when unregistering a filesystem, so we can
    access d_ops and i_ops during rcu-walk.
    * Similarly take the vfsmount lock for the entire path walk. So now mnt
    refcounts are not required for persistence. Also we are free to perform mount
    lookups, and to assume dentry mount points and mount roots are stable up and
    down the path.
    * Have a per-dentry seqlock to protect the dentry name, parent, and inode,
    so we can load this tuple atomically, and also check whether any of its
    members have changed.
    * Dentry lookups (based on parent, candidate string tuple) recheck the parent
    sequence after the child is found in case anything changed in the parent
    during the path walk.
    * inode is also RCU protected so we can load d_inode and use the inode for
    limited things.
    * i_mode, i_uid, i_gid can be tested for exec permissions during path walk.
    * i_op can be loaded.

    When we reach the destination dentry, we lock it, recheck lookup sequence,
    and increment its refcount and mountpoint refcount. RCU and vfsmount locks
    are dropped. This is termed "dropping rcu-walk". If the dentry refcount does
    not match, we can not drop rcu-walk gracefully at the current point in the
    lokup, so instead return -ECHILD (for want of a better errno). This signals the
    path walking code to re-do the entire lookup with a ref-walk.

    Aside from the final dentry, there are other situations that may be encounted
    where we cannot continue rcu-walk. In that case, we drop rcu-walk (ie. take
    a reference on the last good dentry) and continue with a ref-walk. Again, if
    we can drop rcu-walk gracefully, we return -ECHILD and do the whole lookup
    using ref-walk. But it is very important that we can continue with ref-walk
    for most cases, particularly to avoid the overhead of double lookups, and to
    gain the scalability advantages on common path elements (like cwd and root).

    The cases where rcu-walk cannot continue are:
    * NULL dentry (ie. any uncached path element)
    * parent with d_inode->i_op->permission or ACLs
    * dentries with d_revalidate
    * Following links

    In future patches, permission checks and d_revalidate become rcu-walk aware. It
    may be possible eventually to make following links rcu-walk aware.

    Uncached path elements will always require dropping to ref-walk mode, at the
    very least because i_mutex needs to be grabbed, and objects allocated.

    Signed-off-by: Nick Piggin

    Nick Piggin
     
  • dcache_lock no longer protects anything. remove it.

    Signed-off-by: Nick Piggin

    Nick Piggin
     
  • Make d_count non-atomic and protect it with d_lock. This allows us to ensure a
    0 refcount dentry remains 0 without dcache_lock. It is also fairly natural when
    we start protecting many other dentry members with d_lock.

    Signed-off-by: Nick Piggin

    Nick Piggin
     
  • Change d_hash so it may be called from lock-free RCU lookups. See similar
    patch for d_compare for details.

    For in-tree filesystems, this is just a mechanical change.

    Signed-off-by: Nick Piggin

    Nick Piggin
     

08 Dec, 2010

1 commit


29 Oct, 2010

1 commit

  • nameidata_to_filp() drops nd->path or transfers it to opened
    file. In the former case it's a Bad Idea(tm) to do mnt_drop_write()
    on nd->path.mnt, since we might race with umount and vfsmount in
    question might be gone already.

    Fix: don't drop it, then... IOW, have nameidata_to_filp() grab nd->path
    in case it transfers it to file and do path_drop() in callers. After
    they are through with accessing nd->path...

    Reported-by: Miklos Szeredi
    Signed-off-by: Al Viro

    Al Viro
     

26 Oct, 2010

2 commits


18 Aug, 2010

4 commits

  • fs: brlock vfsmount_lock

    Use a brlock for the vfsmount lock. It must be taken for write whenever
    modifying the mount hash or associated fields, and may be taken for read when
    performing mount hash lookups.

    A new lock is added for the mnt-id allocator, so it doesn't need to take
    the heavy vfsmount write-lock.

    The number of atomics should remain the same for fastpath rlock cases, though
    code would be slightly slower due to per-cpu access. Scalability is not not be
    much improved in common cases yet, due to other locks (ie. dcache_lock) getting
    in the way. However path lookups crossing mountpoints should be one case where
    scalability is improved (currently requiring the global lock).

    The slowpath is slower due to use of brlock. On a 64 core, 64 socket, 32 node
    Altix system (high latency to remote nodes), a simple umount microbenchmark
    (mount --bind mnt mnt2 ; umount mnt2 loop 1000 times), before this patch it
    took 6.8s, afterwards took 7.1s, about 5% slower.

    Cc: Al Viro
    Signed-off-by: Nick Piggin
    Signed-off-by: Al Viro

    Nick Piggin
     
  • fs: remove extra lookup in __lookup_hash

    Optimize lookup for create operations, where no dentry should often be
    common-case. In cases where it is not, such as unlink, the added overhead
    is much smaller than the removed.

    Also, move comments about __d_lookup racyness to the __d_lookup call site.
    d_lookup is intuitive; __d_lookup is what needs commenting. So in that same
    vein, add kerneldoc comments to __d_lookup and clean up some of the comments:

    - We are interested in how the RCU lookup works here, particularly with
    renames. Make that explicit, and point to the document where it is explained
    in more detail.
    - RCU is pretty standard now, and macros make implementations pretty mindless.
    If we want to know about RCU barrier details, we look in RCU code.
    - Delete some boring legacy comments because we don't care much about how the
    code used to work, more about the interesting parts of how it works now. So
    comments about lazy LRU may be interesting, but would better be done in the
    LRU or refcount management code.

    Signed-off-by: Nick Piggin
    Signed-off-by: Al Viro

    Nick Piggin
     
  • fs: dentry allocation consolidation

    There are 2 duplicate copies of code in dentry allocation in path lookup.
    Consolidate them into a single function.

    Signed-off-by: Nick Piggin
    Signed-off-by: Al Viro

    Nick Piggin
     
  • fs: fix do_lookup false negative

    In do_lookup, if we initially find no dentry, we take the directory i_mutex and
    re-check the lookup. If we find a dentry there, then we revalidate it if
    needed. However if that revalidate asks for the dentry to be invalidated, we
    return -ENOENT from do_lookup. What should happen instead is an attempt to
    allocate and lookup a new dentry.

    This is probably not noticed because it is rare. It is only reached if a
    concurrent create races in first (in which case, the dentry probably won't be
    invalidated anyway), or if the racy __d_lookup has failed due to a
    false-negative (which is very rare).

    Fix this by removing code and have it use the normal reval path.

    Signed-off-by: Nick Piggin
    Signed-off-by: Al Viro

    Nick Piggin
     

11 Aug, 2010

2 commits

  • Add three helpers that retrieve a refcounted copy of the root and cwd
    from the supplied fs_struct.

    get_fs_root()
    get_fs_pwd()
    get_fs_root_and_pwd()

    Signed-off-by: Miklos Szeredi
    Signed-off-by: Al Viro

    Miklos Szeredi
     
  • * 'for-linus' of git://git.infradead.org/users/eparis/notify: (132 commits)
    fanotify: use both marks when possible
    fsnotify: pass both the vfsmount mark and inode mark
    fsnotify: walk the inode and vfsmount lists simultaneously
    fsnotify: rework ignored mark flushing
    fsnotify: remove global fsnotify groups lists
    fsnotify: remove group->mask
    fsnotify: remove the global masks
    fsnotify: cleanup should_send_event
    fanotify: use the mark in handler functions
    audit: use the mark in handler functions
    dnotify: use the mark in handler functions
    inotify: use the mark in handler functions
    fsnotify: send fsnotify_mark to groups in event handling functions
    fsnotify: Exchange list heads instead of moving elements
    fsnotify: srcu to protect read side of inode and vfsmount locks
    fsnotify: use an explicit flag to indicate fsnotify_destroy_mark has been called
    fsnotify: use _rcu functions for mark list traversal
    fsnotify: place marks on object in order of group memory address
    vfs/fsnotify: fsnotify_close can delay the final work in fput
    fsnotify: store struct file not struct path
    ...

    Fix up trivial delete/modify conflict in fs/notify/inotify/inotify.c.

    Linus Torvalds
     

02 Aug, 2010

2 commits

  • SELinux needs to pass the MAY_ACCESS flag so it can handle auditting
    correctly. Presently the masking of MAY_* flags is done in the VFS. In
    order to allow LSMs to decide what flags they care about and what flags
    they don't just pass them all and the each LSM mask off what they don't
    need. This patch should contain no functional changes to either the VFS or
    any LSM.

    Signed-off-by: Eric Paris
    Acked-by: Stephen D. Smalley
    Signed-off-by: James Morris

    Eric Paris
     
  • When commit be6d3e56a6b9b3a4ee44a0685e39e595073c6f0d "introduce new LSM hooks
    where vfsmount is available." was proposed, regarding security_path_truncate(),
    only "struct file *" argument (which AppArmor wanted to use) was removed.
    But length and time_attrs arguments are not used by TOMOYO nor AppArmor.
    Thus, let's remove these arguments.

    Signed-off-by: Tetsuo Handa
    Acked-by: Nick Piggin
    Signed-off-by: James Morris

    Tetsuo Handa