16 Apr, 2015

2 commits


18 Feb, 2015

2 commits


20 Nov, 2014

1 commit


05 Jun, 2014

2 commits


22 Nov, 2013

1 commit

  • A race window in configfs, it starts from one dentry is UNHASHED and end
    before configfs_d_iput is called. In this window, if a lookup happen,
    since the original dentry was UNHASHED, so a new dentry will be
    allocated, and then in configfs_attach_attr(), sd->s_dentry will be
    updated to the new dentry. Then in configfs_d_iput(),
    BUG_ON(sd->s_dentry != dentry) will be triggered and system panic.

    sys_open: sys_close:
    ... fput
    dput
    dentry_kill
    __d_drop dentry still point
    to this dentry.

    lookup_real
    configfs_lookup
    configfs_attach_attr---> update sd->s_dentry
    to new allocated dentry here.

    d_kill
    configfs_d_iput s_dentry != dentry)
    triggered here.

    To fix it, change configfs_d_iput to not update sd->s_dentry if
    sd->s_count > 2, that means there are another dentry is using the sd
    beside the one that is going to be put. Use configfs_dirent_lock in
    configfs_attach_attr to sync with configfs_d_iput.

    With the following steps, you can reproduce the bug.

    1. enable ocfs2, this will mount configfs at /sys/kernel/config and
    fill configure in it.

    2. run the following script.
    while [ 1 ]; do cat /sys/kernel/config/cluster/$your_cluster_name/idle_timeout_ms > /dev/null; done &
    while [ 1 ]; do cat /sys/kernel/config/cluster/$your_cluster_name/idle_timeout_ms > /dev/null; done &

    Signed-off-by: Junxiao Bi
    Cc: Joel Becker
    Cc: Al Viro
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Junxiao Bi
     

16 Nov, 2013

1 commit


15 Jul, 2013

1 commit

  • Pull more vfs stuff from Al Viro:
    "O_TMPFILE ABI changes, Oleg's fput() series, misc cleanups, including
    making simple_lookup() usable for filesystems with non-NULL s_d_op,
    which allows us to get rid of quite a bit of ugliness"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    sunrpc: now we can just set ->s_d_op
    cgroup: we can use simple_lookup() now
    efivarfs: we can use simple_lookup() now
    make simple_lookup() usable for filesystems that set ->s_d_op
    configfs: don't open-code d_alloc_name()
    __rpc_lookup_create_exclusive: pass string instead of qstr
    rpc_create_*_dir: don't bother with qstr
    llist: llist_add() can use llist_add_batch()
    llist: fix/simplify llist_add() and llist_add_batch()
    fput: turn "list_head delayed_fput_list" into llist_head
    fs/file_table.c:fput(): add comment
    Safer ABI for O_TMPFILE

    Linus Torvalds
     

14 Jul, 2013

1 commit


10 Jul, 2013

1 commit

  • Pull third set of VFS updates from Al Viro:
    "Misc stuff all over the place. There will be one more pile in a
    couple of days"

    This is an "evil merge" that also uses the new d_count helper in
    fs/configfs/dir.c, missed by commit 84d08fa888e7 ("helper for reading
    ->d_count")

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    ncpfs: fix error return code in ncp_parse_options()
    locks: move file_lock_list to a set of percpu hlist_heads and convert file_lock_lock to an lglock
    seq_file: add seq_list_*_percpu helpers
    f2fs: fix readdir incorrectness
    mode_t whack-a-mole...
    lustre: kill the pointless wrapper
    helper for reading ->d_count

    Linus Torvalds
     

29 Jun, 2013

1 commit


27 Feb, 2013

1 commit

  • Pull vfs pile (part one) from Al Viro:
    "Assorted stuff - cleaning namei.c up a bit, fixing ->d_name/->d_parent
    locking violations, etc.

    The most visible changes here are death of FS_REVAL_DOT (replaced with
    "has ->d_weak_revalidate()") and a new helper getting from struct file
    to inode. Some bits of preparation to xattr method interface changes.

    Misc patches by various people sent this cycle *and* ocfs2 fixes from
    several cycles ago that should've been upstream right then.

    PS: the next vfs pile will be xattr stuff."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (46 commits)
    saner proc_get_inode() calling conventions
    proc: avoid extra pde_put() in proc_fill_super()
    fs: change return values from -EACCES to -EPERM
    fs/exec.c: make bprm_mm_init() static
    ocfs2/dlm: use GFP_ATOMIC inside a spin_lock
    ocfs2: fix possible use-after-free with AIO
    ocfs2: Fix oops in ocfs2_fast_symlink_readpage() code path
    get_empty_filp()/alloc_file() leave both ->f_pos and ->f_version zero
    target: writev() on single-element vector is pointless
    export kernel_write(), convert open-coded instances
    fs: encode_fh: return FILEID_INVALID if invalid fid_type
    kill f_vfsmnt
    vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op
    nfsd: handle vfs_getattr errors in acl protocol
    switch vfs_getattr() to struct path
    default SET_PERSONALITY() in linux/elf.h
    ceph: prepopulate inodes only when request is aborted
    d_hash_and_lookup(): export, switch open-coded instances
    9p: switch v9fs_set_create_acl() to inode+fid, do it before d_instantiate()
    9p: split dropping the acls from v9fs_set_create_acl()
    ...

    Linus Torvalds
     

23 Feb, 2013

1 commit


22 Feb, 2013

1 commit


18 Dec, 2012

1 commit


14 Jul, 2012

1 commit

  • Just the flags; only NFS cares even about that, but there are
    legitimate uses for such argument. And getting rid of that
    completely would require splitting ->lookup() into a couple
    of methods (at least), so let's leave that alone for now...

    Signed-off-by: Al Viro

    Al Viro
     

21 Mar, 2012

4 commits


04 Jan, 2012

3 commits


28 May, 2011

1 commit


27 May, 2011

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (25 commits)
    cifs: remove unnecessary dentry_unhash on rmdir/rename_dir
    ocfs2: remove unnecessary dentry_unhash on rmdir/rename_dir
    exofs: remove unnecessary dentry_unhash on rmdir/rename_dir
    nfs: remove unnecessary dentry_unhash on rmdir/rename_dir
    ext2: remove unnecessary dentry_unhash on rmdir/rename_dir
    ext3: remove unnecessary dentry_unhash on rmdir/rename_dir
    ext4: remove unnecessary dentry_unhash on rmdir/rename_dir
    btrfs: remove unnecessary dentry_unhash in rmdir/rename_dir
    ceph: remove unnecessary dentry_unhash calls
    vfs: clean up vfs_rename_other
    vfs: clean up vfs_rename_dir
    vfs: clean up vfs_rmdir
    vfs: fix vfs_rename_dir for FS_RENAME_DOES_D_MOVE filesystems
    libfs: drop unneeded dentry_unhash
    vfs: update dentry_unhash() comment
    vfs: push dentry_unhash on rename_dir into file systems
    vfs: push dentry_unhash on rmdir into file systems
    vfs: remove dget() from dentry_unhash()
    vfs: dentry_unhash immediately prior to rmdir
    vfs: Block mmapped writes while the fs is frozen
    ...

    Linus Torvalds
     

26 May, 2011

1 commit

  • Only a few file systems need this. Start by pushing it down into each
    fs rmdir method (except gfs2 and xfs) so it can be dealt with on a per-fs
    basis.

    This does not change behavior for any in-tree file systems.

    Acked-by: Christoph Hellwig
    Signed-off-by: Sage Weil
    Signed-off-by: Al Viro

    Sage Weil
     

18 May, 2011

2 commits

  • configfs_readdir() will use the existing inode numbers of inodes in the
    dcache, but it makes them up for attribute files that aren't currently
    instantiated. There is a race where a closing attribute file can be
    tearing down at the same time as configfs_readdir() is trying to get its
    inode number.

    We want to get the inode number of open attribute files, because they
    should match while instantiated. We can't lock down the transition
    where dentry->d_inode is set to NULL, so we just check for NULL there.
    We can, however, ensure that an inode we find isn't iput() in
    configfs_d_iput() until after we've accessed it.

    Signed-off-by: Joel Becker

    Joel Becker
     
  • When configfs is faking mkdir() on its subsystem or default group
    objects, it starts by adding a negative dentry. It then tries to
    instantiate the group. If that should fail, it must clean up after
    itself.

    I was using d_delete() here, but configfs_attach_group() promises to
    return an empty dentry on error. d_delete() explodes with the entry
    dentry. Let's try d_drop() instead. The unhashing is what we want for
    our dentry.

    Signed-off-by: Joel Becker

    Joel Becker
     

31 Mar, 2011

1 commit


13 Jan, 2011

1 commit


07 Jan, 2011

4 commits

  • Reduce some branches and memory accesses in dcache lookup by adding dentry
    flags to indicate common d_ops are set, rather than having to check them.
    This saves a pointer memory access (dentry->d_op) in common path lookup
    situations, and saves another pointer load and branch in cases where we
    have d_op but not the particular operation.

    Patched with:

    git grep -E '[.>]([[:space:]])*d_op([[:space:]])*=' | xargs sed -e 's/\([^\t ]*\)->d_op = \(.*\);/d_set_d_op(\1, \2);/' -e 's/\([^\t ]*\)\.d_op = \(.*\);/d_set_d_op(\&\1, \2);/' -i

    Signed-off-by: Nick Piggin

    Nick Piggin
     
  • Make d_count non-atomic and protect it with d_lock. This allows us to ensure a
    0 refcount dentry remains 0 without dcache_lock. It is also fairly natural when
    we start protecting many other dentry members with d_lock.

    Signed-off-by: Nick Piggin

    Nick Piggin
     
  • Change d_delete from a dentry deletion notification to a dentry caching
    advise, more like ->drop_inode. Require it to be constant and idempotent,
    and not take d_lock. This is how all existing filesystems use the callback
    anyway.

    This makes fine grained dentry locking of dput and dentry lru scanning
    much simpler.

    Signed-off-by: Nick Piggin

    Nick Piggin
     
  • Switching d_op on a live dentry is racy in general, so avoid it. In this case
    it is a negative dentry, which is safer, but there are still concurrent ops
    which may be called on d_op in that case (eg. d_revalidate). So in general
    a filesystem may not do this. Fix configfs so as not to do this.

    Signed-off-by: Nick Piggin

    Nick Piggin
     

15 May, 2010

1 commit

  • 1) i_flags simply doesn't work for mount/unlink race prevention;
    we may have many links to file and rm on one of those obviously
    shouldn't prevent bind on top of another later on. To fix it
    right way we need to mark _dentry_ as unsuitable for mounting
    upon; new flag (DCACHE_CANT_MOUNT) is protected by d_flags and
    i_mutex on the inode in question. Set it (with dont_mount(dentry))
    in unlink/rmdir/etc., check (with cant_mount(dentry)) in places
    in namespace.c that used to check for S_DEAD. Setting S_DEAD
    is still needed in places where we used to set it (for directories
    getting killed), since we rely on it for readdir/rmdir race
    prevention.

    2) rename()/mount() protection has another bogosity - we unhash
    the target before we'd checked that it's not a mountpoint. Fixed.

    3) ancient bogosity in pivot_root() - we locked i_mutex on the
    right directory, but checked S_DEAD on the different (and wrong)
    one. Noticed and fixed.

    Signed-off-by: Al Viro

    Al Viro
     

01 May, 2009

2 commits

  • configfs_depend_item() recursively locks all inodes mutex from configfs root to
    the target item, which makes lockdep unhappy. The purpose of this recursive
    locking is to ensure that the item tree can be safely parsed and that the target
    item, if found, is not about to leave.

    This patch reworks configfs_depend_item() locking using configfs_dirent_lock.
    Since configfs_dirent_lock protects all changes to the configfs_dirent tree, and
    protects tagging of items to be removed, this lock can be used instead of the
    inodes mutex lock chain.
    This needs that the check for dependents be done atomically with
    CONFIGFS_USET_DROPPING tagging.

    Now lockdep looks happy with configfs.

    [ Lifted the setting of s_type into configfs_new_dirent() to satisfy the
    atomic setting of CONFIGFS_USET_CREATING -- Joel ]

    Signed-off-by: Louis Rilling
    Signed-off-by: Joel Becker

    Louis Rilling
     
  • When attaching default groups (subdirs) of a new group (in mkdir() or
    in configfs_register()), configfs recursively takes inode's mutexes
    along the path from the parent of the new group to the default
    subdirs. This is needed to ensure that the VFS will not race with
    operations on these sub-dirs. This is safe for the following reasons:

    - the VFS allows one to lock first an inode and second one of its
    children (The lock subclasses for this pattern are respectively
    I_MUTEX_PARENT and I_MUTEX_CHILD);
    - from this rule any inode path can be recursively locked in
    descending order as long as it stays under a single mountpoint and
    does not follow symlinks.

    Unfortunately lockdep does not know (yet?) how to handle such
    recursion.

    I've tried to use Peter Zijlstra's lock_set_subclass() helper to
    upgrade i_mutexes from I_MUTEX_CHILD to I_MUTEX_PARENT when we know
    that we might recursively lock some of their descendant, but this
    usage does not seem to fit the purpose of lock_set_subclass() because
    it leads to several i_mutex locked with subclass I_MUTEX_PARENT by
    the same task.

    >From inside configfs it is not possible to serialize those recursive
    locking with a top-level one, because mkdir() and rmdir() are already
    called with inodes locked by the VFS. So using some
    mutex_lock_nest_lock() is not an option.

    I am proposing two solutions:
    1) one that wraps recursive mutex_lock()s with
    lockdep_off()/lockdep_on().
    2) (as suggested earlier by Peter Zijlstra) one that puts the
    i_mutexes recursively locked in different classes based on their
    depth from the top-level config_group created. This
    induces an arbitrary limit (MAX_LOCK_DEPTH - 2 == 46) on the
    nesting of configfs default groups whenever lockdep is activated
    but this limit looks reasonably high. Unfortunately, this also
    isolates VFS operations on configfs default groups from the others
    and thus lowers the chances to detect locking issues.

    Nobody likes solution 1), which I can understand.

    This patch implements solution 2). However lockdep is still not happy with
    configfs_depend_item(). Next patch reworks the locking of
    configfs_depend_item() and finally makes lockdep happy.

    [ Note: This hides a few locking interactions with the VFS from lockdep.
    That was my big concern, because we like lockdep's protection. However,
    the current state always dumps a spurious warning. The locking is
    correct, so I tell people to ignore the warning and that we'll keep
    our eyes on the locking to make sure it stays correct. With this patch,
    we eliminate the warning. We do lose some of the lockdep protections,
    but this only means that we still have to keep our eyes on the locking.
    We're going to do that anyway. -- Joel ]

    Signed-off-by: Louis Rilling
    Signed-off-by: Joel Becker

    Louis Rilling
     

28 Mar, 2009

1 commit