14 Jul, 2012

4 commits


06 May, 2012

1 commit

  • After we moved inode_sync_wait() from end_writeback() it doesn't make sense
    to call the function end_writeback() anymore. Rename it to clear_inode()
    which well says what the function really does - set I_CLEAR flag.

    Signed-off-by: Jan Kara
    Signed-off-by: Fengguang Wu

    Jan Kara
     

22 Mar, 2012

2 commits

  • Pull vfs pile 1 from Al Viro:
    "This is _not_ all; in particular, Miklos' and Jan's stuff is not there
    yet."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (64 commits)
    ext4: initialization of ext4_li_mtx needs to be done earlier
    debugfs-related mode_t whack-a-mole
    hfsplus: add an ioctl to bless files
    hfsplus: change finder_info to u32
    hfsplus: initialise userflags
    qnx4: new helper - try_extent()
    qnx4: get rid of qnx4_bread/qnx4_getblk
    take removal of PF_FORKNOEXEC to flush_old_exec()
    trim includes in inode.c
    um: uml_dup_mmap() relies on ->mmap_sem being held, but activate_mm() doesn't hold it
    um: embed ->stub_pages[] into mmu_context
    gadgetfs: list_for_each_safe() misuse
    ocfs2: fix leaks on failure exits in module_init
    ecryptfs: make register_filesystem() the last potential failure exit
    ntfs: forgets to unregister sysctls on register_filesystem() failure
    logfs: missing cleanup on register_filesystem() failure
    jfs: mising cleanup on register_filesystem() failure
    make configfs_pin_fs() return root dentry on success
    configfs: configfs_create_dir() has parent dentry in dentry->d_parent
    configfs: sanitize configfs_create()
    ...

    Linus Torvalds
     
  • Pull kmap_atomic cleanup from Cong Wang.

    It's been in -next for a long time, and it gets rid of the (no longer
    used) second argument to k[un]map_atomic().

    Fix up a few trivial conflicts in various drivers, and do an "evil
    merge" to catch some new uses that have come in since Cong's tree.

    * 'kmap_atomic' of git://github.com/congwang/linux: (59 commits)
    feature-removal-schedule.txt: schedule the deprecated form of kmap_atomic() for removal
    highmem: kill all __kmap_atomic() [swarren@nvidia.com: highmem: Fix ARM build break due to __kmap_atomic rename]
    drbd: remove the second argument of k[un]map_atomic()
    zcache: remove the second argument of k[un]map_atomic()
    gma500: remove the second argument of k[un]map_atomic()
    dm: remove the second argument of k[un]map_atomic()
    tomoyo: remove the second argument of k[un]map_atomic()
    sunrpc: remove the second argument of k[un]map_atomic()
    rds: remove the second argument of k[un]map_atomic()
    net: remove the second argument of k[un]map_atomic()
    mm: remove the second argument of k[un]map_atomic()
    lib: remove the second argument of k[un]map_atomic()
    power: remove the second argument of k[un]map_atomic()
    kdb: remove the second argument of k[un]map_atomic()
    udf: remove the second argument of k[un]map_atomic()
    ubifs: remove the second argument of k[un]map_atomic()
    squashfs: remove the second argument of k[un]map_atomic()
    reiserfs: remove the second argument of k[un]map_atomic()
    ocfs2: remove the second argument of k[un]map_atomic()
    ntfs: remove the second argument of k[un]map_atomic()
    ...

    Linus Torvalds
     

21 Mar, 2012

2 commits


20 Mar, 2012

1 commit


17 Mar, 2012

2 commits

  • When writing files to afs I sometimes hit a BUG:

    kernel BUG at fs/afs/rxrpc.c:179!

    With a backtrace of:

    afs_free_call
    afs_make_call
    afs_fs_store_data
    afs_vnode_store_data
    afs_write_back_from_locked_page
    afs_writepages_region
    afs_writepages

    The cause is:

    ASSERT(skb_queue_empty(&call->rx_queue));

    Looking at a tcpdump of the session the abort happens because we
    are exceeding our disk quota:

    rx abort fs reply store-data error diskquota exceeded (32)

    So the abort error is valid. We hit the BUG because we haven't
    freed all the resources for the call.

    By freeing any skbs in call->rx_queue before calling afs_free_call
    we avoid hitting leaking memory and avoid hitting the BUG.

    Signed-off-by: Anton Blanchard
    Signed-off-by: David Howells
    Cc:
    Signed-off-by: Linus Torvalds

    Anton Blanchard
     
  • A read of a large file on an afs mount failed:

    # cat junk.file > /dev/null
    cat: junk.file: Bad message

    Looking at the trace, call->offset wrapped since it is only an
    unsigned short. In afs_extract_data:

    _enter("{%u},{%zu},%d,,%zu", call->offset, len, last, count);
    ...

    if (call->offset < count) {
    if (last) {
    _leave(" = -EBADMSG [%d < %zu]", call->offset, count);
    return -EBADMSG;
    }

    Which matches the trace:

    [cat ] ==> afs_extract_data({65132},{524},1,,65536)
    [cat ] < 65536]

    call->offset went from 65132 to 0. Fix this by making call->offset an
    unsigned int.

    Signed-off-by: Anton Blanchard
    Signed-off-by: David Howells
    Cc:
    Signed-off-by: Linus Torvalds

    Anton Blanchard
     

04 Jan, 2012

4 commits


02 Nov, 2011

1 commit


21 Jul, 2011

2 commits

  • Fix silly characters in a comment in AFS code (some weird characters replaced
    the word 'flag' some point way back).

    Reported-by: viro@ZenIV.linux.org.uk
    Signed-off-by: David Howells
    Signed-off-by: Al Viro

    David Howells
     
  • Btrfs needs to be able to control how filemap_write_and_wait_range() is called
    in fsync to make it less of a painful operation, so push down taking i_mutex and
    the calling of filemap_write_and_wait() down into the ->fsync() handlers. Some
    file systems can drop taking the i_mutex altogether it seems, like ext3 and
    ocfs2. For correctness sake I just pushed everything down in all cases to make
    sure that we keep the current behavior the same for everybody, and then each
    individual fs maintainer can make up their mind about what to do from there.
    Thanks,

    Acked-by: Jan Kara
    Signed-off-by: Josef Bacik
    Signed-off-by: Al Viro

    Josef Bacik
     

20 Jul, 2011

3 commits


16 Jun, 2011

3 commits


13 Jun, 2011

1 commit

  • * set ->s_fs_info in set() callback passed to sget()
    * allocate the thing and set it up enough for afs_test_super() before
    making it visible
    * have it freed in ->kill_sb() (current tree simply leaks it)
    * have ->put_super() leave ->s_fs_info->volume alone; it's too early for
    dropping it; do that from ->kill_sb() after having called kill_anon_super().

    Signed-off-by: Al Viro

    Al Viro
     

28 May, 2011

1 commit


26 May, 2011

2 commits


31 Mar, 2011

1 commit


26 Feb, 2011

1 commit

  • I'm seeing the following oops when testing afs:

    Unable to handle kernel paging request for data at address 0x00000008
    ...
    NIP [c0000000003393b0] .afs_unlink_writeback+0x38/0xc0
    LR [c00000000033987c] .afs_put_writeback+0x98/0xec
    Call Trace:
    [c00000000345f600] [c00000000033987c] .afs_put_writeback+0x98/0xec
    [c00000000345f690] [c00000000033ae80] .afs_write_begin+0x6a4/0x75c
    [c00000000345f790] [c00000000012b77c] .generic_file_buffered_write+0x148/0x320
    [c00000000345f8d0] [c00000000012e1b8] .__generic_file_aio_write+0x37c/0x3e4
    [c00000000345f9d0] [c00000000012e2a8] .generic_file_aio_write+0x88/0xfc
    [c00000000345fa90] [c0000000003390a8] .afs_file_write+0x10c/0x178
    [c00000000345fb40] [c000000000188788] .do_sync_write+0xc4/0x128
    [c00000000345fcc0] [c000000000189658] .vfs_write+0xe8/0x1d8
    [c00000000345fd70] [c000000000189884] .SyS_write+0x68/0xb0
    [c00000000345fe30] [c000000000008564] syscall_exit+0x0/0x40

    afs_write_begin hits an error and calls afs_unlink_writeback. In there
    we do list_del_init on an uninitialised list.

    The patch below initialises ->link when creating the afs_writeback struct.

    Signed-off-by: Anton Blanchard
    Signed-off-by: David Howells
    Signed-off-by: Linus Torvalds

    Anton Blanchard
     

16 Jan, 2011

3 commits

  • Unexport do_add_mount() and make ->d_automount() return the vfsmount to be
    added rather than calling do_add_mount() itself. follow_automount() will then
    do the addition.

    This slightly complicates things as ->d_automount() normally wants to add the
    new vfsmount to an expiration list and start an expiration timer. The problem
    with that is that the vfsmount will be deleted if it has a refcount of 1 and
    the timer will not repeat if the expiration list is empty.

    To this end, we require the vfsmount to be returned from d_automount() with a
    refcount of (at least) 2. One of these refs will be dropped unconditionally.
    In addition, follow_automount() must get a 3rd ref around the call to
    do_add_mount() lest it eat a ref and return an error, leaving the mount we
    have open to being expired as we would otherwise have only 1 ref on it.

    d_automount() should also add the the vfsmount to the expiration list (by
    calling mnt_set_expiry()) and start the expiration timer before returning, if
    this mechanism is to be used. The vfsmount will be unlinked from the
    expiration list by follow_automount() if do_add_mount() fails.

    This patch also fixes the call to do_add_mount() for AFS to propagate the mount
    flags from the parent vfsmount.

    Signed-off-by: David Howells
    Signed-off-by: Al Viro

    David Howells
     
  • Make AFS use the new d_automount() dentry operation rather than abusing
    follow_link() on directories.

    Signed-off-by: David Howells
    Signed-off-by: Al Viro

    David Howells
     
  • Add a dentry op (d_manage) to permit a filesystem to hold a process and make it
    sleep when it tries to transit away from one of that filesystem's directories
    during a pathwalk. The operation is keyed off a new dentry flag
    (DCACHE_MANAGE_TRANSIT).

    The filesystem is allowed to be selective about which processes it holds and
    which it permits to continue on or prohibits from transiting from each flagged
    directory. This will allow autofs to hold up client processes whilst letting
    its userspace daemon through to maintain the directory or the stuff behind it
    or mounted upon it.

    The ->d_manage() dentry operation:

    int (*d_manage)(struct path *path, bool mounting_here);

    takes a pointer to the directory about to be transited away from and a flag
    indicating whether the transit is undertaken by do_add_mount() or
    do_move_mount() skipping through a pile of filesystems mounted on a mountpoint.

    It should return 0 if successful and to let the process continue on its way;
    -EISDIR to prohibit the caller from skipping to overmounted filesystems or
    automounting, and to use this directory; or some other error code to return to
    the user.

    ->d_manage() is called with namespace_sem writelocked if mounting_here is true
    and no other locks held, so it may sleep. However, if mounting_here is true,
    it may not initiate or wait for a mount or unmount upon the parameter
    directory, even if the act is actually performed by userspace.

    Within fs/namei.c, follow_managed() is extended to check with d_manage() first
    on each managed directory, before transiting away from it or attempting to
    automount upon it.

    follow_down() is renamed follow_down_one() and should only be used where the
    filesystem deliberately intends to avoid management steps (e.g. autofs).

    A new follow_down() is added that incorporates the loop done by all other
    callers of follow_down() (do_add/move_mount(), autofs and NFSD; whilst AFS, NFS
    and CIFS do use it, their use is removed by converting them to use
    d_automount()). The new follow_down() calls d_manage() as appropriate. It
    also takes an extra parameter to indicate if it is being called from mount code
    (with namespace_sem writelocked) which it passes to d_manage(). follow_down()
    ignores automount points so that it can be used to mount on them.

    __follow_mount_rcu() is made to abort rcu-walk mode if it hits a directory with
    DCACHE_MANAGE_TRANSIT set on the basis that we're probably going to have to
    sleep. It would be possible to enter d_manage() in rcu-walk mode too, and have
    that determine whether to abort or not itself. That would allow the autofs
    daemon to continue on in rcu-walk mode.

    Note that DCACHE_MANAGE_TRANSIT on a directory should be cleared when it isn't
    required as every tranist from that directory will cause d_manage() to be
    invoked. It can always be set again when necessary.

    ==========================
    WHAT THIS MEANS FOR AUTOFS
    ==========================

    Autofs currently uses the lookup() inode op and the d_revalidate() dentry op to
    trigger the automounting of indirect mounts, and both of these can be called
    with i_mutex held.

    autofs knows that the i_mutex will be held by the caller in lookup(), and so
    can drop it before invoking the daemon - but this isn't so for d_revalidate(),
    since the lock is only held on _some_ of the code paths that call it. This
    means that autofs can't risk dropping i_mutex from its d_revalidate() function
    before it calls the daemon.

    The bug could manifest itself as, for example, a process that's trying to
    validate an automount dentry that gets made to wait because that dentry is
    expired and needs cleaning up:

    mkdir S ffffffff8014e05a 0 32580 24956
    Call Trace:
    [] :autofs4:autofs4_wait+0x674/0x897
    [] avc_has_perm+0x46/0x58
    [] autoremove_wake_function+0x0/0x2e
    [] :autofs4:autofs4_expire_wait+0x41/0x6b
    [] :autofs4:autofs4_revalidate+0x91/0x149
    [] __lookup_hash+0xa0/0x12f
    [] lookup_create+0x46/0x80
    [] sys_mkdirat+0x56/0xe4

    versus the automount daemon which wants to remove that dentry, but can't
    because the normal process is holding the i_mutex lock:

    automount D ffffffff8014e05a 0 32581 1 32561
    Call Trace:
    [] __mutex_lock_slowpath+0x60/0x9b
    [] do_path_lookup+0x2ca/0x2f1
    [] .text.lock.mutex+0xf/0x14
    [] do_rmdir+0x77/0xde
    [] tracesys+0x71/0xe0
    [] tracesys+0xd5/0xe0

    which means that the system is deadlocked.

    This patch allows autofs to hold up normal processes whilst the daemon goes
    ahead and does things to the dentry tree behind the automouter point without
    risking a deadlock as almost no locks are held in d_manage() and none in
    d_automount().

    Signed-off-by: David Howells
    Was-Acked-by: Ian Kent
    Signed-off-by: Al Viro

    David Howells
     

15 Jan, 2011

1 commit

  • flush_scheduled_work() is going away. afs needs to make sure all the
    works it has queued have finished before being unloaded and there can
    be arbitrary number of pending works. Add afs_wq and use it as the
    flush domain instead of the system workqueue.

    Also, convert cancel_delayed_work() + flush_scheduled_work() to
    cancel_delayed_work_sync() in afs_mntpt_kill_timer().

    Signed-off-by: Tejun Heo
    Signed-off-by: David Howells
    Cc: linux-afs@lists.infradead.org
    Signed-off-by: Linus Torvalds

    Tejun Heo
     

13 Jan, 2011

1 commit


07 Jan, 2011

4 commits

  • Signed-off-by: Nick Piggin

    Nick Piggin
     
  • Require filesystems be aware of .d_revalidate being called in rcu-walk
    mode (nd->flags & LOOKUP_RCU). For now do a simple push down, returning
    -ECHILD from all implementations.

    Signed-off-by: Nick Piggin

    Nick Piggin
     
  • Reduce some branches and memory accesses in dcache lookup by adding dentry
    flags to indicate common d_ops are set, rather than having to check them.
    This saves a pointer memory access (dentry->d_op) in common path lookup
    situations, and saves another pointer load and branch in cases where we
    have d_op but not the particular operation.

    Patched with:

    git grep -E '[.>]([[:space:]])*d_op([[:space:]])*=' | xargs sed -e 's/\([^\t ]*\)->d_op = \(.*\);/d_set_d_op(\1, \2);/' -e 's/\([^\t ]*\)\.d_op = \(.*\);/d_set_d_op(\&\1, \2);/' -i

    Signed-off-by: Nick Piggin

    Nick Piggin
     
  • RCU free the struct inode. This will allow:

    - Subsequent store-free path walking patch. The inode must be consulted for
    permissions when walking, so an RCU inode reference is a must.
    - sb_inode_list_lock to be moved inside i_lock because sb list walkers who want
    to take i_lock no longer need to take sb_inode_list_lock to walk the list in
    the first place. This will simplify and optimize locking.
    - Could remove some nested trylock loops in dcache code
    - Could potentially simplify things a bit in VM land. Do not need to take the
    page lock to follow page->mapping.

    The downsides of this is the performance cost of using RCU. In a simple
    creat/unlink microbenchmark, performance drops by about 10% due to inability to
    reuse cache-hot slab objects. As iterations increase and RCU freeing starts
    kicking over, this increases to about 20%.

    In cases where inode lifetimes are longer (ie. many inodes may be allocated
    during the average life span of a single inode), a lot of this cache reuse is
    not applicable, so the regression caused by this patch is smaller.

    The cache-hot regression could largely be avoided by using SLAB_DESTROY_BY_RCU,
    however this adds some complexity to list walking and store-free path walking,
    so I prefer to implement this at a later date, if it is shown to be a win in
    real situations. I haven't found a regression in any non-micro benchmark so I
    doubt it will be a problem.

    Signed-off-by: Nick Piggin

    Nick Piggin