15 Sep, 2014

4 commits

  • Pull vfs fixes from Al Viro:
    "double iput() on failure exit in lustre, racy removal of spliced
    dentries from ->s_anon in __d_materialise_dentry() plus a bunch of
    assorted RCU pathwalk fixes"

    The RCU pathwalk fixes end up fixing a couple of cases where we
    incorrectly dropped out of RCU walking, due to incorrect initialization
    and testing of the sequence locks in some corner cases. Since dropping
    out of RCU walk mode forces the slow locked accesses, those corner cases
    slowed down quite dramatically.

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    be careful with nd->inode in path_init() and follow_dotdot_rcu()
    don't bugger nd->seq on set_root_rcu() from follow_dotdot_rcu()
    fix bogus read_seqretry() checks introduced in b37199e
    move the call of __d_drop(anon) into __d_materialise_unique(dentry, anon)
    [fix] lustre: d_make_root() does iput() on dentry allocation failure

    Linus Torvalds
     
  • The performance regression that Josef Bacik reported in the pathname
    lookup (see commit 99d263d4c5b2 "vfs: fix bad hashing of dentries") made
    me look at performance stability of the dcache code, just to verify that
    the problem was actually fixed. That turned up a few other problems in
    this area.

    There are a few cases where we exit RCU lookup mode and go to the slow
    serializing case when we shouldn't, Al has fixed those and they'll come
    in with the next VFS pull.

    But my performance verification also shows that link_path_walk() turns
    out to have a very unfortunate 32-bit store of the length and hash of
    the name we look up, followed by a 64-bit read of the combined hash_len
    field. That screws up the processor store to load forwarding, causing
    an unnecessary hickup in this critical routine.

    It's caused by the ugly calling convention for the "hash_name()"
    function, and easily fixed by just making hash_name() fill in the whole
    'struct qstr' rather than passing it a pointer to just the hash value.

    With that, the profile for this function looks much smoother.

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • in the former we simply check if dentry is still valid after picking
    its ->d_inode; in the latter we fetch ->d_inode in the same places
    where we fetch dentry and its ->d_seq, under the same checks.

    Cc: stable@vger.kernel.org # 2.6.38+
    Signed-off-by: Al Viro

    Al Viro
     
  • return the value instead, and have path_init() do the assignment. Broken by
    "vfs: Fix absolute RCU path walk failures due to uninitialized seq number",
    which was Cc-stable with 2.6.38+ as destination. This one should go where
    it went.

    To avoid dummy value returned in case when root is already set (it would do
    no harm, actually, since the only caller that doesn't ignore the return value
    is guaranteed to have nd->root *not* set, but it's more obvious that way),
    lift the check into callers. And do the same to set_root(), to keep them
    in sync.

    Cc: stable@vger.kernel.org # 2.6.38+
    Signed-off-by: Al Viro

    Al Viro
     

14 Sep, 2014

3 commits

  • read_seqretry() returns true on mismatch, not on match...

    Cc: stable@vger.kernel.org # 3.15+
    Signed-off-by: Al Viro

    Al Viro
     
  • and lock the right list there

    Cc: stable@vger.kernel.org
    Signed-off-by: Al Viro

    Al Viro
     
  • Josef Bacik found a performance regression between 3.2 and 3.10 and
    narrowed it down to commit bfcfaa77bdf0 ("vfs: use 'unsigned long'
    accesses for dcache name comparison and hashing"). He reports:

    "The test case is essentially

    for (i = 0; i < 1000000; i++)
    mkdir("a$i");

    On xfs on a fio card this goes at about 20k dir/sec with 3.2, and 12k
    dir/sec with 3.10. This is because we spend waaaaay more time in
    __d_lookup on 3.10 than in 3.2.

    The new hashing function for strings is suboptimal for <
    sizeof(unsigned long) string names (and hell even > sizeof(unsigned
    long) string names that I've tested). I broke out the old hashing
    function and the new one into a userspace helper to get real numbers
    and this is what I'm getting:

    Old hash table had 1000000 entries, 0 dupes, 0 max dupes
    New hash table had 12628 entries, 987372 dupes, 900 max dupes
    We had 11400 buckets with a p50 of 30 dupes, p90 of 240 dupes, p99 of 567 dupes for the new hash

    My test does the hash, and then does the d_hash into a integer pointer
    array the same size as the dentry hash table on my system, and then
    just increments the value at the address we got to see how many
    entries we overlap with.

    As you can see the old hash function ended up with all 1 million
    entries in their own bucket, whereas the new one they are only
    distributed among ~12.5k buckets, which is why we're using so much
    more CPU in __d_lookup".

    The reason for this hash regression is two-fold:

    - On 64-bit architectures the down-mixing of the original 64-bit
    word-at-a-time hash into the final 32-bit hash value is very
    simplistic and suboptimal, and just adds the two 32-bit parts
    together.

    In particular, because there is no bit shuffling and the mixing
    boundary is also a byte boundary, similar character patterns in the
    low and high word easily end up just canceling each other out.

    - the old byte-at-a-time hash mixed each byte into the final hash as it
    hashed the path component name, resulting in the low bits of the hash
    generally being a good source of hash data. That is not true for the
    word-at-a-time case, and the hash data is distributed among all the
    bits.

    The fix is the same in both cases: do a better job of mixing the bits up
    and using as much of the hash data as possible. We already have the
    "hash_32|64()" functions to do that.

    Reported-by: Josef Bacik
    Cc: Al Viro
    Cc: Christoph Hellwig
    Cc: Chris Mason
    Cc: linux-fsdevel@vger.kernel.org
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

13 Sep, 2014

2 commits

  • Pull NFS client fixes from Trond Myklebust:
    "Highlights:
    - fix a kernel warning when removing /proc/net/nfsfs
    - revert commit 49a4bda22e18 due to Oopses
    - fix a typo in the pNFS file layout commit code"

    * tag 'nfs-for-3.17-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
    pnfs: fix filelayout_retry_commit when idx > 0
    nfs: revert "nfs4: queue free_lock_state job submission to nfsiod"
    nfs: fix kernel warning when removing proc entry

    Linus Torvalds
     
  • Pull btrfs fixes from Chris Mason:
    "Filipe is doing a careful pass through fsync problems, and these are
    the fixes so far. I'll have one more for rc6 that we're still
    testing.

    My big commit is fixing up some inode hash races that Al Viro found
    (thanks Al)"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
    Btrfs: use insert_inode_locked4 for inode creation
    Btrfs: fix fsync data loss after a ranged fsync
    Btrfs: kfree()ing ERR_PTRs
    Btrfs: fix crash while doing a ranged fsync
    Btrfs: fix corruption after write/fsync failure + fsync + log recovery
    Btrfs: fix autodefrag with compression

    Linus Torvalds
     

11 Sep, 2014

6 commits

  • Merge misc fixes from Andrew Morton:
    "10 fixes"

    * emailed patches from Andrew Morton :
    fs/notify: don't show f_handle if exportfs_encode_inode_fh failed
    fsnotify/fdinfo: use named constants instead of hardcoded values
    kcmp: fix standard comparison bug
    mm/mmap.c: use pr_emerg when printing BUG related information
    shm: add memfd.h to UAPI export list
    checkpatch: allow commit descriptions on separate line from commit id
    sh: get_user_pages_fast() must flush cache
    eventpoll: fix uninitialized variable in epoll_ctl
    kernel/printk/printk.c: fix faulty logic in the case of recursive printk
    mem-hotplug: let memblock skip the hotpluggable memory regions in __next_mem_range()

    Linus Torvalds
     
  • Currently we handle only ENOSPC. In case of other errors the file_handle
    variable isn't filled properly and we will show a part of stack.

    Signed-off-by: Andrey Vagin
    Acked-by: Cyrill Gorcunov
    Cc: Alexander Viro
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrey Vagin
     
  • MAX_HANDLE_SZ is equal to 128, but currently the size of pad is only 64
    bytes, so exportfs_encode_inode_fh can return an error.

    Signed-off-by: Andrey Vagin
    Acked-by: Cyrill Gorcunov
    Cc: Alexander Viro
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrey Vagin
     
  • When calling epoll_ctl with operation EPOLL_CTL_DEL, structure epds is
    not initialized but ep_take_care_of_epollwakeup reads its event field.
    When this unintialized field has EPOLLWAKEUP bit set, a capability check
    is done for CAP_BLOCK_SUSPEND in ep_take_care_of_epollwakeup. This
    produces unexpected messages in the audit log, such as (on a system
    running SELinux):

    type=AVC msg=audit(1408212798.866:410): avc: denied
    { block_suspend } for pid=7754 comm="dbus-daemon" capability=36
    scontext=unconfined_u:unconfined_r:unconfined_t
    tcontext=unconfined_u:unconfined_r:unconfined_t
    tclass=capability2 permissive=1

    type=SYSCALL msg=audit(1408212798.866:410): arch=c000003e syscall=233
    success=yes exit=0 a0=3 a1=2 a2=9 a3=7fffd4d66ec0 items=0 ppid=1
    pid=7754 auid=1000 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0
    fsgid=0 tty=(none) ses=3 comm="dbus-daemon"
    exe="/usr/bin/dbus-daemon"
    subj=unconfined_u:unconfined_r:unconfined_t key=(null)

    ("arch=c000003e syscall=233 a1=2" means "epoll_ctl(op=EPOLL_CTL_DEL)")

    Remove use of epds in epoll_ctl when op == EPOLL_CTL_DEL.

    Fixes: 4d7e30d98939 ("epoll: Add a flag, EPOLLWAKEUP, to prevent suspend while epoll events are ready")
    Signed-off-by: Nicolas Iooss
    Cc: Alexander Viro
    Cc: Arve Hjønnevåg
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nicolas Iooss
     
  • Pull UDF fixes from Jan Kara:
    "Fixes for UDF handling of NFS handles and one fix for proper handling
    of corrupted media"

    * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
    udf: saner calling conventions for udf_new_inode()
    udf: fix the udf_iget() vs. udf_new_inode() races
    udf: merge the pieces inserting a new non-directory object into directory
    udf: Set i_generation field
    udf: Properly detect stale inodes
    udf: Make udf_read_inode() and udf_iget() return error
    udf: Avoid infinite loop when processing indirect ICBs
    udf: Fold udf_fill_inode() into __udf_read_inode()
    udf: Avoid dir link count to go negative

    Linus Torvalds
     
  • filelayout_retry_commit was recently split out from alloc_ds_commits,
    but was done in such a way that the bucket pointer always starts at
    index 0 no matter what the @idx argument is set to.

    The intention of the @idx argument is to retry commits starting at
    bucket @idx. This is called when alloc_ds_commits fails for a bucket.

    Signed-off-by: Weston Andros Adamson
    Signed-off-by: Trond Myklebust

    Weston Andros Adamson
     

10 Sep, 2014

1 commit

  • Pull cifs/smb3 fixes from Steve French:
    "This includes various cifs and smb3 bug fixes including those for bugs
    found with the recently updated xfstests.

    Also I am working fixes for two additional cifs problems found by
    xfstests which I plan to send later (when reviewed and run additional
    tests)"

    * 'for-next-3.17' of git://git.samba.org/sfrench/cifs-2.6:
    Clarify Kconfig help text for CIFS and SMB2/SMB3
    CIFS: Fix wrong filename length for SMB2
    CIFS: Fix wrong restart readdir for SMB1
    CIFS: Fix directory rename error
    cifs: No need to send SIGKILL to demux_thread during umount
    cifs: Allow directIO read/write during cache=strict
    cifs: remove unneeded check of null checking in if condition
    cifs: fix a possible use of uninit variable in SMB2_sess_setup
    cifs: fix memory leak when password is supplied multiple times
    cifs: fix a possible null pointer deref in decode_ascii_ssetup
    Trivial whitespace fix

    Linus Torvalds
     

09 Sep, 2014

9 commits

  • This reverts commit 49a4bda22e186c4d0eb07f4a36b5b1a378f9398d.

    Christoph reported an oops due to the above commit:

    generic/089 242s ...[ 2187.041239] general protection fault: 0000 [#1]
    SMP
    [ 2187.042899] Modules linked in:
    [ 2187.044000] CPU: 0 PID: 11913 Comm: kworker/0:1 Not tainted 3.16.0-rc6+ #1151
    [ 2187.044287] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
    [ 2187.044287] Workqueue: nfsiod free_lock_state_work
    [ 2187.044287] task: ffff880072b50cd0 ti: ffff88007a4ec000 task.ti: ffff88007a4ec000
    [ 2187.044287] RIP: 0010:[] [] free_lock_state_work+0x16/0x30
    [ 2187.044287] RSP: 0018:ffff88007a4efd58 EFLAGS: 00010296
    [ 2187.044287] RAX: 6b6b6b6b6b6b6b6b RBX: ffff88007a947ac0 RCX: 8000000000000000
    [ 2187.044287] RDX: ffffffff826af9e0 RSI: ffff88007b093c00 RDI: ffff88007b093db8
    [ 2187.044287] RBP: ffff88007a4efd58 R08: ffffffff832d3e10 R09: 000001c40efc0000
    [ 2187.044287] R10: 0000000000000000 R11: 0000000000059e30 R12: ffff88007fc13240
    [ 2187.044287] R13: ffff88007fc18b00 R14: ffff88007b093db8 R15: 0000000000000000
    [ 2187.044287] FS: 0000000000000000(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000
    [ 2187.044287] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    [ 2187.044287] CR2: 00007f93ec33fb80 CR3: 0000000079dc2000 CR4: 00000000000006f0
    [ 2187.044287] Stack:
    [ 2187.044287] ffff88007a4efdd8 ffffffff810cc877 ffffffff810cc80d ffff88007fc13258
    [ 2187.044287] 000000007a947af0 0000000000000000 ffffffff8353ccc8 ffffffff82b6f3d0
    [ 2187.044287] 0000000000000000 ffffffff82267679 ffff88007a4efdd8 ffff88007fc13240
    [ 2187.044287] Call Trace:
    [ 2187.044287] [] process_one_work+0x1c7/0x490
    [ 2187.044287] [] ? process_one_work+0x15d/0x490
    [ 2187.044287] [] worker_thread+0x119/0x4f0
    [ 2187.044287] [] ? trace_hardirqs_on+0xd/0x10
    [ 2187.044287] [] ? init_pwq+0x190/0x190
    [ 2187.044287] [] kthread+0xdf/0x100
    [ 2187.044287] [] ? __init_kthread_worker+0x70/0x70
    [ 2187.044287] [] ret_from_fork+0x7c/0xb0
    [ 2187.044287] [] ? __init_kthread_worker+0x70/0x70
    [ 2187.044287] Code: 0f 1f 44 00 00 31 c0 5d c3 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 8d b7 48 fe ff ff 48 8b 87 58 fe ff ff 48 89 e5 48 8b 40 30 8b 00 48 8b 10 48 89 c7 48 8b 92 90 03 00 00 ff 52 28 5d c3
    [ 2187.044287] RIP [] free_lock_state_work+0x16/0x30
    [ 2187.044287] RSP
    [ 2187.103626] ---[ end trace 0f11326d28e5d8fa ]---

    The original reason for this patch was because the fl_release_private
    operation couldn't sleep. With commit ed9814d85810 (locks: defer freeing
    locks in locks_delete_lock until after i_lock has been dropped), this is
    no longer a problem so we can revert this patch.

    Reported-by: Christoph Hellwig
    Signed-off-by: Jeff Layton
    Reviewed-by: Christoph Hellwig
    Tested-by: Christoph Hellwig
    Signed-off-by: Trond Myklebust

    Jeff Layton
     
  • I saw the following kernel warning:

    [ 1852.321222] ------------[ cut here ]------------
    [ 1852.326527] WARNING: CPU: 0 PID: 118 at fs/proc/generic.c:521 remove_proc_entry+0x154/0x16b()
    [ 1852.335630] remove_proc_entry: removing non-empty directory 'fs/nfsfs', leaking at least 'volumes'
    [ 1852.344084] CPU: 0 PID: 118 Comm: kworker/u8:2 Not tainted 3.16.0+ #540
    [ 1852.350036] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
    [ 1852.354992] Workqueue: netns cleanup_net
    [ 1852.358701] 0000000000000000 ffff880116f2fbd0 ffffffff819c03e9 ffff880116f2fc18
    [ 1852.366474] ffff880116f2fc08 ffffffff810744ee ffffffff811e0e6e ffff8800d4e96238
    [ 1852.373507] ffffffff81dbe665 ffff8800d46a5948 0000000000000005 ffff880116f2fc68
    [ 1852.380224] Call Trace:
    [ 1852.381976] [] dump_stack+0x4d/0x66
    [ 1852.385495] [] warn_slowpath_common+0x7a/0x93
    [ 1852.389869] [] ? remove_proc_entry+0x154/0x16b
    [ 1852.393987] [] warn_slowpath_fmt+0x4c/0x4e
    [ 1852.397999] [] remove_proc_entry+0x154/0x16b
    [ 1852.402034] [] nfs_fs_proc_net_exit+0x53/0x56
    [ 1852.406136] [] nfs_net_exit+0x12/0x1d
    [ 1852.409774] [] ops_exit_list+0x44/0x55
    [ 1852.413529] [] cleanup_net+0xee/0x182
    [ 1852.417198] [] process_one_work+0x209/0x40d
    [ 1852.502320] [] ? process_one_work+0x162/0x40d
    [ 1852.587629] [] worker_thread+0x1f0/0x2c7
    [ 1852.673291] [] ? process_scheduled_works+0x2f/0x2f
    [ 1852.759470] [] kthread+0xc9/0xd1
    [ 1852.843099] [] ? finish_task_switch+0x3a/0xce
    [ 1852.926518] [] ? __kthread_parkme+0x61/0x61
    [ 1853.008565] [] ret_from_fork+0x7c/0xb0
    [ 1853.076477] [] ? __kthread_parkme+0x61/0x61
    [ 1853.140653] ---[ end trace 69c4c6617f78e32d ]---

    It looks wrong that we add "/proc/net/nfsfs" in nfs_fs_proc_net_init()
    while remove "/proc/fs/nfsfs" in nfs_fs_proc_net_exit().

    Fixes: commit 65b38851a17 (NFS: Fix /proc/fs/nfsfs/servers and /proc/fs/nfsfs/volumes)
    Cc: Eric W. Biederman
    Cc: Trond Myklebust
    Cc: Dan Aloni
    Signed-off-by: Cong Wang
    [Trond: replace uses of remove_proc_entry() with remove_proc_subtree()
    as suggested by Al Viro]
    Cc: stable@vger.kernel.org # 3.4.x : 65b38851a17: NFS: Fix /proc/fs/nfsfs/servers
    Cc: stable@vger.kernel.org # 3.4.x
    Signed-off-by: Trond Myklebust

    Cong Wang
     
  • Pull ext4 bugfix from Ted Ts'o.

    [ Hmm. It's possible we should make kfree() aware of error pointers,
    and use IS_ERR_OR_NULL rather than a NULL check. But in the meantime
    this is obviously the right fix. - Linus ]

    * 'for_linus_urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
    ext4: avoid trying to kfree an ERR_PTR pointer

    Linus Torvalds
     
  • Pull nfsd bugfixes from Bruce Fields:
    "A couple minor nfsd bugfixes"

    * 'for-3.17' of git://linux-nfs.org/~bfields/linux:
    lockd: fix rpcbind crash on lockd startup failure
    nfsd4: fix rd_dircount enforcement

    Linus Torvalds
     
  • Btrfs was inserting inodes into the hash table before we had fully
    set the inode up on disk. This leaves us open to rare races that allow
    two different inodes in memory for the same [root, inode] pair.

    This patch fixes things by using insert_inode_locked4 to insert an I_NEW
    inode and unlock_new_inode when we're ready for the rest of the kernel
    to use the inode.

    It also makes sure to init the operations pointers on the inode before
    going into the error handling paths.

    Signed-off-by: Chris Mason
    Reported-by: Al Viro

    Chris Mason
     
  • While we're doing a full fsync (when the inode has the flag
    BTRFS_INODE_NEEDS_FULL_SYNC set) that is ranged too (covers only a
    portion of the file), we might have ordered operations that are started
    before or while we're logging the inode and that fall outside the fsync
    range.

    Therefore when a full ranged fsync finishes don't remove every extent
    map from the list of modified extent maps - as for some of them, that
    fall outside our fsync range, their respective ordered operation hasn't
    finished yet, meaning the corresponding file extent item wasn't inserted
    into the fs/subvol tree yet and therefore we didn't log it, and we must
    let the next fast fsync (one that checks only the modified list) see this
    extent map and log a matching file extent item to the log btree and wait
    for its ordered operation to finish (if it's still ongoing).

    A test case for xfstests follows.

    Signed-off-by: Filipe Manana
    Signed-off-by: Chris Mason

    Filipe Manana
     
  • The "inherit" in btrfs_ioctl_snap_create_v2() and "vol_args" in
    btrfs_ioctl_rm_dev() are ERR_PTRs so we can't call kfree() on them.

    These kind of bugs are "One Err Bugs" where there is just one error
    label that does everything. I could set the "inherit = NULL" and keep
    the single out label but it ends up being more complicated that way. It
    makes the code simpler to re-order the unwind so it's in the mirror
    order of the allocation and introduce some new error labels.

    Signed-off-by: Dan Carpenter
    Signed-off-by: Chris Mason

    Dan Carpenter
     
  • Nikita Yuschenko reported that booting a kernel with init=/bin/sh and
    then nfs mounting without portmap or rpcbind running using a busybox
    mount resulted in:

    # mount -t nfs 10.30.130.21:/opt /mnt
    svc: failed to register lockdv1 RPC service (errno 111).
    lockd_up: makesock failed, error=-111
    Unable to handle kernel paging request for data at address 0x00000030
    Faulting instruction address: 0xc055e65c
    Oops: Kernel access of bad area, sig: 11 [#1]
    MPC85xx CDS
    Modules linked in:
    CPU: 0 PID: 1338 Comm: mount Not tainted 3.10.44.cge #117
    task: cf29cea0 ti: cf35c000 task.ti: cf35c000
    NIP: c055e65c LR: c0566490 CTR: c055e648
    REGS: cf35dad0 TRAP: 0300 Not tainted (3.10.44.cge)
    MSR: 00029000 CR: 22442488 XER: 20000000
    DEAR: 00000030, ESR: 00000000

    GPR00: c05606f4 cf35db80 cf29cea0 cf0ded80 cf0dedb8 00000001 1dec3086
    00000000
    GPR08: 00000000 c07b1640 00000007 1dec3086 22442482 100b9758 00000000
    10090ae8
    GPR16: 00000000 000186a5 00000000 00000000 100c3018 bfa46edc 100b0000
    bfa46ef0
    GPR24: cf386ae0 c07834f0 00000000 c0565f88 00000001 cf0dedb8 00000000
    cf0ded80
    NIP [c055e65c] call_start+0x14/0x34
    LR [c0566490] __rpc_execute+0x70/0x250
    Call Trace:
    [cf35db80] [00000080] 0x80 (unreliable)
    [cf35dbb0] [c05606f4] rpc_run_task+0x9c/0xc4
    [cf35dbc0] [c0560840] rpc_call_sync+0x50/0xb8
    [cf35dbf0] [c056ee90] rpcb_register_call+0x54/0x84
    [cf35dc10] [c056f24c] rpcb_register+0xf8/0x10c
    [cf35dc70] [c0569e18] svc_unregister.isra.23+0x100/0x108
    [cf35dc90] [c0569e38] svc_rpcb_cleanup+0x18/0x30
    [cf35dca0] [c0198c5c] lockd_up+0x1dc/0x2e0
    [cf35dcd0] [c0195348] nlmclnt_init+0x2c/0xc8
    [cf35dcf0] [c015bb5c] nfs_start_lockd+0x98/0xec
    [cf35dd20] [c015ce6c] nfs_create_server+0x1e8/0x3f4
    [cf35dd90] [c0171590] nfs3_create_server+0x10/0x44
    [cf35dda0] [c016528c] nfs_try_mount+0x158/0x1e4
    [cf35de20] [c01670d0] nfs_fs_mount+0x434/0x8c8
    [cf35de70] [c00cd3bc] mount_fs+0x20/0xbc
    [cf35de90] [c00e4f88] vfs_kern_mount+0x50/0x104
    [cf35dec0] [c00e6e0c] do_mount+0x1d0/0x8e0
    [cf35df10] [c00e75ac] SyS_mount+0x90/0xd0
    [cf35df40] [c000ccf4] ret_from_syscall+0x0/0x3c

    The addition of svc_shutdown_net() resulted in two calls to
    svc_rpcb_cleanup(); the second is no longer necessary and crashes when
    it calls rpcb_register_call with clnt=NULL.

    Reported-by: Nikita Yushchenko
    Fixes: 679b033df484 "lockd: ensure we tear down any live sockets when socket creation fails during lockd_up"
    Cc: stable@vger.kernel.org
    Acked-by: Jeff Layton
    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • Commit 3b299709091b "nfsd4: enforce rd_dircount" totally misunderstood
    rd_dircount; it refers to total non-attribute bytes returned, not number
    of directory entries returned.

    Bring the code into agreement with RFC 3530 section 14.2.24.

    Cc: stable@vger.kernel.org
    Fixes: 3b299709091b "nfsd4: enforce rd_dircount"
    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     

08 Sep, 2014

2 commits

  • Pull filesystem fixes from Al Viro:
    "Several bugfixes (all of them -stable fodder).

    Alexey's one deals with double mutex_lock() in UFS (apparently, nobody
    has tried to test "ufs: sb mutex merge + mutex_destroy" on something
    like file creation/removal on ufs). Mine deal with two kinds of
    umount bugs, in umount propagation and in handling of automounted
    submounts, both resulting in bogus transient EBUSY from umount"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    ufs: fix deadlocks introduced by sb mutex merge
    fix EBUSY on umount() from MNT_SHRINKABLE
    get rid of propagate_umount() mistakenly treating slaves as busy.

    Linus Torvalds
     
  • Commit 0244756edc4b ("ufs: sb mutex merge + mutex_destroy") introduces
    deadlocks in ufs_new_inode() and ufs_free_inode().
    Most callers of that functions acqure the mutex by themselves and
    ufs_{new,free}_inode() do that via lock_ufs(),
    i.e we have an unavoidable double lock.

    The patch proposes to resolve the issue by making sure that
    ufs_{new,free}_inode() are not called with the mutex held.

    Found by Linux Driver Verification project (linuxtesting.org).

    Cc: stable@vger.kernel.org # 3.16
    Signed-off-by: Alexey Khoroshilov
    Signed-off-by: Al Viro

    Alexey Khoroshilov
     

07 Sep, 2014

1 commit

  • Pull xfs fixes from Dave Chinner:
    "The fixes all address recently discovered data corruption issues.

    The original Direct IO issue was discovered by Chris Mason @ Facebook
    on a production workload which mixed buffered reads with direct reads
    and writes IO to the same file. The fix for that exposed other issues
    with page invalidation (exposed by millions of fsx operations) failing
    due to dirty buffers beyond EOF.

    Finally, the collapse_range code could also cause problems due to
    racing writeback changing the extent map while it was being shifted
    around. The commits for that problem are simple mitigation fixes that
    prevent the problem from occuring. A more robust fix for 3.18 that
    addresses the underlying problem is currently being worked on by
    Brian.

    Summary of fixes:
    - a direct IO read/buffered read data corruption
    - the associated fallout from the DIO data corruption fix
    - collapse range bugs that are potential data corruption issues"

    * tag 'xfs-for-linus-3.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs:
    xfs: trim eofblocks before collapse range
    xfs: xfs_file_collapse_range is delalloc challenged
    xfs: don't log inode unless extent shift makes extent modifications
    xfs: use ranged writeback and invalidation for direct IO
    xfs: don't zero partial page cache pages during O_DIRECT writes
    xfs: don't zero partial page cache pages during O_DIRECT writes
    xfs: don't dirty buffers beyond EOF

    Linus Torvalds
     

05 Sep, 2014

9 commits

  • This patch changes sync_filesystem() to be EXPORT_SYMBOL().

    The reason this is needed is that starting with 3.15 kernel, due to
    Theodore Ts'o's commit 02b9984d6408 ("fs: push sync_filesystem() down to
    the file system's remount_fs()"), all file systems that have dirty data
    to be written out need to call sync_filesystem() from their
    ->remount_fs() method when remounting read-only.

    As this is now a generically required function rather than an internal
    only function it should be EXPORT_SYMBOL() so that all file systems can
    call it.

    Signed-off-by: Anton Altaparmakov
    Acked-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Anton Altaparmakov
     
  • Pull aio bugfixes from Ben LaHaise:
    "Two small fixes"

    * git://git.kvack.org/~bcrl/aio-fixes:
    aio: block exit_aio() until all context requests are completed
    aio: add missing smp_rmb() in read_events_ring

    Linus Torvalds
     
  • It seems that exit_aio() also needs to wait for all iocbs to complete (like
    io_destroy), but we missed the wait step in current implemention, so fix
    it in the same way as we did in io_destroy.

    Signed-off-by: Gu Zheng
    Signed-off-by: Benjamin LaHaise
    Cc: stable@vger.kernel.org

    Gu Zheng
     
  • Signed-off-by: Al Viro
    Signed-off-by: Jan Kara

    Al Viro
     
  • Currently udf_iget() (triggered by NFS) can race with udf_new_inode()
    leading to two inode structures with the same inode number:

    nfsd: iget_locked() creates inode
    nfsd: try to read from disk, block on that.
    udf_new_inode(): allocate inode with that inumber
    udf_new_inode(): insert it into icache, set it up and dirty
    udf_write_inode(): write inode into buffer cache
    nfsd: get CPU again, look into buffer cache, see nice and sane on-disk
    inode, set the in-core inode from it

    Fix the problem by putting inode into icache in locked state (I_NEW set)
    and unlocking it only after it's fully set up.

    Signed-off-by: Al Viro
    Signed-off-by: Jan Kara

    Al Viro
     
  • boilerplate code in udf_{create,mknod,symlink} taken to new helper

    symlink case converted to unique id calculated by udf_new_inode() - no
    point finding a new one.

    Signed-off-by: Al Viro
    Signed-off-by: Jan Kara

    Al Viro
     
  • Currently UDF doesn't initialize i_generation in any way and thus NFS
    can easily get reallocated inodes from stale file handles. Luckily UDF
    already has a unique object identifier associated with each inode -
    i_unique. Use that for initialization of i_generation.

    Signed-off-by: Jan Kara

    Jan Kara
     
  • NFS can easily ask for inodes that are already deleted. Currently UDF
    happily returns such inodes which is a bug. Return -ESTALE if
    udf_read_inode() is asked to read deleted inode.

    Signed-off-by: Jan Kara

    Jan Kara
     
  • Currently __udf_read_inode() wasn't returning anything and we found out
    whether we succeeded reading inode by checking whether inode is bad or
    not. udf_iget() returned NULL on failure and inode pointer otherwise.
    Make these two functions properly propagate errors up the call stack and
    use the return value in callers.

    Signed-off-by: Jan Kara

    Jan Kara
     

04 Sep, 2014

3 commits