05 Aug, 2016

5 commits

  • vfs_{create,mkdir,mknod} each begin with a call to may_create(), which
    returns EEXIST if the object already exists.

    This check is therefore unnecessary.

    (In the NFSv2 case, nfsd_proc_create also has such a check. Contrary to
    RFC 1094, our code seems to believe that a CREATE of an existing file
    should succeed. I'm leaving that behavior alone.)

    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • There's some odd logic in nfsd_create() that allows it to be called with
    the parent directory either locked or unlocked. The only already-locked
    caller is NFSv2's nfsd_proc_create(). It's less confusing to split out
    the unlocked case into a separate function which the NFSv2 code can call
    directly.

    Also fix some comments while we're here.

    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • Create and other nfsd ops generally assume we can call lookup_one_len on
    inodes with S_IFDIR set. Al says that this assumption isn't true in
    general, though it should be for the filesystem objects nfsd sees.

    Add a check just to make sure our assumption isn't violated.

    Remove a couple checks for i_op->lookup in create code.

    Cc: Al Viro
    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • lookup_one_len already has this check.

    The only effect of this patch is to return access instead of perm in the
    0-length-filename case. I actually prefer nfserr_perm (or _inval?), but
    I doubt anyone cares.

    The isdotent check seems redundant too, but I worry that some client
    might actually care about that strange nfserr_exist error.

    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • When doing a create (mkdir/mknod) on a name, it's worth
    checking the name exists first before returning EACCES in case
    the directory is not writeable by the user.
    This makes return values on the client more consistent
    regardless of whenever the entry there is cached in the local
    cache or not.
    Another positive side effect is certain programs only expect
    EEXIST in that case even despite POSIX allowing any valid
    error to be returned.

    Signed-off-by: Oleg Drokin
    Signed-off-by: J. Bruce Fields

    Oleg Drokin
     

02 Aug, 2016

2 commits

  • This modification is useful for debugging issues that happen while
    the socket is being initialised.

    Signed-off-by: Trond Myklebust
    Signed-off-by: J. Bruce Fields

    Trond Myklebust
     
  • We're seeing traces of the following form:

    [10952.396347] svc: transport ffff88042ba4a 000 dequeued, inuse=2
    [10952.396351] svc: tcp_accept ffff88042ba4 a000 sock ffff88042a6e4c80
    [10952.396362] nfsd: connect from 10.2.6.1, port=187
    [10952.396364] svc: svc_setup_socket ffff8800b99bcf00
    [10952.396368] setting up TCP socket for reading
    [10952.396370] svc: svc_setup_socket created ffff8803eb10a000 (inet ffff88042b75b800)
    [10952.396373] svc: transport ffff8803eb10a000 put into queue
    [10952.396375] svc: transport ffff88042ba4a000 put into queue
    [10952.396377] svc: server ffff8800bb0ec000 waiting for data (to = 3600000)
    [10952.396380] svc: transport ffff8803eb10a000 dequeued, inuse=2
    [10952.396381] svc_recv: found XPT_CLOSE
    [10952.396397] svc: svc_delete_xprt(ffff8803eb10a000)
    [10952.396398] svc: svc_tcp_sock_detach(ffff8803eb10a000)
    [10952.396399] svc: svc_sock_detach(ffff8803eb10a000)
    [10952.396412] svc: svc_sock_free(ffff8803eb10a000)

    i.e. an immediate close of the socket after initialisation.

    The culprit appears to be the test at the end of svc_tcp_init, which
    checks if the newly created socket is in the TCP_ESTABLISHED state,
    and immediately closes it if not. The evidence appears to suggest that
    the socket might still be in the SYN_RECV state at this time.

    The fix is to check for both states, and then to add a check in
    svc_tcp_state_change() to ensure we don't close the socket when
    it transitions into TCP_ESTABLISHED.

    Signed-off-by: Trond Myklebust
    Signed-off-by: J. Bruce Fields

    Trond Myklebust
     

16 Jul, 2016

4 commits

  • If the underlying filesystem supports multiple layout types, then there
    is little reason not to advertise that fact to clients and let them
    choose what type to use.

    Turn the ex_layout_type field into a bitfield. For each supported
    layout type, we set a bit in that field. When the client requests a
    layout, ensure that the bit for that layout type is set. When the
    client requests attributes, send back a list of supported types.

    Signed-off-by: Jeff Layton
    Reviewed-by: Weston Andros Adamson
    Signed-off-by: J. Bruce Fields

    Jeff Layton
     
  • nfsd4_release_lockowner finds a lock owner that has no lock state,
    and drops cl_lock. Then release_lockowner picks up cl_lock and
    unhashes the lock owner.

    During the window where cl_lock is dropped, I don't see anything
    preventing a concurrent nfsd4_lock from finding that same lock owner
    and adding lock state to it.

    Move release_lockowner() into nfsd4_release_lockowner and hang onto
    the cl_lock until after the lock owner's state cannot be found
    again.

    Found by inspection, we don't currently have a reproducer.

    Fixes: 2c41beb0e5cf ("nfsd: reduce cl_lock thrashing in ... ")
    Reviewed-by: Jeff Layton
    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • These values are all multiples of 4 already, so there's no change in
    behavior from this patch. But perhaps this will prevent mistakes in the
    future.

    Signed-off-by: Kinglong Mee
    Signed-off-by: J. Bruce Fields

    Kinglong Mee
     
  • Instead of creeping pnfs layout configuration into filesystems, move the
    definition of block-based export operations under a more abstract
    configuration.

    Signed-off-by: Benjamin Coddington
    Reviewed-by: Christoph Hellwig
    Acked-by: Dave Chinner
    Signed-off-by: J. Bruce Fields

    Benjamin Coddington
     

14 Jul, 2016

17 commits


02 Jul, 2016

1 commit


01 Jul, 2016

2 commits

  • (Another one for the f_path debacle.)

    ltp fcntl33 testcase caused an Oops in selinux_file_send_sigiotask.

    The reason is that generic_add_lease() used filp->f_path.dentry->inode
    while all the others use file_inode(). This makes a difference for files
    opened on overlayfs since the former will point to the overlay inode the
    latter to the underlying inode.

    So generic_add_lease() added the lease to the overlay inode and
    generic_delete_lease() removed it from the underlying inode. When the file
    was released the lease remained on the overlay inode's lock list, resulting
    in use after free.

    Reported-by: Eryu Guan
    Fixes: 4bacc9c9234c ("overlayfs: Make f_path always point to the overlay and f_inode to the underlay")
    Cc:
    Signed-off-by: Miklos Szeredi
    Reviewed-by: Jeff Layton
    Signed-off-by: J. Bruce Fields

    Miklos Szeredi
     
  • If the lockd service fails to start up then we need to be sure that the
    notifier blocks are not registered, otherwise a subsequent start of the
    service could cause the same notifier to be registered twice, leading to
    soft lockups.

    Signed-off-by: Scott Mayhew
    Cc: stable@vger.kernel.org
    Fixes: 0751ddf77b6a "lockd: Register callbacks on the inetaddr_chain..."
    Signed-off-by: J. Bruce Fields

    Scott Mayhew
     

27 Jun, 2016

2 commits

  • Linus Torvalds
     
  • Pull SCSI fixes from James Bottomley:
    "Two straightforward fixes.

    One is a concurrency issue only affecting SAS connected SATA drives,
    but which could hang the storage subsystem if it triggers (because the
    outstanding command count on error never goes back to zero) and the
    other is a NO_TAG fallout from the switch to hostwide tags which
    causes the system to crash on module insertion (we've checked
    carefully and only the 53c700 family of drivers is vulnerable to this
    issue)"

    * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
    53c700: fix BUG on untagged commands
    scsi: fix race between simultaneous decrements of ->host_failed

    Linus Torvalds
     

25 Jun, 2016

7 commits

  • …git/mason/linux-btrfs

    Pull btrfs fixes part 2 from Chris Mason:
    "This has one patch from Omar to bring iterate_shared back to btrfs.

    We have a tree of work we queue up for directory items and it doesn't
    lend itself well to shared access. While we're cleaning it up, Omar
    has changed things to use an exclusive lock when there are delayed
    items"

    * 'for-linus-4.7-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
    Btrfs: fix ->iterate_shared() by upgrading i_rwsem for delayed nodes

    Linus Torvalds
     
  • Pull btrfs fixes from Chris Mason:
    "I have a two part pull this time because one of the patches Dave
    Sterba collected needed to be against v4.7-rc2 or higher (we used
    rc4). I try to make my for-linus-xx branch testable on top of the
    last major so we can hand fixes to people on the list more easily, so
    I've split this pull in two.

    This first part has some fixes and two performance improvements that
    we've been testing for some time.

    Josef's two performance fixes are most notable. The transid tracking
    patch makes a big improvement on pretty much every workload"

    * 'for-linus-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
    Btrfs: Force stripesize to the value of sectorsize
    btrfs: fix disk_i_size update bug when fallocate() fails
    Btrfs: fix error handling in map_private_extent_buffer
    Btrfs: fix error return code in btrfs_init_test_fs()
    Btrfs: don't do nocow check unless we have to
    btrfs: fix deadlock in delayed_ref_async_start
    Btrfs: track transid for delayed ref flushing

    Linus Torvalds
     
  • Pull sound fixes from Takashi Iwai:
    "Again pretty calm weeks: we've had only a few trivial / stable
    HD-audio fixes in addition to a possible race fix for snd-dummy driver
    spotted by syzkaller"

    * tag 'sound-4.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
    ALSA: dummy: Fix a use-after-free at closing
    ALSA: hda / realtek - add two more Thinkpad IDs (5050,5053) for tpt460 fixup
    ALSA: hda - Fix the headset mic jack detection on Dell machine
    ALSA: hda/tegra: iomem fixups for sparse warnings
    ALSA: hdac_regmap - fix the register access for runtime PM

    Linus Torvalds
     
  • Pull x86 kprobe fix from Thomas Gleixner:
    "A single fix clearing the TF bit when a fault is single stepped"

    * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    kprobes/x86: Clear TF bit in fault on single-stepping

    Linus Torvalds
     
  • Pull scheduler fixes from Thomas Gleixner:
    "A couple of scheduler fixes:

    - force watchdog reset while processing sysrq-w

    - fix a deadlock when enabling trace events in the scheduler

    - fixes to the throttled next buddy logic

    - fixes for the average accounting (missing serialization and
    underflow handling)

    - allow kernel threads for fallback to online but not active cpus"

    * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    sched/core: Allow kthreads to fall back to online && !active cpus
    sched/fair: Do not announce throttled next buddy in dequeue_task_fair()
    sched/fair: Initialize throttle_count for new task-groups lazily
    sched/fair: Fix cfs_rq avg tracking underflow
    kernel/sysrq, watchdog, sched/core: Reset watchdog on all CPUs while processing sysrq-w
    sched/debug: Fix deadlock when enabling sched events
    sched/fair: Fix post_init_entity_util_avg() serialization

    Linus Torvalds
     
  • Commit fe742fd4f90f ("Revert "btrfs: switch to ->iterate_shared()"")
    backed out the conversion to ->iterate_shared() for Btrfs because the
    delayed inode handling in btrfs_real_readdir() is racy. However, we can
    still do readdir in parallel if there are no delayed nodes.

    This is a temporary fix which upgrades the shared inode lock to an
    exclusive lock only when we have delayed items until we come up with a
    more complete solution. While we're here, rename the
    btrfs_{get,put}_delayed_items functions to make it very clear that
    they're just for readdir.

    Tested with xfstests and by doing a parallel kernel build:

    while make tinyconfig && make -j4 && git clean dqfx; do
    :
    done

    along with a bunch of parallel finds in another shell:

    while true; do
    for ((i=0; i/dev/null &
    done
    wait
    done

    Signed-off-by: Omar Sandoval
    Signed-off-by: David Sterba
    Signed-off-by: Chris Mason

    Omar Sandoval
     
  • Pull locking fix from Thomas Gleixner:
    "A single fix to address a race in the static key logic"

    * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    locking/static_key: Fix concurrent static_key_slow_inc()

    Linus Torvalds