28 Oct, 2017

3 commits

  • Pull cifs fixes from Steve French:
    "Various SMB3 fixes for 4.14 and stable"

    * tag '4.14-smb3-fixes-for-stable' of git://git.samba.org/sfrench/cifs-2.6:
    SMB3: Validate negotiate request must always be signed
    SMB: fix validate negotiate info uninitialised memory use
    SMB: fix leak of validate negotiate info response buffer
    CIFS: Fix NULL pointer deref on SMB2_tcon() failure
    CIFS: do not send invalid input buffer on QUERY_INFO requests
    cifs: Select all required crypto modules
    CIFS: SMBD: Fix the definition for SMB2_CHANNEL_RDMA_V1_INVALIDATE
    cifs: handle large EA requests more gracefully in smb2+
    Fix encryption labels and lengths for SMB3.1.1

    Linus Torvalds
     
  • Pull overlayfs fixes from Miklos Szeredi:
    "Fix several issues, most of them introduced in the last release"

    * 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
    ovl: do not cleanup unsupported index entries
    ovl: handle ENOENT on index lookup
    ovl: fix EIO from lookup of non-indexed upper
    ovl: Return -ENOMEM if an allocation fails ovl_lookup()
    ovl: add NULL check in ovl_alloc_inode

    Linus Torvalds
     
  • Pull fuse fix from Miklos Szeredi:
    "This fixes a longstanding bug, which can be triggered by interrupting
    a directory reading syscall"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
    fuse: fix READDIRPLUS skipping an entry

    Linus Torvalds
     

27 Oct, 2017

1 commit


26 Oct, 2017

7 commits


25 Oct, 2017

2 commits

  • Marios Titas running a Haskell program noticed a problem with fuse's
    readdirplus: when it is interrupted by a signal, it skips one directory
    entry.

    The reason is that fuse erronously updates ctx->pos after a failed
    dir_emit().

    The issue originates from the patch adding readdirplus support.

    Reported-by: Jakob Unterwurzacher
    Tested-by: Marios Titas
    Signed-off-by: Miklos Szeredi
    Fixes: 0b05b18381ee ("fuse: implement NFS-like readdirplus support")
    Cc: # v3.9

    Miklos Szeredi
     
  • sparse warns:

    fs/ceph/caps.c:2042:9: warning: context imbalance in 'try_flush_caps' - wrong count at exit

    We need to exit this function with the lock unlocked, but a couple of
    cases leave it locked.

    Cc: stable@vger.kernel.org
    Signed-off-by: Jeff Layton
    Reviewed-by: "Yan, Zheng"
    Reviewed-by: Ilya Dryomov
    Signed-off-by: Ilya Dryomov

    Jeff Layton
     

24 Oct, 2017

4 commits

  • With index=on, ovl_indexdir_cleanup() tries to cleanup invalid index
    entries (e.g. bad index name). This behavior could result in cleaning of
    entries created by newer kernels and is therefore undesirable.
    Instead, abort mount if such entries are encountered. We still cleanup
    'stale' entries and 'orphan' entries, both those cases can be a result
    of offline changes to lower and upper dirs.

    When encoutering an index entry of type directory or whiteout, kernel
    was supposed to fallback to read-only mount, but the fill_super()
    operation returns EROFS in this case instead of returning success with
    read-only mount flag, so mount fails when encoutering directory or
    whiteout index entries. Bless this behavior by returning -EINVAL on
    directory and whiteout index entries as we do for all unsupported index
    entries.

    Fixes: 61b674710cd9 ("ovl: do not cleanup directory and whiteout index..")
    Cc: # v4.13
    Signed-off-by: Amir Goldstein

    Amir Goldstein
     
  • Treat ENOENT from index entry lookup the same way as treating a returned
    negative dentry. Apparently, either could be returned if file is not
    found, depending on the underlying file system.

    Fixes: 359f392ca53e ("ovl: lookup index entry for copy up origin")
    Cc: # v4.13
    Signed-off-by: Amir Goldstein

    Amir Goldstein
     
  • Commit fbaf94ee3cd5 ("ovl: don't set origin on broken lower hardlink")
    attempt to avoid the condition of non-indexed upper inode with lower
    hardlink as origin. If this condition is found, lookup returns EIO.

    The protection of commit mentioned above does not cover the case of lower
    that is not a hardlink when it is copied up (with either index=off/on)
    and then lower is hardlinked while overlay is offline.

    Changes to lower layer while overlayfs is offline should not result in
    unexpected behavior, so a permanent EIO error after creating a link in
    lower layer should not be considered as correct behavior.

    This fix replaces EIO error with success in cases where upper has origin
    but no index is found, or index is found that does not match upper
    inode. In those cases, lookup will not fail and the returned overlay inode
    will be hashed by upper inode instead of by lower origin inode.

    Fixes: 359f392ca53e ("ovl: lookup index entry for copy up origin")
    Cc: # v4.13
    Signed-off-by: Amir Goldstein
    Signed-off-by: Miklos Szeredi

    Amir Goldstein
     
  • Apparently our current rwsem code doesn't like doing the trylock, then
    lock for real scheme. So change our read/write methods to just do the
    trylock for the RWF_NOWAIT case. This fixes a ~25% regression in
    AIM7.

    Fixes: 91f9943e ("fs: support RWF_NOWAIT for buffered reads")
    Reported-by: kernel test robot
    Signed-off-by: Christoph Hellwig
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Christoph Hellwig
     

22 Oct, 2017

1 commit


20 Oct, 2017

2 commits

  • …jmorris/linux-security

    Pull key handling fixes from James Morris:
    "This includes a fix for the capabilities code from Colin King, and a
    set of further fixes for the keys subsystem. From David:

    - Fix a bunch of places where kernel drivers may access revoked
    user-type keys and don't do it correctly.

    - Fix some ecryptfs bits.

    - Fix big_key to require CONFIG_CRYPTO.

    - Fix a couple of bugs in the asymmetric key type.

    - Fix a race between updating and finding negative keys.

    - Prevent add_key() from updating uninstantiated keys.

    - Make loading of key flags and expiry time atomic when not holding
    locks"

    * 'fixes-v4.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
    commoncap: move assignment of fs_ns to avoid null pointer dereference
    pkcs7: Prevent NULL pointer dereference, since sinfo is not always set.
    KEYS: load key flags and expiry time atomically in proc_keys_show()
    KEYS: Load key expiry time atomically in keyring_search_iterator()
    KEYS: load key flags and expiry time atomically in key_validate()
    KEYS: don't let add_key() update an uninstantiated key
    KEYS: Fix race between updating and finding a negative key
    KEYS: checking the input id parameters before finding asymmetric key
    KEYS: Fix the wrong index when checking the existence of second id
    security/keys: BIG_KEY requires CONFIG_CRYPTO
    ecryptfs: fix dereference of NULL user_key_payload
    fscrypt: fix dereference of NULL user_key_payload
    lib/digsig: fix dereference of NULL user_key_payload
    FS-Cache: fix dereference of NULL user_key_payload
    KEYS: encrypted: fix dereference of NULL user_key_payload

    Linus Torvalds
     
  • This introduces a "register private expedited" membarrier command which
    allows eventual removal of important memory barrier constraints on the
    scheduler fast-paths. It changes how the "private expedited" membarrier
    command (new to 4.14) is used from user-space.

    This new command allows processes to register their intent to use the
    private expedited command. This affects how the expedited private
    command introduced in 4.14-rc is meant to be used, and should be merged
    before 4.14 final.

    Processes are now required to register before using
    MEMBARRIER_CMD_PRIVATE_EXPEDITED, otherwise that command returns EPERM.

    This fixes a problem that arose when designing requested extensions to
    sys_membarrier() to allow JITs to efficiently flush old code from
    instruction caches. Several potential algorithms are much less painful
    if the user register intent to use this functionality early on, for
    example, before the process spawns the second thread. Registering at
    this time removes the need to interrupt each and every thread in that
    process at the first expedited sys_membarrier() system call.

    Signed-off-by: Mathieu Desnoyers
    Acked-by: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Ingo Molnar
    Cc: Alexander Viro
    Signed-off-by: Linus Torvalds

    Mathieu Desnoyers
     

19 Oct, 2017

9 commits


17 Oct, 2017

6 commits

  • Currently we try to defer completion of async DIO to the process context
    in case there are any mapped pages associated with the inode so that we
    can invalidate the pages when the IO completes. However the check is racy
    and the pages can be mapped afterwards. If this happens we might end up
    calling invalidate_inode_pages2_range() in dio_complete() in interrupt
    context which could sleep. This can be reproduced by generic/451.

    Fix this by passing the information whether we can or can't invalidate
    to the dio_complete(). Thanks Eryu Guan for reporting this and Jan Kara
    for suggesting a fix.

    Fixes: 332391a9935d ("fs: Fix page cache inconsistency when mixing buffered and AIO DIO")
    Reported-by: Eryu Guan
    Reviewed-by: Jan Kara
    Tested-by: Eryu Guan
    Signed-off-by: Lukas Czerner
    Signed-off-by: Jens Axboe

    Lukas Czerner
     
  • The mount i_version flag is not enabled in the new sb_flags. This patch
    adds the missing SB_I_VERSION flag.

    Fixes: e462ec5 "VFS: Differentiate mount flags (MS_*) from internal
    superblock flags"
    Signed-off-by: Mimi Zohar
    Signed-off-by: Al Viro

    Mimi Zohar
     
  • The last cleanup introduced two harmless warnings:

    fs/xfs/xfs_fsmap.c:480:1: warning: '__xfs_getfsmap_rtdev' defined but not used
    fs/xfs/xfs_fsmap.c:372:1: warning: 'xfs_getfsmap_rtdev_rtbitmap_helper' defined but not used

    This moves those two functions as well.

    Fixes: bb9c2e543325 ("xfs: move more RT specific code under CONFIG_XFS_RT")
    Signed-off-by: Arnd Bergmann
    Reviewed-by: Brian Foster
    Acked-by: Geert Uytterhoeven
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Arnd Bergmann
     
  • The writeback rework in commit fbcc02561359 ("xfs: Introduce
    writeback context for writepages") introduced a subtle change in
    behavior with regard to the block mapping used across the
    ->writepages() sequence. The previous xfs_cluster_write() code would
    only flush pages up to EOF at the time of the writepage, thus
    ensuring that any pages due to file-extending writes would be
    handled on a separate cycle and with a new, updated block mapping.

    The updated code establishes a block mapping in xfs_writepage_map()
    that could extend beyond EOF if the file has post-eof preallocation.
    Because we now use the generic writeback infrastructure and pass the
    cached mapping to each writepage call, there is no implicit EOF
    limit in place. If eofblocks trimming occurs during ->writepages(),
    any post-eof portion of the cached mapping becomes invalid. The
    eofblocks code has no means to serialize against writeback because
    there are no pages associated with post-eof blocks. Therefore if an
    eofblocks trim occurs and is followed by a file-extending buffered
    write, not only has the mapping become invalid, but we could end up
    writing a page to disk based on the invalid mapping.

    Consider the following sequence of events:

    - A buffered write creates a delalloc extent and post-eof
    speculative preallocation.
    - Writeback starts and on the first writepage cycle, the delalloc
    extent is converted to real blocks (including the post-eof blocks)
    and the mapping is cached.
    - The file is closed and xfs_release() trims post-eof blocks. The
    cached writeback mapping is now invalid.
    - Another buffered write appends the file with a delalloc extent.
    - The concurrent writeback cycle picks up the just written page
    because the writeback range end is LLONG_MAX. xfs_writepage_map()
    attributes it to the (now invalid) cached mapping and writes the
    data to an incorrect location on disk (and where the file offset is
    still backed by a delalloc extent).

    This problem is reproduced by xfstests test generic/464, which
    triggers racing writes, appends, open/closes and writeback requests.

    To address this problem, trim the mapping used during writeback to
    within EOF when the mapping is validated. This ensures the mapping
    is revalidated for any pages encountered beyond EOF as of the time
    the current mapping was cached or last validated.

    Reported-by: Eryu Guan
    Diagnosed-by: Eryu Guan
    Signed-off-by: Brian Foster
    Reviewed-by: Dave Chinner
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Brian Foster
     
  • Commit 332391a9935d ("fs: Fix page cache inconsistency when mixing
    buffered and AIO DIO") moved page cache invalidation from
    iomap_dio_rw() to iomap_dio_complete() for iomap based direct write
    path, but before the dio->end_io() call, and it re-introdued the bug
    fixed by commit c771c14baa33 ("iomap: invalidate page caches should
    be after iomap_dio_complete() in direct write").

    I found this because fstests generic/418 started failing on XFS with
    v4.14-rc3 kernel, which is the regression test for this specific
    bug.

    So similarly, fix it by moving dio->end_io() (which does the
    unwritten extent conversion) before page cache invalidation, to make
    sure next buffer read reads the final real allocations not unwritten
    extents. I also add some comments about why should end_io() go first
    in case we get it wrong again in the future.

    Note that, there's no such problem in the non-iomap based direct
    write path, because we didn't remove the page cache invalidation
    after the ->direct_IO() in generic_file_direct_write() call, but I
    decided to fix dio_complete() too so we don't leave a landmine
    there, also be consistent with iomap_dio_complete().

    Fixes: 332391a9935d ("fs: Fix page cache inconsistency when mixing buffered and AIO DIO")
    Signed-off-by: Eryu Guan
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong
    Reviewed-by: Jan Kara
    Reviewed-by: Lukas Czerner

    Eryu Guan
     
  • Recently we've had warnings arise from the vm handing us pages
    without bufferheads attached to them. This should not ever occur
    in XFS, but we don't defend against it properly if it does. The only
    place where we remove bufferheads from a page is in
    xfs_vm_releasepage(), but we can't tell the difference here between
    "page is dirty so don't release" and "page is dirty but is being
    invalidated so release it".

    In some places that are invalidating pages ask for pages to be
    released and follow up afterward calling ->releasepage by checking
    whether the page was dirty and then aborting the invalidation. This
    is a possible vector for releasing buffers from a page but then
    leaving it in the mapping, so we really do need to avoid dirty pages
    in xfs_vm_releasepage().

    To differentiate between invalidated pages and normal pages, we need
    to clear the page dirty flag when invalidating the pages. This can
    be done through xfs_vm_invalidatepage(), and will result
    xfs_vm_releasepage() seeing the page as clean which matches the
    bufferhead state on the page after calling block_invalidatepage().

    Hence we can re-add the page dirty check in xfs_vm_releasepage to
    catch the case where we might be releasing a page that is actually
    dirty and so should not have the bufferheads on it removed. This
    will remove one possible vector of "dirty page with no bufferheads"
    and so help narrow down the search for the root cause of that
    problem.

    Signed-Off-By: Dave Chinner
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Dave Chinner
     

14 Oct, 2017

2 commits

  • inode->i_private is assigned by a Node pointer only after registering a
    new binary format, so it could be NULL if inode was created by
    bm_fill_super() (or iput() was called by the error path in
    bm_register_write()), and this could result in NULL pointer dereference
    when evicting such an inode. e.g. mount binfmt_misc filesystem then
    umount it immediately:

    mount -t binfmt_misc binfmt_misc /proc/sys/fs/binfmt_misc
    umount /proc/sys/fs/binfmt_misc

    will result in

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000013
    IP: bm_evict_inode+0x16/0x40 [binfmt_misc]
    ...
    Call Trace:
    evict+0xd3/0x1a0
    iput+0x17d/0x1d0
    dentry_unlink_inode+0xb9/0xf0
    __dentry_kill+0xc7/0x170
    shrink_dentry_list+0x122/0x280
    shrink_dcache_parent+0x39/0x90
    do_one_tree+0x12/0x40
    shrink_dcache_for_umount+0x2d/0x90
    generic_shutdown_super+0x1f/0x120
    kill_litter_super+0x29/0x40
    deactivate_locked_super+0x43/0x70
    deactivate_super+0x45/0x60
    cleanup_mnt+0x3f/0x70
    __cleanup_mnt+0x12/0x20
    task_work_run+0x86/0xa0
    exit_to_usermode_loop+0x6d/0x99
    syscall_return_slowpath+0xba/0xf0
    entry_SYSCALL_64_fastpath+0xa3/0xa

    Fix it by making sure Node (e) is not NULL.

    Link: http://lkml.kernel.org/r/20171010100642.31786-1-eguan@redhat.com
    Fixes: 83f918274e4b ("exec: binfmt_misc: shift filp_close(interp_file) from kill_node() to bm_evict_inode()")
    Signed-off-by: Eryu Guan
    Acked-by: Oleg Nesterov
    Cc: Alexander Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eryu Guan
     
  • When using FAT on a block device which supports rw_page, we can hit
    BUG_ON(!PageLocked(page)) in try_to_free_buffers(). This is because we
    call clean_buffers() after unlocking the page we've written. Introduce
    a new clean_page_buffers() which cleans all buffers associated with a
    page and call it from within bdev_write_page().

    [akpm@linux-foundation.org: s/PAGE_SIZE/~0U/ per Linus and Matthew]
    Link: http://lkml.kernel.org/r/20171006211541.GA7409@bombadil.infradead.org
    Signed-off-by: Matthew Wilcox
    Reported-by: Toshi Kani
    Reported-by: OGAWA Hirofumi
    Tested-by: Toshi Kani
    Acked-by: Johannes Thumshirn
    Cc: Ross Zwisler
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     

13 Oct, 2017

3 commits

  • Pull xfs fixes from Darrick Wong:

    - Fix a stale kernel memory exposure when logging inodes.

    - Fix some build problems with CONFIG_XFS_RT=n

    - Don't change inode mode if the acl write fails, leaving the file
    totally inaccessible.

    - Fix a dangling pointer problem when removing an attr fork under
    memory pressure.

    - Don't crash while trying to invalidate a null buffer associated with
    a corrupt metadata pointer.

    * tag 'xfs-4.14-fixes-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
    xfs: handle error if xfs_btree_get_bufs fails
    xfs: reinit btree pointer on attr tree inactivation walk
    xfs: Fix bool initialization/comparison
    xfs: don't change inode mode if ACL update fails
    xfs: move more RT specific code under CONFIG_XFS_RT
    xfs: Don't log uninitialised fields in inode structures

    Linus Torvalds
     
  • Pull quota fix from Jan Kara:
    "A fix for a regression in handling of quota grace times and warnings"

    * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
    quota: Generate warnings for DQUOT_SPACE_NOFAIL allocations

    Linus Torvalds
     
  • In eCryptfs, we failed to verify that the authentication token keys are
    not revoked before dereferencing their payloads, which is problematic
    because the payload of a revoked key is NULL. request_key() *does* skip
    revoked keys, but there is still a window where the key can be revoked
    before we acquire the key semaphore.

    Fix it by updating ecryptfs_get_key_payload_data() to return
    -EKEYREVOKED if the key payload is NULL. For completeness we check this
    for "encrypted" keys as well as "user" keys, although encrypted keys
    cannot be revoked currently.

    Alternatively we could use key_validate(), but since we'll also need to
    fix ecryptfs_get_key_payload_data() to validate the payload length, it
    seems appropriate to just check the payload pointer.

    Fixes: 237fead61998 ("[PATCH] ecryptfs: fs/Makefile and fs/Kconfig")
    Reviewed-by: James Morris
    Cc: [v2.6.19+]
    Cc: Michael Halcrow
    Signed-off-by: Eric Biggers
    Signed-off-by: David Howells

    Eric Biggers