04 Sep, 2016

3 commits

  • Pull btrfs fixes from Chris Mason:
    "I'm still prepping a set of fixes for btrfs fsync, just nailing down a
    hard to trigger memory corruption. For now, these are tested and ready."

    * 'for-linus-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
    btrfs: fix one bug that process may endlessly wait for ticket in wait_reserve_ticket()
    Btrfs: fix endless loop in balancing block groups
    Btrfs: kill invalid ASSERT() in process_all_refs()

    Linus Torvalds
     
  • Pull driver core fixes from Greg KH:
    "Here are three small fixes for 4.8-rc5.

    One for sysfs, one for kernfs, and one documentation fix, all for
    reported issues. All of these have been in linux-next for a while"

    * tag 'driver-core-4.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
    sysfs: correctly handle read offset on PREALLOC attrs
    documentation: drivers/core/of: fix name of of_node symlink
    kernfs: don't depend on d_find_any_alias() when generating notifications

    Linus Torvalds
     
  • In commit 8ead9dd54716 ("devpts: more pty driver interface cleanups") I
    made devpts_get_priv() just return the dentry->fs_data directly. And
    because I thought it wouldn't happen, I added a warning if you ever saw
    a pts node that wasn't on devpts.

    And no, that warning never triggered under any actual real use, but you
    can trigger it by creating nonsensical pts nodes by hand.

    So just revert the warning, and make devpts_get_priv() return NULL for
    that case like it used to.

    Reported-by: Dmitry Vyukov
    Cc: stable@vger.kernel.org # 4.6+
    Cc: Eric W Biederman"
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

03 Sep, 2016

1 commit

  • Pull overlayfs fixes from Miklos Szeredi:
    "Most of this is regression fixes for posix acl behavior introduced in
    4.8-rc1 (these were caught by the pjd-fstest suite). The are also
    miscellaneous fixes marked as stable material and cleanups.

    Other than overlayfs code, it touches to add a constant
    with which to disable posix acl caching. No changes needed to the
    actual caching code, it automatically does the right thing, although
    later we may want to optimize this case.

    I'm now testing overlayfs with the following test suites to catch
    regressions:

    - unionmount-testsuite
    - xfstests
    - pjd-fstest"

    * 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
    ovl: update doc
    ovl: listxattr: use strnlen()
    ovl: Switch to generic_getxattr
    ovl: copyattr after setting POSIX ACL
    ovl: Switch to generic_removexattr
    ovl: Get rid of ovl_xattr_noacl_handlers array
    ovl: Fix OVL_XATTR_PREFIX
    ovl: fix spelling mistake: "directries" -> "directories"
    ovl: don't cache acl on overlay layer
    ovl: use cached acl on underlying layer
    ovl: proper cleanup of workdir
    ovl: remove posix_acl_default from workdir
    ovl: handle umask and posix_acl_default correctly on creation
    ovl: don't copy up opaqueness

    Linus Torvalds
     

02 Sep, 2016

2 commits

  • Pull audit fixes from Paul Moore:
    "Two small patches to fix some bugs with the audit-by-executable
    functionality we introduced back in v4.3 (both patches are marked
    for the stable folks)"

    * 'stable-4.8' of git://git.infradead.org/users/pcmoore/audit:
    audit: fix exe_file access in audit_exe_compare
    mm: introduce get_task_exe_file

    Linus Torvalds
     
  • …rnel/git/dgc/linux-xfs

    Pull xfs and iomap fixes from Dave Chinner:
    "Most of these changes are small regression fixes that address problems
    introduced in the 4.8-rc1 window. The two fixes that aren't (IO
    completion fix and superblock inprogress check) are fixes for problems
    introduced some time ago and need to be pushed back to stable kernels.

    Changes in this update:
    - iomap FIEMAP_EXTENT_MERGED usage fix
    - additional mount-time feature restrictions
    - rmap btree query fixes
    - freeze/unmount io completion workqueue fix
    - memory corruption fix for deferred operations handling"

    * tag 'xfs-iomap-for-linus-4.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs:
    xfs: track log done items directly in the deferred pending work item
    iomap: don't set FIEMAP_EXTENT_MERGED for extent based filesystems
    xfs: prevent dropping ioend completions during buftarg wait
    xfs: fix superblock inprogress check
    xfs: simple btree query range should look right if LE lookup fails
    xfs: fix some key handling problems in _btree_simple_query_range
    xfs: don't log the entire end of the AGF
    xfs: disallow mounting of realtime + rmap filesystems
    xfs: don't perform lookups on zero-height btrees

    Linus Torvalds
     

01 Sep, 2016

17 commits

  • If can_overcommit() in btrfs_calc_reclaim_metadata_size() returns true,
    btrfs_async_reclaim_metadata_space() will not reclaim metadata space, just
    return directly and also forget to wake up process which are waiting for
    their tickets, so these processes will wait endlessly.

    Fstests case generic/172 with mount option "-o compress=lzo" have revealed
    this bug in my test machine. Here if we have tickets to handle, we must
    handle them first.

    Signed-off-by: Wang Xiaoguang
    Reviewed-by: Josef Bacik
    Signed-off-by: David Sterba

    Wang Xiaoguang
     
  • Qgroup function may overwrite the saved error 'err' with 0
    in case quota is not enabled, and this ends up with a
    endless loop in balance because we keep going back to balance
    the same block group.

    It really should use 'ret' instead.

    Signed-off-by: Liu Bo
    Reviewed-by: Qu Wenruo
    Signed-off-by: David Sterba

    Liu Bo
     
  • Suppose you have the following tree in snap1 on a file system mounted with -o
    inode_cache so that inode numbers are recycled

    └── [ 258] a
    └── [ 257] b

    and then you remove b, rename a to c, and then re-create b in c so you have the
    following tree

    └── [ 258] c
    └── [ 257] b

    and then you try to do an incremental send you will hit

    ASSERT(pending_move == 0);

    in process_all_refs(). This is because we assume that any recycling of inodes
    will not have a pending change in our path, which isn't the case. This is the
    case for the DELETE side, since we want to remove the old file using the old
    path, but on the create side we could have a pending move and need to do the
    normal pending rename dance. So remove this ASSERT() and put a comment about
    why we ignore pending_move. Thanks,

    Signed-off-by: Josef Bacik
    Signed-off-by: David Sterba

    Josef Bacik
     
  • Be defensive about what underlying fs provides us in the returned xattr
    list buffer. If it's not properly null terminated, bail out with a warning
    insead of BUG.

    Signed-off-by: Miklos Szeredi
    Cc:

    Miklos Szeredi
     
  • Now that overlayfs has xattr handlers for iop->{set,remove}xattr, use
    those same handlers for iop->getxattr as well.

    Signed-off-by: Andreas Gruenbacher
    Signed-off-by: Miklos Szeredi

    Andreas Gruenbacher
     
  • Setting POSIX acl may also modify the file mode, so need to copy that up to
    the overlay inode.

    Reported-by: Eryu Guan
    Fixes: d837a49bd57f ("ovl: fix POSIX ACL setting")
    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • Commit d837a49bd57f ("ovl: fix POSIX ACL setting") switches from
    iop->setxattr from ovl_setxattr to generic_setxattr, so switch from
    ovl_removexattr to generic_removexattr as well. As far as permission
    checking goes, the same rules should apply in either case.

    While doing that, rename ovl_setxattr to ovl_xattr_set to indicate that
    this is not an iop->setxattr implementation and remove the unused inode
    argument.

    Move ovl_other_xattr_set above ovl_own_xattr_set so that they match the
    order of handlers in ovl_xattr_handlers.

    Signed-off-by: Andreas Gruenbacher
    Fixes: d837a49bd57f ("ovl: fix POSIX ACL setting")
    Signed-off-by: Miklos Szeredi

    Andreas Gruenbacher
     
  • Use an ordinary #ifdef to conditionally include the POSIX ACL handlers
    in ovl_xattr_handlers, like the other filesystems do. Flag the code
    that is now only used conditionally with __maybe_unused.

    Signed-off-by: Andreas Gruenbacher
    Signed-off-by: Miklos Szeredi

    Andreas Gruenbacher
     
  • Make sure ovl_own_xattr_handler only matches attribute names starting
    with "overlay.", not "overlayXXX".

    Signed-off-by: Andreas Gruenbacher
    Fixes: d837a49bd57f ("ovl: fix POSIX ACL setting")
    Signed-off-by: Miklos Szeredi

    Andreas Gruenbacher
     
  • Trivial fix to spelling mistake in pr_err message.

    Signed-off-by: Colin Ian King
    Signed-off-by: Miklos Szeredi

    Colin Ian King
     
  • Some operations (setxattr/chmod) can make the cached acl stale. We either
    need to clear overlay's acl cache for the affected inode or prevent acl
    caching on the overlay altogether. Preventing caching has the following
    advantages:

    - no double caching, less memory used

    - overlay cache doesn't go stale when fs clears it's own cache

    Possible disadvantage is performance loss. If that becomes a problem
    get_acl() can be optimized for overlayfs.

    This patch disables caching by pre setting i_*acl to a value that

    - has bit 0 set, so is_uncached_acl() will return true

    - is not equal to ACL_NOT_CACHED, so get_acl() will not overwrite it

    The constant -3 was chosen for this purpose.

    Fixes: 39a25b2b3762 ("ovl: define ->get_acl() for overlay inodes")
    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • Instead of calling ->get_acl() directly, use get_acl() to get the cached
    value.

    We will have the acl cached on the underlying inode anyway, because we do
    permission checking on the both the overlay and the underlying fs.

    So, since we already have double caching, this improves performance without
    any cost.

    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • When mounting overlayfs it needs a clean "work" directory under the
    supplied workdir.

    Previously the mount code removed this directory if it already existed and
    created a new one. If the removal failed (e.g. directory was not empty)
    then it fell back to a read-only mount not using the workdir.

    While this has never been reported, it is possible to get a non-empty
    "work" dir from a previous mount of overlayfs in case of crash in the
    middle of an operation using the work directory.

    In this case the left over state should be discarded and the overlay
    filesystem will be consistent, guaranteed by the atomicity of operations on
    moving to/from the workdir to the upper layer.

    This patch implements cleaning out any files left in workdir. It is
    implemented using real recursion for simplicity, but the depth is limited
    to 2, because the worst case is that of a directory containing whiteouts
    under "work".

    Signed-off-by: Miklos Szeredi
    Cc:

    Miklos Szeredi
     
  • Clear out posix acl xattrs on workdir and also reset the mode after
    creation so that an inherited sgid bit is cleared.

    Signed-off-by: Miklos Szeredi
    Cc:

    Miklos Szeredi
     
  • Setting MS_POSIXACL in sb->s_flags has the side effect of passing mode to
    create functions without masking against umask.

    Another problem when creating over a whiteout is that the default posix acl
    is not inherited from the parent dir (because the real parent dir at the
    time of creation is the work directory).

    Fix these problems by:

    a) If upper fs does not have MS_POSIXACL, then mask mode with umask.

    b) If creating over a whiteout, call posix_acl_create() to get the
    inherited acls. After creation (but before moving to the final
    destination) set these acls on the created file. posix_acl_create() also
    updates the file creation mode as appropriate.

    Fixes: 39a25b2b3762 ("ovl: define ->get_acl() for overlay inodes")
    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • For more convenient access if one has a pointer to the task.

    As a minor nit take advantage of the fact that only task lock + rcu are
    needed to safely grab ->exe_file. This saves mm refcount dance.

    Use the helper in proc_exe_link.

    Signed-off-by: Mateusz Guzik
    Acked-by: Konstantin Khlebnikov
    Acked-by: Richard Guy Briggs
    Cc: # 4.3.x
    Signed-off-by: Paul Moore

    Mateusz Guzik
     
  • We used to delay switching to the new credentials until after we had
    mapped the executable (and possible elf interpreter). That was kind of
    odd to begin with, since the new executable will actually then _run_
    with the new creds, but whatever.

    The bigger problem was that we also want to make sure that we turn off
    prof events and tracing before we start mapping the new executable
    state. So while this is a cleanup, it's also a fix for a possible
    information leak.

    Reported-by: Robert Święcki
    Tested-by: Peter Zijlstra
    Acked-by: David Howells
    Acked-by: Oleg Nesterov
    Acked-by: Andy Lutomirski
    Acked-by: Eric W. Biederman
    Cc: Willy Tarreau
    Cc: Kees Cook
    Cc: Al Viro
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

31 Aug, 2016

3 commits

  • Attributes declared with __ATTR_PREALLOC use sysfs_kf_read() which returns
    zero bytes for non-zero offset. This breaks script checkarray in mdadm tool
    in debian where /bin/sh is 'dash' because its builtin 'read' reads only one
    byte at a time. Script gets 'i' instead of 'idle' when reads current action
    from /sys/block/$dev/md/sync_action and as a result does nothing.

    This patch adds trivial implementation of partial read: generate whole
    string and move required part into buffer head.

    Signed-off-by: Konstantin Khlebnikov
    Fixes: 4ef67a8c95f3 ("sysfs/kernfs: make read requests on pre-alloc files use the buffer.")
    Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=787950
    Cc: Stable # v3.19+
    Acked-by: Tejun Heo
    Signed-off-by: Greg Kroah-Hartman

    Konstantin Khlebnikov
     
  • kernfs_notify_workfn() sends out file modified events for the
    scheduled kernfs_nodes. Because the modifications aren't from
    userland, it doesn't have the matching file struct at hand and can't
    use fsnotify_modify(). Instead, it looked up the inode and then used
    d_find_any_alias() to find the dentry and used fsnotify_parent() and
    fsnotify() directly to generate notifications.

    The assumption was that the relevant dentries would have been pinned
    if there are listeners, which isn't true as inotify doesn't pin
    dentries at all and watching the parent doesn't pin the child dentries
    even for dnotify. This led to, for example, inotify watchers not
    getting notifications if the system is under memory pressure and the
    matching dentries got reclaimed. It can also be triggered through
    /proc/sys/vm/drop_caches or a remount attempt which involves shrinking
    dcache.

    fsnotify_parent() only uses the dentry to access the parent inode,
    which kernfs can do easily. Update kernfs_notify_workfn() so that it
    uses fsnotify() directly for both the parent and target inodes without
    going through d_find_any_alias(). While at it, supply the target file
    name to fsnotify() from kernfs_node->name.

    Signed-off-by: Tejun Heo
    Reported-by: Evgeny Vereshchagin
    Fixes: d911d9874801 ("kernfs: make kernfs_notify() trigger inotify events too")
    Cc: John McCutchan
    Cc: Robert Love
    Cc: Eric Paris
    Cc: stable@vger.kernel.org # v3.16+
    Signed-off-by: Greg Kroah-Hartman

    Tejun Heo
     
  • Pull NFS client bugfixes from Trond Myklebust:
    "Highlights include:

    Stable patches:
    - Fix a refcount leak in nfs_callback_up_net
    - Fix an Oopsable condition when the flexfile pNFS driver connection
    to the DS fails
    - Fix an Oopsable condition in NFSv4.1 server callback races
    - Ensure pNFS clients stop doing I/O to the DS if their lease has
    expired, as required by the NFSv4.1 protocol

    Bugfixes:
    - Fix potential looping in the NFSv4.x migration code
    - Patch series to close callback races for OPEN, LAYOUTGET and
    LAYOUTRETURN
    - Silence WARN_ON when NFSv4.1 over RDMA is in use
    - Fix a LAYOUTCOMMIT race in the pNFS/blocks client
    - Fix pNFS timeout issues when the DS fails"

    * tag 'nfs-for-4.8-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
    NFSv4.x: Fix a refcount leak in nfs_callback_up_net
    NFS4: Avoid migration loops
    pNFS/flexfiles: Fix an Oopsable condition when connection to the DS fails
    NFSv4.1: Remove obsolete and incorrrect assignment in nfs4_callback_sequence
    NFSv4.1: Close callback races for OPEN, LAYOUTGET and LAYOUTRETURN
    NFSv4.1: Defer bumping the slot sequence number until we free the slot
    NFSv4.1: Delay callback processing when there are referring triples
    NFSv4.1: Fix Oopsable condition in server callback races
    SUNRPC: Silence WARN_ON when NFSv4.1 over RDMA is in use
    pnfs/blocklayout: update last_write_offset atomically with extents
    pNFS: The client must not do I/O to the DS if it's lease has expired
    pNFS: Handle NFS4ERR_OLD_STATEID correctly in LAYOUTSTAT calls
    pNFS/flexfiles: Set reasonable default retrans values for the data channel
    NFS: Allow the mount option retrans=0
    pNFS/flexfiles: Fix layoutstat periodic reporting

    Linus Torvalds
     

30 Aug, 2016

5 commits

  • On error, the callers expect us to return without bumping
    nn->cb_users[].

    Signed-off-by: Trond Myklebust
    Cc: stable@vger.kernel.org # v3.7+

    Trond Myklebust
     
  • If a server returns itself as a location while migrating, the client may
    end up getting stuck attempting to migrate twice to the same server. Catch
    this by checking if the nfs_client found is the same as the existing
    client. For the other two callers to nfs4_set_client, the nfs_client will
    always be ERR_PTR(-EINVAL).

    Signed-off-by: Benjamin Coddington
    Signed-off-by: Trond Myklebust

    Benjamin Coddington
     
  • Christoph reports slab corruption when a deferred refcount update
    aborts during _defer_finish(). The cause of this was broken log item
    state tracking in xfs_defer_pending -- upon an abort,
    _defer_trans_abort() will call abort_intent on all intent items,
    including the ones that have already had a done item attached.

    This is incorrect because each intent item has 2 refcount: the first
    is released when the intent item is committed to the log; and the
    second is released when the _done_ item is committed to the log, or
    by the intent creator if there is no done item. In other words, once
    we log the done item, responsibility for releasing the intent item's
    second refcount is transferred to the done item and /must not/ be
    performed by anything else.

    The dfp_committed flag should have been tracking whether or not we had
    a done item so that _defer_trans_abort could decide if it needs to
    abort the intent item, but due to a thinko this was not the case. Rip
    it out and track the done item directly so that we do the right thing
    w.r.t. intent item freeing.

    Signed-off-by: Darrick J. Wong
    Reported-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Dave Chinner

    Darrick J. Wong
     
  • Pull ext4 fixes from Ted Ts'o:
    "Fix bugs that could cause kernel deadlocks or file system corruption
    while moving xattrs to expand the extended inode.

    Also add some sanity checks to the block group descriptors to make
    sure we don't end up overwriting the superblock"

    * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
    ext4: avoid deadlock when expanding inode size
    ext4: properly align shifted xattrs when expanding inodes
    ext4: fix xattr shifting when expanding inodes part 2
    ext4: fix xattr shifting when expanding inodes
    ext4: validate that metadata blocks do not overlap superblock
    ext4: reserve xattr index for the Hurd

    Linus Torvalds
     
  • If the attempt to connect to a DS fails inside ff_layout_pg_init_read or
    ff_layout_pg_init_write, then we currently end up clearing the layout
    segment carried by the struct nfs_pageio_descriptor, causing an Oops
    when we later call into ff_layout_read_pagelist/ff_layout_write_pagelist.

    The fix is to ensure we return the layout and then retry.

    Fixes: 446ca2195303 ("pNFS/flexfiles: When initing reads or writes, we...")
    Cc: stable@vger.kernel.org # v4.7+
    Signed-off-by: Trond Myklebust

    Trond Myklebust
     

29 Aug, 2016

6 commits


27 Aug, 2016

3 commits

  • Merge fixes from Andrew Morton:
    "11 fixes"

    * emailed patches from Andrew Morton :
    mm: silently skip readahead for DAX inodes
    dax: fix device-dax region base
    fs/seq_file: fix out-of-bounds read
    mm: memcontrol: avoid unused function warning
    mm: clarify COMPACTION Kconfig text
    treewide: replace config_enabled() with IS_ENABLED() (2nd round)
    printk: fix parsing of "brl=" option
    soft_dirty: fix soft_dirty during THP split
    sysctl: handle error writing UINT_MAX to u32 fields
    get_maintainer: quiet noisy implicit -f vcs_file_exists checking
    byteswap: don't use __builtin_bswap*() with sparse

    Linus Torvalds
     
  • Pull btrfs fixes from Chris Mason:
    "We've queued up a few different fixes in here. These range from
    enospc corners to fsync and quota fixes, and a few targeted at error
    handling for corrupt metadata/fuzzing"

    * 'for-linus-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
    Btrfs: fix lockdep warning on deadlock against an inode's log mutex
    Btrfs: detect corruption when non-root leaf has zero item
    Btrfs: check btree node's nritems
    btrfs: don't create or leak aliased root while cleaning up orphans
    Btrfs: fix em leak in find_first_block_group
    btrfs: do not background blkdev_put()
    Btrfs: clarify do_chunk_alloc()'s return value
    btrfs: fix fsfreeze hang caused by delayed iputs deal
    btrfs: update btrfs_space_info's bytes_may_use timely
    btrfs: divide btrfs_update_reserved_bytes() into two functions
    btrfs: use correct offset for reloc_inode in prealloc_file_extent_cluster()
    btrfs: qgroup: Fix qgroup incorrectness caused by log replay
    btrfs: relocation: Fix leaking qgroups numbers on data extents
    btrfs: qgroup: Refactor btrfs_qgroup_insert_dirty_extent()
    btrfs: waiting on qgroup rescan should not always be interruptible
    btrfs: properly track when rescan worker is running
    btrfs: flush_space: treat return value of do_chunk_alloc properly
    Btrfs: add ASSERT for block group's memory leak
    btrfs: backref: Fix soft lockup in __merge_refs function
    Btrfs: fix memory leak of reloc_root

    Linus Torvalds
     
  • Pull dlm fix from David Teigland:
    "This fixes a bug introduced by recent debugfs cleanup"

    * tag 'dlm-4.8-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm:
    dlm: fix malfunction of dlm_tool caused by debugfs changes

    Linus Torvalds