06 Jan, 2017

1 commit

  • Pull audit fixes from Paul Moore:
    "Two small fixes relating to audit's use of fsnotify.

    The first patch plugs a leak and the second fixes some lock
    shenanigans. The patches are small and I banged on this for an
    afternoon with our testsuite and didn't see anything odd"

    * 'stable-4.10' of git://git.infradead.org/users/pcmoore/audit:
    audit: Fix sleep in atomic
    fsnotify: Remove fsnotify_duplicate_mark()

    Linus Torvalds
     

05 Jan, 2017

2 commits

  • Pull xfs fixes from Darrick Wong:

    - fixes for crashes and double-cleanup errors

    - XFS maintainership handover

    - fix to prevent absurdly large block reservations

    - fix broken sysfs getter/setters

    * tag 'xfs-for-linus-4.10-rc3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
    xfs: fix max_retries _show and _store functions
    xfs: update MAINTAINERS
    xfs: fix crash and data corruption due to removal of busy COW extents
    xfs: use the actual AG length when reserving blocks
    xfs: fix double-cleanup when CUI recovery fails

    Linus Torvalds
     
  • Pull block layer fixes from Jens Axboe:
    "A set of fixes for the current series, one fixing a regression with
    block size < page cache size in the alias series from Jan. Outside of
    that, two small cleanups for wbt from Bart, a nvme pull request from
    Christoph, and a few small fixes of documentation updates"

    * 'for-linus' of git://git.kernel.dk/linux-block:
    block: fix up io_poll documentation
    block: Avoid that sparse complains about context imbalance in __wbt_wait()
    block: Make wbt_wait() definition consistent with declaration
    clean_bdev_aliases: Prevent cleaning blocks that are not in block range
    genhd: remove dead and duplicated scsi code
    block: add back plugging in __blkdev_direct_IO
    nvmet/fcloop: remove some logically dead code performing redundant ret checks
    nvmet: fix KATO offset in Set Features
    nvme/fc: simplify error handling of nvme_fc_create_hw_io_queues
    nvme/fc: correct some printk information
    nvme/scsi: Remove START STOP emulation
    nvme/pci: Delete misleading queue-wrap comment
    nvme/pci: Fix whitespace problem
    nvme: simplify stripe quirk
    nvme: update maintainers information

    Linus Torvalds
     

04 Jan, 2017

4 commits

  • max_retries _show and _store functions should test against cfg->max_retries,
    not cfg->retry_timeout

    Signed-off-by: Carlos Maiolino
    Reviewed-by: Eric Sandeen
    Signed-off-by: Darrick J. Wong

    Carlos Maiolino
     
  • There is a race window between write_cache_pages calling
    clear_page_dirty_for_io and XFS calling set_page_writeback, in which
    the mapping for an inode is tagged neither as dirty, nor as writeback.

    If the COW shrinker hits in exactly that window we'll remove the delayed
    COW extents and writepages trying to write it back, which in release
    kernels will manifest as corruption of the bmap btree, and in debug
    kernels will trip the ASSERT about now calling xfs_bmapi_write with the
    COWFORK flag for holes. A complex customer load manages to hit this
    window fairly reliably, probably by always having COW writeback in flight
    while the cow shrinker runs.

    This patch adds another check for having the I_DIRTY_PAGES flag set,
    which is still set during this race window. While this fixes the problem
    I'm still not overly happy about the way the COW shrinker works as it
    still seems a bit fragile.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Darrick J. Wong

    Christoph Hellwig
     
  • We need to use the actual AG length when making per-AG reservations,
    since we could otherwise end up reserving more blocks out of the last
    AG than there are actual blocks.

    Complained-about-by: Brian Foster
    Signed-off-by: Darrick J. Wong
    Reviewed-by: Christoph Hellwig

    Darrick J. Wong
     
  • Dan Carpenter reported a double-free of rcur if _defer_finish fails
    while we're recovering CUI items. Fix the error recovery to prevent
    this.

    Reported-by: Dan Carpenter
    Signed-off-by: Darrick J. Wong

    Darrick J. Wong
     

03 Jan, 2017

2 commits


31 Dec, 2016

1 commit

  • Attempting to link a device node, named pipe, or socket file into an
    encrypted directory through rename(2) or link(2) always failed with
    EPERM. This happened because fscrypt_has_permitted_context() saw that
    the file was unencrypted and forbid creating the link. This behavior
    was unexpected because such files are never encrypted; only regular
    files, directories, and symlinks can be encrypted.

    To fix this, make fscrypt_has_permitted_context() always return true on
    special files.

    This will be covered by a test in my encryption xfstests patchset.

    Fixes: 9bd8212f981e ("ext4 crypto: add encryption policy and password salt support")
    Signed-off-by: Eric Biggers
    Reviewed-by: Richard Weinberger
    Cc: stable@vger.kernel.org
    Signed-off-by: Theodore Ts'o

    Eric Biggers
     

28 Dec, 2016

1 commit

  • Commit f1c131b45410a: "crypto: xts - Convert to skcipher" now fails
    the setkey operation if the AES key is the same as the tweak key.
    Previously this check was only done if FIPS mode is enabled. Now this
    check is also done if weak key checking was requested. This is
    reasonable, but since we were using the dummy key which was a constant
    series of 0x42 bytes, it now caused dummy encrpyption test mode to
    fail.

    Fix this by using 0x42... and 0x24... for the two keys, so they are
    different.

    Fixes: f1c131b45410a202eb45cc55980a7a9e4e4b4f40
    Cc: stable@vger.kernel.org
    Signed-off-by: Theodore Ts'o

    Theodore Ts'o
     

27 Dec, 2016

6 commits

  • Now that dax_iomap_fault() calls ->iomap_begin() without entry lock, we
    can use transaction starting in ext4_iomap_begin() and thus simplify
    ext4_dax_fault(). It also provides us proper retries in case of ENOSPC.

    Signed-off-by: Jan Kara
    Signed-off-by: Dan Williams

    Jan Kara
     
  • Currently ->iomap_begin() handler is called with entry lock held. If the
    filesystem held any locks between ->iomap_begin() and ->iomap_end()
    (such as ext4 which will want to hold transaction open), this would cause
    lock inversion with the iomap_apply() from standard IO path which first
    calls ->iomap_begin() and only then calls ->actor() callback which grabs
    entry locks for DAX (if it faults when copying from/to user provided
    buffers).

    Fix the problem by nesting grabbing of entry lock inside ->iomap_begin()
    - ->iomap_end() pair.

    Reviewed-by: Ross Zwisler
    Signed-off-by: Jan Kara
    Signed-off-by: Dan Williams

    Jan Kara
     
  • The only case when we do not finish the page fault completely is when we
    are loading hole pages into a radix tree. Avoid this special case and
    finish the fault in that case as well inside the DAX fault handler. It
    will allow us for easier iomap handling.

    Reviewed-by: Ross Zwisler
    Signed-off-by: Jan Kara
    Signed-off-by: Dan Williams

    Jan Kara
     
  • Currently dax_iomap_rw() takes care of invalidating page tables and
    evicting hole pages from the radix tree when write(2) to the file
    happens. This invalidation is only necessary when there is some block
    allocation resulting from write(2). Furthermore in current place the
    invalidation is racy wrt page fault instantiating a hole page just after
    we have invalidated it.

    So perform the page invalidation inside dax_iomap_actor() where we can
    do it only when really necessary and after blocks have been allocated so
    nobody will be instantiating new hole pages anymore.

    Reviewed-by: Christoph Hellwig
    Reviewed-by: Ross Zwisler
    Signed-off-by: Jan Kara
    Signed-off-by: Dan Williams

    Jan Kara
     
  • Currently invalidate_inode_pages2_range() and invalidate_mapping_pages()
    just delete all exceptional radix tree entries they find. For DAX this
    is not desirable as we track cache dirtiness in these entries and when
    they are evicted, we may not flush caches although it is necessary. This
    can for example manifest when we write to the same block both via mmap
    and via write(2) (to different offsets) and fsync(2) then does not
    properly flush CPU caches when modification via write(2) was the last
    one.

    Create appropriate DAX functions to handle invalidation of DAX entries
    for invalidate_inode_pages2_range() and invalidate_mapping_pages() and
    wire them up into the corresponding mm functions.

    Acked-by: Johannes Weiner
    Reviewed-by: Ross Zwisler
    Signed-off-by: Jan Kara
    Signed-off-by: Dan Williams

    Jan Kara
     
  • So far we did not return BH_New buffers from ext2_get_blocks() when we
    allocated and zeroed-out a block for DAX inode to avoid racy zeroing in
    DAX code. This zeroing is gone these days so we can remove the
    workaround.

    Reviewed-by: Ross Zwisler
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jan Kara
    Signed-off-by: Dan Williams

    Jan Kara
     

26 Dec, 2016

3 commits

  • No point in going through loops and hoops instead of just comparing the
    values.

    Signed-off-by: Thomas Gleixner
    Cc: Peter Zijlstra

    Thomas Gleixner
     
  • ktime_set(S,N) was required for the timespec storage type and is still
    useful for situations where a Seconds and Nanoseconds part of a time value
    needs to be converted. For anything where the Seconds argument is 0, this
    is pointless and can be replaced with a simple assignment.

    Signed-off-by: Thomas Gleixner
    Cc: Peter Zijlstra

    Thomas Gleixner
     
  • ktime is a union because the initial implementation stored the time in
    scalar nanoseconds on 64 bit machine and in a endianess optimized timespec
    variant for 32bit machines. The Y2038 cleanup removed the timespec variant
    and switched everything to scalar nanoseconds. The union remained, but
    become completely pointless.

    Get rid of the union and just keep ktime_t as simple typedef of type s64.

    The conversion was done with coccinelle and some manual mopping up.

    Signed-off-by: Thomas Gleixner
    Cc: Peter Zijlstra

    Thomas Gleixner
     

25 Dec, 2016

2 commits

  • This was entirely automated, using the script by Al:

    PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*'
    sed -i -e "s!$PATT!#include !" \
    $(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h)

    to do the replacement at the end of the merge window.

    Requested-by: Al Viro
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • Pull cifs fixes from Steve French:
    "This ncludes various cifs/smb3 bug fixes, mostly for stable as well.

    In the next week I expect that Germano will have some reconnection
    fixes, and also I expect to have the remaining pieces of the snapshot
    enablement and SMB3 ACLs, but wanted to get this set of bug fixes in"

    * 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
    cifs_get_root shouldn't use path with tree name
    Fix default behaviour for empty domains and add domainauto option
    cifs: use %16phN for formatting md5 sum
    cifs: Fix smbencrypt() to stop pointing a scatterlist at the stack
    CIFS: Fix a possible double locking of mutex during reconnect
    CIFS: Fix a possible memory corruption during reconnect
    CIFS: Fix a possible memory corruption in push locks
    CIFS: Fix missing nls unload in smb2_reconnect()
    CIFS: Decrease verbosity of ioctl call
    SMB3: parsing for new snapshot timestamp mount parm

    Linus Torvalds
     

24 Dec, 2016

3 commits

  • There are only two calls sites of fsnotify_duplicate_mark(). Those are
    in kernel/audit_tree.c and both are bogus. Vfsmount pointer is unused
    for audit tree, inode pointer and group gets set in
    fsnotify_add_mark_locked() later anyway, mask and free_mark are already
    set in alloc_chunk(). In fact, calling fsnotify_duplicate_mark() is
    actively harmful because following fsnotify_add_mark_locked() will leak
    group reference by overwriting the group pointer. So just remove the two
    calls to fsnotify_duplicate_mark() and the function.

    Signed-off-by: Jan Kara
    [PM: line wrapping to fit in 80 chars]
    Signed-off-by: Paul Moore

    Jan Kara
     
  • Pull final vfs updates from Al Viro:
    "Assorted cleanups and fixes all over the place"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    sg_write()/bsg_write() is not fit to be called under KERNEL_DS
    ufs: fix function declaration for ufs_truncate_blocks
    fs: exec: apply CLOEXEC before changing dumpable task flags
    seq_file: reset iterator to first record for zero offset
    vfs: fix isize/pos/len checks for reflink & dedupe
    [iov_iter] fix iterate_all_kinds() on empty iterators
    move aio compat to fs/aio.c
    reorganize do_make_slave()
    clone_private_mount() doesn't need to touch namespace_sem
    remove a bogus claim about namespace_sem being held by callers of mnt_alloc_id()

    Linus Torvalds
     
  • Pull befs updates from Luis de Bethencourt:
    "A series of small fixes and adding NFS export support"

    * tag 'befs-v4.10-rc1' of git://github.com/luisbg/linux-befs:
    befs: add NFS export support
    befs: remove trailing whitespaces
    befs: remove signatures from comments
    befs: fix style issues in header files
    befs: fix style issues in linuxvfs.c
    befs: fix typos in linuxvfs.c
    befs: fix style issues in io.c
    befs: fix style issues in inode.c
    befs: fix style issues in debug.c

    Linus Torvalds
     

23 Dec, 2016

7 commits

  • Al Viro
     
  • sparse says:

    fs/ufs/inode.c:1195:6: warning: symbol 'ufs_truncate_blocks' was not declared. Should it be static?

    Note that the forward declaration in the file is already marked static.

    Signed-off-by: Jeff Layton
    Signed-off-by: Al Viro

    Jeff Layton
     
  • If you have a process that has set itself to be non-dumpable, and it
    then undergoes exec(2), any CLOEXEC file descriptors it has open are
    "exposed" during a race window between the dumpable flags of the process
    being reset for exec(2) and CLOEXEC being applied to the file
    descriptors. This can be exploited by a process by attempting to access
    /proc//fd/... during this window, without requiring CAP_SYS_PTRACE.

    The race in question is after set_dumpable has been (for get_link,
    though the trace is basically the same for readlink):

    [vfs]
    -> proc_pid_link_inode_operations.get_link
    -> proc_pid_get_link
    -> proc_fd_access_allowed
    -> ptrace_may_access(task, PTRACE_MODE_READ_FSCREDS);

    Which will return 0, during the race window and CLOEXEC file descriptors
    will still be open during this window because do_close_on_exec has not
    been called yet. As a result, the ordering of these calls should be
    reversed to avoid this race window.

    This is of particular concern to container runtimes, where joining a
    PID namespace with file descriptors referring to the host filesystem
    can result in security issues (since PRCTL_SET_DUMPABLE doesn't protect
    against access of CLOEXEC file descriptors -- file descriptors which may
    reference filesystem objects the container shouldn't have access to).

    Cc: dev@opencontainers.org
    Cc: # v3.2+
    Reported-by: Michael Crosby
    Signed-off-by: Aleksa Sarai
    Signed-off-by: Al Viro

    Aleksa Sarai
     
  • If kernfs file is empty on a first read, successive read operations
    using the same file descriptor will return no data, even when data is
    available. Default kernfs 'seq_next' implementation advances iterator
    position even when next object is not there. Kernfs 'seq_start' for
    following requests will not return iterator as position is already on
    the second object.

    This defect doesn't allow to monitor badblocks sysfs files from MD raid.
    They are initially empty but if data appears at some stage, userspace is
    not able to read it.

    Signed-off-by: Tomasz Majchrzak
    Signed-off-by: Miklos Szeredi
    Signed-off-by: Al Viro

    Tomasz Majchrzak
     
  • Strengthen the checking of pos/len vs. i_size, clarify the return values
    for the clone prep function, and remove pointless code.

    Reviewed-by: Christoph Hellwig
    Signed-off-by: Darrick J. Wong
    Signed-off-by: Al Viro

    Darrick J. Wong
     
  • ... and fix the minor buglet in compat io_submit() - native one
    kills ioctx as cleanup when put_user() fails. Get rid of
    bogus compat_... in !CONFIG_AIO case, while we are at it - they
    should simply fail with ENOSYS, same as for native counterparts.

    Signed-off-by: Al Viro

    Al Viro
     
  • This allows sending larger than 1 MB requests to devices that support
    large I/O sizes.

    Signed-off-by: Christoph Hellwig
    Reported-by: Laurence Oberman
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

22 Dec, 2016

8 commits

  • Implement mandatory export_operations, so it is possible to export befs via
    nfs.

    Signed-off-by: Luis de Bethencourt

    Luis de Bethencourt
     
  • Removing all trailing whitespaces in befs.

    I was skeptic about tainting the history with this, but whitespace changes
    can be ignored by using 'git blame -w' and 'git log -w'.

    Signed-off-by: Luis de Bethencourt

    Luis de Bethencourt
     
  • No idea why some comments have signatures. These predate git. Removing them
    since they add noise and no information.

    Signed-off-by: Luis de Bethencourt

    Luis de Bethencourt
     
  • Fixing checkpatch.pl issues in befs header files:
    WARNING: Missing a blank line after declarations
    + befs_inode_addr iaddr;
    + iaddr.allocation_group = blockno >> BEFS_SB(sb)->ag_shift;

    WARNING: space prohibited between function name and open parenthesis '('
    + return BEFS_SB(sb)->block_size / sizeof (befs_disk_inode_addr);

    ERROR: "foo * bar" should be "foo *bar"
    + const char *key, befs_off_t * value);

    ERROR: Macros with complex values should be enclosed in parentheses
    +#define PACKED __attribute__ ((__packed__))

    Signed-off-by: Luis de Bethencourt

    Luis de Bethencourt
     
  • Fix the following type of checkpatch.pl issues:
    WARNING: line over 80 characters
    +static struct dentry *befs_lookup(struct inode *, struct dentry *, unsigned int);

    ERROR: code indent should use tabs where possible
    + if (!bi)$

    WARNING: please, no spaces at the start of a line
    + if (!bi)$

    WARNING: labels should not be indented
    + unacquire_bh:

    WARNING: space prohibited between function name and open parenthesis '('
    + sizeof (struct befs_inode_info),

    WARNING: braces {} are not necessary for single statement blocks
    + if (!*out) {
    + return -ENOMEM;
    + }

    WARNING: Block comments use a trailing */ on a separate line
    + * in special cases */

    WARNING: Missing a blank line after declarations
    + int token;
    + if (!*p)

    ERROR: do not use assignment in if condition
    + if (!(bh = sb_bread(sb, sb_block))) {

    ERROR: space prohibited after that open parenthesis '('
    + if( befs_sb->num_blocks > ~((sector_t)0) ) {

    ERROR: space prohibited before that close parenthesis ')'
    + if( befs_sb->num_blocks > ~((sector_t)0) ) {

    ERROR: space required before the open parenthesis '('
    + if( befs_sb->num_blocks > ~((sector_t)0) ) {

    Signed-off-by: Luis de Bethencourt

    Luis de Bethencourt
     
  • Signed-off-by: Luis de Bethencourt

    Luis de Bethencourt
     
  • Fixing the two following checkpatch.pl issues:
    ERROR: trailing whitespace
    + * Based on portions of file.c and inode.c $

    WARNING: labels should not be indented
    + error:

    Signed-off-by: Luis de Bethencourt

    Luis de Bethencourt
     
  • Fixing the following checkpatch.pl errors and warning:
    ERROR: trailing whitespace
    + * $

    WARNING: Block comments use * on subsequent lines
    +/*
    + Validates the correctness of the befs inode

    ERROR: "foo * bar" should be "foo *bar"
    +befs_check_inode(struct super_block *sb, befs_inode * raw_inode,

    Signed-off-by: Luis de Bethencourt

    Luis de Bethencourt