17 Nov, 2015

1 commit

  • fs/cachefiles/rdwr.c: In function ‘cachefiles_write_page’:
    fs/cachefiles/rdwr.c:882: warning: ‘ret’ may be used uninitialized in
    this function

    If the jump to label "error" is taken, "ret" will indeed be
    uninitialized, and random stack data may be printed by the debug code.

    Fixes: 102f4d900c9c8f5e ("FS-Cache: Handle a write to the page immediately beyond the EOF marker")
    Signed-off-by: Geert Uytterhoeven
    Signed-off-by: David Howells
    Signed-off-by: Al Viro

    Geert Uytterhoeven
     

14 Nov, 2015

16 commits

  • …/olof/chrome-platform

    Pull chrome platform updates from Olof Johansson:
    "Here's the branch of chrome platform changes for v4.4. Some have been
    queued up for the full 4.3 release cycle since I forgot to send them
    in for that round (rebased early on to deal with fixes conflicts).

    Most of these enable EC communication stuff -- Pixel 2015 support,
    enabling building for ARM64 platforms, and a few fixes for memory
    leaks.

    There's also a patch in here to allow reading/writing the verified
    boot context, which depends on a sysfs patch acked by Greg"

    * tag 'chrome-platform-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/olof/chrome-platform:
    platform/chrome: Fix i2c-designware adapter name
    platform/chrome: Support reading/writing the vboot context
    sysfs: Support is_visible() on binary attributes
    platform/chrome: cros_ec: Fix possible leak in led_rgb_store()
    platform/chrome: cros_ec: Fix leak in sequence_store()
    platform/chrome: Enable Chrome platforms on 64-bit ARM
    platform/chrome: cros_ec_dev - Add a platform device ID table
    platform/chrome: cros_ec_lpc - Add support for Google Pixel 2
    platform/chrome: cros_ec_lpc - Use existing function to check EC result
    platform/chrome: Make depends on MFD_CROS_EC instead CROS_EC_PROTO
    Revert "platform/chrome: Don't make CHROME_PLATFORMS depends on X86 || ARM"

    Linus Torvalds
     
  • Pull SCSI target updates from Nicholas Bellinger:
    "This series contains HCH's changes to absorb configfs attribute
    ->show() + ->store() function pointer usage from it's original
    tree-wide consumers, into common configfs code.

    It includes usb-gadget, target w/ drivers, netconsole and ocfs2
    changes to realize the improved simplicity, that now renders the
    original include/target/configfs_macros.h CPP magic for fabric drivers
    and others, unnecessary and obsolete.

    And with common code in place, new configfs attributes can be added
    easier than ever before.

    Note, there are further improvements in-flight from other folks for
    v4.5 code in configfs land, plus number of target fixes for post -rc1
    code"

    In the meantime, a new user of the now-removed old configfs API came in
    through the char/misc tree in commit 7bd1d4093c2f ("stm class: Introduce
    an abstraction for System Trace Module devices").

    This merge resolution comes from Alexander Shishkin, who updated his stm
    class tracing abstraction to account for the removal of the old
    show_attribute and store_attribute methods in commit 517982229f78
    ("configfs: remove old API") from this pull. As Alexander says about
    that patch:

    "There's no need to keep an extra wrapper structure per item and the
    awkward show_attribute/store_attribute item ops are no longer needed.

    This patch converts policy code to the new api, all the while making
    the code quite a bit smaller and easier on the eyes.

    Signed-off-by: Alexander Shishkin "

    That patch was folded into the merge so that the tree should be fully
    bisectable.

    * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (23 commits)
    configfs: remove old API
    ocfs2/cluster: use per-attribute show and store methods
    ocfs2/cluster: move locking into attribute store methods
    netconsole: use per-attribute show and store methods
    target: use per-attribute show and store methods
    spear13xx_pcie_gadget: use per-attribute show and store methods
    dlm: use per-attribute show and store methods
    usb-gadget/f_serial: use per-attribute show and store methods
    usb-gadget/f_phonet: use per-attribute show and store methods
    usb-gadget/f_obex: use per-attribute show and store methods
    usb-gadget/f_uac2: use per-attribute show and store methods
    usb-gadget/f_uac1: use per-attribute show and store methods
    usb-gadget/f_mass_storage: use per-attribute show and store methods
    usb-gadget/f_sourcesink: use per-attribute show and store methods
    usb-gadget/f_printer: use per-attribute show and store methods
    usb-gadget/f_midi: use per-attribute show and store methods
    usb-gadget/f_loopback: use per-attribute show and store methods
    usb-gadget/ether: use per-attribute show and store methods
    usb-gadget/f_acm: use per-attribute show and store methods
    usb-gadget/f_hid: use per-attribute show and store methods
    ...

    Linus Torvalds
     
  • Pull vfs xattr cleanups from Al Viro.

    * 'for-linus-3' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    f2fs: xattr simplifications
    squashfs: xattr simplifications
    9p: xattr simplifications
    xattr handlers: Pass handler to operations instead of flags
    jffs2: Add missing capability check for listing trusted xattrs
    hfsplus: Remove unused xattr handler list operations
    ubifs: Remove unused security xattr handler
    vfs: Fix the posix_acl_xattr_list return value
    vfs: Check attribute names in posix acl xattr handers

    Linus Torvalds
     
  • Pull libnvdimm fixes from Dan Williams:

    - three fixes tagged for -stable including a crash fix, simple
    performance tweak, and an invalid i/o error.

    - build regression fix for the nvdimm unit tests

    - nvdimm documentation update

    * 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
    dax: fix __dax_pmd_fault crash
    libnvdimm: documentation clarifications
    libnvdimm, pmem: fix size trim in pmem_direct_access()
    libnvdimm, e820: fix numa node for e820-type-12 pmem ranges
    tools/testing/nvdimm, acpica: fix flag rename build breakage

    Linus Torvalds
     
  • Now that the xattr handler is passed to the xattr handler operations, we
    have access to the attribute name prefix, so simplify
    f2fs_xattr_generic_list.

    Also, f2fs_xattr_advise_list is only ever called for
    f2fs_xattr_advise_handler; there is no need to double check for that.

    Signed-off-by: Andreas Gruenbacher
    Cc: Jaegeuk Kim
    Cc: Changman Lee
    Cc: Chao Yu
    Cc: linux-f2fs-devel@lists.sourceforge.net
    Signed-off-by: Al Viro

    Andreas Gruenbacher
     
  • Now that the xattr handler is passed to the xattr handler operations, we
    have access to the attribute name prefix, so simplify the squashfs xattr
    handlers a bit.

    Signed-off-by: Andreas Gruenbacher
    Cc: Phillip Lougher
    Signed-off-by: Al Viro

    Andreas Gruenbacher
     
  • Now that the xattr handler is passed to the xattr handler operations, we
    can use the same get and set operations for the user, trusted, and security
    xattr namespaces. In those namespaces, we can access the full attribute
    name by "reattaching" the name prefix the vfs has skipped for us. Add a
    xattr_full_name helper to make this obvious in the code.

    For the "system.posix_acl_access" and "system.posix_acl_default"
    attributes, handler->prefix is the full attribute name; the suffix is the
    empty string.

    Signed-off-by: Andreas Gruenbacher
    Cc: Eric Van Hensbergen
    Cc: Ron Minnich
    Cc: Latchesar Ionkov
    Cc: v9fs-developer@lists.sourceforge.net
    Signed-off-by: Al Viro

    Andreas Gruenbacher
     
  • The xattr_handler operations are currently all passed a file system
    specific flags value which the operations can use to disambiguate between
    different handlers; some file systems use that to distinguish the xattr
    namespace, for example. In some oprations, it would be useful to also have
    access to the handler prefix. To allow that, pass a pointer to the handler
    to operations instead of the flags value alone.

    Signed-off-by: Andreas Gruenbacher
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Andreas Gruenbacher
     
  • The vfs checks if a task has the appropriate access for get and set
    operations, but it cannot do that for the list operation; the file system
    must check for that itself.

    Signed-off-by: Andreas Gruenbacher
    Reviewed-by: Christoph Hellwig
    Cc: David Woodhouse
    Cc: linux-mtd@lists.infradead.org
    Signed-off-by: Al Viro

    Andreas Gruenbacher
     
  • The list operations can never be called; they are even documented to be
    unused.

    Signed-off-by: Andreas Gruenbacher
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Andreas Gruenbacher
     
  • Ubifs installs a security xattr handler in sb->s_xattr but doesn't use the
    generic_{get,set,list,remove}xattr inode operations needed for processing
    this list of attribute handlers; the handler is never called. Instead,
    ubifs uses its own xattr handlers which also process security xattrs.

    Remove the dead code.

    Signed-off-by: Andreas Gruenbacher
    Reviewed-by: Richard Weinberger
    Cc: Artem Bityutskiy
    Cc: Adrian Hunter
    Cc: linux-mtd@lists.infradead.org
    Cc: Subodh Nijsure
    Signed-off-by: Al Viro

    Andreas Gruenbacher
     
  • When a filesystem that contains POSIX ACLs is mounted without ACL support
    (-o noacl), the appropriate behavior is not to list any existing POSIX ACL
    xattrs. The return value for list xattr handlers in this case is 0, not an
    error code: several filesystems that use the POSIX ACL xattr handlers do
    not expect the list operation to fail.

    Symlinks cannot have ACLs, so posix_acl_xattr_list will never be called for
    symlinks in the first place.

    Signed-off-by: Andreas Gruenbacher
    Signed-off-by: Al Viro

    Andreas Gruenbacher
     
  • The get and set operations of the POSIX ACL xattr handlers failed to check
    the attribute names, so all names with "system.posix_acl_access" or
    "system.posix_acl_default" as a prefix were accepted. Reject invalid names
    from now on.

    Signed-off-by: Andreas Gruenbacher
    Signed-off-by: Al Viro

    Andreas Gruenbacher
     
  • Pull SMB3 updates from Steve French:
    "A collection of SMB3 patches adding some reliability features
    (persistent and resilient handles) and improving SMB3 copy offload.

    I will have some additional patches for SMB3 encryption and SMB3.1.1
    signing (important security features), and also for improving SMB3
    persistent handle reconnection (setting ChannelSequence number e.g.)
    that I am still working on but wanted to get this set in since they
    can stand alone"

    * 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
    Allow copy offload (CopyChunk) across shares
    Add resilienthandles mount parm
    [SMB3] Send durable handle v2 contexts when use of persistent handles required
    [SMB3] Display persistenthandles in /proc/mounts for SMB3 shares if enabled
    [SMB3] Enable checking for continuous availability and persistent handle support
    [SMB3] Add parsing for new mount option controlling persistent handles
    Allow duplicate extents in SMB3 not just SMB3.1.1

    Linus Torvalds
     
  • Pull btrfs fixes and cleanups from Chris Mason:
    "Some of this got cherry-picked from a github repo this week, but I
    verified the patches.

    We have three small scrub cleanups and a collection of fixes"

    * 'for-linus-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
    btrfs: Use fs_info directly in btrfs_delete_unused_bgs
    btrfs: Fix lost-data-profile caused by balance bg
    btrfs: Fix lost-data-profile caused by auto removing bg
    btrfs: Remove len argument from scrub_find_csum
    btrfs: Reduce unnecessary arguments in scrub_recheck_block
    btrfs: Use scrub_checksum_data and scrub_checksum_tree_block for scrub_recheck_block_checksum
    btrfs: Reset sblock->xxx_error stats before calling scrub_recheck_block_checksum
    btrfs: scrub: setup all fields for sblock_to_check
    btrfs: scrub: set error stats when tree block spanning stripes
    Btrfs: fix race when listing an inode's xattrs
    Btrfs: fix race leading to BUG_ON when running delalloc for nodatacow
    Btrfs: fix race leading to incorrect item deletion when dropping extents
    Btrfs: fix sleeping inside atomic context in qgroup rescan worker
    Btrfs: fix race waiting for qgroup rescan worker
    btrfs: qgroup: exit the rescan worker during umount
    Btrfs: fix extent accounting for partial direct IO writes

    Linus Torvalds
     
  • Pull Ceph updates from Sage Weil:
    "There are several patches from Ilya fixing RBD allocation lifecycle
    issues, a series adding a nocephx_sign_messages option (and associated
    bug fixes/cleanups), several patches from Zheng improving the
    (directory) fsync behavior, a big improvement in IO for direct-io
    requests when striping is enabled from Caifeng, and several other
    small fixes and cleanups"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
    libceph: clear msg->con in ceph_msg_release() only
    libceph: add nocephx_sign_messages option
    libceph: stop duplicating client fields in messenger
    libceph: drop authorizer check from cephx msg signing routines
    libceph: msg signing callouts don't need con argument
    libceph: evaluate osd_req_op_data() arguments only once
    ceph: make fsync() wait unsafe requests that created/modified inode
    ceph: add request to i_unsafe_dirops when getting unsafe reply
    libceph: introduce ceph_x_authorizer_cleanup()
    ceph: don't invalidate page cache when inode is no longer used
    rbd: remove duplicate calls to rbd_dev_mapping_clear()
    rbd: set device_type::release instead of device::release
    rbd: don't free rbd_dev outside of the release callback
    rbd: return -ENOMEM instead of pool id if rbd_dev_create() fails
    libceph: use local variable cursor instead of &msg->cursor
    libceph: remove con argument in handle_reply()
    ceph: combine as many iovec as possile into one OSD request
    ceph: fix message length computation
    ceph: fix a comment typo
    rbd: drop null test before destroy functions

    Linus Torvalds
     

13 Nov, 2015

2 commits

  • Since 4.3 introduced devm_memremap_pages() the pfns handled by DAX may
    optionally have a struct page backing. When a mapped pfn reaches
    vmf_insert_pfn_pmd() it fails with a crash signature like the following:

    kernel BUG at mm/huge_memory.c:905!
    [..]
    Call Trace:
    [] __dax_pmd_fault+0x2ea/0x5b0
    [] xfs_filemap_pmd_fault+0x92/0x150 [xfs]
    [] handle_mm_fault+0x312/0x1b50

    Fix this by falling back to 4K mappings in the pfn_valid() case. Longer
    term, vmf_insert_pfn_pmd() needs to grow support for architectures that
    can provide a 'pmd_special' capability.

    Cc:
    Cc: Andrew Morton
    Reported-by: Ross Zwisler
    Signed-off-by: Dan Williams

    Dan Williams
     
  • Pull misc block fixes from Jens Axboe:
    "Stuff that got collected after the merge window opened. This
    contains:

    - NVMe:
    - Fix for non-striped transfer size setting for NVMe from
    Sathyavathi.
    - (Some) support for the weird Apple nvme controller in the
    macbooks. From Stephan Günther.

    - The error value leak for dax from Al.

    - A few minor blk-mq tweaks from me.

    - Add the new linux-block@vger.kernel.org mailing list to the
    MAINTAINERS file.

    - Discard fix for brd, from Jan.

    - A kerneldoc warning for block core from Randy.

    - An older fix from Vivek, converting a WARN_ON() to a rate limited
    printk when a device is hot removed with dirty inodes"

    * 'for-linus' of git://git.kernel.dk/linux-block:
    block: don't hardcode blk_qc_t -> tag mask
    dax_io(): don't let non-error value escape via retval instead of EFAULT
    block: fix blk-core.c kernel-doc warning
    fs/block_dev.c: Remove WARN_ON() when inode writeback fails
    NVMe: add support for Apple NVMe controller
    NVMe: use split lo_hi_{read,write}q
    blk-mq: mark __blk_mq_complete_request() static
    MAINTAINERS: add reference to new linux-block list
    NVMe: Increase the max transfer size when mdts is 0
    brd: Refuse improperly aligned discard requests

    Linus Torvalds
     

12 Nov, 2015

5 commits

  • Pull xfs updates from Dave Chinner:
    "There is nothing really major here - the only significant addition is
    the per-mount operation statistics infrastructure. Otherwises there's
    various ACL, xattr, DAX, AIO and logging fixes, and a smattering of
    small cleanups and fixes elsewhere.

    Summary:

    - per-mount operational statistics in sysfs
    - fixes for concurrent aio append write submission
    - various logging fixes
    - detection of zeroed logs and invalid log sequence numbers on v5 filesystems
    - memory allocation failure message improvements
    - a bunch of xattr/ACL fixes
    - fdatasync optimisation
    - miscellaneous other fixes and cleanups"

    * tag 'xfs-for-linus-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs: (39 commits)
    xfs: give all workqueues rescuer threads
    xfs: fix log recovery op header validation assert
    xfs: Fix error path in xfs_get_acl
    xfs: optimise away log forces on timestamp updates for fdatasync
    xfs: don't leak uuid table on rmmod
    xfs: invalidate cached acl if set via ioctl
    xfs: Plug memory leak in xfs_attrmulti_attr_set
    xfs: Validate the length of on-disk ACLs
    xfs: invalidate cached acl if set directly via xattr
    xfs: xfs_filemap_pmd_fault treats read faults as write faults
    xfs: add ->pfn_mkwrite support for DAX
    xfs: DAX does not use IO completion callbacks
    xfs: Don't use unwritten extents for DAX
    xfs: introduce BMAPI_ZERO for allocating zeroed extents
    xfs: fix inode size update overflow in xfs_map_direct()
    xfs: clear PF_NOFREEZE for xfsaild kthread
    xfs: fix an error code in xfs_fs_fill_super()
    xfs: stats are no longer dependent on CONFIG_PROC_FS
    xfs: simplify /proc teardown & error handling
    xfs: per-filesystem stats counter implementation
    ...

    Linus Torvalds
     
  • Pull nfsd updates from Bruce Fields:
    "Apologies for coming a little late in the merge window. Fortunately
    this is another fairly quiet one:

    Mainly smaller bugfixes and cleanup. We're still finding some bugs
    from the breakup of the big NFSv4 state lock in 3.17 -- thanks
    especially to Andrew Elble and Jeff Layton for tracking down some of
    the remaining races"

    * tag 'nfsd-4.4' of git://linux-nfs.org/~bfields/linux:
    svcrpc: document lack of some memory barriers
    nfsd: fix race with open / open upgrade stateids
    nfsd: eliminate sending duplicate and repeated delegations
    nfsd: remove recurring workqueue job to clean DRC
    SUNRPC: drop stale comment in svc_setup_socket()
    nfsd: ensure that seqid morphing operations are atomic wrt to copies
    nfsd: serialize layout stateid morphing operations
    nfsd: improve client_has_state to check for unused openowners
    nfsd: fix clid_inuse on mount with security change
    sunrpc/cache: make cache flushing more reliable.
    nfsd: move include of state.h from trace.c to trace.h
    sunrpc: avoid warning in gss_key_timeout
    lockd: get rid of reference-counted NSM RPC clients
    SUNRPC: Use MSG_SENDPAGE_NOTLAST when calling sendpage()
    lockd: create NSM handles per net namespace
    nfsd: switch unsigned char flags in svc_fh to bools
    nfsd: move svc_fh->fh_maxsize to just after fh_handle
    nfsd: drop null test before destroy functions
    nfsd: serialize state seqid morphing operations

    Linus Torvalds
     
  • Pull vfs update from Al Viro:

    - misc stable fixes

    - trivial kernel-doc and comment fixups

    - remove never-used block_page_mkwrite() wrapper function, and rename
    the function that is _actually_ used to not have double underscores.

    * 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    fs: 9p: cache.h: Add #define of include guard
    vfs: remove stale comment in inode_operations
    vfs: remove unused wrapper block_page_mkwrite()
    binfmt_elf: Correct `arch_check_elf's description
    fs: fix writeback.c kernel-doc warnings
    fs: fix inode.c kernel-doc warning
    fs/pipe.c: return error code rather than 0 in pipe_write()
    fs/pipe.c: preserve alloc_file() error code
    binfmt_elf: Don't clobber passed executable's file header
    FS-Cache: Handle a write to the page immediately beyond the EOF marker
    cachefiles: perform test on s_blocksize when opening cache file.
    FS-Cache: Don't override netfs's primary_index if registering failed
    FS-Cache: Increase reference of parent after registering, netfs success
    debugfs: fix refcount imbalance in start_creating

    Linus Torvalds
     
  • Signed-off-by: Al Viro
    Reported-by: Sasha Levin
    Cc: stable@vger.kernel.org # 4.0+
    Signed-off-by: Jens Axboe

    Al Viro
     
  • If a block device is hot removed and later last reference to device
    is put, we try to writeback the dirty inode. But device is gone and
    that writeback fails.

    Currently we do a WARN_ON() which does not seem to be the right thing.
    Convert it to a ratelimited kernel warning.

    Reported-by: Andi Kleen
    Signed-off-by: Vivek Goyal
    Acked-by: Tejun Heo
    [jmoyer@redhat.com: get rid of unnecessary name initialization, 80 cols]
    Signed-off-by: Jeff Moyer
    Signed-off-by: Jens Axboe

    Vivek Goyal
     

11 Nov, 2015

16 commits

  • The include file was intended to have an include guard, but the #define
    part is missing.

    Signed-off-by: Tzvetelin Katchov
    Signed-off-by: Al Viro

    Tzvetelin Katchov
     
  • The function currently called "__block_page_mkwrite()" used to be called
    "block_page_mkwrite()" until a wrapper for this function was added by:

    commit 24da4fab5a61 ("vfs: Create __block_page_mkwrite() helper passing
    error values back")

    This wrapper, the current "block_page_mkwrite()", is currently unused.
    __block_page_mkwrite() is used directly by ext4, nilfs2 and xfs.

    Remove the unused wrapper, rename __block_page_mkwrite() back to
    block_page_mkwrite() and update the comment above block_page_mkwrite().

    Signed-off-by: Ross Zwisler
    Reviewed-by: Jan Kara
    Cc: Jan Kara
    Cc: Christoph Hellwig
    Cc: Al Viro
    Signed-off-by: Al Viro

    Ross Zwisler
     
  • Correct `arch_check_elf's description, mistakenly copied and pasted from
    `arch_elf_pt_proc'.

    Signed-off-by: Maciej W. Rozycki
    Signed-off-by: Al Viro

    Maciej W. Rozycki
     
  • Fix kernel-doc warnings in fs/fs-writeback.c by moving a #define macro
    to after the function's opening brace. Also #undef this macro at the
    end of the function.

    ..//fs/fs-writeback.c:1984: warning: Excess function parameter 'inode' description in 'I_DIRTY_INODE'
    ..//fs/fs-writeback.c:1984: warning: Excess function parameter 'flags' description in 'I_DIRTY_INODE'

    Signed-off-by: Randy Dunlap
    Signed-off-by: Al Viro

    Randy Dunlap
     
  • Fix kernel-doc warning in fs/inode.c:

    ..//fs/inode.c:1606: warning: No description found for parameter 'inode'

    Signed-off-by: Randy Dunlap
    Signed-off-by: Al Viro

    Randy Dunlap
     
  • pipe_write() would return 0 if it failed to merge the beginning of the
    data to write with the last, partially filled pipe buffer. It should
    return an error code instead. Userspace programs could be confused by
    write() returning 0 when called with a nonzero 'count'.

    The EFAULT error case was a regression from f0d1bec9d5 ("new helper:
    copy_page_from_iter()"), while the ops->confirm() error case was a much
    older bug.

    Test program:

    #include
    #include
    #include

    int main(void)
    {
    int fd[2];
    char data[1] = {0};

    assert(0 == pipe(fd));
    assert(1 == write(fd[1], data, 1));

    /* prior to this patch, write() returned 0 here */
    assert(-1 == write(fd[1], NULL, 1));
    assert(errno == EFAULT);
    }

    Cc: stable@vger.kernel.org # at least v3.15+
    Signed-off-by: Eric Biggers
    Signed-off-by: Al Viro

    Eric Biggers
     
  • If sys_pipe() was unable to allocate a 'struct file', it always failed
    with ENFILE, which means "The number of simultaneously open files in the
    system would exceed a system-imposed limit." However, alloc_file()
    actually returns an ERR_PTR value and might fail with other error codes.
    Currently, in addition to ENFILE, it can fail with ENOMEM, potentially
    when there are few open files in the system. Update sys_pipe() to
    preserve this error code.

    In a prior submission of a similar patch (1) some concern was raised
    about introducing a new error code for sys_pipe(). However, for most
    system calls, programs cannot assume that new error codes will never be
    introduced. In addition, ENOMEM was, in fact, already a possible error
    code for sys_pipe(), in the case where the file descriptor table could
    not be expanded due to insufficient memory.

    (1) http://comments.gmane.org/gmane.linux.kernel/1357942

    Signed-off-by: Eric Biggers
    Signed-off-by: Al Viro

    Eric Biggers
     
  • Do not clobber the buffer space passed from `search_binary_handler' and
    originally preloaded by `prepare_binprm' with the executable's file
    header by overwriting it with its interpreter's file header. Instead
    keep the buffer space intact and directly use the data structure locally
    allocated for the interpreter's file header, fixing a bug introduced in
    2.1.14 with loadable module support (linux-mips.org commit beb11695
    [Import of Linux/MIPS 2.1.14], predating kernel.org repo's history).
    Adjust the amount of data read from the interpreter's file accordingly.

    This was not an issue before loadable module support, because back then
    `load_elf_binary' was executed only once for a given ELF executable,
    whether the function succeeded or failed.

    With loadable module support supported and enabled, upon a failure of
    `load_elf_binary' -- which may for example be caused by architecture
    code rejecting an executable due to a missing hardware feature requested
    in the file header -- a module load is attempted and then the function
    reexecuted by `search_binary_handler'. With the executable's file
    header replaced with its interpreter's file header the executable can
    then be erroneously accepted in this subsequent attempt.

    Cc: stable@vger.kernel.org # all the way back
    Signed-off-by: Maciej W. Rozycki
    Signed-off-by: Al Viro

    Maciej W. Rozycki
     
  • Handle a write being requested to the page immediately beyond the EOF
    marker on a cache object. Currently this gets an assertion failure in
    CacheFiles because the EOF marker is used there to encode information about
    a partial page at the EOF - which could lead to an unknown blank spot in
    the file if we extend the file over it.

    The problem is actually in fscache where we check the index of the page
    being written against store_limit. store_limit is set to the number of
    pages that we're allowed to store by fscache_set_store_limit() - which
    means it's one more than the index of the last page we're allowed to store.
    The problem is that we permit writing to a page with an index _equal_ to
    the store limit - when we should reject that case.

    Whilst we're at it, change the triggered assertion in CacheFiles to just
    return -ENOBUFS instead.

    The assertion failure looks something like this:

    CacheFiles: Assertion failed
    1000 < 7b1 is false
    ------------[ cut here ]------------
    kernel BUG at fs/cachefiles/rdwr.c:962!
    ...
    RIP: 0010:[] [] cachefiles_write_page+0x273/0x2d0 [cachefiles]

    Cc: stable@vger.kernel.org # v2.6.31+; earlier - that + backport of a17754f (at least)
    Signed-off-by: David Howells
    Signed-off-by: Al Viro

    David Howells
     
  • cachefiles requires that s_blocksize in the cache is not greater than
    PAGE_SIZE, and performs the check every time a block is accessed.

    Move the test to the place where the file is "opened", where other
    file-validity tests are performed.

    Signed-off-by: NeilBrown
    Signed-off-by: David Howells
    Signed-off-by: Al Viro

    NeilBrown
     
  • Only override netfs->primary_index when registering success.

    Cc: stable@vger.kernel.org # v2.6.30+
    Signed-off-by: Kinglong Mee
    Signed-off-by: David Howells
    Signed-off-by: Al Viro

    Kinglong Mee
     
  • If netfs exist, fscache should not increase the reference of parent's
    usage and n_children, otherwise, never be decreased.

    v2: thanks David's suggest,
    move increasing reference of parent if success
    use kmem_cache_free() freeing primary_index directly

    v3: don't move "netfs->primary_index->parent = &fscache_fsdef_index;"

    Cc: stable@vger.kernel.org # v2.6.30+
    Signed-off-by: Kinglong Mee
    Signed-off-by: David Howells
    Signed-off-by: Al Viro

    Kinglong Mee
     
  • In debugfs' start_creating(), we pin the file system to safely access
    its root. When we failed to create a file, we unpin the file system via
    failed_creating() to release the mount count and eventually the reference
    of the vfsmount.

    However, when we run into an error during lookup_one_len() when still
    in start_creating(), we only release the parent's mutex but not so the
    reference on the mount. Looks like it was done in the past, but after
    splitting portions of __create_file() into start_creating() and
    end_creating() via 190afd81e4a5 ("debugfs: split the beginning and the
    end of __create_file() off"), this seemed missed. Noticed during code
    review.

    Fixes: 190afd81e4a5 ("debugfs: split the beginning and the end of __create_file() off")
    Cc: stable@vger.kernel.org # v4.0+
    Signed-off-by: Daniel Borkmann
    Signed-off-by: Al Viro

    Daniel Borkmann
     
  • No need to use root->fs_info in btrfs_delete_unused_bgs(),
    use fs_info directly instead.

    Signed-off-by: Zhao Lei
    Signed-off-by: Chris Mason

    Zhao Lei
     
  • Reproduce:
    (In integration-4.3 branch)

    TEST_DEV=(/dev/vdg /dev/vdh)
    TEST_DIR=/mnt/tmp

    umount "$TEST_DEV" >/dev/null
    mkfs.btrfs -f -d raid1 "${TEST_DEV[@]}"

    mount -o nospace_cache "$TEST_DEV" "$TEST_DIR"
    btrfs balance start -dusage=0 $TEST_DIR
    btrfs filesystem usage $TEST_DIR

    dd if=/dev/zero of="$TEST_DIR"/file count=100
    btrfs filesystem usage $TEST_DIR

    Result:
    We can see "no data chunk" in first "btrfs filesystem usage":
    # btrfs filesystem usage $TEST_DIR
    Overall:
    ...
    Metadata,single: Size:8.00MiB, Used:0.00B
    /dev/vdg 8.00MiB
    Metadata,RAID1: Size:122.88MiB, Used:112.00KiB
    /dev/vdg 122.88MiB
    /dev/vdh 122.88MiB
    System,single: Size:4.00MiB, Used:0.00B
    /dev/vdg 4.00MiB
    System,RAID1: Size:8.00MiB, Used:16.00KiB
    /dev/vdg 8.00MiB
    /dev/vdh 8.00MiB
    Unallocated:
    /dev/vdg 1.06GiB
    /dev/vdh 1.07GiB

    And "data chunks changed from raid1 to single" in second
    "btrfs filesystem usage":
    # btrfs filesystem usage $TEST_DIR
    Overall:
    ...
    Data,single: Size:256.00MiB, Used:0.00B
    /dev/vdh 256.00MiB
    Metadata,single: Size:8.00MiB, Used:0.00B
    /dev/vdg 8.00MiB
    Metadata,RAID1: Size:122.88MiB, Used:112.00KiB
    /dev/vdg 122.88MiB
    /dev/vdh 122.88MiB
    System,single: Size:4.00MiB, Used:0.00B
    /dev/vdg 4.00MiB
    System,RAID1: Size:8.00MiB, Used:16.00KiB
    /dev/vdg 8.00MiB
    /dev/vdh 8.00MiB
    Unallocated:
    /dev/vdg 1.06GiB
    /dev/vdh 841.92MiB

    Reason:
    btrfs balance delete last data chunk in case of no data in
    the filesystem, then we can see "no data chunk" by "fi usage"
    command.

    And when we do write operation to fs, the only available data
    profile is 0x0, result is all new chunks are allocated single type.

    Fix:
    Allocate a data chunk explicitly to ensure we don't lose the
    raid profile for data.

    Test:
    Test by above script, and confirmed the logic by debug output.

    Reviewed-by: Filipe Manana
    Signed-off-by: Zhao Lei
    Signed-off-by: Chris Mason

    Zhao Lei
     
  • Reproduce:
    (In integration-4.3 branch)

    TEST_DEV=(/dev/vdg /dev/vdh)
    TEST_DIR=/mnt/tmp

    umount "$TEST_DEV" >/dev/null
    mkfs.btrfs -f -d raid1 "${TEST_DEV[@]}"

    mount -o nospace_cache "$TEST_DEV" "$TEST_DIR"
    umount "$TEST_DEV"

    mount -o nospace_cache "$TEST_DEV" "$TEST_DIR"
    btrfs filesystem usage $TEST_DIR

    We can see the data chunk changed from raid1 to single:
    # btrfs filesystem usage $TEST_DIR
    Data,single: Size:8.00MiB, Used:0.00B
    /dev/vdg 8.00MiB
    #

    Reason:
    When a empty filesystem mount with -o nospace_cache, the last
    data blockgroup will be auto-removed in umount.

    Then if we mount it again, there is no data chunk in the
    filesystem, so the only available data profile is 0x0, result
    is all new chunks are created as single type.

    Fix:
    Don't auto-delete last blockgroup for a raid type.

    Test:
    Test by above script, and confirmed the logic by debug output.

    Reviewed-by: Filipe Manana
    Signed-off-by: Zhao Lei
    Signed-off-by: Chris Mason

    Zhao Lei