18 Jan, 2020

1 commit

  • commit 83c9c547168e8b914ea6398430473a4de68c52cc upstream.

    Commit 85a8ce62c2ea ("block: add bio_truncate to fix guard_bio_eod")
    adds bio_truncate() for handling bio EOD. However, bio_truncate()
    doesn't use the passed 'op' parameter from guard_bio_eod's callers.

    So bio_trunacate() may retrieve wrong 'op', and zering pages may
    not be done for READ bio.

    Fixes this issue by moving guard_bio_eod() after bio_set_op_attrs()
    in submit_bh_wbc() so that bio_truncate() can always retrieve correct
    op info.

    Meantime remove the 'op' parameter from guard_bio_eod() because it isn't
    used any more.

    Cc: Carlos Maiolino
    Cc: linux-fsdevel@vger.kernel.org
    Fixes: 85a8ce62c2ea ("block: add bio_truncate to fix guard_bio_eod")
    Signed-off-by: Ming Lei
    Signed-off-by: Greg Kroah-Hartman

    Fold in kerneldoc and bio_op() change.

    Signed-off-by: Jens Axboe

    Ming Lei
     

09 Jan, 2020

1 commit

  • [ Upstream commit 85a8ce62c2eabe28b9d76ca4eecf37922402df93 ]

    Some filesystem, such as vfat, may send bio which crosses device boundary,
    and the worse thing is that the IO request starting within device boundaries
    can contain more than one segment past EOD.

    Commit dce30ca9e3b6 ("fs: fix guard_bio_eod to check for real EOD errors")
    tries to fix this issue by returning -EIO for this situation. However,
    this way lets fs user code lose chance to handle -EIO, then sync_inodes_sb()
    may hang for ever.

    Also the current truncating on last segment is dangerous by updating the
    last bvec, given bvec table becomes not immutable any more, and fs bio
    users may not retrieve the truncated pages via bio_for_each_segment_all() in
    its .end_io callback.

    Fixes this issue by supporting multi-segment truncating. And the
    approach is simpler:

    - just update bio size since block layer can make correct bvec with
    the updated bio size. Then bvec table becomes really immutable.

    - zero all truncated segments for read bio

    Cc: Carlos Maiolino
    Cc: linux-fsdevel@vger.kernel.org
    Fixed-by: dce30ca9e3b6 ("fs: fix guard_bio_eod to check for real EOD errors")
    Reported-by: syzbot+2b9e54155c8c25d8d165@syzkaller.appspotmail.com
    Signed-off-by: Ming Lei
    Signed-off-by: Jens Axboe
    Signed-off-by: Sasha Levin

    Ming Lei
     

16 Jul, 2019

1 commit

  • Pull more block updates from Jens Axboe:
    "A later pull request with some followup items. I had some vacation
    coming up to the merge window, so certain things items were delayed a
    bit. This pull request also contains fixes that came in within the
    last few days of the merge window, which I didn't want to push right
    before sending you a pull request.

    This contains:

    - NVMe pull request, mostly fixes, but also a few minor items on the
    feature side that were timing constrained (Christoph et al)

    - Report zones fixes (Damien)

    - Removal of dead code (Damien)

    - Turn on cgroup psi memstall (Josef)

    - block cgroup MAINTAINERS entry (Konstantin)

    - Flush init fix (Josef)

    - blk-throttle low iops timing fix (Konstantin)

    - nbd resize fixes (Mike)

    - nbd 0 blocksize crash fix (Xiubo)

    - block integrity error leak fix (Wenwen)

    - blk-cgroup writeback and priority inheritance fixes (Tejun)"

    * tag 'for-linus-20190715' of git://git.kernel.dk/linux-block: (42 commits)
    MAINTAINERS: add entry for block io cgroup
    null_blk: fixup ->report_zones() for !CONFIG_BLK_DEV_ZONED
    block: Limit zone array allocation size
    sd_zbc: Fix report zones buffer allocation
    block: Kill gfp_t argument of blkdev_report_zones()
    block: Allow mapping of vmalloc-ed buffers
    block/bio-integrity: fix a memory leak bug
    nvme: fix NULL deref for fabrics options
    nbd: add netlink reconfigure resize support
    nbd: fix crash when the blksize is zero
    block: Disable write plugging for zoned block devices
    block: Fix elevator name declaration
    block: Remove unused definitions
    nvme: fix regression upon hot device removal and insertion
    blk-throttle: fix zero wait time for iops throttled group
    block: Fix potential overflow in blk_report_zones()
    blkcg: implement REQ_CGROUP_PUNT
    blkcg, writeback: Implement wbc_blkcg_css()
    blkcg, writeback: Add wbc->no_cgroup_owner
    blkcg, writeback: Rename wbc_account_io() to wbc_account_cgroup_owner()
    ...

    Linus Torvalds
     

10 Jul, 2019

1 commit


28 Jun, 2019

1 commit


21 May, 2019

1 commit

  • Add SPDX license identifiers to all files which:

    - Have no license information of any form

    - Have EXPORT_.*_SYMBOL_GPL inside which was used in the
    initial scan/conversion to ignore the file

    These files fall under the project license, GPL v2 only. The resulting SPDX
    license identifier is:

    GPL-2.0-only

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

01 May, 2019

2 commits

  • In iomap_write_end, we're not holding a page reference anymore when
    calling the page_done callback, but the callback needs that reference to
    access the page. To fix that, move the put_page call in
    __generic_write_end into the callers of __generic_write_end. Then, in
    iomap_write_end, put the page after calling the page_done callback.

    Reported-by: Jan Kara
    Fixes: 63899c6f8851 ("iomap: add a page_done callback")
    Signed-off-by: Andreas Gruenbacher
    Reviewed-by: Jan Kara
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Andreas Gruenbacher
     
  • The VFS-internal __generic_write_end helper always returns the value of
    its @copied argument. This can be confusing, and it isn't very useful
    anyway, so turn __generic_write_end into a function returning void
    instead.

    Signed-off-by: Andreas Gruenbacher
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Andreas Gruenbacher
     

01 Mar, 2019

1 commit

  • guard_bio_eod() can truncate a segment in bio to allow it to do IO on
    odd last sectors of a device.

    It already checks if the IO starts past EOD, but it does not consider
    the possibility of an IO request starting within device boundaries can
    contain more than one segment past EOD.

    In such cases, truncated_bytes can be bigger than PAGE_SIZE, and will
    underflow bvec->bv_len.

    Fix this by checking if truncated_bytes is lower than PAGE_SIZE.

    This situation has been found on filesystems such as isofs and vfat,
    which doesn't check the device size before mount, if the device is
    smaller than the filesystem itself, a readahead on such filesystem,
    which spans EOD, can trigger this situation, leading a call to
    zero_user() with a wrong size possibly corrupting memory.

    I didn't see any crash, or didn't let the system run long enough to
    check if memory corruption will be hit somewhere, but adding
    instrumentation to guard_bio_end() to check truncated_bytes size, was
    enough to see the error.

    The following script can trigger the error.

    MNT=/mnt
    IMG=./DISK.img
    DEV=/dev/loop0

    mkfs.vfat $IMG
    mount $IMG $MNT
    cp -R /etc $MNT &> /dev/null
    umount $MNT

    losetup -D

    losetup --find --show --sizelimit 16247280 $IMG
    mount $DEV $MNT

    find $MNT -type f -exec cat {} + >/dev/null

    Kudos to Eric Sandeen for coming up with the reproducer above

    Reviewed-by: Ming Lei
    Signed-off-by: Carlos Maiolino
    Signed-off-by: Jens Axboe

    Carlos Maiolino
     

15 Feb, 2019

2 commits

  • Pull in 5.0-rc6 to avoid a dumb merge conflict with fs/iomap.c.
    This is needed since io_uring is now based on the block branch,
    to avoid a conflict between the multi-page bvecs and the bits
    of io_uring that touch the core block parts.

    * tag 'v5.0-rc6': (525 commits)
    Linux 5.0-rc6
    x86/mm: Make set_pmd_at() paravirt aware
    MAINTAINERS: Update the ocores i2c bus driver maintainer, etc
    blk-mq: remove duplicated definition of blk_mq_freeze_queue
    Blk-iolatency: warn on negative inflight IO counter
    blk-iolatency: fix IO hang due to negative inflight counter
    MAINTAINERS: unify reference to xen-devel list
    x86/mm/cpa: Fix set_mce_nospec()
    futex: Handle early deadlock return correctly
    futex: Fix barrier comment
    net: dsa: b53: Fix for failure when irq is not defined in dt
    blktrace: Show requests without sector
    mips: cm: reprime error cause
    mips: loongson64: remove unreachable(), fix loongson_poweroff().
    sit: check if IPv6 enabled before calling ip6_err_gen_icmpv6_unreach()
    geneve: should not call rt6_lookup() when ipv6 was disabled
    KVM: nVMX: unconditionally cancel preemption timer in free_nested (CVE-2019-7221)
    KVM: x86: work around leak of uninitialized stack contents (CVE-2019-7222)
    kvm: fix kvm_ioctl_create_device() reference counting (CVE-2019-6974)
    signal: Better detection of synchronous signals
    ...

    Jens Axboe
     
  • Once multi-page bvec is enabled, the last bvec may include more than one
    page, this patch use mp_bvec_last_segment() to truncate the bio.

    Reviewed-by: Omar Sandoval
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Ming Lei
    Signed-off-by: Jens Axboe

    Ming Lei
     

07 Feb, 2019

1 commit

  • When something let __find_get_block_slow() hit all_mapped path, it calls
    printk() for 100+ times per a second. But there is no need to print same
    message with such high frequency; it is just asking for stall warning, or
    at least bloating log files.

    [ 399.866302][T15342] __find_get_block_slow() failed. block=1, b_blocknr=8
    [ 399.873324][T15342] b_state=0x00000029, b_size=512
    [ 399.878403][T15342] device loop0 blocksize: 4096
    [ 399.883296][T15342] __find_get_block_slow() failed. block=1, b_blocknr=8
    [ 399.890400][T15342] b_state=0x00000029, b_size=512
    [ 399.895595][T15342] device loop0 blocksize: 4096
    [ 399.900556][T15342] __find_get_block_slow() failed. block=1, b_blocknr=8
    [ 399.907471][T15342] b_state=0x00000029, b_size=512
    [ 399.912506][T15342] device loop0 blocksize: 4096

    This patch reduces frequency to up to once per a second, in addition to
    concatenating three lines into one.

    [ 399.866302][T15342] __find_get_block_slow() failed. block=1, b_blocknr=8, b_state=0x00000029, b_size=512, device loop0 blocksize: 4096

    Signed-off-by: Tetsuo Handa
    Reviewed-by: Jan Kara
    Cc: Dmitry Vyukov
    Signed-off-by: Jens Axboe

    Tetsuo Handa
     

05 Jan, 2019

1 commit


08 Dec, 2018

1 commit

  • One of the goals of this series is to remove a separate reference to
    the css of the bio. This can and should be accessed via bio_blkcg(). In
    this patch, wbc_init_bio() now requires a bio to have a device
    associated with it.

    Signed-off-by: Dennis Zhou
    Reviewed-by: Josef Bacik
    Acked-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Dennis Zhou
     

03 Nov, 2018

1 commit

  • Pull block layer fixes from Jens Axboe:
    "The biggest part of this pull request is the revert of the blkcg
    cleanup series. It had one fix earlier for a stacked device issue, but
    another one was reported. Rather than play whack-a-mole with this,
    revert the entire series and try again for the next kernel release.

    Apart from that, only small fixes/changes.

    Summary:

    - Indentation fixup for mtip32xx (Colin Ian King)

    - The blkcg cleanup series revert (Dennis Zhou)

    - Two NVMe fixes. One fixing a regression in the nvme request
    initialization in this merge window, causing nvme-fc to not work.
    The other is a suspend/resume p2p resource issue (James, Keith)

    - Fix sg discard merge, allowing us to merge in cases where we didn't
    before (Jianchao Wang)

    - Call rq_qos_exit() after the queue is frozen, preventing a hang
    (Ming)

    - Fix brd queue setup, fixing an oops if we fail setting up all
    devices (Ming)"

    * tag 'for-linus-20181102' of git://git.kernel.dk/linux-block:
    nvme-pci: fix conflicting p2p resource adds
    nvme-fc: fix request private initialization
    blkcg: revert blkcg cleanups series
    block: brd: associate with queue until adding disk
    block: call rq_qos_exit() after queue is frozen
    mtip32xx: clean an indentation issue, remove extraneous tabs
    block: fix the DISCARD request merge

    Linus Torvalds
     

02 Nov, 2018

1 commit

  • This reverts a series committed earlier due to null pointer exception
    bug report in [1]. It seems there are edge case interactions that I did
    not consider and will need some time to understand what causes the
    adverse interactions.

    The original series can be found in [2] with a follow up series in [3].

    [1] https://www.spinics.net/lists/cgroups/msg20719.html
    [2] https://lore.kernel.org/lkml/20180911184137.35897-1-dennisszhou@gmail.com/
    [3] https://lore.kernel.org/lkml/20181020185612.51587-1-dennis@kernel.org/

    This reverts the following commits:
    d459d853c2ed, b2c3fa546705, 101246ec02b5, b3b9f24f5fcc, e2b0989954ae,
    f0fcb3ec89f3, c839e7a03f92, bdc2491708c4, 74b7c02a9bc1, 5bf9a1f3b4ef,
    a7b39b4e961c, 07b05bcc3213, 49f4c2dc2b50, 27e6fa996c53

    Signed-off-by: Dennis Zhou
    Signed-off-by: Jens Axboe

    Dennis Zhou
     

29 Oct, 2018

1 commit

  • Pull XArray conversion from Matthew Wilcox:
    "The XArray provides an improved interface to the radix tree data
    structure, providing locking as part of the API, specifying GFP flags
    at allocation time, eliminating preloading, less re-walking the tree,
    more efficient iterations and not exposing RCU-protected pointers to
    its users.

    This patch set

    1. Introduces the XArray implementation

    2. Converts the pagecache to use it

    3. Converts memremap to use it

    The page cache is the most complex and important user of the radix
    tree, so converting it was most important. Converting the memremap
    code removes the only other user of the multiorder code, which allows
    us to remove the radix tree code that supported it.

    I have 40+ followup patches to convert many other users of the radix
    tree over to the XArray, but I'd like to get this part in first. The
    other conversions haven't been in linux-next and aren't suitable for
    applying yet, but you can see them in the xarray-conv branch if you're
    interested"

    * 'xarray' of git://git.infradead.org/users/willy/linux-dax: (90 commits)
    radix tree: Remove multiorder support
    radix tree test: Convert multiorder tests to XArray
    radix tree tests: Convert item_delete_rcu to XArray
    radix tree tests: Convert item_kill_tree to XArray
    radix tree tests: Move item_insert_order
    radix tree test suite: Remove multiorder benchmarking
    radix tree test suite: Remove __item_insert
    memremap: Convert to XArray
    xarray: Add range store functionality
    xarray: Move multiorder_check to in-kernel tests
    xarray: Move multiorder_shrink to kernel tests
    xarray: Move multiorder account test in-kernel
    radix tree test suite: Convert iteration test to XArray
    radix tree test suite: Convert tag_tagged_items to XArray
    radix tree: Remove radix_tree_clear_tags
    radix tree: Remove radix_tree_maybe_preload_order
    radix tree: Remove split/join code
    radix tree: Remove radix_tree_update_node_t
    page cache: Finish XArray conversion
    dax: Convert page fault handlers to XArray
    ...

    Linus Torvalds
     

21 Oct, 2018

1 commit


22 Sep, 2018

1 commit

  • One of the goals of this series is to remove a separate reference to
    the css of the bio. This can and should be accessed via bio_blkcg. In
    this patch, the wbc_init_bio call is changed such that it must be called
    after a queue has been associated with the bio.

    Signed-off-by: Dennis Zhou
    Reviewed-by: Josef Bacik
    Acked-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Dennis Zhou (Facebook)
     

30 Aug, 2018

1 commit


18 Aug, 2018

1 commit

  • The buffer_head can consume a significant amount of system memory and is
    directly related to the amount of page cache. In our production
    environment we have observed that a lot of machines are spending a
    significant amount of memory as buffer_head and can not be left as
    system memory overhead.

    Charging buffer_head is not as simple as adding __GFP_ACCOUNT to the
    allocation. The buffer_heads can be allocated in a memcg different from
    the memcg of the page for which buffer_heads are being allocated. One
    concrete example is memory reclaim. The reclaim can trigger I/O of
    pages of any memcg on the system. So, the right way to charge
    buffer_head is to extract the memcg from the page for which buffer_heads
    are being allocated and then use targeted memcg charging API.

    [shakeelb@google.com: use __GFP_ACCOUNT for directed memcg charging]
    Link: http://lkml.kernel.org/r/20180702220208.213380-1-shakeelb@google.com
    Link: http://lkml.kernel.org/r/20180627191250.209150-3-shakeelb@google.com
    Signed-off-by: Shakeel Butt
    Acked-by: Johannes Weiner
    Cc: Michal Hocko
    Cc: Jan Kara
    Cc: Amir Goldstein
    Cc: Greg Thelen
    Cc: Vladimir Davydov
    Cc: Roman Gushchin
    Cc: Alexander Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Shakeel Butt
     

20 Jun, 2018

2 commits

  • In iomap_to_bh, not only mark buffer heads in IOMAP_UNWRITTEN maps as
    new, but also buffer heads in IOMAP_MAPPED maps with the IOMAP_F_NEW
    flag set. This will be used by filesystems like gfs2, which allocate
    blocks in iomap->begin.

    Minor corrections to the comment for IOMAP_UNWRITTEN maps.

    Signed-off-by: Andreas Gruenbacher
    Signed-off-by: Christoph Hellwig
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Andreas Gruenbacher
     
  • Bits of the buffer.c based write_end implementations that don't know
    about buffer_heads and can be reused by other implementations.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Brian Foster
    Reviewed-by: Andreas Gruenbacher
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Christoph Hellwig
     

02 Jun, 2018

1 commit

  • This function is only used by the iomap code, depends on being called
    from it, and will soon stop poking into buffer head internals.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Andreas Gruenbacher
    Reviewed-by: Dave Chinner
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Christoph Hellwig
     

13 Apr, 2018

1 commit


12 Apr, 2018

2 commits

  • Remove the address_space ->tree_lock and use the xa_lock newly added to
    the radix_tree_root. Rename the address_space ->page_tree to ->i_pages,
    since we don't really care that it's a tree.

    [willy@infradead.org: fix nds32, fs/dax.c]
    Link: http://lkml.kernel.org/r/20180406145415.GB20605@bombadil.infradead.orgLink: http://lkml.kernel.org/r/20180313132639.17387-9-willy@infradead.org
    Signed-off-by: Matthew Wilcox
    Acked-by: Jeff Layton
    Cc: Darrick J. Wong
    Cc: Dave Chinner
    Cc: Ryusuke Konishi
    Cc: Will Deacon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • XFS currently contains a copy-and-paste of __set_page_dirty(). Export
    it from buffer.c instead.

    Link: http://lkml.kernel.org/r/20180313132639.17387-6-willy@infradead.org
    Signed-off-by: Matthew Wilcox
    Acked-by: Jeff Layton
    Reviewed-by: Darrick J. Wong
    Cc: Ryusuke Konishi
    Cc: Dave Chinner
    Cc: Will Deacon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     

06 Apr, 2018

1 commit

  • Prior to commit d47992f86b30 ("mm: change invalidatepage prototype to
    accept length"), an offset of 0 meant that the full page was being
    invalidated. After that commit, we need to instead check the length.

    Jan said:
    :
    : The only possible issue is that try_to_release_page() was called more
    : often than necessary. Otherwise the issue is harmless but still it's good
    : to have this fixed.

    Link: http://lkml.kernel.org/r/x49fu5rtnzs.fsf@segfault.boston.devel.redhat.com
    Fixes: d47992f86b307 ("mm: change invalidatepage prototype to accept length")
    Signed-off-by: Jeff Moyer
    Reviewed-by: Jan Kara
    Cc: Lukas Czerner
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jeff Moyer
     

19 Mar, 2018

1 commit

  • There are 2 distinct freezing mechanisms - one operates on block
    devices and another one directly on super blocks. Both end up with the
    same result, but thaw of only one of these does not thaw the other.

    In particular fsfreeze --freeze uses the ioctl variant going to the
    super block. Since prior to this patch emergency thaw was not doing
    a relevant thaw, filesystems frozen with this method remained
    unaffected.

    The patch is a hack which adds blind unfreezing.

    In order to keep the super block write-locked the whole time the code
    is shuffled around and the newly introduced __iterate_supers is
    employed.

    Signed-off-by: Mateusz Guzik
    Signed-off-by: Al Viro

    Mateusz Guzik
     

01 Feb, 2018

1 commit

  • Pull misc vfs updates from Al Viro:
    "All kinds of misc stuff, without any unifying topic, from various
    people.

    Neil's d_anon patch, several bugfixes, introduction of kvmalloc
    analogue of kmemdup_user(), extending bitfield.h to deal with
    fixed-endians, assorted cleanups all over the place..."

    * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (28 commits)
    alpha: osf_sys.c: use timespec64 where appropriate
    alpha: osf_sys.c: fix put_tv32 regression
    jffs2: Fix use-after-free bug in jffs2_iget()'s error handling path
    dcache: delete unused d_hash_mask
    dcache: subtract d_hash_shift from 32 in advance
    fs/buffer.c: fold init_buffer() into init_page_buffers()
    fs: fold __inode_permission() into inode_permission()
    fs: add RWF_APPEND
    sctp: use vmemdup_user() rather than badly open-coding memdup_user()
    snd_ctl_elem_init_enum_names(): switch to vmemdup_user()
    replace_user_tlv(): switch to vmemdup_user()
    new primitive: vmemdup_user()
    memdup_user(): switch to GFP_USER
    eventfd: fold eventfd_ctx_get() into eventfd_ctx_fileget()
    eventfd: fold eventfd_ctx_read() into eventfd_read()
    eventfd: convert to use anon_inode_getfd()
    nfs4file: get rid of pointless include of btrfs.h
    uvc_v4l2: clean copyin/copyout up
    vme_user: don't use __copy_..._user()
    usx2y: don't bother with memdup_user() for 16-byte structure
    ...

    Linus Torvalds
     

26 Jan, 2018

1 commit


07 Jan, 2018

1 commit


16 Nov, 2017

1 commit

  • Every pagevec_init user claims the pages being released are hot even in
    cases where it is unlikely the pages are hot. As no one cares about the
    hotness of pages being released to the allocator, just ditch the
    parameter.

    No performance impact is expected as the overhead is marginal. The
    parameter is removed simply because it is a bit stupid to have a useless
    parameter copied everywhere.

    Link: http://lkml.kernel.org/r/20171018075952.10627-6-mgorman@techsingularity.net
    Signed-off-by: Mel Gorman
    Acked-by: Vlastimil Babka
    Cc: Andi Kleen
    Cc: Dave Chinner
    Cc: Dave Hansen
    Cc: Jan Kara
    Cc: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     

15 Nov, 2017

2 commits

  • Pull core block layer updates from Jens Axboe:
    "This is the main pull request for block storage for 4.15-rc1.

    Nothing out of the ordinary in here, and no API changes or anything
    like that. Just various new features for drivers, core changes, etc.
    In particular, this pull request contains:

    - A patch series from Bart, closing the whole on blk/scsi-mq queue
    quescing.

    - A series from Christoph, building towards hidden gendisks (for
    multipath) and ability to move bio chains around.

    - NVMe
    - Support for native multipath for NVMe (Christoph).
    - Userspace notifications for AENs (Keith).
    - Command side-effects support (Keith).
    - SGL support (Chaitanya Kulkarni)
    - FC fixes and improvements (James Smart)
    - Lots of fixes and tweaks (Various)

    - bcache
    - New maintainer (Michael Lyle)
    - Writeback control improvements (Michael)
    - Various fixes (Coly, Elena, Eric, Liang, et al)

    - lightnvm updates, mostly centered around the pblk interface
    (Javier, Hans, and Rakesh).

    - Removal of unused bio/bvec kmap atomic interfaces (me, Christoph)

    - Writeback series that fix the much discussed hundreds of millions
    of sync-all units. This goes all the way, as discussed previously
    (me).

    - Fix for missing wakeup on writeback timer adjustments (Yafang
    Shao).

    - Fix laptop mode on blk-mq (me).

    - {mq,name} tupple lookup for IO schedulers, allowing us to have
    alias names. This means you can use 'deadline' on both !mq and on
    mq (where it's called mq-deadline). (me).

    - blktrace race fix, oopsing on sg load (me).

    - blk-mq optimizations (me).

    - Obscure waitqueue race fix for kyber (Omar).

    - NBD fixes (Josef).

    - Disable writeback throttling by default on bfq, like we do on cfq
    (Luca Miccio).

    - Series from Ming that enable us to treat flush requests on blk-mq
    like any other request. This is a really nice cleanup.

    - Series from Ming that improves merging on blk-mq with schedulers,
    getting us closer to flipping the switch on scsi-mq again.

    - BFQ updates (Paolo).

    - blk-mq atomic flags memory ordering fixes (Peter Z).

    - Loop cgroup support (Shaohua).

    - Lots of minor fixes from lots of different folks, both for core and
    driver code"

    * 'for-4.15/block' of git://git.kernel.dk/linux-block: (294 commits)
    nvme: fix visibility of "uuid" ns attribute
    blk-mq: fixup some comment typos and lengths
    ide: ide-atapi: fix compile error with defining macro DEBUG
    blk-mq: improve tag waiting setup for non-shared tags
    brd: remove unused brd_mutex
    blk-mq: only run the hardware queue if IO is pending
    block: avoid null pointer dereference on null disk
    fs: guard_bio_eod() needs to consider partitions
    xtensa/simdisk: fix compile error
    nvme: expose subsys attribute to sysfs
    nvme: create 'slaves' and 'holders' entries for hidden controllers
    block: create 'slaves' and 'holders' entries for hidden gendisks
    nvme: also expose the namespace identification sysfs files for mpath nodes
    nvme: implement multipath access to nvme subsystems
    nvme: track shared namespaces
    nvme: introduce a nvme_ns_ids structure
    nvme: track subsystems
    block, nvme: Introduce blk_mq_req_flags_t
    block, scsi: Make SCSI quiesce and resume work reliably
    block: Add the QUEUE_FLAG_PREEMPT_ONLY request queue flag
    ...

    Linus Torvalds
     
  • Pull ext4 updates from Ted Ts'o:

    - Add support for online resizing of file systems with bigalloc

    - Fix a two data corruption bugs involving DAX, as well as a corruption
    bug after a crash during a racing fallocate and delayed allocation.

    - Finally, a number of cleanups and optimizations.

    * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
    ext4: improve smp scalability for inode generation
    ext4: add support for online resizing with bigalloc
    ext4: mention noload when recovering on read-only device
    Documentation: fix little inconsistencies
    ext4: convert timers to use timer_setup()
    jbd2: convert timers to use timer_setup()
    ext4: remove duplicate extended attributes defs
    ext4: add ext4_should_use_dax()
    ext4: add sanity check for encryption + DAX
    ext4: prevent data corruption with journaling + DAX
    ext4: prevent data corruption with inline data + DAX
    ext4: fix interaction between i_size, fallocate, and delalloc after a crash
    ext4: retry allocations conservatively
    ext4: Switch to iomap for SEEK_HOLE / SEEK_DATA
    ext4: Add iomap support for inline data
    iomap: Add IOMAP_F_DATA_INLINE flag
    iomap: Switch from blkno to disk offset

    Linus Torvalds
     

11 Nov, 2017

1 commit

  • guard_bio_eod() needs to look at the partition capacity, not just the
    capacity of the whole device, when determining if truncation is
    necessary.

    [ 60.268688] attempt to access beyond end of device
    [ 60.268690] unknown-block(9,1): rw=0, want=67103509, limit=67103506
    [ 60.268693] buffer_io_error: 2 callbacks suppressed
    [ 60.268696] Buffer I/O error on dev md1p7, logical block 4524305, async page read

    Fixes: 74d46992e0d9 ("block: replace bi_bdev with a gendisk pointer and partitions index")
    Cc: stable@vger.kernel.org # v4.13
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Greg Edwards
    Signed-off-by: Jens Axboe

    Greg Edwards
     

25 Oct, 2017

1 commit

  • …READ_ONCE()/WRITE_ONCE()

    Please do not apply this to mainline directly, instead please re-run the
    coccinelle script shown below and apply its output.

    For several reasons, it is desirable to use {READ,WRITE}_ONCE() in
    preference to ACCESS_ONCE(), and new code is expected to use one of the
    former. So far, there's been no reason to change most existing uses of
    ACCESS_ONCE(), as these aren't harmful, and changing them results in
    churn.

    However, for some features, the read/write distinction is critical to
    correct operation. To distinguish these cases, separate read/write
    accessors must be used. This patch migrates (most) remaining
    ACCESS_ONCE() instances to {READ,WRITE}_ONCE(), using the following
    coccinelle script:

    ----
    // Convert trivial ACCESS_ONCE() uses to equivalent READ_ONCE() and
    // WRITE_ONCE()

    // $ make coccicheck COCCI=/home/mark/once.cocci SPFLAGS="--include-headers" MODE=patch

    virtual patch

    @ depends on patch @
    expression E1, E2;
    @@

    - ACCESS_ONCE(E1) = E2
    + WRITE_ONCE(E1, E2)

    @ depends on patch @
    expression E;
    @@

    - ACCESS_ONCE(E)
    + READ_ONCE(E)
    ----

    Signed-off-by: Mark Rutland <mark.rutland@arm.com>
    Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: davem@davemloft.net
    Cc: linux-arch@vger.kernel.org
    Cc: mpe@ellerman.id.au
    Cc: shuah@kernel.org
    Cc: snitzer@redhat.com
    Cc: thor.thayer@linux.intel.com
    Cc: tj@kernel.org
    Cc: viro@zeniv.linux.org.uk
    Cc: will.deacon@arm.com
    Link: http://lkml.kernel.org/r/1508792849-3115-19-git-send-email-paulmck@linux.vnet.ibm.com
    Signed-off-by: Ingo Molnar <mingo@kernel.org>

    Mark Rutland
     

03 Oct, 2017

3 commits

  • Since the previous commit removed any case where grow_buffers()
    would return failure due to memory allocations, we can safely
    remove the case where we have to call free_more_memory() in
    this function.

    Since this is also the last user of free_more_memory(), kill
    it off completely.

    Reviewed-by: Nikolay Borisov
    Reviewed-by: Jan Kara
    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • We currently use it for find_or_create_page(), which means that it
    cannot fail. Ensure we also pass in 'retry == true' to
    alloc_page_buffers(), which also ensure that it cannot fail.

    After this, there are no failure cases in grow_dev_page() that
    occur because of a failed memory allocation.

    Reviewed-by: Nikolay Borisov
    Reviewed-by: Jan Kara
    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Instead of adding weird retry logic in that function, utilize
    __GFP_NOFAIL to ensure that the vm takes care of handling any
    potential retries appropriately. This means we don't have to
    call free_more_memory() from here.

    Reviewed-by: Nikolay Borisov
    Reviewed-by: Jan Kara
    Signed-off-by: Jens Axboe

    Jens Axboe