06 Oct, 2020

1 commit

  • bio_crypt_clone() assumes its gfp_mask argument always includes
    __GFP_DIRECT_RECLAIM, so that the mempool_alloc() will always succeed.

    However, bio_crypt_clone() might be called with GFP_ATOMIC via
    setup_clone() in drivers/md/dm-rq.c, or with GFP_NOWAIT via
    kcryptd_io_read() in drivers/md/dm-crypt.c.

    Neither case is currently reachable with a bio that actually has an
    encryption context. However, it's fragile to rely on this. Just make
    bio_crypt_clone() able to fail, analogous to bio_integrity_clone().

    Reported-by: Miaohe Lin
    Signed-off-by: Eric Biggers
    Reviewed-by: Mike Snitzer
    Reviewed-by: Satya Tangirala
    Cc: Satya Tangirala
    Signed-off-by: Jens Axboe

    Eric Biggers
     

01 Jul, 2020

1 commit


14 May, 2020

1 commit

  • We must have some way of letting a storage device driver know what
    encryption context it should use for en/decrypting a request. However,
    it's the upper layers (like the filesystem/fscrypt) that know about and
    manages encryption contexts. As such, when the upper layer submits a bio
    to the block layer, and this bio eventually reaches a device driver with
    support for inline encryption, the device driver will need to have been
    told the encryption context for that bio.

    We want to communicate the encryption context from the upper layer to the
    storage device along with the bio, when the bio is submitted to the block
    layer. To do this, we add a struct bio_crypt_ctx to struct bio, which can
    represent an encryption context (note that we can't use the bi_private
    field in struct bio to do this because that field does not function to pass
    information across layers in the storage stack). We also introduce various
    functions to manipulate the bio_crypt_ctx and make the bio/request merging
    logic aware of the bio_crypt_ctx.

    We also make changes to blk-mq to make it handle bios with encryption
    contexts. blk-mq can merge many bios into the same request. These bios need
    to have contiguous data unit numbers (the necessary changes to blk-merge
    are also made to ensure this) - as such, it suffices to keep the data unit
    number of just the first bio, since that's all a storage driver needs to
    infer the data unit number to use for each data block in each bio in a
    request. blk-mq keeps track of the encryption context to be used for all
    the bios in a request with the request's rq_crypt_ctx. When the first bio
    is added to an empty request, blk-mq will program the encryption context
    of that bio into the request_queue's keyslot manager, and store the
    returned keyslot in the request's rq_crypt_ctx. All the functions to
    operate on encryption contexts are in blk-crypto.c.

    Upper layers only need to call bio_crypt_set_ctx with the encryption key,
    algorithm and data_unit_num; they don't have to worry about getting a
    keyslot for each encryption context, as blk-mq/blk-crypto handles that.
    Blk-crypto also makes it possible for request-based layered devices like
    dm-rq to make use of inline encryption hardware by cloning the
    rq_crypt_ctx and programming a keyslot in the new request_queue when
    necessary.

    Note that any user of the block layer can submit bios with an
    encryption context, such as filesystems, device-mapper targets, etc.

    Signed-off-by: Satya Tangirala
    Reviewed-by: Eric Biggers
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Satya Tangirala
     

30 Apr, 2019

1 commit


22 Feb, 2019

1 commit

  • Block bounce needs to allocate new page for doing IO, and the
    new page has to be updated to bvec table.

    Commit 6dc4f100c switches __blk_queue_bounce() to use the new
    bio_for_each_segment_all() interface. Unfortunately the new
    bio_for_each_segment_all() can't be used to update bvec table.

    This patch fixes this issue by retrieving bvec from the table
    directly, then the new allocated page can be updated to the bio.
    This way is safe because the cloned bio is single page bvec.

    Fixes: 6dc4f100c ("block: allow bio_for_each_segment_all() to iterate over multi-page bvec")
    Cc: Christoph Hellwig
    Cc: Omar Sandoval
    Signed-off-by: Ming Lei
    Signed-off-by: Jens Axboe

    Ming Lei
     

15 Feb, 2019

1 commit

  • This patch introduces one extra iterator variable to bio_for_each_segment_all(),
    then we can allow bio_for_each_segment_all() to iterate over multi-page bvec.

    Given it is just one mechannical & simple change on all bio_for_each_segment_all()
    users, this patch does tree-wide change in one single patch, so that we can
    avoid to use a temporary helper for this conversion.

    Reviewed-by: Omar Sandoval
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Ming Lei
    Signed-off-by: Jens Axboe

    Ming Lei
     

08 Dec, 2018

2 commits

  • Prior patches ensured that any bio that interacts with a request_queue
    is properly associated with a blkg. This makes bio->bi_css unnecessary
    as blkg maintains a reference to blkcg already.

    This removes the bio field bi_css and transfers corresponding uses to
    access via bi_blkg.

    Signed-off-by: Dennis Zhou
    Reviewed-by: Josef Bacik
    Acked-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Dennis Zhou
     
  • bio_issue_init among other things initializes the timestamp for an IO.
    Rather than have this logic handled by policies, this consolidates it to
    be on the init paths (normal, clone, bounce clone).

    Signed-off-by: Dennis Zhou
    Acked-by: Tejun Heo
    Reviewed-by: Liu Bo
    Reviewed-by: Josef Bacik
    Signed-off-by: Jens Axboe

    Dennis Zhou
     

13 Nov, 2018

1 commit

  • We need to copy the io priority, too; otherwise the clone will run
    with a different priority than the original one.

    Fixes: 43b62ce3ff0a ("block: move bio io prio to a new field")
    Signed-off-by: Hannes Reinecke
    Signed-off-by: Jean Delvare

    Fixed up subject, and ordered stores.

    Signed-off-by: Jens Axboe

    Hannes Reinecke
     

03 Nov, 2018

1 commit

  • Pull block layer fixes from Jens Axboe:
    "The biggest part of this pull request is the revert of the blkcg
    cleanup series. It had one fix earlier for a stacked device issue, but
    another one was reported. Rather than play whack-a-mole with this,
    revert the entire series and try again for the next kernel release.

    Apart from that, only small fixes/changes.

    Summary:

    - Indentation fixup for mtip32xx (Colin Ian King)

    - The blkcg cleanup series revert (Dennis Zhou)

    - Two NVMe fixes. One fixing a regression in the nvme request
    initialization in this merge window, causing nvme-fc to not work.
    The other is a suspend/resume p2p resource issue (James, Keith)

    - Fix sg discard merge, allowing us to merge in cases where we didn't
    before (Jianchao Wang)

    - Call rq_qos_exit() after the queue is frozen, preventing a hang
    (Ming)

    - Fix brd queue setup, fixing an oops if we fail setting up all
    devices (Ming)"

    * tag 'for-linus-20181102' of git://git.kernel.dk/linux-block:
    nvme-pci: fix conflicting p2p resource adds
    nvme-fc: fix request private initialization
    blkcg: revert blkcg cleanups series
    block: brd: associate with queue until adding disk
    block: call rq_qos_exit() after queue is frozen
    mtip32xx: clean an indentation issue, remove extraneous tabs
    block: fix the DISCARD request merge

    Linus Torvalds
     

02 Nov, 2018

1 commit

  • This reverts a series committed earlier due to null pointer exception
    bug report in [1]. It seems there are edge case interactions that I did
    not consider and will need some time to understand what causes the
    adverse interactions.

    The original series can be found in [2] with a follow up series in [3].

    [1] https://www.spinics.net/lists/cgroups/msg20719.html
    [2] https://lore.kernel.org/lkml/20180911184137.35897-1-dennisszhou@gmail.com/
    [3] https://lore.kernel.org/lkml/20181020185612.51587-1-dennis@kernel.org/

    This reverts the following commits:
    d459d853c2ed, b2c3fa546705, 101246ec02b5, b3b9f24f5fcc, e2b0989954ae,
    f0fcb3ec89f3, c839e7a03f92, bdc2491708c4, 74b7c02a9bc1, 5bf9a1f3b4ef,
    a7b39b4e961c, 07b05bcc3213, 49f4c2dc2b50, 27e6fa996c53

    Signed-off-by: Dennis Zhou
    Signed-off-by: Jens Axboe

    Dennis Zhou
     

31 Oct, 2018

1 commit

  • Move remaining definitions and declarations from include/linux/bootmem.h
    into include/linux/memblock.h and remove the redundant header.

    The includes were replaced with the semantic patch below and then
    semi-automated removal of duplicated '#include

    @@
    @@
    - #include
    + #include

    [sfr@canb.auug.org.au: dma-direct: fix up for the removal of linux/bootmem.h]
    Link: http://lkml.kernel.org/r/20181002185342.133d1680@canb.auug.org.au
    [sfr@canb.auug.org.au: powerpc: fix up for removal of linux/bootmem.h]
    Link: http://lkml.kernel.org/r/20181005161406.73ef8727@canb.auug.org.au
    [sfr@canb.auug.org.au: x86/kaslr, ACPI/NUMA: fix for linux/bootmem.h removal]
    Link: http://lkml.kernel.org/r/20181008190341.5e396491@canb.auug.org.au
    Link: http://lkml.kernel.org/r/1536927045-23536-30-git-send-email-rppt@linux.vnet.ibm.com
    Signed-off-by: Mike Rapoport
    Signed-off-by: Stephen Rothwell
    Acked-by: Michal Hocko
    Cc: Catalin Marinas
    Cc: Chris Zankel
    Cc: "David S. Miller"
    Cc: Geert Uytterhoeven
    Cc: Greentime Hu
    Cc: Greg Kroah-Hartman
    Cc: Guan Xuetao
    Cc: Ingo Molnar
    Cc: "James E.J. Bottomley"
    Cc: Jonas Bonn
    Cc: Jonathan Corbet
    Cc: Ley Foon Tan
    Cc: Mark Salter
    Cc: Martin Schwidefsky
    Cc: Matt Turner
    Cc: Michael Ellerman
    Cc: Michal Simek
    Cc: Palmer Dabbelt
    Cc: Paul Burton
    Cc: Richard Kuo
    Cc: Richard Weinberger
    Cc: Rich Felker
    Cc: Russell King
    Cc: Serge Semin
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Cc: Vineet Gupta
    Cc: Yoshinori Sato
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     

22 Oct, 2018

1 commit

  • We're only setting up the bounce bio sets if we happen
    to need bouncing for regular HIGHMEM, not if we only need
    it for ISA devices.

    Protect the ISA bounce setup with a mutex, since it's
    being invoked from driver init functions and can thus be
    called in parallel.

    Cc: stable@vger.kernel.org
    Reported-by: Ondrej Zary
    Tested-by: Ondrej Zary
    Signed-off-by: Jens Axboe

    Jens Axboe
     

22 Sep, 2018

2 commits


25 Jul, 2018

1 commit


31 May, 2018

2 commits


08 May, 2018

1 commit

  • bounce_copy_vec() disables interrupts around kmap_atomic(). This is a
    leftover from the old kmap_atomic() implementation which relied on fixed
    mapping slots, so the caller had to make sure that the same slot could not
    be reused from an interrupting context.

    kmap_atomic() was changed to dynamic slots long ago and commit 1ec9c5ddc17a
    ("include/linux/highmem.h: remove the second argument of k[un]map_atomic()")
    removed the slot assignements, but the callers were not checked for now
    redundant interrupt disabling.

    Remove the conditional interrupt disable.

    Signed-off-by: Sebastian Andrzej Siewior
    Signed-off-by: Jens Axboe

    Sebastian Andrzej Siewior
     

26 Mar, 2018

1 commit


30 Jan, 2018

1 commit

  • Pull block updates from Jens Axboe:
    "This is the main pull request for block IO related changes for the
    4.16 kernel. Nothing major in this pull request, but a good amount of
    improvements and fixes all over the map. This contains:

    - BFQ improvements, fixes, and cleanups from Angelo, Chiara, and
    Paolo.

    - Support for SMR zones for deadline and mq-deadline from Damien and
    Christoph.

    - Set of fixes for bcache by way of Michael Lyle, including fixes
    from himself, Kent, Rui, Tang, and Coly.

    - Series from Matias for lightnvm with fixes from Hans Holmberg,
    Javier, and Matias. Mostly centered around pblk, and the removing
    rrpc 1.2 in preparation for supporting 2.0.

    - A couple of NVMe pull requests from Christoph. Nothing major in
    here, just fixes and cleanups, and support for command tracing from
    Johannes.

    - Support for blk-throttle for tracking reads and writes separately.
    From Joseph Qi. A few cleanups/fixes also for blk-throttle from
    Weiping.

    - Series from Mike Snitzer that enables dm to register its queue more
    logically, something that's alwways been problematic on dm since
    it's a stacked device.

    - Series from Ming cleaning up some of the bio accessor use, in
    preparation for supporting multipage bvecs.

    - Various fixes from Ming closing up holes around queue mapping and
    quiescing.

    - BSD partition fix from Richard Narron, fixing a problem where we
    can't mount newer (10/11) FreeBSD partitions.

    - Series from Tejun reworking blk-mq timeout handling. The previous
    scheme relied on atomic bits, but it had races where we would think
    a request had timed out if it to reused at the wrong time.

    - null_blk now supports faking timeouts, to enable us to better
    exercise and test that functionality separately. From me.

    - Kill the separate atomic poll bit in the request struct. After
    this, we don't use the atomic bits on blk-mq anymore at all. From
    me.

    - sgl_alloc/free helpers from Bart.

    - Heavily contended tag case scalability improvement from me.

    - Various little fixes and cleanups from Arnd, Bart, Corentin,
    Douglas, Eryu, Goldwyn, and myself"

    * 'for-4.16/block' of git://git.kernel.dk/linux-block: (186 commits)
    block: remove smart1,2.h
    nvme: add tracepoint for nvme_complete_rq
    nvme: add tracepoint for nvme_setup_cmd
    nvme-pci: introduce RECONNECTING state to mark initializing procedure
    nvme-rdma: remove redundant boolean for inline_data
    nvme: don't free uuid pointer before printing it
    nvme-pci: Suspend queues after deleting them
    bsg: use pr_debug instead of hand crafted macros
    blk-mq-debugfs: don't allow write on attributes with seq_operations set
    nvme-pci: Fix queue double allocations
    block: Set BIO_TRACE_COMPLETION on new bio during split
    blk-throttle: use queue_is_rq_based
    block: Remove kblockd_schedule_delayed_work{,_on}()
    blk-mq: Avoid that blk_mq_delay_run_hw_queue() introduces unintended delays
    blk-mq: Rename blk_mq_request_direct_issue() into blk_mq_request_issue_directly()
    lib/scatterlist: Fix chaining support in sgl_alloc_order()
    blk-throttle: track read and write request individually
    block: add bdev_read_only() checks to common helpers
    block: fail op_is_write() requests to read-only partitions
    blk-throttle: export io_serviced_recursive, io_service_bytes_recursive
    ...

    Linus Torvalds
     

07 Jan, 2018

2 commits


19 Dec, 2017

1 commit

  • Commit a8821f3f3("block: Improvements to bounce-buffer handling") tries
    to make sure that the bio to .make_request_fn won't exceed BIO_MAX_PAGES,
    but ignores that passthrough I/O can use blk_queue_bounce() too.
    Especially, passthrough IO may not be sector-aligned, and the check
    of 'sectors < bio_sectors(*bio_orig)' inside __blk_queue_bounce() may
    become true even though the max bvec number doesn't exceed BIO_MAX_PAGES,
    then cause the bio splitted, and the original passthrough bio is submited
    to generic_make_request().

    This patch fixes this issue by checking if the bio is passthrough IO,
    and use bio_kmalloc() to allocate the cloned passthrough bio.

    Cc: NeilBrown
    Fixes: a8821f3f3("block: Improvements to bounce-buffer handling")
    Tested-by: Michele Ballabio
    Signed-off-by: Ming Lei
    Signed-off-by: Jens Axboe

    Ming Lei
     

02 Nov, 2017

1 commit

  • Many source files in the tree are missing licensing information, which
    makes it harder for compliance tools to determine the correct license.

    By default all files without license information are under the default
    license of the kernel, which is GPL version 2.

    Update the files which contain no license information with the 'GPL-2.0'
    SPDX license identifier. The SPDX identifier is a legally binding
    shorthand, which can be used instead of the full boiler plate text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart and
    Philippe Ombredanne.

    How this work was done:

    Patches were generated and checked against linux-4.14-rc6 for a subset of
    the use cases:
    - file had no licensing information it it.
    - file was a */uapi/* one with no licensing information in it,
    - file was a */uapi/* one with existing licensing information,

    Further patches will be generated in subsequent months to fix up cases
    where non-standard license headers were used, and references to license
    had to be inferred by heuristics based on keywords.

    The analysis to determine which SPDX License Identifier to be applied to
    a file was done in a spreadsheet of side by side results from of the
    output of two independent scanners (ScanCode & Windriver) producing SPDX
    tag:value files created by Philippe Ombredanne. Philippe prepared the
    base worksheet, and did an initial spot review of a few 1000 files.

    The 4.13 kernel was the starting point of the analysis with 60,537 files
    assessed. Kate Stewart did a file by file comparison of the scanner
    results in the spreadsheet to determine which SPDX license identifier(s)
    to be applied to the file. She confirmed any determination that was not
    immediately clear with lawyers working with the Linux Foundation.

    Criteria used to select files for SPDX license identifier tagging was:
    - Files considered eligible had to be source code files.
    - Make and config files were included as candidates if they contained >5
    lines of source
    - File already had some variant of a license header in it (even if
    Reviewed-by: Philippe Ombredanne
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

28 Jun, 2017

3 commits


22 Jun, 2017

1 commit

  • Avoid that building with W=1 causes the compiler to complain that
    a declaration for bounce_bio_set and bounce_bio_split is missing.

    References: commit a8821f3f32be ("block: Improvements to bounce-buffer handling")
    Signed-off-by: Bart Van Assche
    Cc: Neil Brown
    Cc: Christoph Hellwig
    Cc: Hannes Reinecke
    Cc: Ming Lei
    Signed-off-by: Jens Axboe

    Bart Van Assche
     

19 Jun, 2017

1 commit

  • Since commit 23688bf4f830 ("block: ensure to split after potentially
    bouncing a bio") blk_queue_bounce() is called *before*
    blk_queue_split().
    This means that:
    1/ the comments blk_queue_split() about bounce buffers are
    irrelevant, and
    2/ a very large bio (more than BIO_MAX_PAGES) will no longer be
    split before it arrives at blk_queue_bounce(), leading to the
    possibility that bio_clone_bioset() will fail and a NULL
    will be dereferenced.

    Separately, blk_queue_bounce() shouldn't use fs_bio_set as the bio
    being copied could be from the same set, and this could lead to a
    deadlock.

    So:
    - allocate 2 private biosets for blk_queue_bounce, one for
    splitting enormous bios and one for cloning bios.
    - add code to split a bio that exceeds BIO_MAX_PAGES.
    - Fix up the comments in blk_queue_split()

    Credit-to: Ming Lei (suggested using single bio_for_each_segment loop)
    Reviewed-by: Ming Lei
    Reviewed-by: Christoph Hellwig
    Signed-off-by: NeilBrown
    Signed-off-by: Jens Axboe

    NeilBrown
     

09 Jun, 2017

1 commit

  • Replace bi_error with a new bi_status to allow for a clear conversion.
    Note that device mapper overloaded bi_error with a private value, which
    we'll have to keep arround at least for now and thus propagate to a
    proper blk_status_t value.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

20 Sep, 2015

1 commit

  • Pull block updates from Jens Axboe:
    "This is a bit bigger than it should be, but I could (did) not want to
    send it off last week due to both wanting extra testing, and expecting
    a fix for the bounce regression as well. In any case, this contains:

    - Fix for the blk-merge.c compilation warning on gcc 5.x from me.

    - A set of back/front SG gap merge fixes, from me and from Sagi.
    This ensures that we honor SG gapping for integrity payloads as
    well.

    - Two small fixes for null_blk from Matias, fixing a leak and a
    capacity propagation issue.

    - A blkcg fix from Tejun, fixing a NULL dereference.

    - A fast clone optimization from Ming, fixing a performance
    regression since the arbitrarily sized bio's were introduced.

    - Also from Ming, a regression fix for bouncing IOs"

    * 'for-linus' of git://git.kernel.dk/linux-block:
    block: fix bounce_end_io
    block: blk-merge: fast-clone bio when splitting rw bios
    block: blkg_destroy_all() should clear q->root_blkg and ->root_rl.blkg
    block: Copy a user iovec if it includes gaps
    block: Refuse adding appending a gapped integrity page to a bio
    block: Refuse request/bio merges with gaps in the integrity payload
    block: Check for gaps on front and back merges
    null_blk: fix wrong capacity when bs is not 512 bytes
    null_blk: fix memory leak on cleanup
    block: fix bogus compiler warnings in blk-merge.c

    Linus Torvalds
     

18 Sep, 2015

1 commit

  • When bio bounce is involved, one new bio and its biovecs are
    cloned from the comming bio, which can be one fast-cloned bio
    from upper layer(such as dm).

    So it is obviously wrong to assume the start index of the coming(
    original) bio's io vector is zero, which can be any value between
    0 and (bi_max_vecs - 1), especially in case of bio split.

    This patch fixes Fedora's booting oops on i386, often with the
    following kernel log together:

    > [ 9.026738] systemd[1]: Switching root.
    > [ 9.036467] systemd-journald[149]: Received SIGTERM from PID 1
    > (systemd).
    > [ 9.082262] BUG: Bad page state in process kworker/u5:1 pfn:372ac
    > [ 9.083989] page:f3d32ae0 count:0 mapcount:0 mapping:f2252178
    > index:0x16a
    > [ 9.085755] flags: 0x40020021(locked|lru|mappedtodisk)
    > [ 9.087284] page dumped because: page still charged to cgroup
    > [ 9.088772] bad because of flags:
    > [ 9.089731] flags: 0x21(locked|lru)
    > [ 9.090818] page->mem_cgroup:f2c3e400

    Reported-by: Josh Boyer
    Tested-by: Adam Williamson
    Cc: Ming Lin
    Cc: Mike Snitzer
    Signed-off-by: Ming Lei
    Signed-off-by: Jens Axboe

    Ming Lei
     

04 Sep, 2015

1 commit

  • Pull ext3 removal, quota & udf fixes from Jan Kara:
    "The biggest change in the pull is the removal of ext3 filesystem
    driver (~28k lines removed). Ext4 driver is a full featured
    replacement these days and both RH and SUSE use it for several years
    without issues. Also there are some workarounds in VM & block layer
    mainly for ext3 which we could eventually get rid of.

    Other larger change is addition of proper error handling for
    dquot_initialize(). The rest is small fixes and cleanups"

    [ I wasn't convinced about the ext3 removal and worried about things
    falling through the cracks for legacy users, but ext4 maintainers
    piped up and were all unanimously in favor of removal, and maintaining
    all legacy ext3 support inside ext4. - Linus ]

    * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
    udf: Don't modify filesystem for read-only mounts
    quota: remove an unneeded condition
    ext4: memory leak on error in ext4_symlink()
    mm/Kconfig: NEED_BOUNCE_POOL: clean-up condition
    ext4: Improve ext4 Kconfig test
    block: Remove forced page bouncing under IO
    fs: Remove ext3 filesystem driver
    doc: Update doc about journalling layer
    jfs: Handle error from dquot_initialize()
    reiserfs: Handle error from dquot_initialize()
    ocfs2: Handle error from dquot_initialize()
    ext4: Handle error from dquot_initialize()
    ext2: Handle error from dquot_initalize()
    quota: Propagate error from ->acquire_dquot()

    Linus Torvalds
     

29 Jul, 2015

2 commits

  • Some places use helpers now, others don't. We only have the 'is set'
    helper, add helpers for setting and clearing flags too.

    It was a bit of a mess of atomic vs non-atomic access. With
    BIO_UPTODATE gone, we don't have any risk of concurrent access to the
    flags. So relax the restriction and don't make any of them atomic. The
    flags that do have serialization issues (reffed and chained), we
    already handle those separately.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Currently we have two different ways to signal an I/O error on a BIO:

    (1) by clearing the BIO_UPTODATE flag
    (2) by returning a Linux errno value to the bi_end_io callback

    The first one has the drawback of only communicating a single possible
    error (-EIO), and the second one has the drawback of not beeing persistent
    when bios are queued up, and are not passed along from child to parent
    bio in the ever more popular chaining scenario. Having both mechanisms
    available has the additional drawback of utterly confusing driver authors
    and introducing bugs where various I/O submitters only deal with one of
    them, and the others have to add boilerplate code to deal with both kinds
    of error returns.

    So add a new bi_error field to store an errno value directly in struct
    bio and remove the existing mechanisms to clean all this up.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Hannes Reinecke
    Reviewed-by: NeilBrown
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

24 Jul, 2015

1 commit

  • JBD layer wrote back data buffers without setting PageWriteback bit.
    Thus standard mechanism for guaranteeing stable pages under IO did not
    work. Since JBD is gone now and there is no other user of the
    functionality, just remove it.

    Acked-by: Jens Axboe
    Signed-off-by: Jan Kara

    Jan Kara
     

26 Jun, 2015

2 commits

  • Pull cgroup writeback support from Jens Axboe:
    "This is the big pull request for adding cgroup writeback support.

    This code has been in development for a long time, and it has been
    simmering in for-next for a good chunk of this cycle too. This is one
    of those problems that has been talked about for at least half a
    decade, finally there's a solution and code to go with it.

    Also see last weeks writeup on LWN:

    http://lwn.net/Articles/648292/"

    * 'for-4.2/writeback' of git://git.kernel.dk/linux-block: (85 commits)
    writeback, blkio: add documentation for cgroup writeback support
    vfs, writeback: replace FS_CGROUP_WRITEBACK with SB_I_CGROUPWB
    writeback: do foreign inode detection iff cgroup writeback is enabled
    v9fs: fix error handling in v9fs_session_init()
    bdi: fix wrong error return value in cgwb_create()
    buffer: remove unusued 'ret' variable
    writeback: disassociate inodes from dying bdi_writebacks
    writeback: implement foreign cgroup inode bdi_writeback switching
    writeback: add lockdep annotation to inode_to_wb()
    writeback: use unlocked_inode_to_wb transaction in inode_congested()
    writeback: implement unlocked_inode_to_wb transaction and use it for stat updates
    writeback: implement [locked_]inode_to_wb_and_lock_list()
    writeback: implement foreign cgroup inode detection
    writeback: make writeback_control track the inode being written back
    writeback: relocate wb[_try]_get(), wb_put(), inode_{attach|detach}_wb()
    mm: vmscan: disable memcg direct reclaim stalling if cgroup writeback support is in use
    writeback: implement memcg writeback domain based throttling
    writeback: reset wb_domain->dirty_limit[_tstmp] when memcg domain size changes
    writeback: implement memcg wb_domain
    writeback: update wb_over_bg_thresh() to use wb_domain aware operations
    ...

    Linus Torvalds
     
  • Pull core block IO update from Jens Axboe:
    "Nothing really major in here, mostly a collection of smaller
    optimizations and cleanups, mixed with various fixes. In more detail,
    this contains:

    - Addition of policy specific data to blkcg for block cgroups. From
    Arianna Avanzini.

    - Various cleanups around command types from Christoph.

    - Cleanup of the suspend block I/O path from Christoph.

    - Plugging updates from Shaohua and Jeff Moyer, for blk-mq.

    - Eliminating atomic inc/dec of both remaining IO count and reference
    count in a bio. From me.

    - Fixes for SG gap and chunk size support for data-less (discards)
    IO, so we can merge these better. From me.

    - Small restructuring of blk-mq shared tag support, freeing drivers
    from iterating hardware queues. From Keith Busch.

    - A few cfq-iosched tweaks, from Tahsin Erdogan and me. Makes the
    IOPS mode the default for non-rotational storage"

    * 'for-4.2/core' of git://git.kernel.dk/linux-block: (35 commits)
    cfq-iosched: fix other locations where blkcg_to_cfqgd() can return NULL
    cfq-iosched: fix sysfs oops when attempting to read unconfigured weights
    cfq-iosched: move group scheduling functions under ifdef
    cfq-iosched: fix the setting of IOPS mode on SSDs
    blktrace: Add blktrace.c to BLOCK LAYER in MAINTAINERS file
    block, cgroup: implement policy-specific per-blkcg data
    block: Make CFQ default to IOPS mode on SSDs
    block: add blk_set_queue_dying() to blkdev.h
    blk-mq: Shared tag enhancements
    block: don't honor chunk sizes for data-less IO
    block: only honor SG gap prevention for merges that contain data
    block: fix returnvar.cocci warnings
    block, dm: don't copy bios for request clones
    block: remove management of bi_remaining when restoring original bi_end_io
    block: replace trylock with mutex_lock in blkdev_reread_part()
    block: export blkdev_reread_part() and __blkdev_reread_part()
    suspend: simplify block I/O handling
    block: collapse bio bit space
    block: remove unused BIO_RW_BLOCK and BIO_EOF flags
    block: remove BIO_EOPNOTSUPP
    ...

    Linus Torvalds
     

02 Jun, 2015

1 commit

  • With the planned cgroup writeback support, backing-dev related
    declarations will be more widely used across block and cgroup;
    unfortunately, including backing-dev.h from include/linux/blkdev.h
    makes cyclic include dependency quite likely.

    This patch separates out backing-dev-defs.h which only has the
    essential definitions and updates blkdev.h to include it. c files
    which need access to more backing-dev details now include
    backing-dev.h directly. This takes backing-dev.h off the common
    include dependency chain making it a lot easier to use it across block
    and cgroup.

    v2: fs/fat build failure fixed.

    Signed-off-by: Tejun Heo
    Reviewed-by: Jan Kara
    Cc: Jens Axboe
    Signed-off-by: Jens Axboe

    Tejun Heo