11 Sep, 2015

1 commit

  • Pull blk-cg updates from Jens Axboe:
    "A bit later in the cycle, but this has been in the block tree for a a
    while. This is basically four patchsets from Tejun, that improve our
    buffered cgroup writeback. It was dependent on the other cgroup
    changes, but they went in earlier in this cycle.

    Series 1 is set of 5 patches that has cgroup writeback updates:

    - bdi_writeback iteration fix which could lead to some wb's being
    skipped or repeated during e.g. sync under memory pressure.

    - Simplification of wb work wait mechanism.

    - Writeback tracepoints updated to report cgroup.

    Series 2 is is a set of updates for the CFQ cgroup writeback handling:

    cfq has always charged all async IOs to the root cgroup. It didn't
    have much choice as writeback didn't know about cgroups and there
    was no way to tell who to blame for a given writeback IO.
    writeback finally grew support for cgroups and now tags each
    writeback IO with the appropriate cgroup to charge it against.

    This patchset updates cfq so that it follows the blkcg each bio is
    tagged with. Async cfq_queues are now shared across cfq_group,
    which is per-cgroup, instead of per-request_queue cfq_data. This
    makes all IOs follow the weight based IO resource distribution
    implemented by cfq.

    - Switched from GFP_ATOMIC to GFP_NOWAIT as suggested by Jeff.

    - Other misc review points addressed, acks added and rebased.

    Series 3 is the blkcg policy cleanup patches:

    This patchset contains assorted cleanups for blkcg_policy methods
    and blk[c]g_policy_data handling.

    - alloc/free added for blkg_policy_data. exit dropped.

    - alloc/free added for blkcg_policy_data.

    - blk-throttle's async percpu allocation is replaced with direct
    allocation.

    - all methods now take blk[c]g_policy_data instead of blkcg_gq or
    blkcg.

    And finally, series 4 is a set of patches cleaning up the blkcg stats
    handling:

    blkcg's stats have always been somwhat of a mess. This patchset
    tries to improve the situation a bit.

    - The following patches added to consolidate blkcg entry point and
    blkg creation. This is in itself is an improvement and helps
    colllecting common stats on bio issue.

    - per-blkg stats now accounted on bio issue rather than request
    completion so that bio based and request based drivers can behave
    the same way. The issue was spotted by Vivek.

    - cfq-iosched implements custom recursive stats and blk-throttle
    implements custom per-cpu stats. This patchset make blkcg core
    support both by default.

    - cfq-iosched and blk-throttle keep track of the same stats
    multiple times. Unify them"

    * 'for-4.3/blkcg' of git://git.kernel.dk/linux-block: (45 commits)
    blkcg: use CGROUP_WEIGHT_* scale for io.weight on the unified hierarchy
    blkcg: s/CFQ_WEIGHT_*/CFQ_WEIGHT_LEGACY_*/
    blkcg: implement interface for the unified hierarchy
    blkcg: misc preparations for unified hierarchy interface
    blkcg: separate out tg_conf_updated() from tg_set_conf()
    blkcg: move body parsing from blkg_conf_prep() to its callers
    blkcg: mark existing cftypes as legacy
    blkcg: rename subsystem name from blkio to io
    blkcg: refine error codes returned during blkcg configuration
    blkcg: remove unnecessary NULL checks from __cfqg_set_weight_device()
    blkcg: reduce stack usage of blkg_rwstat_recursive_sum()
    blkcg: remove cfqg_stats->sectors
    blkcg: move io_service_bytes and io_serviced stats into blkcg_gq
    blkcg: make blkg_[rw]stat_recursive_sum() to be able to index into blkcg_gq
    blkcg: make blkcg_[rw]stat per-cpu
    blkcg: add blkg_[rw]stat->aux_cnt and replace cfq_group->dead_stats with it
    blkcg: consolidate blkg creation in blkcg_bio_issue_check()
    blk-throttle: improve queue bypass handling
    blkcg: move root blkg lookup optimization from throtl_lookup_tg() to __blkg_lookup()
    blkcg: inline [__]blkg_lookup()
    ...

    Linus Torvalds
     

19 Aug, 2015

1 commit

  • blkg (blkcg_gq) currently is created by blkcg policies invoking
    blkg_lookup_create() which ends up repeating about the same code in
    different policies. Theoretically, this can avoid the overhead of
    looking and/or creating blkg's if blkcg is enabled but no policy is in
    use; however, the cost of blkg lookup / creation is very low
    especially if only the root blkcg is in use which is highly likely if
    no blkcg policy is in active use - it boils down to a single very
    predictable conditional and surrounding RCU protection.

    This patch consolidates blkg creation to a new function
    blkcg_bio_issue_check() which is called during bio issue from
    generic_make_request_checks(). blkcg_bio_issue_check() is now the
    only function which tries to create missing blkg's. The subsequent
    policy and request_list operations just perform blkg_lookup() and if
    missing falls back to the root.

    * blk_get_rl() no longer tries to create blkg. It uses blkg_lookup()
    instead of blkg_lookup_create().

    * blk_throtl_bio() is now called from blkcg_bio_issue_check() with rcu
    read locked and blkg already looked up. Both throtl_lookup_tg() and
    throtl_lookup_create_tg() are dropped.

    * cfq is similarly updated. cfq_lookup_create_cfqg() is replaced with
    cfq_lookup_cfqg()which uses blkg_lookup().

    This consolidates blkg handling and avoids unnecessary blkg creation
    retries under memory pressure. In addition, this provides a common
    bio entry point into blkcg where things like common accounting can be
    performed.

    v2: Build fixes for !CONFIG_CFQ_GROUP_IOSCHED and
    !CONFIG_BLK_DEV_THROTTLING.

    Signed-off-by: Tejun Heo
    Cc: Vivek Goyal
    Cc: Arianna Avanzini
    Signed-off-by: Jens Axboe

    Tejun Heo
     

14 Aug, 2015

1 commit

  • The way the block layer is currently written, it goes to great lengths
    to avoid having to split bios; upper layer code (such as bio_add_page())
    checks what the underlying device can handle and tries to always create
    bios that don't need to be split.

    But this approach becomes unwieldy and eventually breaks down with
    stacked devices and devices with dynamic limits, and it adds a lot of
    complexity. If the block layer could split bios as needed, we could
    eliminate a lot of complexity elsewhere - particularly in stacked
    drivers. Code that creates bios can then create whatever size bios are
    convenient, and more importantly stacked drivers don't have to deal with
    both their own bio size limitations and the limitations of the
    (potentially multiple) devices underneath them. In the future this will
    let us delete merge_bvec_fn and a bunch of other code.

    We do this by adding calls to blk_queue_split() to the various
    make_request functions that need it - a few can already handle arbitrary
    size bios. Note that we add the call _after_ any call to
    blk_queue_bounce(); this means that blk_queue_split() and
    blk_recalc_rq_segments() don't need to be concerned with bouncing
    affecting segment merging.

    Some make_request_fn() callbacks were simple enough to audit and verify
    they don't need blk_queue_split() calls. The skipped ones are:

    * nfhd_make_request (arch/m68k/emu/nfblock.c)
    * axon_ram_make_request (arch/powerpc/sysdev/axonram.c)
    * simdisk_make_request (arch/xtensa/platforms/iss/simdisk.c)
    * brd_make_request (ramdisk - drivers/block/brd.c)
    * mtip_submit_request (drivers/block/mtip32xx/mtip32xx.c)
    * loop_make_request
    * null_queue_bio
    * bcache's make_request fns

    Some others are almost certainly safe to remove now, but will be left
    for future patches.

    Cc: Jens Axboe
    Cc: Christoph Hellwig
    Cc: Al Viro
    Cc: Ming Lei
    Cc: Neil Brown
    Cc: Alasdair Kergon
    Cc: Mike Snitzer
    Cc: dm-devel@redhat.com
    Cc: Lars Ellenberg
    Cc: drbd-user@lists.linbit.com
    Cc: Jiri Kosina
    Cc: Geoff Levand
    Cc: Jim Paris
    Cc: Philip Kelleher
    Cc: Minchan Kim
    Cc: Nitin Gupta
    Cc: Oleg Drokin
    Cc: Andreas Dilger
    Acked-by: NeilBrown (for the 'md/md.c' bits)
    Acked-by: Mike Snitzer
    Reviewed-by: Martin K. Petersen
    Signed-off-by: Kent Overstreet
    [dpark: skip more mq-based drivers, resolve merge conflicts, etc.]
    Signed-off-by: Dongsu Park
    Signed-off-by: Ming Lin
    Signed-off-by: Jens Axboe

    Kent Overstreet
     

29 Jul, 2015

2 commits

  • Some places use helpers now, others don't. We only have the 'is set'
    helper, add helpers for setting and clearing flags too.

    It was a bit of a mess of atomic vs non-atomic access. With
    BIO_UPTODATE gone, we don't have any risk of concurrent access to the
    flags. So relax the restriction and don't make any of them atomic. The
    flags that do have serialization issues (reffed and chained), we
    already handle those separately.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Currently we have two different ways to signal an I/O error on a BIO:

    (1) by clearing the BIO_UPTODATE flag
    (2) by returning a Linux errno value to the bi_end_io callback

    The first one has the drawback of only communicating a single possible
    error (-EIO), and the second one has the drawback of not beeing persistent
    when bios are queued up, and are not passed along from child to parent
    bio in the ever more popular chaining scenario. Having both mechanisms
    available has the additional drawback of utterly confusing driver authors
    and introducing bugs where various I/O submitters only deal with one of
    them, and the others have to add boilerplate code to deal with both kinds
    of error returns.

    So add a new bi_error field to store an errno value directly in struct
    bio and remove the existing mechanisms to clean all this up.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Hannes Reinecke
    Reviewed-by: NeilBrown
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

07 Jul, 2015

1 commit


27 Jun, 2015

1 commit

  • Pull device mapper fixes from Mike Snitzer:
    "Apologies for not pressing this request-based DM partial completion
    issue further, it was an oversight on my part. We'll have to get it
    fixed up properly and revisit for a future release.

    - Revert block and DM core changes the removed request-based DM's
    ability to handle partial request completions -- otherwise with the
    current SCSI LLDs these changes could lead to silent data
    corruption.

    - Fix two DM version bumps that were missing from the initial 4.2 DM
    pull request (enabled userspace lvm2 to know certain changes have
    been made)"

    * tag 'dm-4.2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
    dm cache policy smq: fix "default" version to be 1.4.0
    dm: bump the ioctl version to 4.32.0
    Revert "block, dm: don't copy bios for request clones"
    Revert "dm: do not allocate any mempools for blk-mq request-based DM"

    Linus Torvalds
     

26 Jun, 2015

3 commits

  • This reverts commit 5f1b670d0bef508a5554d92525f5f6d00d640b38.

    Justification for revert as reported in this dm-devel post:
    https://www.redhat.com/archives/dm-devel/2015-June/msg00160.html

    this change should not be pushed to mainline yet.

    Firstly, Christoph has a newer version of the patch that fixes silent
    data corruption problem:
    https://www.redhat.com/archives/dm-devel/2015-May/msg00229.html

    And the new version still depends on LLDDs to always complete requests
    to the end when error happens, while block API doesn't enforce such a
    requirement. If the assumption is ever broken, the inconsistency between
    request and bio (e.g. rq->__sector and rq->bio) will cause silent data
    corruption:
    https://www.redhat.com/archives/dm-devel/2015-June/msg00022.html

    Reported-by: Junichi Nomura
    Signed-off-by: Mike Snitzer

    Mike Snitzer
     
  • Pull cgroup writeback support from Jens Axboe:
    "This is the big pull request for adding cgroup writeback support.

    This code has been in development for a long time, and it has been
    simmering in for-next for a good chunk of this cycle too. This is one
    of those problems that has been talked about for at least half a
    decade, finally there's a solution and code to go with it.

    Also see last weeks writeup on LWN:

    http://lwn.net/Articles/648292/"

    * 'for-4.2/writeback' of git://git.kernel.dk/linux-block: (85 commits)
    writeback, blkio: add documentation for cgroup writeback support
    vfs, writeback: replace FS_CGROUP_WRITEBACK with SB_I_CGROUPWB
    writeback: do foreign inode detection iff cgroup writeback is enabled
    v9fs: fix error handling in v9fs_session_init()
    bdi: fix wrong error return value in cgwb_create()
    buffer: remove unusued 'ret' variable
    writeback: disassociate inodes from dying bdi_writebacks
    writeback: implement foreign cgroup inode bdi_writeback switching
    writeback: add lockdep annotation to inode_to_wb()
    writeback: use unlocked_inode_to_wb transaction in inode_congested()
    writeback: implement unlocked_inode_to_wb transaction and use it for stat updates
    writeback: implement [locked_]inode_to_wb_and_lock_list()
    writeback: implement foreign cgroup inode detection
    writeback: make writeback_control track the inode being written back
    writeback: relocate wb[_try]_get(), wb_put(), inode_{attach|detach}_wb()
    mm: vmscan: disable memcg direct reclaim stalling if cgroup writeback support is in use
    writeback: implement memcg writeback domain based throttling
    writeback: reset wb_domain->dirty_limit[_tstmp] when memcg domain size changes
    writeback: implement memcg wb_domain
    writeback: update wb_over_bg_thresh() to use wb_domain aware operations
    ...

    Linus Torvalds
     
  • Pull core block IO update from Jens Axboe:
    "Nothing really major in here, mostly a collection of smaller
    optimizations and cleanups, mixed with various fixes. In more detail,
    this contains:

    - Addition of policy specific data to blkcg for block cgroups. From
    Arianna Avanzini.

    - Various cleanups around command types from Christoph.

    - Cleanup of the suspend block I/O path from Christoph.

    - Plugging updates from Shaohua and Jeff Moyer, for blk-mq.

    - Eliminating atomic inc/dec of both remaining IO count and reference
    count in a bio. From me.

    - Fixes for SG gap and chunk size support for data-less (discards)
    IO, so we can merge these better. From me.

    - Small restructuring of blk-mq shared tag support, freeing drivers
    from iterating hardware queues. From Keith Busch.

    - A few cfq-iosched tweaks, from Tahsin Erdogan and me. Makes the
    IOPS mode the default for non-rotational storage"

    * 'for-4.2/core' of git://git.kernel.dk/linux-block: (35 commits)
    cfq-iosched: fix other locations where blkcg_to_cfqgd() can return NULL
    cfq-iosched: fix sysfs oops when attempting to read unconfigured weights
    cfq-iosched: move group scheduling functions under ifdef
    cfq-iosched: fix the setting of IOPS mode on SSDs
    blktrace: Add blktrace.c to BLOCK LAYER in MAINTAINERS file
    block, cgroup: implement policy-specific per-blkcg data
    block: Make CFQ default to IOPS mode on SSDs
    block: add blk_set_queue_dying() to blkdev.h
    blk-mq: Shared tag enhancements
    block: don't honor chunk sizes for data-less IO
    block: only honor SG gap prevention for merges that contain data
    block: fix returnvar.cocci warnings
    block, dm: don't copy bios for request clones
    block: remove management of bi_remaining when restoring original bi_end_io
    block: replace trylock with mutex_lock in blkdev_reread_part()
    block: export blkdev_reread_part() and __blkdev_reread_part()
    suspend: simplify block I/O handling
    block: collapse bio bit space
    block: remove unused BIO_RW_BLOCK and BIO_EOF flags
    block: remove BIO_EOPNOTSUPP
    ...

    Linus Torvalds
     

02 Jun, 2015

5 commits

  • Now that bdi layer can handle per-blkcg bdi_writeback_congested state,
    blk_{set|clear}_congested() can propagate non-root blkcg congestion
    state to them.

    This can be easily achieved by disabling the root_rl tests in
    blk_{set|clear}_congested(). Note that we still need those tests when
    !CONFIG_CGROUP_WRITEBACK as otherwise we'll end up flipping root blkcg
    wb's congestion state for events happening on other blkcgs.

    v2: Updated for bdi_writeback_congested.

    Signed-off-by: Tejun Heo
    Cc: Jens Axboe
    Cc: Jan Kara
    Cc: Vivek Goyal
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • blk_{set|clear}_queue_congested() take @q and set or clear,
    respectively, the congestion state of its bdi's root wb. Because bdi
    used to be able to handle congestion state only on the root wb, the
    callers of those functions tested whether the congestion is on the
    root blkcg and skipped if not.

    This is cumbersome and makes implementation of per cgroup
    bdi_writeback congestion state propagation difficult. This patch
    renames blk_{set|clear}_queue_congested() to
    blk_{set|clear}_congested(), and makes them take request_list instead
    of request_queue and test whether the specified request_list is the
    root one before updating bdi_writeback congestion state. This makes
    the tests in the callers unnecessary and simplifies them.

    As there are no external users of these functions, the definitions are
    moved from include/linux/blkdev.h to block/blk-core.c.

    This patch doesn't introduce any noticeable behavior difference.

    Signed-off-by: Tejun Heo
    Cc: Jens Axboe
    Cc: Jan Kara
    Cc: Vivek Goyal
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • cgroup writeback requires support from both bdi and filesystem sides.
    Add BDI_CAP_CGROUP_WRITEBACK and FS_CGROUP_WRITEBACK to indicate
    support and enable BDI_CAP_CGROUP_WRITEBACK on block based bdi's by
    default. Also, define CONFIG_CGROUP_WRITEBACK which is enabled if
    both MEMCG and BLK_CGROUP are enabled.

    inode_cgwb_enabled() which determines whether a given inode's both bdi
    and fs support cgroup writeback is added.

    Signed-off-by: Tejun Heo
    Cc: Jens Axboe
    Cc: Jan Kara
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • Currently, a bdi (backing_dev_info) embeds single wb (bdi_writeback)
    and the role of the separation is unclear. For cgroup support for
    writeback IOs, a bdi will be updated to host multiple wb's where each
    wb serves writeback IOs of a different cgroup on the bdi. To achieve
    that, a wb should carry all states necessary for servicing writeback
    IOs for a cgroup independently.

    This patch moves bdi->state into wb.

    * enum bdi_state is renamed to wb_state and the prefix of all enums is
    changed from BDI_ to WB_.

    * Explicit zeroing of bdi->state is removed without adding zeoring of
    wb->state as the whole data structure is zeroed on init anyway.

    * As there's still only one bdi_writeback per backing_dev_info, all
    uses of bdi->state are mechanically replaced with bdi->wb.state
    introducing no behavior changes.

    Signed-off-by: Tejun Heo
    Reviewed-by: Jan Kara
    Cc: Jens Axboe
    Cc: Wu Fengguang
    Cc: drbd-dev@lists.linbit.com
    Cc: Neil Brown
    Cc: Alasdair Kergon
    Cc: Mike Snitzer
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • cgroup aware writeback support will require exposing some of blkcg
    details. In preprataion, move block/blk-cgroup.h to
    include/linux/blk-cgroup.h. This patch is pure file move.

    Signed-off-by: Tejun Heo
    Cc: Vivek Goyal
    Signed-off-by: Jens Axboe

    Tejun Heo
     

30 May, 2015

1 commit


22 May, 2015

1 commit

  • Currently dm-multipath has to clone the bios for every request sent
    to the lower devices, which wastes cpu cycles and ties down memory.

    This patch instead adds a new REQ_CLONE flag that instructs req_bio_endio
    to not complete bios attached to a request, which we set on clone
    requests similar to bios in a flush sequence. With this change I/O
    errors on a path failure only get propagated to dm-multipath, which
    can then either resubmit the I/O or complete the bios on the original
    request.

    I've done some basic testing of this on a Linux target with ALUA support,
    and it survives path failures during I/O nicely.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Mike Snitzer
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

19 May, 2015

1 commit


13 May, 2015

1 commit

  • With commit ff36ab345 ("dm: remove request-based logic from
    make_request_fn wrapper") DM no longer calls blk_queue_bio() directly,
    so remove its export. Doing so required a forward declaration in
    blk-core.c.

    Signed-off-by: Mike Snitzer
    Signed-off-by: Jens Axboe

    Mike Snitzer
     

09 May, 2015

2 commits

  • Last patch makes plug work for multiple queue case. However it only
    works for single disk case, because it assumes only one request in the
    plug list. If a task is accessing multiple disks, eg MD/DM, the
    assumption is wrong. Let blk_attempt_plug_merge() record request from
    the same queue.

    V2: use NULL parameter in !mq case. Fix a bug. Add comments in
    blk_attempt_plug_merge to make it less (hopefully) confusion.

    Cc: Jens Axboe
    Cc: Christoph Hellwig
    Signed-off-by: Shaohua Li
    Signed-off-by: Jens Axboe

    Shaohua Li
     
  • Current code looks like inner plug gets flushed with a
    blk_finish_plug(). Actually it's a nop. All requests/callbacks are added
    to current->plug, while only outmost plug is assigned to current->plug.
    So inner plug always has empty request/callback list, which makes
    blk_flush_plug_list() a nop. This tries to make the code more clear.

    Signed-off-by: Shaohua Li
    Reviewed-by: Jeff Moyer
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Shaohua Li
     

06 May, 2015

1 commit


28 Apr, 2015

1 commit

  • Because of the peculiar way that md devices are created (automatically
    when the device node is opened), a new device can be created and
    registered immediately after the
    blk_unregister_region(disk_devt(disk), disk->minors);
    call in del_gendisk().

    Therefore it is important that all visible artifacts of the previous
    device are removed before this call. In particular, the 'bdi'.

    Since:
    commit c4db59d31e39ea067c32163ac961e9c80198fd37
    Author: Christoph Hellwig
    fs: don't reassign dirty inodes to default_backing_dev_info

    moved the
    device_unregister(bdi->dev);
    call from bdi_unregister() to bdi_destroy() it has been quite easy to
    lose a race and have a new (e.g.) "md127" be created after the
    blk_unregister_region() call and before bdi_destroy() is ultimately
    called by the final 'put_disk', which must come after del_gendisk().

    The new device finds that the bdi name is already registered in sysfs
    and complains

    > [ 9627.630029] WARNING: CPU: 18 PID: 3330 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x5a/0x70()
    > [ 9627.630032] sysfs: cannot create duplicate filename '/devices/virtual/bdi/9:127'

    We can fix this by moving the bdi_destroy() call out of
    blk_release_queue() (which can happen very late when a refcount
    reaches zero) and into blk_cleanup_queue() - which happens exactly when the md
    device driver calls it.

    Then it is only necessary for md to call blk_cleanup_queue() before
    del_gendisk(). As loop.c devices are also created on demand by
    opening the device node, we make the same change there.

    Fixes: c4db59d31e39ea067c32163ac961e9c80198fd37
    Reported-by: Azat Khuzhin
    Cc: Christoph Hellwig
    Cc: stable@vger.kernel.org (v4.0)
    Signed-off-by: NeilBrown
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    NeilBrown
     

25 Mar, 2015

1 commit

  • blk_init_rl() allocates a mempool using mempool_create_node() with node
    local memory. This only allocates the mempool and element list locally
    to the requeue queue node.

    What we really want to do is allocate the request itself local to the
    queue. To do this, we need our own alloc and free functions that will
    allocate from request_cachep and pass the request queue node in to prefer
    node local memory.

    Acked-by: Tejun Heo
    Signed-off-by: David Rientjes
    Signed-off-by: Jens Axboe

    David Rientjes
     

13 Feb, 2015

1 commit

  • Pull core block IO changes from Jens Axboe:
    "This contains:

    - A series from Christoph that cleans up and refactors various parts
    of the REQ_BLOCK_PC handling. Contributions in that series from
    Dongsu Park and Kent Overstreet as well.

    - CFQ:
    - A bug fix for cfq for realtime IO scheduling from Jeff Moyer.
    - A stable patch fixing a potential crash in CFQ in OOM
    situations. From Konstantin Khlebnikov.

    - blk-mq:
    - Add support for tag allocation policies, from Shaohua. This is
    a prep patch enabling libata (and other SCSI parts) to use the
    blk-mq tagging, instead of rolling their own.
    - Various little tweaks from Keith and Mike, in preparation for
    DM blk-mq support.
    - Minor little fixes or tweaks from me.
    - A double free error fix from Tony Battersby.

    - The partition 4k issue fixes from Matthew and Boaz.

    - Add support for zero+unprovision for blkdev_issue_zeroout() from
    Martin"

    * 'for-3.20/core' of git://git.kernel.dk/linux-block: (27 commits)
    block: remove unused function blk_bio_map_sg
    block: handle the null_mapped flag correctly in blk_rq_map_user_iov
    blk-mq: fix double-free in error path
    block: prevent request-to-request merging with gaps if not allowed
    blk-mq: make blk_mq_run_queues() static
    dm: fix multipath regression due to initializing wrong request
    cfq-iosched: handle failure of cfq group allocation
    block: Quiesce zeroout wrapper
    block: rewrite and split __bio_copy_iov()
    block: merge __bio_map_user_iov into bio_map_user_iov
    block: merge __bio_map_kern into bio_map_kern
    block: pass iov_iter to the BLOCK_PC mapping functions
    block: add a helper to free bio bounce buffer pages
    block: use blk_rq_map_user_iov to implement blk_rq_map_user
    block: simplify bio_map_kern
    block: mark blk-mq devices as stackable
    block: keep established cmd_flags when cloning into a blk-mq request
    block: add blk-mq support to blk_insert_cloned_request()
    block: require blk_rq_prep_clone() be given an initialized clone request
    blk-mq: add tag allocation policy
    ...

    Linus Torvalds
     

29 Jan, 2015

3 commits

  • blk_mq_alloc_request() may establish REQ_MQ_INFLIGHT in addition to
    incrementing the hctx->nr_active count. Any cmd_flags that are
    established in the newly allocated clone request must be preserved in
    addition to the cmd_flags that are later copied over from the original
    request as part of blk_rq_prep_clone().

    Otherwise, if REQ_MQ_INFLIGHT isn't set in the clone request the
    hctx->nr_active count won't get decremented via blk_mq_free_request().

    The only consumer of blk_rq_prep_clone() is request-based DM, which uses
    blk_rq_init() prior to calling blk_rq_prep_clone() for the non-blk-mq
    case. Given the cloned request's cmd_flags will be 0 it is safe to OR
    them with the original request's cmd_flags for both the non-blk-mq and
    blk-mq cases.

    Reported-by: Bart Van Assche
    Signed-off-by: Keith Busch
    Signed-off-by: Mike Snitzer
    Signed-off-by: Jens Axboe

    Keith Busch
     
  • If the request passed to blk_insert_cloned_request() was allocated by
    a blk-mq device it must be submitted using blk_mq_insert_request().

    Signed-off-by: Keith Busch
    Signed-off-by: Mike Snitzer
    Signed-off-by: Jens Axboe

    Keith Busch
     
  • Prepare to allow blk_rq_prep_clone() to accept clone requests that were
    allocated from blk-mq request queues. As such the blk_rq_prep_clone()
    caller must first initialize the clone request.

    Signed-off-by: Keith Busch
    Signed-off-by: Mike Snitzer
    Signed-off-by: Jens Axboe

    Keith Busch
     

21 Jan, 2015

1 commit

  • Since "BDI: Provide backing device capability information [try #3]" the
    backing_dev_info structure also provides flags for the kind of mmap
    operation available in a nommu environment, which is entirely unrelated
    to it's original purpose.

    Introduce a new nommu-only file operation to provide this information to
    the nommu mmap code instead. Splitting this from the backing_dev_info
    structure allows to remove lots of backing_dev_info instance that aren't
    otherwise needed, and entirely gets rid of the concept of providing a
    backing_dev_info for a character device. It also removes the need for
    the mtd_inodefs filesystem.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Tejun Heo
    Acked-by: Brian Norris
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

01 Jan, 2015

1 commit

  • If it's dying, we can't expect new request to complete and come
    in an wake up other tasks waiting for requests. So after we
    have marked it as dying, wake up everybody currently waiting
    for a request. Once they wake, they will retry their allocation
    and fail appropriately due to the state of the queue.

    Tested-by: Keith Busch
    Signed-off-by: Jens Axboe

    Jens Axboe
     

14 Dec, 2014

1 commit

  • Pull block driver core update from Jens Axboe:
    "This is the pull request for the core block IO changes for 3.19. Not
    a huge round this time, mostly lots of little good fixes:

    - Fix a bug in sysfs blktrace interface causing a NULL pointer
    dereference, when enabled/disabled through that API. From Arianna
    Avanzini.

    - Various updates/fixes/improvements for blk-mq:

    - A set of updates from Bart, mostly fixing buts in the tag
    handling.

    - Cleanup/code consolidation from Christoph.

    - Extend queue_rq API to be able to handle batching issues of IO
    requests. NVMe will utilize this shortly. From me.

    - A few tag and request handling updates from me.

    - Cleanup of the preempt handling for running queues from Paolo.

    - Prevent running of unmapped hardware queues from Ming Lei.

    - Move the kdump memory limiting check to be in the correct
    location, from Shaohua.

    - Initialize all software queues at init time from Takashi. This
    prevents a kobject warning when CPUs are brought online that
    weren't online when a queue was registered.

    - Single writeback fix for I_DIRTY clearing from Tejun. Queued with
    the core IO changes, since it's just a single fix.

    - Version X of the __bio_add_page() segment addition retry from
    Maurizio. Hope the Xth time is the charm.

    - Documentation fixup for IO scheduler merging from Jan.

    - Introduce (and use) generic IO stat accounting helpers for non-rq
    drivers, from Gu Zheng.

    - Kill off artificial limiting of max sectors in a request from
    Christoph"

    * 'for-3.19/core' of git://git.kernel.dk/linux-block: (26 commits)
    bio: modify __bio_add_page() to accept pages that don't start a new segment
    blk-mq: Fix uninitialized kobject at CPU hotplugging
    blktrace: don't let the sysfs interface remove trace from running list
    blk-mq: Use all available hardware queues
    blk-mq: Micro-optimize bt_get()
    blk-mq: Fix a race between bt_clear_tag() and bt_get()
    blk-mq: Avoid that __bt_get_word() wraps multiple times
    blk-mq: Fix a use-after-free
    blk-mq: prevent unmapped hw queue from being scheduled
    blk-mq: re-check for available tags after running the hardware queue
    blk-mq: fix hang in bt_get()
    blk-mq: move the kdump check to blk_mq_alloc_tag_set
    blk-mq: cleanup tag free handling
    blk-mq: use 'nr_cpu_ids' as highest CPU ID count for hwq cpu map
    blk: introduce generic io stat accounting help function
    blk-mq: handle the single queue case in blk_mq_hctx_next_cpu
    genhd: check for int overflow in disk_expand_part_tbl()
    blk-mq: add blk_mq_free_hctx_request()
    blk-mq: export blk_mq_free_request()
    blk-mq: use get_cpu/put_cpu instead of preempt_disable/preempt_enable
    ...

    Linus Torvalds
     

11 Dec, 2014

1 commit

  • Pull ACPI and power management updates from Rafael Wysocki:
    "This time we have some more new material than we used to have during
    the last couple of development cycles.

    The most important part of it to me is the introduction of a unified
    interface for accessing device properties provided by platform
    firmware. It works with Device Trees and ACPI in a uniform way and
    drivers using it need not worry about where the properties come from
    as long as the platform firmware (either DT or ACPI) makes them
    available. It covers both devices and "bare" device node objects
    without struct device representation as that turns out to be necessary
    in some cases. This has been in the works for quite a few months (and
    development cycles) and has been approved by all of the relevant
    maintainers.

    On top of that, some drivers are switched over to the new interface
    (at25, leds-gpio, gpio_keys_polled) and some additional changes are
    made to the core GPIO subsystem to allow device drivers to manipulate
    GPIOs in the "canonical" way on platforms that provide GPIO
    information in their ACPI tables, but don't assign names to GPIO lines
    (in which case the driver needs to do that on the basis of what it
    knows about the device in question). That also has been approved by
    the GPIO core maintainers and the rfkill driver is now going to use
    it.

    Second is support for hardware P-states in the intel_pstate driver.
    It uses CPUID to detect whether or not the feature is supported by the
    processor in which case it will be enabled by default. However, it
    can be disabled entirely from the kernel command line if necessary.

    Next is support for a platform firmware interface based on ACPI
    operation regions used by the PMIC (Power Management Integrated
    Circuit) chips on the Intel Baytrail-T and Baytrail-T-CR platforms.
    That interface is used for manipulating power resources and for
    thermal management: sensor temperature reporting, trip point setting
    and so on.

    Also the ACPI core is now going to support the _DEP configuration
    information in a limited way. Basically, _DEP it supposed to reflect
    off-the-hierarchy dependencies between devices which may be very
    indirect, like when AML for one device accesses locations in an
    operation region handled by another device's driver (usually, the
    device depended on this way is a serial bus or GPIO controller). The
    support added this time is sufficient to make the ACPI battery driver
    work on Asus T100A, but it is general enough to be able to cover some
    other use cases in the future.

    Finally, we have a new cpufreq driver for the Loongson1B processor.

    In addition to the above, there are fixes and cleanups all over the
    place as usual and a traditional ACPICA update to a recent upstream
    release.

    As far as the fixes go, the ACPI LPSS (Low-power Subsystem) driver for
    Intel platforms should be able to handle power management of the DMA
    engine correctly, the cpufreq-dt driver should interact with the
    thermal subsystem in a better way and the ACPI backlight driver should
    handle some more corner cases, among other things.

    On top of the ACPICA update there are fixes for race conditions in the
    ACPICA's interrupt handling code which might lead to some random and
    strange looking failures on some systems.

    In the cleanups department the most visible part is the series of
    commits targeted at getting rid of the CONFIG_PM_RUNTIME configuration
    option. That was triggered by a discussion regarding the generic
    power domains code during which we realized that trying to support
    certain combinations of PM config options was painful and not really
    worth it, because nobody would use them in production anyway. For
    this reason, we decided to make CONFIG_PM_SLEEP select
    CONFIG_PM_RUNTIME and that lead to the conclusion that the latter
    became redundant and CONFIG_PM could be used instead of it. The
    material here makes that replacement in a major part of the tree, but
    there will be at least one more batch of that in the second part of
    the merge window.

    Specifics:

    - Support for retrieving device properties information from ACPI _DSD
    device configuration objects and a unified device properties
    interface for device drivers (and subsystems) on top of that. As
    stated above, this works with Device Trees and ACPI and allows
    device drivers to be written in a platform firmware (DT or ACPI)
    agnostic way. The at25, leds-gpio and gpio_keys_polled drivers are
    now going to use this new interface and the GPIO subsystem is
    additionally modified to allow device drivers to assign names to
    GPIO resources returned by ACPI _CRS objects (in case _DSD is not
    present or does not provide the expected data). The changes in
    this set are mostly from Mika Westerberg, Rafael J Wysocki, Aaron
    Lu, and Darren Hart with some fixes from others (Fabio Estevam,
    Geert Uytterhoeven).

    - Support for Hardware Managed Performance States (HWP) as described
    in Volume 3, section 14.4, of the Intel SDM in the intel_pstate
    driver. CPUID is used to detect whether or not the feature is
    supported by the processor. If supported, it will be enabled
    automatically unless the intel_pstate=no_hwp switch is present in
    the kernel command line. From Dirk Brandewie.

    - New Intel Broadwell-H ID for intel_pstate (Dirk Brandewie).

    - Support for firmware interface based on ACPI operation regions used
    by the PMIC chips on the Intel Baytrail-T and Baytrail-T-CR
    platforms for power resource control and thermal management (Aaron
    Lu).

    - Limited support for retrieving off-the-hierarchy dependencies
    between devices from ACPI _DEP device configuration objects and
    deferred probing support for the ACPI battery driver based on the
    _DEP information to make that driver work on Asus T100A (Lan
    Tianyu).

    - New cpufreq driver for the Loongson1B processor (Kelvin Cheung).

    - ACPICA update to upstream revision 20141107 which only affects
    tools (Bob Moore).

    - Fixes for race conditions in the ACPICA's interrupt handling code
    and in the ACPI code related to system suspend and resume (Lv Zheng
    and Rafael J Wysocki).

    - ACPI core fix for an RCU-related issue in the ioremap() regions
    management code that slowed down significantly after CPUs had been
    allowed to enter idle states even if they'd had RCU callbakcs
    queued and triggered some problems in certain proprietary graphics
    driver (and elsewhere). The fix replaces synchronize_rcu() in that
    code with synchronize_rcu_expedited() which makes the issue go
    away. From Konstantin Khlebnikov.

    - ACPI LPSS (Low-Power Subsystem) driver fix to handle power
    management of the DMA engine included into the LPSS correctly. The
    problem is that the DMA engine doesn't have ACPI PM support of its
    own and it simply is turned off when the last LPSS device having
    ACPI PM support goes into D3cold. To work around that, the PM
    domain used by the ACPI LPSS driver is redesigned so at least one
    device with ACPI PM support will be on as long as the DMA engine is
    in use. From Andy Shevchenko.

    - ACPI backlight driver fix to avoid using it on "Win8-compatible"
    systems where it doesn't work and where it was used by default by
    mistake (Aaron Lu).

    - Assorted minor ACPI core fixes and cleanups from Tomasz Nowicki,
    Sudeep Holla, Huang Rui, Hanjun Guo, Fabian Frederick, and Ashwin
    Chaugule (mostly related to the upcoming ARM64 support).

    - Intel RAPL (Running Average Power Limit) power capping driver fixes
    and improvements including new processor IDs (Jacob Pan).

    - Generic power domains modification to power up domains after
    attaching devices to them to meet the expectations of device
    drivers and bus types assuming devices to be accessible at probe
    time (Ulf Hansson).

    - Preliminary support for controlling device clocks from the generic
    power domains core code and modifications of the ARM/shmobile
    platform to use that feature (Ulf Hansson).

    - Assorted minor fixes and cleanups of the generic power domains core
    code (Ulf Hansson, Geert Uytterhoeven).

    - Assorted minor fixes and cleanups of the device clocks control code
    in the PM core (Geert Uytterhoeven, Grygorii Strashko).

    - Consolidation of device power management Kconfig options by making
    CONFIG_PM_SLEEP select CONFIG_PM_RUNTIME and removing the latter
    which is now redundant (Rafael J Wysocki and Kevin Hilman). That
    is the first batch of the changes needed for this purpose.

    - Core device runtime power management support code cleanup related
    to the execution of callbacks (Andrzej Hajda).

    - cpuidle ARM support improvements (Lorenzo Pieralisi).

    - cpuidle cleanup related to the CPUIDLE_FLAG_TIME_VALID flag and a
    new MAINTAINERS entry for ARM Exynos cpuidle (Daniel Lezcano and
    Bartlomiej Zolnierkiewicz).

    - New cpufreq driver callback (->ready) to be executed when the
    cpufreq core is ready to use a given policy object and cpufreq-dt
    driver modification to use that callback for cooling device
    registration (Viresh Kumar).

    - cpufreq core fixes and cleanups (Viresh Kumar, Vince Hsu, James
    Geboski, Tomeu Vizoso).

    - Assorted fixes and cleanups in the cpufreq-pcc, intel_pstate,
    cpufreq-dt, pxa2xx cpufreq drivers (Lenny Szubowicz, Ethan Zhao,
    Stefan Wahren, Petr Cvek).

    - OPP (Operating Performance Points) framework modification to allow
    OPPs to be removed too and update of a few cpufreq drivers
    (cpufreq-dt, exynos5440, imx6q, cpufreq) to remove OPPs (added
    during initialization) on driver removal (Viresh Kumar).

    - Hibernation core fixes and cleanups (Tina Ruchandani and Markus
    Elfring).

    - PM Kconfig fix related to CPU power management (Pankaj Dubey).

    - cpupower tool fix (Prarit Bhargava)"

    * tag 'pm+acpi-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (120 commits)
    i2c-omap / PM: Drop CONFIG_PM_RUNTIME from i2c-omap.c
    dmaengine / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
    tools: cpupower: fix return checks for sysfs_get_idlestate_count()
    drivers: sh / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
    e1000e / igb / PM: Eliminate CONFIG_PM_RUNTIME
    MMC / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
    MFD / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
    misc / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
    media / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
    input / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
    leds: leds-gpio: Fix multiple instances registration without 'label' property
    iio / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
    hsi / OMAP / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
    i2c-hid / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
    drm / exynos / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
    gpio / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
    hwrandom / exynos / PM: Use CONFIG_PM in #ifdef
    block / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
    USB / PM: Drop CONFIG_PM_RUNTIME from the USB core
    PM: Merge the SET*_RUNTIME_PM_OPS() macros
    ...

    Linus Torvalds
     

10 Dec, 2014

1 commit

  • blk-mq users are allowed to free the memory request_queue.tag_set
    points at after blk_cleanup_queue() has finished but before
    blk_release_queue() has started. This can happen e.g. in the SCSI
    core. The SCSI core namely embeds the tag_set structure in a SCSI
    host structure. The SCSI host structure is freed by
    scsi_host_dev_release(). This function is called after
    blk_cleanup_queue() finished but can be called before
    blk_release_queue().

    This means that it is not safe to access request_queue.tag_set from
    inside blk_release_queue(). Hence remove the blk_sync_queue() call
    from blk_release_queue(). This call is not necessary - outstanding
    requests must have finished before blk_release_queue() is
    called. Additionally, move the blk_mq_free_queue() call from
    blk_release_queue() to blk_cleanup_queue() to avoid that struct
    request_queue.tag_set gets accessed after it has been freed.

    This patch avoids that the following kernel oops can be triggered
    when deleting a SCSI host for which scsi-mq was enabled:

    Call Trace:
    [] lock_acquire+0xc4/0x270
    [] mutex_lock_nested+0x61/0x380
    [] blk_mq_free_queue+0x30/0x180
    [] blk_release_queue+0x84/0xd0
    [] kobject_cleanup+0x7b/0x1a0
    [] kobject_put+0x30/0x70
    [] blk_put_queue+0x15/0x20
    [] disk_release+0x99/0xd0
    [] device_release+0x36/0xb0
    [] kobject_cleanup+0x7b/0x1a0
    [] kobject_put+0x30/0x70
    [] put_disk+0x1a/0x20
    [] __blkdev_put+0x135/0x1b0
    [] blkdev_put+0x50/0x160
    [] kill_block_super+0x44/0x70
    [] deactivate_locked_super+0x44/0x60
    [] deactivate_super+0x4e/0x70
    [] cleanup_mnt+0x43/0x90
    [] __cleanup_mnt+0x12/0x20
    [] task_work_run+0xac/0xe0
    [] do_notify_resume+0x61/0xa0
    [] int_signal+0x12/0x17

    Signed-off-by: Bart Van Assche
    Cc: Christoph Hellwig
    Cc: Robert Elliott
    Cc: Ming Lei
    Cc: Alexander Gordeev
    Cc: # v3.13+
    Signed-off-by: Jens Axboe

    Bart Van Assche
     

04 Dec, 2014

1 commit

  • After commit b2b49ccbdd54 (PM: Kconfig: Set PM_RUNTIME if PM_SLEEP is
    selected) PM_RUNTIME is always set if PM is set, so #ifdef blocks
    depending on CONFIG_PM_RUNTIME may now be changed to depend on
    CONFIG_PM.

    Replace CONFIG_PM_RUNTIME with CONFIG_PM in the block device core.

    Reviewed-by: Aaron Lu
    Acked-by: Jens Axboe
    Signed-off-by: Rafael J. Wysocki

    Rafael J. Wysocki
     

12 Nov, 2014

1 commit

  • Currently scsi piggy backs on the block layer to define the concept
    of a tagged command. But we want to be able to have block-level host-wide
    tags assigned even for untagged commands like the initial INQUIRY, so add
    a new SCSI-level flag for commands that are tagged at the scsi level, so
    that even commands without that set can have tags assigned to them. Note
    that this alredy is the case for the blk-mq code path, and this just lets
    the old path catch up with it.

    We also set this flag based upon sdev->simple_tags instead of the block
    queue flag, so that it is entirely independent of the block layer tagging,
    and thus always correct even if a driver doesn't use block level tagging
    yet.

    Also remove the old blk_rq_tagged; it was only used by SCSI drivers, and
    removing it forces them to look for the proper replacement.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Mike Christie
    Reviewed-by: Martin K. Petersen
    Reviewed-by: Hannes Reinecke

    Christoph Hellwig
     

19 Oct, 2014

1 commit

  • Pull core block layer changes from Jens Axboe:
    "This is the core block IO pull request for 3.18. Apart from the new
    and improved flush machinery for blk-mq, this is all mostly bug fixes
    and cleanups.

    - blk-mq timeout updates and fixes from Christoph.

    - Removal of REQ_END, also from Christoph. We pass it through the
    ->queue_rq() hook for blk-mq instead, freeing up one of the request
    bits. The space was overly tight on 32-bit, so Martin also killed
    REQ_KERNEL since it's no longer used.

    - blk integrity updates and fixes from Martin and Gu Zheng.

    - Update to the flush machinery for blk-mq from Ming Lei. Now we
    have a per hardware context flush request, which both cleans up the
    code should scale better for flush intensive workloads on blk-mq.

    - Improve the error printing, from Rob Elliott.

    - Backing device improvements and cleanups from Tejun.

    - Fixup of a misplaced rq_complete() tracepoint from Hannes.

    - Make blk_get_request() return error pointers, fixing up issues
    where we NULL deref when a device goes bad or missing. From Joe
    Lawrence.

    - Prep work for drastically reducing the memory consumption of dm
    devices from Junichi Nomura. This allows creating clone bio sets
    without preallocating a lot of memory.

    - Fix a blk-mq hang on certain combinations of queue depths and
    hardware queues from me.

    - Limit memory consumption for blk-mq devices for crash dump
    scenarios and drivers that use crazy high depths (certain SCSI
    shared tag setups). We now just use a single queue and limited
    depth for that"

    * 'for-3.18/core' of git://git.kernel.dk/linux-block: (58 commits)
    block: Remove REQ_KERNEL
    blk-mq: allocate cpumask on the home node
    bio-integrity: remove the needless fail handle of bip_slab creating
    block: include func name in __get_request prints
    block: make blk_update_request print prefix match ratelimited prefix
    blk-merge: don't compute bi_phys_segments from bi_vcnt for cloned bio
    block: fix alignment_offset math that assumes io_min is a power-of-2
    blk-mq: Make bt_clear_tag() easier to read
    blk-mq: fix potential hang if rolling wakeup depth is too high
    block: add bioset_create_nobvec()
    block: use bio_clone_fast() in blk_rq_prep_clone()
    block: misplaced rq_complete tracepoint
    sd: Honor block layer integrity handling flags
    block: Replace strnicmp with strncasecmp
    block: Add T10 Protection Information functions
    block: Don't merge requests if integrity flags differ
    block: Integrity checksum flag
    block: Relocate bio integrity flags
    block: Add a disk flag to block integrity profile
    block: Add prefix to block integrity profile flags
    ...

    Linus Torvalds
     

13 Oct, 2014

2 commits

  • In __get_request calls to printk_ratelimited, include the function name so
    the callbacks suppressed message matches the messages that are printed,
    and add "dev" before the device name so it matches other block layer
    messages.

    Signed-off-by: Robert Elliott
    Reviewed-by: Webb Scales
    Signed-off-by: Jens Axboe

    Robert Elliott
     
  • In blk_update_request, change the printk_ratelimited
    prefix from end_request to blk_update_request so it
    matches the name printed if rate limiting occurs.

    Old:
    [10234.933106] blk_update_request: 174 callbacks suppressed
    [10234.934940] end_request: critical target error, dev sdr, sector 16
    [10234.949788] end_request: critical target error, dev sdr, sector 16

    New:
    [16863.445173] blk_update_request: 398 callbacks suppressed
    [16863.447029] blk_update_request: critical target error, dev sdr, sector
    1442066176
    [16863.449383] blk_update_request: critical target error, dev sdr, sector
    802802888
    [16863.451680] blk_update_request: critical target error, dev sdr, sector
    1609535456

    Signed-off-by: Robert Elliott
    Reviewed-by: Webb Scales
    Signed-off-by: Jens Axboe

    Robert Elliott
     

08 Oct, 2014

1 commit

  • Pull "trivial tree" updates from Jiri Kosina:
    "Usual pile from trivial tree everyone is so eagerly waiting for"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (39 commits)
    Remove MN10300_PROC_MN2WS0038
    mei: fix comments
    treewide: Fix typos in Kconfig
    kprobes: update jprobe_example.c for do_fork() change
    Documentation: change "&" to "and" in Documentation/applying-patches.txt
    Documentation: remove obsolete pcmcia-cs from Changes
    Documentation: update links in Changes
    Documentation: Docbook: Fix generated DocBook/kernel-api.xml
    score: Remove GENERIC_HAS_IOMAP
    gpio: fix 'CONFIG_GPIO_IRQCHIP' comments
    tty: doc: Fix grammar in serial/tty
    dma-debug: modify check_for_stack output
    treewide: fix errors in printk
    genirq: fix reference in devm_request_threaded_irq comment
    treewide: fix synchronize_rcu() in comments
    checkstack.pl: port to AArch64
    doc: queue-sysfs: minor fixes
    init/do_mounts: better syntax description
    MIPS: fix comment spelling
    powerpc/simpleboot: fix comment
    ...

    Linus Torvalds
     

04 Oct, 2014

1 commit

  • Request cloning clones bios in the request to track the completion
    of each bio.
    For that purpose, we can use bio_clone_fast() instead of bio_clone()
    to avoid unnecessary allocation and copy of bvecs.

    This patch reduces memory footprint of request-based device-mapper
    (about 1-4KB for each request) and is a preparation for further
    reduction of memory usage by removing unused bvec mempool.

    Signed-off-by: Jun'ichi Nomura
    Signed-off-by: Mike Snitzer
    Signed-off-by: Jens Axboe

    Junichi Nomura