14 Apr, 2016

1 commit


13 Apr, 2016

2 commits

  • We don't have any drivers left using it, so kill it off. Update
    documentation to use the newer blk_queue_write_cache().

    Signed-off-by: Jens Axboe
    Reviewed-by: Christoph Hellwig

    Jens Axboe
     
  • Add an internal helper and flag for setting whether a queue has
    write back caching, or write through (or none). Add a sysfs file
    to show this as well, and make it changeable from user space.

    This will replace the (awkward) blk_queue_flush() interface that
    drivers currently use to inform the block layer of write cache state
    and capabilities.

    Signed-off-by: Jens Axboe
    Reviewed-by: Christoph Hellwig

    Jens Axboe
     

05 Apr, 2016

1 commit

  • PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
    ago with promise that one day it will be possible to implement page
    cache with bigger chunks than PAGE_SIZE.

    This promise never materialized. And unlikely will.

    We have many places where PAGE_CACHE_SIZE assumed to be equal to
    PAGE_SIZE. And it's constant source of confusion on whether
    PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
    especially on the border between fs and mm.

    Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
    breakage to be doable.

    Let's stop pretending that pages in page cache are special. They are
    not.

    The changes are pretty straight-forward:

    - << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

    - >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

    - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

    - page_cache_get() -> get_page();

    - page_cache_release() -> put_page();

    This patch contains automated changes generated with coccinelle using
    script below. For some reason, coccinelle doesn't patch header files.
    I've called spatch for them manually.

    The only adjustment after coccinelle is revert of changes to
    PAGE_CAHCE_ALIGN definition: we are going to drop it later.

    There are few places in the code where coccinelle didn't reach. I'll
    fix them manually in a separate patch. Comments and documentation also
    will be addressed with the separate patch.

    virtual patch

    @@
    expression E;
    @@
    - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
    + E

    @@
    expression E;
    @@
    - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
    + E

    @@
    @@
    - PAGE_CACHE_SHIFT
    + PAGE_SHIFT

    @@
    @@
    - PAGE_CACHE_SIZE
    + PAGE_SIZE

    @@
    @@
    - PAGE_CACHE_MASK
    + PAGE_MASK

    @@
    expression E;
    @@
    - PAGE_CACHE_ALIGN(E)
    + PAGE_ALIGN(E)

    @@
    expression E;
    @@
    - page_cache_get(E)
    + get_page(E)

    @@
    expression E;
    @@
    - page_cache_release(E)
    + put_page(E)

    Signed-off-by: Kirill A. Shutemov
    Acked-by: Michal Hocko
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     

12 Feb, 2016

1 commit

  • The new queue limit is not used by the majority of block drivers, and
    should be initialized to 0 for the driver's requested settings to be used.

    Signed-off-by: Keith Busch
    Acked-by: Martin K. Petersen
    Reviewed-by: Sagi Grimberg
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Keith Busch
     

26 Nov, 2015

1 commit

  • Commit 4f258a46346c ("sd: Fix maximum I/O size for BLOCK_PC requests")
    had the unfortunate side-effect of removing an implicit clamp to
    BLK_DEF_MAX_SECTORS for REQ_TYPE_FS requests in the block layer
    code. This caused problems for some SMR drives.

    Debugging this issue revealed a few problems with the existing
    infrastructure since the block layer didn't know how to deal with
    device-imposed limits, only limits set by the I/O controller.

    - Introduce a new queue limit, max_dev_sectors, which is used by the
    ULD to signal the maximum sectors for a REQ_TYPE_FS request.

    - Ensure that max_dev_sectors is correctly stacked and taken into
    account when overriding max_sectors through sysfs.

    - Rework sd_read_block_limits() so it saves the max_xfer and opt_xfer
    values for later processing.

    - In sd_revalidate() set the queue's max_dev_sectors based on the
    MAXIMUM TRANSFER LENGTH value in the Block Limits VPD. If this value
    is not reported, fall back to a cap based on the CDB TRANSFER LENGTH
    field size.

    - In sd_revalidate(), use OPTIMAL TRANSFER LENGTH from the Block Limits
    VPD--if reported and sane--to signal the preferred device transfer
    size for FS requests. Otherwise use BLK_DEF_MAX_SECTORS.

    - blk_limits_max_hw_sectors() is no longer used and can be removed.

    Signed-off-by: Martin K. Petersen
    Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=93581
    Reviewed-by: Christoph Hellwig
    Tested-by: sweeneygj@gmx.com
    Tested-by: Arzeets
    Tested-by: David Eisner
    Tested-by: Mario Kicherer
    Signed-off-by: Martin K. Petersen

    Martin K. Petersen
     

03 Sep, 2015

1 commit

  • Pull core block updates from Jens Axboe:
    "This first core part of the block IO changes contains:

    - Cleanup of the bio IO error signaling from Christoph. We used to
    rely on the uptodate bit and passing around of an error, now we
    store the error in the bio itself.

    - Improvement of the above from myself, by shrinking the bio size
    down again to fit in two cachelines on x86-64.

    - Revert of the max_hw_sectors cap removal from a revision again,
    from Jeff Moyer. This caused performance regressions in various
    tests. Reinstate the limit, bump it to a more reasonable size
    instead.

    - Make /sys/block//queue/discard_max_bytes writeable, by me.
    Most devices have huge trim limits, which can cause nasty latencies
    when deleting files. Enable the admin to configure the size down.
    We will look into having a more sane default instead of UINT_MAX
    sectors.

    - Improvement of the SGP gaps logic from Keith Busch.

    - Enable the block core to handle arbitrarily sized bios, which
    enables a nice simplification of bio_add_page() (which is an IO hot
    path). From Kent.

    - Improvements to the partition io stats accounting, making it
    faster. From Ming Lei.

    - Also from Ming Lei, a basic fixup for overflow of the sysfs pending
    file in blk-mq, as well as a fix for a blk-mq timeout race
    condition.

    - Ming Lin has been carrying Kents above mentioned patches forward
    for a while, and testing them. Ming also did a few fixes around
    that.

    - Sasha Levin found and fixed a use-after-free problem introduced by
    the bio->bi_error changes from Christoph.

    - Small blk cgroup cleanup from Viresh Kumar"

    * 'for-4.3/core' of git://git.kernel.dk/linux-block: (26 commits)
    blk: Fix bio_io_vec index when checking bvec gaps
    block: Replace SG_GAPS with new queue limits mask
    block: bump BLK_DEF_MAX_SECTORS to 2560
    Revert "block: remove artifical max_hw_sectors cap"
    blk-mq: fix race between timeout and freeing request
    blk-mq: fix buffer overflow when reading sysfs file of 'pending'
    Documentation: update notes in biovecs about arbitrarily sized bios
    block: remove bio_get_nr_vecs()
    fs: use helper bio_add_page() instead of open coding on bi_io_vec
    block: kill merge_bvec_fn() completely
    md/raid5: get rid of bio_fits_rdev()
    md/raid5: split bio for chunk_aligned_read
    block: remove split code in blkdev_issue_{discard,write_same}
    btrfs: remove bio splitting and merge_bvec_fn() calls
    bcache: remove driver private bio splitting code
    block: simplify bio_add_page()
    block: make generic_make_request handle arbitrarily sized bios
    blk-cgroup: Drop unlikely before IS_ERR(_OR_NULL)
    block: don't access bio->bi_error after bio_put()
    block: shrink struct bio down to 2 cache lines again
    ...

    Linus Torvalds
     

20 Aug, 2015

1 commit

  • The SG_GAPS queue flag caused checks for bio vector alignment against
    PAGE_SIZE, but the device may have different constraints. This patch
    adds a queue limits so a driver with such constraints can set to allow
    requests that would have been unnecessarily split. The new gaps check
    takes the request_queue as a parameter to simplify the logic around
    invoking this function.

    This new limit makes the queue flag redundant, so removing it and
    all usage. Device-mappers will inherit the correct settings through
    blk_stack_limits().

    Signed-off-by: Keith Busch
    Reviewed-by: Martin K. Petersen
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Keith Busch
     

19 Aug, 2015

1 commit

  • This reverts commit 34b48db66e08ca1c1bc07cf305d672ac940268dc.
    That commit caused performance regressions for streaming I/O
    workloads on a number of different storage devices, from
    SATA disks to external RAID arrays. It also managed to
    trip up some buggy firmware in at least one drive, causing
    data corruption.

    The next patch will bump the default max_sectors_kb value to
    1280, which will accommodate a 10-data-disk stripe write
    with chunk size 128k. In the testing I've done using iozone,
    fio, and aio-stress, a value of 1280 does not show a big
    performance difference from 512. This will hopefully still
    help the software RAID setup that Christoph saw the original
    performance gains with while still not regressing other
    storage configurations.

    Signed-off-by: Jeff Moyer
    Signed-off-by: Jens Axboe

    Jeff Moyer
     

14 Aug, 2015

1 commit

  • As generic_make_request() is now able to handle arbitrarily sized bios,
    it's no longer necessary for each individual block driver to define its
    own ->merge_bvec_fn() callback. Remove every invocation completely.

    Cc: Jens Axboe
    Cc: Lars Ellenberg
    Cc: drbd-user@lists.linbit.com
    Cc: Jiri Kosina
    Cc: Yehuda Sadeh
    Cc: Sage Weil
    Cc: Alex Elder
    Cc: ceph-devel@vger.kernel.org
    Cc: Alasdair Kergon
    Cc: Mike Snitzer
    Cc: dm-devel@redhat.com
    Cc: Neil Brown
    Cc: linux-raid@vger.kernel.org
    Cc: Christoph Hellwig
    Cc: "Martin K. Petersen"
    Acked-by: NeilBrown (for the 'md' bits)
    Acked-by: Mike Snitzer
    Signed-off-by: Kent Overstreet
    [dpark: also remove ->merge_bvec_fn() in dm-thin as well as
    dm-era-target, and resolve merge conflicts]
    Signed-off-by: Dongsu Park
    Signed-off-by: Ming Lin
    Signed-off-by: Jens Axboe

    Kent Overstreet
     

13 Aug, 2015

1 commit

  • Commit bcdb247c6b6a ("sd: Limit transfer length") clamped the maximum
    size of an I/O request to the MAXIMUM TRANSFER LENGTH field in the BLOCK
    LIMITS VPD. This had the unfortunate effect of also limiting the maximum
    size of non-filesystem requests sent to the device through sg/bsg.

    Avoid using blk_queue_max_hw_sectors() and set the max_sectors queue
    limit directly.

    Also update the comment in blk_limits_max_hw_sectors() to clarify that
    max_hw_sectors defines the limit for the I/O controller only.

    Signed-off-by: Martin K. Petersen
    Reported-by: Brian King
    Tested-by: Brian King
    Cc: stable@vger.kernel.org # 3.17+
    Signed-off-by: James Bottomley

    Martin K. Petersen
     

17 Jul, 2015

1 commit

  • Lots of devices support huge discard sizes these days. Depending
    on how the device handles them internally, huge discards can
    introduce massive latencies (hundreds of msec) on the device side.

    We have a sysfs file, discard_max_bytes, that advertises the max
    hardware supported discard size. Make this writeable, and split
    the settings into a soft and hard limit. This can be set from
    'discard_granularity' and up to the hardware limit.

    Add a new sysfs file, 'discard_max_hw_bytes', that shows the hw
    set limit.

    Reviewed-by: Jeff Moyer
    Signed-off-by: Jens Axboe

    Jens Axboe
     

31 Mar, 2015

1 commit

  • Linux 3.19 commit 69c953c ("lib/lcm.c: lcm(n,0)=lcm(0,n) is 0, not n")
    caused blk_stack_limits() to not properly stack queue_limits for stacked
    devices (e.g. DM).

    Fix this regression by establishing lcm_not_zero() and switching
    blk_stack_limits() over to using it.

    DM uses blk_set_stacking_limits() to establish the initial top-level
    queue_limits that are then built up based on underlying devices' limits
    using blk_stack_limits(). In the case of optimal_io_size (io_opt)
    blk_set_stacking_limits() establishes a default value of 0. With commit
    69c953c, lcm(0, n) is no longer n, which compromises proper stacking of
    the underlying devices' io_opt.

    Test:
    $ modprobe scsi_debug dev_size_mb=10 num_tgts=1 opt_blks=1536
    $ cat /sys/block/sde/queue/optimal_io_size
    786432
    $ dmsetup create node --table "0 100 linear /dev/sde 0"

    Before this fix:
    $ cat /sys/block/dm-5/queue/optimal_io_size
    0

    After this fix:
    $ cat /sys/block/dm-5/queue/optimal_io_size
    786432

    Signed-off-by: Mike Snitzer
    Cc: stable@vger.kernel.org # 3.19+
    Acked-by: Martin K. Petersen
    Signed-off-by: Jens Axboe

    Mike Snitzer
     

22 Oct, 2014

1 commit

  • Set max_sectors to the value the drivers provides as hardware limit by
    default. Linux had proper I/O throttling for a long time and doesn't
    rely on a artifically small maximum I/O size anymore. By not limiting
    the I/O size by default we remove an annoying tuning step required for
    most Linux installation.

    Note that both the user, and if absolutely required the driver can still
    impose a limit for FS requests below max_hw_sectors_kb.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

09 Oct, 2014

1 commit

  • The math in both blk_stack_limits() and queue_limit_alignment_offset()
    assume that a block device's io_min (aka minimum_io_size) is always a
    power-of-2. Fix the math such that it works for non-power-of-2 io_min.

    This issue (of alignment_offset != 0) became apparent when testing
    dm-thinp with a thinp blocksize that matches a RAID6 stripesize of
    1280K. Commit fdfb4c8c1 ("dm thin: set minimum_io_size to pool's data
    block size") unlocked the potential for alignment_offset != 0 due to
    the dm-thin-pool's io_min possibly being a non-power-of-2.

    Signed-off-by: Mike Snitzer
    Cc: stable@vger.kernel.org
    Acked-by: Martin K. Petersen
    Signed-off-by: Jens Axboe

    Mike Snitzer
     

11 Jun, 2014

1 commit

  • With commit 762380ad9322 added support for chunk sizes and no merging
    across them, it broke the rule of always allowing adding of a single
    page to an empty bio. So relax the restriction a bit to allow for that,
    similarly to what we have always done.

    This fixes a crash with mkfs.xfs and 512b sector sizes on NVMe.

    Reported-by: Keith Busch
    Signed-off-by: Jens Axboe

    Jens Axboe
     

06 Jun, 2014

1 commit

  • Some drivers have different limits on what size a request should
    optimally be, depending on the offset of the request. Similar to
    dividing a device into chunks. Add a setting that allows the driver
    to inform the block layer of such a chunk size. The block layer will
    then prevent merging across the chunks.

    This is needed to optimally support NVMe with a non-zero stripe size.

    Signed-off-by: Jens Axboe

    Jens Axboe
     

09 Jan, 2014

1 commit

  • Now that we've got code for raid5/6 stripe awareness, bcache just needs
    to know about the stripes and when writing partial stripes is expensive
    - we probably don't want to enable this optimization for raid1 or 10,
    even though they have stripes. So add a flag to queue_limits.

    Signed-off-by: Kent Overstreet

    Kent Overstreet
     

14 Nov, 2013

1 commit

  • Pull block IO core updates from Jens Axboe:
    "This is the pull request for the core changes in the block layer for
    3.13. It contains:

    - The new blk-mq request interface.

    This is a new and more scalable queueing model that marries the
    best part of the request based interface we currently have (which
    is fully featured, but scales poorly) and the bio based "interface"
    which the new drivers for high IOPS devices end up using because
    it's much faster than the request based one.

    The bio interface has no block layer support, since it taps into
    the stack much earlier. This means that drivers end up having to
    implement a lot of functionality on their own, like tagging,
    timeout handling, requeue, etc. The blk-mq interface provides all
    these. Some drivers even provide a switch to select bio or rq and
    has code to handle both, since things like merging only works in
    the rq model and hence is faster for some workloads. This is a
    huge mess. Conversion of these drivers nets us a substantial code
    reduction. Initial results on converting SCSI to this model even
    shows an 8x improvement on single queue devices. So while the
    model was intended to work on the newer multiqueue devices, it has
    substantial improvements for "classic" hardware as well. This code
    has gone through extensive testing and development, it's now ready
    to go. A pull request is coming to convert virtio-blk to this
    model will be will be coming as well, with more drivers scheduled
    for 3.14 conversion.

    - Two blktrace fixes from Jan and Chen Gang.

    - A plug merge fix from Alireza Haghdoost.

    - Conversion of __get_cpu_var() from Christoph Lameter.

    - Fix for sector_div() with 64-bit divider from Geert Uytterhoeven.

    - A fix for a race between request completion and the timeout
    handling from Jeff Moyer. This is what caused the merge conflict
    with blk-mq/core, in case you are looking at that.

    - A dm stacking fix from Mike Snitzer.

    - A code consolidation fix and duplicated code removal from Kent
    Overstreet.

    - A handful of block bug fixes from Mikulas Patocka, fixing a loop
    crash and memory corruption on blk cg.

    - Elevator switch bug fix from Tomoki Sekiyama.

    A heads-up that I had to rebase this branch. Initially the immutable
    bio_vecs had been queued up for inclusion, but a week later, it became
    clear that it wasn't fully cooked yet. So the decision was made to
    pull this out and postpone it until 3.14. It was a straight forward
    rebase, just pruning out the immutable series and the later fixes of
    problems with it. The rest of the patches applied directly and no
    further changes were made"

    * 'for-3.13/core' of git://git.kernel.dk/linux-block: (31 commits)
    block: replace IS_ERR and PTR_ERR with PTR_ERR_OR_ZERO
    block: replace IS_ERR and PTR_ERR with PTR_ERR_OR_ZERO
    block: Do not call sector_div() with a 64-bit divisor
    kernel: trace: blktrace: remove redundent memcpy() in compat_blk_trace_setup()
    block: Consolidate duplicated bio_trim() implementations
    block: Use rw_copy_check_uvector()
    block: Enable sysfs nomerge control for I/O requests in the plug list
    block: properly stack underlying max_segment_size to DM device
    elevator: acquire q->sysfs_lock in elevator_change()
    elevator: Fix a race in elevator switching and md device initialization
    block: Replace __get_cpu_var uses
    bdi: test bdi_init failure
    block: fix a probe argument to blk_register_region
    loop: fix crash if blk_alloc_queue fails
    blk-core: Fix memory corruption if blkcg_init_queue fails
    block: fix race between request completion and timeout handling
    blktrace: Send BLK_TN_PROCESS events to all running traces
    blk-mq: don't disallow request merges for req->special being set
    blk-mq: mq plug list breakage
    blk-mq: fix for flush deadlock
    ...

    Linus Torvalds
     

09 Nov, 2013

1 commit

  • Without this patch all DM devices will default to BLK_MAX_SEGMENT_SIZE
    (65536) even if the underlying device(s) have a larger value -- this is
    due to blk_stack_limits() using min_not_zero() when stacking the
    max_segment_size limit.

    1073741824

    before patch:
    65536

    after patch:
    1073741824

    Reported-by: Lukasz Flis
    Signed-off-by: Mike Snitzer
    Cc: stable@vger.kernel.org # v3.3+
    Signed-off-by: Jens Axboe

    Mike Snitzer
     

31 Oct, 2013

1 commit


15 Dec, 2012

1 commit


20 Sep, 2012

1 commit

  • The WRITE SAME command supported on some SCSI devices allows the same
    block to be efficiently replicated throughout a block range. Only a
    single logical block is transferred from the host and the storage device
    writes the same data to all blocks described by the I/O.

    This patch implements support for WRITE SAME in the block layer. The
    blkdev_issue_write_same() function can be used by filesystems and block
    drivers to replicate a buffer across a block range. This can be used to
    efficiently initialize software RAID devices, etc.

    Signed-off-by: Martin K. Petersen
    Acked-by: Mike Snitzer
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     

01 Aug, 2012

1 commit

  • blk_set_stacking_limits is intended to allow stacking drivers to build
    up the limits of the stacked device based on the underlying devices'
    limits. But defaulting 'max_sectors' to BLK_DEF_MAX_SECTORS (1024)
    doesn't allow the stacking driver to inherit a max_sectors larger than
    1024 -- due to blk_stack_limits' use of min_not_zero.

    It is now clear that this artificial limit is getting in the way so
    change blk_set_stacking_limits's max_sectors to UINT_MAX (which allows
    stacking drivers like dm-multipath to inherit 'max_sectors' from the
    underlying paths).

    Reported-by: Vijay Chauhan
    Tested-by: Vijay Chauhan
    Signed-off-by: Mike Snitzer
    Signed-off-by: Jens Axboe

    Mike Snitzer
     

11 Jan, 2012

1 commit

  • Stacking driver queue limits are typically bounded exclusively by the
    capabilities of the low level devices, not by the stacking driver
    itself.

    This patch introduces blk_set_stacking_limits() which has more liberal
    metrics than the default queue limits function. This allows us to
    inherit topology parameters from bottom devices without manually
    tweaking the default limits in each driver prior to calling the stacking
    function.

    Since there is now a clear distinction between stacking and low-level
    devices, blk_set_default_limits() has been modified to carry the more
    conservative values that we used to manually set in
    blk_queue_make_request().

    Signed-off-by: Martin K. Petersen
    Acked-by: Mike Snitzer
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     

18 May, 2011

1 commit

  • In some cases we would end up stacking discard_zeroes_data incorrectly.
    Fix this by enabling the feature by default for stacking drivers and
    clearing it for low-level drivers. Incorporating a device that does not
    support dzd will then cause the feature to be disabled in the stacking
    driver.

    Also ensure that the maximum discard value does not overflow when
    exported in sysfs and return 0 in the alignment and dzd fields for
    devices that don't support discard.

    Reported-by: Lukas Czerner
    Signed-off-by: Martin K. Petersen
    Acked-by: Mike Snitzer
    Cc: stable@kernel.org
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     

07 May, 2011

1 commit

  • flush request isn't queueable in some drives. Add a flag to let driver
    notify block layer about this. We can optimize flush performance with the
    knowledge.

    Stable: 2.6.39 only

    Cc: stable@kernel.org
    Signed-off-by: Shaohua Li
    Acked-by: Tejun Heo
    Signed-off-by: Jens Axboe

    shaohua.li@intel.com
     

18 Apr, 2011

1 commit


12 Apr, 2011

1 commit


10 Mar, 2011

2 commits


03 Mar, 2011

1 commit

  • There does not seem to be a clear convention whether q->queue_lock is
    initialized or not when blk_cleanup_queue() is called. In the past it
    was not necessary but now blk_throtl_exit() takes up queue lock by
    default and needs queue lock to be available.

    In fact elevator_exit() code also has similar requirement just that it
    is less stringent in the sense that elevator_exit() is called only if
    elevator is initialized.

    Two problems have been noticed because of ambiguity about spin lock
    status.

    - If a driver calls blk_alloc_queue() and then soon calls
    blk_cleanup_queue() almost immediately, (because some other
    driver structure allocation failed or some other error happened)
    then blk_throtl_exit() will run into issues as queue lock is not
    initialized. Loop driver ran into this issue recently and I
    noticed error paths in md driver too. Similar error paths should
    exist in other drivers too.

    - If some driver provided external spin lock and zapped the lock
    before blk_cleanup_queue(), then it can lead to issues.

    So this patch initializes the default queue lock at queue allocation time.

    block throttling code is one of the users of queue lock and it is
    initialized at the queue allocation time, so it makes sense to
    initialize ->queue_lock also to internal lock. A driver can overide that
    lock later. This will take care of the issue where a driver does not have
    to worry about initializing the queue lock to default before calling
    blk_cleanup_queue()

    Signed-off-by: Vivek Goyal
    Signed-off-by: Jens Axboe

    Vivek Goyal
     

17 Dec, 2010

2 commits

  • Implement blk_limits_max_hw_sectors() and make
    blk_queue_max_hw_sectors() a wrapper around it.

    DM needs this to avoid setting queue_limits' max_hw_sectors and
    max_sectors directly. dm_set_device_limits() now leverages
    blk_limits_max_hw_sectors() logic to establish the appropriate
    max_hw_sectors minimum (PAGE_SIZE). Fixes issue where DM was
    incorrectly setting max_sectors rather than max_hw_sectors (which
    caused dm_merge_bvec()'s max_hw_sectors check to be ineffective).

    Signed-off-by: Mike Snitzer
    Cc: stable@kernel.org
    Acked-by: Martin K. Petersen
    Signed-off-by: Jens Axboe

    Mike Snitzer
     
  • When stacking devices, a request_queue is not always available. This
    forced us to have a no_cluster flag in the queue_limits that could be
    used as a carrier until the request_queue had been set up for a
    metadevice.

    There were several problems with that approach. First of all it was up
    to the stacking device to remember to set queue flag after stacking had
    completed. Also, the queue flag and the queue limits had to be kept in
    sync at all times. We got that wrong, which could lead to us issuing
    commands that went beyond the max scatterlist limit set by the driver.

    The proper fix is to avoid having two flags for tracking the same thing.
    We deprecate QUEUE_FLAG_CLUSTER and use the queue limit directly in the
    block layer merging functions. The queue_limit 'no_cluster' is turned
    into 'cluster' to avoid double negatives and to ease stacking.
    Clustering defaults to being enabled as before. The queue flag logic is
    removed from the stacking function, and explicitly setting the cluster
    flag is no longer necessary in DM and MD.

    Reported-by: Ed Lin
    Signed-off-by: Martin K. Petersen
    Acked-by: Mike Snitzer
    Cc: stable@kernel.org
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     

23 Oct, 2010

1 commit

  • * 'for-2.6.37/barrier' of git://git.kernel.dk/linux-2.6-block: (46 commits)
    xen-blkfront: disable barrier/flush write support
    Added blk-lib.c and blk-barrier.c was renamed to blk-flush.c
    block: remove BLKDEV_IFL_WAIT
    aic7xxx_old: removed unused 'req' variable
    block: remove the BH_Eopnotsupp flag
    block: remove the BLKDEV_IFL_BARRIER flag
    block: remove the WRITE_BARRIER flag
    swap: do not send discards as barriers
    fat: do not send discards as barriers
    ext4: do not send discards as barriers
    jbd2: replace barriers with explicit flush / FUA usage
    jbd2: Modify ASYNC_COMMIT code to not rely on queue draining on barrier
    jbd: replace barriers with explicit flush / FUA usage
    nilfs2: replace barriers with explicit flush / FUA usage
    reiserfs: replace barriers with explicit flush / FUA usage
    gfs2: replace barriers with explicit flush / FUA usage
    btrfs: replace barriers with explicit flush / FUA usage
    xfs: replace barriers with explicit flush / FUA usage
    block: pass gfp_mask and flags to sb_issue_discard
    dm: convey that all flushes are processed as empty
    ...

    Linus Torvalds
     

14 Oct, 2010

1 commit


01 Oct, 2010

2 commits


25 Sep, 2010

1 commit

  • The bounce_pfn of the request queue in 64 bit systems is set to the
    current max_low_pfn. Adding more memory later makes this incorrect.
    Memory allocated beyond this boot time max_low_pfn appear to require
    bounce buffers (bounce buffers are actually not allocated but used in
    calculating segments that may result in "over max segments limit"
    errors).

    Signed-off-by: Malahal Naineni
    Signed-off-by: Jens Axboe

    Malahal Naineni
     

11 Sep, 2010

1 commit

  • Some controllers have a hardware limit on the number of protection
    information scatter-gather list segments they can handle.

    Introduce a max_integrity_segments limit in the block layer and provide
    a new scsi_host_template setting that allows HBA drivers to provide a
    value suitable for the hardware.

    Add support for honoring the integrity segment limit when merging both
    bios and requests.

    Signed-off-by: Martin K. Petersen
    Signed-off-by: Jens Axboe

    Martin K. Petersen