11 Feb, 2014

1 commit

  • Witch to using a preallocated flush_rq for blk-mq similar to what's done
    with the old request path. This allows us to set up the request properly
    with a tag from the actually allowed range and ->rq_disk as needed by
    some drivers. To make life easier we also switch to dynamic allocation
    of ->flush_rq for the old path.

    This effectively reverts most of

    "blk-mq: fix for flush deadlock"

    and

    "blk-mq: Don't reserve a tag for flush request"

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

01 Jan, 2014

1 commit


15 Nov, 2013

1 commit


25 Oct, 2013

1 commit

  • Linux currently has two models for block devices:

    - The classic request_fn based approach, where drivers use struct
    request units for IO. The block layer provides various helper
    functionalities to let drivers share code, things like tag
    management, timeout handling, queueing, etc.

    - The "stacked" approach, where a driver squeezes in between the
    block layer and IO submitter. Since this bypasses the IO stack,
    driver generally have to manage everything themselves.

    With drivers being written for new high IOPS devices, the classic
    request_fn based driver doesn't work well enough. The design dates
    back to when both SMP and high IOPS was rare. It has problems with
    scaling to bigger machines, and runs into scaling issues even on
    smaller machines when you have IOPS in the hundreds of thousands
    per device.

    The stacked approach is then most often selected as the model
    for the driver. But this means that everybody has to re-invent
    everything, and along with that we get all the problems again
    that the shared approach solved.

    This commit introduces blk-mq, block multi queue support. The
    design is centered around per-cpu queues for queueing IO, which
    then funnel down into x number of hardware submission queues.
    We might have a 1:1 mapping between the two, or it might be
    an N:M mapping. That all depends on what the hardware supports.

    blk-mq provides various helper functions, which include:

    - Scalable support for request tagging. Most devices need to
    be able to uniquely identify a request both in the driver and
    to the hardware. The tagging uses per-cpu caches for freed
    tags, to enable cache hot reuse.

    - Timeout handling without tracking request on a per-device
    basis. Basically the driver should be able to get a notification,
    if a request happens to fail.

    - Optional support for non 1:1 mappings between issue and
    submission queues. blk-mq can redirect IO completions to the
    desired location.

    - Support for per-request payloads. Drivers almost always need
    to associate a request structure with some driver private
    command structure. Drivers can tell blk-mq this at init time,
    and then any request handed to the driver will have the
    required size of memory associated with it.

    - Support for merging of IO, and plugging. The stacked model
    gets neither of these. Even for high IOPS devices, merging
    sequential IO reduces per-command overhead and thus
    increases bandwidth.

    For now, this is provided as a potential 3rd queueing model, with
    the hope being that, as it matures, it can replace both the classic
    and stacked model. That would get us back to having just 1 real
    model for block devices, leaving the stacked approach to dm/md
    devices (as it was originally intended).

    Contributions in this patch from the following people:

    Shaohua Li
    Alexander Gordeev
    Christoph Hellwig
    Mike Christie
    Matias Bjorling
    Jeff Moyer

    Acked-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Jens Axboe
     

12 Sep, 2013

1 commit


04 Apr, 2013

1 commit

  • As found by gcc-4.8, the QUEUE_SYSFS_BIT_FNS macro creates functions
    that use a value generated by queue_var_store independent of whether
    that value was set or not.

    block/blk-sysfs.c: In function 'queue_store_nonrot':
    block/blk-sysfs.c:244:385: warning: 'val' may be used uninitialized in this function [-Wmaybe-uninitialized]

    Unlike most other such warnings, this one is not a false positive,
    writing any non-number string into the sysfs files indeed has
    an undefined result, rather than returning an error.

    Signed-off-by: Arnd Bergmann
    Signed-off-by: Jens Axboe

    Arnd Bergmann
     

10 Jan, 2013

1 commit


06 Dec, 2012

1 commit

  • QUEUE_FLAG_DEAD is used to indicate that queuing new requests must
    stop. After this flag has been set queue draining starts. However,
    during the queue draining phase it is still safe to invoke the
    queue's request_fn, so QUEUE_FLAG_DYING is a better name for this
    flag.

    This patch has been generated by running the following command
    over the kernel source tree:

    git grep -lEw 'blk_queue_dead|QUEUE_FLAG_DEAD' |
    xargs sed -i.tmp -e 's/blk_queue_dead/blk_queue_dying/g' \
    -e 's/QUEUE_FLAG_DEAD/QUEUE_FLAG_DYING/g'; \
    sed -i.tmp -e "s/QUEUE_FLAG_DYING$(printf \\t)*5/QUEUE_FLAG_DYING$(printf \\t)5/g" \
    include/linux/blkdev.h; \
    sed -i.tmp -e 's/ DEAD/ DYING/g' -e 's/dead queue/a dying queue/' \
    -e 's/Dead queue/A dying queue/' block/blk-core.c

    Signed-off-by: Bart Van Assche
    Acked-by: Tejun Heo
    Cc: James Bottomley
    Cc: Mike Christie
    Cc: Jens Axboe
    Cc: Chanho Min
    Signed-off-by: Jens Axboe

    Bart Van Assche
     

21 Sep, 2012

1 commit

  • …_init_allocated_queue()

    b82d4b197c ("blkcg: make request_queue bypassing on allocation") made
    request_queues bypassed on allocation to avoid switching on and off
    bypass mode on a queue being initialized. Some drivers allocate and
    then destroy a lot of queues without fully initializing them and
    incurring bypass latency overhead on each of them could add upto
    significant overhead.

    Unfortunately, blk_init_allocated_queue() is never used by queues of
    bio-based drivers, which means that all bio-based driver queues are in
    bypass mode even after initialization and registration complete
    successfully.

    Due to the limited way request_queues are used by bio drivers, this
    problem is hidden pretty well but it shows up when blk-throttle is
    used in combination with a bio-based driver. Trying to configure
    (echoing to cgroupfs file) blk-throttle for a bio-based driver hangs
    indefinitely in blkg_conf_prep() waiting for bypass mode to end.

    This patch moves the initial blk_queue_bypass_end() call from
    blk_init_allocated_queue() to blk_register_queue() which is called for
    any userland-visible queues regardless of its type.

    I believe this is correct because I don't think there is any block
    driver which needs or wants working elevator and blk-cgroup on a queue
    which isn't visible to userland. If there are such users, we need a
    different solution.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Reported-by: Joseph Glanville <joseph.glanville@orionvm.com.au>
    Cc: stable@vger.kernel.org
    Acked-by: Vivek Goyal <vgoyal@redhat.com>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

    Tejun Heo
     

20 Sep, 2012

1 commit

  • The WRITE SAME command supported on some SCSI devices allows the same
    block to be efficiently replicated throughout a block range. Only a
    single logical block is transferred from the host and the storage device
    writes the same data to all blocks described by the I/O.

    This patch implements support for WRITE SAME in the block layer. The
    blkdev_issue_write_same() function can be used by filesystems and block
    drivers to replicate a buffer across a block range. This can be used to
    efficiently initialize software RAID devices, etc.

    Signed-off-by: Martin K. Petersen
    Acked-by: Mike Snitzer
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     

09 Sep, 2012

1 commit

  • Instead of using simple_strtoul which "converts" invalid numbers to 0,
    use strict_strtoul and perform error checking to ensure that userspace
    passes us a valid unsigned long. This addresses problems with functions
    such as writev, which might want to write a trailing newline -- the
    newline should rightfully be rejected, but the value preceeding it
    should be preserved.

    Fixes BZ#46981.

    Signed-off-by: Dave Reisner
    Signed-off-by: Jens Axboe

    Dave Reisner
     

27 Jun, 2012

1 commit

  • Currently, request_queue has one request_list to allocate requests
    from regardless of blkcg of the IO being issued. When the unified
    request pool is used up, cfq proportional IO limits become meaningless
    - whoever grabs the next request being freed wins the race regardless
    of the configured weights.

    This can be easily demonstrated by creating a blkio cgroup w/ very low
    weight, put a program which can issue a lot of random direct IOs there
    and running a sequential IO from a different cgroup. As soon as the
    request pool is used up, the sequential IO bandwidth crashes.

    This patch implements per-blkg request_list. Each blkg has its own
    request_list and any IO allocates its request from the matching blkg
    making blkcgs completely isolated in terms of request allocation.

    * Root blkcg uses the request_list embedded in each request_queue,
    which was renamed to @q->root_rl from @q->rq. While making blkcg rl
    handling a bit harier, this enables avoiding most overhead for root
    blkcg.

    * Queue fullness is properly per request_list but bdi isn't blkcg
    aware yet, so congestion state currently just follows the root
    blkcg. As writeback isn't aware of blkcg yet, this works okay for
    async congestion but readahead may get the wrong signals. It's
    better than blkcg completely collapsing with shared request_list but
    needs to be improved with future changes.

    * After this change, each block cgroup gets a full request pool making
    resource consumption of each cgroup higher. This makes allowing
    non-root users to create cgroups less desirable; however, note that
    allowing non-root users to directly manage cgroups is already
    severely broken regardless of this patch - each block cgroup
    consumes kernel memory and skews IO weight (IO weights are not
    hierarchical).

    v2: queue-sysfs.txt updated and patch description udpated as suggested
    by Vivek.

    v3: blk_get_rl() wasn't checking error return from
    blkg_lookup_create() and may cause oops on lookup failure. Fix it
    by falling back to root_rl on blkg lookup failures. This problem
    was spotted by Rakesh Iyer .

    v4: Updated to accomodate 458f27a982 "block: Avoid missed wakeup in
    request waitqueue". blk_drain_queue() now wakes up waiters on all
    blkg->rl on the target queue.

    Signed-off-by: Tejun Heo
    Acked-by: Vivek Goyal
    Cc: Wu Fengguang
    Signed-off-by: Jens Axboe

    Tejun Heo
     

25 Jun, 2012

1 commit

  • Request allocation is about to be made per-blkg meaning that there'll
    be multiple request lists.

    * Make queue full state per request_list. blk_*queue_full() functions
    are renamed to blk_*rl_full() and takes @rl instead of @q.

    * Rename blk_init_free_list() to blk_init_rl() and make it take @rl
    instead of @q. Also add @gfp_mask parameter.

    * Add blk_exit_rl() instead of destroying rl directly from
    blk_release_queue().

    * Add request_list->q and make request alloc/free functions -
    blk_free_request(), [__]freed_request(), __get_request() - take @rl
    instead of @q.

    This patch doesn't introduce any functional difference.

    Signed-off-by: Tejun Heo
    Acked-by: Vivek Goyal
    Signed-off-by: Jens Axboe

    Tejun Heo
     

07 Mar, 2012

2 commits

  • Currently, blkg is per cgroup-queue-policy combination. This is
    unnatural and leads to various convolutions in partially used
    duplicate fields in blkg, config / stat access, and general management
    of blkgs.

    This patch make blkg's per cgroup-queue and let them serve all
    policies. blkgs are now created and destroyed by blkcg core proper.
    This will allow further consolidation of common management logic into
    blkcg core and API with better defined semantics and layering.

    As a transitional step to untangle blkg management, elvswitch and
    policy [de]registration, all blkgs except the root blkg are being shot
    down during elvswitch and bypass. This patch adds blkg_root_update()
    to update root blkg in place on policy change. This is hacky and racy
    but should be good enough as interim step until we get locking
    simplified and switch over to proper in-place update for all blkgs.

    -v2: Root blkgs need to be updated on elvswitch too and blkg_alloc()
    comment wasn't updated according to the function change. Fixed.
    Both pointed out by Vivek.

    -v3: v2 updated blkg_destroy_all() to invoke update_root_blkg_pd() for
    all policies. This freed root pd during elvswitch before the
    last queue finished exiting and led to oops. Directly invoke
    update_root_blkg_pd() only on BLKIO_POLICY_PROP from
    cfq_exit_queue(). This also is closer to what will be done with
    proper in-place blkg update. Reported by Vivek.

    Signed-off-by: Tejun Heo
    Cc: Vivek Goyal
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • Currently block core calls directly into blk-throttle for init, drain
    and exit. This patch adds blkcg_{init|drain|exit}_queue() which wraps
    the blk-throttle functions. This is to give more control and
    visiblity to blkcg core layer for proper layering. Further patches
    will add logic common to blkcg policies to the functions.

    While at it, collapse blk_throtl_release() into blk_throtl_exit().
    There's no reason to keep them separate.

    Signed-off-by: Tejun Heo
    Cc: Vivek Goyal
    Signed-off-by: Jens Axboe

    Tejun Heo
     

14 Dec, 2011

3 commits

  • With kmem_cache managed by blk-ioc, io_cq exit/release can be moved to
    blk-ioc too. The odd ->io_cq->exit/release() callbacks are replaced
    with elevator_ops->elevator_exit_icq_fn() with unlinking from both ioc
    and q, and freeing automatically handled by blk-ioc. The elevator
    operation only need to perform exit operation specific to the elevator
    - in cfq's case, exiting the cfqq's.

    Also, clearing of io_cq's on q detach is moved to block core and
    automatically performed on elevator switch and q release.

    Because the q io_cq points to might be freed before RCU callback for
    the io_cq runs, blk-ioc code should remember to which cache the io_cq
    needs to be freed when the io_cq is released. New field
    io_cq->__rcu_icq_cache is added for this purpose. As both the new
    field and rcu_head are used only after io_cq is released and the
    q/ioc_node fields aren't, they are put into unions.

    Signed-off-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • cfq allocates per-queue id using ida and uses it to index cic radix
    tree from io_context. Move it to q->id and allocate on queue init and
    free on queue release. This simplifies cfq a bit and will allow for
    further improvements of io context life-cycle management.

    This patch doesn't introduce any functional difference.

    Signed-off-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • There are a number of QUEUE_FLAG_DEAD tests. Add blk_queue_dead()
    macro and use it.

    This patch doesn't introduce any functional difference.

    Signed-off-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Tejun Heo
     

19 Oct, 2011

2 commits

  • request_queue is refcounted but actually depdends on lifetime
    management from the queue owner - on blk_cleanup_queue(), block layer
    expects that there's no request passing through request_queue and no
    new one will.

    This is fundamentally broken. The queue owner (e.g. SCSI layer)
    doesn't have a way to know whether there are other active users before
    calling blk_cleanup_queue() and other users (e.g. bsg) don't have any
    guarantee that the queue is and would stay valid while it's holding a
    reference.

    With delay added in blk_queue_bio() before queue_lock is grabbed, the
    following oops can be easily triggered when a device is removed with
    in-flight IOs.

    sd 0:0:1:0: [sdb] Stopping disk
    ata1.01: disabled
    general protection fault: 0000 [#1] PREEMPT SMP
    CPU 2
    Modules linked in:

    Pid: 648, comm: test_rawio Not tainted 3.1.0-rc3-work+ #56 Bochs Bochs
    RIP: 0010:[] [] elv_rqhash_find+0x61/0x100
    ...
    Process test_rawio (pid: 648, threadinfo ffff880019efa000, task ffff880019ef8a80)
    ...
    Call Trace:
    [] elv_merge+0x84/0xe0
    [] blk_queue_bio+0xf4/0x400
    [] generic_make_request+0xca/0x100
    [] submit_bio+0x74/0x100
    [] dio_bio_submit+0xbc/0xc0
    [] __blockdev_direct_IO+0x92e/0xb40
    [] blkdev_direct_IO+0x57/0x60
    [] generic_file_aio_read+0x6d5/0x760
    [] do_sync_read+0xda/0x120
    [] vfs_read+0xc5/0x180
    [] sys_pread64+0x9a/0xb0
    [] system_call_fastpath+0x16/0x1b

    This happens because blk_queue_cleanup() destroys the queue and
    elevator whether IOs are in progress or not and DEAD tests are
    sprinkled in the request processing path without proper
    synchronization.

    Similar problem exists for blk-throtl. On queue cleanup, blk-throtl
    is shutdown whether it has requests in it or not. Depending on
    timing, it either oopses or throttled bios are lost putting tasks
    which are waiting for bio completion into eternal D state.

    The way it should work is having the usual clear distinction between
    shutdown and release. Shutdown drains all currently pending requests,
    marks the queue dead, and performs partial teardown of the now
    unnecessary part of the queue. Even after shutdown is complete,
    reference holders are still allowed to issue requests to the queue
    although they will be immmediately failed. The rest of teardown
    happens on release.

    This patch makes the following changes to make blk_queue_cleanup()
    behave as proper shutdown.

    * QUEUE_FLAG_DEAD is now set while holding both q->exit_mutex and
    queue_lock.

    * Unsynchronized DEAD check in generic_make_request_checks() removed.
    This couldn't make any meaningful difference as the queue could die
    after the check.

    * blk_drain_queue() updated such that it can drain all requests and is
    now called during cleanup.

    * blk_throtl updated such that it checks DEAD on grabbing queue_lock,
    drains all throttled bios during cleanup and free td when queue is
    released.

    Signed-off-by: Tejun Heo
    Cc: Vivek Goyal
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • Conflicts:
    block/blk-core.c
    include/linux/blkdev.h

    Signed-off-by: Jens Axboe

    Jens Axboe
     

28 Sep, 2011

1 commit

  • A kernel crash is observed when a mounted ext3/ext4 filesystem is
    physically removed. The problem is that blk_cleanup_queue() frees up
    some resources eg by calling elevator_exit(), which are not checked for
    in normal operation. So we should rather move these calls to the
    destructor function blk_release_queue() as at that point all remaining
    references are gone. However, in doing so we have to ensure that any
    externally supplied queue_lock is disconnected as the driver might free
    up the lock after the call of blk_cleanup_queue(),

    Signed-off-by: Hannes Reinecke
    Signed-off-by: Jens Axboe

    Hannes Reinecke
     

21 Sep, 2011

1 commit


24 Aug, 2011

1 commit


24 Jul, 2011

1 commit

  • Some systems benefit from completions always being steered to the strict
    requester cpu rather than the looser "per-socket" steering that
    blk_cpu_to_group() attempts by default. This is because the first
    CPU in the group mask ends up being completely overloaded with work,
    while the others (including the original submitter) has power left
    to spare.

    Allow the strict mode to be set by writing '2' to the sysfs control
    file. This is identical to the scheme used for the nomerges file,
    where '2' is a more aggressive setting than just being turned on.

    echo 2 > /sys/block//queue/rq_affinity

    Cc: Christoph Hellwig
    Cc: Roland Dreier
    Tested-by: Dave Jiang
    Signed-off-by: Dan Williams
    Signed-off-by: Jens Axboe

    Dan Williams
     

21 May, 2011

1 commit

  • Since for-2.6.40/core was forked off the 2.6.39 devel tree, we've
    had churn in the core area that makes it difficult to handle
    patches for eg cfq or blk-throttle. Instead of requiring that they
    be based in older versions with bugs that have been fixed later
    in the rc cycle, merge in 2.6.39 final.

    Also fixes up conflicts in the below files.

    Conflicts:
    drivers/block/paride/pcd.c
    drivers/cdrom/viocd.c
    drivers/ide/ide-cd.c

    Signed-off-by: Jens Axboe

    Jens Axboe
     

18 May, 2011

1 commit

  • In some cases we would end up stacking discard_zeroes_data incorrectly.
    Fix this by enabling the feature by default for stacking drivers and
    clearing it for low-level drivers. Incorporating a device that does not
    support dzd will then cause the feature to be disabled in the stacking
    driver.

    Also ensure that the maximum discard value does not overflow when
    exported in sysfs and return 0 in the alignment and dzd fields for
    devices that don't support discard.

    Reported-by: Lukas Czerner
    Signed-off-by: Martin K. Petersen
    Acked-by: Mike Snitzer
    Cc: stable@kernel.org
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     

19 Apr, 2011

2 commits

  • In queue_requests_store, the code looks like
    if (rl->count[BLK_RW_SYNC] >= q->nr_requests) {
    blk_set_queue_full(q, BLK_RW_SYNC);
    } else if (rl->count[BLK_RW_SYNC]+1 nr_requests) {
    blk_clear_queue_full(q, BLK_RW_SYNC);
    wake_up(&rl->wait[BLK_RW_SYNC]);
    }
    If we don't satify the situation of "if", we can get that
    rl->count[BLK_RW_SYNC} < q->nr_quests. It is the same as
    rl->count[BLK_RW_SYNC]+1 nr_requests.
    All the "else" should satisfy the "else if" check so it isn't
    needed actually.

    Signed-off-by: Tao Ma
    Signed-off-by: Jens Axboe

    Tao Ma
     
  • We do not call blk_trace_remove_sysfs() in err return path
    if kobject_add() fails. This path fixes it.

    Cc: stable@kernel.org
    Signed-off-by: Liu Yuan
    Signed-off-by: Jens Axboe

    Liu Yuan
     

14 Apr, 2011

1 commit


03 Mar, 2011

1 commit

  • Move blk_throtl_exit() in blk_cleanup_queue() as blk_throtl_exit() is
    written in such a way that it needs queue lock. In blk_release_queue()
    there is no gurantee that ->queue_lock is still around.

    Initially blk_throtl_exit() was in blk_cleanup_queue() but Ingo reported
    one problem.

    https://lkml.org/lkml/2010/10/23/86

    And a quick fix moved blk_throtl_exit() to blk_release_queue().

    commit 7ad58c028652753814054f4e3ac58f925e7343f4
    Author: Jens Axboe
    Date: Sat Oct 23 20:40:26 2010 +0200

    block: fix use-after-free bug in blk throttle code

    This patch reverts above change and does not try to shutdown the
    throtl work in blk_sync_queue(). By avoiding call to
    throtl_shutdown_timer_wq() from blk_sync_queue(), we should also avoid
    the problem reported by Ingo.

    blk_sync_queue() seems to be used only by md driver and it seems to be
    using it to make sure q->unplug_fn is not called as md registers its
    own unplug functions and it is about to free up the data structures
    used by unplug_fn(). Block throttle does not call back into unplug_fn()
    or into md. So there is no need to cancel blk throttle work.

    In fact I think cancelling block throttle work is bad because it might
    happen that some bios are throttled and scheduled to be dispatched later
    with the help of pending work and if work is cancelled, these bios might
    never be dispatched.

    Block layer also uses blk_sync_queue() during blk_cleanup_queue() and
    blk_release_queue() time. That should be safe as we are also calling
    blk_throtl_exit() which should make sure all the throttling related
    data structures are cleaned up.

    Signed-off-by: Vivek Goyal
    Signed-off-by: Jens Axboe

    Vivek Goyal
     

17 Dec, 2010

1 commit

  • When stacking devices, a request_queue is not always available. This
    forced us to have a no_cluster flag in the queue_limits that could be
    used as a carrier until the request_queue had been set up for a
    metadevice.

    There were several problems with that approach. First of all it was up
    to the stacking device to remember to set queue flag after stacking had
    completed. Also, the queue flag and the queue limits had to be kept in
    sync at all times. We got that wrong, which could lead to us issuing
    commands that went beyond the max scatterlist limit set by the driver.

    The proper fix is to avoid having two flags for tracking the same thing.
    We deprecate QUEUE_FLAG_CLUSTER and use the queue limit directly in the
    block layer merging functions. The queue_limit 'no_cluster' is turned
    into 'cluster' to avoid double negatives and to ease stacking.
    Clustering defaults to being enabled as before. The queue flag logic is
    removed from the stacking function, and explicitly setting the cluster
    flag is no longer necessary in DM and MD.

    Reported-by: Ed Lin
    Signed-off-by: Martin K. Petersen
    Acked-by: Mike Snitzer
    Cc: stable@kernel.org
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     

24 Oct, 2010

1 commit

  • blk_throtl_exit() frees the throttle data hanging off the queue
    in blk_cleanup_queue(), but blk_put_queue() will indirectly
    dereference this data when calling blk_sync_queue() which in
    turns calls throtl_shutdown_timer_wq().

    Fix this by moving the freeing of the throttle data to when
    the queue is truly being released, and post the call to
    blk_sync_queue().

    Reported-by: Ingo Molnar
    Tested-by: Ingo Molnar
    Signed-off-by: Jens Axboe

    Jens Axboe
     

23 Oct, 2010

1 commit

  • * 'for-2.6.37/core' of git://git.kernel.dk/linux-2.6-block: (39 commits)
    cfq-iosched: Fix a gcc 4.5 warning and put some comments
    block: Turn bvec_k{un,}map_irq() into static inline functions
    block: fix accounting bug on cross partition merges
    block: Make the integrity mapped property a bio flag
    block: Fix double free in blk_integrity_unregister
    block: Ensure physical block size is unsigned int
    blkio-throttle: Fix possible multiplication overflow in iops calculations
    blkio-throttle: limit max iops value to UINT_MAX
    blkio-throttle: There is no need to convert jiffies to milli seconds
    blkio-throttle: Fix link failure failure on i386
    blkio: Recalculate the throttled bio dispatch time upon throttle limit change
    blkio: Add root group to td->tg_list
    blkio: deletion of a cgroup was causes oops
    blkio: Do not export throttle files if CONFIG_BLK_DEV_THROTTLING=n
    block: set the bounce_pfn to the actual DMA limit rather than to max memory
    block: revert bad fix for memory hotplug causing bounces
    Fix compile error in blk-exec.c for !CONFIG_DETECT_HUNG_TASK
    block: set the bounce_pfn to the actual DMA limit rather than to max memory
    block: Prevent hang_check firing during long I/O
    cfq: improve fsync performance for small files
    ...

    Fix up trivial conflicts due to __rcu sparse annotation in include/linux/genhd.h

    Linus Torvalds
     

11 Sep, 2010

1 commit

  • Some controllers have a hardware limit on the number of protection
    information scatter-gather list segments they can handle.

    Introduce a max_integrity_segments limit in the block layer and provide
    a new scsi_host_template setting that allows HBA drivers to provide a
    value suitable for the hardware.

    Add support for honoring the integrity segment limit when merging both
    bios and requests.

    Signed-off-by: Martin K. Petersen
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     

23 Aug, 2010

1 commit


08 Aug, 2010

2 commits


10 Apr, 2010

1 commit

  • * 'for-linus' of git://git.kernel.dk/linux-2.6-block: (34 commits)
    cfq-iosched: Fix the incorrect timeslice accounting with forced_dispatch
    loop: Update mtime when writing using aops
    block: expose the statistics in blkio.time and blkio.sectors for the root cgroup
    backing-dev: Handle class_create() failure
    Block: Fix block/elevator.c elevator_get() off-by-one error
    drbd: lc_element_by_index() never returns NULL
    cciss: unlock on error path
    cfq-iosched: Do not merge queues of BE and IDLE classes
    cfq-iosched: Add additional blktrace log messages in CFQ for easier debugging
    i2o: Remove the dangerous kobj_to_i2o_device macro
    block: remove 16 bytes of padding from struct request on 64bits
    cfq-iosched: fix a kbuild regression
    block: make CONFIG_BLK_CGROUP visible
    Remove GENHD_FL_DRIVERFS
    block: Export max number of segments and max segment size in sysfs
    block: Finalize conversion of block limits functions
    block: Fix overrun in lcm() and move it to lib
    vfs: improve writeback_inodes_wb()
    paride: fix off-by-one test
    drbd: fix al-to-on-disk-bitmap for 4k logical_block_size
    ...

    Linus Torvalds
     

30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

19 Mar, 2010

1 commit