14 Dec, 2011

3 commits

  • With kmem_cache managed by blk-ioc, io_cq exit/release can be moved to
    blk-ioc too. The odd ->io_cq->exit/release() callbacks are replaced
    with elevator_ops->elevator_exit_icq_fn() with unlinking from both ioc
    and q, and freeing automatically handled by blk-ioc. The elevator
    operation only need to perform exit operation specific to the elevator
    - in cfq's case, exiting the cfqq's.

    Also, clearing of io_cq's on q detach is moved to block core and
    automatically performed on elevator switch and q release.

    Because the q io_cq points to might be freed before RCU callback for
    the io_cq runs, blk-ioc code should remember to which cache the io_cq
    needs to be freed when the io_cq is released. New field
    io_cq->__rcu_icq_cache is added for this purpose. As both the new
    field and rcu_head are used only after io_cq is released and the
    q/ioc_node fields aren't, they are put into unions.

    Signed-off-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • cfq allocates per-queue id using ida and uses it to index cic radix
    tree from io_context. Move it to q->id and allocate on queue init and
    free on queue release. This simplifies cfq a bit and will allow for
    further improvements of io context life-cycle management.

    This patch doesn't introduce any functional difference.

    Signed-off-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • There are a number of QUEUE_FLAG_DEAD tests. Add blk_queue_dead()
    macro and use it.

    This patch doesn't introduce any functional difference.

    Signed-off-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Tejun Heo
     

19 Oct, 2011

2 commits

  • request_queue is refcounted but actually depdends on lifetime
    management from the queue owner - on blk_cleanup_queue(), block layer
    expects that there's no request passing through request_queue and no
    new one will.

    This is fundamentally broken. The queue owner (e.g. SCSI layer)
    doesn't have a way to know whether there are other active users before
    calling blk_cleanup_queue() and other users (e.g. bsg) don't have any
    guarantee that the queue is and would stay valid while it's holding a
    reference.

    With delay added in blk_queue_bio() before queue_lock is grabbed, the
    following oops can be easily triggered when a device is removed with
    in-flight IOs.

    sd 0:0:1:0: [sdb] Stopping disk
    ata1.01: disabled
    general protection fault: 0000 [#1] PREEMPT SMP
    CPU 2
    Modules linked in:

    Pid: 648, comm: test_rawio Not tainted 3.1.0-rc3-work+ #56 Bochs Bochs
    RIP: 0010:[] [] elv_rqhash_find+0x61/0x100
    ...
    Process test_rawio (pid: 648, threadinfo ffff880019efa000, task ffff880019ef8a80)
    ...
    Call Trace:
    [] elv_merge+0x84/0xe0
    [] blk_queue_bio+0xf4/0x400
    [] generic_make_request+0xca/0x100
    [] submit_bio+0x74/0x100
    [] dio_bio_submit+0xbc/0xc0
    [] __blockdev_direct_IO+0x92e/0xb40
    [] blkdev_direct_IO+0x57/0x60
    [] generic_file_aio_read+0x6d5/0x760
    [] do_sync_read+0xda/0x120
    [] vfs_read+0xc5/0x180
    [] sys_pread64+0x9a/0xb0
    [] system_call_fastpath+0x16/0x1b

    This happens because blk_queue_cleanup() destroys the queue and
    elevator whether IOs are in progress or not and DEAD tests are
    sprinkled in the request processing path without proper
    synchronization.

    Similar problem exists for blk-throtl. On queue cleanup, blk-throtl
    is shutdown whether it has requests in it or not. Depending on
    timing, it either oopses or throttled bios are lost putting tasks
    which are waiting for bio completion into eternal D state.

    The way it should work is having the usual clear distinction between
    shutdown and release. Shutdown drains all currently pending requests,
    marks the queue dead, and performs partial teardown of the now
    unnecessary part of the queue. Even after shutdown is complete,
    reference holders are still allowed to issue requests to the queue
    although they will be immmediately failed. The rest of teardown
    happens on release.

    This patch makes the following changes to make blk_queue_cleanup()
    behave as proper shutdown.

    * QUEUE_FLAG_DEAD is now set while holding both q->exit_mutex and
    queue_lock.

    * Unsynchronized DEAD check in generic_make_request_checks() removed.
    This couldn't make any meaningful difference as the queue could die
    after the check.

    * blk_drain_queue() updated such that it can drain all requests and is
    now called during cleanup.

    * blk_throtl updated such that it checks DEAD on grabbing queue_lock,
    drains all throttled bios during cleanup and free td when queue is
    released.

    Signed-off-by: Tejun Heo
    Cc: Vivek Goyal
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • Conflicts:
    block/blk-core.c
    include/linux/blkdev.h

    Signed-off-by: Jens Axboe

    Jens Axboe
     

28 Sep, 2011

1 commit

  • A kernel crash is observed when a mounted ext3/ext4 filesystem is
    physically removed. The problem is that blk_cleanup_queue() frees up
    some resources eg by calling elevator_exit(), which are not checked for
    in normal operation. So we should rather move these calls to the
    destructor function blk_release_queue() as at that point all remaining
    references are gone. However, in doing so we have to ensure that any
    externally supplied queue_lock is disconnected as the driver might free
    up the lock after the call of blk_cleanup_queue(),

    Signed-off-by: Hannes Reinecke
    Signed-off-by: Jens Axboe

    Hannes Reinecke
     

21 Sep, 2011

1 commit


24 Aug, 2011

1 commit


24 Jul, 2011

1 commit

  • Some systems benefit from completions always being steered to the strict
    requester cpu rather than the looser "per-socket" steering that
    blk_cpu_to_group() attempts by default. This is because the first
    CPU in the group mask ends up being completely overloaded with work,
    while the others (including the original submitter) has power left
    to spare.

    Allow the strict mode to be set by writing '2' to the sysfs control
    file. This is identical to the scheme used for the nomerges file,
    where '2' is a more aggressive setting than just being turned on.

    echo 2 > /sys/block//queue/rq_affinity

    Cc: Christoph Hellwig
    Cc: Roland Dreier
    Tested-by: Dave Jiang
    Signed-off-by: Dan Williams
    Signed-off-by: Jens Axboe

    Dan Williams
     

21 May, 2011

1 commit

  • Since for-2.6.40/core was forked off the 2.6.39 devel tree, we've
    had churn in the core area that makes it difficult to handle
    patches for eg cfq or blk-throttle. Instead of requiring that they
    be based in older versions with bugs that have been fixed later
    in the rc cycle, merge in 2.6.39 final.

    Also fixes up conflicts in the below files.

    Conflicts:
    drivers/block/paride/pcd.c
    drivers/cdrom/viocd.c
    drivers/ide/ide-cd.c

    Signed-off-by: Jens Axboe

    Jens Axboe
     

18 May, 2011

1 commit

  • In some cases we would end up stacking discard_zeroes_data incorrectly.
    Fix this by enabling the feature by default for stacking drivers and
    clearing it for low-level drivers. Incorporating a device that does not
    support dzd will then cause the feature to be disabled in the stacking
    driver.

    Also ensure that the maximum discard value does not overflow when
    exported in sysfs and return 0 in the alignment and dzd fields for
    devices that don't support discard.

    Reported-by: Lukas Czerner
    Signed-off-by: Martin K. Petersen
    Acked-by: Mike Snitzer
    Cc: stable@kernel.org
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     

19 Apr, 2011

2 commits

  • In queue_requests_store, the code looks like
    if (rl->count[BLK_RW_SYNC] >= q->nr_requests) {
    blk_set_queue_full(q, BLK_RW_SYNC);
    } else if (rl->count[BLK_RW_SYNC]+1 nr_requests) {
    blk_clear_queue_full(q, BLK_RW_SYNC);
    wake_up(&rl->wait[BLK_RW_SYNC]);
    }
    If we don't satify the situation of "if", we can get that
    rl->count[BLK_RW_SYNC} < q->nr_quests. It is the same as
    rl->count[BLK_RW_SYNC]+1 nr_requests.
    All the "else" should satisfy the "else if" check so it isn't
    needed actually.

    Signed-off-by: Tao Ma
    Signed-off-by: Jens Axboe

    Tao Ma
     
  • We do not call blk_trace_remove_sysfs() in err return path
    if kobject_add() fails. This path fixes it.

    Cc: stable@kernel.org
    Signed-off-by: Liu Yuan
    Signed-off-by: Jens Axboe

    Liu Yuan
     

14 Apr, 2011

1 commit


03 Mar, 2011

1 commit

  • Move blk_throtl_exit() in blk_cleanup_queue() as blk_throtl_exit() is
    written in such a way that it needs queue lock. In blk_release_queue()
    there is no gurantee that ->queue_lock is still around.

    Initially blk_throtl_exit() was in blk_cleanup_queue() but Ingo reported
    one problem.

    https://lkml.org/lkml/2010/10/23/86

    And a quick fix moved blk_throtl_exit() to blk_release_queue().

    commit 7ad58c028652753814054f4e3ac58f925e7343f4
    Author: Jens Axboe
    Date: Sat Oct 23 20:40:26 2010 +0200

    block: fix use-after-free bug in blk throttle code

    This patch reverts above change and does not try to shutdown the
    throtl work in blk_sync_queue(). By avoiding call to
    throtl_shutdown_timer_wq() from blk_sync_queue(), we should also avoid
    the problem reported by Ingo.

    blk_sync_queue() seems to be used only by md driver and it seems to be
    using it to make sure q->unplug_fn is not called as md registers its
    own unplug functions and it is about to free up the data structures
    used by unplug_fn(). Block throttle does not call back into unplug_fn()
    or into md. So there is no need to cancel blk throttle work.

    In fact I think cancelling block throttle work is bad because it might
    happen that some bios are throttled and scheduled to be dispatched later
    with the help of pending work and if work is cancelled, these bios might
    never be dispatched.

    Block layer also uses blk_sync_queue() during blk_cleanup_queue() and
    blk_release_queue() time. That should be safe as we are also calling
    blk_throtl_exit() which should make sure all the throttling related
    data structures are cleaned up.

    Signed-off-by: Vivek Goyal
    Signed-off-by: Jens Axboe

    Vivek Goyal
     

17 Dec, 2010

1 commit

  • When stacking devices, a request_queue is not always available. This
    forced us to have a no_cluster flag in the queue_limits that could be
    used as a carrier until the request_queue had been set up for a
    metadevice.

    There were several problems with that approach. First of all it was up
    to the stacking device to remember to set queue flag after stacking had
    completed. Also, the queue flag and the queue limits had to be kept in
    sync at all times. We got that wrong, which could lead to us issuing
    commands that went beyond the max scatterlist limit set by the driver.

    The proper fix is to avoid having two flags for tracking the same thing.
    We deprecate QUEUE_FLAG_CLUSTER and use the queue limit directly in the
    block layer merging functions. The queue_limit 'no_cluster' is turned
    into 'cluster' to avoid double negatives and to ease stacking.
    Clustering defaults to being enabled as before. The queue flag logic is
    removed from the stacking function, and explicitly setting the cluster
    flag is no longer necessary in DM and MD.

    Reported-by: Ed Lin
    Signed-off-by: Martin K. Petersen
    Acked-by: Mike Snitzer
    Cc: stable@kernel.org
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     

24 Oct, 2010

1 commit

  • blk_throtl_exit() frees the throttle data hanging off the queue
    in blk_cleanup_queue(), but blk_put_queue() will indirectly
    dereference this data when calling blk_sync_queue() which in
    turns calls throtl_shutdown_timer_wq().

    Fix this by moving the freeing of the throttle data to when
    the queue is truly being released, and post the call to
    blk_sync_queue().

    Reported-by: Ingo Molnar
    Tested-by: Ingo Molnar
    Signed-off-by: Jens Axboe

    Jens Axboe
     

23 Oct, 2010

1 commit

  • * 'for-2.6.37/core' of git://git.kernel.dk/linux-2.6-block: (39 commits)
    cfq-iosched: Fix a gcc 4.5 warning and put some comments
    block: Turn bvec_k{un,}map_irq() into static inline functions
    block: fix accounting bug on cross partition merges
    block: Make the integrity mapped property a bio flag
    block: Fix double free in blk_integrity_unregister
    block: Ensure physical block size is unsigned int
    blkio-throttle: Fix possible multiplication overflow in iops calculations
    blkio-throttle: limit max iops value to UINT_MAX
    blkio-throttle: There is no need to convert jiffies to milli seconds
    blkio-throttle: Fix link failure failure on i386
    blkio: Recalculate the throttled bio dispatch time upon throttle limit change
    blkio: Add root group to td->tg_list
    blkio: deletion of a cgroup was causes oops
    blkio: Do not export throttle files if CONFIG_BLK_DEV_THROTTLING=n
    block: set the bounce_pfn to the actual DMA limit rather than to max memory
    block: revert bad fix for memory hotplug causing bounces
    Fix compile error in blk-exec.c for !CONFIG_DETECT_HUNG_TASK
    block: set the bounce_pfn to the actual DMA limit rather than to max memory
    block: Prevent hang_check firing during long I/O
    cfq: improve fsync performance for small files
    ...

    Fix up trivial conflicts due to __rcu sparse annotation in include/linux/genhd.h

    Linus Torvalds
     

11 Sep, 2010

1 commit

  • Some controllers have a hardware limit on the number of protection
    information scatter-gather list segments they can handle.

    Introduce a max_integrity_segments limit in the block layer and provide
    a new scsi_host_template setting that allows HBA drivers to provide a
    value suitable for the hardware.

    Add support for honoring the integrity segment limit when merging both
    bios and requests.

    Signed-off-by: Martin K. Petersen
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     

23 Aug, 2010

1 commit


08 Aug, 2010

2 commits


10 Apr, 2010

1 commit

  • * 'for-linus' of git://git.kernel.dk/linux-2.6-block: (34 commits)
    cfq-iosched: Fix the incorrect timeslice accounting with forced_dispatch
    loop: Update mtime when writing using aops
    block: expose the statistics in blkio.time and blkio.sectors for the root cgroup
    backing-dev: Handle class_create() failure
    Block: Fix block/elevator.c elevator_get() off-by-one error
    drbd: lc_element_by_index() never returns NULL
    cciss: unlock on error path
    cfq-iosched: Do not merge queues of BE and IDLE classes
    cfq-iosched: Add additional blktrace log messages in CFQ for easier debugging
    i2o: Remove the dangerous kobj_to_i2o_device macro
    block: remove 16 bytes of padding from struct request on 64bits
    cfq-iosched: fix a kbuild regression
    block: make CONFIG_BLK_CGROUP visible
    Remove GENHD_FL_DRIVERFS
    block: Export max number of segments and max segment size in sysfs
    block: Finalize conversion of block limits functions
    block: Fix overrun in lcm() and move it to lib
    vfs: improve writeback_inodes_wb()
    paride: fix off-by-one test
    drbd: fix al-to-on-disk-bitmap for 4k logical_block_size
    ...

    Linus Torvalds
     

30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

19 Mar, 2010

1 commit


15 Mar, 2010

1 commit


08 Mar, 2010

1 commit

  • Constify struct sysfs_ops.

    This is part of the ops structure constification
    effort started by Arjan van de Ven et al.

    Benefits of this constification:

    * prevents modification of data that is shared
    (referenced) by many other structure instances
    at runtime

    * detects/prevents accidental (but not intentional)
    modification attempts on archs that enforce
    read-only kernel data at runtime

    * potentially better optimized code as the compiler
    can assume that the const data cannot be changed

    * the compiler/linker move const data into .rodata
    and therefore exclude them from false sharing

    Signed-off-by: Emese Revfy
    Acked-by: David Teigland
    Acked-by: Matt Domsch
    Acked-by: Maciej Sosnowski
    Acked-by: Hans J. Koch
    Acked-by: Pekka Enberg
    Acked-by: Jens Axboe
    Acked-by: Stephen Hemminger
    Signed-off-by: Greg Kroah-Hartman

    Emese Revfy
     

29 Jan, 2010

1 commit

  • Updated 'nomerges' tunable to accept a value of '2' - indicating that _no_
    merges at all are to be attempted (not even the simple one-hit cache).

    The following table illustrates the additional benefit - 5 minute runs of
    a random I/O load were applied to a dozen devices on a 16-way x86_64 system.

    nomerges Throughput %System Improvement (tput / %sys)
    -------- ------------ ----------- -------------------------
    0 12.45 MB/sec 0.669365609
    1 12.50 MB/sec 0.641519199 0.40% / 2.71%
    2 12.52 MB/sec 0.639849750 0.56% / 2.96%

    Signed-off-by: Alan D. Brunelle
    Signed-off-by: Jens Axboe

    Alan D. Brunelle
     

03 Dec, 2009

1 commit

  • The discard ioctl is used by mkfs utilities to clear a block device
    prior to putting metadata down. However, not all devices return zeroed
    blocks after a discard. Some drives return stale data, potentially
    containing old superblocks. It is therefore important to know whether
    discarded blocks are properly zeroed.

    Both ATA and SCSI drives have configuration bits that indicate whether
    zeroes are returned after a discard operation. Implement a block level
    interface that allows this information to be bubbled up the stack and
    queried via a new block device ioctl.

    Signed-off-by: Martin K. Petersen
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     

10 Nov, 2009

1 commit

  • While SSDs track block usage on a per-sector basis, RAID arrays often
    have allocation blocks that are bigger. Allow the discard granularity
    and alignment to be set and teach the topology stacking logic how to
    handle them.

    Signed-off-by: Martin K. Petersen
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     

02 Oct, 2009

1 commit

  • Add missing blk_trace_remove_sysfs to be in pair with blk_trace_init_sysfs
    introduced in commit 1d54ad6da9192fed5dd3b60224d9f2dfea0dcd82.
    Release kobject also in case the request_fn is NULL.

    Problem was noticed via kmemleak backtrace when some sysfs entries were
    note properly destroyed during device removal:

    unreferenced object 0xffff88001aa76640 (size 80):
    comm "lvcreate", pid 2120, jiffies 4294885144
    hex dump (first 32 bytes):
    01 00 00 00 00 00 00 00 f0 65 a7 1a 00 88 ff ff .........e......
    90 66 a7 1a 00 88 ff ff 86 1d 53 81 ff ff ff ff .f........S.....
    backtrace:
    [] kmemleak_alloc+0x26/0x60
    [] kmem_cache_alloc+0x133/0x1c0
    [] sysfs_new_dirent+0x41/0x120
    [] sysfs_add_file_mode+0x3c/0xb0
    [] internal_create_group+0xc1/0x1a0
    [] sysfs_create_group+0x13/0x20
    [] blk_trace_init_sysfs+0x14/0x20
    [] blk_register_queue+0x3c/0xf0
    [] add_disk+0x94/0x160
    [] dm_create+0x598/0x6e0 [dm_mod]
    [] dev_create+0x51/0x350 [dm_mod]
    [] ctl_ioctl+0x1a3/0x240 [dm_mod]
    [] dm_compat_ctl_ioctl+0x12/0x20 [dm_mod]
    [] compat_sys_ioctl+0xcd/0x4f0
    [] sysenter_dispatch+0x7/0x2c
    [] 0xffffffffffffffff

    Signed-off-by: Zdenek Kabelac
    Reviewed-by: Li Zefan
    Signed-off-by: Jens Axboe

    Zdenek Kabelac
     

14 Sep, 2009

1 commit


02 Sep, 2009

1 commit

  • The patch "block: Use accessor functions for queue limits"
    (ae03bf639a5027d27270123f5f6e3ee6a412781d) changed queue_max_sectors_store()
    to use blk_queue_max_sectors() instead of directly assigning the value.

    But blk_queue_max_sectors() differs a bit
    1. It sets both max_sectors_kb, and max_hw_sectors_kb
    2. Never allows one to change max_sectors_kb above BLK_DEF_MAX_SECTORS. If one
    specifies a value greater then max_hw_sectors is set to that value but
    max_sectors is set to BLK_DEF_MAX_SECTORS

    I am not sure whether blk_queue_max_sectors() should be changed, as it seems
    to be that way for a long time. And there may be callers dependent on that
    behaviour.

    This patch simply reverts to the older way of directly assigning the value to
    max_sectors as it was before.

    Signed-off-by: Nikanth Karthikesan
    Signed-off-by: Jens Axboe

    Nikanth Karthikesan
     

17 Jul, 2009

1 commit

  • In blk-sysfs.c, queue_var_store uses unsigned long to store data,
    but queue_var_show uses unsigned int to show data. This causes,

    # echo 70000000000 > /sys/block//queue/read_ahead_kb
    # cat /sys/block//queue/read_ahead_kb => get wrong value

    Fix it by using unsigned long.

    While at it, convert queue_rq_affinity_show() such that it uses bool
    variable instead of explicit != 0 testing.

    Signed-off-by: Xiaotian Feng
    Signed-off-by: Tejun Heo

    Xiaotian Feng
     

12 Jun, 2009

1 commit

  • * 'for-2.6.31' of git://git.kernel.dk/linux-2.6-block: (153 commits)
    block: add request clone interface (v2)
    floppy: fix hibernation
    ramdisk: remove long-deprecated "ramdisk=" boot-time parameter
    fs/bio.c: add missing __user annotation
    block: prevent possible io_context->refcount overflow
    Add serial number support for virtio_blk, V4a
    block: Add missing bounce_pfn stacking and fix comments
    Revert "block: Fix bounce limit setting in DM"
    cciss: decode unit attention in SCSI error handling code
    cciss: Remove no longer needed sendcmd reject processing code
    cciss: change SCSI error handling routines to work with interrupts enabled.
    cciss: separate error processing and command retrying code in sendcmd_withirq_core()
    cciss: factor out fix target status processing code from sendcmd functions
    cciss: simplify interface of sendcmd() and sendcmd_withirq()
    cciss: factor out core of sendcmd_withirq() for use by SCSI error handling code
    cciss: Use schedule_timeout_uninterruptible in SCSI error handling code
    block: needs to set the residual length of a bidi request
    Revert "block: implement blkdev_readpages"
    block: Fix bounce limit setting in DM
    Removed reference to non-existing file Documentation/PCI/PCI-DMA-mapping.txt
    ...

    Manually fix conflicts with tracing updates in:
    block/blk-sysfs.c
    drivers/ide/ide-atapi.c
    drivers/ide/ide-cd.c
    drivers/ide/ide-floppy.c
    drivers/ide/ide-tape.c
    include/trace/events/block.h
    kernel/trace/blktrace.c

    Linus Torvalds
     

23 May, 2009

4 commits

  • To support devices with physical block sizes bigger than 512 bytes we
    need to ensure proper alignment. This patch adds support for exposing
    I/O topology characteristics as devices are stacked.

    logical_block_size is the smallest unit the device can address.

    physical_block_size indicates the smallest I/O the device can write
    without incurring a read-modify-write penalty.

    The io_min parameter is the smallest preferred I/O size reported by
    the device. In many cases this is the same as the physical block
    size. However, the io_min parameter can be scaled up when stacking
    (RAID5 chunk size > physical block size).

    The io_opt characteristic indicates the optimal I/O size reported by
    the device. This is usually the stripe width for arrays.

    The alignment_offset parameter indicates the number of bytes the start
    of the device/partition is offset from the device's natural alignment.
    Partition tools and MD/DM utilities can use this to pad their offsets
    so filesystems start on proper boundaries.

    Signed-off-by: Martin K. Petersen
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     
  • Currently stacking devices do not have a queue directory in sysfs.
    However, many of the I/O characteristics like sector size, maximum
    request size, etc. are queue properties.

    This patch enables the queue directory for MD/DM devices. The elevator
    code has been modified to deal with queues that do not have an I/O
    scheduler.

    Signed-off-by: Martin K. Petersen
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     
  • Convert all external users of queue limits to using wrapper functions
    instead of poking the request queue variables directly.

    Signed-off-by: Martin K. Petersen
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     
  • Until now we have had a 1:1 mapping between storage device physical
    block size and the logical block sized used when addressing the device.
    With SATA 4KB drives coming out that will no longer be the case. The
    sector size will be 4KB but the logical block size will remain
    512-bytes. Hence we need to distinguish between the physical block size
    and the logical ditto.

    This patch renames hardsect_size to logical_block_size.

    Signed-off-by: Martin K. Petersen
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     

07 May, 2009

1 commit