18 Oct, 2007

2 commits


17 Oct, 2007

4 commits

  • provide BDI constructor/destructor hooks

    [akpm@linux-foundation.org: compile fix]
    Signed-off-by: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     
  • The memset() of the sg entry was originally removed, because it could
    overwrite a chain pointer. But it's quite OK to memset() it when we know
    it's a valid entry, since it can't contain a chain pointer.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • * 'for-linus' of git://git.kernel.dk/data/git/linux-2.6-block: (63 commits)
    Fix memory leak in dm-crypt
    SPARC64: sg chaining support
    SPARC: sg chaining support
    PPC: sg chaining support
    PS3: sg chaining support
    IA64: sg chaining support
    x86-64: enable sg chaining
    x86-64: update pci-gart iommu to sg helpers
    x86-64: update nommu to sg helpers
    x86-64: update calgary iommu to sg helpers
    swiotlb: sg chaining support
    i386: enable sg chaining
    i386 dma_map_sg: convert to using sg helpers
    mmc: need to zero sglist on init
    Panic in blk_rq_map_sg() from CCISS driver
    remove sglist_len
    remove blk_queue_max_phys_segments in libata
    revert sg segment size ifdefs
    Fixup u14-34f ENABLE_SG_CHAINING
    qla1280: enable use_sg_chaining option
    ...

    Linus Torvalds
     
  • Remove the size limit max_sectors_kb imposed on max_readahead_kb.

    The size restriction is unreasonable. Especially when max_sectors_kb cannot
    grow larger than max_hw_sectors_kb, which can be rather small for some disk
    drives.

    Cc: Jens Axboe
    Signed-off-by: Fengguang Wu
    Acked-by: Jens Axboe
    Cc: Rusty Russell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Fengguang Wu
     

16 Oct, 2007

7 commits


13 Oct, 2007

1 commit


10 Oct, 2007

10 commits

  • As bi_end_io is only called once when the reqeust is complete,
    the 'size' argument is now redundant. Remove it.

    Now there is no need for bio_endio to subtract the size completed
    from bi_size. So don't do that either.

    While we are at it, change bi_end_io to return void.

    Signed-off-by: Neil Brown
    Signed-off-by: Jens Axboe

    NeilBrown
     
  • The only caller of bio_endio that does not pass the full bi_size
    is end_that_request_first. Also, no ->bi_end_io method is really
    interested in bi_size being decremented.

    So move the decrement and related code into ll_rw_blk and merge it
    with order_bio_endio to form req_bio_endio which does endio functionality
    specific to request completion.

    As some ->bi_end_io methods do check bi_size of 0, we set it thus for
    now, but that will go in the next patch.

    Signed-off-by: Neil Brown

    ### Diffstat output
    ./block/ll_rw_blk.c | 42 +++++++++++++++++++++++++++---------------
    ./fs/bio.c | 23 +++++++++++------------
    2 files changed, 38 insertions(+), 27 deletions(-)

    diff .prev/block/ll_rw_blk.c ./block/ll_rw_blk.c
    Signed-off-by: Jens Axboe

    NeilBrown
     
  • The entire function of flush_dry_bio_endio is to undo the effects
    of bio_endio (when called on a barrier request). So remove the
    function and the call to bio_endio.

    This allows us to remove "bi_size" from "struct request_queue".

    Signed-off-by: Neil Brown

    ### Diffstat output
    ./block/ll_rw_blk.c | 39 ++-------------------------------------
    ./include/linux/blkdev.h | 1 -
    2 files changed, 2 insertions(+), 38 deletions(-)

    diff .prev/block/ll_rw_blk.c ./block/ll_rw_blk.c
    Signed-off-by: Jens Axboe

    NeilBrown
     
  • blk_cpu_notifier is marked as __devinitdata, but __devinitdata need not
    be __init even if HOTPLUG_CPU=n, which wastes space. It should be marked
    __cpuinitdata, and the callback itself as __cpuinit.

    Signed-off-by: Satyam Sharma
    Signed-off-by: Andrew Morton
    Signed-off-by: Jens Axboe

    Satyam Sharma
     
  • Remove one level of nesting where appropriate.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • These have very similar functions and should share code where
    possible.

    Signed-off-by: Neil Brown

    diff .prev/block/ll_rw_blk.c ./block/ll_rw_blk.c
    Signed-off-by: Jens Axboe

    NeilBrown
     
  • blk_rq_bio_prep is exported for use in exactly
    one place. That place can benefit from using
    the new blk_rq_append_bio instead.
    So
    - change dm-emc to call blk_rq_append_bio
    - stop exporting blk_rq_bio_prep, and
    - initialise rq_disk in blk_rq_bio_prep,
    as dm-emc needs it.

    Signed-off-by: Neil Brown

    diff .prev/block/ll_rw_blk.c ./block/ll_rw_blk.c
    Signed-off-by: Jens Axboe

    NeilBrown
     
  • ll_back_merge_fn is currently exported to SCSI where is it used,
    together with blk_rq_bio_prep, in exactly the same way these
    functions are used in __blk_rq_map_user.

    So move the common code into a new function (blk_rq_append_bio), and
    don't export ll_back_merge_fn any longer.

    Signed-off-by: Neil Brown

    diff .prev/block/ll_rw_blk.c ./block/ll_rw_blk.c
    Signed-off-by: Jens Axboe

    NeilBrown
     
  • Every usage of rq_for_each_bio wraps a usage of
    bio_for_each_segment, so these can be combined into
    rq_for_each_segment.

    We define "struct req_iterator" to hold the 'bio' and 'index' that
    are needed for the double iteration.

    Signed-off-by: Neil Brown

    Various compile fixes by me...

    Signed-off-by: Jens Axboe

    NeilBrown
     
  • blk_recalc_rq_segments calls blk_recount_segments on each bio,
    then does some extra calculations to handle segments that overlap
    two bios.

    If we merge the code from blk_recount_segments into
    blk_recalc_rq_segments, we can process the whole request one bio_vec
    at a time, and not need the messy cross-bio calculations.

    Then blk_recount_segments can be implemented by calling
    blk_recalc_rq_segments, passing it a simple on-stack request which
    stores just the bio.

    Signed-off-by: Neil Brown

    diff .prev/block/ll_rw_blk.c ./block/ll_rw_blk.c
    Signed-off-by: Jens Axboe

    NeilBrown
     

15 Sep, 2007

1 commit

  • Should add some comments for the tag barriers (they won't be so important
    if we can switch over to the explicit _lock bitops, but for now we should
    make it clear).

    Jens' original patch said a barrier after the test_and_clear_bit was also
    required. I can't see why (and it would prevent the use of the _lock bitop).

    Acked-by: Jens Axboe
    Signed-off-by: Linus Torvalds
    --

    Nick Piggin
     

13 Sep, 2007

1 commit

  • There's a race condition in blk_queue_end_tag() for shared tag maps,
    users include stex (promise supertrak thingy) and qla2xxx. The former
    at least has reported bugs in this area, not sure why we haven't seen
    any for the latter. It could be because the window is narrow and that
    other conditions in the qla2xxx code hide this. It's a real bug,
    though, as the stex smp users can attest.

    We need to ensure two things - the tag bit clearing needs to happen
    AFTER we cleared the tag pointer, as the tag bit clearing/setting is
    what protects this map. Secondly, we need to ensure that the visibility
    of the tag pointer and tag bit clear are ordered properly.

    [ I removed the SMP barriers - "test_and_clear_bit()" already implies
    all the required barriers. -- Linus ]

    Also see http://bugzilla.kernel.org/show_bug.cgi?id=7842

    Signed-off-by: Jens Axboe
    Signed-off-by: Linus Torvalds

    Jens Axboe
     

12 Aug, 2007

1 commit

  • This patch provides more information concerning REMAP operations on block
    IOs. The additional information provides clearer details at the user level,
    and supports post-processing analysis in btt.

    o Adds in partition remaps on the same device.
    o Fixed up the remap information in DM to be in the right order
    o Sent up mapped-from and mapped-to device information

    Signed-off-by: Alan D. Brunelle
    Signed-off-by: Jens Axboe

    Alan D. Brunelle
     

24 Jul, 2007

1 commit

  • Some of the code has been gradually transitioned to using the proper
    struct request_queue, but there's lots left. So do a full sweet of
    the kernel and get rid of this typedef and replace its uses with
    the proper type.

    Signed-off-by: Jens Axboe

    Jens Axboe
     

20 Jul, 2007

1 commit

  • Slab destructors were no longer supported after Christoph's
    c59def9f222d44bb7e2f0a559f2906191a0862d7 change. They've been
    BUGs for both slab and slub, and slob never supported them
    either.

    This rips out support for the dtor pointer from kmem_cache_create()
    completely and fixes up every single callsite in the kernel (there were
    about 224, not including the slab allocator definitions themselves,
    or the documentation references).

    Signed-off-by: Paul Mundt

    Paul Mundt
     

18 Jul, 2007

1 commit

  • kmalloc_node() and kmem_cache_alloc_node() were not available in a zeroing
    variant in the past. But with __GFP_ZERO it is possible now to do zeroing
    while allocating.

    Use __GFP_ZERO to remove the explicit clearing of memory via memset whereever
    we can.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

16 Jul, 2007

4 commits


10 Jul, 2007

2 commits

  • Barrier bios are completed twice - once after the barrier write itself
    is done and again after the whole sequence is complete.
    flush_dry_bio_endio() is for the first completion. It doesn't really
    complete the bio. It rewinds bvec and resets bio so that it can be
    completed again when the whole barrier sequence is complete.

    The bvec rewinding code has the following problems.

    1. The rewinding code is wrong because filesystems may pass bvec with
    non zero bv_offset.

    2. The block layer doesn't guarantee anything about the state of
    bvec array on request completion. bv_offset and len are updated
    iff __end_that_request_first() completes the bvec partially.

    Because of #2, #1 doesn't really matter (nobody cares whether bvec is
    re-wound correctly or not) but then again by not doing unwinding at
    all, we'll always give back the same bvec to the caller as full bvec
    completion doesn't alter bvecs and the final completion is always full
    completion.

    Drop unnecessary rewinding code.

    This is spotted by Neil Brown.

    Signed-off-by: Tejun Heo
    Cc: Neil Brown
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • Two bugs in there:

    - The virt oversize check should use the current bio hardware back
    size and the next bio front size, not the same bio. Spotted by
    Neil Brown.

    - The segment size check should add hw front sizes, not total bio
    sizes. Spotted by James Bottomley

    Acked-by: James Bottomley
    Acked-by: NeilBrown
    Signed-off-by: Jens Axboe

    Jens Axboe
     

16 Jun, 2007

1 commit

  • SCSI marks internal commands with REQ_PREEMPT and push it at the front
    of the request queue using blk_execute_rq(). When entering suspended
    or frozen state, SCSI devices are quiesced using
    scsi_device_quiesce(). In quiesced state, only REQ_PREEMPT requests
    are processed. This is how SCSI blocks other requests out while
    suspending and resuming. As all internal commands are pushed at the
    front of the queue, this usually works.

    Unfortunately, this interacts badly with ordered requeueing. To
    preserve request order on requeueing (due to busy device, active EH or
    other failures), requests are sorted according to ordered sequence on
    requeue if IO barrier is in progress.

    The following sequence deadlocks.

    1. IO barrier sequence issues.

    2. Suspend requested. Queue is quiesced with part or all of IO
    barrier sequence at the front.

    3. During suspending or resuming, SCSI issues internal command which
    gets deferred and requeued for some reason. As the command is
    issued after the IO barrier in #1, ordered requeueing code puts the
    request after IO barrier sequence.

    4. The device is ready to process requests again but still is in
    quiesced state and the first request of the queue isn't
    REQ_PREEMPT, so command processing is deadlocked -
    suspending/resuming waits for the issued request to complete while
    the request can't be processed till device is put back into
    running state by resuming.

    This can be fixed by always putting !fs requests at the front when
    requeueing.

    The following thread reports this deadlock.

    http://thread.gmane.org/gmane.linux.kernel/537473

    Signed-off-by: Tejun Heo
    Acked-by: David Greaves
    Acked-by: Jeff Garzik
    Signed-off-by: Jens Axboe
    Signed-off-by: Linus Torvalds

    Tejun Heo
     

16 May, 2007

1 commit


11 May, 2007

1 commit

  • to generic_make_request can use up a lot of space, and we would rather they
    didn't.

    As generic_make_request is a void function, and as it is generally not
    expected that it will have any effect immediately, it is safe to delay any
    call to generic_make_request until there is sufficient stack space
    available.

    As ->bi_next is reserved for the driver to use, it can have no valid value
    when generic_make_request is called, and as __make_request implicitly
    assumes it will be NULL (ELEVATOR_BACK_MERGE fork of switch) we can be
    certain that all callers set it to NULL. We can therefore safely use
    bi_next to link pending requests together, providing we clear it before
    making the real call.

    So, we choose to allow each thread to only be active in one
    generic_make_request at a time. If a subsequent (recursive) call is made,
    the bio is linked into a per-thread list, and is handled when the active
    call completes.

    As the list of pending bios is per-thread, there are no locking issues to
    worry about.

    I say above that it is "safe to delay any call...". There are, however,
    some behaviours of a make_request_fn which would make it unsafe. These
    include any behaviour that assumes anything will have changed after a
    recursive call to generic_make_request.

    These could include:
    - waiting for that call to finish and call it's bi_end_io function.
    md use to sometimes do this (marking the superblock dirty before
    completing a write) but doesn't any more
    - inspecting the bio for fields that generic_make_request might
    change, such as bi_sector or bi_bdev. It is hard to see a good
    reason for this, and I don't think anyone actually does it.
    - inspecing the queue to see if, e.g. it is 'full' yet. Again, I
    think this is very unlikely to be useful, or to be done.

    Signed-off-by: Neil Brown
    Cc: Jens Axboe
    Cc:

    Alasdair G Kergon said:

    I can see nothing wrong with this in principle.

    For device-mapper at the moment though it's essential that, while the bio
    mappings may now get delayed, they still get processed in exactly
    the same order as they were passed to generic_make_request().

    My main concern is whether the timing changes implicit in this patch
    will make the rare data-corrupting races in the existing snapshot code
    more likely. (I'm working on a fix for these races, but the unfinished
    patch is already several hundred lines long.)

    It would be helpful if some people on this mailing list would test
    this patch in various scenarios and report back.

    Signed-off-by: Andrew Morton
    Signed-off-by: Jens Axboe

    Neil Brown
     

10 May, 2007

1 commit

  • * git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial: (25 commits)
    sound: convert "sound" subdirectory to UTF-8
    MAINTAINERS: Add cxacru website/mailing list
    include files: convert "include" subdirectory to UTF-8
    general: convert "kernel" subdirectory to UTF-8
    documentation: convert the Documentation directory to UTF-8
    Convert the toplevel files CREDITS and MAINTAINERS to UTF-8.
    remove broken URLs from net drivers' output
    Magic number prefix consistency change to Documentation/magic-number.txt
    trivial: s/i_sem /i_mutex/
    fix file specification in comments
    drivers/base/platform.c: fix small typo in doc
    misc doc and kconfig typos
    Remove obsolete fat_cvf help text
    Fix occurrences of "the the "
    Fix minor typoes in kernel/module.c
    Kconfig: Remove reference to external mqueue library
    Kconfig: A couple of grammatical fixes in arch/i386/Kconfig
    Correct comments in genrtc.c to refer to correct /proc file.
    Fix more "deprecated" spellos.
    Fix "deprecated" typoes.
    ...

    Fix trivial comment conflict in kernel/relay.c.

    Linus Torvalds