10 Jul, 2007

5 commits

  • elevator

    Signed-off-by: Matthias Kaehlcke
    Signed-off-by: Andrew Morton
    Signed-off-by: Jens Axboe

    Matthias Kaehlcke
     
  • instead of going through all options.

    Signed-off-by: Jan Engelhardt
    Signed-off-by: Andrew Morton
    Signed-off-by: Jens Axboe

    Jan Engelhardt
     
  • With the cfq_queue hash removal, we inadvertently got rid of the
    async queue sharing. This was not intentional, in fact CFQ purposely
    shares the async queue per priority level to get good merging for
    async writes.

    So put some logic in cfq_get_queue() to track the shared queues.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Barrier bios are completed twice - once after the barrier write itself
    is done and again after the whole sequence is complete.
    flush_dry_bio_endio() is for the first completion. It doesn't really
    complete the bio. It rewinds bvec and resets bio so that it can be
    completed again when the whole barrier sequence is complete.

    The bvec rewinding code has the following problems.

    1. The rewinding code is wrong because filesystems may pass bvec with
    non zero bv_offset.

    2. The block layer doesn't guarantee anything about the state of
    bvec array on request completion. bv_offset and len are updated
    iff __end_that_request_first() completes the bvec partially.

    Because of #2, #1 doesn't really matter (nobody cares whether bvec is
    re-wound correctly or not) but then again by not doing unwinding at
    all, we'll always give back the same bvec to the caller as full bvec
    completion doesn't alter bvecs and the final completion is always full
    completion.

    Drop unnecessary rewinding code.

    This is spotted by Neil Brown.

    Signed-off-by: Tejun Heo
    Cc: Neil Brown
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • Two bugs in there:

    - The virt oversize check should use the current bio hardware back
    size and the next bio front size, not the same bio. Spotted by
    Neil Brown.

    - The segment size check should add hw front sizes, not total bio
    sizes. Spotted by James Bottomley

    Acked-by: James Bottomley
    Acked-by: NeilBrown
    Signed-off-by: Jens Axboe

    Jens Axboe
     

16 Jun, 2007

1 commit

  • SCSI marks internal commands with REQ_PREEMPT and push it at the front
    of the request queue using blk_execute_rq(). When entering suspended
    or frozen state, SCSI devices are quiesced using
    scsi_device_quiesce(). In quiesced state, only REQ_PREEMPT requests
    are processed. This is how SCSI blocks other requests out while
    suspending and resuming. As all internal commands are pushed at the
    front of the queue, this usually works.

    Unfortunately, this interacts badly with ordered requeueing. To
    preserve request order on requeueing (due to busy device, active EH or
    other failures), requests are sorted according to ordered sequence on
    requeue if IO barrier is in progress.

    The following sequence deadlocks.

    1. IO barrier sequence issues.

    2. Suspend requested. Queue is quiesced with part or all of IO
    barrier sequence at the front.

    3. During suspending or resuming, SCSI issues internal command which
    gets deferred and requeued for some reason. As the command is
    issued after the IO barrier in #1, ordered requeueing code puts the
    request after IO barrier sequence.

    4. The device is ready to process requests again but still is in
    quiesced state and the first request of the queue isn't
    REQ_PREEMPT, so command processing is deadlocked -
    suspending/resuming waits for the issued request to complete while
    the request can't be processed till device is put back into
    running state by resuming.

    This can be fixed by always putting !fs requests at the front when
    requeueing.

    The following thread reports this deadlock.

    http://thread.gmane.org/gmane.linux.kernel/537473

    Signed-off-by: Tejun Heo
    Acked-by: David Greaves
    Acked-by: Jeff Garzik
    Signed-off-by: Jens Axboe
    Signed-off-by: Linus Torvalds

    Tejun Heo
     

24 May, 2007

2 commits

  • Send an uevent to user space to indicate that a media change event has
    occurred.

    Signed-off-by: Kristen Carlson Accardi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kristen Carlson Accardi
     
  • Allow user space to determine if a disk supports Asynchronous Notification of
    media changes. This is done by adding a new sysfs file "capability_flags",
    which is documented in (insert file name). This sysfs file will export all
    disk capabilities flags to user space. We also define a new flag to define
    the media change notification capability.

    Signed-off-by: Kristen Carlson Accardi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kristen Carlson Accardi
     

16 May, 2007

1 commit


11 May, 2007

1 commit

  • to generic_make_request can use up a lot of space, and we would rather they
    didn't.

    As generic_make_request is a void function, and as it is generally not
    expected that it will have any effect immediately, it is safe to delay any
    call to generic_make_request until there is sufficient stack space
    available.

    As ->bi_next is reserved for the driver to use, it can have no valid value
    when generic_make_request is called, and as __make_request implicitly
    assumes it will be NULL (ELEVATOR_BACK_MERGE fork of switch) we can be
    certain that all callers set it to NULL. We can therefore safely use
    bi_next to link pending requests together, providing we clear it before
    making the real call.

    So, we choose to allow each thread to only be active in one
    generic_make_request at a time. If a subsequent (recursive) call is made,
    the bio is linked into a per-thread list, and is handled when the active
    call completes.

    As the list of pending bios is per-thread, there are no locking issues to
    worry about.

    I say above that it is "safe to delay any call...". There are, however,
    some behaviours of a make_request_fn which would make it unsafe. These
    include any behaviour that assumes anything will have changed after a
    recursive call to generic_make_request.

    These could include:
    - waiting for that call to finish and call it's bi_end_io function.
    md use to sometimes do this (marking the superblock dirty before
    completing a write) but doesn't any more
    - inspecting the bio for fields that generic_make_request might
    change, such as bi_sector or bi_bdev. It is hard to see a good
    reason for this, and I don't think anyone actually does it.
    - inspecing the queue to see if, e.g. it is 'full' yet. Again, I
    think this is very unlikely to be useful, or to be done.

    Signed-off-by: Neil Brown
    Cc: Jens Axboe
    Cc:

    Alasdair G Kergon said:

    I can see nothing wrong with this in principle.

    For device-mapper at the moment though it's essential that, while the bio
    mappings may now get delayed, they still get processed in exactly
    the same order as they were passed to generic_make_request().

    My main concern is whether the timing changes implicit in this patch
    will make the rare data-corrupting races in the existing snapshot code
    more likely. (I'm working on a fix for these races, but the unfinished
    patch is already several hundred lines long.)

    It would be helpful if some people on this mailing list would test
    this patch in various scenarios and report back.

    Signed-off-by: Andrew Morton
    Signed-off-by: Jens Axboe

    Neil Brown
     

10 May, 2007

5 commits

  • * git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial: (25 commits)
    sound: convert "sound" subdirectory to UTF-8
    MAINTAINERS: Add cxacru website/mailing list
    include files: convert "include" subdirectory to UTF-8
    general: convert "kernel" subdirectory to UTF-8
    documentation: convert the Documentation directory to UTF-8
    Convert the toplevel files CREDITS and MAINTAINERS to UTF-8.
    remove broken URLs from net drivers' output
    Magic number prefix consistency change to Documentation/magic-number.txt
    trivial: s/i_sem /i_mutex/
    fix file specification in comments
    drivers/base/platform.c: fix small typo in doc
    misc doc and kconfig typos
    Remove obsolete fat_cvf help text
    Fix occurrences of "the the "
    Fix minor typoes in kernel/module.c
    Kconfig: Remove reference to external mqueue library
    Kconfig: A couple of grammatical fixes in arch/i386/Kconfig
    Correct comments in genrtc.c to refer to correct /proc file.
    Fix more "deprecated" spellos.
    Fix "deprecated" typoes.
    ...

    Fix trivial comment conflict in kernel/relay.c.

    Linus Torvalds
     
  • Since nonboot CPUs are now disabled after tasks and devices have been
    frozen and the CPU hotplug infrastructure is used for this purpose, we need
    special CPU hotplug notifications that will help the CPU-hotplug-aware
    subsystems distinguish normal CPU hotplug events from CPU hotplug events
    related to a system-wide suspend or resume operation in progress. This
    patch introduces such notifications and causes them to be used during
    suspend and resume transitions. It also changes all of the
    CPU-hotplug-aware subsystems to take these notifications into consideration
    (for now they are handled in the same way as the corresponding "normal"
    ones).

    [oleg@tv-sign.ru: cleanups]
    Signed-off-by: Rafael J. Wysocki
    Cc: Gautham R Shenoy
    Cc: Pavel Machek
    Signed-off-by: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rafael J. Wysocki
     
  • flush_work(wq, work) doesn't need the first parameter, we can use cwq->wq
    (this was possible from the very beginnig, I missed this). So we can unify
    flush_work_keventd and flush_work.

    Also, rename flush_work() to cancel_work_sync() and fix all callers.
    Perhaps this is not the best name, but "flush_work" is really bad.

    (akpm: this is why the earlier patches bypassed maintainers)

    Signed-off-by: Oleg Nesterov
    Cc: Jeff Garzik
    Cc: "David S. Miller"
    Cc: Jens Axboe
    Cc: Tejun Heo
    Cc: Auke Kok ,
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Switch the kblockd flushing from a global flush to a more specific
    flush_work().

    (akpm: bypassed maintainers, sorry. There are other patches which depend on
    this)

    Cc: "Maciej W. Rozycki"
    Cc: David Howells
    Cc: Jens Axboe
    Cc: Nick Piggin
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • Display all possible partitions when the root filesystem is not mounted.
    This helps to track spell'o's and missing drivers.

    Updated to work with newer kernels.

    Example output:

    VFS: Cannot open root device "foobar" or unknown-block(0,0)
    Please append a correct "root=" boot option; here are the available partitions:
    0800 8388608 sda driver: sd
    0801 192748 sda1
    0802 8193150 sda2
    0810 4194304 sdb driver: sd
    Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0)

    [akpm@linux-foundation.org: cleanups, fix printk warnings]
    Signed-off-by: Jan Engelhardt
    Cc: Dave Gilbert
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Gilbert
     

09 May, 2007

4 commits

  • Signed-off-by: Michael Opdenacker
    Signed-off-by: Adrian Bunk

    Michael Opdenacker
     
  • * 'for-2.6.22' of git://git.kernel.dk/data/git/linux-2.6-block:
    [PATCH] ll_rw_blk: fix missing bounce in blk_rq_map_kern()
    [PATCH] splice: always call into page_cache_readahead()
    [PATCH] splice(): fix interaction with readahead

    Linus Torvalds
     
  • Fix units mismatch (jiffies vs msecs) in as-iosched.c, spotted by Xiaoning
    Ding .

    Signed-off-by: Nick Piggin
    Cc: Jens Axboe
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • I think we might just need the blk_map_kern users now. For the async
    execute I added the bounce code already and the block SG_IO has it
    atleady. I think the blk_map_kern bounce code got dropped because we
    thought the correct gfp_t would be passed in. But I think all we need is
    the patch below and all the paths are take care of. The patch is not
    tested. Patch was made against scsi-misc.

    The last place that is sending non sg commands may just be md/dm-emc.c
    but that is is just waiting on alasdair to take some patches that fix
    that and a bunch of junk in there including adding bounce support. If
    the patch below is ok though and dm-emc finally gets converted then it
    will have sg and bonce buffer support.

    Signed-off-by: Mike Christie
    Signed-off-by: Jens Axboe

    Mike Christie
     

08 May, 2007

2 commits

  • This patch provides a new macro

    KMEM_CACHE(, )

    to simplify slab creation. KMEM_CACHE creates a slab with the name of the
    struct, with the size of the struct and with the alignment of the struct.
    Additional slab flags may be specified if necessary.

    Example

    struct test_slab {
    int a,b,c;
    struct list_head;
    } __cacheline_aligned_in_smp;

    test_slab_cache = KMEM_CACHE(test_slab, SLAB_PANIC)

    will create a new slab named "test_slab" of the size sizeof(struct
    test_slab) and aligned to the alignment of test slab. If it fails then we
    panic.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • Remove the destroy_dirty_buffers argument from invalidate_bdev(), it hasn't
    been used in 6 years (so akpm says).

    find * -name \*.[ch] | xargs grep -l invalidate_bdev |
    while read file; do
    quilt add $file;
    sed -ie 's/invalidate_bdev(\([^,]*\),[^)]*)/invalidate_bdev(\1)/g' $file;
    done

    Signed-off-by: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     

06 May, 2007

1 commit

  • * master.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (87 commits)
    [SCSI] fusion: fix domain validation loops
    [SCSI] qla2xxx: fix regression on sparc64
    [SCSI] modalias for scsi devices
    [SCSI] sg: cap reserved_size values at max_sectors
    [SCSI] BusLogic: stop using check_region
    [SCSI] tgt: fix rdma transfer bugs
    [SCSI] aacraid: fix aacraid not finding device
    [SCSI] aacraid: Correct SMC products in aacraid.txt
    [SCSI] scsi_error.c: Add EH Start Unit retry
    [SCSI] aacraid: [Fastboot] Panics for AACRAID driver during 'insmod' for kexec test.
    [SCSI] ipr: Driver version to 2.3.2
    [SCSI] ipr: Faster sg list fetch
    [SCSI] ipr: Return better qc_issue errors
    [SCSI] ipr: Disrupt device error
    [SCSI] ipr: Improve async error logging level control
    [SCSI] ipr: PCI unblock config access fix
    [SCSI] ipr: Fix for oops following SATA request sense
    [SCSI] ipr: Log error for SAS dual path switch
    [SCSI] ipr: Enable logging of debug error data for all devices
    [SCSI] ipr: Add new PCI-E IDs to device table
    ...

    Linus Torvalds
     

03 May, 2007

1 commit


30 Apr, 2007

17 commits