16 Dec, 2011

1 commit

  • While probing, fd sets up queue, probes hardware and tears down the
    queue if probing fails. In the process, blk_drain_queue() kicks the
    queue which failed to finish initialization and fd is unhappy about
    that.

    floppy0: no floppy controllers found
    ------------[ cut here ]------------
    WARNING: at drivers/block/floppy.c:2929 do_fd_request+0xbf/0xd0()
    Hardware name: To Be Filled By O.E.M.
    VFS: do_fd_request called on non-open device
    Modules linked in:
    Pid: 1, comm: swapper Not tainted 3.2.0-rc4-00077-g5983fe2 #2
    Call Trace:
    [] warn_slowpath_common+0x7a/0xb0
    [] warn_slowpath_fmt+0x41/0x50
    [] do_fd_request+0xbf/0xd0
    [] blk_drain_queue+0x65/0x80
    [] blk_cleanup_queue+0xe3/0x1a0
    [] floppy_init+0xdeb/0xe28
    [] ? daring+0x6b/0x6b
    [] do_one_initcall+0x3f/0x170
    [] kernel_init+0x9d/0x11e
    [] ? schedule_tail+0x22/0xa0
    [] kernel_thread_helper+0x4/0x10
    [] ? start_kernel+0x2be/0x2be
    [] ? gs_change+0xb/0xb

    Avoid it by making blk_drain_queue() kick queue iff dispatch queue has
    something on it.

    Signed-off-by: Tejun Heo
    Reported-by: Ralf Hildebrandt
    Reported-by: Wu Fengguang
    Tested-by: Sergei Trofimovich
    Signed-off-by: Jens Axboe

    Tejun Heo
     

02 Dec, 2011

1 commit

  • cfq_cic_link() has race condition. When some processes which shared ioc
    issue I/O to same block device simultaneously, cfq_cic_link() returns -EEXIST
    sometimes. The race condition might stop I/O by following steps:

    step 1: Process A: Issue an I/O to /dev/sda
    step 2: Process A: Get an ioc (iocA here) in get_io_context() which does not
    linked with a cic for the device
    step 3: Process A: Get a new cic for the device (cicA here) in
    cfq_alloc_io_context()

    step 4: Process B: Issue an I/O to /dev/sda
    step 5: Process B: Get iocA in get_io_context() since process A and B share the
    same ioc
    step 6: Process B: Get a new cic for the device (cicB here) in
    cfq_alloc_io_context() since iocA has not been linked with a
    cic for the device yet

    step 7: Process A: Link cicA to iocA in cfq_cic_link()
    step 8: Process A: Dispatch I/O to driver and finish it

    step 9: Process B: Try to link cicB to iocA in cfq_cic_link()
    But it fails with showing "cfq: cic link failed!" kernel
    message, since iocA has already linked with cicA at step 7.
    step 10: Process B: Wait for finishig I/O in get_request_wait()
    The function does not wake up, when there is no I/O to the
    device.

    When cfq_cic_link() returns -EEXIST, it means ioc has already linked with cic.
    So when cfq_cic_link() return -EEXIST, retry cfq_cic_lookup().

    Signed-off-by: Yasuaki Ishimatsu
    Cc: stable@kernel.org
    Signed-off-by: Jens Axboe

    Yasuaki Ishimatsu
     

30 Nov, 2011

1 commit


23 Nov, 2011

1 commit

  • struct request_queue is allocated with __GFP_ZERO so its "node" field is
    zero before initialization. This causes an oops if node 0 is offline in
    the page allocator because its zonelists are not initialized. From Dave
    Young's dmesg:

    SRAT: Node 1 PXM 2 0-d0000000
    SRAT: Node 1 PXM 2 100000000-330000000
    SRAT: Node 0 PXM 1 330000000-630000000
    Initmem setup node 1 0000000000000000-000000000affb000
    ...
    Built 1 zonelists in Node order, mobility grouping on.
    ...
    BUG: unable to handle kernel paging request at 0000000000001c08
    IP: [] __alloc_pages_nodemask+0xb5/0x870

    and __alloc_pages_nodemask+0xb5 translates to a NULL pointer on
    zonelist->_zonerefs.

    The fix is to initialize q->node at the time of allocation so the correct
    node is passed to the slab allocator later.

    Since blk_init_allocated_queue_node() is no longer needed, merge it with
    blk_init_allocated_queue().

    [rientjes@google.com: changelog, initializing q->node]
    Cc: stable@vger.kernel.org [2.6.37+]
    Reported-by: Dave Young
    Signed-off-by: Mike Snitzer
    Signed-off-by: David Rientjes
    Tested-by: Dave Young
    Signed-off-by: Jens Axboe

    Mike Snitzer
     

16 Nov, 2011

2 commits


14 Nov, 2011

1 commit


10 Nov, 2011

1 commit

  • This reverts commit a72c5e5eb738033938ab30d6a634b74d1d060f10.

    The commit introduced alias for block devices which is intended to be
    used during logging although actual usage hasn't been committed yet.
    This approach adds very limited benefit (raw log might be easier to
    follow) which can be trivially implemented in userland but has a lot
    of problems.

    It is much worse than netif renames because it doesn't rename the
    actual device but just adds conveninence name which isn't used
    universally or enforced. Everything internal including device lookup
    and sysfs still uses the internal name and nothing prevents two
    devices from using conflicting alias - ie. sda can have sdb as its
    alias.

    This has been nacked by people working on device driver core, block
    layer and kernel-userland interface and shouldn't have been
    upstreamed. Revert it.

    http://thread.gmane.org/gmane.linux.kernel/1155104
    http://thread.gmane.org/gmane.linux.scsi/68632
    http://thread.gmane.org/gmane.linux.scsi/69776

    Signed-off-by: Tejun Heo
    Acked-by: Greg Kroah-Hartman
    Acked-by: Kay Sievers
    Cc: "James E.J. Bottomley"
    Cc: Nao Nishijima
    Cc: Alan Cox
    Cc: Al Viro
    Signed-off-by: Jens Axboe

    Tejun Heo
     

07 Nov, 2011

1 commit

  • * 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits)
    Revert "tracing: Include module.h in define_trace.h"
    irq: don't put module.h into irq.h for tracking irqgen modules.
    bluetooth: macroize two small inlines to avoid module.h
    ip_vs.h: fix implicit use of module_get/module_put from module.h
    nf_conntrack.h: fix up fallout from implicit moduleparam.h presence
    include: replace linux/module.h with "struct module" wherever possible
    include: convert various register fcns to macros to avoid include chaining
    crypto.h: remove unused crypto_tfm_alg_modname() inline
    uwb.h: fix implicit use of asm/page.h for PAGE_SIZE
    pm_runtime.h: explicitly requires notifier.h
    linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h
    miscdevice.h: fix up implicit use of lists and types
    stop_machine.h: fix implicit use of smp.h for smp_processor_id
    of: fix implicit use of errno.h in include/linux/of.h
    of_platform.h: delete needless include
    acpi: remove module.h include from platform/aclinux.h
    miscdevice.h: delete unnecessary inclusion of module.h
    device_cgroup.h: delete needless include
    net: sch_generic remove redundant use of
    net: inet_timewait_sock doesnt need
    ...

    Fix up trivial conflicts (other header files, and removal of the ab3550 mfd driver) in
    - drivers/media/dvb/frontends/dibx000_common.c
    - drivers/media/video/{mt9m111.c,ov6650.c}
    - drivers/mfd/ab3550-core.c
    - include/linux/dmaengine.h

    Linus Torvalds
     

05 Nov, 2011

2 commits

  • * 'for-3.2/drivers' of git://git.kernel.dk/linux-block: (30 commits)
    virtio-blk: use ida to allocate disk index
    hpsa: add small delay when using PCI Power Management to reset for kump
    cciss: add small delay when using PCI Power Management to reset for kump
    xen/blkback: Fix two races in the handling of barrier requests.
    xen/blkback: Check for proper operation.
    xen/blkback: Fix the inhibition to map pages when discarding sector ranges.
    xen/blkback: Report VBD_WSECT (wr_sect) properly.
    xen/blkback: Support 'feature-barrier' aka old-style BARRIER requests.
    xen-blkfront: plug device number leak in xlblk_init() error path
    xen-blkfront: If no barrier or flush is supported, use invalid operation.
    xen-blkback: use kzalloc() in favor of kmalloc()+memset()
    xen-blkback: fixed indentation and comments
    xen-blkfront: fix a deadlock while handling discard response
    xen-blkfront: Handle discard requests.
    xen-blkback: Implement discard requests ('feature-discard')
    xen-blkfront: add BLKIF_OP_DISCARD and discard request struct
    drivers/block/loop.c: remove unnecessary bdev argument from loop_clr_fd()
    drivers/block/loop.c: emit uevent on auto release
    drivers/block/cpqarray.c: use pci_dev->revision
    loop: always allow userspace partitions and optionally support automatic scanning
    ...

    Fic up trivial header file includsion conflict in drivers/block/loop.c

    Linus Torvalds
     
  • * 'for-3.2/core' of git://git.kernel.dk/linux-block: (29 commits)
    block: don't call blk_drain_queue() if elevator is not up
    blk-throttle: use queue_is_locked() instead of lockdep_is_held()
    blk-throttle: Take blkcg->lock while traversing blkcg->policy_list
    blk-throttle: Free up policy node associated with deleted rule
    block: warn if tag is greater than real_max_depth.
    block: make gendisk hold a reference to its queue
    blk-flush: move the queue kick into
    blk-flush: fix invalid BUG_ON in blk_insert_flush
    block: Remove the control of complete cpu from bio.
    block: fix a typo in the blk-cgroup.h file
    block: initialize the bounce pool if high memory may be added later
    block: fix request_queue lifetime handling by making blk_queue_cleanup() properly shutdown
    block: drop @tsk from attempt_plug_merge() and explain sync rules
    block: make get_request[_wait]() fail if queue is dead
    block: reorganize throtl_get_tg() and blk_throtl_bio()
    block: reorganize queue draining
    block: drop unnecessary blk_get/put_queue() in scsi_cmd_ioctl() and blk_get_tg()
    block: pass around REQ_* flags instead of broken down booleans during request alloc/free
    block: move blk_throtl prototypes to block/blk.h
    block: fix genhd refcounting in blkio_policy_parse_and_set()
    ...

    Fix up trivial conflicts due to "mddev_t" -> "struct mddev" conversion
    and making the request functions be of type "void" instead of "int" in
    - drivers/md/{faulty.c,linear.c,md.c,md.h,multipath.c,raid0.c,raid1.c,raid10.c,raid5.c}
    - drivers/staging/zram/zram_drv.c

    Linus Torvalds
     

04 Nov, 2011

1 commit

  • blk_cleanup_queue() may be called before elevator is set up on a
    queue which triggers the following oops.

    BUG: unable to handle kernel NULL pointer dereference at (null)
    IP: [] elv_drain_elevator+0x1c/0x70
    ...
    Pid: 830, comm: kworker/0:2 Not tainted 3.1.0-next-20111025_64+ #1590
    Bochs Bochs
    RIP: 0010:[] [] elv_drain_elevator+0x1c/0x70
    ...
    Call Trace:
    [] blk_drain_queue+0x42/0x70
    [] blk_cleanup_queue+0xd0/0x1c0
    [] md_free+0x50/0x70
    [] kobject_release+0x8b/0x1d0
    [] kref_put+0x36/0xa0
    [] kobject_put+0x27/0x60
    [] mddev_delayed_delete+0x2f/0x40
    [] process_one_work+0x100/0x3b0
    [] worker_thread+0x15f/0x3a0
    [] kthread+0x87/0x90
    [] kernel_thread_helper+0x4/0x10

    Fix it by making blk_cleanup_queue() check whether q->elevator is set
    up before invoking blk_drain_queue.

    Signed-off-by: Tejun Heo
    Reported-and-tested-by: Jiri Slaby
    Signed-off-by: Jens Axboe

    Tejun Heo
     

01 Nov, 2011

2 commits


29 Oct, 2011

1 commit

  • * git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (204 commits)
    [SCSI] qla4xxx: export address/port of connection (fix udev disk names)
    [SCSI] ipr: Fix BUG on adapter dump timeout
    [SCSI] megaraid_sas: Fix instance access in megasas_reset_timer
    [SCSI] hpsa: change confusing message to be more clear
    [SCSI] iscsi class: fix vlan configuration
    [SCSI] qla4xxx: fix data alignment and use nl helpers
    [SCSI] iscsi class: fix link local mispelling
    [SCSI] iscsi class: Replace iscsi_get_next_target_id with IDA
    [SCSI] aacraid: use lower snprintf() limit
    [SCSI] lpfc 8.3.27: Change driver version to 8.3.27
    [SCSI] lpfc 8.3.27: T10 additions for SLI4
    [SCSI] lpfc 8.3.27: Fix queue allocation failure recovery
    [SCSI] lpfc 8.3.27: Change algorithm for getting physical port name
    [SCSI] lpfc 8.3.27: Changed worst case mailbox timeout
    [SCSI] lpfc 8.3.27: Miscellanous logic and interface fixes
    [SCSI] megaraid_sas: Changelog and version update
    [SCSI] megaraid_sas: Add driver workaround for PERC5/1068 kdump kernel panic
    [SCSI] megaraid_sas: Add multiple MSI-X vector/multiple reply queue support
    [SCSI] megaraid_sas: Add support for MegaRAID 9360/9380 12GB/s controllers
    [SCSI] megaraid_sas: Clear FUSION_IN_RESET before enabling interrupts
    ...

    Linus Torvalds
     

25 Oct, 2011

4 commits


24 Oct, 2011

6 commits

  • Jens Axboe
     
  • The following command sequence triggers an oops.

    # mount /dev/sdb1 /mnt
    # echo 1 > /sys/class/scsi_device/0\:0\:1\:0/device/delete
    # umount /mnt

    general protection fault: 0000 [#1] PREEMPT SMP
    CPU 2
    Modules linked in:

    Pid: 791, comm: umount Not tainted 3.1.0-rc3-work+ #8 Bochs Bochs
    RIP: 0010:[] [] __lock_acquire+0x389/0x1d60
    ...
    Call Trace:
    [] lock_acquire+0x95/0x140
    [] _raw_spin_lock+0x3b/0x50
    [] bdi_lock_two+0x5c/0x70
    [] bdev_inode_switch_bdi+0x4c/0xf0
    [] __blkdev_put+0x11b/0x1d0
    [] __blkdev_put+0x160/0x1d0
    [] blkdev_put+0x5f/0x190
    [] kill_block_super+0x4d/0x80
    [] deactivate_locked_super+0x45/0x70
    [] deactivate_super+0x4a/0x70
    [] mntput_no_expire+0xed/0x130
    [] sys_umount+0x7e/0x3a0
    [] system_call_fastpath+0x16/0x1b

    This is because bdev holds on to disk but disk doesn't pin the
    associated queue. If a SCSI device is removed while the device is
    still open, the sdev puts the base reference to the queue on release.
    When the bdev is finally released, the associated queue is already
    gone along with the bdi and bdev_inode_switch_bdi() ends up
    dereferencing already freed bdi.

    Even if it were not for this bug, disk not holding onto the associated
    queue is very unusual and error-prone.

    Fix it by making add_disk() take an extra reference to its queue and
    put it on disk_release() and ensuring that disk and its fops owner are
    put in that order after all accesses to the disk and queue are
    complete.

    Signed-off-by: Tejun Heo
    Cc: Jens Axboe
    Cc: stable@kernel.org
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • A dm-multipath user reported[1] a problem when trying to boot
    a kernel with commit 4853abaae7e4a2af938115ce9071ef8684fb7af4
    (block: fix flush machinery for stacking drivers with differring
    flush flags) applied. It turns out that an empty flush request
    can be sent into blk_insert_flush. When the BUG_ON was fixed
    to allow for this, I/O on the underlying device would stall. The
    reason is that blk_insert_cloned_request does not kick the queue.
    In the aforementioned commit, I had added a special case to
    kick the queue if data was sent down but the queue flags did
    not require a flush. A better solution is to push the queue
    kick up into blk_insert_cloned_request.

    This patch, along with a follow-on which fixes the BUG_ON, fixes
    the issue reported.

    [1] http://www.redhat.com/archives/dm-devel/2011-September/msg00154.html

    Reported-by: Christophe Saout
    Signed-off-by: Jeff Moyer
    Acked-by: Tejun Heo

    Stable note: 3.1
    Cc: stable@vger.kernel.org
    Signed-off-by: Jens Axboe

    Jeff Moyer
     
  • A user reported a regression due to commit
    4853abaae7e4a2af938115ce9071ef8684fb7af4 (block: fix flush
    machinery for stacking drivers with differring flush flags).
    Part of the problem is that blk_insert_flush required a
    single bio be attached to the request. In reality, having
    no attached bio is also a valid case, as can be observed with
    an empty flush.

    [1] http://www.redhat.com/archives/dm-devel/2011-September/msg00154.html

    Reported-by: Christophe Saout
    Signed-off-by: Jeff Moyer

    Stable note: 3.1
    Cc: stable@vger.kernel.org
    Signed-off-by: Jens Axboe

    Jeff Moyer
     
  • bio originally has the functionality to set the complete cpu, but
    it is broken.

    Chirstoph said that "This code is unused, and from the all the
    discussions lately pretty obviously broken. The only thing keeping
    it serves is creating more confusion and possibly more bugs."

    And Jens replied with "We can kill bio_set_completion_cpu(). I'm fine
    with leaving cpu control to the request based drivers, they are the
    only ones that can toggle the setting anyway".

    So this patch tries to remove all the work of controling complete cpu
    from a bio.

    Cc: Shaohua Li
    Cc: Christoph Hellwig
    Signed-off-by: Tao Ma
    Signed-off-by: Jens Axboe

    Tao Ma
     
  • byptes -> bytes.

    Signed-off-by: Jie Liu
    Signed-off-by: Jens Axboe

    Jie Liu
     

19 Oct, 2011

11 commits

  • request_queue is refcounted but actually depdends on lifetime
    management from the queue owner - on blk_cleanup_queue(), block layer
    expects that there's no request passing through request_queue and no
    new one will.

    This is fundamentally broken. The queue owner (e.g. SCSI layer)
    doesn't have a way to know whether there are other active users before
    calling blk_cleanup_queue() and other users (e.g. bsg) don't have any
    guarantee that the queue is and would stay valid while it's holding a
    reference.

    With delay added in blk_queue_bio() before queue_lock is grabbed, the
    following oops can be easily triggered when a device is removed with
    in-flight IOs.

    sd 0:0:1:0: [sdb] Stopping disk
    ata1.01: disabled
    general protection fault: 0000 [#1] PREEMPT SMP
    CPU 2
    Modules linked in:

    Pid: 648, comm: test_rawio Not tainted 3.1.0-rc3-work+ #56 Bochs Bochs
    RIP: 0010:[] [] elv_rqhash_find+0x61/0x100
    ...
    Process test_rawio (pid: 648, threadinfo ffff880019efa000, task ffff880019ef8a80)
    ...
    Call Trace:
    [] elv_merge+0x84/0xe0
    [] blk_queue_bio+0xf4/0x400
    [] generic_make_request+0xca/0x100
    [] submit_bio+0x74/0x100
    [] dio_bio_submit+0xbc/0xc0
    [] __blockdev_direct_IO+0x92e/0xb40
    [] blkdev_direct_IO+0x57/0x60
    [] generic_file_aio_read+0x6d5/0x760
    [] do_sync_read+0xda/0x120
    [] vfs_read+0xc5/0x180
    [] sys_pread64+0x9a/0xb0
    [] system_call_fastpath+0x16/0x1b

    This happens because blk_queue_cleanup() destroys the queue and
    elevator whether IOs are in progress or not and DEAD tests are
    sprinkled in the request processing path without proper
    synchronization.

    Similar problem exists for blk-throtl. On queue cleanup, blk-throtl
    is shutdown whether it has requests in it or not. Depending on
    timing, it either oopses or throttled bios are lost putting tasks
    which are waiting for bio completion into eternal D state.

    The way it should work is having the usual clear distinction between
    shutdown and release. Shutdown drains all currently pending requests,
    marks the queue dead, and performs partial teardown of the now
    unnecessary part of the queue. Even after shutdown is complete,
    reference holders are still allowed to issue requests to the queue
    although they will be immmediately failed. The rest of teardown
    happens on release.

    This patch makes the following changes to make blk_queue_cleanup()
    behave as proper shutdown.

    * QUEUE_FLAG_DEAD is now set while holding both q->exit_mutex and
    queue_lock.

    * Unsynchronized DEAD check in generic_make_request_checks() removed.
    This couldn't make any meaningful difference as the queue could die
    after the check.

    * blk_drain_queue() updated such that it can drain all requests and is
    now called during cleanup.

    * blk_throtl updated such that it checks DEAD on grabbing queue_lock,
    drains all throttled bios during cleanup and free td when queue is
    released.

    Signed-off-by: Tejun Heo
    Cc: Vivek Goyal
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • attempt_plug_merge() accesses elevator without holding queue_lock and
    may call into ->elevator_bio_merge_fn(). The elvator is guaranteed to
    be valid because it's accessed iff the plugged list has requests and
    elevator is never exited with live requests, so as long as the
    elevator method can deal with unlocked access, this is safe.

    Explain the sync rules around attempt_plug_merge() and drop the
    unnecessary @tsk parameter.

    This patch doesn't introduce any functional change.

    Signed-off-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • Currently get_request[_wait]() allocates request whether queue is dead
    or not. This patch makes get_request[_wait]() return NULL if @q is
    dead. blk_queue_bio() is updated to fail the submitted bio if request
    allocation fails. While at it, add docbook comments for
    get_request[_wait]().

    Note that the current code has rather unclear (there are spurious DEAD
    tests scattered around) assumption that the owner of a queue
    guarantees that no request travels block layer if the queue is dead
    and this patch in itself doesn't change much; however, this will allow
    fixing the broken assumption in the next patch.

    Signed-off-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • blk_throtl_bio() and throtl_get_tg() have rather unusual interface.

    * throtl_get_tg() returns pointer to a valid tg or ERR_PTR(-ENODEV),
    and drops queue_lock in the latter case. Different locking context
    depending on return value is error-prone and DEAD state is scheduled
    to be protected by queue_lock anyway. Move DEAD check inside
    queue_lock and return valid tg or NULL.

    * blk_throtl_bio() indicates return status both with its return value
    and in/out param **@bio. The former is used to indicate whether
    queue is found to be dead during throtl processing. The latter
    whether the bio is throttled.

    There's no point in returning DEAD check result from
    blk_throtl_bio(). The queue can die after blk_throtl_bio() is
    finished but before make_request_fn() grabs queue lock.

    Make it take *@bio instead and return boolean result indicating
    whether the request is throttled or not.

    This patch doesn't cause any visible functional difference.

    Signed-off-by: Tejun Heo
    Cc: Vivek Goyal
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • Reorganize queue draining related code in preparation of queue exit
    changes.

    * Factor out actual draining from elv_quiesce_start() to
    blk_drain_queue().

    * Make elv_quiesce_start/end() responsible for their own locking.

    * Replace open-coded ELVSWITCH clearing in elevator_switch() with
    elv_quiesce_end().

    This patch doesn't cause any visible functional difference.

    Signed-off-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • blk_get/put_queue() in scsi_cmd_ioctl() and throtl_get_tg() are
    completely bogus. The caller must have a reference to the queue on
    entry and taking an extra reference doesn't change anything.

    For scsi_cmd_ioctl(), the only effect is that it ends up checking
    QUEUE_FLAG_DEAD on entry; however, this is bogus as queue can die
    right after blk_get_queue(). Dead queue should be and is handled in
    request issue path (it's somewhat broken now but that's a separate
    problem and doesn't affect this one much).

    throtl_get_tg() incorrectly assumes that q is rcu freed. Also, it
    doesn't check return value of blk_get_queue(). If the queue is
    already dead, it ends up doing an extra put.

    Drop them.

    Signed-off-by: Tejun Heo
    Cc: Vivek Goyal
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • blk_alloc_request() and freed_request() take different combinations of
    REQ_* @flags, @priv and @is_sync when @flags is superset of the latter
    two. Make them take @flags only. This cleans up the code a bit and
    will ease updating allocation related REQ_* flags.

    This patch doesn't introduce any functional difference.

    Signed-off-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • blk_throtl interface is block internal and there's no reason to have
    them in linux/blkdev.h. Move them to block/blk.h.

    This patch doesn't introduce any functional change.

    Signed-off-by: Tejun Heo
    Cc: Vivek Goyal
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • blkio_policy_parse_and_set() calls blkio_check_dev_num() to check
    whether the given dev_t is valid. blkio_check_dev_num() uses
    get_gendisk() for verification but never puts the returned genhd
    leaking the reference.

    This patch collapses blkio_check_dev_num() into its caller and updates
    it such that the genhd is put before returning.

    Signed-off-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • The following command sequence triggers an oops.

    # mount /dev/sdb1 /mnt
    # echo 1 > /sys/class/scsi_device/0\:0\:1\:0/device/delete
    # umount /mnt

    general protection fault: 0000 [#1] PREEMPT SMP
    CPU 2
    Modules linked in:

    Pid: 791, comm: umount Not tainted 3.1.0-rc3-work+ #8 Bochs Bochs
    RIP: 0010:[] [] __lock_acquire+0x389/0x1d60
    ...
    Call Trace:
    [] lock_acquire+0x95/0x140
    [] _raw_spin_lock+0x3b/0x50
    [] bdi_lock_two+0x5c/0x70
    [] bdev_inode_switch_bdi+0x4c/0xf0
    [] __blkdev_put+0x11b/0x1d0
    [] __blkdev_put+0x160/0x1d0
    [] blkdev_put+0x5f/0x190
    [] kill_block_super+0x4d/0x80
    [] deactivate_locked_super+0x45/0x70
    [] deactivate_super+0x4a/0x70
    [] mntput_no_expire+0xed/0x130
    [] sys_umount+0x7e/0x3a0
    [] system_call_fastpath+0x16/0x1b

    This is because bdev holds on to disk but disk doesn't pin the
    associated queue. If a SCSI device is removed while the device is
    still open, the sdev puts the base reference to the queue on release.
    When the bdev is finally released, the associated queue is already
    gone along with the bdi and bdev_inode_switch_bdi() ends up
    dereferencing already freed bdi.

    Even if it were not for this bug, disk not holding onto the associated
    queue is very unusual and error-prone.

    Fix it by making add_disk() take an extra reference to its queue and
    put it on disk_release() and ensuring that disk and its fops owner are
    put in that order after all accesses to the disk and queue are
    complete.

    Signed-off-by: Tejun Heo
    Cc: stable@kernel.org
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • Conflicts:
    block/blk-core.c
    include/linux/blkdev.h

    Signed-off-by: Jens Axboe

    Jens Axboe
     

28 Sep, 2011

1 commit

  • A kernel crash is observed when a mounted ext3/ext4 filesystem is
    physically removed. The problem is that blk_cleanup_queue() frees up
    some resources eg by calling elevator_exit(), which are not checked for
    in normal operation. So we should rather move these calls to the
    destructor function blk_release_queue() as at that point all remaining
    references are gone. However, in doing so we have to ensure that any
    externally supplied queue_lock is disconnected as the driver might free
    up the lock after the call of blk_cleanup_queue(),

    Signed-off-by: Hannes Reinecke
    Signed-off-by: Jens Axboe

    Hannes Reinecke
     

21 Sep, 2011

3 commits

  • The bug is we're not able to remove the device from blkio cgroup's
    per-device control files if it gets unplugged.

    To reproduce the bug:

    # mount -t cgroup -o blkio xxx /cgroup
    # cd /cgroup
    # echo "8:0 1000" > blkio.throttle.read_bps_device
    # unplug the device
    # cat blkio.throttle.read_bps_device
    8:0 1000
    # echo "8:0 0" > blkio.throttle.read_bps_device
    -bash: echo: write error: No such device

    After patching, the device removal will succeed.

    Thanks for the comments of Paul, Zefan, and Vivek.

    Signed-off-by: Wanlong Gao
    Cc: Li Zefan
    Cc: Paul Menage
    Acked-by: Vivek Goyal
    Cc: Jens Axboe
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Jens Axboe

    Wanlong Gao
     
  • The kerneldoc for blk_release_queue() is referring to blk_cleanup_queue().

    Cc: Jens Axboe
    Signed-off-by: Andrew Morton
    Signed-off-by: Jens Axboe

    Andrew Morton
     
  • Thus spake Andrew Morton:

    "And I have the usual maintainability whine. If someone comes up to
    vmscan.c and sees it calling blk_start_plug(), how are they supposed to
    work out why that call is there? They go look at the blk_start_plug()
    definition and it is undocumented. I think we can do better than this?"

    Adapted from the LWN article - http://lwn.net/Articles/438256/ by Jens
    Axboe and from an earlier attempt by Shaohua Li to document blk-plug.

    [akpm@linux-foundation.org: grammatical and spelling tweaks]
    Signed-off-by: Suresh Jayaraman
    Cc: Shaohua Li
    Cc: Jonathan Corbet
    Signed-off-by: Andrew Morton
    Signed-off-by: Jens Axboe

    Suresh Jayaraman