30 Jul, 2012

4 commits

  • This patch adds support for the new VIRTIO_BLK_F_CONFIG_WCE feature,
    which exposes the cache mode in the configuration space and lets the
    driver modify it. The cache mode is exposed via sysfs.

    Even if the host does not support the new feature, the cache mode is
    visible (thanks to the existing VIRTIO_BLK_F_WCE), but not modifiable.

    Signed-off-by: Paolo Bonzini
    Signed-off-by: Rusty Russell

    Paolo Bonzini
     
  • Block layer will allocate a spinlock for the queue if the driver does
    not provide one in blk_init_queue().

    The reason to use the internal spinlock is that blk_cleanup_queue() will
    switch to use the internal spinlock in the cleanup code path.

    if (q->queue_lock != &q->__queue_lock)
    q->queue_lock = &q->__queue_lock;

    However, processes which are in D state might have taken the driver
    provided spinlock, when the processes wake up, they would release the
    block provided spinlock.

    =====================================
    [ BUG: bad unlock balance detected! ]
    3.4.0-rc7+ #238 Not tainted
    -------------------------------------
    fio/3587 is trying to release lock (&(&q->__queue_lock)->rlock) at:
    [] blk_queue_bio+0x2a2/0x380
    but there are no more locks to release!

    other info that might help us debug this:
    1 lock held by fio/3587:
    #0: (&(&vblk->lock)->rlock){......}, at:
    [] get_request_wait+0x19a/0x250

    Other drivers use block layer provided spinlock as well, e.g. SCSI.

    Switching to the block layer provided spinlock saves a bit of memory and
    does not increase lock contention. Performance test shows no real
    difference is observed before and after this patch.

    Changes in v2: Improve commit log as Michael suggested.

    Cc: virtualization@lists.linux-foundation.org
    Cc: kvm@vger.kernel.org
    Cc: stable@kernel.org
    Signed-off-by: Asias He
    Acked-by: Michael S. Tsirkin
    Signed-off-by: Rusty Russell

    Asias He
     
  • blk_cleanup_queue() will call blk_drian_queue() to drain all the
    requests before queue DEAD marking. If we reset the device before
    blk_cleanup_queue() the drain would fail.

    1) if the queue is stopped in do_virtblk_request() because device is
    full, the q->request_fn() will not be called.

    blk_drain_queue() {
    while(true) {
    ...
    if (!list_empty(&q->queue_head))
    __blk_run_queue(q) {
    if (queue is not stoped)
    q->request_fn()
    }
    ...
    }
    }

    Do no reset the device before blk_cleanup_queue() gives the chance to
    start the queue in interrupt handler blk_done().

    2) In commit b79d866c8b7014a51f611a64c40546109beaf24a, We abort requests
    dispatched to driver before blk_cleanup_queue(). There is a race if
    requests are dispatched to driver after the abort and before the queue
    DEAD mark. To fix this, instead of aborting the requests explicitly, we
    can just reset the device after after blk_cleanup_queue so that the
    device can complete all the requests before queue DEAD marking in the
    drain process.

    Cc: Rusty Russell
    Cc: virtualization@lists.linux-foundation.org
    Cc: kvm@vger.kernel.org
    Cc: stable@kernel.org
    Signed-off-by: Asias He
    Acked-by: Michael S. Tsirkin
    Signed-off-by: Rusty Russell

    Asias He
     
  • del_gendisk() might not return due to failing to remove the
    /sys/block/vda/serial sysfs entry when another thread (udev) is
    trying to read it.

    virtblk_remove()
    vdev->config->reset() : guest will not kick us through interrupt
    del_gendisk()
    device_del()
    kobject_del(): got stuck, sysfs entry ref count non zero

    sysfs_open_file(): user space process read /sys/block/vda/serial
    sysfs_get_active() : got sysfs entry ref count
    dev_attr_show()
    virtblk_serial_show()
    blk_execute_rq() : got stuck, interrupt is disabled
    request cannot be finished

    This patch fixes it by calling del_gendisk() before we disable guest's
    interrupt so that the request sent in virtblk_serial_show() will be
    finished and del_gendisk() will success.

    This fixes another race in hot-unplug process.

    It is save to call del_gendisk(vblk->disk) before
    flush_work(&vblk->config_work) which might access vblk->disk, because
    vblk->disk is not freed until put_disk(vblk->disk).

    Cc: virtualization@lists.linux-foundation.org
    Cc: kvm@vger.kernel.org
    Cc: stable@kernel.org
    Signed-off-by: Asias He
    Acked-by: Michael S. Tsirkin
    Signed-off-by: Rusty Russell

    Asias He
     

22 May, 2012

2 commits

  • Benchmark shows small performance improvement on fusion io device.

    Before:
    seq-read : io=1,024MB, bw=19,982KB/s, iops=39,964, runt= 52475msec
    seq-write: io=1,024MB, bw=20,321KB/s, iops=40,641, runt= 51601msec
    rnd-read : io=1,024MB, bw=15,404KB/s, iops=30,808, runt= 68070msec
    rnd-write: io=1,024MB, bw=14,776KB/s, iops=29,552, runt= 70963msec

    After:
    seq-read : io=1,024MB, bw=20,343KB/s, iops=40,685, runt= 51546msec
    seq-write: io=1,024MB, bw=20,803KB/s, iops=41,606, runt= 50404msec
    rnd-read : io=1,024MB, bw=16,221KB/s, iops=32,442, runt= 64642msec
    rnd-write: io=1,024MB, bw=15,199KB/s, iops=30,397, runt= 68991msec

    Signed-off-by: Asias He
    Signed-off-by: Rusty Russell

    Asias He
     
  • If we reset the virtio-blk device before the requests already dispatched
    to the virtio-blk driver from the block layer are finised, we will stuck
    in blk_cleanup_queue() and the remove will fail.

    blk_cleanup_queue() calls blk_drain_queue() to drain all requests queued
    before DEAD marking. However it will never success if the device is
    already stopped. We'll have q->in_flight[] > 0, so the drain will not
    finish.

    How to reproduce the race:
    1. hot-plug a virtio-blk device
    2. keep reading/writing the device in guest
    3. hot-unplug while the device is busy serving I/O

    Test:
    ~1000 rounds of hot-plug/hot-unplug test passed with this patch.

    Changes in v3:
    - Drop blk_abort_queue and blk_abort_request
    - Use __blk_end_request_all to complete request dispatched to driver

    Changes in v2:
    - Drop req_in_flight
    - Use virtqueue_detach_unused_buf to get request dispatched to driver

    Signed-off-by: Asias He
    Signed-off-by: Rusty Russell

    Asias He
     

17 Apr, 2012

1 commit

  • Pull virtio fixes from Michael S. Tsirkin:
    "Here are some virtio fixes for 3.4: a test build fix, a patch by Ren
    fixing naming for systems with a massive number of virtio blk devices,
    and balloon fixes for powerpc by David Gibson.

    There was some discussion about Ren's patch for virtio disc naming:
    some people wanted to move the legacy name mangling function to the
    block core. But there's no concensus on that yet, and we can always
    deduplicate later. Added comments in the hope that this will stop
    people from copying this legacy naming scheme into future drivers."

    * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
    virtio_balloon: fix handling of PAGE_SIZE != 4k
    virtio_balloon: Fix endian bug
    virtio_blk: helper function to format disk names
    tools/virtio: fix up vhost/test module build

    Linus Torvalds
     

12 Apr, 2012

1 commit

  • The current virtio block's naming algorithm just supports 18278
    (26^3 + 26^2 + 26) disks. If there are more virtio blocks,
    there will be disks with the same name.

    Based on commit 3e1a7ff8a0a7b948f2684930166954f9e8e776fe, add
    a function "virtblk_name_format()" for virtio block to support mass
    of disks naming.

    Notes:
    - Our naming scheme is ugly. We are stuck with it
    for virtio but don't use it for any new driver:
    new drivers should name their devices PREFIX%d
    where the sequence number can be allocated by ida
    - sd_format_disk_name has exactly the same logic.
    Moving it to a central place was deferred over worries
    that this will make people keep using the legacy naming
    in new drivers.
    We kept code idential in case someone wants to deduplicate later.

    Signed-off-by: Ren Mingxin
    Acked-by: Asias He
    Signed-off-by: Michael S. Tsirkin

    Ren Mingxin
     

29 Mar, 2012

1 commit

  • If a virtio disk is open in guest and a disk resize operation is done,
    (virsh blockresize), new size is not visible to tools like "fdisk -l".
    This seems to be happening as we update only part->nr_sects and not
    bdev->bd_inode size.

    Call revalidate_disk() which should take care of it. I tested growing disk
    size of already open disk and it works for me.

    Signed-off-by: Vivek Goyal
    Signed-off-by: Jens Axboe

    Vivek Goyal
     

15 Jan, 2012

1 commit

  • Introduce a wrapper around scsi_cmd_ioctl that takes a block device.

    The function will then be enhanced to detect partition block devices
    and, in that case, subject the ioctls to whitelisting.

    Cc: linux-scsi@vger.kernel.org
    Cc: Jens Axboe
    Cc: James Bottomley
    Signed-off-by: Paolo Bonzini
    Signed-off-by: Linus Torvalds

    Paolo Bonzini
     

12 Jan, 2012

4 commits


07 Nov, 2011

1 commit

  • * 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits)
    Revert "tracing: Include module.h in define_trace.h"
    irq: don't put module.h into irq.h for tracking irqgen modules.
    bluetooth: macroize two small inlines to avoid module.h
    ip_vs.h: fix implicit use of module_get/module_put from module.h
    nf_conntrack.h: fix up fallout from implicit moduleparam.h presence
    include: replace linux/module.h with "struct module" wherever possible
    include: convert various register fcns to macros to avoid include chaining
    crypto.h: remove unused crypto_tfm_alg_modname() inline
    uwb.h: fix implicit use of asm/page.h for PAGE_SIZE
    pm_runtime.h: explicitly requires notifier.h
    linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h
    miscdevice.h: fix up implicit use of lists and types
    stop_machine.h: fix implicit use of smp.h for smp_processor_id
    of: fix implicit use of errno.h in include/linux/of.h
    of_platform.h: delete needless include
    acpi: remove module.h include from platform/aclinux.h
    miscdevice.h: delete unnecessary inclusion of module.h
    device_cgroup.h: delete needless include
    net: sch_generic remove redundant use of
    net: inet_timewait_sock doesnt need
    ...

    Fix up trivial conflicts (other header files, and removal of the ab3550 mfd driver) in
    - drivers/media/dvb/frontends/dibx000_common.c
    - drivers/media/video/{mt9m111.c,ov6650.c}
    - drivers/mfd/ab3550-core.c
    - include/linux/dmaengine.h

    Linus Torvalds
     

02 Nov, 2011

1 commit

  • Based on a patch by Mark Wu

    Current index allocation in virtio-blk is based on a monotonically
    increasing variable "index". This means we'll run out of numbers
    after a while. It also could cause confusion about the disk
    name in the case of hot-plugging disks.
    Change virtio-blk to use ida to allocate index, instead.

    Signed-off-by: Michael S. Tsirkin
    Signed-off-by: Rusty Russell

    Michael S. Tsirkin
     

01 Nov, 2011

1 commit


30 May, 2011

2 commits

  • It is easier to figure out the context by reading SCSI_SENSE_BUFFERSIZE
    instead of plain '96'.

    Signed-off-by: Liu Yuan
    Signed-off-by: Rusty Russell

    Liu Yuan
     
  • Wire up the virtio_driver config_changed method to get notified about
    config changes raised by the host. For now we just re-read the device
    size to support online resizing of devices, but once we add more
    attributes that might be changeable they could be added as well.

    Note that the config_changed method is called from irq context, so
    we'll have to use the workqueue infrastructure to provide us a proper
    user context for our changes.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Rusty Russell

    Christoph Hellwig
     

23 Oct, 2010

1 commit

  • * 'for-2.6.37/barrier' of git://git.kernel.dk/linux-2.6-block: (46 commits)
    xen-blkfront: disable barrier/flush write support
    Added blk-lib.c and blk-barrier.c was renamed to blk-flush.c
    block: remove BLKDEV_IFL_WAIT
    aic7xxx_old: removed unused 'req' variable
    block: remove the BH_Eopnotsupp flag
    block: remove the BLKDEV_IFL_BARRIER flag
    block: remove the WRITE_BARRIER flag
    swap: do not send discards as barriers
    fat: do not send discards as barriers
    ext4: do not send discards as barriers
    jbd2: replace barriers with explicit flush / FUA usage
    jbd2: Modify ASYNC_COMMIT code to not rely on queue draining on barrier
    jbd: replace barriers with explicit flush / FUA usage
    nilfs2: replace barriers with explicit flush / FUA usage
    reiserfs: replace barriers with explicit flush / FUA usage
    gfs2: replace barriers with explicit flush / FUA usage
    btrfs: replace barriers with explicit flush / FUA usage
    xfs: replace barriers with explicit flush / FUA usage
    block: pass gfp_mask and flags to sb_issue_discard
    dm: convey that all flushes are processed as empty
    ...

    Linus Torvalds
     

21 Oct, 2010

1 commit

  • Remove the BKL usage added in "block: push down BKL into .locked_ioctl".
    Virtio-blk doesn't use the BKL for anything, and doesn't implement any
    ioctl command by itself, but only uses the generic scsi_cmd_ioctl
    which is fine without the BKL.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Rusty Russell

    Christoph Hellwig
     

19 Oct, 2010

1 commit


10 Oct, 2010

1 commit


10 Sep, 2010

3 commits

  • Remove now unused REQ_HARDBARRIER support. virtio_blk already
    supports REQ_FLUSH and the usefulness of REQ_FUA for virtio_blk is
    questionable at this point, so there's nothing else to do to support
    new REQ_FLUSH/FUA interface.

    Signed-off-by: Tejun Heo
    Cc: Michael S. Tsirkin
    Cc: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • Barrier is deemed too heavy and will soon be replaced by FLUSH/FUA
    requests. Deprecate barrier. All REQ_HARDBARRIERs are failed with
    -EOPNOTSUPP and blk_queue_ordered() is replaced with simpler
    blk_queue_flush().

    blk_queue_flush() takes combinations of REQ_FLUSH and FUA. If a
    device has write cache and can flush it, it should set REQ_FLUSH. If
    the device can handle FUA writes, it should also set REQ_FUA.

    All blk_queue_ordered() users are converted.

    * ORDERED_DRAIN is mapped to 0 which is the default value.
    * ORDERED_DRAIN_FLUSH is mapped to REQ_FLUSH.
    * ORDERED_DRAIN_FLUSH_FUA is mapped to REQ_FLUSH | REQ_FUA.

    Signed-off-by: Tejun Heo
    Acked-by: Boaz Harrosh
    Cc: Christoph Hellwig
    Cc: Nick Piggin
    Cc: Michael S. Tsirkin
    Cc: Jeremy Fitzhardinge
    Cc: Chris Wright
    Cc: FUJITA Tomonori
    Cc: Geert Uytterhoeven
    Cc: David S. Miller
    Cc: Alasdair G Kergon
    Cc: Pierre Ossman
    Cc: Stefan Weinhuber
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • Nobody is making meaningful use of ORDERED_BY_TAG now and queue
    draining for barrier requests will be removed soon which will render
    the advantage of tag ordering moot. Kill ORDERED_BY_TAG. The
    following users are affected.

    * brd: converted to ORDERED_DRAIN.
    * virtio_blk: ORDERED_TAG path was already marked deprecated. Removed.
    * xen-blkfront: ORDERED_TAG case dropped.

    Signed-off-by: Tejun Heo
    Cc: Christoph Hellwig
    Cc: Nick Piggin
    Cc: Michael S. Tsirkin
    Cc: Jeremy Fitzhardinge
    Cc: Chris Wright
    Signed-off-by: Jens Axboe

    Tejun Heo
     

11 Aug, 2010

1 commit

  • * 'for-2.6.36' of git://git.kernel.dk/linux-2.6-block: (149 commits)
    block: make sure that REQ_* types are seen even with CONFIG_BLOCK=n
    xen-blkfront: fix missing out label
    blkdev: fix blkdev_issue_zeroout return value
    block: update request stacking methods to support discards
    block: fix missing export of blk_types.h
    writeback: fix bad _bh spinlock nesting
    drbd: revert "delay probes", feature is being re-implemented differently
    drbd: Initialize all members of sync_conf to their defaults [Bugz 315]
    drbd: Disable delay probes for the upcomming release
    writeback: cleanup bdi_register
    writeback: add new tracepoints
    writeback: remove unnecessary init_timer call
    writeback: optimize periodic bdi thread wakeups
    writeback: prevent unnecessary bdi threads wakeups
    writeback: move bdi threads exiting logic to the forker thread
    writeback: restructure bdi forker loop a little
    writeback: move last_active to bdi
    writeback: do not remove bdi from bdi_list
    writeback: simplify bdi code a little
    writeback: do not lose wake-ups in bdi threads
    ...

    Fixed up pretty trivial conflicts in drivers/block/virtio_blk.c and
    drivers/scsi/scsi_error.c as per Jens.

    Linus Torvalds
     

08 Aug, 2010

5 commits

  • As a preparation for the removal of the big kernel
    lock in the block layer, this removes the BKL
    from the common ioctl handling code, moving it
    into every single driver still using it.

    Signed-off-by: Arnd Bergmann
    Acked-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Arnd Bergmann
     
  • This removes q->prepare_flush_fn completely (changes the
    blk_queue_ordered API).

    Signed-off-by: FUJITA Tomonori
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    FUJITA Tomonori
     
  • use REQ_FLUSH flag instead.

    Signed-off-by: FUJITA Tomonori
    Cc: Rusty Russell
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    FUJITA Tomonori
     
  • On compilation, gcc correctly detects that we do not handle
    all types:

    In function ‘blk_done’:
    warning: enumeration value ‘REQ_TYPE_FS’ not handled in switch
    warning: enumeration value ‘REQ_TYPE_SENSE’ not handled in switch
    warning: enumeration value ‘REQ_TYPE_PM_SUSPEND’ not handled in switch
    warning: enumeration value ‘REQ_TYPE_PM_RESUME’ not handled in switch
    warning: enumeration value ‘REQ_TYPE_PM_SHUTDOWN’ not handled in switch
    warning: enumeration value ‘REQ_TYPE_LINUX_BLOCK’ not handled in switch
    warning: enumeration value ‘REQ_TYPE_ATA_TASKFILE’ not handled in switch
    warning: enumeration value ‘REQ_TYPE_ATA_PC’ not handled in switch

    which is a bit pointless since this is at the end of the request
    processessing. Add a default case that just breaks out.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Remove all the trivial wrappers for the cmd_type and cmd_flags fields in
    struct requests. This allows much easier grepping for different request
    types instead of unwinding through macros.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

05 Aug, 2010

3 commits

  • With the availablility of a sysfs device attribute for examining disk serial
    numbers the ioctl is no longer needed. The user-space changes for this aren't
    upstream yet so we don't have any users to worry about.

    Signed-off-by: Ryan Harper
    Signed-off-by: Rusty Russell

    Ryan Harper
     
  • Create a new attribute for virtio-blk devices that will fetch the serial number
    of the block device. This attribute can be used by udev to create disk/by-id
    symlinks for devices that don't have a UUID (filesystem) associated with them.

    ATA_IDENTIFY strings are special in that they can be up to 20 chars long
    and aren't required to be nul-terminated. The buffer is also zero-padded
    meaning that if the serial is 19 chars or less that we get a nul-terminated
    string. When copying this value into a string buffer, we must be careful to
    copy up to the nul (if it present) and only 20 if it is longer and not to
    attempt to nul terminate; this isn't needed.

    Changes since v1:
    - Added BUILD_BUG_ON() for PAGE_SIZE check
    - Removed min() since BUILD_BUG_ON() handles the check
    - Replaced serial_sysfs() by copying id directly to buffer

    Signed-off-by: Ryan Harper
    Signed-off-by: john cooper
    Signed-off-by: Rusty Russell

    Ryan Harper
     
  • If we want to support barriers with the cache=writethrough mode in qemu
    we need to tell the block layer that we only need queue drains to
    implement a barrier. Follow the model set by SCSI and IDE and assume
    that there is no volatile write cache if the host doesn't advertize it.
    While this might imply working barriers on old qemu versions or other
    hypervisors that actually have a volatile write cache this is only a
    cosmetic issue - these hypervisors don't guarantee any data integrity
    with or without this patch, but with the patch we at least provide
    data ordering.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Rusty Russell

    Christoph Hellwig
     

03 Jun, 2010

1 commit


19 May, 2010

4 commits