10 Feb, 2017

1 commit

  • Today a Xenstore watch event is delivered via a callback function
    declared as:

    void (*callback)(struct xenbus_watch *,
    const char **vec, unsigned int len);

    As all watch events only ever come with two parameters (path and token)
    changing the prototype to:

    void (*callback)(struct xenbus_watch *,
    const char *path, const char *token);

    is the natural thing to do.

    Apply this change and adapt all users.

    Cc: konrad.wilk@oracle.com
    Cc: roger.pau@citrix.com
    Cc: wei.liu2@citrix.com
    Cc: paul.durrant@citrix.com
    Cc: netdev@vger.kernel.org

    Signed-off-by: Juergen Gross
    Reviewed-by: Paul Durrant
    Reviewed-by: Wei Liu
    Reviewed-by: Roger Pau Monné
    Reviewed-by: Boris Ostrovsky
    Signed-off-by: Boris Ostrovsky

    Juergen Gross
     

14 Dec, 2016

1 commit

  • Pull xen updates from Juergen Gross:
    "Xen features and fixes for 4.10

    These are some fixes, a move of some arm related headers to share them
    between arm and arm64 and a series introducing a helper to make code
    more readable.

    The most notable change is David stepping down as maintainer of the
    Xen hypervisor interface. This results in me sending you the pull
    requests for Xen related code from now on"

    * tag 'for-linus-4.10-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: (29 commits)
    xen/balloon: Only mark a page as managed when it is released
    xenbus: fix deadlock on writes to /proc/xen/xenbus
    xen/scsifront: don't request a slot on the ring until request is ready
    xen/x86: Increase xen_e820_map to E820_X_MAX possible entries
    x86: Make E820_X_MAX unconditionally larger than E820MAX
    xen/pci: Bubble up error and fix description.
    xen: xenbus: set error code on failure
    xen: set error code on failures
    arm/xen: Use alloc_percpu rather than __alloc_percpu
    arm/arm64: xen: Move shared architecture headers to include/xen/arm
    xen/events: use xen_vcpu_id mapping for EVTCHNOP_status
    xen/gntdev: Use VM_MIXEDMAP instead of VM_IO to avoid NUMA balancing
    xen-scsifront: Add a missing call to kfree
    MAINTAINERS: update XEN HYPERVISOR INTERFACE
    xenfs: Use proc_create_mount_point() to create /proc/xen
    xen-platform: use builtin_pci_driver
    xen-netback: fix error handling output
    xen: make use of xenbus_read_unsigned() in xenbus
    xen: make use of xenbus_read_unsigned() in xen-pciback
    xen: make use of xenbus_read_unsigned() in xen-fbfront
    ...

    Linus Torvalds
     

07 Nov, 2016

1 commit

  • Use xenbus_read_unsigned() instead of xenbus_scanf() when possible.
    This requires to change the type of one read from int to unsigned,
    but this case has been wrong before: negative values are not allowed
    for the modified case.

    Cc: konrad.wilk@oracle.com
    Cc: roger.pau@citrix.com

    Signed-off-by: Juergen Gross
    Acked-by: David Vrabel

    Juergen Gross
     

01 Nov, 2016

1 commit


28 Jul, 2016

1 commit

  • Pull xen updates from David Vrabel:
    "Features and fixes for 4.8-rc0:

    - ACPI support for guests on ARM platforms.
    - Generic steal time support for arm and x86.
    - Support cases where kernel cpu is not Xen VCPU number (e.g., if
    in-guest kexec is used).
    - Use the system workqueue instead of a custom workqueue in various
    places"

    * tag 'for-linus-4.8-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: (47 commits)
    xen: add static initialization of steal_clock op to xen_time_ops
    xen/pvhvm: run xen_vcpu_setup() for the boot CPU
    xen/evtchn: use xen_vcpu_id mapping
    xen/events: fifo: use xen_vcpu_id mapping
    xen/events: use xen_vcpu_id mapping in events_base
    x86/xen: use xen_vcpu_id mapping when pointing vcpu_info to shared_info
    x86/xen: use xen_vcpu_id mapping for HYPERVISOR_vcpu_op
    xen: introduce xen_vcpu_id mapping
    x86/acpi: store ACPI ids from MADT for future usage
    x86/xen: update cpuid.h from Xen-4.7
    xen/evtchn: add IOCTL_EVTCHN_RESTRICT
    xen-blkback: really don't leak mode property
    xen-blkback: constify instance of "struct attribute_group"
    xen-blkfront: prefer xenbus_scanf() over xenbus_gather()
    xen-blkback: prefer xenbus_scanf() over xenbus_gather()
    xen: support runqueue steal time on xen
    arm/xen: add support for vm_assist hypercall
    xen: update xen headers
    xen-pciback: drop superfluous variables
    xen-pciback: short-circuit read path used for merging write values
    ...

    Linus Torvalds
     

22 Jul, 2016

3 commits


09 Jun, 2016

1 commit


08 Jun, 2016

2 commits


14 Apr, 2016

1 commit


04 Mar, 2016

2 commits

  • The processes names are truncated to 17, while we had the length
    of the process as name 20 - which meant that while we filled
    it out with various details - the last 3 characters (which had
    the queue number) never surfaced to the user-space.

    To simplify this and be able to fit the device name, domain id,
    and the queue number we remove the 'blkback' from the name.

    Prior to this patch the device name is "blkback.."
    for example: blkback.8.xvda, blkback.11.hda.

    With the multiqueue block backend we add "-%d" for the queue.
    But sadly this is already way past the limit so it gets stripped.

    Possible solution had been identified by Ian:
    http://lists.xenproject.org/archives/html/xen-devel/2015-05/msg03516.html

    "
    If you are pressed for space then the "xvd" is probably a bit redundant
    in a string which starts blkbk.

    The guest may not even call the device xvdN (iirc BSD has another
    prefix) any how, so having blkback say so seems of limited use anyway.

    Since this seems to not include a partition number how does this work in
    the split partition scheme? (i.e. one where the guest is given xvda1 and
    xvda2 rather than xvda with a partition table)

    [It will be 'blkback.8.xvda1', and 'blkback.11.xvda2']

    Perhaps something derived from one of the schemes in
    http://xenbits.xen.org/docs/unstable/misc/vbd-interface.txt might be a
    better fit?

    After a bit of discussion (see
    http://lists.xenproject.org/archives/html/xen-devel/2015-12/msg01588.html)
    we settled on dropping the "blback" part.

    This will make it possible to have the .-:

    [1.xvda-0]
    [1.xvda-1]

    And we enough space to make it go up to:

    [32100.xvdfg9-5]

    Acked-by: Roger Pau Monné
    Reported-by: Jan Beulich
    Signed-off-by: Konrad Rzeszutek Wilk

    Konrad Rzeszutek Wilk
     
  • There's no reason to defer this until the connect phase, and in fact
    there are frontend implementations expecting this to be available
    earlier. Move it into the probe function.

    Acked-by: Roger Pau Monné
    Signed-off-by: Jan Beulich
    Cc: Bob Liu
    Signed-off-by: Konrad Rzeszutek Wilk

    Jan Beulich
     

22 Jan, 2016

1 commit

  • Pull block driver updates from Jens Axboe:
    "This is the block driver pull request for 4.5, with the exception of
    NVMe, which is in a separate branch and will be posted after this one.

    This pull request contains:

    - A set of bcache stability fixes, which have been acked by Kent.
    These have been used and tested for more than a year by the
    community, so it's about time that they got in.

    - A set of drbd updates from the drbd team (Andreas, Lars, Philipp)
    and Markus Elfring, Oleg Drokin.

    - A set of fixes for xen blkback/front from the usual suspects, (Bob,
    Konrad) as well as community based fixes from Kiri, Julien, and
    Peng.

    - A 2038 time fix for sx8 from Shraddha, with a fix from me.

    - A small mtip32xx cleanup from Zhu Yanjun.

    - A null_blk division fix from Arnd"

    * 'for-4.5/drivers' of git://git.kernel.dk/linux-block: (71 commits)
    null_blk: use sector_div instead of do_div
    mtip32xx: restrict variables visible in current code module
    xen/blkfront: Fix crash if backend doesn't follow the right states.
    xen/blkback: Fix two memory leaks.
    xen/blkback: make st_ statistics per ring
    xen/blkfront: Handle non-indirect grant with 64KB pages
    xen-blkfront: Introduce blkif_ring_get_request
    xen-blkback: clear PF_NOFREEZE for xen_blkif_schedule()
    xen/blkback: Free resources if connect_ring failed.
    xen/blocks: Return -EXX instead of -1
    xen/blkback: make pool of persistent grants and free pages per-queue
    xen/blkback: get the number of hardware queues/rings from blkfront
    xen/blkback: pseudo support for multi hardware queues/rings
    xen/blkback: separate ring information out of struct xen_blkif
    xen/blkfront: correct setting for xen_blkif_max_ring_order
    xen/blkfront: make persistent grants pool per-queue
    xen/blkfront: Remove duplicate setting of ->xbdev.
    xen/blkfront: Cleanup of comments, fix unaligned variables, and syntax errors.
    xen/blkfront: negotiate number of queues/rings to be used with backend
    xen/blkfront: split per device io_lock
    ...

    Linus Torvalds
     

05 Jan, 2016

9 commits

  • This patch fixs two memleaks:
    backtrace:
    [] kmemleak_alloc+0x28/0x50
    [] kmem_cache_alloc+0xbb/0x1d0
    [] xen_blkbk_probe+0x58/0x230
    [] xenbus_dev_probe+0x76/0x130
    [] driver_probe_device+0x166/0x2c0
    [] __device_attach_driver+0xac/0xb0
    [] bus_for_each_drv+0x67/0x90
    [] __device_attach+0xc7/0x120
    [] device_initial_probe+0x13/0x20
    [] bus_probe_device+0x9a/0xb0
    [] device_add+0x3b1/0x5c0
    [] device_register+0x1e/0x30
    [] xenbus_probe_node+0x158/0x170
    [] xenbus_dev_changed+0x1af/0x1c0
    [] backend_changed+0x1b/0x20
    [] xenwatch_thread+0xb6/0x160
    unreferenced object 0xffff880007ba8ef8 (size 224):

    backtrace:
    [] kmemleak_alloc+0x28/0x50
    [] __kmalloc+0xd3/0x1e0
    [] frontend_changed+0x2c7/0x580
    [] xenbus_otherend_changed+0xa2/0xb0
    [] frontend_changed+0x10/0x20
    [] xenwatch_thread+0xb6/0x160
    [] kthread+0xd7/0xf0
    [] ret_from_fork+0x3f/0x70
    [] 0xffffffffffffffff
    unreferenced object 0xffff8800048dcd38 (size 224):

    The first leak is caused by not put() the be->blkif reference
    which we had gotten in xen_blkif_alloc(), while the second is
    us not freeing blkif->rings in the right place.

    Signed-off-by: Bob Liu
    Reported-and-Tested-by: Konrad Rzeszutek Wilk
    Signed-off-by: Konrad Rzeszutek Wilk

    Bob Liu
     
  • Make st_* statistics per ring and the VBD sysfs would iterate over all the
    rings.

    Note: xenvbd_sysfs_delif() is called in xen_blkbk_remove() before all rings
    are torn down, so it's safe.

    Signed-off-by: Bob Liu
    Signed-off-by: Konrad Rzeszutek Wilk
    ---
    v2: Aligned the variables on the same column.

    Bob Liu
     
  • xen_blkif_schedule() kthread calls try_to_freeze() at the beginning of
    every attempt to purge the LRU. This operation can't ever succeed though,
    as the kthread hasn't marked itself as freezable.

    Before (hopefully eventually) kthread freezing gets converted to fileystem
    freezing, we'd rather mark xen_blkif_schedule() freezable (as it can
    generate I/O during suspend).

    Signed-off-by: Jiri Kosina
    Signed-off-by: Konrad Rzeszutek Wilk

    Jiri Kosina
     
  • With the multi-queue support we could fail at setting up
    some of the rings and fail the connection. That meant that
    all resources tied to rings[0..n-1] (where n is the ring
    that failed to be setup). Eventually the frontend will switch
    to the states and we will call xen_blkif_disconnect.

    However we do not want to be at the mercy of the frontend
    deciding when to change states. This allows us to do the
    cleanup right away and freeing resources.

    Signed-off-by: Konrad Rzeszutek Wilk

    Konrad Rzeszutek Wilk
     
  • Lets return sensible values instead of -1.

    Signed-off-by: Konrad Rzeszutek Wilk

    Konrad Rzeszutek Wilk
     
  • Make pool of persistent grants and free pages per-queue/ring instead of
    per-device to get better scalability.

    Test was done based on null_blk driver:
    dom0: v4.2-rc8 16vcpus 10GB "modprobe null_blk"
    domu: v4.2-rc8 16vcpus 10GB

    [test]
    rw=read
    direct=1
    ioengine=libaio
    bs=4k
    time_based
    runtime=30
    filename=/dev/xvdb
    numjobs=16
    iodepth=64
    iodepth_batch=64
    iodepth_batch_complete=64
    group_reporting

    Results:
    iops1: After patch "xen/blkfront: make persistent grants per-queue".
    iops2: After this patch.

    Queues: 1 4 8 16
    Iops orig(k): 810 1064 780 700
    Iops1(k): 810 1230(~20%) 1024(~20%) 850(~20%)
    Iops2(k): 810 1410(~35%) 1354(~75%) 1440(~100%)

    With 4 queues after this commit we can get ~75% increase in IOPS, and
    performance won't drop if increasing queue numbers.

    Please find the respective chart in this link:
    https://www.dropbox.com/s/agrcy2pbzbsvmwv/iops.png?dl=0

    Signed-off-by: Bob Liu
    Signed-off-by: Konrad Rzeszutek Wilk

    Bob Liu
     
  • Backend advertises "multi-queue-max-queues" to front, also get the negotiated
    number from "multi-queue-num-queues" written by blkfront.

    Signed-off-by: Bob Liu
    Signed-off-by: Konrad Rzeszutek Wilk

    Bob Liu
     
  • Preparatory patch for multiple hardware queues (rings). The number of
    rings is unconditionally set to 1, larger number will be enabled in
    "xen/blkback: get the number of hardware queues/rings from blkfront".

    Signed-off-by: Arianna Avanzini
    Signed-off-by: Bob Liu
    Signed-off-by: Konrad Rzeszutek Wilk
    ---
    v2: Align variables in the structures.

    Konrad Rzeszutek Wilk
     
  • Split per ring information to an new structure "xen_blkif_ring", so that one vbd
    device can be associated with one or more rings/hardware queues.

    Introduce 'pers_gnts_lock' to protect the pool of persistent grants since we
    may have multi backend threads.

    This patch is a preparation for supporting multi hardware queues/rings.

    Signed-off-by: Arianna Avanzini
    Signed-off-by: Bob Liu
    Signed-off-by: Konrad Rzeszutek Wilk
    ---
    v2: Align the variables in the structure.

    Bob Liu
     

18 Dec, 2015

2 commits

  • Since indirect descriptors are in memory shared with the frontend, the
    frontend could alter the first_sect and last_sect values after they have
    been validated but before they are recorded in the request. This may
    result in I/O requests that overflow the foreign page, possibly
    overwriting local pages when the I/O request is executed.

    When parsing indirect descriptors, only read first_sect and last_sect
    once.

    This is part of XSA155.

    CC: stable@vger.kernel.org
    Signed-off-by: Roger Pau Monné
    Signed-off-by: David Vrabel
    Signed-off-by: Konrad Rzeszutek Wilk

    Roger Pau Monné
     
  • A compiler may load a switch statement value multiple times, which could
    be bad when the value is in memory shared with the frontend.

    When converting a non-native request to a native one, ensure that
    src->operation is only loaded once by using READ_ONCE().

    This is part of XSA155.

    CC: stable@vger.kernel.org
    Signed-off-by: Roger Pau Monné
    Signed-off-by: David Vrabel
    Signed-off-by: Konrad Rzeszutek Wilk

    Roger Pau Monné
     

23 Oct, 2015

2 commits

  • Linux may use a different page size than the size of grant. So make
    clear that the order is actually in number of grant.

    Signed-off-by: Julien Grall
    Signed-off-by: David Vrabel

    Julien Grall
     
  • The PV block protocol is using 4KB page granularity. The goal of this
    patch is to allow a Linux using 64KB page granularity behaving as a
    block backend on a non-modified Xen.

    It's only necessary to adapt the ring size and the number of request per
    indirect frames. The rest of the code is relying on the grant table
    code.

    Note that the grant table code is allocating a Linux page per grant
    which will result to waste 6OKB for every grant when Linux is using 64KB
    page granularity. This could be improved by sharing the page between
    multiple grants.

    Signed-off-by: Julien Grall
    Acked-by: "Roger Pau Monné"
    Signed-off-by: David Vrabel

    Julien Grall
     

24 Sep, 2015

2 commits

  • …git/konrad/xen into for-linus

    Konrad writes:

    It has one fix that should go in and also be put in stable tree (I've
    added the CC already).

    It is a fix for a memory leak that can exposed via using UEFI
    xen-blkfront driver.

    Jens Axboe
     
  • This is due to commit 86839c56dee28c315a4c19b7bfee450ccd84cd25
    "xen/block: add multi-page ring support"

    When using an guest under UEFI - after the domain is destroyed
    the following warning comes from blkback.

    ------------[ cut here ]------------
    WARNING: CPU: 2 PID: 95 at
    /home/julien/works/linux/drivers/block/xen-blkback/xenbus.c:274
    xen_blkif_deferred_free+0x1f4/0x1f8()
    Modules linked in:
    CPU: 2 PID: 95 Comm: kworker/2:1 Tainted: G W 4.2.0 #85
    Hardware name: APM X-Gene Mustang board (DT)
    Workqueue: events xen_blkif_deferred_free
    Call trace:
    [] dump_backtrace+0x0/0x124
    [] show_stack+0x10/0x1c
    [] dump_stack+0x78/0x98
    [] warn_slowpath_common+0x9c/0xd4
    [] warn_slowpath_null+0x14/0x20
    [] xen_blkif_deferred_free+0x1f0/0x1f8
    [] process_one_work+0x160/0x3b4
    [] worker_thread+0x140/0x494
    [] kthread+0xd8/0xf0
    ---[ end trace 6f859b7883c88cdd ]---

    Request allocation has been moved to connect_ring, which is called every
    time blkback connects to the frontend (this can happen multiple times during
    a blkback instance life cycle). On the other hand, request freeing has not
    been moved, so it's only called when destroying the backend instance. Due to
    this mismatch, blkback can allocate the request pool multiple times, without
    freeing it.

    In order to fix it, move the freeing of requests to xen_blkif_disconnect to
    restore the symmetry between request allocation and freeing.

    Reported-by: Julien Grall
    Signed-off-by: Roger Pau Monné
    Tested-by: Julien Grall
    Cc: Konrad Rzeszutek Wilk
    Cc: Boris Ostrovsky
    Cc: David Vrabel
    Cc: xen-devel@lists.xenproject.org
    CC: stable@vger.kernel.org # 4.2
    Signed-off-by: Konrad Rzeszutek Wilk

    Roger Pau Monne
     

03 Sep, 2015

1 commit

  • Pull core block updates from Jens Axboe:
    "This first core part of the block IO changes contains:

    - Cleanup of the bio IO error signaling from Christoph. We used to
    rely on the uptodate bit and passing around of an error, now we
    store the error in the bio itself.

    - Improvement of the above from myself, by shrinking the bio size
    down again to fit in two cachelines on x86-64.

    - Revert of the max_hw_sectors cap removal from a revision again,
    from Jeff Moyer. This caused performance regressions in various
    tests. Reinstate the limit, bump it to a more reasonable size
    instead.

    - Make /sys/block//queue/discard_max_bytes writeable, by me.
    Most devices have huge trim limits, which can cause nasty latencies
    when deleting files. Enable the admin to configure the size down.
    We will look into having a more sane default instead of UINT_MAX
    sectors.

    - Improvement of the SGP gaps logic from Keith Busch.

    - Enable the block core to handle arbitrarily sized bios, which
    enables a nice simplification of bio_add_page() (which is an IO hot
    path). From Kent.

    - Improvements to the partition io stats accounting, making it
    faster. From Ming Lei.

    - Also from Ming Lei, a basic fixup for overflow of the sysfs pending
    file in blk-mq, as well as a fix for a blk-mq timeout race
    condition.

    - Ming Lin has been carrying Kents above mentioned patches forward
    for a while, and testing them. Ming also did a few fixes around
    that.

    - Sasha Levin found and fixed a use-after-free problem introduced by
    the bio->bi_error changes from Christoph.

    - Small blk cgroup cleanup from Viresh Kumar"

    * 'for-4.3/core' of git://git.kernel.dk/linux-block: (26 commits)
    blk: Fix bio_io_vec index when checking bvec gaps
    block: Replace SG_GAPS with new queue limits mask
    block: bump BLK_DEF_MAX_SECTORS to 2560
    Revert "block: remove artifical max_hw_sectors cap"
    blk-mq: fix race between timeout and freeing request
    blk-mq: fix buffer overflow when reading sysfs file of 'pending'
    Documentation: update notes in biovecs about arbitrarily sized bios
    block: remove bio_get_nr_vecs()
    fs: use helper bio_add_page() instead of open coding on bi_io_vec
    block: kill merge_bvec_fn() completely
    md/raid5: get rid of bio_fits_rdev()
    md/raid5: split bio for chunk_aligned_read
    block: remove split code in blkdev_issue_{discard,write_same}
    btrfs: remove bio splitting and merge_bvec_fn() calls
    bcache: remove driver private bio splitting code
    block: simplify bio_add_page()
    block: make generic_make_request handle arbitrarily sized bios
    blk-cgroup: Drop unlikely before IS_ERR(_OR_NULL)
    block: don't access bio->bi_error after bio_put()
    block: shrink struct bio down to 2 cache lines again
    ...

    Linus Torvalds
     

29 Jul, 2015

1 commit

  • Currently we have two different ways to signal an I/O error on a BIO:

    (1) by clearing the BIO_UPTODATE flag
    (2) by returning a Linux errno value to the bi_end_io callback

    The first one has the drawback of only communicating a single possible
    error (-EIO), and the second one has the drawback of not beeing persistent
    when bios are queued up, and are not passed along from child to parent
    bio in the ever more popular chaining scenario. Having both mechanisms
    available has the additional drawback of utterly confusing driver authors
    and introducing bugs where various I/O submitters only deal with one of
    them, and the others have to add boilerplate code to deal with both kinds
    of error returns.

    So add a new bi_error field to store an errno value directly in struct
    bio and remove the existing mechanisms to clean all this up.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Hannes Reinecke
    Reviewed-by: NeilBrown
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

28 Jul, 2015

1 commit


24 Jul, 2015

1 commit


02 Jul, 2015

1 commit

  • Pull xen updates from David Vrabel:
    "Xen features and cleanups for 4.2-rc0:

    - add "make xenconfig" to assist in generating configs for Xen guests

    - preparatory cleanups necessary for supporting 64 KiB pages in ARM
    guests

    - automatically use hvc0 as the default console in ARM guests"

    * tag 'for-linus-4.2-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
    block/xen-blkback: s/nr_pages/nr_segs/
    block/xen-blkfront: Remove invalid comment
    block/xen-blkfront: Remove unused macro MAXIMUM_OUTSTANDING_BLOCK_REQS
    arm/xen: Drop duplicate define mfn_to_virt
    xen/grant-table: Remove unused macro SPP
    xen/xenbus: client: Fix call of virt_to_mfn in xenbus_grant_ring
    xen: Include xen/page.h rather than asm/xen/page.h
    kconfig: add xenconfig defconfig helper
    kconfig: clarify kvmconfig is for kvm
    xen/pcifront: Remove usage of struct timeval
    xen/tmem: use BUILD_BUG_ON() in favor of BUG_ON()
    hvc_xen: avoid uninitialized variable warning
    xenbus: avoid uninitialized variable warning
    xen/arm: allow console=hvc0 to be omitted for guests
    arm,arm64/xen: move Xen initialization earlier
    arm/xen: Correctly check if the event channel interrupt is present

    Linus Torvalds
     

28 Jun, 2015

1 commit


17 Jun, 2015

1 commit


06 Jun, 2015

2 commits

  • Extend xen/block to support multi-page ring, so that more requests can be
    issued by using more than one pages as the request ring between blkfront
    and backend.
    As a result, the performance can get improved significantly.

    We got some impressive improvements on our highend iscsi storage cluster
    backend. If using 64 pages as the ring, the IOPS increased about 15 times
    for the throughput testing and above doubled for the latency testing.

    The reason was the limit on outstanding requests is 32 if use only one-page
    ring, but in our case the iscsi lun was spread across about 100 physical
    drives, 32 was really not enough to keep them busy.

    Changes in v2:
    - Rebased to 4.0-rc6.
    - Document on how multi-page ring feature working to linux io/blkif.h.

    Changes in v3:
    - Remove changes to linux io/blkif.h and follow the protocol defined
    in io/blkif.h of XEN tree.
    - Rebased to 4.1-rc3

    Changes in v4:
    - Turn to use 'ring-page-order' and 'max-ring-page-order'.
    - A few comments from Roger.

    Changes in v5:
    - Clarify with 4k granularity to comment
    - Address more comments from Roger

    Signed-off-by: Bob Liu
    Signed-off-by: Konrad Rzeszutek Wilk

    Bob Liu
     
  • This is a pre-patch for multi-page ring feature.
    In connect_ring, we can know exactly how many pages are used for the shared
    ring, delay pending_req allocation here so that we won't waste too much memory.

    Signed-off-by: Bob Liu
    Signed-off-by: Konrad Rzeszutek Wilk

    Bob Liu
     

27 Apr, 2015

1 commit