20 Aug, 2016

3 commits


28 Jul, 2016

1 commit

  • Pull xen updates from David Vrabel:
    "Features and fixes for 4.8-rc0:

    - ACPI support for guests on ARM platforms.
    - Generic steal time support for arm and x86.
    - Support cases where kernel cpu is not Xen VCPU number (e.g., if
    in-guest kexec is used).
    - Use the system workqueue instead of a custom workqueue in various
    places"

    * tag 'for-linus-4.8-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: (47 commits)
    xen: add static initialization of steal_clock op to xen_time_ops
    xen/pvhvm: run xen_vcpu_setup() for the boot CPU
    xen/evtchn: use xen_vcpu_id mapping
    xen/events: fifo: use xen_vcpu_id mapping
    xen/events: use xen_vcpu_id mapping in events_base
    x86/xen: use xen_vcpu_id mapping when pointing vcpu_info to shared_info
    x86/xen: use xen_vcpu_id mapping for HYPERVISOR_vcpu_op
    xen: introduce xen_vcpu_id mapping
    x86/acpi: store ACPI ids from MADT for future usage
    x86/xen: update cpuid.h from Xen-4.7
    xen/evtchn: add IOCTL_EVTCHN_RESTRICT
    xen-blkback: really don't leak mode property
    xen-blkback: constify instance of "struct attribute_group"
    xen-blkfront: prefer xenbus_scanf() over xenbus_gather()
    xen-blkback: prefer xenbus_scanf() over xenbus_gather()
    xen: support runqueue steal time on xen
    arm/xen: add support for vm_assist hypercall
    xen: update xen headers
    xen-pciback: drop superfluous variables
    xen-pciback: short-circuit read path used for merging write values
    ...

    Linus Torvalds
     

27 Jul, 2016

2 commits

  • Pull block driver updates from Jens Axboe:
    "This branch also contains core changes. I've come to the conclusion
    that from 4.9 and forward, I'll be doing just a single branch. We
    often have dependencies between core and drivers, and it's hard to
    always split them up appropriately without pulling core into drivers
    when that happens.

    That said, this contains:

    - separate secure erase type for the core block layer, from
    Christoph.

    - set of discard fixes, from Christoph.

    - bio shrinking fixes from Christoph, as a followup up to the
    op/flags change in the core branch.

    - map and append request fixes from Christoph.

    - NVMeF (NVMe over Fabrics) code from Christoph. This is pretty
    exciting!

    - nvme-loop fixes from Arnd.

    - removal of ->driverfs_dev from Dan, after providing a
    device_add_disk() helper.

    - bcache fixes from Bhaktipriya and Yijing.

    - cdrom subchannel read fix from Vchannaiah.

    - set of lightnvm updates from Wenwei, Matias, Johannes, and Javier.

    - set of drbd updates and fixes from Fabian, Lars, and Philipp.

    - mg_disk error path fix from Bart.

    - user notification for failed device add for loop, from Minfei.

    - NVMe in general:
    + NVMe delay quirk from Guilherme.
    + SR-IOV support and command retry limits from Keith.
    + fix for memory-less NUMA node from Masayoshi.
    + use UINT_MAX for discard sectors, from Minfei.
    + cancel IO fixes from Ming.
    + don't allocate unused major, from Neil.
    + error code fixup from Dan.
    + use constants for PSDT/FUSE from James.
    + variable init fix from Jay.
    + fabrics fixes from Ming, Sagi, and Wei.
    + various fixes"

    * 'for-4.8/drivers' of git://git.kernel.dk/linux-block: (115 commits)
    nvme/pci: Provide SR-IOV support
    nvme: initialize variable before logical OR'ing it
    block: unexport various bio mapping helpers
    scsi/osd: open code blk_make_request
    target: stop using blk_make_request
    block: simplify and export blk_rq_append_bio
    block: ensure bios return from blk_get_request are properly initialized
    virtio_blk: use blk_rq_map_kern
    memstick: don't allow REQ_TYPE_BLOCK_PC requests
    block: shrink bio size again
    block: simplify and cleanup bvec pool handling
    block: get rid of bio_rw and READA
    block: don't ignore -EOPNOTSUPP blkdev_issue_write_same
    block: introduce BLKDEV_DISCARD_ZERO to fix zeroout
    NVMe: don't allocate unused nvme_major
    nvme: avoid crashes when node 0 is memoryless node.
    nvme: Limit command retries
    loop: Make user notify for adding loop device failed
    nvme-loop: fix nvme-loop Kconfig dependencies
    nvmet: fix return value check in nvmet_subsys_alloc()
    ...

    Linus Torvalds
     
  • Pull core block updates from Jens Axboe:

    - the big change is the cleanup from Mike Christie, cleaning up our
    uses of command types and modified flags. This is what will throw
    some merge conflicts

    - regression fix for the above for btrfs, from Vincent

    - following up to the above, better packing of struct request from
    Christoph

    - a 2038 fix for blktrace from Arnd

    - a few trivial/spelling fixes from Bart Van Assche

    - a front merge check fix from Damien, which could cause issues on
    SMR drives

    - Atari partition fix from Gabriel

    - convert cfq to highres timers, since jiffies isn't granular enough
    for some devices these days. From Jan and Jeff

    - CFQ priority boost fix idle classes, from me

    - cleanup series from Ming, improving our bio/bvec iteration

    - a direct issue fix for blk-mq from Omar

    - fix for plug merging not involving the IO scheduler, like we do for
    other types of merges. From Tahsin

    - expose DAX type internally and through sysfs. From Toshi and Yigal

    * 'for-4.8/core' of git://git.kernel.dk/linux-block: (76 commits)
    block: Fix front merge check
    block: do not merge requests without consulting with io scheduler
    block: Fix spelling in a source code comment
    block: expose QUEUE_FLAG_DAX in sysfs
    block: add QUEUE_FLAG_DAX for devices to advertise their DAX support
    Btrfs: fix comparison in __btrfs_map_block()
    block: atari: Return early for unsupported sector size
    Doc: block: Fix a typo in queue-sysfs.txt
    cfq-iosched: Charge at least 1 jiffie instead of 1 ns
    cfq-iosched: Fix regression in bonnie++ rewrite performance
    cfq-iosched: Convert slice_resid from u64 to s64
    block: Convert fifo_time from ulong to u64
    blktrace: avoid using timespec
    block/blk-cgroup.c: Declare local symbols static
    block/bio-integrity.c: Add #include "blk.h"
    block/partition-generic.c: Remove a set-but-not-used variable
    block: bio: kill BIO_MAX_SIZE
    cfq-iosched: temporarily boost queue priority for idle classes
    block: drbd: avoid to use BIO_MAX_SIZE
    block: bio: remove BIO_MAX_SECTORS
    ...

    Linus Torvalds
     

22 Jul, 2016

1 commit


30 Jun, 2016

1 commit

  • Uncompleted reqs used to be 'saved and resubmitted' in blkfront_recover() during
    migration, but that's too late after multi-queue was introduced.

    After a migrate to another host (which may not have multiqueue support), the
    number of rings (block hardware queues) may be changed and the ring and shadow
    structure will also be reallocated.

    The blkfront_recover() then can't 'save and resubmit' the real
    uncompleted reqs because shadow structure have been reallocated.

    This patch fixes this issue by moving the 'save' logic out of
    blkfront_recover() to earlier place in blkfront_resume().

    The 'resubmit' is not changed and still in blkfront_recover().

    Signed-off-by: Bob Liu
    Signed-off-by: Konrad Rzeszutek Wilk
    Cc: stable@vger.kernel.org

    Bob Liu
     

28 Jun, 2016

1 commit

  • For block drivers that specify a parent device, convert them to use
    device_add_disk().

    This conversion was done with the following semantic patch:

    @@
    struct gendisk *disk;
    expression E;
    @@

    - disk->driverfs_dev = E;
    ...
    - add_disk(disk);
    + device_add_disk(E, disk);

    @@
    struct gendisk *disk;
    expression E1, E2;
    @@

    - disk->driverfs_dev = E1;
    ...
    E2 = disk;
    ...
    - add_disk(E2);
    + device_add_disk(E1, E2);

    ...plus some manual fixups for a few missed conversions.

    Cc: Jens Axboe
    Cc: Keith Busch
    Cc: Michael S. Tsirkin
    Cc: David Woodhouse
    Cc: David S. Miller
    Cc: James Bottomley
    Cc: Ross Zwisler
    Cc: Konrad Rzeszutek Wilk
    Cc: Martin K. Petersen
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Johannes Thumshirn
    Signed-off-by: Dan Williams

    Dan Williams
     

09 Jun, 2016

3 commits

  • Instead of overloading the discard support with the REQ_SECURE flag.
    Use the opportunity to rename the queue flag as well, and remove the
    dead checks for this flag in the RAID 1 and RAID 10 drivers that don't
    claim support for secure erase.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • After a migrate to another host (which may not have multiqueue
    support), the number of rings (block hardware queues)
    may be changed and the ring info structure will also be reallocated.

    This patch fixes two related bugs:
    * call blk_mq_update_nr_hw_queues() to make blk-core know the number
    of hardware queues have been changed.
    * Don't store rinfo pointer to hctx->driver_data, because rinfo may be
    reallocated so use hctx->queue_num to get the rinfo structure instead.

    Signed-off-by: Bob Liu
    Signed-off-by: Konrad Rzeszutek Wilk

    Bob Liu
     
  • Sometimes blkfront may twice receive blkback_changed() notification
    (XenbusStateConnected) after migration, which will cause
    talk_to_blkback() to be called twice too and confuse xen-blkback.

    The flow is as follow:
    blkfront blkback
    blkfront_resume()
    > talk_to_blkback()
    > Set blkfront to XenbusStateInitialised
    front changed()
    > Connect()
    > Set blkback to XenbusStateConnected

    blkback_changed()
    > Skip talk_to_blkback()
    because frontstate == XenbusStateInitialised
    > blkfront_connect()
    > Set blkfront to XenbusStateConnected

    -----
    And here we get another XenbusStateConnected notification leading
    to:
    -----
    blkback_changed()
    > because now frontstate != XenbusStateInitialised
    talk_to_blkback() is also called again
    > blkfront state changed from
    XenbusStateConnected to XenbusStateInitialised
    (Which is not correct!)

    front_changed():
    > Do nothing because blkback
    already in XenbusStateConnected

    Now blkback is in XenbusStateConnected but blkfront is still
    in XenbusStateInitialised - leading to no disks.

    Poking of the XenbusStateConnected state is allowed (to deal with
    block disk change) and has to be dealt with. The most likely
    cause of this bug are custom udev scripts hooking up the disks
    and then validating the size.

    Signed-off-by: Bob Liu
    Signed-off-by: Konrad Rzeszutek Wilk

    Bob Liu
     

08 Jun, 2016

4 commits

  • The last patch added a REQ_OP_FLUSH for request_fn drivers
    and the next patch renames REQ_FLUSH to REQ_PREFLUSH which
    will be used by file systems and make_request_fn drivers so
    they can send a write/flush combo.

    This patch drops xen's use of REQ_FLUSH to track if it supports
    REQ_OP_FLUSH requests, so REQ_FLUSH can be deleted.

    Signed-off-by: Mike Christie
    Reviewed-by: Hannes Reinecke
    Signed-off-by: Juergen Gross
    Signed-off-by: Jens Axboe

    Mike Christie
     
  • This adds a REQ_OP_FLUSH operation that is sent to request_fn
    based drivers by the block layer's flush code, instead of
    sending requests with the request->cmd_flags REQ_FLUSH bit set.

    Signed-off-by: Mike Christie
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Hannes Reinecke
    Signed-off-by: Jens Axboe

    Mike Christie
     
  • The req operation REQ_OP is separated from the rq_flag_bits
    definition. This converts the block layer drivers to
    use req_op to get the op from the request struct.

    Signed-off-by: Mike Christie
    Reviewed-by: Hannes Reinecke
    Signed-off-by: Jens Axboe

    Mike Christie
     
  • This has callers of submit_bio/submit_bio_wait set the bio->bi_rw
    instead of passing it in. This makes that use the same as
    generic_make_request and how we set the other bio fields.

    Signed-off-by: Mike Christie

    Fixed up fs/ext4/crypto.c

    Signed-off-by: Jens Axboe

    Mike Christie
     

13 Apr, 2016

1 commit


19 Mar, 2016

1 commit

  • Pull block driver updates from Jens Axboe:
    "This is the block driver pull request for this merge window. It sits
    on top of for-4.6/core, that was just sent out.

    This contains:

    - A set of fixes for lightnvm. One from Alan, fixing an overflow,
    and the rest from the usual suspects, Javier and Matias.

    - A set of fixes for nbd from Markus and Dan, and a fixup from Arnd
    for correct usage of the signed 64-bit divider.

    - A set of bug fixes for the Micron mtip32xx, from Asai.

    - A fix for the brd discard handling from Bart.

    - Update the maintainers entry for cciss, since that hardware has
    transferred ownership.

    - Three bug fixes for bcache from Eric Wheeler.

    - Set of fixes for xen-blk{back,front} from Jan and Konrad.

    - Removal of the cpqarray driver. It has been disabled in Kconfig
    since 2013, and we were initially scheduled to remove it in 3.15.

    - Various updates and fixes for NVMe, with the most important being:

    - Removal of the per-device NVMe thread, replacing that with a
    watchdog timer instead. From Christoph.

    - Exposing the namespace WWID through sysfs, from Keith.

    - Set of cleanups from Ming Lin.

    - Logging the controller device name instead of the underlying
    PCI device name, from Sagi.

    - And a bunch of fixes and optimizations from the usual suspects
    in this area"

    * 'for-4.6/drivers' of git://git.kernel.dk/linux-block: (49 commits)
    NVMe: Expose ns wwid through single sysfs entry
    drivers:block: cpqarray clean up
    brd: Fix discard request processing
    cpqarray: remove it from the kernel
    cciss: update MAINTAINERS
    NVMe: Remove unused sq_head read in completion path
    bcache: fix cache_set_flush() NULL pointer dereference on OOM
    bcache: cleaned up error handling around register_cache()
    bcache: fix race of writeback thread starting before complete initialization
    NVMe: Create discard zero quirk white list
    nbd: use correct div_s64 helper
    mtip32xx: remove unneeded variable in mtip_cmd_timeout()
    lightnvm: generalize rrpc ppa calculations
    lightnvm: remove struct nvm_dev->total_blocks
    lightnvm: rename ->nr_pages to ->nr_sects
    lightnvm: update closed list outside of intr context
    xen/blback: Fit the important information of the thread in 17 characters
    lightnvm: fold get bb tbl when using dual/quad plane mode
    lightnvm: fix up nonsensical configure overrun checking
    xen-blkback: advertise indirect segment support earlier
    ...

    Linus Torvalds
     

04 Mar, 2016

1 commit

  • "max" is rather ambiguous and carries pretty little meaning, the more
    that there are also "max_queues" and "max_ring_page_order". Make this
    "max_indirect_segments" instead, and at once change the type from int
    to uint (to match the respective variable's type).

    Acked-by: Roger Pau Monné
    Signed-off-by: Jan Beulich
    Signed-off-by: Konrad Rzeszutek Wilk

    Jan Beulich
     

30 Jan, 2016

1 commit

  • Need to reallocate ring info in the resume path, because info->rinfo was freed
    in blkif_free(). And 'multi-queue-max-queues' backend reports may have been
    changed.

    Signed-off-by: Bob Liu
    Reported-and-Tested-by: Konrad Rzeszutek Wilk
    Signed-off-by: Konrad Rzeszutek Wilk

    Bob Liu
     

05 Jan, 2016

11 commits

  • We have split the setting up of all the resources in two steps:
    1) talk_to_blkback - which figures out the num_ring_pages (from
    the default value of zero), sets up shadow and so
    2) blkfront_connect - does the real part of filling out the
    internal structures.

    The problem is if we bypass the 1) step and go straight to 2)
    and call blkfront_setup_indirect where we use the macro
    BLK_RING_SIZE - which returns an negative value (because
    sz is zero - since num_ring_pages is zero - since it has never
    been set).

    We can fix this by making sure that we always have called
    talk_to_blkback before going to blkfront_connect.

    Or we could set in blkfront_probe info->nr_ring_pages = 1
    to have a default value. But that looks odd - as we haven't
    actually negotiated any ring size.

    This patch changes XenbusStateConnected state to detect if
    we haven't done the initial handshake - and if so continue
    on as if were in XenbusStateInitWait state.

    We also roll the error recovery (freeing the structure) into
    talk_to_blkback error path - which is safe since that function
    is only called from blkback_changed.

    Signed-off-by: Konrad Rzeszutek Wilk

    Konrad Rzeszutek Wilk
     
  • The minimal size of request in the block framework is always PAGE_SIZE.
    It means that when 64KB guest is support, the request will at least be
    64KB.

    Although, if the backend doesn't support indirect descriptor (such as QDISK
    in QEMU), a ring request is only able to accommodate 11 segments of 4KB
    (i.e 44KB).

    The current frontend is assuming that an I/O request will always fit in
    a ring request. This is not true any more when using 64KB page
    granularity and will therefore crash during boot.

    On ARM64, the ABI is completely neutral to the page granularity used by
    the domU. The guest has the choice between different page granularity
    supported by the processors (for instance on ARM64: 4KB, 16KB, 64KB).
    This can't be enforced by the hypervisor and therefore it's possible to
    run guests using different page granularity.

    So we can't mandate the block backend to support indirect descriptor
    when the frontend is using 64KB page granularity and have to fix it
    properly in the frontend.

    The solution exposed below is based on modifying directly the frontend
    guest rather than asking the block framework to support smaller size
    (i.e < PAGE_SIZE). This is because the change is the block framework are
    not trivial as everything seems to relying on a struct *page (see [1]).
    Although, it may be possible that someone succeed to do it in the future
    and we would therefore be able to use it.

    Given that a block request may not fit in a single ring request, a
    second request is introduced for the data that cannot fit in the first
    one. This means that the second ring request should never be used on
    Linux if the page size is smaller than 44KB.

    To achieve the support of the extra ring request, the block queue size
    is divided by two. Therefore, the ring will always contain enough space
    to accommodate 2 ring requests. While this will reduce the overall
    performance, it will make the implementation more contained. The way
    forward to get better performance is to implement in the backend either
    indirect descriptor or multiple grants ring.

    Note that the parameters blk_queue_max_* helpers haven't been updated.
    The block code will set the mimimum size supported and we may be able
    to support directly any change in the block framework that lower down
    the minimal size of a request.

    [1] http://lists.xen.org/archives/html/xen-devel/2015-08/msg02200.html

    Signed-off-by: Julien Grall
    Acked-by: Roger Pau Monné
    Signed-off-by: Konrad Rzeszutek Wilk

    Julien Grall
     
  • The code to get a request is always the same. Therefore we can factorize
    it in a single function.

    Signed-off-by: Julien Grall
    Acked-by: Roger Pau Monné
    Signed-off-by: Konrad Rzeszutek Wilk

    Julien Grall
     
  • Lets return sensible values instead of -1.

    Signed-off-by: Konrad Rzeszutek Wilk

    Konrad Rzeszutek Wilk
     
  • According to this piece code:
    "
    pr_info("Invalid max_ring_order (%d), will use default max: %d.\n",
    xen_blkif_max_ring_order, XENBUS_MAX_RING_GRANT_ORDER);
    "
    if xen_blkif_max_ring_order is bigger that XENBUS_MAX_RING_GRANT_ORDER,
    need to set xen_blkif_max_ring_order using XENBUS_MAX_RING_GRANT_ORDER,
    but not 0.

    Signed-off-by: Peng Fan
    Cc: Boris Ostrovsky
    Cc: David Vrabel
    Cc: "Roger Pau Monné"
    Signed-off-by: Konrad Rzeszutek Wilk

    Peng Fan
     
  • Make persistent grants per-queue/ring instead of per-device, so that we can
    drop the 'dev_lock' and get better scalability.

    Test was done based on null_blk driver:
    dom0: v4.2-rc8 16vcpus 10GB "modprobe null_blk"
    domu: v4.2-rc8 16vcpus 10GB

    [test]
    rw=read
    direct=1
    ioengine=libaio
    bs=4k
    time_based
    runtime=30
    filename=/dev/xvdb
    numjobs=16
    iodepth=64
    iodepth_batch=64
    iodepth_batch_complete=64
    group_reporting

    Queues: 1 4 8 16
    Iops orig(k): 810 1064 780 700
    Iops patched(k): 810 1230(~20%) 1024(~20%) 850(~20%)

    Signed-off-by: Bob Liu
    Signed-off-by: Konrad Rzeszutek Wilk

    Bob Liu
     
  • We do the same exact operations a bit earlier in the
    function.

    Signed-off-by: Bob Liu
    Signed-off-by: Konrad Rzeszutek Wilk

    Bob Liu
     
  • Signed-off-by: Konrad Rzeszutek Wilk

    Konrad Rzeszutek Wilk
     
  • The max number of hardware queues for xen/blkfront is set by parameter
    'max_queues'(default 4), while it is also capped by the max value that the
    xen/blkback exposes through XenStore key 'multi-queue-max-queues'.

    The negotiated number is the smaller one and would be written back to xenstore
    as "multi-queue-num-queues", blkback needs to read this negotiated number.

    Signed-off-by: Bob Liu
    Signed-off-by: Konrad Rzeszutek Wilk

    Bob Liu
     
  • After patch "xen/blkfront: separate per ring information out of device
    info", per-ring data is protected by a per-device lock ('io_lock').

    This is not a good way and will effect the scalability, so introduce a
    per-ring lock ('ring_lock').

    The old 'io_lock' is renamed to 'dev_lock' which protects the ->grants list and
    ->persistent_gnts_c which are shared by all rings.

    Note that in 'blkfront_probe' the 'blkfront_info' is setup via kzalloc
    so setting ->persistent_gnts_c to zero is not needed.

    Signed-off-by: Bob Liu
    Signed-off-by: Konrad Rzeszutek Wilk

    Bob Liu
     
  • Preparatory patch for multiple hardware queues (rings). The number of
    rings is unconditionally set to 1, larger number will be enabled in
    patch "xen/blkfront: negotiate number of queues/rings to be used with backend"
    so as to make review easier.

    Note that blkfront_gather_backend_features does not call
    blkfront_setup_indirect anymore (as that needs to be done per ring).
    That means that in blkif_recover/blkif_connect we have to do it in a loop
    (bounded by nr_rings).

    Signed-off-by: Bob Liu
    Signed-off-by: Konrad Rzeszutek Wilk

    Bob Liu
     

04 Jan, 2016

1 commit

  • Split per ring information to a new structure "blkfront_ring_info".

    A ring is the representation of a hardware queue, every vbd device can associate
    with one or more rings depending on how many hardware queues/rings to be used.

    This patch is a preparation for supporting real multi hardware queues/rings.

    We also add a backpointer to 'struct blkfront_info' (dev_info) which
    is not needed (we could use containers_of) but further patch
    ("xen/blkfront: pseudo support for multi hardware queues/rings")
    will make allocation of 'blkfront_ring_info' dynamic.

    Signed-off-by: Arianna Avanzini
    Signed-off-by: Bob Liu
    Signed-off-by: Konrad Rzeszutek Wilk

    Bob Liu
     

05 Nov, 2015

1 commit

  • Pull xen updates from David Vrabel:

    - Improve balloon driver memory hotplug placement.

    - Use unpopulated hotplugged memory for foreign pages (if
    supported/enabled).

    - Support 64 KiB guest pages on arm64.

    - CPU hotplug support on arm/arm64.

    * tag 'for-linus-4.4-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: (44 commits)
    xen: fix the check of e_pfn in xen_find_pfn_range
    x86/xen: add reschedule point when mapping foreign GFNs
    xen/arm: don't try to re-register vcpu_info on cpu_hotplug.
    xen, cpu_hotplug: call device_offline instead of cpu_down
    xen/arm: Enable cpu_hotplug.c
    xenbus: Support multiple grants ring with 64KB
    xen/grant-table: Add an helper to iterate over a specific number of grants
    xen/xenbus: Rename *RING_PAGE* to *RING_GRANT*
    xen/arm: correct comment in enlighten.c
    xen/gntdev: use types from linux/types.h in userspace headers
    xen/gntalloc: use types from linux/types.h in userspace headers
    xen/balloon: Use the correct sizeof when declaring frame_list
    xen/swiotlb: Add support for 64KB page granularity
    xen/swiotlb: Pass addresses rather than frame numbers to xen_arch_need_swiotlb
    arm/xen: Add support for 64KB page granularity
    xen/privcmd: Add support for Linux 64KB page granularity
    net/xen-netback: Make it running on 64KB page granularity
    net/xen-netfront: Make it running on 64KB page granularity
    block/xen-blkback: Make it running on 64KB page granularity
    block/xen-blkfront: Make it running on 64KB page granularity
    ...

    Linus Torvalds
     

23 Oct, 2015

5 commits

  • Linux may use a different page size than the size of grant. So make
    clear that the order is actually in number of grant.

    Signed-off-by: Julien Grall
    Signed-off-by: David Vrabel

    Julien Grall
     
  • The PV block protocol is using 4KB page granularity. The goal of this
    patch is to allow a Linux using 64KB page granularity using block
    device on a non-modified Xen.

    The block API is using segment which should at least be the size of a
    Linux page. Therefore, the driver will have to break the page in chunk
    of 4K before giving the page to the backend.

    When breaking a 64KB segment in 4KB chunks, it is possible that some
    chunks are empty. As the PV protocol always require to have data in the
    chunk, we have to count the number of Xen page which will be in use and
    avoid sending empty chunks.

    Note that, a pre-defined number of grants are reserved before preparing
    the request. This pre-defined number is based on the number and the
    maximum size of the segments. If each segment contains a very small
    amount of data, the driver may reserve too many grants (16 grants is
    reserved per segment with 64KB page granularity).

    Furthermore, in the case of persistent grants we allocate one Linux page
    per grant although only the first 4KB of the page will be effectively
    in use. This could be improved by sharing the page with multiple grants.

    Signed-off-by: Julien Grall
    Acked-by: Roger Pau Monné
    Signed-off-by: David Vrabel

    Julien Grall
     
  • Prepare the code to support 64KB page granularity. The first
    implementation will use a full Linux page per indirect and persistent
    grant. When non-persistent grant is used, each page of a bio request
    may be split in multiple grant.

    Furthermore, the field page of the grant structure is only used to copy
    data from persistent grant or indirect grant. Avoid to set it for other
    use case as it will have no meaning given the page will be split in
    multiple grant.

    Provide 2 functions, to setup indirect grant, the other for bio page.

    Signed-off-by: Julien Grall
    Acked-by: Roger Pau Monné
    Signed-off-by: David Vrabel

    Julien Grall
     
  • All the usage of the field pfn are done using the same idiom:

    pfn_to_page(grant->pfn)

    This will return always the same page. Store directly the page in the
    grant to clean up the code.

    Signed-off-by: Julien Grall
    Acked-by: Roger Pau Monné
    Reviewed-by: Stefano Stabellini
    Signed-off-by: David Vrabel

    Julien Grall
     
  • Currently, blkif_queue_request has 2 distinct execution path:
    - Send a discard request
    - Send a read/write request

    The function is also allocating grants to use for generating the
    request. Although, this is only used for read/write request.

    Rather than having a function with 2 distinct execution path, separate
    the function in 2. This will also remove one level of tabulation.

    Signed-off-by: Julien Grall
    Reviewed-by: Roger Pau Monné
    Signed-off-by: David Vrabel

    Julien Grall
     

08 Oct, 2015

2 commits

  • …git/konrad/xen into for-linus

    Konrad writes:

    Please git pull an update branch to your 'for-4.3/drivers' branch (which
    oddly I don't see does not have the previous pull?)

    git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git stable/for-jens-4.3

    which has two fixes - one where we use the Xen blockfront EFI driver and
    don't release all the requests, the other if the allocation of resources
    for a particular state failed - we would go back 'Closing' and assume
    that an structure would be allocated while in fact it may not be - and
    crash.

    Jens Axboe
     
  • xen-blkfront will crash if the check to talk_to_blkback()
    in blkback_changed()(XenbusStateInitWait) returns an error.
    The driver data is freed and info is set to NULL. Later during
    the close process via talk_to_blkback's call to xenbus_dev_fatal()
    the null pointer is passed to and dereference in blkfront_closing.

    CC: stable@vger.kernel.org
    Signed-off-by: Cathy Avery
    Signed-off-by: Konrad Rzeszutek Wilk

    Cathy Avery