26 Jan, 2017

1 commit

  • commit ae7871be189cb41184f1e05742b4a99e2c59774d upstream.

    Convert the flag swiotlb_force from an int to an enum, to prepare for
    the advent of more possible values.

    Suggested-by: Konrad Rzeszutek Wilk
    Signed-off-by: Geert Uytterhoeven
    Signed-off-by: Konrad Rzeszutek Wilk
    Signed-off-by: Greg Kroah-Hartman

    Geert Uytterhoeven
     

06 Jan, 2017

1 commit

  • commit 30faaafdfa0c754c91bac60f216c9f34a2bfdf7e upstream.

    Commit 9c17d96500f7 ("xen/gntdev: Grant maps should not be subject to
    NUMA balancing") set VM_IO flag to prevent grant maps from being
    subjected to NUMA balancing.

    It was discovered recently that this flag causes get_user_pages() to
    always fail with -EFAULT.

    check_vma_flags
    __get_user_pages
    __get_user_pages_locked
    __get_user_pages_unlocked
    get_user_pages_fast
    iov_iter_get_pages
    dio_refill_pages
    do_direct_IO
    do_blockdev_direct_IO
    do_blockdev_direct_IO
    ext4_direct_IO_read
    generic_file_read_iter
    aio_run_iocb

    (which can happen if guest's vdisk has direct-io-safe option).

    To avoid this let's use VM_MIXEDMAP flag instead --- it prevents
    NUMA balancing just as VM_IO does and has no effect on
    check_vma_flags().

    Reported-by: Olaf Hering
    Suggested-by: Hugh Dickins
    Signed-off-by: Boris Ostrovsky
    Acked-by: Hugh Dickins
    Tested-by: Olaf Hering
    Signed-off-by: Juergen Gross
    Signed-off-by: Greg Kroah-Hartman

    Boris Ostrovsky
     

25 Oct, 2016

1 commit


24 Oct, 2016

3 commits


07 Oct, 2016

1 commit

  • Pull xen updates from David Vrabel:
    "xen features and fixes for 4.9:

    - switch to new CPU hotplug mechanism

    - support driver_override in pciback

    - require vector callback for HVM guests (the alternate mechanism via
    the platform device has been broken for ages)"

    * tag 'for-linus-4.9-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
    xen/x86: Update topology map for PV VCPUs
    xen/x86: Initialize per_cpu(xen_vcpu, 0) a little earlier
    xen/pciback: support driver_override
    xen/pciback: avoid multiple entries in slot list
    xen/pciback: simplify pcistub device handling
    xen: Remove event channel notification through Xen PCI platform device
    xen/events: Convert to hotplug state machine
    xen/x86: Convert to hotplug state machine
    x86/xen: add missing \n at end of printk warning message
    xen/grant-table: Use kmalloc_array() in arch_gnttab_valloc()
    xen: Make VPMU init message look less scary
    xen: rename xen_pmu_init() in sys-hypervisor.c
    hotplug: Prevent alloc/free of irq descriptors during cpu up/down (again)
    xen/x86: Move irq allocation from Xen smp_op.cpu_up()

    Linus Torvalds
     

30 Sep, 2016

5 commits

  • Support the driver_override scheme introduced with commit 782a985d7af2
    ("PCI: Introduce new device binding path using pci_dev.driver_override")

    As pcistub_probe() is called for all devices (it has to check for a
    match based on the slot address rather than device type) it has to
    check for driver_override set to "pciback" itself.

    Up to now for assigning a pci device to pciback you need something like:

    echo 0000:07:10.0 > /sys/bus/pci/devices/0000\:07\:10.0/driver/unbind
    echo 0000:07:10.0 > /sys/bus/pci/drivers/pciback/new_slot
    echo 0000:07:10.0 > /sys/bus/pci/drivers_probe

    while with the patch you can use the same mechanism as for similar
    drivers like pci-stub and vfio-pci:

    echo pciback > /sys/bus/pci/devices/0000\:07\:10.0/driver_override
    echo 0000:07:10.0 > /sys/bus/pci/devices/0000\:07\:10.0/driver/unbind
    echo 0000:07:10.0 > /sys/bus/pci/drivers_probe

    So e.g. libvirt doesn't need special handling for pciback.

    Signed-off-by: Juergen Gross
    Signed-off-by: David Vrabel

    Juergen Gross
     
  • The Xen pciback driver has a list of all pci devices it is ready to
    seize. There is no check whether a to be added entry already exists.
    While this might be no problem in the common case it might confuse
    those which consume the list via sysfs.

    Modify the handling of this list by not adding an entry which already
    exists. As this will be needed later split out the list handling into
    a separate function.

    Signed-off-by: Juergen Gross
    Signed-off-by: David Vrabel

    Juergen Gross
     
  • The Xen pciback driver maintains a list of all its seized devices.
    There are two functions searching the list for a specific device with
    basically the same semantics just returning different structures in
    case of a match.

    Split out the search function.

    Signed-off-by: Juergen Gross
    Signed-off-by: David Vrabel

    Juergen Gross
     
  • Ever since commit 254d1a3f02eb ("xen/pv-on-hvm kexec: shutdown watches
    from old kernel") using the INTx interrupt from Xen PCI platform
    device for event channel notification would just lockup the guest
    during bootup. postcore_initcall now calls xs_reset_watches which
    will eventually try to read a value from XenStore and will get stuck
    on read_reply at XenBus forever since the platform driver is not
    probed yet and its INTx interrupt handler is not registered yet. That
    means that the guest can not be notified at this moment of any pending
    event channels and none of the per-event handlers will ever be invoked
    (including the XenStore one) and the reply will never be picked up by
    the kernel.

    The exact stack where things get stuck during xenbus_init:

    -xenbus_init
    -xs_init
    -xs_reset_watches
    -xenbus_scanf
    -xenbus_read
    -xs_single
    -xs_single
    -xs_talkv

    Vector callbacks have always been the favourite event notification
    mechanism since their introduction in commit 38e20b07efd5 ("x86/xen:
    event channels delivery on HVM.") and the vector callback feature has
    always been advertised for quite some time by Xen that's why INTx was
    broken for several years now without impacting anyone.

    Luckily this also means that event channel notification through INTx
    is basically dead-code which can be safely removed without impacting
    anybody since it has been effectively disabled for more than 4 years
    with nobody complaining about it (at least as far as I'm aware of).

    This commit removes event channel notification through Xen PCI
    platform device.

    Cc: Boris Ostrovsky
    Cc: David Vrabel
    Cc: Juergen Gross
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Cc: x86@kernel.org
    Cc: Konrad Rzeszutek Wilk
    Cc: Bjorn Helgaas
    Cc: Stefano Stabellini
    Cc: Julien Grall
    Cc: Vitaly Kuznetsov
    Cc: Paul Gortmaker
    Cc: Ross Lagerwall
    Cc: xen-devel@lists.xenproject.org
    Cc: linux-kernel@vger.kernel.org
    Cc: linux-pci@vger.kernel.org
    Cc: Anthony Liguori
    Signed-off-by: KarimAllah Ahmed
    Reviewed-by: Boris Ostrovsky
    Signed-off-by: David Vrabel

    KarimAllah Ahmed
     
  • Install the callbacks via the state machine.

    Signed-off-by: Sebastian Andrzej Siewior
    Signed-off-by: Boris Ostrovsky
    Signed-off-by: David Vrabel

    Sebastian Andrzej Siewior
     

25 Aug, 2016

2 commits

  • There are two functions with name xen_pmu_init() in the kernel. Rename
    the one in drivers/xen/sys-hypervisor.c to avoid shadowing the one in
    arch/x86/xen/pmu.c

    To avoid the same problem in future rename some more functions.

    Signed-off-by: Juergen Gross
    Signed-off-by: David Vrabel

    Juergen Gross
     
  • This should really only be done for XS_TRANSACTION_END messages, or
    else at least some of the xenstore-* tools don't work anymore.

    Fixes: 0beef634b8 ("xenbus: don't BUG() on user mode induced condition")
    Reported-by: Richard Schütz
    Cc:
    Signed-off-by: Jan Beulich
    Tested-by: Richard Schütz
    Signed-off-by: David Vrabel

    Jan Beulich
     

04 Aug, 2016

1 commit

  • The dma-mapping core and the implementations do not change the DMA
    attributes passed by pointer. Thus the pointer can point to const data.
    However the attributes do not have to be a bitfield. Instead unsigned
    long will do fine:

    1. This is just simpler. Both in terms of reading the code and setting
    attributes. Instead of initializing local attributes on the stack
    and passing pointer to it to dma_set_attr(), just set the bits.

    2. It brings safeness and checking for const correctness because the
    attributes are passed by value.

    Semantic patches for this change (at least most of them):

    virtual patch
    virtual context

    @r@
    identifier f, attrs;

    @@
    f(...,
    - struct dma_attrs *attrs
    + unsigned long attrs
    , ...)
    {
    ...
    }

    @@
    identifier r.f;
    @@
    f(...,
    - NULL
    + 0
    )

    and

    // Options: --all-includes
    virtual patch
    virtual context

    @r@
    identifier f, attrs;
    type t;

    @@
    t f(..., struct dma_attrs *attrs);

    @@
    identifier r.f;
    @@
    f(...,
    - NULL
    + 0
    )

    Link: http://lkml.kernel.org/r/1468399300-5399-2-git-send-email-k.kozlowski@samsung.com
    Signed-off-by: Krzysztof Kozlowski
    Acked-by: Vineet Gupta
    Acked-by: Robin Murphy
    Acked-by: Hans-Christian Noren Egtvedt
    Acked-by: Mark Salter [c6x]
    Acked-by: Jesper Nilsson [cris]
    Acked-by: Daniel Vetter [drm]
    Reviewed-by: Bart Van Assche
    Acked-by: Joerg Roedel [iommu]
    Acked-by: Fabien Dessenne [bdisp]
    Reviewed-by: Marek Szyprowski [vb2-core]
    Acked-by: David Vrabel [xen]
    Acked-by: Konrad Rzeszutek Wilk [xen swiotlb]
    Acked-by: Joerg Roedel [iommu]
    Acked-by: Richard Kuo [hexagon]
    Acked-by: Geert Uytterhoeven [m68k]
    Acked-by: Gerald Schaefer [s390]
    Acked-by: Bjorn Andersson
    Acked-by: Hans-Christian Noren Egtvedt [avr32]
    Acked-by: Vineet Gupta [arc]
    Acked-by: Robin Murphy [arm64 and dma-iommu]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Krzysztof Kozlowski
     

28 Jul, 2016

1 commit

  • Pull xen updates from David Vrabel:
    "Features and fixes for 4.8-rc0:

    - ACPI support for guests on ARM platforms.
    - Generic steal time support for arm and x86.
    - Support cases where kernel cpu is not Xen VCPU number (e.g., if
    in-guest kexec is used).
    - Use the system workqueue instead of a custom workqueue in various
    places"

    * tag 'for-linus-4.8-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: (47 commits)
    xen: add static initialization of steal_clock op to xen_time_ops
    xen/pvhvm: run xen_vcpu_setup() for the boot CPU
    xen/evtchn: use xen_vcpu_id mapping
    xen/events: fifo: use xen_vcpu_id mapping
    xen/events: use xen_vcpu_id mapping in events_base
    x86/xen: use xen_vcpu_id mapping when pointing vcpu_info to shared_info
    x86/xen: use xen_vcpu_id mapping for HYPERVISOR_vcpu_op
    xen: introduce xen_vcpu_id mapping
    x86/acpi: store ACPI ids from MADT for future usage
    x86/xen: update cpuid.h from Xen-4.7
    xen/evtchn: add IOCTL_EVTCHN_RESTRICT
    xen-blkback: really don't leak mode property
    xen-blkback: constify instance of "struct attribute_group"
    xen-blkfront: prefer xenbus_scanf() over xenbus_gather()
    xen-blkback: prefer xenbus_scanf() over xenbus_gather()
    xen: support runqueue steal time on xen
    arm/xen: add support for vm_assist hypercall
    xen: update xen headers
    xen-pciback: drop superfluous variables
    xen-pciback: short-circuit read path used for merging write values
    ...

    Linus Torvalds
     

27 Jul, 2016

1 commit

  • I have noticed that frontswap.h first declares "frontswap_enabled" as
    extern bool variable, and then overrides it with "#define
    frontswap_enabled (1)" for CONFIG_FRONTSWAP=Y or (0) when disabled. The
    bool variable isn't actually instantiated anywhere.

    This all looks like an unfinished attempt to make frontswap_enabled
    reflect whether a backend is instantiated. But in the current state,
    all frontswap hooks call unconditionally into frontswap.c just to check
    if frontswap_ops is non-NULL. This should at least be checked inline,
    but we can further eliminate the overhead when CONFIG_FRONTSWAP is
    enabled and no backend registered, using a static key that is initially
    disabled, and gets enabled only upon first backend registration.

    Thus, checks for "frontswap_enabled" are replaced with
    "frontswap_enabled()" wrapping the static key check. There are two
    exceptions:

    - xen's selfballoon_process() was testing frontswap_enabled in code guarded
    by #ifdef CONFIG_FRONTSWAP, which was effectively always true when reachable.
    The patch just removes this check. Using frontswap_enabled() does not sound
    correct here, as this can be true even without xen's own backend being
    registered.

    - in SYSCALL_DEFINE2(swapon), change the check to IS_ENABLED(CONFIG_FRONTSWAP)
    as it seems the bitmap allocation cannot currently be postponed until a
    backend is registered. This means that frontswap will still have some
    memory overhead by being configured, but without a backend.

    After the patch, we can expect that some functions in frontswap.c are
    called only when frontswap_ops is non-NULL. Change the checks there to
    VM_BUG_ONs. While at it, convert other BUG_ONs to VM_BUG_ONs as
    frontswap has been stable for some time.

    [akpm@linux-foundation.org: coding-style fixes]
    Link: http://lkml.kernel.org/r/1463152235-9717-1-git-send-email-vbabka@suse.cz
    Signed-off-by: Vlastimil Babka
    Cc: Konrad Rzeszutek Wilk
    Cc: Boris Ostrovsky
    Cc: David Vrabel
    Cc: Juergen Gross
    Cc: "Kirill A. Shutemov"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vlastimil Babka
     

26 Jul, 2016

1 commit


25 Jul, 2016

5 commits

  • Use the newly introduced xen_vcpu_id mapping to get Xen's idea of vCPU
    id for CPU0.

    Signed-off-by: Vitaly Kuznetsov
    Signed-off-by: David Vrabel

    Vitaly Kuznetsov
     
  • EVTCHNOP_init_control has vCPU id as a parameter and Xen's idea of
    vCPU id should be used. Use the newly introduced xen_vcpu_id mapping
    to convert it from Linux's id.

    Signed-off-by: Vitaly Kuznetsov
    Signed-off-by: David Vrabel

    Vitaly Kuznetsov
     
  • EVTCHNOP_bind_ipi and EVTCHNOP_bind_virq pass vCPU id as a parameter
    and Xen's idea of vCPU id should be used. Use the newly introduced
    xen_vcpu_id mapping to convert it from Linux's id.

    Signed-off-by: Vitaly Kuznetsov
    Signed-off-by: David Vrabel

    Vitaly Kuznetsov
     
  • HYPERVISOR_vcpu_op() passes Linux's idea of vCPU id as a parameter
    while Xen's idea is expected. In some cases these ideas diverge so we
    need to do remapping.

    Convert all callers of HYPERVISOR_vcpu_op() to use xen_vcpu_nr().

    Leave xen_fill_possible_map() and xen_filter_cpu_maps() intact as
    they're only being called by PV guests before perpu areas are
    initialized. While the issue could be solved by switching to
    early_percpu for xen_vcpu_id I think it's not worth it: PV guests will
    probably never get to the point where their idea of vCPU id diverges
    from Xen's.

    Signed-off-by: Vitaly Kuznetsov
    Signed-off-by: David Vrabel

    Vitaly Kuznetsov
     
  • IOCTL_EVTCHN_RESTRICT limits the file descriptor to being able to bind
    to interdomain event channels from a specific domain. Event channels
    that are already bound continue to work for sending and receiving
    notifications.

    This is useful as part of deprivileging a user space PV backend or
    device model (QEMU). e.g., Once the device model as bound to the
    ioreq server event channels it can restrict the file handle so an
    exploited DM cannot use it to create or bind to arbitrary event
    channels.

    Signed-off-by: David Vrabel
    Reviewed-by: Boris Ostrovsky

    David Vrabel
     

08 Jul, 2016

3 commits

  • As of Xen 4.7 PV CPUID doesn't expose either of CPUID[1].ECX[7] and
    CPUID[0x80000007].EDX[7] anymore, causing the driver to fail to load on
    both Intel and AMD systems. Doing any kind of hardware capability
    checks in the driver as a prerequisite was wrong anyway: With the
    hypervisor being in charge, all such checking should be done by it. If
    ACPI data gets uploaded despite some missing capability, the hypervisor
    is free to ignore part or all of that data.

    Ditch the entire check_prereq() function, and do the only valid check
    (xen_initial_domain()) in the caller in its place.

    Signed-off-by: Jan Beulich
    Cc:
    Signed-off-by: David Vrabel

    Jan Beulich
     
  • No need to retain a local copy of the full request message, only the
    type is really needed.

    Signed-off-by: Jan Beulich
    Signed-off-by: David Vrabel

    Jan Beulich
     
  • xenbus_dev_request_and_reply() needs to track whether a transaction is
    open. For XS_TRANSACTION_START messages it calls transaction_start()
    and for XS_TRANSACTION_END messages it calls transaction_end().

    If sending an XS_TRANSACTION_START message fails or responds with an
    an error, the transaction is not open and transaction_end() must be
    called.

    If sending an XS_TRANSACTION_END message fails, the transaction is
    still open, but if an error response is returned the transaction is
    closed.

    Commit 027bd7e89906 ("xen/xenbus: Avoid synchronous wait on XenBus
    stalling shutdown/restart") introduced a regression where failed
    XS_TRANSACTION_START messages were leaving the transaction open. This
    can cause problems with suspend (and migration) as all transactions
    must be closed before suspending.

    It appears that the problematic change was added accidentally, so just
    remove it.

    Signed-off-by: Jan Beulich
    Cc: Konrad Rzeszutek Wilk
    Cc:
    Signed-off-by: David Vrabel

    Jan Beulich
     

07 Jul, 2016

1 commit

  • Inability to locate a user mode specified transaction ID should not
    lead to a kernel crash. For other than XS_TRANSACTION_START also
    don't issue anything to xenbus if the specified ID doesn't match that
    of any active transaction.

    Signed-off-by: Jan Beulich
    Cc:
    Signed-off-by: David Vrabel

    Jan Beulich
     

06 Jul, 2016

13 commits

  • Up to now reading the stolen time of a remote cpu was not possible in a
    performant way under Xen. This made support of runqueue steal time via
    paravirt_steal_rq_enabled impossible.

    With the addition of an appropriate hypervisor interface this is now
    possible, so add the support.

    Signed-off-by: Juergen Gross
    Reviewed-by: Stefano Stabellini
    Signed-off-by: David Vrabel

    Juergen Gross
     
  • req_start is simply an alias of the "offset" function parameter, and
    req_end is being used just once in each function. (And both variables
    were loop invariant anyway, so should at least have got initialized
    outside the loop.)

    Signed-off-by: Jan Beulich
    Signed-off-by: David Vrabel

    Jan Beulich
     
  • There's no point calling xen_pcibk_config_read() here - all it'll do is
    return whatever conf_space_read() returns for the field which was found
    here (and which would be found there again). Also there's no point
    clearing tmp_val before the call.

    Signed-off-by: Jan Beulich
    Signed-off-by: David Vrabel

    Jan Beulich
     
  • Signed-off-by: Jan Beulich
    Signed-off-by: David Vrabel

    Jan Beulich
     
  • Other than for raw BAR values, flags are properly separated in the
    internal representation.

    Signed-off-by: Jan Beulich
    Signed-off-by: David Vrabel

    Jan Beulich
     
  • Signed-off-by: Jan Beulich
    Signed-off-by: David Vrabel

    Jan Beulich
     
  • It is now identical to bar_init().

    Signed-off-by: Jan Beulich
    Signed-off-by: David Vrabel

    Jan Beulich
     
  • Signed-off-by: Jan Beulich
    Signed-off-by: David Vrabel

    Jan Beulich
     
  • System workqueues have been able to handle high level of concurrency
    for a long time now and there's no reason to use dedicated workqueues
    just to gain concurrency. Replace dedicated xenbus_frontend_wq with the
    use of system_wq.

    Unlike a dedicated per-cpu workqueue created with create_workqueue(),
    system_wq allows multiple work items to overlap executions even on
    the same CPU; however, a per-cpu workqueue doesn't have any CPU
    locality or global ordering guarantees unless the target CPU is
    explicitly specified and the increase of local concurrency shouldn't
    make any difference.

    In this case, there is only a single work item, increase of concurrency
    level by switching to system_wq should not make any difference.

    Signed-off-by: Bhaktipriya Shridhar
    Acked-by: Tejun Heo
    Signed-off-by: David Vrabel

    Bhaktipriya Shridhar
     
  • System workqueues have been able to handle high level of concurrency
    for a long time now and there's no reason to use dedicated workqueues
    just to gain concurrency. Replace dedicated xen_pcibk_wq with the
    use of system_wq.

    Unlike a dedicated per-cpu workqueue created with create_workqueue(),
    system_wq allows multiple work items to overlap executions even on
    the same CPU; however, a per-cpu workqueue doesn't have any CPU
    locality or global ordering guarantees unless the target CPU is
    explicitly specified and thus the increase of local concurrency shouldn't
    make any difference.

    Since the work items could be pending, flush_work() has been used in
    xen_pcibk_disconnect(). xen_pcibk_xenbus_remove() calls free_pdev()
    which in turn calls xen_pcibk_disconnect() for every pdev to ensure that
    there is no pending task while disconnecting the driver.

    Signed-off-by: Bhaktipriya Shridhar
    Acked-by: Tejun Heo
    Signed-off-by: David Vrabel

    Bhaktipriya Shridhar
     
  • The pv_time_ops structure contains a function pointer for the
    "steal_clock" functionality used only by KVM and Xen on ARM. Xen on x86
    uses its own mechanism to account for the "stolen" time a thread wasn't
    able to run due to hypervisor scheduling.

    Add support in Xen arch independent time handling for this feature by
    moving it out of the arm arch into drivers/xen and remove the x86 Xen
    hack.

    Signed-off-by: Juergen Gross
    Reviewed-by: Boris Ostrovsky
    Reviewed-by: Stefano Stabellini
    Signed-off-by: David Vrabel

    Juergen Gross
     
  • Replace explicit computation of vma page count by a call to
    vma_pages().

    Signed-off-by: Muhammad Falak R Wani
    Reviewed-by: Boris Ostrovsky
    Signed-off-by: David Vrabel

    Muhammad Falak R Wani
     
  • When running on Xen hypervisor, runtime services are supported through
    hypercall. Add a Xen specific function to initialize runtime services.

    Signed-off-by: Shannon Zhao
    Reviewed-by: Stefano Stabellini
    Tested-by: Julien Grall
    Acked-by: Catalin Marinas

    Shannon Zhao