13 Jun, 2016

1 commit

  • The Hyper-V Linux Integration Services use the VMBus implementation for
    communication with the Hypervisor. VMBus registers its own interrupt
    handler that completely bypasses the common Linux interrupt handling.
    This implies that the interrupt entropy collector is not triggered.

    This patch adds the interrupt entropy collection callback into the VMBus
    interrupt handler function.

    Cc: stable@kernel.org
    Signed-off-by: Stephan Mueller
    Signed-off-by: Stephan Mueller
    Signed-off-by: Theodore Ts'o

    Stephan Mueller
     

02 May, 2016

4 commits

  • We set host_specified_ha_region = true on certain request but this is a
    global state which stays 'true' forever. We need to reset it when we
    receive a request where ha_region is not specified. I did not see any
    real issues, the bug was found by code inspection.

    Signed-off-by: Vitaly Kuznetsov
    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    Vitaly Kuznetsov
     
  • When we iterate through all HA regions in handle_pg_range() we have an
    assumption that all these regions are sorted in the list and the
    'start_pfn >= has->end_pfn' check is enough to find the proper region.
    Unfortunately it's not the case with WS2016 where host can hot-add regions
    in a different order. We end up modifying the wrong HA region and crashing
    later on pages online. Modify the check to make sure we found the region
    we were searching for while iterating. Fix the same check in pfn_covered()
    as well.

    Signed-off-by: Vitaly Kuznetsov
    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    Vitaly Kuznetsov
     
  • Kdump keeps biting. Turns out CHANNELMSG_UNLOAD_RESPONSE is always
    delivered to the CPU which was used for initial contact or to CPU0
    depending on host version. vmbus_wait_for_unload() doesn't account for
    the fact that in case we're crashing on some other CPU we won't get the
    CHANNELMSG_UNLOAD_RESPONSE message and our wait on the current CPU will
    never end.

    Do the following:
    1) Check for completion_done() in the loop. In case interrupt handler is
    still alive we'll get the confirmation we need.

    2) Read message pages for all CPUs message page as we're unsure where
    CHANNELMSG_UNLOAD_RESPONSE is going to be delivered to. We can race with
    still-alive interrupt handler doing the same, add cmpxchg() to
    vmbus_signal_eom() to not lose CHANNELMSG_UNLOAD_RESPONSE message.

    3) Cleanup message pages on all CPUs. This is required (at least for the
    current CPU as we're clearing CPU0 messages now but we may want to bring
    up additional CPUs on crash) as new messages won't be delivered till we
    consume what's pending. On boot we'll place message pages somewhere else
    and we won't be able to read stale messages.

    Signed-off-by: Vitaly Kuznetsov
    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    Vitaly Kuznetsov
     
  • Hyper-V VMs can be replicated to another hosts and there is a feature to
    set different IP for replicas, it is called 'Failover TCP/IP'. When
    such guest starts Hyper-V host sends it KVP_OP_SET_IP_INFO message as soon
    as we finish negotiation procedure. The problem is that it can happen (and
    it actually happens) before userspace daemon connects and we reply with
    HV_E_FAIL to the message. As there are no repetitions we fail to set the
    requested IP.

    Solve the issue by postponing our reply to the negotiation message till
    userspace daemon is connected. We can't wait too long as there is a
    host-side timeout (cca. 75 seconds) and if we fail to reply in this time
    frame the whole KVP service will become inactive. The solution is not
    ideal - if it takes userspace daemon more than 60 seconds to connect
    IP Failover will still fail but I don't see a solution with our current
    separation between kernel and userspace parts.

    Other two modules (VSS and FCOPY) don't require such delay, leave them
    untouched.

    Signed-off-by: Vitaly Kuznetsov
    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    Vitaly Kuznetsov
     

01 May, 2016

13 commits

  • Simplify the logic that picks MMIO ranges by pulling out the
    logic related to trying to lay frame buffer claim on top of where
    the firmware placed the frame buffer.

    Signed-off-by: Jake Oshins
    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    Jake Oshins
     
  • Later in the boot sequence, we need to figure out which memory
    ranges can be given out to various paravirtual drivers. The
    hyperv_fb driver should, ideally, be placed right on top of
    the frame buffer, without some other device getting plopped on
    top of this range in the meantime. Recording this now allows
    that to be guaranteed.

    Signed-off-by: Jake Oshins
    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    Jake Oshins
     
  • This patch changes vmbus_allocate_mmio() and vmbus_free_mmio() so
    that when child paravirtual devices allocate memory-mapped I/O
    space, they allocate it privately from a resource tree pointed
    at by hyperv_mmio and also by the public resource tree
    iomem_resource. This allows the region to be marked as "busy"
    in the private tree, but a "bridge window" in the public tree,
    guaranteeing that no two bridge windows will overlap each other
    but while also allowing the PCI device children of the bridge
    windows to overlap that window.

    One might conclude that this belongs in the pnp layer, rather
    than in this driver. Rafael Wysocki, the maintainter of the
    pnp layer, has previously asked that we not modify the pnp layer
    as it is considered deprecated. This patch is thus essentially
    a workaround.

    Signed-off-by: Jake Oshins
    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    Jake Oshins
     
  • A patch later in this series allocates child nodes
    in this resource tree. For that to work, this tree
    needs to be sorted in ascending order.

    Signed-off-by: Jake Oshins
    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    Jake Oshins
     
  • This patch introduces a function that reverses everything
    done by vmbus_allocate_mmio(). Existing code just called
    release_mem_region(). Future patches in this series
    require a more complex sequence of actions, so this function
    is introduced to wrap those actions.

    Signed-off-by: Jake Oshins
    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    Jake Oshins
     
  • In existing code, this tree of resources is created
    in single-threaded code and never modified after it is
    created, and thus needs no locking. This patch introduces
    a semaphore for tree access, as other patches in this
    series introduce run-time modifications of this resource
    tree which can happen on multiple threads.

    Signed-off-by: Jake Oshins
    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    Jake Oshins
     
  • Implement APIs for in-place consumption of vmbus packets. Currently, each
    packet is copied and processed one at a time and as part of processing
    each packet we potentially may signal the host (if it is waiting for
    room to produce a packet).

    These APIs help batched in-place processing of vmbus packets.
    We also optimize host signaling by having a separate API to signal
    the end of in-place consumption. With netvsc using these APIs,
    on an iperf run on average I see about 20X reduction in checks to
    signal the host.

    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    K. Y. Srinivasan
     
  • In preparation for implementing APIs for in-place consumption of VMBUS
    packets, movve some ring buffer functionality into hyperv.h

    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    K. Y. Srinivasan
     
  • In preparation for moving some ring buffer functionality out of the
    vmbus driver, export the API for signaling the host.

    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    K. Y. Srinivasan
     
  • Use the virt_xx barriers that have been defined for use in virtual machines.

    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    K. Y. Srinivasan
     
  • Use the READ_ONCE macro to access variabes that can change asynchronously.
    This is the recommended mechanism for dealing with "unsafe" compiler
    optimizations.

    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    K. Y. Srinivasan
     
  • Introduce separate functions for estimating how much can be read from
    and written to the ring buffer.

    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    K. Y. Srinivasan
     
  • On the consumer side, we have interrupt driven flow management of the
    producer. It is sufficient to base the signaling decision on the
    amount of space that is available to write after the read is complete.
    The current code samples the previous available space and uses this
    in making the signaling decision. This state can be stale and is
    unnecessary. Since the state can be stale, we end up not signaling
    the host (when we should) and this can result in a hang. Fix this
    problem by removing the unnecessary check. I would like to thank
    Arseney Romanenko for pointing out this issue.

    Also, issue a full memory barrier before making the signaling descision
    to correctly deal with potential reordering of the write (read index)
    followed by the read of pending_sz.

    Signed-off-by: K. Y. Srinivasan
    Tested-by: Dexuan Cui
    Cc:
    Signed-off-by: Greg Kroah-Hartman

    K. Y. Srinivasan
     

18 Mar, 2016

1 commit

  • Pull char/misc updates from Greg KH:
    "Here is the big char/misc driver update for 4.6-rc1.

    The majority of the patches here is hwtracing and some new mic
    drivers, but there's a lot of other driver updates as well. Full
    details in the shortlog.

    All have been in linux-next for a while with no reported issues"

    * tag 'char-misc-4.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (238 commits)
    goldfish: Fix build error of missing ioremap on UM
    nvmem: mediatek: Fix later provider initialization
    nvmem: imx-ocotp: Fix return value of imx_ocotp_read
    nvmem: Fix dependencies for !HAS_IOMEM archs
    char: genrtc: replace blacklist with whitelist
    drivers/hwtracing: make coresight-etm-perf.c explicitly non-modular
    drivers: char: mem: fix IS_ERROR_VALUE usage
    char: xillybus: Fix internal data structure initialization
    pch_phub: return -ENODATA if ROM can't be mapped
    Drivers: hv: vmbus: Support kexec on ws2012 r2 and above
    Drivers: hv: vmbus: Support handling messages on multiple CPUs
    Drivers: hv: utils: Remove util transport handler from list if registration fails
    Drivers: hv: util: Pass the channel information during the init call
    Drivers: hv: vmbus: avoid unneeded compiler optimizations in vmbus_wait_for_unload()
    Drivers: hv: vmbus: remove code duplication in message handling
    Drivers: hv: vmbus: avoid wait_for_completion() on crash
    Drivers: hv: vmbus: don't loose HVMSG_TIMER_EXPIRED messages
    misc: at24: replace memory_accessor with nvmem_device_read
    eeprom: 93xx46: extend driver to plug into the NVMEM framework
    eeprom: at25: extend driver to plug into the NVMEM framework
    ...

    Linus Torvalds
     

02 Mar, 2016

8 commits

  • WS2012 R2 and above hosts can support kexec in that thay can support
    reconnecting to the host (as would be needed in the kexec path)
    on any CPU. Enable this. Pre ws2012 r2 hosts don't have this ability
    and consequently cannot support kexec.

    Signed-off-by: Alex Ng
    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    Alex Ng
     
  • Starting with Windows 2012 R2, message inteerupts can be delivered
    on any VCPU in the guest. Support this functionality.

    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    K. Y. Srinivasan
     
  • If util transport fails to initialize for any reason, the list of transport
    handlers may become corrupted due to freeing the transport handler without
    removing it from the list. Fix this by cleaning it up from the list.

    Signed-off-by: Alex Ng
    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    Alex Ng
     
  • Pass the channel information to the util drivers that need to defer
    reading the channel while they are processing a request. This would address
    the following issue reported by Vitaly:

    Commit 3cace4a61610 ("Drivers: hv: utils: run polling callback always in
    interrupt context") removed direct *_transaction.state = HVUTIL_READY
    assignments from *_handle_handshake() functions introducing the following
    race: if a userspace daemon connects before we get first non-negotiation
    request from the server hv_poll_channel() won't set transaction state to
    HVUTIL_READY as (!channel) condition will fail, we set it to non-NULL on
    the first real request from the server.

    Signed-off-by: K. Y. Srinivasan
    Reported-by: Vitaly Kuznetsov
    Signed-off-by: Greg Kroah-Hartman

    K. Y. Srinivasan
     
  • Message header is modified by the hypervisor and we read it in a loop,
    we need to prevent compilers from optimizing accesses. There are no such
    optimizations at this moment, this is just a future proof.

    Suggested-by: Radim Krcmar
    Signed-off-by: Vitaly Kuznetsov
    Reviewed-by: Radim Kr.má
    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    Vitaly Kuznetsov
     
  • We have 3 functions dealing with messages and they all implement
    the same logic to finalize reads, move it to vmbus_signal_eom().

    Suggested-by: Radim Krcmar
    Signed-off-by: Vitaly Kuznetsov
    Reviewed-by: Radim Kr.má
    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    Vitaly Kuznetsov
     
  • wait_for_completion() may sleep, it enables interrupts and this
    is something we really want to avoid on crashes because interrupt
    handlers can cause other crashes. Switch to the recently introduced
    vmbus_wait_for_unload() doing busy wait instead.

    Reported-by: Radim Krcmar
    Signed-off-by: Vitaly Kuznetsov
    Reviewed-by: Radim Kr.má
    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    Vitaly Kuznetsov
     
  • We must handle HVMSG_TIMER_EXPIRED messages in the interrupt context
    and we offload all the rest to vmbus_on_msg_dpc() tasklet. This functions
    loops to see if there are new messages pending. In case we'll ever see
    HVMSG_TIMER_EXPIRED message there we're going to lose it as we can't
    handle it from there. Avoid looping in vmbus_on_msg_dpc(), we're OK
    with handling one message per interrupt.

    Signed-off-by: Vitaly Kuznetsov
    Reviewed-by: Radim Kr.má
    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    Vitaly Kuznetsov
     

17 Feb, 2016

1 commit

  • VMBus hypercall codes inside Hyper-V UAPI header will
    be used by QEMU to implement VMBus host devices support.

    Signed-off-by: Andrey Smetanin
    Acked-by: K. Y. Srinivasan
    Reviewed-by: Roman Kagan
    CC: Gleb Natapov
    CC: Paolo Bonzini
    CC: Joerg Roedel
    CC: "K. Y. Srinivasan"
    CC: Haiyang Zhang
    CC: Roman Kagan
    CC: Denis V. Lunev
    CC: qemu-devel@nongnu.org
    [Do not rename the constant at the same time as moving it, as that
    would cause semantic conflicts with the Hyper-V tree. - Paolo]
    Signed-off-by: Paolo Bonzini

    Andrey Smetanin
     

08 Feb, 2016

12 commits