06 Aug, 2015

18 commits

  • IRQ_DOMAIN is a hidden config option, so depending on it doesn't
    make any sense. Select the config option because it's required to
    compile this driver.

    Signed-off-by: Stephen Boyd
    Reviewed-by: Andy Gross
    Signed-off-by: Greg Kroah-Hartman

    Stephen Boyd
     
  • Reviewed-by: Andy Gross
    Signed-off-by: Courtney Cavin
    Signed-off-by: Bjorn Andersson
    Tested-by: Tim Bird
    Signed-off-by: Greg Kroah-Hartman

    Courtney Cavin
     
  • Add tracepoints to retrieve information about read, write
    and non-data commands. For performance measurement support
    tracepoints are added at the beginning and at the end of
    transfers. Following is a list showing the new tracepoint
    events. The "cmd" parameter here represents the opcode, SID,
    and full 16-bit address.

    spmi_write_begin: cmd and data buffer.
    spmi_write_end : cmd and return value.
    spmi_read_begin : cmd.
    spmi_read_end : cmd, return value and data buffer.
    spmi_cmd : cmd.

    The reason that cmd appears at both the beginning and at
    the end event is that SPMI drivers can request commands
    concurrently. cmd helps in matching the corresponding
    events.

    SPMI tracepoints can be enabled like:

    echo 1 >/sys/kernel/debug/tracing/events/spmi/enable

    and will dump messages that can be viewed in
    /sys/kernel/debug/tracing/trace that look like:

    ... spmi_read_begin: opc=56 sid=00 addr=0x0000
    ... spmi_read_end: opc=56 sid=00 addr=0x0000 ret=0 len=02 buf=0x[01-40]
    ... spmi_write_begin: opc=48 sid=00 addr=0x0000 len=3 buf=0x[ff-ff-ff]

    Suggested-by: Sagar Dharia
    Acked-by: Steven Rostedt
    Reviewed-by: Stephen Boyd
    Signed-off-by: Gilad Avidov
    Signed-off-by: Ankit Gupta
    Signed-off-by: Greg Kroah-Hartman

    Ankit Gupta
     
  • Until now, only 32-bit DMA addressing was allowed, following a report on
    some old Intel machine that dropped 64-bit PCIe packets, even though
    pci_set_dma_mask() was successful with DMA_BIT_MASK(64).

    But then came TI's Keystone II chip (ARM Cortex A15 + DSPs), which refuses
    32-bit DMA addressing (for good reasons). So 64-bit DMA is allowed as a
    fallback option.

    Signed-off-by: Eli Billauer
    Signed-off-by: Greg Kroah-Hartman

    Eli Billauer
     
  • Commit e513229b4c38 ("Drivers: hv: vmbus: prevent cpu offlining on newer
    hypervisors") was altering smp_ops.cpu_disable to prevent CPU offlining.
    We can bo better by using cpu_hotplug_enable/disable functions instead of
    such hard-coding.

    Reported-by: Radim Kr.má
    Signed-off-by: Vitaly Kuznetsov
    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    Vitaly Kuznetsov
     
  • Hyper-V module needs to disable cpu hotplug (offlining) as there is no
    support from hypervisor side to reassign already opened event channels
    to a different CPU. Currently it is been done by altering
    smp_ops.cpu_disable but it is hackish.

    Signed-off-by: Vitaly Kuznetsov
    Reviewed-by: Thomas Gleixner
    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    Vitaly Kuznetsov
     
  • As a prerequisite to exporting cpu_hotplug_enable/cpu_hotplug_disable
    functions to modules we need to convert cpu_hotplug_disabled to a counter
    to properly support disable -> disable -> enable call sequences. E.g.
    after Hyper-V vmbus module (which is supposed to be the first user of
    exported cpu_hotplug_enable/cpu_hotplug_disable) did cpu_hotplug_disable()
    hibernate path calls disable_nonboot_cpus() and if we hit an error in
    _cpu_down() enable_nonboot_cpus() will be called on the failure path (thus
    making cpu_hotplug_disabled = 0 and leaving cpu hotplug in 'enabled'
    state). Same problem is possible if more than 1 module use
    cpu_hotplug_disable/cpu_hotplug_enable on their load/unload paths. When
    one of these modules is been unloaded it is logical to leave cpu hotplug
    in 'disabled' state.

    To support the change we need to increse cpu_hotplug_disabled counter
    in disable_nonboot_cpus() unconditionally as all users of
    disable_nonboot_cpus() are supposed to do enable_nonboot_cpus() in case
    an error was returned.

    Signed-off-by: Vitaly Kuznetsov
    Reviewed-by: Thomas Gleixner
    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    Vitaly Kuznetsov
     
  • The 4 sysfs files should be stable ABIs to the user space.

    Signed-off-by: Dexuan Cui
    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    Dexuan Cui
     
  • By default lsvmbus lists all the devices in the VMBus.
    With -v or -vv, more information is printed, including the VMBus
    Rel_ID, class ID, device ID and which channel is bound to which
    virtual processor, etc.

    Signed-off-by: Dexuan Cui
    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    Dexuan Cui
     
  • This is useful to analyze performance issue.

    Signed-off-by: Dexuan Cui
    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    Dexuan Cui
     
  • The current Hyper-V clock source is based on the per-partition reference counter
    and this counter is being accessed via s synthetic MSR - HV_X64_MSR_TIME_REF_COUNT.
    Hyper-V has a more efficient way of computing the per-partition reference
    counter value that does not involve reading a synthetic MSR. We implement
    a time source based on this mechanism.

    Tested-by: Vivek Yadav
    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    K. Y. Srinivasan
     
  • Migrate hv driver to the new 'set-state' interface provided by
    clockevents core, the earlier 'set-mode' interface is marked obsolete
    now.

    This also enables us to implement callbacks for new states of clockevent
    devices, for example: ONESHOT_STOPPED.

    Cc: "K. Y. Srinivasan"
    Cc: Haiyang Zhang
    Cc: devel@linuxdriverproject.org
    Signed-off-by: Viresh Kumar
    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    Viresh Kumar
     
  • Fixes a bug where previously hv_ringbuffer_read would pass in the old
    number of bytes available to read instead of the expected old read index
    when calculating when to signal to the host that the ringbuffer is empty.
    Since the previous write size is already saved, also changes the
    hv_need_to_signal_on_read to use the previously read value rather than
    recalculating it.

    Signed-off-by: Christopher Oo
    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    Christopher Oo
     
  • Keep track of CPU affiliations of sub-channels within the scope of the primary
    channel. This will allow us to better distribute the load amongst available
    CPUs.

    Signed-off-by: Dexuan Cui
    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    Dexuan Cui
     
  • The current code tracks the assigned CPUs within a NUMA node in the context of
    the primary channel. So, if we have a VM with a single NUMA node with 8 VCPUs, we may
    end up unevenly distributing the channel load. Fix the issue by tracking affiliations
    globally.

    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    K. Y. Srinivasan
     
  • This patch deletes the logic from hyperv_fb which picked a range of MMIO space
    for the frame buffer and adds new logic to hv_vmbus which picks ranges for
    child drivers. The new logic isn't quite the same as the old, as it considers
    more possible ranges.

    Signed-off-by: Jake Oshins
    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    Jake Oshins
     
  • This patch changes the logic in hv_vmbus to record all of the ranges in the
    VM's firmware (BIOS or UEFI) that offer regions of memory-mapped I/O space for
    use by paravirtual front-end drivers. The old logic just found one range
    above 4GB and called it good. This logic will find any ranges above 1MB.

    It would have been possible with this patch to just use existing resource
    allocation functions, rather than keep track of the entire set of Hyper-V
    related MMIO regions in VMBus. This strategy, however, is not sufficient
    when the resource allocator needs to be aware of the constraints of a
    Hyper-V virtual machine, which is what happens in the next patch in the series.
    So this first patch exists to show the first steps in reworking the MMIO
    allocation paths for Hyper-V front-end drivers.

    Signed-off-by: Jake Oshins
    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    Jake Oshins
     
  • With well over 200+ users of this api, there are a mere 12 users that
    actually checked the return value of this function. And all of them
    really didn't do anything with that information as the system or module
    was shutting down no matter what.

    So stop pretending like it matters, and just return void from
    misc_deregister(). If something goes wrong in the call, you will get a
    WARNING splat in the syslog so you know how to fix up your driver.
    Other than that, there's nothing that can go wrong.

    Cc: Alasdair Kergon
    Cc: Neil Brown
    Cc: Oleg Drokin
    Cc: Andreas Dilger
    Cc: "Michael S. Tsirkin"
    Cc: Wim Van Sebroeck
    Cc: Christine Caulfield
    Cc: David Teigland
    Cc: Mark Fasheh
    Acked-by: Joel Becker
    Acked-by: Alexandre Belloni
    Acked-by: Alessandro Zummo
    Acked-by: Mike Snitzer
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

05 Aug, 2015

16 commits

  • We cycle through all the "high performance" channels to distribute
    load across the available CPUs. Process the NetworkDirect as a
    high performance device.

    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    K. Y. Srinivasan
     
  • Hypervisor Top Level Functional Specification v3.1/4.0 notes that cpuid
    (0x40000003) EDX's 10th bit should be used to check that Hyper-V guest
    crash MSR's functionality available.

    This patch should fix this recognition. Currently the code checks EAX
    register instead of EDX.

    Signed-off-by: Andrey Smetanin
    Signed-off-by: Denis V. Lunev
    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    Denis V. Lunev
     
  • Pre-Win2012R2 hosts don't properly handle CHANNELMSG_UNLOAD and
    wait_for_completion() hangs. Avoid sending such request on old hosts.

    Signed-off-by: Vitaly Kuznetsov
    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    Vitaly Kuznetsov
     
  • This fixes a typo: base_flag_bumber to base_flag_number

    Signed-off-by: Nik Nyby
    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    Nik Nyby
     
  • We don't catch this allocation failure because there is a typo and we
    check the wrong variable.

    Fixes: 14b50f80c32d ('Drivers: hv: util: introduce hv_utils_transport abstraction')

    Signed-off-by: Dan Carpenter
    Reviewed-by: Vitaly Kuznetsov
    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    Dan Carpenter
     
  • The guest may have to send a completion packet back to the host.
    To support this usage, permit sending a packet without a payload -
    we would be only sending the descriptor in this case.

    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    K. Y. Srinivasan
     
  • Support Win10 protocol for Dynamic Memory. Thia patch allows guests on Win10 hosts
    to hot-add memory even when dynamic memory is not enabled on the guest.

    Signed-off-by: Alex Ng
    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    Alex Ng
     
  • struct hv_start_fcopy is too big to be on stack on i386, the following
    warning is reported:

    >> drivers/hv/hv_fcopy.c:159:1: warning: the frame size of 1088 bytes is larger than 1024 bytes [-Wframe-larger-than=]

    Reported-by: kbuild test robot
    Signed-off-by: Vitaly Kuznetsov
    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    Vitaly Kuznetsov
     
  • kzalloc() return value check was accidentally lost in 11bc3a5fa91f:
    "Drivers: hv: kvp: convert to hv_utils_transport" commit.

    We don't need to reset kvp_transaction.state here as we have the
    kvp_timeout_func() timeout function and in case we're in OOM situation
    it is preferable to wait.

    Reported-by: Dan Carpenter
    Signed-off-by: Vitaly Kuznetsov
    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    Vitaly Kuznetsov
     
  • current_pt_regs() sometimes returns regs of the userspace process and in
    case of a kernel crash this is not what we need to report. E.g. when we
    trigger crash with sysrq we see the following:
    ...
    RIP: 0010:[] [] sysrq_handle_crash+0x16/0x20
    RSP: 0018:ffff8800db0a7d88 EFLAGS: 00010246
    RAX: 000000000000000f RBX: ffffffff820a0660 RCX: 0000000000000000
    ...
    at the same time current_pt_regs() give us:
    ip=7f899ea7e9e0, ax=ffffffffffffffda, bx=26c81a0, cx=7f899ea7e9e0, ...
    These registers come from the userspace process triggered the crash. As we
    don't even know which process it was this information is rather useless.

    When kernel crash happens through 'die' proper regs are being passed to
    all receivers on the die_chain (and panic_notifier_list is being notified
    with the string passed to panic() only). If panic() is called manually
    (e.g. on BUG()) we won't get 'die' notification so keep the 'panic'
    notification reporter as well but guard against double reporting.

    Signed-off-by: Vitaly Kuznetsov
    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    Vitaly Kuznetsov
     
  • Full kernel hang is observed when kdump kernel starts after a crash. This
    hang happens in vmbus_negotiate_version() function on
    wait_for_completion() as Hyper-V host (Win2012R2 in my testing) never
    responds to CHANNELMSG_INITIATE_CONTACT as it thinks the connection is
    already established. We need to perform some mandatory minimalistic
    cleanup before we start new kernel.

    Reported-by: K. Y. Srinivasan
    Signed-off-by: Vitaly Kuznetsov
    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    Vitaly Kuznetsov
     
  • At the very late stage of kexec a driver (which are not being unloaded) can
    try to post a message or signal an event. This will crash the kernel as we
    already did hv_cleanup() and the hypercall page is NULL.

    Move all common (between 32 and 64 bit code) declarations to the beginning
    of the do_hypercall() function. Unfortunately we have to write the
    !hypercall_page check twice to not mix declarations and code.

    Signed-off-by: Vitaly Kuznetsov
    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    Vitaly Kuznetsov
     
  • When general-purpose kexec (not kdump) is being performed in Hyper-V guest
    the newly booted kernel fails with an MCE error coming from the host. It
    is the same error which was fixed in the "Drivers: hv: vmbus: Implement
    the protocol for tearing down vmbus state" commit - monitor pages remain
    special and when they're being written to (as the new kernel doesn't know
    these pages are special) bad things happen. We need to perform some
    minimalistic cleanup before booting a new kernel on kexec. To do so we
    need to register a special machine_ops.shutdown handler to be executed
    before the native_machine_shutdown(). Registering a shutdown notification
    handler via the register_reboot_notifier() call is not sufficient as it
    happens to early for our purposes. machine_ops is not being exported to
    modules (and I don't think we want to export it) so let's do this in
    mshyperv.c

    The minimalistic cleanup consists of cleaning up clockevents, synic MSRs,
    guest os id MSR, and hypercall MSR.

    Kdump doesn't require all this stuff as it lives in a separate memory
    space.

    Signed-off-by: Vitaly Kuznetsov
    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    Vitaly Kuznetsov
     
  • If some piece of code wants to check kexec_in_progress it has to be put
    in #ifdef CONFIG_KEXEC block to not break the build in !CONFIG_KEXEC
    case. Overcome this limitation by defining kexec_in_progress to false.

    Signed-off-by: Vitaly Kuznetsov
    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    Vitaly Kuznetsov
     
  • We already have hv_synic_free() which frees all per-cpu pages for all
    CPUs, let's remove the hv_synic_free_cpu() call from hv_synic_cleanup()
    so it will be possible to do separate cleanup (writing to MSRs) and final
    freeing. This is going to be used to assist kexec.

    Signed-off-by: Vitaly Kuznetsov
    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    Vitaly Kuznetsov
     
  • Remove bogus check on pm_runtime_active that prevented
    disconnection from a client in case the device was resuming
    from power gating but not yet active.

    Fix regression introduced by
    18901357e70ae29e3fd1c58712a6847c2ae52eae
    mei: disconnect on connection request timeout

    Signed-off-by: Tomas Winkler
    Signed-off-by: Alexander Usyskin
    Signed-off-by: Greg Kroah-Hartman

    Tomas Winkler
     

04 Aug, 2015

6 commits