04 Nov, 2015

1 commit

  • Moved Hyper-V synic contants from guest Hyper-V drivers private
    header into x86 arch uapi Hyper-V header.

    Added Hyper-V synic msr's flags into x86 arch uapi Hyper-V header.

    Signed-off-by: Andrey Smetanin
    Reviewed-by: Roman Kagan
    Signed-off-by: Denis V. Lunev
    CC: Vitaly Kuznetsov
    CC: "K. Y. Srinivasan"
    CC: Gleb Natapov
    CC: Paolo Bonzini
    Signed-off-by: Paolo Bonzini

    Andrey Smetanin
     

21 Sep, 2015

1 commit


06 Aug, 2015

9 commits

  • Commit e513229b4c38 ("Drivers: hv: vmbus: prevent cpu offlining on newer
    hypervisors") was altering smp_ops.cpu_disable to prevent CPU offlining.
    We can bo better by using cpu_hotplug_enable/disable functions instead of
    such hard-coding.

    Reported-by: Radim Kr.má
    Signed-off-by: Vitaly Kuznetsov
    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    Vitaly Kuznetsov
     
  • This is useful to analyze performance issue.

    Signed-off-by: Dexuan Cui
    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    Dexuan Cui
     
  • The current Hyper-V clock source is based on the per-partition reference counter
    and this counter is being accessed via s synthetic MSR - HV_X64_MSR_TIME_REF_COUNT.
    Hyper-V has a more efficient way of computing the per-partition reference
    counter value that does not involve reading a synthetic MSR. We implement
    a time source based on this mechanism.

    Tested-by: Vivek Yadav
    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    K. Y. Srinivasan
     
  • Migrate hv driver to the new 'set-state' interface provided by
    clockevents core, the earlier 'set-mode' interface is marked obsolete
    now.

    This also enables us to implement callbacks for new states of clockevent
    devices, for example: ONESHOT_STOPPED.

    Cc: "K. Y. Srinivasan"
    Cc: Haiyang Zhang
    Cc: devel@linuxdriverproject.org
    Signed-off-by: Viresh Kumar
    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    Viresh Kumar
     
  • Fixes a bug where previously hv_ringbuffer_read would pass in the old
    number of bytes available to read instead of the expected old read index
    when calculating when to signal to the host that the ringbuffer is empty.
    Since the previous write size is already saved, also changes the
    hv_need_to_signal_on_read to use the previously read value rather than
    recalculating it.

    Signed-off-by: Christopher Oo
    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    Christopher Oo
     
  • Keep track of CPU affiliations of sub-channels within the scope of the primary
    channel. This will allow us to better distribute the load amongst available
    CPUs.

    Signed-off-by: Dexuan Cui
    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    Dexuan Cui
     
  • The current code tracks the assigned CPUs within a NUMA node in the context of
    the primary channel. So, if we have a VM with a single NUMA node with 8 VCPUs, we may
    end up unevenly distributing the channel load. Fix the issue by tracking affiliations
    globally.

    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    K. Y. Srinivasan
     
  • This patch deletes the logic from hyperv_fb which picked a range of MMIO space
    for the frame buffer and adds new logic to hv_vmbus which picks ranges for
    child drivers. The new logic isn't quite the same as the old, as it considers
    more possible ranges.

    Signed-off-by: Jake Oshins
    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    Jake Oshins
     
  • This patch changes the logic in hv_vmbus to record all of the ranges in the
    VM's firmware (BIOS or UEFI) that offer regions of memory-mapped I/O space for
    use by paravirtual front-end drivers. The old logic just found one range
    above 4GB and called it good. This logic will find any ranges above 1MB.

    It would have been possible with this patch to just use existing resource
    allocation functions, rather than keep track of the entire set of Hyper-V
    related MMIO regions in VMBus. This strategy, however, is not sufficient
    when the resource allocator needs to be aware of the constraints of a
    Hyper-V virtual machine, which is what happens in the next patch in the series.
    So this first patch exists to show the first steps in reworking the MMIO
    allocation paths for Hyper-V front-end drivers.

    Signed-off-by: Jake Oshins
    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    Jake Oshins
     

05 Aug, 2015

14 commits

  • We cycle through all the "high performance" channels to distribute
    load across the available CPUs. Process the NetworkDirect as a
    high performance device.

    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    K. Y. Srinivasan
     
  • Hypervisor Top Level Functional Specification v3.1/4.0 notes that cpuid
    (0x40000003) EDX's 10th bit should be used to check that Hyper-V guest
    crash MSR's functionality available.

    This patch should fix this recognition. Currently the code checks EAX
    register instead of EDX.

    Signed-off-by: Andrey Smetanin
    Signed-off-by: Denis V. Lunev
    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    Denis V. Lunev
     
  • Pre-Win2012R2 hosts don't properly handle CHANNELMSG_UNLOAD and
    wait_for_completion() hangs. Avoid sending such request on old hosts.

    Signed-off-by: Vitaly Kuznetsov
    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    Vitaly Kuznetsov
     
  • This fixes a typo: base_flag_bumber to base_flag_number

    Signed-off-by: Nik Nyby
    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    Nik Nyby
     
  • We don't catch this allocation failure because there is a typo and we
    check the wrong variable.

    Fixes: 14b50f80c32d ('Drivers: hv: util: introduce hv_utils_transport abstraction')

    Signed-off-by: Dan Carpenter
    Reviewed-by: Vitaly Kuznetsov
    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    Dan Carpenter
     
  • The guest may have to send a completion packet back to the host.
    To support this usage, permit sending a packet without a payload -
    we would be only sending the descriptor in this case.

    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    K. Y. Srinivasan
     
  • Support Win10 protocol for Dynamic Memory. Thia patch allows guests on Win10 hosts
    to hot-add memory even when dynamic memory is not enabled on the guest.

    Signed-off-by: Alex Ng
    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    Alex Ng
     
  • struct hv_start_fcopy is too big to be on stack on i386, the following
    warning is reported:

    >> drivers/hv/hv_fcopy.c:159:1: warning: the frame size of 1088 bytes is larger than 1024 bytes [-Wframe-larger-than=]

    Reported-by: kbuild test robot
    Signed-off-by: Vitaly Kuznetsov
    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    Vitaly Kuznetsov
     
  • kzalloc() return value check was accidentally lost in 11bc3a5fa91f:
    "Drivers: hv: kvp: convert to hv_utils_transport" commit.

    We don't need to reset kvp_transaction.state here as we have the
    kvp_timeout_func() timeout function and in case we're in OOM situation
    it is preferable to wait.

    Reported-by: Dan Carpenter
    Signed-off-by: Vitaly Kuznetsov
    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    Vitaly Kuznetsov
     
  • current_pt_regs() sometimes returns regs of the userspace process and in
    case of a kernel crash this is not what we need to report. E.g. when we
    trigger crash with sysrq we see the following:
    ...
    RIP: 0010:[] [] sysrq_handle_crash+0x16/0x20
    RSP: 0018:ffff8800db0a7d88 EFLAGS: 00010246
    RAX: 000000000000000f RBX: ffffffff820a0660 RCX: 0000000000000000
    ...
    at the same time current_pt_regs() give us:
    ip=7f899ea7e9e0, ax=ffffffffffffffda, bx=26c81a0, cx=7f899ea7e9e0, ...
    These registers come from the userspace process triggered the crash. As we
    don't even know which process it was this information is rather useless.

    When kernel crash happens through 'die' proper regs are being passed to
    all receivers on the die_chain (and panic_notifier_list is being notified
    with the string passed to panic() only). If panic() is called manually
    (e.g. on BUG()) we won't get 'die' notification so keep the 'panic'
    notification reporter as well but guard against double reporting.

    Signed-off-by: Vitaly Kuznetsov
    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    Vitaly Kuznetsov
     
  • Full kernel hang is observed when kdump kernel starts after a crash. This
    hang happens in vmbus_negotiate_version() function on
    wait_for_completion() as Hyper-V host (Win2012R2 in my testing) never
    responds to CHANNELMSG_INITIATE_CONTACT as it thinks the connection is
    already established. We need to perform some mandatory minimalistic
    cleanup before we start new kernel.

    Reported-by: K. Y. Srinivasan
    Signed-off-by: Vitaly Kuznetsov
    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    Vitaly Kuznetsov
     
  • At the very late stage of kexec a driver (which are not being unloaded) can
    try to post a message or signal an event. This will crash the kernel as we
    already did hv_cleanup() and the hypercall page is NULL.

    Move all common (between 32 and 64 bit code) declarations to the beginning
    of the do_hypercall() function. Unfortunately we have to write the
    !hypercall_page check twice to not mix declarations and code.

    Signed-off-by: Vitaly Kuznetsov
    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    Vitaly Kuznetsov
     
  • When general-purpose kexec (not kdump) is being performed in Hyper-V guest
    the newly booted kernel fails with an MCE error coming from the host. It
    is the same error which was fixed in the "Drivers: hv: vmbus: Implement
    the protocol for tearing down vmbus state" commit - monitor pages remain
    special and when they're being written to (as the new kernel doesn't know
    these pages are special) bad things happen. We need to perform some
    minimalistic cleanup before booting a new kernel on kexec. To do so we
    need to register a special machine_ops.shutdown handler to be executed
    before the native_machine_shutdown(). Registering a shutdown notification
    handler via the register_reboot_notifier() call is not sufficient as it
    happens to early for our purposes. machine_ops is not being exported to
    modules (and I don't think we want to export it) so let's do this in
    mshyperv.c

    The minimalistic cleanup consists of cleaning up clockevents, synic MSRs,
    guest os id MSR, and hypercall MSR.

    Kdump doesn't require all this stuff as it lives in a separate memory
    space.

    Signed-off-by: Vitaly Kuznetsov
    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    Vitaly Kuznetsov
     
  • We already have hv_synic_free() which frees all per-cpu pages for all
    CPUs, let's remove the hv_synic_free_cpu() call from hv_synic_cleanup()
    so it will be possible to do separate cleanup (writing to MSRs) and final
    freeing. This is going to be used to assist kexec.

    Signed-off-by: Vitaly Kuznetsov
    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    Vitaly Kuznetsov
     

13 Jun, 2015

1 commit


01 Jun, 2015

4 commits

  • Channels/sub-channels can be affinitized to VCPUs in the guest. Implement
    this affinity in a way that is NUMA aware. The current protocol distributed
    the primary channels uniformly across all available CPUs. The new protocol
    is NUMA aware: primary channels are distributed across the available NUMA
    nodes while the sub-channels within a primary channel are distributed amongst
    CPUs within the NUMA node assigned to the primary channel.

    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    K. Y. Srinivasan
     
  • Map target_cpu to target_vcpu using the mapping table.
    We should use the mapping table to transform guest CPU ID to VP Index
    as is done for the non-performance critical channels.
    While the value CPU 0 is special and will
    map to VP index 0, it is good to be consistent.

    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    K. Y. Srinivasan
     
  • Memory notifiers are being executed in a sequential order and when one of
    them fails returning something different from NOTIFY_OK the remainder of
    the notification chain is not being executed. When a memory block is being
    onlined in online_pages() we do memory_notify(MEM_GOING_ONLINE, ) and if
    one of the notifiers in the chain fails we end up doing
    memory_notify(MEM_CANCEL_ONLINE, ) so it is possible for a notifier to see
    MEM_CANCEL_ONLINE without seeing the corresponding MEM_GOING_ONLINE event.
    E.g. when CONFIG_KASAN is enabled the kasan_mem_notifier() is being used
    to prevent memory hotplug, it returns NOTIFY_BAD for all MEM_GOING_ONLINE
    events. As kasan_mem_notifier() comes before the hv_memory_notifier() in
    the notification chain we don't see the MEM_GOING_ONLINE event and we do
    not take the ha_region_mutex. We, however, see the MEM_CANCEL_ONLINE event
    and unconditionally try to release the lock, the following is observed:

    [ 110.850927] =====================================
    [ 110.850927] [ BUG: bad unlock balance detected! ]
    [ 110.850927] 4.1.0-rc3_bugxxxxxxx_test_xxxx #595 Not tainted
    [ 110.850927] -------------------------------------
    [ 110.850927] systemd-udevd/920 is trying to release lock
    (&dm_device.ha_region_mutex) at:
    [ 110.850927] [] mutex_unlock+0xe/0x10
    [ 110.850927] but there are no more locks to release!

    At the same time we can have the ha_region_mutex taken when we get the
    MEM_CANCEL_ONLINE event in case one of the memory notifiers after the
    hv_memory_notifier() in the notification chain failed so we need to add
    the mutex_is_locked() check. In case of MEM_ONLINE we are always supposed
    to have the mutex locked.

    Signed-off-by: Vitaly Kuznetsov

    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    Vitaly Kuznetsov
     
  • Add support for Windows 10.

    Signed-off-by: Keith Mange
    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    Keith Mange
     

25 May, 2015

10 commits

  • Primary channels are distributed evenly across all vcpus we have. When the host
    asks us to create subchannels it usually makes us num_cpus-1 offers and we are
    supposed to distribute the work evenly among the channel itself and all its
    subchannels. Make sure they are all assigned to different vcpus.

    Signed-off-by: Vitaly Kuznetsov
    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    Vitaly Kuznetsov
     
  • We need to call init_vp_index() after we added the channel to the appropriate
    list (global or subchannel) to be able to use this information when assigning
    the channel to the particular vcpu. To do so we need to move a couple of
    functions around. The only real change is the init_vp_index() call. This is a
    small refactoring without a functional change.

    Signed-off-by: Vitaly Kuznetsov
    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    Vitaly Kuznetsov
     
  • It is unlikely that that host will ask us to close only one subchannel for a
    device but let's be consistent. Do both num_sc++ and num_sc-- with
    channel->lock to be on the safe side.

    Signed-off-by: Vitaly Kuznetsov
    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    Vitaly Kuznetsov
     
  • Remove some code duplication, no functional change intended.

    Signed-off-by: Vitaly Kuznetsov
    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    Vitaly Kuznetsov
     
  • Explicitly kill tasklets we create on module unload.

    Signed-off-by: Vitaly Kuznetsov
    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    Vitaly Kuznetsov
     
  • In case there was an error reported in the response to the CHANNELMSG_OPENCHANNEL
    call we need to do the cleanup as a vmbus_open() user won't be doing it after
    receiving an error. The cleanup should be done on all failure paths. We also need
    to avoid returning open_info->response.open_result.status as the return value as
    all other errors we return from vmbus_open() are -EXXX and vmbus_open() callers
    are not supposed to analyze host error codes.

    Signed-off-by: Vitaly Kuznetsov
    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    Vitaly Kuznetsov
     
  • Implement the protocol for tearing down the monitor state established with
    the host.

    Signed-off-by: K. Y. Srinivasan
    Tested-by: Vitaly Kuznetsov
    Signed-off-by: Greg Kroah-Hartman

    K. Y. Srinivasan
     
  • free_channel() has been invoked in
    vmbus_remove() -> hv_process_channel_removal(), or vmbus_remove() ->
    ... -> vmbus_close_internal() -> hv_process_channel_removal().

    We also change to use list_for_each_entry_safe(), because the entry
    is removed in hv_process_channel_removal().

    This patch fixes a bug in the vmbus unload path.

    Thank Dan Carpenter for finding the issue!

    Signed-off-by: Dexuan Cui
    Reported-by: Dan Carpenter
    Cc: K. Y. Srinivasan
    Cc: Vitaly Kuznetsov
    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    Dexuan Cui
     
  • Commit 96c1d0581d00f7abe033350edb021a9d947d8d81 ("Drivers: hv: vmbus: Add
    support for VMBus panic notifier handler") introduced
    atomic_notifier_chain_register() call on module load. We also need to call
    atomic_notifier_chain_unregister() on module unload as otherwise the following
    crash is observed when we bring hv_vmbus back:

    [ 39.788877] BUG: unable to handle kernel paging request at ffffffffa00078a8
    [ 39.788877] IP: [] notifier_call_chain+0x3f/0x80
    ...
    [ 39.788877] Call Trace:
    [ 39.788877] [] __atomic_notifier_call_chain+0x5d/0x90
    ...
    [ 39.788877] [] ? atomic_notifier_chain_register+0x38/0x70
    [ 39.788877] [] ? atomic_notifier_chain_register+0x17/0x70
    [ 39.788877] [] hv_acpi_init+0x14f/0x1000 [hv_vmbus]
    [ 39.788877] [] do_one_initcall+0xd4/0x210

    Signed-off-by: Vitaly Kuznetsov
    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    Vitaly Kuznetsov
     
  • In case we do request_resource() in vmbus_acpi_add() we need to tear it down
    to be able to load the driver again. Otherwise the following crash in observed
    when hv_vmbus unload/load sequence is performed on a Generation2 instance:

    [ 38.165701] BUG: unable to handle kernel paging request at ffffffffa00075a0
    [ 38.166315] IP: [] __request_resource+0x2f/0x50
    [ 38.166315] PGD 1f34067 PUD 1f35063 PMD 3f723067 PTE 0
    [ 38.166315] Oops: 0000 [#1] SMP
    [ 38.166315] Modules linked in: hv_vmbus(+) [last unloaded: hv_vmbus]
    [ 38.166315] CPU: 0 PID: 267 Comm: modprobe Not tainted 3.19.0-rc5_bug923184+ #486
    [ 38.166315] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v1.0 11/26/2012
    [ 38.166315] task: ffff88003f401cb0 ti: ffff88003f60c000 task.ti: ffff88003f60c000
    [ 38.166315] RIP: 0010:[] [] __request_resource+0x2f/0x50
    [ 38.166315] RSP: 0018:ffff88003f60fb58 EFLAGS: 00010286

    Signed-off-by: Vitaly Kuznetsov
    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    Vitaly Kuznetsov