13 Dec, 2018

1 commit

  • commit 37c2578c0c40e286bc0d30bdc05290b2058cf66e upstream.

    vmbus_process_offer() mustn't call channel->sc_creation_callback()
    directly for sub-channels, because sc_creation_callback() ->
    vmbus_open() may never get the host's response to the
    OPEN_CHANNEL message (the host may rescind a channel at any time,
    e.g. in the case of hot removing a NIC), and vmbus_onoffer_rescind()
    may not wake up the vmbus_open() as it's blocked due to a non-zero
    vmbus_connection.offer_in_progress, and finally we have a deadlock.

    The above is also true for primary channels, if the related device
    drivers use sync probing mode by default.

    And, usually the handling of primary channels and sub-channels can
    depend on each other, so we should offload them to different
    workqueues to avoid possible deadlock, e.g. in sync-probing mode,
    NIC1's netvsc_subchan_work() can race with NIC2's netvsc_probe() ->
    rtnl_lock(), and causes deadlock: the former gets the rtnl_lock
    and waits for all the sub-channels to appear, but the latter
    can't get the rtnl_lock and this blocks the handling of sub-channels.

    The patch can fix the multiple-NIC deadlock described above for
    v3.x kernels (e.g. RHEL 7.x) which don't support async-probing
    of devices, and v4.4, v4.9, v4.14 and v4.18 which support async-probing
    but don't enable async-probing for Hyper-V drivers (yet).

    The patch can also fix the hang issue in sub-channel's handling described
    above for all versions of kernels, including v4.19 and v4.20-rc4.

    So actually the patch should be applied to all the existing kernels,
    not only the kernels that have 8195b1396ec8.

    Fixes: 8195b1396ec8 ("hv_netvsc: fix deadlock on hotplug")
    Cc: stable@vger.kernel.org
    Cc: Stephen Hemminger
    Cc: K. Y. Srinivasan
    Cc: Haiyang Zhang
    Signed-off-by: Dexuan Cui
    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    Dexuan Cui
     

10 Oct, 2018

1 commit

  • commit 41e270f6898e7502be9fd6920ee0a108ca259d36 upstream.

    With CONFIG_DEBUG_PREEMPT=y, I always see this warning:
    BUG: using smp_processor_id() in preemptible [00000000]

    Fix the false warning by using get/put_cpu().

    Here vmbus_connect() sends a message to the host and waits for the
    host's response. The host will deliver the response message and an
    interrupt on CPU msg->target_vcpu, and later the interrupt handler
    will wake up vmbus_connect(). vmbus_connect() doesn't really have
    to run on the same cpu as CPU msg->target_vcpu, so it's safe to
    call put_cpu() just here.

    Signed-off-by: Dexuan Cui
    Cc: stable@vger.kernel.org
    Cc: K. Y. Srinivasan
    Cc: Haiyang Zhang
    Cc: Stephen Hemminger
    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    Dexuan Cui
     

10 Aug, 2017

3 commits

  • To support implementing remote TLB flushing on Hyper-V with a hypercall
    we need to make vp_index available outside of vmbus module. Rename and
    globalize.

    Signed-off-by: Vitaly Kuznetsov
    Reviewed-by: Andy Shevchenko
    Reviewed-by: Stephen Hemminger
    Cc: Andy Lutomirski
    Cc: Haiyang Zhang
    Cc: Jork Loeser
    Cc: K. Y. Srinivasan
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Simon Xiao
    Cc: Steven Rostedt
    Cc: Thomas Gleixner
    Cc: devel@linuxdriverproject.org
    Link: http://lkml.kernel.org/r/20170802160921.21791-7-vkuznets@redhat.com
    Signed-off-by: Ingo Molnar

    Vitaly Kuznetsov
     
  • We need to pass only 8 bytes of input for HvSignalEvent which makes it a
    perfect fit for fast hypercall. hv_input_signal_event_buffer is not needed
    any more and hv_input_signal_event is converted to union for convenience.

    Signed-off-by: Vitaly Kuznetsov
    Reviewed-by: Andy Shevchenko
    Reviewed-by: Stephen Hemminger
    Cc: Andy Lutomirski
    Cc: Haiyang Zhang
    Cc: Jork Loeser
    Cc: K. Y. Srinivasan
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Simon Xiao
    Cc: Steven Rostedt
    Cc: Thomas Gleixner
    Cc: devel@linuxdriverproject.org
    Link: http://lkml.kernel.org/r/20170802160921.21791-5-vkuznets@redhat.com
    Signed-off-by: Ingo Molnar

    Vitaly Kuznetsov
     
  • We have only three call sites for hv_do_hypercall() and we're going to
    change HVCALL_SIGNAL_EVENT to doing fast hypercall so we can inline this
    function for optimization.

    Hyper-V top level functional specification states that r9-r11 registers
    and flags may be clobbered by the hypervisor during hypercall and with
    inlining this is somewhat important, add the clobbers.

    Signed-off-by: Vitaly Kuznetsov
    Reviewed-by: Andy Shevchenko
    Reviewed-by: Stephen Hemminger
    Cc: Andy Lutomirski
    Cc: Haiyang Zhang
    Cc: Jork Loeser
    Cc: K. Y. Srinivasan
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Simon Xiao
    Cc: Steven Rostedt
    Cc: Thomas Gleixner
    Cc: devel@linuxdriverproject.org
    Link: http://lkml.kernel.org/r/20170802160921.21791-3-vkuznets@redhat.com
    Signed-off-by: Ingo Molnar

    Vitaly Kuznetsov
     

25 May, 2017

1 commit


18 May, 2017

2 commits

  • Fix the rescind handling. This patch addresses the following rescind
    scenario that is currently not handled correctly:

    If a rescind were to be received while the offer is still being
    peocessed, we will be blocked indefinitely since the rescind message
    is handled on the same work element as the offer message. Fix this
    issue.

    I would like to thank Dexuan Cui and
    Long Li for working with me on this patch.

    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    K. Y. Srinivasan
     
  • ENOBUFS is a more approrpiate error code to be returned
    when the hypervisor cannot post the message because of
    insufficient buffers. Make the adjustment.

    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    K. Y. Srinivasan
     

17 Mar, 2017

1 commit

  • The change to reschedule tasklet if more data arrives in ring buffer
    can cause performance regression if host timing is such that the
    next response happens in small window.

    Go back to a modified version of the original looping behavior.
    If the race occurs in a small time, then loop. But if the tasklet
    has been running for a long interval due to flood, then reschedule
    the tasklet to allow migration to ksoftirqd.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    Stephen Hemminger
     

15 Feb, 2017

3 commits

  • Change the simple boolean batched_reading into a tri-value.
    For future NAPI support in netvsc driver, the callback needs to
    occur directly in interrupt handler.

    Batched mode is also changed to disable host interrupts immediately
    in interrupt routine (to avoid unnecessary host signals), and the
    tasklet is rescheduled if more data is detected.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    Stephen Hemminger
     
  • Make the event handling tasklet per channel rather than per-cpu.
    This allows for better fairness when getting lots of data on the same
    cpu.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    Stephen Hemminger
     
  • The hv_context structure had several arrays which were per-cpu
    and was allocating small structures (tasklet_struct). Instead use
    a single per-cpu array.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    Stephen Hemminger
     

10 Feb, 2017

1 commit


20 Jan, 2017

1 commit


11 Jan, 2017

1 commit

  • DoS protection conditions were altered in WS2016 and now it's easy to get
    -EAGAIN returned from vmbus_post_msg() (e.g. when we try changing MTU on a
    netvsc device in a loop). All vmbus_post_msg() callers don't retry the
    operation and we usually end up with a non-functional device or crash.

    While host's DoS protection conditions are unknown to me my tests show that
    it can take up to 10 seconds before the message is sent so doing udelay()
    is not an option, we really need to sleep. Almost all vmbus_post_msg()
    callers are ready to sleep but there is one special case:
    vmbus_initiate_unload() which can be called from interrupt/NMI context and
    we can't sleep there. I'm also not sure about the lonely
    vmbus_send_tl_connect_request() which has no in-tree users but its external
    users are most likely waiting for the host to reply so sleeping there is
    also appropriate.

    Signed-off-by: Vitaly Kuznetsov
    Signed-off-by: K. Y. Srinivasan
    Cc:
    Signed-off-by: Greg Kroah-Hartman

    Vitaly Kuznetsov
     

06 Dec, 2016

1 commit


31 Aug, 2016

1 commit


01 May, 2016

1 commit


02 Mar, 2016

2 commits

  • WS2012 R2 and above hosts can support kexec in that thay can support
    reconnecting to the host (as would be needed in the kexec path)
    on any CPU. Enable this. Pre ws2012 r2 hosts don't have this ability
    and consequently cannot support kexec.

    Signed-off-by: Alex Ng
    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    Alex Ng
     
  • wait_for_completion() may sleep, it enables interrupts and this
    is something we really want to avoid on crashes because interrupt
    handlers can cause other crashes. Switch to the recently introduced
    vmbus_wait_for_unload() doing busy wait instead.

    Reported-by: Radim Krcmar
    Signed-off-by: Vitaly Kuznetsov
    Reviewed-by: Radim Kr.má
    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    Vitaly Kuznetsov
     

08 Feb, 2016

2 commits


15 Dec, 2015

2 commits


01 Jun, 2015

1 commit


25 May, 2015

1 commit


03 Apr, 2015

2 commits

  • Most of the retries can be done within a millisecond successfully, so we
    sleep 1ms before the first retry, then gradually increase the retry
    interval to 2^n with max value of 2048ms. Doing so, we will have shorter
    overall delay time, because most of the cases succeed within 1-2 attempts.

    Signed-off-by: Haiyang Zhang
    Reviewed-by: K. Y. Srinivasan
    Reviewed-by: Dexuan Cui
    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    Haiyang Zhang
     
  • Since the 2 fucntions can safely run in vmbus_connection.work_queue without
    hang, we don't need to schedule new work items into the per-channel workqueue.

    Actally we can even remove the per-channel workqueue now -- we'll do it
    in the next patch.

    Signed-off-by: Dexuan Cui
    Cc: K. Y. Srinivasan
    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    Dexuan Cui
     

25 Mar, 2015

1 commit


02 Mar, 2015

3 commits

  • Currently we log messages when either we are not able to map an ID to a
    channel or when the channel does not have a callback associated
    (in the channel interrupt handling path). These messages don't add
    any value, get rid of them.

    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    K. Y. Srinivasan
     
  • I got HV_STATUS_INVALID_CONNECTION_ID on Hyper-V 2008 R2 when keeping running
    "rmmod hv_netvsc; modprobe hv_netvsc; rmmod hv_utils; modprobe hv_utils"
    in a Linux guest. Looks the host has some kind of throttling mechanism if
    some kinds of hypercalls are sent too frequently.
    Without the patch, the driver can occasionally fail to load.

    Also let's retry HV_STATUS_INSUFFICIENT_MEMORY, though we didn't get it
    before.

    Removed 'case -ENOMEM', since the hypervisor doesn't return this.

    CC: "K. Y. Srinivasan"
    Reviewed-by: Jason Wang
    Signed-off-by: Dexuan Cui
    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    Dexuan Cui
     
  • We need to destroy hv_vmbus_con on module shutdown, otherwise the following
    crash is sometimes observed:

    [ 76.569845] hv_vmbus: Hyper-V Host Build:9600-6.3-17-0.17039; Vmbus version:3.0
    [ 82.598859] BUG: unable to handle kernel paging request at ffffffffa0003480
    [ 82.599287] IP: [] 0xffffffffa0003480
    [ 82.599287] PGD 1f34067 PUD 1f35063 PMD 3f72d067 PTE 0
    [ 82.599287] Oops: 0010 [#1] SMP
    [ 82.599287] Modules linked in: [last unloaded: hv_vmbus]
    [ 82.599287] CPU: 0 PID: 26 Comm: kworker/0:1 Not tainted 3.19.0-rc5_bug923184+ #488
    [ 82.599287] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v1.0 11/26/2012
    [ 82.599287] Workqueue: hv_vmbus_con 0xffffffffa0003480
    [ 82.599287] task: ffff88007b6ddfa0 ti: ffff88007f8f8000 task.ti: ffff88007f8f8000
    [ 82.599287] RIP: 0010:[] [] 0xffffffffa0003480
    [ 82.599287] RSP: 0018:ffff88007f8fbe00 EFLAGS: 00010202
    ...

    To avoid memory leaks we need to free monitor_pages and int_page for
    vmbus_connection. Implement vmbus_disconnect() function by separating cleanup
    path from vmbus_connect().

    As we use hv_vmbus_con to release channels (see free_channel() in channel_mgmt.c)
    we need to make sure the work was done before we remove the queue, do that with
    drain_workqueue(). We also need to avoid handling messages which can (potentially)
    create new channels, so set vmbus_connection.conn_state = DISCONNECTED at the very
    beginning of vmbus_exit() and check for that in vmbus_onmessage_work().

    Signed-off-by: Vitaly Kuznetsov
    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    Vitaly Kuznetsov
     

12 Jan, 2015

1 commit


24 Sep, 2014

1 commit

  • Posting messages to the host can fail because of transient resource
    related failures. Correctly deal with these failures and increase the
    number of attempts to post the message before giving up.

    In this version of the patch, I have normalized the error code to
    Linux error code.

    Signed-off-by: K. Y. Srinivasan
    Cc:
    Tested-by: Sitsofe Wheeler
    Signed-off-by: Greg Kroah-Hartman

    K. Y. Srinivasan
     

10 Jul, 2014

1 commit

  • Starting with Win8, we have implemented several optimizations to improve the
    scalability and performance of the VMBUS transport between the Host and the
    Guest. Some of the non-performance critical services cannot leverage these
    optimization since they only read and process one message at a time.
    Make adjustments to the callback dispatch code to account for the way
    non-performance critical drivers handle reading of the channel.

    Signed-off-by: K. Y. Srinivasan
    Cc:
    Signed-off-by: Greg Kroah-Hartman

    K. Y. Srinivasan
     

29 May, 2014

1 commit

  • We try to free two pages when only one has been allocated.
    Cleanup path is unlikely, so I haven't found any trace that would fit,
    but I hope that free_pages_prepare() does catch it.

    Cc: stable@vger.kernel.org
    Signed-off-by: Radim Krčmář
    Reviewed-by: Amos Kong
    Acked-by: Jason Wang
    Signed-off-by: Greg Kroah-Hartman

    Radim Krčmář
     

04 May, 2014

2 commits


17 Apr, 2014

1 commit


08 Feb, 2014

1 commit

  • When the guest attempts to connect with the host when there may already be a
    connection with the host (as would be the case during the kdump/kexec path),
    it is difficult to guarantee timely response from the host. Starting with
    WS2012 R2, the host supports this ability to re-connect with the host
    (explicitly to support kexec). Prior to responding to the guest, the host
    needs to ensure that device states based on the previous connection to
    the host have been properly torn down. This may introduce unbounded delays.
    To deal with this issue, don't do a timed wait during the initial connect
    with the host.

    Signed-off-by: K. Y. Srinivasan
    Cc:
    Signed-off-by: Greg Kroah-Hartman

    K. Y. Srinivasan