02 Oct, 2019

1 commit

  • xennet_fill_frags() uses ~0U as return value when the sk_buff is not able
    to cache extra fragments. This is incorrect because the return type of
    xennet_fill_frags() is RING_IDX and 0xffffffff is an expected value for
    ring buffer index.

    In the situation when the rsp_cons is approaching 0xffffffff, the return
    value of xennet_fill_frags() may become 0xffffffff which xennet_poll() (the
    caller) would regard as error. As a result, queue->rx.rsp_cons is set
    incorrectly because it is updated only when there is error. If there is no
    error, xennet_poll() would be responsible to update queue->rx.rsp_cons.
    Finally, queue->rx.rsp_cons would point to the rx ring buffer entries whose
    queue->rx_skbs[i] and queue->grant_rx_ref[i] are already cleared to NULL.
    This leads to NULL pointer access in the next iteration to process rx ring
    buffer entries.

    The symptom is similar to the one fixed in
    commit 00b368502d18 ("xen-netfront: do not assume sk_buff_head list is
    empty in error handling").

    This patch changes the return type of xennet_fill_frags() to indicate
    whether it is successful or failed. The queue->rx.rsp_cons will be
    always updated inside this function.

    Fixes: ad4f15dc2c70 ("xen/netfront: don't bug in case of too many frags")
    Signed-off-by: Dongli Zhang
    Reviewed-by: Juergen Gross
    Signed-off-by: David S. Miller

    Dongli Zhang
     

18 Sep, 2019

1 commit


17 Sep, 2019

1 commit

  • When skb_shinfo(skb) is not able to cache extra fragment (that is,
    skb_shinfo(skb)->nr_frags >= MAX_SKB_FRAGS), xennet_fill_frags() assumes
    the sk_buff_head list is already empty. As a result, cons is increased only
    by 1 and returns to error handling path in xennet_poll().

    However, if the sk_buff_head list is not empty, queue->rx.rsp_cons may be
    set incorrectly. That is, queue->rx.rsp_cons would point to the rx ring
    buffer entries whose queue->rx_skbs[i] and queue->grant_rx_ref[i] are
    already cleared to NULL. This leads to NULL pointer access in the next
    iteration to process rx ring buffer entries.

    Below is how xennet_poll() does error handling. All remaining entries in
    tmpq are accounted to queue->rx.rsp_cons without assuming how many
    outstanding skbs are remained in the list.

    985 static int xennet_poll(struct napi_struct *napi, int budget)
    ... ...
    1032 if (unlikely(xennet_set_skb_gso(skb, gso))) {
    1033 __skb_queue_head(&tmpq, skb);
    1034 queue->rx.rsp_cons += skb_queue_len(&tmpq);
    1035 goto err;
    1036 }

    It is better to always have the error handling in the same way.

    Fixes: ad4f15dc2c70 ("xen/netfront: don't bug in case of too many frags")
    Signed-off-by: Dongli Zhang
    Signed-off-by: David S. Miller

    Dongli Zhang
     

31 Jul, 2019

1 commit


17 Apr, 2019

1 commit

  • In preparation to enabling -Wimplicit-fallthrough, mark switch
    cases where we are expecting to fall through.

    This patch fixes the following warning:

    drivers/net/xen-netfront.c: In function ‘netback_changed’:
    drivers/net/xen-netfront.c:2038:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
    if (dev->state == XenbusStateClosed)
    ^
    drivers/net/xen-netfront.c:2041:2: note: here
    case XenbusStateClosing:
    ^~~~

    Warning level 3 was used: -Wimplicit-fallthrough=3

    Notice that, in this particular case, the code comment is modified
    in accordance with what GCC is expecting to find.

    This patch is part of the ongoing efforts to enable
    -Wimplicit-fallthrough.

    Signed-off-by: Gustavo A. R. Silva
    Reviewed-by: Juergen Gross
    Signed-off-by: David S. Miller

    Gustavo A. R. Silva
     

21 Mar, 2019

1 commit

  • After the previous patch, all the callers of ndo_select_queue()
    provide as a 'fallback' argument netdev_pick_tx.
    The only exceptions are nested calls to ndo_select_queue(),
    which pass down the 'fallback' available in the current scope
    - still netdev_pick_tx.

    We can drop such argument and replace fallback() invocation with
    netdev_pick_tx(). This avoids an indirect call per xmit packet
    in some scenarios (TCP syn, UDP unconnected, XDP generic, pktgen)
    with device drivers implementing such ndo. It also clean the code
    a bit.

    Tested with ixgbe and CONFIG_FCOE=m

    With pktgen using queue xmit:
    threads vanilla patched
    (kpps) (kpps)
    1 2334 2428
    2 4166 4278
    4 7895 8100

    v1 -> v2:
    - rebased after helper's name change

    Signed-off-by: Paolo Abeni
    Signed-off-by: David S. Miller

    Paolo Abeni
     

21 Dec, 2018

1 commit


19 Dec, 2018

1 commit

  • At least old Xen net backends seem to send frags with no real data
    sometimes. In case such a fragment happens to occur with the frag limit
    already reached the frontend will BUG currently even if this situation
    is easily recoverable.

    Modify the BUG_ON() condition accordingly.

    Tested-by: Dietmar Hahn
    Signed-off-by: Juergen Gross
    Signed-off-by: David S. Miller

    Juergen Gross
     

10 Nov, 2018

1 commit

  • RING_PUSH_REQUESTS_AND_CHECK_NOTIFY is already able to make sure backend sees
    requests before req_prod is updated.

    Signed-off-by: Jacob Wen
    Reviewed-by: Juergen Gross
    Reviewed-by: Wei Liu
    Signed-off-by: David S. Miller

    Jacob Wen
     

13 Sep, 2018

1 commit

  • Commit 57f230ab04d291 ("xen/netfront: raise max number of slots in
    xennet_get_responses()") raised the max number of allowed slots by one.
    This seems to be problematic in some configurations with netback using
    a larger MAX_SKB_FRAGS value (e.g. old Linux kernel with MAX_SKB_FRAGS
    defined as 18 instead of nowadays 17).

    Instead of BUG_ON() in this case just fall back to retransmission.

    Fixes: 57f230ab04d291 ("xen/netfront: raise max number of slots in xennet_get_responses()")
    Cc: stable@vger.kernel.org
    Signed-off-by: Juergen Gross
    Signed-off-by: David S. Miller

    Juergen Gross
     

08 Sep, 2018

1 commit

  • Commit 822fb18a82aba ("xen-netfront: wait xenbus state change when load
    module manually") added a new wait queue to wait on for a state change
    when the module is loaded manually. Unfortunately there is no wakeup
    anywhere to stop that waiting.

    Instead of introducing a new wait queue rename the existing
    module_unload_q to module_wq and use it for both purposes (loading and
    unloading).

    As any state change of the backend might be intended to stop waiting
    do the wake_up_all() in any case when netback_changed() is called.

    Fixes: 822fb18a82aba ("xen-netfront: wait xenbus state change when load module manually")
    Cc: #4.18
    Signed-off-by: Juergen Gross
    Reviewed-by: Boris Ostrovsky
    Signed-off-by: David S. Miller

    Juergen Gross
     

15 Aug, 2018

1 commit

  • There is a call trace generated after commit 2d408c0d4574b01b9ed45e02516888bf925e11a9(
    xen-netfront: fix queue name setting). There is no 'device/vif/xx-q0-tx' file found
    under /proc/irq/xx/.

    This patch only picks up device type and id as its name.

    With the patch, now /proc/interrupts looks like below and the warning message gone:
    70: 21 0 0 0 xen-dyn -event vif0-q0-tx
    71: 15 0 0 0 xen-dyn -event vif0-q0-rx
    72: 14 0 0 0 xen-dyn -event vif0-q1-tx
    73: 33 0 0 0 xen-dyn -event vif0-q1-rx
    74: 12 0 0 0 xen-dyn -event vif0-q2-tx
    75: 24 0 0 0 xen-dyn -event vif0-q2-rx
    76: 19 0 0 0 xen-dyn -event vif0-q3-tx
    77: 21 0 0 0 xen-dyn -event vif0-q3-rx

    Below is call trace information without this patch:

    name 'device/vif/0-q0-tx'
    WARNING: CPU: 2 PID: 37 at fs/proc/generic.c:174 __xlate_proc_name+0x85/0xa0
    RIP: 0010:__xlate_proc_name+0x85/0xa0
    RSP: 0018:ffffb85c40473c18 EFLAGS: 00010286
    RAX: 0000000000000000 RBX: 0000000000000006 RCX: 0000000000000006
    RDX: 0000000000000007 RSI: 0000000000000096 RDI: ffff984c7f516930
    RBP: ffffb85c40473cb8 R08: 000000000000002c R09: 0000000000000229
    R10: 0000000000000000 R11: 0000000000000001 R12: ffffb85c40473c98
    R13: ffffb85c40473cb8 R14: ffffb85c40473c50 R15: 0000000000000000
    FS: 0000000000000000(0000) GS:ffff984c7f500000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00007f69b6899038 CR3: 000000001c20a006 CR4: 00000000001606e0
    Call Trace:
    __proc_create+0x45/0x230
    ? snprintf+0x49/0x60
    proc_mkdir_data+0x35/0x90
    register_handler_proc+0xef/0x110
    ? proc_register+0xfc/0x110
    ? proc_create_data+0x70/0xb0
    __setup_irq+0x39b/0x660
    ? request_threaded_irq+0xad/0x160
    request_threaded_irq+0xf5/0x160
    ? xennet_tx_buf_gc+0x1d0/0x1d0 [xen_netfront]
    bind_evtchn_to_irqhandler+0x3d/0x70
    ? xenbus_alloc_evtchn+0x41/0xa0
    netback_changed+0xa46/0xcda [xen_netfront]
    ? find_watch+0x40/0x40
    xenwatch_thread+0xc5/0x160
    ? finish_wait+0x80/0x80
    kthread+0x112/0x130
    ? kthread_create_worker_on_cpu+0x70/0x70
    ret_from_fork+0x35/0x40
    Code: 81 5c 00 48 85 c0 75 cc 5b 49 89 2e 31 c0 5d 4d 89 3c 24 41 5c 41 5d 41 5e 41 5f c3 4c 89 ee 48 c7 c7 40 4f 0e b4 e8 65 ea d8 ff 0b b8 fe ff ff ff 5b 5d 41 5c 41 5d 41 5e 41 5f c3 66 0f 1f
    ---[ end trace 650e5561b0caab3a ]---

    Signed-off-by: Xiao Liang
    Reviewed-by: Juergen Gross

    Signed-off-by: David S. Miller

    Xiao Liang
     

12 Aug, 2018

2 commits


03 Aug, 2018

1 commit


31 Jul, 2018

1 commit

  • When loading module manually, after call xenbus_switch_state to initializes
    the state of the netfront device, the driver state did not change so fast
    that may lead no dev created in latest kernel. This patch adds wait to make
    sure xenbus knows the driver is not in closed/unknown state.

    Current state:
    [vm]# ethtool eth0
    Settings for eth0:
    Link detected: yes
    [vm]# modprobe -r xen_netfront
    [vm]# modprobe xen_netfront
    [vm]# ethtool eth0
    Settings for eth0:
    Cannot get device settings: No such device
    Cannot get wake-on-lan settings: No such device
    Cannot get message level: No such device
    Cannot get link status: No such device
    No data available

    With the patch installed.
    [vm]# ethtool eth0
    Settings for eth0:
    Link detected: yes
    [vm]# modprobe -r xen_netfront
    [vm]# modprobe xen_netfront
    [vm]# ethtool eth0
    Settings for eth0:
    Link detected: yes

    Signed-off-by: Xiao Liang
    Signed-off-by: David S. Miller

    Xiao Liang
     

23 Jul, 2018

1 commit

  • Commit f599c64fdf7d ("xen-netfront: Fix race between device setup and
    open") changed the initialization order: xennet_create_queues() now
    happens before we do register_netdev() so using netdev->name in
    xennet_init_queue() is incorrect, we end up with the following in
    /proc/interrupts:

    60: 139 0 xen-dyn -event eth%d-q0-tx
    61: 265 0 xen-dyn -event eth%d-q0-rx
    62: 234 0 xen-dyn -event eth%d-q1-tx
    63: 1 0 xen-dyn -event eth%d-q1-rx

    and this looks ugly. Actually, using early netdev name (even when it's
    already set) is also not ideal: nowadays we tend to rename eth devices
    and queue name may end up not corresponding to the netdev name.

    Use nodename from xenbus device for queue naming: this can't change in VM's
    lifetime. Now /proc/interrupts looks like

    62: 202 0 xen-dyn -event device/vif/0-q0-tx
    63: 317 0 xen-dyn -event device/vif/0-q0-rx
    64: 262 0 xen-dyn -event device/vif/0-q1-tx
    65: 17 0 xen-dyn -event device/vif/0-q1-rx

    Fixes: f599c64fdf7d ("xen-netfront: Fix race between device setup and open")
    Signed-off-by: Vitaly Kuznetsov
    Reviewed-by: Ross Lagerwall
    Signed-off-by: David S. Miller

    Vitaly Kuznetsov
     

10 Jul, 2018

1 commit

  • This patch makes it so that instead of passing a void pointer as the
    accel_priv we instead pass a net_device pointer as sb_dev. Making this
    change allows us to pass the subordinate device through to the fallback
    function eventually so that we can keep the actual code in the
    ndo_select_queue call as focused on possible on the exception cases.

    Signed-off-by: Alexander Duyck
    Tested-by: Andrew Bowers
    Signed-off-by: Jeff Kirsher

    Alexander Duyck
     

22 Jun, 2018

2 commits

  • Update the features after calling register_netdev() otherwise the
    device features are not set up correctly and it not possible to change
    the MTU of the device. After this change, the features reported by
    ethtool match the device's features before the commit which introduced
    the issue and it is possible to change the device's MTU.

    Fixes: f599c64fdf7d ("xen-netfront: Fix race between device setup and open")
    Reported-by: Liam Shepherd
    Signed-off-by: Ross Lagerwall
    Reviewed-by: Juergen Gross
    Signed-off-by: David S. Miller

    Ross Lagerwall
     
  • Fixes: f599c64fdf7d ("xen-netfront: Fix race between device setup and open")
    Reported-by: Ben Hutchings
    Signed-off-by: Ross Lagerwall
    Reviewed-by: Juergen Gross
    Signed-off-by: David S. Miller

    Ross Lagerwall
     

13 Jun, 2018

1 commit


14 May, 2018

1 commit


27 Mar, 2018

1 commit


01 Mar, 2018

1 commit

  • A toolstack may delete the vif frontend and backend xenstore entries
    while xen-netfront is in the removal code path. In that case, the
    checks for xenbus_read_driver_state would return XenbusStateUnknown, and
    xennet_remove would hang indefinitely. This hang prevents system
    shutdown.

    xennet_remove must be able to handle XenbusStateUnknown, and
    netback_changed must also wake up the wake_queue for that state as well.

    Fixes: 5b5971df3bc2 ("xen-netfront: remove warning when unloading module")

    Signed-off-by: Jason Andryuk
    Cc: Eduardo Otubo
    Reviewed-by: Boris Ostrovsky
    Signed-off-by: Juergen Gross

    Jason Andryuk
     

06 Feb, 2018

1 commit

  • When a netfront device is set up it registers a netdev fairly early on,
    before it has set up the queues and is actually usable. A userspace tool
    like NetworkManager will immediately try to open it and access its state
    as soon as it appears. The bug can be reproduced by hotplugging VIFs
    until the VM runs out of grant refs. It registers the netdev but fails
    to set up any queues (since there are no more grant refs). In the
    meantime, NetworkManager opens the device and the kernel crashes trying
    to access the queues (of which there are none).

    Fix this in two ways:
    * For initial setup, register the netdev much later, after the queues
    are setup. This avoids the race entirely.
    * During a suspend/resume cycle, the frontend reconnects to the backend
    and the queues are recreated. It is possible (though highly unlikely) to
    race with something opening the device and accessing the queues after
    they have been destroyed but before they have been recreated. Extend the
    region covered by the rtnl semaphore to protect against this race. There
    is a possibility that we fail to recreate the queues so check for this
    in the open function.

    Signed-off-by: Ross Lagerwall
    Reviewed-by: Boris Ostrovsky
    Signed-off-by: Juergen Gross

    Ross Lagerwall
     

09 Jan, 2018

1 commit

  • When loading the module after unloading it, the network interface would
    not be enabled and thus wouldn't have a backend counterpart and unable
    to be used by the guest.

    The guest would face errors like:

    [root@guest ~]# ethtool -i eth0
    Cannot get driver information: No such device

    [root@guest ~]# ifconfig eth0
    eth0: error fetching interface information: Device not found

    This patch initializes the state of the netfront device whenever it is
    loaded manually, this state would communicate the netback to create its
    device and establish the connection between them.

    Signed-off-by: Eduardo Otubo
    Reviewed-by: Boris Ostrovsky
    Signed-off-by: David S. Miller

    Eduardo Otubo
     

30 Nov, 2017

1 commit

  • Pull networking fixes from David Miller:

    1) The forcedeth conversion from pci_*() DMA interfaces to dma_*() ones
    missed one spot. From Zhu Yanjun.

    2) Missing CRYPTO_SHA256 Kconfig dep in cfg80211, from Johannes Berg.

    3) Fix checksum offloading in thunderx driver, from Sunil Goutham.

    4) Add SPDX to vm_sockets_diag.h, from Stephen Hemminger.

    5) Fix use after free of packet headers in TIPC, from Jon Maloy.

    6) "sizeof(ptr)" vs "sizeof(*ptr)" bug in i40e, from Gustavo A R Silva.

    7) Tunneling fixes in mlxsw driver, from Petr Machata.

    8) Fix crash in fanout_demux_rollover() of AF_PACKET, from Mike
    Maloney.

    9) Fix race in AF_PACKET bind() vs. NETDEV_UP notifier, from Eric
    Dumazet.

    10) Fix regression in sch_sfq.c due to one of the timer_setup()
    conversions. From Paolo Abeni.

    11) SCTP does list_for_each_entry() using wrong struct member, fix from
    Xin Long.

    12) Don't use big endian netlink attribute read for
    IFLA_BOND_AD_ACTOR_SYSTEM, it is in cpu endianness. Also from Xin
    Long.

    13) Fix mis-initialization of q->link.clock in CBQ scheduler, preventing
    adding filters there. From Jiri Pirko.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (67 commits)
    ethernet: dwmac-stm32: Fix copyright
    net: via: via-rhine: use %p to format void * address instead of %x
    net: ethernet: xilinx: Mark XILINX_LL_TEMAC broken on 64-bit
    myri10ge: Update MAINTAINERS
    net: sched: cbq: create block for q->link.block
    atm: suni: remove extraneous space to fix indentation
    atm: lanai: use %p to format kernel addresses instead of %x
    VSOCK: Don't set sk_state to TCP_CLOSE before testing it
    atm: fore200e: use %pK to format kernel addresses instead of %x
    ambassador: fix incorrect indentation of assignment statement
    vxlan: use __be32 type for the param vni in __vxlan_fdb_delete
    bonding: use nla_get_u64 to extract the value for IFLA_BOND_AD_ACTOR_SYSTEM
    sctp: use right member as the param of list_for_each_entry
    sch_sfq: fix null pointer dereference at timer expiration
    cls_bpf: don't decrement net's refcount when offload fails
    net/packet: fix a race in packet_bind() and packet_notifier()
    packet: fix crash in fanout_demux_rollover()
    sctp: remove extern from stream sched
    sctp: force the params with right types for sctp csum apis
    sctp: force SCTP_ERROR_INV_STRM with __u32 when calling sctp_chunk_fail
    ...

    Linus Torvalds
     

28 Nov, 2017

1 commit

  • v2:
    * Replace busy wait with wait_event()/wake_up_all()
    * Cannot garantee that at the time xennet_remove is called, the
    xen_netback state will not be XenbusStateClosed, so added a
    condition for that
    * There's a small chance for the xen_netback state is
    XenbusStateUnknown by the time the xen_netfront switches to Closed,
    so added a condition for that.

    When unloading module xen_netfront from guest, dmesg would output
    warning messages like below:

    [ 105.236836] xen:grant_table: WARNING: g.e. 0x903 still in use!
    [ 105.236839] deferring g.e. 0x903 (pfn 0x35805)

    This problem relies on netfront and netback being out of sync. By the time
    netfront revokes the g.e.'s netback didn't have enough time to free all of
    them, hence displaying the warnings on dmesg.

    The trick here is to make netfront to wait until netback frees all the g.e.'s
    and only then continue to cleanup for the module removal, and this is done by
    manipulating both device states.

    Signed-off-by: Eduardo Otubo
    Acked-by: Juergen Gross
    Signed-off-by: David S. Miller

    Eduardo Otubo
     

22 Nov, 2017

1 commit

  • This converts all remaining cases of the old setup_timer() API into using
    timer_setup(), where the callback argument is the structure already
    holding the struct timer_list. These should have no behavioral changes,
    since they just change which pointer is passed into the callback with
    the same available pointers after conversion. It handles the following
    examples, in addition to some other variations.

    Casting from unsigned long:

    void my_callback(unsigned long data)
    {
    struct something *ptr = (struct something *)data;
    ...
    }
    ...
    setup_timer(&ptr->my_timer, my_callback, ptr);

    and forced object casts:

    void my_callback(struct something *ptr)
    {
    ...
    }
    ...
    setup_timer(&ptr->my_timer, my_callback, (unsigned long)ptr);

    become:

    void my_callback(struct timer_list *t)
    {
    struct something *ptr = from_timer(ptr, t, my_timer);
    ...
    }
    ...
    timer_setup(&ptr->my_timer, my_callback, 0);

    Direct function assignments:

    void my_callback(unsigned long data)
    {
    struct something *ptr = (struct something *)data;
    ...
    }
    ...
    ptr->my_timer.function = my_callback;

    have a temporary cast added, along with converting the args:

    void my_callback(struct timer_list *t)
    {
    struct something *ptr = from_timer(ptr, t, my_timer);
    ...
    }
    ...
    ptr->my_timer.function = (TIMER_FUNC_TYPE)my_callback;

    And finally, callbacks without a data assignment:

    void my_callback(unsigned long data)
    {
    ...
    }
    ...
    setup_timer(&ptr->my_timer, my_callback, 0);

    have their argument renamed to verify they're unused during conversion:

    void my_callback(struct timer_list *unused)
    {
    ...
    }
    ...
    timer_setup(&ptr->my_timer, my_callback, 0);

    The conversion is done with the following Coccinelle script:

    spatch --very-quiet --all-includes --include-headers \
    -I ./arch/x86/include -I ./arch/x86/include/generated \
    -I ./include -I ./arch/x86/include/uapi \
    -I ./arch/x86/include/generated/uapi -I ./include/uapi \
    -I ./include/generated/uapi --include ./include/linux/kconfig.h \
    --dir . \
    --cocci-file ~/src/data/timer_setup.cocci

    @fix_address_of@
    expression e;
    @@

    setup_timer(
    -&(e)
    +&e
    , ...)

    // Update any raw setup_timer() usages that have a NULL callback, but
    // would otherwise match change_timer_function_usage, since the latter
    // will update all function assignments done in the face of a NULL
    // function initialization in setup_timer().
    @change_timer_function_usage_NULL@
    expression _E;
    identifier _timer;
    type _cast_data;
    @@

    (
    -setup_timer(&_E->_timer, NULL, _E);
    +timer_setup(&_E->_timer, NULL, 0);
    |
    -setup_timer(&_E->_timer, NULL, (_cast_data)_E);
    +timer_setup(&_E->_timer, NULL, 0);
    |
    -setup_timer(&_E._timer, NULL, &_E);
    +timer_setup(&_E._timer, NULL, 0);
    |
    -setup_timer(&_E._timer, NULL, (_cast_data)&_E);
    +timer_setup(&_E._timer, NULL, 0);
    )

    @change_timer_function_usage@
    expression _E;
    identifier _timer;
    struct timer_list _stl;
    identifier _callback;
    type _cast_func, _cast_data;
    @@

    (
    -setup_timer(&_E->_timer, _callback, _E);
    +timer_setup(&_E->_timer, _callback, 0);
    |
    -setup_timer(&_E->_timer, &_callback, _E);
    +timer_setup(&_E->_timer, _callback, 0);
    |
    -setup_timer(&_E->_timer, _callback, (_cast_data)_E);
    +timer_setup(&_E->_timer, _callback, 0);
    |
    -setup_timer(&_E->_timer, &_callback, (_cast_data)_E);
    +timer_setup(&_E->_timer, _callback, 0);
    |
    -setup_timer(&_E->_timer, (_cast_func)_callback, _E);
    +timer_setup(&_E->_timer, _callback, 0);
    |
    -setup_timer(&_E->_timer, (_cast_func)&_callback, _E);
    +timer_setup(&_E->_timer, _callback, 0);
    |
    -setup_timer(&_E->_timer, (_cast_func)_callback, (_cast_data)_E);
    +timer_setup(&_E->_timer, _callback, 0);
    |
    -setup_timer(&_E->_timer, (_cast_func)&_callback, (_cast_data)_E);
    +timer_setup(&_E->_timer, _callback, 0);
    |
    -setup_timer(&_E._timer, _callback, (_cast_data)_E);
    +timer_setup(&_E._timer, _callback, 0);
    |
    -setup_timer(&_E._timer, _callback, (_cast_data)&_E);
    +timer_setup(&_E._timer, _callback, 0);
    |
    -setup_timer(&_E._timer, &_callback, (_cast_data)_E);
    +timer_setup(&_E._timer, _callback, 0);
    |
    -setup_timer(&_E._timer, &_callback, (_cast_data)&_E);
    +timer_setup(&_E._timer, _callback, 0);
    |
    -setup_timer(&_E._timer, (_cast_func)_callback, (_cast_data)_E);
    +timer_setup(&_E._timer, _callback, 0);
    |
    -setup_timer(&_E._timer, (_cast_func)_callback, (_cast_data)&_E);
    +timer_setup(&_E._timer, _callback, 0);
    |
    -setup_timer(&_E._timer, (_cast_func)&_callback, (_cast_data)_E);
    +timer_setup(&_E._timer, _callback, 0);
    |
    -setup_timer(&_E._timer, (_cast_func)&_callback, (_cast_data)&_E);
    +timer_setup(&_E._timer, _callback, 0);
    |
    _E->_timer@_stl.function = _callback;
    |
    _E->_timer@_stl.function = &_callback;
    |
    _E->_timer@_stl.function = (_cast_func)_callback;
    |
    _E->_timer@_stl.function = (_cast_func)&_callback;
    |
    _E._timer@_stl.function = _callback;
    |
    _E._timer@_stl.function = &_callback;
    |
    _E._timer@_stl.function = (_cast_func)_callback;
    |
    _E._timer@_stl.function = (_cast_func)&_callback;
    )

    // callback(unsigned long arg)
    @change_callback_handle_cast
    depends on change_timer_function_usage@
    identifier change_timer_function_usage._callback;
    identifier change_timer_function_usage._timer;
    type _origtype;
    identifier _origarg;
    type _handletype;
    identifier _handle;
    @@

    void _callback(
    -_origtype _origarg
    +struct timer_list *t
    )
    {
    (
    ... when != _origarg
    _handletype *_handle =
    -(_handletype *)_origarg;
    +from_timer(_handle, t, _timer);
    ... when != _origarg
    |
    ... when != _origarg
    _handletype *_handle =
    -(void *)_origarg;
    +from_timer(_handle, t, _timer);
    ... when != _origarg
    |
    ... when != _origarg
    _handletype *_handle;
    ... when != _handle
    _handle =
    -(_handletype *)_origarg;
    +from_timer(_handle, t, _timer);
    ... when != _origarg
    |
    ... when != _origarg
    _handletype *_handle;
    ... when != _handle
    _handle =
    -(void *)_origarg;
    +from_timer(_handle, t, _timer);
    ... when != _origarg
    )
    }

    // callback(unsigned long arg) without existing variable
    @change_callback_handle_cast_no_arg
    depends on change_timer_function_usage &&
    !change_callback_handle_cast@
    identifier change_timer_function_usage._callback;
    identifier change_timer_function_usage._timer;
    type _origtype;
    identifier _origarg;
    type _handletype;
    @@

    void _callback(
    -_origtype _origarg
    +struct timer_list *t
    )
    {
    + _handletype *_origarg = from_timer(_origarg, t, _timer);
    +
    ... when != _origarg
    - (_handletype *)_origarg
    + _origarg
    ... when != _origarg
    }

    // Avoid already converted callbacks.
    @match_callback_converted
    depends on change_timer_function_usage &&
    !change_callback_handle_cast &&
    !change_callback_handle_cast_no_arg@
    identifier change_timer_function_usage._callback;
    identifier t;
    @@

    void _callback(struct timer_list *t)
    { ... }

    // callback(struct something *handle)
    @change_callback_handle_arg
    depends on change_timer_function_usage &&
    !match_callback_converted &&
    !change_callback_handle_cast &&
    !change_callback_handle_cast_no_arg@
    identifier change_timer_function_usage._callback;
    identifier change_timer_function_usage._timer;
    type _handletype;
    identifier _handle;
    @@

    void _callback(
    -_handletype *_handle
    +struct timer_list *t
    )
    {
    + _handletype *_handle = from_timer(_handle, t, _timer);
    ...
    }

    // If change_callback_handle_arg ran on an empty function, remove
    // the added handler.
    @unchange_callback_handle_arg
    depends on change_timer_function_usage &&
    change_callback_handle_arg@
    identifier change_timer_function_usage._callback;
    identifier change_timer_function_usage._timer;
    type _handletype;
    identifier _handle;
    identifier t;
    @@

    void _callback(struct timer_list *t)
    {
    - _handletype *_handle = from_timer(_handle, t, _timer);
    }

    // We only want to refactor the setup_timer() data argument if we've found
    // the matching callback. This undoes changes in change_timer_function_usage.
    @unchange_timer_function_usage
    depends on change_timer_function_usage &&
    !change_callback_handle_cast &&
    !change_callback_handle_cast_no_arg &&
    !change_callback_handle_arg@
    expression change_timer_function_usage._E;
    identifier change_timer_function_usage._timer;
    identifier change_timer_function_usage._callback;
    type change_timer_function_usage._cast_data;
    @@

    (
    -timer_setup(&_E->_timer, _callback, 0);
    +setup_timer(&_E->_timer, _callback, (_cast_data)_E);
    |
    -timer_setup(&_E._timer, _callback, 0);
    +setup_timer(&_E._timer, _callback, (_cast_data)&_E);
    )

    // If we fixed a callback from a .function assignment, fix the
    // assignment cast now.
    @change_timer_function_assignment
    depends on change_timer_function_usage &&
    (change_callback_handle_cast ||
    change_callback_handle_cast_no_arg ||
    change_callback_handle_arg)@
    expression change_timer_function_usage._E;
    identifier change_timer_function_usage._timer;
    identifier change_timer_function_usage._callback;
    type _cast_func;
    typedef TIMER_FUNC_TYPE;
    @@

    (
    _E->_timer.function =
    -_callback
    +(TIMER_FUNC_TYPE)_callback
    ;
    |
    _E->_timer.function =
    -&_callback
    +(TIMER_FUNC_TYPE)_callback
    ;
    |
    _E->_timer.function =
    -(_cast_func)_callback;
    +(TIMER_FUNC_TYPE)_callback
    ;
    |
    _E->_timer.function =
    -(_cast_func)&_callback
    +(TIMER_FUNC_TYPE)_callback
    ;
    |
    _E._timer.function =
    -_callback
    +(TIMER_FUNC_TYPE)_callback
    ;
    |
    _E._timer.function =
    -&_callback;
    +(TIMER_FUNC_TYPE)_callback
    ;
    |
    _E._timer.function =
    -(_cast_func)_callback
    +(TIMER_FUNC_TYPE)_callback
    ;
    |
    _E._timer.function =
    -(_cast_func)&_callback
    +(TIMER_FUNC_TYPE)_callback
    ;
    )

    // Sometimes timer functions are called directly. Replace matched args.
    @change_timer_function_calls
    depends on change_timer_function_usage &&
    (change_callback_handle_cast ||
    change_callback_handle_cast_no_arg ||
    change_callback_handle_arg)@
    expression _E;
    identifier change_timer_function_usage._timer;
    identifier change_timer_function_usage._callback;
    type _cast_data;
    @@

    _callback(
    (
    -(_cast_data)_E
    +&_E->_timer
    |
    -(_cast_data)&_E
    +&_E._timer
    |
    -_E
    +&_E->_timer
    )
    )

    // If a timer has been configured without a data argument, it can be
    // converted without regard to the callback argument, since it is unused.
    @match_timer_function_unused_data@
    expression _E;
    identifier _timer;
    identifier _callback;
    @@

    (
    -setup_timer(&_E->_timer, _callback, 0);
    +timer_setup(&_E->_timer, _callback, 0);
    |
    -setup_timer(&_E->_timer, _callback, 0L);
    +timer_setup(&_E->_timer, _callback, 0);
    |
    -setup_timer(&_E->_timer, _callback, 0UL);
    +timer_setup(&_E->_timer, _callback, 0);
    |
    -setup_timer(&_E._timer, _callback, 0);
    +timer_setup(&_E._timer, _callback, 0);
    |
    -setup_timer(&_E._timer, _callback, 0L);
    +timer_setup(&_E._timer, _callback, 0);
    |
    -setup_timer(&_E._timer, _callback, 0UL);
    +timer_setup(&_E._timer, _callback, 0);
    |
    -setup_timer(&_timer, _callback, 0);
    +timer_setup(&_timer, _callback, 0);
    |
    -setup_timer(&_timer, _callback, 0L);
    +timer_setup(&_timer, _callback, 0);
    |
    -setup_timer(&_timer, _callback, 0UL);
    +timer_setup(&_timer, _callback, 0);
    |
    -setup_timer(_timer, _callback, 0);
    +timer_setup(_timer, _callback, 0);
    |
    -setup_timer(_timer, _callback, 0L);
    +timer_setup(_timer, _callback, 0);
    |
    -setup_timer(_timer, _callback, 0UL);
    +timer_setup(_timer, _callback, 0);
    )

    @change_callback_unused_data
    depends on match_timer_function_unused_data@
    identifier match_timer_function_unused_data._callback;
    type _origtype;
    identifier _origarg;
    @@

    void _callback(
    -_origtype _origarg
    +struct timer_list *unused
    )
    {
    ... when != _origarg
    }

    Signed-off-by: Kees Cook

    Kees Cook
     

17 Oct, 2017

1 commit

  • RFC791 specifies the minimum MTU to be 68, while xen-net{front|back}
    drivers use a minimum value of 0.

    When set MTU to 0~67 with xen_net{front|back} driver, the network
    will become unreachable immediately, the guest can no longer be pinged.

    xen_net{front|back} should not allow the user to set this value which causes
    network problems.

    Reported-by: Chen Shi
    Signed-off-by: Mohammed Gamal
    Acked-by: Wei Liu
    Reviewed-by: Boris Ostrovsky
    Signed-off-by: Boris Ostrovsky

    Mohammed Gamal
     

31 Aug, 2017

1 commit

  • xennet_start_xmit() might copy skb with inappropriate layout
    into a fresh one.

    Old skb is freed, and at this point it is not a drop, but
    a consume. New skb will then be either consumed or dropped.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

12 May, 2017

1 commit

  • Unavoidable crashes in netfront_resume() and netback_changed() after a
    previous fail in talk_to_netback() (e.g. when we fail to read MAC from
    xenstore) were discovered. The failure path in talk_to_netback() does
    unregister/free for netdev but we don't reset drvdata and we try accessing
    it after resume.

    Fix the bug by removing the whole xen device completely with
    device_unregister(), this guarantees we won't have any calls into netfront
    after a failure.

    Signed-off-by: Vitaly Kuznetsov
    Signed-off-by: David S. Miller

    Vitaly Kuznetsov
     

23 Feb, 2017

1 commit

  • Pull networking updates from David Miller:
    "Highlights:

    1) Support TX_RING in AF_PACKET TPACKET_V3 mode, from Sowmini
    Varadhan.

    2) Simplify classifier state on sk_buff in order to shrink it a bit.
    From Willem de Bruijn.

    3) Introduce SIPHASH and it's usage for secure sequence numbers and
    syncookies. From Jason A. Donenfeld.

    4) Reduce CPU usage for ICMP replies we are going to limit or
    suppress, from Jesper Dangaard Brouer.

    5) Introduce Shared Memory Communications socket layer, from Ursula
    Braun.

    6) Add RACK loss detection and allow it to actually trigger fast
    recovery instead of just assisting after other algorithms have
    triggered it. From Yuchung Cheng.

    7) Add xmit_more and BQL support to mvneta driver, from Simon Guinot.

    8) skb_cow_data avoidance in esp4 and esp6, from Steffen Klassert.

    9) Export MPLS packet stats via netlink, from Robert Shearman.

    10) Significantly improve inet port bind conflict handling, especially
    when an application is restarted and changes it's setting of
    reuseport. From Josef Bacik.

    11) Implement TX batching in vhost_net, from Jason Wang.

    12) Extend the dummy device so that VF (virtual function) features,
    such as configuration, can be more easily tested. From Phil
    Sutter.

    13) Avoid two atomic ops per page on x86 in bnx2x driver, from Eric
    Dumazet.

    14) Add new bpf MAP, implementing a longest prefix match trie. From
    Daniel Mack.

    15) Packet sample offloading support in mlxsw driver, from Yotam Gigi.

    16) Add new aquantia driver, from David VomLehn.

    17) Add bpf tracepoints, from Daniel Borkmann.

    18) Add support for port mirroring to b53 and bcm_sf2 drivers, from
    Florian Fainelli.

    19) Remove custom busy polling in many drivers, it is done in the core
    networking since 4.5 times. From Eric Dumazet.

    20) Support XDP adjust_head in virtio_net, from John Fastabend.

    21) Fix several major holes in neighbour entry confirmation, from
    Julian Anastasov.

    22) Add XDP support to bnxt_en driver, from Michael Chan.

    23) VXLAN offloads for enic driver, from Govindarajulu Varadarajan.

    24) Add IPVTAP driver (IP-VLAN based tap driver) from Sainath Grandhi.

    25) Support GRO in IPSEC protocols, from Steffen Klassert"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1764 commits)
    Revert "ath10k: Search SMBIOS for OEM board file extension"
    net: socket: fix recvmmsg not returning error from sock_error
    bnxt_en: use eth_hw_addr_random()
    bpf: fix unlocking of jited image when module ronx not set
    arch: add ARCH_HAS_SET_MEMORY config
    net: napi_watchdog() can use napi_schedule_irqoff()
    tcp: Revert "tcp: tcp_probe: use spin_lock_bh()"
    net/hsr: use eth_hw_addr_random()
    net: mvpp2: enable building on 64-bit platforms
    net: mvpp2: switch to build_skb() in the RX path
    net: mvpp2: simplify MVPP2_PRS_RI_* definitions
    net: mvpp2: fix indentation of MVPP2_EXT_GLOBAL_CTRL_DEFAULT
    net: mvpp2: remove unused register definitions
    net: mvpp2: simplify mvpp2_bm_bufs_add()
    net: mvpp2: drop useless fields in mvpp2_bm_pool and related code
    net: mvpp2: remove unused 'tx_skb' field of 'struct mvpp2_tx_queue'
    net: mvpp2: release reference to txq_cpu[] entry after unmapping
    net: mvpp2: handle too large value in mvpp2_rx_time_coal_set()
    net: mvpp2: handle too large value handling in mvpp2_rx_pkts_coal_set()
    net: mvpp2: remove useless arguments in mvpp2_rx_{pkts, time}_coal_set
    ...

    Linus Torvalds
     

22 Feb, 2017

1 commit

  • Pull xen updates from Juergen Gross:
    "Xen features and fixes:

    - a series from Boris Ostrovsky adding support for booting Linux as
    Xen PVH guest

    - a series from Juergen Gross streamlining the xenbus driver

    - a series from Paul Durrant adding support for the new device model
    hypercall

    - several small corrections"

    * tag 'for-linus-4.11-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
    xen/privcmd: add IOCTL_PRIVCMD_RESTRICT
    xen/privcmd: Add IOCTL_PRIVCMD_DM_OP
    xen/privcmd: return -ENOTTY for unimplemented IOCTLs
    xen: optimize xenbus driver for multiple concurrent xenstore accesses
    xen: modify xenstore watch event interface
    xen: clean up xenbus internal headers
    xenbus: Neaten xenbus_va_dev_error
    xen/pvh: Use Xen's emergency_restart op for PVH guests
    xen/pvh: Enable CPU hotplug
    xen/pvh: PVH guests always have PV devices
    xen/pvh: Initialize grant table for PVH guests
    xen/pvh: Make sure we don't use ACPI_IRQ_MODEL_PIC for SCI
    xen/pvh: Bootstrap PVH guest
    xen/pvh: Import PVH-related Xen public interfaces
    xen/x86: Remove PVH support
    x86/boot/32: Convert the 32-bit pgtable setup code from assembly to C
    xen/manage: correct return value check on xenbus_scanf()
    x86/xen: Fix APIC id mismatch warning on Intel
    xen/netback: set default upper limit of tx/rx queues to 8
    xen/netfront: set default upper limit of tx/rx queues to 8

    Linus Torvalds
     

11 Feb, 2017

2 commits


10 Feb, 2017

2 commits

  • This fixes a crash when running out of grant refs when creating many
    queues across many netdevs.

    * If creating queues fails (i.e. there are no grant refs available),
    call xenbus_dev_fatal() to ensure that the xenbus device is set to the
    closed state.
    * If no queues are created, don't call xennet_disconnect_backend as
    netdev->real_num_tx_queues will not have been set correctly.
    * If setup_netfront() fails, ensure that all the queues created are
    cleaned up, not just those that have been set up.
    * If any queues were set up and an error occurs, call
    xennet_destroy_queues() to clean up the napi context.
    * If any fatal error occurs, unregister and destroy the netdev to avoid
    leaving around a half setup network device.

    Signed-off-by: Ross Lagerwall
    Reviewed-by: Boris Ostrovsky
    Signed-off-by: David S. Miller

    Ross Lagerwall
     
  • The commit 90c311b0eeea ("xen-netfront: Fix Rx stall during network
    stress and OOM") caused the refill timer to be triggerred almost on
    all invocations of xennet_alloc_rx_buffers for certain workloads.
    This reworks the fix by reverting to the old behaviour and taking into
    consideration the skb allocation failure. Refill timer is now triggered
    on insufficient requests or skb allocation failure.

    Signed-off-by: Vineeth Remanan Pillai
    Fixes: 90c311b0eeea (xen-netfront: Fix Rx stall during network stress and OOM)
    Reported-by: Boris Ostrovsky
    Reviewed-by: Boris Ostrovsky
    Signed-off-by: David S. Miller

    Vineeth Remanan Pillai
     

31 Jan, 2017

1 commit

  • napi_complete_done() allows to opt-in for gro_flush_timeout,
    added back in linux-3.19, commit 3b47d30396ba
    ("net: gro: add a per device gro flush timer")

    This allows for more efficient GRO aggregation without
    sacrifying latencies.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

30 Jan, 2017

1 commit

  • The default for the number of tx/rx queues of one interface is the
    number of vcpus of the system today. As each queue pair reserves 512
    grant pages this default consumes a ridiculous number of grants for
    large guests.

    Limit the queue number to 8 as default. This value can be modified
    via a module parameter if required.

    Signed-off-by: Juergen Gross
    Reviewed-by: Boris Ostrovsky
    Signed-off-by: Boris Ostrovsky

    Juergen Gross