21 Jun, 2020

1 commit

  • This will be useful to allow busy poll for tunneled traffic. In case of
    busy poll for sessions over tunnels, the underlying physical device's
    queues need to be polled.

    Tunnels schedule NAPI either via netif_rx() for backlog queue or
    schedule the gro_cell_poll(). netif_rx() propagates the valid skb->napi_id
    to the socket. OTOH, gro_cell_poll() stamps the skb->napi_id again by
    calling skb_mark_napi_id() with the tunnel NAPI which is not a busy poll
    candidate. This was preventing tunneled traffic to use busy poll. A valid
    NAPI ID in the skb indicates it was already marked for busy poll by a
    NAPI driver and hence needs to be copied into the socket.

    Signed-off-by: Amritha Nambiar
    Reviewed-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Amritha Nambiar
     

31 Oct, 2019

1 commit

  • We already annotated most accesses to sk->sk_napi_id

    We missed sk_mark_napi_id() and sk_mark_napi_id_once()
    which might be called without socket lock held in UDP stack.

    KCSAN reported :
    BUG: KCSAN: data-race in udpv6_queue_rcv_one_skb / udpv6_queue_rcv_one_skb

    write to 0xffff888121c6d108 of 4 bytes by interrupt on cpu 0:
    sk_mark_napi_id include/net/busy_poll.h:125 [inline]
    __udpv6_queue_rcv_skb net/ipv6/udp.c:571 [inline]
    udpv6_queue_rcv_one_skb+0x70c/0xb40 net/ipv6/udp.c:672
    udpv6_queue_rcv_skb+0xb5/0x400 net/ipv6/udp.c:689
    udp6_unicast_rcv_skb.isra.0+0xd7/0x180 net/ipv6/udp.c:832
    __udp6_lib_rcv+0x69c/0x1770 net/ipv6/udp.c:913
    udpv6_rcv+0x2b/0x40 net/ipv6/udp.c:1015
    ip6_protocol_deliver_rcu+0x22a/0xbe0 net/ipv6/ip6_input.c:409
    ip6_input_finish+0x30/0x50 net/ipv6/ip6_input.c:450
    NF_HOOK include/linux/netfilter.h:305 [inline]
    NF_HOOK include/linux/netfilter.h:299 [inline]
    ip6_input+0x177/0x190 net/ipv6/ip6_input.c:459
    dst_input include/net/dst.h:442 [inline]
    ip6_rcv_finish+0x110/0x140 net/ipv6/ip6_input.c:76
    NF_HOOK include/linux/netfilter.h:305 [inline]
    NF_HOOK include/linux/netfilter.h:299 [inline]
    ipv6_rcv+0x1a1/0x1b0 net/ipv6/ip6_input.c:284
    __netif_receive_skb_one_core+0xa7/0xe0 net/core/dev.c:5010
    __netif_receive_skb+0x37/0xf0 net/core/dev.c:5124
    process_backlog+0x1d3/0x420 net/core/dev.c:5955
    napi_poll net/core/dev.c:6392 [inline]
    net_rx_action+0x3ae/0xa90 net/core/dev.c:6460

    write to 0xffff888121c6d108 of 4 bytes by interrupt on cpu 1:
    sk_mark_napi_id include/net/busy_poll.h:125 [inline]
    __udpv6_queue_rcv_skb net/ipv6/udp.c:571 [inline]
    udpv6_queue_rcv_one_skb+0x70c/0xb40 net/ipv6/udp.c:672
    udpv6_queue_rcv_skb+0xb5/0x400 net/ipv6/udp.c:689
    udp6_unicast_rcv_skb.isra.0+0xd7/0x180 net/ipv6/udp.c:832
    __udp6_lib_rcv+0x69c/0x1770 net/ipv6/udp.c:913
    udpv6_rcv+0x2b/0x40 net/ipv6/udp.c:1015
    ip6_protocol_deliver_rcu+0x22a/0xbe0 net/ipv6/ip6_input.c:409
    ip6_input_finish+0x30/0x50 net/ipv6/ip6_input.c:450
    NF_HOOK include/linux/netfilter.h:305 [inline]
    NF_HOOK include/linux/netfilter.h:299 [inline]
    ip6_input+0x177/0x190 net/ipv6/ip6_input.c:459
    dst_input include/net/dst.h:442 [inline]
    ip6_rcv_finish+0x110/0x140 net/ipv6/ip6_input.c:76
    NF_HOOK include/linux/netfilter.h:305 [inline]
    NF_HOOK include/linux/netfilter.h:299 [inline]
    ipv6_rcv+0x1a1/0x1b0 net/ipv6/ip6_input.c:284
    __netif_receive_skb_one_core+0xa7/0xe0 net/core/dev.c:5010
    __netif_receive_skb+0x37/0xf0 net/core/dev.c:5124
    process_backlog+0x1d3/0x420 net/core/dev.c:5955

    Reported by Kernel Concurrency Sanitizer on:
    CPU: 1 PID: 10890 Comm: syz-executor.0 Not tainted 5.4.0-rc3+ #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011

    Fixes: e68b6e50fa35 ("udp: enable busy polling for all sockets")
    Signed-off-by: Eric Dumazet
    Reported-by: syzbot
    Signed-off-by: David S. Miller

    Eric Dumazet
     

05 Jun, 2019

1 commit

  • Based on 1 normalized pattern(s):

    this program is free software you can redistribute it and or modify
    it under the terms and conditions of the gnu general public license
    version 2 as published by the free software foundation this program
    is distributed in the hope it will be useful but without any
    warranty without even the implied warranty of merchantability or
    fitness for a particular purpose see the gnu general public license
    for more details you should have received a copy of the gnu general
    public license along with this program if not write to the free
    software foundation inc 51 franklin st fifth floor boston ma 02110
    1301 usa

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-only

    has been chosen to replace the boilerplate/reference in 111 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Alexios Zavras
    Reviewed-by: Allison Randal
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190530000436.567572064@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

31 Jul, 2018

2 commits


02 Jul, 2018

1 commit

  • This patch adds a new field to sock_common 'skc_rx_queue_mapping'
    which holds the receive queue number for the connection. The Rx queue
    is marked in tcp_finish_connect() to allow a client app to do
    SO_INCOMING_NAPI_ID after a connect() call to get the right queue
    association for a socket. Rx queue is also marked in tcp_conn_request()
    to allow syn-ack to go on the right tx-queue associated with
    the queue on which syn is received.

    Signed-off-by: Amritha Nambiar
    Signed-off-by: Sridhar Samudrala
    Signed-off-by: David S. Miller

    Amritha Nambiar
     

26 May, 2018

1 commit


12 Aug, 2017

1 commit

  • MIN_NAPI_ID is used in various places outside of
    CONFIG_NET_RX_BUSY_POLL wrapping, so when it's not set
    we run into build errors such as:

    net/core/dev.c: In function 'dev_get_by_napi_id':
    net/core/dev.c:886:16: error: ‘MIN_NAPI_ID’ undeclared (first use in this function)
    if (napi_id < MIN_NAPI_ID)
    ^~~~~~~~~~~

    Thus, have MIN_NAPI_ID always defined to fix these errors.

    Signed-off-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

25 Mar, 2017

5 commits

  • Move the core functionality in sk_busy_loop() to napi_busy_loop() and
    make it independent of sk.

    This enables re-using this function in epoll busy loop implementation.

    Signed-off-by: Sridhar Samudrala
    Signed-off-by: Alexander Duyck
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Sridhar Samudrala
     
  • This patch flips the logic we were using to determine if the busy polling
    has timed out. The main motivation for this is that we will need to
    support two different possible timeout values in the future and by
    recording the start time rather than when we would want to end we can focus
    on making the end_time specific to the task be it epoll or socket based
    polling.

    Signed-off-by: Alexander Duyck
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Alexander Duyck
     
  • checking the return value of sk_busy_loop. As there are only a few
    consumers of that data, and the data being checked for can be replaced
    with a check for !skb_queue_empty() we might as well just pull the code
    out of sk_busy_loop and place it in the spots that actually need it.

    Signed-off-by: Alexander Duyck
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Alexander Duyck
     
  • Instead of defining two versions of skb_mark_napi_id I think it is more
    readable to just match the format of the sk_mark_napi_id functions and just
    wrap the contents of the function instead of defining two versions of the
    function. This way we can save a few lines of code since we only need 2 of
    the ifdef/endif but needed 5 for the extra function declaration.

    Signed-off-by: Alexander Duyck
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Alexander Duyck
     
  • This patch is a cleanup/fix for NAPI IDs following the changes that made it
    so that sender_cpu and napi_id were doing a better job of sharing the same
    location in the sk_buff.

    One issue I found is that we weren't validating the napi_id as being valid
    before we started trying to setup the busy polling. This change corrects
    that by using the MIN_NAPI_ID value that is now used in both allocating the
    NAPI IDs, as well as validating them.

    Signed-off-by: Alexander Duyck
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Alexander Duyck
     

02 Mar, 2017

2 commits


14 Feb, 2017

1 commit


18 Nov, 2016

1 commit

  • UDP busy polling is restricted to connected UDP sockets.

    This is because sk_busy_loop() only takes care of one NAPI context.

    There are cases where it could be extended.

    1) Some hosts receive traffic on a single NIC, with one RX queue.

    2) Some applications use SO_REUSEPORT and associated BPF filter
    to split the incoming traffic on one UDP socket per RX
    queue/thread/cpu

    3) Some UDP sockets are used to send/receive traffic for one flow, but
    they do not bother with connect()

    This patch records the napi_id of first received skb, giving more
    reach to busy polling.

    Tested:

    lpaa23:~# echo 70 >/proc/sys/net/core/busy_read
    lpaa24:~# echo 70 >/proc/sys/net/core/busy_read

    lpaa23:~# for f in `seq 1 10`; do ./super_netperf 1 -H lpaa24 -t UDP_RR -l 5; done

    Before patch :
    27867 28870 37324 41060 41215
    36764 36838 44455 41282 43843
    After patch :
    73920 73213 70147 74845 71697
    68315 68028 75219 70082 73707

    Signed-off-by: Eric Dumazet
    Cc: Willem de Bruijn
    Signed-off-by: David S. Miller

    Eric Dumazet
     

17 Nov, 2016

1 commit


19 Nov, 2015

1 commit

  • There is really little gain from inlining this big function.
    We'll soon make it even bigger in following patches.

    This means we no longer need to export napi_by_id()

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

14 Jan, 2014

1 commit

  • The only valid use of preempt_enable_no_resched() is if the very next
    line is schedule() or if we know preemption cannot actually be enabled
    by that statement due to known more preempt_count 'refs'.

    This busy_poll stuff looks to be completely and utterly broken,
    sched_clock() can return utter garbage with interrupts enabled (rare
    but still) and it can drift unbounded between CPUs.

    This means that if you get preempted/migrated and your new CPU is
    years behind on the previous CPU we get to busy spin for a _very_ long
    time.

    There is a _REASON_ sched_clock() warns about preemptability -
    papering over it with a preempt_disable()/preempt_enable_no_resched()
    is just terminal brain damage on so many levels.

    Replace sched_clock() usage with local_clock() which has a bounded
    drift between CPUs (
    Signed-off-by: Peter Zijlstra
    Cc: David S. Miller
    Cc: rui.zhang@intel.com
    Cc: jacob.jun.pan@linux.intel.com
    Cc: Mike Galbraith
    Cc: hpa@zytor.com
    Cc: Arjan van de Ven
    Cc: lenb@kernel.org
    Cc: rjw@rjwysocki.net
    Cc: Eliezer Tamir
    Cc: Linus Torvalds
    Cc: Andrew Morton
    Link: http://lkml.kernel.org/r/20131119151338.GF3694@twins.programming.kicks-ass.net
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

29 Aug, 2013

1 commit

  • Add a cpu_relaxt to sk_busy_loop.

    Julie Cummings reported performance issues when hyperthreading is on.
    Arjan van de Ven observed that we should have a cpu_relax() in the
    busy poll loop.

    Reported-by: Julie Cummings
    Signed-off-by: Eliezer Tamir
    Signed-off-by: David S. Miller

    Eliezer Tamir
     

10 Aug, 2013

1 commit

  • Rename mib counter from "low latency" to "busy poll"

    v1 also moved the counter to the ip MIB (suggested by Shawn Bohrer)
    Eric Dumazet suggested that the current location is better.

    So v2 just renames the counter to fit the new naming convention.

    Signed-off-by: Eliezer Tamir
    Signed-off-by: David S. Miller

    Eliezer Tamir
     

05 Aug, 2013

1 commit

  • When renaming ll_poll to busy poll, I introduced a typo
    in the name of the do-nothing placeholder for sk_busy_loop
    and called it sk_busy_poll.
    This broke compile when busy poll was not configured.
    Cong Wang submitted a patch to fixed that.
    This patch removes the now redundant, misspelled placeholder.

    Signed-off-by: Eliezer Tamir
    Signed-off-by: David S. Miller

    Eliezer Tamir
     

02 Aug, 2013

2 commits


11 Jul, 2013

3 commits