29 Nov, 2020

1 commit

  • Daniel Borkmann says:

    ====================
    pull-request: bpf 2020-11-28

    1) Do not reference the skb for xsk's generic TX side since when looped
    back into RX it might crash in generic XDP, from Björn Töpel.

    2) Fix umem cleanup on a partially set up xsk socket when being destroyed,
    from Magnus Karlsson.

    3) Fix an incorrect netdev reference count when failing xsk_bind() operation,
    from Marek Majtyka.

    4) Fix bpftool to set an error code on failed calloc() in build_btf_type_table(),
    from Zhen Lei.

    * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
    bpf: Add MAINTAINERS entry for BPF LSM
    bpftool: Fix error return value in build_btf_type_table
    net, xsk: Avoid taking multiple skbuff references
    xsk: Fix incorrect netdev reference count
    xsk: Fix umem cleanup bug at socket destruct
    MAINTAINERS: Update XDP and AF_XDP entries
    ====================

    Link: https://lore.kernel.org/r/20201128005104.1205-1-daniel@iogearbox.net
    Signed-off-by: Jakub Kicinski

    Jakub Kicinski
     

25 Nov, 2020

1 commit

  • Commit 642e450b6b59 ("xsk: Do not discard packet when NETDEV_TX_BUSY")
    addressed the problem that packets were discarded from the Tx AF_XDP
    ring, when the driver returned NETDEV_TX_BUSY. Part of the fix was
    bumping the skbuff reference count, so that the buffer would not be
    freed by dev_direct_xmit(). A reference count larger than one means
    that the skbuff is "shared", which is not the case.

    If the "shared" skbuff is sent to the generic XDP receive path,
    netif_receive_generic_xdp(), and pskb_expand_head() is entered the
    BUG_ON(skb_shared(skb)) will trigger.

    This patch adds a variant to dev_direct_xmit(), __dev_direct_xmit(),
    where a user can select the skbuff free policy. This allows AF_XDP to
    avoid bumping the reference count, but still keep the NETDEV_TX_BUSY
    behavior.

    Fixes: 642e450b6b59 ("xsk: Do not discard packet when NETDEV_TX_BUSY")
    Reported-by: Yonghong Song
    Signed-off-by: Björn Töpel
    Signed-off-by: Daniel Borkmann
    Link: https://lore.kernel.org/bpf/20201123175600.146255-1-bjorn.topel@gmail.com

    Björn Töpel
     

24 Nov, 2020

1 commit

  • In the patchset merged by commit b9fcf0a0d826
    ("Merge branch 'support-AF_PACKET-for-layer-3-devices'") L3 devices which
    did not have header_ops were given one for the purpose of protocol parsing
    on af_packet transmit path.

    That change made af_packet receive path regard these devices as having a
    visible L3 header and therefore aligned incoming skb->data to point to the
    skb's mac_header. Some devices, such as ipip, xfrmi, and others, do not
    reset their mac_header prior to ingress and therefore their incoming
    packets became malformed.

    Ideally these devices would reset their mac headers, or af_packet would be
    able to rely on dev->hard_header_len being 0 for such cases, but it seems
    this is not the case.

    Fix by changing af_packet RX ll visibility criteria to include the
    existence of a '.create()' header operation, which is used when creating
    a device hard header - via dev_hard_header() - by upper layers, and does
    not exist in these L3 devices.

    As this predicate may be useful in other situations, add it as a common
    dev_has_header() helper in netdevice.h.

    Fixes: b9fcf0a0d826 ("Merge branch 'support-AF_PACKET-for-layer-3-devices'")
    Signed-off-by: Eyal Birger
    Acked-by: Jason A. Donenfeld
    Acked-by: Willem de Bruijn
    Link: https://lore.kernel.org/r/20201121062817.3178900-1-eyal.birger@gmail.com
    Signed-off-by: Jakub Kicinski

    Eyal Birger
     

14 Oct, 2020

1 commit

  • In several places the same code is used to populate rtnl_link_stats64
    fields with data from pcpu_sw_netstats. Therefore factor out this code
    to a new function dev_fetch_sw_netstats().

    v2:
    - constify argument netstats
    - don't ignore netstats being NULL or an ERRPTR
    - switch to EXPORT_SYMBOL_GPL

    Signed-off-by: Heiner Kallweit
    Link: https://lore.kernel.org/r/6d16a338-52f5-df69-0020-6bc771a7d498@gmail.com
    Signed-off-by: Jakub Kicinski

    Heiner Kallweit
     

13 Oct, 2020

1 commit

  • Alexei Starovoitov says:

    ====================
    pull-request: bpf-next 2020-10-12

    The main changes are:

    1) The BPF verifier improvements to track register allocation pattern, from Alexei and Yonghong.

    2) libbpf relocation support for different size load/store, from Andrii.

    3) bpf_redirect_peer() helper and support for inner map array with different max_entries, from Daniel.

    4) BPF support for per-cpu variables, form Hao.

    5) sockmap improvements, from John.
    ====================

    Signed-off-by: Jakub Kicinski

    Jakub Kicinski
     

12 Oct, 2020

1 commit

  • Add an efficient ingress to ingress netns switch that can be used out of tc BPF
    programs in order to redirect traffic from host ns ingress into a container
    veth device ingress without having to go via CPU backlog queue [0]. For local
    containers this can also be utilized and path via CPU backlog queue only needs
    to be taken once, not twice. On a high level this borrows from ipvlan which does
    similar switch in __netif_receive_skb_core() and then iterates via another_round.
    This helps to reduce latency for mentioned use cases.

    Pod to remote pod with redirect(), TCP_RR [1]:

    # percpu_netperf 10.217.1.33
    RT_LATENCY: 122.450 (per CPU: 122.666 122.401 122.333 122.401 )
    MEAN_LATENCY: 121.210 (per CPU: 121.100 121.260 121.320 121.160 )
    STDDEV_LATENCY: 120.040 (per CPU: 119.420 119.910 125.460 115.370 )
    MIN_LATENCY: 46.500 (per CPU: 47.000 47.000 47.000 45.000 )
    P50_LATENCY: 118.500 (per CPU: 118.000 119.000 118.000 119.000 )
    P90_LATENCY: 127.500 (per CPU: 127.000 128.000 127.000 128.000 )
    P99_LATENCY: 130.750 (per CPU: 131.000 131.000 129.000 132.000 )

    TRANSACTION_RATE: 32666.400 (per CPU: 8152.200 8169.842 8174.439 8169.897 )

    Pod to remote pod with redirect_peer(), TCP_RR:

    # percpu_netperf 10.217.1.33
    RT_LATENCY: 44.449 (per CPU: 43.767 43.127 45.279 45.622 )
    MEAN_LATENCY: 45.065 (per CPU: 44.030 45.530 45.190 45.510 )
    STDDEV_LATENCY: 84.823 (per CPU: 66.770 97.290 84.380 90.850 )
    MIN_LATENCY: 33.500 (per CPU: 33.000 33.000 34.000 34.000 )
    P50_LATENCY: 43.250 (per CPU: 43.000 43.000 43.000 44.000 )
    P90_LATENCY: 46.750 (per CPU: 46.000 47.000 47.000 47.000 )
    P99_LATENCY: 52.750 (per CPU: 51.000 54.000 53.000 53.000 )

    TRANSACTION_RATE: 90039.500 (per CPU: 22848.186 23187.089 22085.077 21919.130 )

    [0] https://linuxplumbersconf.org/event/7/contributions/674/attachments/568/1002/plumbers_2020_cilium_load_balancer.pdf
    [1] https://github.com/borkmann/netperf_scripts/blob/master/percpu_netperf

    Signed-off-by: Daniel Borkmann
    Signed-off-by: Alexei Starovoitov
    Link: https://lore.kernel.org/bpf/20201010234006.7075-3-daniel@iogearbox.net

    Daniel Borkmann
     

06 Oct, 2020

2 commits


04 Oct, 2020

1 commit


03 Oct, 2020

1 commit

  • As warned by "make htmldocs", there are two new struct elements
    that aren't documented:

    ../include/linux/netdevice.h:2159: warning: Function parameter or member 'unlink_list' not described in 'net_device'
    ../include/linux/netdevice.h:2159: warning: Function parameter or member 'nested_level' not described in 'net_device'

    Fixes: 1fc70edb7d7b ("net: core: add nested_level variable in net_device")
    Signed-off-by: Mauro Carvalho Chehab
    Signed-off-by: David S. Miller

    Mauro Carvalho Chehab
     

30 Sep, 2020

1 commit

  • Quite some drivers make conditional decisions based on in_interrupt() to
    invoke either netif_rx() or netif_rx_ni().

    Conditionals based on in_interrupt() or other variants of preempt count
    checks in drivers should not exist for various reasons and Linus clearly
    requested to either split the code pathes or pass an argument to the
    common functions which provides the context.

    This is obviously the correct solution, but for some of the affected
    drivers this needs a major rewrite due to their convoluted structure.

    As in_interrupt() usage in drivers needs to be phased out, provide
    netif_rx_any_context() as a stop gap for these drivers.

    This confines the in_interrupt() conditional to core code which in turn
    allows to remove the access to this check for driver code and provides one
    central place to do further modifications once the driver maze is cleaned
    up.

    Suggested-by: Thomas Gleixner
    Signed-off-by: Sebastian Andrzej Siewior
    Signed-off-by: Thomas Gleixner
    Signed-off-by: David S. Miller

    Sebastian Andrzej Siewior
     

29 Sep, 2020

2 commits

  • This patch is to add a new variable 'nested_level' into the net_device
    structure.
    This variable will be used as a parameter of spin_lock_nested() of
    dev->addr_list_lock.

    netif_addr_lock() can be called recursively so spin_lock_nested() is
    used instead of spin_lock() and dev->lower_level is used as a parameter
    of spin_lock_nested().
    But, dev->lower_level value can be updated while it is being used.
    So, lockdep would warn a possible deadlock scenario.

    When a stacked interface is deleted, netif_{uc | mc}_sync() is
    called recursively.
    So, spin_lock_nested() is called recursively too.
    At this moment, the dev->lower_level variable is used as a parameter of it.
    dev->lower_level value is updated when interfaces are being unlinked/linked
    immediately.
    Thus, After unlinking, dev->lower_level shouldn't be a parameter of
    spin_lock_nested().

    A (macvlan)
    |
    B (vlan)
    |
    C (bridge)
    |
    D (macvlan)
    |
    E (vlan)
    |
    F (bridge)

    A->lower_level : 6
    B->lower_level : 5
    C->lower_level : 4
    D->lower_level : 3
    E->lower_level : 2
    F->lower_level : 1

    When an interface 'A' is removed, it releases resources.
    At this moment, netif_addr_lock() would be called.
    Then, netdev_upper_dev_unlink() is called recursively.
    Then dev->lower_level is updated.
    There is no problem.

    But, when the bridge module is removed, 'C' and 'F' interfaces
    are removed at once.
    If 'F' is removed first, a lower_level value is like below.
    A->lower_level : 5
    B->lower_level : 4
    C->lower_level : 3
    D->lower_level : 2
    E->lower_level : 1
    F->lower_level : 1

    Then, 'C' is removed. at this moment, netif_addr_lock() is called
    recursively.
    The ordering is like this.
    C(3)->D(2)->E(1)->F(1)
    At this moment, the lower_level value of 'E' and 'F' are the same.
    So, lockdep warns a possible deadlock scenario.

    In order to avoid this problem, a new variable 'nested_level' is added.
    This value is the same as dev->lower_level - 1.
    But this value is updated in rtnl_unlock().
    So, this variable can be used as a parameter of spin_lock_nested() safely
    in the rtnl context.

    Test commands:
    ip link add br0 type bridge vlan_filtering 1
    ip link add vlan1 link br0 type vlan id 10
    ip link add macvlan2 link vlan1 type macvlan
    ip link add br3 type bridge vlan_filtering 1
    ip link set macvlan2 master br3
    ip link add vlan4 link br3 type vlan id 10
    ip link add macvlan5 link vlan4 type macvlan
    ip link add br6 type bridge vlan_filtering 1
    ip link set macvlan5 master br6
    ip link add vlan7 link br6 type vlan id 10
    ip link add macvlan8 link vlan7 type macvlan

    ip link set br0 up
    ip link set vlan1 up
    ip link set macvlan2 up
    ip link set br3 up
    ip link set vlan4 up
    ip link set macvlan5 up
    ip link set br6 up
    ip link set vlan7 up
    ip link set macvlan8 up
    modprobe -rv bridge

    Splat looks like:
    [ 36.057436][ T744] WARNING: possible recursive locking detected
    [ 36.058848][ T744] 5.9.0-rc6+ #728 Not tainted
    [ 36.059959][ T744] --------------------------------------------
    [ 36.061391][ T744] ip/744 is trying to acquire lock:
    [ 36.062590][ T744] ffff8c4767509280 (&vlan_netdev_addr_lock_key){+...}-{2:2}, at: dev_set_rx_mode+0x19/0x30
    [ 36.064922][ T744]
    [ 36.064922][ T744] but task is already holding lock:
    [ 36.066626][ T744] ffff8c4767769280 (&vlan_netdev_addr_lock_key){+...}-{2:2}, at: dev_uc_add+0x1e/0x60
    [ 36.068851][ T744]
    [ 36.068851][ T744] other info that might help us debug this:
    [ 36.070731][ T744] Possible unsafe locking scenario:
    [ 36.070731][ T744]
    [ 36.072497][ T744] CPU0
    [ 36.073238][ T744] ----
    [ 36.074007][ T744] lock(&vlan_netdev_addr_lock_key);
    [ 36.075290][ T744] lock(&vlan_netdev_addr_lock_key);
    [ 36.076590][ T744]
    [ 36.076590][ T744] *** DEADLOCK ***
    [ 36.076590][ T744]
    [ 36.078515][ T744] May be due to missing lock nesting notation
    [ 36.078515][ T744]
    [ 36.080491][ T744] 3 locks held by ip/744:
    [ 36.081471][ T744] #0: ffffffff98571df0 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x236/0x490
    [ 36.083614][ T744] #1: ffff8c4767769280 (&vlan_netdev_addr_lock_key){+...}-{2:2}, at: dev_uc_add+0x1e/0x60
    [ 36.085942][ T744] #2: ffff8c476c8da280 (&bridge_netdev_addr_lock_key/4){+...}-{2:2}, at: dev_uc_sync+0x39/0x80
    [ 36.088400][ T744]
    [ 36.088400][ T744] stack backtrace:
    [ 36.089772][ T744] CPU: 6 PID: 744 Comm: ip Not tainted 5.9.0-rc6+ #728
    [ 36.091364][ T744] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
    [ 36.093630][ T744] Call Trace:
    [ 36.094416][ T744] dump_stack+0x77/0x9b
    [ 36.095385][ T744] __lock_acquire+0xbc3/0x1f40
    [ 36.096522][ T744] lock_acquire+0xb4/0x3b0
    [ 36.097540][ T744] ? dev_set_rx_mode+0x19/0x30
    [ 36.098657][ T744] ? rtmsg_ifinfo+0x1f/0x30
    [ 36.099711][ T744] ? __dev_notify_flags+0xa5/0xf0
    [ 36.100874][ T744] ? rtnl_is_locked+0x11/0x20
    [ 36.101967][ T744] ? __dev_set_promiscuity+0x7b/0x1a0
    [ 36.103230][ T744] _raw_spin_lock_bh+0x38/0x70
    [ 36.104348][ T744] ? dev_set_rx_mode+0x19/0x30
    [ 36.105461][ T744] dev_set_rx_mode+0x19/0x30
    [ 36.106532][ T744] dev_set_promiscuity+0x36/0x50
    [ 36.107692][ T744] __dev_set_promiscuity+0x123/0x1a0
    [ 36.108929][ T744] dev_set_promiscuity+0x1e/0x50
    [ 36.110093][ T744] br_port_set_promisc+0x1f/0x40 [bridge]
    [ 36.111415][ T744] br_manage_promisc+0x8b/0xe0 [bridge]
    [ 36.112728][ T744] __dev_set_promiscuity+0x123/0x1a0
    [ 36.113967][ T744] ? __hw_addr_sync_one+0x23/0x50
    [ 36.115135][ T744] __dev_set_rx_mode+0x68/0x90
    [ 36.116249][ T744] dev_uc_sync+0x70/0x80
    [ 36.117244][ T744] dev_uc_add+0x50/0x60
    [ 36.118223][ T744] macvlan_open+0x18e/0x1f0 [macvlan]
    [ 36.119470][ T744] __dev_open+0xd6/0x170
    [ 36.120470][ T744] __dev_change_flags+0x181/0x1d0
    [ 36.121644][ T744] dev_change_flags+0x23/0x60
    [ 36.122741][ T744] do_setlink+0x30a/0x11e0
    [ 36.123778][ T744] ? __lock_acquire+0x92c/0x1f40
    [ 36.124929][ T744] ? __nla_validate_parse.part.6+0x45/0x8e0
    [ 36.126309][ T744] ? __lock_acquire+0x92c/0x1f40
    [ 36.127457][ T744] __rtnl_newlink+0x546/0x8e0
    [ 36.128560][ T744] ? lock_acquire+0xb4/0x3b0
    [ 36.129623][ T744] ? deactivate_slab.isra.85+0x6a1/0x850
    [ 36.130946][ T744] ? __lock_acquire+0x92c/0x1f40
    [ 36.132102][ T744] ? lock_acquire+0xb4/0x3b0
    [ 36.133176][ T744] ? is_bpf_text_address+0x5/0xe0
    [ 36.134364][ T744] ? rtnl_newlink+0x2e/0x70
    [ 36.135445][ T744] ? rcu_read_lock_sched_held+0x32/0x60
    [ 36.136771][ T744] ? kmem_cache_alloc_trace+0x2d8/0x380
    [ 36.138070][ T744] ? rtnl_newlink+0x2e/0x70
    [ 36.139164][ T744] rtnl_newlink+0x47/0x70
    [ ... ]

    Fixes: 845e0ebb4408 ("net: change addr_list_lock back to static key")
    Signed-off-by: Taehee Yoo
    Signed-off-by: David S. Miller

    Taehee Yoo
     
  • Functions related to nested interface infrastructure such as
    netdev_walk_all_{ upper | lower }_dev() pass both private functions
    and "data" pointer to handle their own things.
    At this point, the data pointer type is void *.
    In order to make it easier to expand common variables and functions,
    this new netdev_nested_priv structure is added.

    In the following patch, a new member variable will be added into this
    struct to fix the lockdep issue.

    Signed-off-by: Taehee Yoo
    Signed-off-by: David S. Miller

    Taehee Yoo
     

23 Sep, 2020

1 commit

  • Two minor conflicts:

    1) net/ipv4/route.c, adding a new local variable while
    moving another local variable and removing it's
    initial assignment.

    2) drivers/net/dsa/microchip/ksz9477.c, overlapping changes.
    One pretty prints the port mode differently, whilst another
    changes the driver to try and obtain the port mode from
    the port node rather than the switch node.

    Signed-off-by: David S. Miller

    David S. Miller
     

19 Sep, 2020

1 commit

  • Earlier commit 316cdaa1158a ("net: add option to not create fall-back
    tunnels in root-ns as well") removed the CONFIG_SYSCTL to enable the
    kernel-commandline to work. However, this variable gets defined only
    when CONFIG_SYSCTL option is selected.

    With this change the behavior would default to creating fall-back
    tunnels in all namespaces when CONFIG_SYSCTL is not selected and
    the kernel commandline option will be ignored.

    Fixes: 316cdaa1158a ("net: add option to not create fall-back tunnels in root-ns as well")
    Signed-off-by: Mahesh Bandewar
    Reported-by: Randy Dunlap
    Reported-by: kernel test robot
    Acked-by: Randy Dunlap # build-tested
    Signed-off-by: David S. Miller

    Mahesh Bandewar
     

18 Sep, 2020

1 commit


11 Sep, 2020

2 commits

  • To RCUify napi->dev_list we need to replace list_del_init()
    with list_del_rcu(). There is no _init() version for RCU for
    obvious reasons. Up until now netif_napi_del() was idempotent
    so to make sure it remains such add a bit which is set when
    NAPI is listed, and cleared when it removed. Since we don't
    expect multiple calls to netif_napi_add() to be correct,
    add a warning on that side.

    Now that napi_hash_add / napi_hash_del are only called by
    napi_add / del we can actually steal its bit. We just need
    to make sure hash node is initialized correctly.

    Signed-off-by: Jakub Kicinski
    Signed-off-by: David S. Miller

    Jakub Kicinski
     
  • We allow drivers to call napi_hash_del() before calling
    netif_napi_del() to batch RCU grace periods. This makes
    the API asymmetric and leaks internal implementation details.
    Soon we will want the grace period to protect more than just
    the NAPI hash table.

    Restructure the API and have drivers call a new function -
    __netif_napi_del() if they want to take care of RCU waits.

    Note that only core was checking the return status from
    napi_hash_del() so the new helper does not report if the
    NAPI was actually deleted.

    Some notes on driver oddness:
    - veth observed the grace period before calling netif_napi_del()
    but that should not matter
    - myri10ge observed normal RCU flavor
    - bnx2x and enic did not actually observe the grace period
    (unless they did so implicitly)
    - virtio_net and enic only unhashed Rx NAPIs

    The last two points seem to indicate that the calls to
    napi_hash_del() were a left over rather than an optimization.
    Regardless, it's easy enough to correct them.

    This patch may introduce extra synchronize_net() calls for
    interfaces which set NAPI_STATE_NO_BUSY_POLL and depend on
    free_netdev() to call netif_napi_del(). This seems inevitable
    since we want to use RCU for netpoll dev->napi_list traversal,
    and almost no drivers set IFF_DISABLE_NETPOLL.

    Signed-off-by: Jakub Kicinski
    Signed-off-by: David S. Miller

    Jakub Kicinski
     

08 Sep, 2020

2 commits

  • Fix kernel-doc warning in :

    ../include/linux/netdevice.h:2158: warning: Function parameter or member 'xdp_state' not described in 'net_device'

    Fixes: 7f0a838254bd ("bpf, xdp: Maintain info on attached XDP BPF programs in net_device")
    Signed-off-by: Randy Dunlap
    Cc: Andrii Nakryiko
    Cc: Alexei Starovoitov
    Signed-off-by: Jakub Kicinski

    Randy Dunlap
     
  • Fix kernel-doc warning in :

    ../include/linux/netdevice.h:2158: warning: Function parameter or member 'proto_down_reason' not described in 'net_device'

    Fixes: 829eb208e80d ("rtnetlink: add support for protodown reason")
    Signed-off-by: Randy Dunlap
    Acked-by: Roopa Prabhu
    Signed-off-by: Jakub Kicinski

    Randy Dunlap
     

02 Sep, 2020

1 commit

  • Daniel Borkmann says:

    ====================
    pull-request: bpf-next 2020-09-01

    The following pull-request contains BPF updates for your *net-next* tree.

    There are two small conflicts when pulling, resolve as follows:

    1) Merge conflict in tools/lib/bpf/libbpf.c between 88a82120282b ("libbpf: Factor
    out common ELF operations and improve logging") in bpf-next and 1e891e513e16
    ("libbpf: Fix map index used in error message") in net-next. Resolve by taking
    the hunk in bpf-next:

    [...]
    scn = elf_sec_by_idx(obj, obj->efile.btf_maps_shndx);
    data = elf_sec_data(obj, scn);
    if (!scn || !data) {
    pr_warn("elf: failed to get %s map definitions for %s\n",
    MAPS_ELF_SEC, obj->path);
    return -EINVAL;
    }
    [...]

    2) Merge conflict in drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c between
    9647c57b11e5 ("xsk: i40e: ice: ixgbe: mlx5: Test for dma_need_sync earlier for
    better performance") in bpf-next and e20f0dbf204f ("net/mlx5e: RX, Add a prefetch
    command for small L1_CACHE_BYTES") in net-next. Resolve the two locations by retaining
    net_prefetch() and taking xsk_buff_dma_sync_for_cpu() from bpf-next. Should look like:

    [...]
    xdp_set_data_meta_invalid(xdp);
    xsk_buff_dma_sync_for_cpu(xdp, rq->xsk_pool);
    net_prefetch(xdp->data);
    [...]

    We've added 133 non-merge commits during the last 14 day(s) which contain
    a total of 246 files changed, 13832 insertions(+), 3105 deletions(-).

    The main changes are:

    1) Initial support for sleepable BPF programs along with bpf_copy_from_user() helper
    for tracing to reliably access user memory, from Alexei Starovoitov.

    2) Add BPF infra for writing and parsing TCP header options, from Martin KaFai Lau.

    3) bpf_d_path() helper for returning full path for given 'struct path', from Jiri Olsa.

    4) AF_XDP support for shared umems between devices and queues, from Magnus Karlsson.

    5) Initial prep work for full BPF-to-BPF call support in libbpf, from Andrii Nakryiko.

    6) Generalize bpf_sk_storage map & add local storage for inodes, from KP Singh.

    7) Implement sockmap/hash updates from BPF context, from Lorenz Bauer.

    8) BPF xor verification for scalar types & add BPF link iterator, from Yonghong Song.

    9) Use target's prog type for BPF_PROG_TYPE_EXT prog verification, from Udip Pant.

    10) Rework BPF tracing samples to use libbpf loader, from Daniel T. Lee.

    11) Fix xdpsock sample to really cycle through all buffers, from Weqaar Janjua.

    12) Improve type safety for tun/veth XDP frame handling, from Maciej Żenczykowski.

    13) Various smaller cleanups and improvements all over the place.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

01 Sep, 2020

1 commit

  • Replace the explicit umem reference passed to the driver in AF_XDP
    zero-copy mode with the buffer pool instead. This in preparation for
    extending the functionality of the zero-copy mode so that umems can be
    shared between queues on the same netdev and also between netdevs. In
    this commit, only an umem reference has been added to the buffer pool
    struct. But later commits will add other entities to it. These are
    going to be entities that are different between different queue ids
    and netdevs even though the umem is shared between them.

    Signed-off-by: Magnus Karlsson
    Signed-off-by: Daniel Borkmann
    Acked-by: Björn Töpel
    Link: https://lore.kernel.org/bpf/1598603189-32145-2-git-send-email-magnus.karlsson@intel.com

    Magnus Karlsson
     

28 Aug, 2020

1 commit

  • The sysctl that was added earlier by commit 79134e6ce2c ("net: do
    not create fallback tunnels for non-default namespaces") to create
    fall-back only in root-ns. This patch enhances that behavior to provide
    option not to create fallback tunnels in root-ns as well. Since modules
    that create fallback tunnels could be built-in and setting the sysctl
    value after booting is pointless, so added a kernel cmdline options to
    change this default. The default setting is preseved for backward
    compatibility. The kernel command line option of fb_tunnels=initns will
    set the sysctl value to 1 and will create fallback tunnels only in initns
    while kernel cmdline fb_tunnels=none will set the sysctl value to 2 and
    fallback tunnels are skipped in every netns.

    Signed-off-by: Mahesh Bandewar
    Cc: Eric Dumazet
    Cc: Maciej Zenczykowski
    Cc: Jian Yang
    Cc: Randy Dunlap
    Signed-off-by: David S. Miller

    Mahesh Bandewar
     

27 Aug, 2020

1 commit


06 Aug, 2020

1 commit

  • Pull networking updates from David Miller:

    1) Support 6Ghz band in ath11k driver, from Rajkumar Manoharan.

    2) Support UDP segmentation in code TSO code, from Eric Dumazet.

    3) Allow flashing different flash images in cxgb4 driver, from Vishal
    Kulkarni.

    4) Add drop frames counter and flow status to tc flower offloading,
    from Po Liu.

    5) Support n-tuple filters in cxgb4, from Vishal Kulkarni.

    6) Various new indirect call avoidance, from Eric Dumazet and Brian
    Vazquez.

    7) Fix BPF verifier failures on 32-bit pointer arithmetic, from
    Yonghong Song.

    8) Support querying and setting hardware address of a port function via
    devlink, use this in mlx5, from Parav Pandit.

    9) Support hw ipsec offload on bonding slaves, from Jarod Wilson.

    10) Switch qca8k driver over to phylink, from Jonathan McDowell.

    11) In bpftool, show list of processes holding BPF FD references to
    maps, programs, links, and btf objects. From Andrii Nakryiko.

    12) Several conversions over to generic power management, from Vaibhav
    Gupta.

    13) Add support for SO_KEEPALIVE et al. to bpf_setsockopt(), from Dmitry
    Yakunin.

    14) Various https url conversions, from Alexander A. Klimov.

    15) Timestamping and PHC support for mscc PHY driver, from Antoine
    Tenart.

    16) Support bpf iterating over tcp and udp sockets, from Yonghong Song.

    17) Support 5GBASE-T i40e NICs, from Aleksandr Loktionov.

    18) Add kTLS RX HW offload support to mlx5e, from Tariq Toukan.

    19) Fix the ->ndo_start_xmit() return type to be netdev_tx_t in several
    drivers. From Luc Van Oostenryck.

    20) XDP support for xen-netfront, from Denis Kirjanov.

    21) Support receive buffer autotuning in MPTCP, from Florian Westphal.

    22) Support EF100 chip in sfc driver, from Edward Cree.

    23) Add XDP support to mvpp2 driver, from Matteo Croce.

    24) Support MPTCP in sock_diag, from Paolo Abeni.

    25) Commonize UDP tunnel offloading code by creating udp_tunnel_nic
    infrastructure, from Jakub Kicinski.

    26) Several pci_ --> dma_ API conversions, from Christophe JAILLET.

    27) Add FLOW_ACTION_POLICE support to mlxsw, from Ido Schimmel.

    28) Add SK_LOOKUP bpf program type, from Jakub Sitnicki.

    29) Refactor a lot of networking socket option handling code in order to
    avoid set_fs() calls, from Christoph Hellwig.

    30) Add rfc4884 support to icmp code, from Willem de Bruijn.

    31) Support TBF offload in dpaa2-eth driver, from Ioana Ciornei.

    32) Support XDP_REDIRECT in qede driver, from Alexander Lobakin.

    33) Support PCI relaxed ordering in mlx5 driver, from Aya Levin.

    34) Support TCP syncookies in MPTCP, from Flowian Westphal.

    35) Fix several tricky cases of PMTU handling wrt. briding, from Stefano
    Brivio.

    * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2056 commits)
    net: thunderx: initialize VF's mailbox mutex before first usage
    usb: hso: remove bogus check for EINPROGRESS
    usb: hso: no complaint about kmalloc failure
    hso: fix bailout in error case of probe
    ip_tunnel_core: Fix build for archs without _HAVE_ARCH_IPV6_CSUM
    selftests/net: relax cpu affinity requirement in msg_zerocopy test
    mptcp: be careful on subflow creation
    selftests: rtnetlink: make kci_test_encap() return sub-test result
    selftests: rtnetlink: correct the final return value for the test
    net: dsa: sja1105: use detected device id instead of DT one on mismatch
    tipc: set ub->ifindex for local ipv6 address
    ipv6: add ipv6_dev_find()
    net: openvswitch: silence suspicious RCU usage warning
    Revert "vxlan: fix tos value before xmit"
    ptp: only allow phase values lower than 1 period
    farsync: switch from 'pci_' to 'dma_' API
    wan: wanxl: switch from 'pci_' to 'dma_' API
    hv_netvsc: do not use VF device if link is down
    dpaa2-eth: Fix passing zero to 'PTR_ERR' warning
    net: macb: Properly handle phylink on at91sam9x
    ...

    Linus Torvalds
     

05 Aug, 2020

2 commits

  • Pull documentation updates from Jonathan Corbet:
    "It's been a busy cycle for documentation - hopefully the busiest for a
    while to come. Changes include:

    - Some new Chinese translations

    - Progress on the battle against double words words and non-HTTPS
    URLs

    - Some block-mq documentation

    - More RST conversions from Mauro. At this point, that task is
    essentially complete, so we shouldn't see this kind of churn again
    for a while. Unless we decide to switch to asciidoc or
    something...:)

    - Lots of typo fixes, warning fixes, and more"

    * tag 'docs-5.9' of git://git.lwn.net/linux: (195 commits)
    scripts/kernel-doc: optionally treat warnings as errors
    docs: ia64: correct typo
    mailmap: add entry for
    doc/zh_CN: add cpu-load Chinese version
    Documentation/admin-guide: tainted-kernels: fix spelling mistake
    MAINTAINERS: adjust kprobes.rst entry to new location
    devices.txt: document rfkill allocation
    PCI: correct flag name
    docs: filesystems: vfs: correct flag name
    docs: filesystems: vfs: correct sync_mode flag names
    docs: path-lookup: markup fixes for emphasis
    docs: path-lookup: more markup fixes
    docs: path-lookup: fix HTML entity mojibake
    CREDITS: Replace HTTP links with HTTPS ones
    docs: process: Add an example for creating a fixes tag
    doc/zh_CN: add Chinese translation prefer section
    doc/zh_CN: add clearing-warn-once Chinese version
    doc/zh_CN: add admin-guide index
    doc:it_IT: process: coding-style.rst: Correct __maybe_unused compiler label
    futex: MAINTAINERS: Re-add selftests directory
    ...

    Linus Torvalds
     
  • Currently, processes sending traffic to a local bridge with an
    encapsulation device as a port don't get ICMP errors if they exceed
    the PMTU of the encapsulated link.

    David Ahern suggested this as a hack, but it actually looks like
    the correct solution: when we update the PMTU for a given destination
    by means of updating or creating a route exception, the encapsulation
    might trigger this because of PMTU discovery happening either on the
    encapsulation device itself, or its lower layer. This happens on
    bridged encapsulations only.

    The output interface shouldn't matter, because we already have a
    valid destination. Drop the output interface restriction from the
    associated route lookup.

    For UDP tunnels, we will now have a route exception created for the
    encapsulation itself, with a MTU value reflecting its headroom, which
    allows a bridge forwarding IP packets originated locally to deliver
    errors back to the sending socket.

    The behaviour is now consistent with IPv6 and verified with selftests
    pmtu_ipv{4,6}_br_{geneve,vxlan}{4,6}_exception introduced later in
    this series.

    v2:
    - reset output interface only for bridge ports (David Ahern)
    - add and use netif_is_any_bridge_port() helper (David Ahern)

    Suggested-by: David Ahern
    Signed-off-by: Stefano Brivio
    Reviewed-by: David Ahern
    Signed-off-by: David S. Miller

    Stefano Brivio
     

04 Aug, 2020

1 commit

  • Daniel Borkmann says:

    ====================
    pull-request: bpf-next 2020-08-04

    The following pull-request contains BPF updates for your *net-next* tree.

    We've added 73 non-merge commits during the last 9 day(s) which contain
    a total of 135 files changed, 4603 insertions(+), 1013 deletions(-).

    The main changes are:

    1) Implement bpf_link support for XDP. Also add LINK_DETACH operation for the BPF
    syscall allowing processes with BPF link FD to force-detach, from Andrii Nakryiko.

    2) Add BPF iterator for map elements and to iterate all BPF programs for efficient
    in-kernel inspection, from Yonghong Song and Alexei Starovoitov.

    3) Separate bpf_get_{stack,stackid}() helpers for perf events in BPF to avoid
    unwinder errors, from Song Liu.

    4) Allow cgroup local storage map to be shared between programs on the same
    cgroup. Also extend BPF selftests with coverage, from YiFei Zhu.

    5) Add BPF exception tables to ARM64 JIT in order to be able to JIT BPF_PROBE_MEM
    load instructions, from Jean-Philippe Brucker.

    6) Follow-up fixes on BPF socket lookup in combination with reuseport group
    handling. Also add related BPF selftests, from Jakub Sitnicki.

    7) Allow to use socket storage in BPF_PROG_TYPE_CGROUP_SOCK-typed programs for
    socket create/release as well as bind functions, from Stanislav Fomichev.

    8) Fix an info leak in xsk_getsockopt() when retrieving XDP stats via old struct
    xdp_statistics, from Peilin Ye.

    9) Fix PT_REGS_RC{,_CORE}() macros in libbpf for MIPS arch, from Jerry Crunchtime.

    10) Extend BPF kernel test infra with skb->family and skb->{local,remote}_ip{4,6}
    fields and allow user space to specify skb->dev via ifindex, from Dmitry Yakunin.

    11) Fix a bpftool segfault due to missing program type name and make it more robust
    to prevent them in future gaps, from Quentin Monnet.

    12) Consolidate cgroup helper functions across selftests and fix a v6 localhost
    resolver issue, from John Fastabend.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

01 Aug, 2020

1 commit

  • netdev protodown is a mechanism that allows protocols to
    hold an interface down. It was initially introduced in
    the kernel to hold links down by a multihoming protocol.
    There was also an attempt to introduce protodown
    reason at the time but was rejected. protodown and protodown reason
    is supported by almost every switching and routing platform.
    It was ok for a while to live without a protodown reason.
    But, its become more critical now given more than
    one protocol may need to keep a link down on a system
    at the same time. eg: vrrp peer node, port security,
    multihoming protocol. Its common for Network operators and
    protocol developers to look for such a reason on a networking
    box (Its also known as errDisable by most networking operators)

    This patch adds support for link protodown reason
    attribute. There are two ways to maintain protodown
    reasons.
    (a) enumerate every possible reason code in kernel
    - A protocol developer has to make a request and
    have that appear in a certain kernel version
    (b) provide the bits in the kernel, and allow user-space
    (sysadmin or NOS distributions) to manage the bit-to-reasonname
    map.
    - This makes extending reason codes easier (kind of like
    the iproute2 table to vrf-name map /etc/iproute2/rt_tables.d/)

    This patch takes approach (b).

    a few things about the patch:
    - It treats the protodown reason bits as counter to indicate
    active protodown users
    - Since protodown attribute is already an exposed UAPI,
    the reason is not enforced on a protodown set. Its a no-op
    if not used.
    the patch follows the below algorithm:
    - presence of reason bits set indicates protodown
    is in use
    - user can set protodown and protodown reason in a
    single or multiple setlink operations
    - setlink operation to clear protodown, will return -EBUSY
    if there are active protodown reason bits
    - reason is not included in link dumps if not used

    example with patched iproute2:
    $cat /etc/iproute2/protodown_reasons.d/r.conf
    0 mlag
    1 evpn
    2 vrrp
    3 psecurity

    $ip link set dev vxlan0 protodown on protodown_reason vrrp on
    $ip link set dev vxlan0 protodown_reason mlag on
    $ip link show
    14: vxlan0: mtu 1500 qdisc noop state DOWN mode
    DEFAULT group default qlen 1000
    link/ether f6:06:be:17:91:e7 brd ff:ff:ff:ff:ff:ff protodown on

    $ip link set dev vxlan0 protodown_reason mlag off
    $ip link set dev vxlan0 protodown off protodown_reason vrrp off

    Signed-off-by: Roopa Prabhu
    Signed-off-by: David S. Miller

    Roopa Prabhu
     

26 Jul, 2020

3 commits

  • Now that BPF program/link management is centralized in generic net_device
    code, kernel code never queries program id from drivers, so
    XDP_QUERY_PROG/XDP_QUERY_PROG_HW commands are unnecessary.

    This patch removes all the implementations of those commands in kernel, along
    the xdp_attachment_query().

    This patch was compile-tested on allyesconfig.

    Signed-off-by: Andrii Nakryiko
    Signed-off-by: Alexei Starovoitov
    Link: https://lore.kernel.org/bpf/20200722064603.3350758-10-andriin@fb.com

    Andrii Nakryiko
     
  • Add bpf_link-based API (bpf_xdp_link) to attach BPF XDP program through
    BPF_LINK_CREATE command.

    bpf_xdp_link is mutually exclusive with direct BPF program attachment,
    previous BPF program should be detached prior to attempting to create a new
    bpf_xdp_link attachment (for a given XDP mode). Once BPF link is attached, it
    can't be replaced by other BPF program attachment or link attachment. It will
    be detached only when the last BPF link FD is closed.

    bpf_xdp_link will be auto-detached when net_device is shutdown, similarly to
    how other BPF links behave (cgroup, flow_dissector). At that point bpf_link
    will become defunct, but won't be destroyed until last FD is closed.

    Signed-off-by: Andrii Nakryiko
    Signed-off-by: Alexei Starovoitov
    Link: https://lore.kernel.org/bpf/20200722064603.3350758-5-andriin@fb.com

    Andrii Nakryiko
     
  • Instead of delegating to drivers, maintain information about which BPF
    programs are attached in which XDP modes (generic/skb, driver, or hardware)
    locally in net_device. This effectively obsoletes XDP_QUERY_PROG command.

    Such re-organization simplifies existing code already. But it also allows to
    further add bpf_link-based XDP attachments without drivers having to know
    about any of this at all, which seems like a good setup.
    XDP_SETUP_PROG/XDP_SETUP_PROG_HW are just low-level commands to driver to
    install/uninstall active BPF program. All the higher-level concerns about
    prog/link interaction will be contained within generic driver-agnostic logic.

    All the XDP_QUERY_PROG calls to driver in dev_xdp_uninstall() were removed.
    It's not clear for me why dev_xdp_uninstall() were passing previous prog_flags
    when resetting installed programs. That seems unnecessary, plus most drivers
    don't populate prog_flags anyways. Having XDP_SETUP_PROG vs XDP_SETUP_PROG_HW
    should be enough of an indicator of what is required of driver to correctly
    reset active BPF program. dev_xdp_uninstall() is also generalized as an
    iteration over all three supported mode.

    Signed-off-by: Andrii Nakryiko
    Signed-off-by: Alexei Starovoitov
    Link: https://lore.kernel.org/bpf/20200722064603.3350758-3-andriin@fb.com

    Andrii Nakryiko
     

11 Jul, 2020

1 commit

  • Cater to devices which:
    (a) may want to sleep in the callbacks;
    (b) only have IPv4 support;
    (c) need all the programming to happen while the netdev is up.

    Drivers attach UDP tunnel offload info struct to their netdevs,
    where they declare how many UDP ports of various tunnel types
    they support. Core takes care of tracking which ports to offload.

    Use a fixed-size array since this matches what almost all drivers
    do, and avoids a complexity and uncertainty around memory allocations
    in an atomic context.

    Make sure that tunnel drivers don't try to replay the ports when
    new NIC netdev is registered. Automatic replays would mess up
    reference counting, and will be removed completely once all drivers
    are converted.

    v4:
    - use a #define NULL to avoid build issues with CONFIG_INET=n.

    Signed-off-by: Jakub Kicinski
    Signed-off-by: David S. Miller

    Jakub Kicinski
     

27 Jun, 2020

1 commit

  • Changeset 6f8b12d661d0 ("net: napi: add hard irqs deferral feature")
    added a new element at struct net_device.

    Add a description for it, based on what's described at the changeset
    which added such feature.

    Fixes: 6f8b12d661d0 ("net: napi: add hard irqs deferral feature")
    Signed-off-by: Mauro Carvalho Chehab
    Link: https://lore.kernel.org/r/807a3840e7bc1562adefadb0535c9f47e6ab52e0.1592895969.git.mchehab+huawei@kernel.org
    Signed-off-by: Jonathan Corbet

    Mauro Carvalho Chehab
     

19 Jun, 2020

1 commit

  • In the current code, ->ndo_start_xmit() can be executed recursively only
    10 times because of stack memory.
    But, in the case of the vxlan, 10 recursion limit value results in
    a stack overflow.
    In the current code, the nested interface is limited by 8 depth.
    There is no critical reason that the recursion limitation value should
    be 10.
    So, it would be good to be the same value with the limitation value of
    nesting interface depth.

    Test commands:
    ip link add vxlan10 type vxlan vni 10 dstport 4789 srcport 4789 4789
    ip link set vxlan10 up
    ip a a 192.168.10.1/24 dev vxlan10
    ip n a 192.168.10.2 dev vxlan10 lladdr fc:22:33:44:55:66 nud permanent

    for i in {9..0}
    do
    let A=$i+1
    ip link add vxlan$i type vxlan vni $i dstport 4789 srcport 4789 4789
    ip link set vxlan$i up
    ip a a 192.168.$i.1/24 dev vxlan$i
    ip n a 192.168.$i.2 dev vxlan$i lladdr fc:22:33:44:55:66 nud permanent
    bridge fdb add fc:22:33:44:55:66 dev vxlan$A dst 192.168.$i.2 self
    done
    hping3 192.168.10.2 -2 -d 60000

    Splat looks like:
    [ 103.814237][ T1127] =============================================================================
    [ 103.871955][ T1127] BUG kmalloc-2k (Tainted: G B ): Padding overwritten. 0x00000000897a2e4f-0x000
    [ 103.873187][ T1127] -----------------------------------------------------------------------------
    [ 103.873187][ T1127]
    [ 103.874252][ T1127] INFO: Slab 0x000000005cccc724 objects=5 used=5 fp=0x0000000000000000 flags=0x10000000001020
    [ 103.881323][ T1127] CPU: 3 PID: 1127 Comm: hping3 Tainted: G B 5.7.0+ #575
    [ 103.882131][ T1127] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
    [ 103.883006][ T1127] Call Trace:
    [ 103.883324][ T1127] dump_stack+0x96/0xdb
    [ 103.883716][ T1127] slab_err+0xad/0xd0
    [ 103.884106][ T1127] ? _raw_spin_unlock+0x1f/0x30
    [ 103.884620][ T1127] ? get_partial_node.isra.78+0x140/0x360
    [ 103.885214][ T1127] slab_pad_check.part.53+0xf7/0x160
    [ 103.885769][ T1127] ? pskb_expand_head+0x110/0xe10
    [ 103.886316][ T1127] check_slab+0x97/0xb0
    [ 103.886763][ T1127] alloc_debug_processing+0x84/0x1a0
    [ 103.887308][ T1127] ___slab_alloc+0x5a5/0x630
    [ 103.887765][ T1127] ? pskb_expand_head+0x110/0xe10
    [ 103.888265][ T1127] ? lock_downgrade+0x730/0x730
    [ 103.888762][ T1127] ? pskb_expand_head+0x110/0xe10
    [ 103.889244][ T1127] ? __slab_alloc+0x3e/0x80
    [ 103.889675][ T1127] __slab_alloc+0x3e/0x80
    [ 103.890108][ T1127] __kmalloc_node_track_caller+0xc7/0x420
    [ ... ]

    Fixes: 11a766ce915f ("net: Increase xmit RECURSION_LIMIT to 10.")
    Signed-off-by: Taehee Yoo
    Signed-off-by: David S. Miller

    Taehee Yoo
     

14 Jun, 2020

1 commit

  • Pull networking fixes from David Miller:

    1) Fix cfg80211 deadlock, from Johannes Berg.

    2) RXRPC fails to send norigications, from David Howells.

    3) MPTCP RM_ADDR parsing has an off by one pointer error, fix from
    Geliang Tang.

    4) Fix crash when using MSG_PEEK with sockmap, from Anny Hu.

    5) The ucc_geth driver needs __netdev_watchdog_up exported, from
    Valentin Longchamp.

    6) Fix hashtable memory leak in dccp, from Wang Hai.

    7) Fix how nexthops are marked as FDB nexthops, from David Ahern.

    8) Fix mptcp races between shutdown and recvmsg, from Paolo Abeni.

    9) Fix crashes in tipc_disc_rcv(), from Tuong Lien.

    10) Fix link speed reporting in iavf driver, from Brett Creeley.

    11) When a channel is used for XSK and then reused again later for XSK,
    we forget to clear out the relevant data structures in mlx5 which
    causes all kinds of problems. Fix from Maxim Mikityanskiy.

    12) Fix memory leak in genetlink, from Cong Wang.

    13) Disallow sockmap attachments to UDP sockets, it simply won't work.
    From Lorenz Bauer.

    * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (83 commits)
    net: ethernet: ti: ale: fix allmulti for nu type ale
    net: ethernet: ti: am65-cpsw-nuss: fix ale parameters init
    net: atm: Remove the error message according to the atomic context
    bpf: Undo internal BPF_PROBE_MEM in BPF insns dump
    libbpf: Support pre-initializing .bss global variables
    tools/bpftool: Fix skeleton codegen
    bpf: Fix memlock accounting for sock_hash
    bpf: sockmap: Don't attach programs to UDP sockets
    bpf: tcp: Recv() should return 0 when the peer socket is closed
    ibmvnic: Flush existing work items before device removal
    genetlink: clean up family attributes allocations
    net: ipa: header pad field only valid for AP->modem endpoint
    net: ipa: program upper nibbles of sequencer type
    net: ipa: fix modem LAN RX endpoint id
    net: ipa: program metadata mask differently
    ionic: add pcie_print_link_status
    rxrpc: Fix race between incoming ACK parser and retransmitter
    net/mlx5: E-Switch, Fix some error pointer dereferences
    net/mlx5: Don't fail driver on failure to create debugfs
    net/mlx5e: CT: Fix ipv6 nat header rewrite actions
    ...

    Linus Torvalds
     

10 Jun, 2020

1 commit

  • The dynamic key update for addr_list_lock still causes troubles,
    for example the following race condition still exists:

    CPU 0: CPU 1:
    (RCU read lock) (RTNL lock)
    dev_mc_seq_show() netdev_update_lockdep_key()
    -> lockdep_unregister_key()
    -> netif_addr_lock_bh()

    because lockdep doesn't provide an API to update it atomically.
    Therefore, we have to move it back to static keys and use subclass
    for nest locking like before.

    In commit 1a33e10e4a95 ("net: partially revert dynamic lockdep key
    changes"), I already reverted most parts of commit ab92d68fc22f
    ("net: core: add generic lockdep keys").

    This patch reverts the rest and also part of commit f3b0a18bb6cb
    ("net: remove unnecessary variables and callback"). After this
    patch, addr_list_lock changes back to using static keys and
    subclasses to satisfy lockdep. Thanks to dev->lower_level, we do
    not have to change back to ->ndo_get_lock_subclass().

    And hopefully this reduces some syzbot lockdep noises too.

    Reported-by: syzbot+f3a0e80c34b3fc28ac5e@syzkaller.appspotmail.com
    Cc: Taehee Yoo
    Cc: Dmitry Vyukov
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Cong Wang
     

09 Jun, 2020

1 commit

  • Instead of enabling dynamic debug globally with CONFIG_DYNAMIC_DEBUG,
    CONFIG_DYNAMIC_DEBUG_CORE will only enable core function of dynamic
    debug. With the DYNAMIC_DEBUG_MODULE defined for any modules, dynamic
    debug will be tied to them.

    This is useful for people who only want to enable dynamic debug for
    kernel modules without worrying about kernel image size and memory
    consumption is increasing too much.

    [orson.zhai@unisoc.com: v2]
    Link: http://lkml.kernel.org/r/1587408228-10861-1-git-send-email-orson.unisoc@gmail.com

    Signed-off-by: Orson Zhai
    Signed-off-by: Andrew Morton
    Acked-by: Greg Kroah-Hartman
    Acked-by: Petr Mladek
    Cc: Jonathan Corbet
    Cc: Sergey Senozhatsky
    Cc: Steven Rostedt
    Cc: Jason Baron
    Cc: Randy Dunlap
    Link: http://lkml.kernel.org/r/1586521984-5890-1-git-send-email-orson.unisoc@gmail.com
    Signed-off-by: Linus Torvalds

    Orson Zhai
     

24 May, 2020

1 commit


20 May, 2020

1 commit

  • This method is used to properly allow kernel callers of the IPv4 route
    management ioctls. The exsting ip_tunnel_ioctl helper is renamed to
    ip_tunnel_ctl to better reflect that it doesn't directly implement ioctls
    touching user memory, and is used for the guts of ndo_tunnel_ctl
    implementations. A new ip_tunnel_ioctl helper is added that can be wired
    up directly to the ndo_do_ioctl method and takes care of the copy to and
    from userspace.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: David S. Miller

    Christoph Hellwig