25 Oct, 2019

1 commit

  • Some interface types could be nested.
    (VLAN, BONDING, TEAM, MACSEC, MACVLAN, IPVLAN, VIRT_WIFI, VXLAN, etc..)
    These interface types should set lockdep class because, without lockdep
    class key, lockdep always warn about unexisting circular locking.

    In the current code, these interfaces have their own lockdep class keys and
    these manage itself. So that there are so many duplicate code around the
    /driver/net and /net/.
    This patch adds new generic lockdep keys and some helper functions for it.

    This patch does below changes.
    a) Add lockdep class keys in struct net_device
    - qdisc_running, xmit, addr_list, qdisc_busylock
    - these keys are used as dynamic lockdep key.
    b) When net_device is being allocated, lockdep keys are registered.
    - alloc_netdev_mqs()
    c) When net_device is being free'd llockdep keys are unregistered.
    - free_netdev()
    d) Add generic lockdep key helper function
    - netdev_register_lockdep_key()
    - netdev_unregister_lockdep_key()
    - netdev_update_lockdep_key()
    e) Remove unnecessary generic lockdep macro and functions
    f) Remove unnecessary lockdep code of each interfaces.

    After this patch, each interface modules don't need to maintain
    their lockdep keys.

    Signed-off-by: Taehee Yoo
    Signed-off-by: David S. Miller

    Taehee Yoo
     

02 Oct, 2019

1 commit

  • commit 174e23810cd31
    ("sk_buff: drop all skb extensions on free and skb scrubbing") made napi
    recycle always drop skb extensions. The additional skb_ext_del() that is
    performed via nf_reset on napi skb recycle is not needed anymore.

    Most nf_reset() calls in the stack are there so queued skb won't block
    'rmmod nf_conntrack' indefinitely.

    This removes the skb_ext_del from nf_reset, and renames it to a more
    fitting nf_reset_ct().

    In a few selected places, add a call to skb_ext_reset to make sure that
    no active extensions remain.

    I am submitting this for "net", because we're still early in the release
    cycle. The patch applies to net-next too, but I think the rename causes
    needless divergence between those trees.

    Suggested-by: Eric Dumazet
    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

28 Sep, 2019

1 commit

  • A user reported that vrf create fails when IPv6 is disabled at boot using
    'ipv6.disable=1':
    https://bugzilla.kernel.org/show_bug.cgi?id=204903

    The failure is adding fib rules at create time. Add RTNL_FAMILY_IP6MR to
    the check in vrf_fib_rule if ipv6_mod_enabled is disabled.

    Fixes: e4a38c0c4b27 ("ipv6: add vrf table handling code for ipv6 mcast")
    Signed-off-by: David Ahern
    Cc: Patrick Ruddy
    Signed-off-by: David S. Miller

    David Ahern
     

22 Jul, 2019

1 commit

  • vrf_process_v4_outbound() and vrf_process_v6_outbound() do routing
    using ip/ipv6 addresses, but don't make sure the header is available
    in skb->data[] (skb_headlen() is less then header size).

    Case:

    1) igb driver from intel.
    2) Packet size is greater then 255.
    3) MPLS forwards to VRF device.

    So, patch adds pskb_may_pull() calls in vrf_process_v4/v6_outbound()
    functions.

    Signed-off-by: Peter Kosyh
    Reviewed-by: David Ahern
    Signed-off-by: David S. Miller

    Peter Kosyh
     

28 Jun, 2019

1 commit

  • The new route handling in ip_mc_finish_output() from 'net' overlapped
    with the new support for returning congestion notifications from BPF
    programs.

    In order to handle this I had to take the dev_loopback_xmit() calls
    out of the switch statement.

    The aquantia driver conflicts were simple overlapping changes.

    Signed-off-by: David S. Miller

    David S. Miller
     

27 Jun, 2019

1 commit

  • There is no functional change in this patch, it only prepares the next one.

    rt6_nexthop() will be used by ip6_dst_lookup_neigh(), which uses const
    variables.

    Signed-off-by: Nicolas Dichtel
    Reported-by: kbuild test robot
    Acked-by: Nick Desaulniers
    Signed-off-by: David S. Miller

    Nicolas Dichtel
     

24 Jun, 2019

1 commit

  • For tx path, in most cases, we still have to take refcnt on the dst
    cause the caller is caching the dst somewhere. But it still is
    beneficial to make use of RT6_LOOKUP_F_DST_NOREF flag while doing the
    route lookup. It is cause this flag prevents manipulating refcnt on
    net->ipv6.ip6_null_entry when doing fib6_rule_lookup() to traverse each
    routing table. The null_entry is a shared object and constant updates on
    it cause false sharing.

    We converted the current major lookup function ip6_route_output_flags()
    to make use of RT6_LOOKUP_F_DST_NOREF.

    Together with the change in the rx path, we see noticable performance
    boost:
    I ran synflood tests between 2 hosts under the same switch. Both hosts
    have 20G mlx NIC, and 8 tx/rx queues.
    Sender sends pure SYN flood with random src IPs and ports using trafgen.
    Receiver has a simple TCP listener on the target port.
    Both hosts have multiple custom rules:
    - For incoming packets, only local table is traversed.
    - For outgoing packets, 3 tables are traversed to find the route.
    The packet processing rate on the receiver is as follows:
    - Before the fix: 3.78Mpps
    - After the fix: 5.50Mpps

    Signed-off-by: Wei Wang
    Signed-off-by: David S. Miller

    Wei Wang
     

31 May, 2019

1 commit

  • Based on 1 normalized pattern(s):

    this program is free software you can redistribute it and or modify
    it under the terms of the gnu general public license as published by
    the free software foundation either version 2 of the license or at
    your option any later version

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-or-later

    has been chosen to replace the boilerplate/reference in 3029 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Allison Randal
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

26 Apr, 2019

1 commit


25 Apr, 2019

1 commit


09 Apr, 2019

3 commits

  • David S. Miller
     
  • A common theme in the output path is looking up a neigh entry for a
    nexthop, either the gateway in an rtable or a fallback to the daddr
    in the skb:

    nexthop = (__force u32)rt_nexthop(rt, ip_hdr(skb)->daddr);
    neigh = __ipv4_neigh_lookup_noref(dev, nexthop);
    if (unlikely(!neigh))
    neigh = __neigh_create(&arp_tbl, &nexthop, dev, false);

    To allow the nexthop to be an IPv6 address we need to consider the
    family of the nexthop and then call __ipv{4,6}_neigh_lookup_noref based
    on it.

    To make this simpler, add a ip_neigh_gw4 helper similar to ip_neigh_gw6
    added in an earlier patch which handles:

    neigh = __ipv4_neigh_lookup_noref(dev, nexthop);
    if (unlikely(!neigh))
    neigh = __neigh_create(&arp_tbl, &nexthop, dev, false);

    And then add a second one, ip_neigh_for_gw, that calls either
    ip_neigh_gw4 or ip_neigh_gw6 based on the address family of the gateway.

    Update the output paths in the VRF driver and core v4 code to use
    ip_neigh_for_gw simplifying the family based lookup and making both
    ready for a v6 nexthop.

    ipv4_neigh_lookup has a different need - the potential to resolve a
    passed in address in addition to any gateway in the rtable or skb. Since
    this is a one-off, add ip_neigh_gw4 and ip_neigh_gw6 diectly. The
    difference between __neigh_create used by the helpers and neigh_create
    called by ipv4_neigh_lookup is taking a refcount, so add rcu_read_lock_bh
    and bump the refcnt on the neigh entry.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     
  • A later patch allows an IPv6 gateway with an IPv4 route. The neighbor
    entry will exist in the v6 ndisc table and the cached header will contain
    the ipv6 protocol which is wrong for an IPv4 packet. For an IPv4 packet to
    use the v6 neighbor entry, neigh_output needs to skip the cached header
    and just use the output callback for the neigh entry.

    A future patchset can look at expanding the hh_cache to handle 2
    protocols. For now, IPv6 gateways with an IPv4 route will take the
    extra overhead of generating the header.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     

08 Apr, 2019

1 commit

  • When the mtu of a vrf device is set to 0, it would cause ping
    failed. So I think we should limit vrf mtu in a reasonable range
    to solve this problem. I set dev->min_mtu to IPV6_MIN_MTU, so it
    will works for both ipv4 and ipv6. And if dev->max_mtu still be 0
    can be confusing, so I set dev->max_mtu to ETH_MAX_MTU.

    Here is the reproduce step:

    1.Config vrf interface and set mtu to 0:
    3: enp4s0: mtu 1500 qdisc fq_codel
    master vrf1 state UP mode DEFAULT group default qlen 1000
    link/ether 52:54:00:9e:dd:c1 brd ff:ff:ff:ff:ff:ff

    2.Ping peer:
    3: enp4s0: mtu 1500 qdisc fq_codel
    master vrf1 state UP group default qlen 1000
    link/ether 52:54:00:9e:dd:c1 brd ff:ff:ff:ff:ff:ff
    inet 10.0.0.1/16 scope global enp4s0
    valid_lft forever preferred_lft forever
    connect: Network is unreachable

    3.Set mtu to default value, ping works:
    PING 10.0.0.2 (10.0.0.2) 56(84) bytes of data.
    64 bytes from 10.0.0.2: icmp_seq=1 ttl=64 time=1.88 ms

    Fixes: ad49bc6361ca2 ("net: vrf: remove MTU limits for vrf device")
    Signed-off-by: Miaohe Lin
    Reviewed-by: David Ahern
    Signed-off-by: David S. Miller

    Miaohe Lin
     

28 Mar, 2019

1 commit

  • VRF devices don't work with upper devices. Currently, it's possible to
    add a VRF device to a bridge or team, and to create macvlan, macsec, or
    ipvlan devices on top of a VRF (bond and vlan are prevented respectively
    by the lack of an ndo_set_mac_address op and the NETIF_F_VLAN_CHALLENGED
    feature flag).

    Fix this by setting the IFF_NO_RX_HANDLER flag (introduced in commit
    f5426250a6ec ("net: introduce IFF_NO_RX_HANDLER")).

    Cc: David Ahern
    Fixes: 193125dbd8eb ("net: Introduce VRF device driver")
    Signed-off-by: Sabrina Dubroca
    Acked-by: David Ahern
    Signed-off-by: David S. Miller

    Sabrina Dubroca
     

22 Feb, 2019

1 commit

  • Similiar to commit e94cd8113ce63 ("net: remove MTU limits for dummy and
    ifb device"), MTU is irrelevant for VRF device. We init it as 64K while
    limit it to [68, 1500] may make users feel confused.

    Reported-by: Jianlin Shi
    Signed-off-by: Hangbin Liu
    Reviewed-by: David Ahern
    Signed-off-by: David S. Miller

    Hangbin Liu
     

07 Dec, 2018

2 commits

  • In order to pass extack together with NETDEV_PRE_UP notifications, it's
    necessary to route the extack to __dev_open() from diverse (possibly
    indirect) callers. One prominent API through which the notification is
    invoked is dev_change_flags().

    Therefore extend dev_change_flags() with and extra extack argument and
    update all users. Most of the calls end up just encoding NULL, but
    several sites (VLAN, ipvlan, VRF, rtnetlink) do have extack available.

    Since the function declaration line is changed anyway, name the other
    function arguments to placate checkpatch.

    Signed-off-by: Petr Machata
    Acked-by: Jiri Pirko
    Reviewed-by: Ido Schimmel
    Reviewed-by: David Ahern
    Signed-off-by: David S. Miller

    Petr Machata
     
  • A follow-up patch will extend dev_change_flags() with an extack
    argument. Extend cycle_netdev() to have that argument available for the
    conversion.

    Signed-off-by: Petr Machata
    Acked-by: Jiri Pirko
    Reviewed-by: Ido Schimmel
    Reviewed-by: David Ahern
    Signed-off-by: David S. Miller

    Petr Machata
     

08 Nov, 2018

1 commit

  • The skb for packets that are multicast or to a link-local address are
    not marked as being enslaved to a VRF, if they are received on a socket
    bound to the VRF. This is needed for ND and it is preferable for the
    kernel not to have to deal with the additional use-cases if ll or mcast
    packets are handled as enslaved. However, this does not allow service
    instances listening on unbound and bound to VRF sockets to distinguish
    the VRF used, if packets are sent as multicast or to a link-local
    address. The fix is for the VRF driver to also mark these skb as being
    enslaved to the VRF.

    Signed-off-by: Mike Manning
    Reviewed-by: David Ahern
    Tested-by: David Ahern
    Signed-off-by: David S. Miller

    Mike Manning
     

03 Oct, 2018

1 commit

  • The code to obtain the correct table for the incoming interface was
    missing for IPv6. This has been added along with the table creation
    notification to fib rules for the RTNL_FAMILY_IP6MR address family.

    Signed-off-by: Patrick Ruddy
    Signed-off-by: Mike Manning
    Reviewed-by: David Ahern
    Signed-off-by: David S. Miller

    Patrick Ruddy
     

29 May, 2018

1 commit

  • SCTP sockets originated in a VRF can improve their performance if CRC32c
    computation is delegated to underlying devices: update device features,
    setting NETIF_F_SCTP_CRC. Iterating the following command in the topology
    proposed with [1],

    # ip vrf exec vrf-h2 netperf -H 192.0.2.1 -t SCTP_STREAM -- -m 10K

    the measured throughput in Mbit/s improved from 2395 ± 1% to 2720 ± 1%.

    [1] https://www.spinics.net/lists/netdev/msg486007.html

    Signed-off-by: Davide Caratti
    Reviewed-by: Marcelo Ricardo Leitner
    Acked-by: David Ahern
    Signed-off-by: David S. Miller

    Davide Caratti
     

18 Apr, 2018

1 commit

  • A later patch removes rt6i_table from rt6_info. Save the ipv6
    table for a VRF in net_vrf. fib tables can not be deleted so
    no reference counting or locking is required.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     

02 Apr, 2018

1 commit


31 Mar, 2018

1 commit

  • Miguel reported an skb use after free / double free in vrf_finish_output
    when neigh_output returns an error. The vrf driver should return after
    the call to neigh_output as it takes over the skb on error path as well.

    Patch is a simplified version of Miguel's patch which was written for 4.9,
    and updated to top of tree.

    Fixes: 8f58336d3f78a ("net: Add ethernet header for pass through VRF device")
    Signed-off-by: Miguel Fadon Perlines
    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     

28 Mar, 2018

1 commit


05 Mar, 2018

1 commit

  • IPv6 does path selection for multipath routes deep in the lookup
    functions. The next patch adds L4 hash option and needs the skb
    for the forward path. To get the skb to the relevant FIB lookup
    functions it needs to go through the fib rules layer, so add a
    lookup_data argument to the fib_lookup_arg struct.

    Signed-off-by: David Ahern
    Reviewed-by: Ido Schimmel
    Reviewed-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    David Ahern
     

28 Feb, 2018

1 commit

  • These pernet_operations make pretty simple actions
    like variable initialization on init, debug checks
    on exit, and so on, and they obviously are able
    to be executed in parallel with any others:

    vrf_net_ops
    lockd_net_ops
    grace_net_ops
    xfrm6_tunnel_net_ops
    kcm_net_ops
    tcf_net_ops

    Signed-off-by: Kirill Tkhai
    Signed-off-by: David S. Miller

    Kirill Tkhai
     

24 Feb, 2018

1 commit

  • For ages iproute2 has used `struct rtmsg` as the ancillary header for
    FIB rules and in the process set the protocol value to RTPROT_BOOT.
    Until ca56209a66 ("net: Allow a rule to track originating protocol")
    the kernel rules code ignored the protocol value sent from userspace
    and always returned 0 in notifications. To avoid incompatibility with
    existing iproute2, send the protocol as a new attribute.

    Fixes: cac56209a66 ("net: Allow a rule to track originating protocol")
    Signed-off-by: Donald Sharp
    Signed-off-by: David S. Miller

    Donald Sharp
     

22 Feb, 2018

1 commit

  • Allow a rule that is being added/deleted/modified or
    dumped to contain the originating protocol's id.

    The protocol is handled just like a routes originating
    protocol is. This is especially useful because there
    is starting to be a plethora of different user space
    programs adding rules.

    Allow the vrf device to specify that the kernel is the originator
    of the rule created for this device.

    Signed-off-by: Donald Sharp
    Signed-off-by: David S. Miller

    Donald Sharp
     

16 Feb, 2018

1 commit

  • Remove rt_table_id from rtable. It was added for getroute to return the
    table id that was hit in the lookup. With the changes for fibmatch the
    table id can be extracted from the fib_info returned in the fib_result
    so it no longer needs to be in rtable directly.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     

26 Jan, 2018

1 commit

  • Sukumar reported that sends to the local broadcast address
    (255.255.255.255) are broken. Check for the address in vrf driver
    and do not redirect to the VRF device - similar to multicast
    packets.

    With this change sockets can use SO_BINDTODEVICE to specify an
    egress interface and receive responses. Note: the egress interface
    can not be a VRF device but needs to be the enslaved device.

    https://bugzilla.kernel.org/show_bug.cgi?id=198521

    Reported-by: Sukumar Gopalakrishnan
    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     

04 Nov, 2017

1 commit


02 Nov, 2017

1 commit

  • FRA_L3MDEV is defined as U8, but is being added as a U32 attribute. On
    big endian architecture, this results in the l3mdev entry not being
    added to the FIB rules.

    Fixes: 1aa6c4f6b8cd8 ("net: vrf: Add l3mdev rules on first device create")
    Signed-off-by: Jeff Barnhill
    Acked-by: David Ahern
    Signed-off-by: David S. Miller

    Jeff Barnhill
     

05 Oct, 2017

3 commits


22 Sep, 2017

1 commit


16 Sep, 2017

1 commit

  • When building an allmodconfig kernel with gcc-4.6, we get a rather
    odd warning:

    drivers/net/vrf.c: In function ‘vrf_ip6_input_dst’:
    drivers/net/vrf.c:964:3: error: initialized field with side-effects overwritten [-Werror]
    drivers/net/vrf.c:964:3: error: (near initialization for ‘fl6’) [-Werror]

    I have no idea what this warning is even trying to say, but it does
    seem like a false positive. Reordering the initialization in to match
    the structure definition gets rid of the warning, and might also avoid
    whatever gcc thinks is wrong here.

    Fixes: 9ff74384600a ("net: vrf: Handle ipv6 multicast and link-local addresses")
    Signed-off-by: Arnd Bergmann
    Signed-off-by: David S. Miller

    Arnd Bergmann
     

14 Aug, 2017

1 commit


08 Aug, 2017

1 commit

  • Add extack error messages for failure paths creating vrf devices. Once
    extack support is added to iproute2, we go from the unhelpful:
    $ ip li add foobar type vrf
    RTNETLINK answers: Invalid argument

    to:
    $ ip li add foobar type vrf
    Error: VRF table id is missing

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern