03 Nov, 2019

1 commit


25 Oct, 2019

1 commit

  • Some interface types could be nested.
    (VLAN, BONDING, TEAM, MACSEC, MACVLAN, IPVLAN, VIRT_WIFI, VXLAN, etc..)
    These interface types should set lockdep class because, without lockdep
    class key, lockdep always warn about unexisting circular locking.

    In the current code, these interfaces have their own lockdep class keys and
    these manage itself. So that there are so many duplicate code around the
    /driver/net and /net/.
    This patch adds new generic lockdep keys and some helper functions for it.

    This patch does below changes.
    a) Add lockdep class keys in struct net_device
    - qdisc_running, xmit, addr_list, qdisc_busylock
    - these keys are used as dynamic lockdep key.
    b) When net_device is being allocated, lockdep keys are registered.
    - alloc_netdev_mqs()
    c) When net_device is being free'd llockdep keys are unregistered.
    - free_netdev()
    d) Add generic lockdep key helper function
    - netdev_register_lockdep_key()
    - netdev_unregister_lockdep_key()
    - netdev_update_lockdep_key()
    e) Remove unnecessary generic lockdep macro and functions
    f) Remove unnecessary lockdep code of each interfaces.

    After this patch, each interface modules don't need to maintain
    their lockdep keys.

    Signed-off-by: Taehee Yoo
    Signed-off-by: David S. Miller

    Taehee Yoo
     

11 Oct, 2019

1 commit


17 Aug, 2019

1 commit

  • Allow encapsulated packets sent to tunnels layered over ipvlan to use
    offloads rather than forcing SW fallbacks.

    Since commit f21e5077010acda73a60 ("macvlan: add offload features for
    encapsulation"), macvlan has set dev->hw_enc_features to include
    everything in dev->features; do likewise in ipvlan.

    Signed-off-by: Bill Sommerfeld
    Acked-by: Mahesh Bandewar
    Signed-off-by: David S. Miller

    Bill Sommerfeld
     

08 Jun, 2019

1 commit

  • Pull networking fixes from David Miller:

    1) Free AF_PACKET po->rollover properly, from Willem de Bruijn.

    2) Read SFP eeprom in max 16 byte increments to avoid problems with
    some SFP modules, from Russell King.

    3) Fix UDP socket lookup wrt. VRF, from Tim Beale.

    4) Handle route invalidation properly in s390 qeth driver, from Julian
    Wiedmann.

    5) Memory leak on unload in RDS, from Zhu Yanjun.

    6) sctp_process_init leak, from Neil HOrman.

    7) Fix fib_rules rule insertion semantic change that broke Android,
    from Hangbin Liu.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (33 commits)
    pktgen: do not sleep with the thread lock held.
    net: mvpp2: Use strscpy to handle stat strings
    net: rds: fix memory leak in rds_ib_flush_mr_pool
    ipv6: fix EFAULT on sendto with icmpv6 and hdrincl
    ipv6: use READ_ONCE() for inet->hdrincl as in ipv4
    Revert "fib_rules: return 0 directly if an exactly same rule exists when NLM_F_EXCL not supplied"
    net: aquantia: fix wol configuration not applied sometimes
    ethtool: fix potential userspace buffer overflow
    Fix memory leak in sctp_process_init
    net: rds: fix memory leak when unload rds_rdma
    ipv6: fix the check before getting the cookie in rt6_get_cookie
    ipv4: not do cache for local delivery if bc_forwarding is enabled
    s390/qeth: handle error when updating TX queue count
    s390/qeth: fix VLAN attribute in bridge_hostnotify udev event
    s390/qeth: check dst entry before use
    s390/qeth: handle limited IPv4 broadcast in L3 TX path
    net: fix indirect calls helpers for ptype list hooks.
    net: ipvlan: Fix ipvlan device tso disabled while NETIF_F_IP_CSUM is set
    udp: only choose unbound UDP socket for multicast when not in a VRF
    net/tls: replace the sleeping lock around RX resync with a bit lock
    ...

    Linus Torvalds
     

05 Jun, 2019

1 commit

  • There's some NICs, such as hinic, with NETIF_F_IP_CSUM and NETIF_F_TSO
    on but NETIF_F_HW_CSUM off. And ipvlan device features will be
    NETIF_F_TSO on with NETIF_F_IP_CSUM and NETIF_F_IP_CSUM both off as
    IPVLAN_FEATURES only care about NETIF_F_HW_CSUM. So TSO will be
    disabled in netdev_fix_features.
    For example:
    Features for enp129s0f0:
    rx-checksumming: on
    tx-checksumming: on
    tx-checksum-ipv4: on
    tx-checksum-ip-generic: off [fixed]
    tx-checksum-ipv6: on

    Fixes: a188222b6ed2 ("net: Rename NETIF_F_ALL_CSUM to NETIF_F_CSUM_MASK")
    Signed-off-by: Miaohe Lin
    Signed-off-by: David S. Miller

    Miaohe Lin
     

31 May, 2019

1 commit

  • Based on 1 normalized pattern(s):

    this program is free software you can redistribute it and or modify
    it under the terms of the gnu general public license as published by
    the free software foundation either version 2 of the license or at
    your option any later version

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-or-later

    has been chosen to replace the boilerplate/reference in 3029 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Allison Randal
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

21 May, 2019

2 commits


25 Feb, 2019

1 commit

  • Three conflicts, one of which, for marvell10g.c is non-trivial and
    requires some follow-up from Heiner or someone else.

    The issue is that Heiner converted the marvell10g driver over to
    use the generic c45 code as much as possible.

    However, in 'net' a bug fix appeared which makes sure that a new
    local mask (MDIO_AN_10GBT_CTRL_ADV_NBT_MASK) with value 0x01e0
    is cleared.

    Signed-off-by: David S. Miller

    David S. Miller
     

23 Feb, 2019

1 commit

  • When running Docker with userns isolation e.g. --userns-remap="default"
    and spawning up some containers with CAP_NET_ADMIN under this realm, I
    noticed that link changes on ipvlan slave device inside that container
    can affect all devices from this ipvlan group which are in other net
    namespaces where the container should have no permission to make changes
    to, such as the init netns, for example.

    This effectively allows to undo ipvlan private mode and switch globally to
    bridge mode where slaves can communicate directly without going through
    hostns, or it allows to switch between global operation mode (l2/l3/l3s)
    for everyone bound to the given ipvlan master device. libnetwork plugin
    here is creating an ipvlan master and ipvlan slave in hostns and a slave
    each that is moved into the container's netns upon creation event.

    * In hostns:

    # ip -d a
    [...]
    8: cilium_host@bond0: mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
    ipvlan mode l3 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
    inet 10.41.0.1/32 scope link cilium_host
    valid_lft forever preferred_lft forever
    [...]

    * Spawn container & change ipvlan mode setting inside of it:

    # docker run -dt --cap-add=NET_ADMIN --network cilium-net --name client -l app=test cilium/netperf
    9fff485d69dcb5ce37c9e33ca20a11ccafc236d690105aadbfb77e4f4170879c

    # docker exec -ti client ip -d a
    [...]
    10: cilium0@if4: mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
    ipvlan mode l3 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
    inet 10.41.197.43/32 brd 10.41.197.43 scope global cilium0
    valid_lft forever preferred_lft forever

    # docker exec -ti client ip link change link cilium0 name cilium0 type ipvlan mode l2

    # docker exec -ti client ip -d a
    [...]
    10: cilium0@if4: mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
    ipvlan mode l2 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
    inet 10.41.197.43/32 brd 10.41.197.43 scope global cilium0
    valid_lft forever preferred_lft forever

    * In hostns (mode switched to l2):

    # ip -d a
    [...]
    8: cilium_host@bond0: mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
    ipvlan mode l2 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
    inet 10.41.0.1/32 scope link cilium_host
    valid_lft forever preferred_lft forever
    [...]

    Same l3 -> l2 switch would also happen by creating another slave inside
    the container's network namespace when specifying the existing cilium0
    link to derive the actual (bond0) master:

    # docker exec -ti client ip link add link cilium0 name cilium1 type ipvlan mode l2

    # docker exec -ti client ip -d a
    [...]
    2: cilium1@if4: mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
    ipvlan mode l2 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
    10: cilium0@if4: mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
    ipvlan mode l2 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
    inet 10.41.197.43/32 brd 10.41.197.43 scope global cilium0
    valid_lft forever preferred_lft forever

    * In hostns:

    # ip -d a
    [...]
    8: cilium_host@bond0: mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
    ipvlan mode l2 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
    inet 10.41.0.1/32 scope link cilium_host
    valid_lft forever preferred_lft forever
    [...]

    One way to mitigate it is to check CAP_NET_ADMIN permissions of
    the ipvlan master device's ns, and only then allow to change
    mode or flags for all devices bound to it. Above two cases are
    then disallowed after the patch.

    Signed-off-by: Daniel Borkmann
    Acked-by: Mahesh Bandewar
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

09 Feb, 2019

2 commits

  • An ipvlan bug fix in 'net' conflicted with the abstraction away
    of the IPV6 specific support in 'net-next'.

    Similarly, a bug fix for mlx5 in 'net' conflicted with the flow
    action conversion in 'net-next'.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Right now ipvlan has a hard dependency on CONFIG_NETFILTER and
    otherwise it cannot be built. However, the only ipvlan operation
    mode that actually depends on netfilter is l3s, everything else
    is independent of it. Break this hard dependency such that users
    are able to use ipvlan l3 mode on systems where netfilter is not
    compiled in.

    Therefore, this adds a hidden CONFIG_IPVLAN_L3S bool which is
    defaulting to y when CONFIG_NETFILTER is set in order to retain
    existing behavior for l3s. All l3s related code is refactored
    into ipvlan_l3s.c that is compiled in when enabled.

    Signed-off-by: Daniel Borkmann
    Cc: Mahesh Bandewar
    Cc: Florian Westphal
    Cc: Martynas Pumputis
    Acked-by: Florian Westphal
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

31 Jan, 2019

1 commit

  • While implementing ipvlan l3 and l3s mode for kubernetes CNI plugin,
    I ran into the issue that while l3 mode is working fine, l3s mode
    does not have any connectivity to kube-apiserver and hence all pods
    end up in Error state as well. The ipvlan master device sits on
    top of a bond device and hostns traffic to kube-apiserver (also running
    in hostns) is DNATed from 10.152.183.1:443 to 139.178.29.207:37573
    where the latter is the address of the bond0. While in l3 mode, a
    curl to https://10.152.183.1:443 or to https://139.178.29.207:37573
    works fine from hostns, neither of them do in case of l3s. In the
    latter only a curl to https://127.0.0.1:37573 appeared to work where
    for local addresses of bond0 I saw kernel suddenly starting to emit
    ARP requests to query HW address of bond0 which remained unanswered
    and neighbor entries in INCOMPLETE state. These ARP requests only
    happen while in l3s.

    Debugging this further, I found the issue is that l3s mode is piggy-
    backing on l3 master device, and in this case local routes are using
    l3mdev_master_dev_rcu(dev) instead of net->loopback_dev as per commit
    f5a0aab84b74 ("net: ipv4: dst for local input routes should use l3mdev
    if relevant") and 5f02ce24c269 ("net: l3mdev: Allow the l3mdev to be
    a loopback"). I found that reverting them back into using the
    net->loopback_dev fixed ipvlan l3s connectivity and got everything
    working for the CNI.

    Now judging from 4fbae7d83c98 ("ipvlan: Introduce l3s mode") and the
    l3mdev paper in [0] the only sole reason why ipvlan l3s is relying
    on l3 master device is to get the l3mdev_ip_rcv() receive hook for
    setting the dst entry of the input route without adding its own
    ipvlan specific hacks into the receive path, however, any l3 domain
    semantics beyond just that are breaking l3s operation. Note that
    ipvlan also has the ability to dynamically switch its internal
    operation from l3 to l3s for all ports via ipvlan_set_port_mode()
    at runtime. In any case, l3 vs l3s soley distinguishes itself by
    'de-confusing' netfilter through switching skb->dev to ipvlan slave
    device late in NF_INET_LOCAL_IN before handing the skb to L4.

    Minimal fix taken here is to add a IFF_L3MDEV_RX_HANDLER flag which,
    if set from ipvlan setup, gets us only the wanted l3mdev_l3_rcv() hook
    without any additional l3mdev semantics on top. This should also have
    minimal impact since dev->priv_flags is already hot in cache. With
    this set, l3s mode is working fine and I also get things like
    masquerading pod traffic on the ipvlan master properly working.

    [0] https://netdevconf.org/1.2/papers/ahern-what-is-l3mdev-paper.pdf

    Fixes: f5a0aab84b74 ("net: ipv4: dst for local input routes should use l3mdev if relevant")
    Fixes: 5f02ce24c269 ("net: l3mdev: Allow the l3mdev to be a loopback")
    Fixes: 4fbae7d83c98 ("ipvlan: Introduce l3s mode")
    Signed-off-by: Daniel Borkmann
    Cc: Mahesh Bandewar
    Cc: David Ahern
    Cc: Florian Westphal
    Cc: Martynas Pumputis
    Acked-by: David Ahern
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

14 Dec, 2018

1 commit

  • A NETDEV_CHANGEADDR event implies a change of address of each of the
    IPVLANs of this IPVLAN device. Therefore propagate NETDEV_PRE_CHANGEADDR
    to all the IPVLANs.

    Signed-off-by: Petr Machata
    Acked-by: Jiri Pirko
    Reviewed-by: Ido Schimmel
    Signed-off-by: David S. Miller

    Petr Machata
     

11 Dec, 2018

1 commit

  • Fix following gcc warning:

    drivers/net/ipvlan/ipvlan_main.c:543:12: warning:
    comparison is always false due to limited range of data type [-Wtype-limits]

    'mode' is a u16 variable, IPVLAN_MODE_L2 is zero,
    the comparison is always false

    Signed-off-by: YueHaibing
    Signed-off-by: David S. Miller

    YueHaibing
     

07 Dec, 2018

2 commits

  • In order to pass extack together with NETDEV_PRE_UP notifications, it's
    necessary to route the extack to __dev_open() from diverse (possibly
    indirect) callers. One prominent API through which the notification is
    invoked is dev_change_flags().

    Therefore extend dev_change_flags() with and extra extack argument and
    update all users. Most of the calls end up just encoding NULL, but
    several sites (VLAN, ipvlan, VRF, rtnetlink) do have extack available.

    Since the function declaration line is changed anyway, name the other
    function arguments to placate checkpatch.

    Signed-off-by: Petr Machata
    Acked-by: Jiri Pirko
    Reviewed-by: Ido Schimmel
    Reviewed-by: David Ahern
    Signed-off-by: David S. Miller

    Petr Machata
     
  • A follow-up patch will extend dev_change_flags() with an extack
    argument. Extend ipvlan_set_port_mode() to have that argument available
    for the conversion.

    Signed-off-by: Petr Machata
    Acked-by: Jiri Pirko
    Reviewed-by: Ido Schimmel
    Signed-off-by: David S. Miller

    Petr Machata
     

02 Jul, 2018

1 commit

  • After we change the ipvlan mode from l3 to l2, or vice versa, we only
    reset IFF_NOARP flag, but don't flush the ARP table cache, which will
    cause eth->h_dest to be equal to eth->h_source in ipvlan_xmit_mode_l2().
    Then the message will not come out of host.

    Here is the reproducer on local host:

    ip link set eth1 up
    ip addr add 192.168.1.1/24 dev eth1
    ip link add link eth1 ipvlan1 type ipvlan mode l3

    ip netns add net1
    ip link set ipvlan1 netns net1
    ip netns exec net1 ip link set ipvlan1 up
    ip netns exec net1 ip addr add 192.168.2.1/24 dev ipvlan1

    ip route add 192.168.2.0/24 via 192.168.1.2
    ping 192.168.2.2 -c 2

    ip netns exec net1 ip link set ipvlan1 type ipvlan mode l2
    ping 192.168.2.2 -c 2

    Add the same configuration on remote host. After we set the mode to l2,
    we could find that the src/dst MAC addresses are the same on eth1:

    21:26:06.648565 00:b7:13:ad:d3:05 > 00:b7:13:ad:d3:05, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 58356, offset 0, flags [DF], proto ICMP (1), length 84)
    192.168.2.1 > 192.168.2.2: ICMP echo request, id 22686, seq 1, length 64

    Fix this by calling dev_change_flags(), which will call netdevice notifier
    with flag change info.

    v2:
    a) As pointed out by Wang Cong, check return value for dev_change_flags() when
    change dev flags.
    b) As suggested by Stefano and Sabrina, move flags setting before l3mdev_ops.
    So we don't need to redo ipvlan_{, un}register_nf_hook() again in err path.

    Reported-by: Jianlin Shi
    Reviewed-by: Stefano Brivio
    Reviewed-by: Sabrina Dubroca
    Fixes: 2ad7bf3638411 ("ipvlan: Initial check-in of the IPVLAN driver.")
    Signed-off-by: Hangbin Liu
    Signed-off-by: David S. Miller

    Hangbin Liu
     

21 Jun, 2018

1 commit

  • Commit 296d48568042 ("ipvlan: inherit MTU from master device") adjusted
    the mtu from the master device when creating a ipvlan device, but it
    would also override the mtu value set in rtnl_create_link. It causes
    IFLA_MTU param not to take effect.

    So this patch is to not adjust the mtu if IFLA_MTU param is set when
    creating a ipvlan device.

    Fixes: 296d48568042 ("ipvlan: inherit MTU from master device")
    Reported-by: Jianlin Shi
    Signed-off-by: Xin Long
    Signed-off-by: David S. Miller

    Xin Long
     

20 Jun, 2018

1 commit

  • Similar to the fixes on team and bonding, this restores the ability
    to set an ipvlan device's mtu to anything higher than 1500.

    Fixes: 91572088e3fd ("net: use core MTU range checking in core net infra")
    Signed-off-by: Xin Long
    Signed-off-by: David S. Miller

    Xin Long
     

16 May, 2018

1 commit


28 Mar, 2018

1 commit


10 Mar, 2018

1 commit

  • Some network devices - notably ipvlan slave - are not compatible with
    any kind of rx_handler. Currently the hook can be installed but any
    configuration (bridge, bond, macsec, ...) is nonfunctional.

    This change allocates a priv_flag bit to mark such devices and explicitly
    forbid installing a rx_handler if such bit is set. The new bit is used
    by ipvlan slave device.

    Signed-off-by: Paolo Abeni
    Signed-off-by: David S. Miller

    Paolo Abeni
     

09 Mar, 2018

1 commit

  • The rx_handler field is rcu-protected, but I forgot to use the
    proper accessor while refactoring netif_is_ipvlan_port(). Such
    function only check the rx_handler value, so it is safe, but we need
    to properly read rx_handler via rcu_access_pointer() to avoid sparse
    warnings.

    Fixes: 1ec54cb44e67 ("net: unpollute priv_flags space")
    Signed-off-by: Paolo Abeni
    Signed-off-by: David S. Miller

    Paolo Abeni
     

08 Mar, 2018

1 commit

  • the ipvlan device driver defines and uses 2 bits inside the priv_flags
    net_device field. Such bits and the related helper are used only
    inside the ipvlan device driver, and the core networking does not
    need to be aware of them.

    This change moves netif_is_ipvlan* helper in the ipvlan driver and
    re-implement them looking for ipvlan specific symbols instead of
    using priv_flags.

    Overall this frees two bits inside priv_flags - and move the following
    ones to avoid gaps - without any intended functional change.

    Signed-off-by: Paolo Abeni
    Signed-off-by: David S. Miller

    Paolo Abeni
     

05 Mar, 2018

2 commits

  • Currently we allow the creation of 8021q devices on top of
    ipvlan, but such devices are nonfunctional, as the underlying
    ipvlan rx_hanlder hook can't match the relevant traffic.

    Be explicit and forbid the creation of such nonfunctional devices.

    Signed-off-by: Paolo Abeni
    Signed-off-by: David S. Miller

    Paolo Abeni
     
  • IPv6 does path selection for multipath routes deep in the lookup
    functions. The next patch adds L4 hash option and needs the skb
    for the forward path. To get the skb to the relevant FIB lookup
    functions it needs to go through the fib rules layer, so add a
    lookup_data argument to the fib_lookup_arg struct.

    Signed-off-by: David Ahern
    Reviewed-by: Ido Schimmel
    Reviewed-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    David Ahern
     

01 Mar, 2018

2 commits

  • This changeset moves ipvlan address under RCU protection, using
    a per ipvlan device spinlock to protect list mutation and RCU
    read access to protect list traversal.

    Also explicitly use RCU read lock to traverse the per port
    ipvlans list, so that we can now perform a full address lookup
    without asserting the RTNL lock.

    Overall this allows the ipvlan driver to check fully for duplicate
    addresses - before this commit ipv6 addresses assigned by autoconf
    via prefix delegation where accepted without any check - and avoid
    the following rntl assertion failure still in the same code path:

    RTNL: assertion failed at drivers/net/ipvlan/ipvlan_core.c (124)
    WARNING: CPU: 15 PID: 0 at drivers/net/ipvlan/ipvlan_core.c:124 ipvlan_addr_busy+0x97/0xa0 [ipvlan]
    Modules linked in: ipvlan(E) ixgbe
    CPU: 15 PID: 0 Comm: swapper/15 Tainted: G E 4.16.0-rc2.ipvlan+ #1782
    Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.1.7 06/16/2016
    RIP: 0010:ipvlan_addr_busy+0x97/0xa0 [ipvlan]
    RSP: 0018:ffff881ff9e03768 EFLAGS: 00010286
    RAX: 0000000000000000 RBX: ffff881fdf2a9000 RCX: 0000000000000000
    RDX: 0000000000000001 RSI: 00000000000000f6 RDI: 0000000000000300
    RBP: ffff881fdf2a8000 R08: 0000000000000000 R09: 0000000000000000
    R10: 0000000000000001 R11: ffff881ff9e034c0 R12: ffff881fe07bcc00
    R13: 0000000000000001 R14: ffffffffa02002b0 R15: 0000000000000001
    FS: 0000000000000000(0000) GS:ffff881ff9e00000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00007fc5c1a4f248 CR3: 000000207e012005 CR4: 00000000001606e0
    Call Trace:

    ipvlan_addr6_event+0x6c/0xd0 [ipvlan]
    notifier_call_chain+0x49/0x90
    atomic_notifier_call_chain+0x6a/0x100
    ipv6_add_addr+0x5f9/0x720
    addrconf_prefix_rcv_add_addr+0x244/0x3c0
    addrconf_prefix_rcv+0x2f3/0x790
    ndisc_router_discovery+0x633/0xb70
    ndisc_rcv+0x155/0x180
    icmpv6_rcv+0x4ac/0x5f0
    ip6_input_finish+0x138/0x6a0
    ip6_input+0x41/0x1f0
    ipv6_rcv+0x4db/0x8d0
    __netif_receive_skb_core+0x3d5/0xe40
    netif_receive_skb_internal+0x89/0x370
    napi_gro_receive+0x14f/0x1e0
    ixgbe_clean_rx_irq+0x4ce/0x1020 [ixgbe]
    ixgbe_poll+0x31a/0x7a0 [ixgbe]
    net_rx_action+0x296/0x4f0
    __do_softirq+0xcf/0x4f5
    irq_exit+0xf5/0x110
    do_IRQ+0x62/0x110
    common_interrupt+0x91/0x91

    v1 -> v2: drop unneeded in_softirq check in ipvlan_addr6_validator_event()

    Fixes: e9997c2938b2 ("ipvlan: fix check for IP addresses in control path")
    Signed-off-by: Paolo Abeni
    Signed-off-by: David S. Miller

    Paolo Abeni
     
  • Currently, if IPv6 is enabled on top of an ipvlan device in l3
    mode, the following warning message:

    Dropped {multi|broad}cast of type= [86dd]

    is emitted every time that a RS is generated and dmseg is soon
    filled with irrelevant messages. Replace pr_warn with pr_debug,
    to preserve debuggability, without scaring the sysadmin.

    Signed-off-by: Paolo Abeni
    Signed-off-by: David S. Miller

    Paolo Abeni
     

28 Feb, 2018

1 commit

  • These pernet_operations unregister ipvlan net hooks.
    nf_unregister_net_hooks() removes hooks one-by-one,
    and then frees the memory via rcu. This looks similar
    to that happens, when a new hooks is added: allocation
    of bigger memory region, copy of old content, and rcu
    freeing the old memory. So, all of net code should be
    well with this behavior. Also at the time of hook
    unregistering, there are no packets, and foreign net
    pernet_operations are not interested in others hooks.
    So, we mark them as async.

    Signed-off-by: Kirill Tkhai
    Signed-off-by: David S. Miller

    Kirill Tkhai
     

22 Feb, 2018

1 commit

  • IPVlan has an hard dependency on IPv6, refactor the ipvlan code to allow
    compiling it with IPv6 disabled, move duplicate code into addr_equal()
    and refactor series of if-else into a switch.

    Signed-off-by: Matteo Croce
    Signed-off-by: David S. Miller

    Matteo Croce
     

16 Dec, 2017

2 commits

  • IPvlan currently scrubs packets at every location where packets may be
    crossing namespace boundary. Though this is desirable, currently IPvlan
    does it more than necessary. e.g. packets that are going to take
    dev_forward_skb() path will get scrubbed so no point in scrubbing them
    before forwarding. Another side-effect of scrubbing is that pkt-type gets
    set to PACKET_HOST which overrides what was already been set by the
    earlier path making erroneous delivery of the packets.

    Also scrubbing packets just before calling dev_queue_xmit() has detrimental
    effects since packets lose skb->sk and because of that miss prio updates,
    incorrect socket back-pressure and would even break TSQ.

    Fixes: b93dd49c1a35 ('ipvlan: Scrub skb before crossing the namespace boundary')
    Signed-off-by: Mahesh Bandewar
    Signed-off-by: David S. Miller

    Mahesh Bandewar
     
  • This reverts commit 92ff42645028fa6f9b8aa767718457b9264316b4.

    Even though the check added is not that taxing, it's not really needed.
    First of all this will be per packet cost and second thing is that the
    eth_type_trans() already does this correctly. The excessive scrubbing
    in IPvlan was changing the pkt-type skb metadata of the packet which
    made it necessary to re-check the mac. The subsequent patch in this
    series removes the faulty packet-scrub.

    Signed-off-by: Mahesh Bandewar
    Signed-off-by: David S. Miller

    Mahesh Bandewar
     

12 Dec, 2017

1 commit

  • Packets that don't have dest mac as the mac of the master device should
    not be entertained by the IPvlan rx-handler. This is mostly true as the
    packet path mostly takes care of that, except when the master device is
    a virtual device. As demonstrated in the following case -

    ip netns add ns1
    ip link add ve1 type veth peer name ve2
    ip link add link ve2 name iv1 type ipvlan mode l2
    ip link set dev iv1 netns ns1
    ip link set ve1 up
    ip link set ve2 up
    ip -n ns1 link set iv1 up
    ip addr add 192.168.10.1/24 dev ve1
    ip -n ns1 addr 192.168.10.2/24 dev iv1
    ping -c2 192.168.10.2

    ip neigh show dev ve1
    ip neigh show 192.168.10.2 lladdr dev ve1
    ping -c2 192.168.10.2

    This patch adds that missing check in the IPvlan rx-handler.

    Reported-by: Amit Sikka
    Signed-off-by: Mahesh Bandewar
    Signed-off-by: David S. Miller

    Mahesh Bandewar
     

07 Dec, 2017

1 commit


05 Dec, 2017

1 commit


03 Dec, 2017

2 commits


24 Nov, 2017

1 commit

  • In the function ipvlan_get_L3_hdr, current codes use pskb_may_pull to
    make sure the skb header has enough linear room for ipv6 header. But it
    would use the latter memory directly without linear check when it is icmp.
    So it still may access the unepxected memory in ipvlan_addr_lookup.

    Now invoke the pskb_may_pull again if it is ipv6 icmp.

    Signed-off-by: Gao Feng
    Signed-off-by: David S. Miller

    Gao Feng