06 Apr, 2016

1 commit

  • When creating an ip6tnl tunnel with ip tunnel, rtnl_link_ops is not set
    before ip6_tnl_create2 is called. When register_netdevice is called, there
    is no linkinfo attribute in the NEWLINK message because of that.

    Setting rtnl_link_ops before calling register_netdevice fixes that.

    Fixes: 0b112457229d ("ip6tnl: add support of link creation via rtnl")
    Signed-off-by: Thadeu Lima de Souza Cascardo
    Acked-by: Nicolas Dichtel
    Signed-off-by: David S. Miller

    Thadeu Lima de Souza Cascardo
     

09 Mar, 2016

1 commit


24 Feb, 2016

1 commit

  • IPCB may contain data from previous layers (in the observed case the
    qdisc layer). In the observed scenario, the data was misinterpreted as
    ip header options, which later caused the ihl to be set to an invalid
    value (opt before dst_link_failure can be called for
    various types of tunnels. This change only applies to encapsulated ipv4
    packets.

    The code introduced in 11c21a30 which clears all of IPCB has been removed
    to be consistent with these changes, and instead the opt field is cleared
    unconditionally in ip_tunnel_xmit. The change in ip_tunnel_xmit applies to
    SIT, GRE, and IPIP tunnels.

    The relevant vti, l2tp, and pptp functions already contain similar code for
    clearing the IPCB.

    Signed-off-by: Bernie Harris
    Signed-off-by: David S. Miller

    Bernie Harris
     

17 Feb, 2016

1 commit

  • This also fix a potential race into the existing tunnel code, which
    could lead to the wrong dst to be permanenty cached:

    CPU1: CPU2:


    dst = ip6_route_output(...)

    dst_cache_reset() // no effect,
    // the cache is empty
    dst_cache_set() // the wrong dst
    // is permanenty stored
    // into the cache

    With the new dst implementation the above race is not possible
    since the first cache lookup after dst_cache_reset will fail due
    to the timestamp check

    Signed-off-by: Paolo Abeni
    Suggested-and-acked-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Paolo Abeni
     

19 Nov, 2015

1 commit

  • the commit cdf3464e6c6b ("ipv6: Fix dst_entry refcnt bugs in ip6_tunnel")
    introduced percpu storage for ip6_tunnel dst cache, but while clearing
    such cache it used raw_cpu_ptr to walk the per cpu entries, so cached
    dst on non current cpu are not actually reset.

    This patch replaces raw_cpu_ptr with per_cpu_ptr, properly cleaning
    such storage.

    Fixes: cdf3464e6c6b ("ipv6: Fix dst_entry refcnt bugs in ip6_tunnel")
    Signed-off-by: Paolo Abeni
    Acked-by: Martin KaFai Lau
    Signed-off-by: David S. Miller

    Paolo Abeni
     

25 Sep, 2015

1 commit

  • Currently error log messages in ip6_tnl_err are printed at 'warn'
    level. This is different to other tunnel types which don't print
    any messages. These log messages don't provide any information that
    couldn't be deduced with networking tools. Also it can be annoying
    to have one end of the tunnel go down and have the logs fill with
    pointless messages such as "Path to destination invalid or inactive!".

    This patch reduces the log level of these messages to 'dbg' level to
    bring the visible behaviour into line with other tunnel types.

    Signed-off-by: Matt Bennett
    Signed-off-by: David S. Miller

    Matt Bennett
     

16 Sep, 2015

3 commits

  • This patch uses a seqlock to ensure consistency between idst->dst and
    idst->cookie. It also makes dst freeing from fib tree to undergo a
    rcu grace period.

    Signed-off-by: Martin KaFai Lau
    Signed-off-by: David S. Miller

    Martin KaFai Lau
     
  • Problems in the current dst_entry cache in the ip6_tunnel:

    1. ip6_tnl_dst_set is racy. There is no lock to protect it:
    - One major problem is that the dst refcnt gets messed up. F.e.
    the same dst_cache can be released multiple times and then
    triggering the infamous dst refcnt < 0 warning message.
    - Another issue is the inconsistency between dst_cache and
    dst_cookie.

    It can be reproduced by adding and removing the ip6gre tunnel
    while running a super_netperf TCP_CRR test.

    2. ip6_tnl_dst_get does not take the dst refcnt before returning
    the dst.

    This patch:
    1. Create a percpu dst_entry cache in ip6_tnl
    2. Use a spinlock to protect the dst_cache operations
    3. ip6_tnl_dst_get always takes the dst refcnt before returning

    Signed-off-by: Martin KaFai Lau
    Signed-off-by: David S. Miller

    Martin KaFai Lau
     
  • It is a prep work to fix the dst_entry refcnt bugs in
    ip6_tunnel.

    This patch rename:
    1. ip6_tnl_dst_check() to ip6_tnl_dst_get() to better
    reflect that it will take a dst refcnt in the next patch.
    2. ip6_tnl_dst_store() to ip6_tnl_dst_set() to have a more
    conventional name matching with ip6_tnl_dst_get().

    Signed-off-by: Martin KaFai Lau
    Signed-off-by: David S. Miller

    Martin KaFai Lau
     

01 Aug, 2015

2 commits


26 May, 2015

1 commit

  • Instead of doing the rt6->rt6i_node check whenever we need
    to get the route's cookie. Refactor it into rt6_get_cookie().
    It is a prep work to handle FLOWI_FLAG_KNOWN_NH and also
    percpu rt6_info later.

    Signed-off-by: Martin KaFai Lau
    Cc: Hannes Frederic Sowa
    Cc: Steffen Klassert
    Cc: Julian Anastasov
    Signed-off-by: David S. Miller

    Martin KaFai Lau
     

08 Apr, 2015

1 commit


03 Apr, 2015

1 commit


01 Apr, 2015

4 commits


21 Mar, 2015

1 commit

  • Conflicts:
    drivers/net/ethernet/emulex/benet/be_main.c
    net/core/sysctl_net_core.c
    net/ipv4/inet_diag.c

    The be_main.c conflict resolution was really tricky. The conflict
    hunks generated by GIT were very unhelpful, to say the least. It
    split functions in half and moved them around, when the real actual
    conflict only existed solely inside of one function, that being
    be_map_pci_bars().

    So instead, to resolve this, I checked out be_main.c from the top
    of net-next, then I applied the be_main.c changes from 'net' since
    the last time I merged. And this worked beautifully.

    The inet_diag.c and sysctl_net_core.c conflicts were simple
    overlapping changes, and were easily to resolve.

    Signed-off-by: David S. Miller

    David S. Miller
     

18 Mar, 2015

1 commit

  • After commit 2b0bb01b6edb, the kernel returns -ENOBUFS when user tries to add
    an existing tunnel with ioctl API:
    $ ip -6 tunnel add ip6tnl1 mode ip6ip6 dev eth1
    add tunnel "ip6tnl0" failed: No buffer space available

    It's confusing, the right error is EEXIST.

    This patch also change a bit the code returned:
    - ENOBUFS -> ENOMEM
    - ENOENT -> ENODEV

    Fixes: 2b0bb01b6edb ("ip6_tunnel: Return an error when adding an existing tunnel.")
    CC: Steffen Klassert
    Reported-by: Pierre Cheynier
    Signed-off-by: Nicolas Dichtel
    Signed-off-by: David S. Miller

    Nicolas Dichtel
     

25 Feb, 2015

1 commit


20 Jan, 2015

1 commit


24 Nov, 2014

1 commit


07 Nov, 2014

3 commits


04 Nov, 2014

1 commit

  • ip6_tnl_dev_init() sets the dev->iflink via a call to
    ip6_tnl_link_config(). After that, register_netdevice()
    sets dev->iflink = -1. So we loose the iflink configuration
    for ipv6 tunnels. Fix this by using ip6_tnl_dev_init() as the
    ndo_init function. Then ip6_tnl_dev_init() is called after
    dev->iflink is set to -1 from register_netdevice().

    Signed-off-by: Steffen Klassert
    Signed-off-by: David S. Miller

    Steffen Klassert
     

31 Oct, 2014

1 commit

  • The fallback device is in ipv6 mode by default.
    The mode can not be changed in runtime, so there
    is no way to decapsulate ip4in6 packets coming from
    various sources without creating the specific tunnel
    ifaces for each peer.

    This allows to update the fallback tunnel device, but only
    the mode could be changed. Usual command should work for the
    fallback device: `ip -6 tun change ip6tnl0 mode any`

    The fallback device can not be hidden from the packet receiver
    as a regular tunnel, but there is no need for synchronization
    as long as we do single assignment.

    Cc: David S. Miller
    Cc: Eric Dumazet
    Signed-off-by: Alexey Andriyanov
    Signed-off-by: David S. Miller

    Alexey Andriyanov
     

08 Oct, 2014

1 commit

  • Testing xmit_more support with netperf and connected UDP sockets,
    I found strange dst refcount false sharing.

    Current handling of IFF_XMIT_DST_RELEASE is not optimal.

    Dropping dst in validate_xmit_skb() is certainly too late in case
    packet was queued by cpu X but dequeued by cpu Y

    The logical point to take care of drop/force is in __dev_queue_xmit()
    before even taking qdisc lock.

    As Julian Anastasov pointed out, need for skb_dst() might come from some
    packet schedulers or classifiers.

    This patch adds new helper to cleanly express needs of various drivers
    or qdiscs/classifiers.

    Drivers that need skb_dst() in their ndo_start_xmit() should call
    following helper in their setup instead of the prior :

    dev->priv_flags &= ~IFF_XMIT_DST_RELEASE;
    ->
    netif_keep_dst(dev);

    Instead of using a single bit, we use two bits, one being
    eventually rebuilt in bonding/team drivers.

    The other one, is permanent and blocks IFF_XMIT_DST_RELEASE being
    rebuilt in bonding/team. Eventually, we could add something
    smarter later.

    Signed-off-by: Eric Dumazet
    Cc: Julian Anastasov
    Signed-off-by: David S. Miller

    Eric Dumazet
     

03 Oct, 2014

1 commit


29 Sep, 2014

1 commit


25 Aug, 2014

1 commit

  • This patch makes no changes to the logic of the code but simply addresses
    coding style issues as detected by checkpatch.

    Both objdump and diff -w show no differences.

    A number of items are addressed in this patch:
    * Multiple spaces converted to tabs
    * Spaces before tabs removed.
    * Spaces in pointer typing cleansed (char *)foo etc.
    * Remove space after sizeof
    * Ensure spacing around comparators such as if statements.

    Signed-off-by: Ian Morris
    Signed-off-by: David S. Miller

    Ian Morris
     

16 Jul, 2014

1 commit

  • Extend alloc_netdev{,_mq{,s}}() to take name_assign_type as argument, and convert
    all users to pass NET_NAME_UNKNOWN.

    Coccinelle patch:

    @@
    expression sizeof_priv, name, setup, txqs, rxqs, count;
    @@

    (
    -alloc_netdev_mqs(sizeof_priv, name, setup, txqs, rxqs)
    +alloc_netdev_mqs(sizeof_priv, name, NET_NAME_UNKNOWN, setup, txqs, rxqs)
    |
    -alloc_netdev_mq(sizeof_priv, name, setup, count)
    +alloc_netdev_mq(sizeof_priv, name, NET_NAME_UNKNOWN, setup, count)
    |
    -alloc_netdev(sizeof_priv, name, setup)
    +alloc_netdev(sizeof_priv, name, NET_NAME_UNKNOWN, setup)
    )

    v9: move comments here from the wrong commit

    Signed-off-by: Tom Gundersen
    Reviewed-by: David Herrmann
    Signed-off-by: David S. Miller

    Tom Gundersen
     

08 Jul, 2014

1 commit

  • Automatically generate flow labels for IPv6 packets on transmit.
    The flow label is computed based on skb_get_hash. The flow label will
    only automatically be set when it is zero otherwise (i.e. flow label
    manager hasn't set one). This supports the transmit side functionality
    of RFC 6438.

    Added an IPv6 sysctl auto_flowlabels to enable/disable this behavior
    system wide, and added IPV6_AUTOFLOWLABEL socket option to enable this
    functionality per socket.

    By default, auto flowlabels are disabled to avoid possible conflicts
    with flow label manager, however if this feature proves useful we
    may want to enable it by default.

    It should also be noted that FreeBSD has already implemented automatic
    flow labels (including the sysctl and socket option). In FreeBSD,
    automatic flow labels default to enabled.

    Performance impact:

    Running super_netperf with 200 flows for TCP_RR and UDP_RR for
    IPv6. Note that in UDP case, __skb_get_hash will be called for
    every packet with explains slight regression. In the TCP case
    the hash is saved in the socket so there is no regression.

    Automatic flow labels disabled:

    TCP_RR:
    86.53% CPU utilization
    127/195/322 90/95/99% latencies
    1.40498e+06 tps

    UDP_RR:
    90.70% CPU utilization
    118/168/243 90/95/99% latencies
    1.50309e+06 tps

    Automatic flow labels enabled:

    TCP_RR:
    85.90% CPU utilization
    128/199/337 90/95/99% latencies
    1.40051e+06

    UDP_RR
    92.61% CPU utilization
    115/164/236 90/95/99% latencies
    1.4687e+06

    Signed-off-by: Tom Herbert
    Signed-off-by: David S. Miller

    Tom Herbert
     

24 May, 2014

1 commit

  • Conflicts:
    drivers/net/bonding/bond_alb.c
    drivers/net/ethernet/altera/altera_msgdma.c
    drivers/net/ethernet/altera/altera_sgdma.c
    net/ipv6/xfrm6_output.c

    Several cases of overlapping changes.

    The xfrm6_output.c has a bug fix which overlaps the renaming
    of skb->local_df to skb->ignore_df.

    In the Altera TSE driver cases, the register access cleanups
    in net-next overlapped with bug fixes done in net.

    Similarly a bug fix to send ALB packets in the bonding driver using
    the right source address overlaps with cleanups in net-next.

    Signed-off-by: David S. Miller

    David S. Miller
     

22 May, 2014

1 commit

  • Enable the module alias hookup to allow tunnel modules to be autoloaded on demand.

    This is in line with how most other netdev kinds work, and will allow userspace
    to create tunnels without having CAP_SYS_MODULE.

    Signed-off-by: Tom Gundersen
    Signed-off-by: David S. Miller

    Tom Gundersen
     

13 May, 2014

1 commit

  • The function ip6_tnl_validate assumes that the rtnl
    attribute IFLA_IPTUN_PROTO always be filled . If this
    attribute is not filled by the userspace application
    kernel get crashed with NULL pointer dereference. This
    patch fixes the potential kernel crash when
    IFLA_IPTUN_PROTO is missing .

    Signed-off-by: Susant Sahani
    Acked-by: Thomas Graf
    Signed-off-by: David S. Miller

    Susant Sahani
     

17 Apr, 2014

1 commit


15 Mar, 2014

1 commit

  • Replace the bh safe variant with the hard irq safe variant.

    We need a hard irq safe variant to deal with netpoll transmitting
    packets from hard irq context, and we need it in most if not all of
    the places using the bh safe variant.

    Except on 32bit uni-processor the code is exactly the same so don't
    bother with a bh variant, just have a hard irq safe variant that
    everyone can use.

    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

15 Feb, 2014

1 commit