26 Aug, 2019

1 commit

  • When using mpls over gre/gre6 setup, rt->rt_gw4 address is not set, the
    same for rt->rt_gw_family. Therefore, when rt->rt_gw_family is checked
    in mpls_xmit(), neigh_xmit() call is skipped. As a result, such setup
    doesn't work anymore.

    This issue was found with LTP mpls03 tests.

    Fixes: 1550c171935d ("ipv4: Prepare rtable for IPv6 gateway")
    Signed-off-by: Alexey Kodanev
    Reviewed-by: David Ahern
    Signed-off-by: David S. Miller

    Alexey Kodanev
     

10 Jun, 2019

1 commit

  • If you configure a route with multiple labels, e.g.
    ip route add 10.10.3.0/24 encap mpls 16/100 via 10.10.2.2 dev ens4
    A warning is logged:
    kernel: [ 130.561819] netlink: 'ip': attribute type 1 has an invalid
    length.

    This happens because mpls_iptunnel_policy has set the type of
    MPLS_IPTUNNEL_DST to fixed size NLA_U32.
    Change it to a minimum size.
    nla_get_labels() does the remaining validation.

    Fixes: e3e4712ec096 ("mpls: ip tunnel support")
    Signed-off-by: George Wilkie
    Reviewed-by: David Ahern
    Signed-off-by: David S. Miller

    George Wilkie
     

31 May, 2019

1 commit

  • Based on 1 normalized pattern(s):

    this program is free software you can redistribute it and or modify
    it under the terms of the gnu general public license as published by
    the free software foundation either version 2 of the license or at
    your option any later version

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-or-later

    has been chosen to replace the boilerplate/reference in 3029 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Allison Randal
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

28 Apr, 2019

1 commit

  • We currently have two levels of strict validation:

    1) liberal (default)
    - undefined (type >= max) & NLA_UNSPEC attributes accepted
    - attribute length >= expected accepted
    - garbage at end of message accepted
    2) strict (opt-in)
    - NLA_UNSPEC attributes accepted
    - attribute length >= expected accepted

    Split out parsing strictness into four different options:
    * TRAILING - check that there's no trailing data after parsing
    attributes (in message or nested)
    * MAXTYPE - reject attrs > max known type
    * UNSPEC - reject attributes with NLA_UNSPEC policy entries
    * STRICT_ATTRS - strictly validate attribute size

    The default for future things should be *everything*.
    The current *_strict() is a combination of TRAILING and MAXTYPE,
    and is renamed to _deprecated_strict().
    The current regular parsing has none of this, and is renamed to
    *_parse_deprecated().

    Additionally it allows us to selectively set one of the new flags
    even on old policies. Notably, the UNSPEC flag could be useful in
    this case, since it can be arranged (by filling in the policy) to
    not be an incompatible userspace ABI change, but would then going
    forward prevent forgetting attribute entries. Similar can apply
    to the POLICY flag.

    We end up with the following renames:
    * nla_parse -> nla_parse_deprecated
    * nla_parse_strict -> nla_parse_deprecated_strict
    * nlmsg_parse -> nlmsg_parse_deprecated
    * nlmsg_parse_strict -> nlmsg_parse_deprecated_strict
    * nla_parse_nested -> nla_parse_nested_deprecated
    * nla_validate_nested -> nla_validate_nested_deprecated

    Using spatch, of course:
    @@
    expression TB, MAX, HEAD, LEN, POL, EXT;
    @@
    -nla_parse(TB, MAX, HEAD, LEN, POL, EXT)
    +nla_parse_deprecated(TB, MAX, HEAD, LEN, POL, EXT)

    @@
    expression NLH, HDRLEN, TB, MAX, POL, EXT;
    @@
    -nlmsg_parse(NLH, HDRLEN, TB, MAX, POL, EXT)
    +nlmsg_parse_deprecated(NLH, HDRLEN, TB, MAX, POL, EXT)

    @@
    expression NLH, HDRLEN, TB, MAX, POL, EXT;
    @@
    -nlmsg_parse_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
    +nlmsg_parse_deprecated_strict(NLH, HDRLEN, TB, MAX, POL, EXT)

    @@
    expression TB, MAX, NLA, POL, EXT;
    @@
    -nla_parse_nested(TB, MAX, NLA, POL, EXT)
    +nla_parse_nested_deprecated(TB, MAX, NLA, POL, EXT)

    @@
    expression START, MAX, POL, EXT;
    @@
    -nla_validate_nested(START, MAX, POL, EXT)
    +nla_validate_nested_deprecated(START, MAX, POL, EXT)

    @@
    expression NLH, HDRLEN, MAX, POL, EXT;
    @@
    -nlmsg_validate(NLH, HDRLEN, MAX, POL, EXT)
    +nlmsg_validate_deprecated(NLH, HDRLEN, MAX, POL, EXT)

    For this patch, don't actually add the strict, non-renamed versions
    yet so that it breaks compile if I get it wrong.

    Also, while at it, make nla_validate and nla_parse go down to a
    common __nla_validate_parse() function to avoid code duplication.

    Ultimately, this allows us to have very strict validation for every
    new caller of nla_parse()/nlmsg_parse() etc as re-introduced in the
    next patch, while existing things will continue to work as is.

    In effect then, this adds fully strict validation for any new command.

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     

09 Apr, 2019

2 commits

  • Add support for an IPv6 gateway to rtable. Since a gateway is either
    IPv4 or IPv6, make it a union with rt_gw4 where rt_gw_family decides
    which address is in use.

    When dumping the route data, encode an ipv6 nexthop using RTA_VIA.

    Signed-off-by: David Ahern
    Reviewed-by: Ido Schimmel
    Signed-off-by: David S. Miller

    David Ahern
     
  • To allow the gateway to be either an IPv4 or IPv6 address, remove
    rt_uses_gateway from rtable and replace with rt_gw_family. If
    rt_gw_family is set it implies rt_uses_gateway. Rename rt_gateway
    to rt_gw4 to represent the IPv4 version.

    Signed-off-by: David Ahern
    Reviewed-by: Ido Schimmel
    Signed-off-by: David S. Miller

    David Ahern
     

20 Mar, 2019

1 commit

  • This patch adds support for 6PE (RFC 4798) which uses IPv4-mapped IPv6
    nexthop to connect IPv6 islands over IPv4 only MPLS network core.

    Prior to this fix, to find the link-layer destination mac address, 6PE
    enabled host/router was sending IPv6 ND requests for IPv4-mapped IPv6
    nexthop address over the interface facing the IPv4 only core which
    wouldn't success as the core is IPv6 free.

    This fix changes that behavior on 6PE host to treat the nexthop as IPv4
    address and send ARP requests whenever the next-hop address is an
    IPv4-mapped IPv6 address.

    Below topology illustrates the issue and how the patch addresses it.

    abcd::1.1.1.1 (lo) abcd::2.2.2.2 (lo)
    R0 (PE/host)------------------------R1--------------------------------R2 (PE/host)

    eth1 eth2 eth3 eth4
    172.18.0.10 172.18.0.11 172.19.0.11 172.19.0.12
    ffff::172.18.0.10 ffff::172.19.0.12

    R0 and R2 act as 6PE routers of IPv6 islands. R1 is IPv4 only with MPLS tunnels
    between R0,R1 and R1,R2.

    docker exec r0 ip -f inet6 route add abcd::2.2.2.2/128 nexthop encap mpls 100 via ::ffff:172.18.0.11 dev eth1
    docker exec r2 ip -f inet6 route add abcd::1.1.1.1/128 nexthop encap mpls 200 via ::ffff:172.19.0.11 dev eth4

    docker exec r1 ip -f mpls route add 100 via inet 172.19.0.12 dev eth3
    docker exec r1 ip -f mpls route add 200 via inet 172.18.0.10 dev eth2

    With the change, when R0 sends an IPv6 packet over MPLS tunnel to abcd::2.2.2.2,
    using ::ffff:172.18.0.11 as the nexthop, it does neighbor discovery for
    172.18.18.0.11.

    Signed-off-by: Vinay K Nallamothu
    Tested-by: Avinash Lingala
    Tested-by: Aravind Srinivas Srinivasa Prabhakar
    Signed-off-by: David S. Miller

    Vinay K Nallamothu
     

09 Feb, 2019

1 commit

  • One of the more common cases of allocation size calculations is finding
    the size of a structure that has a zero-sized array at the end, along
    with memory for some number of elements for that array. For example:

    struct foo {
    int stuff;
    struct boo entry[];
    };

    instance = alloc(sizeof(struct foo) + count * sizeof(struct boo));

    Instead of leaving these open-coded and prone to type mistakes, we can
    now use the new struct_size() helper:

    instance = alloc(struct_size(instance, entry, count));

    This code was detected with the help of Coccinelle.

    Signed-off-by: Gustavo A. R. Silva
    Signed-off-by: David S. Miller

    Gustavo A. R. Silva
     

25 Jul, 2018

1 commit


30 May, 2017

2 commits


14 Apr, 2017

1 commit


02 Apr, 2017

1 commit

  • Alow users to push down more labels per MPLS encap. Similar to LSR case,
    move label array to the end of mpls_iptunnel_encap and allocate based on
    the number of labels for the route.

    For consistency with the LSR case, re-use the same maximum number of
    labels.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     

14 Mar, 2017

1 commit

  • Allow TTL propagation from IP packets to MPLS packets to be
    configured. Add a new optional LWT attribute, MPLS_IPTUNNEL_TTL, which
    allows the TTL to be set in the resulting MPLS packet, with the value
    of 0 having the semantics of enabling propagation of the TTL from the
    IP header (i.e. non-zero values disable propagation).

    Also allow the configuration to be overridden globally by reusing the
    same sysctl to control whether the TTL is propagated from IP packets
    into the MPLS header. If the per-LWT attribute is set then it
    overrides the global configuration. If the TTL isn't propagated then a
    default TTL value is used which can be configured via a new sysctl,
    "net.mpls.default_ttl". This is kept separate from the configuration
    of whether IP TTL propagation is enabled as it can be used in the
    future when non-IP payloads are supported (i.e. where there is no
    payload TTL that can be propagated).

    Signed-off-by: Robert Shearman
    Acked-by: David Ahern
    Tested-by: David Ahern
    Signed-off-by: David S. Miller

    Robert Shearman
     

31 Jan, 2017

1 commit


28 Jan, 2017

1 commit


25 Jan, 2017

1 commit


18 Jan, 2017

1 commit

  • Having MPLS packet stats is useful for observing network operation and
    for diagnosing network problems. In the absence of anything better,
    RFC2863 and RFC3813 are used for guidance for which stats to expose
    and the semantics of them. In particular rx_noroutes maps to in
    unknown protos in RFC2863. The stats are exposed to userspace via
    AF_MPLS attributes embedded in the IFLA_STATS_AF_SPEC attribute of
    RTM_GETSTATS messages.

    All the introduced fields are 64-bit, even error ones, to ensure no
    overflow with long uptimes. Per-CPU counters are used to avoid
    cache-line contention on the commonly used fields. The other fields
    have also been made per-CPU for code to avoid performance problems in
    error conditions on the assumption that on some platforms the cost of
    atomic operations could be more expensive than sending the packet
    (which is what would be done in the success case). If that's not the
    case, we could instead not use per-CPU counters for these fields.

    Only unicast and non-fragment are exposed at the moment, but other
    counters can be exposed in the future either by adding to the end of
    struct mpls_link_stats or by additional netlink attributes in the
    AF_MPLS IFLA_STATS_AF_SPEC nested attribute.

    Signed-off-by: Robert Shearman
    Signed-off-by: David S. Miller

    Robert Shearman
     

24 Oct, 2016

1 commit


31 Aug, 2016

2 commits

  • As reported by Lennert the MPLS GSO code is failing to properly segment
    large packets. There are a couple of problems:

    1. the inner protocol is not set so the gso segment functions for inner
    protocol layers are not getting run, and

    2 MPLS labels for packets that use the "native" (non-OVS) MPLS code
    are not properly accounted for in mpls_gso_segment.

    The MPLS GSO code was added for OVS. It is re-using skb_mac_gso_segment
    to call the gso segment functions for the higher layer protocols. That
    means skb_mac_gso_segment is called twice -- once with the network
    protocol set to MPLS and again with the network protocol set to the
    inner protocol.

    This patch sets the inner skb protocol addressing item 1 above and sets
    the network_header and inner_network_header to mark where the MPLS labels
    start and end. The MPLS code in OVS is also updated to set the two
    network markers.

    >From there the MPLS GSO code uses the difference between the network
    header and the inner network header to know the size of the MPLS header
    that was pushed. It then pulls the MPLS header, resets the mac_len and
    protocol for the inner protocol and then calls skb_mac_gso_segment
    to segment the skb.

    Afterward the inner protocol segmentation is done the skb protocol
    is set to mpls for each segment and the network and mac headers
    restored.

    Reported-by: Lennert Buytenhek
    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     
  • Today mpls iptunnel lwtunnel_output redirect expects the tunnel
    output function to handle fragmentation. This is ok but can be
    avoided if we did not do the mpls output redirect too early.
    ie we could wait until ip fragmentation is done and then call
    mpls output for each ip fragment.

    To make this work we will need,
    1) the lwtunnel state to carry encap headroom
    2) and do the redirect to the encap output handler on the ip fragment
    (essentially do the output redirect after fragmentation)

    This patch adds tunnel headroom in lwtstate to make sure we
    account for tunnel data in mtu calculations during fragmentation
    and adds new xmit redirect handler to redirect to lwtunnel xmit func
    after ip fragmentation.

    This includes IPV6 and some mtu fixes and testing from David Ahern.

    Signed-off-by: Roopa Prabhu
    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    Roopa Prabhu
     

22 Feb, 2016

1 commit


18 Dec, 2015

1 commit


12 Dec, 2015

1 commit


08 Dec, 2015

1 commit

  • Locally generated IPv4 and (probably) IPv6 packets are dropped because
    skb->protocol isn't set. We could write wrappers to lwtunnel_output
    for IPv4 and IPv6 that set the protocol accordingly and then call
    lwtunnel_output, but mpls_output relies on the AF-specific type of dst
    anyway to get the via address.

    Therefore, make use of dst->dst_ops->family in mpls_output to
    determine the type of nexthop and thus protocol of the packet instead
    of checking skb->protocol.

    Fixes: 61adedf3e3f1 ("route: move lwtunnel state to dst_entry")
    Reported-by: Sam Russell
    Signed-off-by: Robert Shearman
    Signed-off-by: David S. Miller

    Robert Shearman
     

08 Oct, 2015

1 commit


25 Aug, 2015

1 commit

  • Add cfg and family arguments to lwt build state functions. cfg is a void
    pointer and will either be a pointer to a fib_config or fib6_config
    structure. The family parameter indicates which one (either AF_INET
    or AF_INET6).

    LWT encpasulation implementation may use the fib configuration to build
    the LWT state.

    Signed-off-by: Tom Herbert
    Signed-off-by: David S. Miller

    Tom Herbert
     

21 Aug, 2015

1 commit

  • Currently, the lwtunnel state resides in per-protocol data. This is
    a problem if we encapsulate ipv6 traffic in an ipv4 tunnel (or vice versa).
    The xmit function of the tunnel does not know whether the packet has been
    routed to it by ipv4 or ipv6, yet it needs the lwtstate data. Moving the
    lwtstate data to dst_entry makes such inter-protocol tunneling possible.

    As a bonus, this brings a nice diffstat.

    Signed-off-by: Jiri Benc
    Acked-by: Roopa Prabhu
    Acked-by: Thomas Graf
    Signed-off-by: David S. Miller

    Jiri Benc
     

23 Jul, 2015

1 commit


22 Jul, 2015

1 commit

  • This implementation uses lwtunnel infrastructure to register
    hooks for mpls tunnel encaps.

    It picks cues from iptunnel_encaps infrastructure and previous
    mpls iptunnel RFC patches from Eric W. Biederman and Robert Shearman

    Signed-off-by: Roopa Prabhu
    Signed-off-by: David S. Miller

    Roopa Prabhu