02 Nov, 2017

1 commit

  • Many source files in the tree are missing licensing information, which
    makes it harder for compliance tools to determine the correct license.

    By default all files without license information are under the default
    license of the kernel, which is GPL version 2.

    Update the files which contain no license information with the 'GPL-2.0'
    SPDX license identifier. The SPDX identifier is a legally binding
    shorthand, which can be used instead of the full boiler plate text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart and
    Philippe Ombredanne.

    How this work was done:

    Patches were generated and checked against linux-4.14-rc6 for a subset of
    the use cases:
    - file had no licensing information it it.
    - file was a */uapi/* one with no licensing information in it,
    - file was a */uapi/* one with existing licensing information,

    Further patches will be generated in subsequent months to fix up cases
    where non-standard license headers were used, and references to license
    had to be inferred by heuristics based on keywords.

    The analysis to determine which SPDX License Identifier to be applied to
    a file was done in a spreadsheet of side by side results from of the
    output of two independent scanners (ScanCode & Windriver) producing SPDX
    tag:value files created by Philippe Ombredanne. Philippe prepared the
    base worksheet, and did an initial spot review of a few 1000 files.

    The 4.13 kernel was the starting point of the analysis with 60,537 files
    assessed. Kate Stewart did a file by file comparison of the scanner
    results in the spreadsheet to determine which SPDX license identifier(s)
    to be applied to the file. She confirmed any determination that was not
    immediately clear with lawyers working with the Linux Foundation.

    Criteria used to select files for SPDX license identifier tagging was:
    - Files considered eligible had to be source code files.
    - Make and config files were included as candidates if they contained >5
    lines of source
    - File already had some variant of a license header in it (even if
    Reviewed-by: Philippe Ombredanne
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

30 May, 2017

3 commits


02 Apr, 2017

5 commits

  • Alow users to push down more labels per MPLS encap. Similar to LSR case,
    move label array to the end of mpls_iptunnel_encap and allocate based on
    the number of labels for the route.

    For consistency with the LSR case, re-use the same maximum number of
    labels.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     
  • Allow users to push down more labels per MPLS route. With the previous
    patches, no memory allocations are based on MAX_NEW_LABELS; the limit
    is only used to keep userspace in check.

    At this point MAX_NEW_LABELS is only used for mpls_route_config (copying
    route data from userspace) and processing nexthops looking for the max
    number of labels across the route spec.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     
  • Move labels to the end of mpls_nh as a 0-sized array and within mpls_route
    move the via for a nexthop after the mpls_nh. The new layout becomes:

    +----------------------+
    | mpls_route |
    +----------------------+
    | mpls_nh 0 |
    +----------------------+
    | alignment padding | 4 bytes for odd number of labels; 0 for even
    +----------------------+
    | via[rt_max_alen] 0 |
    +----------------------+
    | alignment padding | via's aligned on sizeof(unsigned long)
    +----------------------+
    | ... |
    +----------------------+
    | mpls_nh n-1 |
    +----------------------+
    | via[rt_max_alen] n-1 |
    +----------------------+

    Memory allocated for nexthop + via is constant across all nexthops and
    their via. It is based on the maximum number of labels across all nexthops
    and the maximum via length. The size is saved in the mpls_route as
    rt_nh_size. Accessing a nexthop becomes rt->rt_nh + index * rt->rt_nh_size.

    The offset of the via address from a nexthop is saved as rt_via_offset
    so that given an mpls_nh pointer the via for that hop is simply
    nh + rt->rt_via_offset.

    With prior code, memory allocated per mpls_route with 1 nexthop:
    via is an ethernet address - 64 bytes
    via is an ipv4 address - 64
    via is an ipv6 address - 72

    With this patch set, memory allocated per mpls_route with 1 nexthop and
    1 or 2 labels:
    via is an ethernet address - 56 bytes
    via is an ipv4 address - 56
    via is an ipv6 address - 64

    The 8-byte reduction is due to the previous patch; the change introduced
    by this patch has no impact on the size of allocations for 1 or 2 labels.

    Performance impact of this change was examined using network namespaces
    with veth pairs connecting namespaces. ns0 inserts the packet to the
    label-switched path using an lwt route with encap mpls. ns1 adds 1 or 2
    labels depending on test, ns2 (and ns3 for 2-label test) pops the label
    and forwards. ns3 (or ns4) for a 2-label is the destination. Similar
    series of namespaces used for 2-nexthop test.

    Intent is to measure changes to latency (overhead in manipulating the
    packet) in the forwarding path. Tests used netperf with UDP_RR.

    IPv4: current patches
    1 label, 1 nexthop 29908 30115
    2 label, 1 nexthop 29071 29612
    1 label, 2 nexthop 29582 29776
    2 label, 2 nexthop 29086 29149

    IPv6: current patches
    1 label, 1 nexthop 24502 24960
    2 label, 1 nexthop 24041 24407
    1 label, 2 nexthop 23795 23899
    2 label, 2 nexthop 23074 22959

    In short, the change has no effect to a modest increase in performance.
    This is expected since this patch does not really have an impact on routes
    with 1 or 2 labels (the current limit) and 1 or 2 nexthops.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     
  • Number of nexthops and number of alive nexthops are tracked using an
    unsigned int. A route should never have more than 255 nexthops so
    convert both to u8. Update all references and intermediate variables
    to consistently use u8 as well.

    Shrinks the size of mpls_route from 32 bytes to 24 bytes with a 2-byte
    hole before the nexthops.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     
  • The number of alive nexthops for a route (rt->rt_nhn_alive) and the
    flags for a next hop (nh->nh_flags) are modified by netdev event
    handlers. The event handlers run with rtnl_lock held so updates are
    always done with the lock held. The packet path accesses the fields
    under the rcu lock. Since those fields can change at any moment in
    the packet path, both fields should be accessed using READ_ONCE. Updates
    to both fields should use WRITE_ONCE.

    Update mpls_select_multipath (packet path) and mpls_ifdown and mpls_ifup
    (event handlers) accordingly.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     

14 Mar, 2017

1 commit

  • Provide the ability to control on a per-route basis whether the TTL
    value from an MPLS packet is propagated to an IPv4/IPv6 packet when
    the last label is popped as per the theoretical model in RFC 3443
    through a new route attribute, RTA_TTL_PROPAGATE which can be 0 to
    mean disable propagation and 1 to mean enable propagation.

    In order to provide the ability to change the behaviour for packets
    arriving with IPv4/IPv6 Explicit Null labels and to provide an easy
    way for a user to change the behaviour for all existing routes without
    having to reprogram them, a global knob is provided. This is done
    through the addition of a new per-namespace sysctl,
    "net.mpls.ip_ttl_propagate", which defaults to enabled. If the
    per-route attribute is set (either enabled or disabled) then it
    overrides the global configuration.

    Signed-off-by: Robert Shearman
    Acked-by: David Ahern
    Tested-by: David Ahern
    Signed-off-by: David S. Miller

    Robert Shearman
     

21 Feb, 2017

1 commit

  • Add netconf support to MPLS. Allows userpsace to learn and be notified
    of changes to 'input' enable setting per interface.

    Acked-by: Nicolas Dichtel
    Signed-off-by: David Ahern
    Acked-by: Robert Shearman
    Signed-off-by: David S. Miller

    David Ahern
     

18 Jan, 2017

1 commit

  • Having MPLS packet stats is useful for observing network operation and
    for diagnosing network problems. In the absence of anything better,
    RFC2863 and RFC3813 are used for guidance for which stats to expose
    and the semantics of them. In particular rx_noroutes maps to in
    unknown protos in RFC2863. The stats are exposed to userspace via
    AF_MPLS attributes embedded in the IFLA_STATS_AF_SPEC attribute of
    RTM_GETSTATS messages.

    All the introduced fields are 64-bit, even error ones, to ensure no
    overflow with long uptimes. Per-CPU counters are used to avoid
    cache-line contention on the commonly used fields. The other fields
    have also been made per-CPU for code to avoid performance problems in
    error conditions on the assumption that on some platforms the cost of
    atomic operations could be more expensive than sending the packet
    (which is what would be done in the success case). If that's not the
    case, we could instead not use per-CPU counters for these fields.

    Only unicast and non-fragment are exposed at the moment, but other
    counters can be exposed in the future either by adding to the end of
    struct mpls_link_stats or by additional netlink attributes in the
    AF_MPLS IFLA_STATS_AF_SPEC nested attribute.

    Signed-off-by: Robert Shearman
    Signed-off-by: David S. Miller

    Robert Shearman
     

03 Oct, 2016

1 commit


04 Dec, 2015

1 commit

  • Adds support for RTNH_F_DEAD and RTNH_F_LINKDOWN flags on mpls
    routes due to link events. Also adds code to ignore dead
    routes during route selection.

    Unlike ip routes, mpls routes are not deleted when the route goes
    dead. This is current mpls behaviour and this patch does not change
    that. With this patch however, routes will be marked dead.
    dead routes are not notified to userspace (this is consistent with ipv4
    routes).

    dead routes:
    -----------
    $ip -f mpls route show
    100
    nexthop as to 200 via inet 10.1.1.2 dev swp1
    nexthop as to 700 via inet 10.1.1.6 dev swp2

    $ip link set dev swp1 down

    $ip link show dev swp1
    4: swp1: mtu 1500 qdisc pfifo_fast state DOWN mode
    DEFAULT group default qlen 1000
    link/ether 00:02:00:00:00:01 brd ff:ff:ff:ff:ff:ff

    $ip -f mpls route show
    100
    nexthop as to 200 via inet 10.1.1.2 dev swp1 dead linkdown
    nexthop as to 700 via inet 10.1.1.6 dev swp2

    linkdown routes:
    ----------------
    $ip -f mpls route show
    100
    nexthop as to 200 via inet 10.1.1.2 dev swp1
    nexthop as to 700 via inet 10.1.1.6 dev swp2

    $ip link show dev swp1
    4: swp1: mtu 1500 qdisc pfifo_fast
    state UP mode DEFAULT group default qlen 1000
    link/ether 00:02:00:00:00:01 brd ff:ff:ff:ff:ff:ff

    /* carrier goes down */
    $ip link show dev swp1
    4: swp1: mtu 1500 qdisc pfifo_fast
    state DOWN mode DEFAULT group default qlen 1000
    link/ether 00:02:00:00:00:01 brd ff:ff:ff:ff:ff:ff

    $ip -f mpls route show
    100
    nexthop as to 200 via inet 10.1.1.2 dev swp1 linkdown
    nexthop as to 700 via inet 10.1.1.6 dev swp2

    Signed-off-by: Roopa Prabhu
    Acked-by: Robert Shearman
    Signed-off-by: David S. Miller

    Roopa Prabhu
     

28 Oct, 2015

1 commit

  • Nexthops for MPLS routes have a via address field sized for the
    largest via address that is expected, which is 32 bytes. This means
    that in the most common case of having ipv4 via addresses, 28 bytes of
    memory more than required are used per nexthop. In the other common
    case of an ipv6 nexthop then 16 bytes more than required are
    used. With large numbers of MPLS routes this extra memory usage could
    start to become significant.

    To avoid allocating memory for a maximum length via address when not
    all of it is required and to allow for ease of iterating over
    nexthops, then the via addresses are changed to be stored in the same
    memory block as the route and nexthops, but in an array after the end
    of the array of nexthops. New accessors are provided to retrieve a
    pointer to the via address.

    To allow for O(1) access without having to store a pointer or offset
    per nh, the via address for each nexthop is sized according to the
    maximum via address for any nexthop in the route, which is stored in a
    new route field, rt_max_alen, but this is in an existing hole in
    struct mpls_route so it doesn't increase the size of the
    structure. Each via address is ensured to be aligned to VIA_ALEN_ALIGN
    to account for architectures that don't allow unaligned accesses.

    Signed-off-by: Robert Shearman
    Signed-off-by: David S. Miller

    Robert Shearman
     

23 Oct, 2015

1 commit

  • This patch adds support for MPLS multipath routes.

    Includes following changes to support multipath:
    - splits struct mpls_route into 'struct mpls_route + struct mpls_nh'

    - 'struct mpls_nh' represents a mpls nexthop label forwarding entry

    - moves mpls route and nexthop structures into internal.h

    - A mpls_route can point to multiple mpls_nh structs

    - the nexthops are maintained as a array (similar to ipv4 fib)

    - In the process of restructuring, this patch also consistently changes
    all labels to u8

    - Adds support to parse/fill RTA_MULTIPATH netlink attribute for
    multipath routes similar to ipv4/v6 fib

    - In this patch, the multipath route nexthop selection algorithm
    simply returns the first nexthop. It is replaced by a
    hash based algorithm from Robert Shearman in the next patch

    - mpls_route_update cleanup: remove 'dev' handling in mpls_route_update.
    mpls_route_update though implemented to update based on dev, it was
    never used that way. And the dev handling gets tricky with multiple
    nexthops. Cannot match against any single nexthops dev. So, this patch
    removes the unused 'dev' handling in mpls_route_update.

    - dead route/path handling will be implemented in a subsequent patch

    Example:

    $ip -f mpls route add 100 nexthop as 200 via inet 10.1.1.2 dev swp1 \
    nexthop as 700 via inet 10.1.1.6 dev swp2 \
    nexthop as 800 via inet 40.1.1.2 dev swp3

    $ip -f mpls route show
    100
    nexthop as to 200 via inet 10.1.1.2 dev swp1
    nexthop as to 700 via inet 10.1.1.6 dev swp2
    nexthop as to 800 via inet 40.1.1.2 dev swp3

    Signed-off-by: Roopa Prabhu
    Acked-by: Robert Shearman
    Signed-off-by: David S. Miller

    Roopa Prabhu
     

22 Jul, 2015

1 commit


08 Jun, 2015

1 commit

  • The mpls device is used in an RCU read context without a lock being
    held. As the memory is freed without waiting for the RCU grace period
    to elapse, the freed memory could still be in use.

    Address this by using kfree_rcu to free the memory for the mpls device
    after the RCU grace period has elapsed.

    Fixes: 03c57747a702 ("mpls: Per-device MPLS state")
    Signed-off-by: Robert Shearman
    Acked-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Robert Shearman
     

06 May, 2015

1 commit


23 Apr, 2015

2 commits

  • An MPLS network is a single trust domain where the edges must be in
    control of what labels make their way into the core. The simplest way
    of ensuring this is for the edge device to always impose the labels,
    and not allow forward labeled traffic from untrusted neighbours. This
    is achieved by allowing a per-device configuration of whether MPLS
    traffic input from that interface should be processed or not.

    To be secure by default, the default state is changed to MPLS being
    disabled on all interfaces unless explicitly enabled and no global
    option is provided to change the default. Whilst this differs from
    other protocols (e.g. IPv6), network operators are used to explicitly
    enabling MPLS forwarding on interfaces, and with the number of links
    to the MPLS core typically fairly low this doesn't present too much of
    a burden on operators.

    Cc: "Eric W. Biederman"
    Signed-off-by: Robert Shearman
    Reviewed-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Robert Shearman
     
  • Add per-device MPLS state to supported interfaces. Use the presence of
    this state in mpls_route_add to determine that this is a supported
    interface.

    Use the presence of mpls_dev to drop packets that arrived on an
    unsupported interface - previously they were allowed through.

    Cc: "Eric W. Biederman"
    Signed-off-by: Robert Shearman
    Reviewed-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Robert Shearman
     

04 Mar, 2015

2 commits

  • Reading and writing addresses in network byte order in netlink is
    traditional and I see no reason to change that. MPLS is interesting
    as effectively it has variabely length addresses (the MPLS label
    stack). To represent these variable length addresses in netlink
    I use a valid MPLS label stack (complete with stop bit).

    This achieves two things: a well defined existing format is used,
    and the data can be interpreted without looking at it's length.

    Not needed to look at the length to decode the variable length
    network representation allows existing userspace functions
    such as inet_ntop to be used without needed to change their
    prototype.

    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • This change adds a new Kconfig option MPLS_ROUTING.

    The core of this change is the code to look at an mpls packet received
    from another machine. Look that packet up in a routing table and
    forward the packet on.

    Support of MPLS over ATM is not considered or attempted here. This
    implemntation follows RFC3032 and implements the MPLS shim header that
    can pass over essentially any network.

    What RFC3021 refers to as the as the Incoming Label Map (ILM) I call
    net->mpls.platform_label[]. What RFC3031 refers to as the Next Label
    Hop Forwarding Entry (NHLFE) I call mpls_route. Though calling it the
    label fordwarding information base (lfib) might also be valid.

    Further the implemntation forwards packets as described in RFC3032.
    There is no need and given the original motivation for MPLS a strong
    discincentive to have a flexible label forwarding path. In essence
    the logic is the topmost label is read, looked up, removed, and
    replaced by 0 or more new lables and the sent out the specified
    interface to it's next hop.

    Quite a few optional features are not implemented here. Among them
    are generation of ICMP errors when the TTL is exceeded or the packet
    is larger than the next hop MTU (those conditions are detected and the
    packets are dropped instead of generating an icmp error). The traffic
    class field is always set to 0. The implementation focuses on IP over
    MPLS and does not handle egress of other kinds of protocols.

    Instead of implementing coordination with the neighbour table and
    sorting out how to input next hops in a different address family (for
    which there is value). I was lazy and implemented a next hop mac
    address instead. The code is simpler and there are flavor of MPLS
    such as MPLS-TP where neither an IPv4 nor an IPv6 next hop is
    appropriate so a next hop by mac address would need to be implemented
    at some point.

    Two new definitions AF_MPLS and PF_MPLS are exposed to userspace.

    Decoding the mpls header must be done by first byeswapping a 32bit bit
    endian word into the local cpu endian and then bit shifting to extract
    the pieces. There is no C bit-field that can represent a wire format
    mpls header on a little endian machine as the low bits of the 20bit
    label wind up in the wrong half of third byte. Therefore internally
    everything is deal with in cpu native byte order except when writing
    to and reading from a packet.

    For management simplicity if a label is configured to forward out
    an interface that is down the packet is dropped early. Similarly
    if an network interface is removed rt_dev is updated to NULL
    (so no reference is preserved) and any packets for that label
    are dropped. Keeping the label entries in the kernel allows
    the kernel label table to function as the definitive source
    of which labels are allocated and which are not.

    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman