08 Jul, 2017

1 commit


05 Jul, 2017

1 commit


04 Jul, 2017

1 commit

  • This patch adds RTM_GETROUTE doit handler for mpls routes.

    Input:
    RTA_DST - input label
    RTA_NEWDST - labels in packet for multipath selection

    By default the getroute handler returns matched
    nexthop label, via and oif

    With RTM_F_FIB_MATCH flag, full matched route is
    returned.

    example (with patched iproute2):
    $ip -f mpls route show
    101
    nexthop as to 102/103 via inet 172.16.2.2 dev virt1-2
    nexthop as to 302/303 via inet 172.16.12.2 dev virt1-12
    201
    nexthop as to 202/203 via inet6 2001:db8:2::2 dev virt1-2
    nexthop as to 402/403 via inet6 2001:db8:12::2 dev virt1-12

    $ip -f mpls route get 103
    RTNETLINK answers: Network is unreachable

    $ip -f mpls route get 101
    101 as to 102/103 via inet 172.16.2.2 dev virt1-2

    $ip -f mpls route get as to 302/303 101
    101 as to 302/303 via inet 172.16.12.2 dev virt1-12

    $ip -f mpls route get fibmatch 103
    RTNETLINK answers: Network is unreachable

    $ip -f mpls route get fibmatch 101
    101
    nexthop as to 102/103 via inet 172.16.2.2 dev virt1-2
    nexthop as to 302/303 via inet 172.16.12.2 dev virt1-12

    Signed-off-by: Roopa Prabhu
    Signed-off-by: David S. Miller

    Roopa Prabhu
     

07 Jun, 2017

1 commit


01 Jun, 2017

1 commit

  • recent fixes to use WRITE_ONCE for nh_flags on link up,
    accidently ended up leaving the deadflags on a nh. This patch
    fixes the WRITE_ONCE to use freshly evaluated nh_flags.

    Fixes: 39eb8cd17588 ("net: mpls: rt_nhn_alive and nh_flags should be accessed using READ_ONCE")
    Reported-by: Satish Ashok
    Signed-off-by: Roopa Prabhu
    Acked-by: David Ahern
    Signed-off-by: David S. Miller

    Roopa Prabhu
     

30 May, 2017

6 commits


09 May, 2017

1 commit

  • There are many code paths opencoding kvmalloc. Let's use the helper
    instead. The main difference to kvmalloc is that those users are
    usually not considering all the aspects of the memory allocator. E.g.
    allocation requests
    Reviewed-by: Boris Ostrovsky # Xen bits
    Acked-by: Kees Cook
    Acked-by: Vlastimil Babka
    Acked-by: Andreas Dilger # Lustre
    Acked-by: Christian Borntraeger # KVM/s390
    Acked-by: Dan Williams # nvdim
    Acked-by: David Sterba # btrfs
    Acked-by: Ilya Dryomov # Ceph
    Acked-by: Tariq Toukan # mlx4
    Acked-by: Leon Romanovsky # mlx5
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: Herbert Xu
    Cc: Anton Vorontsov
    Cc: Colin Cross
    Cc: Tony Luck
    Cc: "Rafael J. Wysocki"
    Cc: Ben Skeggs
    Cc: Kent Overstreet
    Cc: Santosh Raspatur
    Cc: Hariprasad S
    Cc: Yishai Hadas
    Cc: Oleg Drokin
    Cc: "Yan, Zheng"
    Cc: Alexander Viro
    Cc: Alexei Starovoitov
    Cc: Eric Dumazet
    Cc: David Miller
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     

18 Apr, 2017

1 commit

  • Add netlink_ext_ack arg to rtnl_doit_func. Pass extack arg to nlmsg_parse
    for doit functions that call it directly.

    This is the first step to using extended error reporting in rtnetlink.
    >From here individual subsystems can be updated to set netlink_ext_ack as
    needed.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     

14 Apr, 2017

1 commit


02 Apr, 2017

6 commits

  • Alow users to push down more labels per MPLS encap. Similar to LSR case,
    move label array to the end of mpls_iptunnel_encap and allocate based on
    the number of labels for the route.

    For consistency with the LSR case, re-use the same maximum number of
    labels.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     
  • Allow users to push down more labels per MPLS route. With the previous
    patches, no memory allocations are based on MAX_NEW_LABELS; the limit
    is only used to keep userspace in check.

    At this point MAX_NEW_LABELS is only used for mpls_route_config (copying
    route data from userspace) and processing nexthops looking for the max
    number of labels across the route spec.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     
  • Limit memory allocation size for mpls_route to 4096.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     
  • Move labels to the end of mpls_nh as a 0-sized array and within mpls_route
    move the via for a nexthop after the mpls_nh. The new layout becomes:

    +----------------------+
    | mpls_route |
    +----------------------+
    | mpls_nh 0 |
    +----------------------+
    | alignment padding | 4 bytes for odd number of labels; 0 for even
    +----------------------+
    | via[rt_max_alen] 0 |
    +----------------------+
    | alignment padding | via's aligned on sizeof(unsigned long)
    +----------------------+
    | ... |
    +----------------------+
    | mpls_nh n-1 |
    +----------------------+
    | via[rt_max_alen] n-1 |
    +----------------------+

    Memory allocated for nexthop + via is constant across all nexthops and
    their via. It is based on the maximum number of labels across all nexthops
    and the maximum via length. The size is saved in the mpls_route as
    rt_nh_size. Accessing a nexthop becomes rt->rt_nh + index * rt->rt_nh_size.

    The offset of the via address from a nexthop is saved as rt_via_offset
    so that given an mpls_nh pointer the via for that hop is simply
    nh + rt->rt_via_offset.

    With prior code, memory allocated per mpls_route with 1 nexthop:
    via is an ethernet address - 64 bytes
    via is an ipv4 address - 64
    via is an ipv6 address - 72

    With this patch set, memory allocated per mpls_route with 1 nexthop and
    1 or 2 labels:
    via is an ethernet address - 56 bytes
    via is an ipv4 address - 56
    via is an ipv6 address - 64

    The 8-byte reduction is due to the previous patch; the change introduced
    by this patch has no impact on the size of allocations for 1 or 2 labels.

    Performance impact of this change was examined using network namespaces
    with veth pairs connecting namespaces. ns0 inserts the packet to the
    label-switched path using an lwt route with encap mpls. ns1 adds 1 or 2
    labels depending on test, ns2 (and ns3 for 2-label test) pops the label
    and forwards. ns3 (or ns4) for a 2-label is the destination. Similar
    series of namespaces used for 2-nexthop test.

    Intent is to measure changes to latency (overhead in manipulating the
    packet) in the forwarding path. Tests used netperf with UDP_RR.

    IPv4: current patches
    1 label, 1 nexthop 29908 30115
    2 label, 1 nexthop 29071 29612
    1 label, 2 nexthop 29582 29776
    2 label, 2 nexthop 29086 29149

    IPv6: current patches
    1 label, 1 nexthop 24502 24960
    2 label, 1 nexthop 24041 24407
    1 label, 2 nexthop 23795 23899
    2 label, 2 nexthop 23074 22959

    In short, the change has no effect to a modest increase in performance.
    This is expected since this patch does not really have an impact on routes
    with 1 or 2 labels (the current limit) and 1 or 2 nexthops.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     
  • Number of nexthops and number of alive nexthops are tracked using an
    unsigned int. A route should never have more than 255 nexthops so
    convert both to u8. Update all references and intermediate variables
    to consistently use u8 as well.

    Shrinks the size of mpls_route from 32 bytes to 24 bytes with a 2-byte
    hole before the nexthops.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     
  • The number of alive nexthops for a route (rt->rt_nhn_alive) and the
    flags for a next hop (nh->nh_flags) are modified by netdev event
    handlers. The event handlers run with rtnl_lock held so updates are
    always done with the lock held. The packet path accesses the fields
    under the rcu lock. Since those fields can change at any moment in
    the packet path, both fields should be accessed using READ_ONCE. Updates
    to both fields should use WRITE_ONCE.

    Update mpls_select_multipath (packet path) and mpls_ifdown and mpls_ifup
    (event handlers) accordingly.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     

30 Mar, 2017

1 commit


29 Mar, 2017

2 commits


28 Mar, 2017

2 commits

  • When all devices for all nexthops in a route have been deleted, the
    route is effectively dead, so remove it.

    Signed-off-by: David Ahern
    Acked-by: Roopa Prabhu
    Signed-off-by: David S. Miller

    David Ahern
     
  • If the device for a nexthop in a multipath route is deleted, the nexthop
    is effectively removed from the route. Currently, a route dump still
    returns the nexhop though without the device set:

    $ ip -f mpls ro ls
    100
    nexthopvia inet 10.11.1.2 dev br0
    nexthopvia inet 10.100.3.1 dev eth3
    $ ip li del br0
    $ ip -f mpls ro ls
    100
    nexthopvia inet 10.11.1.2 dev * dead linkdown
    nexthopvia inet 10.100.3.1 dev eth3

    Since the nexthop is effectively deleted, drop the hop from the route
    dump.

    Signed-off-by: David Ahern
    Acked-by: Roopa Prabhu
    Signed-off-by: David S. Miller

    David Ahern
     

25 Mar, 2017

1 commit


24 Mar, 2017

1 commit


17 Mar, 2017

1 commit

  • Alive tracking of nexthops can account for a link twice if the carrier
    goes down followed by an admin down of the same link rendering multipath
    routes useless. This is similar to 79099aab38c8 for UNREGISTER events and
    DOWN events.

    Fix by tracking number of alive nexthops in mpls_ifdown similar to the
    logic in mpls_ifup. Checking the flags per nexthop once after all events
    have been processed is simpler than trying to maintian a running count
    through all event combinations.

    Also, WRITE_ONCE is used instead of ACCESS_ONCE to set rt_nhn_alive
    per a comment from checkpatch:
    WARNING: Prefer WRITE_ONCE(, ) over ACCESS_ONCE() =

    Fixes: c89359a42e2a4 ("mpls: support for dead routes")
    Signed-off-by: David Ahern
    Acked-by: Robert Shearman
    Signed-off-by: David S. Miller

    David Ahern
     

16 Mar, 2017

1 commit


14 Mar, 2017

2 commits

  • Allow TTL propagation from IP packets to MPLS packets to be
    configured. Add a new optional LWT attribute, MPLS_IPTUNNEL_TTL, which
    allows the TTL to be set in the resulting MPLS packet, with the value
    of 0 having the semantics of enabling propagation of the TTL from the
    IP header (i.e. non-zero values disable propagation).

    Also allow the configuration to be overridden globally by reusing the
    same sysctl to control whether the TTL is propagated from IP packets
    into the MPLS header. If the per-LWT attribute is set then it
    overrides the global configuration. If the TTL isn't propagated then a
    default TTL value is used which can be configured via a new sysctl,
    "net.mpls.default_ttl". This is kept separate from the configuration
    of whether IP TTL propagation is enabled as it can be used in the
    future when non-IP payloads are supported (i.e. where there is no
    payload TTL that can be propagated).

    Signed-off-by: Robert Shearman
    Acked-by: David Ahern
    Tested-by: David Ahern
    Signed-off-by: David S. Miller

    Robert Shearman
     
  • Provide the ability to control on a per-route basis whether the TTL
    value from an MPLS packet is propagated to an IPv4/IPv6 packet when
    the last label is popped as per the theoretical model in RFC 3443
    through a new route attribute, RTA_TTL_PROPAGATE which can be 0 to
    mean disable propagation and 1 to mean enable propagation.

    In order to provide the ability to change the behaviour for packets
    arriving with IPv4/IPv6 Explicit Null labels and to provide an easy
    way for a user to change the behaviour for all existing routes without
    having to reprogram them, a global knob is provided. This is done
    through the addition of a new per-namespace sysctl,
    "net.mpls.ip_ttl_propagate", which defaults to enabled. If the
    per-route attribute is set (either enabled or disabled) then it
    overrides the global configuration.

    Signed-off-by: Robert Shearman
    Acked-by: David Ahern
    Tested-by: David Ahern
    Signed-off-by: David S. Miller

    Robert Shearman
     

13 Mar, 2017

2 commits

  • Multipath routes can be rendered usesless when a device in one of the
    paths is deleted. For example:

    $ ip -f mpls ro ls
    100
    nexthop as to 200 via inet 172.16.2.2 dev virt12
    nexthop as to 300 via inet 172.16.3.2 dev br0
    101
    nexthop as to 201 via inet6 2000:2::2 dev virt12
    nexthop as to 301 via inet6 2000:3::2 dev br0

    $ ip li del br0

    When br0 is deleted the other hop is not considered in
    mpls_select_multipath because of the alive check -- rt_nhn_alive
    is 0.

    rt_nhn_alive is decremented once in mpls_ifdown when the device is taken
    down (NETDEV_DOWN) and again when it is deleted (NETDEV_UNREGISTER). For
    a 2 hop route, deleting one device drops the alive count to 0. Since
    devices are taken down before unregistering, the decrement on
    NETDEV_UNREGISTER is redundant.

    Fixes: c89359a42e2a4 ("mpls: support for dead routes")
    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     
  • When the mpls_router module is unloaded, mpls routes are deleted but
    notifications are not sent to userspace leaving userspace caches
    out of sync. Add the call to mpls_notify_route in mpls_net_exit as
    routes are freed.

    Fixes: 0189197f44160 ("mpls: Basic routing support")
    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     

21 Feb, 2017

1 commit

  • Add netconf support to MPLS. Allows userpsace to learn and be notified
    of changes to 'input' enable setting per interface.

    Acked-by: Nicolas Dichtel
    Signed-off-by: David Ahern
    Acked-by: Robert Shearman
    Signed-off-by: David S. Miller

    David Ahern
     

31 Jan, 2017

1 commit


28 Jan, 2017

1 commit


25 Jan, 2017

1 commit


24 Jan, 2017

1 commit

  • MPLS multipath for LSR is broken -- always selecting the first nexthop
    in the one label case. For example:

    $ ip -f mpls ro ls
    100
    nexthop as to 200 via inet 172.16.2.2 dev virt12
    nexthop as to 300 via inet 172.16.3.2 dev virt13
    101
    nexthop as to 201 via inet6 2000:2::2 dev virt12
    nexthop as to 301 via inet6 2000:3::2 dev virt13

    In this example incoming packets have a single MPLS labels which means
    BOS bit is set. The BOS bit is passed from mpls_forward down to
    mpls_multipath_hash which never processes the hash loop because BOS is 1.

    Update mpls_multipath_hash to process the entire label stack. mpls_hdr_len
    tracks the total mpls header length on each pass (on pass N mpls_hdr_len
    is N * sizeof(mpls_shim_hdr)). When the label is found with the BOS set
    it verifies the skb has sufficient header for ipv4 or ipv6, and find the
    IPv4 and IPv6 header by using the last mpls_hdr pointer and adding 1 to
    advance past it.

    With these changes I have verified the code correctly sees the label,
    BOS, IPv4 and IPv6 addresses in the network header and icmp/tcp/udp
    traffic for ipv4 and ipv6 are distributed across the nexthops.

    Fixes: 1c78efa8319ca ("mpls: flow-based multipath selection")
    Acked-by: Robert Shearman
    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     

18 Jan, 2017

1 commit

  • Having MPLS packet stats is useful for observing network operation and
    for diagnosing network problems. In the absence of anything better,
    RFC2863 and RFC3813 are used for guidance for which stats to expose
    and the semantics of them. In particular rx_noroutes maps to in
    unknown protos in RFC2863. The stats are exposed to userspace via
    AF_MPLS attributes embedded in the IFLA_STATS_AF_SPEC attribute of
    RTM_GETSTATS messages.

    All the introduced fields are 64-bit, even error ones, to ensure no
    overflow with long uptimes. Per-CPU counters are used to avoid
    cache-line contention on the commonly used fields. The other fields
    have also been made per-CPU for code to avoid performance problems in
    error conditions on the assumption that on some platforms the cost of
    atomic operations could be more expensive than sending the packet
    (which is what would be done in the success case). If that's not the
    case, we could instead not use per-CPU counters for these fields.

    Only unicast and non-fragment are exposed at the moment, but other
    counters can be exposed in the future either by adding to the end of
    struct mpls_link_stats or by additional netlink attributes in the
    AF_MPLS IFLA_STATS_AF_SPEC nested attribute.

    Signed-off-by: Robert Shearman
    Signed-off-by: David S. Miller

    Robert Shearman
     

07 Dec, 2016

1 commit