09 Mar, 2016

2 commits

  • One of our customers observed issues with FIB6 garbage collectors
    running in different network namespaces blocking each other, resulting
    in soft lockups (fib6_run_gc() initiated from timer runs always in
    forced mode).

    Now that FIB6 walkers are separated per namespace, there is no more need
    for instances of fib6_run_gc() in different namespaces blocking each
    other. There is still a call to icmp6_dst_gc() which operates on shared
    data but this function is protected by its own shared lock.

    Signed-off-by: Michal Kubecek
    Reviewed-by: Cong Wang
    Signed-off-by: David S. Miller

    Michal Kubeček
     
  • The IPv6 FIB data structures are separated per network namespace but
    there is still only one global walkers list and one global walker list
    lock. This means changes in one namespace unnecessarily interfere with
    walkers in other namespaces.

    Replace the global list with per-netns lists (and give each its own
    lock).

    Signed-off-by: Michal Kubecek
    Reviewed-by: Cong Wang
    Signed-off-by: David S. Miller

    Michal Kubeček
     

10 Jul, 2015

1 commit

  • Add support to allow non-local binds similar to how this was done for IPv4.
    Non-local binds are very useful in emulating the Internet in a box, etc.

    This add the ip_nonlocal_bind sysctl under ipv6.

    Testing:

    Set up nonlocal binding and receive routing on a host, e.g.:

    ip -6 rule add from ::/0 iif eth0 lookup 200
    ip -6 route add local 2001:0:0:1::/64 dev lo proto kernel scope host table 200
    sysctl -w net.ipv6.ip_nonlocal_bind=1

    Set up routing to 2001:0:0:1::/64 on peer to go to first host

    ping6 -I 2001:0:0:1::1 peer-address -- to verify

    Signed-off-by: Tom Herbert
    Signed-off-by: David S. Miller

    Tom Herbert
     

04 May, 2015

1 commit

  • This patch divides the IPv6 flow label space into two ranges:
    0-7ffff is reserved for flow label manager, 80000-fffff will be
    used for creating auto flow labels (per RFC6438). This only affects how
    labels are set on transmit, it does not affect receive. This range split
    can be disbaled by systcl.

    Background:

    IPv6 flow labels have been an unmitigated disappointment thus far
    in the lifetime of IPv6. Support in HW devices to use them for ECMP
    is lacking, and OSes don't turn them on by default. If we had these
    we could get much better hashing in IPv6 networks without resorting
    to DPI, possibly eliminating some of the motivations to to define new
    encaps in UDP just for getting ECMP.

    Unfortunately, the initial specfications of IPv6 did not clarify
    how they are to be used. There has always been a vague concept that
    these can be used for ECMP, flow hashing, etc. and we do now have a
    good standard how to this in RFC6438. The problem is that flow labels
    can be either stateful or stateless (as in RFC6438), and we are
    presented with the possibility that a stateless label may collide
    with a stateful one. Attempts to split the flow label space were
    rejected in IETF. When we added support in Linux for RFC6438, we
    could not turn on flow labels by default due to this conflict.

    This patch splits the flow label space and should give us
    a path to enabling auto flow labels by default for all IPv6 packets.
    This is an API change so we need to consider compatibility with
    existing deployment. The stateful range is chosen to be the lower
    values in hopes that most uses would have chosen small numbers.

    Once we resolve the stateless/stateful issue, we can proceed to
    look at enabling RFC6438 flow labels by default (starting with
    scaled testing).

    Signed-off-by: Tom Herbert
    Signed-off-by: David S. Miller

    Tom Herbert
     

24 Mar, 2015

1 commit


28 Feb, 2015

1 commit

  • Joining multicast group on ethernet level via "ip maddr" command would
    not work if we have an Ethernet switch that does igmp snooping since
    the switch would not replicate multicast packets on ports that did not
    have IGMP reports for the multicast addresses.

    Linux vxlan interfaces created via "ip link add vxlan" have the group option
    that enables then to do the required join.

    By extending ip address command with option "autojoin" we can get similar
    functionality for openvswitch vxlan interfaces as well as other tunneling
    mechanisms that need to receive multicast traffic. The kernel code is
    structured similar to how the vxlan driver does a group join / leave.

    example:
    ip address add 224.1.1.10/24 dev eth5 autojoin
    ip address del 224.1.1.10/24 dev eth5

    Signed-off-by: Madhu Challa
    Signed-off-by: David S. Miller

    Madhu Challa
     

07 Oct, 2014

1 commit

  • Try to reduce number of possible fn_sernum mutation by constraining them
    to their namespace.

    Also remove rt_genid which I forgot to remove in 705f1c869d577c ("ipv6:
    remove rt6i_genid").

    Cc: YOSHIFUJI Hideaki
    Cc: Martin Lau
    Signed-off-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Hannes Frederic Sowa
     

08 Jul, 2014

1 commit

  • Automatically generate flow labels for IPv6 packets on transmit.
    The flow label is computed based on skb_get_hash. The flow label will
    only automatically be set when it is zero otherwise (i.e. flow label
    manager hasn't set one). This supports the transmit side functionality
    of RFC 6438.

    Added an IPv6 sysctl auto_flowlabels to enable/disable this behavior
    system wide, and added IPV6_AUTOFLOWLABEL socket option to enable this
    functionality per socket.

    By default, auto flowlabels are disabled to avoid possible conflicts
    with flow label manager, however if this feature proves useful we
    may want to enable it by default.

    It should also be noted that FreeBSD has already implemented automatic
    flow labels (including the sysctl and socket option). In FreeBSD,
    automatic flow labels default to enabled.

    Performance impact:

    Running super_netperf with 200 flows for TCP_RR and UDP_RR for
    IPv6. Note that in UDP case, __skb_get_hash will be called for
    every packet with explains slight regression. In the TCP case
    the hash is saved in the socket so there is no regression.

    Automatic flow labels disabled:

    TCP_RR:
    86.53% CPU utilization
    127/195/322 90/95/99% latencies
    1.40498e+06 tps

    UDP_RR:
    90.70% CPU utilization
    118/168/243 90/95/99% latencies
    1.50309e+06 tps

    Automatic flow labels enabled:

    TCP_RR:
    85.90% CPU utilization
    128/199/337 90/95/99% latencies
    1.40051e+06

    UDP_RR
    92.61% CPU utilization
    115/164/236 90/95/99% latencies
    1.4687e+06

    Signed-off-by: Tom Herbert
    Signed-off-by: David S. Miller

    Tom Herbert
     

14 May, 2014

1 commit

  • Kernel-originated IP packets that have no user socket associated
    with them (e.g., ICMP errors and echo replies, TCP RSTs, etc.)
    are emitted with a mark of zero. Add a sysctl to make them have
    the same mark as the packet they are replying to.

    This allows an administrator that wishes to do so to use
    mark-based routing, firewalling, etc. for these replies by
    marking the original packets inbound.

    Tested using user-mode linux:
    - ICMP/ICMPv6 echo replies and errors.
    - TCP RST packets (IPv4 and IPv6).

    Signed-off-by: Lorenzo Colitti
    Signed-off-by: David S. Miller

    Lorenzo Colitti
     

20 Jan, 2014

1 commit

  • With the introduction of IPV6_FL_F_REFLECT, there is no guarantee of
    flow label unicity. This patch introduces a new sysctl to protect the old
    behaviour, enable by default.

    Changelog of V3:
    * rename ip6_flowlabel_consistency to flowlabel_consistency
    * use net_info_ratelimited()
    * checkpatch cleanups

    Signed-off-by: Florent Fourcot
    Acked-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Florent Fourcot
     

15 Jan, 2014

1 commit


08 Jan, 2014

1 commit

  • This change allows to follow a recommandation of RFC4942.

    - Add "anycast_src_echo_reply" sysctl to control the use of anycast addresses
    as source addresses for ICMPv6 echo reply. This sysctl is false by default
    to preserve existing behavior.
    - Add inline check ipv6_anycast_destination().
    - Use them in icmpv6_echo_reply().

    Reference:
    RFC4942 - IPv6 Transition/Coexistence Security Considerations
    (http://tools.ietf.org/html/rfc4942#section-2.1.6)

    2.1.6. Anycast Traffic Identification and Security

    [...]
    To avoid exposing knowledge about the internal structure of the
    network, it is recommended that anycast servers now take advantage of
    the ability to return responses with the anycast address as the
    source address if possible.

    Signed-off-by: Francois-Xavier Le Bail
    Acked-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    FX Le Bail
     

01 Aug, 2013

1 commit

  • Current net name space has only one genid for both IPv4 and IPv6, it has below
    drawbacks:

    - Add/delete an IPv4 address will invalidate all IPv6 routing table entries.
    - Insert/remove XFRM policy will also invalidate both IPv4/IPv6 routing table
    entries even when the policy is only applied for one address family.

    Thus, this patch attempt to split one genid for two to cater for IPv4 and IPv6
    separately in a fine granularity.

    Signed-off-by: Fan Du
    Acked-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    fan.du
     

25 Mar, 2013

1 commit

  • This patch adds a dev_addr_genid for IPv6. The goal is to use it, combined with
    dev_base_seq to check if a change occurs during a netlink dump.
    If a change is detected, the flag NLM_F_DUMP_INTR is set in the first message
    after the dump was interrupted.

    Note that only dump of unicast addresses is checked (multicast and anycast are
    not checked).

    Reported-by: Junwei Zhang
    Reported-by: Hongjun Li
    Signed-off-by: Nicolas Dichtel
    Signed-off-by: David S. Miller

    Nicolas Dichtel
     

06 Feb, 2013

1 commit

  • The xfrm gc threshold can be configured via xfrm{4,6}_gc_thresh
    sysctl but currently only in init_net, other namespaces always
    use the default value. This can substantially limit the number
    of IPsec tunnels that can be effectively used.

    Signed-off-by: Michal Kubecek
    Signed-off-by: Steffen Klassert

    Michal Kubecek
     

20 Sep, 2012

1 commit

  • As pointed by Michal, it is necessary to add a new
    namespace for nf_conntrack_reasm code, this prepares
    for the second patch.

    Cc: Herbert Xu
    Cc: Michal Kubeček
    Cc: David Miller
    Cc: Patrick McHardy
    Cc: Pablo Neira Ayuso
    Cc: netfilter-devel@vger.kernel.org
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Amerigo Wang
     

30 Aug, 2012

1 commit


09 Jun, 2012

1 commit

  • now inetpeer doesn't support namespace,the information will
    be leaking across namespace.

    this patch move the global vars v4_peers and v6_peers to
    netns_ipv4 and netns_ipv6 as a field peers.

    add struct pernet_operations inetpeer_ops to initial pernet
    inetpeer data.

    and change family_to_base and inet_getpeer to support namespace.

    Signed-off-by: Gao feng
    Signed-off-by: David S. Miller

    Gao feng
     

21 Apr, 2012

1 commit


11 May, 2010

4 commits

  • This patch adds support for multiple independant multicast routing instances,
    named "tables".

    Userspace multicast routing daemons can bind to a specific table instance by
    issuing a setsockopt call using a new option MRT6_TABLE. The table number is
    stored in the raw socket data and affects all following ip6mr setsockopt(),
    getsockopt() and ioctl() calls. By default, a single table (RT6_TABLE_DFLT)
    is created with a default routing rule pointing to it. Newly created pim6reg
    devices have the table number appended ("pim6regX"), with the exception of
    devices created in the default table, which are named just "pim6reg" for
    compatibility reasons.

    Packets are directed to a specific table instance using routing rules,
    similar to how regular routing rules work. Currently iif, oif and mark
    are supported as keys, source and destination addresses could be supported
    additionally.

    Example usage:

    - bind pimd/xorp/... to a specific table:

    uint32_t table = 123;
    setsockopt(fd, SOL_IPV6, MRT6_TABLE, &table, sizeof(table));

    - create routing rules directing packets to the new table:

    # ip -6 mrule add iif eth0 lookup 123
    # ip -6 mrule add oif eth0 lookup 123

    Signed-off-by: Patrick McHardy

    Patrick McHardy
     
  • Signed-off-by: Patrick McHardy

    Patrick McHardy
     
  • Signed-off-by: Patrick McHardy

    Patrick McHardy
     
  • The unres_queue is currently shared between all namespaces. Following patches
    will additionally allow to create multiple multicast routing tables in each
    namespace. Having a single shared queue for all these users seems to excessive,
    move the queue and the cleanup timer to the per-namespace data to unshare it.

    As a side-effect, this fixes a bug in the seq file iteration functions: the
    first entry returned is always from the current namespace, entries returned
    after that may belong to any namespace.

    Signed-off-by: Patrick McHardy

    Patrick McHardy
     

18 Jan, 2010

1 commit


02 Sep, 2009

1 commit

  • struct net::ipv6.ip6_dst_ops is separatedly dynamically allocated,
    but there is no fundamental reason for it. Embed it directly into
    struct netns_ipv6.

    For that:
    * move struct dst_ops into separate header to fix circular dependencies
    I honestly tried not to, it's pretty impossible to do other way
    * drop dynamical allocation, allocate together with netns

    For a change, remove struct dst_ops::dst_net, it's deducible
    by using container_of() given dst_ops pointer.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: David S. Miller

    Alexey Dobriyan
     

11 Dec, 2008

6 commits

  • Preliminary work to make IPv6 multicast forwarding netns-aware.

    Declare variable 'reg_vif_num' per-namespace, moves into struct netns_ipv6.

    At the moment, this variable is only referenced in init_net.

    Signed-off-by: Benjamin Thery
    Signed-off-by: David S. Miller

    Benjamin Thery
     
  • Preliminary work to make IPv6 multicast forwarding netns-aware.

    Declare IPv6 multicast forwarding variables 'mroute_do_assert' and
    'mroute_do_pim' per-namespace in struct netns_ipv6.

    At the moment, these variables are only referenced in init_net.

    Signed-off-by: Benjamin Thery
    Signed-off-by: David S. Miller

    Benjamin Thery
     
  • Preliminary work to make IPv6 multicast forwarding netns-aware.

    Declare variable cache_resolve_queue_len per-namespace: moves it into
    struct netns_ipv6.

    This variable counts the number of unresolved cache entries queued in the
    list mfc_unres_queue. This list is kept global to all netns as the number
    of entries per namespace is limited to 10 (hardcoded in routine
    ip6mr_cache_unresolved).
    Entries belonging to different namespaces in mfc_unres_queue will be
    identified by matching the mfc_net member introduced previously in
    struct mfc6_cache.

    Keeping this list global to all netns, also allows us to keep a single
    timer (ipmr_expire_timer) to handle their expiration.
    In some places cache_resolve_queue_len value was tested for arming
    or deleting the timer. These tests were equivalent to testing
    mfc_unres_queue value instead and are replaced in this patch.

    At the moment, cache_resolve_queue_len is only referenced in init_net.

    Signed-off-by: Benjamin Thery
    Signed-off-by: David S. Miller

    Benjamin Thery
     
  • Preliminary work to make IPv6 multicast forwarding netns-aware.

    Dynamically allocates IPv6 multicast forwarding cache, mfc6_cache_array,
    and moves it to struct netns_ipv6.

    At the moment, mfc6_cache_array is only referenced in init_net.

    Replace 'ARRAY_SIZE(mfc6_cache_array)' with mfc6_cache_array size: MFC6_LINES.

    Signed-off-by: Benjamin Thery
    Signed-off-by: David S. Miller

    Benjamin Thery
     
  • Preliminary work to make IPv6 multicast forwarding netns-aware.

    Dynamically allocates interface table vif6_table and moves it to
    struct netns_ipv6, and updates MIF_EXISTS() macro.

    At the moment, vif6_table is only referenced in init_net.

    Signed-off-by: Benjamin Thery
    Signed-off-by: David S. Miller

    Benjamin Thery
     
  • Preliminary work to make IPv6 multicast forwarding netns-aware.

    Make IPv6 multicast forwarding mroute6_socket per-namespace,
    moves it into struct netns_ipv6.

    At the moment, mroute6_socket is only referenced in init_net.

    Signed-off-by: Benjamin Thery
    Signed-off-by: David S. Miller

    Benjamin Thery
     

23 Jul, 2008

1 commit


10 Jun, 2008

1 commit


08 Mar, 2008

3 commits


05 Mar, 2008

3 commits


04 Mar, 2008

1 commit