04 Nov, 2016

1 commit

  • Andrey reported the following error report while running the syzkaller
    fuzzer:

    general protection fault: 0000 [#1] SMP KASAN
    Dumping ftrace buffer:
    (ftrace buffer empty)
    Modules linked in:
    CPU: 0 PID: 648 Comm: syz-executor Not tainted 4.9.0-rc3+ #333
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
    task: ffff8800398c4480 task.stack: ffff88003b468000
    RIP: 0010:[] [< inline >]
    inet_exact_dif_match include/net/tcp.h:808
    RIP: 0010:[] []
    __inet_lookup_listener+0xb6/0x500 net/ipv4/inet_hashtables.c:219
    RSP: 0018:ffff88003b46f270 EFLAGS: 00010202
    RAX: 0000000000000004 RBX: 0000000000004242 RCX: 0000000000000001
    RDX: 0000000000000000 RSI: ffffc90000e3c000 RDI: 0000000000000054
    RBP: ffff88003b46f2d8 R08: 0000000000004000 R09: ffffffff830910e7
    R10: 0000000000000000 R11: 000000000000000a R12: ffffffff867fa0c0
    R13: 0000000000004242 R14: 0000000000000003 R15: dffffc0000000000
    FS: 00007fb135881700(0000) GS:ffff88003ec00000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000020cc3000 CR3: 000000006d56a000 CR4: 00000000000006f0
    Stack:
    0000000000000000 000000000601a8c0 0000000000000000 ffffffff00004242
    424200003b9083c2 ffff88003def4041 ffffffff84e7e040 0000000000000246
    ffff88003a0911c0 0000000000000000 ffff88003a091298 ffff88003b9083ae
    Call Trace:
    [] tcp_v4_send_reset+0x584/0x1700 net/ipv4/tcp_ipv4.c:643
    [] tcp_v4_rcv+0x198b/0x2e50 net/ipv4/tcp_ipv4.c:1718
    [] ip_local_deliver_finish+0x332/0xad0
    net/ipv4/ip_input.c:216
    ...

    MD5 has a code path that calls __inet_lookup_listener with a null skb,
    so inet{6}_exact_dif_match needs to check skb against null before pulling
    the flag.

    Fixes: a04a480d4392 ("net: Require exact match for TCP socket lookups if
    dif is l3mdev")
    Reported-by: Andrey Konovalov
    Signed-off-by: David Ahern
    Tested-by: Andrey Konovalov
    Signed-off-by: David S. Miller

    David Ahern
     

17 Oct, 2016

1 commit

  • Currently, socket lookups for l3mdev (vrf) use cases can match a socket
    that is bound to a port but not a device (ie., a global socket). If the
    sysctl tcp_l3mdev_accept is not set this leads to ack packets going out
    based on the main table even though the packet came in from an L3 domain.
    The end result is that the connection does not establish creating
    confusion for users since the service is running and a socket shows in
    ss output. Fix by requiring an exact dif to sk_bound_dev_if match if the
    skb came through an interface enslaved to an l3mdev device and the
    tcp_l3mdev_accept is not set.

    skb's through an l3mdev interface are marked by setting a flag in
    inet{6}_skb_parm. The IPv6 variant is already set; this patch adds the
    flag for IPv4. Using an skb flag avoids a device lookup on the dif. The
    flag is set in the VRF driver using the IP{6}CB macros. For IPv4, the
    inet_skb_parm struct is moved in the cb per commit 971f10eca186, so the
    match function in the TCP stack needs to use TCP_SKB_CB. For IPv6, the
    move is done after the socket lookup, so IP6CB is used.

    The flags field in inet_skb_parm struct needs to be increased to add
    another flag. There is currently a 1-byte hole following the flags,
    so it can be expanded to u16 without increasing the size of the struct.

    Fixes: 193125dbd8eb ("net: Introduce VRF device driver")
    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     

30 Sep, 2016

1 commit

  • This implements:
    https://tools.ietf.org/html/rfc7559

    Backoff is performed according to RFC3315 section 14:
    https://tools.ietf.org/html/rfc3315#section-14

    We allow setting /proc/sys/net/ipv6/conf/*/router_solicitations
    to a negative value meaning an unlimited number of retransmits,
    and we make this the new default (inline with the RFC).

    We also add a new setting:
    /proc/sys/net/ipv6/conf/*/router_solicitation_max_interval
    defaulting to 1 hour (per RFC recommendation).

    Signed-off-by: Maciej Żenczykowski
    Acked-by: Erik Kline
    Signed-off-by: David S. Miller

    Maciej Żenczykowski
     

10 Jun, 2016

1 commit

  • Frank Kellermann reported a kernel crash with 4.5.0 when IPv6 is
    disabled at boot using the kernel option ipv6.disable=1. Using
    current net-next with the boot option:

    $ ip link add red type vrf table 1001

    Generates:
    [12210.919584] BUG: unable to handle kernel NULL pointer dereference at 0000000000000748
    [12210.921341] IP: [] fib6_get_table+0x2c/0x5a
    [12210.922537] PGD b79e3067 PUD bb32b067 PMD 0
    [12210.923479] Oops: 0000 [#1] SMP
    [12210.924001] Modules linked in: ipvlan 8021q garp mrp stp llc
    [12210.925130] CPU: 3 PID: 1177 Comm: ip Not tainted 4.7.0-rc1+ #235
    [12210.926168] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
    [12210.928065] task: ffff8800b9ac4640 ti: ffff8800bacac000 task.ti: ffff8800bacac000
    [12210.929328] RIP: 0010:[] [] fib6_get_table+0x2c/0x5a
    [12210.930697] RSP: 0018:ffff8800bacaf888 EFLAGS: 00010202
    [12210.931563] RAX: 0000000000000748 RBX: ffffffff81a9e280 RCX: ffff8800b9ac4e28
    [12210.932688] RDX: 00000000000000e9 RSI: 0000000000000002 RDI: 0000000000000286
    [12210.933820] RBP: ffff8800bacaf898 R08: ffff8800b9ac4df0 R09: 000000000052001b
    [12210.934941] R10: 00000000657c0000 R11: 000000000000c649 R12: 00000000000003e9
    [12210.936032] R13: 00000000000003e9 R14: ffff8800bace7800 R15: ffff8800bb3ec000
    [12210.937103] FS: 00007faa1766c700(0000) GS:ffff88013ac00000(0000) knlGS:0000000000000000
    [12210.938321] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [12210.939166] CR2: 0000000000000748 CR3: 00000000b79d6000 CR4: 00000000000406e0
    [12210.940278] Stack:
    [12210.940603] ffff8800bb3ec000 ffffffff81a9e280 ffff8800bacaf8c8 ffffffff814b3135
    [12210.941818] ffff8800bb3ec000 ffffffff81a9e280 ffffffff81a9e280 ffff8800bace7800
    [12210.943040] ffff8800bacaf8f0 ffffffff81397c88 ffff8800bb3ec000 ffffffff81a9e280
    [12210.944288] Call Trace:
    [12210.944688] [] fib6_new_table+0x24/0x8a
    [12210.945516] [] vrf_dev_init+0xd4/0x162
    [12210.946328] [] register_netdevice+0x100/0x396
    [12210.947209] [] vrf_newlink+0x40/0xb3
    [12210.948001] [] rtnl_newlink+0x5d3/0x6d5
    ...

    The problem above is due to the fact that the fib hash table is not
    allocated when IPv6 is disabled at boot.

    As for the VRF driver it should not do any IPv6 initializations if IPv6
    is disabled, so it needs to know if IPv6 is disabled at boot. The disable
    parameter is private to the IPv6 module, so provide an accessor for
    modules to determine if IPv6 was disabled at boot time.

    Fixes: 35402e3136634 ("net: Add IPv6 support to VRF device")
    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     

12 May, 2016

1 commit

  • Currently the VRF driver uses the rx_handler to switch the skb device
    to the VRF device. Switching the dev prior to the ip / ipv6 layer
    means the VRF driver has to duplicate IP/IPv6 processing which adds
    overhead and makes features such as retaining the ingress device index
    more complicated than necessary.

    This patch moves the hook to the L3 layer just after the first NF_HOOK
    for PRE_ROUTING. This location makes exposing the original ingress device
    trivial (next patch) and allows adding other NF_HOOKs to the VRF driver
    in the future.

    dev_queue_xmit_nit is exported so that the VRF driver can cycle the skb
    with the switched device through the packet taps to maintain current
    behavior (tcpdump can be used on either the vrf device or the enslaved
    devices).

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     

20 Apr, 2016

1 commit

  • Struct ctl_table_header holds pointer to sysctl table which could be used
    for freeing it after unregistration. IPv4 sysctls already use that.
    Remove redundant NULL assignment: ndev allocated using kzalloc.

    This also saves some bytes: sysctl table could be shorter than
    DEVCONF_MAX+1 if some options are disable in config.

    Signed-off-by: Konstantin Khlebnikov
    Signed-off-by: David S. Miller

    Konstantin Khlebnikov
     

26 Feb, 2016

1 commit

  • Currently, all ipv6 addresses are flushed when the interface is configured
    down, including global, static addresses:

    $ ip -6 addr show dev eth1
    3: eth1: mtu 1500 state UP qlen 1000
    inet6 2100:1::2/120 scope global
    valid_lft forever preferred_lft forever
    inet6 fe80::e0:f9ff:fe79:34bd/64 scope link
    valid_lft forever preferred_lft forever
    $ ip link set dev eth1 down
    $ ip -6 addr show dev eth1
    << nothing; all addresses have been flushed>>

    Add a new sysctl to make this behavior optional. The new setting defaults to
    flush all addresses to maintain backwards compatibility. When the set global
    addresses with no expire times are not flushed on an admin down. The sysctl
    is per-interface or system-wide for all interfaces

    $ sysctl -w net.ipv6.conf.eth1.keep_addr_on_down=1
    or
    $ sysctl -w net.ipv6.conf.all.keep_addr_on_down=1

    Will keep addresses on eth1 on an admin down.

    $ ip -6 addr show dev eth1
    3: eth1: mtu 1500 state UP qlen 1000
    inet6 2100:1::2/120 scope global
    valid_lft forever preferred_lft forever
    inet6 fe80::e0:f9ff:fe79:34bd/64 scope link
    valid_lft forever preferred_lft forever
    $ ip link set dev eth1 down
    $ ip -6 addr show dev eth1
    3: eth1: mtu 1500 state DOWN qlen 1000
    inet6 2100:1::2/120 scope global tentative
    valid_lft forever preferred_lft forever
    inet6 fe80::e0:f9ff:fe79:34bd/64 scope link tentative
    valid_lft forever preferred_lft forever

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     

11 Feb, 2016

2 commits

  • In certain 802.11 wireless deployments, there will be NA proxies
    that use knowledge of the network to correctly answer requests.
    To prevent unsolicitd advertisements on the shared medium from
    being a problem, on such deployments wireless needs to drop them.

    Enable this by providing an option called "drop_unsolicited_na".

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     
  • In order to solve a problem with 802.11, the so-called hole-196 attack,
    add an option (sysctl) called "drop_unicast_in_l2_multicast" which, if
    enabled, causes the stack to drop IPv6 unicast packets encapsulated in
    link-layer multi- or broadcast frames. Such frames can (as an attack)
    be created by any member of the same wireless network and transmitted
    as valid encrypted frames since the symmetric key for broadcast frames
    is shared between all stations.

    Reviewed-by: Julian Anastasov
    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     

03 Dec, 2015

1 commit

  • This patch addresses multiple problems :

    UDP/RAW sendmsg() need to get a stable struct ipv6_txoptions
    while socket is not locked : Other threads can change np->opt
    concurrently. Dmitry posted a syzkaller
    (http://github.com/google/syzkaller) program desmonstrating
    use-after-free.

    Starting with TCP/DCCP lockless listeners, tcp_v6_syn_recv_sock()
    and dccp_v6_request_recv_sock() also need to use RCU protection
    to dereference np->opt once (before calling ipv6_dup_options())

    This patch adds full RCU protection to np->opt

    Reported-by: Dmitry Vyukov
    Signed-off-by: Eric Dumazet
    Acked-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Eric Dumazet
     

05 Oct, 2015

1 commit


14 Aug, 2015

1 commit

  • Like the ipv4 patch with a similar title, this adds a sysctl to allow
    the user to change routing behavior based on whether or not the
    interface associated with the nexthop was an up or down link. The
    default setting preserves the current behavior, but anyone that enables
    it will notice that nexthops on down interfaces will no longer be
    selected:

    net.ipv6.conf.all.ignore_routes_with_linkdown = 0
    net.ipv6.conf.default.ignore_routes_with_linkdown = 0
    net.ipv6.conf.lo.ignore_routes_with_linkdown = 0
    ...

    When the above sysctls are set, not only will link status be reported to
    userspace, but an indication that a nexthop is dead and will not be used
    is also reported.

    1000::/8 via 7000::2 dev p7p1 metric 1024 dead linkdown pref medium
    1000::/8 via 8000::2 dev p8p1 metric 1024 pref medium
    7000::/8 dev p7p1 proto kernel metric 256 dead linkdown pref medium
    8000::/8 dev p8p1 proto kernel metric 256 pref medium
    9000::/8 via 8000::2 dev p8p1 metric 2048 pref medium
    9000::/8 via 7000::2 dev p7p1 metric 1024 dead linkdown pref medium
    fe80::/64 dev p7p1 proto kernel metric 256 dead linkdown pref medium
    fe80::/64 dev p8p1 proto kernel metric 256 pref medium

    This also adds devconf support and notification when sysctl values
    change.

    v2: drop use of rt6i_nhflags since it is not needed right now

    Signed-off-by: Andy Gospodarek
    Signed-off-by: Dinesh Dutt
    Signed-off-by: David S. Miller

    Andy Gospodarek
     

31 Jul, 2015

1 commit

  • Commit 6fd99094de2b ("ipv6: Don't reduce hop limit for an interface")
    disabled accept hop limit from RA if it is smaller than the current hop
    limit for security stuff. But this behavior kind of break the RFC definition.

    RFC 4861, 6.3.4. Processing Received Router Advertisements
    A Router Advertisement field (e.g., Cur Hop Limit, Reachable Time,
    and Retrans Timer) may contain a value denoting that it is
    unspecified. In such cases, the parameter should be ignored and the
    host should continue using whatever value it is already using.

    If the received Cur Hop Limit value is non-zero, the host SHOULD set
    its CurHopLimit variable to the received value.

    So add sysctl option accept_ra_min_hop_limit to let user choose the minimum
    hop limit value they can accept from RA. And set default to 1 to meet RFC
    standards.

    Signed-off-by: Hangbin Liu
    Acked-by: YOSHIFUJI Hideaki
    Signed-off-by: David S. Miller

    Hangbin Liu
     

23 Jul, 2015

1 commit

  • Per RFC 6724, section 4, "Candidate Source Addresses":

    It is RECOMMENDED that the candidate source addresses be the set
    of unicast addresses assigned to the interface that will be used
    to send to the destination (the "outgoing" interface).

    Add a sysctl to enable this behaviour.

    Signed-off-by: Erik Kline
    Signed-off-by: David S. Miller

    Erik Kline
     

10 Jul, 2015

1 commit


24 Mar, 2015

1 commit

  • This patch implements the procfs logic for the stable_address knob:
    The secret is formatted as an ipv6 address and will be stored per
    interface and per namespace. We track initialized flag and return EIO
    errors until the secret is set.

    We don't inherit the secret to newly created namespaces.

    Cc: Erik Kline
    Cc: Fernando Gont
    Cc: Lorenzo Colitti
    Cc: YOSHIFUJI Hideaki/吉藤英明
    Signed-off-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Hannes Frederic Sowa
     

03 Feb, 2015

1 commit

  • Pull IPv6 cork initialization into its own function that
    can be re-used. IPv6 specific cork data did not have an
    explicit data structure. This patch creats eone so that
    just ipv6 cork data can be as arguemts. Also, since
    IPv6 tries to save the flow label into inet_cork_full
    tructure, pass the full cork.

    Adjust ip6_cork_release() to take cork data structures.

    Signed-off-by: Vladislav Yasevich
    Signed-off-by: David S. Miller

    Vlad Yasevich
     

26 Jan, 2015

1 commit

  • The kernel forcefully applies MTU values received in router
    advertisements provided the new MTU is less than the current. This
    behavior is undesirable when the user space is managing the MTU. Instead
    a sysctl flag 'accept_ra_mtu' is introduced such that the user space
    can control whether or not RA provided MTU updates should be applied. The
    default behavior is unchanged; user space must explicitly set this flag
    to 0 for RA MTUs to be ignored.

    Signed-off-by: Harout Hedeshian
    Signed-off-by: David S. Miller

    Harout Hedeshian
     

06 Nov, 2014

1 commit


30 Oct, 2014

1 commit

  • Add a sysctl that causes an interface's optimistic addresses
    to be considered equivalent to other non-deprecated addresses
    for source address selection purposes. Preferred addresses
    will still take precedence over optimistic addresses, subject
    to other ranking in the source address selection algorithm.

    This is useful where different interfaces are connected to
    different networks from different ISPs (e.g., a cell network
    and a home wifi network).

    The current behaviour complies with RFC 3484/6724, and it
    makes sense if the host has only one interface, or has
    multiple interfaces on the same network (same or cooperating
    administrative domain(s), but not in the multiple distinct
    networks case.

    For example, if a mobile device has an IPv6 address on an LTE
    network and then connects to IPv6-enabled wifi, while the wifi
    IPv6 address is undergoing DAD, IPv6 connections will try use
    the wifi default route with the LTE IPv6 address, and will get
    stuck until they time out.

    Also, because optimistic nodes can receive frames, issue
    an RTM_NEWADDR as soon as DAD starts (with the IFA_F_OPTIMSTIC
    flag appropriately set). A second RTM_NEWADDR is sent if DAD
    completes (the address flags have changed), otherwise an
    RTM_DELADDR is sent.

    Also: add an entry in ip-sysctl.txt for optimistic_dad.

    Signed-off-by: Erik Kline
    Acked-by: Lorenzo Colitti
    Acked-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Erik Kline
     

08 Jul, 2014

1 commit

  • Automatically generate flow labels for IPv6 packets on transmit.
    The flow label is computed based on skb_get_hash. The flow label will
    only automatically be set when it is zero otherwise (i.e. flow label
    manager hasn't set one). This supports the transmit side functionality
    of RFC 6438.

    Added an IPv6 sysctl auto_flowlabels to enable/disable this behavior
    system wide, and added IPV6_AUTOFLOWLABEL socket option to enable this
    functionality per socket.

    By default, auto flowlabels are disabled to avoid possible conflicts
    with flow label manager, however if this feature proves useful we
    may want to enable it by default.

    It should also be noted that FreeBSD has already implemented automatic
    flow labels (including the sysctl and socket option). In FreeBSD,
    automatic flow labels default to enabled.

    Performance impact:

    Running super_netperf with 200 flows for TCP_RR and UDP_RR for
    IPv6. Note that in UDP case, __skb_get_hash will be called for
    every packet with explains slight regression. In the TCP case
    the hash is saved in the socket so there is no regression.

    Automatic flow labels disabled:

    TCP_RR:
    86.53% CPU utilization
    127/195/322 90/95/99% latencies
    1.40498e+06 tps

    UDP_RR:
    90.70% CPU utilization
    118/168/243 90/95/99% latencies
    1.50309e+06 tps

    Automatic flow labels enabled:

    TCP_RR:
    85.90% CPU utilization
    128/199/337 90/95/99% latencies
    1.40051e+06

    UDP_RR
    92.61% CPU utilization
    115/164/236 90/95/99% latencies
    1.4687e+06

    Signed-off-by: Tom Herbert
    Signed-off-by: David S. Miller

    Tom Herbert
     

02 Jul, 2014

2 commits

  • When an UDP application switches from AF_INET to AF_INET6 sockets, we
    have a small performance degradation for IPv4 communications because of
    extra cache line misses to access ipv6only information.

    This can also be noticed for TCP listeners, as ipv6_only_sock() is also
    used from __inet_lookup_listener()->compute_score()

    This is magnified when SO_REUSEPORT is used.

    Move ipv6only into struct sock_common so that it is available at
    no extra cost in lookups.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • This can be used in virtual networking applications, and
    may have other uses as well. The option is disabled by
    default.

    A specific use case is setting up virtual routers, bridges, and
    hosts on a single OS without the use of network namespaces or
    virtual machines. With proper use of ip rules, routing tables,
    veth interface pairs and/or other virtual interfaces,
    and applications that can bind to interfaces and/or IP addresses,
    it is possibly to create one or more virtual routers with multiple
    hosts attached. The host interfaces can act as IPv6 systems,
    with radvd running on the ports in the virtual routers. With the
    option provided in this patch enabled, those hosts can now properly
    obtain IPv6 addresses from the radvd.

    Signed-off-by: Ben Greear
    Signed-off-by: David S. Miller

    Ben Greear
     

28 Jun, 2014

1 commit

  • Since pktops is only used for IPv6 only and opts is used for IPv4
    only, we can move these fields into a union and this allows us to drop
    the inet6_reqsk_alloc function as after this change it becomes
    equivalent with inet_reqsk_alloc.

    This patch also fixes a kmemcheck issue in the IPv6 stack: the flags
    field was not annotated after a request_sock was allocated.

    Signed-off-by: Octavian Purdila
    Signed-off-by: David S. Miller

    Octavian Purdila
     

20 Jan, 2014

2 commits

  • We currently don't report IPV6_RECVPKTINFO in cmsg access ancillary data
    for IPv4 datagrams on IPv6 sockets.

    This patch splits the ip6_datagram_recv_ctl into two functions, one
    which handles both protocol families, AF_INET and AF_INET6, while the
    ip6_datagram_recv_specific_ctl only handles IPv6 cmsg data.

    ip6_datagram_recv_*_ctl never reported back any errors, so we can make
    them return void. Also provide a helper for protocols which don't offer dual
    personality to further use ip6_datagram_recv_ctl, which is exported to
    modules.

    I needed to shuffle the code for ping around a bit to make it easier to
    implement dual personality for ping ipv6 sockets in future.

    Reported-by: Gert Doering
    Signed-off-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Hannes Frederic Sowa
     
  • With this option, the socket will reply with the flow label value read
    on received packets.

    The goal is to have a connection with the same flow label in both
    direction of the communication.

    Changelog of V4:
    * Do not erase the flow label on the listening socket. Use pktopts to
    store the received value

    Signed-off-by: Florent Fourcot
    Acked-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Florent Fourcot
     

19 Dec, 2013

1 commit

  • IPV6_PMTU_INTERFACE is the same as IPV6_PMTU_PROBE for ipv6. Add it
    nontheless for symmetry with IPv4 sockets. Also drop incoming MTU
    information if this mode is enabled.

    The additional bit in ipv6_pinfo just eats in the padding behind the
    bitfield. There are no changes to the layout of the struct at all.

    Signed-off-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Hannes Frederic Sowa
     

10 Dec, 2013

2 commits

  • tclass information in now already stored in rcv_flowinfo
    We do not need to store the same information twice.

    Signed-off-by: Florent Fourcot
    Reviewed-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Florent Fourcot
     
  • The current implementation of IPV6_FLOWINFO only gives a
    result if pktoptions is available (thanks to the
    ip6_datagram_recv_ctl function).
    It gives inconsistent results to user space, sometimes
    there is a result for getsockopt(IPV6_FLOWINFO), sometimes
    not.

    This patch add rcv_flowinfo to store it, and return it to
    the userspace in the same way than other pkt_options.

    Signed-off-by: Florent Fourcot
    Reviewed-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Florent Fourcot
     

06 Dec, 2013

1 commit

  • The code to detect fragments in checksum_setup() was missing for IPv4 and
    too eager for IPv6. (It transpires that Windows seems to send IPv6 packets
    with a fragment header even if they are not a fragment - i.e. offset is zero,
    and M bit is not set).

    This patch also incorporates a fix to callers of maybe_pull_tail() where
    skb->network_header was being erroneously added to the length argument.

    Signed-off-by: Paul Durrant
    Signed-off-by: Zoltan Kiss
    Cc: Wei Liu
    Cc: Ian Campbell
    Cc: David Vrabel
    cc: David Miller
    Acked-by: Wei Liu
    Signed-off-by: David S. Miller

    Paul Durrant
     

29 Oct, 2013

1 commit


10 Oct, 2013

1 commit

  • TCP listener refactoring, part 5 :

    We want to be able to insert request sockets (SYN_RECV) into main
    ehash table instead of the per listener hash table to allow RCU
    lookups and remove listener lock contention.

    This patch includes the needed struct sock_common in front
    of struct request_sock

    This means there is no more inet6_request_sock IPv6 specific
    structure.

    Following inet_request_sock fields were renamed as they became
    macros to reference fields from struct sock_common.
    Prefix ir_ was chosen to avoid name collisions.

    loc_port -> ir_loc_port
    loc_addr -> ir_loc_addr
    rmt_addr -> ir_rmt_addr
    rmt_port -> ir_rmt_port
    iif -> ir_iif

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

09 Oct, 2013

1 commit

  • TCP listener refactoring, part 4 :

    To speed up inet lookups, we moved IPv4 addresses from inet to struct
    sock_common

    Now is time to do the same for IPv6, because it permits us to have fast
    lookups for all kind of sockets, including upcoming SYN_RECV.

    Getting IPv6 addresses in TCP lookups currently requires two extra cache
    lines, plus a dereference (and memory stall).

    inet6_sk(sk) does the dereference of inet_sk(__sk)->pinet6

    This patch is way bigger than its IPv4 counter part, because for IPv4,
    we could add aliases (inet_daddr, inet_rcv_saddr), while on IPv6,
    it's not doable easily.

    inet6_sk(sk)->daddr becomes sk->sk_v6_daddr
    inet6_sk(sk)->rcv_saddr becomes sk->sk_v6_rcv_saddr

    And timewait socket also have tw->tw_v6_daddr & tw->tw_v6_rcv_saddr
    at the same offset.

    We get rid of INET6_TW_MATCH() as INET6_MATCH() is now the generic
    macro.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

04 Oct, 2013

1 commit

  • TCP listener refactoring, part 2 :

    We can use a generic lookup, sockets being in whatever state, if
    we are sure all relevant fields are at the same place in all socket
    types (ESTABLISH, TIME_WAIT, SYN_RECV)

    This patch removes these macros :

    inet_addrpair, inet_addrpair, tw_addrpair, tw_portpair

    And adds :

    sk_portpair, sk_addrpair, sk_daddr, sk_rcv_saddr

    Then, INET_TW_MATCH() is really the same than INET_MATCH()

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

30 Aug, 2013

1 commit


27 Aug, 2013

1 commit


20 Aug, 2013

1 commit

  • It is not allowed for an ipv6 packet to contain multiple fragmentation
    headers. So discard packets which were already reassembled by
    fragmentation logic and send back a parameter problem icmp.

    The updates for RFC 6980 will come in later, I have to do a bit more
    research here.

    Cc: YOSHIFUJI Hideaki
    Signed-off-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Hannes Frederic Sowa
     

14 Aug, 2013

1 commit

  • Commit cab70040dfd95ee32144f02fade64f0cb94f31a0 ("net: igmp:
    Reduce Unsolicited report interval to 1s when using IGMPv3") and
    2690048c01f32bf45d1c1e1ab3079bc10ad2aea7 ("net: igmp: Allow user-space
    configuration of igmp unsolicited report interval") by William Manley made
    igmp unsolicited report intervals configurable per interface and corrected
    the interval of unsolicited igmpv3 report messages resendings to 1s.

    Same needs to be done for IPv6:

    MLDv1 (RFC2710 7.10.): 10 seconds
    MLDv2 (RFC3810 9.11.): 1 second

    Both intervals are configurable via new procfs knobs
    mldv1_unsolicited_report_interval and mldv2_unsolicited_report_interval.

    (also added .force_mld_version to ipv6_devconf_dflt to bring structs in
    line without semantic changes)

    v2:
    a) Joined documentation update for IPv4 and IPv6 MLD/IGMP
    unsolicited_report_interval procfs knobs.
    b) incorporate stylistic feedback from William Manley

    v3:
    a) add new DEVCONF_* values to the end of the enum (thanks to David
    Miller)

    Cc: Cong Wang
    Cc: William Manley
    Cc: Benjamin LaHaise
    Cc: YOSHIFUJI Hideaki
    Signed-off-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Hannes Frederic Sowa
     

31 Jan, 2013

1 commit


14 Jan, 2013

1 commit