21 May, 2019

1 commit

  • Add SPDX license identifiers to all files which:

    - Have no license information of any form

    - Have EXPORT_.*_SYMBOL_GPL inside which was used in the
    initial scan/conversion to ignore the file

    These files fall under the project license, GPL v2 only. The resulting SPDX
    license identifier is:

    GPL-2.0-only

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

28 Mar, 2019

1 commit

  • According to Amit Klein and Benny Pinkas, IP ID generation is too weak
    and might be used by attackers.

    Even with recent net_hash_mix() fix (netns: provide pure entropy for net_hash_mix())
    having 64bit key and Jenkins hash is risky.

    It is time to switch to siphash and its 128bit keys.

    Signed-off-by: Eric Dumazet
    Reported-by: Amit Klein
    Reported-by: Benny Pinkas
    Signed-off-by: David S. Miller

    Eric Dumazet
     

24 Nov, 2017

1 commit

  • Tuntap and similar devices can inject GSO packets. Accept type
    VIRTIO_NET_HDR_GSO_UDP, even though not generating UFO natively.

    Processes are expected to use feature negotiation such as TUNSETOFFLOAD
    to detect supported offload types and refrain from injecting other
    packets. This process breaks down with live migration: guest kernels
    do not renegotiate flags, so destination hosts need to expose all
    features that the source host does.

    Partially revert the UFO removal from 182e0b6b5846~1..d9d30adf5677.
    This patch introduces nearly(*) no new code to simplify verification.
    It brings back verbatim tuntap UFO negotiation, VIRTIO_NET_HDR_GSO_UDP
    insertion and software UFO segmentation.

    It does not reinstate protocol stack support, hardware offload
    (NETIF_F_UFO), SKB_GSO_UDP tunneling in SKB_GSO_SOFTWARE or reception
    of VIRTIO_NET_HDR_GSO_UDP packets in tuntap.

    To support SKB_GSO_UDP reappearing in the stack, also reinstate
    logic in act_csum and openvswitch. Achieve equivalence with v4.13 HEAD
    by squashing in commit 939912216fa8 ("net: skb_needs_check() removes
    CHECKSUM_UNNECESSARY check for tx.") and reverting commit 8d63bee643f1
    ("net: avoid skb_warn_bad_offload false positives on UFO").

    (*) To avoid having to bring back skb_shinfo(skb)->ip6_frag_id,
    ipv6_proxy_select_ident is changed to return a __be32 and this is
    assigned directly to the frag_hdr. Also, SKB_GSO_UDP is inserted
    at the end of the enum to minimize code churn.

    Tested
    Booted a v4.13 guest kernel with QEMU. On a host kernel before this
    patch `ethtool -k eth0` shows UFO disabled. After the patch, it is
    enabled, same as on a v4.13 host kernel.

    A UFO packet sent from the guest appears on the tap device:
    host:
    nc -l -p -u 8000 &
    tcpdump -n -i tap0

    guest:
    dd if=/dev/zero of=payload.txt bs=1 count=2000
    nc -u 192.16.1.1 8000 < payload.txt

    Direct tap to tap transmission of VIRTIO_NET_HDR_GSO_UDP succeeds,
    packets arriving fragmented:

    ./with_tap_pair.sh ./tap_send_ufo tap0 tap1
    (from https://github.com/wdebruij/kerneltools/tree/master/tests)

    Changes
    v1 -> v2
    - simplified set_offload change (review comment)
    - documented test procedure

    Link: http://lkml.kernel.org/r/
    Fixes: fb652fdfe837 ("macvlan/macvtap: Remove NETIF_F_UFO advertisement.")
    Reported-by: Michal Kubecek
    Signed-off-by: Willem de Bruijn
    Acked-by: Jason Wang
    Signed-off-by: David S. Miller

    Willem de Bruijn
     

11 Nov, 2017

1 commit


23 Aug, 2017

1 commit

  • A packet length of exactly IPV6_MAXPLEN is allowed, we should
    refuse parsing options only if the size is 64KiB or more.

    While at it, remove one extra variable and one assignment which
    were also introduced by the commit that introduced the size
    check. Checking the sum 'offset + len' and only later adding
    'len' to 'offset' doesn't provide any advantage over directly
    summing to 'offset' and checking it.

    Fixes: 6399f1fae4ec ("ipv6: avoid overflow of offset in ip6_find_1stfragopt")
    Signed-off-by: Stefano Brivio
    Signed-off-by: David S. Miller

    Stefano Brivio
     

20 Jul, 2017

1 commit

  • In some cases, offset can overflow and can cause an infinite loop in
    ip6_find_1stfragopt(). Make it unsigned int to prevent the overflow, and
    cap it at IPV6_MAXPLEN, since packets larger than that should be invalid.

    This problem has been here since before the beginning of git history.

    Signed-off-by: Sabrina Dubroca
    Acked-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Sabrina Dubroca
     

18 May, 2017

1 commit

  • The KASAN warning repoted below was discovered with a syzkaller
    program. The reproducer is basically:
    int s = socket(AF_INET6, SOCK_RAW, NEXTHDR_HOP);
    send(s, &one_byte_of_data, 1, MSG_MORE);
    send(s, &more_than_mtu_bytes_data, 2000, 0);

    The socket() call sets the nexthdr field of the v6 header to
    NEXTHDR_HOP, the first send call primes the payload with a non zero
    byte of data, and the second send call triggers the fragmentation path.

    The fragmentation code tries to parse the header options in order
    to figure out where to insert the fragment option. Since nexthdr points
    to an invalid option, the calculation of the size of the network header
    can made to be much larger than the linear section of the skb and data
    is read outside of it.

    This fix makes ip6_find_1stfrag return an error if it detects
    running out-of-bounds.

    [ 42.361487] ==================================================================
    [ 42.364412] BUG: KASAN: slab-out-of-bounds in ip6_fragment+0x11c8/0x3730
    [ 42.365471] Read of size 840 at addr ffff88000969e798 by task ip6_fragment-oo/3789
    [ 42.366469]
    [ 42.366696] CPU: 1 PID: 3789 Comm: ip6_fragment-oo Not tainted 4.11.0+ #41
    [ 42.367628] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.1-1ubuntu1 04/01/2014
    [ 42.368824] Call Trace:
    [ 42.369183] dump_stack+0xb3/0x10b
    [ 42.369664] print_address_description+0x73/0x290
    [ 42.370325] kasan_report+0x252/0x370
    [ 42.370839] ? ip6_fragment+0x11c8/0x3730
    [ 42.371396] check_memory_region+0x13c/0x1a0
    [ 42.371978] memcpy+0x23/0x50
    [ 42.372395] ip6_fragment+0x11c8/0x3730
    [ 42.372920] ? nf_ct_expect_unregister_notifier+0x110/0x110
    [ 42.373681] ? ip6_copy_metadata+0x7f0/0x7f0
    [ 42.374263] ? ip6_forward+0x2e30/0x2e30
    [ 42.374803] ip6_finish_output+0x584/0x990
    [ 42.375350] ip6_output+0x1b7/0x690
    [ 42.375836] ? ip6_finish_output+0x990/0x990
    [ 42.376411] ? ip6_fragment+0x3730/0x3730
    [ 42.376968] ip6_local_out+0x95/0x160
    [ 42.377471] ip6_send_skb+0xa1/0x330
    [ 42.377969] ip6_push_pending_frames+0xb3/0xe0
    [ 42.378589] rawv6_sendmsg+0x2051/0x2db0
    [ 42.379129] ? rawv6_bind+0x8b0/0x8b0
    [ 42.379633] ? _copy_from_user+0x84/0xe0
    [ 42.380193] ? debug_check_no_locks_freed+0x290/0x290
    [ 42.380878] ? ___sys_sendmsg+0x162/0x930
    [ 42.381427] ? rcu_read_lock_sched_held+0xa3/0x120
    [ 42.382074] ? sock_has_perm+0x1f6/0x290
    [ 42.382614] ? ___sys_sendmsg+0x167/0x930
    [ 42.383173] ? lock_downgrade+0x660/0x660
    [ 42.383727] inet_sendmsg+0x123/0x500
    [ 42.384226] ? inet_sendmsg+0x123/0x500
    [ 42.384748] ? inet_recvmsg+0x540/0x540
    [ 42.385263] sock_sendmsg+0xca/0x110
    [ 42.385758] SYSC_sendto+0x217/0x380
    [ 42.386249] ? SYSC_connect+0x310/0x310
    [ 42.386783] ? __might_fault+0x110/0x1d0
    [ 42.387324] ? lock_downgrade+0x660/0x660
    [ 42.387880] ? __fget_light+0xa1/0x1f0
    [ 42.388403] ? __fdget+0x18/0x20
    [ 42.388851] ? sock_common_setsockopt+0x95/0xd0
    [ 42.389472] ? SyS_setsockopt+0x17f/0x260
    [ 42.390021] ? entry_SYSCALL_64_fastpath+0x5/0xbe
    [ 42.390650] SyS_sendto+0x40/0x50
    [ 42.391103] entry_SYSCALL_64_fastpath+0x1f/0xbe
    [ 42.391731] RIP: 0033:0x7fbbb711e383
    [ 42.392217] RSP: 002b:00007ffff4d34f28 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
    [ 42.393235] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fbbb711e383
    [ 42.394195] RDX: 0000000000001000 RSI: 00007ffff4d34f60 RDI: 0000000000000003
    [ 42.395145] RBP: 0000000000000046 R08: 00007ffff4d34f40 R09: 0000000000000018
    [ 42.396056] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000400aad
    [ 42.396598] R13: 0000000000000066 R14: 00007ffff4d34ee0 R15: 00007fbbb717af00
    [ 42.397257]
    [ 42.397411] Allocated by task 3789:
    [ 42.397702] save_stack_trace+0x16/0x20
    [ 42.398005] save_stack+0x46/0xd0
    [ 42.398267] kasan_kmalloc+0xad/0xe0
    [ 42.398548] kasan_slab_alloc+0x12/0x20
    [ 42.398848] __kmalloc_node_track_caller+0xcb/0x380
    [ 42.399224] __kmalloc_reserve.isra.32+0x41/0xe0
    [ 42.399654] __alloc_skb+0xf8/0x580
    [ 42.400003] sock_wmalloc+0xab/0xf0
    [ 42.400346] __ip6_append_data.isra.41+0x2472/0x33d0
    [ 42.400813] ip6_append_data+0x1a8/0x2f0
    [ 42.401122] rawv6_sendmsg+0x11ee/0x2db0
    [ 42.401505] inet_sendmsg+0x123/0x500
    [ 42.401860] sock_sendmsg+0xca/0x110
    [ 42.402209] ___sys_sendmsg+0x7cb/0x930
    [ 42.402582] __sys_sendmsg+0xd9/0x190
    [ 42.402941] SyS_sendmsg+0x2d/0x50
    [ 42.403273] entry_SYSCALL_64_fastpath+0x1f/0xbe
    [ 42.403718]
    [ 42.403871] Freed by task 1794:
    [ 42.404146] save_stack_trace+0x16/0x20
    [ 42.404515] save_stack+0x46/0xd0
    [ 42.404827] kasan_slab_free+0x72/0xc0
    [ 42.405167] kfree+0xe8/0x2b0
    [ 42.405462] skb_free_head+0x74/0xb0
    [ 42.405806] skb_release_data+0x30e/0x3a0
    [ 42.406198] skb_release_all+0x4a/0x60
    [ 42.406563] consume_skb+0x113/0x2e0
    [ 42.406910] skb_free_datagram+0x1a/0xe0
    [ 42.407288] netlink_recvmsg+0x60d/0xe40
    [ 42.407667] sock_recvmsg+0xd7/0x110
    [ 42.408022] ___sys_recvmsg+0x25c/0x580
    [ 42.408395] __sys_recvmsg+0xd6/0x190
    [ 42.408753] SyS_recvmsg+0x2d/0x50
    [ 42.409086] entry_SYSCALL_64_fastpath+0x1f/0xbe
    [ 42.409513]
    [ 42.409665] The buggy address belongs to the object at ffff88000969e780
    [ 42.409665] which belongs to the cache kmalloc-512 of size 512
    [ 42.410846] The buggy address is located 24 bytes inside of
    [ 42.410846] 512-byte region [ffff88000969e780, ffff88000969e980)
    [ 42.411941] The buggy address belongs to the page:
    [ 42.412405] page:ffffea000025a780 count:1 mapcount:0 mapping: (null) index:0x0 compound_mapcount: 0
    [ 42.413298] flags: 0x100000000008100(slab|head)
    [ 42.413729] raw: 0100000000008100 0000000000000000 0000000000000000 00000001800c000c
    [ 42.414387] raw: ffffea00002a9500 0000000900000007 ffff88000c401280 0000000000000000
    [ 42.415074] page dumped because: kasan: bad access detected
    [ 42.415604]
    [ 42.415757] Memory state around the buggy address:
    [ 42.416222] ffff88000969e880: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    [ 42.416904] ffff88000969e900: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    [ 42.417591] >ffff88000969e980: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
    [ 42.418273] ^
    [ 42.418588] ffff88000969ea00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
    [ 42.419273] ffff88000969ea80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
    [ 42.419882] ==================================================================

    Reported-by: Andrey Konovalov
    Signed-off-by: Craig Gallek
    Signed-off-by: David S. Miller

    Craig Gallek
     

03 Dec, 2016

1 commit

  • When xfrm is applied to TSO/GSO packets, it follows this path:

    xfrm_output() -> xfrm_output_gso() -> skb_gso_segment()

    where skb_gso_segment() relies on skb->protocol to function properly.

    This patch sets skb->protocol to ETH_P_IPV6 before dst_output() is called,
    fixing a bug where GSO packets sent through an ipip6 tunnel are dropped
    when xfrm is involved.

    Cc: stable@vger.kernel.org
    Signed-off-by: Eli Cooper
    Signed-off-by: David S. Miller

    Eli Cooper
     

11 Sep, 2016

1 commit

  • This patch adds the infrastructure to the output path to pass an skb
    to an l3mdev device if it has a hook registered. This is the Tx parallel
    to l3mdev_ip{6}_rcv in the receive path and is the basis for removing
    the existing hook that returns the vrf dst on the fib lookup.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     

08 Oct, 2015

6 commits


18 Sep, 2015

3 commits

  • This is immediately motivated by the bridge code that chains functions that
    call into netfilter. Without passing net into the okfns the bridge code would
    need to guess about the best expression for the network namespace to process
    packets in.

    As net is frequently one of the first things computed in continuation functions
    after netfilter has done it's job passing in the desired network namespace is in
    many cases a code simplification.

    To support this change the function dst_output_okfn is introduced to
    simplify passing dst_output as an okfn. For the moment dst_output_okfn
    just silently drops the struct net.

    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • Pass a network namespace parameter into the netfilter hooks. At the
    call site of the netfilter hooks the path a packet is taking through
    the network stack is well known which allows the network namespace to
    be easily and reliabily.

    This allows the replacement of magic code like
    "dev_net(state->in?:state->out)" that appears at the start of most
    netfilter hooks with "state->net".

    In almost all cases the network namespace passed in is derived
    from the first network device passed in, guaranteeing those
    paths will not see any changes in practice.

    The exceptions are:
    xfrm/xfrm_output.c:xfrm_output_resume() xs_net(skb_dst(skb)->xfrm)
    ipvs/ip_vs_xmit.c:ip_vs_nat_send_or_cont() ip_vs_conn_net(cp)
    ipvs/ip_vs_xmit.c:ip_vs_send_or_cont() ip_vs_conn_net(cp)
    ipv4/raw.c:raw_send_hdrinc() sock_net(sk)
    ipv6/ip6_output.c:ip6_xmit() sock_net(sk)
    ipv6/ndisc.c:ndisc_send_skb() dev_net(skb->dev) not dev_net(dst->dev)
    ipv6/raw.c:raw6_send_hdrinc() sock_net(sk)
    br_netfilter_hooks.c:br_nf_pre_routing_finish() dev_net(skb->dev) before skb->dev is set to nf_bridge->physindev

    In all cases these exceptions seem to be a better expression for the
    network namespace the packet is being processed in then the historic
    "dev_net(in?in:out)". I am documenting them in case something odd
    pops up and someone starts trying to track down what happened.

    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • Add a sock paramter to dst_output making dst_output_sk superfluous.
    Add a skb->sk parameter to all of the callers of dst_output
    Have the callers of dst_output_sk call dst_output.

    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

19 Jun, 2015

1 commit

  • This pulls the full hook netfilter definitions from all those that include
    net_namespace.h.

    Instead let's just include the bare minimum required in the new
    linux/netfilter_defs.h file, and use it from the netfilter netns header files.

    I also needed to include in.h and in6.h from linux/netfilter.h otherwise we hit
    this compilation error:

    In file included from include/linux/netfilter_defs.h:4:0,
    from include/net/netns/netfilter.h:4,
    from include/net/net_namespace.h:22,
    from include/linux/netdevice.h:43,
    from net/netfilter/nfnetlink_queue_core.c:23:
    include/uapi/linux/netfilter.h:76:17: error: field ‘in’ has incomplete type struct in_addr in;

    And also explicit include linux/netfilter.h in several spots.

    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Eric W. Biederman

    Pablo Neira Ayuso
     

26 May, 2015

3 commits

  • ipv6_select_ident() returns a 32bit value in network order.

    Fixes: 286c2349f666 ("ipv6: Clean up ipv6_select_ident() and ip6_fragment()")
    Signed-off-by: Eric Dumazet
    Reported-by: kbuild test robot
    Acked-by: Martin KaFai Lau
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • This patch removes the assumptions that the returned rt is always
    a RTF_CACHE entry with the rt6i_dst and rt6i_src containing the
    destination and source address. The dst and src can be recovered from
    the calling site.

    We may consider to rename (rt6i_dst, rt6i_src) to
    (rt6i_key_dst, rt6i_key_src) later.

    Signed-off-by: Martin KaFai Lau
    Reviewed-by: Hannes Frederic Sowa
    Cc: Steffen Klassert
    Cc: Julian Anastasov
    Signed-off-by: David S. Miller

    Martin KaFai Lau
     
  • This patch changes the ipv6_select_ident() signature to return a
    fragment id instead of taking a whole frag_hdr as a param to
    only set the frag_hdr->identification.

    It also cleans up ip6_fragment() to obtain the fragment id at the
    beginning instead of using multiple "if" later to check fragment id
    has been generated or not.

    Signed-off-by: Martin KaFai Lau
    Cc: Hannes Frederic Sowa
    Cc: Steffen Klassert
    Cc: Julian Anastasov
    Signed-off-by: David S. Miller

    Martin KaFai Lau
     

08 Apr, 2015

2 commits

  • That was we can make sure the output path of ipv4/ipv6 operate on
    the UDP socket rather than whatever random thing happens to be in
    skb->sk.

    Based upon a patch by Jiri Pirko.

    Signed-off-by: David S. Miller
    Acked-by: Hannes Frederic Sowa

    David Miller
     
  • On the output paths in particular, we have to sometimes deal with two
    socket contexts. First, and usually skb->sk, is the local socket that
    generated the frame.

    And second, is potentially the socket used to control a tunneling
    socket, such as one the encapsulates using UDP.

    We do not want to disassociate skb->sk when encapsulating in order
    to fix this, because that would break socket memory accounting.

    The most extreme case where this can cause huge problems is an
    AF_PACKET socket transmitting over a vxlan device. We hit code
    paths doing checks that assume they are dealing with an ipv4
    socket, but are actually operating upon the AF_PACKET one.

    Signed-off-by: David S. Miller

    David Miller
     

26 Mar, 2015

1 commit


10 Feb, 2015

2 commits

  • Make __ipv6_select_ident() static as it isn't used outside
    the file.

    Fixes: 0508c07f5e0c9 (ipv6: Select fragment id during UFO segmentation if not set.)
    Signed-off-by: Vladislav Yasevich
    Signed-off-by: David S. Miller

    Vlad Yasevich
     
  • Recent commit:
    0508c07f5e0c94f38afd5434e8b2a55b84553077
    Author: Vlad Yasevich
    Date: Tue Feb 3 16:36:15 2015 -0500

    ipv6: Select fragment id during UFO segmentation if not set.

    Introduced a bug on LE in how ipv6 fragment id is assigned.
    This was cought by nightly sparce check:

    Resolve the following sparce error:
    net/ipv6/output_core.c:57:38: sparse: incorrect type in assignment
    (different base types)
    net/ipv6/output_core.c:57:38: expected restricted __be32
    [usertype] ip6_frag_id
    net/ipv6/output_core.c:57:38: got unsigned int [unsigned]
    [assigned] [usertype] id

    Fixes: 0508c07f5e0c9 (ipv6: Select fragment id during UFO segmentation if not set.)
    Signed-off-by: Vladislav Yasevich
    Signed-off-by: David S. Miller

    Vlad Yasevich
     

04 Feb, 2015

1 commit

  • If the IPv6 fragment id has not been set and we perform
    fragmentation due to UFO, select a new fragment id.
    We now consider a fragment id of 0 as unset and if id selection
    process returns 0 (after all the pertrubations), we set it to
    0x80000000, thus giving us ample space not to create collisions
    with the next packet we may have to fragment.

    When doing UFO integrity checking, we also select the
    fragment id if it has not be set yet. This is stored into
    the skb_shinfo() thus allowing UFO to function correclty.

    This patch also removes duplicate fragment id generation code
    and moves ipv6_select_ident() into the header as it may be
    used during GSO.

    Signed-off-by: Vladislav Yasevich
    Signed-off-by: David S. Miller

    Vlad Yasevich
     

31 Oct, 2014

1 commit


25 Aug, 2014

1 commit

  • This patch makes no changes to the logic of the code but simply addresses
    coding style issues as detected by checkpatch.

    Both objdump and diff -w show no differences.

    A number of items are addressed in this patch:
    * Multiple spaces converted to tabs
    * Spaces before tabs removed.
    * Spaces in pointer typing cleansed (char *)foo etc.
    * Remove space after sizeof
    * Ensure spacing around comparators such as if statements.

    Signed-off-by: Ian Morris
    Signed-off-by: David S. Miller

    Ian Morris
     

12 Jun, 2014

1 commit


11 Jun, 2014

1 commit

  • Bug report on https://bugzilla.kernel.org/show_bug.cgi?id=75781

    When a local output ipsec packet match the mangle table rule,
    and be set mark value, the packet will be route again in
    route_me_harder -> _session_decoder6

    In this case, the nhoff in CB of skb was still the default
    value 0. So the protocal match can't success and the packet can't match
    correct SA rule,and then the packet be send out in plaintext.

    To fixed up the issue. The CB->nhoff must be set.

    Signed-off-by: Hui Zhang
    Signed-off-by: David S. Miller

    huizhang
     

04 Jun, 2014

1 commit


03 Jun, 2014

2 commits

  • I noticed we were sending wrong IPv4 ID in TCP flows when MTU discovery
    is disabled.
    Note how GSO/TSO packets do not have monotonically incrementing ID.

    06:37:41.575531 IP (id 14227, proto: TCP (6), length: 4396)
    06:37:41.575534 IP (id 14272, proto: TCP (6), length: 65212)
    06:37:41.575544 IP (id 14312, proto: TCP (6), length: 57972)
    06:37:41.575678 IP (id 14317, proto: TCP (6), length: 7292)
    06:37:41.575683 IP (id 14361, proto: TCP (6), length: 63764)

    It appears I introduced this bug in linux-3.1.

    inet_getid() must return the old value of peer->ip_id_count,
    not the new one.

    Lets revert this part, and remove the prevention of
    a null identification field in IPv6 Fragment Extension Header,
    which is dubious and not even done properly.

    Fixes: 87c48fa3b463 ("ipv6: make fragment identifications less predictable")
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Ideally, we would need to generate IP ID using a per destination IP
    generator.

    linux kernels used inet_peer cache for this purpose, but this had a huge
    cost on servers disabling MTU discovery.

    1) each inet_peer struct consumes 192 bytes

    2) inetpeer cache uses a binary tree of inet_peer structs,
    with a nominal size of ~66000 elements under load.

    3) lookups in this tree are hitting a lot of cache lines, as tree depth
    is about 20.

    4) If server deals with many tcp flows, we have a high probability of
    not finding the inet_peer, allocating a fresh one, inserting it in
    the tree with same initial ip_id_count, (cf secure_ip_id())

    5) We garbage collect inet_peer aggressively.

    IP ID generation do not have to be 'perfect'

    Goal is trying to avoid duplicates in a short period of time,
    so that reassembly units have a chance to complete reassembly of
    fragments belonging to one message before receiving other fragments
    with a recycled ID.

    We simply use an array of generators, and a Jenkin hash using the dst IP
    as a key.

    ipv6_select_ident() is put back into net/ipv6/ip6_output.c where it
    belongs (it is only used from this file)

    secure_ip_id() and secure_ipv6_id() no longer are needed.

    Rename ip_select_ident_more() to ip_select_ident_segs() to avoid
    unnecessary decrement/increment of the number of segments.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

01 Apr, 2014

1 commit

  • First off, we don't need to check for non-NULL rt any more, as we are
    guaranteed to always get a valid rt6_info. Drop the check.

    In case we couldn't allocate an inet_peer for fragmentation information
    we currently generate strictly incrementing fragmentation ids for all
    destination. This is done to maximize the cycle and avoid collisions.

    Those fragmentation ids are very predictable. At least we should try to
    mix in the destination address.

    While it should make no difference to simply use a PRNG at this point,
    secure_ipv6_id ensures that we don't leak information from prandom,
    so its internal state could be recoverable.

    This fallback function should normally not get used thus this should
    not affect performance at all. It is just meant as a safety net.

    Signed-off-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Hannes Frederic Sowa
     

07 Mar, 2014

1 commit

  • Commit e688a604807647 ("net: introduce DST_NOPEER dst flag") introduced
    DST_NOPEER because because of crashes in ipv6_select_ident called from
    udp6_ufo_fragment.

    Since commit 916e4cf46d0204 ("ipv6: reuse ip6_frag_id from
    ip6_ufo_append_data") we don't call ipv6_select_ident any more from
    ip6_ufo_append_data, thus this flag lost its purpose and can be removed.

    Cc: Eric Dumazet
    Signed-off-by: Hannes Frederic Sowa
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Hannes Frederic Sowa
     

01 Sep, 2013

2 commits


29 May, 2013

1 commit

  • This corrects an regression introduced by "net: Use 16bits for *_headers
    fields of struct skbuff" when NET_SKBUFF_DATA_USES_OFFSET is not set. In
    that case skb->tail will be a pointer whereas skb->transport_header
    will be an offset from head. This is corrected by using wrappers that
    ensure that comparisons and calculations are always made using pointers.

    Signed-off-by: Simon Horman
    Signed-off-by: David S. Miller

    Simon Horman