13 Nov, 2020

2 commits

  • udp{4,6}_lib_lookup_skb() use ip{,v6}_hdr() to get IP header of the
    packet. While it's probably OK for non-frag0 paths, this helpers
    will also point to junk on Fast/frag0 GRO when all headers are
    located in frags. As a result, sk/skb lookup may fail or give wrong
    results. To support both GRO modes, skb_gro_network_header() might
    be used. To not modify original functions, add private versions of
    udp{4,6}_lib_lookup_skb() only to perform correct sk lookups on GRO.

    Present since the introduction of "application-level" UDP GRO
    in 4.7-rc1.

    Misc: replace totally unneeded ternaries with plain ifs.

    Fixes: a6024562ffd7 ("udp: Add GRO functions to UDP socket")
    Suggested-by: Willem de Bruijn
    Cc: Eric Dumazet
    Signed-off-by: Alexander Lobakin
    Acked-by: Willem de Bruijn
    Signed-off-by: Jakub Kicinski

    Alexander Lobakin
     
  • UDP GRO uses udp_hdr(skb) in its .gro_receive() callback. While it's
    probably OK for non-frag0 paths (when all headers or even the entire
    frame are already in skb head), this inline points to junk when
    using Fast GRO (napi_gro_frags() or napi_gro_receive() with only
    Ethernet header in skb head and all the rest in the frags) and breaks
    GRO packet compilation and the packet flow itself.
    To support both modes, skb_gro_header_fast() + skb_gro_header_slow()
    are typically used. UDP even has an inline helper that makes use of
    them, udp_gro_udphdr(). Use that instead of troublemaking udp_hdr()
    to get rid of the out-of-order delivers.

    Present since the introduction of plain UDP GRO in 5.0-rc1.

    Fixes: e20cf8d3f1f7 ("udp: implement GRO for plain UDP sockets.")
    Cc: Eric Dumazet
    Signed-off-by: Alexander Lobakin
    Acked-by: Willem de Bruijn
    Signed-off-by: Jakub Kicinski

    Alexander Lobakin
     

31 Mar, 2020

1 commit

  • Without NAPI_GRO_CB(skb)->is_flist initialized, when the dev doesn't
    support NETIF_F_GRO_FRAGLIST, is_flist can still be set and fraglist
    will be used in udp_gro_receive().

    So fix it by initializing is_flist with 0 in udp_gro_receive.

    Fixes: 9fd1ff5d2ac7 ("udp: Support UDP fraglist GRO/GSO.")
    Signed-off-by: Xin Long
    Acked-by: Steffen Klassert
    Signed-off-by: David S. Miller

    Xin Long
     

27 Jan, 2020

1 commit

  • This patch extends UDP GRO to support fraglist GRO/GSO
    by using the previously introduced infrastructure.
    If the feature is enabled, all UDP packets are going to
    fraglist GRO (local input and forward).

    After validating the csum, we mark ip_summed as
    CHECKSUM_UNNECESSARY for fraglist GRO packets to
    make sure that the csum is not touched.

    Signed-off-by: Steffen Klassert
    Reviewed-by: Willem de Bruijn
    Signed-off-by: David S. Miller

    Steffen Klassert
     

04 Jan, 2020

1 commit


22 Jun, 2019

1 commit


19 Jun, 2019

1 commit

  • Fixes an issue where TX Timestamps are not arriving on the error queue
    when UDP_SEGMENT CMSG type is combined with CMSG type SO_TIMESTAMPING.
    This can be illustrated with an updated updgso_bench_tx program which
    includes the '-T' option to test for this condition. It also introduces
    the '-P' option which will call poll() before reading the error queue.

    ./udpgso_bench_tx -4ucTPv -S 1472 -l2 -D 172.16.120.18
    poll timeout
    udp tx: 0 MB/s 1 calls/s 1 msg/s

    The "poll timeout" message above indicates that TX timestamp never
    arrived.

    This patch preserves tx_flags for the first UDP GSO segment. Only the
    first segment is timestamped, even though in some cases there may be
    benefital in timestamping both the first and last segment.

    Factors in deciding on first segment timestamp only:

    - Timestamping both first and last segmented is not feasible. Hardware
    can only have one outstanding TS request at a time.

    - Timestamping last segment may under report network latency of the
    previous segments. Even though the doorbell is suppressed, the ring
    producer counter has been incremented.

    - Timestamping the first segment has the upside in that it reports
    timestamps from the application's view, e.g. RTT.

    - Timestamping the first segment has the downside that it may
    underreport tx host network latency. It appears that we have to pick
    one or the other. And possibly follow-up with a config flag to choose
    behavior.

    v2: Remove tests as noted by Willem de Bruijn
    Moving tests from net to net-next

    v3: Update only relevant tx_flag bits as per
    Willem de Bruijn

    v4: Update comments and commit message as per
    Willem de Bruijn

    Fixes: ee80d1ebe5ba ("udp: add udp gso")
    Signed-off-by: Fred Klassen
    Acked-by: Willem de Bruijn
    Signed-off-by: David S. Miller

    Fred Klassen
     

08 Jun, 2019

1 commit


06 Jun, 2019

1 commit


31 May, 2019

1 commit

  • Based on 1 normalized pattern(s):

    this program is free software you can redistribute it and or modify
    it under the terms of the gnu general public license as published by
    the free software foundation either version 2 of the license or at
    your option any later version

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-or-later

    has been chosen to replace the boilerplate/reference in 3029 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Allison Randal
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

02 May, 2019

1 commit

  • syzbot was able to crash host by sending UDP packets with a 0 payload.

    TCP does not have this issue since we do not aggregate packets without
    payload.

    Since dev_gro_receive() sets gso_size based on skb_gro_len(skb)
    it seems not worth trying to cope with padded packets.

    BUG: KASAN: slab-out-of-bounds in skb_gro_receive+0xf5f/0x10e0 net/core/skbuff.c:3826
    Read of size 16 at addr ffff88808893fff0 by task syz-executor612/7889

    CPU: 0 PID: 7889 Comm: syz-executor612 Not tainted 5.1.0-rc7+ #96
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Call Trace:
    __dump_stack lib/dump_stack.c:77 [inline]
    dump_stack+0x172/0x1f0 lib/dump_stack.c:113
    print_address_description.cold+0x7c/0x20d mm/kasan/report.c:187
    kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
    __asan_report_load16_noabort+0x14/0x20 mm/kasan/generic_report.c:133
    skb_gro_receive+0xf5f/0x10e0 net/core/skbuff.c:3826
    udp_gro_receive_segment net/ipv4/udp_offload.c:382 [inline]
    call_gro_receive include/linux/netdevice.h:2349 [inline]
    udp_gro_receive+0xb61/0xfd0 net/ipv4/udp_offload.c:414
    udp4_gro_receive+0x763/0xeb0 net/ipv4/udp_offload.c:478
    inet_gro_receive+0xe72/0x1110 net/ipv4/af_inet.c:1510
    dev_gro_receive+0x1cd0/0x23c0 net/core/dev.c:5581
    napi_gro_frags+0x36b/0xd10 net/core/dev.c:5843
    tun_get_user+0x2f24/0x3fb0 drivers/net/tun.c:1981
    tun_chr_write_iter+0xbd/0x156 drivers/net/tun.c:2027
    call_write_iter include/linux/fs.h:1866 [inline]
    do_iter_readv_writev+0x5e1/0x8e0 fs/read_write.c:681
    do_iter_write fs/read_write.c:957 [inline]
    do_iter_write+0x184/0x610 fs/read_write.c:938
    vfs_writev+0x1b3/0x2f0 fs/read_write.c:1002
    do_writev+0x15e/0x370 fs/read_write.c:1037
    __do_sys_writev fs/read_write.c:1110 [inline]
    __se_sys_writev fs/read_write.c:1107 [inline]
    __x64_sys_writev+0x75/0xb0 fs/read_write.c:1107
    do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
    entry_SYSCALL_64_after_hwframe+0x49/0xbe
    RIP: 0033:0x441cc0
    Code: 05 48 3d 01 f0 ff ff 0f 83 9d 09 fc ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 83 3d 51 93 29 00 00 75 14 b8 14 00 00 00 0f 05 3d 01 f0 ff ff 0f 83 74 09 fc ff c3 48 83 ec 08 e8 ba 2b 00 00
    RSP: 002b:00007ffe8c716118 EFLAGS: 00000246 ORIG_RAX: 0000000000000014
    RAX: ffffffffffffffda RBX: 00007ffe8c716150 RCX: 0000000000441cc0
    RDX: 0000000000000001 RSI: 00007ffe8c716170 RDI: 00000000000000f0
    RBP: 0000000000000000 R08: 000000000000ffff R09: 0000000000a64668
    R10: 0000000020000040 R11: 0000000000000246 R12: 000000000000c2d9
    R13: 0000000000402b50 R14: 0000000000000000 R15: 0000000000000000

    Allocated by task 5143:
    save_stack+0x45/0xd0 mm/kasan/common.c:75
    set_track mm/kasan/common.c:87 [inline]
    __kasan_kmalloc mm/kasan/common.c:497 [inline]
    __kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:470
    kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:505
    slab_post_alloc_hook mm/slab.h:437 [inline]
    slab_alloc mm/slab.c:3393 [inline]
    kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3555
    mm_alloc+0x1d/0xd0 kernel/fork.c:1030
    bprm_mm_init fs/exec.c:363 [inline]
    __do_execve_file.isra.0+0xaa3/0x23f0 fs/exec.c:1791
    do_execveat_common fs/exec.c:1865 [inline]
    do_execve fs/exec.c:1882 [inline]
    __do_sys_execve fs/exec.c:1958 [inline]
    __se_sys_execve fs/exec.c:1953 [inline]
    __x64_sys_execve+0x8f/0xc0 fs/exec.c:1953
    do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
    entry_SYSCALL_64_after_hwframe+0x49/0xbe

    Freed by task 5351:
    save_stack+0x45/0xd0 mm/kasan/common.c:75
    set_track mm/kasan/common.c:87 [inline]
    __kasan_slab_free+0x102/0x150 mm/kasan/common.c:459
    kasan_slab_free+0xe/0x10 mm/kasan/common.c:467
    __cache_free mm/slab.c:3499 [inline]
    kmem_cache_free+0x86/0x260 mm/slab.c:3765
    __mmdrop+0x238/0x320 kernel/fork.c:677
    mmdrop include/linux/sched/mm.h:49 [inline]
    finish_task_switch+0x47b/0x780 kernel/sched/core.c:2746
    context_switch kernel/sched/core.c:2880 [inline]
    __schedule+0x81b/0x1cc0 kernel/sched/core.c:3518
    preempt_schedule_irq+0xb5/0x140 kernel/sched/core.c:3745
    retint_kernel+0x1b/0x2d
    arch_local_irq_restore arch/x86/include/asm/paravirt.h:767 [inline]
    kmem_cache_free+0xab/0x260 mm/slab.c:3766
    anon_vma_chain_free mm/rmap.c:134 [inline]
    unlink_anon_vmas+0x2ba/0x870 mm/rmap.c:401
    free_pgtables+0x1af/0x2f0 mm/memory.c:394
    exit_mmap+0x2d1/0x530 mm/mmap.c:3144
    __mmput kernel/fork.c:1046 [inline]
    mmput+0x15f/0x4c0 kernel/fork.c:1067
    exec_mmap fs/exec.c:1046 [inline]
    flush_old_exec+0x8d9/0x1c20 fs/exec.c:1279
    load_elf_binary+0x9bc/0x53f0 fs/binfmt_elf.c:864
    search_binary_handler fs/exec.c:1656 [inline]
    search_binary_handler+0x17f/0x570 fs/exec.c:1634
    exec_binprm fs/exec.c:1698 [inline]
    __do_execve_file.isra.0+0x1394/0x23f0 fs/exec.c:1818
    do_execveat_common fs/exec.c:1865 [inline]
    do_execve fs/exec.c:1882 [inline]
    __do_sys_execve fs/exec.c:1958 [inline]
    __se_sys_execve fs/exec.c:1953 [inline]
    __x64_sys_execve+0x8f/0xc0 fs/exec.c:1953
    do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
    entry_SYSCALL_64_after_hwframe+0x49/0xbe

    The buggy address belongs to the object at ffff88808893f7c0
    which belongs to the cache mm_struct of size 1496
    The buggy address is located 600 bytes to the right of
    1496-byte region [ffff88808893f7c0, ffff88808893fd98)
    The buggy address belongs to the page:
    page:ffffea0002224f80 count:1 mapcount:0 mapping:ffff88821bc40ac0 index:0xffff88808893f7c0 compound_mapcount: 0
    flags: 0x1fffc0000010200(slab|head)
    raw: 01fffc0000010200 ffffea00025b4f08 ffffea00027b9d08 ffff88821bc40ac0
    raw: ffff88808893f7c0 ffff88808893e440 0000000100000001 0000000000000000
    page dumped because: kasan: bad access detected

    Memory state around the buggy address:
    ffff88808893fe80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
    ffff88808893ff00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
    >ffff88808893ff80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
    ^
    ffff888088940000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    ffff888088940080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

    Fixes: e20cf8d3f1f7 ("udp: implement GRO for plain UDP sockets.")
    Signed-off-by: Eric Dumazet
    Cc: Paolo Abeni
    Reported-by: syzbot
    Signed-off-by: David S. Miller

    Eric Dumazet
     

28 Apr, 2019

1 commit

  • Currently, the UDP GRO code path does bad things on some edge
    conditions - Aggregation can happen even on packet with different
    lengths.

    Fix the above by rewriting the 'complete' condition for GRO
    packets. While at it, note explicitly that we allow merging the
    first packet per burst below gso_size.

    Reported-by: Sean Tong
    Fixes: e20cf8d3f1f7 ("udp: implement GRO for plain UDP sockets.")
    Signed-off-by: Paolo Abeni
    Signed-off-by: David S. Miller

    Paolo Abeni
     

16 Dec, 2018

2 commits

  • This avoids another indirect call for UDP GRO. Again, the test
    for the IPv6 variant is performed first.

    v1 -> v2:
    - adapted to INDIRECT_CALL_ changes

    Signed-off-by: Paolo Abeni
    Signed-off-by: David S. Miller

    Paolo Abeni
     
  • This avoids an indirect call in the receive path for TCP and UDP
    packets. TCP takes precedence on UDP, so that we have a single
    additional conditional in the common case.

    When IPV6 is build as module, all gro symbols except UDPv6 are
    builtin, while the latter belong to the ipv6 module, so we
    need some special care.

    v1 -> v2:
    - adapted to INDIRECT_CALL_ changes
    v2 -> v3:
    - fix build issue with CONFIG_IPV6=m

    Signed-off-by: Paolo Abeni
    Signed-off-by: David S. Miller

    Paolo Abeni
     

08 Nov, 2018

1 commit

  • This is the RX counterpart of commit bec1f6f69736 ("udp: generate gso
    with UDP_SEGMENT"). When UDP_GRO is enabled, such socket is also
    eligible for GRO in the rx path: UDP segments directed to such socket
    are assembled into a larger GSO_UDP_L4 packet.

    The core UDP GRO support is enabled with setsockopt(UDP_GRO).

    Initial benchmark numbers:

    Before:
    udp rx: 1079 MB/s 769065 calls/s

    After:
    udp rx: 1466 MB/s 24877 calls/s

    This change introduces a side effect in respect to UDP tunnels:
    after a UDP tunnel creation, now the kernel performs a lookup per ingress
    UDP packet, while before such lookup happened only if the ingress packet
    carried a valid internal header csum.

    rfc v2 -> rfc v3:
    - fixed typos in macro name and comments
    - really enforce UDP_GRO_CNT_MAX, instead of UDP_GRO_CNT_MAX + 1
    - acquire socket lock in UDP_GRO setsockopt

    rfc v1 -> rfc v2:
    - use a new option to enable UDP GRO
    - use static keys to protect the UDP GRO socket lookup

    Signed-off-by: Paolo Abeni
    Signed-off-by: David S. Miller

    Paolo Abeni
     

06 Oct, 2018

1 commit

  • Avoid the socket lookup cost in udp_gro_receive if no socket has a
    udp tunnel callback configured.

    udp_sk(sk)->gro_receive requires a registration with
    setup_udp_tunnel_sock, which enables the static key.

    Signed-off-by: Willem de Bruijn
    Acked-by: Paolo Abeni
    Signed-off-by: David S. Miller

    Willem de Bruijn
     

03 Jul, 2018

1 commit


02 Jul, 2018

1 commit

  • Since the addition of GRO for ESP, gro_receive can consume the skb and
    return -EINPROGRESS. In that case, the lower layer GRO handler cannot
    touch the skb anymore.

    Commit 5f114163f2f5 ("net: Add a skb_gro_flush_final helper.") converted
    some of the gro_receive handlers that can lead to ESP's gro_receive so
    that they wouldn't access the skb when -EINPROGRESS is returned, but
    missed other spots, mainly in tunneling protocols.

    This patch finishes the conversion to using skb_gro_flush_final(), and
    adds a new helper, skb_gro_flush_final_remcsum(), used in VXLAN and
    GUE.

    Fixes: 5f114163f2f5 ("net: Add a skb_gro_flush_final helper.")
    Signed-off-by: Sabrina Dubroca
    Reviewed-by: Stefano Brivio
    Signed-off-by: David S. Miller

    Sabrina Dubroca
     

26 Jun, 2018

1 commit

  • Manage pending per-NAPI GRO packets via list_head.

    Return an SKB pointer from the GRO receive handlers. When GRO receive
    handlers return non-NULL, it means that this SKB needs to be completed
    at this time and removed from the NAPI queue.

    Several operations are greatly simplified by this transformation,
    especially timing out the oldest SKB in the list when gro_count
    exceeds MAX_GRO_SKBS, and napi_gro_flush() which walks the queue
    in reverse order.

    Signed-off-by: David S. Miller

    David Miller
     

12 May, 2018

1 commit

  • For some reason, Willem thought that the issue we fixed for TCP
    in commit 7ec318feeed1 ("tcp: gso: avoid refcount_t warning from
    tcp_gso_segment()") was not relevant for UDP GSO.

    But syzbot found its way.

    refcount_t: saturated; leaking memory.
    WARNING: CPU: 0 PID: 10261 at lib/refcount.c:78 refcount_add_not_zero+0x2d4/0x320 lib/refcount.c:78
    Kernel panic - not syncing: panic_on_warn set ...

    CPU: 0 PID: 10261 Comm: syz-executor5 Not tainted 4.17.0-rc3+ #38
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Call Trace:
    __dump_stack lib/dump_stack.c:77 [inline]
    dump_stack+0x1b9/0x294 lib/dump_stack.c:113
    panic+0x22f/0x4de kernel/panic.c:184
    __warn.cold.8+0x163/0x1b3 kernel/panic.c:536
    report_bug+0x252/0x2d0 lib/bug.c:186
    fixup_bug arch/x86/kernel/traps.c:178 [inline]
    do_error_trap+0x1de/0x490 arch/x86/kernel/traps.c:296
    do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315
    invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:992
    RIP: 0010:refcount_add_not_zero+0x2d4/0x320 lib/refcount.c:78
    RSP: 0018:ffff880196db6b90 EFLAGS: 00010282
    RAX: 0000000000000026 RBX: 00000000ffffff01 RCX: ffffc900040d9000
    RDX: 0000000000004a29 RSI: ffffffff8160f6f1 RDI: ffff880196db66f0
    RBP: ffff880196db6c78 R08: ffff8801b33d6740 R09: 0000000000000002
    R10: ffff8801b33d6740 R11: 0000000000000000 R12: 0000000000000000
    R13: 00000000ffffffff R14: ffff880196db6c50 R15: 0000000000020101
    refcount_add+0x1b/0x70 lib/refcount.c:102
    __udp_gso_segment+0xaa5/0xee0 net/ipv4/udp_offload.c:272
    udp4_ufo_fragment+0x592/0x7a0 net/ipv4/udp_offload.c:301
    inet_gso_segment+0x639/0x12b0 net/ipv4/af_inet.c:1342
    skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
    __skb_gso_segment+0x3bb/0x870 net/core/dev.c:2865
    skb_gso_segment include/linux/netdevice.h:4050 [inline]
    validate_xmit_skb+0x54d/0xd90 net/core/dev.c:3122
    __dev_queue_xmit+0xbf8/0x34c0 net/core/dev.c:3579
    dev_queue_xmit+0x17/0x20 net/core/dev.c:3620
    neigh_direct_output+0x15/0x20 net/core/neighbour.c:1401
    neigh_output include/net/neighbour.h:483 [inline]
    ip_finish_output2+0xa5f/0x1840 net/ipv4/ip_output.c:229
    ip_finish_output+0x828/0xf80 net/ipv4/ip_output.c:317
    NF_HOOK_COND include/linux/netfilter.h:277 [inline]
    ip_output+0x21b/0x850 net/ipv4/ip_output.c:405
    dst_output include/net/dst.h:444 [inline]
    ip_local_out+0xc5/0x1b0 net/ipv4/ip_output.c:124
    ip_send_skb+0x40/0xe0 net/ipv4/ip_output.c:1434
    udp_send_skb.isra.37+0x5eb/0x1000 net/ipv4/udp.c:825
    udp_push_pending_frames+0x5c/0xf0 net/ipv4/udp.c:853
    udp_v6_push_pending_frames+0x380/0x3e0 net/ipv6/udp.c:1105
    udp_lib_setsockopt+0x59a/0x600 net/ipv4/udp.c:2403
    udpv6_setsockopt+0x95/0xa0 net/ipv6/udp.c:1447
    sock_common_setsockopt+0x9a/0xe0 net/core/sock.c:3046
    __sys_setsockopt+0x1bd/0x390 net/socket.c:1903
    __do_sys_setsockopt net/socket.c:1914 [inline]
    __se_sys_setsockopt net/socket.c:1911 [inline]
    __x64_sys_setsockopt+0xbe/0x150 net/socket.c:1911
    do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
    entry_SYSCALL_64_after_hwframe+0x49/0xbe

    Fixes: ad405857b174 ("udp: better wmem accounting on gso")
    Signed-off-by: Eric Dumazet
    Cc: Willem de Bruijn
    Cc: Alexander Duyck
    Reported-by: syzbot
    Acked-by: Willem de Bruijn
    Signed-off-by: David S. Miller

    Eric Dumazet
     

09 May, 2018

5 commits

  • This patch makes it so that if a destructor is not present we avoid trying
    to update the skb socket or any reference counting that would be associated
    with the NULL socket and/or descriptor. By doing this we can support
    traffic coming from another namespace without any issues.

    Acked-by: Willem de Bruijn
    Signed-off-by: Alexander Duyck
    Reviewed-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Alexander Duyck
     
  • This patch adds support for a software provided checksum and GSO_PARTIAL
    segmentation support. With this we can offload UDP segmentation on devices
    that only have partial support for tunnels.

    Since we are no longer needing the hardware checksum we can drop the checks
    in the segmentation code that were verifying if it was present.

    Signed-off-by: Alexander Duyck
    Acked-by: Willem de Bruijn
    Signed-off-by: David S. Miller

    Alexander Duyck
     
  • This patch allows us to take care of unrolling the first segment and the
    last segment of the loop for processing the segmented skb. Part of the
    motivation for this is that it makes it easier to process the fact that the
    first fame and all of the frames in between should be mostly identical
    in terms of header data, and the last frame has differences in the length
    and partial checksum.

    In addition I am dropping the header length calculation since we don't
    really need it for anything but the last frame and it can be easily
    obtained by just pulling the data_len and offset of tail from the transport
    header.

    Signed-off-by: Alexander Duyck
    Reviewed-by: Eric Dumazet
    Acked-by: Willem de Bruijn
    Signed-off-by: David S. Miller

    Alexander Duyck
     
  • This patch is meant to allow us to avoid having to recompute the checksum
    from scratch and have it passed as a parameter.

    Instead of taking that approach we can take advantage of the fact that the
    length that was used to compute the existing checksum is included in the
    UDP header.

    Finally to avoid the need to invert the result we can just call csum16_add
    and csum16_sub directly. By doing this we can avoid a number of
    instructions in the loop that is handling segmentation.

    Signed-off-by: Alexander Duyck
    Acked-by: Willem de Bruijn
    Signed-off-by: David S. Miller

    Alexander Duyck
     
  • There is no point in passing MSS as a parameter for for the GSO
    segmentation call as it is already available via the shared info for the
    skb itself.

    Reviewed-by: Eric Dumazet
    Acked-by: Willem de Bruijn
    Signed-off-by: Alexander Duyck
    Signed-off-by: David S. Miller

    Alexander Duyck
     

02 May, 2018

1 commit

  • Using the udp_v4_check() function to calculate the pseudo header
    for the newly segmented UDP packets results in assigning the complement
    of the value to the UDP header checksum field.

    Always undo the complement the partial checksum value in order to
    match the case where GSO is not used on the UDP transmit path.

    Fixes: ee80d1ebe5ba ("udp: add udp gso")
    Signed-off-by: Sean Tranchetti
    Signed-off-by: Subash Abhinov Kasiviswanathan
    Acked-by: Willem de Bruijn
    Signed-off-by: David S. Miller

    Sean Tranchetti
     

28 Apr, 2018

1 commit

  • UDP GSO needs to export __udp_gso_segment to call it from ipv6.

    I accidentally exported static ipv4 function __udp4_gso_segment.
    Remove that EXPORT_SYMBOL_GPL.

    Fixes: ee80d1ebe5ba ("udp: add udp gso")
    Signed-off-by: Willem de Bruijn
    Signed-off-by: David S. Miller

    Willem de Bruijn
     

27 Apr, 2018

2 commits

  • skb_segment by default transfers allocated wmem from the gso skb
    to the tail of the segment list. This underreports real truesize
    of the list, especially if the tail might be dropped.

    Similar to tcp_gso_segment, update wmem_alloc with the aggregate
    list truesize and make each segment responsible for its own
    share by setting skb->destructor.

    Clear gso_skb->destructor prior to calling skb_segment to skip
    the default assignment to tail.

    Signed-off-by: Willem de Bruijn
    Signed-off-by: David S. Miller

    Willem de Bruijn
     
  • Implement generic segmentation offload support for udp datagrams. A
    follow-up patch adds support to the protocol stack to generate such
    packets.

    UDP GSO is not UFO. UFO fragments a single large datagram. GSO splits
    a large payload into a number of discrete UDP datagrams.

    The implementation adds a GSO type SKB_UDP_GSO_L4 to differentiate it
    from UFO (SKB_UDP_GSO).

    IPPROTO_UDPLITE is excluded, as that protocol has no gso handler
    registered.

    [ Export __udp_gso_segment for ipv6. -DaveM ]

    Signed-off-by: Willem de Bruijn
    Signed-off-by: David S. Miller

    Willem de Bruijn
     

23 Jan, 2018

1 commit

  • Validate gso_type during segmentation as SKB_GSO_DODGY sources
    may pass packets where the gso_type does not match the contents.

    Syzkaller was able to enter the SCTP gso handler with a packet of
    gso_type SKB_GSO_TCPV4.

    On entry of transport layer gso handlers, verify that the gso_type
    matches the transport protocol.

    Fixes: 90017accff61 ("sctp: Add GSO support")
    Link: http://lkml.kernel.org/r/
    Reported-by: syzbot+fee64147a25aecd48055@syzkaller.appspotmail.com
    Signed-off-by: Willem de Bruijn
    Acked-by: Jason Wang
    Reviewed-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Willem de Bruijn
     

24 Nov, 2017

1 commit

  • Tuntap and similar devices can inject GSO packets. Accept type
    VIRTIO_NET_HDR_GSO_UDP, even though not generating UFO natively.

    Processes are expected to use feature negotiation such as TUNSETOFFLOAD
    to detect supported offload types and refrain from injecting other
    packets. This process breaks down with live migration: guest kernels
    do not renegotiate flags, so destination hosts need to expose all
    features that the source host does.

    Partially revert the UFO removal from 182e0b6b5846~1..d9d30adf5677.
    This patch introduces nearly(*) no new code to simplify verification.
    It brings back verbatim tuntap UFO negotiation, VIRTIO_NET_HDR_GSO_UDP
    insertion and software UFO segmentation.

    It does not reinstate protocol stack support, hardware offload
    (NETIF_F_UFO), SKB_GSO_UDP tunneling in SKB_GSO_SOFTWARE or reception
    of VIRTIO_NET_HDR_GSO_UDP packets in tuntap.

    To support SKB_GSO_UDP reappearing in the stack, also reinstate
    logic in act_csum and openvswitch. Achieve equivalence with v4.13 HEAD
    by squashing in commit 939912216fa8 ("net: skb_needs_check() removes
    CHECKSUM_UNNECESSARY check for tx.") and reverting commit 8d63bee643f1
    ("net: avoid skb_warn_bad_offload false positives on UFO").

    (*) To avoid having to bring back skb_shinfo(skb)->ip6_frag_id,
    ipv6_proxy_select_ident is changed to return a __be32 and this is
    assigned directly to the frag_hdr. Also, SKB_GSO_UDP is inserted
    at the end of the enum to minimize code churn.

    Tested
    Booted a v4.13 guest kernel with QEMU. On a host kernel before this
    patch `ethtool -k eth0` shows UFO disabled. After the patch, it is
    enabled, same as on a v4.13 host kernel.

    A UFO packet sent from the guest appears on the tap device:
    host:
    nc -l -p -u 8000 &
    tcpdump -n -i tap0

    guest:
    dd if=/dev/zero of=payload.txt bs=1 count=2000
    nc -u 192.16.1.1 8000 < payload.txt

    Direct tap to tap transmission of VIRTIO_NET_HDR_GSO_UDP succeeds,
    packets arriving fragmented:

    ./with_tap_pair.sh ./tap_send_ufo tap0 tap1
    (from https://github.com/wdebruij/kerneltools/tree/master/tests)

    Changes
    v1 -> v2
    - simplified set_offload change (review comment)
    - documented test procedure

    Link: http://lkml.kernel.org/r/
    Fixes: fb652fdfe837 ("macvlan/macvtap: Remove NETIF_F_UFO advertisement.")
    Reported-by: Michal Kubecek
    Signed-off-by: Willem de Bruijn
    Acked-by: Jason Wang
    Signed-off-by: David S. Miller

    Willem de Bruijn
     

09 Oct, 2017

1 commit

  • When gso_size reset to zero for the tail segment in skb_segment(), later
    in ipv6_gso_segment(), __skb_udp_tunnel_segment() and gre_gso_segment()
    we will get incorrect results (payload length, pcsum) for that segment.
    inet_gso_segment() already has a check for gso_size before calculating
    payload.

    The issue was found with LTP vxlan & gre tests over ixgbe NIC.

    Fixes: 07b26c9454a2 ("gso: Support partial splitting at the frag_list pointer")
    Signed-off-by: Alexey Kodanev
    Acked-by: Alexander Duyck
    Signed-off-by: David S. Miller

    Alexey Kodanev
     

18 Jul, 2017

2 commits


25 Apr, 2017

1 commit

  • Otherwise, UDP checksum offloads could corrupt ESP packets by attempting
    to calculate UDP checksum when this inner UDP packet is already protected
    by IPsec.

    One way to reproduce this bug is to have a VM with virtio_net driver (UFO
    set to ON in the guest VM); and then encapsulate all guest's Ethernet
    frames in Geneve; and then further encrypt Geneve with IPsec. In this
    case following symptoms are observed:
    1. If using ixgbe NIC, then it will complain with following error message:
    ixgbe 0000:01:00.1: partial checksum but l4 proto=32!
    2. Receiving IPsec stack will drop all the corrupted ESP packets and
    increase XfrmInStateProtoError counter in /proc/net/xfrm_stat.
    3. iperf UDP test from the VM with packet sizes above MTU will not work at
    all.
    4. iperf TCP test from the VM will get ridiculously low performance because.

    Signed-off-by: Ansis Atteka
    Co-authored-by: Steffen Klassert
    Signed-off-by: David S. Miller

    Ansis Atteka
     

21 Oct, 2016

1 commit

  • Currently, GRO can do unlimited recursion through the gro_receive
    handlers. This was fixed for tunneling protocols by limiting tunnel GRO
    to one level with encap_mark, but both VLAN and TEB still have this
    problem. Thus, the kernel is vulnerable to a stack overflow, if we
    receive a packet composed entirely of VLAN headers.

    This patch adds a recursion counter to the GRO layer to prevent stack
    overflow. When a gro_receive function hits the recursion limit, GRO is
    aborted for this skb and it is processed normally. This recursion
    counter is put in the GRO CB, but could be turned into a percpu counter
    if we run out of space in the CB.

    Thanks to Vladimír Beneš for the initial bug report.

    Fixes: CVE-2016-7039
    Fixes: 9b174d88c257 ("net: Add Transparent Ethernet Bridging GRO support.")
    Fixes: 66e5133f19e9 ("vlan: Add GRO support for non hardware accelerated vlan")
    Signed-off-by: Sabrina Dubroca
    Reviewed-by: Jiri Benc
    Acked-by: Hannes Frederic Sowa
    Acked-by: Tom Herbert
    Signed-off-by: David S. Miller

    Sabrina Dubroca
     

20 Sep, 2016

1 commit

  • Since commit 8a29111c7 ("net: gro: allow to build full sized skb")
    gro may build buffers with a frag_list. This can hurt forwarding
    because most NICs can't offload such packets, they need to be
    segmented in software. This patch splits buffers with a frag_list
    at the frag_list pointer into buffers that can be TSO offloaded.

    Signed-off-by: Steffen Klassert
    Acked-by: Alexander Duyck
    Signed-off-by: David S. Miller

    Steffen Klassert
     

21 May, 2016

1 commit

  • In several gso_segment functions there are checks of gso_type against
    a seemingly arbitrary list of SKB_GSO_* flags. This seems like an
    attempt to identify unsupported GSO types, but since the stack is
    the one that set these GSO types in the first place this seems
    unnecessary to do. If a combination isn't valid in the first
    place that stack should not allow setting it.

    This is a code simplication especially for add new GSO types.

    Signed-off-by: Tom Herbert
    Signed-off-by: David S. Miller

    Tom Herbert
     

10 May, 2016

1 commit


07 May, 2016

1 commit

  • UDP tunnel segmentation code relies on the inner offsets being set for
    an UDP tunnel GSO packet, but the inner *_complete() functions will
    set the inner offsets only if 'encapsulation' is set before calling
    them. Currently, udp_gro_complete() sets 'encapsulation' only after
    the inner *_complete() functions are done. This causes the inner
    offsets having invalid values after udp_gro_complete() returns, which
    in turn will make it impossible to properly segment the packet in case
    it needs to be forwarded, which would be visible to the user either as
    invalid packets being sent or as packet loss.

    This patch fixes this by setting skb's 'encapsulation' in
    udp_gro_complete() before calling into the inner complete functions,
    and by making each possible UDP tunnel gro_complete() callback set the
    inner_mac_header to the beginning of the tunnel payload.

    Signed-off-by: Jarno Rajahalme
    Reviewed-by: Alexander Duyck
    Signed-off-by: David S. Miller

    Jarno Rajahalme