06 Sep, 2019

1 commit

  • [ Upstream commit 8d650cdedaabb33e85e9b7c517c0c71fcecc1de9 ]

    Neal reported incorrect use of ns_capable() from bpf hook.

    bpf_setsockopt(...TCP_CONGESTION...)
    -> tcp_set_congestion_control()
    -> ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN)
    -> ns_capable_common()
    -> current_cred()
    -> rcu_dereference_protected(current->cred, 1)

    Accessing 'current' in bpf context makes no sense, since packets
    are processed from softirq context.

    As Neal stated : The capability check in tcp_set_congestion_control()
    was written assuming a system call context, and then was reused from
    a BPF call site.

    The fix is to add a new parameter to tcp_set_congestion_control(),
    so that the ns_capable() call is only performed under the right
    context.

    Fixes: 91b5b21c7c16 ("bpf: Add support for changing congestion control")
    Signed-off-by: Eric Dumazet
    Cc: Lawrence Brakmo
    Reported-by: Neal Cardwell
    Acked-by: Neal Cardwell
    Acked-by: Lawrence Brakmo
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman
    (cherry picked from commit c60f57dfe995172c2f01e59266e3ffa3419c6cd9)

    Eric Dumazet
     

17 Apr, 2019

4 commits

  • [ Upstream commit 9a5a90d167b0e5fe3d47af16b68fd09ce64085cd ]

    __netif_receive_skb_list_ptype() leaves skb->next poisoned before passing
    it to pt_prev->func handler, what may produce (in certain cases, e.g. DSA
    setup) crashes like:

    [ 88.606777] CPU 0 Unable to handle kernel paging request at virtual address 0000000e, epc == 80687078, ra == 8052cc7c
    [ 88.618666] Oops[#1]:
    [ 88.621196] CPU: 0 PID: 0 Comm: swapper Not tainted 5.1.0-rc2-dlink-00206-g4192a172-dirty #1473
    [ 88.630885] $ 0 : 00000000 10000400 00000002 864d7850
    [ 88.636709] $ 4 : 87c0ddf0 864d7800 87c0ddf0 00000000
    [ 88.642526] $ 8 : 00000000 49600000 00000001 00000001
    [ 88.648342] $12 : 00000000 c288617b dadbee27 25d17c41
    [ 88.654159] $16 : 87c0ddf0 85cff080 80790000 fffffffd
    [ 88.659975] $20 : 80797b20 ffffffff 00000001 864d7800
    [ 88.665793] $24 : 00000000 8011e658
    [ 88.671609] $28 : 80790000 87c0dbc0 87cabf00 8052cc7c
    [ 88.677427] Hi : 00000003
    [ 88.680622] Lo : 7b5b4220
    [ 88.683840] epc : 80687078 vlan_dev_hard_start_xmit+0x1c/0x1a0
    [ 88.690532] ra : 8052cc7c dev_hard_start_xmit+0xac/0x188
    [ 88.696734] Status: 10000404 IEp
    [ 88.700422] Cause : 50000008 (ExcCode 02)
    [ 88.704874] BadVA : 0000000e
    [ 88.708069] PrId : 0001a120 (MIPS interAptiv (multi))
    [ 88.713005] Modules linked in:
    [ 88.716407] Process swapper (pid: 0, threadinfo=(ptrval), task=(ptrval), tls=00000000)
    [ 88.725219] Stack : 85f61c28 00000000 0000000e 80780000 87c0ddf0 85cff080 80790000 8052cc7c
    [ 88.734529] 87cabf00 00000000 00000001 85f5fb40 807b0000 864d7850 87cabf00 807d0000
    [ 88.743839] 864d7800 8655f600 00000000 85cff080 87c1c000 0000006a 00000000 8052d96c
    [ 88.753149] 807a0000 8057adb8 87c0dcc8 87c0dc50 85cfff08 00000558 87cabf00 85f58c50
    [ 88.762460] 00000002 85f58c00 864d7800 80543308 fffffff4 00000001 85f58c00 864d7800
    [ 88.771770] ...
    [ 88.774483] Call Trace:
    [ 88.777199] [] vlan_dev_hard_start_xmit+0x1c/0x1a0
    [ 88.783504] [] dev_hard_start_xmit+0xac/0x188
    [ 88.789326] [] __dev_queue_xmit+0x6e8/0x7d4
    [ 88.794955] [] ip_finish_output2+0x238/0x4d0
    [ 88.800677] [] ip_output+0xc8/0x140
    [ 88.805526] [] ip_forward+0x364/0x560
    [ 88.810567] [] ip_rcv+0x48/0xe4
    [ 88.815030] [] __netif_receive_skb_one_core+0x44/0x58
    [ 88.821635] [] dsa_switch_rcv+0x108/0x1ac
    [ 88.827067] [] __netif_receive_skb_list_core+0x228/0x26c
    [ 88.833951] [] netif_receive_skb_list+0x1d4/0x394
    [ 88.840160] [] lunar_rx_poll+0x38c/0x828
    [ 88.845496] [] net_rx_action+0x14c/0x3cc
    [ 88.850835] [] __do_softirq+0x178/0x338
    [ 88.856077] [] irq_exit+0xbc/0x100
    [ 88.860846] [] plat_irq_dispatch+0xc0/0x144
    [ 88.866477] [] handle_int+0x14c/0x158
    [ 88.871516] [] r4k_wait+0x30/0x40
    [ 88.876462] Code: afb10014 8c8200a0 00803025 94a20468 00000000 10620042 00a08025 9605046a
    [ 88.887332]
    [ 88.888982] ---[ end trace eb863d007da11cf1 ]---
    [ 88.894122] Kernel panic - not syncing: Fatal exception in interrupt
    [ 88.901202] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---

    Fix this by pulling skb off the sublist and zeroing skb->next pointer
    before calling ptype callback.

    Fixes: 88eb1944e18c ("net: core: propagate SKB lists through packet_type lookup")
    Reviewed-by: Edward Cree
    Signed-off-by: Alexander Lobakin
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin

    Alexander Lobakin
     
  • [ Upstream commit 355b98553789b646ed97ad801a619ff898471b92 ]

    net_hash_mix() currently uses kernel address of a struct net,
    and is used in many places that could be used to reveal this
    address to a patient attacker, thus defeating KASLR, for
    the typical case (initial net namespace, &init_net is
    not dynamically allocated)

    I believe the original implementation tried to avoid spending
    too many cycles in this function, but security comes first.

    Also provide entropy regardless of CONFIG_NET_NS.

    Fixes: 0b4419162aa6 ("netns: introduce the net_hash_mix "salt" for hashes")
    Signed-off-by: Eric Dumazet
    Reported-by: Amit Klein
    Reported-by: Benny Pinkas
    Cc: Pavel Emelyanov
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin

    Eric Dumazet
     
  • [ Upstream commit 0ab03f353d3613ea49d1f924faf98559003670a8 ]

    Currently we may merge incorrectly a received GSO packet
    or a packet with frag_list into a packet sitting in the
    gro_hash list. skb_segment() may crash case because
    the assumptions on the skb layout are not met.
    The correct behaviour would be to flush the packet in the
    gro_hash list and send the received GSO packet directly
    afterwards. Commit d61d072e87c8e ("net-gro: avoid reorders")
    sets NAPI_GRO_CB(skb)->flush in this case, but this is not
    checked before merging. This patch makes sure to check this
    flag and to not merge in that case.

    Fixes: d61d072e87c8e ("net-gro: avoid reorders")
    Signed-off-by: Steffen Klassert
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin

    Steffen Klassert
     
  • [ Upstream commit 3d8830266ffc28c16032b859e38a0252e014b631 ]

    NULL or ZERO_SIZE_PTR will be returned for zero sized memory
    request, and derefencing them will lead to a segfault

    so it is unnecessory to call vzalloc for zero sized memory
    request and not call functions which maybe derefence the
    NULL allocated memory

    this also fixes a possible memory leak if phy_ethtool_get_stats
    returns error, memory should be freed before exit

    Signed-off-by: Li RongQing
    Reviewed-by: Wang Li
    Reviewed-by: Michal Kubecek
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin

    Li RongQing
     

03 Apr, 2019

2 commits

  • [ Upstream commit a3e23f719f5c4a38ffb3d30c8d7632a4ed8ccd9e ]

    In netdev_queue_add_kobject and rx_queue_add_kobject,
    if sysfs_create_group failed, kobject_put will call
    netdev_queue_release to decrease dev refcont, however
    dev_hold has not be called. So we will see this while
    unregistering dev:

    unregister_netdevice: waiting for bcsh0 to become free. Usage count = -1

    Reported-by: Hulk Robot
    Fixes: d0d668371679 ("net: don't decrement kobj reference count on init failure")
    Signed-off-by: YueHaibing
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    YueHaibing
     
  • [ Upstream commit 0b91bce1ebfc797ff3de60c8f4a1e6219a8a3187 ]

    Christoph reported a stall while peeking datagram with an offset when
    busy polling is enabled. __skb_try_recv_datagram() uses as the loop
    termination condition 'queue empty'. When peeking, the socket
    queue can be not empty, even when no additional packets are received.

    Address the issue explicitly checking for receive queue changes,
    as currently done by __skb_wait_for_more_packets().

    Fixes: 2b5cd0dfa384 ("net: Change return type of sk_busy_loop from bool to void")
    Reported-and-tested-by: Christoph Paasch
    Signed-off-by: Paolo Abeni
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Paolo Abeni
     

24 Mar, 2019

2 commits

  • [ Upstream commit 4c3024debf62de4c6ac6d3cb4c0063be21d4f652 ]

    BPF can adjust gso only for tcp bytestreams. Fail on other gso types.

    But only on gso packets. It does not touch this field if !gso_size.

    Fixes: b90efd225874 ("bpf: only adjust gso_size on bytestream protocols")
    Signed-off-by: Willem de Bruijn
    Acked-by: Yonghong Song
    Signed-off-by: Daniel Borkmann
    Signed-off-by: Sasha Levin

    Willem de Bruijn
     
  • [ Upstream commit b90efd2258749e04e1b3f71ef0d716f2ac2337e0 ]

    bpf_skb_change_proto and bpf_skb_adjust_room change skb header length.
    For GSO packets they adjust gso_size to maintain the same MTU.

    The gso size can only be safely adjusted on bytestream protocols.
    Commit d02f51cbcf12 ("bpf: fix bpf_skb_adjust_net/bpf_skb_proto_xlat
    to deal with gso sctp skbs") excluded SKB_GSO_SCTP.

    Since then type SKB_GSO_UDP_L4 has been added, whose contents are one
    gso_size unit per datagram. Also exclude these.

    Move from a blacklist to a whitelist check to future proof against
    additional such new GSO types, e.g., for fraglist based GRO.

    Fixes: bec1f6f69736 ("udp: generate gso with UDP_SEGMENT")
    Signed-off-by: Willem de Bruijn
    Signed-off-by: Alexei Starovoitov
    Signed-off-by: Sasha Levin

    Willem de Bruijn
     

19 Mar, 2019

1 commit

  • [ Upstream commit 2a5ff07a0eb945f291e361aa6f6becca8340ba46 ]

    We keep receiving syzbot reports [1] that show that tunnels do not play
    the rcu/IFF_UP rules properly.

    At device dismantle phase, gro_cells_destroy() will be called
    only after a full rcu grace period is observed after IFF_UP
    has been cleared.

    This means that IFF_UP needs to be tested before queueing packets
    into netif_rx() or gro_cells.

    This patch implements the test in gro_cells_receive() because
    too many callers do not seem to bother enough.

    [1]
    BUG: unable to handle kernel paging request at fffff4ca0b9ffffe
    PGD 0 P4D 0
    Oops: 0000 [#1] PREEMPT SMP KASAN
    CPU: 0 PID: 21 Comm: kworker/u4:1 Not tainted 5.0.0+ #97
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Workqueue: netns cleanup_net
    RIP: 0010:__skb_unlink include/linux/skbuff.h:1929 [inline]
    RIP: 0010:__skb_dequeue include/linux/skbuff.h:1945 [inline]
    RIP: 0010:__skb_queue_purge include/linux/skbuff.h:2656 [inline]
    RIP: 0010:gro_cells_destroy net/core/gro_cells.c:89 [inline]
    RIP: 0010:gro_cells_destroy+0x19d/0x360 net/core/gro_cells.c:78
    Code: 03 42 80 3c 20 00 0f 85 53 01 00 00 48 8d 7a 08 49 8b 47 08 49 c7 07 00 00 00 00 48 89 f9 49 c7 47 08 00 00 00 00 48 c1 e9 03 80 3c 21 00 0f 85 10 01 00 00 48 89 c1 48 89 42 08 48 c1 e9 03
    RSP: 0018:ffff8880aa3f79a8 EFLAGS: 00010a02
    RAX: 00ffffffffffffe8 RBX: ffffe8ffffc64b70 RCX: 1ffff8ca0b9ffffe
    RDX: ffffc6505cffffe8 RSI: ffffffff858410ca RDI: ffffc6505cfffff0
    RBP: ffff8880aa3f7a08 R08: ffff8880aa3e8580 R09: fffffbfff1263645
    R10: fffffbfff1263644 R11: ffffffff8931b223 R12: dffffc0000000000
    R13: 0000000000000000 R14: ffffe8ffffc64b80 R15: ffffe8ffffc64b75
    kobject: 'loop2' (000000004bd7d84a): kobject_uevent_env
    FS: 0000000000000000(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: fffff4ca0b9ffffe CR3: 0000000094941000 CR4: 00000000001406f0
    Call Trace:
    kobject: 'loop2' (000000004bd7d84a): fill_kobj_path: path = '/devices/virtual/block/loop2'
    ip_tunnel_dev_free+0x19/0x60 net/ipv4/ip_tunnel.c:1010
    netdev_run_todo+0x51c/0x7d0 net/core/dev.c:8970
    rtnl_unlock+0xe/0x10 net/core/rtnetlink.c:116
    ip_tunnel_delete_nets+0x423/0x5f0 net/ipv4/ip_tunnel.c:1124
    vti_exit_batch_net+0x23/0x30 net/ipv4/ip_vti.c:495
    ops_exit_list.isra.0+0x105/0x160 net/core/net_namespace.c:156
    cleanup_net+0x3fb/0x960 net/core/net_namespace.c:551
    process_one_work+0x98e/0x1790 kernel/workqueue.c:2173
    worker_thread+0x98/0xe40 kernel/workqueue.c:2319
    kthread+0x357/0x430 kernel/kthread.c:246
    ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352
    Modules linked in:
    CR2: fffff4ca0b9ffffe
    [ end trace 513fc9c1338d1cb3 ]
    RIP: 0010:__skb_unlink include/linux/skbuff.h:1929 [inline]
    RIP: 0010:__skb_dequeue include/linux/skbuff.h:1945 [inline]
    RIP: 0010:__skb_queue_purge include/linux/skbuff.h:2656 [inline]
    RIP: 0010:gro_cells_destroy net/core/gro_cells.c:89 [inline]
    RIP: 0010:gro_cells_destroy+0x19d/0x360 net/core/gro_cells.c:78
    Code: 03 42 80 3c 20 00 0f 85 53 01 00 00 48 8d 7a 08 49 8b 47 08 49 c7 07 00 00 00 00 48 89 f9 49 c7 47 08 00 00 00 00 48 c1 e9 03 80 3c 21 00 0f 85 10 01 00 00 48 89 c1 48 89 42 08 48 c1 e9 03
    RSP: 0018:ffff8880aa3f79a8 EFLAGS: 00010a02
    RAX: 00ffffffffffffe8 RBX: ffffe8ffffc64b70 RCX: 1ffff8ca0b9ffffe
    RDX: ffffc6505cffffe8 RSI: ffffffff858410ca RDI: ffffc6505cfffff0
    RBP: ffff8880aa3f7a08 R08: ffff8880aa3e8580 R09: fffffbfff1263645
    R10: fffffbfff1263644 R11: ffffffff8931b223 R12: dffffc0000000000
    kobject: 'loop3' (00000000e4ee57a6): kobject_uevent_env
    R13: 0000000000000000 R14: ffffe8ffffc64b80 R15: ffffe8ffffc64b75
    FS: 0000000000000000(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: fffff4ca0b9ffffe CR3: 0000000094941000 CR4: 00000000001406f0

    Fixes: c9e6bc644e55 ("net: add gro_cells infrastructure")
    Signed-off-by: Eric Dumazet
    Reported-by: syzbot
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     

14 Mar, 2019

1 commit

  • [ Upstream commit c9e4576743eeda8d24dedc164d65b78877f9a98c ]

    When sock recvbuff is set by bpf_setsockopt(), the value must by
    limited by rmem_max. It is the same with sendbuff.

    Fixes: 8c4b4c7e9ff0 ("bpf: Add setsockopt helper function to bpf")
    Signed-off-by: Yafang Shao
    Acked-by: Martin KaFai Lau
    Acked-by: Lawrence Brakmo
    Signed-off-by: Daniel Borkmann
    Signed-off-by: Sasha Levin

    Yafang Shao
     

10 Mar, 2019

2 commits

  • [ Upstream commit 895a5e96dbd6386c8e78e5b78e067dcc67b7f0ab ]

    syzkaller report this:
    BUG: memory leak
    unreferenced object 0xffff88837a71a500 (size 256):
    comm "syz-executor.2", pid 9770, jiffies 4297825125 (age 17.843s)
    hex dump (first 32 bytes):
    00 00 00 00 ad 4e ad de ff ff ff ff 00 00 00 00 .....N..........
    ff ff ff ff ff ff ff ff 20 c0 ef 86 ff ff ff ff ........ .......
    backtrace:
    [] netdev_register_kobject+0x124/0x2e0 net/core/net-sysfs.c:1751
    [] register_netdevice+0xcc1/0x1270 net/core/dev.c:8516
    [] tun_set_iff drivers/net/tun.c:2649 [inline]
    [] __tun_chr_ioctl+0x2218/0x3d20 drivers/net/tun.c:2883
    [] vfs_ioctl fs/ioctl.c:46 [inline]
    [] do_vfs_ioctl+0x1a5/0x10e0 fs/ioctl.c:690
    [] ksys_ioctl+0x89/0xa0 fs/ioctl.c:705
    [] __do_sys_ioctl fs/ioctl.c:712 [inline]
    [] __se_sys_ioctl fs/ioctl.c:710 [inline]
    [] __x64_sys_ioctl+0x74/0xb0 fs/ioctl.c:710
    [] do_syscall_64+0xc8/0x580 arch/x86/entry/common.c:290
    [] entry_SYSCALL_64_after_hwframe+0x49/0xbe
    [] 0xffffffffffffffff

    It should call kset_unregister to free 'dev->queues_kset'
    in error path of register_queue_kobjects, otherwise will cause a mem leak.

    Reported-by: Hulk Robot
    Fixes: 1d24eb4815d1 ("xps: Transmit Packet Steering")
    Signed-off-by: YueHaibing
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    YueHaibing
     
  • [ Upstream commit 46b1c18f9deb326a7e18348e668e4c7ab7c7458b ]

    In the series fc8b81a5981f ("Merge branch 'lockless-qdisc-series'")
    John made the assumption that the data path had no need to read
    the qdisc qlen (number of packets in the qdisc).

    It is true when pfifo_fast is used as the root qdisc, or as direct MQ/MQPRIO
    children.

    But pfifo_fast can be used as leaf in class full qdiscs, and existing
    logic needs to access the child qlen in an efficient way.

    HTB breaks badly, since it uses cl->leaf.q->q.qlen in :
    htb_activate() -> WARN_ON()
    htb_dequeue_tree() to decide if a class can be htb_deactivated
    when it has no more packets.

    HFSC, DRR, CBQ, QFQ have similar issues, and some calls to
    qdisc_tree_reduce_backlog() also read q.qlen directly.

    Using qdisc_qlen_sum() (which iterates over all possible cpus)
    in the data path is a non starter.

    It seems we have to put back qlen in a central location,
    at least for stable kernels.

    For all qdisc but pfifo_fast, qlen is guarded by the qdisc lock,
    so the existing q.qlen{++|--} are correct.

    For 'lockless' qdisc (pfifo_fast so far), we need to use atomic_{inc|dec}()
    because the spinlock might be not held (for example from
    pfifo_fast_enqueue() and pfifo_fast_dequeue())

    This patch adds atomic_qlen (in the same location than qlen)
    and renames the following helpers, since we want to express
    they can be used without qdisc lock, and that qlen is no longer percpu.

    - qdisc_qstats_cpu_qlen_dec -> qdisc_qstats_atomic_qlen_dec()
    - qdisc_qstats_cpu_qlen_inc -> qdisc_qstats_atomic_qlen_inc()

    Later (net-next) we might revert this patch by tracking all these
    qlen uses and replace them by a more efficient method (not having
    to access a precise qlen, but an empty/non_empty status that might
    be less expensive to maintain/track).

    Another possibility is to have a legacy pfifo_fast version that would
    be used when used a a child qdisc, since the parent qdisc needs
    a spinlock anyway. But then, future lockless qdiscs would also
    have the same problem.

    Fixes: 7e66016f2c65 ("net: sched: helpers to sum qlen and qlen for per cpu logic")
    Signed-off-by: Eric Dumazet
    Cc: John Fastabend
    Cc: Jamal Hadi Salim
    Cc: Cong Wang
    Cc: Jiri Pirko
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     

27 Feb, 2019

2 commits

  • [ Upstream commit f4924f24da8c7ef64195096817f3cde324091d97 ]

    In sock_setsockopt() (net/core/sock.h), when SO_MARK option is used
    to change sk_mark, sk_dst_reset(sk) is called. The same should be
    done in bpf_setsockopt().

    Fixes: 8c4b4c7e9ff0 ("bpf: Add setsockopt helper function to bpf")
    Reported-by: Maciej Żenczykowski
    Signed-off-by: Peter Oskolkov
    Acked-by: Martin KaFai Lau
    Reviewed-by: Maciej Żenczykowski
    Signed-off-by: Daniel Borkmann
    Signed-off-by: Sasha Levin

    Peter Oskolkov
     
  • [ Upstream commit 31aa6503a15ba00182ea6dbbf51afb63bf9e851d ]

    The existing BPF TCP initial congestion window (TCP_BPF_IW) does not
    to work on (active) Fast Open sender. This is because it changes the
    (initial) window only if data_segs_out is zero -- but data_segs_out
    is also incremented on SYN-data. This patch fixes the issue by
    proerly accounting for SYN-data additionally.

    Fixes: fc7478103c84 ("bpf: Adds support for setting initial cwnd")
    Signed-off-by: Yuchung Cheng
    Reviewed-by: Neal Cardwell
    Acked-by: Lawrence Brakmo
    Signed-off-by: Alexei Starovoitov
    Signed-off-by: Sasha Levin

    Yuchung Cheng
     

23 Feb, 2019

2 commits

  • [ Upstream commit 3bed3cc4156eedf652b4df72bdb35d4f1a2a739d ]

    This patch addresses the fact that there are drivers, specifically tun,
    that will call into the network page fragment allocators with buffer sizes
    that are not cache aligned. Doing this could result in data alignment
    and DMA performance issues as these fragment pools are also shared with the
    skb allocator and any other devices that will use napi_alloc_frags or
    netdev_alloc_frags.

    Fixes: ffde7328a36d ("net: Split netdev_alloc_frag into __alloc_page_frag and add __napi_alloc_frag")
    Reported-by: Jann Horn
    Signed-off-by: Alexander Duyck
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Alexander Duyck
     
  • [ Upstream commit 3b89ea9c5902acccdbbdec307c85edd1bf52515e ]

    The features attribute is of type u64 and stored in the native endianes on
    the system. The for_each_set_bit() macro takes a pointer to a 32 bit array
    and goes over the bits in this area. On little Endian systems this also
    works with an u64 as the most significant bit is on the highest address,
    but on big endian the words are swapped. When we expect bit 15 here we get
    bit 47 (15 + 32).

    This patch converts it more or less to its own for_each_set_bit()
    implementation which works on 64 bit integers directly. This is then
    completely in host endianness and should work like expected.

    Fixes: fd867d51f ("net/core: generic support for disabling netdev features down stack")
    Signed-off-by: Hauke Mehrtens
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin

    Hauke Mehrtens
     

07 Feb, 2019

1 commit

  • [ Upstream commit 35edfdc77f683c8fd27d7732af06cf6489af60a5 ]

    Assign a default net namespace to netdevs created by init_dummy_netdev().
    Fixes a NULL pointer dereference caused by busy-polling a socket bound to
    an iwlwifi wireless device, which bumps the per-net BUSYPOLLRXPACKETS stat
    if napi_poll() received packets:

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000190
    IP: napi_busy_loop+0xd6/0x200
    Call Trace:
    sock_poll+0x5e/0x80
    do_sys_poll+0x324/0x5a0
    SyS_poll+0x6c/0xf0
    do_syscall_64+0x6b/0x1f0
    entry_SYSCALL_64_after_hwframe+0x3d/0xa2

    Fixes: 7db6b048da3b ("net: Commonize busy polling code to focus on napi_id instead of socket")
    Signed-off-by: Josh Elsasser
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Josh Elsasser
     

26 Jan, 2019

2 commits

  • [ Upstream commit 0fbe82e628c817e292ff588cd5847fc935e025f2 ]

    after set SO_DONTROUTE to 1, the IP layer should not route packets if
    the dest IP address is not in link scope. But if the socket has cached
    the dst_entry, such packets would be routed until the sk_dst_cache
    expires. So we should clean the sk_dst_cache when a user set
    SO_DONTROUTE option. Below are server/client python scripts which
    could reprodue this issue:

    server side code:

    ==========================================================================
    import socket
    import struct
    import time

    s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    s.bind(('0.0.0.0', 9000))
    s.listen(1)
    sock, addr = s.accept()
    sock.setsockopt(socket.SOL_SOCKET, socket.SO_DONTROUTE, struct.pack('i', 1))
    while True:
    sock.send(b'foo')
    time.sleep(1)
    ==========================================================================

    client side code:
    ==========================================================================
    import socket
    import time

    s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    s.connect(('server_address', 9000))
    while True:
    data = s.recv(1024)
    print(data)
    ==========================================================================

    Signed-off-by: yupeng
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin

    yupeng
     
  • [ Upstream commit f8c468e8537925e0c4607263f498a1b7c0c8982e ]

    Commit dcda9b04713c ("mm, tree wide: replace __GFP_REPEAT by
    __GFP_RETRY_MAYFAIL with more useful semantic") replaced __GFP_REPEAT in
    alloc_skb_with_frags() with __GFP_RETRY_MAYFAIL when the allocation may
    directly reclaim.

    The previous behavior would require reclaim up to 1 << order pages for
    skb aligned header_len of order > PAGE_ALLOC_COSTLY_ORDER before failing,
    otherwise the allocations in alloc_skb() would loop in the page allocator
    looking for memory. __GFP_RETRY_MAYFAIL makes both allocations failable
    under memory pressure, including for the HEAD allocation.

    This can cause, among many other things, write() to fail with ENOTCONN
    during RPC when under memory pressure.

    These allocations should succeed as they did previous to dcda9b04713c
    even if it requires calling the oom killer and additional looping in the
    page allocator to find memory. There is no way to specify the previous
    behavior of __GFP_REPEAT, but it's unlikely to be necessary since the
    previous behavior only guaranteed that 1 << order pages would be reclaimed
    before failing for order > PAGE_ALLOC_COSTLY_ORDER. That reclaim is not
    guaranteed to be contiguous memory, so repeating for such large orders is
    usually not beneficial.

    Removing the setting of __GFP_RETRY_MAYFAIL to restore the previous
    behavior, specifically not allowing alloc_skb() to fail for small orders
    and oom kill if necessary rather than allowing RPCs to fail.

    Fixes: dcda9b04713c ("mm, tree wide: replace __GFP_REPEAT by __GFP_RETRY_MAYFAIL with more useful semantic")
    Signed-off-by: David Rientjes
    Reviewed-by: Eric Dumazet
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    David Rientjes
     

23 Jan, 2019

1 commit

  • commit e7c87bd6cc4ec7b0ac1ed0a88a58f8206c577488 upstream.

    Syzkaller was able to construct a packet of negative length by
    redirecting from bpf_prog_test_run_skb with BPF_PROG_TYPE_LWT_XMIT:

    BUG: KASAN: slab-out-of-bounds in memcpy include/linux/string.h:345 [inline]
    BUG: KASAN: slab-out-of-bounds in skb_copy_from_linear_data include/linux/skbuff.h:3421 [inline]
    BUG: KASAN: slab-out-of-bounds in __pskb_copy_fclone+0x2dd/0xeb0 net/core/skbuff.c:1395
    Read of size 4294967282 at addr ffff8801d798009c by task syz-executor2/12942

    kasan_report.cold.9+0x242/0x309 mm/kasan/report.c:412
    check_memory_region_inline mm/kasan/kasan.c:260 [inline]
    check_memory_region+0x13e/0x1b0 mm/kasan/kasan.c:267
    memcpy+0x23/0x50 mm/kasan/kasan.c:302
    memcpy include/linux/string.h:345 [inline]
    skb_copy_from_linear_data include/linux/skbuff.h:3421 [inline]
    __pskb_copy_fclone+0x2dd/0xeb0 net/core/skbuff.c:1395
    __pskb_copy include/linux/skbuff.h:1053 [inline]
    pskb_copy include/linux/skbuff.h:2904 [inline]
    skb_realloc_headroom+0xe7/0x120 net/core/skbuff.c:1539
    ipip6_tunnel_xmit net/ipv6/sit.c:965 [inline]
    sit_tunnel_xmit+0xe1b/0x30d0 net/ipv6/sit.c:1029
    __netdev_start_xmit include/linux/netdevice.h:4325 [inline]
    netdev_start_xmit include/linux/netdevice.h:4334 [inline]
    xmit_one net/core/dev.c:3219 [inline]
    dev_hard_start_xmit+0x295/0xc90 net/core/dev.c:3235
    __dev_queue_xmit+0x2f0d/0x3950 net/core/dev.c:3805
    dev_queue_xmit+0x17/0x20 net/core/dev.c:3838
    __bpf_tx_skb net/core/filter.c:2016 [inline]
    __bpf_redirect_common net/core/filter.c:2054 [inline]
    __bpf_redirect+0x5cf/0xb20 net/core/filter.c:2061
    ____bpf_clone_redirect net/core/filter.c:2094 [inline]
    bpf_clone_redirect+0x2f6/0x490 net/core/filter.c:2066
    bpf_prog_41f2bcae09cd4ac3+0xb25/0x1000

    The generated test constructs a packet with mac header, network
    header, skb->data pointing to network header and skb->len 0.

    Redirecting to a sit0 through __bpf_redirect_no_mac pulls the
    mac length, even though skb->data already is at skb->network_header.
    bpf_prog_test_run_skb has already pulled it as LWT_XMIT !is_l2.

    Update the offset calculation to pull only if skb->data differs
    from skb->network_header, which is not true in this case.

    The test itself can be run only from commit 1cf1cae963c2 ("bpf:
    introduce BPF_PROG_TEST_RUN command"), but the same type of packets
    with skb at network header could already be built from lwt xmit hooks,
    so this fix is more relevant to that commit.

    Also set the mac header on redirect from LWT_XMIT, as even after this
    change to __bpf_redirect_no_mac that field is expected to be set, but
    is not yet in ip_finish_output2.

    Fixes: 3a0af8fd61f9 ("bpf: BPF for lightweight tunnel infrastructure")
    Reported-by: syzbot
    Signed-off-by: Willem de Bruijn
    Acked-by: Martin KaFai Lau
    Signed-off-by: Daniel Borkmann
    Signed-off-by: Greg Kroah-Hartman

    Willem de Bruijn
     

10 Jan, 2019

2 commits

  • [ Upstream commit 3a0ed3e9619738067214871e9cb826fa23b2ddb9 ]

    Al Viro mentioned (Message-ID
    )
    that there is probably a race condition
    lurking in accesses of sk_stamp on 32-bit machines.

    sock->sk_stamp is of type ktime_t which is always an s64.
    On a 32 bit architecture, we might run into situations of
    unsafe access as the access to the field becomes non atomic.

    Use seqlocks for synchronization.
    This allows us to avoid using spinlocks for readers as
    readers do not need mutual exclusion.

    Another approach to solve this is to require sk_lock for all
    modifications of the timestamps. The current approach allows
    for timestamps to have their own lock: sk_stamp_lock.
    This allows for the patch to not compete with already
    existing critical sections, and side effects are limited
    to the paths in the patch.

    The addition of the new field maintains the data locality
    optimizations from
    commit 9115e8cd2a0c ("net: reorganize struct sock for better data
    locality")

    Note that all the instances of the sk_stamp accesses
    are either through the ioctl or the syscall recvmsg.

    Signed-off-by: Deepa Dinamani
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Deepa Dinamani
     
  • [ Upstream commit 8e1da73acded4751a93d4166458a7e640f37d26c ]

    Add napi_disable routine in gro_cells_destroy since starting from
    commit c42858eaf492 ("gro_cells: remove spinlock protecting receive
    queues") gro_cell_poll and gro_cells_destroy can run concurrently on
    napi_skbs list producing a kernel Oops if the tunnel interface is
    removed while gro_cell_poll is running. The following Oops has been
    triggered removing a vxlan device while the interface is receiving
    traffic

    [ 5628.948853] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
    [ 5628.949981] PGD 0 P4D 0
    [ 5628.950308] Oops: 0002 [#1] SMP PTI
    [ 5628.950748] CPU: 0 PID: 9 Comm: ksoftirqd/0 Not tainted 4.20.0-rc6+ #41
    [ 5628.952940] RIP: 0010:gro_cell_poll+0x49/0x80
    [ 5628.955615] RSP: 0018:ffffc9000004fdd8 EFLAGS: 00010202
    [ 5628.956250] RAX: 0000000000000000 RBX: ffffe8ffffc08150 RCX: 0000000000000000
    [ 5628.957102] RDX: 0000000000000000 RSI: ffff88802356bf00 RDI: ffffe8ffffc08150
    [ 5628.957940] RBP: 0000000000000026 R08: 0000000000000000 R09: 0000000000000000
    [ 5628.958803] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000040
    [ 5628.959661] R13: ffffe8ffffc08100 R14: 0000000000000000 R15: 0000000000000040
    [ 5628.960682] FS: 0000000000000000(0000) GS:ffff88803ea00000(0000) knlGS:0000000000000000
    [ 5628.961616] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 5628.962359] CR2: 0000000000000008 CR3: 000000000221c000 CR4: 00000000000006b0
    [ 5628.963188] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    [ 5628.964034] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    [ 5628.964871] Call Trace:
    [ 5628.965179] net_rx_action+0xf0/0x380
    [ 5628.965637] __do_softirq+0xc7/0x431
    [ 5628.966510] run_ksoftirqd+0x24/0x30
    [ 5628.966957] smpboot_thread_fn+0xc5/0x160
    [ 5628.967436] kthread+0x113/0x130
    [ 5628.968283] ret_from_fork+0x3a/0x50
    [ 5628.968721] Modules linked in:
    [ 5628.969099] CR2: 0000000000000008
    [ 5628.969510] ---[ end trace 9d9dedc7181661fe ]---
    [ 5628.970073] RIP: 0010:gro_cell_poll+0x49/0x80
    [ 5628.972965] RSP: 0018:ffffc9000004fdd8 EFLAGS: 00010202
    [ 5628.973611] RAX: 0000000000000000 RBX: ffffe8ffffc08150 RCX: 0000000000000000
    [ 5628.974504] RDX: 0000000000000000 RSI: ffff88802356bf00 RDI: ffffe8ffffc08150
    [ 5628.975462] RBP: 0000000000000026 R08: 0000000000000000 R09: 0000000000000000
    [ 5628.976413] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000040
    [ 5628.977375] R13: ffffe8ffffc08100 R14: 0000000000000000 R15: 0000000000000040
    [ 5628.978296] FS: 0000000000000000(0000) GS:ffff88803ea00000(0000) knlGS:0000000000000000
    [ 5628.979327] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 5628.980044] CR2: 0000000000000008 CR3: 000000000221c000 CR4: 00000000000006b0
    [ 5628.980929] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    [ 5628.981736] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    [ 5628.982409] Kernel panic - not syncing: Fatal exception in interrupt
    [ 5628.983307] Kernel Offset: disabled

    Fixes: c42858eaf492 ("gro_cells: remove spinlock protecting receive queues")
    Signed-off-by: Lorenzo Bianconi
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Lorenzo Bianconi
     

17 Dec, 2018

4 commits

  • [ Upstream commit 867d0ad476db89a1e8af3f297af402399a54eea5 ]

    Commit 04157469b7b8 ("net: Use static_key for XPS maps") introduced a
    static key for XPS, but the increments/decrements don't match.

    First, the static key's counter is incremented once for each queue, but
    only decremented once for a whole batch of queues, leading to large
    unbalances.

    Second, the xps_rxqs_needed key is decremented whenever we reset a batch
    of queues, whether they had any rxqs mapping or not, so that if we setup
    cpu-XPS on em1 and RXQS-XPS on em2, resetting the queues on em1 would
    decrement the xps_rxqs_needed key.

    This reworks the accounting scheme so that the xps_needed key is
    incremented only once for each type of XPS for all the queues on a
    device, and the xps_rxqs_needed key is incremented only once for all
    queues. This is sufficient to let us retrieve queues via
    get_xps_queue().

    This patch introduces a new reset_xps_maps(), which reinitializes and
    frees the appropriate map (xps_rxqs_map or xps_cpus_map), and drops a
    reference to the needed keys:
    - both xps_needed and xps_rxqs_needed, in case of rxqs maps,
    - only xps_needed, in case of CPU maps.

    Now, we also need to call reset_xps_maps() at the end of
    __netif_set_xps_queue() when there's no active map left, for example
    when writing '00000000,00000000' to all queues' xps_rxqs setting.

    Fixes: 04157469b7b8 ("net: Use static_key for XPS maps")
    Signed-off-by: Sabrina Dubroca
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Sabrina Dubroca
     
  • [ Upstream commit f28c020fb488e1a8b87469812017044bef88aa2b ]

    Before commit 80d19669ecd3 ("net: Refactor XPS for CPUs and Rx queues"),
    netif_reset_xps_queues() did netdev_queue_numa_node_write() for all the
    queues being reset. Now, this is only done when the "active" variable in
    clean_xps_maps() is false, ie when on all the CPUs, there's no active
    XPS mapping left.

    Fixes: 80d19669ecd3 ("net: Refactor XPS for CPUs and Rx queues")
    Signed-off-by: Sabrina Dubroca
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Sabrina Dubroca
     
  • [ Upstream commit 688838934c231bb08f46db687e57f6d8bf82709c ]

    kmsan was able to trigger a kernel-infoleak using a gre device [1]

    nlmsg_populate_fdb_fill() has a hard coded assumption
    that dev->addr_len is ETH_ALEN, as normally guaranteed
    for ARPHRD_ETHER devices.

    A similar issue was fixed recently in commit da71577545a5
    ("rtnetlink: Disallow FDB configuration for non-Ethernet device")

    [1]
    BUG: KMSAN: kernel-infoleak in copyout lib/iov_iter.c:143 [inline]
    BUG: KMSAN: kernel-infoleak in _copy_to_iter+0x4c0/0x2700 lib/iov_iter.c:576
    CPU: 0 PID: 6697 Comm: syz-executor310 Not tainted 4.20.0-rc3+ #95
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Call Trace:
    __dump_stack lib/dump_stack.c:77 [inline]
    dump_stack+0x32d/0x480 lib/dump_stack.c:113
    kmsan_report+0x12c/0x290 mm/kmsan/kmsan.c:683
    kmsan_internal_check_memory+0x32a/0xa50 mm/kmsan/kmsan.c:743
    kmsan_copy_to_user+0x78/0xd0 mm/kmsan/kmsan_hooks.c:634
    copyout lib/iov_iter.c:143 [inline]
    _copy_to_iter+0x4c0/0x2700 lib/iov_iter.c:576
    copy_to_iter include/linux/uio.h:143 [inline]
    skb_copy_datagram_iter+0x4e2/0x1070 net/core/datagram.c:431
    skb_copy_datagram_msg include/linux/skbuff.h:3316 [inline]
    netlink_recvmsg+0x6f9/0x19d0 net/netlink/af_netlink.c:1975
    sock_recvmsg_nosec net/socket.c:794 [inline]
    sock_recvmsg+0x1d1/0x230 net/socket.c:801
    ___sys_recvmsg+0x444/0xae0 net/socket.c:2278
    __sys_recvmsg net/socket.c:2327 [inline]
    __do_sys_recvmsg net/socket.c:2337 [inline]
    __se_sys_recvmsg+0x2fa/0x450 net/socket.c:2334
    __x64_sys_recvmsg+0x4a/0x70 net/socket.c:2334
    do_syscall_64+0xcf/0x110 arch/x86/entry/common.c:291
    entry_SYSCALL_64_after_hwframe+0x63/0xe7
    RIP: 0033:0x441119
    Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 3d 01 f0 ff ff 0f 83 db 0a fc ff c3 66 2e 0f 1f 84 00 00 00 00
    RSP: 002b:00007fffc7f008a8 EFLAGS: 00000207 ORIG_RAX: 000000000000002f
    RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 0000000000441119
    RDX: 0000000000000040 RSI: 00000000200005c0 RDI: 0000000000000003
    RBP: 00000000006cc018 R08: 0000000000000100 R09: 0000000000000100
    R10: 0000000000000100 R11: 0000000000000207 R12: 0000000000402080
    R13: 0000000000402110 R14: 0000000000000000 R15: 0000000000000000

    Uninit was stored to memory at:
    kmsan_save_stack_with_flags mm/kmsan/kmsan.c:246 [inline]
    kmsan_save_stack mm/kmsan/kmsan.c:261 [inline]
    kmsan_internal_chain_origin+0x13d/0x240 mm/kmsan/kmsan.c:469
    kmsan_memcpy_memmove_metadata+0x1a9/0xf70 mm/kmsan/kmsan.c:344
    kmsan_memcpy_metadata+0xb/0x10 mm/kmsan/kmsan.c:362
    __msan_memcpy+0x61/0x70 mm/kmsan/kmsan_instr.c:162
    __nla_put lib/nlattr.c:744 [inline]
    nla_put+0x20a/0x2d0 lib/nlattr.c:802
    nlmsg_populate_fdb_fill+0x444/0x810 net/core/rtnetlink.c:3466
    nlmsg_populate_fdb net/core/rtnetlink.c:3775 [inline]
    ndo_dflt_fdb_dump+0x73a/0x960 net/core/rtnetlink.c:3807
    rtnl_fdb_dump+0x1318/0x1cb0 net/core/rtnetlink.c:3979
    netlink_dump+0xc79/0x1c90 net/netlink/af_netlink.c:2244
    __netlink_dump_start+0x10c4/0x11d0 net/netlink/af_netlink.c:2352
    netlink_dump_start include/linux/netlink.h:216 [inline]
    rtnetlink_rcv_msg+0x141b/0x1540 net/core/rtnetlink.c:4910
    netlink_rcv_skb+0x394/0x640 net/netlink/af_netlink.c:2477
    rtnetlink_rcv+0x50/0x60 net/core/rtnetlink.c:4965
    netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline]
    netlink_unicast+0x1699/0x1740 net/netlink/af_netlink.c:1336
    netlink_sendmsg+0x13c7/0x1440 net/netlink/af_netlink.c:1917
    sock_sendmsg_nosec net/socket.c:621 [inline]
    sock_sendmsg net/socket.c:631 [inline]
    ___sys_sendmsg+0xe3b/0x1240 net/socket.c:2116
    __sys_sendmsg net/socket.c:2154 [inline]
    __do_sys_sendmsg net/socket.c:2163 [inline]
    __se_sys_sendmsg+0x305/0x460 net/socket.c:2161
    __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2161
    do_syscall_64+0xcf/0x110 arch/x86/entry/common.c:291
    entry_SYSCALL_64_after_hwframe+0x63/0xe7

    Uninit was created at:
    kmsan_save_stack_with_flags mm/kmsan/kmsan.c:246 [inline]
    kmsan_internal_poison_shadow+0x6d/0x130 mm/kmsan/kmsan.c:170
    kmsan_kmalloc+0xa1/0x100 mm/kmsan/kmsan_hooks.c:186
    __kmalloc+0x14c/0x4d0 mm/slub.c:3825
    kmalloc include/linux/slab.h:551 [inline]
    __hw_addr_create_ex net/core/dev_addr_lists.c:34 [inline]
    __hw_addr_add_ex net/core/dev_addr_lists.c:80 [inline]
    __dev_mc_add+0x357/0x8a0 net/core/dev_addr_lists.c:670
    dev_mc_add+0x6d/0x80 net/core/dev_addr_lists.c:687
    ip_mc_filter_add net/ipv4/igmp.c:1128 [inline]
    igmp_group_added+0x4d4/0xb80 net/ipv4/igmp.c:1311
    __ip_mc_inc_group+0xea9/0xf70 net/ipv4/igmp.c:1444
    ip_mc_inc_group net/ipv4/igmp.c:1453 [inline]
    ip_mc_up+0x1c3/0x400 net/ipv4/igmp.c:1775
    inetdev_event+0x1d03/0x1d80 net/ipv4/devinet.c:1522
    notifier_call_chain kernel/notifier.c:93 [inline]
    __raw_notifier_call_chain kernel/notifier.c:394 [inline]
    raw_notifier_call_chain+0x13d/0x240 kernel/notifier.c:401
    __dev_notify_flags+0x3da/0x860 net/core/dev.c:1733
    dev_change_flags+0x1ac/0x230 net/core/dev.c:7569
    do_setlink+0x165f/0x5ea0 net/core/rtnetlink.c:2492
    rtnl_newlink+0x2ad7/0x35a0 net/core/rtnetlink.c:3111
    rtnetlink_rcv_msg+0x1148/0x1540 net/core/rtnetlink.c:4947
    netlink_rcv_skb+0x394/0x640 net/netlink/af_netlink.c:2477
    rtnetlink_rcv+0x50/0x60 net/core/rtnetlink.c:4965
    netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline]
    netlink_unicast+0x1699/0x1740 net/netlink/af_netlink.c:1336
    netlink_sendmsg+0x13c7/0x1440 net/netlink/af_netlink.c:1917
    sock_sendmsg_nosec net/socket.c:621 [inline]
    sock_sendmsg net/socket.c:631 [inline]
    ___sys_sendmsg+0xe3b/0x1240 net/socket.c:2116
    __sys_sendmsg net/socket.c:2154 [inline]
    __do_sys_sendmsg net/socket.c:2163 [inline]
    __se_sys_sendmsg+0x305/0x460 net/socket.c:2161
    __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2161
    do_syscall_64+0xcf/0x110 arch/x86/entry/common.c:291
    entry_SYSCALL_64_after_hwframe+0x63/0xe7

    Bytes 36-37 of 105 are uninitialized
    Memory access of size 105 starts at ffff88819686c000
    Data copied to user address 0000000020000380

    Fixes: d83b06036048 ("net: add fdb generic dump routine")
    Signed-off-by: Eric Dumazet
    Cc: John Fastabend
    Cc: Ido Schimmel
    Cc: David Ahern
    Reviewed-by: Ido Schimmel
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • [ Upstream commit 22f6bbb7bcfcef0b373b0502a7ff390275c575dd ]

    list_del() leaves the skb->next pointer poisoned, which can then lead to
    a crash in e.g. OVS forwarding. For example, setting up an OVS VXLAN
    forwarding bridge on sfc as per:

    ========
    $ ovs-vsctl show
    5dfd9c47-f04b-4aaa-aa96-4fbb0a522a30
    Bridge "br0"
    Port "br0"
    Interface "br0"
    type: internal
    Port "enp6s0f0"
    Interface "enp6s0f0"
    Port "vxlan0"
    Interface "vxlan0"
    type: vxlan
    options: {key="1", local_ip="10.0.0.5", remote_ip="10.0.0.4"}
    ovs_version: "2.5.0"
    ========
    (where 10.0.0.5 is an address on enp6s0f1)
    and sending traffic across it will lead to the following panic:
    ========
    general protection fault: 0000 [#1] SMP PTI
    CPU: 5 PID: 0 Comm: swapper/5 Not tainted 4.20.0-rc3-ehc+ #701
    Hardware name: Dell Inc. PowerEdge R710/0M233H, BIOS 6.4.0 07/23/2013
    RIP: 0010:dev_hard_start_xmit+0x38/0x200
    Code: 53 48 89 fb 48 83 ec 20 48 85 ff 48 89 54 24 08 48 89 4c 24 18 0f 84 ab 01 00 00 48 8d 86 90 00 00 00 48 89 f5 48 89 44 24 10 8b 33 48 c7 03 00 00 00 00 48 8b 05 c7 d1 b3 00 4d 85 f6 0f 95
    RSP: 0018:ffff888627b437e0 EFLAGS: 00010202
    RAX: 0000000000000000 RBX: dead000000000100 RCX: ffff88862279c000
    RDX: ffff888614a342c0 RSI: 0000000000000000 RDI: 0000000000000000
    RBP: ffff888618a88000 R08: 0000000000000001 R09: 00000000000003e8
    R10: 0000000000000000 R11: ffff888614a34140 R12: 0000000000000000
    R13: 0000000000000062 R14: dead000000000100 R15: ffff888616430000
    FS: 0000000000000000(0000) GS:ffff888627b40000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00007f6d2bc6d000 CR3: 000000000200a000 CR4: 00000000000006e0
    Call Trace:

    __dev_queue_xmit+0x623/0x870
    ? masked_flow_lookup+0xf7/0x220 [openvswitch]
    ? ep_poll_callback+0x101/0x310
    do_execute_actions+0xaba/0xaf0 [openvswitch]
    ? __wake_up_common+0x8a/0x150
    ? __wake_up_common_lock+0x87/0xc0
    ? queue_userspace_packet+0x31c/0x5b0 [openvswitch]
    ovs_execute_actions+0x47/0x120 [openvswitch]
    ovs_dp_process_packet+0x7d/0x110 [openvswitch]
    ovs_vport_receive+0x6e/0xd0 [openvswitch]
    ? dst_alloc+0x64/0x90
    ? rt_dst_alloc+0x50/0xd0
    ? ip_route_input_slow+0x19a/0x9a0
    ? __udp_enqueue_schedule_skb+0x198/0x1b0
    ? __udp4_lib_rcv+0x856/0xa30
    ? __udp4_lib_rcv+0x856/0xa30
    ? cpumask_next_and+0x19/0x20
    ? find_busiest_group+0x12d/0xcd0
    netdev_frame_hook+0xce/0x150 [openvswitch]
    __netif_receive_skb_core+0x205/0xae0
    __netif_receive_skb_list_core+0x11e/0x220
    netif_receive_skb_list+0x203/0x460
    ? __efx_rx_packet+0x335/0x5e0 [sfc]
    efx_poll+0x182/0x320 [sfc]
    net_rx_action+0x294/0x3c0
    __do_softirq+0xca/0x297
    irq_exit+0xa6/0xb0
    do_IRQ+0x54/0xd0
    common_interrupt+0xf/0xf

    ========
    So, in all listified-receive handling, instead pull skbs off the lists with
    skb_list_del_init().

    Fixes: 9af86f933894 ("net: core: fix use-after-free in __netif_receive_skb_list_core")
    Fixes: 7da517a3bc52 ("net: core: Another step of skb receive list processing")
    Fixes: a4ca8b7df73c ("net: ipv4: fix drop handling in ip_list_rcv() and ip_list_rcv_finish()")
    Fixes: d8269e2cbf90 ("net: ipv6: listify ipv6_rcv() and ip6_rcv_finish()")
    Signed-off-by: Edward Cree
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Edward Cree
     

06 Dec, 2018

2 commits

  • [ Upstream commit b5dd186d10ba59e6b5ba60e42b3b083df56df6f3 ]

    When a packet is trapped and the corresponding SKB marked as
    already-forwarded, it retains this marking even after it is forwarded
    across veth links into another bridge. There, since it ingresses the
    bridge over veth, which doesn't have offload_fwd_mark, it triggers a
    warning in nbp_switchdev_frame_mark().

    Then nbp_switchdev_allowed_egress() decides not to allow egress from
    this bridge through another veth, because the SKB is already marked, and
    the mark (of 0) of course matches. Thus the packet is incorrectly
    blocked.

    Solve by resetting offload_fwd_mark() in skb_scrub_packet(). That
    function is called from tunnels and also from veth, and thus catches the
    cases where traffic is forwarded between bridges and transformed in a
    way that invalidates the marking.

    Fixes: 6bc506b4fb06 ("bridge: switchdev: Add forward mark support for stacked devices")
    Fixes: abf4bb6b63d0 ("skbuff: Add the offload_mr_fwd_mark field")
    Signed-off-by: Petr Machata
    Suggested-by: Ido Schimmel
    Acked-by: Jiri Pirko
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Petr Machata
     
  • [ Upstream commit 605108acfe6233b72e2f803aa1cb59a2af3001ca ]

    Eric noted that with UDP GRO and NAPI timeout, we could keep a single
    UDP packet inside the GRO hash forever, if the related NAPI instance
    calls napi_gro_complete() at an higher frequency than the NAPI timeout.
    Willem noted that even TCP packets could be trapped there, till the
    next retransmission.
    This patch tries to address the issue, flushing the old packets -
    those with a NAPI_GRO_CB age before the current jiffy - before scheduling
    the NAPI timeout. The rationale is that such a timeout should be
    well below a jiffy and we are not flushing packets eligible for sane GRO.

    v1 -> v2:
    - clarified the commit message and comment

    RFC -> v1:
    - added 'Fixes tags', cleaned-up the wording.

    Reported-by: Eric Dumazet
    Fixes: 3b47d30396ba ("net: gro: add a per device gro flush timer")
    Signed-off-by: Paolo Abeni
    Acked-by: Willem de Bruijn
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Paolo Abeni
     

01 Dec, 2018

1 commit

  • commit 8873c064d1de579ea23412a6d3eee972593f142b upstream.

    syzkaller was able to hit the WARN_ON(sock_owned_by_user(sk));
    in tcp_close()

    While a socket is being closed, it is very possible other
    threads find it in rtnetlink dump.

    tcp_get_info() will acquire the socket lock for a short amount
    of time (slow = lock_sock_fast(sk)/unlock_sock_fast(sk, slow);),
    enough to trigger the warning.

    Fixes: 67db3e4bfbc9 ("tcp: no longer hold ehash lock while calling tcp_get_info()")
    Signed-off-by: Eric Dumazet
    Reported-by: syzbot
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     

23 Nov, 2018

2 commits

  • [ Upstream commit 33d9a2c72f086cbf1087b2fd2d1a15aa9df14a7f ]

    eth_type_trans() assumes initial value for skb->pkt_type
    is PACKET_HOST.

    This is indeed the value right after a fresh skb allocation.

    However, it is possible that GRO merged a packet with a different
    value (like PACKET_OTHERHOST in case macvlan is used), so
    we need to make sure napi->skb will have pkt_type set back to
    PACKET_HOST.

    Otherwise, valid packets might be dropped by the stack because
    their pkt_type is not PACKET_HOST.

    napi_reuse_skb() was added in commit 96e93eab2033 ("gro: Add
    internal interfaces for VLAN"), but this bug always has
    been there.

    Fixes: 96e93eab2033 ("gro: Add internal interfaces for VLAN")
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • [ Upstream commit 62230715fd2453b3ba948c9d83cfb3ada9169169 ]

    Only first fragment has the sport/dport information,
    not the following ones.

    If we want consistent hash for all fragments, we need to
    ignore ports even for first fragment.

    This bug is visible for IPv6 traffic, if incoming fragments
    do not have a flow label, since skb_get_hash() will give
    different results for first fragment and following ones.

    It is also visible if any routing rule wants dissection
    and sport or dport.

    See commit 5e5d6fed3741 ("ipv6: route: dissect flow
    in input path if fib rules need it") for details.

    [edumazet] rewrote the changelog completely.

    Fixes: 06635a35d13d ("flow_dissect: use programable dissector in skb_flow_dissect and friends")
    Signed-off-by: 배석진
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    배석진
     

14 Nov, 2018

1 commit

  • [ Upstream commit a90e90b7d55e789c71d85b946ffb5c1ab2f137ca ]

    We have seen a customer complaining about soft lockups on !PREEMPT
    kernel config with 4.4 based kernel

    [1072141.435366] NMI watchdog: BUG: soft lockup - CPU#21 stuck for 22s! [systemd:1]
    [1072141.444090] Modules linked in: mpt3sas raid_class binfmt_misc af_packet 8021q garp mrp stp llc xfs libcrc32c bonding iscsi_ibft iscsi_boot_sysfs msr ext4 crc16 jbd2 mbcache cdc_ether usbnet mii joydev hid_generic usbhid intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel ipmi_ssif mgag200 i2c_algo_bit ttm ipmi_devintf drbg ixgbe drm_kms_helper vxlan ansi_cprng ip6_udp_tunnel drm aesni_intel udp_tunnel aes_x86_64 iTCO_wdt syscopyarea ptp xhci_pci lrw iTCO_vendor_support pps_core gf128mul ehci_pci glue_helper sysfillrect mdio pcspkr sb_edac ablk_helper cryptd ehci_hcd sysimgblt xhci_hcd fb_sys_fops edac_core mei_me lpc_ich ses usbcore enclosure dca mfd_core ipmi_si mei i2c_i801 scsi_transport_sas usb_common ipmi_msghandler shpchp fjes wmi processor button acpi_pad btrfs xor raid6_pq sd_mod crc32c_intel megaraid_sas sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua scsi_mod md_mod autofs4
    [1072141.444146] Supported: Yes
    [1072141.444149] CPU: 21 PID: 1 Comm: systemd Not tainted 4.4.121-92.80-default #1
    [1072141.444150] Hardware name: LENOVO Lenovo System x3650 M5 -[5462P4U]- -[5462P4U]-/01GR451, BIOS -[TCE136H-2.70]- 06/13/2018
    [1072141.444151] task: ffff880191bd0040 ti: ffff880191bd4000 task.ti: ffff880191bd4000
    [1072141.444153] RIP: 0010:[] [] update_classid_sock+0x29/0x40
    [1072141.444157] RSP: 0018:ffff880191bd7d58 EFLAGS: 00000286
    [1072141.444158] RAX: ffff883b177cb7c0 RBX: 0000000000000000 RCX: 0000000000000000
    [1072141.444159] RDX: 00000000000009c7 RSI: ffff880191bd7d5c RDI: ffff8822e29bb200
    [1072141.444160] RBP: ffff883a72230980 R08: 0000000000000101 R09: 0000000000000000
    [1072141.444161] R10: 0000000000000008 R11: f000000000000000 R12: ffffffff815229d0
    [1072141.444162] R13: 0000000000000000 R14: ffff881fd0a47ac0 R15: ffff880191bd7f28
    [1072141.444163] FS: 00007f3e2f1eb8c0(0000) GS:ffff882000340000(0000) knlGS:0000000000000000
    [1072141.444164] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [1072141.444165] CR2: 00007f3e2f200000 CR3: 0000001ffea4e000 CR4: 00000000001606f0
    [1072141.444166] Stack:
    [1072141.444166] ffffffa800000246 00000000000009c7 ffffffff8121d583 ffff8818312a05c0
    [1072141.444168] ffff8818312a1100 ffff880197c3b280 ffff881861422858 ffffffffffffffea
    [1072141.444170] ffffffff81522b1c ffffffff81d0ca20 ffff8817fa17b950 ffff883fdd8121e0
    [1072141.444171] Call Trace:
    [1072141.444179] [] iterate_fd+0x53/0x80
    [1072141.444182] [] write_classid+0x4c/0x80
    [1072141.444187] [] cgroup_file_write+0x9b/0x100
    [1072141.444193] [] kernfs_fop_write+0x11b/0x150
    [1072141.444198] [] __vfs_write+0x26/0x100
    [1072141.444201] [] vfs_write+0x9d/0x190
    [1072141.444203] [] SyS_write+0x42/0xa0
    [1072141.444207] [] entry_SYSCALL_64_fastpath+0x1e/0xca
    [1072141.445490] DWARF2 unwinder stuck at entry_SYSCALL_64_fastpath+0x1e/0xca

    If a cgroup has many tasks with many open file descriptors then we would
    end up in a large loop without any rescheduling point throught the
    operation. Add cond_resched once per task.

    Signed-off-by: Michal Hocko
    Signed-off-by: Tejun Heo
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Michal Hocko
     

04 Nov, 2018

4 commits

  • [ Upstream commti ece23711dd956cd5053c9cb03e9fe0668f9c8894 ]

    Just like with normal GRO processing, we have to initialize
    skb->next to NULL when we unlink overflow packets from the
    GRO hash lists.

    Fixes: d4546c2509b1 ("net: Convert GRO SKB handling to list_head.")
    Reported-by: Oleksandr Natalenko
    Tested-by: Oleksandr Natalenko
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    David S. Miller
     
  • [ Upstream commit da71577545a52be3e0e9225a946e5fd79cfab015 ]

    When an FDB entry is configured, the address is validated to have the
    length of an Ethernet address, but the device for which the address is
    configured can be of any type.

    The above can result in the use of uninitialized memory when the address
    is later compared against existing addresses since 'dev->addr_len' is
    used and it may be greater than ETH_ALEN, as with ip6tnl devices.

    Fix this by making sure that FDB entries are only configured for
    Ethernet devices.

    BUG: KMSAN: uninit-value in memcmp+0x11d/0x180 lib/string.c:863
    CPU: 1 PID: 4318 Comm: syz-executor998 Not tainted 4.19.0-rc3+ #49
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
    Google 01/01/2011
    Call Trace:
    __dump_stack lib/dump_stack.c:77 [inline]
    dump_stack+0x14b/0x190 lib/dump_stack.c:113
    kmsan_report+0x183/0x2b0 mm/kmsan/kmsan.c:956
    __msan_warning+0x70/0xc0 mm/kmsan/kmsan_instr.c:645
    memcmp+0x11d/0x180 lib/string.c:863
    dev_uc_add_excl+0x165/0x7b0 net/core/dev_addr_lists.c:464
    ndo_dflt_fdb_add net/core/rtnetlink.c:3463 [inline]
    rtnl_fdb_add+0x1081/0x1270 net/core/rtnetlink.c:3558
    rtnetlink_rcv_msg+0xa0b/0x1530 net/core/rtnetlink.c:4715
    netlink_rcv_skb+0x36e/0x5f0 net/netlink/af_netlink.c:2454
    rtnetlink_rcv+0x50/0x60 net/core/rtnetlink.c:4733
    netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
    netlink_unicast+0x1638/0x1720 net/netlink/af_netlink.c:1343
    netlink_sendmsg+0x1205/0x1290 net/netlink/af_netlink.c:1908
    sock_sendmsg_nosec net/socket.c:621 [inline]
    sock_sendmsg net/socket.c:631 [inline]
    ___sys_sendmsg+0xe70/0x1290 net/socket.c:2114
    __sys_sendmsg net/socket.c:2152 [inline]
    __do_sys_sendmsg net/socket.c:2161 [inline]
    __se_sys_sendmsg+0x2a3/0x3d0 net/socket.c:2159
    __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2159
    do_syscall_64+0xb8/0x100 arch/x86/entry/common.c:291
    entry_SYSCALL_64_after_hwframe+0x63/0xe7
    RIP: 0033:0x440ee9
    Code: e8 cc ab 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7
    48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 3d 01 f0 ff
    ff 0f 83 bb 0a fc ff c3 66 2e 0f 1f 84 00 00 00 00
    RSP: 002b:00007fff6a93b518 EFLAGS: 00000213 ORIG_RAX: 000000000000002e
    RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000440ee9
    RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000003
    RBP: 0000000000000000 R08: 00000000004002c8 R09: 00000000004002c8
    R10: 00000000004002c8 R11: 0000000000000213 R12: 000000000000b4b0
    R13: 0000000000401ec0 R14: 0000000000000000 R15: 0000000000000000

    Uninit was created at:
    kmsan_save_stack_with_flags mm/kmsan/kmsan.c:256 [inline]
    kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:181
    kmsan_kmalloc+0x98/0x100 mm/kmsan/kmsan_hooks.c:91
    kmsan_slab_alloc+0x10/0x20 mm/kmsan/kmsan_hooks.c:100
    slab_post_alloc_hook mm/slab.h:446 [inline]
    slab_alloc_node mm/slub.c:2718 [inline]
    __kmalloc_node_track_caller+0x9e7/0x1160 mm/slub.c:4351
    __kmalloc_reserve net/core/skbuff.c:138 [inline]
    __alloc_skb+0x2f5/0x9e0 net/core/skbuff.c:206
    alloc_skb include/linux/skbuff.h:996 [inline]
    netlink_alloc_large_skb net/netlink/af_netlink.c:1189 [inline]
    netlink_sendmsg+0xb49/0x1290 net/netlink/af_netlink.c:1883
    sock_sendmsg_nosec net/socket.c:621 [inline]
    sock_sendmsg net/socket.c:631 [inline]
    ___sys_sendmsg+0xe70/0x1290 net/socket.c:2114
    __sys_sendmsg net/socket.c:2152 [inline]
    __do_sys_sendmsg net/socket.c:2161 [inline]
    __se_sys_sendmsg+0x2a3/0x3d0 net/socket.c:2159
    __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2159
    do_syscall_64+0xb8/0x100 arch/x86/entry/common.c:291
    entry_SYSCALL_64_after_hwframe+0x63/0xe7

    v2:
    * Make error message more specific (David)

    Fixes: 090096bf3db1 ("net: generic fdb support for drivers without ndo_fdb_")
    Signed-off-by: Ido Schimmel
    Reported-and-tested-by: syzbot+3a288d5f5530b901310e@syzkaller.appspotmail.com
    Reported-and-tested-by: syzbot+d53ab4e92a1db04110ff@syzkaller.appspotmail.com
    Cc: Vlad Yasevich
    Cc: David Ahern
    Reviewed-by: David Ahern
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Ido Schimmel
     
  • [ Upstream commit 89ab066d4229acd32e323f1569833302544a4186 ]

    This reverts commit dd979b4df817e9976f18fb6f9d134d6bc4a3c317.

    This broke tcp_poll for SMC fallback: An AF_SMC socket establishes an
    internal TCP socket for the initial handshake with the remote peer.
    Whenever the SMC connection can not be established this TCP socket is
    used as a fallback. All socket operations on the SMC socket are then
    forwarded to the TCP socket. In case of poll, the file->private_data
    pointer references the SMC socket because the TCP socket has no file
    assigned. This causes tcp_poll to wait on the wrong socket.

    Signed-off-by: Karsten Graul
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Karsten Graul
     
  • [ Upstream commit db4f1be3ca9b0ef7330763d07bf4ace83ad6f913 ]

    Current handling of CHECKSUM_COMPLETE packets by the UDP stack is
    incorrect for any packet that has an incorrect checksum value.

    udp4/6_csum_init() will both make a call to
    __skb_checksum_validate_complete() to initialize/validate the csum
    field when receiving a CHECKSUM_COMPLETE packet. When this packet
    fails validation, skb->csum will be overwritten with the pseudoheader
    checksum so the packet can be fully validated by software, but the
    skb->ip_summed value will be left as CHECKSUM_COMPLETE so that way
    the stack can later warn the user about their hardware spewing bad
    checksums. Unfortunately, leaving the SKB in this state can cause
    problems later on in the checksum calculation.

    Since the the packet is still marked as CHECKSUM_COMPLETE,
    udp_csum_pull_header() will SUBTRACT the checksum of the UDP header
    from skb->csum instead of adding it, leaving us with a garbage value
    in that field. Once we try to copy the packet to userspace in the
    udp4/6_recvmsg(), we'll make a call to skb_copy_and_csum_datagram_msg()
    to checksum the packet data and add it in the garbage skb->csum value
    to perform our final validation check.

    Since the value we're validating is not the proper checksum, it's possible
    that the folded value could come out to 0, causing us not to drop the
    packet. Instead, we believe that the packet was checksummed incorrectly
    by hardware since skb->ip_summed is still CHECKSUM_COMPLETE, and we attempt
    to warn the user with netdev_rx_csum_fault(skb->dev);

    Unfortunately, since this is the UDP path, skb->dev has been overwritten
    by skb->dev_scratch and is no longer a valid pointer, so we end up
    reading invalid memory.

    This patch addresses this problem in two ways:
    1) Do not use the dev pointer when calling netdev_rx_csum_fault()
    from skb_copy_and_csum_datagram_msg(). Since this gets called
    from the UDP path where skb->dev has been overwritten, we have
    no way of knowing if the pointer is still valid. Also for the
    sake of consistency with the other uses of
    netdev_rx_csum_fault(), don't attempt to call it if the
    packet was checksummed by software.

    2) Add better CHECKSUM_COMPLETE handling to udp4/6_csum_init().
    If we receive a packet that's CHECKSUM_COMPLETE that fails
    verification (i.e. skb->csum_valid == 0), check who performed
    the calculation. It's possible that the checksum was done in
    software by the network stack earlier (such as Netfilter's
    CONNTRACK module), and if that says the checksum is bad,
    we can drop the packet immediately instead of waiting until
    we try and copy it to userspace. Otherwise, we need to
    mark the SKB as CHECKSUM_NONE, since the skb->csum field
    no longer contains the full packet checksum after the
    call to __skb_checksum_validate_complete().

    Fixes: e6afc8ace6dd ("udp: remove headers from UDP packets before queueing")
    Fixes: c84d949057ca ("udp: copy skb->truesize in the first cache line")
    Cc: Sam Kumar
    Cc: Eric Dumazet
    Signed-off-by: Sean Tranchetti
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Sean Tranchetti
     

21 Oct, 2018

1 commit

  • This reverts commit 8e326289e3069dfc9fa9c209924668dd031ab8ef.

    This patch results in unnecessary netlink notification when one
    tries to delete a neigh entry already in NUD_FAILED state. Found
    this with a buggy app that tries to delete a NUD_FAILED entry
    repeatedly. While the notification issue can be fixed with more
    checks, adding more complexity here seems unnecessary. Also,
    recent tests with other changes in the neighbour code have
    shown that the INCOMPLETE and PROBE checks are good enough for
    the original issue.

    Signed-off-by: Roopa Prabhu
    Signed-off-by: David S. Miller

    Roopa Prabhu
     

20 Oct, 2018

2 commits

  • We've been getting checksum errors involving small UDP packets, usually
    59B packets with 1 extra non-zero padding byte. netdev_rx_csum_fault()
    has been complaining that HW is providing bad checksums. Turns out the
    problem is in pskb_trim_rcsum_slow(), introduced in commit 88078d98d1bb
    ("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends").

    The source of the problem is that when the bytes we are trimming start
    at an odd address, as in the case of the 1 padding byte above,
    skb_checksum() returns a byte-swapped value. We cannot just combine this
    with skb->csum using csum_sub(). We need to use csum_block_sub() here
    that takes into account the parity of the start address and handles the
    swapping.

    Matches existing code in __skb_postpull_rcsum() and esp_remove_trailer().

    Fixes: 88078d98d1bb ("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends")
    Signed-off-by: Dimitris Michailidis
    Reviewed-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Dimitris Michailidis
     
  • This reverts commit 6fe9487892b32cb1c8b8b0d552ed7222a527fe30.

    It is causing more serious regressions than the RCU warning
    it is fixing.

    Signed-off-by: David S. Miller

    David S. Miller