05 Nov, 2020

3 commits


08 Oct, 2020

1 commit

  • * tag 'v5.4.70': (3051 commits)
    Linux 5.4.70
    netfilter: ctnetlink: add a range check for l3/l4 protonum
    ep_create_wakeup_source(): dentry name can change under you...
    ...

    Conflicts:
    arch/arm/mach-imx/pm-imx6.c
    arch/arm64/boot/dts/freescale/imx8mm-evk.dts
    arch/arm64/boot/dts/freescale/imx8mn-ddr4-evk.dts
    drivers/crypto/caam/caamalg.c
    drivers/gpu/drm/imx/dw_hdmi-imx.c
    drivers/gpu/drm/imx/imx-ldb.c
    drivers/gpu/drm/imx/ipuv3/ipuv3-crtc.c
    drivers/mmc/host/sdhci-esdhc-imx.c
    drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c
    drivers/net/ethernet/freescale/enetc/enetc.c
    drivers/net/ethernet/freescale/enetc/enetc_pf.c
    drivers/thermal/imx_thermal.c
    drivers/usb/cdns3/ep0.c
    drivers/xen/swiotlb-xen.c
    sound/soc/fsl/fsl_esai.c
    sound/soc/fsl/fsl_sai.c

    Signed-off-by: Jason Liu

    Jason Liu
     

01 Oct, 2020

1 commit

  • [ Upstream commit 9ed498c6280a2f2b51d02df96df53037272ede49 ]

    sk->sk_backlog.tail might be read without holding the socket spinlock,
    we need to add proper READ_ONCE()/WRITE_ONCE() to silence the warnings.

    KCSAN reported :

    BUG: KCSAN: data-race in tcp_add_backlog / tcp_recvmsg

    write to 0xffff8881265109f8 of 8 bytes by interrupt on cpu 1:
    __sk_add_backlog include/net/sock.h:907 [inline]
    sk_add_backlog include/net/sock.h:938 [inline]
    tcp_add_backlog+0x476/0xce0 net/ipv4/tcp_ipv4.c:1759
    tcp_v4_rcv+0x1a70/0x1bd0 net/ipv4/tcp_ipv4.c:1947
    ip_protocol_deliver_rcu+0x4d/0x420 net/ipv4/ip_input.c:204
    ip_local_deliver_finish+0x110/0x140 net/ipv4/ip_input.c:231
    NF_HOOK include/linux/netfilter.h:305 [inline]
    NF_HOOK include/linux/netfilter.h:299 [inline]
    ip_local_deliver+0x133/0x210 net/ipv4/ip_input.c:252
    dst_input include/net/dst.h:442 [inline]
    ip_rcv_finish+0x121/0x160 net/ipv4/ip_input.c:413
    NF_HOOK include/linux/netfilter.h:305 [inline]
    NF_HOOK include/linux/netfilter.h:299 [inline]
    ip_rcv+0x18f/0x1a0 net/ipv4/ip_input.c:523
    __netif_receive_skb_one_core+0xa7/0xe0 net/core/dev.c:4929
    __netif_receive_skb+0x37/0xf0 net/core/dev.c:5043
    netif_receive_skb_internal+0x59/0x190 net/core/dev.c:5133
    napi_skb_finish net/core/dev.c:5596 [inline]
    napi_gro_receive+0x28f/0x330 net/core/dev.c:5629
    receive_buf+0x284/0x30b0 drivers/net/virtio_net.c:1061
    virtnet_receive drivers/net/virtio_net.c:1323 [inline]
    virtnet_poll+0x436/0x7d0 drivers/net/virtio_net.c:1428
    napi_poll net/core/dev.c:6311 [inline]
    net_rx_action+0x3ae/0xa90 net/core/dev.c:6379
    __do_softirq+0x115/0x33f kernel/softirq.c:292
    invoke_softirq kernel/softirq.c:373 [inline]
    irq_exit+0xbb/0xe0 kernel/softirq.c:413
    exiting_irq arch/x86/include/asm/apic.h:536 [inline]
    do_IRQ+0xa6/0x180 arch/x86/kernel/irq.c:263
    ret_from_intr+0x0/0x19
    native_safe_halt+0xe/0x10 arch/x86/kernel/paravirt.c:71
    arch_cpu_idle+0x1f/0x30 arch/x86/kernel/process.c:571
    default_idle_call+0x1e/0x40 kernel/sched/idle.c:94
    cpuidle_idle_call kernel/sched/idle.c:154 [inline]
    do_idle+0x1af/0x280 kernel/sched/idle.c:263
    cpu_startup_entry+0x1b/0x20 kernel/sched/idle.c:355
    start_secondary+0x208/0x260 arch/x86/kernel/smpboot.c:264
    secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:241

    read to 0xffff8881265109f8 of 8 bytes by task 8057 on cpu 0:
    tcp_recvmsg+0x46e/0x1b40 net/ipv4/tcp.c:2050
    inet_recvmsg+0xbb/0x250 net/ipv4/af_inet.c:838
    sock_recvmsg_nosec net/socket.c:871 [inline]
    sock_recvmsg net/socket.c:889 [inline]
    sock_recvmsg+0x92/0xb0 net/socket.c:885
    sock_read_iter+0x15f/0x1e0 net/socket.c:967
    call_read_iter include/linux/fs.h:1889 [inline]
    new_sync_read+0x389/0x4f0 fs/read_write.c:414
    __vfs_read+0xb1/0xc0 fs/read_write.c:427
    vfs_read fs/read_write.c:461 [inline]
    vfs_read+0x143/0x2c0 fs/read_write.c:446
    ksys_read+0xd5/0x1b0 fs/read_write.c:587
    __do_sys_read fs/read_write.c:597 [inline]
    __se_sys_read fs/read_write.c:595 [inline]
    __x64_sys_read+0x4c/0x60 fs/read_write.c:595
    do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

    Reported by Kernel Concurrency Sanitizer on:
    CPU: 0 PID: 8057 Comm: syz-fuzzer Not tainted 5.4.0-rc6+ #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin

    Eric Dumazet
     

27 Sep, 2020

2 commits

  • [ Upstream commit fe81d9f6182d1160e625894eecb3d7ff0222cac5 ]

    When calculating ancestor_size with IPv6 enabled, simply using
    sizeof(struct ipv6_pinfo) doesn't account for extra bytes needed for
    alignment in the struct sctp6_sock. On x86, there aren't any extra
    bytes, but on ARM the ipv6_pinfo structure is aligned on an 8-byte
    boundary so there were 4 pad bytes that were omitted from the
    ancestor_size calculation. This would lead to corruption of the
    pd_lobby pointers, causing an oops when trying to free the sctp
    structure on socket close.

    Fixes: 636d25d557d1 ("sctp: not copy sctp_sock pd_lobby in sctp_copy_descendant")
    Signed-off-by: Henry Ptasinski
    Acked-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Henry Ptasinski
     
  • [ Upstream commit 1869e226a7b3ef75b4f70ede2f1b7229f7157fa4 ]

    flowi4_multipath_hash was added by the commit referenced below for
    tunnels. Unfortunately, the patch did not initialize the new field
    for several fast path lookups that do not initialize the entire flow
    struct to 0. Fix those locations. Currently, flowi4_multipath_hash
    is random garbage and affects the hash value computed by
    fib_multipath_hash for multipath selection.

    Fixes: 24ba14406c5c ("route: Add multipath_hash in flowi_common to make user-define hash")
    Signed-off-by: David Ahern
    Cc: wenxu
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    David Ahern
     

10 Sep, 2020

2 commits

  • [ Upstream commit 1e105e6afa6c3d32bfb52c00ffa393894a525c27 ]

    Following bug was reported via irc:
    nft list ruleset
    set knock_candidates_ipv4 {
    type ipv4_addr . inet_service
    size 65535
    elements = { 127.0.0.1 . 123,
    127.0.0.1 . 123 }
    }
    ..
    udp dport 123 add @knock_candidates_ipv4 { ip saddr . 123 }
    udp dport 123 add @knock_candidates_ipv4 { ip saddr . udp dport }

    It should not have been possible to add a duplicate set entry.

    After some debugging it turned out that the problem is the immediate
    value (123) in the second-to-last rule.

    Concatenations use 32bit registers, i.e. the elements are 8 bytes each,
    not 6 and it turns out the kernel inserted

    inet firewall @knock_candidates_ipv4
    element 0100007f ffff7b00 : 0 [end]
    element 0100007f 00007b00 : 0 [end]

    Note the non-zero upper bits of the first element. It turns out that
    nft_immediate doesn't zero the destination register, but this is needed
    when the length isn't a multiple of 4.

    Furthermore, the zeroing in nft_payload is broken. We can't use
    [len / 4] = 0 -- if len is a multiple of 4, index is off by one.

    Skip zeroing in this case and use a conditional instead of (len -1) / 4.

    Fixes: 49499c3e6e18 ("netfilter: nf_tables: switch registers to 32 bit addressing")
    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Sasha Levin

    Florian Westphal
     
  • [ Upstream commit 1d4adfaf65746203861c72d9d78de349eb97d528 ]

    Fix rxrpc_kernel_get_srtt() to indicate the validity of the returned
    smoothed RTT. If we haven't had any valid samples yet, the SRTT isn't
    useful.

    Fixes: c410bf01933e ("rxrpc: Fix the excessive initial retransmission timeout")
    Signed-off-by: David Howells
    Signed-off-by: Sasha Levin

    David Howells
     

21 Aug, 2020

1 commit

  • commit d9539752d23283db4692384a634034f451261e29 upstream.

    Add missed sock updates to compat path via a new helper, which will be
    used more in coming patches. (The net/core/scm.c code is left as-is here
    to assist with -stable backports for the compat path.)

    Cc: Christoph Hellwig
    Cc: Sargun Dhillon
    Cc: Jakub Kicinski
    Cc: stable@vger.kernel.org
    Fixes: 48a87cc26c13 ("net: netprio: fd passed in SCM_RIGHTS datagram not set correctly")
    Fixes: d84295067fc7 ("net: net_cls: fd passed in SCM_RIGHTS datagram not set correctly")
    Acked-by: Christian Brauner
    Signed-off-by: Kees Cook
    Signed-off-by: Greg Kroah-Hartman

    Kees Cook
     

19 Aug, 2020

3 commits

  • [ Upstream commit 62ffc589abb176821662efc4525ee4ac0b9c3894 ]

    Refactor the fastreuse update code in inet_csk_get_port into a small
    helper function that can be called from other places.

    Acked-by: Matthieu Baerts
    Signed-off-by: Tim Froidcoeur
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Tim Froidcoeur
     
  • [ Upstream commit f19008e676366c44e9241af57f331b6c6edf9552 ]

    When TFO keys are read back on big endian systems either via the global
    sysctl interface or via getsockopt() using TCP_FASTOPEN_KEY, the values
    don't match what was written.

    For example, on s390x:

    # echo "1-2-3-4" > /proc/sys/net/ipv4/tcp_fastopen_key
    # cat /proc/sys/net/ipv4/tcp_fastopen_key
    02000000-01000000-04000000-03000000

    Instead of:

    # cat /proc/sys/net/ipv4/tcp_fastopen_key
    00000001-00000002-00000003-00000004

    Fix this by converting to the correct endianness on read. This was
    reported by Colin Ian King when running the 'tcp_fastopen_backup_key' net
    selftest on s390x, which depends on the read value matching what was
    written. I've confirmed that the test now passes on big and little endian
    systems.

    Signed-off-by: Jason Baron
    Fixes: 438ac88009bc ("net: fastopen: robustness and endianness fixes for SipHash")
    Cc: Ard Biesheuvel
    Cc: Eric Dumazet
    Reported-and-tested-by: Colin Ian King
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Jason Baron
     
  • [ Upstream commit f0a5e4d7a594e0fe237d3dfafb069bb82f80f42f ]

    YangYuxi is reporting that connection reuse
    is causing one-second delay when SYN hits
    existing connection in TIME_WAIT state.
    Such delay was added to give time to expire
    both the IPVS connection and the corresponding
    conntrack. This was considered a rare case
    at that time but it is causing problem for
    some environments such as Kubernetes.

    As nf_conntrack_tcp_packet() can decide to
    release the conntrack in TIME_WAIT state and
    to replace it with a fresh NEW conntrack, we
    can use this to allow rescheduling just by
    tuning our check: if the conntrack is
    confirmed we can not schedule it to different
    real server and the one-second delay still
    applies but if new conntrack was created,
    we are free to select new real server without
    any delays.

    YangYuxi lists some of the problem reports:

    - One second connection delay in masquerading mode:
    https://marc.info/?t=151683118100004&r=1&w=2

    - IPVS low throughput #70747
    https://github.com/kubernetes/kubernetes/issues/70747

    - Apache Bench can fill up ipvs service proxy in seconds #544
    https://github.com/cloudnativelabs/kube-router/issues/544

    - Additional 1s latency in `host -> service IP -> pod`
    https://github.com/kubernetes/kubernetes/issues/90854

    Fixes: f719e3754ee2 ("ipvs: drop first packet to redirect conntrack")
    Co-developed-by: YangYuxi
    Signed-off-by: YangYuxi
    Signed-off-by: Julian Anastasov
    Reviewed-by: Simon Horman
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Sasha Levin

    Julian Anastasov
     

11 Aug, 2020

1 commit

  • [ Upstream commit 8c0de6e96c9794cb523a516c465991a70245da1c ]

    IPV6_ADDRFORM causes resource leaks when converting an IPv6 socket
    to IPv4, particularly struct ipv6_ac_socklist. Similar to
    struct ipv6_mc_socklist, we should just close it on this path.

    This bug can be easily reproduced with the following C program:

    #include
    #include
    #include
    #include
    #include

    int main()
    {
    int s, value;
    struct sockaddr_in6 addr;
    struct ipv6_mreq m6;

    s = socket(AF_INET6, SOCK_DGRAM, 0);
    addr.sin6_family = AF_INET6;
    addr.sin6_port = htons(5000);
    inet_pton(AF_INET6, "::ffff:192.168.122.194", &addr.sin6_addr);
    connect(s, (struct sockaddr *)&addr, sizeof(addr));

    inet_pton(AF_INET6, "fe80::AAAA", &m6.ipv6mr_multiaddr);
    m6.ipv6mr_interface = 5;
    setsockopt(s, SOL_IPV6, IPV6_JOIN_ANYCAST, &m6, sizeof(m6));

    value = AF_INET;
    setsockopt(s, SOL_IPV6, IPV6_ADDRFORM, &value, sizeof(value));

    close(s);
    return 0;
    }

    Reported-by: ch3332xr@gmail.com
    Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Cong Wang
     

05 Aug, 2020

2 commits

  • [ Upstream commit 101dde4207f1daa1fda57d714814a03835dccc3f ]

    The commits "xfrm: Move dst->path into struct xfrm_dst"
    and "net: Create and use new helper xfrm_dst_child()."
    changed xfrm bundle handling under the assumption
    that xdst->path and dst->child are not a NULL pointer
    only if dst->xfrm is not a NULL pointer. That is true
    with one exception. If the xfrm hold queue is used
    to wait until a SA is installed by the key manager,
    we create a dummy bundle without a valid dst->xfrm
    pointer. The current xfrm bundle handling crashes
    in that case. Fix this by extending the NULL check
    of dst->xfrm with a test of the DST_XFRM_QUEUE flag.

    Fixes: 0f6c480f23f4 ("xfrm: Move dst->path into struct xfrm_dst")
    Fixes: b92cf4aab8e6 ("net: Create and use new helper xfrm_dst_child().")
    Signed-off-by: Steffen Klassert
    Signed-off-by: Sasha Levin

    Steffen Klassert
     
  • [ Upstream commit 4f47e8ab6ab796b5380f74866fa5287aca4dcc58 ]

    In commit ed17b8d377ea ("xfrm: fix a warning in xfrm_policy_insert_list"),
    it would take 'priority' to make a policy unique, and allow duplicated
    policies with different 'priority' to be added, which is not expected
    by userland, as Tobias reported in strongswan.

    To fix this duplicated policies issue, and also fix the issue in
    commit ed17b8d377ea ("xfrm: fix a warning in xfrm_policy_insert_list"),
    when doing add/del/get/update on user interfaces, this patch is to change
    to look up a policy with both mark and mask by doing:

    mark.v == pol->mark.v && mark.m == pol->mark.m

    and leave the check:

    (mark & pol->mark.m) == pol->mark.v

    for tx/rx path only.

    As the userland expects an exact mark and mask match to manage policies.

    v1->v2:
    - make xfrm_policy_mark_match inline and fix the changelog as
    Tobias suggested.

    Fixes: 295fae568885 ("xfrm: Allow user space manipulation of SPD mark")
    Fixes: ed17b8d377ea ("xfrm: fix a warning in xfrm_policy_insert_list")
    Reported-by: Tobias Brunner
    Tested-by: Tobias Brunner
    Signed-off-by: Xin Long
    Signed-off-by: Steffen Klassert
    Signed-off-by: Sasha Levin

    Xin Long
     

22 Jul, 2020

3 commits

  • [ Upstream commit d7bf2ebebc2bd61ab95e2a8e33541ef282f303d4 ]

    There are a couple of places in net/sched/ that check skb->protocol and act
    on the value there. However, in the presence of VLAN tags, the value stored
    in skb->protocol can be inconsistent based on whether VLAN acceleration is
    enabled. The commit quoted in the Fixes tag below fixed the users of
    skb->protocol to use a helper that will always see the VLAN ethertype.

    However, most of the callers don't actually handle the VLAN ethertype, but
    expect to find the IP header type in the protocol field. This means that
    things like changing the ECN field, or parsing diffserv values, stops
    working if there's a VLAN tag, or if there are multiple nested VLAN
    tags (QinQ).

    To fix this, change the helper to take an argument that indicates whether
    the caller wants to skip the VLAN tags or not. When skipping VLAN tags, we
    make sure to skip all of them, so behaviour is consistent even in QinQ
    mode.

    To make the helper usable from the ECN code, move it to if_vlan.h instead
    of pkt_sched.h.

    v3:
    - Remove empty lines
    - Move vlan variable definitions inside loop in skb_protocol()
    - Also use skb_protocol() helper in IP{,6}_ECN_decapsulate() and
    bpf_skb_ecn_set_ce()

    v2:
    - Use eth_type_vlan() helper in skb_protocol()
    - Also fix code that reads skb->protocol directly
    - Change a couple of 'if/else if' statements to switch constructs to avoid
    calling the helper twice

    Reported-by: Ilya Ponetayev
    Fixes: d8b9605d2697 ("net: sched: fix skb->protocol use in case of accelerated vlan path")
    Signed-off-by: Toke Høiland-Jørgensen
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Toke Høiland-Jørgensen
     
  • [ Upstream commit 394de110a73395de2ca4516b0de435e91b11b604 ]

    The packets from tunnel devices (eg bareudp) may have only
    metadata in the dst pointer of skb. Hence a pointer check of
    neigh_lookup is needed in dst_neigh_lookup_skb

    Kernel crashes when packets from bareudp device is processed in
    the kernel neighbour subsytem.

    [ 133.384484] BUG: kernel NULL pointer dereference, address: 0000000000000000
    [ 133.385240] #PF: supervisor instruction fetch in kernel mode
    [ 133.385828] #PF: error_code(0x0010) - not-present page
    [ 133.386603] PGD 0 P4D 0
    [ 133.386875] Oops: 0010 [#1] SMP PTI
    [ 133.387275] CPU: 0 PID: 5045 Comm: ping Tainted: G W 5.8.0-rc2+ #15
    [ 133.388052] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
    [ 133.391076] RIP: 0010:0x0
    [ 133.392401] Code: Bad RIP value.
    [ 133.394029] RSP: 0018:ffffb79980003d50 EFLAGS: 00010246
    [ 133.396656] RAX: 0000000080000102 RBX: ffff9de2fe0d6600 RCX: ffff9de2fe5e9d00
    [ 133.399018] RDX: 0000000000000000 RSI: ffff9de2fe5e9d00 RDI: ffff9de2fc21b400
    [ 133.399685] RBP: ffff9de2fe5e9d00 R08: 0000000000000000 R09: 0000000000000000
    [ 133.400350] R10: ffff9de2fbc6be22 R11: ffff9de2fe0d6600 R12: ffff9de2fc21b400
    [ 133.401010] R13: ffff9de2fe0d6628 R14: 0000000000000001 R15: 0000000000000003
    [ 133.401667] FS: 00007fe014918740(0000) GS:ffff9de2fec00000(0000) knlGS:0000000000000000
    [ 133.402412] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 133.402948] CR2: ffffffffffffffd6 CR3: 000000003bb72000 CR4: 00000000000006f0
    [ 133.403611] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    [ 133.404270] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    [ 133.404933] Call Trace:
    [ 133.405169]
    [ 133.405367] __neigh_update+0x5a4/0x8f0
    [ 133.405734] arp_process+0x294/0x820
    [ 133.406076] ? __netif_receive_skb_core+0x866/0xe70
    [ 133.406557] arp_rcv+0x129/0x1c0
    [ 133.406882] __netif_receive_skb_one_core+0x95/0xb0
    [ 133.407340] process_backlog+0xa7/0x150
    [ 133.407705] net_rx_action+0x2af/0x420
    [ 133.408457] __do_softirq+0xda/0x2a8
    [ 133.408813] asm_call_on_stack+0x12/0x20
    [ 133.409290]
    [ 133.409519] do_softirq_own_stack+0x39/0x50
    [ 133.410036] do_softirq+0x50/0x60
    [ 133.410401] __local_bh_enable_ip+0x50/0x60
    [ 133.410871] ip_finish_output2+0x195/0x530
    [ 133.411288] ip_output+0x72/0xf0
    [ 133.411673] ? __ip_finish_output+0x1f0/0x1f0
    [ 133.412122] ip_send_skb+0x15/0x40
    [ 133.412471] raw_sendmsg+0x853/0xab0
    [ 133.412855] ? insert_pfn+0xfe/0x270
    [ 133.413827] ? vvar_fault+0xec/0x190
    [ 133.414772] sock_sendmsg+0x57/0x80
    [ 133.415685] __sys_sendto+0xdc/0x160
    [ 133.416605] ? syscall_trace_enter+0x1d4/0x2b0
    [ 133.417679] ? __audit_syscall_exit+0x1d9/0x280
    [ 133.418753] ? __prepare_exit_to_usermode+0x5d/0x1a0
    [ 133.419819] __x64_sys_sendto+0x24/0x30
    [ 133.420848] do_syscall_64+0x4d/0x90
    [ 133.421768] entry_SYSCALL_64_after_hwframe+0x44/0xa9
    [ 133.422833] RIP: 0033:0x7fe013689c03
    [ 133.423749] Code: Bad RIP value.
    [ 133.424624] RSP: 002b:00007ffc7288f418 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
    [ 133.425940] RAX: ffffffffffffffda RBX: 000056151fc63720 RCX: 00007fe013689c03
    [ 133.427225] RDX: 0000000000000040 RSI: 000056151fc63720 RDI: 0000000000000003
    [ 133.428481] RBP: 00007ffc72890b30 R08: 000056151fc60500 R09: 0000000000000010
    [ 133.429757] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
    [ 133.431041] R13: 000056151fc636e0 R14: 000056151fc616bc R15: 0000000000000080
    [ 133.432481] Modules linked in: mpls_iptunnel act_mirred act_tunnel_key cls_flower sch_ingress veth mpls_router ip_tunnel bareudp ip6_udp_tunnel udp_tunnel macsec udp_diag inet_diag unix_diag af_packet_diag netlink_diag binfmt_misc xt_MASQUERADE iptable_nat xt_addrtype xt_conntrack nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter bridge stp llc ebtable_filter ebtables overlay ip6table_filter ip6_tables iptable_filter sunrpc ext4 mbcache jbd2 pcspkr i2c_piix4 virtio_balloon joydev ip_tables xfs libcrc32c ata_generic qxl pata_acpi drm_ttm_helper ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm ata_piix libata virtio_net net_failover virtio_console failover virtio_blk i2c_core virtio_pci virtio_ring serio_raw floppy virtio dm_mirror dm_region_hash dm_log dm_mod
    [ 133.444045] CR2: 0000000000000000
    [ 133.445082] ---[ end trace f4aeee1958fd1638 ]---
    [ 133.446236] RIP: 0010:0x0
    [ 133.447180] Code: Bad RIP value.
    [ 133.448152] RSP: 0018:ffffb79980003d50 EFLAGS: 00010246
    [ 133.449363] RAX: 0000000080000102 RBX: ffff9de2fe0d6600 RCX: ffff9de2fe5e9d00
    [ 133.450835] RDX: 0000000000000000 RSI: ffff9de2fe5e9d00 RDI: ffff9de2fc21b400
    [ 133.452237] RBP: ffff9de2fe5e9d00 R08: 0000000000000000 R09: 0000000000000000
    [ 133.453722] R10: ffff9de2fbc6be22 R11: ffff9de2fe0d6600 R12: ffff9de2fc21b400
    [ 133.455149] R13: ffff9de2fe0d6628 R14: 0000000000000001 R15: 0000000000000003
    [ 133.456520] FS: 00007fe014918740(0000) GS:ffff9de2fec00000(0000) knlGS:0000000000000000
    [ 133.458046] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 133.459342] CR2: ffffffffffffffd6 CR3: 000000003bb72000 CR4: 00000000000006f0
    [ 133.460782] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    [ 133.462240] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    [ 133.463697] Kernel panic - not syncing: Fatal exception in interrupt
    [ 133.465226] Kernel Offset: 0xfa00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
    [ 133.467025] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---

    Fixes: aaa0c23cb901 ("Fix dst_neigh_lookup/dst_neigh_lookup_skb return value handling bug")
    Signed-off-by: Martin Varghese
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Martin Varghese
     
  • [ Upstream commit 1e82a62fec613844da9e558f3493540a5b7a7b67 ]

    A potential deadlock can occur during registering or unregistering a
    new generic netlink family between the main nl_table_lock and the
    cb_lock where each thread wants the lock held by the other, as
    demonstrated below.

    1) Thread 1 is performing a netlink_bind() operation on a socket. As part
    of this call, it will call netlink_lock_table(), incrementing the
    nl_table_users count to 1.
    2) Thread 2 is registering (or unregistering) a genl_family via the
    genl_(un)register_family() API. The cb_lock semaphore will be taken for
    writing.
    3) Thread 1 will call genl_bind() as part of the bind operation to handle
    subscribing to GENL multicast groups at the request of the user. It will
    attempt to take the cb_lock semaphore for reading, but it will fail and
    be scheduled away, waiting for Thread 2 to finish the write.
    4) Thread 2 will call netlink_table_grab() during the (un)registration
    call. However, as Thread 1 has incremented nl_table_users, it will not
    be able to proceed, and both threads will be stuck waiting for the
    other.

    genl_bind() is a noop, unless a genl_family implements the mcast_bind()
    function to handle setting up family-specific multicast operations. Since
    no one in-tree uses this functionality as Cong pointed out, simply removing
    the genl_bind() function will remove the possibility for deadlock, as there
    is no attempt by Thread 1 above to take the cb_lock semaphore.

    Fixes: c380d9a7afff ("genetlink: pass multicast bind/unbind to families")
    Suggested-by: Cong Wang
    Acked-by: Johannes Berg
    Reported-by: kernel test robot
    Signed-off-by: Sean Tranchetti
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Sean Tranchetti
     

01 Jul, 2020

3 commits

  • [ Upstream commit 94579ac3f6d0820adc83b5dc5358ead0158101e9 ]

    During IPsec performance testing, we see bad ICMP checksum. The error packet
    has duplicated ESP trailer due to double validate_xmit_xfrm calls. The first call
    is from ip_output, but the packet cannot be sent because
    netif_xmit_frozen_or_stopped is true and the packet gets dev_requeue_skb. The second
    call is from NET_TX softirq. However after the first call, the packet already
    has the ESP trailer.

    Fix by marking the skb with XFRM_XMIT bit after the packet is handled by
    validate_xmit_xfrm to avoid duplicate ESP trailer insertion.

    Fixes: f6e27114a60a ("net: Add a xfrm validate function to validate_xmit_skb")
    Signed-off-by: Huy Nguyen
    Reviewed-by: Boris Pismenny
    Reviewed-by: Raed Salem
    Reviewed-by: Saeed Mahameed
    Signed-off-by: Steffen Klassert
    Signed-off-by: Sasha Levin

    Huy Nguyen
     
  • [ Upstream commit 471e39df96b9a4c4ba88a2da9e25a126624d7a9c ]

    If a socket is set ipv6only, it will still send IPv4 addresses in the
    INIT and INIT_ACK packets. This potentially misleads the peer into using
    them, which then would cause association termination.

    The fix is to not add IPv4 addresses to ipv6only sockets.

    Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
    Reported-by: Corey Minyard
    Signed-off-by: Marcelo Ricardo Leitner
    Tested-by: Corey Minyard
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Marcelo Ricardo Leitner
     
  • [ Upstream commit 41b14fb8724d5a4b382a63cb4a1a61880347ccb8 ]

    Clearing the sock TX queue in sk_set_socket() might cause unexpected
    out-of-order transmit when called from sock_orphan(), as outstanding
    packets can pick a different TX queue and bypass the ones already queued.

    This is undesired in general. More specifically, it breaks the in-order
    scheduling property guarantee for device-offloaded TLS sockets.

    Remove the call to sk_tx_queue_clear() in sk_set_socket(), and add it
    explicitly only where needed.

    Fixes: e022f0b4a03f ("net: Introduce sk_tx_queue_mapping")
    Signed-off-by: Tariq Toukan
    Reviewed-by: Boris Pismenny
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Tariq Toukan
     

22 Jun, 2020

1 commit

  • [ Upstream commit e91de6afa81c10e9f855c5695eb9a53168d96b73 ]

    KTLS uses a stream parser to collect TLS messages and send them to
    the upper layer tls receive handler. This ensures the tls receiver
    has a full TLS header to parse when it is run. However, when a
    socket has BPF_SK_SKB_STREAM_VERDICT program attached before KTLS
    is enabled we end up with two stream parsers running on the same
    socket.

    The result is both try to run on the same socket. First the KTLS
    stream parser runs and calls read_sock() which will tcp_read_sock
    which in turn calls tcp_rcv_skb(). This dequeues the skb from the
    sk_receive_queue. When this is done KTLS code then data_ready()
    callback which because we stacked KTLS on top of the bpf stream
    verdict program has been replaced with sk_psock_start_strp(). This
    will in turn kick the stream parser again and eventually do the
    same thing KTLS did above calling into tcp_rcv_skb() and dequeuing
    a skb from the sk_receive_queue.

    At this point the data stream is broke. Part of the stream was
    handled by the KTLS side some other bytes may have been handled
    by the BPF side. Generally this results in either missing data
    or more likely a "Bad Message" complaint from the kTLS receive
    handler as the BPF program steals some bytes meant to be in a
    TLS header and/or the TLS header length is no longer correct.

    We've already broke the idealized model where we can stack ULPs
    in any order with generic callbacks on the TX side to handle this.
    So in this patch we do the same thing but for RX side. We add
    a sk_psock_strp_enabled() helper so TLS can learn a BPF verdict
    program is running and add a tls_sw_has_ctx_rx() helper so BPF
    side can learn there is a TLS ULP on the socket.

    Then on BPF side we omit calling our stream parser to avoid
    breaking the data stream for the KTLS receiver. Then on the
    KTLS side we call BPF_SK_SKB_STREAM_VERDICT once the KTLS
    receiver is done with the packet but before it posts the
    msg to userspace. This gives us symmetry between the TX and
    RX halfs and IMO makes it usable again. On the TX side we
    process packets in this order BPF -> TLS -> TCP and on
    the receive side in the reverse order TCP -> TLS -> BPF.

    Discovered while testing OpenSSL 3.0 Alpha2.0 release.

    Fixes: d829e9c4112b5 ("tls: convert to generic sk_msg interface")
    Signed-off-by: John Fastabend
    Signed-off-by: Alexei Starovoitov
    Link: https://lore.kernel.org/bpf/159079361946.5745.605854335665044485.stgit@john-Precision-5820-Tower
    Signed-off-by: Alexei Starovoitov
    Signed-off-by: Sasha Levin

    John Fastabend
     

19 Jun, 2020

1 commit

  • * tag 'v5.4.47': (2193 commits)
    Linux 5.4.47
    KVM: arm64: Save the host's PtrAuth keys in non-preemptible context
    KVM: arm64: Synchronize sysreg state on injecting an AArch32 exception
    ...

    Conflicts:
    arch/arm/boot/dts/imx6qdl.dtsi
    arch/arm/mach-imx/Kconfig
    arch/arm/mach-imx/common.h
    arch/arm/mach-imx/suspend-imx6.S
    arch/arm64/boot/dts/freescale/imx8qxp-mek.dts
    arch/powerpc/include/asm/cacheflush.h
    drivers/cpufreq/imx6q-cpufreq.c
    drivers/dma/imx-sdma.c
    drivers/edac/synopsys_edac.c
    drivers/firmware/imx/imx-scu.c
    drivers/net/ethernet/freescale/fec.h
    drivers/net/ethernet/freescale/fec_main.c
    drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
    drivers/net/phy/phy_device.c
    drivers/perf/fsl_imx8_ddr_perf.c
    drivers/usb/cdns3/gadget.c
    drivers/usb/dwc3/gadget.c
    include/uapi/linux/dma-buf.h

    Signed-off-by: Jason Liu

    Jason Liu
     

17 Jun, 2020

1 commit

  • [ Upstream commit c96b6acc8f89a4a7f6258dfe1d077654c11415be ]

    There are some memory leaks in dccp_init() and dccp_fini().

    In dccp_fini() and the error handling path in dccp_init(), free lhash2
    is missing. Add inet_hashinfo2_free_mod() to do it.

    If inet_hashinfo2_init_mod() failed in dccp_init(),
    percpu_counter_destroy() should be called to destroy dccp_orphan_count.
    It need to goto out_free_percpu when inet_hashinfo2_init_mod() failed.

    Fixes: c92c81df93df ("net: dccp: fix kernel crash on module load")
    Reported-by: Hulk Robot
    Signed-off-by: Wang Hai
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Wang Hai
     

03 Jun, 2020

6 commits

  • commit 1fd1c768f3624a5e66766e7b4ddb9b607cd834a5 upstream.

    Similar to the last path, need to fix fib_info_nh_uses_dev for
    external nexthops to avoid referencing multiple nh_grp structs.
    Move the device check in fib_info_nh_uses_dev to a helper and
    create a nexthop version that is called if the fib_info uses an
    external nexthop.

    Fixes: 430a049190de ("nexthop: Add support for nexthop groups")
    Signed-off-by: David Ahern
    Acked-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    David Ahern
     
  • commit 0b5e2e39739e861fa5fc84ab27a35dbe62a15330 upstream.

    I got too fancy consolidating checks on multipath type. The result
    is that path lookups can access 2 different nh_grp structs as exposed
    by Nik's torture tests. Expand nexthop_is_multipath within nexthop.h to
    avoid multiple, nh_grp dereferences and make decisions based on the
    consistent struct.

    Only 2 places left using nexthop_is_multipath are within IPv6, both
    only check that the nexthop is a multipath for a branching decision
    which are acceptable.

    Fixes: 430a049190de ("nexthop: Add support for nexthop groups")
    Signed-off-by: David Ahern
    Acked-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    David Ahern
     
  • commit 90f33bffa382598a32cc82abfeb20adc92d041b6 upstream.

    We must avoid modifying published nexthop groups while they might be
    in use, otherwise we might see NULL ptr dereferences. In order to do
    that we allocate 2 nexthoup group structures upon nexthop creation
    and swap between them when we have to delete an entry. The reason is
    that we can't fail nexthop group removal, so we can't handle allocation
    failure thus we move the extra allocation on creation where we can
    safely fail and return ENOMEM.

    Fixes: 430a049190de ("nexthop: Add support for nexthop groups")
    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Nikolay Aleksandrov
     
  • [ Upstream commit 0cada33241d9de205522e3858b18e506ca5cce2c ]

    tls_sw_recvmsg() and tls_decrypt_done() can be run concurrently.
    // tls_sw_recvmsg()
    if (atomic_read(&ctx->decrypt_pending))
    crypto_wait_req(-EINPROGRESS, &ctx->async_wait);
    else
    reinit_completion(&ctx->async_wait.completion);

    //tls_decrypt_done()
    pending = atomic_dec_return(&ctx->decrypt_pending);

    if (!pending && READ_ONCE(ctx->async_notify))
    complete(&ctx->async_wait.completion);

    Consider the scenario tls_decrypt_done() is about to run complete()

    if (!pending && READ_ONCE(ctx->async_notify))

    and tls_sw_recvmsg() reads decrypt_pending == 0, does reinit_completion(),
    then tls_decrypt_done() runs complete(). This sequence of execution
    results in wrong completion. Consequently, for next decrypt request,
    it will not wait for completion, eventually on connection close, crypto
    resources freed, there is no way to handle pending decrypt response.

    This race condition can be avoided by having atomic_read() mutually
    exclusive with atomic_dec_return(),complete().Intoduced spin lock to
    ensure the mutual exclution.

    Addressed similar problem in tx direction.

    v1->v2:
    - More readable commit message.
    - Corrected the lock to fix new race scenario.
    - Removed barrier which is not needed now.

    Fixes: a42055e8d2c3 ("net/tls: Add support for async encryption of records for performance")
    Signed-off-by: Vinay Kumar Yadav
    Reviewed-by: Jakub Kicinski
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Vinay Kumar Yadav
     
  • [ Upstream commit b15e62631c5f19fea9895f7632dae9c1b27fe0cd ]

    When a new action is installed, firstuse field of 'tcf_t' is explicitly set
    to 0. Value of zero means "new action, not yet used"; as a packet hits the
    action, 'firstuse' is stamped with the current jiffies value.

    tcf_tm_dump() should return 0 for firstuse if action has not yet been hit.

    Fixes: 48d8ee1694dd ("net sched actions: aggregate dumping of actions timeinfo")
    Cc: Jamal Hadi Salim
    Signed-off-by: Roman Mashak
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Roman Mashak
     
  • [ Upstream commit 41b4bd986f86331efc599b9a3f5fb86ad92e9af9 ]

    In case we can't find a ->dumpit callback for the requested
    (family,type) pair, we fall back to (PF_UNSPEC,type). In effect, we're
    in the same situation as if userspace had requested a PF_UNSPEC
    dump. For RTM_GETROUTE, that handler is rtnl_dump_all, which calls all
    the registered RTM_GETROUTE handlers.

    The requested table id may or may not exist for all of those
    families. commit ae677bbb4441 ("net: Don't return invalid table id
    error when dumping all families") fixed the problem when userspace
    explicitly requests a PF_UNSPEC dump, but missed the fallback case.

    For example, when we pass ipv6.disable=1 to a kernel with
    CONFIG_IP_MROUTE=y and CONFIG_IP_MROUTE_MULTIPLE_TABLES=y,
    the (PF_INET6, RTM_GETROUTE) handler isn't registered, so we end up in
    rtnl_dump_all, and listing IPv6 routes will unexpectedly print:

    # ip -6 r
    Error: ipv4: MR table does not exist.
    Dump terminated

    commit ae677bbb4441 introduced the dump_all_families variable, which
    gets set when userspace requests a PF_UNSPEC dump. However, we can't
    simply set the family to PF_UNSPEC in rtnetlink_rcv_msg in the
    fallback case to get dump_all_families == true, because some messages
    types (for example RTM_GETRULE and RTM_GETNEIGH) only register the
    PF_UNSPEC handler and use the family to filter in the kernel what is
    dumped to userspace. We would then export more entries, that userspace
    would have to filter. iproute does that, but other programs may not.

    Instead, this patch removes dump_all_families and updates the
    RTM_GETROUTE handlers to check if the family that is being dumped is
    their own. When it's not, which covers both the intentional PF_UNSPEC
    dumps (as dump_all_families did) and the fallback case, ignore the
    missing table id error.

    Fixes: cb167893f41e ("net: Plumb support for filtering ipv4 and ipv6 multicast route dumps")
    Signed-off-by: Sabrina Dubroca
    Reviewed-by: David Ahern
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Sabrina Dubroca
     

27 May, 2020

2 commits

  • commit c410bf01933e5e09d142c66c3df9ad470a7eec13 upstream.

    rxrpc currently uses a fixed 4s retransmission timeout until the RTT is
    sufficiently sampled. This can cause problems with some fileservers with
    calls to the cache manager in the afs filesystem being dropped from the
    fileserver because a packet goes missing and the retransmission timeout is
    greater than the call expiry timeout.

    Fix this by:

    (1) Copying the RTT/RTO calculation code from Linux's TCP implementation
    and altering it to fit rxrpc.

    (2) Altering the various users of the RTT to make use of the new SRTT
    value.

    (3) Replacing the use of rxrpc_resend_timeout to use the calculated RTO
    value instead (which is needed in jiffies), along with a backoff.

    Notes:

    (1) rxrpc provides RTT samples by matching the serial numbers on outgoing
    DATA packets that have the RXRPC_REQUEST_ACK set and PING ACK packets
    against the reference serial number in incoming REQUESTED ACK and
    PING-RESPONSE ACK packets.

    (2) Each packet that is transmitted on an rxrpc connection gets a new
    per-connection serial number, even for retransmissions, so an ACK can
    be cross-referenced to a specific trigger packet. This allows RTT
    information to be drawn from retransmitted DATA packets also.

    (3) rxrpc maintains the RTT/RTO state on the rxrpc_peer record rather than
    on an rxrpc_call because many RPC calls won't live long enough to
    generate more than one sample.

    (4) The calculated SRTT value is in units of 8ths of a microsecond rather
    than nanoseconds.

    The (S)RTT and RTO values are displayed in /proc/net/rxrpc/peers.

    Fixes: 17926a79320a ([AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both"")
    Signed-off-by: David Howells
    Signed-off-by: Greg Kroah-Hartman

    David Howells
     
  • [ Upstream commit 1cd9b3abf5332102d4d967555e7ed861a75094bf ]

    In net/Kconfig, NET_DEVLINK implies NET_DROP_MONITOR.

    The original behavior of the 'imply' keyword prevents NET_DROP_MONITOR
    from being 'm' when NET_DEVLINK=y.

    With the planned Kconfig change that relaxes the 'imply', the
    combination of NET_DEVLINK=y and NET_DROP_MONITOR=m would be allowed.

    Use IS_REACHABLE() to avoid the vmlinux link error for this case.

    Reported-by: Stephen Rothwell
    Signed-off-by: Masahiro Yamada
    Acked-by: Neil Horman
    Signed-off-by: Sasha Levin

    Masahiro Yamada
     

20 May, 2020

3 commits

  • [ Upstream commit 2c407aca64977ede9b9f35158e919773cae2082f ]

    gcc-10 warns around a suspicious access to an empty struct member:

    net/netfilter/nf_conntrack_core.c: In function '__nf_conntrack_alloc':
    net/netfilter/nf_conntrack_core.c:1522:9: warning: array subscript 0 is outside the bounds of an interior zero-length array 'u8[0]' {aka 'unsigned char[0]'} [-Wzero-length-bounds]
    1522 | memset(&ct->__nfct_init_offset[0], 0,
    | ^~~~~~~~~~~~~~~~~~~~~~~~~~
    In file included from net/netfilter/nf_conntrack_core.c:37:
    include/net/netfilter/nf_conntrack.h:90:5: note: while referencing '__nfct_init_offset'
    90 | u8 __nfct_init_offset[0];
    | ^~~~~~~~~~~~~~~~~~

    The code is correct but a bit unusual. Rework it slightly in a way that
    does not trigger the warning, using an empty struct instead of an empty
    array. There are probably more elegant ways to do this, but this is the
    smallest change.

    Fixes: c41884ce0562 ("netfilter: conntrack: avoid zeroing timer")
    Signed-off-by: Arnd Bergmann
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Sasha Levin

    Arnd Bergmann
     
  • [ Upstream commit 24adbc1676af4e134e709ddc7f34cf2adc2131e4 ]

    We autotune rcvbuf whenever SO_RCVLOWAT is set to account for 100%
    overhead in tcp_set_rcvlowat()

    This works well when skb->len/skb->truesize ratio is bigger than 0.5

    But if we receive packets with small MSS, we can end up in a situation
    where not enough bytes are available in the receive queue to satisfy
    RCVLOWAT setting.
    As our sk_rcvbuf limit is hit, we send zero windows in ACK packets,
    preventing remote peer from sending more data.

    Even autotuning does not help, because it only triggers at the time
    user process drains the queue. If no EPOLLIN is generated, this
    can not happen.

    Note poll() has a similar issue, after commit
    c7004482e8dc ("tcp: Respect SO_RCVLOWAT in tcp_poll().")

    Fixes: 03f45c883c6f ("tcp: avoid extra wakeups for SO_RCVLOWAT users")
    Signed-off-by: Eric Dumazet
    Acked-by: Soheil Hassas Yeganeh
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • [ Upstream commit a7df4870d79b00742da6cc93ca2f336a71db77f7 ]

    When we tell kernel to dump filters from root (ffff:ffff),
    those filters on ingress (ffff:0000) are matched, but their
    true parents must be dumped as they are. However, kernel
    dumps just whatever we tell it, that is either ffff:ffff
    or ffff:0000:

    $ nl-cls-list --dev=dummy0 --parent=root
    cls basic dev dummy0 id none parent root prio 49152 protocol ip match-all
    cls basic dev dummy0 id :1 parent root prio 49152 protocol ip match-all
    $ nl-cls-list --dev=dummy0 --parent=ffff:
    cls basic dev dummy0 id none parent ffff: prio 49152 protocol ip match-all
    cls basic dev dummy0 id :1 parent ffff: prio 49152 protocol ip match-all

    This is confusing and misleading, more importantly this is
    a regression since 4.15, so the old behavior must be restored.

    And, when tc filters are installed on a tc class, the parent
    should be the classid, rather than the qdisc handle. Commit
    edf6711c9840 ("net: sched: remove classid and q fields from tcf_proto")
    removed the classid we save for filters, we can just restore
    this classid in tcf_block.

    Steps to reproduce this:
    ip li set dev dummy0 up
    tc qd add dev dummy0 ingress
    tc filter add dev dummy0 parent ffff: protocol arp basic action pass
    tc filter show dev dummy0 root

    Before this patch:
    filter protocol arp pref 49152 basic
    filter protocol arp pref 49152 basic handle 0x1
    action order 1: gact action pass
    random type none pass val 0
    index 1 ref 1 bind 1

    After this patch:
    filter parent ffff: protocol arp pref 49152 basic
    filter parent ffff: protocol arp pref 49152 basic handle 0x1
    action order 1: gact action pass
    random type none pass val 0
    index 1 ref 1 bind 1

    Fixes: a10fa20101ae ("net: sched: propagate q and parent from caller down to tcf_fill_node")
    Fixes: edf6711c9840 ("net: sched: remove classid and q fields from tcf_proto")
    Cc: Jamal Hadi Salim
    Cc: Jiri Pirko
    Signed-off-by: Cong Wang
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin

    Cong Wang
     

14 May, 2020

2 commits

  • [ Upstream commit b723748750ece7d844cdf2f52c01d37f83387208 ]

    RFC 6040 recommends propagating an ECT(1) mark from an outer tunnel header
    to the inner header if that inner header is already marked as ECT(0). When
    RFC 6040 decapsulation was implemented, this case of propagation was not
    added. This simply appears to be an oversight, so let's fix that.

    Fixes: eccc1bb8d4b4 ("tunnel: drop packet if ECN present with not-ECT")
    Reported-by: Bob Briscoe
    Reported-by: Olivier Tilmans
    Cc: Dave Taht
    Cc: Stephen Hemminger
    Signed-off-by: Toke Høiland-Jørgensen
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Toke Høiland-Jørgensen
     
  • [ Upstream commit 8f34e53b60b337e559f1ea19e2780ff95ab2fa65 ]

    Nik reported a bug with pcpu dst cache when nexthop objects are
    used illustrated by the following:
    $ ip netns add foo
    $ ip -netns foo li set lo up
    $ ip -netns foo addr add 2001:db8:11::1/128 dev lo
    $ ip netns exec foo sysctl net.ipv6.conf.all.forwarding=1
    $ ip li add veth1 type veth peer name veth2
    $ ip li set veth1 up
    $ ip addr add 2001:db8:10::1/64 dev veth1
    $ ip li set dev veth2 netns foo
    $ ip -netns foo li set veth2 up
    $ ip -netns foo addr add 2001:db8:10::2/64 dev veth2
    $ ip -6 nexthop add id 100 via 2001:db8:10::2 dev veth1
    $ ip -6 route add 2001:db8:11::1/128 nhid 100

    Create a pcpu entry on cpu 0:
    $ taskset -a -c 0 ip -6 route get 2001:db8:11::1

    Re-add the route entry:
    $ ip -6 ro del 2001:db8:11::1
    $ ip -6 route add 2001:db8:11::1/128 nhid 100

    Route get on cpu 0 returns the stale pcpu:
    $ taskset -a -c 0 ip -6 route get 2001:db8:11::1
    RTNETLINK answers: Network is unreachable

    While cpu 1 works:
    $ taskset -a -c 1 ip -6 route get 2001:db8:11::1
    2001:db8:11::1 from :: via 2001:db8:10::2 dev veth1 src 2001:db8:10::1 metric 1024 pref medium

    Conversion of FIB entries to work with external nexthop objects
    missed an important difference between IPv4 and IPv6 - how dst
    entries are invalidated when the FIB changes. IPv4 has a per-network
    namespace generation id (rt_genid) that is bumped on changes to the FIB.
    Checking if a dst_entry is still valid means comparing rt_genid in the
    rtable to the current value of rt_genid for the namespace.

    IPv6 also has a per network namespace counter, fib6_sernum, but the
    count is saved per fib6_node. With the per-node counter only dst_entries
    based on fib entries under the node are invalidated when changes are
    made to the routes - limiting the scope of invalidations. IPv6 uses a
    reference in the rt6_info, 'from', to track the corresponding fib entry
    used to create the dst_entry. When validating a dst_entry, the 'from'
    is used to backtrack to the fib6_node and check the sernum of it to the
    cookie passed to the dst_check operation.

    With the inline format (nexthop definition inline with the fib6_info),
    dst_entries cached in the fib6_nh have a 1:1 correlation between fib
    entries, nexthop data and dst_entries. With external nexthops, IPv6
    looks more like IPv4 which means multiple fib entries across disparate
    fib6_nodes can all reference the same fib6_nh. That means validation
    of dst_entries based on external nexthops needs to use the IPv4 format
    - the per-network namespace counter.

    Add sernum to rt6_info and set it when creating a pcpu dst entry. Update
    rt6_get_cookie to return sernum if it is set and update dst_check for
    IPv6 to look for sernum set and based the check on it if so. Finally,
    rt6_get_pcpu_route needs to validate the cached entry before returning
    a pcpu entry (similar to the rt_cache_valid calls in __mkroute_input and
    __mkroute_output for IPv4).

    This problem only affects routes using the new, external nexthops.

    Thanks to the kbuild test robot for catching the IS_ENABLED needed
    around rt_genid_ipv6 before I sent this out.

    Fixes: 5b98324ebe29 ("ipv6: Allow routes to use nexthop objects")
    Reported-by: Nikolay Aleksandrov
    Signed-off-by: David Ahern
    Reviewed-by: Nikolay Aleksandrov
    Tested-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    David Ahern
     

10 May, 2020

1 commit

  • commit d0208bf4da97f76237300afb83c097de25645de6 upstream.

    Commit 6cd021a58c18a ("udp: segment looped gso packets correctly")
    fixes an issue with rare udp gso multicast packets looped onto the
    receive path.

    The stable backport makes the narrowest change to target only these
    packets, when needed. As opposed to, say, expanding __udp_gso_segment,
    which is harder to reason to be free from unintended side-effects.

    But the resulting code is hardly self-describing.
    Document its purpose and rationale.

    Signed-off-by: Willem de Bruijn
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Willem de Bruijn
     

29 Apr, 2020

1 commit

  • commit 6cb5f3ea4654faf8c28b901266e960b1a4787b26 upstream.

    When fixing the initialization race, we neglected to account for
    the fact that debugfs is initialized in wiphy_register(), and
    some debugfs things went missing (or rather were rerooted to the
    global debugfs root).

    Fix this by adding debugfs entries only after wiphy_register().
    This requires some changes in the rate control code since it
    currently adds debugfs at alloc time, which can no longer be
    done after the reordering.

    Reported-by: Jouni Malinen
    Reported-by: kernel test robot
    Reported-by: Hauke Mehrtens
    Reported-by: Felix Fietkau
    Cc: stable@vger.kernel.org
    Fixes: 52e04b4ce5d0 ("mac80211: fix race in ieee80211_register_hw()")
    Signed-off-by: Johannes Berg
    Acked-by: Sumit Garg
    Link: https://lore.kernel.org/r/20200423111344.0e00d3346f12.Iadc76a03a55093d94391fc672e996a458702875d@changeid
    Signed-off-by: Johannes Berg
    Signed-off-by: Greg Kroah-Hartman

    Johannes Berg