30 Sep, 2022

1 commit

  • This is the 5.15.71 stable release

    * tag 'v5.15.71': (144 commits)
    Linux 5.15.71
    ext4: use locality group preallocation for small closed files
    ext4: avoid unnecessary spreading of allocations among groups
    ...

    Signed-off-by: Jason Liu

    Conflicts:
    drivers/net/phy/aquantia_main.c
    drivers/tty/serial/fsl_lpuart.c

    Jason Liu
     

28 Sep, 2022

11 commits

  • [ Upstream commit c2e1cfefcac35e0eea229e148c8284088ce437b5 ]

    tfilter_put need to be called to put the refount got by tp->ops->get to
    avoid possible refcount leak when chain->tmplt_ops != NULL and
    chain->tmplt_ops != tp->ops.

    Fixes: 7d5509fa0d3d ("net: sched: extend proto ops with 'put' callback")
    Signed-off-by: Hangyu Hua
    Reviewed-by: Vlad Buslov
    Link: https://lore.kernel.org/r/20220921092734.31700-1-hbh25y@gmail.com
    Signed-off-by: Jakub Kicinski
    Signed-off-by: Sasha Levin

    Hangyu Hua
     
  • [ Upstream commit e738455b2c6dcdab03e45d97de36476f93f557d2 ]

    There might be a potential race between SMC-R buffer map and
    link group termination.

    smc_smcr_terminate_all() | smc_connect_rdma()
    --------------------------------------------------------------
    | smc_conn_create()
    for links in smcibdev |
    schedule links down |
    | smc_buf_create()
    | \- smcr_buf_map_usable_links()
    | \- no usable links found,
    | (rmb->mr = NULL)
    |
    | smc_clc_send_confirm()
    | \- access conn->rmb_desc->mr[]->rkey
    | (panic)

    During reboot and IB device module remove, all links will be set
    down and no usable links remain in link groups. In such situation
    smcr_buf_map_usable_links() should return an error and stop the
    CLC flow accessing to uninitialized mr.

    Fixes: b9247544c1bc ("net/smc: convert static link ID instances to support multiple links")
    Signed-off-by: Wen Gu
    Link: https://lore.kernel.org/r/1663656189-32090-1-git-send-email-guwen@linux.alibaba.com
    Signed-off-by: Paolo Abeni
    Signed-off-by: Sasha Levin

    Wen Gu
     
  • [ Upstream commit 62ce44c4fff947eebdf10bb582267e686e6835c9 ]

    The bug fix was incomplete, it "replaced" crash with a memory leak.
    The old code had an assignment to "ret" embedded into the conditional,
    restore this.

    Fixes: 7997eff82828 ("netfilter: ebtables: reject blobs that don't provide all entry points")
    Reported-and-tested-by: syzbot+a24c5252f3e3ab733464@syzkaller.appspotmail.com
    Signed-off-by: Florian Westphal
    Signed-off-by: Sasha Levin

    Florian Westphal
     
  • [ Upstream commit 9a4d6dd554b86e65581ef6b6638a39ae079b17ac ]

    It seems to me that percpu memory for chain stats started leaking since
    commit 3bc158f8d0330f0a ("netfilter: nf_tables: map basechain priority to
    hardware priority") when nft_chain_offload_priority() returned an error.

    Signed-off-by: Tetsuo Handa
    Fixes: 3bc158f8d0330f0a ("netfilter: nf_tables: map basechain priority to hardware priority")
    Signed-off-by: Florian Westphal
    Signed-off-by: Sasha Levin

    Tetsuo Handa
     
  • [ Upstream commit 921ebde3c0d22c8cba74ce8eb3cc4626abff1ccd ]

    syzbot is reporting underflow of nft_counters_enabled counter at
    nf_tables_addchain() [1], for commit 43eb8949cfdffa76 ("netfilter:
    nf_tables: do not leave chain stats enabled on error") missed that
    nf_tables_chain_destroy() after nft_basechain_init() in the error path of
    nf_tables_addchain() decrements the counter because nft_basechain_init()
    makes nft_is_base_chain() return true by setting NFT_CHAIN_BASE flag.

    Increment the counter immediately after returning from
    nft_basechain_init().

    Link: https://syzkaller.appspot.com/bug?extid=b5d82a651b71cd8a75ab [1]
    Reported-by: syzbot
    Signed-off-by: Tetsuo Handa
    Tested-by: syzbot
    Fixes: 43eb8949cfdffa76 ("netfilter: nf_tables: do not leave chain stats enabled on error")
    Signed-off-by: Florian Westphal
    Signed-off-by: Sasha Levin

    Tetsuo Handa
     
  • [ Upstream commit 1461d212ab277d8bba1a753d33e9afe03d81f9d4 ]

    taprio can only operate as root qdisc, and to that end, there exists the
    following check in taprio_init(), just as in mqprio:

    if (sch->parent != TC_H_ROOT)
    return -EOPNOTSUPP;

    And indeed, when we try to attach taprio to an mqprio child, it fails as
    expected:

    $ tc qdisc add dev swp0 root handle 1: mqprio num_tc 8 \
    map 0 1 2 3 4 5 6 7 \
    queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 hw 0
    $ tc qdisc replace dev swp0 parent 1:2 taprio num_tc 8 \
    map 0 1 2 3 4 5 6 7 \
    queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
    base-time 0 sched-entry S 0x7f 990000 sched-entry S 0x80 100000 \
    flags 0x0 clockid CLOCK_TAI
    Error: sch_taprio: Can only be attached as root qdisc.

    (extack message added by me)

    But when we try to attach a taprio child to a taprio root qdisc,
    surprisingly it doesn't fail:

    $ tc qdisc replace dev swp0 root handle 1: taprio num_tc 8 \
    map 0 1 2 3 4 5 6 7 queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
    base-time 0 sched-entry S 0x7f 990000 sched-entry S 0x80 100000 \
    flags 0x0 clockid CLOCK_TAI
    $ tc qdisc replace dev swp0 parent 1:2 taprio num_tc 8 \
    map 0 1 2 3 4 5 6 7 \
    queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
    base-time 0 sched-entry S 0x7f 990000 sched-entry S 0x80 100000 \
    flags 0x0 clockid CLOCK_TAI

    This is because tc_modify_qdisc() behaves differently when mqprio is
    root, vs when taprio is root.

    In the mqprio case, it finds the parent qdisc through
    p = qdisc_lookup(dev, TC_H_MAJ(clid)), and then the child qdisc through
    q = qdisc_leaf(p, clid). This leaf qdisc q has handle 0, so it is
    ignored according to the comment right below ("It may be default qdisc,
    ignore it"). As a result, tc_modify_qdisc() goes through the
    qdisc_create() code path, and this gives taprio_init() a chance to check
    for sch_parent != TC_H_ROOT and error out.

    Whereas in the taprio case, the returned q = qdisc_leaf(p, clid) is
    different. It is not the default qdisc created for each netdev queue
    (both taprio and mqprio call qdisc_create_dflt() and keep them in
    a private q->qdiscs[], or priv->qdiscs[], respectively). Instead, taprio
    makes qdisc_leaf() return the _root_ qdisc, aka itself.

    When taprio does that, tc_modify_qdisc() goes through the qdisc_change()
    code path, because the qdisc layer never finds out about the child qdisc
    of the root. And through the ->change() ops, taprio has no reason to
    check whether its parent is root or not, just through ->init(), which is
    not called.

    The problem is the taprio_leaf() implementation. Even though code wise,
    it does the exact same thing as mqprio_leaf() which it is copied from,
    it works with different input data. This is because mqprio does not
    attach itself (the root) to each device TX queue, but one of the default
    qdiscs from its private array.

    In fact, since commit 13511704f8d7 ("net: taprio offload: enforce qdisc
    to netdev queue mapping"), taprio does this too, but just for the full
    offload case. So if we tried to attach a taprio child to a fully
    offloaded taprio root qdisc, it would properly fail too; just not to a
    software root taprio.

    To fix the problem, stop looking at the Qdisc that's attached to the TX
    queue, and instead, always return the default qdiscs that we've
    allocated (and to which we privately enqueue and dequeue, in software
    scheduling mode).

    Since Qdisc_class_ops :: leaf is only called from tc_modify_qdisc(),
    the risk of unforeseen side effects introduced by this change is
    minimal.

    Fixes: 5a781ccbd19e ("tc: Add support for configuring the taprio scheduler")
    Signed-off-by: Vladimir Oltean
    Reviewed-by: Vinicius Costa Gomes
    Signed-off-by: Jakub Kicinski
    Signed-off-by: Sasha Levin

    Vladimir Oltean
     
  • [ Upstream commit db46e3a88a09c5cf7e505664d01da7238cd56c92 ]

    In an incredibly strange API design decision, qdisc->destroy() gets
    called even if qdisc->init() never succeeded, not exclusively since
    commit 87b60cfacf9f ("net_sched: fix error recovery at qdisc creation"),
    but apparently also earlier (in the case of qdisc_create_dflt()).

    The taprio qdisc does not fully acknowledge this when it attempts full
    offload, because it starts off with q->flags = TAPRIO_FLAGS_INVALID in
    taprio_init(), then it replaces q->flags with TCA_TAPRIO_ATTR_FLAGS
    parsed from netlink (in taprio_change(), tail called from taprio_init()).

    But in taprio_destroy(), we call taprio_disable_offload(), and this
    determines what to do based on FULL_OFFLOAD_IS_ENABLED(q->flags).

    But looking at the implementation of FULL_OFFLOAD_IS_ENABLED()
    (a bitwise check of bit 1 in q->flags), it is invalid to call this macro
    on q->flags when it contains TAPRIO_FLAGS_INVALID, because that is set
    to U32_MAX, and therefore FULL_OFFLOAD_IS_ENABLED() will return true on
    an invalid set of flags.

    As a result, it is possible to crash the kernel if user space forces an
    error between setting q->flags = TAPRIO_FLAGS_INVALID, and the calling
    of taprio_enable_offload(). This is because drivers do not expect the
    offload to be disabled when it was never enabled.

    The error that we force here is to attach taprio as a non-root qdisc,
    but instead as child of an mqprio root qdisc:

    $ tc qdisc add dev swp0 root handle 1: \
    mqprio num_tc 8 map 0 1 2 3 4 5 6 7 \
    queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 hw 0
    $ tc qdisc replace dev swp0 parent 1:1 \
    taprio num_tc 8 map 0 1 2 3 4 5 6 7 \
    queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 base-time 0 \
    sched-entry S 0x7f 990000 sched-entry S 0x80 100000 \
    flags 0x0 clockid CLOCK_TAI
    Unable to handle kernel paging request at virtual address fffffffffffffff8
    [fffffffffffffff8] pgd=0000000000000000, p4d=0000000000000000
    Internal error: Oops: 96000004 [#1] PREEMPT SMP
    Call trace:
    taprio_dump+0x27c/0x310
    vsc9959_port_setup_tc+0x1f4/0x460
    felix_port_setup_tc+0x24/0x3c
    dsa_slave_setup_tc+0x54/0x27c
    taprio_disable_offload.isra.0+0x58/0xe0
    taprio_destroy+0x80/0x104
    qdisc_create+0x240/0x470
    tc_modify_qdisc+0x1fc/0x6b0
    rtnetlink_rcv_msg+0x12c/0x390
    netlink_rcv_skb+0x5c/0x130
    rtnetlink_rcv+0x1c/0x2c

    Fix this by keeping track of the operations we made, and undo the
    offload only if we actually did it.

    I've added "bool offloaded" inside a 4 byte hole between "int clockid"
    and "atomic64_t picos_per_byte". Now the first cache line looks like
    below:

    $ pahole -C taprio_sched net/sched/sch_taprio.o
    struct taprio_sched {
    struct Qdisc * * qdiscs; /* 0 8 */
    struct Qdisc * root; /* 8 8 */
    u32 flags; /* 16 4 */
    enum tk_offsets tk_offset; /* 20 4 */
    int clockid; /* 24 4 */
    bool offloaded; /* 28 1 */

    /* XXX 3 bytes hole, try to pack */

    atomic64_t picos_per_byte; /* 32 0 */

    /* XXX 8 bytes hole, try to pack */

    spinlock_t current_entry_lock; /* 40 0 */

    /* XXX 8 bytes hole, try to pack */

    struct sched_entry * current_entry; /* 48 8 */
    struct sched_gate_list * oper_sched; /* 56 8 */
    /* --- cacheline 1 boundary (64 bytes) --- */

    Fixes: 9c66d1564676 ("taprio: Add support for hardware offloading")
    Signed-off-by: Vladimir Oltean
    Reviewed-by: Vinicius Costa Gomes
    Signed-off-by: Jakub Kicinski
    Signed-off-by: Sasha Levin

    Vladimir Oltean
     
  • [ Upstream commit 64ae13ed478428135cddc2f1113dff162d8112d4 ]

    __flow_hash_consistentify() wrongly swaps ipv4 addresses in few cases.
    This function is indirectly used by __skb_get_hash_symmetric(), which is
    used to fanout packets in AF_PACKET.
    Intrusion detection systems may be impacted by this issue.

    __flow_hash_consistentify() computes the addresses difference then swaps
    them if the difference is negative. In few cases src - dst and dst - src
    are both negative.

    The following snippet mimics __flow_hash_consistentify():

    ```
    #include
    #include

    int main(int argc, char** argv) {

    int diffs_d, diffd_s;
    uint32_t dst = 0xb225a8c0; /* 178.37.168.192 --> 192.168.37.178 */
    uint32_t src = 0x3225a8c0; /* 50.37.168.192 --> 192.168.37.50 */
    uint32_t dst2 = 0x3325a8c0; /* 51.37.168.192 --> 192.168.37.51 */

    diffs_d = src - dst;
    diffd_s = dst - src;

    printf("src:%08x dst:%08x, diff(s-d)=%d(0x%x) diff(d-s)=%d(0x%x)\n",
    src, dst, diffs_d, diffs_d, diffd_s, diffd_s);

    diffs_d = src - dst2;
    diffd_s = dst2 - src;

    printf("src:%08x dst:%08x, diff(s-d)=%d(0x%x) diff(d-s)=%d(0x%x)\n",
    src, dst2, diffs_d, diffs_d, diffd_s, diffd_s);

    return 0;
    }
    ```

    Results:

    src:3225a8c0 dst:b225a8c0, \
    diff(s-d)=-2147483648(0x80000000) \
    diff(d-s)=-2147483648(0x80000000)

    src:3225a8c0 dst:3325a8c0, \
    diff(s-d)=-16777216(0xff000000) \
    diff(d-s)=16777216(0x1000000)

    In the first case the addresses differences are always < 0, therefore
    __flow_hash_consistentify() always swaps, thus dst->src and src->dst
    packets have differents hashes.

    Fixes: c3f8324188fa8 ("net: Add full IPv6 addresses to flow_keys")
    Signed-off-by: Ludovic Cintrat
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin

    Ludovic Cintrat
     
  • [ Upstream commit 559c36c5a8d730c49ef805a72b213d3bba155cc8 ]

    nf_osf_find() incorrectly returns true on mismatch, this leads to
    copying uninitialized memory area in nft_osf which can be used to leak
    stale kernel stack data to userspace.

    Fixes: 22c7652cdaa8 ("netfilter: nft_osf: Add version option support")
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Florian Westphal
    Signed-off-by: Sasha Levin

    Pablo Neira Ayuso
     
  • [ Upstream commit e8d5dfd1d8747b56077d02664a8838c71ced948e ]

    CTCP messages should only be at the start of an IRC message, not
    anywhere within it.

    While the helper only decodes packes in the ORIGINAL direction, its
    possible to make a client send a CTCP message back by empedding one into
    a PING request. As-is, thats enough to make the helper believe that it
    saw a CTCP message.

    Fixes: 869f37d8e48f ("[NETFILTER]: nf_conntrack/nf_nat: add IRC helper port")
    Signed-off-by: David Leadbeater
    Signed-off-by: Florian Westphal
    Signed-off-by: Sasha Levin

    David Leadbeater
     
  • [ Upstream commit 39aebedeaaa95757f5c1f2ddb5f43fdddbf478ca ]

    ct_sip_next_header and ct_sip_get_header return an absolute
    value of matchoff, not a shift from current dataoff.
    So dataoff should be assigned matchoff, not incremented by it.

    This issue can be seen in the scenario when there are multiple
    Contact headers and the first one is using a hostname and other headers
    use IP addresses. In this case, ct_sip_walk_headers will work as follows:

    The first ct_sip_get_header call to will find the first Contact header
    but will return -1 as the header uses a hostname. But matchoff will
    be changed to the offset of this header. After that, dataoff should be
    set to matchoff, so that the next ct_sip_get_header call find the next
    Contact header. But instead of assigning dataoff to matchoff, it is
    incremented by it, which is not correct, as matchoff is an absolute
    value of the offset. So on the next call to the ct_sip_get_header,
    dataoff will be incorrect, and the next Contact header may not be
    found at all.

    Fixes: 05e3ced297fe ("[NETFILTER]: nf_conntrack_sip: introduce SIP-URI parsing helper")
    Signed-off-by: Igor Ryzhov
    Signed-off-by: Florian Westphal
    Signed-off-by: Sasha Levin

    Igor Ryzhov
     

27 Sep, 2022

1 commit

  • This is the 5.15.70 stable release

    * tag 'v5.15.70': (2444 commits)
    Linux 5.15.70
    ALSA: hda/sigmatel: Fix unused variable warning for beep power change
    cgroup: Add missing cpus_read_lock() to cgroup_attach_task_all()
    ...

    Signed-off-by: Jason Liu

    Conflicts:
    arch/arm/boot/dts/imx6ul.dtsi
    arch/arm/mm/mmu.c
    arch/arm64/boot/dts/freescale/imx8mp-evk.dts
    drivers/gpu/drm/imx/dcss/dcss-kms.c
    drivers/media/platform/nxp/imx-jpeg/mxc-jpeg.c
    drivers/media/platform/nxp/imx-jpeg/mxc-jpeg.h
    drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c
    drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c
    drivers/soc/fsl/Kconfig
    drivers/soc/imx/gpcv2.c
    drivers/usb/dwc3/host.c
    net/dsa/slave.c
    sound/soc/fsl/imx-card.c

    Jason Liu
     

23 Sep, 2022

3 commits

  • commit e22aa14866684f77b4f6b6cae98539e520ddb731 upstream.

    If we set XFRM security policy by calling setsockopt with option
    IPV6_XFRM_POLICY, the policy will be stored in 'sock_policy' in 'sock'
    struct. However tcp_v6_send_response doesn't look up dst_entry with the
    actual socket but looks up with tcp control socket. This may cause a
    problem that a RST packet is sent without ESP encryption & peer's TCP
    socket can't receive it.
    This patch will make the function look up dest_entry with actual socket,
    if the socket has XFRM policy(sock_policy), so that the TCP response
    packet via this function can be encrypted, & aligned on the encrypted
    TCP socket.

    Tested: We encountered this problem when a TCP socket which is encrypted
    in ESP transport mode encryption, receives challenge ACK at SYN_SENT
    state. After receiving challenge ACK, TCP needs to send RST to
    establish the socket at next SYN try. But the RST was not encrypted &
    peer TCP socket still remains on ESTABLISHED state.
    So we verified this with test step as below.
    [Test step]
    1. Making a TCP state mismatch between client(IDLE) & server(ESTABLISHED).
    2. Client tries a new connection on the same TCP ports(src & dst).
    3. Server will return challenge ACK instead of SYN,ACK.
    4. Client will send RST to server to clear the SOCKET.
    5. Client will retransmit SYN to server on the same TCP ports.
    [Expected result]
    The TCP connection should be established.

    Cc: Maciej Żenczykowski
    Cc: Eric Dumazet
    Cc: Steffen Klassert
    Cc: Sehee Lee
    Signed-off-by: Sewook Seo
    Reviewed-by: Eric Dumazet
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    sewookseo
     
  • [ Upstream commit 214a9dc7d852216e83acac7b75bc18f01ce184c2 ]

    Fix the calculation of the resend age to add a microsecond value as
    microseconds, not nanoseconds.

    Signed-off-by: David Howells
    Signed-off-by: Sasha Levin

    David Howells
     
  • [ Upstream commit d3d863036d688313f8d566b87acd7d99daf82749 ]

    If the local processor work item for the rxrpc local endpoint gets requeued
    by an event (such as an incoming packet) between it getting scheduled for
    destruction and the UDP socket being closed, the rxrpc_local_destroyer()
    function can get run twice. The second time it can hang because it can end
    up waiting for cleanup events that will never happen.

    Signed-off-by: David Howells
    Signed-off-by: Sasha Levin

    David Howells
     

20 Sep, 2022

1 commit

  • [ Upstream commit 52267ce25f60f37ae40ccbca0b21328ebae5ae75 ]

    In case the source port cannot be decoded, print the warning only once. This
    still brings attention to the user and does not spam the logs at the same time.

    Signed-off-by: Kurt Kanzenbach
    Reviewed-by: Andrew Lunn
    Reviewed-by: Vladimir Oltean
    Link: https://lore.kernel.org/r/20220830163448.8921-1-kurt@linutronix.de
    Signed-off-by: Jakub Kicinski
    Signed-off-by: Sasha Levin

    Kurt Kanzenbach
     

15 Sep, 2022

13 commits

  • [ Upstream commit 2f09707d0c972120bf794cfe0f0c67e2c2ddb252 ]

    Cong Wang noticed that the previous fix for sch_sfb accessing the queued
    skb after enqueueing it to a child qdisc was incomplete: the SFB enqueue
    function was also calling qdisc_qstats_backlog_inc() after enqueue, which
    reads the pkt len from the skb cb field. Fix this by also storing the skb
    len, and using the stored value to increment the backlog after enqueueing.

    Fixes: 9efd23297cca ("sch_sfb: Don't assume the skb is still around after enqueueing to child")
    Signed-off-by: Toke Høiland-Jørgensen
    Acked-by: Cong Wang
    Link: https://lore.kernel.org/r/20220905192137.965549-1-toke@toke.dk
    Signed-off-by: Paolo Abeni
    Signed-off-by: Sasha Levin

    Toke Høiland-Jørgensen
     
  • [ Upstream commit 686dc2db2a0fdc1d34b424ec2c0a735becd8d62b ]

    Fix a bug reported and analyzed by Nagaraj Arankal, where the handling
    of a spurious non-SACK RTO could cause a connection to fail to clear
    retrans_stamp, causing a later RTO to very prematurely time out the
    connection with ETIMEDOUT.

    Here is the buggy scenario, expanding upon Nagaraj Arankal's excellent
    report:

    (*1) Send one data packet on a non-SACK connection

    (*2) Because no ACK packet is received, the packet is retransmitted
    and we enter CA_Loss; but this retransmission is spurious.

    (*3) The ACK for the original data is received. The transmitted packet
    is acknowledged. The TCP timestamp is before the retrans_stamp,
    so tcp_may_undo() returns true, and tcp_try_undo_loss() returns
    true without changing state to Open (because tcp_is_sack() is
    false), and tcp_process_loss() returns without calling
    tcp_try_undo_recovery(). Normally after undoing a CA_Loss
    episode, tcp_fastretrans_alert() would see that the connection
    has returned to CA_Open and fall through and call
    tcp_try_to_open(), which would set retrans_stamp to 0. However,
    for non-SACK connections we hold the connection in CA_Loss, so do
    not fall through to call tcp_try_to_open() and do not set
    retrans_stamp to 0. So retrans_stamp is (erroneously) still
    non-zero.

    At this point the first "retransmission event" has passed and
    been recovered from. Any future retransmission is a completely
    new "event". However, retrans_stamp is erroneously still
    set. (And we are still in CA_Loss, which is correct.)

    (*4) After 16 minutes (to correspond with tcp_retries2=15), a new data
    packet is sent. Note: No data is transmitted between (*3) and
    (*4) and we disabled keep alives.

    The socket's timeout SHOULD be calculated from this point in
    time, but instead it's calculated from the prior "event" 16
    minutes ago (step (*2)).

    (*5) Because no ACK packet is received, the packet is retransmitted.

    (*6) At the time of the 2nd retransmission, the socket returns
    ETIMEDOUT, prematurely, because retrans_stamp is (erroneously)
    too far in the past (set at the time of (*2)).

    This commit fixes this bug by ensuring that we reuse in
    tcp_try_undo_loss() the same careful logic for non-SACK connections
    that we have in tcp_try_undo_recovery(). To avoid duplicating logic,
    we factor out that logic into a new
    tcp_is_non_sack_preventing_reopen() helper and call that helper from
    both undo functions.

    Fixes: da34ac7626b5 ("tcp: only undo on partial ACKs in CA_Loss")
    Reported-by: Nagaraj Arankal
    Link: https://lore.kernel.org/all/SJ0PR84MB1847BE6C24D274C46A1B9B0EB27A9@SJ0PR84MB1847.NAMPRD84.PROD.OUTLOOK.COM/
    Signed-off-by: Neal Cardwell
    Signed-off-by: Yuchung Cheng
    Reviewed-by: Eric Dumazet
    Link: https://lore.kernel.org/r/20220903121023.866900-1-ncardwell.kernel@gmail.com
    Signed-off-by: Paolo Abeni
    Signed-off-by: Sasha Levin

    Neal Cardwell
     
  • [ Upstream commit 84a53580c5d2138c7361c7c3eea5b31827e63b35 ]

    The SRv6 layer allows defining HMAC data that can later be used to sign IPv6
    Segment Routing Headers. This configuration is realised via netlink through
    four attributes: SEG6_ATTR_HMACKEYID, SEG6_ATTR_SECRET, SEG6_ATTR_SECRETLEN and
    SEG6_ATTR_ALGID. Because the SECRETLEN attribute is decoupled from the actual
    length of the SECRET attribute, it is possible to provide invalid combinations
    (e.g., secret = "", secretlen = 64). This case is not checked in the code and
    with an appropriately crafted netlink message, an out-of-bounds read of up
    to 64 bytes (max secret length) can occur past the skb end pointer and into
    skb_shared_info:

    Breakpoint 1, seg6_genl_sethmac (skb=, info=) at net/ipv6/seg6.c:208
    208 memcpy(hinfo->secret, secret, slen);
    (gdb) bt
    #0 seg6_genl_sethmac (skb=, info=) at net/ipv6/seg6.c:208
    #1 0xffffffff81e012e9 in genl_family_rcv_msg_doit (skb=skb@entry=0xffff88800b1f9f00, nlh=nlh@entry=0xffff88800b1b7600,
    extack=extack@entry=0xffffc90000ba7af0, ops=ops@entry=0xffffc90000ba7a80, hdrlen=4, net=0xffffffff84237580 , family=,
    family=) at net/netlink/genetlink.c:731
    #2 0xffffffff81e01435 in genl_family_rcv_msg (extack=0xffffc90000ba7af0, nlh=0xffff88800b1b7600, skb=0xffff88800b1f9f00,
    family=0xffffffff82fef6c0 ) at net/netlink/genetlink.c:775
    #3 genl_rcv_msg (skb=0xffff88800b1f9f00, nlh=0xffff88800b1b7600, extack=0xffffc90000ba7af0) at net/netlink/genetlink.c:792
    #4 0xffffffff81dfffc3 in netlink_rcv_skb (skb=skb@entry=0xffff88800b1f9f00, cb=cb@entry=0xffffffff81e01350 )
    at net/netlink/af_netlink.c:2501
    #5 0xffffffff81e00919 in genl_rcv (skb=0xffff88800b1f9f00) at net/netlink/genetlink.c:803
    #6 0xffffffff81dff6ae in netlink_unicast_kernel (ssk=0xffff888010eec800, skb=0xffff88800b1f9f00, sk=0xffff888004aed000)
    at net/netlink/af_netlink.c:1319
    #7 netlink_unicast (ssk=ssk@entry=0xffff888010eec800, skb=skb@entry=0xffff88800b1f9f00, portid=portid@entry=0, nonblock=)
    at net/netlink/af_netlink.c:1345
    #8 0xffffffff81dff9a4 in netlink_sendmsg (sock=, msg=0xffffc90000ba7e48, len=) at net/netlink/af_netlink.c:1921
    ...
    (gdb) p/x ((struct sk_buff *)0xffff88800b1f9f00)->head + ((struct sk_buff *)0xffff88800b1f9f00)->end
    $1 = 0xffff88800b1b76c0
    (gdb) p/x secret
    $2 = 0xffff88800b1b76c0
    (gdb) p slen
    $3 = 64 '@'

    The OOB data can then be read back from userspace by dumping HMAC state. This
    commit fixes this by ensuring SECRETLEN cannot exceed the actual length of
    SECRET.

    Reported-by: Lucas Leong
    Tested: verified that EINVAL is correctly returned when secretlen > len(secret)
    Fixes: 4f4853dc1c9c1 ("ipv6: sr: implement API to control SR HMAC structure")
    Signed-off-by: David Lebrun
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin

    David Lebrun
     
  • [ Upstream commit 3261400639463a853ba2b3be8bd009c2a8089775 ]

    We got a recent syzbot report [1] showing a possible misuse
    of pfmemalloc page status in TCP zerocopy paths.

    Indeed, for pages coming from user space or other layers,
    using page_is_pfmemalloc() is moot, and possibly could give
    false positives.

    There has been attempts to make page_is_pfmemalloc() more robust,
    but not using it in the first place in this context is probably better,
    removing cpu cycles.

    Note to stable teams :

    You need to backport 84ce071e38a6 ("net: introduce
    __skb_fill_page_desc_noacc") as a prereq.

    Race is more probable after commit c07aea3ef4d4
    ("mm: add a signature in struct page") because page_is_pfmemalloc()
    is now using low order bit from page->lru.next, which can change
    more often than page->index.

    Low order bit should never be set for lru.next (when used as an anchor
    in LRU list), so KCSAN report is mostly a false positive.

    Backporting to older kernel versions seems not necessary.

    [1]
    BUG: KCSAN: data-race in lru_add_fn / tcp_build_frag

    write to 0xffffea0004a1d2c8 of 8 bytes by task 18600 on cpu 0:
    __list_add include/linux/list.h:73 [inline]
    list_add include/linux/list.h:88 [inline]
    lruvec_add_folio include/linux/mm_inline.h:105 [inline]
    lru_add_fn+0x440/0x520 mm/swap.c:228
    folio_batch_move_lru+0x1e1/0x2a0 mm/swap.c:246
    folio_batch_add_and_move mm/swap.c:263 [inline]
    folio_add_lru+0xf1/0x140 mm/swap.c:490
    filemap_add_folio+0xf8/0x150 mm/filemap.c:948
    __filemap_get_folio+0x510/0x6d0 mm/filemap.c:1981
    pagecache_get_page+0x26/0x190 mm/folio-compat.c:104
    grab_cache_page_write_begin+0x2a/0x30 mm/folio-compat.c:116
    ext4_da_write_begin+0x2dd/0x5f0 fs/ext4/inode.c:2988
    generic_perform_write+0x1d4/0x3f0 mm/filemap.c:3738
    ext4_buffered_write_iter+0x235/0x3e0 fs/ext4/file.c:270
    ext4_file_write_iter+0x2e3/0x1210
    call_write_iter include/linux/fs.h:2187 [inline]
    new_sync_write fs/read_write.c:491 [inline]
    vfs_write+0x468/0x760 fs/read_write.c:578
    ksys_write+0xe8/0x1a0 fs/read_write.c:631
    __do_sys_write fs/read_write.c:643 [inline]
    __se_sys_write fs/read_write.c:640 [inline]
    __x64_sys_write+0x3e/0x50 fs/read_write.c:640
    do_syscall_x64 arch/x86/entry/common.c:50 [inline]
    do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80
    entry_SYSCALL_64_after_hwframe+0x63/0xcd

    read to 0xffffea0004a1d2c8 of 8 bytes by task 18611 on cpu 1:
    page_is_pfmemalloc include/linux/mm.h:1740 [inline]
    __skb_fill_page_desc include/linux/skbuff.h:2422 [inline]
    skb_fill_page_desc include/linux/skbuff.h:2443 [inline]
    tcp_build_frag+0x613/0xb20 net/ipv4/tcp.c:1018
    do_tcp_sendpages+0x3e8/0xaf0 net/ipv4/tcp.c:1075
    tcp_sendpage_locked net/ipv4/tcp.c:1140 [inline]
    tcp_sendpage+0x89/0xb0 net/ipv4/tcp.c:1150
    inet_sendpage+0x7f/0xc0 net/ipv4/af_inet.c:833
    kernel_sendpage+0x184/0x300 net/socket.c:3561
    sock_sendpage+0x5a/0x70 net/socket.c:1054
    pipe_to_sendpage+0x128/0x160 fs/splice.c:361
    splice_from_pipe_feed fs/splice.c:415 [inline]
    __splice_from_pipe+0x222/0x4d0 fs/splice.c:559
    splice_from_pipe fs/splice.c:594 [inline]
    generic_splice_sendpage+0x89/0xc0 fs/splice.c:743
    do_splice_from fs/splice.c:764 [inline]
    direct_splice_actor+0x80/0xa0 fs/splice.c:931
    splice_direct_to_actor+0x305/0x620 fs/splice.c:886
    do_splice_direct+0xfb/0x180 fs/splice.c:974
    do_sendfile+0x3bf/0x910 fs/read_write.c:1249
    __do_sys_sendfile64 fs/read_write.c:1317 [inline]
    __se_sys_sendfile64 fs/read_write.c:1303 [inline]
    __x64_sys_sendfile64+0x10c/0x150 fs/read_write.c:1303
    do_syscall_x64 arch/x86/entry/common.c:50 [inline]
    do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80
    entry_SYSCALL_64_after_hwframe+0x63/0xcd

    value changed: 0x0000000000000000 -> 0xffffea0004a1d288

    Reported by Kernel Concurrency Sanitizer on:
    CPU: 1 PID: 18611 Comm: syz-executor.4 Not tainted 6.0.0-rc2-syzkaller-00248-ge022620b5d05-dirty #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/22/2022

    Fixes: c07aea3ef4d4 ("mm: add a signature in struct page")
    Reported-by: syzbot
    Signed-off-by: Eric Dumazet
    Cc: Shakeel Butt
    Reviewed-by: Shakeel Butt
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin

    Eric Dumazet
     
  • [ Upstream commit e2b224abd9bf45dcb55750479fc35970725a430b ]

    There is a shift wrapping bug in this code so anything thing above
    31 will return false.

    Fixes: 35c55c9877f8 ("tipc: add neighbor monitoring framework")
    Signed-off-by: Dan Carpenter
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin

    Dan Carpenter
     
  • [ Upstream commit 9efd23297cca530bb35e1848665805d3fcdd7889 ]

    The sch_sfb enqueue() routine assumes the skb is still alive after it has
    been enqueued into a child qdisc, using the data in the skb cb field in the
    increment_qlen() routine after enqueue. However, the skb may in fact have
    been freed, causing a use-after-free in this case. In particular, this
    happens if sch_cake is used as a child of sfb, and the GSO splitting mode
    of CAKE is enabled (in which case the skb will be split into segments and
    the original skb freed).

    Fix this by copying the sfb cb data to the stack before enqueueing the skb,
    and using this stack copy in increment_qlen() instead of the skb pointer
    itself.

    Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-18231
    Fixes: e13e02a3c68d ("net_sched: SFB flow scheduler")
    Signed-off-by: Toke Høiland-Jørgensen
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin

    Toke Høiland-Jørgensen
     
  • [ Upstream commit 0d40f728e28393a8817d1fcae923dfa3409e488c ]

    rxkad_verify_packet_2() has a small stack-allocated sglist of 4 elements,
    but if that isn't sufficient for the number of fragments in the socket
    buffer, we try to allocate an sglist large enough to hold all the
    fragments.

    However, for large packets with a lot of fragments, this isn't sufficient
    and we need at least one additional fragment.

    The problem manifests as skb_to_sgvec() returning -EMSGSIZE and this then
    getting returned by userspace. Most of the time, this isn't a problem as
    rxrpc sets a limit of 5692, big enough for 4 jumbo subpackets to be glued
    together; occasionally, however, the server will ignore the reported limit
    and give a packet that's a lot bigger - say 19852 bytes with ->nr_frags
    being 7. skb_to_sgvec() then tries to return a "zeroth" fragment that
    seems to occur before the fragments counted by ->nr_frags and we hit the
    end of the sglist too early.

    Note that __skb_to_sgvec() also has an skb_walk_frags() loop that is
    recursive up to 24 deep. I'm not sure if I need to take account of that
    too - or if there's an easy way of counting those frags too.

    Fix this by counting an extra frag and allocating a larger sglist based on
    that.

    Fixes: d0d5c0cd1e71 ("rxrpc: Use skb_unshare() rather than skb_cow_data()")
    Reported-by: Marc Dionne
    Signed-off-by: David Howells
    cc: linux-afs@lists.infradead.org
    Signed-off-by: Sasha Levin

    David Howells
     
  • [ Upstream commit ac56a0b48da86fd1b4389632fb7c4c8a5d86eefa ]

    Because rxrpc pretends to be a tunnel on top of a UDP/UDP6 socket, allowing
    it to siphon off UDP packets early in the handling of received UDP packets
    thereby avoiding the packet going through the UDP receive queue, it doesn't
    get ICMP packets through the UDP ->sk_error_report() callback. In fact, it
    doesn't appear that there's any usable option for getting hold of ICMP
    packets.

    Fix this by adding a new UDP encap hook to distribute error messages for
    UDP tunnels. If the hook is set, then the tunnel driver will be able to
    see ICMP packets. The hook provides the offset into the packet of the UDP
    header of the original packet that caused the notification.

    An alternative would be to call the ->error_handler() hook - but that
    requires that the skbuff be cloned (as ip_icmp_error() or ipv6_cmp_error()
    do, though isn't really necessary or desirable in rxrpc's case is we want
    to parse them there and then, not queue them).

    Changes
    =======
    ver #3)
    - Fixed an uninitialised variable.

    ver #2)
    - Fixed some missing CONFIG_AF_RXRPC_IPV6 conditionals.

    Fixes: 5271953cad31 ("rxrpc: Use the UDP encap_rcv hook")
    Signed-off-by: David Howells
    Signed-off-by: Sasha Levin

    David Howells
     
  • [ Upstream commit 0efe125cfb99e6773a7434f3463f7c2fa28f3a43 ]

    Ensure the match happens in the right direction, previously the
    destination used was the server, not the NAT host, as the comment
    shows the code intended.

    Additionally nf_nat_irc uses port 0 as a signal and there's no valid way
    it can appear in a DCC message, so consider port 0 also forged.

    Fixes: 869f37d8e48f ("[NETFILTER]: nf_conntrack/nf_nat: add IRC helper port")
    Signed-off-by: David Leadbeater
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Sasha Levin

    David Leadbeater
     
  • [ Upstream commit 77972a36ecc4db7fc7c68f0e80714263c5f03f65 ]

    splice back the hook list so nft_chain_release_hook() has a chance to
    release the hooks.

    BUG: memory leak
    unreferenced object 0xffff88810180b100 (size 96):
    comm "syz-executor133", pid 3619, jiffies 4294945714 (age 12.690s)
    hex dump (first 32 bytes):
    28 64 23 02 81 88 ff ff 28 64 23 02 81 88 ff ff (d#.....(d#.....
    90 a8 aa 83 ff ff ff ff 00 00 b5 0f 81 88 ff ff ................
    backtrace:
    [] kmalloc include/linux/slab.h:600 [inline]
    [] nft_netdev_hook_alloc+0x3b/0xc0 net/netfilter/nf_tables_api.c:1901
    [] nft_chain_parse_netdev net/netfilter/nf_tables_api.c:1998 [inline]
    [] nft_chain_parse_hook+0x33a/0x530 net/netfilter/nf_tables_api.c:2073
    [] nf_tables_addchain.constprop.0+0x10b/0x950 net/netfilter/nf_tables_api.c:2218
    [] nf_tables_newchain+0xa8b/0xc60 net/netfilter/nf_tables_api.c:2593
    [] nfnetlink_rcv_batch+0xa46/0xd20 net/netfilter/nfnetlink.c:517
    [] nfnetlink_rcv_skb_batch net/netfilter/nfnetlink.c:638 [inline]
    [] nfnetlink_rcv+0x1f9/0x220 net/netfilter/nfnetlink.c:656
    [] netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
    [] netlink_unicast+0x397/0x4c0 net/netlink/af_netlink.c:1345
    [] netlink_sendmsg+0x396/0x710 net/netlink/af_netlink.c:1921
    [] sock_sendmsg_nosec net/socket.c:714 [inline]
    [] sock_sendmsg+0x56/0x80 net/socket.c:734
    [] ____sys_sendmsg+0x36c/0x390 net/socket.c:2482
    [] ___sys_sendmsg+0xa8/0x110 net/socket.c:2536
    [] __sys_sendmsg+0x88/0x100 net/socket.c:2565
    [] do_syscall_x64 arch/x86/entry/common.c:50 [inline]
    [] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
    [] entry_SYSCALL_64_after_hwframe+0x63/0xcd

    Fixes: d54725cd11a5 ("netfilter: nf_tables: support for multiple devices per netdev hook")
    Reported-by: syzbot+5fcdbfab6d6744c57418@syzkaller.appspotmail.com
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Sasha Levin

    Pablo Neira Ayuso
     
  • [ Upstream commit d047283a7034140ea5da759a494fd2274affdd46 ]

    The IPv6 path already drops dst in the daddr changed case, but the IPv4
    path does not. This change makes the two code paths consistent.

    Further, it is possible that there is already a metadata_dst allocated from
    ingress that might already be attached to skbuff->dst while following
    the bridge path. If it is not released before setting a new
    metadata_dst, it will be leaked. This is similar to what is done in
    bpf_set_tunnel_key() or ip6_route_input().

    It is important to note that the memory being leaked is not the dst
    being set in the bridge code, but rather memory allocated from some
    other code path that is not being freed correctly before the skb dst is
    overwritten.

    An example of the leakage fixed by this commit found using kmemleak:

    unreferenced object 0xffff888010112b00 (size 256):
    comm "softirq", pid 0, jiffies 4294762496 (age 32.012s)
    hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 80 16 f1 83 ff ff ff ff ................
    e1 4e f6 82 ff ff ff ff 00 00 00 00 00 00 00 00 .N..............
    backtrace:
    [] metadata_dst_alloc+0x1b/0xe0
    [] udp_tun_rx_dst+0x174/0x1f0
    [] geneve_udp_encap_recv+0x350/0x7b0
    [] udp_queue_rcv_one_skb+0x380/0x560
    [] udp_unicast_rcv_skb+0x75/0x90
    [] ip_protocol_deliver_rcu+0xd8/0x230
    [] ip_local_deliver_finish+0x7a/0xa0
    [] __netif_receive_skb_one_core+0x89/0xa0
    [] process_backlog+0x93/0x190
    [] __napi_poll+0x28/0x170
    [] net_rx_action+0x14f/0x2a0
    [] __do_softirq+0xf4/0x305
    [] __irq_exit_rcu+0xc3/0x140
    [] sysvec_apic_timer_interrupt+0x9e/0xc0
    [] asm_sysvec_apic_timer_interrupt+0x16/0x20
    [] native_safe_halt+0x13/0x20

    Florian Westphal says: "Original code was likely fine because nothing
    ever did set a skb->dst entry earlier than bridge in those days."

    Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
    Signed-off-by: Harsh Modi
    Acked-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Sasha Levin

    Harsh Modi
     
  • [ Upstream commit c624c58e08b15105662b9ab9be23d14a6b945a49 ]

    skb_copy_bits() could fail, which requires a check on the return
    value.

    Signed-off-by: Li Zhong
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin

    lily
     
  • [ Upstream commit cf97769c761abfeac8931b35fe0e1a8d5fabc9d8 ]

    When a TCP sends more bytes than allowed by the receive window, all future
    packets can be marked as invalid.
    This can clog up the conntrack table because of 5-day default timeout.

    Sequence of packets:
    01 initiator > responder: [S], seq 171, win 5840, options [mss 1330,sackOK,TS val 63 ecr 0,nop,wscale 1]
    02 responder > initiator: [S.], seq 33211, ack 172, win 65535, options [mss 1460,sackOK,TS val 010 ecr 63,nop,wscale 8]
    03 initiator > responder: [.], ack 33212, win 2920, options [nop,nop,TS val 068 ecr 010], length 0
    04 initiator > responder: [P.], seq 172:240, ack 33212, win 2920, options [nop,nop,TS val 279 ecr 010], length 68

    Window is 5840 starting from 33212 -> 39052.

    05 responder > initiator: [.], ack 240, win 256, options [nop,nop,TS val 872 ecr 279], length 0
    06 responder > initiator: [.], seq 33212:34530, ack 240, win 256, options [nop,nop,TS val 892 ecr 279], length 1318

    This is fine, conntrack will flag the connection as having outstanding
    data (UNACKED), which lowers the conntrack timeout to 300s.

    07 responder > initiator: [.], seq 34530:35848, ack 240, win 256, options [nop,nop,TS val 892 ecr 279], length 1318
    08 responder > initiator: [.], seq 35848:37166, ack 240, win 256, options [nop,nop,TS val 892 ecr 279], length 1318
    09 responder > initiator: [.], seq 37166:38484, ack 240, win 256, options [nop,nop,TS val 892 ecr 279], length 1318
    10 responder > initiator: [.], seq 38484:39802, ack 240, win 256, options [nop,nop,TS val 892 ecr 279], length 1318

    Packet 10 is already sending more than permitted, but conntrack doesn't
    validate this (only seq is tested vs. maxend, not 'seq+len').

    38484 is acceptable, but only up to 39052, so this packet should
    not have been sent (or only 568 bytes, not 1318).

    At this point, connection is still in '300s' mode.

    Next packet however will get flagged:
    11 responder > initiator: [P.], seq 39802:40128, ack 240, win 256, options [nop,nop,TS val 892 ecr 279], length 326

    nf_ct_proto_6: SEQ is over the upper bound (over the window of the receiver) .. LEN=378 .. SEQ=39802 ACK=240 ACK PSH ..

    Now, a couple of replies/acks comes in:

    12 initiator > responder: [.], ack 34530, win 4368,
    [.. irrelevant acks removed ]
    16 initiator > responder: [.], ack 39802, win 8712, options [nop,nop,TS val 296201291 ecr 2982371892], length 0

    This ack is significant -- this acks the last packet send by the
    responder that conntrack considered valid.

    This means that ack == td_end. This will withdraw the
    'unacked data' flag, the connection moves back to the 5-day timeout
    of established conntracks.

    17 initiator > responder: ack 40128, win 10030, ...

    This packet is also flagged as invalid.

    Because conntrack only updates state based on packets that are
    considered valid, packet 11 'did not exist' and that gets us:

    nf_ct_proto_6: ACK is over upper bound 39803 (ACKed data not seen yet) .. SEQ=240 ACK=40128 WINDOW=10030 RES=0x00 ACK URG

    Because this received and processed by the endpoints, the conntrack entry
    remains in a bad state, no packets will ever be considered valid again:

    30 responder > initiator: [F.], seq 40432, ack 2045, win 391, ..
    31 initiator > responder: [.], ack 40433, win 11348, ..
    32 initiator > responder: [F.], seq 2045, ack 40433, win 11348 ..

    ... all trigger 'ACK is over bound' test and we end up with
    non-early-evictable 5-day default timeout.

    NB: This patch triggers a bunch of checkpatch warnings because of silly
    indent. I will resend the cleanup series linked below to reduce the
    indent level once this change has propagated to net-next.

    I could route the cleanup via nf but that causes extra backport work for
    stable maintainers.

    Link: https://lore.kernel.org/netfilter-devel/20220720175228.17880-1-fw@strlen.de/T/#mb1d7147d36294573cc4f81d00f9f8dadfdd06cd8
    Signed-off-by: Florian Westphal
    Signed-off-by: Sasha Levin

    Florian Westphal
     

08 Sep, 2022

10 commits

  • commit f0da47118c7e93cdbbc6fb403dd729a5f2c90ee3 upstream.

    Upon reception, a packet must be categorized, either it's destination is
    the host, or it is another host. A packet with no destination addressing
    fields may be valid in two situations:
    - the packet has no source field: only ACKs are built like that, we
    consider the host as the destination.
    - the packet has a valid source field: it is directed to the PAN
    coordinator, as for know we don't have this information we consider we
    are not the PAN coordinator.

    There was likely a copy/paste error made during a previous cleanup
    because the if clause is now containing exactly the same condition as in
    the switch case, which can never be true. In the past the destination
    address was used in the switch and the source address was used in the
    if, which matches what the spec says.

    Cc: stable@vger.kernel.org
    Fixes: ae531b9475f6 ("ieee802154: use ieee802154_addr instead of *_sa variants")
    Signed-off-by: Miquel Raynal
    Link: https://lore.kernel.org/r/20220826142954.254853-1-miquel.raynal@bootlin.com
    Signed-off-by: Stefan Schmidt
    Signed-off-by: Greg Kroah-Hartman

    Miquel Raynal
     
  • commit 278d3ba61563ceed3cb248383ced19e14ec7bc1f upstream.

    On 32bit-UP u64_stats_fetch_begin() disables only preemption. If the
    reader is in preemptible context and the writer side
    (u64_stats_update_begin*()) runs in an interrupt context (IRQ or
    softirq) then the writer can update the stats during the read operation.
    This update remains undetected.

    Use u64_stats_fetch_begin_irq() to ensure the stats fetch on 32bit-UP
    are not interrupted by a writer. 32bit-SMP remains unaffected by this
    change.

    Cc: "David S. Miller"
    Cc: Catherine Sullivan
    Cc: David Awogbemila
    Cc: Dimitris Michailidis
    Cc: Eric Dumazet
    Cc: Hans Ulli Kroll
    Cc: Jakub Kicinski
    Cc: Jeroen de Borst
    Cc: Johannes Berg
    Cc: Linus Walleij
    Cc: Paolo Abeni
    Cc: Simon Horman
    Cc: linux-arm-kernel@lists.infradead.org
    Cc: linux-wireless@vger.kernel.org
    Cc: netdev@vger.kernel.org
    Cc: oss-drivers@corigine.com
    Cc: stable@vger.kernel.org
    Signed-off-by: Sebastian Andrzej Siewior
    Reviewed-by: Simon Horman
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Sebastian Andrzej Siewior
     
  • commit eb55dc09b5dd040232d5de32812cc83001a23da6 upstream.

    __mkroute_input() uses fib_validate_source() to trigger an icmp redirect.
    My understanding is that fib_validate_source() is used to know if the src
    address and the gateway address are on the same link. For that,
    fib_validate_source() returns 1 (same link) or 0 (not the same network).
    __mkroute_input() is the only user of these positive values, all other
    callers only look if the returned value is negative.

    Since the below patch, fib_validate_source() didn't return anymore 1 when
    both addresses are on the same network, because the route lookup returns
    RT_SCOPE_LINK instead of RT_SCOPE_HOST. But this is, in fact, right.
    Let's adapat the test to return 1 again when both addresses are on the same
    link.

    CC: stable@vger.kernel.org
    Fixes: 747c14307214 ("ip: fix dflt addr selection for connected nexthop")
    Reported-by: kernel test robot
    Reported-by: Heng Qi
    Signed-off-by: Nicolas Dichtel
    Reviewed-by: David Ahern
    Link: https://lore.kernel.org/r/20220829100121.3821-1-nicolas.dichtel@6wind.com
    Signed-off-by: Jakub Kicinski
    Signed-off-by: Greg Kroah-Hartman

    Nicolas Dichtel
     
  • commit 60deb9f10eec5c6a20252ed36238b55d8b614a2c upstream.

    ieee80211_scan_rx() tries to access scan_req->flags after a
    null check, but a UAF is observed when the scan is completed
    and __ieee80211_scan_completed() executes, which then calls
    cfg80211_scan_done() leading to the freeing of scan_req.

    Since scan_req is rcu_dereference()'d, prevent the racing in
    __ieee80211_scan_completed() by ensuring that from mac80211's
    POV it is no longer accessed from an RCU read critical section
    before we call cfg80211_scan_done().

    Cc: stable@vger.kernel.org
    Link: https://syzkaller.appspot.com/bug?extid=f9acff9bf08a845f225d
    Reported-by: syzbot+f9acff9bf08a845f225d@syzkaller.appspotmail.com
    Suggested-by: Johannes Berg
    Signed-off-by: Siddh Raman Pant
    Link: https://lore.kernel.org/r/20220819200340.34826-1-code@siddh.me
    Signed-off-by: Johannes Berg
    Signed-off-by: Greg Kroah-Hartman

    Siddh Raman Pant
     
  • commit 15bc8966b6d3a5b9bfe4c9facfa02f2b69b1e5f0 upstream.

    When we are not connected to a channel, sending channel "switch"
    announcement doesn't make any sense.

    The BSS list is empty in that case. This causes the for loop in
    cfg80211_get_bss() to be bypassed, so the function returns NULL
    (check line 1424 of net/wireless/scan.c), causing the WARN_ON()
    in ieee80211_ibss_csa_beacon() to get triggered (check line 500
    of net/mac80211/ibss.c), which was consequently reported on the
    syzkaller dashboard.

    Thus, check if we have an existing connection before generating
    the CSA beacon in ieee80211_ibss_finish_csa().

    Cc: stable@vger.kernel.org
    Fixes: cd7760e62c2a ("mac80211: add support for CSA in IBSS mode")
    Link: https://syzkaller.appspot.com/bug?id=05603ef4ae8926761b678d2939a3b2ad28ab9ca6
    Reported-by: syzbot+b6c9fe29aefe68e4ad34@syzkaller.appspotmail.com
    Signed-off-by: Siddh Raman Pant
    Tested-by: syzbot+b6c9fe29aefe68e4ad34@syzkaller.appspotmail.com
    Link: https://lore.kernel.org/r/20220814151512.9985-1-code@siddh.me
    Signed-off-by: Johannes Berg
    Signed-off-by: Greg Kroah-Hartman

    Siddh Raman Pant
     
  • [ Upstream commit a8424a9b4522a3ab9f32175ad6d848739079071f ]

    For passive connections, the refcount increment has been done in
    smc_clcsock_accept()-->smc_sock_alloc().

    Fixes: 3b2dec2603d5 ("net/smc: restructure client and server code in af_smc")
    Signed-off-by: Yacan Liu
    Reviewed-by: Tony Lu
    Link: https://lore.kernel.org/r/20220830152314.838736-1-liuyacan@corp.netease.com
    Signed-off-by: Paolo Abeni
    Signed-off-by: Sasha Levin

    Yacan Liu
     
  • [ Upstream commit 0b4f688d53fdc2a731b9d9cdf0c96255bc024ea6 ]

    This reverts commit 90fabae8a2c225c4e4936723c38857887edde5cc.

    Patch was applied hastily, revert and let the v2 be reviewed.

    Fixes: 90fabae8a2c2 ("sch_cake: Return __NET_XMIT_STOLEN when consuming enqueued skb")
    Link: https://lore.kernel.org/all/87wnao2ha3.fsf@toke.dk/
    Signed-off-by: Jakub Kicinski
    Signed-off-by: Sasha Levin

    Jakub Kicinski
     
  • [ Upstream commit 8c70521238b7863c2af607e20bcba20f974c969b ]

    challenge_timestamp can be read an written by concurrent threads.

    This was expected, but we need to annotate the race to avoid potential issues.

    Following patch moves challenge_timestamp and challenge_count
    to per-netns storage to provide better isolation.

    Fixes: 354e4aa391ed ("tcp: RFC 5961 5.2 Blind Data Injection Attack Mitigation")
    Reported-by: syzbot
    Signed-off-by: Eric Dumazet
    Acked-by: Neal Cardwell
    Signed-off-by: Jakub Kicinski
    Signed-off-by: Sasha Levin

    Eric Dumazet
     
  • [ Upstream commit 90fabae8a2c225c4e4936723c38857887edde5cc ]

    When the GSO splitting feature of sch_cake is enabled, GSO superpackets
    will be broken up and the resulting segments enqueued in place of the
    original skb. In this case, CAKE calls consume_skb() on the original skb,
    but still returns NET_XMIT_SUCCESS. This can confuse parent qdiscs into
    assuming the original skb still exists, when it really has been freed. Fix
    this by adding the __NET_XMIT_STOLEN flag to the return value in this case.

    Fixes: 0c850344d388 ("sch_cake: Conditionally split GSO segments")
    Signed-off-by: Toke Høiland-Jørgensen
    Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-18231
    Link: https://lore.kernel.org/r/20220831092103.442868-1-toke@toke.dk
    Signed-off-by: Jakub Kicinski
    Signed-off-by: Sasha Levin

    Toke Høiland-Jørgensen
     
  • [ Upstream commit 8fc29ff3910f3af08a7c40a75d436b5720efe2bf ]

    strp_init() is called just a few lines above this csk->sk_user_data
    check, it also initializes strp->work etc., therefore, it is
    unnecessary to call strp_done() to cancel the freshly initialized
    work.

    And if sk_user_data is already used by KCM, psock->strp should not be
    touched, particularly strp->work state, so we need to move strp_init()
    after the csk->sk_user_data check.

    This also makes a lockdep warning reported by syzbot go away.

    Reported-and-tested-by: syzbot+9fc084a4348493ef65d2@syzkaller.appspotmail.com
    Reported-by: syzbot+e696806ef96cdd2d87cd@syzkaller.appspotmail.com
    Fixes: e5571240236c ("kcm: Check if sk_user_data already set in kcm_attach")
    Fixes: dff8baa26117 ("kcm: Call strp_stop before strp_done in kcm_attach")
    Cc: Tom Herbert
    Signed-off-by: Cong Wang
    Link: https://lore.kernel.org/r/20220827181314.193710-1-xiyou.wangcong@gmail.com
    Signed-off-by: Jakub Kicinski
    Signed-off-by: Sasha Levin

    Cong Wang