20 Jan, 2021

2 commits


13 Jan, 2021

2 commits

  • [ Upstream commit bd1248f1ddbc48b0c30565fce897a3b6423313b8 ]

    Check Scell_log shift size in red_check_params() and modify all callers
    of red_check_params() to pass Scell_log.

    This prevents a shift out-of-bounds as detected by UBSAN:
    UBSAN: shift-out-of-bounds in ./include/net/red.h:252:22
    shift exponent 72 is too large for 32-bit type 'int'

    Fixes: 8afa10cbe281 ("net_sched: red: Avoid illegal values")
    Signed-off-by: Randy Dunlap
    Reported-by: syzbot+97c5bd9cc81eca63d36e@syzkaller.appspotmail.com
    Cc: Nogah Frankel
    Cc: Jamal Hadi Salim
    Cc: Cong Wang
    Cc: Jiri Pirko
    Cc: netdev@vger.kernel.org
    Cc: "David S. Miller"
    Cc: Jakub Kicinski
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Randy Dunlap
     
  • [ Upstream commit 698285da79f5b0b099db15a37ac661ac408c80eb ]

    taprio_graft() can insert a NULL element in the array of child qdiscs. As
    a consquence, taprio_reset() might not reset child qdiscs completely, and
    taprio_destroy() might leak resources. Fix it by ensuring that loops that
    iterate over q->qdiscs[] don't end when they find the first NULL item.

    Fixes: 44d4775ca518 ("net/sched: sch_taprio: reset child qdiscs before freeing them")
    Fixes: 5a781ccbd19e ("tc: Add support for configuring the taprio scheduler")
    Suggested-by: Jakub Kicinski
    Signed-off-by: Davide Caratti
    Link: https://lore.kernel.org/r/13edef6778fef03adc751582562fba4a13e06d6a.1608240532.git.dcaratti@redhat.com
    Signed-off-by: Jakub Kicinski
    Signed-off-by: Greg Kroah-Hartman

    Davide Caratti
     

06 Jan, 2021

1 commit

  • [ Upstream commit 44d4775ca51805b376a8db5b34f650434a08e556 ]

    syzkaller shows that packets can still be dequeued while taprio_destroy()
    is running. Let sch_taprio use the reset() function to cancel the advance
    timer and drop all skbs from the child qdiscs.

    Fixes: 5a781ccbd19e ("tc: Add support for configuring the taprio scheduler")
    Link: https://syzkaller.appspot.com/bug?id=f362872379bf8f0017fb667c1ab158f2d1e764ae
    Reported-by: syzbot+8971da381fb5a31f542d@syzkaller.appspotmail.com
    Signed-off-by: Davide Caratti
    Acked-by: Vinicius Costa Gomes
    Link: https://lore.kernel.org/r/63b6d79b0e830ebb0283e020db4df3cdfdfb2b94.1608142843.git.dcaratti@redhat.com
    Signed-off-by: Jakub Kicinski
    Signed-off-by: Greg Kroah-Hartman

    Davide Caratti
     

14 Dec, 2020

1 commit

  • Update txq0's trans_start in order to prevent the netdev watchdog from
    triggering too quickly. Since we set the LLTX flag, the stack won't update
    the jiffies for other tx queues. Prevent the watchdog from checking the
    other tx queues by adding the NETIF_HW_ACCEL_MQ flag.

    Signed-off-by: Camelia Groza

    Split core changes.

    Signed-off-by: Madalin Bucur

    Camelia Groza
     

10 Dec, 2020

1 commit

  • TCA_FLOWER_KEY_MPLS_OPT_LSE_LABEL is a u32 attribute (MPLS label is
    20 bits long).

    Fixes the following bug:

    $ tc filter add dev ethX ingress protocol mpls_uc \
    flower mpls lse depth 2 label 256 \
    action drop

    $ tc filter show dev ethX ingress
    filter protocol mpls_uc pref 49152 flower chain 0
    filter protocol mpls_uc pref 49152 flower chain 0 handle 0x1
    eth_type 8847
    mpls
    lse depth 2 label 0
    Signed-off-by: David S. Miller

    Guillaume Nault
     

05 Dec, 2020

1 commit

  • with the following tdc testcase:

    83be: (qdisc, fq_pie) Create FQ-PIE with invalid number of flows

    as fq_pie_init() fails, fq_pie_destroy() is called to clean up. Since the
    timer is not yet initialized, it's possible to observe a splat like this:

    INFO: trying to register non-static key.
    the code is fine but needs lockdep annotation.
    turning off the locking correctness validator.
    CPU: 0 PID: 975 Comm: tc Not tainted 5.10.0-rc4+ #298
    Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
    Call Trace:
    dump_stack+0x99/0xcb
    register_lock_class+0x12dd/0x1750
    __lock_acquire+0xfe/0x3970
    lock_acquire+0x1c8/0x7f0
    del_timer_sync+0x49/0xd0
    fq_pie_destroy+0x3f/0x80 [sch_fq_pie]
    qdisc_create+0x916/0x1160
    tc_modify_qdisc+0x3c4/0x1630
    rtnetlink_rcv_msg+0x346/0x8e0
    netlink_unicast+0x439/0x630
    netlink_sendmsg+0x719/0xbf0
    sock_sendmsg+0xe2/0x110
    ____sys_sendmsg+0x5ba/0x890
    ___sys_sendmsg+0xe9/0x160
    __sys_sendmsg+0xd3/0x170
    do_syscall_64+0x33/0x40
    entry_SYSCALL_64_after_hwframe+0x44/0xa9
    [...]
    ODEBUG: assert_init not available (active state 0) object type: timer_list hint: 0x0
    WARNING: CPU: 0 PID: 975 at lib/debugobjects.c:508 debug_print_object+0x162/0x210
    [...]
    Call Trace:
    debug_object_assert_init+0x268/0x380
    try_to_del_timer_sync+0x6a/0x100
    del_timer_sync+0x9e/0xd0
    fq_pie_destroy+0x3f/0x80 [sch_fq_pie]
    qdisc_create+0x916/0x1160
    tc_modify_qdisc+0x3c4/0x1630
    rtnetlink_rcv_msg+0x346/0x8e0
    netlink_rcv_skb+0x120/0x380
    netlink_unicast+0x439/0x630
    netlink_sendmsg+0x719/0xbf0
    sock_sendmsg+0xe2/0x110
    ____sys_sendmsg+0x5ba/0x890
    ___sys_sendmsg+0xe9/0x160
    __sys_sendmsg+0xd3/0x170
    do_syscall_64+0x33/0x40
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

    fix it moving timer_setup() before any failure, like it was done on 'red'
    with former commit 608b4adab178 ("net_sched: initialize timer earlier in
    red_init()").

    Fixes: ec97ecf1ebe4 ("net: sched: add Flow Queue PIE packet scheduler")
    Signed-off-by: Davide Caratti
    Reviewed-by: Cong Wang
    Link: https://lore.kernel.org/r/2e78e01c504c633ebdff18d041833cf2e079a3a4.1607020450.git.dcaratti@redhat.com
    Signed-off-by: Jakub Kicinski

    Davide Caratti
     

04 Dec, 2020

1 commit

  • when 'act_mpls' is used to mangle the LSE, the current value is read from
    the packet dereferencing 4 bytes at mpls_hdr(): ensure that the label is
    contained in the skb "linear" area.

    Found by code inspection.

    v2:
    - use MPLS_HLEN instead of sizeof(new_lse), thanks to Jakub Kicinski

    Fixes: 2a2ea50870ba ("net: sched: add mpls manipulation actions to TC")
    Signed-off-by: Davide Caratti
    Acked-by: Guillaume Nault
    Link: https://lore.kernel.org/r/3243506cba43d14858f3bd21ee0994160e44d64a.1606987058.git.dcaratti@redhat.com
    Signed-off-by: Jakub Kicinski

    Davide Caratti
     

30 Oct, 2020

1 commit

  • Currently it is possible to craft a special netlink RTM_NEWQDISC
    command that can result in jitter being equal to 0x80000000. It is
    enough to set the 32 bit jitter to 0x02000000 (it will later be
    multiplied by 2^6) or just set the 64 bit jitter via
    TCA_NETEM_JITTER64. This causes an overflow during the generation of
    uniformly distributed numbers in tabledist(), which in turn leads to
    division by zero (sigma != 0, but sigma * 2 is 0).

    The related fragment of code needs 32-bit division - see commit
    9b0ed89 ("netem: remove unnecessary 64 bit modulus"), so switching to
    64 bit is not an option.

    Fix the issue by keeping the value of jitter within the range that can
    be adequately handled by tabledist() - [0;INT_MAX]. As negative std
    deviation makes no sense, take the absolute value of the passed value
    and cap it at INT_MAX. Inside tabledist(), switch to unsigned 32 bit
    arithmetic in order to prevent overflows.

    Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
    Signed-off-by: Aleksandr Nogikh
    Reported-by: syzbot+ec762a6342ad0d3c0d8f@syzkaller.appspotmail.com
    Acked-by: Stephen Hemminger
    Link: https://lore.kernel.org/r/20201028170731.1383332-1-aleksandrnogikh@gmail.com
    Signed-off-by: Jakub Kicinski

    Aleksandr Nogikh
     

28 Oct, 2020

2 commits

  • The tcf_block_unbind() expects that the caller will take block->cb_lock
    before calling it, however the code took RTNL lock and dropped cb_lock
    instead. This causes to the following kernel panic.

    WARNING: CPU: 1 PID: 13524 at net/sched/cls_api.c:1488 tcf_block_unbind+0x2db/0x420
    Modules linked in: mlx5_ib mlx5_core mlxfw ptp pps_core act_mirred act_tunnel_key cls_flower vxlan ip6_udp_tunnel udp_tunnel dummy sch_ingress openvswitch nsh xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi ib_umad ib_ipoib rdma_cm iw_cm ib_cm ib_uverbs ib_core overlay [last unloaded: mlxfw]
    CPU: 1 PID: 13524 Comm: test-ecmp-add-v Tainted: G W 5.9.0+ #1
    Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
    RIP: 0010:tcf_block_unbind+0x2db/0x420
    Code: ff 48 83 c4 40 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8d bc 24 30 01 00 00 be ff ff ff ff e8 7d 7f 70 00 85 c0 0f 85 7b fd ff ff 0b e9 74 fd ff ff 48 c7 c7 dc 6a 24 84 e8 02 ec fe fe e9 55 fd
    RSP: 0018:ffff888117d17968 EFLAGS: 00010246
    RAX: 0000000000000000 RBX: ffff88812f713c00 RCX: 1ffffffff0848d5b
    RDX: 0000000000000001 RSI: ffff88814fbc8130 RDI: ffff888107f2b878
    RBP: 1ffff11022fa2f3f R08: 0000000000000000 R09: ffffffff84115a87
    R10: fffffbfff0822b50 R11: ffff888107f2b898 R12: ffff88814fbc8000
    R13: ffff88812f713c10 R14: ffff888117d17a38 R15: ffff88814fbc80c0
    FS: 00007f6593d36740(0000) GS:ffff8882a4f00000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00005607a00758f8 CR3: 0000000131aea006 CR4: 0000000000170ea0
    Call Trace:
    tc_block_indr_cleanup+0x3e0/0x5a0
    ? tcf_block_unbind+0x420/0x420
    ? __mutex_unlock_slowpath+0xe7/0x610
    flow_indr_dev_unregister+0x5e2/0x930
    ? mlx5e_restore_tunnel+0xdf0/0xdf0 [mlx5_core]
    ? mlx5e_restore_tunnel+0xdf0/0xdf0 [mlx5_core]
    ? flow_indr_block_cb_alloc+0x3c0/0x3c0
    ? mlx5_db_free+0x37c/0x4b0 [mlx5_core]
    mlx5e_cleanup_rep_tx+0x8b/0xc0 [mlx5_core]
    mlx5e_detach_netdev+0xe5/0x120 [mlx5_core]
    mlx5e_vport_rep_unload+0x155/0x260 [mlx5_core]
    esw_offloads_disable+0x227/0x2b0 [mlx5_core]
    mlx5_eswitch_disable_locked.cold+0x38e/0x699 [mlx5_core]
    mlx5_eswitch_disable+0x94/0xf0 [mlx5_core]
    mlx5_device_disable_sriov+0x183/0x1f0 [mlx5_core]
    mlx5_core_sriov_configure+0xfd/0x230 [mlx5_core]
    sriov_numvfs_store+0x261/0x2f0
    ? sriov_drivers_autoprobe_store+0x110/0x110
    ? sysfs_file_ops+0x170/0x170
    ? sysfs_file_ops+0x117/0x170
    ? sysfs_file_ops+0x170/0x170
    kernfs_fop_write+0x1ff/0x3f0
    ? rcu_read_lock_any_held+0x6e/0x90
    vfs_write+0x1f3/0x620
    ksys_write+0xf9/0x1d0
    ? __x64_sys_read+0xb0/0xb0
    ? lockdep_hardirqs_on_prepare+0x273/0x3f0
    ? syscall_enter_from_user_mode+0x1d/0x50
    do_syscall_64+0x2d/0x40
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

    ---[ end trace bfdd028ada702879 ]---

    Fixes: 0fdcf78d5973 ("net: use flow_indr_dev_setup_offload()")
    Signed-off-by: Leon Romanovsky
    Link: https://lore.kernel.org/r/20201026123327.1141066-1-leon@kernel.org
    Signed-off-by: Jakub Kicinski

    Leon Romanovsky
     
  • TCA_MPLS_ACT_PUSH and TCA_MPLS_ACT_MAC_PUSH might be used on gso
    packets. Such packets will thus require mpls_gso.ko for segmentation.

    v2: Drop dependency on CONFIG_NET_MPLS_GSO in Kconfig (from Jakub and
    David).

    Fixes: 2a2ea50870ba ("net: sched: add mpls manipulation actions to TC")
    Signed-off-by: Guillaume Nault
    Link: https://lore.kernel.org/r/1f6cab15bbd15666795061c55563aaf6a386e90e.1603708007.git.gnault@redhat.com
    Signed-off-by: Jakub Kicinski

    Guillaume Nault
     

21 Oct, 2020

3 commits

  • the following command

    # tc action add action tunnel_key \
    > set src_ip 2001:db8::1 dst_ip 2001:db8::2 id 10 erspan_opts 1:6789:0:0

    generates the following splat:

    BUG: KASAN: slab-out-of-bounds in tunnel_key_copy_opts+0xcc9/0x1010 [act_tunnel_key]
    Write of size 4 at addr ffff88813f5f1cc8 by task tc/873

    CPU: 2 PID: 873 Comm: tc Not tainted 5.9.0+ #282
    Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
    Call Trace:
    dump_stack+0x99/0xcb
    print_address_description.constprop.7+0x1e/0x230
    kasan_report.cold.13+0x37/0x7c
    tunnel_key_copy_opts+0xcc9/0x1010 [act_tunnel_key]
    tunnel_key_init+0x160c/0x1f40 [act_tunnel_key]
    tcf_action_init_1+0x5b5/0x850
    tcf_action_init+0x15d/0x370
    tcf_action_add+0xd9/0x2f0
    tc_ctl_action+0x29b/0x3a0
    rtnetlink_rcv_msg+0x341/0x8d0
    netlink_rcv_skb+0x120/0x380
    netlink_unicast+0x439/0x630
    netlink_sendmsg+0x719/0xbf0
    sock_sendmsg+0xe2/0x110
    ____sys_sendmsg+0x5ba/0x890
    ___sys_sendmsg+0xe9/0x160
    __sys_sendmsg+0xd3/0x170
    do_syscall_64+0x33/0x40
    entry_SYSCALL_64_after_hwframe+0x44/0xa9
    RIP: 0033:0x7f872a96b338
    Code: 89 02 48 c7 c0 ff ff ff ff eb b5 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 25 43 2c 00 8b 00 85 c0 75 17 b8 2e 00 00 00 0f 05 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 41 54 41 89 d4 55
    RSP: 002b:00007ffffe367518 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
    RAX: ffffffffffffffda RBX: 000000005f8f5aed RCX: 00007f872a96b338
    RDX: 0000000000000000 RSI: 00007ffffe367580 RDI: 0000000000000003
    RBP: 0000000000000000 R08: 0000000000000001 R09: 000000000000001c
    R10: 000000000000000b R11: 0000000000000246 R12: 0000000000000001
    R13: 0000000000686760 R14: 0000000000000601 R15: 0000000000000000

    Allocated by task 873:
    kasan_save_stack+0x19/0x40
    __kasan_kmalloc.constprop.7+0xc1/0xd0
    __kmalloc+0x151/0x310
    metadata_dst_alloc+0x20/0x40
    tunnel_key_init+0xfff/0x1f40 [act_tunnel_key]
    tcf_action_init_1+0x5b5/0x850
    tcf_action_init+0x15d/0x370
    tcf_action_add+0xd9/0x2f0
    tc_ctl_action+0x29b/0x3a0
    rtnetlink_rcv_msg+0x341/0x8d0
    netlink_rcv_skb+0x120/0x380
    netlink_unicast+0x439/0x630
    netlink_sendmsg+0x719/0xbf0
    sock_sendmsg+0xe2/0x110
    ____sys_sendmsg+0x5ba/0x890
    ___sys_sendmsg+0xe9/0x160
    __sys_sendmsg+0xd3/0x170
    do_syscall_64+0x33/0x40
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

    The buggy address belongs to the object at ffff88813f5f1c00
    which belongs to the cache kmalloc-256 of size 256
    The buggy address is located 200 bytes inside of
    256-byte region [ffff88813f5f1c00, ffff88813f5f1d00)
    The buggy address belongs to the page:
    page:0000000011b48a19 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x13f5f0
    head:0000000011b48a19 order:1 compound_mapcount:0
    flags: 0x17ffffc0010200(slab|head)
    raw: 0017ffffc0010200 0000000000000000 0000000d00000001 ffff888107c43400
    raw: 0000000000000000 0000000080100010 00000001ffffffff 0000000000000000
    page dumped because: kasan: bad access detected

    Memory state around the buggy address:
    ffff88813f5f1b80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
    ffff88813f5f1c00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    >ffff88813f5f1c80: 00 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc
    ^
    ffff88813f5f1d00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
    ffff88813f5f1d80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc

    using IPv6 tunnels, act_tunnel_key allocates a fixed amount of memory for
    the tunnel metadata, but then it expects additional bytes to store tunnel
    specific metadata with tunnel_key_copy_opts().

    Fix the arguments of __ipv6_tun_set_dst(), so that 'md_size' contains the
    size previously computed by tunnel_key_get_opts_len(), like it's done for
    IPv4 tunnels.

    Fixes: 0ed5269f9e41 ("net/sched: add tunnel option support to act_tunnel_key")
    Reported-by: Shuang Li
    Signed-off-by: Davide Caratti
    Acked-by: Cong Wang
    Link: https://lore.kernel.org/r/36ebe969f6d13ff59912d6464a4356fe6f103766.1603231100.git.dcaratti@redhat.com
    Signed-off-by: Jakub Kicinski

    Davide Caratti
     
  • We need to jump to the "err_out_locked" label when
    tcf_gate_get_entries() fails. Otherwise, tc_setup_flow_action() exits
    with ->tcfa_lock still held.

    Fixes: d29bdd69ecdd ("net: schedule: add action gate offloading")
    Signed-off-by: Guillaume Nault
    Acked-by: Cong Wang
    Link: https://lore.kernel.org/r/12f60e385584c52c22863701c0185e40ab08a7a7.1603207948.git.gnault@redhat.com
    Signed-off-by: Jakub Kicinski

    Guillaume Nault
     
  • Need to use the udp header type and not tcp.

    Fixes: 9c26ba9b1f45 ("net/sched: act_ct: Instantiate flow table entry actions")
    Signed-off-by: Roi Dayan
    Reviewed-by: Paul Blakey
    Link: https://lore.kernel.org/r/20201019090244.3015186-1-roid@nvidia.com
    Signed-off-by: Jakub Kicinski

    Roi Dayan
     

09 Oct, 2020

1 commit

  • kmalloc() of sufficiently big portion of memory is cache-aligned
    in regular conditions. If some debugging options are used,
    there is no reason qdisc structures would need 64-byte alignment
    if most other kernel structures are not aligned.

    This get rid of QDISC_ALIGN and QDISC_ALIGNTO.

    Addition of privdata field will help implementing
    the reverse of qdisc_priv() and documents where
    the private data is.

    Signed-off-by: Eric Dumazet
    Cc: Allen Pais
    Acked-by: Cong Wang
    Signed-off-by: Jakub Kicinski

    Eric Dumazet
     

06 Oct, 2020

1 commit


05 Oct, 2020

1 commit

  • Although we take RTNL on dump path, it is possible to
    skip RTNL on insertion path. So the following race condition
    is possible:

    rtnl_lock() // no rtnl lock
    mutex_lock(&idrinfo->lock);
    // insert ERR_PTR(-EBUSY)
    mutex_unlock(&idrinfo->lock);
    tc_dump_action()
    rtnl_unlock()

    So we have to skip those temporary -EBUSY entries on dump path
    too.

    Reported-and-tested-by: syzbot+b47bc4f247856fb4d9e1@syzkaller.appspotmail.com
    Fixes: 0fedc63fadf0 ("net_sched: commit action insertions together")
    Cc: Vlad Buslov
    Cc: Jamal Hadi Salim
    Cc: Jiri Pirko
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Cong Wang
     

04 Oct, 2020

2 commits

  • Define the MAC_PUSH action which pushes an MPLS LSE before the mac
    header (instead of between the mac and the network headers as the
    plain PUSH action does).

    The only special case is when the skb has an offloaded VLAN. In that
    case, it has to be inlined before pushing the MPLS header.

    Signed-off-by: Guillaume Nault
    Signed-off-by: David S. Miller

    Guillaume Nault
     
  • Implement TCA_VLAN_ACT_POP_ETH and TCA_VLAN_ACT_PUSH_ETH, to
    respectively pop and push a base Ethernet header at the beginning of a
    frame.

    POP_ETH is just a matter of pulling ETH_HLEN bytes. VLAN tags, if any,
    must be stripped before calling POP_ETH.

    PUSH_ETH is restricted to skbs with no mac_header, and only the MAC
    addresses can be configured. The Ethertype is automatically set from
    skb->protocol. These restrictions ensure that all skb's fields remain
    consistent, so that this action can't confuse other part of the
    networking stack (like GSO).

    Since openvswitch already had these actions, consolidate the code in
    skbuff.c (like for vlan and mpls push/pop).

    Signed-off-by: Guillaume Nault
    Signed-off-by: David S. Miller

    Guillaume Nault
     

29 Sep, 2020

2 commits

  • There is a regular need in the kernel to provide a way to declare having
    a dynamically sized set of trailing elements in a structure. Kernel code
    should always use “flexible array members”[1] for these cases. The older
    style of one-element or zero-length arrays should no longer be used[2].

    Refactor the code according to the use of a flexible-array member in
    struct tc_u_hnode and use the struct_size() helper to calculate the
    size for the allocations. Commit 5778d39d070b ("net_sched: fix struct
    tc_u_hnode layout in u32") makes it clear that the code is expected to
    dynamically allocate divisor + 1 entries for ->ht[] in tc_uhnode. Also,
    based on other observations, as the piece of code below:

    1232 for (h = 0; h divisor; h++) {
    1233 for (n = rtnl_dereference(ht->ht[h]);
    1234 n;
    1235 n = rtnl_dereference(n->next)) {
    1236 if (tc_skip_hw(n->flags))
    1237 continue;
    1238
    1239 err = u32_reoffload_knode(tp, n, add, cb,
    1240 cb_priv, extack);
    1241 if (err)
    1242 return err;
    1243 }
    1244 }

    we can assume that, in general, the code is actually expecting to allocate
    that extra space for the one-element array in tc_uhnode, everytime it
    allocates memory for instances of tc_uhnode or tc_u_common structures.
    That's the reason for passing '1' as the last argument for struct_size()
    in the allocation for _root_ht_ and _tp_c_, and 'divisor + 1' in the
    allocation code for _ht_.

    [1] https://en.wikipedia.org/wiki/Flexible_array_member
    [2] https://www.kernel.org/doc/html/v5.9-rc1/process/deprecated.html#zero-length-and-one-element-arrays

    Tested-by: kernel test robot
    Link: https://lore.kernel.org/lkml/5f7062af.z3T9tn9yIPv6h5Ny%25lkp@intel.com/
    Signed-off-by: Gustavo A. R. Silva
    Signed-off-by: David S. Miller

    Gustavo A. R. Silva
     
  • All TC actions call tcf_action_check_ctrlact() to validate
    goto chain, so this check in tcf_action_init_1() is actually
    redundant. Remove it to save troubles of leaking memory.

    Fixes: e49d8c22f126 ("net_sched: defer tcf_idr_insert() in tcf_action_init_1()")
    Reported-by: Vlad Buslov
    Suggested-by: Davide Caratti
    Cc: Jamal Hadi Salim
    Cc: Jiri Pirko
    Signed-off-by: Cong Wang
    Reviewed-by: Davide Caratti
    Signed-off-by: David S. Miller

    Cong Wang
     

25 Sep, 2020

2 commits

  • syzbot is able to trigger a failure case inside the loop in
    tcf_action_init(), and when this happens we clean up with
    tcf_action_destroy(). But, as these actions are already inserted
    into the global IDR, other parallel process could free them
    before tcf_action_destroy(), then we will trigger a use-after-free.

    Fix this by deferring the insertions even later, after the loop,
    and committing all the insertions in a separate loop, so we will
    never fail in the middle of the insertions any more.

    One side effect is that the window between alloction and final
    insertion becomes larger, now it is more likely that the loop in
    tcf_del_walker() sees the placeholder -EBUSY pointer. So we have
    to check for error pointer in tcf_del_walker().

    Reported-and-tested-by: syzbot+2287853d392e4b42374a@syzkaller.appspotmail.com
    Fixes: 0190c1d452a9 ("net: sched: atomically check-allocate action")
    Cc: Vlad Buslov
    Cc: Jamal Hadi Salim
    Cc: Jiri Pirko
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Cong Wang
     
  • All TC actions call tcf_idr_insert() for new action at the end
    of their ->init(), so we can actually move it to a central place
    in tcf_action_init_1().

    And once the action is inserted into the global IDR, other parallel
    process could free it immediately as its refcnt is still 1, so we can
    not fail after this, we need to move it after the goto action
    validation to avoid handling the failure case after insertion.

    This is found during code review, is not directly triggered by syzbot.
    And this prepares for the next patch.

    Cc: Vlad Buslov
    Cc: Jamal Hadi Salim
    Cc: Jiri Pirko
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Cong Wang
     

23 Sep, 2020

1 commit

  • Two minor conflicts:

    1) net/ipv4/route.c, adding a new local variable while
    moving another local variable and removing it's
    initial assignment.

    2) drivers/net/dsa/microchip/ksz9477.c, overlapping changes.
    One pretty prints the port mode differently, whilst another
    changes the driver to try and obtain the port mode from
    the port node rather than the switch node.

    Signed-off-by: David S. Miller

    David S. Miller
     

15 Sep, 2020

2 commits

  • In fl_set_erspan_opt(), all bits of erspan md was set 1, as this
    function is also used to set opt MASK. However, when setting for
    md->u.index for opt VALUE, the rest bits of the union md->u will
    be left 1. It would cause to fail the match of the whole md when
    version is 1 and only index is set.

    This patch is to fix by initializing with 0 before setting erspan
    md->u.

    Reported-by: Shuang Li
    Fixes: 79b1011cb33d ("net: sched: allow flower to match erspan options")
    Signed-off-by: Xin Long
    Signed-off-by: David S. Miller

    Xin Long
     
  • As we can see from vxlan_build/parse_gbp_hdr(), when processing metadata
    on vxlan rx/tx path, only dont_learn/policy_applied/policy_id fields can
    be set to or parse from the packet for vxlan gbp option.

    So we'd better do the mask when set it in act_tunnel_key and cls_flower.
    Otherwise, when users don't know these bits, they may configure with a
    value which can never be matched.

    Reported-by: Shuang Li
    Signed-off-by: Xin Long
    Signed-off-by: David S. Miller

    Xin Long
     

12 Sep, 2020

1 commit

  • It's possible that the user specifies an interval that couldn't allow
    any packet to be transmitted. This also avoids the issue of the
    hrtimer handler starving the other threads because it's running too
    often.

    The solution is to reject interval sizes that according to the current
    link speed wouldn't allow any packet to be transmitted.

    Reported-by: syzbot+8267241609ae8c23b248@syzkaller.appspotmail.com
    Fixes: 5a781ccbd19e ("tc: Add support for configuring the taprio scheduler")
    Signed-off-by: Vinicius Costa Gomes
    Signed-off-by: David S. Miller

    Vinicius Costa Gomes
     

11 Sep, 2020

1 commit

  • Currently there is concurrent reset and enqueue operation for the
    same lockless qdisc when there is no lock to synchronize the
    q->enqueue() in __dev_xmit_skb() with the qdisc reset operation in
    qdisc_deactivate() called by dev_deactivate_queue(), which may cause
    out-of-bounds access for priv->ring[] in hns3 driver if user has
    requested a smaller queue num when __dev_xmit_skb() still enqueue a
    skb with a larger queue_mapping after the corresponding qdisc is
    reset, and call hns3_nic_net_xmit() with that skb later.

    Reused the existing synchronize_net() in dev_deactivate_many() to
    make sure skb with larger queue_mapping enqueued to old qdisc(which
    is saved in dev_queue->qdisc_sleeping) will always be reset when
    dev_reset_queue() is called.

    Fixes: 6b3ba9146fe6 ("net: sched: allow qdiscs to handle locking")
    Signed-off-by: Yunsheng Lin
    Signed-off-by: David S. Miller

    Yunsheng Lin
     

09 Sep, 2020

1 commit

  • Reviewing the error handling in tcf_action_init_1()
    most of the early handling uses

    err_out:
    if (cookie) {
    kfree(cookie->data);
    kfree(cookie);
    }

    before cookie could ever be set.

    So skip the unnecessay check.

    Signed-off-by: Tom Rix
    Signed-off-by: David S. Miller

    Tom Rix
     

05 Sep, 2020

2 commits

  • The following deadlock scenario is triggered by syzbot:

    Thread A: Thread B:
    tcf_idr_check_alloc()
    ...
    populate_metalist()
    rtnl_unlock()
    rtnl_lock()
    ...
    request_module() tcf_idr_check_alloc()
    rtnl_lock()

    At this point, thread A is waiting for thread B to release RTNL
    lock, while thread B is waiting for thread A to commit the IDR
    change with tcf_idr_insert() later.

    Break this deadlock situation by preloading ife modules earlier,
    before tcf_idr_check_alloc(), this is fine because we only need
    to load modules we need potentially.

    Reported-and-tested-by: syzbot+80e32b5d1f9923f8ace6@syzkaller.appspotmail.com
    Fixes: 0190c1d452a9 ("net: sched: atomically check-allocate action")
    Cc: Jamal Hadi Salim
    Cc: Vlad Buslov
    Cc: Jiri Pirko
    Signed-off-by: Cong Wang
    Signed-off-by: Jakub Kicinski

    Cong Wang
     
  • We got slightly different patches removing a double word
    in a comment in net/ipv4/raw.c - picked the version from net.

    Simple conflict in drivers/net/ethernet/ibm/ibmvnic.c. Use cached
    values instead of VNIC login response buffer (following what
    commit 507ebe6444a4 ("ibmvnic: Fix use-after-free of VNIC login
    response buffer") did).

    Signed-off-by: Jakub Kicinski

    Jakub Kicinski
     

04 Sep, 2020

1 commit

  • Pull networking fixes from David Miller:

    1) Use netif_rx_ni() when necessary in batman-adv stack, from Jussi
    Kivilinna.

    2) Fix loss of RTT samples in rxrpc, from David Howells.

    3) Memory leak in hns_nic_dev_probe(), from Dignhao Liu.

    4) ravb module cannot be unloaded, fix from Yuusuke Ashizuka.

    5) We disable BH for too lokng in sctp_get_port_local(), add a
    cond_resched() here as well, from Xin Long.

    6) Fix memory leak in st95hf_in_send_cmd, from Dinghao Liu.

    7) Out of bound access in bpf_raw_tp_link_fill_link_info(), from
    Yonghong Song.

    8) Missing of_node_put() in mt7530 DSA driver, from Sumera
    Priyadarsini.

    9) Fix crash in bnxt_fw_reset_task(), from Michael Chan.

    10) Fix geneve tunnel checksumming bug in hns3, from Yi Li.

    11) Memory leak in rxkad_verify_response, from Dinghao Liu.

    12) In tipc, don't use smp_processor_id() in preemptible context. From
    Tuong Lien.

    13) Fix signedness issue in mlx4 memory allocation, from Shung-Hsi Yu.

    14) Missing clk_disable_prepare() in gemini driver, from Dan Carpenter.

    15) Fix ABI mismatch between driver and firmware in nfp, from Louis
    Peens.

    * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (110 commits)
    net/smc: fix sock refcounting in case of termination
    net/smc: reset sndbuf_desc if freed
    net/smc: set rx_off for SMCR explicitly
    net/smc: fix toleration of fake add_link messages
    tg3: Fix soft lockup when tg3_reset_task() fails.
    doc: net: dsa: Fix typo in config code sample
    net: dp83867: Fix WoL SecureOn password
    nfp: flower: fix ABI mismatch between driver and firmware
    tipc: fix shutdown() of connectionless socket
    ipv6: Fix sysctl max for fib_multipath_hash_policy
    drivers/net/wan/hdlc: Change the default of hard_header_len to 0
    net: gemini: Fix another missing clk_disable_unprepare() in probe
    net: bcmgenet: fix mask check in bcmgenet_validate_flow()
    amd-xgbe: Add support for new port mode
    net: usb: dm9601: Add USB ID of Keenetic Plus DSL
    vhost: fix typo in error message
    net: ethernet: mlx4: Fix memory allocation in mlx4_buddy_init()
    pktgen: fix error message with wrong function name
    net: ethernet: ti: am65-cpsw: fix rmii 100Mbit link mode
    cxgb4: fix thermal zone device registration
    ...

    Linus Torvalds
     

28 Aug, 2020

1 commit

  • When ->init() fails, ->destroy() is called to clean up.
    So it is unnecessary to clean up in red_init(), and it
    would cause some refcount underflow.

    Fixes: aee9caa03fc3 ("net: sched: sch_red: Add qevents "early_drop" and "mark"")
    Reported-and-tested-by: syzbot+b33c1cb0a30ebdc8a5f9@syzkaller.appspotmail.com
    Reported-and-tested-by: syzbot+e5ea5f8a3ecfd4427a1c@syzkaller.appspotmail.com
    Cc: Petr Machata
    Signed-off-by: Cong Wang
    Reviewed-by: Petr Machata
    Signed-off-by: David S. Miller

    Cong Wang
     

27 Aug, 2020

1 commit

  • Since commit 9c66d1564676 ("taprio: Add support for hardware
    offloading") there's a bit of inconsistency when offloading schedules
    to the hardware:

    In software mode, the gate masks are specified in terms of traffic
    classes, so if say "sched-entry S 03 20000", it means that the traffic
    classes 0 and 1 are open for 20us; when taprio is offloaded to
    hardware, the gate masks are specified in terms of hardware queues.

    The idea here is to fix hardware offloading, so schedules in hardware
    and software mode have the same behavior. What's needed to do is to
    map traffic classes to queues when applying the offload to the driver.

    Fixes: 9c66d1564676 ("taprio: Add support for hardware offloading")
    Signed-off-by: Vinicius Costa Gomes
    Signed-off-by: David S. Miller

    Vinicius Costa Gomes
     

24 Aug, 2020

2 commits


21 Aug, 2020

1 commit


19 Aug, 2020

1 commit


06 Aug, 2020

1 commit

  • Pull networking updates from David Miller:

    1) Support 6Ghz band in ath11k driver, from Rajkumar Manoharan.

    2) Support UDP segmentation in code TSO code, from Eric Dumazet.

    3) Allow flashing different flash images in cxgb4 driver, from Vishal
    Kulkarni.

    4) Add drop frames counter and flow status to tc flower offloading,
    from Po Liu.

    5) Support n-tuple filters in cxgb4, from Vishal Kulkarni.

    6) Various new indirect call avoidance, from Eric Dumazet and Brian
    Vazquez.

    7) Fix BPF verifier failures on 32-bit pointer arithmetic, from
    Yonghong Song.

    8) Support querying and setting hardware address of a port function via
    devlink, use this in mlx5, from Parav Pandit.

    9) Support hw ipsec offload on bonding slaves, from Jarod Wilson.

    10) Switch qca8k driver over to phylink, from Jonathan McDowell.

    11) In bpftool, show list of processes holding BPF FD references to
    maps, programs, links, and btf objects. From Andrii Nakryiko.

    12) Several conversions over to generic power management, from Vaibhav
    Gupta.

    13) Add support for SO_KEEPALIVE et al. to bpf_setsockopt(), from Dmitry
    Yakunin.

    14) Various https url conversions, from Alexander A. Klimov.

    15) Timestamping and PHC support for mscc PHY driver, from Antoine
    Tenart.

    16) Support bpf iterating over tcp and udp sockets, from Yonghong Song.

    17) Support 5GBASE-T i40e NICs, from Aleksandr Loktionov.

    18) Add kTLS RX HW offload support to mlx5e, from Tariq Toukan.

    19) Fix the ->ndo_start_xmit() return type to be netdev_tx_t in several
    drivers. From Luc Van Oostenryck.

    20) XDP support for xen-netfront, from Denis Kirjanov.

    21) Support receive buffer autotuning in MPTCP, from Florian Westphal.

    22) Support EF100 chip in sfc driver, from Edward Cree.

    23) Add XDP support to mvpp2 driver, from Matteo Croce.

    24) Support MPTCP in sock_diag, from Paolo Abeni.

    25) Commonize UDP tunnel offloading code by creating udp_tunnel_nic
    infrastructure, from Jakub Kicinski.

    26) Several pci_ --> dma_ API conversions, from Christophe JAILLET.

    27) Add FLOW_ACTION_POLICE support to mlxsw, from Ido Schimmel.

    28) Add SK_LOOKUP bpf program type, from Jakub Sitnicki.

    29) Refactor a lot of networking socket option handling code in order to
    avoid set_fs() calls, from Christoph Hellwig.

    30) Add rfc4884 support to icmp code, from Willem de Bruijn.

    31) Support TBF offload in dpaa2-eth driver, from Ioana Ciornei.

    32) Support XDP_REDIRECT in qede driver, from Alexander Lobakin.

    33) Support PCI relaxed ordering in mlx5 driver, from Aya Levin.

    34) Support TCP syncookies in MPTCP, from Flowian Westphal.

    35) Fix several tricky cases of PMTU handling wrt. briding, from Stefano
    Brivio.

    * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2056 commits)
    net: thunderx: initialize VF's mailbox mutex before first usage
    usb: hso: remove bogus check for EINPROGRESS
    usb: hso: no complaint about kmalloc failure
    hso: fix bailout in error case of probe
    ip_tunnel_core: Fix build for archs without _HAVE_ARCH_IPV6_CSUM
    selftests/net: relax cpu affinity requirement in msg_zerocopy test
    mptcp: be careful on subflow creation
    selftests: rtnetlink: make kci_test_encap() return sub-test result
    selftests: rtnetlink: correct the final return value for the test
    net: dsa: sja1105: use detected device id instead of DT one on mismatch
    tipc: set ub->ifindex for local ipv6 address
    ipv6: add ipv6_dev_find()
    net: openvswitch: silence suspicious RCU usage warning
    Revert "vxlan: fix tos value before xmit"
    ptp: only allow phase values lower than 1 period
    farsync: switch from 'pci_' to 'dma_' API
    wan: wanxl: switch from 'pci_' to 'dma_' API
    hv_netvsc: do not use VF device if link is down
    dpaa2-eth: Fix passing zero to 'PTR_ERR' warning
    net: macb: Properly handle phylink on at91sam9x
    ...

    Linus Torvalds