06 Oct, 2020

1 commit


05 Oct, 2020

1 commit

  • Although we take RTNL on dump path, it is possible to
    skip RTNL on insertion path. So the following race condition
    is possible:

    rtnl_lock() // no rtnl lock
    mutex_lock(&idrinfo->lock);
    // insert ERR_PTR(-EBUSY)
    mutex_unlock(&idrinfo->lock);
    tc_dump_action()
    rtnl_unlock()

    So we have to skip those temporary -EBUSY entries on dump path
    too.

    Reported-and-tested-by: syzbot+b47bc4f247856fb4d9e1@syzkaller.appspotmail.com
    Fixes: 0fedc63fadf0 ("net_sched: commit action insertions together")
    Cc: Vlad Buslov
    Cc: Jamal Hadi Salim
    Cc: Jiri Pirko
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Cong Wang
     

29 Sep, 2020

1 commit

  • All TC actions call tcf_action_check_ctrlact() to validate
    goto chain, so this check in tcf_action_init_1() is actually
    redundant. Remove it to save troubles of leaking memory.

    Fixes: e49d8c22f126 ("net_sched: defer tcf_idr_insert() in tcf_action_init_1()")
    Reported-by: Vlad Buslov
    Suggested-by: Davide Caratti
    Cc: Jamal Hadi Salim
    Cc: Jiri Pirko
    Signed-off-by: Cong Wang
    Reviewed-by: Davide Caratti
    Signed-off-by: David S. Miller

    Cong Wang
     

25 Sep, 2020

2 commits

  • syzbot is able to trigger a failure case inside the loop in
    tcf_action_init(), and when this happens we clean up with
    tcf_action_destroy(). But, as these actions are already inserted
    into the global IDR, other parallel process could free them
    before tcf_action_destroy(), then we will trigger a use-after-free.

    Fix this by deferring the insertions even later, after the loop,
    and committing all the insertions in a separate loop, so we will
    never fail in the middle of the insertions any more.

    One side effect is that the window between alloction and final
    insertion becomes larger, now it is more likely that the loop in
    tcf_del_walker() sees the placeholder -EBUSY pointer. So we have
    to check for error pointer in tcf_del_walker().

    Reported-and-tested-by: syzbot+2287853d392e4b42374a@syzkaller.appspotmail.com
    Fixes: 0190c1d452a9 ("net: sched: atomically check-allocate action")
    Cc: Vlad Buslov
    Cc: Jamal Hadi Salim
    Cc: Jiri Pirko
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Cong Wang
     
  • All TC actions call tcf_idr_insert() for new action at the end
    of their ->init(), so we can actually move it to a central place
    in tcf_action_init_1().

    And once the action is inserted into the global IDR, other parallel
    process could free it immediately as its refcnt is still 1, so we can
    not fail after this, we need to move it after the goto action
    validation to avoid handling the failure case after insertion.

    This is found during code review, is not directly triggered by syzbot.
    And this prepares for the next patch.

    Cc: Vlad Buslov
    Cc: Jamal Hadi Salim
    Cc: Jiri Pirko
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Cong Wang
     

09 Sep, 2020

1 commit

  • Reviewing the error handling in tcf_action_init_1()
    most of the early handling uses

    err_out:
    if (cookie) {
    kfree(cookie->data);
    kfree(cookie);
    }

    before cookie could ever be set.

    So skip the unnecessay check.

    Signed-off-by: Tom Rix
    Signed-off-by: David S. Miller

    Tom Rix
     

21 Jun, 2020

1 commit


20 Jun, 2020

1 commit

  • This patch adds a drop frames counter to tc flower offloading.
    Reporting h/w dropped frames is necessary for some actions.
    Some actions like police action and the coming introduced stream gate
    action would produce dropped frames which is necessary for user. Status
    update shows how many filtered packets increasing and how many dropped
    in those packets.

    v2: Changes
    - Update commit comments suggest by Jiri Pirko.

    Signed-off-by: Po Liu
    Reviewed-by: Simon Horman
    Reviewed-by: Vlad Buslov
    Signed-off-by: David S. Miller

    Po Liu
     

16 May, 2020

1 commit

  • Extend tcf_action_dump() with boolean argument 'terse' that is used to
    request terse-mode action dump. In terse mode only essential data needed to
    identify particular action (action kind, cookie, etc.) and its stats is put
    to resulting skb and everything else is omitted. Implement
    tcf_exts_terse_dump() helper in cls API that is intended to be used to
    request terse dump of all exts (actions) attached to the filter.

    Signed-off-by: Vlad Buslov
    Reviewed-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Vlad Buslov
     

01 May, 2020

1 commit

  • In the netlink policy, we currently have a void *validation_data
    that's pointing to different things:
    * a u32 value for bitfield32,
    * the netlink policy for nested/nested array
    * the string for NLA_REJECT

    Remove the pointer and place appropriate type-safe items in the
    union instead.

    While at it, completely dissolve the pointer for the bitfield32
    case and just put the value there directly.

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     

31 Mar, 2020

2 commits

  • It may be up to the driver (in case ANY HW stats is passed) to select
    which type of HW stats he is going to use. Add an infrastructure to
    expose this information to user.

    $ tc filter add dev enp3s0np1 ingress proto ip handle 1 pref 1 flower dst_ip 192.168.1.1 action drop
    $ tc -s filter show dev enp3s0np1 ingress
    filter protocol ip pref 1 flower chain 0
    filter protocol ip pref 1 flower chain 0 handle 0x1
    eth_type ipv4
    dst_ip 192.168.1.1
    in_hw in_hw_count 2
    action order 1: gact action drop
    random type none pass val 0
    index 1 ref 1 bind 1 installed 10 sec used 10 sec
    Action statistics:
    Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
    backlog 0b 0p requeues 0
    used_hw_stats immediate <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Jiri Pirko
     
  • Introduce a helper to pass value and selector to. The helper packs them
    into struct and puts them into netlink message.

    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Jiri Pirko
     

24 Mar, 2020

1 commit

  • Commit 53eca1f3479f ("net: rename flow_action_hw_stats_types* ->
    flow_action_hw_stats*") renamed just the flow action types and
    helpers. For consistency rename variables, enums, struct members
    and UAPI too (note that this UAPI was not in any official release,
    yet).

    Signed-off-by: Jakub Kicinski
    Reviewed-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Jakub Kicinski
     

09 Mar, 2020

1 commit

  • Currently, user who is adding an action expects HW to report stats,
    however it does not have exact expectations about the stats types.
    That is aligned with TCA_ACT_HW_STATS_TYPE_ANY.

    Allow user to specify the type of HW stats for an action and require it.

    Pass the information down to flow_offload layer.

    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Jiri Pirko
     

27 Feb, 2020

1 commit

  • The put of the flags was added by the commit referenced in fixes tag,
    however the size of the message was not extended accordingly.

    Fix this by adding size of the flags bitfield to the message size.

    Fixes: e38226786022 ("net: sched: update action implementations to support flags")
    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Jiri Pirko
     

27 Nov, 2019

1 commit

  • Pull RCU updates from Ingo Molnar:
    "The main changes in this cycle were:

    - Dynamic tick (nohz) updates, perhaps most notably changes to force
    the tick on when needed due to lengthy in-kernel execution on CPUs
    on which RCU is waiting.

    - Linux-kernel memory consistency model updates.

    - Replace rcu_swap_protected() with rcu_prepace_pointer().

    - Torture-test updates.

    - Documentation updates.

    - Miscellaneous fixes"

    * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (51 commits)
    security/safesetid: Replace rcu_swap_protected() with rcu_replace_pointer()
    net/sched: Replace rcu_swap_protected() with rcu_replace_pointer()
    net/netfilter: Replace rcu_swap_protected() with rcu_replace_pointer()
    net/core: Replace rcu_swap_protected() with rcu_replace_pointer()
    bpf/cgroup: Replace rcu_swap_protected() with rcu_replace_pointer()
    fs/afs: Replace rcu_swap_protected() with rcu_replace_pointer()
    drivers/scsi: Replace rcu_swap_protected() with rcu_replace_pointer()
    drm/i915: Replace rcu_swap_protected() with rcu_replace_pointer()
    x86/kvm/pmu: Replace rcu_swap_protected() with rcu_replace_pointer()
    rcu: Upgrade rcu_swap_protected() to rcu_replace_pointer()
    rcu: Suppress levelspread uninitialized messages
    rcu: Fix uninitialized variable in nocb_gp_wait()
    rcu: Update descriptions for rcu_future_grace_period tracepoint
    rcu: Update descriptions for rcu_nocb_wake tracepoint
    rcu: Remove obsolete descriptions for rcu_barrier tracepoint
    rcu: Ensure that ->rcu_urgent_qs is set before resched IPI
    workqueue: Convert for_each_wq to use built-in list check
    rcu: Several rcu_segcblist functions can be static
    rcu: Remove unused function hlist_bl_del_init_rcu()
    Documentation: Rename rcu_node_context_switch() to rcu_note_context_switch()
    ...

    Linus Torvalds
     

13 Nov, 2019

1 commit

  • after commit 4097e9d250fb ("net: sched: don't use tc_action->order during
    action dump"), 'act->order' is initialized but then it's no more read, so
    we can just remove this member of struct tc_action.

    CC: Ivan Vecera
    Signed-off-by: Davide Caratti
    Acked-by: Jiri Pirko
    Reviewed-by: Ivan Vecera
    Signed-off-by: David S. Miller

    Davide Caratti
     

06 Nov, 2019

1 commit

  • Now the kernel uses 64bit packet counters in scheduler layer,
    we want to export these counters to user space.

    Instead risking breaking user space by adding fields
    to struct gnet_stats_basic, add a new TCA_STATS_PKT64.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

31 Oct, 2019

5 commits

  • …k/linux-rcu into core/rcu

    Pull RCU and LKMM changes from Paul E. McKenney:

    - Documentation updates.

    - Miscellaneous fixes.

    - Dynamic tick (nohz) updates, perhaps most notably changes to
    force the tick on when needed due to lengthy in-kernel execution
    on CPUs on which RCU is waiting.

    - Replace rcu_swap_protected() with rcu_prepace_pointer().

    - Torture-test updates.

    - Linux-kernel memory consistency model updates.

    Signed-off-by: Ingo Molnar <mingo@kernel.org>

    Ingo Molnar
     
  • Extend struct tc_action with new "tcfa_flags" field. Set the field in
    tcf_idr_create() function and provide new helper
    tcf_idr_create_from_flags() that derives 'cpustats' boolean from flags
    value. Update individual hardware-offloaded actions init() to pass their
    "flags" argument to new helper in order to skip percpu stats allocation
    when user requested it through flags.

    Signed-off-by: Vlad Buslov
    Signed-off-by: David S. Miller

    Vlad Buslov
     
  • Extend TCA_ACT space with nla_bitfield32 flags. Add
    TCA_ACT_FLAGS_NO_PERCPU_STATS as the only allowed flag. Parse the flags in
    tcf_action_init_1() and pass resulting value as additional argument to
    a_o->init().

    Signed-off-by: Vlad Buslov
    Signed-off-by: David S. Miller

    Vlad Buslov
     
  • Modify stats update helper functions introduced in previous patches in this
    series to fallback to regular tc_action->tcfa_{b|q}stats if cpu stats are
    not allocated for the action argument. If regular non-percpu allocated
    counters are in use, then obtain action tcfa_lock while modifying them.

    Signed-off-by: Vlad Buslov
    Acked-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Vlad Buslov
     
  • Currently, all implementations of tc_action_ops->stats_update() callback
    have almost exactly the same implementation of counters update
    code (besides gact which also updates drop counter). In order to simplify
    support for using both percpu-allocated and regular action counters
    depending on run-time flag in following patches, extract action counters
    update code into standalone function in act API.

    This commit doesn't change functionality.

    Signed-off-by: Vlad Buslov
    Acked-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Vlad Buslov
     

30 Oct, 2019

1 commit

  • This commit replaces the use of rcu_swap_protected() with the more
    intuitively appealing rcu_replace_pointer() as a step towards removing
    rcu_swap_protected().

    Link: https://lore.kernel.org/lkml/CAHk-=wiAsJLw1egFEE=Z7-GGtM6wcvtyytXZA1+BHqta4gg6Hw@mail.gmail.com/
    Reported-by: Linus Torvalds
    [ paulmck: From rcu_replace() to rcu_replace_pointer() per Ingo Molnar. ]
    Signed-off-by: Paul E. McKenney
    Cc: Jamal Hadi Salim
    Cc: Cong Wang
    Cc: Jiri Pirko
    Cc: "David S. Miller"
    Cc:
    Cc:

    Paul E. McKenney
     

16 Oct, 2019

1 commit

  • tc_ctl_action() has the ability to loop forever if tcf_action_add()
    returns -EAGAIN.

    This special case has been done in case a module needed to be loaded,
    but it turns out that tcf_add_notify() could also return -EAGAIN
    if the socket sk_rcvbuf limit is hit.

    We need to separate the two cases, and only loop for the module
    loading case.

    While we are at it, add a limit of 10 attempts since unbounded
    loops are always scary.

    syzbot repro was something like :

    socket(PF_NETLINK, SOCK_RAW|SOCK_NONBLOCK, NETLINK_ROUTE) = 3
    write(3, ..., 38) = 38
    setsockopt(3, SOL_SOCKET, SO_RCVBUF, [0], 4) = 0
    sendmsg(3, {msg_name(0)=NULL, msg_iov(1)=[{..., 388}], msg_controllen=0, msg_flags=0x10}, ...)

    NMI backtrace for cpu 0
    CPU: 0 PID: 1054 Comm: khungtaskd Not tainted 5.4.0-rc1+ #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Call Trace:
    __dump_stack lib/dump_stack.c:77 [inline]
    dump_stack+0x172/0x1f0 lib/dump_stack.c:113
    nmi_cpu_backtrace.cold+0x70/0xb2 lib/nmi_backtrace.c:101
    nmi_trigger_cpumask_backtrace+0x23b/0x28b lib/nmi_backtrace.c:62
    arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
    trigger_all_cpu_backtrace include/linux/nmi.h:146 [inline]
    check_hung_uninterruptible_tasks kernel/hung_task.c:205 [inline]
    watchdog+0x9d0/0xef0 kernel/hung_task.c:289
    kthread+0x361/0x430 kernel/kthread.c:255
    ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352
    Sending NMI from CPU 0 to CPUs 1:
    NMI backtrace for cpu 1
    CPU: 1 PID: 8859 Comm: syz-executor910 Not tainted 5.4.0-rc1+ #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    RIP: 0010:arch_local_save_flags arch/x86/include/asm/paravirt.h:751 [inline]
    RIP: 0010:lockdep_hardirqs_off+0x1df/0x2e0 kernel/locking/lockdep.c:3453
    Code: 5c 08 00 00 5b 41 5c 41 5d 5d c3 48 c7 c0 58 1d f3 88 48 ba 00 00 00 00 00 fc ff df 48 c1 e8 03 80 3c 10 00 0f 85 d3 00 00 00 83 3d 21 9e 99 07 00 0f 84 b9 00 00 00 9c 58 0f 1f 44 00 00 f6
    RSP: 0018:ffff8880a6f3f1b8 EFLAGS: 00000046
    RAX: 1ffffffff11e63ab RBX: ffff88808c9c6080 RCX: 0000000000000000
    RDX: dffffc0000000000 RSI: 0000000000000000 RDI: ffff88808c9c6914
    RBP: ffff8880a6f3f1d0 R08: ffff88808c9c6080 R09: fffffbfff16be5d1
    R10: fffffbfff16be5d0 R11: 0000000000000003 R12: ffffffff8746591f
    R13: ffff88808c9c6080 R14: ffffffff8746591f R15: 0000000000000003
    FS: 00000000011e4880(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: ffffffffff600400 CR3: 00000000a8920000 CR4: 00000000001406e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Call Trace:
    trace_hardirqs_off+0x62/0x240 kernel/trace/trace_preemptirq.c:45
    __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:108 [inline]
    _raw_spin_lock_irqsave+0x6f/0xcd kernel/locking/spinlock.c:159
    __wake_up_common_lock+0xc8/0x150 kernel/sched/wait.c:122
    __wake_up+0xe/0x10 kernel/sched/wait.c:142
    netlink_unlock_table net/netlink/af_netlink.c:466 [inline]
    netlink_unlock_table net/netlink/af_netlink.c:463 [inline]
    netlink_broadcast_filtered+0x705/0xb80 net/netlink/af_netlink.c:1514
    netlink_broadcast+0x3a/0x50 net/netlink/af_netlink.c:1534
    rtnetlink_send+0xdd/0x110 net/core/rtnetlink.c:714
    tcf_add_notify net/sched/act_api.c:1343 [inline]
    tcf_action_add+0x243/0x370 net/sched/act_api.c:1362
    tc_ctl_action+0x3b5/0x4bc net/sched/act_api.c:1410
    rtnetlink_rcv_msg+0x463/0xb00 net/core/rtnetlink.c:5386
    netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477
    rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5404
    netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
    netlink_unicast+0x531/0x710 net/netlink/af_netlink.c:1328
    netlink_sendmsg+0x8a5/0xd60 net/netlink/af_netlink.c:1917
    sock_sendmsg_nosec net/socket.c:637 [inline]
    sock_sendmsg+0xd7/0x130 net/socket.c:657
    ___sys_sendmsg+0x803/0x920 net/socket.c:2311
    __sys_sendmsg+0x105/0x1d0 net/socket.c:2356
    __do_sys_sendmsg net/socket.c:2365 [inline]
    __se_sys_sendmsg net/socket.c:2363 [inline]
    __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2363
    do_syscall_64+0xfa/0x760 arch/x86/entry/common.c:290
    entry_SYSCALL_64_after_hwframe+0x49/0xbe
    RIP: 0033:0x440939

    Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
    Signed-off-by: Eric Dumazet
    Reported-by: syzbot+cf0adbb9c28c8866c788@syzkaller.appspotmail.com
    Signed-off-by: David S. Miller

    Eric Dumazet
     

09 Oct, 2019

1 commit

  • For TCA_ACT_KIND, we have to keep the backward compatibility too,
    and rely on nla_strlcpy() to check and terminate the string with
    a NUL.

    Note for TC actions, nla_strcmp() is already used to compare kind
    strings, so we don't need to fix other places.

    Fixes: 199ce850ce11 ("net_sched: add policy validation for action attributes")
    Reported-by: Marcelo Ricardo Leitner
    Cc: Jamal Hadi Salim
    Cc: Jiri Pirko
    Signed-off-by: Cong Wang
    Reviewed-by: Marcelo Ricardo Leitner
    Signed-off-by: Jakub Kicinski

    Cong Wang
     

22 Sep, 2019

1 commit


02 Jul, 2019

1 commit

  • idr_for_each_entry_ul() is buggy as it can't handle overflow
    case correctly. When we have an ID == UINT_MAX, it becomes an
    infinite loop. This happens when running on 32-bit CPU where
    unsigned long has the same size with unsigned int.

    There is no better way to fix this than casting it to a larger
    integer, but we can't just 64 bit integer on 32 bit CPU. Instead
    we could just use an additional integer to help us to detect this
    overflow case, that is, adding a new parameter to this macro.
    Fortunately tc action is its only user right now.

    Fixes: 65a206c01e8e ("net/sched: Change act_api and act_xxx modules to use IDR")
    Reported-by: Li Shuang
    Tested-by: Davide Caratti
    Cc: Matthew Wilcox
    Cc: Chris Mi
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Cong Wang
     

31 May, 2019

2 commits

  • Pull yet more SPDX updates from Greg KH:
    "Here is another set of reviewed patches that adds SPDX tags to
    different kernel files, based on a set of rules that are being used to
    parse the comments to try to determine that the license of the file is
    "GPL-2.0-or-later" or "GPL-2.0-only". Only the "obvious" versions of
    these matches are included here, a number of "non-obvious" variants of
    text have been found but those have been postponed for later review
    and analysis.

    There is also a patch in here to add the proper SPDX header to a bunch
    of Kbuild files that we have missed in the past due to new files being
    added and forgetting that Kbuild uses two different file names for
    Makefiles. This issue was reported by the Kbuild maintainer.

    These patches have been out for review on the linux-spdx@vger mailing
    list, and while they were created by automatic tools, they were
    hand-verified by a bunch of different people, all whom names are on
    the patches are reviewers"

    * tag 'spdx-5.2-rc3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (82 commits)
    treewide: Add SPDX license identifier - Kbuild
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 225
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 224
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 223
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 222
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 221
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 220
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 218
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 217
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 216
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 215
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 214
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 213
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 211
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 210
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 209
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 207
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 206
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 203
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 201
    ...

    Linus Torvalds
     
  • Based on 1 normalized pattern(s):

    this program is free software you can redistribute it and or modify
    it under the terms of the gnu general public license as published by
    the free software foundation either version 2 of the license or at
    your option any later version

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-or-later

    has been chosen to replace the boilerplate/reference in 3029 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Allison Randal
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

25 May, 2019

1 commit

  • Function tcf_action_dump() relies on tc_action->order field when starting
    nested nla to send action data to userspace. This approach breaks in
    several cases:

    - When multiple filters point to same shared action, tc_action->order field
    is overwritten each time it is attached to filter. This causes filter
    dump to output action with incorrect attribute for all filters that have
    the action in different position (different order) from the last set
    tc_action->order value.

    - When action data is displayed using tc action API (RTM_GETACTION), action
    order is overwritten by tca_action_gd() according to its position in
    resulting array of nl attributes, which will break filter dump for all
    filters attached to that shared action that expect it to have different
    order value.

    Don't rely on tc_action->order when dumping actions. Set nla according to
    action position in resulting array of actions instead.

    Signed-off-by: Vlad Buslov
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    Vlad Buslov
     

28 Apr, 2019

2 commits

  • We currently have two levels of strict validation:

    1) liberal (default)
    - undefined (type >= max) & NLA_UNSPEC attributes accepted
    - attribute length >= expected accepted
    - garbage at end of message accepted
    2) strict (opt-in)
    - NLA_UNSPEC attributes accepted
    - attribute length >= expected accepted

    Split out parsing strictness into four different options:
    * TRAILING - check that there's no trailing data after parsing
    attributes (in message or nested)
    * MAXTYPE - reject attrs > max known type
    * UNSPEC - reject attributes with NLA_UNSPEC policy entries
    * STRICT_ATTRS - strictly validate attribute size

    The default for future things should be *everything*.
    The current *_strict() is a combination of TRAILING and MAXTYPE,
    and is renamed to _deprecated_strict().
    The current regular parsing has none of this, and is renamed to
    *_parse_deprecated().

    Additionally it allows us to selectively set one of the new flags
    even on old policies. Notably, the UNSPEC flag could be useful in
    this case, since it can be arranged (by filling in the policy) to
    not be an incompatible userspace ABI change, but would then going
    forward prevent forgetting attribute entries. Similar can apply
    to the POLICY flag.

    We end up with the following renames:
    * nla_parse -> nla_parse_deprecated
    * nla_parse_strict -> nla_parse_deprecated_strict
    * nlmsg_parse -> nlmsg_parse_deprecated
    * nlmsg_parse_strict -> nlmsg_parse_deprecated_strict
    * nla_parse_nested -> nla_parse_nested_deprecated
    * nla_validate_nested -> nla_validate_nested_deprecated

    Using spatch, of course:
    @@
    expression TB, MAX, HEAD, LEN, POL, EXT;
    @@
    -nla_parse(TB, MAX, HEAD, LEN, POL, EXT)
    +nla_parse_deprecated(TB, MAX, HEAD, LEN, POL, EXT)

    @@
    expression NLH, HDRLEN, TB, MAX, POL, EXT;
    @@
    -nlmsg_parse(NLH, HDRLEN, TB, MAX, POL, EXT)
    +nlmsg_parse_deprecated(NLH, HDRLEN, TB, MAX, POL, EXT)

    @@
    expression NLH, HDRLEN, TB, MAX, POL, EXT;
    @@
    -nlmsg_parse_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
    +nlmsg_parse_deprecated_strict(NLH, HDRLEN, TB, MAX, POL, EXT)

    @@
    expression TB, MAX, NLA, POL, EXT;
    @@
    -nla_parse_nested(TB, MAX, NLA, POL, EXT)
    +nla_parse_nested_deprecated(TB, MAX, NLA, POL, EXT)

    @@
    expression START, MAX, POL, EXT;
    @@
    -nla_validate_nested(START, MAX, POL, EXT)
    +nla_validate_nested_deprecated(START, MAX, POL, EXT)

    @@
    expression NLH, HDRLEN, MAX, POL, EXT;
    @@
    -nlmsg_validate(NLH, HDRLEN, MAX, POL, EXT)
    +nlmsg_validate_deprecated(NLH, HDRLEN, MAX, POL, EXT)

    For this patch, don't actually add the strict, non-renamed versions
    yet so that it breaks compile if I get it wrong.

    Also, while at it, make nla_validate and nla_parse go down to a
    common __nla_validate_parse() function to avoid code duplication.

    Ultimately, this allows us to have very strict validation for every
    new caller of nla_parse()/nlmsg_parse() etc as re-introduced in the
    next patch, while existing things will continue to work as is.

    In effect then, this adds fully strict validation for any new command.

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     
  • Even if the NLA_F_NESTED flag was introduced more than 11 years ago, most
    netlink based interfaces (including recently added ones) are still not
    setting it in kernel generated messages. Without the flag, message parsers
    not aware of attribute semantics (e.g. wireshark dissector or libmnl's
    mnl_nlmsg_fprintf()) cannot recognize nested attributes and won't display
    the structure of their contents.

    Unfortunately we cannot just add the flag everywhere as there may be
    userspace applications which check nlattr::nla_type directly rather than
    through a helper masking out the flags. Therefore the patch renames
    nla_nest_start() to nla_nest_start_noflag() and introduces nla_nest_start()
    as a wrapper adding NLA_F_NESTED. The calls which add NLA_F_NESTED manually
    are rewritten to use nla_nest_start().

    Except for changes in include/net/netlink.h, the patch was generated using
    this semantic patch:

    @@ expression E1, E2; @@
    -nla_nest_start(E1, E2)
    +nla_nest_start_noflag(E1, E2)

    @@ expression E1, E2; @@
    -nla_nest_start_noflag(E1, E2 | NLA_F_NESTED)
    +nla_nest_start(E1, E2)

    Signed-off-by: Michal Kubecek
    Acked-by: Jiri Pirko
    Acked-by: David Ahern
    Signed-off-by: David S. Miller

    Michal Kubecek
     

22 Mar, 2019

2 commits

  • use RCU when accessing the action chain, to avoid use after free in the
    traffic path when 'goto chain' is replaced on existing TC actions (see
    script below). Since the control action is read in the traffic path
    without holding the action spinlock, we need to explicitly ensure that
    a->goto_chain is not NULL before dereferencing (i.e it's not sufficient
    to rely on the value of TC_ACT_GOTO_CHAIN bits). Not doing so caused NULL
    dereferences in tcf_action_goto_chain_exec() when the following script:

    # tc chain add dev dd0 chain 42 ingress protocol ip flower \
    > ip_proto udp action pass index 4
    # tc filter add dev dd0 ingress protocol ip flower \
    > ip_proto udp action csum udp goto chain 42 index 66
    # tc chain del dev dd0 chain 42 ingress
    (start UDP traffic towards dd0)
    # tc action replace action csum udp pass index 66

    was run repeatedly for several hours.

    Suggested-by: Cong Wang
    Suggested-by: Vlad Buslov
    Signed-off-by: Davide Caratti
    Signed-off-by: David S. Miller

    Davide Caratti
     
  • - pass a pointer to struct tcf_proto in each actions's init() handler,
    to allow validating the control action, checking whether the chain
    exists and (eventually) refcounting it.
    - remove code that validates the control action after a successful call
    to the action's init() handler, and replace it with a test that forbids
    addition of actions having 'goto_chain' and NULL goto_chain pointer at
    the same time.
    - add tcf_action_check_ctrlact(), that will validate the control action
    and eventually allocate the action 'goto_chain' within the init()
    handler.
    - add tcf_action_set_ctrlact(), that will assign the control action and
    swap the current 'goto_chain' pointer with the new given one.

    This disallows 'goto_chain' on actions that don't initialize it properly
    in their init() handler, i.e. calling tcf_action_check_ctrlact() after
    successful IDR reservation and then calling tcf_action_set_ctrlact()
    to assign 'goto_chain' and 'tcf_action' consistently.

    By doing this, the kernel does not leak anymore refcounts when a valid
    'goto chain' handle is replaced in TC actions, causing kmemleak splats
    like the following one:

    # tc chain add dev dd0 chain 42 ingress protocol ip flower \
    > ip_proto tcp action drop
    # tc chain add dev dd0 chain 43 ingress protocol ip flower \
    > ip_proto udp action drop
    # tc filter add dev dd0 ingress matchall \
    > action gact goto chain 42 index 66
    # tc filter replace dev dd0 ingress matchall \
    > action gact goto chain 43 index 66
    # echo scan >/sys/kernel/debug/kmemleak

    unreferenced object 0xffff93c0ee09f000 (size 1024):
    comm "tc", pid 2565, jiffies 4295339808 (age 65.426s)
    hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
    00 00 00 00 08 00 06 00 00 00 00 00 00 00 00 00 ................
    backtrace:
    [] tc_ctl_chain+0x3d2/0x4c0
    [] rtnetlink_rcv_msg+0x263/0x2d0
    [] netlink_rcv_skb+0x4a/0x110
    [] netlink_unicast+0x1a0/0x250
    [] netlink_sendmsg+0x2c1/0x3c0
    [] sock_sendmsg+0x36/0x40
    [] ___sys_sendmsg+0x280/0x2f0
    [] __sys_sendmsg+0x5e/0xa0
    [] do_syscall_64+0x5b/0x180
    [] entry_SYSCALL_64_after_hwframe+0x44/0xa9
    [] 0xffffffffffffffff

    Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain")
    Fixes: 97763dc0f401 ("net_sched: reject unknown tcfa_action values")
    Signed-off-by: Davide Caratti
    Signed-off-by: David S. Miller

    Davide Caratti
     

11 Feb, 2019

1 commit

  • Modify the kernel users of the TCA_ACT_* macros to use TCA_ID_*. For
    example, use TCA_ID_GACT instead of TCA_ACT_GACT. This will align with
    TCA_ID_POLICE and also differentiates these identifier, used in struct
    tc_action_ops type field, from other macros starting with TCA_ACT_.

    To make things clearer, we name the enum defining the TCA_ID_*
    identifiers and also change the "type" field of struct tc_action to
    id.

    Signed-off-by: Eli Cohen
    Acked-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Eli Cohen
     

11 Dec, 2018

1 commit

  • The egdev mechanism was replaced by the TC indirect block notifications
    platform.

    Signed-off-by: Oz Shlomo
    Reviewed-by: Eli Britstein
    Reviewed-by: Jiri Pirko
    Cc: John Hurley
    Cc: Jakub Kicinski
    Signed-off-by: Saeed Mahameed

    Oz Shlomo
     

09 Oct, 2018

1 commit

  • Make sure extack is passed to nlmsg_parse where easy to do so.
    Most of these are dump handlers and leveraging the extack in
    the netlink_callback.

    Signed-off-by: David Ahern
    Acked-by: Christian Brauner
    Signed-off-by: David S. Miller

    David Ahern
     

05 Oct, 2018

1 commit

  • In commit ec3ed293e766 ("net_sched: change tcf_del_walker() to take idrinfo->lock")
    we move fl_hw_destroy_tmplt() to a workqueue to avoid blocking
    with the spinlock held. Unfortunately, this causes a lot of
    troubles here:

    1. tcf_chain_destroy() could be called right after we queue the work
    but before the work runs. This is a use-after-free.

    2. The chain refcnt is already 0, we can't even just hold it again.
    We can check refcnt==1 but it is ugly.

    3. The chain with refcnt 0 is still visible in its block, which means
    it could be still found and used!

    4. The block has a refcnt too, we can't hold it without introducing a
    proper API either.

    We can make it working but the end result is ugly. Instead of wasting
    time on reviewing it, let's just convert the troubling spinlock to
    a mutex, which allows us to use non-atomic allocations too.

    Fixes: ec3ed293e766 ("net_sched: change tcf_del_walker() to take idrinfo->lock")
    Reported-by: Ido Schimmel
    Cc: Jamal Hadi Salim
    Cc: Vlad Buslov
    Cc: Jiri Pirko
    Signed-off-by: Cong Wang
    Tested-by: Ido Schimmel
    Signed-off-by: David S. Miller

    Cong Wang
     

25 Sep, 2018

1 commit