21 Oct, 2020

1 commit

  • the following command

    # tc action add action tunnel_key \
    > set src_ip 2001:db8::1 dst_ip 2001:db8::2 id 10 erspan_opts 1:6789:0:0

    generates the following splat:

    BUG: KASAN: slab-out-of-bounds in tunnel_key_copy_opts+0xcc9/0x1010 [act_tunnel_key]
    Write of size 4 at addr ffff88813f5f1cc8 by task tc/873

    CPU: 2 PID: 873 Comm: tc Not tainted 5.9.0+ #282
    Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
    Call Trace:
    dump_stack+0x99/0xcb
    print_address_description.constprop.7+0x1e/0x230
    kasan_report.cold.13+0x37/0x7c
    tunnel_key_copy_opts+0xcc9/0x1010 [act_tunnel_key]
    tunnel_key_init+0x160c/0x1f40 [act_tunnel_key]
    tcf_action_init_1+0x5b5/0x850
    tcf_action_init+0x15d/0x370
    tcf_action_add+0xd9/0x2f0
    tc_ctl_action+0x29b/0x3a0
    rtnetlink_rcv_msg+0x341/0x8d0
    netlink_rcv_skb+0x120/0x380
    netlink_unicast+0x439/0x630
    netlink_sendmsg+0x719/0xbf0
    sock_sendmsg+0xe2/0x110
    ____sys_sendmsg+0x5ba/0x890
    ___sys_sendmsg+0xe9/0x160
    __sys_sendmsg+0xd3/0x170
    do_syscall_64+0x33/0x40
    entry_SYSCALL_64_after_hwframe+0x44/0xa9
    RIP: 0033:0x7f872a96b338
    Code: 89 02 48 c7 c0 ff ff ff ff eb b5 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 25 43 2c 00 8b 00 85 c0 75 17 b8 2e 00 00 00 0f 05 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 41 54 41 89 d4 55
    RSP: 002b:00007ffffe367518 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
    RAX: ffffffffffffffda RBX: 000000005f8f5aed RCX: 00007f872a96b338
    RDX: 0000000000000000 RSI: 00007ffffe367580 RDI: 0000000000000003
    RBP: 0000000000000000 R08: 0000000000000001 R09: 000000000000001c
    R10: 000000000000000b R11: 0000000000000246 R12: 0000000000000001
    R13: 0000000000686760 R14: 0000000000000601 R15: 0000000000000000

    Allocated by task 873:
    kasan_save_stack+0x19/0x40
    __kasan_kmalloc.constprop.7+0xc1/0xd0
    __kmalloc+0x151/0x310
    metadata_dst_alloc+0x20/0x40
    tunnel_key_init+0xfff/0x1f40 [act_tunnel_key]
    tcf_action_init_1+0x5b5/0x850
    tcf_action_init+0x15d/0x370
    tcf_action_add+0xd9/0x2f0
    tc_ctl_action+0x29b/0x3a0
    rtnetlink_rcv_msg+0x341/0x8d0
    netlink_rcv_skb+0x120/0x380
    netlink_unicast+0x439/0x630
    netlink_sendmsg+0x719/0xbf0
    sock_sendmsg+0xe2/0x110
    ____sys_sendmsg+0x5ba/0x890
    ___sys_sendmsg+0xe9/0x160
    __sys_sendmsg+0xd3/0x170
    do_syscall_64+0x33/0x40
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

    The buggy address belongs to the object at ffff88813f5f1c00
    which belongs to the cache kmalloc-256 of size 256
    The buggy address is located 200 bytes inside of
    256-byte region [ffff88813f5f1c00, ffff88813f5f1d00)
    The buggy address belongs to the page:
    page:0000000011b48a19 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x13f5f0
    head:0000000011b48a19 order:1 compound_mapcount:0
    flags: 0x17ffffc0010200(slab|head)
    raw: 0017ffffc0010200 0000000000000000 0000000d00000001 ffff888107c43400
    raw: 0000000000000000 0000000080100010 00000001ffffffff 0000000000000000
    page dumped because: kasan: bad access detected

    Memory state around the buggy address:
    ffff88813f5f1b80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
    ffff88813f5f1c00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    >ffff88813f5f1c80: 00 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc
    ^
    ffff88813f5f1d00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
    ffff88813f5f1d80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc

    using IPv6 tunnels, act_tunnel_key allocates a fixed amount of memory for
    the tunnel metadata, but then it expects additional bytes to store tunnel
    specific metadata with tunnel_key_copy_opts().

    Fix the arguments of __ipv6_tun_set_dst(), so that 'md_size' contains the
    size previously computed by tunnel_key_get_opts_len(), like it's done for
    IPv4 tunnels.

    Fixes: 0ed5269f9e41 ("net/sched: add tunnel option support to act_tunnel_key")
    Reported-by: Shuang Li
    Signed-off-by: Davide Caratti
    Acked-by: Cong Wang
    Link: https://lore.kernel.org/r/36ebe969f6d13ff59912d6464a4356fe6f103766.1603231100.git.dcaratti@redhat.com
    Signed-off-by: Jakub Kicinski

    Davide Caratti
     

25 Sep, 2020

1 commit

  • All TC actions call tcf_idr_insert() for new action at the end
    of their ->init(), so we can actually move it to a central place
    in tcf_action_init_1().

    And once the action is inserted into the global IDR, other parallel
    process could free it immediately as its refcnt is still 1, so we can
    not fail after this, we need to move it after the goto action
    validation to avoid handling the failure case after insertion.

    This is found during code review, is not directly triggered by syzbot.
    And this prepares for the next patch.

    Cc: Vlad Buslov
    Cc: Jamal Hadi Salim
    Cc: Jiri Pirko
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Cong Wang
     

15 Sep, 2020

1 commit

  • As we can see from vxlan_build/parse_gbp_hdr(), when processing metadata
    on vxlan rx/tx path, only dont_learn/policy_applied/policy_id fields can
    be set to or parse from the packet for vxlan gbp option.

    So we'd better do the mask when set it in act_tunnel_key and cls_flower.
    Otherwise, when users don't know these bits, they may configure with a
    value which can never be matched.

    Reported-by: Shuang Li
    Signed-off-by: Xin Long
    Signed-off-by: David S. Miller

    Xin Long
     

27 Nov, 2019

1 commit

  • Pull RCU updates from Ingo Molnar:
    "The main changes in this cycle were:

    - Dynamic tick (nohz) updates, perhaps most notably changes to force
    the tick on when needed due to lengthy in-kernel execution on CPUs
    on which RCU is waiting.

    - Linux-kernel memory consistency model updates.

    - Replace rcu_swap_protected() with rcu_prepace_pointer().

    - Torture-test updates.

    - Documentation updates.

    - Miscellaneous fixes"

    * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (51 commits)
    security/safesetid: Replace rcu_swap_protected() with rcu_replace_pointer()
    net/sched: Replace rcu_swap_protected() with rcu_replace_pointer()
    net/netfilter: Replace rcu_swap_protected() with rcu_replace_pointer()
    net/core: Replace rcu_swap_protected() with rcu_replace_pointer()
    bpf/cgroup: Replace rcu_swap_protected() with rcu_replace_pointer()
    fs/afs: Replace rcu_swap_protected() with rcu_replace_pointer()
    drivers/scsi: Replace rcu_swap_protected() with rcu_replace_pointer()
    drm/i915: Replace rcu_swap_protected() with rcu_replace_pointer()
    x86/kvm/pmu: Replace rcu_swap_protected() with rcu_replace_pointer()
    rcu: Upgrade rcu_swap_protected() to rcu_replace_pointer()
    rcu: Suppress levelspread uninitialized messages
    rcu: Fix uninitialized variable in nocb_gp_wait()
    rcu: Update descriptions for rcu_future_grace_period tracepoint
    rcu: Update descriptions for rcu_nocb_wake tracepoint
    rcu: Remove obsolete descriptions for rcu_barrier tracepoint
    rcu: Ensure that ->rcu_urgent_qs is set before resched IPI
    workqueue: Convert for_each_wq to use built-in list check
    rcu: Several rcu_segcblist functions can be static
    rcu: Remove unused function hlist_bl_del_init_rcu()
    Documentation: Rename rcu_node_context_switch() to rcu_note_context_switch()
    ...

    Linus Torvalds
     

23 Nov, 2019

1 commit


22 Nov, 2019

2 commits

  • This patch is to allow setting erspan options using the
    act_tunnel_key action. Different from geneve options,
    only one option can be set. And also, geneve options,
    vxlan options or erspan options can't be set at the
    same time.

    Options are expressed as ver:index:dir:hwid, when ver
    is set to 1, index will be applied while dir and hwid
    will be ignored, and when ver is set to 2, dir and
    hwid will be used while index will be ignored.

    # ip link add name erspan1 type erspan external
    # tc qdisc add dev eth0 ingress
    # tc filter add dev eth0 protocol ip parent ffff: \
    flower indev eth0 \
    ip_proto udp \
    action tunnel_key \
    set src_ip 10.0.99.192 \
    dst_ip 10.0.99.193 \
    dst_port 6081 \
    id 11 \
    erspan_opts 1:2:0:0 \
    action mirred egress redirect dev erspan1

    v1->v2:
    - do the validation when dst is not yet allocated as Jakub suggested.
    - use Duplicate instead of Wrong in err msg for extack.

    Signed-off-by: Xin Long
    Signed-off-by: David S. Miller

    Xin Long
     
  • This patch is to allow setting vxlan options using the
    act_tunnel_key action. Different from geneve options,
    only one option can be set. And also, geneve options
    and vxlan options can't be set at the same time.

    gbp is the only param for vxlan options:

    # ip link add name vxlan0 type vxlan dstport 0 external
    # tc qdisc add dev eth0 ingress
    # tc filter add dev eth0 protocol ip parent ffff: \
    flower indev eth0 \
    ip_proto udp \
    action tunnel_key \
    set src_ip 10.0.99.192 \
    dst_ip 10.0.99.193 \
    dst_port 6081 \
    id 11 \
    vxlan_opts 01020304 \
    action mirred egress redirect dev vxlan0

    v1->v2:
    - add .strict_start_type for enc_opts_policy as Jakub noticed.
    - use Duplicate instead of Wrong in err msg for extack as Jakub
    suggested.

    Signed-off-by: Xin Long
    Signed-off-by: David S. Miller

    Xin Long
     

19 Nov, 2019

1 commit

  • info->options_len is 'u8' type, and when opts_len with a value >
    IP_TUNNEL_OPTS_MAX, 'info->options_len = opts_len' will cast int
    to u8 and set a wrong value to info->options_len.

    Kernel crashed in my test when doing:

    # opts="0102:80:00800022"
    # for i in {1..99}; do opts="$opts,0102:80:00800022"; done
    # ip link add name geneve0 type geneve dstport 0 external
    # tc qdisc add dev eth0 ingress
    # tc filter add dev eth0 protocol ip parent ffff: \
    flower indev eth0 ip_proto udp action tunnel_key \
    set src_ip 10.0.99.192 dst_ip 10.0.99.193 \
    dst_port 6081 id 11 geneve_opts $opts \
    action mirred egress redirect dev geneve0

    So we should do the similar check as cls_flower does, return error
    when opts_len > IP_TUNNEL_OPTS_MAX in tunnel_key_copy_opts().

    Fixes: 0ed5269f9e41 ("net/sched: add tunnel option support to act_tunnel_key")
    Signed-off-by: Xin Long
    Reviewed-by: Simon Horman
    Signed-off-by: David S. Miller

    Xin Long
     

31 Oct, 2019

3 commits

  • Extend struct tc_action with new "tcfa_flags" field. Set the field in
    tcf_idr_create() function and provide new helper
    tcf_idr_create_from_flags() that derives 'cpustats' boolean from flags
    value. Update individual hardware-offloaded actions init() to pass their
    "flags" argument to new helper in order to skip percpu stats allocation
    when user requested it through flags.

    Signed-off-by: Vlad Buslov
    Signed-off-by: David S. Miller

    Vlad Buslov
     
  • Extend TCA_ACT space with nla_bitfield32 flags. Add
    TCA_ACT_FLAGS_NO_PERCPU_STATS as the only allowed flag. Parse the flags in
    tcf_action_init_1() and pass resulting value as additional argument to
    a_o->init().

    Signed-off-by: Vlad Buslov
    Signed-off-by: David S. Miller

    Vlad Buslov
     
  • Extract common code that increments cpu_bstats counter into standalone act
    API function. Change hardware offloaded actions that use percpu counter
    allocation to use the new function instead of incrementing cpu_bstats
    directly.

    This commit doesn't change functionality.

    Signed-off-by: Vlad Buslov
    Acked-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Vlad Buslov
     

30 Oct, 2019

1 commit

  • This commit replaces the use of rcu_swap_protected() with the more
    intuitively appealing rcu_replace_pointer() as a step towards removing
    rcu_swap_protected().

    Link: https://lore.kernel.org/lkml/CAHk-=wiAsJLw1egFEE=Z7-GGtM6wcvtyytXZA1+BHqta4gg6Hw@mail.gmail.com/
    Reported-by: Linus Torvalds
    [ paulmck: From rcu_replace() to rcu_replace_pointer() per Ingo Molnar. ]
    Signed-off-by: Paul E. McKenney
    Cc: Jamal Hadi Salim
    Cc: Cong Wang
    Cc: Jiri Pirko
    Cc: "David S. Miller"
    Cc:
    Cc:

    Paul E. McKenney
     

28 Aug, 2019

1 commit

  • The net pointer in struct xt_tgdtor_param is not explicitly
    initialized therefore is still NULL when dereferencing it.
    So we have to find a way to pass the correct net pointer to
    ipt_destroy_target().

    The best way I find is just saving the net pointer inside the per
    netns struct tcf_idrinfo, which could make this patch smaller.

    Fixes: 0c66dc1ea3f0 ("netfilter: conntrack: register hooks in netns when needed by ruleset")
    Reported-and-tested-by: itugrok@yahoo.com
    Cc: Jamal Hadi Salim
    Cc: Jiri Pirko
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Cong Wang
     

06 Aug, 2019

1 commit

  • Currently init call of all actions (except ipt) init their 'parm'
    structure as a direct pointer to nla data in skb. This leads to race
    condition when some of the filter actions were initialized successfully
    (and were assigned with idr action index that was written directly
    into nla data), but then were deleted and retried (due to following
    action module missing or classifier-initiated retry), in which case
    action init code tries to insert action to idr with index that was
    assigned on previous iteration. During retry the index can be reused
    by another action that was inserted concurrently, which causes
    unintended action sharing between filters.
    To fix described race condition, save action idr index to temporary
    stack-allocated variable instead on nla data.

    Fixes: 0190c1d452a9 ("net: sched: atomically check-allocate action")
    Signed-off-by: Dmytro Linkin
    Signed-off-by: Vlad Buslov
    Acked-by: Cong Wang
    Signed-off-by: David S. Miller

    Dmytro Linkin
     

31 May, 2019

1 commit

  • Based on 1 normalized pattern(s):

    this program is free software you can redistribute it and or modify
    it under the terms of the gnu general public license as published by
    the free software foundation either version 2 of the license or at
    your option any later version

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-or-later

    has been chosen to replace the boilerplate/reference in 3029 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Allison Randal
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

28 Apr, 2019

2 commits

  • We currently have two levels of strict validation:

    1) liberal (default)
    - undefined (type >= max) & NLA_UNSPEC attributes accepted
    - attribute length >= expected accepted
    - garbage at end of message accepted
    2) strict (opt-in)
    - NLA_UNSPEC attributes accepted
    - attribute length >= expected accepted

    Split out parsing strictness into four different options:
    * TRAILING - check that there's no trailing data after parsing
    attributes (in message or nested)
    * MAXTYPE - reject attrs > max known type
    * UNSPEC - reject attributes with NLA_UNSPEC policy entries
    * STRICT_ATTRS - strictly validate attribute size

    The default for future things should be *everything*.
    The current *_strict() is a combination of TRAILING and MAXTYPE,
    and is renamed to _deprecated_strict().
    The current regular parsing has none of this, and is renamed to
    *_parse_deprecated().

    Additionally it allows us to selectively set one of the new flags
    even on old policies. Notably, the UNSPEC flag could be useful in
    this case, since it can be arranged (by filling in the policy) to
    not be an incompatible userspace ABI change, but would then going
    forward prevent forgetting attribute entries. Similar can apply
    to the POLICY flag.

    We end up with the following renames:
    * nla_parse -> nla_parse_deprecated
    * nla_parse_strict -> nla_parse_deprecated_strict
    * nlmsg_parse -> nlmsg_parse_deprecated
    * nlmsg_parse_strict -> nlmsg_parse_deprecated_strict
    * nla_parse_nested -> nla_parse_nested_deprecated
    * nla_validate_nested -> nla_validate_nested_deprecated

    Using spatch, of course:
    @@
    expression TB, MAX, HEAD, LEN, POL, EXT;
    @@
    -nla_parse(TB, MAX, HEAD, LEN, POL, EXT)
    +nla_parse_deprecated(TB, MAX, HEAD, LEN, POL, EXT)

    @@
    expression NLH, HDRLEN, TB, MAX, POL, EXT;
    @@
    -nlmsg_parse(NLH, HDRLEN, TB, MAX, POL, EXT)
    +nlmsg_parse_deprecated(NLH, HDRLEN, TB, MAX, POL, EXT)

    @@
    expression NLH, HDRLEN, TB, MAX, POL, EXT;
    @@
    -nlmsg_parse_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
    +nlmsg_parse_deprecated_strict(NLH, HDRLEN, TB, MAX, POL, EXT)

    @@
    expression TB, MAX, NLA, POL, EXT;
    @@
    -nla_parse_nested(TB, MAX, NLA, POL, EXT)
    +nla_parse_nested_deprecated(TB, MAX, NLA, POL, EXT)

    @@
    expression START, MAX, POL, EXT;
    @@
    -nla_validate_nested(START, MAX, POL, EXT)
    +nla_validate_nested_deprecated(START, MAX, POL, EXT)

    @@
    expression NLH, HDRLEN, MAX, POL, EXT;
    @@
    -nlmsg_validate(NLH, HDRLEN, MAX, POL, EXT)
    +nlmsg_validate_deprecated(NLH, HDRLEN, MAX, POL, EXT)

    For this patch, don't actually add the strict, non-renamed versions
    yet so that it breaks compile if I get it wrong.

    Also, while at it, make nla_validate and nla_parse go down to a
    common __nla_validate_parse() function to avoid code duplication.

    Ultimately, this allows us to have very strict validation for every
    new caller of nla_parse()/nlmsg_parse() etc as re-introduced in the
    next patch, while existing things will continue to work as is.

    In effect then, this adds fully strict validation for any new command.

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     
  • Even if the NLA_F_NESTED flag was introduced more than 11 years ago, most
    netlink based interfaces (including recently added ones) are still not
    setting it in kernel generated messages. Without the flag, message parsers
    not aware of attribute semantics (e.g. wireshark dissector or libmnl's
    mnl_nlmsg_fprintf()) cannot recognize nested attributes and won't display
    the structure of their contents.

    Unfortunately we cannot just add the flag everywhere as there may be
    userspace applications which check nlattr::nla_type directly rather than
    through a helper masking out the flags. Therefore the patch renames
    nla_nest_start() to nla_nest_start_noflag() and introduces nla_nest_start()
    as a wrapper adding NLA_F_NESTED. The calls which add NLA_F_NESTED manually
    are rewritten to use nla_nest_start().

    Except for changes in include/net/netlink.h, the patch was generated using
    this semantic patch:

    @@ expression E1, E2; @@
    -nla_nest_start(E1, E2)
    +nla_nest_start_noflag(E1, E2)

    @@ expression E1, E2; @@
    -nla_nest_start_noflag(E1, E2 | NLA_F_NESTED)
    +nla_nest_start(E1, E2)

    Signed-off-by: Michal Kubecek
    Acked-by: Jiri Pirko
    Acked-by: David Ahern
    Signed-off-by: David S. Miller

    Michal Kubecek
     

22 Mar, 2019

2 commits

  • the following script:

    # tc qdisc add dev crash0 clsact
    # tc filter add dev crash0 egress matchall \
    > action tunnel_key set src_ip 10.10.10.1 dst_ip 20.20.2 dst_port 3128 \
    > nocsum id 1 pass index 90
    # tc actions replace action tunnel_key \
    > set src_ip 10.10.10.1 dst_ip 20.20.2 dst_port 3128 nocsum id 1 \
    > goto chain 42 index 90 cookie c1a0c1a0
    # tc actions show action tunnel_key

    had the following output:

    Error: Failed to init TC action chain.
    We have an error talking to the kernel
    total acts 1

    action order 0: tunnel_key set
    src_ip 10.10.10.1
    dst_ip 20.20.2.0
    key_id 1
    dst_port 3128
    nocsum goto chain 42
    index 90 ref 2 bind 1
    cookie c1a0c1a0

    then, the first packet transmitted by crash0 made the kernel crash:

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
    #PF error: [normal kernel read fault]
    PGD 800000002aba4067 P4D 800000002aba4067 PUD 795f9067 PMD 0
    Oops: 0000 [#1] SMP PTI
    CPU: 3 PID: 0 Comm: swapper/3 Not tainted 5.0.0-rc4.gotochain_crash+ #536
    Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
    RIP: 0010:tcf_action_exec+0xb8/0x100
    Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
    RSP: 0018:ffff9346bdb83be0 EFLAGS: 00010246
    RAX: 000000002000002a RBX: ffff9346bb795c00 RCX: 0000000000000002
    RDX: 0000000000000000 RSI: ffff93466c881700 RDI: 0000000000000246
    RBP: ffff9346bdb83c80 R08: ffff9346b3e1e0c8 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000000 R12: ffff9346b978f000
    R13: ffff9346b978f008 R14: 0000000000000001 R15: ffff93466dceeb40
    FS: 0000000000000000(0000) GS:ffff9346bdb80000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000000000000 CR3: 000000007a6c2002 CR4: 00000000001606e0
    Call Trace:

    tcf_classify+0x58/0x120
    __dev_queue_xmit+0x40a/0x890
    ? ip6_finish_output2+0x369/0x590
    ip6_finish_output2+0x369/0x590
    ? ip6_output+0x68/0x110
    ip6_output+0x68/0x110
    ? nf_hook.constprop.35+0x79/0xc0
    mld_sendpack+0x16f/0x220
    mld_ifc_timer_expire+0x195/0x2c0
    ? igmp6_timer_handler+0x70/0x70
    call_timer_fn+0x2b/0x130
    run_timer_softirq+0x3e8/0x440
    ? tick_sched_timer+0x37/0x70
    __do_softirq+0xe3/0x2f5
    irq_exit+0xf0/0x100
    smp_apic_timer_interrupt+0x6c/0x130
    apic_timer_interrupt+0xf/0x20

    RIP: 0010:native_safe_halt+0x2/0x10
    Code: 55 ff ff ff 7f f3 c3 65 48 8b 04 25 00 5c 01 00 f0 80 48 02 20 48 8b 00 a8 08 74 8b eb c1 90 90 90 90 90 90 90 90 90 90 fb f4 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 f4 c3 90 90 90 90 90 90
    RSP: 0018:ffffa48a8038feb8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
    RAX: ffffffffaa8184f0 RBX: 0000000000000003 RCX: 0000000000000000
    RDX: 0000000000000001 RSI: 0000000000000087 RDI: 0000000000000003
    RBP: 0000000000000003 R08: 0011251c6fcfac49 R09: ffff9346b995be00
    R10: ffffa48a805e7ce8 R11: 00000000024c38dd R12: 0000000000000000
    R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
    ? __sched_text_end+0x1/0x1
    default_idle+0x1c/0x140
    do_idle+0x1c4/0x280
    cpu_startup_entry+0x19/0x20
    start_secondary+0x1a7/0x200
    secondary_startup_64+0xa4/0xb0
    Modules linked in: act_tunnel_key veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 crct10dif_pclmul crc32_pclmul snd_hda_codec_generic ghash_clmulni_intel mbcache snd_hda_intel jbd2 snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device snd_pcm aesni_intel crypto_simd cryptd glue_helper joydev snd_timer snd pcspkr virtio_balloon soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper syscopyarea sysfillrect virtio_net sysimgblt fb_sys_fops ttm net_failover virtio_console virtio_blk failover drm serio_raw crc32c_intel ata_piix virtio_pci floppy virtio_ring libata virtio dm_mirror dm_region_hash dm_log dm_mod
    CR2: 0000000000000000

    Validating the control action within tcf_tunnel_key_init() proved to fix
    the above issue. A TDC selftest is added to verify the correct behavior.

    Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain")
    Fixes: 97763dc0f401 ("net_sched: reject unknown tcfa_action values")
    Signed-off-by: Davide Caratti
    Signed-off-by: David S. Miller

    Davide Caratti
     
  • - pass a pointer to struct tcf_proto in each actions's init() handler,
    to allow validating the control action, checking whether the chain
    exists and (eventually) refcounting it.
    - remove code that validates the control action after a successful call
    to the action's init() handler, and replace it with a test that forbids
    addition of actions having 'goto_chain' and NULL goto_chain pointer at
    the same time.
    - add tcf_action_check_ctrlact(), that will validate the control action
    and eventually allocate the action 'goto_chain' within the init()
    handler.
    - add tcf_action_set_ctrlact(), that will assign the control action and
    swap the current 'goto_chain' pointer with the new given one.

    This disallows 'goto_chain' on actions that don't initialize it properly
    in their init() handler, i.e. calling tcf_action_check_ctrlact() after
    successful IDR reservation and then calling tcf_action_set_ctrlact()
    to assign 'goto_chain' and 'tcf_action' consistently.

    By doing this, the kernel does not leak anymore refcounts when a valid
    'goto chain' handle is replaced in TC actions, causing kmemleak splats
    like the following one:

    # tc chain add dev dd0 chain 42 ingress protocol ip flower \
    > ip_proto tcp action drop
    # tc chain add dev dd0 chain 43 ingress protocol ip flower \
    > ip_proto udp action drop
    # tc filter add dev dd0 ingress matchall \
    > action gact goto chain 42 index 66
    # tc filter replace dev dd0 ingress matchall \
    > action gact goto chain 43 index 66
    # echo scan >/sys/kernel/debug/kmemleak

    unreferenced object 0xffff93c0ee09f000 (size 1024):
    comm "tc", pid 2565, jiffies 4295339808 (age 65.426s)
    hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
    00 00 00 00 08 00 06 00 00 00 00 00 00 00 00 00 ................
    backtrace:
    [] tc_ctl_chain+0x3d2/0x4c0
    [] rtnetlink_rcv_msg+0x263/0x2d0
    [] netlink_rcv_skb+0x4a/0x110
    [] netlink_unicast+0x1a0/0x250
    [] netlink_sendmsg+0x2c1/0x3c0
    [] sock_sendmsg+0x36/0x40
    [] ___sys_sendmsg+0x280/0x2f0
    [] __sys_sendmsg+0x5e/0xa0
    [] do_syscall_64+0x5b/0x180
    [] entry_SYSCALL_64_after_hwframe+0x44/0xa9
    [] 0xffffffffffffffff

    Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain")
    Fixes: 97763dc0f401 ("net_sched: reject unknown tcfa_action values")
    Signed-off-by: Davide Caratti
    Signed-off-by: David S. Miller

    Davide Caratti
     

06 Mar, 2019

1 commit

  • dst_cache_destroy will be called in dst_release

    dst_release-->dst_destroy_rcu-->dst_destroy-->metadata_dst_free
    -->dst_cache_destroy

    It should not call dst_cache_destroy before dst_release

    Fixes: 41411e2fd6b8 ("net/sched: act_tunnel_key: Add dst_cache support")
    Signed-off-by: wenxu
    Signed-off-by: David S. Miller

    wenxu
     

05 Mar, 2019

1 commit

  • The label is only used from inside the #ifdef and should be
    hidden the same way, to avoid this warning:

    net/sched/act_tunnel_key.c: In function 'tunnel_key_init':
    net/sched/act_tunnel_key.c:389:1: error: label 'release_tun_meta' defined but not used [-Werror=unused-label]
    release_tun_meta:

    Fixes: 41411e2fd6b8 ("net/sched: act_tunnel_key: Add dst_cache support")
    Signed-off-by: Arnd Bergmann
    Signed-off-by: David S. Miller

    Arnd Bergmann
     

03 Mar, 2019

1 commit


28 Feb, 2019

1 commit

  • Tunnel key action params->tcft_enc_metadata is only set when action is
    TCA_TUNNEL_KEY_ACT_SET. However, metadata pointer is incorrectly
    dereferenced during tunnel key init and release without verifying that
    action is if correct type, which causes NULL pointer dereference. Metadata
    tunnel dst_cache is also leaked on action overwrite.

    Fix metadata handling:
    - Verify that metadata pointer is not NULL before dereferencing it in
    tunnel_key_init error handling code.
    - Move dst_cache destroy code into tunnel_key_release_params() function
    that is called in both action overwrite and release cases (fixes resource
    leak) and verifies that actions has correct type before dereferencing
    metadata pointer (fixes NULL pointer dereference).

    Oops with KASAN enabled during tdc tests execution:

    [ 261.080482] ==================================================================
    [ 261.088049] BUG: KASAN: null-ptr-deref in dst_cache_destroy+0x21/0xa0
    [ 261.094613] Read of size 8 at addr 00000000000000b0 by task tc/2976
    [ 261.102524] CPU: 14 PID: 2976 Comm: tc Not tainted 5.0.0-rc7+ #157
    [ 261.108844] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017
    [ 261.116726] Call Trace:
    [ 261.119234] dump_stack+0x9a/0xeb
    [ 261.122625] ? dst_cache_destroy+0x21/0xa0
    [ 261.126818] ? dst_cache_destroy+0x21/0xa0
    [ 261.131004] kasan_report+0x176/0x192
    [ 261.134752] ? idr_get_next+0xd0/0x120
    [ 261.138578] ? dst_cache_destroy+0x21/0xa0
    [ 261.142768] dst_cache_destroy+0x21/0xa0
    [ 261.146799] tunnel_key_release+0x3a/0x50 [act_tunnel_key]
    [ 261.152392] tcf_action_cleanup+0x2c/0xc0
    [ 261.156490] tcf_generic_walker+0x4c2/0x5c0
    [ 261.160794] ? tcf_action_dump_1+0x390/0x390
    [ 261.165163] ? tunnel_key_walker+0x5/0x1a0 [act_tunnel_key]
    [ 261.170865] ? tunnel_key_walker+0xe9/0x1a0 [act_tunnel_key]
    [ 261.176641] tca_action_gd+0x600/0xa40
    [ 261.180482] ? tca_get_fill.constprop.17+0x200/0x200
    [ 261.185548] ? __lock_acquire+0x588/0x1d20
    [ 261.189741] ? __lock_acquire+0x588/0x1d20
    [ 261.193922] ? mark_held_locks+0x90/0x90
    [ 261.197944] ? mark_held_locks+0x90/0x90
    [ 261.202018] ? __nla_parse+0xfe/0x190
    [ 261.205774] tc_ctl_action+0x218/0x230
    [ 261.209614] ? tcf_action_add+0x230/0x230
    [ 261.213726] rtnetlink_rcv_msg+0x3a5/0x600
    [ 261.217910] ? lock_downgrade+0x2d0/0x2d0
    [ 261.222006] ? validate_linkmsg+0x400/0x400
    [ 261.226278] ? find_held_lock+0x6d/0xd0
    [ 261.230200] ? match_held_lock+0x1b/0x210
    [ 261.234296] ? validate_linkmsg+0x400/0x400
    [ 261.238567] netlink_rcv_skb+0xc7/0x1f0
    [ 261.242489] ? netlink_ack+0x470/0x470
    [ 261.246319] ? netlink_deliver_tap+0x1f3/0x5a0
    [ 261.250874] netlink_unicast+0x2ae/0x350
    [ 261.254884] ? netlink_attachskb+0x340/0x340
    [ 261.261647] ? _copy_from_iter_full+0xdd/0x380
    [ 261.268576] ? __virt_addr_valid+0xb6/0xf0
    [ 261.275227] ? __check_object_size+0x159/0x240
    [ 261.282184] netlink_sendmsg+0x4d3/0x630
    [ 261.288572] ? netlink_unicast+0x350/0x350
    [ 261.295132] ? netlink_unicast+0x350/0x350
    [ 261.301608] sock_sendmsg+0x6d/0x80
    [ 261.307467] ___sys_sendmsg+0x48e/0x540
    [ 261.313633] ? copy_msghdr_from_user+0x210/0x210
    [ 261.320545] ? save_stack+0x89/0xb0
    [ 261.326289] ? __lock_acquire+0x588/0x1d20
    [ 261.332605] ? entry_SYSCALL_64_after_hwframe+0x49/0xbe
    [ 261.340063] ? mark_held_locks+0x90/0x90
    [ 261.346162] ? do_filp_open+0x138/0x1d0
    [ 261.352108] ? may_open_dev+0x50/0x50
    [ 261.357897] ? match_held_lock+0x1b/0x210
    [ 261.364016] ? __fget_light+0xa6/0xe0
    [ 261.369840] ? __sys_sendmsg+0xd2/0x150
    [ 261.375814] __sys_sendmsg+0xd2/0x150
    [ 261.381610] ? __ia32_sys_shutdown+0x30/0x30
    [ 261.388026] ? lock_downgrade+0x2d0/0x2d0
    [ 261.394182] ? mark_held_locks+0x1c/0x90
    [ 261.400230] ? do_syscall_64+0x1e/0x280
    [ 261.406172] do_syscall_64+0x78/0x280
    [ 261.411932] entry_SYSCALL_64_after_hwframe+0x49/0xbe
    [ 261.419103] RIP: 0033:0x7f28e91a8b87
    [ 261.424791] Code: 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 80 00 00 00 00 8b 05 6a 2b 2c 00 48 63 d2 48 63 ff 85 c0 75 18 b8 2e 00 00 00 0f 05 3d 00 f0 ff ff 77 59 f3 c3 0f 1f 80 00 00 00 00 53 48 89 f3 48
    [ 261.448226] RSP: 002b:00007ffdc5c4e2d8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
    [ 261.458183] RAX: ffffffffffffffda RBX: 000000005c73c202 RCX: 00007f28e91a8b87
    [ 261.467728] RDX: 0000000000000000 RSI: 00007ffdc5c4e340 RDI: 0000000000000003
    [ 261.477342] RBP: 0000000000000000 R08: 0000000000000001 R09: 000000000000000c
    [ 261.486970] R10: 000000000000000c R11: 0000000000000246 R12: 0000000000000001
    [ 261.496599] R13: 000000000067b4e0 R14: 00007ffdc5c5248c R15: 00007ffdc5c52480
    [ 261.506281] ==================================================================
    [ 261.516076] Disabling lock debugging due to kernel taint
    [ 261.523979] BUG: unable to handle kernel NULL pointer dereference at 00000000000000b0
    [ 261.534413] #PF error: [normal kernel read fault]
    [ 261.541730] PGD 8000000317400067 P4D 8000000317400067 PUD 316878067 PMD 0
    [ 261.551294] Oops: 0000 [#1] SMP KASAN PTI
    [ 261.557985] CPU: 14 PID: 2976 Comm: tc Tainted: G B 5.0.0-rc7+ #157
    [ 261.568306] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017
    [ 261.578874] RIP: 0010:dst_cache_destroy+0x21/0xa0
    [ 261.586413] Code: f4 ff ff ff eb f6 0f 1f 00 0f 1f 44 00 00 41 56 41 55 49 c7 c6 60 fe 35 af 41 54 55 49 89 fc 53 bd ff ff ff ff e8 ef 98 73 ff 83 3c 24 00 75 35 eb 6c 4c 63 ed e8 de 98 73 ff 4a 8d 3c ed 40
    [ 261.611247] RSP: 0018:ffff888316447160 EFLAGS: 00010282
    [ 261.619564] RAX: 0000000000000000 RBX: ffff88835b3e2f00 RCX: ffffffffad1c5071
    [ 261.629862] RDX: 0000000000000003 RSI: dffffc0000000000 RDI: 0000000000000297
    [ 261.640149] RBP: 00000000ffffffff R08: fffffbfff5dd4e89 R09: fffffbfff5dd4e89
    [ 261.650467] R10: 0000000000000001 R11: fffffbfff5dd4e88 R12: 00000000000000b0
    [ 261.660785] R13: ffff8883267a10c0 R14: ffffffffaf35fe60 R15: 0000000000000001
    [ 261.671110] FS: 00007f28ea3e6400(0000) GS:ffff888364200000(0000) knlGS:0000000000000000
    [ 261.682447] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 261.691491] CR2: 00000000000000b0 CR3: 00000003178ae004 CR4: 00000000001606e0
    [ 261.701283] Call Trace:
    [ 261.706374] tunnel_key_release+0x3a/0x50 [act_tunnel_key]
    [ 261.714522] tcf_action_cleanup+0x2c/0xc0
    [ 261.721208] tcf_generic_walker+0x4c2/0x5c0
    [ 261.728074] ? tcf_action_dump_1+0x390/0x390
    [ 261.734996] ? tunnel_key_walker+0x5/0x1a0 [act_tunnel_key]
    [ 261.743247] ? tunnel_key_walker+0xe9/0x1a0 [act_tunnel_key]
    [ 261.751557] tca_action_gd+0x600/0xa40
    [ 261.757991] ? tca_get_fill.constprop.17+0x200/0x200
    [ 261.765644] ? __lock_acquire+0x588/0x1d20
    [ 261.772461] ? __lock_acquire+0x588/0x1d20
    [ 261.779266] ? mark_held_locks+0x90/0x90
    [ 261.785880] ? mark_held_locks+0x90/0x90
    [ 261.792470] ? __nla_parse+0xfe/0x190
    [ 261.798738] tc_ctl_action+0x218/0x230
    [ 261.805145] ? tcf_action_add+0x230/0x230
    [ 261.811760] rtnetlink_rcv_msg+0x3a5/0x600
    [ 261.818564] ? lock_downgrade+0x2d0/0x2d0
    [ 261.825433] ? validate_linkmsg+0x400/0x400
    [ 261.832256] ? find_held_lock+0x6d/0xd0
    [ 261.838624] ? match_held_lock+0x1b/0x210
    [ 261.845142] ? validate_linkmsg+0x400/0x400
    [ 261.851729] netlink_rcv_skb+0xc7/0x1f0
    [ 261.857976] ? netlink_ack+0x470/0x470
    [ 261.864132] ? netlink_deliver_tap+0x1f3/0x5a0
    [ 261.870969] netlink_unicast+0x2ae/0x350
    [ 261.877294] ? netlink_attachskb+0x340/0x340
    [ 261.883962] ? _copy_from_iter_full+0xdd/0x380
    [ 261.890750] ? __virt_addr_valid+0xb6/0xf0
    [ 261.897188] ? __check_object_size+0x159/0x240
    [ 261.903928] netlink_sendmsg+0x4d3/0x630
    [ 261.910112] ? netlink_unicast+0x350/0x350
    [ 261.916410] ? netlink_unicast+0x350/0x350
    [ 261.922656] sock_sendmsg+0x6d/0x80
    [ 261.928257] ___sys_sendmsg+0x48e/0x540
    [ 261.934183] ? copy_msghdr_from_user+0x210/0x210
    [ 261.940865] ? save_stack+0x89/0xb0
    [ 261.946355] ? __lock_acquire+0x588/0x1d20
    [ 261.952358] ? entry_SYSCALL_64_after_hwframe+0x49/0xbe
    [ 261.959468] ? mark_held_locks+0x90/0x90
    [ 261.965248] ? do_filp_open+0x138/0x1d0
    [ 261.970910] ? may_open_dev+0x50/0x50
    [ 261.976386] ? match_held_lock+0x1b/0x210
    [ 261.982210] ? __fget_light+0xa6/0xe0
    [ 261.987648] ? __sys_sendmsg+0xd2/0x150
    [ 261.993263] __sys_sendmsg+0xd2/0x150
    [ 261.998613] ? __ia32_sys_shutdown+0x30/0x30
    [ 262.004555] ? lock_downgrade+0x2d0/0x2d0
    [ 262.010236] ? mark_held_locks+0x1c/0x90
    [ 262.015758] ? do_syscall_64+0x1e/0x280
    [ 262.021234] do_syscall_64+0x78/0x280
    [ 262.026500] entry_SYSCALL_64_after_hwframe+0x49/0xbe
    [ 262.033207] RIP: 0033:0x7f28e91a8b87
    [ 262.038421] Code: 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 80 00 00 00 00 8b 05 6a 2b 2c 00 48 63 d2 48 63 ff 85 c0 75 18 b8 2e 00 00 00 0f 05 3d 00 f0 ff ff 77 59 f3 c3 0f 1f 80 00 00 00 00 53 48 89 f3 48
    [ 262.060708] RSP: 002b:00007ffdc5c4e2d8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
    [ 262.070112] RAX: ffffffffffffffda RBX: 000000005c73c202 RCX: 00007f28e91a8b87
    [ 262.079087] RDX: 0000000000000000 RSI: 00007ffdc5c4e340 RDI: 0000000000000003
    [ 262.088122] RBP: 0000000000000000 R08: 0000000000000001 R09: 000000000000000c
    [ 262.097157] R10: 000000000000000c R11: 0000000000000246 R12: 0000000000000001
    [ 262.106207] R13: 000000000067b4e0 R14: 00007ffdc5c5248c R15: 00007ffdc5c52480
    [ 262.115271] Modules linked in: act_tunnel_key act_skbmod act_simple act_connmark nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 act_csum libcrc32c act_meta_skbtcindex act_meta_skbprio act_meta_mark act_ife ife act_police act_sample psample act_gact veth nfsv3 nfs_acl nfs lockd grace fscache bridge stp llc intel_rapl sb_edac mlx5_ib x86_pkg_temp_thermal sunrpc intel_powerclamp coretemp ib_uverbs kvm_intel ib_core kvm irqbypass mlx5_core crct10dif_pclmul crc32_pclmul crc32c_intel igb ghash_clmulni_intel intel_cstate mlxfw iTCO_wdt devlink intel_uncore iTCO_vendor_support ipmi_ssif ptp mei_me intel_rapl_perf ioatdma joydev pps_core ses mei i2c_i801 pcspkr enclosure lpc_ich dca wmi ipmi_si ipmi_devintf ipmi_msghandler acpi_pad acpi_power_meter pcc_cpufreq ast i2c_algo_bit drm_kms_helper ttm drm mpt3sas raid_class scsi_transport_sas
    [ 262.204393] CR2: 00000000000000b0
    [ 262.210390] ---[ end trace 2e41d786f2c7901a ]---
    [ 262.226790] RIP: 0010:dst_cache_destroy+0x21/0xa0
    [ 262.234083] Code: f4 ff ff ff eb f6 0f 1f 00 0f 1f 44 00 00 41 56 41 55 49 c7 c6 60 fe 35 af 41 54 55 49 89 fc 53 bd ff ff ff ff e8 ef 98 73 ff 83 3c 24 00 75 35 eb 6c 4c 63 ed e8 de 98 73 ff 4a 8d 3c ed 40
    [ 262.258311] RSP: 0018:ffff888316447160 EFLAGS: 00010282
    [ 262.266304] RAX: 0000000000000000 RBX: ffff88835b3e2f00 RCX: ffffffffad1c5071
    [ 262.276251] RDX: 0000000000000003 RSI: dffffc0000000000 RDI: 0000000000000297
    [ 262.286208] RBP: 00000000ffffffff R08: fffffbfff5dd4e89 R09: fffffbfff5dd4e89
    [ 262.296183] R10: 0000000000000001 R11: fffffbfff5dd4e88 R12: 00000000000000b0
    [ 262.306157] R13: ffff8883267a10c0 R14: ffffffffaf35fe60 R15: 0000000000000001
    [ 262.316139] FS: 00007f28ea3e6400(0000) GS:ffff888364200000(0000) knlGS:0000000000000000
    [ 262.327146] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 262.335815] CR2: 00000000000000b0 CR3: 00000003178ae004 CR4: 00000000001606e0

    Fixes: 41411e2fd6b8 ("net/sched: act_tunnel_key: Add dst_cache support")
    Signed-off-by: Vlad Buslov
    Reviewed-by: Roi Dayan
    Signed-off-by: David S. Miller

    Vlad Buslov
     

26 Feb, 2019

1 commit

  • Metadata pointer is only initialized for action TCA_TUNNEL_KEY_ACT_SET, but
    it is unconditionally dereferenced in tunnel_key_init() error handler.
    Verify that metadata pointer is not NULL before dereferencing it in
    tunnel_key_init error handling code.

    Fixes: ee28bb56ac5b ("net/sched: fix memory leak in act_tunnel_key_init()")
    Signed-off-by: Vlad Buslov
    Reviewed-by: Davide Caratti
    Signed-off-by: David S. Miller

    Vlad Buslov
     

25 Feb, 2019

1 commit


11 Feb, 2019

1 commit

  • Modify the kernel users of the TCA_ACT_* macros to use TCA_ID_*. For
    example, use TCA_ID_GACT instead of TCA_ACT_GACT. This will align with
    TCA_ID_POLICE and also differentiates these identifier, used in struct
    tc_action_ops type field, from other macros starting with TCA_ACT_.

    To make things clearer, we name the enum defining the TCA_ID_*
    identifiers and also change the "type" field of struct tc_action to
    id.

    Signed-off-by: Eli Cohen
    Acked-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Eli Cohen
     

16 Jan, 2019

1 commit

  • running the following TDC test cases:

    7afc - Replace tunnel_key set action with all parameters
    364d - Replace tunnel_key set action with all parameters and cookie

    it's possible to trigger kmemleak warnings like:

    unreferenced object 0xffff94797127ab40 (size 192):
    comm "tc", pid 3248, jiffies 4300565293 (age 1006.862s)
    hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 c0 93 f9 8a ff ff ff ff ................
    41 84 ee 89 ff ff ff ff 00 00 00 00 00 00 00 00 A...............
    backtrace:
    [] tunnel_key_init+0x31d/0x820 [act_tunnel_key]
    [] tcf_action_init_1+0x384/0x4c0
    [] tcf_action_init+0x12b/0x1a0
    [] tcf_action_add+0x73/0x170
    [] tc_ctl_action+0x122/0x160
    [] rtnetlink_rcv_msg+0x263/0x2d0
    [] netlink_rcv_skb+0x4a/0x110
    [] netlink_unicast+0x1a0/0x250
    [] netlink_sendmsg+0x2c1/0x3c0
    [] sock_sendmsg+0x36/0x40
    [] ___sys_sendmsg+0x280/0x2f0
    [] __sys_sendmsg+0x5e/0xa0
    [] do_syscall_64+0x5b/0x180
    [] entry_SYSCALL_64_after_hwframe+0x44/0xa9
    [] 0xffffffffffffffff

    when the tunnel_key action is replaced, the kernel forgets to release the
    dst metadata: ensure they are released by tunnel_key_init(), the same way
    it's done in tunnel_key_release().

    Fixes: d0f6dd8a914f4 ("net/sched: Introduce act_tunnel_key")
    Signed-off-by: Davide Caratti
    Acked-by: Cong Wang
    Signed-off-by: David S. Miller

    Davide Caratti
     

05 Dec, 2018

2 commits

  • It's possible to set a tunnel without a destination port. However,
    on dump(), a zero dst port is returned to user space even if it was not
    set, fix that.

    Note that so far it wasn't required, b/c key less tunnels were not
    supported and the UDP tunnels do require destination port.

    Signed-off-by: Adi Nissim
    Reviewed-by: Oz Shlomo
    Acked-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Adi Nissim
     
  • Allow setting a tunnel without a tunnel key. This is required for
    tunneling protocols, such as GRE, that define the key as an optional
    field.

    Signed-off-by: Adi Nissim
    Acked-by: Or Gerlitz
    Reviewed-by: Oz Shlomo
    Acked-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Adi Nissim
     

13 Sep, 2018

1 commit


08 Sep, 2018

1 commit

  • When nla_put*() fails after nla_nest_start(), we need
    to call nla_nest_cancel() to cancel the message, otherwise
    we end up calling nla_nest_end() like a success.

    Fixes: 0ed5269f9e41 ("net/sched: add tunnel option support to act_tunnel_key")
    Cc: Davide Caratti
    Cc: Simon Horman
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Cong Wang
     

06 Sep, 2018

1 commit

  • If users try to install act_tunnel_key 'set' rules with duplicate values
    of 'index', the tunnel metadata are allocated, but never released. Then,
    kmemleak complains as follows:

    # tc a a a tunnel_key set src_ip 1.1.1.1 dst_ip 2.2.2.2 id 42 index 111
    # echo clear > /sys/kernel/debug/kmemleak
    # tc a a a tunnel_key set src_ip 1.1.1.1 dst_ip 2.2.2.2 id 42 index 111
    Error: TC IDR already exists.
    We have an error talking to the kernel
    # echo scan > /sys/kernel/debug/kmemleak
    # cat /sys/kernel/debug/kmemleak
    unreferenced object 0xffff8800574e6c80 (size 256):
    comm "tc", pid 5617, jiffies 4298118009 (age 57.990s)
    hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 1c e8 b0 ff ff ff ff ................
    81 24 c2 ad ff ff ff ff 00 00 00 00 00 00 00 00 .$..............
    backtrace:
    [] tunnel_key_init+0x8a5/0x1800 [act_tunnel_key]
    [] tcf_action_init_1+0x698/0xac0
    [] tcf_action_init+0x15c/0x590
    [] tc_ctl_action+0x336/0x5c2
    [] rtnetlink_rcv_msg+0x357/0x8e0
    [] netlink_rcv_skb+0x124/0x350
    [] netlink_unicast+0x40f/0x5d0
    [] netlink_sendmsg+0x6e8/0xba0
    [] sock_sendmsg+0xb3/0xf0
    [] ___sys_sendmsg+0x654/0x960
    [] __sys_sendmsg+0xd3/0x170
    [] do_syscall_64+0xa5/0x470
    [] entry_SYSCALL_64_after_hwframe+0x49/0xbe
    [] 0xffffffffffffffff

    This problem theoretically happens also in case users attempt to setup a
    geneve rule having wrong configuration data, or when the kernel fails to
    allocate 'params_new'. Ensure that tunnel_key_init() releases the tunnel
    metadata also in the above conditions.

    Addresses-Coverity-ID: 1373974 ("Resource leak")
    Fixes: d0f6dd8a914f4 ("net/sched: Introduce act_tunnel_key")
    Fixes: 0ed5269f9e41f ("net/sched: add tunnel option support to act_tunnel_key")
    Signed-off-by: Davide Caratti
    Acked-by: Cong Wang
    Signed-off-by: David S. Miller

    Davide Caratti
     

01 Sep, 2018

1 commit


22 Aug, 2018

1 commit

  • All ops->delete() wants is getting the tn->idrinfo, but we already
    have tc_action before calling ops->delete(), and tc_action has
    a pointer ->idrinfo.

    More importantly, each type of action does the same thing, that is,
    just calling tcf_idr_delete_index().

    So it can be just removed.

    Fixes: b409074e6693 ("net: sched: add 'delete' function to action ops")
    Cc: Jiri Pirko
    Cc: Vlad Buslov
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Cong Wang
     

20 Aug, 2018

1 commit

  • Recently, ops->init() and ops->dump() of all actions were modified to
    always obtain tcf_lock when accessing private action state. Actions that
    don't depend on tcf_lock for synchronization with their data path use
    non-bh locking API. However, tcf_lock is also used to protect rate
    estimator stats in softirq context by timer callback.

    Change ops->init() and ops->dump() of all actions to disable bh when using
    tcf_lock to prevent deadlock reported by following lockdep warning:

    [ 105.470398] ================================
    [ 105.475014] WARNING: inconsistent lock state
    [ 105.479628] 4.18.0-rc8+ #664 Not tainted
    [ 105.483897] --------------------------------
    [ 105.488511] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
    [ 105.494871] swapper/16/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
    [ 105.500449] 00000000f86c012e (&(&p->tcfa_lock)->rlock){+.?.}, at: est_fetch_counters+0x3c/0xa0
    [ 105.509696] {SOFTIRQ-ON-W} state was registered at:
    [ 105.514925] _raw_spin_lock+0x2c/0x40
    [ 105.519022] tcf_bpf_init+0x579/0x820 [act_bpf]
    [ 105.523990] tcf_action_init_1+0x4e4/0x660
    [ 105.528518] tcf_action_init+0x1ce/0x2d0
    [ 105.532880] tcf_exts_validate+0x1d8/0x200
    [ 105.537416] fl_change+0x55a/0x268b [cls_flower]
    [ 105.542469] tc_new_tfilter+0x748/0xa20
    [ 105.546738] rtnetlink_rcv_msg+0x56a/0x6d0
    [ 105.551268] netlink_rcv_skb+0x18d/0x200
    [ 105.555628] netlink_unicast+0x2d0/0x370
    [ 105.559990] netlink_sendmsg+0x3b9/0x6a0
    [ 105.564349] sock_sendmsg+0x6b/0x80
    [ 105.568271] ___sys_sendmsg+0x4a1/0x520
    [ 105.572547] __sys_sendmsg+0xd7/0x150
    [ 105.576655] do_syscall_64+0x72/0x2c0
    [ 105.580757] entry_SYSCALL_64_after_hwframe+0x49/0xbe
    [ 105.586243] irq event stamp: 489296
    [ 105.590084] hardirqs last enabled at (489296): [] _raw_spin_unlock_irq+0x29/0x40
    [ 105.599765] hardirqs last disabled at (489295): [] _raw_spin_lock_irq+0x15/0x50
    [ 105.609277] softirqs last enabled at (489292): [] irq_enter+0x83/0xa0
    [ 105.618001] softirqs last disabled at (489293): [] irq_exit+0x140/0x190
    [ 105.626813]
    other info that might help us debug this:
    [ 105.633976] Possible unsafe locking scenario:

    [ 105.640526] CPU0
    [ 105.643325] ----
    [ 105.646125] lock(&(&p->tcfa_lock)->rlock);
    [ 105.650747]
    [ 105.653717] lock(&(&p->tcfa_lock)->rlock);
    [ 105.658514]
    *** DEADLOCK ***

    [ 105.665349] 1 lock held by swapper/16/0:
    [ 105.669629] #0: 00000000a640ad99 ((&est->timer)){+.-.}, at: call_timer_fn+0x10b/0x550
    [ 105.678200]
    stack backtrace:
    [ 105.683194] CPU: 16 PID: 0 Comm: swapper/16 Not tainted 4.18.0-rc8+ #664
    [ 105.690249] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017
    [ 105.698626] Call Trace:
    [ 105.701421]
    [ 105.703791] dump_stack+0x92/0xeb
    [ 105.707461] print_usage_bug+0x336/0x34c
    [ 105.711744] mark_lock+0x7c9/0x980
    [ 105.715500] ? print_shortest_lock_dependencies+0x2e0/0x2e0
    [ 105.721424] ? check_usage_forwards+0x230/0x230
    [ 105.726315] __lock_acquire+0x923/0x26f0
    [ 105.730597] ? debug_show_all_locks+0x240/0x240
    [ 105.735478] ? mark_lock+0x493/0x980
    [ 105.739412] ? check_chain_key+0x140/0x1f0
    [ 105.743861] ? __lock_acquire+0x836/0x26f0
    [ 105.748323] ? lock_acquire+0x12e/0x290
    [ 105.752516] lock_acquire+0x12e/0x290
    [ 105.756539] ? est_fetch_counters+0x3c/0xa0
    [ 105.761084] _raw_spin_lock+0x2c/0x40
    [ 105.765099] ? est_fetch_counters+0x3c/0xa0
    [ 105.769633] est_fetch_counters+0x3c/0xa0
    [ 105.773995] est_timer+0x87/0x390
    [ 105.777670] ? est_fetch_counters+0xa0/0xa0
    [ 105.782210] ? lock_acquire+0x12e/0x290
    [ 105.786410] call_timer_fn+0x161/0x550
    [ 105.790512] ? est_fetch_counters+0xa0/0xa0
    [ 105.795055] ? del_timer_sync+0xd0/0xd0
    [ 105.799249] ? __lock_is_held+0x93/0x110
    [ 105.803531] ? mark_held_locks+0x20/0xe0
    [ 105.807813] ? _raw_spin_unlock_irq+0x29/0x40
    [ 105.812525] ? est_fetch_counters+0xa0/0xa0
    [ 105.817069] ? est_fetch_counters+0xa0/0xa0
    [ 105.821610] run_timer_softirq+0x3c4/0x9f0
    [ 105.826064] ? lock_acquire+0x12e/0x290
    [ 105.830257] ? __bpf_trace_timer_class+0x10/0x10
    [ 105.835237] ? __lock_is_held+0x25/0x110
    [ 105.839517] __do_softirq+0x11d/0x7bf
    [ 105.843542] irq_exit+0x140/0x190
    [ 105.847208] smp_apic_timer_interrupt+0xac/0x3b0
    [ 105.852182] apic_timer_interrupt+0xf/0x20
    [ 105.856628]
    [ 105.859081] RIP: 0010:cpuidle_enter_state+0xd8/0x4d0
    [ 105.864395] Code: 46 ff 48 89 44 24 08 0f 1f 44 00 00 31 ff e8 cf ec 46 ff 80 7c 24 07 00 0f 85 1d 02 00 00 e8 9f 90 4b ff fb 66 0f 1f 44 00 00 8b 6c 24 08 4d 29 fd 0f 80 36 03 00 00 4c 89 e8 48 ba cf f7 53
    [ 105.884288] RSP: 0018:ffff8803ad94fd20 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
    [ 105.892494] RAX: 0000000000000000 RBX: ffffe8fb300829c0 RCX: ffffffffb41e19e1
    [ 105.899988] RDX: 0000000000000007 RSI: dffffc0000000000 RDI: ffff8803ad9358ac
    [ 105.907503] RBP: ffffffffb6636300 R08: 0000000000000004 R09: 0000000000000000
    [ 105.914997] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000004
    [ 105.922487] R13: ffffffffb6636140 R14: ffffffffb66362d8 R15: 000000188d36091b
    [ 105.929988] ? trace_hardirqs_on_caller+0x141/0x2d0
    [ 105.935232] do_idle+0x28e/0x320
    [ 105.938817] ? arch_cpu_idle_exit+0x40/0x40
    [ 105.943361] ? mark_lock+0x8c1/0x980
    [ 105.947295] ? _raw_spin_unlock_irqrestore+0x32/0x60
    [ 105.952619] cpu_startup_entry+0xc2/0xd0
    [ 105.956900] ? cpu_in_idle+0x20/0x20
    [ 105.960830] ? _raw_spin_unlock_irqrestore+0x32/0x60
    [ 105.966146] ? trace_hardirqs_on_caller+0x141/0x2d0
    [ 105.971391] start_secondary+0x2b5/0x360
    [ 105.975669] ? set_cpu_sibling_map+0x1330/0x1330
    [ 105.980654] secondary_startup_64+0xa5/0xb0

    Taking tcf_lock in sample action with bh disabled causes lockdep to issue a
    warning regarding possible irq lock inversion dependency between tcf_lock,
    and psample_groups_lock that is taken when holding tcf_lock in sample init:

    [ 162.108959] Possible interrupt unsafe locking scenario:

    [ 162.116386] CPU0 CPU1
    [ 162.121277] ---- ----
    [ 162.126162] lock(psample_groups_lock);
    [ 162.130447] local_irq_disable();
    [ 162.136772] lock(&(&p->tcfa_lock)->rlock);
    [ 162.143957] lock(psample_groups_lock);
    [ 162.150813]
    [ 162.153808] lock(&(&p->tcfa_lock)->rlock);
    [ 162.158608]
    *** DEADLOCK ***

    In order to prevent potential lock inversion dependency between tcf_lock
    and psample_groups_lock, extract call to psample_group_get() from tcf_lock
    protected section in sample action init function.

    Fixes: 4e232818bd32 ("net: sched: act_mirred: remove dependency on rtnl lock")
    Fixes: 764e9a24480f ("net: sched: act_vlan: remove dependency on rtnl lock")
    Fixes: 729e01260989 ("net: sched: act_tunnel_key: remove dependency on rtnl lock")
    Fixes: d77284956656 ("net: sched: act_sample: remove dependency on rtnl lock")
    Fixes: e8917f437006 ("net: sched: act_gact: remove dependency on rtnl lock")
    Fixes: b6a2b971c0b0 ("net: sched: act_csum: remove dependency on rtnl lock")
    Fixes: 2142236b4584 ("net: sched: act_bpf: remove dependency on rtnl lock")
    Signed-off-by: Vlad Buslov
    Signed-off-by: David S. Miller

    Vlad Buslov
     

12 Aug, 2018

1 commit

  • Use tcf lock to protect tunnel key action struct private data from
    concurrent modification in init and dump. Use rcu swap operation to
    reassign params pointer under protection of tcf lock. (old params value is
    not used by init, so there is no need of standalone rcu dereference step)

    Remove rtnl lock assertion that is no longer required.

    Signed-off-by: Vlad Buslov
    Signed-off-by: David S. Miller

    Vlad Buslov
     

31 Jul, 2018

1 commit

  • Each lockless action currently does its own RCU locking in ->act().
    This allows using plain RCU accessor, even if the context
    is really RCU BH.

    This change drops the per action RCU lock, replace the accessors
    with the _bh variant, cleans up a bit the surrounding code and
    documents the RCU status in the relevant header.
    No functional nor performance change is intended.

    The goal of this patch is clarifying that the RCU critical section
    used by the tc actions extends up to the classifier's caller.

    v1 -> v2:
    - preserve rcu lock in act_bpf: it's needed by eBPF helpers,
    as pointed out by Daniel

    v3 -> v4:
    - fixed some typos in the commit message (JiriP)

    Signed-off-by: Paolo Abeni
    Acked-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Paolo Abeni
     

21 Jul, 2018

1 commit


20 Jul, 2018

1 commit


08 Jul, 2018

1 commit

  • Implement function that atomically checks if action exists and either takes
    reference to it, or allocates idr slot for action index to prevent
    concurrent allocations of actions with same index. Use EBUSY error pointer
    to indicate that idr slot is reserved.

    Implement cleanup helper function that removes temporary error pointer from
    idr. (in case of error between idr allocation and insertion of newly
    created action to specified index)

    Refactor all action init functions to insert new action to idr using this
    API.

    Reviewed-by: Marcelo Ricardo Leitner
    Signed-off-by: Vlad Buslov
    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Vlad Buslov