10 Mar, 2019

1 commit

  • [ Upstream commit a3df633a3c92bb96b06552c3f828d7c267774379 ]

    Metadata pointer is only initialized for action TCA_TUNNEL_KEY_ACT_SET, but
    it is unconditionally dereferenced in tunnel_key_init() error handler.
    Verify that metadata pointer is not NULL before dereferencing it in
    tunnel_key_init error handling code.

    Fixes: ee28bb56ac5b ("net/sched: fix memory leak in act_tunnel_key_init()")
    Signed-off-by: Vlad Buslov
    Reviewed-by: Davide Caratti
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Vlad Buslov
     

31 Jan, 2019

1 commit

  • [ Upstream commit 9174c3df1cd181c14913138d50ccbe539bb08335 ]

    running the following TDC test cases:

    7afc - Replace tunnel_key set action with all parameters
    364d - Replace tunnel_key set action with all parameters and cookie

    it's possible to trigger kmemleak warnings like:

    unreferenced object 0xffff94797127ab40 (size 192):
    comm "tc", pid 3248, jiffies 4300565293 (age 1006.862s)
    hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 c0 93 f9 8a ff ff ff ff ................
    41 84 ee 89 ff ff ff ff 00 00 00 00 00 00 00 00 A...............
    backtrace:
    [] tunnel_key_init+0x31d/0x820 [act_tunnel_key]
    [] tcf_action_init_1+0x384/0x4c0
    [] tcf_action_init+0x12b/0x1a0
    [] tcf_action_add+0x73/0x170
    [] tc_ctl_action+0x122/0x160
    [] rtnetlink_rcv_msg+0x263/0x2d0
    [] netlink_rcv_skb+0x4a/0x110
    [] netlink_unicast+0x1a0/0x250
    [] netlink_sendmsg+0x2c1/0x3c0
    [] sock_sendmsg+0x36/0x40
    [] ___sys_sendmsg+0x280/0x2f0
    [] __sys_sendmsg+0x5e/0xa0
    [] do_syscall_64+0x5b/0x180
    [] entry_SYSCALL_64_after_hwframe+0x44/0xa9
    [] 0xffffffffffffffff

    when the tunnel_key action is replaced, the kernel forgets to release the
    dst metadata: ensure they are released by tunnel_key_init(), the same way
    it's done in tunnel_key_release().

    Fixes: d0f6dd8a914f4 ("net/sched: Introduce act_tunnel_key")
    Signed-off-by: Davide Caratti
    Acked-by: Cong Wang
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Davide Caratti
     

08 Sep, 2018

1 commit

  • When nla_put*() fails after nla_nest_start(), we need
    to call nla_nest_cancel() to cancel the message, otherwise
    we end up calling nla_nest_end() like a success.

    Fixes: 0ed5269f9e41 ("net/sched: add tunnel option support to act_tunnel_key")
    Cc: Davide Caratti
    Cc: Simon Horman
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Cong Wang
     

06 Sep, 2018

1 commit

  • If users try to install act_tunnel_key 'set' rules with duplicate values
    of 'index', the tunnel metadata are allocated, but never released. Then,
    kmemleak complains as follows:

    # tc a a a tunnel_key set src_ip 1.1.1.1 dst_ip 2.2.2.2 id 42 index 111
    # echo clear > /sys/kernel/debug/kmemleak
    # tc a a a tunnel_key set src_ip 1.1.1.1 dst_ip 2.2.2.2 id 42 index 111
    Error: TC IDR already exists.
    We have an error talking to the kernel
    # echo scan > /sys/kernel/debug/kmemleak
    # cat /sys/kernel/debug/kmemleak
    unreferenced object 0xffff8800574e6c80 (size 256):
    comm "tc", pid 5617, jiffies 4298118009 (age 57.990s)
    hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 1c e8 b0 ff ff ff ff ................
    81 24 c2 ad ff ff ff ff 00 00 00 00 00 00 00 00 .$..............
    backtrace:
    [] tunnel_key_init+0x8a5/0x1800 [act_tunnel_key]
    [] tcf_action_init_1+0x698/0xac0
    [] tcf_action_init+0x15c/0x590
    [] tc_ctl_action+0x336/0x5c2
    [] rtnetlink_rcv_msg+0x357/0x8e0
    [] netlink_rcv_skb+0x124/0x350
    [] netlink_unicast+0x40f/0x5d0
    [] netlink_sendmsg+0x6e8/0xba0
    [] sock_sendmsg+0xb3/0xf0
    [] ___sys_sendmsg+0x654/0x960
    [] __sys_sendmsg+0xd3/0x170
    [] do_syscall_64+0xa5/0x470
    [] entry_SYSCALL_64_after_hwframe+0x49/0xbe
    [] 0xffffffffffffffff

    This problem theoretically happens also in case users attempt to setup a
    geneve rule having wrong configuration data, or when the kernel fails to
    allocate 'params_new'. Ensure that tunnel_key_init() releases the tunnel
    metadata also in the above conditions.

    Addresses-Coverity-ID: 1373974 ("Resource leak")
    Fixes: d0f6dd8a914f4 ("net/sched: Introduce act_tunnel_key")
    Fixes: 0ed5269f9e41f ("net/sched: add tunnel option support to act_tunnel_key")
    Signed-off-by: Davide Caratti
    Acked-by: Cong Wang
    Signed-off-by: David S. Miller

    Davide Caratti
     

22 Aug, 2018

1 commit

  • All ops->delete() wants is getting the tn->idrinfo, but we already
    have tc_action before calling ops->delete(), and tc_action has
    a pointer ->idrinfo.

    More importantly, each type of action does the same thing, that is,
    just calling tcf_idr_delete_index().

    So it can be just removed.

    Fixes: b409074e6693 ("net: sched: add 'delete' function to action ops")
    Cc: Jiri Pirko
    Cc: Vlad Buslov
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Cong Wang
     

20 Aug, 2018

1 commit

  • Recently, ops->init() and ops->dump() of all actions were modified to
    always obtain tcf_lock when accessing private action state. Actions that
    don't depend on tcf_lock for synchronization with their data path use
    non-bh locking API. However, tcf_lock is also used to protect rate
    estimator stats in softirq context by timer callback.

    Change ops->init() and ops->dump() of all actions to disable bh when using
    tcf_lock to prevent deadlock reported by following lockdep warning:

    [ 105.470398] ================================
    [ 105.475014] WARNING: inconsistent lock state
    [ 105.479628] 4.18.0-rc8+ #664 Not tainted
    [ 105.483897] --------------------------------
    [ 105.488511] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
    [ 105.494871] swapper/16/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
    [ 105.500449] 00000000f86c012e (&(&p->tcfa_lock)->rlock){+.?.}, at: est_fetch_counters+0x3c/0xa0
    [ 105.509696] {SOFTIRQ-ON-W} state was registered at:
    [ 105.514925] _raw_spin_lock+0x2c/0x40
    [ 105.519022] tcf_bpf_init+0x579/0x820 [act_bpf]
    [ 105.523990] tcf_action_init_1+0x4e4/0x660
    [ 105.528518] tcf_action_init+0x1ce/0x2d0
    [ 105.532880] tcf_exts_validate+0x1d8/0x200
    [ 105.537416] fl_change+0x55a/0x268b [cls_flower]
    [ 105.542469] tc_new_tfilter+0x748/0xa20
    [ 105.546738] rtnetlink_rcv_msg+0x56a/0x6d0
    [ 105.551268] netlink_rcv_skb+0x18d/0x200
    [ 105.555628] netlink_unicast+0x2d0/0x370
    [ 105.559990] netlink_sendmsg+0x3b9/0x6a0
    [ 105.564349] sock_sendmsg+0x6b/0x80
    [ 105.568271] ___sys_sendmsg+0x4a1/0x520
    [ 105.572547] __sys_sendmsg+0xd7/0x150
    [ 105.576655] do_syscall_64+0x72/0x2c0
    [ 105.580757] entry_SYSCALL_64_after_hwframe+0x49/0xbe
    [ 105.586243] irq event stamp: 489296
    [ 105.590084] hardirqs last enabled at (489296): [] _raw_spin_unlock_irq+0x29/0x40
    [ 105.599765] hardirqs last disabled at (489295): [] _raw_spin_lock_irq+0x15/0x50
    [ 105.609277] softirqs last enabled at (489292): [] irq_enter+0x83/0xa0
    [ 105.618001] softirqs last disabled at (489293): [] irq_exit+0x140/0x190
    [ 105.626813]
    other info that might help us debug this:
    [ 105.633976] Possible unsafe locking scenario:

    [ 105.640526] CPU0
    [ 105.643325] ----
    [ 105.646125] lock(&(&p->tcfa_lock)->rlock);
    [ 105.650747]
    [ 105.653717] lock(&(&p->tcfa_lock)->rlock);
    [ 105.658514]
    *** DEADLOCK ***

    [ 105.665349] 1 lock held by swapper/16/0:
    [ 105.669629] #0: 00000000a640ad99 ((&est->timer)){+.-.}, at: call_timer_fn+0x10b/0x550
    [ 105.678200]
    stack backtrace:
    [ 105.683194] CPU: 16 PID: 0 Comm: swapper/16 Not tainted 4.18.0-rc8+ #664
    [ 105.690249] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017
    [ 105.698626] Call Trace:
    [ 105.701421]
    [ 105.703791] dump_stack+0x92/0xeb
    [ 105.707461] print_usage_bug+0x336/0x34c
    [ 105.711744] mark_lock+0x7c9/0x980
    [ 105.715500] ? print_shortest_lock_dependencies+0x2e0/0x2e0
    [ 105.721424] ? check_usage_forwards+0x230/0x230
    [ 105.726315] __lock_acquire+0x923/0x26f0
    [ 105.730597] ? debug_show_all_locks+0x240/0x240
    [ 105.735478] ? mark_lock+0x493/0x980
    [ 105.739412] ? check_chain_key+0x140/0x1f0
    [ 105.743861] ? __lock_acquire+0x836/0x26f0
    [ 105.748323] ? lock_acquire+0x12e/0x290
    [ 105.752516] lock_acquire+0x12e/0x290
    [ 105.756539] ? est_fetch_counters+0x3c/0xa0
    [ 105.761084] _raw_spin_lock+0x2c/0x40
    [ 105.765099] ? est_fetch_counters+0x3c/0xa0
    [ 105.769633] est_fetch_counters+0x3c/0xa0
    [ 105.773995] est_timer+0x87/0x390
    [ 105.777670] ? est_fetch_counters+0xa0/0xa0
    [ 105.782210] ? lock_acquire+0x12e/0x290
    [ 105.786410] call_timer_fn+0x161/0x550
    [ 105.790512] ? est_fetch_counters+0xa0/0xa0
    [ 105.795055] ? del_timer_sync+0xd0/0xd0
    [ 105.799249] ? __lock_is_held+0x93/0x110
    [ 105.803531] ? mark_held_locks+0x20/0xe0
    [ 105.807813] ? _raw_spin_unlock_irq+0x29/0x40
    [ 105.812525] ? est_fetch_counters+0xa0/0xa0
    [ 105.817069] ? est_fetch_counters+0xa0/0xa0
    [ 105.821610] run_timer_softirq+0x3c4/0x9f0
    [ 105.826064] ? lock_acquire+0x12e/0x290
    [ 105.830257] ? __bpf_trace_timer_class+0x10/0x10
    [ 105.835237] ? __lock_is_held+0x25/0x110
    [ 105.839517] __do_softirq+0x11d/0x7bf
    [ 105.843542] irq_exit+0x140/0x190
    [ 105.847208] smp_apic_timer_interrupt+0xac/0x3b0
    [ 105.852182] apic_timer_interrupt+0xf/0x20
    [ 105.856628]
    [ 105.859081] RIP: 0010:cpuidle_enter_state+0xd8/0x4d0
    [ 105.864395] Code: 46 ff 48 89 44 24 08 0f 1f 44 00 00 31 ff e8 cf ec 46 ff 80 7c 24 07 00 0f 85 1d 02 00 00 e8 9f 90 4b ff fb 66 0f 1f 44 00 00 8b 6c 24 08 4d 29 fd 0f 80 36 03 00 00 4c 89 e8 48 ba cf f7 53
    [ 105.884288] RSP: 0018:ffff8803ad94fd20 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
    [ 105.892494] RAX: 0000000000000000 RBX: ffffe8fb300829c0 RCX: ffffffffb41e19e1
    [ 105.899988] RDX: 0000000000000007 RSI: dffffc0000000000 RDI: ffff8803ad9358ac
    [ 105.907503] RBP: ffffffffb6636300 R08: 0000000000000004 R09: 0000000000000000
    [ 105.914997] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000004
    [ 105.922487] R13: ffffffffb6636140 R14: ffffffffb66362d8 R15: 000000188d36091b
    [ 105.929988] ? trace_hardirqs_on_caller+0x141/0x2d0
    [ 105.935232] do_idle+0x28e/0x320
    [ 105.938817] ? arch_cpu_idle_exit+0x40/0x40
    [ 105.943361] ? mark_lock+0x8c1/0x980
    [ 105.947295] ? _raw_spin_unlock_irqrestore+0x32/0x60
    [ 105.952619] cpu_startup_entry+0xc2/0xd0
    [ 105.956900] ? cpu_in_idle+0x20/0x20
    [ 105.960830] ? _raw_spin_unlock_irqrestore+0x32/0x60
    [ 105.966146] ? trace_hardirqs_on_caller+0x141/0x2d0
    [ 105.971391] start_secondary+0x2b5/0x360
    [ 105.975669] ? set_cpu_sibling_map+0x1330/0x1330
    [ 105.980654] secondary_startup_64+0xa5/0xb0

    Taking tcf_lock in sample action with bh disabled causes lockdep to issue a
    warning regarding possible irq lock inversion dependency between tcf_lock,
    and psample_groups_lock that is taken when holding tcf_lock in sample init:

    [ 162.108959] Possible interrupt unsafe locking scenario:

    [ 162.116386] CPU0 CPU1
    [ 162.121277] ---- ----
    [ 162.126162] lock(psample_groups_lock);
    [ 162.130447] local_irq_disable();
    [ 162.136772] lock(&(&p->tcfa_lock)->rlock);
    [ 162.143957] lock(psample_groups_lock);
    [ 162.150813]
    [ 162.153808] lock(&(&p->tcfa_lock)->rlock);
    [ 162.158608]
    *** DEADLOCK ***

    In order to prevent potential lock inversion dependency between tcf_lock
    and psample_groups_lock, extract call to psample_group_get() from tcf_lock
    protected section in sample action init function.

    Fixes: 4e232818bd32 ("net: sched: act_mirred: remove dependency on rtnl lock")
    Fixes: 764e9a24480f ("net: sched: act_vlan: remove dependency on rtnl lock")
    Fixes: 729e01260989 ("net: sched: act_tunnel_key: remove dependency on rtnl lock")
    Fixes: d77284956656 ("net: sched: act_sample: remove dependency on rtnl lock")
    Fixes: e8917f437006 ("net: sched: act_gact: remove dependency on rtnl lock")
    Fixes: b6a2b971c0b0 ("net: sched: act_csum: remove dependency on rtnl lock")
    Fixes: 2142236b4584 ("net: sched: act_bpf: remove dependency on rtnl lock")
    Signed-off-by: Vlad Buslov
    Signed-off-by: David S. Miller

    Vlad Buslov
     

12 Aug, 2018

1 commit

  • Use tcf lock to protect tunnel key action struct private data from
    concurrent modification in init and dump. Use rcu swap operation to
    reassign params pointer under protection of tcf lock. (old params value is
    not used by init, so there is no need of standalone rcu dereference step)

    Remove rtnl lock assertion that is no longer required.

    Signed-off-by: Vlad Buslov
    Signed-off-by: David S. Miller

    Vlad Buslov
     

31 Jul, 2018

1 commit

  • Each lockless action currently does its own RCU locking in ->act().
    This allows using plain RCU accessor, even if the context
    is really RCU BH.

    This change drops the per action RCU lock, replace the accessors
    with the _bh variant, cleans up a bit the surrounding code and
    documents the RCU status in the relevant header.
    No functional nor performance change is intended.

    The goal of this patch is clarifying that the RCU critical section
    used by the tc actions extends up to the classifier's caller.

    v1 -> v2:
    - preserve rcu lock in act_bpf: it's needed by eBPF helpers,
    as pointed out by Daniel

    v3 -> v4:
    - fixed some typos in the commit message (JiriP)

    Signed-off-by: Paolo Abeni
    Acked-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Paolo Abeni
     

21 Jul, 2018

1 commit


20 Jul, 2018

1 commit


08 Jul, 2018

5 commits

  • Implement function that atomically checks if action exists and either takes
    reference to it, or allocates idr slot for action index to prevent
    concurrent allocations of actions with same index. Use EBUSY error pointer
    to indicate that idr slot is reserved.

    Implement cleanup helper function that removes temporary error pointer from
    idr. (in case of error between idr allocation and insertion of newly
    created action to specified index)

    Refactor all action init functions to insert new action to idr using this
    API.

    Reviewed-by: Marcelo Ricardo Leitner
    Signed-off-by: Vlad Buslov
    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Vlad Buslov
     
  • Return from action init function with reference to action taken,
    even when overwriting existing action.

    Action init API initializes its fourth argument (pointer to pointer to tc
    action) to either existing action with same index or newly created action.
    In case of existing index(and bind argument is zero), init function returns
    without incrementing action reference counter. Caller of action init then
    proceeds working with action, without actually holding reference to it.
    This means that action could be deleted concurrently.

    Change action init behavior to always take reference to action before
    returning successfully, in order to protect from concurrent deletion.

    Reviewed-by: Marcelo Ricardo Leitner
    Signed-off-by: Vlad Buslov
    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Vlad Buslov
     
  • Extend action ops with 'delete' function. Each action type to implements
    its own delete function that doesn't depend on rtnl lock.

    Implement delete function that is required to delete actions without
    holding rtnl lock. Use action API function that atomically deletes action
    only if it is still in action idr. This implementation prevents concurrent
    threads from deleting same action twice.

    Reviewed-by: Marcelo Ricardo Leitner
    Signed-off-by: Vlad Buslov
    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Vlad Buslov
     
  • Add additional 'rtnl_held' argument to act API init functions. It is
    required to implement actions that need to release rtnl lock before loading
    kernel module and reacquire if afterwards.

    Reviewed-by: Marcelo Ricardo Leitner
    Signed-off-by: Vlad Buslov
    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Vlad Buslov
     
  • Change type of action reference counter to refcount_t.

    Change type of action bind counter to atomic_t.
    This type is used to allow decrementing bind counter without testing
    for 0 result.

    Reviewed-by: Marcelo Ricardo Leitner
    Signed-off-by: Vlad Buslov
    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Vlad Buslov
     

07 Jul, 2018

1 commit

  • the control action in the common member of struct tcf_tunnel_key must be a
    valid value, as it can contain the chain index when 'goto chain' is used.
    Ensure that the control action can be read as x->tcfa_action, when x is a
    pointer to struct tc_action and x->ops->type is TCA_ACT_TUNNEL_KEY, to
    prevent the following command:

    # tc filter add dev $h2 ingress protocol ip pref 1 handle 101 flower \
    > $tcflags dst_mac $h2mac action tunnel_key unset goto chain 1

    from causing a NULL dereference when a matching packet is received:

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
    PGD 80000001097ac067 P4D 80000001097ac067 PUD 103b0a067 PMD 0
    Oops: 0000 [#1] SMP PTI
    CPU: 0 PID: 3491 Comm: mausezahn Tainted: G E 4.18.0-rc2.auguri+ #421
    Hardware name: Hewlett-Packard HP Z220 CMT Workstation/1790, BIOS K51 v01.58 02/07/2013
    RIP: 0010:tcf_action_exec+0xb8/0x100
    Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
    RSP: 0018:ffff95145ea03c40 EFLAGS: 00010246
    RAX: 0000000020000001 RBX: ffff9514499e5800 RCX: 0000000000000001
    RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000000
    RBP: ffff95145ea03e60 R08: 0000000000000000 R09: ffff95145ea03c9c
    R10: ffff95145ea03c78 R11: 0000000000000008 R12: ffff951456a69800
    R13: ffff951456a69808 R14: 0000000000000001 R15: ffff95144965ee40
    FS: 00007fd67ee11740(0000) GS:ffff95145ea00000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000000000000 CR3: 00000001038a2006 CR4: 00000000001606f0
    Call Trace:

    fl_classify+0x1ad/0x1c0 [cls_flower]
    ? __update_load_avg_se.isra.47+0x1ca/0x1d0
    ? __update_load_avg_se.isra.47+0x1ca/0x1d0
    ? update_load_avg+0x665/0x690
    ? update_load_avg+0x665/0x690
    ? kmem_cache_alloc+0x38/0x1c0
    tcf_classify+0x89/0x140
    __netif_receive_skb_core+0x5ea/0xb70
    ? enqueue_entity+0xd0/0x270
    ? process_backlog+0x97/0x150
    process_backlog+0x97/0x150
    net_rx_action+0x14b/0x3e0
    __do_softirq+0xde/0x2b4
    do_softirq_own_stack+0x2a/0x40

    do_softirq.part.18+0x49/0x50
    __local_bh_enable_ip+0x49/0x50
    __dev_queue_xmit+0x4ab/0x8a0
    ? wait_woken+0x80/0x80
    ? packet_sendmsg+0x38f/0x810
    ? __dev_queue_xmit+0x8a0/0x8a0
    packet_sendmsg+0x38f/0x810
    sock_sendmsg+0x36/0x40
    __sys_sendto+0x10e/0x140
    ? do_vfs_ioctl+0xa4/0x630
    ? syscall_trace_enter+0x1df/0x2e0
    ? __audit_syscall_exit+0x22a/0x290
    __x64_sys_sendto+0x24/0x30
    do_syscall_64+0x5b/0x180
    entry_SYSCALL_64_after_hwframe+0x44/0xa9
    RIP: 0033:0x7fd67e18dc93
    Code: 48 8b 0d 18 83 20 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d 59 c7 20 00 00 75 13 49 89 ca b8 2c 00 00 00 0f 05 3d 01 f0 ff ff 73 34 c3 48 83 ec 08 e8 2b f7 ff ff 48 89 04 24
    RSP: 002b:00007ffe0189b748 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
    RAX: ffffffffffffffda RBX: 00000000020ca010 RCX: 00007fd67e18dc93
    RDX: 0000000000000062 RSI: 00000000020ca322 RDI: 0000000000000003
    RBP: 00007ffe0189b780 R08: 00007ffe0189b760 R09: 0000000000000014
    R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000062
    R13: 00000000020ca322 R14: 00007ffe0189b760 R15: 0000000000000003
    Modules linked in: act_tunnel_key act_gact cls_flower sch_ingress vrf veth act_csum(E) xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter intel_rapl snd_hda_codec_hdmi x86_pkg_temp_thermal intel_powerclamp snd_hda_codec_realtek coretemp snd_hda_codec_generic kvm_intel kvm irqbypass snd_hda_intel crct10dif_pclmul crc32_pclmul hp_wmi ghash_clmulni_intel pcbc snd_hda_codec aesni_intel sparse_keymap rfkill snd_hda_core snd_hwdep snd_seq crypto_simd iTCO_wdt gpio_ich iTCO_vendor_support wmi_bmof cryptd mei_wdt glue_helper snd_seq_device snd_pcm pcspkr snd_timer snd i2c_i801 lpc_ich sg soundcore wmi mei_me
    mei ie31200_edac nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod sr_mod cdrom i915 video i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ahci crc32c_intel libahci serio_raw sfc libata mtd drm ixgbe mdio i2c_core e1000e dca
    CR2: 0000000000000000
    ---[ end trace 1ab8b5b5d4639dfc ]---
    RIP: 0010:tcf_action_exec+0xb8/0x100
    Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
    RSP: 0018:ffff95145ea03c40 EFLAGS: 00010246
    RAX: 0000000020000001 RBX: ffff9514499e5800 RCX: 0000000000000001
    RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000000
    RBP: ffff95145ea03e60 R08: 0000000000000000 R09: ffff95145ea03c9c
    R10: ffff95145ea03c78 R11: 0000000000000008 R12: ffff951456a69800
    R13: ffff951456a69808 R14: 0000000000000001 R15: ffff95144965ee40
    FS: 00007fd67ee11740(0000) GS:ffff95145ea00000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000000000000 CR3: 00000001038a2006 CR4: 00000000001606f0
    Kernel panic - not syncing: Fatal exception in interrupt
    Kernel Offset: 0x11400000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
    ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---

    Fixes: d0f6dd8a914f ("net/sched: Introduce act_tunnel_key")
    Signed-off-by: Davide Caratti
    Signed-off-by: David S. Miller

    Davide Caratti
     

29 Jun, 2018

3 commits

  • Allow setting tunnel options using the act_tunnel_key action.

    Options are expressed as class:type:data and multiple options
    may be listed using a comma delimiter.

    # ip link add name geneve0 type geneve dstport 0 external
    # tc qdisc add dev eth0 ingress
    # tc filter add dev eth0 protocol ip parent ffff: \
    flower indev eth0 \
    ip_proto udp \
    action tunnel_key \
    set src_ip 10.0.99.192 \
    dst_ip 10.0.99.193 \
    dst_port 6081 \
    id 11 \
    geneve_opts 0102:80:00800022,0102:80:00800022 \
    action mirred egress redirect dev geneve0

    Signed-off-by: Simon Horman
    Signed-off-by: Pieter Jansen van Vuuren
    Reviewed-by: Jakub Kicinski
    Signed-off-by: David S. Miller

    Simon Horman
     
  • Add extended ack support for the tunnel key action by using NL_SET_ERR_MSG
    during validation of user input.

    Cc: Alexander Aring
    Signed-off-by: Simon Horman
    Signed-off-by: Pieter Jansen van Vuuren
    Reviewed-by: Jakub Kicinski
    Reviewed-by: David Ahern
    Signed-off-by: David S. Miller

    Simon Horman
     
  • Metadata may be NULL for one of two reasons:
    * Missing user input
    * Failure to allocate the metadata dst

    Disambiguate these case by returning -EINVAL for the former and -ENOMEM
    for the latter rather than -EINVAL for both cases.

    This is in preparation for using extended ack to provide more information
    to users when parsing their input.

    Signed-off-by: Simon Horman
    Reviewed-by: Jakub Kicinski
    Signed-off-by: David S. Miller

    Simon Horman
     

28 Mar, 2018

1 commit


23 Mar, 2018

1 commit

  • Fun set of conflict resolutions here...

    For the mac80211 stuff, these were fortunately just parallel
    adds. Trivially resolved.

    In drivers/net/phy/phy.c we had a bug fix in 'net' that moved the
    function phy_disable_interrupts() earlier in the file, whilst in
    'net-next' the phy_error() call from this function was removed.

    In net/ipv4/xfrm4_policy.c, David Ahern's changes to remove the
    'rt_table_id' member of rtable collided with a bug fix in 'net' that
    added a new struct member "rt_mtu_locked" which needs to be copied
    over here.

    The mlxsw driver conflict consisted of net-next separating
    the span code and definitions into separate files, whilst
    a 'net' bug fix made some changes to that moved code.

    The mlx5 infiniband conflict resolution was quite non-trivial,
    the RDMA tree's merge commit was used as a guide here, and
    here are their notes:

    ====================

    Due to bug fixes found by the syzkaller bot and taken into the for-rc
    branch after development for the 4.17 merge window had already started
    being taken into the for-next branch, there were fairly non-trivial
    merge issues that would need to be resolved between the for-rc branch
    and the for-next branch. This merge resolves those conflicts and
    provides a unified base upon which ongoing development for 4.17 can
    be based.

    Conflicts:
    drivers/infiniband/hw/mlx5/main.c - Commit 42cea83f9524
    (IB/mlx5: Fix cleanup order on unload) added to for-rc and
    commit b5ca15ad7e61 (IB/mlx5: Add proper representors support)
    add as part of the devel cycle both needed to modify the
    init/de-init functions used by mlx5. To support the new
    representors, the new functions added by the cleanup patch
    needed to be made non-static, and the init/de-init list
    added by the representors patch needed to be modified to
    match the init/de-init list changes made by the cleanup
    patch.
    Updates:
    drivers/infiniband/hw/mlx5/mlx5_ib.h - Update function
    prototypes added by representors patch to reflect new function
    names as changed by cleanup patch
    drivers/infiniband/hw/mlx5/ib_rep.c - Update init/de-init
    stage list to match new order from cleanup patch
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

18 Mar, 2018

1 commit

  • when the following command

    # tc action add action tunnel_key unset index 100

    is run for the first time, and tunnel_key_init() fails to allocate struct
    tcf_tunnel_key_params, tunnel_key_release() dereferences NULL pointers.
    This causes the following error:

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
    IP: tunnel_key_release+0xd/0x40 [act_tunnel_key]
    PGD 8000000033787067 P4D 8000000033787067 PUD 74646067 PMD 0
    Oops: 0000 [#1] SMP PTI
    Modules linked in: act_tunnel_key(E) act_csum ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 mbcache jbd2 crct10dif_pclmul crc32_pclmul snd_hda_codec_generic ghash_clmulni_intel snd_hda_intel pcbc snd_hda_codec snd_hda_core snd_hwdep snd_seq aesni_intel snd_seq_device crypto_simd glue_helper snd_pcm cryptd joydev snd_timer pcspkr virtio_balloon snd i2c_piix4 soundcore nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c ata_generic pata_acpi qxl drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm virtio_net virtio_blk drm virtio_console crc32c_intel ata_piix serio_raw i2c_core virtio_pci libata virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod
    CPU: 2 PID: 3101 Comm: tc Tainted: G E 4.16.0-rc4.act_vlan.orig+ #403
    Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
    RIP: 0010:tunnel_key_release+0xd/0x40 [act_tunnel_key]
    RSP: 0018:ffffba46803b7768 EFLAGS: 00010286
    RAX: ffffffffc09010a0 RBX: 0000000000000000 RCX: 0000000000000024
    RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff99ee336d7480
    RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000044
    R10: 0000000000000220 R11: ffff99ee79d73131 R12: 0000000000000000
    R13: ffff99ee32d67610 R14: ffff99ee7671dc38 R15: 00000000fffffff4
    FS: 00007febcb2cd740(0000) GS:ffff99ee7fd00000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000000000010 CR3: 000000007c8e4005 CR4: 00000000001606e0
    Call Trace:
    __tcf_idr_release+0x79/0xf0
    tunnel_key_init+0xd9/0x460 [act_tunnel_key]
    tcf_action_init_1+0x2cc/0x430
    tcf_action_init+0xd3/0x1b0
    tc_ctl_action+0x18b/0x240
    rtnetlink_rcv_msg+0x29c/0x310
    ? _cond_resched+0x15/0x30
    ? __kmalloc_node_track_caller+0x1b9/0x270
    ? rtnl_calcit.isra.28+0x100/0x100
    netlink_rcv_skb+0xd2/0x110
    netlink_unicast+0x17c/0x230
    netlink_sendmsg+0x2cd/0x3c0
    sock_sendmsg+0x30/0x40
    ___sys_sendmsg+0x27a/0x290
    __sys_sendmsg+0x51/0x90
    do_syscall_64+0x6e/0x1a0
    entry_SYSCALL_64_after_hwframe+0x3d/0xa2
    RIP: 0033:0x7febca6deba0
    RSP: 002b:00007ffe7b0dd128 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
    RAX: ffffffffffffffda RBX: 00007ffe7b0dd250 RCX: 00007febca6deba0
    RDX: 0000000000000000 RSI: 00007ffe7b0dd1a0 RDI: 0000000000000003
    RBP: 000000005aaa90cb R08: 0000000000000002 R09: 0000000000000000
    R10: 00007ffe7b0dcba0 R11: 0000000000000246 R12: 0000000000000000
    R13: 00007ffe7b0dd264 R14: 0000000000000001 R15: 0000000000669f60
    Code: 44 00 00 8b 0d b5 23 00 00 48 8b 87 48 10 00 00 48 8b 3c c8 e9 a5 e5 d8 c3 0f 1f 44 00 00 0f 1f 44 00 00 53 48 8b 9f b0 00 00 00 7b 10 01 74 0b 48 89 df 31 f6 5b e9 f2 fa 7f c3 48 8b 7b 18
    RIP: tunnel_key_release+0xd/0x40 [act_tunnel_key] RSP: ffffba46803b7768
    CR2: 0000000000000010

    Fix this in tunnel_key_release(), ensuring 'param' is not NULL before
    dereferencing it.

    Fixes: d0f6dd8a914f ("net/sched: Introduce act_tunnel_key")
    Signed-off-by: Davide Caratti
    Acked-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Davide Caratti
     

16 Mar, 2018

1 commit

  • If set/unset mode of the tunnel_key action is not provided, ->init() still
    returns 0, and the caller proceeds with bogus 'struct tc_action *' object,
    this results in crash:

    % tc actions add action tunnel_key src_ip 1.1.1.1 dst_ip 2.2.2.1 id 7 index 1

    [ 35.805515] general protection fault: 0000 [#1] SMP PTI
    [ 35.806161] Modules linked in: act_tunnel_key kvm_intel kvm irqbypass
    crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64
    crypto_simd glue_helper cryptd serio_raw
    [ 35.808233] CPU: 1 PID: 428 Comm: tc Not tainted 4.16.0-rc4+ #286
    [ 35.808929] RIP: 0010:tcf_action_init+0x90/0x190
    [ 35.809457] RSP: 0018:ffffb8edc068b9a0 EFLAGS: 00010206
    [ 35.810053] RAX: 1320c000000a0003 RBX: 0000000000000001 RCX: 0000000000000000
    [ 35.810866] RDX: 0000000000000070 RSI: 0000000000007965 RDI: ffffb8edc068b910
    [ 35.811660] RBP: ffffb8edc068b9d0 R08: 0000000000000000 R09: ffffb8edc068b808
    [ 35.812463] R10: ffffffffc02bf040 R11: 0000000000000040 R12: ffffb8edc068bb38
    [ 35.813235] R13: 0000000000000000 R14: 0000000000000000 R15: ffffb8edc068b910
    [ 35.814006] FS: 00007f3d0d8556c0(0000) GS:ffff91d1dbc40000(0000)
    knlGS:0000000000000000
    [ 35.814881] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 35.815540] CR2: 000000000043f720 CR3: 0000000019248001 CR4: 00000000001606a0
    [ 35.816457] Call Trace:
    [ 35.817158] tc_ctl_action+0x11a/0x220
    [ 35.817795] rtnetlink_rcv_msg+0x23d/0x2e0
    [ 35.818457] ? __slab_alloc+0x1c/0x30
    [ 35.819079] ? __kmalloc_node_track_caller+0xb1/0x2b0
    [ 35.819544] ? rtnl_calcit.isra.30+0xe0/0xe0
    [ 35.820231] netlink_rcv_skb+0xce/0x100
    [ 35.820744] netlink_unicast+0x164/0x220
    [ 35.821500] netlink_sendmsg+0x293/0x370
    [ 35.822040] sock_sendmsg+0x30/0x40
    [ 35.822508] ___sys_sendmsg+0x2c5/0x2e0
    [ 35.823149] ? pagecache_get_page+0x27/0x220
    [ 35.823714] ? filemap_fault+0xa2/0x640
    [ 35.824423] ? page_add_file_rmap+0x108/0x200
    [ 35.825065] ? alloc_set_pte+0x2aa/0x530
    [ 35.825585] ? finish_fault+0x4e/0x70
    [ 35.826140] ? __handle_mm_fault+0xbc1/0x10d0
    [ 35.826723] ? __sys_sendmsg+0x41/0x70
    [ 35.827230] __sys_sendmsg+0x41/0x70
    [ 35.827710] do_syscall_64+0x68/0x120
    [ 35.828195] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
    [ 35.828859] RIP: 0033:0x7f3d0ca4da67
    [ 35.829331] RSP: 002b:00007ffc9f284338 EFLAGS: 00000246 ORIG_RAX:
    000000000000002e
    [ 35.830304] RAX: ffffffffffffffda RBX: 00007ffc9f284460 RCX: 00007f3d0ca4da67
    [ 35.831247] RDX: 0000000000000000 RSI: 00007ffc9f2843b0 RDI: 0000000000000003
    [ 35.832167] RBP: 000000005aa6a7a9 R08: 0000000000000001 R09: 0000000000000000
    [ 35.833075] R10: 00000000000005f1 R11: 0000000000000246 R12: 0000000000000000
    [ 35.833997] R13: 00007ffc9f2884c0 R14: 0000000000000001 R15: 0000000000674640
    [ 35.834923] Code: 24 30 bb 01 00 00 00 45 31 f6 eb 5e 8b 50 08 83 c2 07 83 e2
    fc 83 c2 70 49 8b 07 48 8b 40 70 48 85 c0 74 10 48 89 14 24 4c 89 ff d0 48
    8b 14 24 48 01 c2 49 01 d6 45 85 ed 74 05 41 83 47 2c
    [ 35.837442] RIP: tcf_action_init+0x90/0x190 RSP: ffffb8edc068b9a0
    [ 35.838291] ---[ end trace a095c06ee4b97a26 ]---

    Fixes: d0f6dd8a914f ("net/sched: Introduce act_tunnel_key")
    Signed-off-by: Roman Mashak
    Acked-by: Cong Wang
    Signed-off-by: David S. Miller

    Roman Mashak
     

28 Feb, 2018

1 commit

  • These pernet_operations are from net/sched directory, and they call only
    tc_action_net_init() and tc_action_net_exit():

    bpf_net_ops
    connmark_net_ops
    csum_net_ops
    gact_net_ops
    ife_net_ops
    ipt_net_ops
    xt_net_ops
    mirred_net_ops
    nat_net_ops
    pedit_net_ops
    police_net_ops
    sample_net_ops
    simp_net_ops
    skbedit_net_ops
    skbmod_net_ops
    tunnel_key_net_ops
    vlan_net_ops

    1)tc_action_net_init() just allocates and initializes per-net memory.
    2)There should not be in-flight packets at the time of tc_action_net_exit()
    call, or another pernet_operations send packets to dying net (except
    netlink). So, it seems they can be marked as async.

    Signed-off-by: Kirill Tkhai
    Signed-off-by: David S. Miller

    Kirill Tkhai
     

17 Feb, 2018

4 commits


14 Dec, 2017

1 commit


06 Dec, 2017

1 commit


09 Nov, 2017

1 commit

  • This reverts commit ceffcc5e254b450e6159f173e4538215cebf1b59.
    If we hold that refcnt, the netns can never be destroyed until
    all actions are destroyed by user, this breaks our netns design
    which we expect all actions are destroyed when we destroy the
    whole netns.

    Cc: Lucas Bates
    Cc: Jamal Hadi Salim
    Cc: Jiri Pirko
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Cong Wang
     

03 Nov, 2017

1 commit

  • TC actions have been destroyed asynchronously for a long time,
    previously in a RCU callback and now in a workqueue. If we
    don't hold a refcnt for its netns, we could use the per netns
    data structure, struct tcf_idrinfo, after it has been freed by
    netns workqueue.

    Hold refcnt to ensure netns destroy happens after all actions
    are gone.

    Fixes: ddf97ccdd7cb ("net_sched: add network namespace support for tc actions")
    Reported-by: Lucas Bates
    Tested-by: Lucas Bates
    Cc: Jamal Hadi Salim
    Cc: Jiri Pirko
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Cong Wang
     

31 Aug, 2017

1 commit

  • Typically, each TC filter has its own action. All the actions of the
    same type are saved in its hash table. But the hash buckets are too
    small that it degrades to a list. And the performance is greatly
    affected. For example, it takes about 0m11.914s to insert 64K rules.
    If we convert the hash table to IDR, it only takes about 0m1.500s.
    The improvement is huge.

    But please note that the test result is based on previous patch that
    cls_flower uses IDR.

    Signed-off-by: Chris Mi
    Signed-off-by: Jiri Pirko
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    Chris Mi
     

16 Jun, 2017

2 commits

  • Allow requesting of zero UDP checksum for encapsulated packets. The name and
    meaning of the attribute is "NO_CSUM" in order to have the same meaning of
    the attribute missing and being 0.

    Signed-off-by: Jiri Benc
    Signed-off-by: David S. Miller

    Jiri Benc
     
  • There's currently no way to request (outer) UDP checksum with
    act_tunnel_key. This is problem especially for IPv6. Right now, tunnel_key
    action with IPv6 does not work without going through hassles: both sides
    have to have udp6zerocsumrx configured on the tunnel interface. This is
    obviously not a good solution universally.

    It makes more sense to compute the UDP checksum by default even for IPv4.
    Just set the default to request the checksum when using act_tunnel_key.

    Signed-off-by: Jiri Benc
    Signed-off-by: David S. Miller

    Jiri Benc
     

14 Apr, 2017

1 commit


24 Dec, 2016

1 commit


18 Nov, 2016

1 commit

  • Make struct pernet_operations::id unsigned.

    There are 2 reasons to do so:

    1)
    This field is really an index into an zero based array and
    thus is unsigned entity. Using negative value is out-of-bound
    access by definition.

    2)
    On x86_64 unsigned 32-bit data which are mixed with pointers
    via array indexing or offsets added or subtracted to pointers
    are preffered to signed 32-bit data.

    "int" being used as an array index needs to be sign-extended
    to 64-bit before being used.

    void f(long *p, int i)
    {
    g(p[i]);
    }

    roughly translates to

    movsx rsi, esi
    mov rdi, [rsi+...]
    call g

    MOVSX is 3 byte instruction which isn't necessary if the variable is
    unsigned because x86_64 is zero extending by default.

    Now, there is net_generic() function which, you guessed it right, uses
    "int" as an array index:

    static inline void *net_generic(const struct net *net, int id)
    {
    ...
    ptr = ng->ptr[id - 1];
    ...
    }

    And this function is used a lot, so those sign extensions add up.

    Patch snipes ~1730 bytes on allyesconfig kernel (without all junk
    messing with code generation):

    add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)

    Unfortunately some functions actually grow bigger.
    This is a semmingly random artefact of code generation with register
    allocator being used differently. gcc decides that some variable
    needs to live in new r8+ registers and every access now requires REX
    prefix. Or it is shifted into r12, so [r12+0] addressing mode has to be
    used which is longer than [r8]

    However, overall balance is in negative direction:

    add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)
    function old new delta
    nfsd4_lock 3886 3959 +73
    tipc_link_build_proto_msg 1096 1140 +44
    mac80211_hwsim_new_radio 2776 2808 +32
    tipc_mon_rcv 1032 1058 +26
    svcauth_gss_legacy_init 1413 1429 +16
    tipc_bcbase_select_primary 379 392 +13
    nfsd4_exchange_id 1247 1260 +13
    nfsd4_setclientid_confirm 782 793 +11
    ...
    put_client_renew_locked 494 480 -14
    ip_set_sockfn_get 730 716 -14
    geneve_sock_add 829 813 -16
    nfsd4_sequence_done 721 703 -18
    nlmclnt_lookup_host 708 686 -22
    nfsd4_lockt 1085 1063 -22
    nfs_get_client 1077 1050 -27
    tcf_bpf_init 1106 1076 -30
    nfsd4_encode_fattr 5997 5930 -67
    Total: Before=154856051, After=154854321, chg -0.00%

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: David S. Miller

    Alexey Dobriyan
     

10 Nov, 2016

2 commits