17 Apr, 2019

2 commits

  • [ Upstream commit 0db6f8befc32c68bb13d7ffbb2e563c79e913e13 ]

    It returned always NULL, thus it was never possible to get the filter.

    Example:
    $ ip link add foo type dummy
    $ ip link add bar type dummy
    $ tc qdisc add dev foo clsact
    $ tc filter add dev foo protocol all pref 1 ingress handle 1234 \
    matchall action mirred ingress mirror dev bar

    Before the patch:
    $ tc filter get dev foo protocol all pref 1 ingress handle 1234 matchall
    Error: Specified filter handle not found.
    We have an error talking to the kernel

    After:
    $ tc filter get dev foo protocol all pref 1 ingress handle 1234 matchall
    filter ingress protocol all pref 1 matchall chain 0 handle 0x4d2
    not_in_hw
    action order 1: mirred (Ingress Mirror to device bar) pipe
    index 1 ref 1 bind 1

    CC: Yotam Gigi
    CC: Jiri Pirko
    Fixes: fd62d9f5c575 ("net/sched: matchall: Fix configuration race")
    Signed-off-by: Nicolas Dichtel
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin

    Nicolas Dichtel
     
  • [ Upstream commit fae2708174ae95d98d19f194e03d6e8f688ae195 ]

    the control path of 'sample' action does not validate the value of 'rate'
    provided by the user, but then it uses it as divisor in the traffic path.
    Validate it in tcf_sample_init(), and return -EINVAL with a proper extack
    message in case that value is zero, to fix a splat with the script below:

    # tc f a dev test0 egress matchall action sample rate 0 group 1 index 2
    # tc -s a s action sample
    total acts 1

    action order 0: sample rate 1/0 group 1 pipe
    index 2 ref 1 bind 1 installed 19 sec used 19 sec
    Action statistics:
    Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
    backlog 0b 0p requeues 0
    # ping 192.0.2.1 -I test0 -c1 -q

    divide error: 0000 [#1] SMP PTI
    CPU: 1 PID: 6192 Comm: ping Not tainted 5.1.0-rc2.diag2+ #591
    Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
    RIP: 0010:tcf_sample_act+0x9e/0x1e0 [act_sample]
    Code: 6a f1 85 c0 74 0d 80 3d 83 1a 00 00 00 0f 84 9c 00 00 00 4d 85 e4 0f 84 85 00 00 00 e8 9b d7 9c f1 44 8b 8b e0 00 00 00 31 d2 f7 f1 85 d2 75 70 f6 85 83 00 00 00 10 48 8b 45 10 8b 88 08 01
    RSP: 0018:ffffae320190ba30 EFLAGS: 00010246
    RAX: 00000000b0677d21 RBX: ffff8af1ed9ec000 RCX: 0000000059a9fe49
    RDX: 0000000000000000 RSI: 000000000c7e33b7 RDI: ffff8af23daa0af0
    RBP: ffff8af1ee11b200 R08: 0000000074fcaf7e R09: 0000000000000000
    R10: 0000000000000050 R11: ffffffffb3088680 R12: ffff8af232307f80
    R13: 0000000000000003 R14: ffff8af1ed9ec000 R15: 0000000000000000
    FS: 00007fe9c6d2f740(0000) GS:ffff8af23da80000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00007fff6772f000 CR3: 00000000746a2004 CR4: 00000000001606e0
    Call Trace:
    tcf_action_exec+0x7c/0x1c0
    tcf_classify+0x57/0x160
    __dev_queue_xmit+0x3dc/0xd10
    ip_finish_output2+0x257/0x6d0
    ip_output+0x75/0x280
    ip_send_skb+0x15/0x40
    raw_sendmsg+0xae3/0x1410
    sock_sendmsg+0x36/0x40
    __sys_sendto+0x10e/0x140
    __x64_sys_sendto+0x24/0x30
    do_syscall_64+0x60/0x210
    entry_SYSCALL_64_after_hwframe+0x49/0xbe
    [...]
    Kernel panic - not syncing: Fatal exception in interrupt

    Add a TDC selftest to document that 'rate' is now being validated.

    Reported-by: Matteo Croce
    Fixes: 5c5670fae430 ("net/sched: Introduce sample tc action")
    Signed-off-by: Davide Caratti
    Acked-by: Yotam Gigi
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin

    Davide Caratti
     

03 Apr, 2019

1 commit

  • [ Upstream commit 064c5d6881e897077639e04973de26440ee205e6 ]

    A new mirred action is created by the tcf_mirred_init function. This
    contains a list head struct which is inserted into a global list on
    successful creation of a new action. However, after a creation, it is
    still possible to error out and call the tcf_idr_release function. This,
    in turn, calls the act_mirr cleanup function via __tcf_idr_release and
    __tcf_action_put. This cleanup function tries to delete the list entry
    which is as yet uninitialised, leading to a NULL pointer exception.

    Fix this by initialising the list entry on creation of a new action.

    Bug report:

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
    PGD 8000000840c73067 P4D 8000000840c73067 PUD 858dcc067 PMD 0
    Oops: 0002 [#1] SMP PTI
    CPU: 32 PID: 5636 Comm: handler194 Tainted: G OE 5.0.0+ #186
    Hardware name: Dell Inc. PowerEdge R730/0599V5, BIOS 1.3.6 06/03/2015
    RIP: 0010:tcf_mirred_release+0x42/0xa7 [act_mirred]
    Code: f0 90 39 c0 e8 52 04 57 c8 48 c7 c7 b8 80 39 c0 e8 94 fa d4 c7 48 8b 93 d0 00 00 00 48 8b 83 d8 00 00 00 48 c7 c7 f0 90 39 c0 89 42 08 48 89 10 48 b8 00 01 00 00 00 00 ad de 48 89 83 d0 00
    RSP: 0018:ffffac4aa059f688 EFLAGS: 00010282
    RAX: 0000000000000000 RBX: ffff9dcd1b214d00 RCX: 0000000000000000
    RDX: 0000000000000000 RSI: ffff9dcd1fa165f8 RDI: ffffffffc03990f0
    RBP: ffff9dccf9c7af80 R08: 0000000000000a3b R09: 0000000000000000
    R10: ffff9dccfa11f420 R11: 0000000000000000 R12: 0000000000000001
    R13: ffff9dcd16b433c0 R14: ffff9dcd1b214d80 R15: 0000000000000000
    FS: 00007f441bfff700(0000) GS:ffff9dcd1fa00000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000000000008 CR3: 0000000839e64004 CR4: 00000000001606e0
    Call Trace:
    tcf_action_cleanup+0x59/0xca
    __tcf_action_put+0x54/0x6b
    __tcf_idr_release.cold.33+0x9/0x12
    tcf_mirred_init.cold.20+0x22e/0x3b0 [act_mirred]
    tcf_action_init_1+0x3d0/0x4c0
    tcf_action_init+0x9c/0x130
    tcf_exts_validate+0xab/0xc0
    fl_change+0x1ca/0x982 [cls_flower]
    tc_new_tfilter+0x647/0x8d0
    ? load_balance+0x14b/0x9e0
    rtnetlink_rcv_msg+0xe3/0x370
    ? __switch_to_asm+0x40/0x70
    ? __switch_to_asm+0x34/0x70
    ? _cond_resched+0x15/0x30
    ? __kmalloc_node_track_caller+0x1d4/0x2b0
    ? rtnl_calcit.isra.31+0xf0/0xf0
    netlink_rcv_skb+0x49/0x110
    netlink_unicast+0x16f/0x210
    netlink_sendmsg+0x1df/0x390
    sock_sendmsg+0x36/0x40
    ___sys_sendmsg+0x27b/0x2c0
    ? futex_wake+0x80/0x140
    ? do_futex+0x2b9/0xac0
    ? ep_scan_ready_list.constprop.22+0x1f2/0x210
    ? ep_poll+0x7a/0x430
    __sys_sendmsg+0x47/0x80
    do_syscall_64+0x55/0x100
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

    Fixes: 4e232818bd32 ("net: sched: act_mirred: remove dependency on rtnl lock")
    Signed-off-by: John Hurley
    Reviewed-by: Jakub Kicinski
    Acked-by: Cong Wang
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    John Hurley
     

19 Mar, 2019

1 commit

  • [ Upstream commit ecb3dea400d3beaf611ce76ac7a51d4230492cf2 ]

    When adding new filter to flower classifier, fl_change() inserts it to
    handle_idr before initializing filter extensions and assigning it a mask.
    Normally this ordering doesn't matter because all flower classifier ops
    callbacks assume rtnl lock protection. However, when filter has an action
    that doesn't have its kernel module loaded, rtnl lock is released before
    call to request_module(). During this time the filter can be accessed bu
    concurrent task before its initialization is completed, which can lead to a
    crash.

    Example case of NULL pointer dereference in concurrent dump:

    Task 1 Task 2

    tc_new_tfilter()
    fl_change()
    idr_alloc_u32(fnew)
    fl_set_parms()
    tcf_exts_validate()
    tcf_action_init()
    tcf_action_init_1()
    rtnl_unlock()
    request_module()
    ... rtnl_lock()
    tc_dump_tfilter()
    tcf_chain_dump()
    fl_walk()
    idr_get_next_ul()
    tcf_node_dump()
    tcf_fill_node()
    fl_dump()
    mask = &f->mask->key; handle
    that is allocated by idr_alloc_u32(). Move idr allocation code after action
    creation and mask assignment in fl_change() to prevent concurrent access
    to not fully initialized filter when rtnl lock is released to load action
    module.

    Fixes: 01683a146999 ("net: sched: refactor flower walk to iterate over idr")
    Signed-off-by: Vlad Buslov
    Reviewed-by: Roi Dayan
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Vlad Buslov
     

10 Mar, 2019

5 commits

  • [ Upstream commit a3df633a3c92bb96b06552c3f828d7c267774379 ]

    Metadata pointer is only initialized for action TCA_TUNNEL_KEY_ACT_SET, but
    it is unconditionally dereferenced in tunnel_key_init() error handler.
    Verify that metadata pointer is not NULL before dereferencing it in
    tunnel_key_init error handling code.

    Fixes: ee28bb56ac5b ("net/sched: fix memory leak in act_tunnel_key_init()")
    Signed-off-by: Vlad Buslov
    Reviewed-by: Davide Caratti
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Vlad Buslov
     
  • [ Upstream commit 6191da98062d25276a3b88fb2a94dcbcfb3ea65d ]

    when act_skbedit was converted to use RCU in the data plane, we added an
    error path, but we forgot to drop the action refcount in case of failure
    during a 'replace' operation:

    # tc actions add action skbedit ptype otherhost pass index 100
    # tc action show action skbedit
    total acts 1

    action order 0: skbedit ptype otherhost pass
    index 100 ref 1 bind 0
    # tc actions replace action skbedit ptype otherhost drop index 100
    RTNETLINK answers: Cannot allocate memory
    We have an error talking to the kernel
    # tc action show action skbedit
    total acts 1

    action order 0: skbedit ptype otherhost pass
    index 100 ref 2 bind 0

    Ensure we call tcf_idr_release(), in case 'params_new' allocation failed,
    also when the action is being replaced.

    Fixes: c749cdda9089 ("net/sched: act_skbedit: don't use spinlock in the data path")
    Signed-off-by: Davide Caratti
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Davide Caratti
     
  • [ Upstream commit 8f67c90ee9148eab3d2b4393c3cf76489b27f87c ]

    After commit 4e8ddd7f1758 ("net: sched: don't release reference on action
    overwrite"), the error path of all actions was converted to drop refcount
    also when the action was being overwritten. But we forgot act_ipt_init(),
    in case allocation of 'tname' was not successful:

    # tc action add action xt -j LOG --log-prefix hello index 100
    tablename: mangle hook: NF_IP_POST_ROUTING
    target: LOG level warning prefix "hello" index 100
    # tc action show action xt
    total acts 1

    action order 0: tablename: mangle hook: NF_IP_POST_ROUTING
    target LOG level warning prefix "hello"
    index 100 ref 1 bind 0
    # tc action replace action xt -j LOG --log-prefix world index 100
    tablename: mangle hook: NF_IP_POST_ROUTING
    target: LOG level warning prefix "world" index 100
    RTNETLINK answers: Cannot allocate memory
    We have an error talking to the kernel
    # tc action show action xt
    total acts 1

    action order 0: tablename: mangle hook: NF_IP_POST_ROUTING
    target LOG level warning prefix "hello"
    index 100 ref 2 bind 0

    Ensure we call tcf_idr_release(), in case 'tname' allocation failed, also
    when the action is being replaced.

    Fixes: 4e8ddd7f1758 ("net: sched: don't release reference on action overwrite")
    Signed-off-by: Davide Caratti
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Davide Caratti
     
  • [ Upstream commit 5845f706388a4cde0f6b80f9e5d33527e942b7d9 ]

    It can be reproduced by following steps:
    1. virtio_net NIC is configured with gso/tso on
    2. configure nginx as http server with an index file bigger than 1M bytes
    3. use tc netem to produce duplicate packets and delay:
    tc qdisc add dev eth0 root netem delay 100ms 10ms 30% duplicate 90%
    4. continually curl the nginx http server to get index file on client
    5. BUG_ON is seen quickly

    [10258690.371129] kernel BUG at net/core/skbuff.c:4028!
    [10258690.371748] invalid opcode: 0000 [#1] SMP PTI
    [10258690.372094] CPU: 5 PID: 0 Comm: swapper/5 Tainted: G W 5.0.0-rc6 #2
    [10258690.372094] RSP: 0018:ffffa05797b43da0 EFLAGS: 00010202
    [10258690.372094] RBP: 00000000000005ea R08: 0000000000000000 R09: 00000000000005ea
    [10258690.372094] R10: ffffa0579334d800 R11: 00000000000002c0 R12: 0000000000000002
    [10258690.372094] R13: 0000000000000000 R14: ffffa05793122900 R15: ffffa0578f7cb028
    [10258690.372094] FS: 0000000000000000(0000) GS:ffffa05797b40000(0000) knlGS:0000000000000000
    [10258690.372094] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [10258690.372094] CR2: 00007f1a6dc00868 CR3: 000000001000e000 CR4: 00000000000006e0
    [10258690.372094] Call Trace:
    [10258690.372094]
    [10258690.372094] skb_to_sgvec+0x11/0x40
    [10258690.372094] start_xmit+0x38c/0x520 [virtio_net]
    [10258690.372094] dev_hard_start_xmit+0x9b/0x200
    [10258690.372094] sch_direct_xmit+0xff/0x260
    [10258690.372094] __qdisc_run+0x15e/0x4e0
    [10258690.372094] net_tx_action+0x137/0x210
    [10258690.372094] __do_softirq+0xd6/0x2a9
    [10258690.372094] irq_exit+0xde/0xf0
    [10258690.372094] smp_apic_timer_interrupt+0x74/0x140
    [10258690.372094] apic_timer_interrupt+0xf/0x20
    [10258690.372094]

    In __skb_to_sgvec(), the skb->len is not equal to the sum of the skb's
    linear data size and nonlinear data size, thus BUG_ON triggered.
    Because the skb is cloned and a part of nonlinear data is split off.

    Duplicate packet is cloned in netem_enqueue() and may be delayed
    some time in qdisc. When qdisc len reached the limit and returns
    NET_XMIT_DROP, the skb will be retransmit later in write queue.
    the skb will be fragmented by tso_fragment(), the limit size
    that depends on cwnd and mss decrease, the skb's nonlinear
    data will be split off. The length of the skb cloned by netem
    will not be updated. When we use virtio_net NIC and invoke skb_to_sgvec(),
    the BUG_ON trigger.

    To fix it, netem returns NET_XMIT_SUCCESS to upper stack
    when it clones a duplicate packet.

    Fixes: 35d889d1 ("sch_netem: fix skb leak in netem_enqueue()")
    Signed-off-by: Sheng Lan
    Reported-by: Qin Ji
    Suggested-by: Eric Dumazet
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Sheng Lan
     
  • [ Upstream commit 46b1c18f9deb326a7e18348e668e4c7ab7c7458b ]

    In the series fc8b81a5981f ("Merge branch 'lockless-qdisc-series'")
    John made the assumption that the data path had no need to read
    the qdisc qlen (number of packets in the qdisc).

    It is true when pfifo_fast is used as the root qdisc, or as direct MQ/MQPRIO
    children.

    But pfifo_fast can be used as leaf in class full qdiscs, and existing
    logic needs to access the child qlen in an efficient way.

    HTB breaks badly, since it uses cl->leaf.q->q.qlen in :
    htb_activate() -> WARN_ON()
    htb_dequeue_tree() to decide if a class can be htb_deactivated
    when it has no more packets.

    HFSC, DRR, CBQ, QFQ have similar issues, and some calls to
    qdisc_tree_reduce_backlog() also read q.qlen directly.

    Using qdisc_qlen_sum() (which iterates over all possible cpus)
    in the data path is a non starter.

    It seems we have to put back qlen in a central location,
    at least for stable kernels.

    For all qdisc but pfifo_fast, qlen is guarded by the qdisc lock,
    so the existing q.qlen{++|--} are correct.

    For 'lockless' qdisc (pfifo_fast so far), we need to use atomic_{inc|dec}()
    because the spinlock might be not held (for example from
    pfifo_fast_enqueue() and pfifo_fast_dequeue())

    This patch adds atomic_qlen (in the same location than qlen)
    and renames the following helpers, since we want to express
    they can be used without qdisc lock, and that qlen is no longer percpu.

    - qdisc_qstats_cpu_qlen_dec -> qdisc_qstats_atomic_qlen_dec()
    - qdisc_qstats_cpu_qlen_inc -> qdisc_qstats_atomic_qlen_inc()

    Later (net-next) we might revert this patch by tracking all these
    qlen uses and replace them by a more efficient method (not having
    to access a precise qlen, but an empty/non_empty status that might
    be less expensive to maintain/track).

    Another possibility is to have a legacy pfifo_fast version that would
    be used when used a a child qdisc, since the parent qdisc needs
    a spinlock anyway. But then, future lockless qdiscs would also
    have the same problem.

    Fixes: 7e66016f2c65 ("net: sched: helpers to sum qlen and qlen for per cpu logic")
    Signed-off-by: Eric Dumazet
    Cc: John Fastabend
    Cc: Jamal Hadi Salim
    Cc: Cong Wang
    Cc: Jiri Pirko
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     

27 Feb, 2019

3 commits

  • [ Upstream commit 1db817e75f5b9387b8db11e37d5f0624eb9223e0 ]

    struct tcindex_filter_result contains two parts:
    struct tcf_exts and struct tcf_result.

    For the local variable 'cr', its exts part is never used but
    initialized without being released properly on success path. So
    just completely remove the exts part to fix this leak.

    For the local variable 'new_filter_result', it is never properly
    released if not used by 'r' on success path.

    Cc: Jamal Hadi Salim
    Cc: Jiri Pirko
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Cong Wang
     
  • [ Upstream commit 033b228e7f26b29ae37f8bfa1bc6b209a5365e9f ]

    When tcindex_destroy() destroys all the filter results in
    the perfect hash table, it invokes the walker to delete
    each of them. However, results with class==0 are skipped
    in either tcindex_walk() or tcindex_delete(), which causes
    a memory leak reported by kmemleak.

    This patch fixes it by skipping the walker and directly
    deleting these filter results so we don't miss any filter
    result.

    As a result of this change, we have to initialize exts->net
    properly in tcindex_alloc_perfect_hash(). For net-next, we
    need to consider whether we should initialize ->net in
    tcf_exts_init() instead, before that just directly test
    CONFIG_NET_CLS_ACT=y.

    Cc: Jamal Hadi Salim
    Cc: Jiri Pirko
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Cong Wang
     
  • [ Upstream commit 8015d93ebd27484418d4952284fd02172fa4b0b2 ]

    tcindex_destroy() invokes tcindex_destroy_element() via
    a walker to delete each filter result in its perfect hash
    table, and tcindex_destroy_element() calls tcindex_delete()
    which schedules tcf RCU works to do the final deletion work.
    Unfortunately this races with the RCU callback
    __tcindex_destroy(), which could lead to use-after-free as
    reported by Adrian.

    Fix this by migrating this RCU callback to tcf RCU work too,
    as that workqueue is ordered, we will not have use-after-free.

    Note, we don't need to hold netns refcnt because we don't call
    tcf_exts_destroy() here.

    Fixes: 27ce4f05e2ab ("net_sched: use tcf_queue_work() in tcindex filter")
    Reported-by: Adrian
    Cc: Ben Hutchings
    Cc: Jamal Hadi Salim
    Cc: Jiri Pirko
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Cong Wang
     

31 Jan, 2019

3 commits

  • [ Upstream commit 2cddd20147826aef283115abb00012d4dafe3cdb ]

    Recent changes (especially 05cd271fd61a ("cls_flower: Support multiple
    masks per priority")) in the fl_flow_mask structure grow it and its
    current size e.g. on x86_64 with defconfig is 760 bytes and more than
    1024 bytes with some debug options enabled. Prior the mentioned commit
    its size was 176 bytes (using defconfig on x86_64).
    With regard to this fact it's reasonable to allocate this structure
    dynamically in fl_change() to reduce its stack size.

    v2:
    - use kzalloc() instead of kcalloc()

    Fixes: 05cd271fd61a ("cls_flower: Support multiple masks per priority")
    Cc: Jiri Pirko
    Cc: Paul Blakey
    Acked-by: Jiri Pirko
    Signed-off-by: Ivan Vecera
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Ivan Vecera
     
  • [ Upstream commit cd0c4e70fc0ccfa705cdf55efb27519ce9337a26 ]

    Martin reported a set of filters don't work after changing
    from reclassify to continue. Looking into the code, it
    looks like skb protocol is not always fetched for each
    iteration of the filters. But, as demonstrated by Martin,
    TC actions could modify skb->protocol, for example act_vlan,
    this means we have to refetch skb protocol in each iteration,
    rather than using the one we fetch in the beginning of the loop.

    This bug is _not_ introduced by commit 3b3ae880266d
    ("net: sched: consolidate tc_classify{,_compat}"), technically,
    if act_vlan is the only action that modifies skb protocol, then
    it is commit c7e2b9689ef8 ("sched: introduce vlan action") which
    introduced this bug.

    Reported-by: Martin Olsson
    Cc: Jamal Hadi Salim
    Cc: Jiri Pirko
    Signed-off-by: Cong Wang
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Cong Wang
     
  • [ Upstream commit 9174c3df1cd181c14913138d50ccbe539bb08335 ]

    running the following TDC test cases:

    7afc - Replace tunnel_key set action with all parameters
    364d - Replace tunnel_key set action with all parameters and cookie

    it's possible to trigger kmemleak warnings like:

    unreferenced object 0xffff94797127ab40 (size 192):
    comm "tc", pid 3248, jiffies 4300565293 (age 1006.862s)
    hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 c0 93 f9 8a ff ff ff ff ................
    41 84 ee 89 ff ff ff ff 00 00 00 00 00 00 00 00 A...............
    backtrace:
    [] tunnel_key_init+0x31d/0x820 [act_tunnel_key]
    [] tcf_action_init_1+0x384/0x4c0
    [] tcf_action_init+0x12b/0x1a0
    [] tcf_action_add+0x73/0x170
    [] tc_ctl_action+0x122/0x160
    [] rtnetlink_rcv_msg+0x263/0x2d0
    [] netlink_rcv_skb+0x4a/0x110
    [] netlink_unicast+0x1a0/0x250
    [] netlink_sendmsg+0x2c1/0x3c0
    [] sock_sendmsg+0x36/0x40
    [] ___sys_sendmsg+0x280/0x2f0
    [] __sys_sendmsg+0x5e/0xa0
    [] do_syscall_64+0x5b/0x180
    [] entry_SYSCALL_64_after_hwframe+0x44/0xa9
    [] 0xffffffffffffffff

    when the tunnel_key action is replaced, the kernel forgets to release the
    dst metadata: ensure they are released by tunnel_key_init(), the same way
    it's done in tunnel_key_release().

    Fixes: d0f6dd8a914f4 ("net/sched: Introduce act_tunnel_key")
    Signed-off-by: Davide Caratti
    Acked-by: Cong Wang
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Davide Caratti
     

17 Dec, 2018

1 commit

  • [ Upstream commit 9410d386d0a829ace9558336263086c2fbbe8aed ]

    __qdisc_drop_all() accesses skb->prev to get to the tail of the
    segment-list.

    With commit 68d2f84a1368 ("net: gro: properly remove skb from list")
    the skb-list handling has been changed to set skb->next to NULL and set
    the list-poison on skb->prev.

    With that change, __qdisc_drop_all() will panic when it tries to
    dereference skb->prev.

    Since commit 992cba7e276d ("net: Add and use skb_list_del_init().")
    __list_del_entry is used, leaving skb->prev unchanged (thus,
    pointing to the list-head if it's the first skb of the list).
    This will make __qdisc_drop_all modify the next-pointer of the list-head
    and result in a panic later on:

    [ 34.501053] general protection fault: 0000 [#1] SMP KASAN PTI
    [ 34.501968] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.20.0-rc2.mptcp #108
    [ 34.502887] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.5.1 01/01/2011
    [ 34.504074] RIP: 0010:dev_gro_receive+0x343/0x1f90
    [ 34.504751] Code: e0 48 c1 e8 03 42 80 3c 30 00 0f 85 4a 1c 00 00 4d 8b 24 24 4c 39 65 d0 0f 84 0a 04 00 00 49 8d 7c 24 38 48 89 f8 48 c1 e8 03 0f b6 04 30 84 c0 74 08 3c 04
    [ 34.507060] RSP: 0018:ffff8883af507930 EFLAGS: 00010202
    [ 34.507761] RAX: 0000000000000007 RBX: ffff8883970b2c80 RCX: 1ffff11072e165a6
    [ 34.508640] RDX: 1ffff11075867008 RSI: ffff8883ac338040 RDI: 0000000000000038
    [ 34.509493] RBP: ffff8883af5079d0 R08: ffff8883970b2d40 R09: 0000000000000062
    [ 34.510346] R10: 0000000000000034 R11: 0000000000000000 R12: 0000000000000000
    [ 34.511215] R13: 0000000000000000 R14: dffffc0000000000 R15: ffff8883ac338008
    [ 34.512082] FS: 0000000000000000(0000) GS:ffff8883af500000(0000) knlGS:0000000000000000
    [ 34.513036] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 34.513741] CR2: 000055ccc3e9d020 CR3: 00000003abf32000 CR4: 00000000000006e0
    [ 34.514593] Call Trace:
    [ 34.514893]
    [ 34.515157] napi_gro_receive+0x93/0x150
    [ 34.515632] receive_buf+0x893/0x3700
    [ 34.516094] ? __netif_receive_skb+0x1f/0x1a0
    [ 34.516629] ? virtnet_probe+0x1b40/0x1b40
    [ 34.517153] ? __stable_node_chain+0x4d0/0x850
    [ 34.517684] ? kfree+0x9a/0x180
    [ 34.518067] ? __kasan_slab_free+0x171/0x190
    [ 34.518582] ? detach_buf+0x1df/0x650
    [ 34.519061] ? lapic_next_event+0x5a/0x90
    [ 34.519539] ? virtqueue_get_buf_ctx+0x280/0x7f0
    [ 34.520093] virtnet_poll+0x2df/0xd60
    [ 34.520533] ? receive_buf+0x3700/0x3700
    [ 34.521027] ? qdisc_watchdog_schedule_ns+0xd5/0x140
    [ 34.521631] ? htb_dequeue+0x1817/0x25f0
    [ 34.522107] ? sch_direct_xmit+0x142/0xf30
    [ 34.522595] ? virtqueue_napi_schedule+0x26/0x30
    [ 34.523155] net_rx_action+0x2f6/0xc50
    [ 34.523601] ? napi_complete_done+0x2f0/0x2f0
    [ 34.524126] ? kasan_check_read+0x11/0x20
    [ 34.524608] ? _raw_spin_lock+0x7d/0xd0
    [ 34.525070] ? _raw_spin_lock_bh+0xd0/0xd0
    [ 34.525563] ? kvm_guest_apic_eoi_write+0x6b/0x80
    [ 34.526130] ? apic_ack_irq+0x9e/0xe0
    [ 34.526567] __do_softirq+0x188/0x4b5
    [ 34.527015] irq_exit+0x151/0x180
    [ 34.527417] do_IRQ+0xdb/0x150
    [ 34.527783] common_interrupt+0xf/0xf
    [ 34.528223]

    This patch makes sure that skb->prev is set to NULL when entering
    netem_enqueue.

    Cc: Prashant Bhole
    Cc: Tyler Hicks
    Cc: Eric Dumazet
    Fixes: 68d2f84a1368 ("net: gro: properly remove skb from list")
    Suggested-by: Eric Dumazet
    Signed-off-by: Christoph Paasch
    Reviewed-by: Eric Dumazet
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Christoph Paasch
     

23 Nov, 2018

2 commits

  • [ Upstream commit 63c82997f5c0f3e1b914af43d82f712a86bc5f3a ]

    TCA_FLOWER_KEY_ENC_OPTS and TCA_FLOWER_KEY_ENC_OPTS_MASK can only
    currently contain further nested attributes, which are parsed by
    hand, so the policy is never actually used resulting in a W=1
    build warning:

    net/sched/cls_flower.c:492:1: warning: ‘enc_opts_policy’ defined but not used [-Wunused-const-variable=]
    enc_opts_policy[TCA_FLOWER_KEY_ENC_OPTS_MAX + 1] = {

    Add the validation anyway to avoid potential bugs when other
    attributes are added and to make the attribute structure slightly
    more clear. Validation will also set extact to point to bad
    attribute on error.

    Fixes: 0a6e77784f49 ("net/sched: allow flower to match tunnel options")
    Signed-off-by: Jakub Kicinski
    Acked-by: Simon Horman
    Acked-by: Jiri Pirko
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Jakub Kicinski
     
  • [ Upstream commit 19ab69107d3ecfb7cd3e38ad262a881be40c01a3 ]

    tcf_idr_check_alloc() can return a negative value, on allocation failures
    (-ENOMEM) or IDR exhaustion (-ENOSPC): don't leak keys_ex in these cases.

    Fixes: 0190c1d452a9 ("net: sched: atomically check-allocate action")
    Signed-off-by: Davide Caratti
    Acked-by: Cong Wang
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Davide Caratti
     

14 Nov, 2018

1 commit

  • commit e72bde6b66299602087c8c2350d36a525e75d06e upstream.

    Marco reported an error with hfsc:
    root@Calimero:~# tc qdisc add dev eth0 root handle 1:0 hfsc default 1
    Error: Attribute failed policy validation.

    Apparently a few implementations pass TCA_OPTIONS as a binary instead
    of nested attribute, so drop TCA_OPTIONS from the policy.

    Fixes: 8b4c3cdd9dd8 ("net: sched: Add policy validation for tc attributes")
    Reported-by: Marco Berizzi
    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    David Ahern
     

04 Nov, 2018

1 commit

  • [ Upstream commit 38b4f18d56372e1e21771ab7b0357b853330186c ]

    gred_change_table_def() takes a pointer to TCA_GRED_DPS attribute,
    and expects it will be able to interpret its contents as
    struct tc_gred_sopt. Pass the correct gred attribute, instead of
    TCA_OPTIONS.

    This bug meant the table definition could never be changed after
    Qdisc was initialized (unless whatever TCA_OPTIONS contained both
    passed netlink validation and was a valid struct tc_gred_sopt...).

    Old behaviour:
    $ ip link add type dummy
    $ tc qdisc replace dev dummy0 parent root handle 7: \
    gred setup vqs 4 default 0
    $ tc qdisc replace dev dummy0 parent root handle 7: \
    gred setup vqs 4 default 0
    RTNETLINK answers: Invalid argument

    Now:
    $ ip link add type dummy
    $ tc qdisc replace dev dummy0 parent root handle 7: \
    gred setup vqs 4 default 0
    $ tc qdisc replace dev dummy0 parent root handle 7: \
    gred setup vqs 4 default 0
    $ tc qdisc replace dev dummy0 parent root handle 7: \
    gred setup vqs 4 default 0

    Fixes: f62d6b936df5 ("[PKT_SCHED]: GRED: Use central VQ change procedure")
    Signed-off-by: Jakub Kicinski
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Jakub Kicinski
     

19 Oct, 2018

1 commit

  • When dumping classes by parent, kernel would return classes twice:

    | # tc qdisc add dev lo root prio
    | # tc class show dev lo
    | class prio 8001:1 parent 8001:
    | class prio 8001:2 parent 8001:
    | class prio 8001:3 parent 8001:
    | # tc class show dev lo parent 8001:
    | class prio 8001:1 parent 8001:
    | class prio 8001:2 parent 8001:
    | class prio 8001:3 parent 8001:
    | class prio 8001:1 parent 8001:
    | class prio 8001:2 parent 8001:
    | class prio 8001:3 parent 8001:

    This comes from qdisc_match_from_root() potentially returning the root
    qdisc itself if its handle matched. Though in that case, root's classes
    were already dumped a few lines above.

    Fixes: cb395b2010879 ("net: sched: optimize class dumps")
    Signed-off-by: Phil Sutter
    Reviewed-by: Jiri Pirko
    Reviewed-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Phil Sutter
     

16 Oct, 2018

1 commit

  • Similarly to what has been done in 8b4c3cdd9dd8 ("net: sched: Add policy
    validation for tc attributes"), fix classifier code to add validation of
    TCA_CHAIN and TCA_KIND netlink attributes.

    tested with:
    # ./tdc.py -c filter

    v2: Let sch_api and cls_api share nla_policy they have in common, thanks
    to David Ahern.
    v3: Avoid EXPORT_SYMBOL(), as validation of those attributes is not done
    by TC modules, thanks to Cong Wang.
    While at it, restore the 'Delete / get qdisc' comment to its orginal
    position, just above tc_get_qdisc() function prototype.

    Fixes: 5bc1701881e39 ("net: sched: introduce multichain support for filters")
    Signed-off-by: Davide Caratti
    Signed-off-by: David S. Miller

    Davide Caratti
     

12 Oct, 2018

2 commits

  • David writes:
    "Networking

    1) RXRPC receive path fixes from David Howells.

    2) Re-export __skb_recv_udp(), from Jiri Kosina.

    3) Fix refcounting in u32 classificer, from Al Viro.

    4) Userspace netlink ABI fixes from Eugene Syromiatnikov.

    5) Don't double iounmap on rmmod in ena driver, from Arthur
    Kiyanovski.

    6) Fix devlink string attribute handling, we must pull a copy into a
    kernel buffer if the lifetime extends past the netlink request.
    From Moshe Shemesh.

    7) Fix hangs in RDS, from Ka-Cheong Poon.

    8) Fix recursive locking lockdep warnings in tipc, from Ying Xue.

    9) Clear RX irq correctly in socionext, from Ilias Apalodimas.

    10) bcm_sf2 fixes from Florian Fainelli."

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (38 commits)
    net: dsa: bcm_sf2: Call setup during switch resume
    net: dsa: bcm_sf2: Fix unbind ordering
    net: phy: sfp: remove sfp_mutex's definition
    r8169: set RX_MULTI_EN bit in RxConfig for 8168F-family chips
    net: socionext: clear rx irq correctly
    net/mlx4_core: Fix warnings during boot on driverinit param set failures
    tipc: eliminate possible recursive locking detected by LOCKDEP
    selftests: udpgso_bench.sh explicitly requires bash
    selftests: rtnetlink.sh explicitly requires bash.
    qmi_wwan: Added support for Gemalto's Cinterion ALASxx WWAN interface
    tipc: queue socket protocol error messages into socket receive buffer
    tipc: set link tolerance correctly in broadcast link
    net: ipv4: don't let PMTU updates increase route MTU
    net: ipv4: update fnhe_pmtu when first hop's MTU changes
    net/ipv6: stop leaking percpu memory in fib6 info
    rds: RDS (tcp) hangs on sendto() to unresponding address
    net: make skb_partial_csum_set() more robust against overflows
    devlink: Add helper function for safely copy string param
    devlink: Fix param cmode driverinit for string type
    devlink: Fix param set handling for string type
    ...

    Greg Kroah-Hartman
     
  • Kees writes:
    "Fix open-coded multiplication arguments to allocators

    - Fixes several new open-coded multiplications added in the 4.19
    merge window."

    * tag 'alloc-args-v4.19-rc8' of https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
    treewide: Replace more open-coded allocation size multiplications

    Greg Kroah-Hartman
     

08 Oct, 2018

1 commit

  • cls_u32.c misuses refcounts for struct tc_u_hnode - it counts references
    via ->hlist and via ->tp_root together. u32_destroy() drops the former
    and, in case when there had been links, leaves the sucker on the list.
    As the result, there's nothing to protect it from getting freed once links
    are dropped.
    That also makes the "is it busy" check incapable of catching the root
    hnode - it *is* busy (there's a reference from tp), but we don't see it as
    something separate. "Is it our root?" check partially covers that, but
    the problem exists for others' roots as well.

    AFAICS, the minimal fix preserving the existing behaviour (where it doesn't
    include oopsen, that is) would be this:
    * count tp->root and tp_c->hlist as separate references. I.e.
    have u32_init() set refcount to 2, not 1.
    * in u32_destroy() we always drop the former;
    in u32_destroy_hnode() - the latter.

    That way we have *all* references contributing to refcount. List
    removal happens in u32_destroy_hnode() (called only when ->refcnt is 1)
    an in u32_destroy() in case of tc_u_common going away, along with
    everything reachable from it. IOW, that way we know that
    u32_destroy_key() won't free something still on the list (or pointed to by
    someone's ->root).

    Reproducer:

    tc qdisc add dev eth0 ingress
    tc filter add dev eth0 parent ffff: protocol ip prio 100 handle 1: \
    u32 divisor 1
    tc filter add dev eth0 parent ffff: protocol ip prio 200 handle 2: \
    u32 divisor 1
    tc filter add dev eth0 parent ffff: protocol ip prio 100 \
    handle 1:0:11 u32 ht 1: link 801: offset at 0 mask 0f00 shift 6 \
    plus 0 eat match ip protocol 6 ff
    tc filter delete dev eth0 parent ffff: protocol ip prio 200
    tc filter change dev eth0 parent ffff: protocol ip prio 100 \
    handle 1:0:11 u32 ht 1: link 0: offset at 0 mask 0f00 shift 6 plus 0 \
    eat match ip protocol 6 ff
    tc filter delete dev eth0 parent ffff: protocol ip prio 100

    Signed-off-by: Al Viro
    Signed-off-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    Al Viro
     

06 Oct, 2018

2 commits

  • As done treewide earlier, this catches several more open-coded
    allocation size calculations that were added to the kernel during the
    merge window. This performs the following mechanical transformations
    using Coccinelle:

    kvmalloc(a * b, ...) -> kvmalloc_array(a, b, ...)
    kvzalloc(a * b, ...) -> kvcalloc(a, b, ...)
    devm_kzalloc(..., a * b, ...) -> devm_kcalloc(..., a, b, ...)

    Signed-off-by: Kees Cook

    Kees Cook
     
  • A number of TC attributes are processed without proper validation
    (e.g., length checks). Add a tca policy for all input attributes and use
    when invoking nlmsg_parse.

    The 2 Fixes tags below cover the latest additions. The other attributes
    are a string (KIND), nested attribute (OPTIONS which does seem to have
    validation in most cases), for dumps only or a flag.

    Fixes: 5bc1701881e39 ("net: sched: introduce multichain support for filters")
    Fixes: d47a6b0e7c492 ("net: sched: introduce ingress/egress block index attributes for qdisc")
    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     

02 Oct, 2018

1 commit

  • If "td->u.target_size" is larger than sizeof(struct xt_entry_target) we
    return -EINVAL. But we don't check whether it's smaller than
    sizeof(struct xt_entry_target) and that could lead to an out of bounds
    read.

    Fixes: 7ba699c604ab ("[NET_SCHED]: Convert actions from rtnetlink to new netlink API")
    Signed-off-by: Dan Carpenter
    Signed-off-by: David S. Miller

    Dan Carpenter
     

14 Sep, 2018

2 commits

  • Matteo reported the following splat, testing the datapath of TC 'sample':

    BUG: KASAN: null-ptr-deref in tcf_sample_act+0xc4/0x310
    Read of size 8 at addr 0000000000000000 by task nc/433

    CPU: 0 PID: 433 Comm: nc Not tainted 4.19.0-rc3-kvm #17
    Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS ?-20180531_142017-buildhw-08.phx2.fedoraproject.org-1.fc28 04/01/2014
    Call Trace:
    kasan_report.cold.6+0x6c/0x2fa
    tcf_sample_act+0xc4/0x310
    ? dev_hard_start_xmit+0x117/0x180
    tcf_action_exec+0xa3/0x160
    tcf_classify+0xdd/0x1d0
    htb_enqueue+0x18e/0x6b0
    ? deref_stack_reg+0x7a/0xb0
    ? htb_delete+0x4b0/0x4b0
    ? unwind_next_frame+0x819/0x8f0
    ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
    __dev_queue_xmit+0x722/0xca0
    ? unwind_get_return_address_ptr+0x50/0x50
    ? netdev_pick_tx+0xe0/0xe0
    ? save_stack+0x8c/0xb0
    ? kasan_kmalloc+0xbe/0xd0
    ? __kmalloc_track_caller+0xe4/0x1c0
    ? __kmalloc_reserve.isra.45+0x24/0x70
    ? __alloc_skb+0xdd/0x2e0
    ? sk_stream_alloc_skb+0x91/0x3b0
    ? tcp_sendmsg_locked+0x71b/0x15a0
    ? tcp_sendmsg+0x22/0x40
    ? __sys_sendto+0x1b0/0x250
    ? __x64_sys_sendto+0x6f/0x80
    ? do_syscall_64+0x5d/0x150
    ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
    ? __sys_sendto+0x1b0/0x250
    ? __x64_sys_sendto+0x6f/0x80
    ? do_syscall_64+0x5d/0x150
    ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
    ip_finish_output2+0x495/0x590
    ? ip_copy_metadata+0x2e0/0x2e0
    ? skb_gso_validate_network_len+0x6f/0x110
    ? ip_finish_output+0x174/0x280
    __tcp_transmit_skb+0xb17/0x12b0
    ? __tcp_select_window+0x380/0x380
    tcp_write_xmit+0x913/0x1de0
    ? __sk_mem_schedule+0x50/0x80
    tcp_sendmsg_locked+0x49d/0x15a0
    ? tcp_rcv_established+0x8da/0xa30
    ? tcp_set_state+0x220/0x220
    ? clear_user+0x1f/0x50
    ? iov_iter_zero+0x1ae/0x590
    ? __fget_light+0xa0/0xe0
    tcp_sendmsg+0x22/0x40
    __sys_sendto+0x1b0/0x250
    ? __ia32_sys_getpeername+0x40/0x40
    ? _copy_to_user+0x58/0x70
    ? poll_select_copy_remaining+0x176/0x200
    ? __pollwait+0x1c0/0x1c0
    ? ktime_get_ts64+0x11f/0x140
    ? kern_select+0x108/0x150
    ? core_sys_select+0x360/0x360
    ? vfs_read+0x127/0x150
    ? kernel_write+0x90/0x90
    __x64_sys_sendto+0x6f/0x80
    do_syscall_64+0x5d/0x150
    entry_SYSCALL_64_after_hwframe+0x44/0xa9
    RIP: 0033:0x7fefef2b129d
    Code: ff ff ff ff eb b6 0f 1f 80 00 00 00 00 48 8d 05 51 37 0c 00 41 89 ca 8b 00 85 c0 75 20 45 31 c9 45 31 c0 b8 2c 00 00 00 0f 05 3d 00 f0 ff ff 77 6b f3 c3 66 0f 1f 84 00 00 00 00 00 41 56 41
    RSP: 002b:00007fff2f5350c8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
    RAX: ffffffffffffffda RBX: 000056118d60c120 RCX: 00007fefef2b129d
    RDX: 0000000000002000 RSI: 000056118d629320 RDI: 0000000000000003
    RBP: 000056118d530370 R08: 0000000000000000 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000002000
    R13: 000056118d5c2a10 R14: 000056118d5c2a10 R15: 000056118d5303b8

    tcf_sample_act() tried to update its per-cpu stats, but tcf_sample_init()
    forgot to allocate them, because tcf_idr_create() was called with a wrong
    value of 'cpustats'. Setting it to true proved to fix the reported crash.

    Reported-by: Matteo Croce
    Fixes: 65a206c01e8e ("net/sched: Change act_api and act_xxx modules to use IDR")
    Fixes: 5c5670fae430 ("net/sched: Introduce sample tc action")
    Tested-by: Matteo Croce
    Signed-off-by: Davide Caratti
    Acked-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Davide Caratti
     
  • When we delete a chain of filters, we need to notify
    user-space we are deleting each filters in this chain
    too.

    Fixes: 32a4f5ecd738 ("net: sched: introduce chain object to uapi")
    Cc: Jiri Pirko
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Cong Wang
     

08 Sep, 2018

1 commit

  • When nla_put*() fails after nla_nest_start(), we need
    to call nla_nest_cancel() to cancel the message, otherwise
    we end up calling nla_nest_end() like a success.

    Fixes: 0ed5269f9e41 ("net/sched: add tunnel option support to act_tunnel_key")
    Cc: Davide Caratti
    Cc: Simon Horman
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Cong Wang
     

06 Sep, 2018

1 commit

  • If users try to install act_tunnel_key 'set' rules with duplicate values
    of 'index', the tunnel metadata are allocated, but never released. Then,
    kmemleak complains as follows:

    # tc a a a tunnel_key set src_ip 1.1.1.1 dst_ip 2.2.2.2 id 42 index 111
    # echo clear > /sys/kernel/debug/kmemleak
    # tc a a a tunnel_key set src_ip 1.1.1.1 dst_ip 2.2.2.2 id 42 index 111
    Error: TC IDR already exists.
    We have an error talking to the kernel
    # echo scan > /sys/kernel/debug/kmemleak
    # cat /sys/kernel/debug/kmemleak
    unreferenced object 0xffff8800574e6c80 (size 256):
    comm "tc", pid 5617, jiffies 4298118009 (age 57.990s)
    hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 1c e8 b0 ff ff ff ff ................
    81 24 c2 ad ff ff ff ff 00 00 00 00 00 00 00 00 .$..............
    backtrace:
    [] tunnel_key_init+0x8a5/0x1800 [act_tunnel_key]
    [] tcf_action_init_1+0x698/0xac0
    [] tcf_action_init+0x15c/0x590
    [] tc_ctl_action+0x336/0x5c2
    [] rtnetlink_rcv_msg+0x357/0x8e0
    [] netlink_rcv_skb+0x124/0x350
    [] netlink_unicast+0x40f/0x5d0
    [] netlink_sendmsg+0x6e8/0xba0
    [] sock_sendmsg+0xb3/0xf0
    [] ___sys_sendmsg+0x654/0x960
    [] __sys_sendmsg+0xd3/0x170
    [] do_syscall_64+0xa5/0x470
    [] entry_SYSCALL_64_after_hwframe+0x49/0xbe
    [] 0xffffffffffffffff

    This problem theoretically happens also in case users attempt to setup a
    geneve rule having wrong configuration data, or when the kernel fails to
    allocate 'params_new'. Ensure that tunnel_key_init() releases the tunnel
    metadata also in the above conditions.

    Addresses-Coverity-ID: 1373974 ("Resource leak")
    Fixes: d0f6dd8a914f4 ("net/sched: Introduce act_tunnel_key")
    Fixes: 0ed5269f9e41f ("net/sched: add tunnel option support to act_tunnel_key")
    Signed-off-by: Davide Caratti
    Acked-by: Cong Wang
    Signed-off-by: David S. Miller

    Davide Caratti
     

05 Sep, 2018

2 commits

  • Recent refactoring of add_metainfo() caused use_all_metadata() to add
    metainfo to ife action metalist without taking reference to module. This
    causes warning in module_put called from ife action cleanup function.

    Implement add_metainfo_and_get_ops() function that returns with reference
    to module taken if metainfo was added successfully, and call it from
    use_all_metadata(), instead of calling __add_metainfo() directly.

    Example warning:

    [ 646.344393] WARNING: CPU: 1 PID: 2278 at kernel/module.c:1139 module_put+0x1cb/0x230
    [ 646.352437] Modules linked in: act_meta_skbtcindex act_meta_mark act_meta_skbprio act_ife ife veth nfsv3 nfs fscache xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c tun ebtable_filter ebtables ip6table_filter ip6_tables bridge stp llc mlx5_ib ib_uverbs ib_core intel_rapl sb_edac x86_pkg_temp_thermal mlx5_core coretemp kvm_intel kvm nfsd igb irqbypass crct10dif_pclmul devlink crc32_pclmul mei_me joydev ses crc32c_intel enclosure auth_rpcgss i2c_algo_bit ioatdma ptp mei pps_core ghash_clmulni_intel iTCO_wdt iTCO_vendor_support pcspkr dca ipmi_ssif lpc_ich target_core_mod i2c_i801 ipmi_si ipmi_devintf pcc_cpufreq wmi ipmi_msghandler nfs_acl lockd acpi_pad acpi_power_meter grace sunrpc mpt3sas raid_class scsi_transport_sas
    [ 646.425631] CPU: 1 PID: 2278 Comm: tc Not tainted 4.19.0-rc1+ #799
    [ 646.432187] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017
    [ 646.440595] RIP: 0010:module_put+0x1cb/0x230
    [ 646.445238] Code: f3 66 94 02 e8 26 ff fa ff 85 c0 74 11 0f b6 1d 51 30 94 02 80 fb 01 77 60 83 e3 01 74 13 65 ff 0d 3a 83 db 73 e9 2b ff ff ff 0b e9 00 ff ff ff e8 59 01 fb ff 85 c0 75 e4 48 c7 c2 20 62 6b
    [ 646.464997] RSP: 0018:ffff880354d37068 EFLAGS: 00010286
    [ 646.470599] RAX: 0000000000000000 RBX: ffffffffc0a52518 RCX: ffffffff8c2668db
    [ 646.478118] RDX: 0000000000000003 RSI: dffffc0000000000 RDI: ffffffffc0a52518
    [ 646.485641] RBP: ffffffffc0a52180 R08: fffffbfff814a4a4 R09: fffffbfff814a4a3
    [ 646.493164] R10: ffffffffc0a5251b R11: fffffbfff814a4a4 R12: 1ffff1006a9a6e0d
    [ 646.500687] R13: 00000000ffffffff R14: ffff880362bab890 R15: dead000000000100
    [ 646.508213] FS: 00007f4164c99800(0000) GS:ffff88036fe40000(0000) knlGS:0000000000000000
    [ 646.516961] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 646.523080] CR2: 00007f41638b8420 CR3: 0000000351df0004 CR4: 00000000001606e0
    [ 646.530595] Call Trace:
    [ 646.533408] ? find_symbol_in_section+0x260/0x260
    [ 646.538509] tcf_ife_cleanup+0x11b/0x200 [act_ife]
    [ 646.543695] tcf_action_cleanup+0x29/0xa0
    [ 646.548078] __tcf_action_put+0x5a/0xb0
    [ 646.552289] ? nla_put+0x65/0xe0
    [ 646.555889] __tcf_idr_release+0x48/0x60
    [ 646.560187] tcf_generic_walker+0x448/0x6b0
    [ 646.564764] ? tcf_action_dump_1+0x450/0x450
    [ 646.569411] ? __lock_is_held+0x84/0x110
    [ 646.573720] ? tcf_ife_walker+0x10c/0x20f [act_ife]
    [ 646.578982] tca_action_gd+0x972/0xc40
    [ 646.583129] ? tca_get_fill.constprop.17+0x250/0x250
    [ 646.588471] ? mark_lock+0xcf/0x980
    [ 646.592324] ? check_chain_key+0x140/0x1f0
    [ 646.596832] ? debug_show_all_locks+0x240/0x240
    [ 646.601839] ? memset+0x1f/0x40
    [ 646.605350] ? nla_parse+0xca/0x1a0
    [ 646.609217] tc_ctl_action+0x215/0x230
    [ 646.613339] ? tcf_action_add+0x220/0x220
    [ 646.617748] rtnetlink_rcv_msg+0x56a/0x6d0
    [ 646.622227] ? rtnl_fdb_del+0x3f0/0x3f0
    [ 646.626466] netlink_rcv_skb+0x18d/0x200
    [ 646.630752] ? rtnl_fdb_del+0x3f0/0x3f0
    [ 646.634959] ? netlink_ack+0x500/0x500
    [ 646.639106] netlink_unicast+0x2d0/0x370
    [ 646.643409] ? netlink_attachskb+0x340/0x340
    [ 646.648050] ? _copy_from_iter_full+0xe9/0x3e0
    [ 646.652870] ? import_iovec+0x11e/0x1c0
    [ 646.657083] netlink_sendmsg+0x3b9/0x6a0
    [ 646.661388] ? netlink_unicast+0x370/0x370
    [ 646.665877] ? netlink_unicast+0x370/0x370
    [ 646.670351] sock_sendmsg+0x6b/0x80
    [ 646.674212] ___sys_sendmsg+0x4a1/0x520
    [ 646.678443] ? copy_msghdr_from_user+0x210/0x210
    [ 646.683463] ? lock_downgrade+0x320/0x320
    [ 646.687849] ? debug_show_all_locks+0x240/0x240
    [ 646.692760] ? do_raw_spin_unlock+0xa2/0x130
    [ 646.697418] ? _raw_spin_unlock+0x24/0x30
    [ 646.701798] ? __handle_mm_fault+0x1819/0x1c10
    [ 646.706619] ? __pmd_alloc+0x320/0x320
    [ 646.710738] ? debug_show_all_locks+0x240/0x240
    [ 646.715649] ? restore_nameidata+0x7b/0xa0
    [ 646.720117] ? check_chain_key+0x140/0x1f0
    [ 646.724590] ? check_chain_key+0x140/0x1f0
    [ 646.729070] ? __fget_light+0xbc/0xd0
    [ 646.733121] ? __sys_sendmsg+0xd7/0x150
    [ 646.737329] __sys_sendmsg+0xd7/0x150
    [ 646.741359] ? __ia32_sys_shutdown+0x30/0x30
    [ 646.746003] ? up_read+0x53/0x90
    [ 646.749601] ? __do_page_fault+0x484/0x780
    [ 646.754105] ? do_syscall_64+0x1e/0x2c0
    [ 646.758320] do_syscall_64+0x72/0x2c0
    [ 646.762353] entry_SYSCALL_64_after_hwframe+0x49/0xbe
    [ 646.767776] RIP: 0033:0x7f4163872150
    [ 646.771713] Code: 8b 15 3c 7d 2b 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb cd 66 0f 1f 44 00 00 83 3d b9 d5 2b 00 00 75 10 b8 2e 00 00 00 0f 05 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 be cd 00 00 48 89 04 24
    [ 646.791474] RSP: 002b:00007ffdef7d6b58 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
    [ 646.799721] RAX: ffffffffffffffda RBX: 0000000000000024 RCX: 00007f4163872150
    [ 646.807240] RDX: 0000000000000000 RSI: 00007ffdef7d6bd0 RDI: 0000000000000003
    [ 646.814760] RBP: 000000005b8b9482 R08: 0000000000000001 R09: 0000000000000000
    [ 646.822286] R10: 00000000000005e7 R11: 0000000000000246 R12: 00007ffdef7dad20
    [ 646.829807] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000679bc0
    [ 646.837360] irq event stamp: 6083
    [ 646.841043] hardirqs last enabled at (6081): [] __call_rcu+0x17d/0x500
    [ 646.849882] hardirqs last disabled at (6083): [] trace_hardirqs_off_thunk+0x1a/0x1c
    [ 646.859775] softirqs last enabled at (5968): [] __do_softirq+0x4a1/0x6ee
    [ 646.868784] softirqs last disabled at (6082): [] tcf_ife_cleanup+0x39/0x200 [act_ife]
    [ 646.878845] ---[ end trace b1b8c12ffe51e657 ]---

    Fixes: 5ffe57da29b3 ("act_ife: fix a potential deadlock")
    Signed-off-by: Vlad Buslov
    Acked-by: Cong Wang
    Signed-off-by: David S. Miller

    Vlad Buslov
     
  • Immediately after module_put(), user could delete this
    module, so e->ops could be already freed before we call
    e->ops->release().

    Fix this by moving module_put() after ops->release().

    Fixes: ef6980b6becb ("introduce IFE action")
    Cc: Jamal Hadi Salim
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Cong Wang
     

04 Sep, 2018

1 commit

  • Currently, tcf_action_delete() nulls actions array pointer after putting
    and deleting it. However, if tcf_idr_delete_index() returns an error,
    pointer to action is not set to null. That results it being released second
    time in error handling code of tca_action_gd().

    Kasan error:

    [ 807.367755] ==================================================================
    [ 807.375844] BUG: KASAN: use-after-free in tc_setup_cb_call+0x14e/0x250
    [ 807.382763] Read of size 8 at addr ffff88033e636000 by task tc/2732

    [ 807.391289] CPU: 0 PID: 2732 Comm: tc Tainted: G W 4.19.0-rc1+ #799
    [ 807.399542] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017
    [ 807.407948] Call Trace:
    [ 807.410763] dump_stack+0x92/0xeb
    [ 807.414456] print_address_description+0x70/0x360
    [ 807.419549] kasan_report+0x14d/0x300
    [ 807.423582] ? tc_setup_cb_call+0x14e/0x250
    [ 807.428150] tc_setup_cb_call+0x14e/0x250
    [ 807.432539] ? nla_put+0x65/0xe0
    [ 807.436146] fl_dump+0x394/0x3f0 [cls_flower]
    [ 807.440890] ? fl_tmplt_dump+0x140/0x140 [cls_flower]
    [ 807.446327] ? lock_downgrade+0x320/0x320
    [ 807.450702] ? lock_acquire+0xe2/0x220
    [ 807.454819] ? is_bpf_text_address+0x5/0x140
    [ 807.459475] ? memcpy+0x34/0x50
    [ 807.462980] ? nla_put+0x65/0xe0
    [ 807.466582] tcf_fill_node+0x341/0x430
    [ 807.470717] ? tcf_block_put+0xe0/0xe0
    [ 807.474859] tcf_node_dump+0xdb/0xf0
    [ 807.478821] fl_walk+0x8e/0x170 [cls_flower]
    [ 807.483474] tcf_chain_dump+0x35a/0x4d0
    [ 807.487703] ? tfilter_notify+0x170/0x170
    [ 807.492091] ? tcf_fill_node+0x430/0x430
    [ 807.496411] tc_dump_tfilter+0x362/0x3f0
    [ 807.500712] ? tc_del_tfilter+0x850/0x850
    [ 807.505104] ? kasan_unpoison_shadow+0x30/0x40
    [ 807.509940] ? __mutex_unlock_slowpath+0xcf/0x410
    [ 807.515031] netlink_dump+0x263/0x4f0
    [ 807.519077] __netlink_dump_start+0x2a0/0x300
    [ 807.523817] ? tc_del_tfilter+0x850/0x850
    [ 807.528198] rtnetlink_rcv_msg+0x46a/0x6d0
    [ 807.532671] ? rtnl_fdb_del+0x3f0/0x3f0
    [ 807.536878] ? tc_del_tfilter+0x850/0x850
    [ 807.541280] netlink_rcv_skb+0x18d/0x200
    [ 807.545570] ? rtnl_fdb_del+0x3f0/0x3f0
    [ 807.549773] ? netlink_ack+0x500/0x500
    [ 807.553913] netlink_unicast+0x2d0/0x370
    [ 807.558212] ? netlink_attachskb+0x340/0x340
    [ 807.562855] ? _copy_from_iter_full+0xe9/0x3e0
    [ 807.567677] ? import_iovec+0x11e/0x1c0
    [ 807.571890] netlink_sendmsg+0x3b9/0x6a0
    [ 807.576192] ? netlink_unicast+0x370/0x370
    [ 807.580684] ? netlink_unicast+0x370/0x370
    [ 807.585154] sock_sendmsg+0x6b/0x80
    [ 807.589015] ___sys_sendmsg+0x4a1/0x520
    [ 807.593230] ? copy_msghdr_from_user+0x210/0x210
    [ 807.598232] ? do_wp_page+0x174/0x880
    [ 807.602276] ? __handle_mm_fault+0x749/0x1c10
    [ 807.607021] ? __handle_mm_fault+0x1046/0x1c10
    [ 807.611849] ? __pmd_alloc+0x320/0x320
    [ 807.615973] ? check_chain_key+0x140/0x1f0
    [ 807.620450] ? check_chain_key+0x140/0x1f0
    [ 807.624929] ? __fget_light+0xbc/0xd0
    [ 807.628970] ? __sys_sendmsg+0xd7/0x150
    [ 807.633172] __sys_sendmsg+0xd7/0x150
    [ 807.637201] ? __ia32_sys_shutdown+0x30/0x30
    [ 807.641846] ? up_read+0x53/0x90
    [ 807.645442] ? __do_page_fault+0x484/0x780
    [ 807.649949] ? do_syscall_64+0x1e/0x2c0
    [ 807.654164] do_syscall_64+0x72/0x2c0
    [ 807.658198] entry_SYSCALL_64_after_hwframe+0x49/0xbe
    [ 807.663625] RIP: 0033:0x7f42e9870150
    [ 807.667568] Code: 8b 15 3c 7d 2b 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb cd 66 0f 1f 44 00 00 83 3d b9 d5 2b 00 00 75 10 b8 2e 00 00 00 0f 05 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 be cd 00 00 48 89 04 24
    [ 807.687328] RSP: 002b:00007ffdbf595b58 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
    [ 807.695564] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f42e9870150
    [ 807.703083] RDX: 0000000000000000 RSI: 00007ffdbf595b80 RDI: 0000000000000003
    [ 807.710605] RBP: 00007ffdbf599d90 R08: 0000000000679bc0 R09: 000000000000000f
    [ 807.718127] R10: 00000000000005e7 R11: 0000000000000246 R12: 00007ffdbf599d88
    [ 807.725651] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000

    [ 807.735048] Allocated by task 2687:
    [ 807.738902] kasan_kmalloc+0xa0/0xd0
    [ 807.742852] __kmalloc+0x118/0x2d0
    [ 807.746615] tcf_idr_create+0x44/0x320
    [ 807.750738] tcf_nat_init+0x41e/0x530 [act_nat]
    [ 807.755638] tcf_action_init_1+0x4e0/0x650
    [ 807.760104] tcf_action_init+0x1ce/0x2d0
    [ 807.764395] tcf_exts_validate+0x1d8/0x200
    [ 807.768861] fl_change+0x55a/0x26b4 [cls_flower]
    [ 807.773845] tc_new_tfilter+0x748/0xa20
    [ 807.778051] rtnetlink_rcv_msg+0x56a/0x6d0
    [ 807.782517] netlink_rcv_skb+0x18d/0x200
    [ 807.786804] netlink_unicast+0x2d0/0x370
    [ 807.791095] netlink_sendmsg+0x3b9/0x6a0
    [ 807.795387] sock_sendmsg+0x6b/0x80
    [ 807.799240] ___sys_sendmsg+0x4a1/0x520
    [ 807.803445] __sys_sendmsg+0xd7/0x150
    [ 807.807473] do_syscall_64+0x72/0x2c0
    [ 807.811506] entry_SYSCALL_64_after_hwframe+0x49/0xbe

    [ 807.818776] Freed by task 2728:
    [ 807.822283] __kasan_slab_free+0x122/0x180
    [ 807.826752] kfree+0xf4/0x2f0
    [ 807.830080] __tcf_action_put+0x5a/0xb0
    [ 807.834281] tcf_action_put_many+0x46/0x70
    [ 807.838747] tca_action_gd+0x232/0xc40
    [ 807.842862] tc_ctl_action+0x215/0x230
    [ 807.846977] rtnetlink_rcv_msg+0x56a/0x6d0
    [ 807.851444] netlink_rcv_skb+0x18d/0x200
    [ 807.855731] netlink_unicast+0x2d0/0x370
    [ 807.860021] netlink_sendmsg+0x3b9/0x6a0
    [ 807.864312] sock_sendmsg+0x6b/0x80
    [ 807.868166] ___sys_sendmsg+0x4a1/0x520
    [ 807.872372] __sys_sendmsg+0xd7/0x150
    [ 807.876401] do_syscall_64+0x72/0x2c0
    [ 807.880431] entry_SYSCALL_64_after_hwframe+0x49/0xbe

    [ 807.887704] The buggy address belongs to the object at ffff88033e636000
    which belongs to the cache kmalloc-256 of size 256
    [ 807.900909] The buggy address is located 0 bytes inside of
    256-byte region [ffff88033e636000, ffff88033e636100)
    [ 807.913155] The buggy address belongs to the page:
    [ 807.918322] page:ffffea000cf98d80 count:1 mapcount:0 mapping:ffff88036f80ee00 index:0x0 compound_mapcount: 0
    [ 807.928831] flags: 0x5fff8000008100(slab|head)
    [ 807.933647] raw: 005fff8000008100 ffffea000db44f00 0000000400000004 ffff88036f80ee00
    [ 807.942050] raw: 0000000000000000 0000000080190019 00000001ffffffff 0000000000000000
    [ 807.950456] page dumped because: kasan: bad access detected

    [ 807.958240] Memory state around the buggy address:
    [ 807.963405] ffff88033e635f00: fc fc fc fc fb fb fb fb fb fb fb fc fc fc fc fb
    [ 807.971288] ffff88033e635f80: fb fb fb fb fb fb fc fc fc fc fc fc fc fc fc fc
    [ 807.979166] >ffff88033e636000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
    [ 807.994882] ^
    [ 807.998477] ffff88033e636080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
    [ 808.006352] ffff88033e636100: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
    [ 808.014230] ==================================================================
    [ 808.022108] Disabling lock debugging due to kernel taint

    Fixes: edfaf94fa705 ("net_sched: improve and refactor tcf_action_put_many()")
    Signed-off-by: Vlad Buslov
    Acked-by: Cong Wang
    Signed-off-by: David S. Miller

    Vlad Buslov
     

30 Aug, 2018

2 commits

  • After the commit 802bfb19152c ("net/sched: user-space can't set
    unknown tcfa_action values"), unknown tcfa_action values are
    converted to TC_ACT_UNSPEC, but the common agreement is instead
    rejecting such configurations.

    This change also introduces a helper to simplify the destruction
    of a single action, avoiding code duplication.

    v1 -> v2:
    - helper is now static and renamed according to act_* convention
    - updated extack message, according to the new behavior

    Fixes: 802bfb19152c ("net/sched: user-space can't set unknown tcfa_action values")
    Signed-off-by: Paolo Abeni
    Acked-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Paolo Abeni
     
  • in the (rare) case of failure in nla_nest_start(), missing NULL checks in
    tcf_pedit_key_ex_dump() can make the following command

    # tc action add action pedit ex munge ip ttl set 64

    dereference a NULL pointer:

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
    PGD 800000007d1cd067 P4D 800000007d1cd067 PUD 7acd3067 PMD 0
    Oops: 0002 [#1] SMP PTI
    CPU: 0 PID: 3336 Comm: tc Tainted: G E 4.18.0.pedit+ #425
    Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
    RIP: 0010:tcf_pedit_dump+0x19d/0x358 [act_pedit]
    Code: be 02 00 00 00 48 89 df 66 89 44 24 20 e8 9b b1 fd e0 85 c0 75 46 8b 83 c8 00 00 00 49 83 c5 08 48 03 83 d0 00 00 00 4d 39 f5 89 04 25 00 00 00 00 0f 84 81 01 00 00 41 8b 45 00 48 8d 4c 24
    RSP: 0018:ffffb5d4004478a8 EFLAGS: 00010246
    RAX: ffff8880fcda2070 RBX: ffff8880fadd2900 RCX: 0000000000000000
    RDX: 0000000000000002 RSI: ffffb5d4004478ca RDI: ffff8880fcda206e
    RBP: ffff8880fb9cb900 R08: 0000000000000008 R09: ffff8880fcda206e
    R10: ffff8880fadd2900 R11: 0000000000000000 R12: ffff8880fd26cf40
    R13: ffff8880fc957430 R14: ffff8880fc957430 R15: ffff8880fb9cb988
    FS: 00007f75a537a740(0000) GS:ffff8880fda00000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000000000000 CR3: 000000007a2fa005 CR4: 00000000001606f0
    Call Trace:
    ? __nla_reserve+0x38/0x50
    tcf_action_dump_1+0xd2/0x130
    tcf_action_dump+0x6a/0xf0
    tca_get_fill.constprop.31+0xa3/0x120
    tcf_action_add+0xd1/0x170
    tc_ctl_action+0x137/0x150
    rtnetlink_rcv_msg+0x263/0x2d0
    ? _cond_resched+0x15/0x40
    ? rtnl_calcit.isra.30+0x110/0x110
    netlink_rcv_skb+0x4d/0x130
    netlink_unicast+0x1a3/0x250
    netlink_sendmsg+0x2ae/0x3a0
    sock_sendmsg+0x36/0x40
    ___sys_sendmsg+0x26f/0x2d0
    ? do_wp_page+0x8e/0x5f0
    ? handle_pte_fault+0x6c3/0xf50
    ? __handle_mm_fault+0x38e/0x520
    ? __sys_sendmsg+0x5e/0xa0
    __sys_sendmsg+0x5e/0xa0
    do_syscall_64+0x5b/0x180
    entry_SYSCALL_64_after_hwframe+0x44/0xa9
    RIP: 0033:0x7f75a4583ba0
    Code: c3 48 8b 05 f2 62 2c 00 f7 db 64 89 18 48 83 cb ff eb dd 0f 1f 80 00 00 00 00 83 3d fd c3 2c 00 00 75 10 b8 2e 00 00 00 0f 05 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 ae cc 00 00 48 89 04 24
    RSP: 002b:00007fff60ee7418 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
    RAX: ffffffffffffffda RBX: 00007fff60ee7540 RCX: 00007f75a4583ba0
    RDX: 0000000000000000 RSI: 00007fff60ee7490 RDI: 0000000000000003
    RBP: 000000005b842d3e R08: 0000000000000002 R09: 0000000000000000
    R10: 00007fff60ee6ea0 R11: 0000000000000246 R12: 0000000000000000
    R13: 00007fff60ee7554 R14: 0000000000000001 R15: 000000000066c100
    Modules linked in: act_pedit(E) ip6table_filter ip6_tables iptable_filter binfmt_misc crct10dif_pclmul ext4 crc32_pclmul mbcache ghash_clmulni_intel jbd2 pcbc snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm aesni_intel crypto_simd snd_timer cryptd glue_helper snd joydev pcspkr soundcore virtio_balloon i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c ata_generic pata_acpi virtio_net net_failover virtio_blk virtio_console failover qxl crc32c_intel drm_kms_helper syscopyarea serio_raw sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix virtio_pci libata virtio_ring i2c_core virtio floppy dm_mirror dm_region_hash dm_log dm_mod [last unloaded: act_pedit]
    CR2: 0000000000000000

    Like it's done for other TC actions, give up dumping pedit rules and return
    an error if nla_nest_start() returns NULL.

    Fixes: 71d0ed7079df ("net/act_pedit: Support using offset relative to the conventional network headers")
    Signed-off-by: Davide Caratti
    Acked-by: Cong Wang
    Signed-off-by: David S. Miller

    Davide Caratti
     

28 Aug, 2018

2 commits


27 Aug, 2018

1 commit

  • Via u32_change(), TCA_U32_SEL has an unspecified type in the netlink
    policy, so max length isn't enforced, only minimum. This means nkeys
    (from userspace) was being trusted without checking the actual size of
    nla_len(), which could lead to a memory over-read, and ultimately an
    exposure via a call to u32_dump(). Reachability is CAP_NET_ADMIN within
    a namespace.

    Reported-by: Al Viro
    Cc: Jamal Hadi Salim
    Cc: Cong Wang
    Cc: Jiri Pirko
    Cc: "David S. Miller"
    Cc: netdev@vger.kernel.org
    Signed-off-by: Kees Cook
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    Kees Cook