02 Nov, 2017

2 commits

  • Change all relevant instances of aleksandar.markovic@imgtec.com
    email address to aleksandar.markovic@mips.com.

    Signed-off-by: Miodrag Dinic
    Signed-off-by: Aleksandar Markovic
    Patchwork: https://patchwork.linux-mips.org/patch/17514/
    Signed-off-by: James Hogan

    Aleksandar Markovic
     
  • Commit 1ec9dd80bedc ("MIPS: CPS: Detect CPUs in secondary clusters")
    added a check in cps_boot_secondary() that the secondary being booted is
    in the same cluster as the CPU running this code. This check is
    performed using current_cpu_data without disabling preemption. As such
    when CONFIG_PREEMPT=y, a BUG is triggered:

    [ 57.991693] BUG: using smp_processor_id() in preemptible [00000000] code: hotplug/1749

    [ 58.063077] Call Trace:
    [ 58.065842] [] show_stack+0x84/0x114
    [ 58.070830] [] dump_stack+0xf8/0x140
    [ 58.075796] [] check_preemption_disabled+0xec/0x118
    [ 58.082204] [] cps_boot_secondary+0x84/0x44c
    [ 58.087935] [] __cpu_up+0x34/0x98
    [ 58.092624] [] bringup_cpu+0x38/0x114
    [ 58.097680] [] cpuhp_invoke_callback+0x168/0x8f0
    [ 58.103801] [] _cpu_up+0x154/0x1c8
    [ 58.108565] [] do_cpu_up+0x98/0xa8
    [ 58.113333] [] device_online+0x84/0xc0
    [ 58.118481] [] online_store+0x60/0x98
    [ 58.123562] [] kernfs_fop_write+0x158/0x1d4
    [ 58.129196] [] __vfs_write+0x4c/0x168
    [ 58.134247] [] vfs_write+0xe0/0x190
    [ 58.139095] [] SyS_write+0x68/0xc4
    [ 58.143854] [] syscall_common+0x34/0x58

    In reality we don't currently support running the kernel on CPUs not in
    cluster 0, so the answer to cpu_cluster(¤t_cpu_data) will always
    be 0, even if this task being preempted and continues running on a
    different CPU. Regardless, the BUG should not be triggered, so fix this
    by switching to raw_current_cpu_data. When multicluster support lands
    upstream this check will need removing or changing anyway.

    Fixes: 1ec9dd80bedc ("MIPS: CPS: Detect CPUs in secondary clusters")
    Signed-off-by: Matt Redfearn
    Reviewed-by: Paul Burton
    CC: linux-mips@linux-mips.org
    Patchwork: https://patchwork.linux-mips.org/patch/17563/
    Signed-off-by: James Hogan

    Matt Redfearn
     

01 Nov, 2017

8 commits

  • Commit 6f542ebeaee0 ("MIPS: Fix race on setting and getting
    cpu_online_mask") effectively reverted commit 8f46cca1e6c06 ("MIPS: SMP:
    Fix possibility of deadlock when bringing CPUs online") and thus has
    reinstated the possibility of deadlock.

    The commit was based on testing of kernel v4.4, where the CPU hotplug
    core code issued a BUG() if the starting CPU is not marked online when
    the boot CPU returns from __cpu_up. The commit fixes this race (in
    v4.4), but re-introduces the deadlock situation.

    As noted in the commit message, upstream differs in this area. Commit
    8df3e07e7f21f ("cpu/hotplug: Let upcoming cpu bring itself fully up")
    adds a completion event in the CPU hotplug core code, making this race
    impossible. However, people were unhappy with relying on the core code
    to do the right thing.

    To address the issues both commits were trying to fix, add a second
    completion event in the MIPS smp hotplug path. It removes the
    possibility of a race, since the MIPS smp hotplug code now synchronises
    both the boot and secondary CPUs before they return to the hotplug core
    code. It also addresses the deadlock by ensuring that the secondary CPU
    is not marked online before it's counters are synchronised.

    This fix should also be backported to fix the race condition introduced
    by the backport of commit 8f46cca1e6c06 ("MIPS: SMP: Fix possibility of
    deadlock when bringing CPUs online"), through really that race only
    existed before commit 8df3e07e7f21f ("cpu/hotplug: Let upcoming cpu
    bring itself fully up").

    Signed-off-by: Matt Redfearn
    Fixes: 6f542ebeaee0 ("MIPS: Fix race on setting and getting cpu_online_mask")
    CC: Matija Glavinic Pecotic
    Cc: # v4.1+: 8f46cca1e6c0: "MIPS: SMP: Fix possibility of deadlock when bringing CPUs online"
    Cc: # v4.1+: a00eeede507c: "MIPS: SMP: Use a completion event to signal CPU up"
    Cc: # v4.1+: 6f542ebeaee0: "MIPS: Fix race on setting and getting cpu_online_mask"
    Cc: # v4.1+
    Patchwork: https://patchwork.linux-mips.org/patch/17376/
    Signed-off-by: James Hogan

    Matt Redfearn
     
  • Fix a typo in build_one_insn().

    Fixes: b6bd53f9c4e8 ("MIPS: Add missing file for eBPF JIT.")
    Signed-off-by: Wei Yongjun
    Cc: # 4.13+
    Patchwork: https://patchwork.linux-mips.org/patch/17491/
    Signed-off-by: James Hogan

    Wei Yongjun
     
  • It seems that this is a typo error and the proper bit masking is
    "RT | RS" instead of "RS | RS".

    This issue was detected with the help of Coccinelle.

    Fixes: d6b3314b49e1 ("MIPS: uasm: Add lh uam instruction")
    Reported-by: Julia Lawall
    Signed-off-by: Gustavo A. R. Silva
    Reviewed-by: James Hogan
    Cc: # 3.16+
    Patchwork: https://patchwork.linux-mips.org/patch/17551/
    Signed-off-by: James Hogan

    Gustavo A. R. Silva
     
  • The default CM target field in the GCR_BASE register is encoded with 0
    meaning memory & 1 being reserved. However the definitions we use for
    those bits effectively get these two values backwards - likely because
    they were copied from the definitions for the CM regions where the
    target is encoded differently. This results in use setting up GCR_BASE
    with the reserved target value by default, rather than targeting memory
    as intended. Although we currently seem to get away with this it's not a
    great idea to rely upon.

    Fix this by changing our macros to match the documentated target values.

    The incorrect encoding became used as of commit 9f98f3dd0c51 ("MIPS: Add
    generic CM probe & access code") in the Linux v3.15 cycle, and was
    likely carried forwards from older but unused code introduced by
    commit 39b8d5254246 ("[MIPS] Add support for MIPS CMP platform.") in the
    v2.6.26 cycle.

    Fixes: 9f98f3dd0c51 ("MIPS: Add generic CM probe & access code")
    Signed-off-by: Paul Burton
    Reported-by: Matt Redfearn
    Reviewed-by: James Hogan
    Cc: Matt Redfearn
    Cc: Ralf Baechle
    Cc: linux-mips@linux-mips.org
    Cc: # v3.15+
    Patchwork: https://patchwork.linux-mips.org/patch/17562/
    Signed-off-by: James Hogan

    Paul Burton
     
  • Commit e83f7e02af50c ("MIPS: CPS: Have asm/mips-cps.h include CM & CPC
    headers") adds a #error to arch/mips/include/asm/mips-cpc.h if it is
    included directly. While this commit replaced almost all direct includes
    of mips-cm.h and mips-cpc.h, 2 remain.

    With some defconfigs, mips-cps.h is indirectly included before
    mips-cpc.h, but in others this results in compilation errors:

    In file included from arch/mips/generic/init.c:23:0:
    ./arch/mips/include/asm/mips-cpc.h:12:3: error: #error Please include
    asm/mips-cps.h rather than asm/mips-cpc.h
    # error Please include asm/mips-cps.h rather than asm/mips-cpc.h

    In file included from arch/mips/kernel/smp.c:23:0:
    ./arch/mips/include/asm/mips-cpc.h:12:3: error: #error Please include
    asm/mips-cps.h rather than asm/mips-cpc.h
    # error Please include asm/mips-cps.h rather than asm/mips-cpc.h

    In both cases, fix this by including mips-cps.h instead.

    Fixes: e83f7e02af50c ("MIPS: CPS: Have asm/mips-cps.h include CM & CPC headers")
    Signed-off-by: Matt Redfearn
    Patchwork: https://patchwork.linux-mips.org/patch/17492/
    Signed-off-by: James Hogan

    Matt Redfearn
     
  • Commit 9fef68686317b ("MIPS: Make SAVE_SOME more standard") made several
    changes to the order in which registers are saved in the SAVE_SOME
    macro, used by exception handlers to save the processor state. In
    particular, it removed the
    move k1, sp
    in the delay slot of the branch testing if the processor is already in
    kernel mode. This is replaced later in the macro by a
    move k0, sp
    When CONFIG_EVA is disabled, this instruction actually appears in the
    delay slot of the branch. However, when CONFIG_EVA is enabled, instead
    the RPS workaround of
    MFC0 k0, CP0_ENTRYHI
    appears in the delay slot. This results in k0 not containing the stack
    pointer, but some unrelated value, which is then saved to the kernel
    stack. On exit from the exception, this bogus value is restored to the
    stack pointer, resulting in an OOPS.

    Fix this by moving the save of SP in k0 explicitly in the delay slot of
    the branch, outside of the CONFIG_EVA section, restoring the expected
    instruction ordering when CONFIG_EVA is active.

    Fixes: 9fef68686317b ("MIPS: Make SAVE_SOME more standard")
    Signed-off-by: Matt Redfearn
    Reported-by: Vladimir Kondratiev
    Reviewed-by: Corey Minyard
    Reviewed-by: James Hogan
    Patchwork: https://patchwork.linux-mips.org/patch/17471/
    Signed-off-by: James Hogan

    Matt Redfearn
     
  • Since commit 04a85e087ad6 ("MIPS: generic: Move NI 169445 FIT image
    source to its own file"), a generic 32r2el_defconfig kernel fails to
    build with the following build error:

    ITB arch/mips/boot/vmlinux.gz.itb
    Error: arch/mips/boot/vmlinux.gz.its:111.1-2 syntax error
    FATAL ERROR: Unable to parse input tree
    mkimage Can't read arch/mips/boot/vmlinux.gz.itb.tmp: Invalid argument

    Fix arch/mips/generic/board-ni169445.its.S to include the necessary "/"
    node path before the first open brace.

    The original issue in arch/mips/generic/vmlinux.its.S was fixed directly
    in the original commit 7aacf86b75bc ("MIPS: NI 169445 board support")
    after https://patchwork.linux-mips.org/patch/16941/ was submitted, but
    the separate its.S file wasn't correctly fixed when resolving the
    conflict in commit 04a85e087ad6 ("MIPS: generic: Move NI 169445 FIT
    image source to its own file").

    Fixes: 04a85e087ad6 ("MIPS: generic: Move NI 169445 FIT image source to its own file")
    Signed-off-by: James Hogan
    Cc: Ralf Baechle
    Cc: Paul Burton
    Cc: Nathan Sullivan
    Cc: linux-mips@linux-mips.org
    Patchwork: https://patchwork.linux-mips.org/patch/17561/
    Signed-off-by: James Hogan

    James Hogan
     
  • MIPS will soon not be a part of Imagination Technologies, and as such
    many @imgtec.com email addresses will no longer be valid. This patch
    updates the addresses for those who:

    - Have 10 or more patches in mainline authored using an @imgtec.com
    email address, or any patches dated within the past year.

    - Are still with Imagination but leaving as part of the MIPS business
    unit, as determined from an internal email address list.

    - Haven't already updated their email address (ie. JamesH) or expressed
    a desire to be excluded (ie. Maciej).

    - Acked v2 or earlier of this patch, which leaves Deng-Cheng, Matt &
    myself.

    New addresses are of the form firstname.lastname@mips.com, and all
    verified against an internal email address list. An entry is added to
    .mailmap for each person such that get_maintainer.pl will report the new
    addresses rather than @imgtec.com addresses which will soon be dead.

    Instances of the affected addresses throughout the tree are then
    mechanically replaced with the new @mips.com address.

    Signed-off-by: Paul Burton
    Cc: Deng-Cheng Zhu
    Cc: Deng-Cheng Zhu
    Acked-by: Dengcheng Zhu
    Cc: Matt Redfearn
    Cc: Matt Redfearn
    Acked-by: Matt Redfearn
    Cc: Andrew Morton
    Cc: linux-kernel@vger.kernel.org
    Cc: linux-mips@linux-mips.org
    Cc: trivial@kernel.org
    Patchwork: https://patchwork.linux-mips.org/patch/17540/
    Signed-off-by: James Hogan

    Paul Burton
     

30 Oct, 2017

1 commit


29 Oct, 2017

29 commits

  • Pull networking fixes from David Miller:

    1) Fix route leak in xfrm_bundle_create().

    2) In mac80211, validate user rate mask before configuring it. From
    Johannes Berg.

    3) Properly enforce memory limits in fair queueing code, from Toke
    Hoiland-Jorgensen.

    4) Fix lockdep splat in inet_csk_route_req(), from Eric Dumazet.

    5) Fix TSO header allocation and management in mvpp2 driver, from Yan
    Markman.

    6) Don't take socket lock in BH handler in strparser code, from Tom
    Herbert.

    7) Don't show sockets from other namespaces in AF_UNIX code, from
    Andrei Vagin.

    8) Fix double free in error path of tap_open(), from Girish Moodalbail.

    9) Fix TX map failure path in igb and ixgbe, from Jean-Philippe Brucker
    and Alexander Duyck.

    10) Fix DCB mode programming in stmmac driver, from Jose Abreu.

    11) Fix err_count handling in various tunnels (ipip, ip6_gre). From Xin
    Long.

    12) Properly align SKB head before building SKB in tuntap, from Jason
    Wang.

    13) Avoid matching qdiscs with a zero handle during lookups, from Cong
    Wang.

    14) Fix various endianness bugs in sctp, from Xin Long.

    15) Fix tc filter callback races and add selftests which trigger the
    problem, from Cong Wang.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (73 commits)
    selftests: Introduce a new test case to tc testsuite
    selftests: Introduce a new script to generate tc batch file
    net_sched: fix call_rcu() race on act_sample module removal
    net_sched: add rtnl assertion to tcf_exts_destroy()
    net_sched: use tcf_queue_work() in tcindex filter
    net_sched: use tcf_queue_work() in rsvp filter
    net_sched: use tcf_queue_work() in route filter
    net_sched: use tcf_queue_work() in u32 filter
    net_sched: use tcf_queue_work() in matchall filter
    net_sched: use tcf_queue_work() in fw filter
    net_sched: use tcf_queue_work() in flower filter
    net_sched: use tcf_queue_work() in flow filter
    net_sched: use tcf_queue_work() in cgroup filter
    net_sched: use tcf_queue_work() in bpf filter
    net_sched: use tcf_queue_work() in basic filter
    net_sched: introduce a workqueue for RCU callbacks of tc filter
    sctp: fix some type cast warnings introduced since very beginning
    sctp: fix a type cast warnings that causes a_rwnd gets the wrong value
    sctp: fix some type cast warnings introduced by transport rhashtable
    sctp: fix some type cast warnings introduced by stream reconf
    ...

    Linus Torvalds
     
  • Cong Wang says:

    ====================
    net_sched: fix races with RCU callbacks

    Recently, the RCU callbacks used in TC filters and TC actions keep
    drawing my attention, they introduce at least 4 race condition bugs:

    1. A simple one fixed by Daniel:

    commit c78e1746d3ad7d548bdf3fe491898cc453911a49
    Author: Daniel Borkmann
    Date: Wed May 20 17:13:33 2015 +0200

    net: sched: fix call_rcu() race on classifier module unloads

    2. A very nasty one fixed by me:

    commit 1697c4bb5245649a23f06a144cc38c06715e1b65
    Author: Cong Wang
    Date: Mon Sep 11 16:33:32 2017 -0700

    net_sched: carefully handle tcf_block_put()

    3. Two more bugs found by Chris:
    https://patchwork.ozlabs.org/patch/826696/
    https://patchwork.ozlabs.org/patch/826695/

    Usually RCU callbacks are simple, however for TC filters and actions,
    they are complex because at least TC actions could be destroyed
    together with the TC filter in one callback. And RCU callbacks are
    invoked in BH context, without locking they are parallel too. All of
    these contribute to the cause of these nasty bugs.

    Alternatively, we could also:

    a) Introduce a spinlock to serialize these RCU callbacks. But as I
    said in commit 1697c4bb5245 ("net_sched: carefully handle
    tcf_block_put()"), it is very hard to do because of tcf_chain_dump().
    Potentially we need to do a lot of work to make it possible (if not
    impossible).

    b) Just get rid of these RCU callbacks, because they are not
    necessary at all, callers of these call_rcu() are all on slow paths
    and holding RTNL lock, so blocking is allowed in their contexts.
    However, David and Eric dislike adding synchronize_rcu() here.

    As suggested by Paul, we could defer the work to a workqueue and
    gain the permission of holding RTNL again without any performance
    impact, however, in tcf_block_put() we could have a deadlock when
    flushing workqueue while hodling RTNL lock, the trick here is to
    defer the work itself in workqueue and make it queued after all
    other works so that we keep the same ordering to avoid any
    use-after-free. Please see the first patch for details.

    Patch 1 introduces the infrastructure, patch 2~12 move each
    tc filter to the new tc filter workqueue, patch 13 adds
    an assertion to catch potential bugs like this, patch 14
    closes another rcu callback race, patch 15 and patch 16 add
    new test cases.
    ====================

    Reported-by: Chris Mi
    Cc: Daniel Borkmann
    Cc: Jiri Pirko
    Cc: John Fastabend
    Cc: Jamal Hadi Salim
    Cc: "Paul E. McKenney"
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    David S. Miller
     
  • In this patchset, we fixed a tc bug. This patch adds the test case
    that reproduces the bug. To run this test case, user should specify
    an existing NIC device:
    # sudo ./tdc.py -d enp4s0f0

    This test case belongs to category "flower". If user doesn't specify
    a NIC device, the test cases belong to "flower" will not be run.

    In this test case, we create 1M filters and all filters share the same
    action. When destroying all filters, kernel should not panic. It takes
    about 18s to run it.

    Acked-by: Jamal Hadi Salim
    Acked-by: Lucas Bates
    Signed-off-by: Chris Mi
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Chris Mi
     
  • # ./tdc_batch.py -h
    usage: tdc_batch.py [-h] [-n NUMBER] [-o] [-s] [-p] device file

    TC batch file generator

    positional arguments:
    device device name
    file batch file name

    optional arguments:
    -h, --help show this help message and exit
    -n NUMBER, --number NUMBER
    how many lines in batch file
    -o, --skip_sw skip_sw (offload), by default skip_hw
    -s, --share_action all filters share the same action
    -p, --prio all filters have different prio

    Acked-by: Jamal Hadi Salim
    Acked-by: Lucas Bates
    Signed-off-by: Chris Mi
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Chris Mi
     
  • Similar to commit c78e1746d3ad
    ("net: sched: fix call_rcu() race on classifier module unloads"),
    we need to wait for flying RCU callback tcf_sample_cleanup_rcu().

    Cc: Yotam Gigi
    Cc: Daniel Borkmann
    Cc: Jiri Pirko
    Cc: Jamal Hadi Salim
    Cc: "Paul E. McKenney"
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Cong Wang
     
  • After previous patches, it is now safe to claim that
    tcf_exts_destroy() is always called with RTNL lock.

    Cc: Daniel Borkmann
    Cc: Jiri Pirko
    Cc: John Fastabend
    Cc: Jamal Hadi Salim
    Cc: "Paul E. McKenney"
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Cong Wang
     
  • Defer the tcf_exts_destroy() in RCU callback to
    tc filter workqueue and get RTNL lock.

    Reported-by: Chris Mi
    Cc: Daniel Borkmann
    Cc: Jiri Pirko
    Cc: John Fastabend
    Cc: Jamal Hadi Salim
    Cc: "Paul E. McKenney"
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Cong Wang
     
  • Defer the tcf_exts_destroy() in RCU callback to
    tc filter workqueue and get RTNL lock.

    Reported-by: Chris Mi
    Cc: Daniel Borkmann
    Cc: Jiri Pirko
    Cc: John Fastabend
    Cc: Jamal Hadi Salim
    Cc: "Paul E. McKenney"
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Cong Wang
     
  • Defer the tcf_exts_destroy() in RCU callback to
    tc filter workqueue and get RTNL lock.

    Reported-by: Chris Mi
    Cc: Daniel Borkmann
    Cc: Jiri Pirko
    Cc: John Fastabend
    Cc: Jamal Hadi Salim
    Cc: "Paul E. McKenney"
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Cong Wang
     
  • Defer the tcf_exts_destroy() in RCU callback to
    tc filter workqueue and get RTNL lock.

    Reported-by: Chris Mi
    Cc: Daniel Borkmann
    Cc: Jiri Pirko
    Cc: John Fastabend
    Cc: Jamal Hadi Salim
    Cc: "Paul E. McKenney"
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Cong Wang
     
  • Defer the tcf_exts_destroy() in RCU callback to
    tc filter workqueue and get RTNL lock.

    Reported-by: Chris Mi
    Cc: Daniel Borkmann
    Cc: Jiri Pirko
    Cc: John Fastabend
    Cc: Jamal Hadi Salim
    Cc: "Paul E. McKenney"
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Cong Wang
     
  • Defer the tcf_exts_destroy() in RCU callback to
    tc filter workqueue and get RTNL lock.

    Reported-by: Chris Mi
    Cc: Daniel Borkmann
    Cc: Jiri Pirko
    Cc: John Fastabend
    Cc: Jamal Hadi Salim
    Cc: "Paul E. McKenney"
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Cong Wang
     
  • Defer the tcf_exts_destroy() in RCU callback to
    tc filter workqueue and get RTNL lock.

    Reported-by: Chris Mi
    Cc: Daniel Borkmann
    Cc: Jiri Pirko
    Cc: John Fastabend
    Cc: Jamal Hadi Salim
    Cc: "Paul E. McKenney"
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Cong Wang
     
  • Defer the tcf_exts_destroy() in RCU callback to
    tc filter workqueue and get RTNL lock.

    Reported-by: Chris Mi
    Cc: Daniel Borkmann
    Cc: Jiri Pirko
    Cc: John Fastabend
    Cc: Jamal Hadi Salim
    Cc: "Paul E. McKenney"
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Cong Wang
     
  • Defer the tcf_exts_destroy() in RCU callback to
    tc filter workqueue and get RTNL lock.

    Reported-by: Chris Mi
    Cc: Daniel Borkmann
    Cc: Jiri Pirko
    Cc: John Fastabend
    Cc: Jamal Hadi Salim
    Cc: "Paul E. McKenney"
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Cong Wang
     
  • Defer the tcf_exts_destroy() in RCU callback to
    tc filter workqueue and get RTNL lock.

    Reported-by: Chris Mi
    Cc: Daniel Borkmann
    Cc: Jiri Pirko
    Cc: John Fastabend
    Cc: Jamal Hadi Salim
    Cc: "Paul E. McKenney"
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Cong Wang
     
  • Defer the tcf_exts_destroy() in RCU callback to
    tc filter workqueue and get RTNL lock.

    Reported-by: Chris Mi
    Cc: Daniel Borkmann
    Cc: Jiri Pirko
    Cc: John Fastabend
    Cc: Jamal Hadi Salim
    Cc: "Paul E. McKenney"
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Cong Wang
     
  • This patch introduces a dedicated workqueue for tc filters
    so that each tc filter's RCU callback could defer their
    action destroy work to this workqueue. The helper
    tcf_queue_work() is introduced for them to use.

    Because we hold RTNL lock when calling tcf_block_put(), we
    can not simply flush works inside it, therefore we have to
    defer it again to this workqueue and make sure all flying RCU
    callbacks have already queued their work before this one, in
    other words, to ensure this is the last one to execute to
    prevent any use-after-free.

    On the other hand, this makes tcf_block_put() ugly and
    harder to understand. Since David and Eric strongly dislike
    adding synchronize_rcu(), this is probably the only
    solution that could make everyone happy.

    Please also see the code comments below.

    Reported-by: Chris Mi
    Cc: Daniel Borkmann
    Cc: Jiri Pirko
    Cc: John Fastabend
    Cc: Jamal Hadi Salim
    Cc: "Paul E. McKenney"
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Cong Wang
     
  • Xin Long says:

    ====================
    sctp: a bunch of fixes for some sparse warnings

    As Eric noticed, when running 'make C=2 M=net/sctp/', a plenty of
    warnings or errors checked by sparse appear. They are all problems
    about Endian and type cast.

    Most of them are just warnings by which no issues could be caused
    while some might be bugs.

    This patchset fixes them with four patches basically according to
    how they are introduced.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • These warnings were found by running 'make C=2 M=net/sctp/'.
    They are there since very beginning.

    Note after this patch, there still one warning left in
    sctp_outq_flush():
    sctp_chunk_fail(chunk, SCTP_ERROR_INV_STRM)

    Since it has been moved to sctp_stream_outq_migrate on net-next,
    to avoid the extra job when merging net-next to net, I will post
    the fix for it after the merging is done.

    Reported-by: Eric Dumazet
    Signed-off-by: Xin Long
    Signed-off-by: David S. Miller

    Xin Long
     
  • These warnings were found by running 'make C=2 M=net/sctp/'.

    Commit d4d6fb5787a6 ("sctp: Try not to change a_rwnd when faking a
    SACK from SHUTDOWN.") expected to use the peers old rwnd and add
    our flight size to the a_rwnd. But with the wrong Endian, it may
    not work as well as expected.

    So fix it by converting to the right value.

    Fixes: d4d6fb5787a6 ("sctp: Try not to change a_rwnd when faking a SACK from SHUTDOWN.")
    Reported-by: Eric Dumazet
    Signed-off-by: Xin Long
    Signed-off-by: David S. Miller

    Xin Long
     
  • These warnings were found by running 'make C=2 M=net/sctp/'.

    They are introduced by not aware of Endian for the port when
    coding transport rhashtable patches.

    Fixes: 7fda702f9315 ("sctp: use new rhlist interface on sctp transport rhashtable")
    Reported-by: Eric Dumazet
    Signed-off-by: Xin Long
    Signed-off-by: David S. Miller

    Xin Long
     
  • These warnings were found by running 'make C=2 M=net/sctp/'.

    They are introduced by not aware of Endian when coding stream
    reconf patches.

    Since commit c0d8bab6ae51 ("sctp: add get and set sockopt for
    reconf_enable") enabled stream reconf feature for users, the
    Fixes tag below would use it.

    Fixes: c0d8bab6ae51 ("sctp: add get and set sockopt for reconf_enable")
    Reported-by: Eric Dumazet
    Signed-off-by: Xin Long
    Signed-off-by: David S. Miller

    Xin Long
     
  • Davide found the following script triggers a NULL pointer
    dereference:

    ip l a name eth0 type dummy
    tc q a dev eth0 parent :1 handle 1: htb

    This is because for a freshly created netdevice noop_qdisc
    is attached and when passing 'parent :1', kernel actually
    tries to match the major handle which is 0 and noop_qdisc
    has handle 0 so is matched by mistake. Commit 69012ae425d7
    tries to fix a similar bug but still misses this case.

    Handle 0 is not a valid one, should be just skipped. In
    fact, kernel uses it as TC_H_UNSPEC.

    Fixes: 69012ae425d7 ("net: sched: fix handling of singleton qdiscs with qdisc_hash")
    Fixes: 59cc1f61f09c ("net: sched:convert qdisc linked list to hashtable")
    Reported-by: Davide Caratti
    Cc: Jiri Kosina
    Cc: Eric Dumazet
    Cc: Jamal Hadi Salim
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Cong Wang
     
  • Now when migrating sock to another one in sctp_sock_migrate(), it only
    resets owner sk for the data in receive queues, not the chunks on out
    queues.

    It would cause that data chunks length on the sock is not consistent
    with sk sk_wmem_alloc. When closing the sock or freeing these chunks,
    the old sk would never be freed, and the new sock may crash due to
    the overflow sk_wmem_alloc.

    syzbot found this issue with this series:

    r0 = socket$inet_sctp()
    sendto$inet(r0)
    listen(r0)
    accept4(r0)
    close(r0)

    Although listen() should have returned error when one TCP-style socket
    is in connecting (I may fix this one in another patch), it could also
    be reproduced by peeling off an assoc.

    This issue is there since very beginning.

    This patch is to reset owner sk for the chunks on out queues so that
    sk sk_wmem_alloc has correct value after accept one sock or peeloff
    an assoc to one sock.

    Note that when resetting owner sk for chunks on outqueue, it has to
    sctp_clear_owner_w/skb_orphan chunks before changing assoc->base.sk
    first and then sctp_set_owner_w them after changing assoc->base.sk,
    due to that sctp_wfree and it's callees are using assoc->base.sk.

    Reported-by: Dmitry Vyukov
    Signed-off-by: Xin Long
    Acked-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Xin Long
     
  • John Fastabend says:

    ====================
    net: sockmap fixes

    Last two fixes (as far as I know) for sockmap code this round.

    First, we are using the qdisc cb structure when making the data end
    calculation. This is really just wrong so, store it with the other
    metadata in the correct tcp_skb_cb sturct to avoid breaking things.

    Next, with recent work to attach multiple programs to a cgroup a
    specific enumeration of return codes was agreed upon. However,
    I wrote the sk_skb program types before seeing this work and used
    a different convention. Patch 2 in the series aligns the return
    codes to avoid breaking with this infrastructure and also aligns
    with other programming conventions to avoid being the odd duck out
    forcing programs to remember SK_SKB programs are different. Pusing
    to net because its a user visible change. With this SK_SKB program
    return codes are the same as other cgroup program types.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Recent additions to support multiple programs in cgroups impose
    a strict requirement, "all yes is yes, any no is no". To enforce
    this the infrastructure requires the 'no' return code, SK_DROP in
    this case, to be 0.

    To apply these rules to SK_SKB program types the sk_actions return
    codes need to be adjusted.

    This fix adds SK_PASS and makes 'SK_DROP = 0'. Finally, remove
    SK_ABORTED to remove any chance that the API may allow aborted
    program flows to be passed up the stack. This would be incorrect
    behavior and allow programs to break existing policies.

    Signed-off-by: John Fastabend
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    John Fastabend
     
  • SK_SKB program types use bpf_compute_data to store the end of the
    packet data. However, bpf_compute_data assumes the cb is stored in the
    qdisc layer format. But, for SK_SKB this is the wrong layer of the
    stack for this type.

    It happens to work (sort of!) because in most cases nothing happens
    to be overwritten today. This is very fragile and error prone.
    Fortunately, we have another hole in tcp_skb_cb we can use so lets
    put the data_end value there.

    Note, SK_SKB program types do not use data_meta, they are failed by
    sk_skb_is_valid_access().

    Signed-off-by: John Fastabend
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    John Fastabend
     
  • …t/masahiroy/linux-kbuild

    Pull Kbuild fixes from Masahiro Yamada:

    - fix O= building on dash

    - remove unused dependency in Makefile

    - fix default of a choice in Kconfig

    - fix typos and documentation style

    - fix command options unrecognized by sparse

    * tag 'kbuild-fixes-v4.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
    kbuild: clang: fix build failures with sparse check
    kbuild doc: a bundle of fixes on makefiles.txt
    Makefile: kselftest: fix grammar typo
    kbuild: Fix optimization level choice default
    kbuild: drop unused symverfile in Makefile.modpost
    kbuild: revert $(realpath ...) to $(shell cd ... && /bin/pwd)

    Linus Torvalds