29 Jul, 2020

1 commit

  • commit fe5ed7ab99c656bd2f5b79b49df0e9ebf2cead8a upstream.

    If a tracee is uprobed and it hits int3 inserted by debugger, handle_swbp()
    does send_sig(SIGTRAP, current, 0) which means si_code == SI_USER. This used
    to work when this code was written, but then GDB started to validate si_code
    and now it simply can't use breakpoints if the tracee has an active uprobe:

    # cat test.c
    void unused_func(void)
    {
    }
    int main(void)
    {
    return 0;
    }

    # gcc -g test.c -o test
    # perf probe -x ./test -a unused_func
    # perf record -e probe_test:unused_func gdb ./test -ex run
    GNU gdb (GDB) 10.0.50.20200714-git
    ...
    Program received signal SIGTRAP, Trace/breakpoint trap.
    0x00007ffff7ddf909 in dl_main () from /lib64/ld-linux-x86-64.so.2
    (gdb)

    The tracee hits the internal breakpoint inserted by GDB to monitor shared
    library events but GDB misinterprets this SIGTRAP and reports a signal.

    Change handle_swbp() to use force_sig(SIGTRAP), this matches do_int3_user()
    and fixes the problem.

    This is the minimal fix for -stable, arch/x86/kernel/uprobes.c is equally
    wrong; it should use send_sigtrap(TRAP_TRACE) instead of send_sig(SIGTRAP),
    but this doesn't confuse GDB and needs another x86-specific patch.

    Reported-by: Aaron Merey
    Signed-off-by: Oleg Nesterov
    Signed-off-by: Ingo Molnar
    Reviewed-by: Srikar Dronamraju
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/r/20200723154420.GA32043@redhat.com
    Signed-off-by: Greg Kroah-Hartman

    Oleg Nesterov
     

22 Jul, 2020

6 commits

  • commit baedb87d1b53532f81b4bd0387f83b05d4f7eb9a upstream.

    Setting interrupt affinity on inactive interrupts is inconsistent when
    hierarchical irq domains are enabled. The core code should just store the
    affinity and not call into the irq chip driver for inactive interrupts
    because the chip drivers may not be in a state to handle such requests.

    X86 has a hacky workaround for that but all other irq chips have not which
    causes problems e.g. on GIC V3 ITS.

    Instead of adding more ugly hacks all over the place, solve the problem in
    the core code. If the affinity is set on an inactive interrupt then:

    - Store it in the irq descriptors affinity mask
    - Update the effective affinity to reflect that so user space has
    a consistent view
    - Don't call into the irq chip driver

    This is the core equivalent of the X86 workaround and works correctly
    because the affinity setting is established in the irq chip when the
    interrupt is activated later on.

    Note, that this is only effective when hierarchical irq domains are enabled
    by the architecture. Doing it unconditionally would break legacy irq chip
    implementations.

    For hierarchial irq domains this works correctly as none of the drivers can
    have a dependency on affinity setting in inactive state by design.

    Remove the X86 workaround as it is not longer required.

    Fixes: 02edee152d6e ("x86/apic/vector: Ignore set_affinity call for inactive interrupts")
    Reported-by: Ali Saidi
    Signed-off-by: Thomas Gleixner
    Tested-by: Ali Saidi
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/r/20200529015501.15771-1-alisaidi@amazon.com
    Link: https://lkml.kernel.org/r/877dv2rv25.fsf@nanos.tec.linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     
  • commit 01cfcde9c26d8555f0e6e9aea9d6049f87683998 upstream.

    task_h_load() can return 0 in some situations like running stress-ng
    mmapfork, which forks thousands of threads, in a sched group on a 224 cores
    system. The load balance doesn't handle this correctly because
    env->imbalance never decreases and it will stop pulling tasks only after
    reaching loop_max, which can be equal to the number of running tasks of
    the cfs. Make sure that imbalance will be decreased by at least 1.

    misfit task is the other feature that doesn't handle correctly such
    situation although it's probably more difficult to face the problem
    because of the smaller number of CPUs and running tasks on heterogenous
    system.

    We can't simply ensure that task_h_load() returns at least one because it
    would imply to handle underflow in other places.

    Signed-off-by: Vincent Guittot
    Signed-off-by: Peter Zijlstra (Intel)
    Reviewed-by: Valentin Schneider
    Reviewed-by: Dietmar Eggemann
    Tested-by: Dietmar Eggemann
    Cc: # v4.4+
    Link: https://lkml.kernel.org/r/20200710152426.16981-1-vincent.guittot@linaro.org
    Signed-off-by: Greg Kroah-Hartman

    Vincent Guittot
     
  • commit ce3614daabea8a2d01c1dd17ae41d1ec5e5ae7db upstream.

    While integrating rseq into glibc and replacing glibc's sched_getcpu
    implementation with rseq, glibc's tests discovered an issue with
    incorrect __rseq_abi.cpu_id field value right after the first time
    a newly created process issues sched_setaffinity.

    For the records, it triggers after building glibc and running tests, and
    then issuing:

    for x in {1..2000} ; do posix/tst-affinity-static & done

    and shows up as:

    error: Unexpected CPU 2, expected 0
    error: Unexpected CPU 2, expected 0
    error: Unexpected CPU 2, expected 0
    error: Unexpected CPU 2, expected 0
    error: Unexpected CPU 138, expected 0
    error: Unexpected CPU 138, expected 0
    error: Unexpected CPU 138, expected 0
    error: Unexpected CPU 138, expected 0

    This is caused by the scheduler invoking __set_task_cpu() directly from
    sched_fork() and wake_up_new_task(), thus bypassing rseq_migrate() which
    is done by set_task_cpu().

    Add the missing rseq_migrate() to both functions. The only other direct
    use of __set_task_cpu() is done by init_idle(), which does not involve a
    user-space task.

    Based on my testing with the glibc test-case, just adding rseq_migrate()
    to wake_up_new_task() is sufficient to fix the observed issue. Also add
    it to sched_fork() to keep things consistent.

    The reason why this never triggered so far with the rseq/basic_test
    selftest is unclear.

    The current use of sched_getcpu(3) does not typically require it to be
    always accurate. However, use of the __rseq_abi.cpu_id field within rseq
    critical sections requires it to be accurate. If it is not accurate, it
    can cause corruption in the per-cpu data targeted by rseq critical
    sections in user-space.

    Reported-By: Florian Weimer
    Signed-off-by: Mathieu Desnoyers
    Signed-off-by: Peter Zijlstra (Intel)
    Tested-By: Florian Weimer
    Cc: stable@vger.kernel.org # v4.18+
    Link: https://lkml.kernel.org/r/20200707201505.2632-1-mathieu.desnoyers@efficios.com
    Signed-off-by: Greg Kroah-Hartman

    Mathieu Desnoyers
     
  • commit e2a71bdea81690b6ef11f4368261ec6f5b6891aa upstream.

    When an expiration delta falls into the last level of the wheel, that delta
    has be compared against the maximum possible delay and reduced to fit in if
    necessary.

    However instead of comparing the delta against the maximum, the code
    compares the actual expiry against the maximum. Then instead of fixing the
    delta to fit in, it sets the maximum delta as the expiry value.

    This can result in various undesired outcomes, the worst possible one
    being a timer expiring 15 days ahead to fire immediately.

    Fixes: 500462a9de65 ("timers: Switch to a non-cascading wheel")
    Signed-off-by: Frederic Weisbecker
    Signed-off-by: Thomas Gleixner
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/20200717140551.29076-2-frederic@kernel.org
    Signed-off-by: Greg Kroah-Hartman

    Frederic Weisbecker
     
  • commit 30c66fc30ee7a98c4f3adf5fb7e213b61884474f upstream.

    When a timer is enqueued with a negative delta (ie: expiry is below
    base->clk), it gets added to the wheel as expiring now (base->clk).

    Yet the value that gets stored in base->next_expiry, while calling
    trigger_dyntick_cpu(), is the initial timer->expires value. The
    resulting state becomes:

    base->next_expiry < base->clk

    On the next timer enqueue, forward_timer_base() may accidentally
    rewind base->clk. As a possible outcome, timers may expire way too
    early, the worst case being that the highest wheel levels get spuriously
    processed again.

    To prevent from that, make sure that base->next_expiry doesn't get below
    base->clk.

    Fixes: a683f390b93f ("timers: Forward the wheel clock whenever possible")
    Signed-off-by: Frederic Weisbecker
    Signed-off-by: Thomas Gleixner
    Reviewed-by: Anna-Maria Behnsen
    Tested-by: Juri Lelli
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/20200703010657.2302-1-frederic@kernel.org
    Signed-off-by: Greg Kroah-Hartman

    Frederic Weisbecker
     
  • [ Upstream commit ad0f75e5f57ccbceec13274e1e242f2b5a6397ed ]

    When we clone a socket in sk_clone_lock(), its sk_cgrp_data is
    copied, so the cgroup refcnt must be taken too. And, unlike the
    sk_alloc() path, sock_update_netprioidx() is not called here.
    Therefore, it is safe and necessary to grab the cgroup refcnt
    even when cgroup_sk_alloc is disabled.

    sk_clone_lock() is in BH context anyway, the in_interrupt()
    would terminate this function if called there. And for sk_alloc()
    skcd->val is always zero. So it's safe to factor out the code
    to make it more readable.

    The global variable 'cgroup_sk_alloc_disabled' is used to determine
    whether to take these reference counts. It is impossible to make
    the reference counting correct unless we save this bit of information
    in skcd->val. So, add a new bit there to record whether the socket
    has already taken the reference counts. This obviously relies on
    kmalloc() to align cgroup pointers to at least 4 bytes,
    ARCH_KMALLOC_MINALIGN is certainly larger than that.

    This bug seems to be introduced since the beginning, commit
    d979a39d7242 ("cgroup: duplicate cgroup reference when cloning sockets")
    tried to fix it but not compeletely. It seems not easy to trigger until
    the recent commit 090e28b229af
    ("netprio_cgroup: Fix unlimited memory leak of v2 cgroups") was merged.

    Fixes: bd1060a1d671 ("sock, cgroup: add sock->sk_cgroup")
    Reported-by: Cameron Berkenpas
    Reported-by: Peter Geis
    Reported-by: Lu Fengqi
    Reported-by: Daniël Sonck
    Reported-by: Zhang Qiang
    Tested-by: Cameron Berkenpas
    Tested-by: Peter Geis
    Tested-by: Thomas Lamprecht
    Cc: Daniel Borkmann
    Cc: Zefan Li
    Cc: Tejun Heo
    Cc: Roman Gushchin
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Cong Wang
     

16 Jul, 2020

6 commits

  • commit 63960260457a02af2a6cb35d75e6bdb17299c882 upstream.

    When evaluating access control over kallsyms visibility, credentials at
    open() time need to be used, not the "current" creds (though in BPF's
    case, this has likely always been the same). Plumb access to associated
    file->f_cred down through bpf_dump_raw_ok() and its callers now that
    kallsysm_show_value() has been refactored to take struct cred.

    Cc: Alexei Starovoitov
    Cc: Daniel Borkmann
    Cc: bpf@vger.kernel.org
    Cc: stable@vger.kernel.org
    Fixes: 7105e828c087 ("bpf: allow for correlation of maps and helpers in dump")
    Signed-off-by: Kees Cook
    Signed-off-by: Greg Kroah-Hartman

    Kees Cook
     
  • commit 60f7bb66b88b649433bf700acfc60c3f24953871 upstream.

    The kprobe show() functions were using "current"'s creds instead
    of the file opener's creds for kallsyms visibility. Fix to use
    seq_file->file->f_cred.

    Cc: Masami Hiramatsu
    Cc: stable@vger.kernel.org
    Fixes: 81365a947de4 ("kprobes: Show address of kprobes if kallsyms does")
    Fixes: ffb9bd68ebdb ("kprobes: Show blacklist addresses as same as kallsyms does")
    Signed-off-by: Kees Cook
    Signed-off-by: Greg Kroah-Hartman

    Kees Cook
     
  • commit b25a7c5af9051850d4f3d93ca500056ab6ec724b upstream.

    The printing of section addresses in /sys/module/*/sections/* was not
    using the correct credentials to evaluate visibility.

    Before:

    # cat /sys/module/*/sections/.*text
    0xffffffffc0458000
    ...
    # capsh --drop=CAP_SYSLOG -- -c "cat /sys/module/*/sections/.*text"
    0xffffffffc0458000
    ...

    After:

    # cat /sys/module/*/sections/*.text
    0xffffffffc0458000
    ...
    # capsh --drop=CAP_SYSLOG -- -c "cat /sys/module/*/sections/.*text"
    0x0000000000000000
    ...

    Additionally replaces the existing (safe) /proc/modules check with
    file->f_cred for consistency.

    Reported-by: Dominik Czarnota
    Fixes: be71eda5383f ("module: Fix display of wrong module .text address")
    Cc: stable@vger.kernel.org
    Tested-by: Jessica Yu
    Acked-by: Jessica Yu
    Signed-off-by: Kees Cook
    Signed-off-by: Greg Kroah-Hartman

    Kees Cook
     
  • commit ed66f991bb19d94cae5d38f77de81f96aac7813f upstream.

    In order to gain access to the open file's f_cred for kallsym visibility
    permission checks, refactor the module section attributes to use the
    bin_attribute instead of attribute interface. Additionally removes the
    redundant "name" struct member.

    Cc: stable@vger.kernel.org
    Reviewed-by: Greg Kroah-Hartman
    Tested-by: Jessica Yu
    Acked-by: Jessica Yu
    Signed-off-by: Kees Cook
    Signed-off-by: Greg Kroah-Hartman

    Kees Cook
     
  • commit 160251842cd35a75edfb0a1d76afa3eb674ff40a upstream.

    In order to perform future tests against the cred saved during open(),
    switch kallsyms_show_value() to operate on a cred, and have all current
    callers pass current_cred(). This makes it very obvious where callers
    are checking the wrong credential in their "read" contexts. These will
    be fixed in the coming patches.

    Additionally switch return value to bool, since it is always used as a
    direct permission check, not a 0-on-success, negative-on-error style
    function return.

    Cc: stable@vger.kernel.org
    Signed-off-by: Kees Cook
    Signed-off-by: Greg Kroah-Hartman

    Kees Cook
     
  • [ Upstream commit fd844ba9ae59b51e34e77105d79f8eca780b3bd6 ]

    This function is concerned with the long-term CPU mask, not the
    transitory mask the task might have while migrate disabled. Before
    this patch, if a task was migrate-disabled at the time
    __set_cpus_allowed_ptr() was called, and the new mask happened to be
    equal to the CPU that the task was running on, then the mask update
    would be lost.

    Signed-off-by: Scott Wood
    Signed-off-by: Sebastian Andrzej Siewior
    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Ingo Molnar
    Link: https://lkml.kernel.org/r/20200617121742.cpxppyi7twxmpin7@linutronix.de
    Signed-off-by: Sasha Levin

    Scott Wood
     

09 Jul, 2020

2 commits

  • [ Upstream commit 440ab9e10e2e6e5fd677473ee6f9e3af0f6904d6 ]

    At times when I'm using kgdb I see a splat on my console about
    suspicious RCU usage. I managed to come up with a case that could
    reproduce this that looked like this:

    WARNING: suspicious RCU usage
    5.7.0-rc4+ #609 Not tainted
    -----------------------------
    kernel/pid.c:395 find_task_by_pid_ns() needs rcu_read_lock() protection!

    other info that might help us debug this:

    rcu_scheduler_active = 2, debug_locks = 1
    3 locks held by swapper/0/1:
    #0: ffffff81b6b8e988 (&dev->mutex){....}-{3:3}, at: __device_attach+0x40/0x13c
    #1: ffffffd01109e9e8 (dbg_master_lock){....}-{2:2}, at: kgdb_cpu_enter+0x20c/0x7ac
    #2: ffffffd01109ea90 (dbg_slave_lock){....}-{2:2}, at: kgdb_cpu_enter+0x3ec/0x7ac

    stack backtrace:
    CPU: 7 PID: 1 Comm: swapper/0 Not tainted 5.7.0-rc4+ #609
    Hardware name: Google Cheza (rev3+) (DT)
    Call trace:
    dump_backtrace+0x0/0x1b8
    show_stack+0x1c/0x24
    dump_stack+0xd4/0x134
    lockdep_rcu_suspicious+0xf0/0x100
    find_task_by_pid_ns+0x5c/0x80
    getthread+0x8c/0xb0
    gdb_serial_stub+0x9d4/0xd04
    kgdb_cpu_enter+0x284/0x7ac
    kgdb_handle_exception+0x174/0x20c
    kgdb_brk_fn+0x24/0x30
    call_break_hook+0x6c/0x7c
    brk_handler+0x20/0x5c
    do_debug_exception+0x1c8/0x22c
    el1_sync_handler+0x3c/0xe4
    el1_sync+0x7c/0x100
    rpmh_rsc_probe+0x38/0x420
    platform_drv_probe+0x94/0xb4
    really_probe+0x134/0x300
    driver_probe_device+0x68/0x100
    __device_attach_driver+0x90/0xa8
    bus_for_each_drv+0x84/0xcc
    __device_attach+0xb4/0x13c
    device_initial_probe+0x18/0x20
    bus_probe_device+0x38/0x98
    device_add+0x38c/0x420

    If I understand properly we should just be able to blanket kgdb under
    one big RCU read lock and the problem should go away. We'll add it to
    the beast-of-a-function known as kgdb_cpu_enter().

    With this I no longer get any splats and things seem to work fine.

    Signed-off-by: Douglas Anderson
    Link: https://lore.kernel.org/r/20200602154729.v2.1.I70e0d4fd46d5ed2aaf0c98a355e8e1b7a5bb7e4e@changeid
    Signed-off-by: Daniel Thompson
    Signed-off-by: Sasha Levin

    Douglas Anderson
     
  • [ Upstream commit 9818427c6270a9ce8c52c8621026fe9cebae0f92 ]

    Writing to the sysctl of a sched_domain->flags directly updates the value of
    the field, and goes nowhere near update_top_cache_domain(). This means that
    the cached domain pointers can end up containing stale data (e.g. the
    domain pointed to doesn't have the relevant flag set anymore).

    Explicit domain walks that check for flags will be affected by
    the write, but this won't be in sync with the cached pointers which will
    still point to the domains that were cached at the last sched_domain
    build.

    In other words, writing to this interface is playing a dangerous game. It
    could be made to trigger an update of the cached sched_domain pointers when
    written to, but this does not seem to be worth the trouble. Make it
    read-only.

    Signed-off-by: Valentin Schneider
    Signed-off-by: Peter Zijlstra (Intel)
    Link: https://lkml.kernel.org/r/20200415210512.805-3-valentin.schneider@arm.com
    Signed-off-by: Sasha Levin

    Valentin Schneider
     

01 Jul, 2020

8 commits

  • commit 097350d1c6e1f5808cae142006f18a0bbc57018d upstream.

    Currently the ring buffer makes events that happen in interrupts that preempt
    another event have a delta of zero. (Hopefully we can change this soon). But
    this is to deal with the races of updating a global counter with lockless
    and nesting functions updating deltas.

    With the addition of absolute time stamps, the time extend didn't follow
    this rule. A time extend can happen if two events happen longer than 2^27
    nanoseconds appart, as the delta time field in each event is only 27 bits.
    If that happens, then a time extend is injected with 2^59 bits of
    nanoseconds to use (18 years). But if the 2^27 nanoseconds happen between
    two events, and as it is writing the event, an interrupt triggers, it will
    see the 2^27 difference as well and inject a time extend of its own. But a
    recent change made the time extend logic not take into account the nesting,
    and this can cause two time extend deltas to happen moving the time stamp
    much further ahead than the current time. This gets all reset when the ring
    buffer moves to the next page, but that can cause time to appear to go
    backwards.

    This was observed in a trace-cmd recording, and since the data is saved in a
    file, with trace-cmd report --debug, it was possible to see that this indeed
    did happen!

    bash-52501 110d... 81778.908247: sched_switch: bash:52501 [120] S ==> swapper/110:0 [120] [12770284:0x2e8:64]
    -0 110d... 81778.908757: sched_switch: swapper/110:0 [120] R ==> bash:52501 [120] [509947:0x32c:64]
    TIME EXTEND: delta:306454770 length:0
    bash-52501 110.... 81779.215212: sched_swap_numa: src_pid=52501 src_tgid=52388 src_ngid=52501 src_cpu=110 src_nid=2 dst_pid=52509 dst_tgid=52388 dst_ngid=52501 dst_cpu=49 dst_nid=1 [0:0x378:48]
    TIME EXTEND: delta:306458165 length:0
    bash-52501 110dNh. 81779.521670: sched_wakeup: migration/110:565 [0] success=1 CPU:110 [0:0x3b4:40]

    and at the next page, caused the time to go backwards:

    bash-52504 110d... 81779.685411: sched_switch: bash:52504 [120] S ==> swapper/110:0 [120] [8347057:0xfb4:64]
    CPU:110 [SUBBUFFER START] [81779379165886:0x1320000]
    -0 110dN.. 81779.379166: sched_wakeup: bash:52504 [120] success=1 CPU:110 [0:0x10:40]
    -0 110d... 81779.379167: sched_switch: swapper/110:0 [120] R ==> bash:52504 [120] [1168:0x3c:64]

    Link: https://lkml.kernel.org/r/20200622151815.345d1bf5@oasis.local.home

    Cc: Ingo Molnar
    Cc: Andrew Morton
    Cc: Tom Zanussi
    Cc: stable@vger.kernel.org
    Fixes: dc4e2801d400b ("ring-buffer: Redefine the unimplemented RINGBUF_TYPE_TIME_STAMP")
    Reported-by: Julia Lawall
    Signed-off-by: Steven Rostedt (VMware)
    Signed-off-by: Greg Kroah-Hartman

    Steven Rostedt (VMware)
     
  • commit 6784beada631800f2c5afd567e5628c843362cee upstream.

    Fix the event trigger to accept redundant spaces in
    the trigger input.

    For example, these return -EINVAL

    echo " traceon" > events/ftrace/print/trigger
    echo "traceon if common_pid == 0" > events/ftrace/print/trigger
    echo "disable_event:kmem:kmalloc " > events/ftrace/print/trigger

    But these are hard to find what is wrong.

    To fix this issue, use skip_spaces() to remove spaces
    in front of actual tokens, and set NULL if there is no
    token.

    Link: http://lkml.kernel.org/r/159262476352.185015.5261566783045364186.stgit@devnote2

    Cc: Tom Zanussi
    Cc: stable@vger.kernel.org
    Fixes: 85f2b08268c0 ("tracing: Add basic event trigger framework")
    Reviewed-by: Tom Zanussi
    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt (VMware)
    Signed-off-by: Greg Kroah-Hartman

    Masami Hiramatsu
     
  • [ Upstream commit 1b0b283648163dae2a214ca28ed5a99f62a77319 ]

    We use one blktrace per request_queue, that means one per the entire
    disk. So we cannot run one blktrace on say /dev/vda and then /dev/vda1,
    or just two calls on /dev/vda.

    We check for concurrent setup only at the very end of the blktrace setup though.

    If we try to run two concurrent blktraces on the same block device the
    second one will fail, and the first one seems to go on. However when
    one tries to kill the first one one will see things like this:

    The kernel will show these:

    ```
    debugfs: File 'dropped' in directory 'nvme1n1' already present!
    debugfs: File 'msg' in directory 'nvme1n1' already present!
    debugfs: File 'trace0' in directory 'nvme1n1' already present!
    ``

    And userspace just sees this error message for the second call:

    ```
    blktrace /dev/nvme1n1
    BLKTRACESETUP(2) /dev/nvme1n1 failed: 5/Input/output error
    ```

    The first userspace process #1 will also claim that the files
    were taken underneath their nose as well. The files are taken
    away form the first process given that when the second blktrace
    fails, it will follow up with a BLKTRACESTOP and BLKTRACETEARDOWN.
    This means that even if go-happy process #1 is waiting for blktrace
    data, we *have* been asked to take teardown the blktrace.

    This can easily be reproduced with break-blktrace [0] run_0005.sh test.

    Just break out early if we know we're already going to fail, this will
    prevent trying to create the files all over again, which we know still
    exist.

    [0] https://github.com/mcgrof/break-blktrace

    Signed-off-by: Luis Chamberlain
    Signed-off-by: Jan Kara
    Reviewed-by: Bart Van Assche
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jens Axboe
    Signed-off-by: Sasha Levin

    Luis Chamberlain
     
  • [ Upstream commit 6743ad432ec92e680cd0d9db86cb17b949cf5a43 ]

    Anders reported that the lockdep warns that suspicious
    RCU list usage in register_kprobe() (detected by
    CONFIG_PROVE_RCU_LIST.) This is because get_kprobe()
    access kprobe_table[] by hlist_for_each_entry_rcu()
    without rcu_read_lock.

    If we call get_kprobe() from the breakpoint handler context,
    it is run with preempt disabled, so this is not a problem.
    But in other cases, instead of rcu_read_lock(), we locks
    kprobe_mutex so that the kprobe_table[] is not updated.
    So, current code is safe, but still not good from the view
    point of RCU.

    Joel suggested that we can silent that warning by passing
    lockdep_is_held() to the last argument of
    hlist_for_each_entry_rcu().

    Add lockdep_is_held(&kprobe_mutex) at the end of the
    hlist_for_each_entry_rcu() to suppress the warning.

    Link: http://lkml.kernel.org/r/158927055350.27680.10261450713467997503.stgit@devnote2

    Reported-by: Anders Roxell
    Suggested-by: Joel Fernandes
    Reviewed-by: Joel Fernandes (Google)
    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt (VMware)
    Signed-off-by: Sasha Levin

    Masami Hiramatsu
     
  • [ Upstream commit 740797ce3a124b7dd22b7fb832d87bc8fba1cf6f ]

    syzbot reported the following warning:

    WARNING: CPU: 1 PID: 6351 at kernel/sched/deadline.c:628
    enqueue_task_dl+0x22da/0x38a0 kernel/sched/deadline.c:1504

    At deadline.c:628 we have:

    623 static inline void setup_new_dl_entity(struct sched_dl_entity *dl_se)
    624 {
    625 struct dl_rq *dl_rq = dl_rq_of_se(dl_se);
    626 struct rq *rq = rq_of_dl_rq(dl_rq);
    627
    628 WARN_ON(dl_se->dl_boosted);
    629 WARN_ON(dl_time_before(rq_clock(rq), dl_se->deadline));
    [...]
    }

    Which means that setup_new_dl_entity() has been called on a task
    currently boosted. This shouldn't happen though, as setup_new_dl_entity()
    is only called when the 'dynamic' deadline of the new entity
    is in the past w.r.t. rq_clock and boosted tasks shouldn't verify this
    condition.

    Digging through the PI code I noticed that what above might in fact happen
    if an RT tasks blocks on an rt_mutex hold by a DEADLINE task. In the
    first branch of boosting conditions we check only if a pi_task 'dynamic'
    deadline is earlier than mutex holder's and in this case we set mutex
    holder to be dl_boosted. However, since RT 'dynamic' deadlines are only
    initialized if such tasks get boosted at some point (or if they become
    DEADLINE of course), in general RT 'dynamic' deadlines are usually equal
    to 0 and this verifies the aforementioned condition.

    Fix it by checking that the potential donor task is actually (even if
    temporary because in turn boosted) running at DEADLINE priority before
    using its 'dynamic' deadline value.

    Fixes: 2d3d891d3344 ("sched/deadline: Add SCHED_DEADLINE inheritance logic")
    Reported-by: syzbot+119ba87189432ead09b4@syzkaller.appspotmail.com
    Signed-off-by: Juri Lelli
    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Ingo Molnar
    Reviewed-by: Daniel Bristot de Oliveira
    Tested-by: Daniel Wagner
    Link: https://lkml.kernel.org/r/20181119153201.GB2119@localhost.localdomain
    Signed-off-by: Sasha Levin

    Juri Lelli
     
  • [ Upstream commit ce9bc3b27f2a21a7969b41ffb04df8cf61bd1592 ]

    syzbot reported the following warning triggered via SYSC_sched_setattr():

    WARNING: CPU: 0 PID: 6973 at kernel/sched/deadline.c:593 setup_new_dl_entity /kernel/sched/deadline.c:594 [inline]
    WARNING: CPU: 0 PID: 6973 at kernel/sched/deadline.c:593 enqueue_dl_entity /kernel/sched/deadline.c:1370 [inline]
    WARNING: CPU: 0 PID: 6973 at kernel/sched/deadline.c:593 enqueue_task_dl+0x1c17/0x2ba0 /kernel/sched/deadline.c:1441

    This happens because the ->dl_boosted flag is currently not initialized by
    __dl_clear_params() (unlike the other flags) and setup_new_dl_entity()
    rightfully complains about it.

    Initialize dl_boosted to 0.

    Fixes: 2d3d891d3344 ("sched/deadline: Add SCHED_DEADLINE inheritance logic")
    Reported-by: syzbot+5ac8bac25f95e8b221e7@syzkaller.appspotmail.com
    Signed-off-by: Juri Lelli
    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Ingo Molnar
    Tested-by: Daniel Wagner
    Link: https://lkml.kernel.org/r/20200617072919.818409-1-juri.lelli@redhat.com
    Signed-off-by: Sasha Levin

    Juri Lelli
     
  • [ Upstream commit d8fe449a9c51a37d844ab607e14e2f5c657d3cf2 ]

    Attaching to these hooks can break iptables because its optval is
    usually quite big, or at least bigger than the current PAGE_SIZE limit.
    David also mentioned some SCTP options can be big (around 256k).

    For such optvals we expose only the first PAGE_SIZE bytes to
    the BPF program. BPF program has two options:
    1. Set ctx->optlen to 0 to indicate that the BPF's optval
    should be ignored and the kernel should use original userspace
    value.
    2. Set ctx->optlen to something that's smaller than the PAGE_SIZE.

    v5:
    * use ctx->optlen == 0 with trimmed buffer (Alexei Starovoitov)
    * update the docs accordingly

    v4:
    * use temporary buffer to avoid optval == optval_end == NULL;
    this removes the corner case in the verifier that might assume
    non-zero PTR_TO_PACKET/PTR_TO_PACKET_END.

    v3:
    * don't increase the limit, bypass the argument

    v2:
    * proper comments formatting (Jakub Kicinski)

    Fixes: 0d01da6afc54 ("bpf: implement getsockopt and setsockopt hooks")
    Signed-off-by: Stanislav Fomichev
    Signed-off-by: Alexei Starovoitov
    Cc: David Laight
    Link: https://lore.kernel.org/bpf/20200617010416.93086-1-sdf@google.com
    Signed-off-by: Sasha Levin

    Stanislav Fomichev
     
  • [ Upstream commit 99c51064fb06146b3d494b745c947e438a10aaa7 ]

    Syzkaller discovered that creating a hash of type devmap_hash with a large
    number of entries can hit the memory allocator limit for allocating
    contiguous memory regions. There's really no reason to use kmalloc_array()
    directly in the devmap code, so just switch it to the existing
    bpf_map_area_alloc() function that is used elsewhere.

    Fixes: 6f9d451ab1a3 ("xdp: Add devmap_hash map type for looking up devices by hashed index")
    Reported-by: Xiumei Mu
    Signed-off-by: Toke Høiland-Jørgensen
    Signed-off-by: Alexei Starovoitov
    Acked-by: John Fastabend
    Link: https://lore.kernel.org/bpf/20200616142829.114173-1-toke@redhat.com
    Signed-off-by: Sasha Levin

    Toke Høiland-Jørgensen
     

24 Jun, 2020

8 commits

  • commit 9b38cc704e844e41d9cf74e647bff1d249512cb3 upstream.

    Ziqian reported lockup when adding retprobe on _raw_spin_lock_irqsave.
    My test was also able to trigger lockdep output:

    ============================================
    WARNING: possible recursive locking detected
    5.6.0-rc6+ #6 Not tainted
    --------------------------------------------
    sched-messaging/2767 is trying to acquire lock:
    ffffffff9a492798 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_hash_lock+0x52/0xa0

    but task is already holding lock:
    ffffffff9a491a18 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_trampoline+0x0/0x50

    other info that might help us debug this:
    Possible unsafe locking scenario:

    CPU0
    ----
    lock(&(kretprobe_table_locks[i].lock));
    lock(&(kretprobe_table_locks[i].lock));

    *** DEADLOCK ***

    May be due to missing lock nesting notation

    1 lock held by sched-messaging/2767:
    #0: ffffffff9a491a18 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_trampoline+0x0/0x50

    stack backtrace:
    CPU: 3 PID: 2767 Comm: sched-messaging Not tainted 5.6.0-rc6+ #6
    Call Trace:
    dump_stack+0x96/0xe0
    __lock_acquire.cold.57+0x173/0x2b7
    ? native_queued_spin_lock_slowpath+0x42b/0x9e0
    ? lockdep_hardirqs_on+0x590/0x590
    ? __lock_acquire+0xf63/0x4030
    lock_acquire+0x15a/0x3d0
    ? kretprobe_hash_lock+0x52/0xa0
    _raw_spin_lock_irqsave+0x36/0x70
    ? kretprobe_hash_lock+0x52/0xa0
    kretprobe_hash_lock+0x52/0xa0
    trampoline_handler+0xf8/0x940
    ? kprobe_fault_handler+0x380/0x380
    ? find_held_lock+0x3a/0x1c0
    kretprobe_trampoline+0x25/0x50
    ? lock_acquired+0x392/0xbc0
    ? _raw_spin_lock_irqsave+0x50/0x70
    ? __get_valid_kprobe+0x1f0/0x1f0
    ? _raw_spin_unlock_irqrestore+0x3b/0x40
    ? finish_task_switch+0x4b9/0x6d0
    ? __switch_to_asm+0x34/0x70
    ? __switch_to_asm+0x40/0x70

    The code within the kretprobe handler checks for probe reentrancy,
    so we won't trigger any _raw_spin_lock_irqsave probe in there.

    The problem is in outside kprobe_flush_task, where we call:

    kprobe_flush_task
    kretprobe_table_lock
    raw_spin_lock_irqsave
    _raw_spin_lock_irqsave

    where _raw_spin_lock_irqsave triggers the kretprobe and installs
    kretprobe_trampoline handler on _raw_spin_lock_irqsave return.

    The kretprobe_trampoline handler is then executed with already
    locked kretprobe_table_locks, and first thing it does is to
    lock kretprobe_table_locks ;-) the whole lockup path like:

    kprobe_flush_task
    kretprobe_table_lock
    raw_spin_lock_irqsave
    _raw_spin_lock_irqsave ---> probe triggered, kretprobe_trampoline installed

    ---> kretprobe_table_locks locked

    kretprobe_trampoline
    trampoline_handler
    kretprobe_hash_lock(current, &head, &flags);
    Cc: "Gustavo A . R . Silva"
    Cc: Anders Roxell
    Cc: "Naveen N . Rao"
    Cc: Anil S Keshavamurthy
    Cc: David Miller
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: stable@vger.kernel.org
    Reported-by: "Ziqian SUN (Zamir)"
    Acked-by: Masami Hiramatsu
    Signed-off-by: Jiri Olsa
    Signed-off-by: Steven Rostedt (VMware)
    Signed-off-by: Greg Kroah-Hartman

    Jiri Olsa
     
  • commit 1a0aa991a6274161c95a844c58cfb801d681eb59 upstream.

    In kprobe_optimizer() kick_kprobe_optimizer() is called
    without kprobe_mutex, but this can race with other caller
    which is protected by kprobe_mutex.

    To fix that, expand kprobe_mutex protected area to protect
    kick_kprobe_optimizer() call.

    Link: http://lkml.kernel.org/r/158927057586.27680.5036330063955940456.stgit@devnote2

    Fixes: cd7ebe2298ff ("kprobes: Use text_poke_smp_batch for optimizing")
    Cc: Ingo Molnar
    Cc: "Gustavo A . R . Silva"
    Cc: Anders Roxell
    Cc: "Naveen N . Rao"
    Cc: Anil S Keshavamurthy
    Cc: David Miller
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Ziqian SUN
    Cc: stable@vger.kernel.org
    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt (VMware)
    Signed-off-by: Greg Kroah-Hartman

    Masami Hiramatsu
     
  • commit 3aa8fdc37d16735e8891035becf25b3857d3efe0 upstream.

    kmemleak report:
    [] __kmalloc_track_caller+0x139/0x2b0
    [] kstrndup+0x37/0x80
    [] parse_probe_arg.isra.7+0x3cc/0x630
    [] traceprobe_parse_probe_arg+0x2f5/0x810
    [] trace_kprobe_create+0x2ca/0x950
    [] create_or_delete_trace_kprobe+0xf/0x30
    [] trace_run_command+0x67/0x80
    [] trace_parse_run_command+0xa7/0x140
    [] probes_write+0x10/0x20
    [] __vfs_write+0x30/0x1e0
    [] vfs_write+0x96/0x1b0
    [] ksys_write+0x53/0xc0
    [] __ia32_sys_write+0x15/0x20
    [] do_syscall_32_irqs_on+0x3d/0x260
    [] do_fast_syscall_32+0x39/0xb0
    [] entry_SYSENTER_32+0xaf/0x102

    Post parse_probe_arg(), the FETCH_OP_DATA operation type is overwritten
    to FETCH_OP_ST_STRING, as a result memory is never freed since
    traceprobe_free_probe_arg() iterates only over SYMBOL and DATA op types

    Setup fetch string operation correctly after fetch_op_data operation.

    Link: https://lkml.kernel.org/r/20200615143034.GA1734@cosmos

    Cc: stable@vger.kernel.org
    Fixes: a42e3c4de964 ("tracing/probe: Add immediate string parameter support")
    Acked-by: Masami Hiramatsu
    Signed-off-by: Vamshi K Sthambamkadi
    Signed-off-by: Steven Rostedt (VMware)
    Signed-off-by: Greg Kroah-Hartman

    Vamshi K Sthambamkadi
     
  • [ Upstream commit 22d5bd6867364b41576a712755271a7d6161abd6 ]

    Commit 60d53e2c3b75 ("tracing/probe: Split trace_event related data from
    trace_probe") removed the trace_[ku]probe structure from the
    trace_event_call->data pointer. As bpf_get_[ku]probe_info() were
    forgotten in that change, fix them now. These functions are currently
    only used by the bpf_task_fd_query() syscall handler to collect
    information about a perf event.

    Fixes: 60d53e2c3b75 ("tracing/probe: Split trace_event related data from trace_probe")
    Signed-off-by: Jean-Philippe Brucker
    Signed-off-by: Alexei Starovoitov
    Acked-by: Yonghong Song
    Acked-by: Masami Hiramatsu
    Link: https://lore.kernel.org/bpf/20200608124531.819838-1-jean-philippe@linaro.org
    Signed-off-by: Sasha Levin

    Jean-Philippe Brucker
     
  • [ Upstream commit 5aec598c456fe3c1b71a1202cbb42bdc2a643277 ]

    The function blk_log_remap() can be simplified by removing the
    call to get_pdu_remap() that copies the values into extra variable to
    print the data, which also fixes the endiannness warning reported by
    sparse.

    Signed-off-by: Chaitanya Kulkarni
    Signed-off-by: Jens Axboe
    Signed-off-by: Sasha Levin

    Chaitanya Kulkarni
     
  • [ Upstream commit 71df3fd82e7cccec7b749a8607a4662d9f7febdd ]

    In function get_pdu_len() replace variable type from __u64 to
    __be64. This fixes sparse warning.

    Signed-off-by: Chaitanya Kulkarni
    Signed-off-by: Jens Axboe
    Signed-off-by: Sasha Levin

    Chaitanya Kulkarni
     
  • [ Upstream commit 48bc3cd3e07a1486f45d9971c75d6090976c3b1b ]

    In blk_add_trace_spliti() blk_add_trace_bio_remap() use
    blk_status_to_errno() to pass the error instead of pasing the bi_status.
    This fixes the sparse warning.

    Signed-off-by: Chaitanya Kulkarni
    Signed-off-by: Jens Axboe
    Signed-off-by: Sasha Levin

    Chaitanya Kulkarni
     
  • [ Upstream commit 3234ac664a870e6ea69ae3a57d824cd7edbeacc5 ]

    Close the hole of holding a mapping over kernel driver takeover event of
    a given address range.

    Commit 90a545e98126 ("restrict /dev/mem to idle io memory ranges")
    introduced CONFIG_IO_STRICT_DEVMEM with the goal of protecting the
    kernel against scenarios where a /dev/mem user tramples memory that a
    kernel driver owns. However, this protection only prevents *new* read(),
    write() and mmap() requests. Established mappings prior to the driver
    calling request_mem_region() are left alone.

    Especially with persistent memory, and the core kernel metadata that is
    stored there, there are plentiful scenarios for a /dev/mem user to
    violate the expectations of the driver and cause amplified damage.

    Teach request_mem_region() to find and shoot down active /dev/mem
    mappings that it believes it has successfully claimed for the exclusive
    use of the driver. Effectively a driver call to request_mem_region()
    becomes a hole-punch on the /dev/mem device.

    The typical usage of unmap_mapping_range() is part of
    truncate_pagecache() to punch a hole in a file, but in this case the
    implementation is only doing the "first half" of a hole punch. Namely it
    is just evacuating current established mappings of the "hole", and it
    relies on the fact that /dev/mem establishes mappings in terms of
    absolute physical address offsets. Once existing mmap users are
    invalidated they can attempt to re-establish the mapping, or attempt to
    continue issuing read(2) / write(2) to the invalidated extent, but they
    will then be subject to the CONFIG_IO_STRICT_DEVMEM checking that can
    block those subsequent accesses.

    Cc: Arnd Bergmann
    Cc: Ingo Molnar
    Cc: Kees Cook
    Cc: Matthew Wilcox
    Cc: Russell King
    Cc: Andrew Morton
    Cc: Greg Kroah-Hartman
    Fixes: 90a545e98126 ("restrict /dev/mem to idle io memory ranges")
    Signed-off-by: Dan Williams
    Reviewed-by: Kees Cook
    Link: https://lore.kernel.org/r/159009507306.847224.8502634072429766747.stgit@dwillia2-desk3.amr.corp.intel.com
    Signed-off-by: Greg Kroah-Hartman
    Signed-off-by: Sasha Levin

    Dan Williams
     

22 Jun, 2020

9 commits

  • commit b5945214b76a1f22929481724ffd448000ede914 upstream.

    cpu_pm_notify() is basically a wrapper of notifier_call_chain().
    notifier_call_chain() doesn't initialize *nr_calls to 0 before it
    starts incrementing it--presumably it's up to the callers to do this.

    Unfortunately the callers of cpu_pm_notify() don't init *nr_calls.
    This potentially means you could get too many or two few calls to
    CPU_PM_ENTER_FAILED or CPU_CLUSTER_PM_ENTER_FAILED depending on the
    luck of the stack.

    Let's fix this.

    Fixes: ab10023e0088 ("cpu_pm: Add cpu power management notifiers")
    Cc: stable@vger.kernel.org
    Cc: Rafael J. Wysocki
    Reviewed-by: Stephen Boyd
    Reviewed-by: Greg Kroah-Hartman
    Signed-off-by: Douglas Anderson
    Link: https://lore.kernel.org/r/20200504104917.v6.3.I2d44fc0053d019f239527a4e5829416714b7e299@changeid
    Signed-off-by: Bjorn Andersson
    Signed-off-by: Greg Kroah-Hartman

    Douglas Anderson
     
  • [ Upstream commit 1ea0f9120c8ce105ca181b070561df5cbd6bc049 ]

    The map_lookup_and_delete_elem() function should check for both FMODE_CAN_WRITE
    and FMODE_CAN_READ permissions because it returns a map element to user space.

    Fixes: bd513cd08f10 ("bpf: add MAP_LOOKUP_AND_DELETE_ELEM syscall")
    Signed-off-by: Anton Protopopov
    Signed-off-by: Daniel Borkmann
    Link: https://lore.kernel.org/bpf/20200527185700.14658-5-a.s.protopopov@gmail.com
    Signed-off-by: Alexei Starovoitov
    Signed-off-by: Sasha Levin

    Anton Protopopov
     
  • [ Upstream commit d505b8af58912ae1e1a211fabc9995b19bd40828 ]

    When users write some huge number into cpu.cfs_quota_us or
    cpu.rt_runtime_us, overflow might happen during to_ratio() shifts of
    schedulable checks.

    to_ratio() could be altered to avoid unnecessary internal overflow, but
    min_cfs_quota_period is less than 1 << BW_SHIFT, so a cutoff would still
    be needed. Set a cap MAX_BW for cfs_quota_us and rt_runtime_us to
    prevent overflow.

    Signed-off-by: Huaixin Chang
    Signed-off-by: Peter Zijlstra (Intel)
    Reviewed-by: Ben Segall
    Link: https://lkml.kernel.org/r/20200425105248.60093-1-changhuaixin@linux.alibaba.com
    Signed-off-by: Sasha Levin

    Huaixin Chang
     
  • [ Upstream commit bf2c59fce4074e55d622089b34be3a6bc95484fb ]

    In the CPU-offline process, it calls mmdrop() after idle entry and the
    subsequent call to cpuhp_report_idle_dead(). Once execution passes the
    call to rcu_report_dead(), RCU is ignoring the CPU, which results in
    lockdep complaining when mmdrop() uses RCU from either memcg or
    debugobjects below.

    Fix it by cleaning up the active_mm state from BP instead. Every arch
    which has CONFIG_HOTPLUG_CPU should have already called idle_task_exit()
    from AP. The only exception is parisc because it switches them to
    &init_mm unconditionally (see smp_boot_one_cpu() and smp_cpu_init()),
    but the patch will still work there because it calls mmgrab(&init_mm) in
    smp_cpu_init() and then should call mmdrop(&init_mm) in finish_cpu().

    WARNING: suspicious RCU usage
    -----------------------------
    kernel/workqueue.c:710 RCU or wq_pool_mutex should be held!

    other info that might help us debug this:

    RCU used illegally from offline CPU!
    Call Trace:
    dump_stack+0xf4/0x164 (unreliable)
    lockdep_rcu_suspicious+0x140/0x164
    get_work_pool+0x110/0x150
    __queue_work+0x1bc/0xca0
    queue_work_on+0x114/0x120
    css_release+0x9c/0xc0
    percpu_ref_put_many+0x204/0x230
    free_pcp_prepare+0x264/0x570
    free_unref_page+0x38/0xf0
    __mmdrop+0x21c/0x2c0
    idle_task_exit+0x170/0x1b0
    pnv_smp_cpu_kill_self+0x38/0x2e0
    cpu_die+0x48/0x64
    arch_cpu_idle_dead+0x30/0x50
    do_idle+0x2f4/0x470
    cpu_startup_entry+0x38/0x40
    start_secondary+0x7a8/0xa80
    start_secondary_resume+0x10/0x14

    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Qian Cai
    Signed-off-by: Peter Zijlstra (Intel)
    Acked-by: Michael Ellerman (powerpc)
    Link: https://lkml.kernel.org/r/20200401214033.8448-1-cai@lca.pw
    Signed-off-by: Sasha Levin

    Peter Zijlstra
     
  • [ Upstream commit 586b58cac8b4683eb58a1446fbc399de18974e40 ]

    With CONFIG_DEBUG_ATOMIC_SLEEP=y and CONFIG_CGROUPS=y, kernel oopses in
    non-preemptible context look untidy; after the main oops, the kernel prints
    a "sleeping function called from invalid context" report because
    exit_signals() -> cgroup_threadgroup_change_begin() -> percpu_down_read()
    can sleep, and that happens before the preempt_count_set(PREEMPT_ENABLED)
    fixup.

    It looks like the same thing applies to profile_task_exit() and
    kcov_task_exit().

    Fix it by moving the preemption fixup up and the calls to
    profile_task_exit() and kcov_task_exit() down.

    Fixes: 1dc0fffc48af ("sched/core: Robustify preemption leak checks")
    Signed-off-by: Jann Horn
    Signed-off-by: Peter Zijlstra (Intel)
    Link: https://lkml.kernel.org/r/20200305220657.46800-1-jannh@google.com
    Signed-off-by: Sasha Levin

    Jann Horn
     
  • [ Upstream commit 3054d06719079388a543de6adb812638675ad8f5 ]

    If audit_list_rules_send() fails when trying to create a new thread
    to send the rules it also fails to cleanup properly, leaking a
    reference to a net structure. This patch fixes the error patch and
    renames audit_send_list() to audit_send_list_thread() to better
    match its cousin, audit_send_reply_thread().

    Reported-by: teroincn@gmail.com
    Reviewed-by: Richard Guy Briggs
    Signed-off-by: Paul Moore
    Signed-off-by: Sasha Levin

    Paul Moore
     
  • [ Upstream commit a48b284b403a4a073d8beb72d2bb33e54df67fb6 ]

    If audit_send_reply() fails when trying to create a new thread to
    send the reply it also fails to cleanup properly, leaking a reference
    to a net structure. This patch fixes the error path and makes a
    handful of other cleanups that came up while fixing the code.

    Reported-by: teroincn@gmail.com
    Reviewed-by: Richard Guy Briggs
    Signed-off-by: Paul Moore
    Signed-off-by: Sasha Levin

    Paul Moore
     
  • [ Upstream commit 3ca676e4ca60d1834bb77535dafe24169cadacef ]

    If we detect that we recursively entered the debugger we should hack
    our I/O ops to NULL so that the panic() in the next line won't
    actually cause another recursion into the debugger. The first line of
    kgdb_panic() will check this and return.

    Signed-off-by: Douglas Anderson
    Reviewed-by: Daniel Thompson
    Link: https://lore.kernel.org/r/20200507130644.v4.6.I89de39f68736c9de610e6f241e68d8dbc44bc266@changeid
    Signed-off-by: Daniel Thompson
    Signed-off-by: Sasha Levin

    Douglas Anderson
     
  • [ Upstream commit 202164fbfa2b2ffa3e66b504e0f126ba9a745006 ]

    In commit 81eaadcae81b ("kgdboc: disable the console lock when in
    kgdb") we avoided the WARN_CONSOLE_UNLOCKED() yell when we were in
    kgdboc. That still works fine, but it turns out that we get a similar
    yell when using other I/O drivers. One example is the "I/O driver"
    for the kgdb test suite (kgdbts). When I enabled that I again got the
    same yells.

    Even though "kgdbts" doesn't actually interact with the user over the
    console, using it still causes kgdb to print to the consoles. That
    trips the same warning:
    con_is_visible+0x60/0x68
    con_scroll+0x110/0x1b8
    lf+0x4c/0xc8
    vt_console_print+0x1b8/0x348
    vkdb_printf+0x320/0x89c
    kdb_printf+0x68/0x90
    kdb_main_loop+0x190/0x860
    kdb_stub+0x2cc/0x3ec
    kgdb_cpu_enter+0x268/0x744
    kgdb_handle_exception+0x1a4/0x200
    kgdb_compiled_brk_fn+0x34/0x44
    brk_handler+0x7c/0xb8
    do_debug_exception+0x1b4/0x228

    Let's increment/decrement the "ignore_console_lock_warning" variable
    all the time when we enter the debugger.

    This will allow us to later revert commit 81eaadcae81b ("kgdboc:
    disable the console lock when in kgdb").

    Signed-off-by: Douglas Anderson
    Reviewed-by: Greg Kroah-Hartman
    Reviewed-by: Daniel Thompson
    Link: https://lore.kernel.org/r/20200507130644.v4.1.Ied2b058357152ebcc8bf68edd6f20a11d98d7d4e@changeid
    Signed-off-by: Daniel Thompson
    Signed-off-by: Sasha Levin

    Douglas Anderson