27 Mar, 2019

1 commit

  • commit 71492580571467fb7177aade19c18ce7486267f5 upstream.

    Tetsuo Handa had reported he saw an incorrect "downgrading a read lock"
    warning right after a previous lockdep warning. It is likely that the
    previous warning turned off lock debugging causing the lockdep to have
    inconsistency states leading to the lock downgrade warning.

    Fix that by add a check for debug_locks at the beginning of
    __lock_downgrade().

    Debugged-by: Tetsuo Handa
    Reported-by: Tetsuo Handa
    Reported-by: syzbot+53383ae265fb161ef488@syzkaller.appspotmail.com
    Signed-off-by: Waiman Long
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andrew Morton
    Cc: Linus Torvalds
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Will Deacon
    Link: https://lkml.kernel.org/r/1547093005-26085-1-git-send-email-longman@redhat.com
    Signed-off-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Waiman Long
     

14 Nov, 2018

1 commit

  • [ Upstream commit 9506a7425b094d2f1d9c877ed5a78f416669269b ]

    It was found that when debug_locks was turned off because of a problem
    found by the lockdep code, the system performance could drop quite
    significantly when the lock_stat code was also configured into the
    kernel. For instance, parallel kernel build time on a 4-socket x86-64
    server nearly doubled.

    Further analysis into the cause of the slowdown traced back to the
    frequent call to debug_locks_off() from the __lock_acquired() function
    probably due to some inconsistent lockdep states with debug_locks
    off. The debug_locks_off() function did an unconditional atomic xchg
    to write a 0 value into debug_locks which had already been set to 0.
    This led to severe cacheline contention in the cacheline that held
    debug_locks. As debug_locks is being referenced in quite a few different
    places in the kernel, this greatly slow down the system performance.

    To prevent that trashing of debug_locks cacheline, lock_acquired()
    and lock_contended() now checks the state of debug_locks before
    proceeding. The debug_locks_off() function is also modified to check
    debug_locks before calling __debug_locks_off().

    Signed-off-by: Waiman Long
    Cc: Andrew Morton
    Cc: Linus Torvalds
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Will Deacon
    Link: http://lkml.kernel.org/r/1539913518-15598-1-git-send-email-longman@redhat.com
    Signed-off-by: Ingo Molnar
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Waiman Long
     

10 Sep, 2018

1 commit

  • Commit:

    c3bc8fd637a9 ("tracing: Centralize preemptirq tracepoints and unify their usage")

    added the inclusion of .

    liblockdep doesn't have a stub version of that header so now fails to build.

    However, commit:

    bff1b208a5d1 ("tracing: Partial revert of "tracing: Centralize preemptirq tracepoints and unify their usage"")

    removed the use of functions declared in that header. So delete the #include.

    Signed-off-by: Ben Hutchings
    Cc: Joel Fernandes
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Sasha Levin
    Cc: Steven Rostedt
    Cc: Thomas Gleixner
    Cc: Will Deacon
    Fixes: bff1b208a5d1 ("tracing: Partial revert of "tracing: Centralize ...")
    Fixes: c3bc8fd637a9 ("tracing: Centralize preemptirq tracepoints ...")
    Link: http://lkml.kernel.org/r/20180828203315.GD18030@decadent.org.uk
    Signed-off-by: Ingo Molnar

    Ben Hutchings
     

11 Aug, 2018

1 commit

  • Joel Fernandes created a nice patch that cleaned up the duplicate hooks used
    by lockdep and irqsoff latency tracer. It made both use tracepoints. But it
    caused lockdep to trigger several false positives. We have not figured out
    why yet, but removing lockdep from using the trace event hooks and just call
    its helper functions directly (like it use to), makes the problem go away.

    This is a partial revert of the clean up patch c3bc8fd637a9 ("tracing:
    Centralize preemptirq tracepoints and unify their usage") that adds direct
    calls for lockdep, but also keeps most of the clean up done to get rid of
    the horrible preprocessor if statements.

    Link: http://lkml.kernel.org/r/20180806155058.5ee875f4@gandalf.local.home

    Cc: Peter Zijlstra
    Reviewed-by: Joel Fernandes (Google)
    Fixes: c3bc8fd637a9 ("tracing: Centralize preemptirq tracepoints and unify their usage")
    Signed-off-by: Steven Rostedt (VMware)

    Steven Rostedt (VMware)
     

31 Jul, 2018

2 commits

  • This patch detaches the preemptirq tracepoints from the tracers and
    keeps it separate.

    Advantages:
    * Lockdep and irqsoff event can now run in parallel since they no longer
    have their own calls.

    * This unifies the usecase of adding hooks to an irqsoff and irqson
    event, and a preemptoff and preempton event.
    3 users of the events exist:
    - Lockdep
    - irqsoff and preemptoff tracers
    - irqs and preempt trace events

    The unification cleans up several ifdefs and makes the code in preempt
    tracer and irqsoff tracers simpler. It gets rid of all the horrific
    ifdeferry around PROVE_LOCKING and makes configuration of the different
    users of the tracepoints more easy and understandable. It also gets rid
    of the time_* function calls from the lockdep hooks used to call into
    the preemptirq tracer which is not needed anymore. The negative delta in
    lines of code in this patch is quite large too.

    In the patch we introduce a new CONFIG option PREEMPTIRQ_TRACEPOINTS
    as a single point for registering probes onto the tracepoints. With
    this,
    the web of config options for preempt/irq toggle tracepoints and its
    users becomes:

    PREEMPT_TRACER PREEMPTIRQ_EVENTS IRQSOFF_TRACER PROVE_LOCKING
    | | \ | |
    \ (selects) / \ \ (selects) /
    TRACE_PREEMPT_TOGGLE ----> TRACE_IRQFLAGS
    \ /
    \ (depends on) /
    PREEMPTIRQ_TRACEPOINTS

    Other than the performance tests mentioned in the previous patch, I also
    ran the locking API test suite. I verified that all tests cases are
    passing.

    I also injected issues by not registering lockdep probes onto the
    tracepoints and I see failures to confirm that the probes are indeed
    working.

    This series + lockdep probes not registered (just to inject errors):
    [ 0.000000] hard-irqs-on + irq-safe-A/21: ok | ok | ok |
    [ 0.000000] soft-irqs-on + irq-safe-A/21: ok | ok | ok |
    [ 0.000000] sirq-safe-A => hirqs-on/12:FAILED|FAILED| ok |
    [ 0.000000] sirq-safe-A => hirqs-on/21:FAILED|FAILED| ok |
    [ 0.000000] hard-safe-A + irqs-on/12:FAILED|FAILED| ok |
    [ 0.000000] soft-safe-A + irqs-on/12:FAILED|FAILED| ok |
    [ 0.000000] hard-safe-A + irqs-on/21:FAILED|FAILED| ok |
    [ 0.000000] soft-safe-A + irqs-on/21:FAILED|FAILED| ok |
    [ 0.000000] hard-safe-A + unsafe-B #1/123: ok | ok | ok |
    [ 0.000000] soft-safe-A + unsafe-B #1/123: ok | ok | ok |

    With this series + lockdep probes registered, all locking tests pass:

    [ 0.000000] hard-irqs-on + irq-safe-A/21: ok | ok | ok |
    [ 0.000000] soft-irqs-on + irq-safe-A/21: ok | ok | ok |
    [ 0.000000] sirq-safe-A => hirqs-on/12: ok | ok | ok |
    [ 0.000000] sirq-safe-A => hirqs-on/21: ok | ok | ok |
    [ 0.000000] hard-safe-A + irqs-on/12: ok | ok | ok |
    [ 0.000000] soft-safe-A + irqs-on/12: ok | ok | ok |
    [ 0.000000] hard-safe-A + irqs-on/21: ok | ok | ok |
    [ 0.000000] soft-safe-A + irqs-on/21: ok | ok | ok |
    [ 0.000000] hard-safe-A + unsafe-B #1/123: ok | ok | ok |
    [ 0.000000] soft-safe-A + unsafe-B #1/123: ok | ok | ok |

    Link: http://lkml.kernel.org/r/20180730222423.196630-4-joel@joelfernandes.org

    Acked-by: Peter Zijlstra (Intel)
    Reviewed-by: Namhyung Kim
    Signed-off-by: Joel Fernandes (Google)
    Signed-off-by: Steven Rostedt (VMware)

    Joel Fernandes (Google)
     
  • get_cpu_var disables preemption which has the potential to call into the
    preemption disable trace points causing some complications. There's also
    no need to disable preemption in uses of get_lock_stats anyway since
    preempt is already disabled. So lets simplify the code.

    Link: http://lkml.kernel.org/r/20180730222423.196630-2-joel@joelfernandes.org

    Suggested-by: Peter Zijlstra
    Acked-by: Peter Zijlstra
    Signed-off-by: Joel Fernandes (Google)
    Signed-off-by: Steven Rostedt (VMware)

    Joel Fernandes (Google)
     

22 Jun, 2018

1 commit

  • While debugging where things were going wrong with mapping
    enabling/disabling interrupts with the lockdep state and actual real
    enabling and disabling interrupts, I had to silent the IRQ
    disabling/enabling in debug_check_no_locks_freed() because it was
    always showing up as it was called before the splat was.

    Use raw_local_irq_save/restore() for not only debug_check_no_locks_freed()
    but for all internal lockdep functions, as they hide useful information
    about where interrupts were used incorrectly last.

    Signed-off-by: Steven Rostedt (VMware)
    Cc: Andrew Morton
    Cc: Linus Torvalds
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Will Deacon
    Link: https://lkml.kernel.org/lkml/20180404140630.3f4f4c7a@gandalf.local.home
    Signed-off-by: Ingo Molnar

    Steven Rostedt (VMware)
     

14 May, 2018

2 commits

  • Calling lockdep_print_held_locks() on a running thread is considered unsafe.

    Since all callers should follow that rule and the sanity check is not heavy,
    this patch moves the sanity check to inside lockdep_print_held_locks().

    As a side effect of this patch, the number of locks held by running threads
    will be printed as well. This change will be preferable when we want to
    know which threads might be relevant to a problem but are unable to print
    any clues because that thread is running.

    Signed-off-by: Tetsuo Handa
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Dmitry Vyukov
    Cc: Linus Torvalds
    Cc: Matthew Wilcox
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/1523011279-8206-2-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp
    Signed-off-by: Ingo Molnar

    Tetsuo Handa
     
  • debug_show_all_locks() tries to grab the tasklist_lock for two seconds, but
    calling while_each_thread() without tasklist_lock held is not safe.

    See the following commit for more information:

    4449a51a7c281602 ("vm_is_stack: use for_each_thread() rather then buggy while_each_thread()")

    Change debug_show_all_locks() from "do_each_thread()/while_each_thread()
    with possibility of missing tasklist_lock" to "for_each_process_thread()
    with RCU", and add a call to touch_all_softlockup_watchdogs() like
    show_state_filter() does.

    Signed-off-by: Tetsuo Handa
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/1523011279-8206-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp
    Signed-off-by: Ingo Molnar

    Tetsuo Handa
     

29 Mar, 2018

1 commit

  • The lock debug output in print_lock() has a few shortcomings:

    - It prints the hlock->acquire_ip field in %px and %pS format. That's
    redundant information.

    - It lacks information about the lock object itself. The lock class is
    not helpful to identify a particular instance of a lock.

    Change the output so it prints:

    - hlock->instance to allow identification of a particular lock instance.

    - only the %pS format of hlock->ip_acquire which is sufficient to decode
    the actual code line with faddr2line.

    The resulting output is:

    3 locks held by a.out/31106:
    #0: 00000000b0f753ba (&mm->mmap_sem){++++}, at: copy_process.part.41+0x10d5/0x1fe0
    #1: 00000000ef64d539 (&mm->mmap_sem/1){+.+.}, at: copy_process.part.41+0x10fe/0x1fe0
    #2: 00000000b41a282e (&mapping->i_mmap_rwsem){++++}, at: copy_process.part.41+0x12f2/0x1fe0

    [ tglx: Massaged changelog ]

    Signed-off-by: Tetsuo Handa
    Signed-off-by: Thomas Gleixner
    Acked-by: Michal Hocko
    Acked-by: David Rientjes
    Acked-by: Peter Zijlstra
    Cc: linux-mm@kvack.org
    Cc: Borislav Petkov
    Link: https://lkml.kernel.org/r/201803271941.GBE57310.tVSOJLQOFFOHFM@I-love.SAKURA.ne.jp

    Tetsuo Handa
     

28 Feb, 2018

1 commit

  • Show unadorned pointers in lockdep reports - lockdep is a debugging
    facility and hashing pointers there doesn't make a whole lotta sense.

    Signed-off-by: Borislav Petkov
    Signed-off-by: Thomas Gleixner
    Cc: Peter Zijlstra
    Link: https://lkml.kernel.org/r/20180226134926.23069-1-bp@alien8.de

    Borislav Petkov
     

31 Jan, 2018

1 commit

  • Pull locking updates from Ingo Molnar:
    "The main changes relate to making lock_is_held() et al (and external
    wrappers of them) work on const data types - this requires const
    propagation through the depths of lockdep.

    This removes a number of ugly type hacks the external helpers used"

    * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    lockdep: Convert some users to const
    lockdep: Make lockdep checking constant
    lockdep: Assign lock keys on registration

    Linus Torvalds
     

24 Jan, 2018

1 commit

  • debug_show_all_locks() iterates all tasks and print held locks whole
    holding tasklist_lock. This can take a while on a slow console device
    and may end up triggering NMI hardlockup detector if someone else ends
    up waiting for tasklist_lock.

    Touch the NMI watchdog while printing the held locks to avoid
    spuriously triggering the hardlockup detector.

    Signed-off-by: Tejun Heo
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: kernel-team@fb.com
    Link: http://lkml.kernel.org/r/20180122220055.GB1771050@devbig577.frc2.facebook.com
    Signed-off-by: Ingo Molnar

    Tejun Heo
     

18 Jan, 2018

2 commits

  • There are several places in the kernel which would like to pass a const
    pointer to lockdep_is_held(). Constify the entire path so nobody has to
    trick the compiler.

    Signed-off-by: Matthew Wilcox
    Signed-off-by: Thomas Gleixner
    Acked-by: Peter Zijlstra (Intel)
    Cc: "David S. Miller"
    Link: https://lkml.kernel.org/r/20180117151414.23686-3-willy@infradead.org

    Matthew Wilcox
     
  • Lockdep is assigning lock keys when a lock was looked up. This is
    unnecessary; if the lock has never been registered then it is known that it
    is not locked. It also complicates the calling convention. Switch to
    assigning the lock key in register_lock_class().

    Signed-off-by: Matthew Wilcox
    Signed-off-by: Thomas Gleixner
    Acked-by: Peter Zijlstra (Intel)
    Cc: "David S. Miller"
    Link: https://lkml.kernel.org/r/20180117151414.23686-2-willy@infradead.org

    Matthew Wilcox
     

12 Dec, 2017

1 commit

  • This code (CONFIG_LOCKDEP_CROSSRELEASE=y and CONFIG_LOCKDEP_COMPLETIONS=y),
    while it found a number of old bugs initially, was also causing too many
    false positives that caused people to disable lockdep - which is arguably
    a worse overall outcome.

    If we disable cross-release by default but keep the code upstream then
    in practice the most likely outcome is that we'll allow the situation
    to degrade gradually, by allowing entropy to introduce more and more
    false positives, until it overwhelms maintenance capacity.

    Another bad side effect was that people were trying to work around
    the false positives by uglifying/complicating unrelated code. There's
    a marked difference between annotating locking operations and
    uglifying good code just due to bad lock debugging code ...

    This gradual decrease in quality happened to a number of debugging
    facilities in the kernel, and lockdep is pretty complex already,
    so we cannot risk this outcome.

    Either cross-release checking can be done right with no false positives,
    or it should not be included in the upstream kernel.

    ( Note that it might make sense to maintain it out of tree and go through
    the false positives every now and then and see whether new bugs were
    introduced. )

    Cc: Byungchul Park
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

07 Dec, 2017

1 commit

  • We can't invalidate xhlocks when we've not yet allocated any.

    Reported-by: Dmitry Vyukov
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: linux-kernel@vger.kernel.org
    Fixes: f52be5708076 ("locking/lockdep: Untangle xhlock history save/restore from task independence")
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

16 Nov, 2017

1 commit

  • Patch series "kmemcheck: kill kmemcheck", v2.

    As discussed at LSF/MM, kill kmemcheck.

    KASan is a replacement that is able to work without the limitation of
    kmemcheck (single CPU, slow). KASan is already upstream.

    We are also not aware of any users of kmemcheck (or users who don't
    consider KASan as a suitable replacement).

    The only objection was that since KASAN wasn't supported by all GCC
    versions provided by distros at that time we should hold off for 2
    years, and try again.

    Now that 2 years have passed, and all distros provide gcc that supports
    KASAN, kill kmemcheck again for the very same reasons.

    This patch (of 4):

    Remove kmemcheck annotations, and calls to kmemcheck from the kernel.

    [alexander.levin@verizon.com: correctly remove kmemcheck call from dma_map_sg_attrs]
    Link: http://lkml.kernel.org/r/20171012192151.26531-1-alexander.levin@verizon.com
    Link: http://lkml.kernel.org/r/20171007030159.22241-2-alexander.levin@verizon.com
    Signed-off-by: Sasha Levin
    Cc: Alexander Potapenko
    Cc: Eric W. Biederman
    Cc: Michal Hocko
    Cc: Pekka Enberg
    Cc: Steven Rostedt
    Cc: Tim Hansen
    Cc: Vegard Nossum
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Levin, Alexander (Sasha Levin)
     

25 Oct, 2017

2 commits

  • Add a Kconfig knob that enables the lockdep "crossrelease_fullstack" boot parameter.

    Suggested-by: Ingo Molnar
    Signed-off-by: Byungchul Park
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: amir73il@gmail.com
    Cc: axboe@kernel.dk
    Cc: darrick.wong@oracle.com
    Cc: david@fromorbit.com
    Cc: hch@infradead.org
    Cc: idryomov@gmail.com
    Cc: johan@kernel.org
    Cc: johannes.berg@intel.com
    Cc: kernel-team@lge.com
    Cc: linux-block@vger.kernel.org
    Cc: linux-fsdevel@vger.kernel.org
    Cc: linux-mm@kvack.org
    Cc: linux-xfs@vger.kernel.org
    Cc: oleg@redhat.com
    Cc: tj@kernel.org
    Link: http://lkml.kernel.org/r/1508921765-15396-7-git-send-email-byungchul.park@lge.com
    Signed-off-by: Ingo Molnar

    Byungchul Park
     
  • Johan Hovold reported a heavy performance regression caused by lockdep
    cross-release:

    > Boot time (from "Linux version" to login prompt) had in fact doubled
    > since 4.13 where it took 17 seconds (with my current config) compared to
    > the 35 seconds I now see with 4.14-rc4.
    >
    > I quick bisect pointed to lockdep and specifically the following commit:
    >
    > 28a903f63ec0 ("locking/lockdep: Handle non(or multi)-acquisition
    > of a crosslock")
    >
    > which I've verified is the commit which doubled the boot time (compared
    > to 28a903f63ec0^) (added by lockdep crossrelease series [1]).

    Currently cross-release performs unwind on every acquisition, but that
    is very expensive.

    This patch makes unwind optional and disables it by default and only
    records acquire_ip.

    Full stack traces are sometimes required for full analysis, in which
    case a boot paramter, crossrelease_fullstack, can be specified.

    On my qemu Ubuntu machine (x86_64, 4 cores, 512M), the regression was
    fixed. We measure boot times with 'perf stat --null --repeat 10 $QEMU',
    where $QEMU launches a kernel with init=/bin/true:

    1. No lockdep enabled:

    Performance counter stats for 'qemu_booting_time.sh bzImage' (10 runs):

    2.756558155 seconds time elapsed ( +- 0.09% )

    2. Lockdep enabled:

    Performance counter stats for 'qemu_booting_time.sh bzImage' (10 runs):

    2.968710420 seconds time elapsed ( +- 0.12% )

    3. Lockdep enabled + cross-release enabled:

    Performance counter stats for 'qemu_booting_time.sh bzImage' (10 runs):

    3.153839636 seconds time elapsed ( +- 0.31% )

    4. Lockdep enabled + cross-release enabled + this patch applied:

    Performance counter stats for 'qemu_booting_time.sh bzImage' (10 runs):

    2.963669551 seconds time elapsed ( +- 0.11% )

    I.e. lockdep cross-release performance is now indistinguishable from
    vanilla lockdep.

    Bisected-by: Johan Hovold
    Analyzed-by: Thomas Gleixner
    Suggested-by: Thomas Gleixner
    Reported-by: Johan Hovold
    Signed-off-by: Byungchul Park
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: amir73il@gmail.com
    Cc: axboe@kernel.dk
    Cc: darrick.wong@oracle.com
    Cc: david@fromorbit.com
    Cc: hch@infradead.org
    Cc: idryomov@gmail.com
    Cc: johannes.berg@intel.com
    Cc: kernel-team@lge.com
    Cc: linux-block@vger.kernel.org
    Cc: linux-fsdevel@vger.kernel.org
    Cc: linux-mm@kvack.org
    Cc: linux-xfs@vger.kernel.org
    Cc: oleg@redhat.com
    Cc: tj@kernel.org
    Link: http://lkml.kernel.org/r/1508921765-15396-5-git-send-email-byungchul.park@lge.com
    Signed-off-by: Ingo Molnar

    Byungchul Park
     

10 Oct, 2017

1 commit

  • There is some complication between check_prevs_add() and
    check_prev_add() wrt. saving stack traces. The problem is that we want
    to be frugal with saving stack traces, since it consumes static
    resources.

    We'll only know in check_prev_add() if we need the trace, but we can
    call into it multiple times. So we want to do on-demand and re-use.

    A further complication is that check_prev_add() can drop graph_lock
    and mess with our static resources.

    In any case, the current state; after commit:

    ce07a9415f26 ("locking/lockdep: Make check_prev_add() able to handle external stack_trace")

    is that we'll assume the trace contains valid data once
    check_prev_add() returns '2'. However, as noted by Josh, this is
    false, check_prev_add() can return '2' before having saved a trace,
    this then result in the possibility of using uninitialized data.
    Testing, as reported by Wu, shows a NULL deref.

    So simplify.

    Since the graph_lock() thing is a debug path that hasn't
    really been used in a long while, take it out back and avoid the
    head-ache.

    Further initialize the stack_trace to a known 'empty' state; as long
    as nr_entries == 0, nothing should deref entries. We can then use the
    'entries == NULL' test for a valid trace / on-demand saving.

    Analyzed-by: Josh Poimboeuf
    Reported-by: Fengguang Wu
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Byungchul Park
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Fixes: ce07a9415f26 ("locking/lockdep: Make check_prev_add() able to handle external stack_trace")
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

29 Aug, 2017

1 commit

  • Where XHLOCK_{SOFT,HARD} are save/restore points in the xhlocks[] to
    ensure the temporal IRQ events don't interact with task state, the
    XHLOCK_PROC is a fundament different beast that just happens to share
    the interface.

    The purpose of XHLOCK_PROC is to annotate independent execution inside
    one task. For example workqueues, each work should appear to run in its
    own 'pristine' 'task'.

    Remove XHLOCK_PROC in favour of its own interface to avoid confusion.

    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Byungchul Park
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: boqun.feng@gmail.com
    Cc: david@fromorbit.com
    Cc: johannes@sipsolutions.net
    Cc: kernel-team@lge.com
    Cc: oleg@redhat.com
    Cc: tj@kernel.org
    Link: http://lkml.kernel.org/r/20170829085939.ggmb6xiohw67micb@hirez.programming.kicks-ass.net
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

25 Aug, 2017

1 commit

  • The new completion/crossrelease annotations interact unfavourable with
    the extant flush_work()/flush_workqueue() annotations.

    The problem is that when a single work class does:

    wait_for_completion(&C)

    and

    complete(&C)

    in different executions, we'll build dependencies like:

    lock_map_acquire(W)
    complete_acquire(C)

    and

    lock_map_acquire(W)
    complete_release(C)

    which results in the dependency chain: W->C->W, which lockdep thinks
    spells deadlock, even though there is no deadlock potential since
    works are ran concurrently.

    One possibility would be to change the work 'lock' to recursive-read,
    but that would mean hitting a lockdep limitation on recursive locks.
    Also, unconditinoally switching to recursive-read here would fail to
    detect the actual deadlock on single-threaded workqueues, which do
    have a problem with this.

    For now, forcefully disregard these locks for crossrelease.

    Signed-off-by: Peter Zijlstra (Intel)
    Acked-by: Tejun Heo
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: boqun.feng@gmail.com
    Cc: byungchul.park@lge.com
    Cc: david@fromorbit.com
    Cc: johannes@sipsolutions.net
    Cc: oleg@redhat.com
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

14 Aug, 2017

2 commits

  • As Boqun Feng pointed out, current->hist_id should be aligned with the
    latest valid xhlock->hist_id so that hist_id_save[] storing current->hist_id
    can be comparable with xhlock->hist_id. Fix it.

    Additionally, the condition for overwrite-detection should be the
    opposite. Fix the code and the comments as well.


    Signed-off-by: Byungchul Park
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: akpm@linux-foundation.org
    Cc: kernel-team@lge.com
    Cc: kirill@shutemov.name
    Cc: linux-mm@kvack.org
    Cc: npiggin@gmail.com
    Cc: walken@google.com
    Cc: willy@infradead.org
    Link: http://lkml.kernel.org/r/1502694052-16085-3-git-send-email-byungchul.park@lge.com
    [ Improve the comments some more. ]
    Signed-off-by: Ingo Molnar

    Byungchul Park
     
  • In lockdep_sys_exit(), crossrelease_hist_end() is called unconditionally
    even when getting here without having started e.g. just after forking.

    But it's no problem since it would roll back to an invalid entry anyway.
    Add a comment to explain this.

    Signed-off-by: Byungchul Park
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: akpm@linux-foundation.org
    Cc: boqun.feng@gmail.com
    Cc: kernel-team@lge.com
    Cc: kirill@shutemov.name
    Cc: linux-mm@kvack.org
    Cc: npiggin@gmail.com
    Cc: walken@google.com
    Cc: willy@infradead.org
    Link: http://lkml.kernel.org/r/1502694052-16085-2-git-send-email-byungchul.park@lge.com
    [ Improved the description and the comments. ]
    Signed-off-by: Ingo Molnar

    Byungchul Park
     

10 Aug, 2017

10 commits

  • print_circular_bug() reporting circular bug assumes that target hlock is
    owned by the current. However, in crossrelease, target hlock can be
    owned by other than the current. So the report format needs to be
    changed to reflect the change.

    Signed-off-by: Byungchul Park
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: akpm@linux-foundation.org
    Cc: boqun.feng@gmail.com
    Cc: kernel-team@lge.com
    Cc: kirill@shutemov.name
    Cc: npiggin@gmail.com
    Cc: walken@google.com
    Cc: willy@infradead.org
    Link: http://lkml.kernel.org/r/1502089981-21272-9-git-send-email-byungchul.park@lge.com
    Signed-off-by: Ingo Molnar

    Byungchul Park
     
  • No acquisition might be in progress on commit of a crosslock. Completion
    operations enabling crossrelease are the case like:

    CONTEXT X CONTEXT Y
    --------- ---------
    trigger completion context
    complete AX
    commit AX
    wait_for_complete AX
    acquire AX
    wait

    where AX is a crosslock.

    When no acquisition is in progress, we should not perform commit because
    the lock does not exist, which might cause incorrect memory access. So
    we have to track the number of acquisitions of a crosslock to handle it.

    Moreover, in case that more than one acquisition of a crosslock are
    overlapped like:

    CONTEXT W CONTEXT X CONTEXT Y CONTEXT Z
    --------- --------- --------- ---------
    acquire AX (gen_id: 1)
    acquire A
    acquire AX (gen_id: 10)
    acquire B
    commit AX
    acquire C
    commit AX

    where A, B and C are typical locks and AX is a crosslock.

    Current crossrelease code performs commits in Y and Z with gen_id = 10.
    However, we can use gen_id = 1 to do it, since not only 'acquire AX in X'
    but 'acquire AX in W' also depends on each acquisition in Y and Z until
    their commits. So make it use gen_id = 1 instead of 10 on their commits,
    which adds an additional dependency 'AX -> A' in the example above.

    Signed-off-by: Byungchul Park
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: akpm@linux-foundation.org
    Cc: boqun.feng@gmail.com
    Cc: kernel-team@lge.com
    Cc: kirill@shutemov.name
    Cc: npiggin@gmail.com
    Cc: walken@google.com
    Cc: willy@infradead.org
    Link: http://lkml.kernel.org/r/1502089981-21272-8-git-send-email-byungchul.park@lge.com
    Signed-off-by: Ingo Molnar

    Byungchul Park
     
  • The ring buffer can be overwritten by hardirq/softirq/work contexts.
    That cases must be considered on rollback or commit. For example,

    ||
    ppppppppppppiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii
    wrapped > iiiiiiiiiiiiiiiiiiiiiii....................

    where 'p' represents an acquisition in process context,
    'i' represents an acquisition in irq context.

    On irq exit, crossrelease tries to rollback idx to original position,
    but it should not because the entry already has been invalid by
    overwriting 'i'. Avoid rollback or commit for entries overwritten.

    Signed-off-by: Byungchul Park
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: akpm@linux-foundation.org
    Cc: boqun.feng@gmail.com
    Cc: kernel-team@lge.com
    Cc: kirill@shutemov.name
    Cc: npiggin@gmail.com
    Cc: walken@google.com
    Cc: willy@infradead.org
    Link: http://lkml.kernel.org/r/1502089981-21272-7-git-send-email-byungchul.park@lge.com
    Signed-off-by: Ingo Molnar

    Byungchul Park
     
  • Lockdep is a runtime locking correctness validator that detects and
    reports a deadlock or its possibility by checking dependencies between
    locks. It's useful since it does not report just an actual deadlock but
    also the possibility of a deadlock that has not actually happened yet.
    That enables problems to be fixed before they affect real systems.

    However, this facility is only applicable to typical locks, such as
    spinlocks and mutexes, which are normally released within the context in
    which they were acquired. However, synchronization primitives like page
    locks or completions, which are allowed to be released in any context,
    also create dependencies and can cause a deadlock.

    So lockdep should track these locks to do a better job. The 'crossrelease'
    implementation makes these primitives also be tracked.

    Signed-off-by: Byungchul Park
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: akpm@linux-foundation.org
    Cc: boqun.feng@gmail.com
    Cc: kernel-team@lge.com
    Cc: kirill@shutemov.name
    Cc: npiggin@gmail.com
    Cc: walken@google.com
    Cc: willy@infradead.org
    Link: http://lkml.kernel.org/r/1502089981-21272-6-git-send-email-byungchul.park@lge.com
    Signed-off-by: Ingo Molnar

    Byungchul Park
     
  • Currently, a space for stack_trace is pinned in check_prev_add(), that
    makes us not able to use external stack_trace. The simplest way to
    achieve it is to pass an external stack_trace as an argument.

    A more suitable solution is to pass a callback additionally along with
    a stack_trace so that callers can decide the way to save or whether to
    save. Actually crossrelease needs to do other than saving a stack_trace.
    So pass a stack_trace and callback to handle it, to check_prev_add().

    Signed-off-by: Byungchul Park
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: akpm@linux-foundation.org
    Cc: boqun.feng@gmail.com
    Cc: kernel-team@lge.com
    Cc: kirill@shutemov.name
    Cc: npiggin@gmail.com
    Cc: walken@google.com
    Cc: willy@infradead.org
    Link: http://lkml.kernel.org/r/1502089981-21272-5-git-send-email-byungchul.park@lge.com
    Signed-off-by: Ingo Molnar

    Byungchul Park
     
  • Firstly, return 1 instead of 2 when 'prev -> next' dependency already
    exists. Since the value 2 is not referenced anywhere, just return 1
    indicating success in this case.

    Secondly, return 2 instead of 1 when successfully added a lock_list
    entry with saving stack_trace. With that, a caller can decide whether
    to avoid redundant save_trace() on the caller site.

    Signed-off-by: Byungchul Park
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: akpm@linux-foundation.org
    Cc: boqun.feng@gmail.com
    Cc: kernel-team@lge.com
    Cc: kirill@shutemov.name
    Cc: npiggin@gmail.com
    Cc: walken@google.com
    Cc: willy@infradead.org
    Link: http://lkml.kernel.org/r/1502089981-21272-4-git-send-email-byungchul.park@lge.com
    Signed-off-by: Ingo Molnar

    Byungchul Park
     
  • Crossrelease needs to build a chain between two classes regardless of
    their contexts. However, add_chain_cache() cannot be used for that
    purpose since it assumes that it's called in the acquisition context
    of the hlock. So this patch introduces a new function doing it.

    Signed-off-by: Byungchul Park
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: akpm@linux-foundation.org
    Cc: boqun.feng@gmail.com
    Cc: kernel-team@lge.com
    Cc: kirill@shutemov.name
    Cc: npiggin@gmail.com
    Cc: walken@google.com
    Cc: willy@infradead.org
    Link: http://lkml.kernel.org/r/1502089981-21272-3-git-send-email-byungchul.park@lge.com
    Signed-off-by: Ingo Molnar

    Byungchul Park
     
  • Currently, lookup_chain_cache() provides both 'lookup' and 'add'
    functionalities in a function. However, each is useful. So this
    patch makes lookup_chain_cache() only do 'lookup' functionality and
    makes add_chain_cahce() only do 'add' functionality. And it's more
    readable than before.

    Signed-off-by: Byungchul Park
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: akpm@linux-foundation.org
    Cc: boqun.feng@gmail.com
    Cc: kernel-team@lge.com
    Cc: kirill@shutemov.name
    Cc: npiggin@gmail.com
    Cc: walken@google.com
    Cc: willy@infradead.org
    Link: http://lkml.kernel.org/r/1502089981-21272-2-git-send-email-byungchul.park@lge.com
    Signed-off-by: Ingo Molnar

    Byungchul Park
     
  • Two boots + a make defconfig, the first didn't have the redundant bit
    in, the second did:

    lock-classes: 1168 1169 [max: 8191]
    direct dependencies: 7688 5812 [max: 32768]
    indirect dependencies: 25492 25937
    all direct dependencies: 220113 217512
    dependency chains: 9005 9008 [max: 65536]
    dependency chain hlocks: 34450 34366 [max: 327680]
    in-hardirq chains: 55 51
    in-softirq chains: 371 378
    in-process chains: 8579 8579
    stack-trace entries: 108073 88474 [max: 524288]
    combined max dependencies: 178738560 169094640

    max locking depth: 15 15
    max bfs queue depth: 320 329

    cyclic checks: 9123 9190

    redundant checks: 5046
    redundant links: 1828

    find-mask forwards checks: 2564 2599
    find-mask backwards checks: 39521 39789

    So it saves nearly 2k links and a fair chunk of stack-trace entries, but
    as expected, makes no real difference on the indirect dependencies.

    At the same time, you see the max BFS depth increase, which is also
    expected, although it could easily be boot variance -- these numbers are
    not entirely stable between boots.

    The down side is that the cycles in the graph become larger and thus
    the reports harder to read.

    XXX: do we want this as a CONFIG variable, implied by LOCKDEP_SMALL?

    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Byungchul Park
    Cc: Linus Torvalds
    Cc: Mel Gorman
    Cc: Michal Hocko
    Cc: Nikolay Borisov
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: akpm@linux-foundation.org
    Cc: boqun.feng@gmail.com
    Cc: iamjoonsoo.kim@lge.com
    Cc: kernel-team@lge.com
    Cc: kirill@shutemov.name
    Cc: npiggin@gmail.com
    Cc: walken@google.com
    Link: http://lkml.kernel.org/r/20170303091338.GH6536@twins.programming.kicks-ass.net
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • A while ago someone, and I cannot find the email just now, asked if we
    could not implement the RECLAIM_FS inversion stuff with a 'fake' lock
    like we use for other things like workqueues etc. I think this should
    be possible which allows reducing the 'irq' states and will reduce the
    amount of __bfs() lookups we do.

    Removing the 1 IRQ state results in 4 less __bfs() walks per
    dependency, improving lockdep performance. And by moving this
    annotation out of the lockdep code it becomes easier for the mm people
    to extend.

    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Byungchul Park
    Cc: Linus Torvalds
    Cc: Mel Gorman
    Cc: Michal Hocko
    Cc: Nikolay Borisov
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: akpm@linux-foundation.org
    Cc: boqun.feng@gmail.com
    Cc: iamjoonsoo.kim@lge.com
    Cc: kernel-team@lge.com
    Cc: kirill@shutemov.name
    Cc: npiggin@gmail.com
    Cc: walken@google.com
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

09 Jun, 2017

2 commits

  • The PROVE_RCU_REPEATEDLY Kconfig option was initially added due to
    the volume of messages from PROVE_RCU: Doing just one per boot would
    have required excessive numbers of boots to locate them all. However,
    PROVE_RCU messages are now relatively rare, so there is no longer any
    reason to need more than one such message per boot. This commit therefore
    removes the PROVE_RCU_REPEATEDLY Kconfig option.

    Signed-off-by: Paul E. McKenney
    Cc: Ingo Molnar

    Paul E. McKenney
     
  • Commit a5dd63efda3d ("lockdep: Use "WARNING" tag on lockdep splats")
    substituted pr_warn() for printk() in places called out by Dmitry Vyukov.
    However, this resulted in an ugly mix of pr_warn() and printk(). This
    commit therefore changes printk() to pr_warn() or pr_cont(), depending
    on the absence or presence of KERN_CONT. This is done in all functions
    that had printk() changed to pr_warn() by the aforementioned commit.

    Reported-by: Peter Zijlstra
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

11 May, 2017

1 commit

  • Pull RCU updates from Ingo Molnar:
    "The main changes are:

    - Debloat RCU headers

    - Parallelize SRCU callback handling (plus overlapping patches)

    - Improve the performance of Tree SRCU on a CPU-hotplug stress test

    - Documentation updates

    - Miscellaneous fixes"

    * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (74 commits)
    rcu: Open-code the rcu_cblist_n_lazy_cbs() function
    rcu: Open-code the rcu_cblist_n_cbs() function
    rcu: Open-code the rcu_cblist_empty() function
    rcu: Separately compile large rcu_segcblist functions
    srcu: Debloat the header
    srcu: Adjust default auto-expediting holdoff
    srcu: Specify auto-expedite holdoff time
    srcu: Expedite first synchronize_srcu() when idle
    srcu: Expedited grace periods with reduced memory contention
    srcu: Make rcutorture writer stalls print SRCU GP state
    srcu: Exact tracking of srcu_data structures containing callbacks
    srcu: Make SRCU be built by default
    srcu: Fix Kconfig botch when SRCU not selected
    rcu: Make non-preemptive schedule be Tasks RCU quiescent state
    srcu: Expedite srcu_schedule_cbs_snp() callback invocation
    srcu: Parallelize callback handling
    kvm: Move srcu_struct fields to end of struct kvm
    rcu: Fix typo in PER_RCU_NODE_PERIOD header comment
    rcu: Use true/false in assignment to bool
    rcu: Use bool value directly
    ...

    Linus Torvalds
     

04 May, 2017

2 commits

  • Merge misc updates from Andrew Morton:

    - a few misc things

    - most of MM

    - KASAN updates

    * emailed patches from Andrew Morton : (102 commits)
    kasan: separate report parts by empty lines
    kasan: improve double-free report format
    kasan: print page description after stacks
    kasan: improve slab object description
    kasan: change report header
    kasan: simplify address description logic
    kasan: change allocation and freeing stack traces headers
    kasan: unify report headers
    kasan: introduce helper functions for determining bug type
    mm: hwpoison: call shake_page() after try_to_unmap() for mlocked page
    mm: hwpoison: call shake_page() unconditionally
    mm/swapfile.c: fix swap space leak in error path of swap_free_entries()
    mm/gup.c: fix access_ok() argument type
    mm/truncate: avoid pointless cleancache_invalidate_inode() calls.
    mm/truncate: bail out early from invalidate_inode_pages2_range() if mapping is empty
    fs/block_dev: always invalidate cleancache in invalidate_bdev()
    fs: fix data invalidation in the cleancache during direct IO
    zram: reduce load operation in page_same_filled
    zram: use zram_free_page instead of open-coded
    zram: introduce zram data accessor
    ...

    Linus Torvalds
     
  • GFP_NOFS context is used for the following 5 reasons currently:

    - to prevent from deadlocks when the lock held by the allocation
    context would be needed during the memory reclaim

    - to prevent from stack overflows during the reclaim because the
    allocation is performed from a deep context already

    - to prevent lockups when the allocation context depends on other
    reclaimers to make a forward progress indirectly

    - just in case because this would be safe from the fs POV

    - silence lockdep false positives

    Unfortunately overuse of this allocation context brings some problems to
    the MM. Memory reclaim is much weaker (especially during heavy FS
    metadata workloads), OOM killer cannot be invoked because the MM layer
    doesn't have enough information about how much memory is freeable by the
    FS layer.

    In many cases it is far from clear why the weaker context is even used
    and so it might be used unnecessarily. We would like to get rid of
    those as much as possible. One way to do that is to use the flag in
    scopes rather than isolated cases. Such a scope is declared when really
    necessary, tracked per task and all the allocation requests from within
    the context will simply inherit the GFP_NOFS semantic.

    Not only this is easier to understand and maintain because there are
    much less problematic contexts than specific allocation requests, this
    also helps code paths where FS layer interacts with other layers (e.g.
    crypto, security modules, MM etc...) and there is no easy way to convey
    the allocation context between the layers.

    Introduce memalloc_nofs_{save,restore} API to control the scope of
    GFP_NOFS allocation context. This is basically copying
    memalloc_noio_{save,restore} API we have for other restricted allocation
    context GFP_NOIO. The PF_MEMALLOC_NOFS flag already exists and it is
    just an alias for PF_FSTRANS which has been xfs specific until recently.
    There are no more PF_FSTRANS users anymore so let's just drop it.

    PF_MEMALLOC_NOFS is now checked in the MM layer and drops __GFP_FS
    implicitly same as PF_MEMALLOC_NOIO drops __GFP_IO. memalloc_noio_flags
    is renamed to current_gfp_context because it now cares about both
    PF_MEMALLOC_NOFS and PF_MEMALLOC_NOIO contexts. Xfs code paths preserve
    their semantic. kmem_flags_convert() doesn't need to evaluate the flag
    anymore.

    This patch shouldn't introduce any functional changes.

    Let's hope that filesystems will drop direct GFP_NOFS (resp. ~__GFP_FS)
    usage as much as possible and only use a properly documented
    memalloc_nofs_{save,restore} checkpoints where they are appropriate.

    [akpm@linux-foundation.org: fix comment typo, reflow comment]
    Link: http://lkml.kernel.org/r/20170306131408.9828-5-mhocko@kernel.org
    Signed-off-by: Michal Hocko
    Acked-by: Vlastimil Babka
    Cc: Dave Chinner
    Cc: Theodore Ts'o
    Cc: Chris Mason
    Cc: David Sterba
    Cc: Jan Kara
    Cc: Brian Foster
    Cc: Darrick J. Wong
    Cc: Nikolay Borisov
    Cc: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko