21 Dec, 2018

9 commits

  • commit 7aa54be2976550f17c11a1c3e3630002dea39303 upstream.

    On x86 we cannot do fetch_or() with a single instruction and thus end up
    using a cmpxchg loop, this reduces determinism. Replace the fetch_or()
    with a composite operation: tas-pending + load.

    Using two instructions of course opens a window we previously did not
    have. Consider the scenario:

    CPU0 CPU1 CPU2

    1) lock
    trylock -> (0,0,1)

    2) lock
    trylock /* fail */

    3) unlock -> (0,0,0)

    4) lock
    trylock -> (0,0,1)

    5) tas-pending -> (0,1,1)
    load-val (0,0,1)

    FAIL: _2_ owners

    where 5) is our new composite operation. When we consider each part of
    the qspinlock state as a separate variable (as we can when
    _Q_PENDING_BITS == 8) then the above is entirely possible, because
    tas-pending will only RmW the pending byte, so the later load is able
    to observe prior tail and lock state (but not earlier than its own
    trylock, which operates on the whole word, due to coherence).

    To avoid this we need 2 things:

    - the load must come after the tas-pending (obviously, otherwise it
    can trivially observe prior state).

    - the tas-pending must be a full word RmW instruction, it cannot be an XCHGB for
    example, such that we cannot observe other state prior to setting
    pending.

    On x86 we can realize this by using "LOCK BTS m32, r32" for
    tas-pending followed by a regular load.

    Note that observing later state is not a problem:

    - if we fail to observe a later unlock, we'll simply spin-wait for
    that store to become visible.

    - if we observe a later xchg_tail(), there is no difference from that
    xchg_tail() having taken place before the tas-pending.

    Suggested-by: Will Deacon
    Reported-by: Thomas Gleixner
    Signed-off-by: Peter Zijlstra (Intel)
    Reviewed-by: Will Deacon
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: andrea.parri@amarulasolutions.com
    Cc: longman@redhat.com
    Fixes: 59fb586b4a07 ("locking/qspinlock: Remove unbounded cmpxchg() loop from locking slowpath")
    Link: https://lkml.kernel.org/r/20181003130957.183726335@infradead.org
    Signed-off-by: Ingo Molnar
    [bigeasy: GEN_BINARY_RMWcc macro redo]
    Signed-off-by: Sebastian Andrzej Siewior
    Signed-off-by: Sasha Levin

    Peter Zijlstra
     
  • commit 53bf57fab7321fb42b703056a4c80fc9d986d170 upstream.

    Flip the branch condition after atomic_fetch_or_acquire(_Q_PENDING_VAL)
    such that we loose the indent. This also result in a more natural code
    flow IMO.

    Signed-off-by: Peter Zijlstra (Intel)
    Acked-by: Will Deacon
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: andrea.parri@amarulasolutions.com
    Cc: longman@redhat.com
    Link: https://lkml.kernel.org/r/20181003130257.156322446@infradead.org
    Signed-off-by: Ingo Molnar
    Signed-off-by: Sebastian Andrzej Siewior
    Signed-off-by: Sasha Levin

    Peter Zijlstra
     
  • commit c61da58d8a9ba9238250a548f00826eaf44af0f7 upstream.

    When a queued locker reaches the head of the queue, it claims the lock
    by setting _Q_LOCKED_VAL in the lockword. If there isn't contention, it
    must also clear the tail as part of this operation so that subsequent
    lockers can avoid taking the slowpath altogether.

    Currently this is expressed as a cmpxchg() loop that practically only
    runs up to two iterations. This is confusing to the reader and unhelpful
    to the compiler. Rewrite the cmpxchg() loop without the loop, so that a
    failed cmpxchg() implies that there is contention and we just need to
    write to _Q_LOCKED_VAL without considering the rest of the lockword.

    Signed-off-by: Will Deacon
    Acked-by: Peter Zijlstra (Intel)
    Acked-by: Waiman Long
    Cc: Linus Torvalds
    Cc: Thomas Gleixner
    Cc: boqun.feng@gmail.com
    Cc: linux-arm-kernel@lists.infradead.org
    Cc: paulmck@linux.vnet.ibm.com
    Link: http://lkml.kernel.org/r/1524738868-31318-7-git-send-email-will.deacon@arm.com
    Signed-off-by: Ingo Molnar
    Signed-off-by: Sebastian Andrzej Siewior
    Signed-off-by: Sasha Levin

    Will Deacon
     
  • commit 3bea9adc96842b8a7345c7fb202c16ae9c8d5b25 upstream.

    The native clear_pending() function is identical to the PV version, so the
    latter can simply be removed.

    This fixes the build for systems with >= 16K CPUs using the PV lock implementation.

    Reported-by: Waiman Long
    Signed-off-by: Will Deacon
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: boqun.feng@gmail.com
    Cc: linux-arm-kernel@lists.infradead.org
    Cc: paulmck@linux.vnet.ibm.com
    Link: http://lkml.kernel.org/r/20180427101619.GB21705@arm.com
    Signed-off-by: Ingo Molnar
    Signed-off-by: Sebastian Andrzej Siewior
    Signed-off-by: Sasha Levin

    Will Deacon
     
  • commit 59fb586b4a07b4e1a0ee577140ab4842ba451acd upstream.

    The qspinlock locking slowpath utilises a "pending" bit as a simple form
    of an embedded test-and-set lock that can avoid the overhead of explicit
    queuing in cases where the lock is held but uncontended. This bit is
    managed using a cmpxchg() loop which tries to transition the uncontended
    lock word from (0,0,0) -> (0,0,1) or (0,0,1) -> (0,1,1).

    Unfortunately, the cmpxchg() loop is unbounded and lockers can be starved
    indefinitely if the lock word is seen to oscillate between unlocked
    (0,0,0) and locked (0,0,1). This could happen if concurrent lockers are
    able to take the lock in the cmpxchg() loop without queuing and pass it
    around amongst themselves.

    This patch fixes the problem by unconditionally setting _Q_PENDING_VAL
    using atomic_fetch_or, and then inspecting the old value to see whether
    we need to spin on the current lock owner, or whether we now effectively
    hold the lock. The tricky scenario is when concurrent lockers end up
    queuing on the lock and the lock becomes available, causing us to see
    a lockword of (n,0,0). With pending now set, simply queuing could lead
    to deadlock as the head of the queue may not have observed the pending
    flag being cleared. Conversely, if the head of the queue did observe
    pending being cleared, then it could transition the lock from (n,0,0) ->
    (0,0,1) meaning that any attempt to "undo" our setting of the pending
    bit could race with a concurrent locker trying to set it.

    We handle this race by preserving the pending bit when taking the lock
    after reaching the head of the queue and leaving the tail entry intact
    if we saw pending set, because we know that the tail is going to be
    updated shortly.

    Signed-off-by: Will Deacon
    Acked-by: Peter Zijlstra (Intel)
    Acked-by: Waiman Long
    Cc: Linus Torvalds
    Cc: Thomas Gleixner
    Cc: boqun.feng@gmail.com
    Cc: linux-arm-kernel@lists.infradead.org
    Cc: paulmck@linux.vnet.ibm.com
    Link: http://lkml.kernel.org/r/1524738868-31318-6-git-send-email-will.deacon@arm.com
    Signed-off-by: Ingo Molnar
    Signed-off-by: Sebastian Andrzej Siewior
    Signed-off-by: Sasha Levin

    Will Deacon
     
  • commit 625e88be1f41b53cec55827c984e4a89ea8ee9f9 upstream.

    'struct __qspinlock' provides a handy union of fields so that
    subcomponents of the lockword can be accessed by name, without having to
    manage shifts and masks explicitly and take endianness into account.

    This is useful in qspinlock.h and also potentially in arch headers, so
    move the 'struct __qspinlock' into 'struct qspinlock' and kill the extra
    definition.

    Signed-off-by: Will Deacon
    Acked-by: Peter Zijlstra (Intel)
    Acked-by: Waiman Long
    Acked-by: Boqun Feng
    Cc: Linus Torvalds
    Cc: Thomas Gleixner
    Cc: linux-arm-kernel@lists.infradead.org
    Cc: paulmck@linux.vnet.ibm.com
    Link: http://lkml.kernel.org/r/1524738868-31318-3-git-send-email-will.deacon@arm.com
    Signed-off-by: Ingo Molnar
    Signed-off-by: Sebastian Andrzej Siewior
    Signed-off-by: Sasha Levin

    Will Deacon
     
  • commit 6512276d97b160d90b53285bd06f7f201459a7e3 upstream.

    If a locker taking the qspinlock slowpath reads a lock value indicating
    that only the pending bit is set, then it will spin whilst the
    concurrent pending->locked transition takes effect.

    Unfortunately, there is no guarantee that such a transition will ever be
    observed since concurrent lockers could continuously set pending and
    hand over the lock amongst themselves, leading to starvation. Whilst
    this would probably resolve in practice, it means that it is not
    possible to prove liveness properties about the lock and means that lock
    acquisition time is unbounded.

    Rather than removing the pending->locked spinning from the slowpath
    altogether (which has been shown to heavily penalise a 2-threaded
    locking stress test on x86), this patch replaces the explicit spinning
    with a call to atomic_cond_read_relaxed and allows the architecture to
    provide a bound on the number of spins. For architectures that can
    respond to changes in cacheline state in their smp_cond_load implementation,
    it should be sufficient to use the default bound of 1.

    Suggested-by: Waiman Long
    Signed-off-by: Will Deacon
    Acked-by: Peter Zijlstra (Intel)
    Acked-by: Waiman Long
    Cc: Linus Torvalds
    Cc: Thomas Gleixner
    Cc: boqun.feng@gmail.com
    Cc: linux-arm-kernel@lists.infradead.org
    Cc: paulmck@linux.vnet.ibm.com
    Link: http://lkml.kernel.org/r/1524738868-31318-4-git-send-email-will.deacon@arm.com
    Signed-off-by: Ingo Molnar
    Signed-off-by: Sebastian Andrzej Siewior
    Signed-off-by: Sasha Levin

    Will Deacon
     
  • commit 95bcade33a8af38755c9b0636e36a36ad3789fe6 upstream.

    When a locker ends up queuing on the qspinlock locking slowpath, we
    initialise the relevant mcs node and publish it indirectly by updating
    the tail portion of the lock word using xchg_tail. If we find that there
    was a pre-existing locker in the queue, we subsequently update their
    ->next field to point at our node so that we are notified when it's our
    turn to take the lock.

    This can be roughly illustrated as follows:

    /* Initialise the fields in node and encode a pointer to node in tail */
    tail = initialise_node(node);

    /*
    * Exchange tail into the lockword using an atomic read-modify-write
    * operation with release semantics
    */
    old = xchg_tail(lock, tail);

    /* If there was a pre-existing waiter ... */
    if (old & _Q_TAIL_MASK) {
    prev = decode_tail(old);
    smp_read_barrier_depends();

    /* ... then update their ->next field to point to node.
    WRITE_ONCE(prev->next, node);
    }

    The conditional update of prev->next therefore relies on the address
    dependency from the result of xchg_tail ensuring order against the
    prior initialisation of node. However, since the release semantics of
    the xchg_tail operation apply only to the write portion of the RmW,
    then this ordering is not guaranteed and it is possible for the CPU
    to return old before the writes to node have been published, consequently
    allowing us to point prev->next to an uninitialised node.

    This patch fixes the problem by making the update of prev->next a RELEASE
    operation, which also removes the reliance on dependency ordering.

    Signed-off-by: Will Deacon
    Acked-by: Peter Zijlstra (Intel)
    Cc: Linus Torvalds
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/1518528177-19169-2-git-send-email-will.deacon@arm.com
    Signed-off-by: Ingo Molnar
    Signed-off-by: Sebastian Andrzej Siewior
    Signed-off-by: Sasha Levin

    Will Deacon
     
  • commit 548095dea63ffc016d39c35b32c628d033638aca upstream.

    Queued spinlocks are not used by DEC Alpha, and furthermore operations
    such as READ_ONCE() and release/relaxed RMW atomics are being changed
    to imply smp_read_barrier_depends(). This commit therefore removes the
    now-redundant smp_read_barrier_depends() from queued_spin_lock_slowpath(),
    and adjusts the comments accordingly.

    Signed-off-by: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Ingo Molnar
    Signed-off-by: Sebastian Andrzej Siewior
    Signed-off-by: Sasha Levin

    Paul E. McKenney
     

14 Nov, 2018

1 commit

  • [ Upstream commit 9506a7425b094d2f1d9c877ed5a78f416669269b ]

    It was found that when debug_locks was turned off because of a problem
    found by the lockdep code, the system performance could drop quite
    significantly when the lock_stat code was also configured into the
    kernel. For instance, parallel kernel build time on a 4-socket x86-64
    server nearly doubled.

    Further analysis into the cause of the slowdown traced back to the
    frequent call to debug_locks_off() from the __lock_acquired() function
    probably due to some inconsistent lockdep states with debug_locks
    off. The debug_locks_off() function did an unconditional atomic xchg
    to write a 0 value into debug_locks which had already been set to 0.
    This led to severe cacheline contention in the cacheline that held
    debug_locks. As debug_locks is being referenced in quite a few different
    places in the kernel, this greatly slow down the system performance.

    To prevent that trashing of debug_locks cacheline, lock_acquired()
    and lock_contended() now checks the state of debug_locks before
    proceeding. The debug_locks_off() function is also modified to check
    debug_locks before calling __debug_locks_off().

    Signed-off-by: Waiman Long
    Cc: Andrew Morton
    Cc: Linus Torvalds
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Will Deacon
    Link: http://lkml.kernel.org/r/1539913518-15598-1-git-send-email-longman@redhat.com
    Signed-off-by: Ingo Molnar
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Waiman Long
     

04 Nov, 2018

1 commit

  • [ Upstream commit e4a02ed2aaf447fa849e3254bfdb3b9b01e1e520 ]

    If CONFIG_WW_MUTEX_SELFTEST=y is enabled, booting an image
    in an arm64 virtual machine results in the following
    traceback if 8 CPUs are enabled:

    DEBUG_LOCKS_WARN_ON(__owner_task(owner) != current)
    WARNING: CPU: 2 PID: 537 at kernel/locking/mutex.c:1033 __mutex_unlock_slowpath+0x1a8/0x2e0
    ...
    Call trace:
    __mutex_unlock_slowpath()
    ww_mutex_unlock()
    test_cycle_work()
    process_one_work()
    worker_thread()
    kthread()
    ret_from_fork()

    If requesting b_mutex fails with -EDEADLK, the error variable
    is reassigned to the return value from calling ww_mutex_lock
    on a_mutex again. If this call fails, a_mutex is not locked.
    It is, however, unconditionally unlocked subsequently, causing
    the reported warning. Fix the problem by using two error variables.

    With this change, the selftest still fails as follows:

    cyclic deadlock not resolved, ret[7/8] = -35

    However, the traceback is gone.

    Signed-off-by: Guenter Roeck
    Cc: Chris Wilson
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Will Deacon
    Fixes: d1b42b800e5d0 ("locking/ww_mutex: Add kselftests for resolving ww_mutex cyclic deadlocks")
    Link: http://lkml.kernel.org/r/1538516929-9734-1-git-send-email-linux@roeck-us.net
    Signed-off-by: Ingo Molnar
    Signed-off-by: Sasha Levin

    Guenter Roeck
     

05 Sep, 2018

1 commit

  • [ Upstream commit 62cedf3e60af03e47849fe2bd6a03ec179422a8a ]

    Needed for annotating rt_mutex locks.

    Tested-by: John Sperbeck
    Signed-off-by: Peter Rosin
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Davidlohr Bueso
    Cc: Deepa Dinamani
    Cc: Greg Kroah-Hartman
    Cc: Linus Torvalds
    Cc: Peter Chang
    Cc: Peter Zijlstra
    Cc: Philippe Ombredanne
    Cc: Thomas Gleixner
    Cc: Will Deacon
    Cc: Wolfram Sang
    Link: http://lkml.kernel.org/r/20180720083914.1950-2-peda@axentia.se
    Signed-off-by: Ingo Molnar
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Peter Rosin
     

24 Aug, 2018

1 commit

  • [ Upstream commit fcc784be837714a9173b372ff9fb9b514590dad9 ]

    While debugging where things were going wrong with mapping
    enabling/disabling interrupts with the lockdep state and actual real
    enabling and disabling interrupts, I had to silent the IRQ
    disabling/enabling in debug_check_no_locks_freed() because it was
    always showing up as it was called before the splat was.

    Use raw_local_irq_save/restore() for not only debug_check_no_locks_freed()
    but for all internal lockdep functions, as they hide useful information
    about where interrupts were used incorrectly last.

    Signed-off-by: Steven Rostedt (VMware)
    Cc: Andrew Morton
    Cc: Linus Torvalds
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Will Deacon
    Link: https://lkml.kernel.org/lkml/20180404140630.3f4f4c7a@gandalf.local.home
    Signed-off-by: Ingo Molnar
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Steven Rostedt (VMware)
     

21 Jun, 2018

2 commits

  • [ Upstream commit 5a817641f68a6399a5fac8b7d2da67a73698ffed ]

    The filesystem freezing code needs to transfer ownership of a rwsem
    embedded in a percpu-rwsem from the task that does the freezing to
    another one that does the thawing by calling percpu_rwsem_release()
    after freezing and percpu_rwsem_acquire() before thawing.

    However, the new rwsem debug code runs afoul with this scheme by warning
    that the task that releases the rwsem isn't the one that acquires it,
    as reported by Amir Goldstein:

    DEBUG_LOCKS_WARN_ON(sem->owner != get_current())
    WARNING: CPU: 1 PID: 1401 at /home/amir/build/src/linux/kernel/locking/rwsem.c:133 up_write+0x59/0x79

    Call Trace:
    percpu_up_write+0x1f/0x28
    thaw_super_locked+0xdf/0x120
    do_vfs_ioctl+0x270/0x5f1
    ksys_ioctl+0x52/0x71
    __x64_sys_ioctl+0x16/0x19
    do_syscall_64+0x5d/0x167
    entry_SYSCALL_64_after_hwframe+0x49/0xbe

    To work properly with the rwsem debug code, we need to annotate that the
    rwsem ownership is unknown during the tranfer period until a brave soul
    comes forward to acquire the ownership. During that period, optimistic
    spinning will be disabled.

    Reported-by: Amir Goldstein
    Tested-by: Amir Goldstein
    Signed-off-by: Waiman Long
    Acked-by: Peter Zijlstra
    Cc: Andrew Morton
    Cc: Davidlohr Bueso
    Cc: Jan Kara
    Cc: Linus Torvalds
    Cc: Matthew Wilcox
    Cc: Oleg Nesterov
    Cc: Paul E. McKenney
    Cc: Theodore Y. Ts'o
    Cc: Thomas Gleixner
    Cc: Will Deacon
    Cc: linux-fsdevel@vger.kernel.org
    Link: http://lkml.kernel.org/r/1526420991-21213-3-git-send-email-longman@redhat.com
    Signed-off-by: Ingo Molnar
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Waiman Long
     
  • [ Upstream commit d7d760efad70c7a030725499bf9f342f04af24dd ]

    There are use cases where a rwsem can be acquired by one task, but
    released by another task. In thess cases, optimistic spinning may need
    to be disabled. One example will be the filesystem freeze/thaw code
    where the task that freezes the filesystem will acquire a write lock
    on a rwsem and then un-owns it before returning to userspace. Later on,
    another task will come along, acquire the ownership, thaw the filesystem
    and release the rwsem.

    Bit 0 of the owner field was used to designate that it is a reader
    owned rwsem. It is now repurposed to mean that the owner of the rwsem
    is not known. If only bit 0 is set, the rwsem is reader owned. If bit
    0 and other bits are set, it is writer owned with an unknown owner.
    One such value for the latter case is (-1L). So we can set owner to 1 for
    reader-owned, -1 for writer-owned. The owner is unknown in both cases.

    To handle transfer of rwsem ownership, the higher level code should
    set the owner field to -1 to indicate a write-locked rwsem with unknown
    owner. Optimistic spinning will be disabled in this case.

    Once the higher level code figures who the new owner is, it can then
    set the owner field accordingly.

    Tested-by: Amir Goldstein
    Signed-off-by: Waiman Long
    Acked-by: Peter Zijlstra
    Cc: Andrew Morton
    Cc: Davidlohr Bueso
    Cc: Jan Kara
    Cc: Linus Torvalds
    Cc: Matthew Wilcox
    Cc: Oleg Nesterov
    Cc: Paul E. McKenney
    Cc: Theodore Y. Ts'o
    Cc: Thomas Gleixner
    Cc: Will Deacon
    Cc: linux-fsdevel@vger.kernel.org
    Link: http://lkml.kernel.org/r/1526420991-21213-2-git-send-email-longman@redhat.com
    Signed-off-by: Ingo Molnar
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Waiman Long
     

26 Apr, 2018

1 commit

  • [ Upstream commit 11dc13224c975efcec96647a4768a6f1bb7a19a8 ]

    When queuing on the qspinlock, the count field for the current CPU's head
    node is incremented. This needn't be atomic because locking in e.g. IRQ
    context is balanced and so an IRQ will return with node->count as it
    found it.

    However, the compiler could in theory reorder the initialisation of
    node[idx] before the increment of the head node->count, causing an
    IRQ to overwrite the initialised node and potentially corrupt the lock
    state.

    Avoid the potential for this harmful compiler reordering by placing a
    barrier() between the increment of the head node->count and the subsequent
    node initialisation.

    Signed-off-by: Will Deacon
    Acked-by: Peter Zijlstra (Intel)
    Cc: Linus Torvalds
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/1518528177-19169-3-git-send-email-will.deacon@arm.com
    Signed-off-by: Ingo Molnar
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Will Deacon
     

19 Mar, 2018

1 commit

  • [ Upstream commit 2ce77d16db4240dd2e422fc0a5c26d3e2ec03446 ]

    Things can explode for locktorture if the user does combinations
    of nwriters_stress=0 nreaders_stress=0. Fix this by not assuming
    we always want to torture writer threads.

    Reported-by: Jeremy Linton
    Signed-off-by: Davidlohr Bueso
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Jeremy Linton
    Tested-by: Jeremy Linton
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Davidlohr Bueso
     

25 Feb, 2018

1 commit

  • [ Upstream commit 5e351ad106997e06b2dc3da9c6b939b95f67fb88 ]

    We can't invalidate xhlocks when we've not yet allocated any.

    Reported-by: Dmitry Vyukov
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: linux-kernel@vger.kernel.org
    Fixes: f52be5708076 ("locking/lockdep: Untangle xhlock history save/restore from task independence")
    Signed-off-by: Ingo Molnar
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Peter Zijlstra
     

22 Feb, 2018

1 commit

  • commit 4950276672fce5c241857540f8561c440663673d upstream.

    Patch series "kmemcheck: kill kmemcheck", v2.

    As discussed at LSF/MM, kill kmemcheck.

    KASan is a replacement that is able to work without the limitation of
    kmemcheck (single CPU, slow). KASan is already upstream.

    We are also not aware of any users of kmemcheck (or users who don't
    consider KASan as a suitable replacement).

    The only objection was that since KASAN wasn't supported by all GCC
    versions provided by distros at that time we should hold off for 2
    years, and try again.

    Now that 2 years have passed, and all distros provide gcc that supports
    KASAN, kill kmemcheck again for the very same reasons.

    This patch (of 4):

    Remove kmemcheck annotations, and calls to kmemcheck from the kernel.

    [alexander.levin@verizon.com: correctly remove kmemcheck call from dma_map_sg_attrs]
    Link: http://lkml.kernel.org/r/20171012192151.26531-1-alexander.levin@verizon.com
    Link: http://lkml.kernel.org/r/20171007030159.22241-2-alexander.levin@verizon.com
    Signed-off-by: Sasha Levin
    Cc: Alexander Potapenko
    Cc: Eric W. Biederman
    Cc: Michal Hocko
    Cc: Pekka Enberg
    Cc: Steven Rostedt
    Cc: Tim Hansen
    Cc: Vegard Nossum
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Levin, Alexander (Sasha Levin)
     

24 Jan, 2018

1 commit

  • commit c1e2f0eaf015fb7076d51a339011f2383e6dd389 upstream.

    Julia reported futex state corruption in the following scenario:

    waiter waker stealer (prio > waiter)

    futex(WAIT_REQUEUE_PI, uaddr, uaddr2,
    timeout=[N ms])
    futex_wait_requeue_pi()
    futex_wait_queue_me()
    freezable_schedule()

    futex(LOCK_PI, uaddr2)
    futex(CMP_REQUEUE_PI, uaddr,
    uaddr2, 1, 0)
    /* requeues waiter to uaddr2 */
    futex(UNLOCK_PI, uaddr2)
    wake_futex_pi()
    cmp_futex_value_locked(uaddr2, waiter)
    wake_up_q()

    task>
    futex(LOCK_PI, uaddr2)
    __rt_mutex_start_proxy_lock()
    try_to_take_rt_mutex() /* steals lock */
    rt_mutex_set_owner(lock, stealer)


    rt_mutex_wait_proxy_lock()
    __rt_mutex_slowlock()
    try_to_take_rt_mutex() /* fails, lock held by stealer */
    if (timeout && !timeout->task)
    return -ETIMEDOUT;
    fixup_owner()
    /* lock wasn't acquired, so,
    fixup_pi_state_owner skipped */

    return -ETIMEDOUT;

    /* At this point, we've returned -ETIMEDOUT to userspace, but the
    * futex word shows waiter to be the owner, and the pi_mutex has
    * stealer as the owner */

    futex_lock(LOCK_PI, uaddr2)
    -> bails with EDEADLK, futex word says we're owner.

    And suggested that what commit:

    73d786bd043e ("futex: Rework inconsistent rt_mutex/futex_q state")

    removes from fixup_owner() looks to be just what is needed. And indeed
    it is -- I completely missed that requeue_pi could also result in this
    case. So we need to restore that, except that subsequent patches, like
    commit:

    16ffa12d7425 ("futex: Pull rt_mutex_futex_unlock() out from under hb->lock")

    changed all the locking rules. Even without that, the sequence:

    - if (rt_mutex_futex_trylock(&q->pi_state->pi_mutex)) {
    - locked = 1;
    - goto out;
    - }

    - raw_spin_lock_irq(&q->pi_state->pi_mutex.wait_lock);
    - owner = rt_mutex_owner(&q->pi_state->pi_mutex);
    - if (!owner)
    - owner = rt_mutex_next_owner(&q->pi_state->pi_mutex);
    - raw_spin_unlock_irq(&q->pi_state->pi_mutex.wait_lock);
    - ret = fixup_pi_state_owner(uaddr, q, owner);

    already suggests there were races; otherwise we'd never have to look
    at next_owner.

    So instead of doing 3 consecutive wait_lock sections with who knows
    what races, we do it all in a single section. Additionally, the usage
    of pi_state->owner in fixup_owner() was only safe because only the
    rt_mutex owner would modify it, which this additional case wrecks.

    Luckily the values can only change away and not to the value we're
    testing, this means we can do a speculative test and double check once
    we have the wait_lock.

    Fixes: 73d786bd043e ("futex: Rework inconsistent rt_mutex/futex_q state")
    Reported-by: Julia Cartwright
    Reported-by: Gratian Crisan
    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Thomas Gleixner
    Tested-by: Julia Cartwright
    Tested-by: Gratian Crisan
    Cc: Darren Hart
    Link: https://lkml.kernel.org/r/20171208124939.7livp7no2ov65rrc@hirez.programming.kicks-ass.net
    Signed-off-by: Greg Kroah-Hartman

    Peter Zijlstra
     

02 Nov, 2017

1 commit

  • Many source files in the tree are missing licensing information, which
    makes it harder for compliance tools to determine the correct license.

    By default all files without license information are under the default
    license of the kernel, which is GPL version 2.

    Update the files which contain no license information with the 'GPL-2.0'
    SPDX license identifier. The SPDX identifier is a legally binding
    shorthand, which can be used instead of the full boiler plate text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart and
    Philippe Ombredanne.

    How this work was done:

    Patches were generated and checked against linux-4.14-rc6 for a subset of
    the use cases:
    - file had no licensing information it it.
    - file was a */uapi/* one with no licensing information in it,
    - file was a */uapi/* one with existing licensing information,

    Further patches will be generated in subsequent months to fix up cases
    where non-standard license headers were used, and references to license
    had to be inferred by heuristics based on keywords.

    The analysis to determine which SPDX License Identifier to be applied to
    a file was done in a spreadsheet of side by side results from of the
    output of two independent scanners (ScanCode & Windriver) producing SPDX
    tag:value files created by Philippe Ombredanne. Philippe prepared the
    base worksheet, and did an initial spot review of a few 1000 files.

    The 4.13 kernel was the starting point of the analysis with 60,537 files
    assessed. Kate Stewart did a file by file comparison of the scanner
    results in the spreadsheet to determine which SPDX license identifier(s)
    to be applied to the file. She confirmed any determination that was not
    immediately clear with lawyers working with the Linux Foundation.

    Criteria used to select files for SPDX license identifier tagging was:
    - Files considered eligible had to be source code files.
    - Make and config files were included as candidates if they contained >5
    lines of source
    - File already had some variant of a license header in it (even if
    Reviewed-by: Philippe Ombredanne
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

10 Oct, 2017

1 commit

  • There is some complication between check_prevs_add() and
    check_prev_add() wrt. saving stack traces. The problem is that we want
    to be frugal with saving stack traces, since it consumes static
    resources.

    We'll only know in check_prev_add() if we need the trace, but we can
    call into it multiple times. So we want to do on-demand and re-use.

    A further complication is that check_prev_add() can drop graph_lock
    and mess with our static resources.

    In any case, the current state; after commit:

    ce07a9415f26 ("locking/lockdep: Make check_prev_add() able to handle external stack_trace")

    is that we'll assume the trace contains valid data once
    check_prev_add() returns '2'. However, as noted by Josh, this is
    false, check_prev_add() can return '2' before having saved a trace,
    this then result in the possibility of using uninitialized data.
    Testing, as reported by Wu, shows a NULL deref.

    So simplify.

    Since the graph_lock() thing is a debug path that hasn't
    really been used in a long while, take it out back and avoid the
    head-ache.

    Further initialize the stack_trace to a known 'empty' state; as long
    as nr_entries == 0, nothing should deref entries. We can then use the
    'entries == NULL' test for a valid trace / on-demand saving.

    Analyzed-by: Josh Poimboeuf
    Reported-by: Fengguang Wu
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Byungchul Park
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Fixes: ce07a9415f26 ("locking/lockdep: Make check_prev_add() able to handle external stack_trace")
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

29 Sep, 2017

1 commit

  • If a spinner is present, there is a chance that the load of
    rwsem_has_spinner() in rwsem_wake() can be reordered with
    respect to decrement of rwsem count in __up_write() leading
    to wakeup being missed:

    spinning writer up_write caller
    --------------- -----------------------
    [S] osq_unlock() [L] osq
    spin_lock(wait_lock)
    sem->count=0xFFFFFFFF00000001
    +0xFFFFFFFF00000000
    count=sem->count
    MB
    sem->count=0xFFFFFFFE00000001
    -0xFFFFFFFF00000001
    spin_trylock(wait_lock)
    return
    rwsem_try_write_lock(count)
    spin_unlock(wait_lock)
    schedule()

    Reordering of atomic_long_sub_return_release() in __up_write()
    and rwsem_has_spinner() in rwsem_wake() can cause missing of
    wakeup in up_write() context. In spinning writer, sem->count
    and local variable count is 0XFFFFFFFE00000001. It would result
    in rwsem_try_write_lock() failing to acquire rwsem and spinning
    writer going to sleep in rwsem_down_write_failed().

    The smp_rmb() will make sure that the spinner state is
    consulted after sem->count is updated in up_write context.

    Signed-off-by: Prateek Sood
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: dave@stgolabs.net
    Cc: longman@redhat.com
    Cc: parri.andrea@gmail.com
    Cc: sramana@codeaurora.org
    Link: http://lkml.kernel.org/r/1504794658-15397-1-git-send-email-prsood@codeaurora.org
    Signed-off-by: Ingo Molnar

    Prateek Sood
     

14 Sep, 2017

1 commit

  • GFP_TEMPORARY was introduced by commit e12ba74d8ff3 ("Group short-lived
    and reclaimable kernel allocations") along with __GFP_RECLAIMABLE. It's
    primary motivation was to allow users to tell that an allocation is
    short lived and so the allocator can try to place such allocations close
    together and prevent long term fragmentation. As much as this sounds
    like a reasonable semantic it becomes much less clear when to use the
    highlevel GFP_TEMPORARY allocation flag. How long is temporary? Can the
    context holding that memory sleep? Can it take locks? It seems there is
    no good answer for those questions.

    The current implementation of GFP_TEMPORARY is basically GFP_KERNEL |
    __GFP_RECLAIMABLE which in itself is tricky because basically none of
    the existing caller provide a way to reclaim the allocated memory. So
    this is rather misleading and hard to evaluate for any benefits.

    I have checked some random users and none of them has added the flag
    with a specific justification. I suspect most of them just copied from
    other existing users and others just thought it might be a good idea to
    use without any measuring. This suggests that GFP_TEMPORARY just
    motivates for cargo cult usage without any reasoning.

    I believe that our gfp flags are quite complex already and especially
    those with highlevel semantic should be clearly defined to prevent from
    confusion and abuse. Therefore I propose dropping GFP_TEMPORARY and
    replace all existing users to simply use GFP_KERNEL. Please note that
    SLAB users with shrinkers will still get __GFP_RECLAIMABLE heuristic and
    so they will be placed properly for memory fragmentation prevention.

    I can see reasons we might want some gfp flag to reflect shorterm
    allocations but I propose starting from a clear semantic definition and
    only then add users with proper justification.

    This was been brought up before LSF this year by Matthew [1] and it
    turned out that GFP_TEMPORARY really doesn't have a clear semantic. It
    seems to be a heuristic without any measured advantage for most (if not
    all) its current users. The follow up discussion has revealed that
    opinions on what might be temporary allocation differ a lot between
    developers. So rather than trying to tweak existing users into a
    semantic which they haven't expected I propose to simply remove the flag
    and start from scratch if we really need a semantic for short term
    allocations.

    [1] http://lkml.kernel.org/r/20170118054945.GD18349@bombadil.infradead.org

    [akpm@linux-foundation.org: fix typo]
    [akpm@linux-foundation.org: coding-style fixes]
    [sfr@canb.auug.org.au: drm/i915: fix up]
    Link: http://lkml.kernel.org/r/20170816144703.378d4f4d@canb.auug.org.au
    Link: http://lkml.kernel.org/r/20170728091904.14627-1-mhocko@kernel.org
    Signed-off-by: Michal Hocko
    Signed-off-by: Stephen Rothwell
    Acked-by: Mel Gorman
    Acked-by: Vlastimil Babka
    Cc: Matthew Wilcox
    Cc: Neil Brown
    Cc: "Theodore Ts'o"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     

09 Sep, 2017

1 commit


05 Sep, 2017

1 commit

  • Pull locking updates from Ingo Molnar:

    - Add 'cross-release' support to lockdep, which allows APIs like
    completions, where it's not the 'owner' who releases the lock, to be
    tracked. It's all activated automatically under
    CONFIG_PROVE_LOCKING=y.

    - Clean up (restructure) the x86 atomics op implementation to be more
    readable, in preparation of KASAN annotations. (Dmitry Vyukov)

    - Fix static keys (Paolo Bonzini)

    - Add killable versions of down_read() et al (Kirill Tkhai)

    - Rework and fix jump_label locking (Marc Zyngier, Paolo Bonzini)

    - Rework (and fix) tlb_flush_pending() barriers (Peter Zijlstra)

    - Remove smp_mb__before_spinlock() and convert its usages, introduce
    smp_mb__after_spinlock() (Peter Zijlstra)

    * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (56 commits)
    locking/lockdep/selftests: Fix mixed read-write ABBA tests
    sched/completion: Avoid unnecessary stack allocation for COMPLETION_INITIALIZER_ONSTACK()
    acpi/nfit: Fix COMPLETION_INITIALIZER_ONSTACK() abuse
    locking/pvqspinlock: Relax cmpxchg's to improve performance on some architectures
    smp: Avoid using two cache lines for struct call_single_data
    locking/lockdep: Untangle xhlock history save/restore from task independence
    locking/refcounts, x86/asm: Disable CONFIG_ARCH_HAS_REFCOUNT for the time being
    futex: Remove duplicated code and fix undefined behaviour
    Documentation/locking/atomic: Finish the document...
    locking/lockdep: Fix workqueue crossrelease annotation
    workqueue/lockdep: 'Fix' flush_work() annotation
    locking/lockdep/selftests: Add mixed read-write ABBA tests
    mm, locking/barriers: Clarify tlb_flush_pending() barriers
    locking/lockdep: Make CONFIG_LOCKDEP_CROSSRELEASE and CONFIG_LOCKDEP_COMPLETIONS truly non-interactive
    locking/lockdep: Explicitly initialize wq_barrier::done::map
    locking/lockdep: Rename CONFIG_LOCKDEP_COMPLETE to CONFIG_LOCKDEP_COMPLETIONS
    locking/lockdep: Reword title of LOCKDEP_CROSSRELEASE config
    locking/lockdep: Make CONFIG_LOCKDEP_CROSSRELEASE part of CONFIG_PROVE_LOCKING
    locking/refcounts, x86/asm: Implement fast refcount overflow protection
    locking/lockdep: Fix the rollback and overwrite detection logic in crossrelease
    ...

    Linus Torvalds
     

29 Aug, 2017

2 commits

  • All the locking related cmpxchg's in the following functions are
    replaced with the _acquire variants:

    - pv_queued_spin_steal_lock()
    - trylock_clear_pending()

    This change should help performance on architectures that use LL/SC.

    The cmpxchg in pv_kick_node() is replaced with a relaxed version
    with explicit memory barrier to make sure that it is fully ordered
    in the writing of next->lock and the reading of pn->state whether
    the cmpxchg is a success or failure without affecting performance in
    non-LL/SC architectures.

    On a 2-socket 12-core 96-thread Power8 system with pvqspinlock
    explicitly enabled, the performance of a locking microbenchmark
    with and without this patch on a 4.13-rc4 kernel with Xinhui's PPC
    qspinlock patch were as follows:

    # of thread w/o patch with patch % Change
    ----------- --------- ---------- --------
    8 5054.8 Mop/s 5209.4 Mop/s +3.1%
    16 3985.0 Mop/s 4015.0 Mop/s +0.8%
    32 2378.2 Mop/s 2396.0 Mop/s +0.7%

    Suggested-by: Peter Zijlstra
    Signed-off-by: Waiman Long
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andrea Parri
    Cc: Boqun Feng
    Cc: Linus Torvalds
    Cc: Pan Xinhui
    Cc: Thomas Gleixner
    Cc: Will Deacon
    Link: http://lkml.kernel.org/r/1502741222-24360-1-git-send-email-longman@redhat.com
    Signed-off-by: Ingo Molnar

    Waiman Long
     
  • Where XHLOCK_{SOFT,HARD} are save/restore points in the xhlocks[] to
    ensure the temporal IRQ events don't interact with task state, the
    XHLOCK_PROC is a fundament different beast that just happens to share
    the interface.

    The purpose of XHLOCK_PROC is to annotate independent execution inside
    one task. For example workqueues, each work should appear to run in its
    own 'pristine' 'task'.

    Remove XHLOCK_PROC in favour of its own interface to avoid confusion.

    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Byungchul Park
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: boqun.feng@gmail.com
    Cc: david@fromorbit.com
    Cc: johannes@sipsolutions.net
    Cc: kernel-team@lge.com
    Cc: oleg@redhat.com
    Cc: tj@kernel.org
    Link: http://lkml.kernel.org/r/20170829085939.ggmb6xiohw67micb@hirez.programming.kicks-ass.net
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

25 Aug, 2017

1 commit

  • The new completion/crossrelease annotations interact unfavourable with
    the extant flush_work()/flush_workqueue() annotations.

    The problem is that when a single work class does:

    wait_for_completion(&C)

    and

    complete(&C)

    in different executions, we'll build dependencies like:

    lock_map_acquire(W)
    complete_acquire(C)

    and

    lock_map_acquire(W)
    complete_release(C)

    which results in the dependency chain: W->C->W, which lockdep thinks
    spells deadlock, even though there is no deadlock potential since
    works are ran concurrently.

    One possibility would be to change the work 'lock' to recursive-read,
    but that would mean hitting a lockdep limitation on recursive locks.
    Also, unconditinoally switching to recursive-read here would fail to
    detect the actual deadlock on single-threaded workqueues, which do
    have a problem with this.

    For now, forcefully disregard these locks for crossrelease.

    Signed-off-by: Peter Zijlstra (Intel)
    Acked-by: Tejun Heo
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: boqun.feng@gmail.com
    Cc: byungchul.park@lge.com
    Cc: david@fromorbit.com
    Cc: johannes@sipsolutions.net
    Cc: oleg@redhat.com
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

17 Aug, 2017

1 commit

  • There is no agreed-upon definition of spin_unlock_wait()'s semantics,
    and it appears that all callers could do just as well with a lock/unlock
    pair. This commit therefore removes spin_unlock_wait() and related
    definitions from core code.

    Signed-off-by: Paul E. McKenney
    Cc: Arnd Bergmann
    Cc: Ingo Molnar
    Cc: Will Deacon
    Cc: Peter Zijlstra
    Cc: Alan Stern
    Cc: Andrea Parri
    Cc: Linus Torvalds

    Paul E. McKenney
     

14 Aug, 2017

2 commits

  • As Boqun Feng pointed out, current->hist_id should be aligned with the
    latest valid xhlock->hist_id so that hist_id_save[] storing current->hist_id
    can be comparable with xhlock->hist_id. Fix it.

    Additionally, the condition for overwrite-detection should be the
    opposite. Fix the code and the comments as well.


    Signed-off-by: Byungchul Park
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: akpm@linux-foundation.org
    Cc: kernel-team@lge.com
    Cc: kirill@shutemov.name
    Cc: linux-mm@kvack.org
    Cc: npiggin@gmail.com
    Cc: walken@google.com
    Cc: willy@infradead.org
    Link: http://lkml.kernel.org/r/1502694052-16085-3-git-send-email-byungchul.park@lge.com
    [ Improve the comments some more. ]
    Signed-off-by: Ingo Molnar

    Byungchul Park
     
  • In lockdep_sys_exit(), crossrelease_hist_end() is called unconditionally
    even when getting here without having started e.g. just after forking.

    But it's no problem since it would roll back to an invalid entry anyway.
    Add a comment to explain this.

    Signed-off-by: Byungchul Park
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: akpm@linux-foundation.org
    Cc: boqun.feng@gmail.com
    Cc: kernel-team@lge.com
    Cc: kirill@shutemov.name
    Cc: linux-mm@kvack.org
    Cc: npiggin@gmail.com
    Cc: walken@google.com
    Cc: willy@infradead.org
    Link: http://lkml.kernel.org/r/1502694052-16085-2-git-send-email-byungchul.park@lge.com
    [ Improved the description and the comments. ]
    Signed-off-by: Ingo Molnar

    Byungchul Park
     

10 Aug, 2017

8 commits

  • print_circular_bug() reporting circular bug assumes that target hlock is
    owned by the current. However, in crossrelease, target hlock can be
    owned by other than the current. So the report format needs to be
    changed to reflect the change.

    Signed-off-by: Byungchul Park
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: akpm@linux-foundation.org
    Cc: boqun.feng@gmail.com
    Cc: kernel-team@lge.com
    Cc: kirill@shutemov.name
    Cc: npiggin@gmail.com
    Cc: walken@google.com
    Cc: willy@infradead.org
    Link: http://lkml.kernel.org/r/1502089981-21272-9-git-send-email-byungchul.park@lge.com
    Signed-off-by: Ingo Molnar

    Byungchul Park
     
  • No acquisition might be in progress on commit of a crosslock. Completion
    operations enabling crossrelease are the case like:

    CONTEXT X CONTEXT Y
    --------- ---------
    trigger completion context
    complete AX
    commit AX
    wait_for_complete AX
    acquire AX
    wait

    where AX is a crosslock.

    When no acquisition is in progress, we should not perform commit because
    the lock does not exist, which might cause incorrect memory access. So
    we have to track the number of acquisitions of a crosslock to handle it.

    Moreover, in case that more than one acquisition of a crosslock are
    overlapped like:

    CONTEXT W CONTEXT X CONTEXT Y CONTEXT Z
    --------- --------- --------- ---------
    acquire AX (gen_id: 1)
    acquire A
    acquire AX (gen_id: 10)
    acquire B
    commit AX
    acquire C
    commit AX

    where A, B and C are typical locks and AX is a crosslock.

    Current crossrelease code performs commits in Y and Z with gen_id = 10.
    However, we can use gen_id = 1 to do it, since not only 'acquire AX in X'
    but 'acquire AX in W' also depends on each acquisition in Y and Z until
    their commits. So make it use gen_id = 1 instead of 10 on their commits,
    which adds an additional dependency 'AX -> A' in the example above.

    Signed-off-by: Byungchul Park
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: akpm@linux-foundation.org
    Cc: boqun.feng@gmail.com
    Cc: kernel-team@lge.com
    Cc: kirill@shutemov.name
    Cc: npiggin@gmail.com
    Cc: walken@google.com
    Cc: willy@infradead.org
    Link: http://lkml.kernel.org/r/1502089981-21272-8-git-send-email-byungchul.park@lge.com
    Signed-off-by: Ingo Molnar

    Byungchul Park
     
  • The ring buffer can be overwritten by hardirq/softirq/work contexts.
    That cases must be considered on rollback or commit. For example,

    ||
    ppppppppppppiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii
    wrapped > iiiiiiiiiiiiiiiiiiiiiii....................

    where 'p' represents an acquisition in process context,
    'i' represents an acquisition in irq context.

    On irq exit, crossrelease tries to rollback idx to original position,
    but it should not because the entry already has been invalid by
    overwriting 'i'. Avoid rollback or commit for entries overwritten.

    Signed-off-by: Byungchul Park
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: akpm@linux-foundation.org
    Cc: boqun.feng@gmail.com
    Cc: kernel-team@lge.com
    Cc: kirill@shutemov.name
    Cc: npiggin@gmail.com
    Cc: walken@google.com
    Cc: willy@infradead.org
    Link: http://lkml.kernel.org/r/1502089981-21272-7-git-send-email-byungchul.park@lge.com
    Signed-off-by: Ingo Molnar

    Byungchul Park
     
  • Lockdep is a runtime locking correctness validator that detects and
    reports a deadlock or its possibility by checking dependencies between
    locks. It's useful since it does not report just an actual deadlock but
    also the possibility of a deadlock that has not actually happened yet.
    That enables problems to be fixed before they affect real systems.

    However, this facility is only applicable to typical locks, such as
    spinlocks and mutexes, which are normally released within the context in
    which they were acquired. However, synchronization primitives like page
    locks or completions, which are allowed to be released in any context,
    also create dependencies and can cause a deadlock.

    So lockdep should track these locks to do a better job. The 'crossrelease'
    implementation makes these primitives also be tracked.

    Signed-off-by: Byungchul Park
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: akpm@linux-foundation.org
    Cc: boqun.feng@gmail.com
    Cc: kernel-team@lge.com
    Cc: kirill@shutemov.name
    Cc: npiggin@gmail.com
    Cc: walken@google.com
    Cc: willy@infradead.org
    Link: http://lkml.kernel.org/r/1502089981-21272-6-git-send-email-byungchul.park@lge.com
    Signed-off-by: Ingo Molnar

    Byungchul Park
     
  • Currently, a space for stack_trace is pinned in check_prev_add(), that
    makes us not able to use external stack_trace. The simplest way to
    achieve it is to pass an external stack_trace as an argument.

    A more suitable solution is to pass a callback additionally along with
    a stack_trace so that callers can decide the way to save or whether to
    save. Actually crossrelease needs to do other than saving a stack_trace.
    So pass a stack_trace and callback to handle it, to check_prev_add().

    Signed-off-by: Byungchul Park
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: akpm@linux-foundation.org
    Cc: boqun.feng@gmail.com
    Cc: kernel-team@lge.com
    Cc: kirill@shutemov.name
    Cc: npiggin@gmail.com
    Cc: walken@google.com
    Cc: willy@infradead.org
    Link: http://lkml.kernel.org/r/1502089981-21272-5-git-send-email-byungchul.park@lge.com
    Signed-off-by: Ingo Molnar

    Byungchul Park
     
  • Firstly, return 1 instead of 2 when 'prev -> next' dependency already
    exists. Since the value 2 is not referenced anywhere, just return 1
    indicating success in this case.

    Secondly, return 2 instead of 1 when successfully added a lock_list
    entry with saving stack_trace. With that, a caller can decide whether
    to avoid redundant save_trace() on the caller site.

    Signed-off-by: Byungchul Park
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: akpm@linux-foundation.org
    Cc: boqun.feng@gmail.com
    Cc: kernel-team@lge.com
    Cc: kirill@shutemov.name
    Cc: npiggin@gmail.com
    Cc: walken@google.com
    Cc: willy@infradead.org
    Link: http://lkml.kernel.org/r/1502089981-21272-4-git-send-email-byungchul.park@lge.com
    Signed-off-by: Ingo Molnar

    Byungchul Park
     
  • Crossrelease needs to build a chain between two classes regardless of
    their contexts. However, add_chain_cache() cannot be used for that
    purpose since it assumes that it's called in the acquisition context
    of the hlock. So this patch introduces a new function doing it.

    Signed-off-by: Byungchul Park
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: akpm@linux-foundation.org
    Cc: boqun.feng@gmail.com
    Cc: kernel-team@lge.com
    Cc: kirill@shutemov.name
    Cc: npiggin@gmail.com
    Cc: walken@google.com
    Cc: willy@infradead.org
    Link: http://lkml.kernel.org/r/1502089981-21272-3-git-send-email-byungchul.park@lge.com
    Signed-off-by: Ingo Molnar

    Byungchul Park
     
  • Currently, lookup_chain_cache() provides both 'lookup' and 'add'
    functionalities in a function. However, each is useful. So this
    patch makes lookup_chain_cache() only do 'lookup' functionality and
    makes add_chain_cahce() only do 'add' functionality. And it's more
    readable than before.

    Signed-off-by: Byungchul Park
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: akpm@linux-foundation.org
    Cc: boqun.feng@gmail.com
    Cc: kernel-team@lge.com
    Cc: kirill@shutemov.name
    Cc: npiggin@gmail.com
    Cc: walken@google.com
    Cc: willy@infradead.org
    Link: http://lkml.kernel.org/r/1502089981-21272-2-git-send-email-byungchul.park@lge.com
    Signed-off-by: Ingo Molnar

    Byungchul Park