28 Oct, 2014

1 commit

  • We're going to make might_sleep() test for TASK_RUNNING, because
    blocking without TASK_RUNNING will destroy the task state by setting
    it to TASK_RUNNING.

    There are a few occasions where its 'valid' to call blocking
    primitives (and mutex_lock in particular) and not have TASK_RUNNING,
    typically such cases are right before we set TASK_RUNNING anyhow.

    Robustify the code by not assuming this; this has the beneficial side
    effect of allowing optional code emission for fixing the above
    might_sleep() false positives.

    Signed-off-by: Peter Zijlstra (Intel)
    Cc: tglx@linutronix.de
    Cc: ilya.dryomov@inktank.com
    Cc: umgwanakikbuti@gmail.com
    Cc: Oleg Nesterov
    Cc: Linus Torvalds
    Link: http://lkml.kernel.org/r/20140924082241.988560063@infradead.org
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

13 Oct, 2014

1 commit

  • Pull core locking updates from Ingo Molnar:
    "The main updates in this cycle were:

    - mutex MCS refactoring finishing touches: improve comments, refactor
    and clean up code, reduce debug data structure footprint, etc.

    - qrwlock finishing touches: remove old code, self-test updates.

    - small rwsem optimization

    - various smaller fixes/cleanups"

    * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    locking/lockdep: Revert qrwlock recusive stuff
    locking/rwsem: Avoid double checking before try acquiring write lock
    locking/rwsem: Move EXPORT_SYMBOL() lines to follow function definition
    locking/rwlock, x86: Delete unused asm/rwlock.h and rwlock.S
    locking/rwlock, x86: Clean up asm/spinlock*.h to remove old rwlock code
    locking/semaphore: Resolve some shadow warnings
    locking/selftest: Support queued rwlock
    locking/lockdep: Restrict the use of recursive read_lock() with qrwlock
    locking/spinlocks: Always evaluate the second argument of spin_lock_nested()
    locking/Documentation: Update locking/mutex-design.txt disadvantages
    locking/Documentation: Move locking related docs into Documentation/locking/
    locking/mutexes: Use MUTEX_SPIN_ON_OWNER when appropriate
    locking/mutexes: Refactor optimistic spinning code
    locking/mcs: Remove obsolete comment
    locking/mutexes: Document quick lock release when unlocking
    locking/mutexes: Standardize arguments in lock/unlock slowpaths
    locking: Remove deprecated smp_mb__() barriers

    Linus Torvalds
     

03 Oct, 2014

2 commits

  • Commit f0bab73cb539 ("locking/lockdep: Restrict the use of recursive
    read_lock() with qrwlock") changed lockdep to try and conform to the
    qrwlock semantics which differ from the traditional rwlock semantics.

    In particular qrwlock is fair outside of interrupt context, but in
    interrupt context readers will ignore all fairness.

    The problem modeling this is that read and write side have different
    lock state (interrupts) semantics but we only have a single
    representation of these. Therefore lockdep will get confused, thinking
    the lock can cause interrupt lock inversions.

    So revert it for now; the old rwlock semantics were already imperfectly
    modeled and the qrwlock extra won't fit either.

    If we want to properly fix this, I think we need to resurrect the work
    by Gautham did a few years ago that split the read and write state of
    locks:

    http://lwn.net/Articles/332801/

    FWIW the locking selftest that would've failed (and was reported by
    Borislav earlier) is something like:

    RL(X1); /* IRQ-ON */
    LOCK(A);
    UNLOCK(A);
    RU(X1);

    IRQ_ENTER();
    RL(X1); /* IN-IRQ */
    RU(X1);
    IRQ_EXIT();

    At which point it would report that because A is an IRQ-unsafe lock we
    can suffer the following inversion:

    CPU0 CPU1

    lock(A)
    lock(X1)
    lock(A)

    lock(X1)

    And this is 'wrong' because X1 can recurse (assuming the above lock are
    in fact read-lock) but lockdep doesn't know about this.

    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Waiman Long
    Cc: ego@linux.vnet.ibm.com
    Cc: bp@alien8.de
    Cc: Linus Torvalds
    Cc: Paul E. McKenney
    Link: http://lkml.kernel.org/r/20140930132600.GA7444@worktop.programming.kicks-ass.net
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Commit 9b0fc9c09f1b ("rwsem: skip initial trylock in rwsem_down_write_failed")
    checks for if there are known active lockers in order to avoid write trylocking
    using expensive cmpxchg() when it likely wouldn't get the lock.

    However, a subsequent patch was added such that we directly
    check for sem->count == RWSEM_WAITING_BIAS right before trying
    that cmpxchg().

    Thus, commit 9b0fc9c09f1b now just adds overhead.

    This patch modifies it so that we only do a check for if
    count == RWSEM_WAITING_BIAS.

    Also, add a comment on why we do an "extra check" of count
    before the cmpxchg().

    Signed-off-by: Jason Low
    Acked-by: Davidlohr Bueso
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Aswin Chandramouleeswaran
    Cc: Chegu Vinod
    Cc: Peter Hurley
    Cc: Tim Chen
    Cc: Linus Torvalds
    Link: http://lkml.kernel.org/r/1410913017.2447.22.camel@j-VirtualBox
    Signed-off-by: Ingo Molnar

    Jason Low
     

30 Sep, 2014

4 commits


17 Sep, 2014

8 commits

  • The amount of global variables is getting pretty ugly. Group variables
    related to the execution (ie: not parameters) in a new context structure.

    Signed-off-by: Davidlohr Bueso
    Signed-off-by: Paul E. McKenney

    Davidlohr Bueso
     
  • We can easily do so with our new reader lock support. Just an arbitrary
    design default: readers have higher (5x) critical region latencies than
    writers: 50 ms and 10 ms, respectively.

    Signed-off-by: Davidlohr Bueso
    Signed-off-by: Paul E. McKenney

    Davidlohr Bueso
     
  • Most of it is based on what we already have for writers. This allows
    readers to be very independent (and thus configurable), enabling
    future module parameters to control things such as rw distribution.
    Furthermore, readers have their own delaying function, allowing us
    to test different rw critical region latencies, and stress locking
    internals. Similarly, statistics, for now will only serve for the
    number of lock acquisitions -- as opposed to writers, readers have
    no failure detection.

    In addition, introduce a new nreaders_stress module parameter. The
    default number of readers will be the same number of writers threads.
    Writer threads are interleaved with readers. Documentation is updated,
    respectively.

    Signed-off-by: Davidlohr Bueso
    Signed-off-by: Paul E. McKenney

    Davidlohr Bueso
     
  • When performing module cleanups by calling torture_cleanup() the
    'torture_type' string in nullified However, callers are not necessarily
    done, and might still need to reference the variable. This impacts
    both rcutorture and locktorture, causing printing things like:

    [ 94.226618] (null)-torture: Stopping lock_torture_writer task
    [ 94.226624] (null)-torture: Stopping lock_torture_stats task

    Thus delay this operation until the very end of the cleanup process.
    The consequence (which shouldn't matter for this kid of program) is,
    of course, that we delay the window between rmmod and modprobing,
    for instance in module_torture_begin().

    Signed-off-by: Davidlohr Bueso
    Signed-off-by: Paul E. McKenney

    Davidlohr Bueso
     
  • The statistics structure can serve well for both reader and writer
    locks, thus simply rename some fields that mention 'write' and leave
    the declaration of lwsa.

    Signed-off-by: Davidlohr Bueso
    Signed-off-by: Paul E. McKenney

    Davidlohr Bueso
     
  • Regular locks are very different than locks with debugging. For instance
    for mutexes, debugging forces to only take the slowpaths. As such, the
    locktorture module should take this into account when printing related
    information -- specifically when printing user passed parameters, it seems
    the right place for such info.

    Signed-off-by: Davidlohr Bueso
    Signed-off-by: Paul E. McKenney

    Davidlohr Bueso
     
  • Add a "mutex_lock" torture test. The main difference with the already
    existing spinlock tests is that the latency of the critical region
    is much larger. We randomly delay for (arbitrarily) either 500 ms or,
    otherwise, 25 ms. While this can considerably reduce the amount of
    writes compared to non blocking locks, if run long enough it can have
    the same torturous effect. Furthermore it is more representative of
    mutex hold times and can stress better things like thrashing.

    Signed-off-by: Davidlohr Bueso
    Signed-off-by: Paul E. McKenney

    Davidlohr Bueso
     
  • ... to just 'torture_runnable'. It follows other variable naming
    and is shorter.

    Signed-off-by: Davidlohr Bueso
    Signed-off-by: Paul E. McKenney

    Davidlohr Bueso
     

16 Sep, 2014

1 commit


04 Sep, 2014

1 commit

  • Resolve some shadow warnings resulting from using the name
    jiffies, which is a well-known global. This is not a problem
    of course, but it could be a trap for someone copying and
    pasting code, and it just makes W=2 a little cleaner.

    Signed-off-by: Mark Rustad
    Signed-off-by: Jeff Kirsher
    Acked-by: Peter Zijlstra
    Cc: Linus Torvalds
    Cc: Andrew Morton
    Cc: Thomas Gleixner
    Cc: Paul E. McKenney
    Link: http://lkml.kernel.org/r/1409739444-13635-1-git-send-email-jeffrey.t.kirsher@intel.com
    Signed-off-by: Ingo Molnar

    Mark Rustad
     

13 Aug, 2014

7 commits

  • Unlike the original unfair rwlock implementation, queued rwlock
    will grant lock according to the chronological sequence of the lock
    requests except when the lock requester is in the interrupt context.
    Consequently, recursive read_lock calls will now hang the process if
    there is a write_lock call somewhere in between the read_lock calls.

    This patch updates the lockdep implementation to look for recursive
    read_lock calls. A new read state (3) is used to mark those read_lock
    call that cannot be recursively called except in the interrupt
    context. The new read state does exhaust the 2 bits available in
    held_lock:read bit field. The addition of any new read state in the
    future may require a redesign of how all those bits are squeezed
    together in the held_lock structure.

    Signed-off-by: Waiman Long
    Signed-off-by: Peter Zijlstra
    Cc: Maarten Lankhorst
    Cc: Rik van Riel
    Cc: Scott J Norton
    Cc: Fengguang Wu
    Cc: Linus Torvalds
    Link: http://lkml.kernel.org/r/1407345722-61615-2-git-send-email-Waiman.Long@hp.com
    Signed-off-by: Ingo Molnar

    Waiman Long
     
  • Specifically:
    Documentation/locking/lockdep-design.txt
    Documentation/locking/lockstat.txt
    Documentation/locking/mutex-design.txt
    Documentation/locking/rt-mutex-design.txt
    Documentation/locking/rt-mutex.txt
    Documentation/locking/spinlocks.txt
    Documentation/locking/ww-mutex-design.txt

    Signed-off-by: Davidlohr Bueso
    Acked-by: Randy Dunlap
    Signed-off-by: Peter Zijlstra
    Cc: jason.low2@hp.com
    Cc: aswin@hp.com
    Cc: Alexei Starovoitov
    Cc: Al Viro
    Cc: Andrew Morton
    Cc: Chris Mason
    Cc: Dan Streetman
    Cc: David Airlie
    Cc: Davidlohr Bueso
    Cc: David S. Miller
    Cc: Greg Kroah-Hartman
    Cc: Heiko Carstens
    Cc: Jason Low
    Cc: Josef Bacik
    Cc: Kees Cook
    Cc: Linus Torvalds
    Cc: Lubomir Rintel
    Cc: Masanari Iida
    Cc: Paul E. McKenney
    Cc: Randy Dunlap
    Cc: Tim Chen
    Cc: Vineet Gupta
    Cc: fengguang.wu@intel.com
    Link: http://lkml.kernel.org/r/1406752916-3341-6-git-send-email-davidlohr@hp.com
    Signed-off-by: Ingo Molnar

    Davidlohr Bueso
     
  • 4badad35 ("locking/mutex: Disable optimistic spinning on some
    architectures") added a ARCH_SUPPORTS_ATOMIC_RMW flag to
    disable the mutex optimistic feature on specific archs.

    Because CONFIG_MUTEX_SPIN_ON_OWNER only depended on DEBUG and
    SMP, it was ok to have the ->owner field conditional a bit
    flexible. However by adding a new variable to the matter,
    we can waste space with the unused field, ie: CONFIG_SMP &&
    (!CONFIG_MUTEX_SPIN_ON_OWNER && !CONFIG_DEBUG_MUTEX).

    Signed-off-by: Davidlohr Bueso
    Acked-by: Jason Low
    Signed-off-by: Peter Zijlstra
    Cc: aswin@hp.com
    Cc: Davidlohr Bueso
    Cc: Heiko Carstens
    Cc: Jason Low
    Cc: Linus Torvalds
    Cc: Paul E. McKenney
    Cc: Tim Chen
    Link: http://lkml.kernel.org/r/1406752916-3341-5-git-send-email-davidlohr@hp.com
    Signed-off-by: Ingo Molnar

    Davidlohr Bueso
     
  • When we fail to acquire the mutex in the fastpath, we end up calling
    __mutex_lock_common(). A *lot* goes on in this function. Move out the
    optimistic spinning code into mutex_optimistic_spin() and simplify
    the former a bit. Furthermore, this is similar to what we have in
    rwsems. No logical changes.

    Signed-off-by: Davidlohr Bueso
    Acked-by: Jason Low
    Signed-off-by: Peter Zijlstra
    Cc: aswin@hp.com
    Cc: mingo@kernel.org
    Cc: Linus Torvalds
    Link: http://lkml.kernel.org/r/1406752916-3341-4-git-send-email-davidlohr@hp.com
    Signed-off-by: Ingo Molnar

    Davidlohr Bueso
     
  • ... as we clearly inline mcs_spin_lock() now.

    Signed-off-by: Davidlohr Bueso
    Acked-by: Jason Low
    Signed-off-by: Peter Zijlstra
    Cc: aswin@hp.com
    Cc: Linus Torvalds
    Link: http://lkml.kernel.org/r/1406752916-3341-3-git-send-email-davidlohr@hp.com
    Signed-off-by: Ingo Molnar

    Davidlohr Bueso
     
  • When unlocking, we always want to reach the slowpath with the lock's counter
    indicating it is unlocked. -- as returned by the asm fastpath call or by
    explicitly setting it. While doing so, at least in theory, we can optimize
    and allow faster lock stealing.

    When unlocking, we always want to reach the slowpath with the lock's counter
    indicating it is unlocked. -- as returned by the asm fastpath call or by
    explicitly setting it. While doing so, at least in theory, we can optimize
    and allow faster lock stealing.

    Signed-off-by: Davidlohr Bueso
    Signed-off-by: Peter Zijlstra
    Cc: jason.low2@hp.com
    Cc: aswin@hp.com
    Cc: Linus Torvalds
    Link: http://lkml.kernel.org/r/1406752916-3341-2-git-send-email-davidlohr@hp.com
    Signed-off-by: Ingo Molnar

    Davidlohr Bueso
     
  • Just how the locking-end behaves, when unlocking, go ahead and
    obtain the proper data structure immediately after the previous
    (asm-end) call exits and there are (probably) pending waiters.
    This simplifies a bit some of the layering.

    Signed-off-by: Davidlohr Bueso
    Signed-off-by: Peter Zijlstra
    Cc: jason.low2@hp.com
    Cc: aswin@hp.com
    Cc: mingo@kernel.org
    Cc: Linus Torvalds
    Cc: linux-kernel@vger.kernel.org
    Link: http://lkml.kernel.org/r/1406752916-3341-1-git-send-email-davidlohr@hp.com
    Signed-off-by: Ingo Molnar

    Davidlohr Bueso
     

17 Jul, 2014

3 commits

  • The arch_mutex_cpu_relax() function, introduced by 34b133f, is
    hacky and ugly. It was added a few years ago to address the fact
    that common cpu_relax() calls include yielding on s390, and thus
    impact the optimistic spinning functionality of mutexes. Nowadays
    we use this function well beyond mutexes: rwsem, qrwlock, mcs and
    lockref. Since the macro that defines the call is in the mutex header,
    any users must include mutex.h and the naming is misleading as well.

    This patch (i) renames the call to cpu_relax_lowlatency ("relax, but
    only if you can do it with very low latency") and (ii) defines it in
    each arch's asm/processor.h local header, just like for regular cpu_relax
    functions. On all archs, except s390, cpu_relax_lowlatency is simply cpu_relax,
    and thus we can take it out of mutex.h. While this can seem redundant,
    I believe it is a good choice as it allows us to move out arch specific
    logic from generic locking primitives and enables future(?) archs to
    transparently define it, similarly to System Z.

    Signed-off-by: Davidlohr Bueso
    Signed-off-by: Peter Zijlstra
    Cc: Andrew Morton
    Cc: Anton Blanchard
    Cc: Aurelien Jacquiot
    Cc: Benjamin Herrenschmidt
    Cc: Bharat Bhushan
    Cc: Catalin Marinas
    Cc: Chen Liqin
    Cc: Chris Metcalf
    Cc: Christian Borntraeger
    Cc: Chris Zankel
    Cc: David Howells
    Cc: David S. Miller
    Cc: Deepthi Dharwar
    Cc: Dominik Dingel
    Cc: Fenghua Yu
    Cc: Geert Uytterhoeven
    Cc: Guan Xuetao
    Cc: Haavard Skinnemoen
    Cc: Hans-Christian Egtvedt
    Cc: Heiko Carstens
    Cc: Helge Deller
    Cc: Hirokazu Takata
    Cc: Ivan Kokshaysky
    Cc: James E.J. Bottomley
    Cc: James Hogan
    Cc: Jason Wang
    Cc: Jesper Nilsson
    Cc: Joe Perches
    Cc: Jonas Bonn
    Cc: Joseph Myers
    Cc: Kees Cook
    Cc: Koichi Yasutake
    Cc: Lennox Wu
    Cc: Linus Torvalds
    Cc: Mark Salter
    Cc: Martin Schwidefsky
    Cc: Matt Turner
    Cc: Max Filippov
    Cc: Michael Neuling
    Cc: Michal Simek
    Cc: Mikael Starvik
    Cc: Nicolas Pitre
    Cc: Paolo Bonzini
    Cc: Paul Burton
    Cc: Paul E. McKenney
    Cc: Paul Gortmaker
    Cc: Paul Mackerras
    Cc: Qais Yousef
    Cc: Qiaowei Ren
    Cc: Rafael Wysocki
    Cc: Ralf Baechle
    Cc: Richard Henderson
    Cc: Richard Kuo
    Cc: Russell King
    Cc: Steven Miao
    Cc: Steven Rostedt
    Cc: Stratos Karafotis
    Cc: Tim Chen
    Cc: Tony Luck
    Cc: Vasily Kulikov
    Cc: Vineet Gupta
    Cc: Vineet Gupta
    Cc: Waiman Long
    Cc: Will Deacon
    Cc: Wolfram Sang
    Cc: adi-buildroot-devel@lists.sourceforge.net
    Cc: linux390@de.ibm.com
    Cc: linux-alpha@vger.kernel.org
    Cc: linux-am33-list@redhat.com
    Cc: linux-arm-kernel@lists.infradead.org
    Cc: linux-c6x-dev@linux-c6x.org
    Cc: linux-cris-kernel@axis.com
    Cc: linux-hexagon@vger.kernel.org
    Cc: linux-ia64@vger.kernel.org
    Cc: linux@lists.openrisc.net
    Cc: linux-m32r-ja@ml.linux-m32r.org
    Cc: linux-m32r@ml.linux-m32r.org
    Cc: linux-m68k@lists.linux-m68k.org
    Cc: linux-metag@vger.kernel.org
    Cc: linux-mips@linux-mips.org
    Cc: linux-parisc@vger.kernel.org
    Cc: linuxppc-dev@lists.ozlabs.org
    Cc: linux-s390@vger.kernel.org
    Cc: linux-sh@vger.kernel.org
    Cc: linux-xtensa@linux-xtensa.org
    Cc: sparclinux@vger.kernel.org
    Link: http://lkml.kernel.org/r/1404079773.2619.4.camel@buesod1.americas.hpqcorp.net
    Signed-off-by: Ingo Molnar

    Davidlohr Bueso
     
  • When lockdep turns itself off, the following message is logged:

    Please attach the output of /proc/lock_stat to the bug report

    Omit this message when CONFIG_LOCK_STAT is off, and /proc/lock_stat
    doesn't exist.

    Signed-off-by: Andreas Gruenbacher
    Signed-off-by: Peter Zijlstra
    Cc: Linus Torvalds
    Link: http://lkml.kernel.org/r/1405451452-3824-1-git-send-email-andreas.gruenbacher@gmail.com
    Signed-off-by: Ingo Molnar

    Andreas Gruenbacher
     
  • …and to refresh the branch with fixes

    Signed-off-by: Ingo Molnar <mingo@kernel.org>

    Ingo Molnar
     

16 Jul, 2014

7 commits

  • Just like with mutexes (CONFIG_MUTEX_SPIN_ON_OWNER),
    encapsulate the dependencies for rwsem optimistic spinning.
    No logical changes here as it continues to depend on both
    SMP and the XADD algorithm variant.

    Signed-off-by: Davidlohr Bueso
    Acked-by: Jason Low
    [ Also make it depend on ARCH_SUPPORTS_ATOMIC_RMW. ]
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1405112406-13052-2-git-send-email-davidlohr@hp.com
    Cc: aswin@hp.com
    Cc: Chris Mason
    Cc: Davidlohr Bueso
    Cc: Josef Bacik
    Cc: Linus Torvalds
    Cc: Waiman Long
    Signed-off-by: Ingo Molnar

    Signed-off-by: Ingo Molnar

    Davidlohr Bueso
     
  • There are two definitions of struct rw_semaphore, one in linux/rwsem.h
    and one in linux/rwsem-spinlock.h.

    For some reason they have different names for the initial field. This
    makes it impossible to use C99 named initialization for
    __RWSEM_INITIALIZER() -- or we have to duplicate that entire thing
    along with the structure definitions.

    The simpler patch is renaming the rwsem-spinlock variant to match the
    regular rwsem.

    This allows us to switch to C99 named initialization.

    Signed-off-by: Peter Zijlstra
    Cc: Linus Torvalds
    Link: http://lkml.kernel.org/n/tip-bmrZolsbGmautmzrerog27io@git.kernel.org
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • In the unlock function of the cancellable MCS spinlock, the first
    thing we do is to retrive the current CPU's osq node. However, due to
    the changes made in the previous patch, in the common case where the
    lock is not contended, we wouldn't need to access the current CPU's
    osq node anymore.

    This patch optimizes this by only retriving this CPU's osq node
    after we attempt the initial cmpxchg to unlock the osq and found
    that its contended.

    Signed-off-by: Jason Low
    Signed-off-by: Peter Zijlstra
    Cc: Scott Norton
    Cc: "Paul E. McKenney"
    Cc: Dave Chinner
    Cc: Waiman Long
    Cc: Davidlohr Bueso
    Cc: Rik van Riel
    Cc: Andrew Morton
    Cc: "H. Peter Anvin"
    Cc: Steven Rostedt
    Cc: Tim Chen
    Cc: Konrad Rzeszutek Wilk
    Cc: Aswin Chandramouleeswaran
    Cc: Linus Torvalds
    Link: http://lkml.kernel.org/r/1405358872-3732-5-git-send-email-jason.low2@hp.com
    Signed-off-by: Ingo Molnar

    Jason Low
     
  • Currently, we initialize the osq lock by directly setting the lock's values. It
    would be preferable if we use an init macro to do the initialization like we do
    with other locks.

    This patch introduces and uses a macro and function for initializing the osq lock.

    Signed-off-by: Jason Low
    Signed-off-by: Peter Zijlstra
    Cc: Scott Norton
    Cc: "Paul E. McKenney"
    Cc: Dave Chinner
    Cc: Waiman Long
    Cc: Davidlohr Bueso
    Cc: Rik van Riel
    Cc: Andrew Morton
    Cc: "H. Peter Anvin"
    Cc: Steven Rostedt
    Cc: Tim Chen
    Cc: Konrad Rzeszutek Wilk
    Cc: Aswin Chandramouleeswaran
    Cc: Linus Torvalds
    Cc: Chris Mason
    Cc: Josef Bacik
    Link: http://lkml.kernel.org/r/1405358872-3732-4-git-send-email-jason.low2@hp.com
    Signed-off-by: Ingo Molnar

    Jason Low
     
  • The cancellable MCS spinlock is currently used to queue threads that are
    doing optimistic spinning. It uses per-cpu nodes, where a thread obtaining
    the lock would access and queue the local node corresponding to the CPU that
    it's running on. Currently, the cancellable MCS lock is implemented by using
    pointers to these nodes.

    In this patch, instead of operating on pointers to the per-cpu nodes, we
    store the CPU numbers in which the per-cpu nodes correspond to in atomic_t.
    A similar concept is used with the qspinlock.

    By operating on the CPU # of the nodes using atomic_t instead of pointers
    to those nodes, this can reduce the overhead of the cancellable MCS spinlock
    by 32 bits (on 64 bit systems).

    Signed-off-by: Jason Low
    Signed-off-by: Peter Zijlstra
    Cc: Scott Norton
    Cc: "Paul E. McKenney"
    Cc: Dave Chinner
    Cc: Waiman Long
    Cc: Davidlohr Bueso
    Cc: Rik van Riel
    Cc: Andrew Morton
    Cc: "H. Peter Anvin"
    Cc: Steven Rostedt
    Cc: Tim Chen
    Cc: Konrad Rzeszutek Wilk
    Cc: Aswin Chandramouleeswaran
    Cc: Linus Torvalds
    Cc: Chris Mason
    Cc: Heiko Carstens
    Cc: Josef Bacik
    Link: http://lkml.kernel.org/r/1405358872-3732-3-git-send-email-jason.low2@hp.com
    Signed-off-by: Ingo Molnar

    Jason Low
     
  • Currently, the per-cpu nodes structure for the cancellable MCS spinlock is
    named "optimistic_spin_queue". However, in a follow up patch in the series
    we will be introducing a new structure that serves as the new "handle" for
    the lock. It would make more sense if that structure is named
    "optimistic_spin_queue". Additionally, since the current use of the
    "optimistic_spin_queue" structure are "nodes", it might be better if we
    rename them to "node" anyway.

    This preparatory patch renames all current "optimistic_spin_queue"
    to "optimistic_spin_node".

    Signed-off-by: Jason Low
    Signed-off-by: Peter Zijlstra
    Cc: Scott Norton
    Cc: "Paul E. McKenney"
    Cc: Dave Chinner
    Cc: Waiman Long
    Cc: Davidlohr Bueso
    Cc: Rik van Riel
    Cc: Andrew Morton
    Cc: "H. Peter Anvin"
    Cc: Steven Rostedt
    Cc: Tim Chen
    Cc: Konrad Rzeszutek Wilk
    Cc: Aswin Chandramouleeswaran
    Cc: Linus Torvalds
    Cc: Chris Mason
    Cc: Heiko Carstens
    Cc: Josef Bacik
    Link: http://lkml.kernel.org/r/1405358872-3732-2-git-send-email-jason.low2@hp.com
    Signed-off-by: Ingo Molnar

    Jason Low
     
  • Commit 4fc828e24cd9 ("locking/rwsem: Support optimistic spinning")
    introduced a major performance regression for workloads such as
    xfs_repair which mix read and write locking of the mmap_sem across
    many threads. The result was xfs_repair ran 5x slower on 3.16-rc2
    than on 3.15 and using 20x more system CPU time.

    Perf profiles indicate in some workloads that significant time can
    be spent spinning on !owner. This is because we don't set the lock
    owner when readers(s) obtain the rwsem.

    In this patch, we'll modify rwsem_can_spin_on_owner() such that we'll
    return false if there is no lock owner. The rationale is that if we
    just entered the slowpath, yet there is no lock owner, then there is
    a possibility that a reader has the lock. To be conservative, we'll
    avoid spinning in these situations.

    This patch reduced the total run time of the xfs_repair workload from
    about 4 minutes 24 seconds down to approximately 1 minute 26 seconds,
    back to close to the same performance as on 3.15.

    Retesting of AIM7, which were some of the workloads used to test the
    original optimistic spinning code, confirmed that we still get big
    performance gains with optimistic spinning, even with this additional
    regression fix. Davidlohr found that while the 'custom' workload took
    a performance hit of ~-14% to throughput for >300 users with this
    additional patch, the overall gain with optimistic spinning is
    still ~+45%. The 'disk' workload even improved by ~+15% at >1000 users.

    Tested-by: Dave Chinner
    Acked-by: Davidlohr Bueso
    Signed-off-by: Jason Low
    Signed-off-by: Peter Zijlstra
    Cc: Tim Chen
    Cc: Linus Torvalds
    Link: http://lkml.kernel.org/r/1404532172.2572.30.camel@j-VirtualBox
    Signed-off-by: Ingo Molnar

    Jason Low
     

05 Jul, 2014

4 commits

  • The mutex_trylock() function calls into __mutex_trylock_fastpath() when
    trying to obtain the mutex. On 32 bit x86, in the !__HAVE_ARCH_CMPXCHG
    case, __mutex_trylock_fastpath() calls directly into __mutex_trylock_slowpath()
    regardless of whether or not the mutex is locked.

    In __mutex_trylock_slowpath(), we then acquire the wait_lock spinlock, xchg()
    lock->count with -1, then set lock->count back to 0 if there are no waiters,
    and return true if the prev lock count was 1.

    However, if the mutex is already locked, then there isn't much point
    in attempting all of the above expensive operations. In this patch, we only
    attempt the above trylock operations if the mutex is unlocked.

    Signed-off-by: Jason Low
    Reviewed-by: Davidlohr Bueso
    Signed-off-by: Peter Zijlstra
    Cc: akpm@linux-foundation.org
    Cc: tim.c.chen@linux.intel.com
    Cc: paulmck@linux.vnet.ibm.com
    Cc: rostedt@goodmis.org
    Cc: Waiman.Long@hp.com
    Cc: scott.norton@hp.com
    Cc: aswin@hp.com
    Cc: Linus Torvalds
    Link: http://lkml.kernel.org/r/1402511843-4721-5-git-send-email-jason.low2@hp.com
    Signed-off-by: Ingo Molnar

    Jason Low
     
  • Upon entering the slowpath in __mutex_lock_common(), we try once more to
    acquire the mutex. We only try to acquire if (lock->count >= 0). However,
    what we actually want here is to try to acquire if the mutex is unlocked
    (lock->count == 1).

    This patch changes it so that we only try-acquire the mutex upon entering
    the slowpath if it is unlocked, rather than if the lock count is non-negative.
    This helps further reduce unnecessary atomic xchg() operations.

    Furthermore, this patch uses !mutex_is_locked(lock) to do the initial
    checks for if the lock is free rather than directly calling atomic_read()
    on the lock->count, in order to improve readability.

    Signed-off-by: Jason Low
    Acked-by: Waiman Long
    Signed-off-by: Peter Zijlstra
    Cc: akpm@linux-foundation.org
    Cc: tim.c.chen@linux.intel.com
    Cc: paulmck@linux.vnet.ibm.com
    Cc: rostedt@goodmis.org
    Cc: davidlohr@hp.com
    Cc: scott.norton@hp.com
    Cc: aswin@hp.com
    Cc: Linus Torvalds
    Link: http://lkml.kernel.org/r/1402511843-4721-4-git-send-email-jason.low2@hp.com
    Signed-off-by: Ingo Molnar

    Jason Low
     
  • MUTEX_SHOW_NO_WAITER() is a macro which checks for if there are
    "no waiters" on a mutex by checking if the lock count is non-negative.
    Based on feedback from the discussion in the earlier version of this
    patchset, the macro is not very readable.

    Furthermore, checking lock->count isn't always the correct way to
    determine if there are "no waiters" on a mutex. For example, a negative
    count on a mutex really only means that there "potentially" are
    waiters. Likewise, there can be waiters on the mutex even if the count is
    non-negative. Thus, "MUTEX_SHOW_NO_WAITER" doesn't always do what the name
    of the macro suggests.

    So this patch deletes the MUTEX_SHOW_NO_WAITERS() macro, directly
    use atomic_read() instead of the macro, and adds comments which
    elaborate on how the extra atomic_read() checks can help reduce
    unnecessary xchg() operations.

    Signed-off-by: Jason Low
    Acked-by: Waiman Long
    Signed-off-by: Peter Zijlstra
    Cc: akpm@linux-foundation.org
    Cc: tim.c.chen@linux.intel.com
    Cc: paulmck@linux.vnet.ibm.com
    Cc: rostedt@goodmis.org
    Cc: davidlohr@hp.com
    Cc: scott.norton@hp.com
    Cc: aswin@hp.com
    Cc: Linus Torvalds
    Link: http://lkml.kernel.org/r/1402511843-4721-3-git-send-email-jason.low2@hp.com
    Signed-off-by: Ingo Molnar

    Jason Low
     
  • The mutex optimistic spinning documentation states that we spin for
    acquisition when we find that there are no pending waiters. However,
    in actuality, whether or not there are waiters for the mutex doesn't
    determine if we will spin for it.

    This patch removes that statement and also adds a comment which
    mentions that we spin for the mutex while we don't need to reschedule.

    Signed-off-by: Jason Low
    Acked-by: Davidlohr Bueso
    Signed-off-by: Peter Zijlstra
    Cc: akpm@linux-foundation.org
    Cc: tim.c.chen@linux.intel.com
    Cc: paulmck@linux.vnet.ibm.com
    Cc: rostedt@goodmis.org
    Cc: Waiman.Long@hp.com
    Cc: scott.norton@hp.com
    Cc: aswin@hp.com
    Cc: Linus Torvalds
    Link: http://lkml.kernel.org/r/1402511843-4721-2-git-send-email-jason.low2@hp.com
    Signed-off-by: Ingo Molnar

    Jason Low
     

22 Jun, 2014

1 commit

  • In case the dead lock detector is enabled we follow the lock chain to
    the end in rt_mutex_adjust_prio_chain, even if we could stop earlier
    due to the priority/waiter constellation.

    But once we are no longer the top priority waiter in a certain step
    or the task holding the lock has already the same priority then there
    is no point in dequeing and enqueing along the lock chain as there is
    no change at all.

    So stop the queueing at this point.

    Signed-off-by: Thomas Gleixner
    Cc: Peter Zijlstra
    Reviewed-by: Steven Rostedt
    Cc: Lai Jiangshan
    Link: http://lkml.kernel.org/r/20140522031950.280830190@linutronix.de
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner