17 Jul, 2014

1 commit

  • The arch_mutex_cpu_relax() function, introduced by 34b133f, is
    hacky and ugly. It was added a few years ago to address the fact
    that common cpu_relax() calls include yielding on s390, and thus
    impact the optimistic spinning functionality of mutexes. Nowadays
    we use this function well beyond mutexes: rwsem, qrwlock, mcs and
    lockref. Since the macro that defines the call is in the mutex header,
    any users must include mutex.h and the naming is misleading as well.

    This patch (i) renames the call to cpu_relax_lowlatency ("relax, but
    only if you can do it with very low latency") and (ii) defines it in
    each arch's asm/processor.h local header, just like for regular cpu_relax
    functions. On all archs, except s390, cpu_relax_lowlatency is simply cpu_relax,
    and thus we can take it out of mutex.h. While this can seem redundant,
    I believe it is a good choice as it allows us to move out arch specific
    logic from generic locking primitives and enables future(?) archs to
    transparently define it, similarly to System Z.

    Signed-off-by: Davidlohr Bueso
    Signed-off-by: Peter Zijlstra
    Cc: Andrew Morton
    Cc: Anton Blanchard
    Cc: Aurelien Jacquiot
    Cc: Benjamin Herrenschmidt
    Cc: Bharat Bhushan
    Cc: Catalin Marinas
    Cc: Chen Liqin
    Cc: Chris Metcalf
    Cc: Christian Borntraeger
    Cc: Chris Zankel
    Cc: David Howells
    Cc: David S. Miller
    Cc: Deepthi Dharwar
    Cc: Dominik Dingel
    Cc: Fenghua Yu
    Cc: Geert Uytterhoeven
    Cc: Guan Xuetao
    Cc: Haavard Skinnemoen
    Cc: Hans-Christian Egtvedt
    Cc: Heiko Carstens
    Cc: Helge Deller
    Cc: Hirokazu Takata
    Cc: Ivan Kokshaysky
    Cc: James E.J. Bottomley
    Cc: James Hogan
    Cc: Jason Wang
    Cc: Jesper Nilsson
    Cc: Joe Perches
    Cc: Jonas Bonn
    Cc: Joseph Myers
    Cc: Kees Cook
    Cc: Koichi Yasutake
    Cc: Lennox Wu
    Cc: Linus Torvalds
    Cc: Mark Salter
    Cc: Martin Schwidefsky
    Cc: Matt Turner
    Cc: Max Filippov
    Cc: Michael Neuling
    Cc: Michal Simek
    Cc: Mikael Starvik
    Cc: Nicolas Pitre
    Cc: Paolo Bonzini
    Cc: Paul Burton
    Cc: Paul E. McKenney
    Cc: Paul Gortmaker
    Cc: Paul Mackerras
    Cc: Qais Yousef
    Cc: Qiaowei Ren
    Cc: Rafael Wysocki
    Cc: Ralf Baechle
    Cc: Richard Henderson
    Cc: Richard Kuo
    Cc: Russell King
    Cc: Steven Miao
    Cc: Steven Rostedt
    Cc: Stratos Karafotis
    Cc: Tim Chen
    Cc: Tony Luck
    Cc: Vasily Kulikov
    Cc: Vineet Gupta
    Cc: Vineet Gupta
    Cc: Waiman Long
    Cc: Will Deacon
    Cc: Wolfram Sang
    Cc: adi-buildroot-devel@lists.sourceforge.net
    Cc: linux390@de.ibm.com
    Cc: linux-alpha@vger.kernel.org
    Cc: linux-am33-list@redhat.com
    Cc: linux-arm-kernel@lists.infradead.org
    Cc: linux-c6x-dev@linux-c6x.org
    Cc: linux-cris-kernel@axis.com
    Cc: linux-hexagon@vger.kernel.org
    Cc: linux-ia64@vger.kernel.org
    Cc: linux@lists.openrisc.net
    Cc: linux-m32r-ja@ml.linux-m32r.org
    Cc: linux-m32r@ml.linux-m32r.org
    Cc: linux-m68k@lists.linux-m68k.org
    Cc: linux-metag@vger.kernel.org
    Cc: linux-mips@linux-mips.org
    Cc: linux-parisc@vger.kernel.org
    Cc: linuxppc-dev@lists.ozlabs.org
    Cc: linux-s390@vger.kernel.org
    Cc: linux-sh@vger.kernel.org
    Cc: linux-xtensa@linux-xtensa.org
    Cc: sparclinux@vger.kernel.org
    Link: http://lkml.kernel.org/r/1404079773.2619.4.camel@buesod1.americas.hpqcorp.net
    Signed-off-by: Ingo Molnar

    Davidlohr Bueso
     

16 Jul, 2014

2 commits

  • The cancellable MCS spinlock is currently used to queue threads that are
    doing optimistic spinning. It uses per-cpu nodes, where a thread obtaining
    the lock would access and queue the local node corresponding to the CPU that
    it's running on. Currently, the cancellable MCS lock is implemented by using
    pointers to these nodes.

    In this patch, instead of operating on pointers to the per-cpu nodes, we
    store the CPU numbers in which the per-cpu nodes correspond to in atomic_t.
    A similar concept is used with the qspinlock.

    By operating on the CPU # of the nodes using atomic_t instead of pointers
    to those nodes, this can reduce the overhead of the cancellable MCS spinlock
    by 32 bits (on 64 bit systems).

    Signed-off-by: Jason Low
    Signed-off-by: Peter Zijlstra
    Cc: Scott Norton
    Cc: "Paul E. McKenney"
    Cc: Dave Chinner
    Cc: Waiman Long
    Cc: Davidlohr Bueso
    Cc: Rik van Riel
    Cc: Andrew Morton
    Cc: "H. Peter Anvin"
    Cc: Steven Rostedt
    Cc: Tim Chen
    Cc: Konrad Rzeszutek Wilk
    Cc: Aswin Chandramouleeswaran
    Cc: Linus Torvalds
    Cc: Chris Mason
    Cc: Heiko Carstens
    Cc: Josef Bacik
    Link: http://lkml.kernel.org/r/1405358872-3732-3-git-send-email-jason.low2@hp.com
    Signed-off-by: Ingo Molnar

    Jason Low
     
  • Currently, the per-cpu nodes structure for the cancellable MCS spinlock is
    named "optimistic_spin_queue". However, in a follow up patch in the series
    we will be introducing a new structure that serves as the new "handle" for
    the lock. It would make more sense if that structure is named
    "optimistic_spin_queue". Additionally, since the current use of the
    "optimistic_spin_queue" structure are "nodes", it might be better if we
    rename them to "node" anyway.

    This preparatory patch renames all current "optimistic_spin_queue"
    to "optimistic_spin_node".

    Signed-off-by: Jason Low
    Signed-off-by: Peter Zijlstra
    Cc: Scott Norton
    Cc: "Paul E. McKenney"
    Cc: Dave Chinner
    Cc: Waiman Long
    Cc: Davidlohr Bueso
    Cc: Rik van Riel
    Cc: Andrew Morton
    Cc: "H. Peter Anvin"
    Cc: Steven Rostedt
    Cc: Tim Chen
    Cc: Konrad Rzeszutek Wilk
    Cc: Aswin Chandramouleeswaran
    Cc: Linus Torvalds
    Cc: Chris Mason
    Cc: Heiko Carstens
    Cc: Josef Bacik
    Link: http://lkml.kernel.org/r/1405358872-3732-2-git-send-email-jason.low2@hp.com
    Signed-off-by: Ingo Molnar

    Jason Low
     

11 Mar, 2014

1 commit

  • Since we want a task waiting for a mutex_lock() to go to sleep and
    reschedule on need_resched() we must be able to abort the
    mcs_spin_lock() around the adaptive spin.

    Therefore implement a cancelable mcs lock.

    Signed-off-by: Peter Zijlstra
    Cc: chegu_vinod@hp.com
    Cc: paulmck@linux.vnet.ibm.com
    Cc: Waiman.Long@hp.com
    Cc: torvalds@linux-foundation.org
    Cc: tglx@linutronix.de
    Cc: riel@redhat.com
    Cc: akpm@linux-foundation.org
    Cc: davidlohr@hp.com
    Cc: hpa@zytor.com
    Cc: andi@firstfloor.org
    Cc: aswin@hp.com
    Cc: scott.norton@hp.com
    Cc: Jason Low
    Link: http://lkml.kernel.org/n/tip-62hcl5wxydmjzd182zhvk89m@git.kernel.org
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

28 Jan, 2014

1 commit

  • We will need the MCS lock code for doing optimistic spinning for rwsem
    and queued rwlock. Extracting the MCS code from mutex.c and put into
    its own file allow us to reuse this code easily.

    We also inline mcs_spin_lock and mcs_spin_unlock functions
    for better efficiency.

    Note that using the smp_load_acquire/smp_store_release pair used in
    mcs_lock and mcs_unlock is not sufficient to form a full memory barrier
    across cpus for many architectures (except x86). For applications that
    absolutely need a full barrier across multiple cpus with mcs_unlock and
    mcs_lock pair, smp_mb__after_unlock_lock() should be used after mcs_lock.

    Reviewed-by: Paul E. McKenney
    Signed-off-by: Tim Chen
    Signed-off-by: Davidlohr Bueso
    Signed-off-by: Peter Zijlstra
    Cc: Linus Torvalds
    Cc: Andrew Morton
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/1390347360.3138.63.camel@schen9-DESK
    Signed-off-by: Ingo Molnar

    Tim Chen
     

11 Nov, 2013

1 commit

  • Fix this docbook error:

    >> docproc: kernel/mutex.c: No such file or directory

    by updating the stale references to kernel/mutex.c.

    Reported-by: fengguang.wu@intel.com
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/n/tip-34pikw1tlsskj65rrt5iusrq@git.kernel.org
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

28 Sep, 2013

1 commit

  • Linus suggested to replace

    #ifndef CONFIG_HAVE_ARCH_MUTEX_CPU_RELAX
    #define arch_mutex_cpu_relax() cpu_relax()
    #endif

    with just a simple

    #ifndef arch_mutex_cpu_relax
    # define arch_mutex_cpu_relax() cpu_relax()
    #endif

    to get rid of CONFIG_HAVE_CPU_RELAX_SIMPLE. So architectures can
    simply define arch_mutex_cpu_relax if they want an architecture
    specific function instead of having to add a select statement in
    their Kconfig in addition.

    Suggested-by: Linus Torvalds
    Signed-off-by: Heiko Carstens

    Heiko Carstens
     

12 Jul, 2013

1 commit

  • Move the definitions for wound/wait mutexes out to a separate
    header, ww_mutex.h. This reduces clutter in mutex.h, and
    increases readability.

    Suggested-by: Linus Torvalds
    Signed-off-by: Maarten Lankhorst
    Acked-by: Peter Zijlstra
    Acked-by: Rik van Riel
    Acked-by: Maarten Lankhorst
    Cc: Dave Airlie
    Link: http://lkml.kernel.org/r/51D675DC.3000907@canonical.com
    [ Tidied up the code a bit. ]
    Signed-off-by: Ingo Molnar

    Maarten Lankhorst
     

26 Jun, 2013

2 commits

  • Injects EDEADLK conditions at pseudo-random interval, with
    exponential backoff up to UINT_MAX (to ensure that every lock
    operation still completes in a reasonable time).

    This way we can test the wound slowpath even for ww mutex users
    where contention is never expected, and the ww deadlock
    avoidance algorithm is only needed for correctness against
    malicious userspace. An example would be protecting kernel
    modesetting properties, which thanks to single-threaded X isn't
    really expected to contend, ever.

    I've looked into using the CONFIG_FAULT_INJECTION
    infrastructure, but decided against it for two reasons:

    - EDEADLK handling is mandatory for ww mutex users and should
    never affect the outcome of a syscall. This is in contrast to -ENOMEM
    injection. So fine configurability isn't required.

    - The fault injection framework only allows to set a simple
    probability for failure. Now the probability that a ww mutex acquire
    stage with N locks will never complete (due to too many injected
    EDEADLK backoffs) is zero. But the expected number of ww_mutex_lock
    operations for the completely uncontended case would be O(exp(N)).
    The per-acuiqire ctx exponential backoff solution choosen here only
    results in O(log N) overhead due to injection and so O(log N * N)
    lock operations. This way we can fail with high probability (and so
    have good test coverage even for fancy backoff and lock acquisition
    paths) without running into patalogical cases.

    Note that EDEADLK will only ever be injected when we managed to
    acquire the lock. This prevents any behaviour changes for users
    which rely on the EALREADY semantics.

    Signed-off-by: Daniel Vetter
    Signed-off-by: Maarten Lankhorst
    Acked-by: Peter Zijlstra
    Cc: dri-devel@lists.freedesktop.org
    Cc: linaro-mm-sig@lists.linaro.org
    Cc: rostedt@goodmis.org
    Cc: daniel@ffwll.ch
    Cc: Linus Torvalds
    Cc: Andrew Morton
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/20130620113117.4001.21681.stgit@patser
    Signed-off-by: Ingo Molnar

    Daniel Vetter
     
  • Wound/wait mutexes are used when other multiple lock
    acquisitions of a similar type can be done in an arbitrary
    order. The deadlock handling used here is called wait/wound in
    the RDBMS literature: The older tasks waits until it can acquire
    the contended lock. The younger tasks needs to back off and drop
    all the locks it is currently holding, i.e. the younger task is
    wounded.

    For full documentation please read Documentation/ww-mutex-design.txt.

    References: https://lwn.net/Articles/548909/
    Signed-off-by: Maarten Lankhorst
    Acked-by: Daniel Vetter
    Acked-by: Rob Clark
    Acked-by: Peter Zijlstra
    Cc: dri-devel@lists.freedesktop.org
    Cc: linaro-mm-sig@lists.linaro.org
    Cc: rostedt@goodmis.org
    Cc: daniel@ffwll.ch
    Cc: Linus Torvalds
    Cc: Andrew Morton
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/51C8038C.9000106@canonical.com
    Signed-off-by: Ingo Molnar

    Maarten Lankhorst
     

19 Apr, 2013

1 commit

  • The current mutex spinning code (with MUTEX_SPIN_ON_OWNER option
    turned on) allow multiple tasks to spin on a single mutex
    concurrently. A potential problem with the current approach is
    that when the mutex becomes available, all the spinning tasks
    will try to acquire the mutex more or less simultaneously. As a
    result, there will be a lot of cacheline bouncing especially on
    systems with a large number of CPUs.

    This patch tries to reduce this kind of contention by putting
    the mutex spinners into a queue so that only the first one in
    the queue will try to acquire the mutex. This will reduce
    contention and allow all the tasks to move forward faster.

    The queuing of mutex spinners is done using an MCS lock based
    implementation which will further reduce contention on the mutex
    cacheline than a similar ticket spinlock based implementation.
    This patch will add a new field into the mutex data structure
    for holding the MCS lock. This expands the mutex size by 8 bytes
    for 64-bit system and 4 bytes for 32-bit system. This overhead
    will be avoid if the MUTEX_SPIN_ON_OWNER option is turned off.

    The following table shows the jobs per minute (JPM) scalability
    data on an 8-node 80-core Westmere box with a 3.7.10 kernel. The
    numactl command is used to restrict the running of the fserver
    workloads to 1/2/4/8 nodes with hyperthreading off.

    +-----------------+-----------+-----------+-------------+----------+
    | Configuration | Mean JPM | Mean JPM | Mean JPM | % Change |
    | | w/o patch | patch 1 | patches 1&2 | 1->1&2 |
    +-----------------+------------------------------------------------+
    | | User Range 1100 - 2000 |
    +-----------------+------------------------------------------------+
    | 8 nodes, HT off | 227972 | 227237 | 305043 | +34.2% |
    | 4 nodes, HT off | 393503 | 381558 | 394650 | +3.4% |
    | 2 nodes, HT off | 334957 | 325240 | 338853 | +4.2% |
    | 1 node , HT off | 198141 | 197972 | 198075 | +0.1% |
    +-----------------+------------------------------------------------+
    | | User Range 200 - 1000 |
    +-----------------+------------------------------------------------+
    | 8 nodes, HT off | 282325 | 312870 | 332185 | +6.2% |
    | 4 nodes, HT off | 390698 | 378279 | 393419 | +4.0% |
    | 2 nodes, HT off | 336986 | 326543 | 340260 | +4.2% |
    | 1 node , HT off | 197588 | 197622 | 197582 | 0.0% |
    +-----------------+-----------+-----------+-------------+----------+

    At low user range 10-100, the JPM differences were within +/-1%.
    So they are not that interesting.

    The fserver workload uses mutex spinning extensively. With just
    the mutex change in the first patch, there is no noticeable
    change in performance. Rather, there is a slight drop in
    performance. This mutex spinning patch more than recovers the
    lost performance and show a significant increase of +30% at high
    user load with the full 8 nodes. Similar improvements were also
    seen in a 3.8 kernel.

    The table below shows the %time spent by different kernel
    functions as reported by perf when running the fserver workload
    at 1500 users with all 8 nodes.

    +-----------------------+-----------+---------+-------------+
    | Function | % time | % time | % time |
    | | w/o patch | patch 1 | patches 1&2 |
    +-----------------------+-----------+---------+-------------+
    | __read_lock_failed | 34.96% | 34.91% | 29.14% |
    | __write_lock_failed | 10.14% | 10.68% | 7.51% |
    | mutex_spin_on_owner | 3.62% | 3.42% | 2.33% |
    | mspin_lock | N/A | N/A | 9.90% |
    | __mutex_lock_slowpath | 1.46% | 0.81% | 0.14% |
    | _raw_spin_lock | 2.25% | 2.50% | 1.10% |
    +-----------------------+-----------+---------+-------------+

    The fserver workload for an 8-node system is dominated by the
    contention in the read/write lock. Mutex contention also plays a
    role. With the first patch only, mutex contention is down (as
    shown by the __mutex_lock_slowpath figure) which help a little
    bit. We saw only a few percents improvement with that.

    By applying patch 2 as well, the single mutex_spin_on_owner
    figure is now split out into an additional mspin_lock figure.
    The time increases from 3.42% to 11.23%. It shows a great
    reduction in contention among the spinners leading to a 30%
    improvement. The time ratio 9.9/2.33=4.3 indicates that there
    are on average 4+ spinners waiting in the spin_lock loop for
    each spinner in the mutex_spin_on_owner loop. Contention in
    other locking functions also go down by quite a lot.

    The table below shows the performance change of both patches 1 &
    2 over patch 1 alone in other AIM7 workloads (at 8 nodes,
    hyperthreading off).

    +--------------+---------------+----------------+-----------------+
    | Workload | mean % change | mean % change | mean % change |
    | | 10-100 users | 200-1000 users | 1100-2000 users |
    +--------------+---------------+----------------+-----------------+
    | alltests | 0.0% | -0.8% | +0.6% |
    | five_sec | -0.3% | +0.8% | +0.8% |
    | high_systime | +0.4% | +2.4% | +2.1% |
    | new_fserver | +0.1% | +14.1% | +34.2% |
    | shared | -0.5% | -0.3% | -0.4% |
    | short | -1.7% | -9.8% | -8.3% |
    +--------------+---------------+----------------+-----------------+

    The short workload is the only one that shows a decline in
    performance probably due to the spinner locking and queuing
    overhead.

    Signed-off-by: Waiman Long
    Reviewed-by: Davidlohr Bueso
    Acked-by: Rik van Riel
    Cc: Linus Torvalds
    Cc: Andrew Morton
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Chandramouleeswaran Aswin
    Cc: Norton Scott J
    Cc: Paul E. McKenney
    Cc: David Howells
    Cc: Dave Jones
    Cc: Clark Williams
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1366226594-5506-4-git-send-email-Waiman.Long@hp.com
    Signed-off-by: Ingo Molnar

    Waiman Long
     

27 Jul, 2011

1 commit

  • This allows us to move duplicated code in
    (atomic_inc_not_zero() for now) to

    Signed-off-by: Arun Sharma
    Reviewed-by: Eric Dumazet
    Cc: Ingo Molnar
    Cc: David Miller
    Cc: Eric Dumazet
    Acked-by: Mike Frysinger
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arun Sharma
     

21 Jul, 2011

1 commit

  • The non-debug variant of mutex_destroy is a no-op, currently
    implemented as a macro which does nothing. This approach fails
    to check the type of the parameter, so an error would only show
    when debugging gets enabled. Using an inline function instead,
    offers type checking for earlier bug catching.

    Signed-off-by: Jean Delvare
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20110716174200.41002352@endymion.delvare
    Signed-off-by: Ingo Molnar

    Jean Delvare
     

25 May, 2011

1 commit

  • In order to convert i_mmap_lock to a mutex we need a mutex equivalent to
    spin_lock_nest_lock(), thus provide the mutex_lock_nest_lock() annotation.

    As with spin_lock_nest_lock(), mutex_lock_nest_lock() allows annotation of
    the locking pattern where an outer lock serializes the acquisition order
    of nested locks. That is, if every time you lock multiple locks A, say A1
    and A2 you first acquire N, the order of acquiring A1 and A2 is
    irrelevant.

    Signed-off-by: Peter Zijlstra
    Cc: Benjamin Herrenschmidt
    Cc: David Miller
    Cc: Martin Schwidefsky
    Cc: Russell King
    Cc: Paul Mundt
    Cc: Jeff Dike
    Cc: Richard Weinberger
    Cc: Tony Luck
    Cc: KAMEZAWA Hiroyuki
    Cc: Hugh Dickins
    Cc: Mel Gorman
    Cc: KOSAKI Motohiro
    Cc: Nick Piggin
    Cc: Namhyung Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     

14 Apr, 2011

1 commit

  • Since we now have p->on_cpu unconditionally available, use it to
    re-implement mutex_spin_on_owner.

    Requested-by: Thomas Gleixner
    Reviewed-by: Frank Rowand
    Cc: Mike Galbraith
    Cc: Nick Piggin
    Cc: Linus Torvalds
    Cc: Andrew Morton
    Signed-off-by: Ingo Molnar
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20110405152728.826338173@chello.nl

    Peter Zijlstra
     

26 Nov, 2010

1 commit

  • The spinning mutex implementation uses cpu_relax() in busy loops as a
    compiler barrier. Depending on the architecture, cpu_relax() may do more
    than needed in this specific mutex spin loops. On System z we also give
    up the time slice of the virtual cpu in cpu_relax(), which prevents
    effective spinning on the mutex.

    This patch replaces cpu_relax() in the spinning mutex code with
    arch_mutex_cpu_relax(), which can be defined by each architecture that
    selects HAVE_ARCH_MUTEX_CPU_RELAX. The default is still cpu_relax(), so
    this patch should not affect other architectures than System z for now.

    Signed-off-by: Gerald Schaefer
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Gerald Schaefer
     

03 Sep, 2010

1 commit


30 Apr, 2009

1 commit

  • include/linux/mutex.h:136: warning: 'mutex_lock' declared inline after being called
    include/linux/mutex.h:136: warning: previous declaration of 'mutex_lock' was here

    uninline it.

    [ Impact: clean up and uninline, address compiler warning ]

    Signed-off-by: Andrew Morton
    Cc: Al Viro
    Cc: Christoph Hellwig
    Cc: Eric Paris
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Andrew Morton
     

29 Apr, 2009

1 commit

  • Much like the atomic_dec_and_lock() function in which we take an hold a
    spin_lock if we drop the atomic to 0 this function takes and holds the
    mutex if we dec the atomic to 0.

    Signed-off-by: Eric Paris
    Signed-off-by: Peter Zijlstra
    Cc: Paul Mackerras
    Orig-LKML-Reference:
    Signed-off-by: Ingo Molnar

    Eric Paris
     

15 Jan, 2009

1 commit

  • Change mutex contention behaviour such that it will sometimes busy wait on
    acquisition - moving its behaviour closer to that of spinlocks.

    This concept got ported to mainline from the -rt tree, where it was originally
    implemented for rtmutexes by Steven Rostedt, based on work by Gregory Haskins.

    Testing with Ingo's test-mutex application (http://lkml.org/lkml/2006/1/8/50)
    gave a 345% boost for VFS scalability on my testbox:

    # ./test-mutex-shm V 16 10 | grep "^avg ops"
    avg ops/sec: 296604

    # ./test-mutex-shm V 16 10 | grep "^avg ops"
    avg ops/sec: 85870

    The key criteria for the busy wait is that the lock owner has to be running on
    a (different) cpu. The idea is that as long as the owner is running, there is a
    fair chance it'll release the lock soon, and thus we'll be better off spinning
    instead of blocking/scheduling.

    Since regular mutexes (as opposed to rtmutexes) do not atomically track the
    owner, we add the owner in a non-atomic fashion and deal with the races in
    the slowpath.

    Furthermore, to ease the testing of the performance impact of this new code,
    there is means to disable this behaviour runtime (without having to reboot
    the system), when scheduler debugging is enabled (CONFIG_SCHED_DEBUG=y),
    by issuing the following command:

    # echo NO_OWNER_SPIN > /debug/sched_features

    This command re-enables spinning again (this is also the default):

    # echo OWNER_SPIN > /debug/sched_features

    Signed-off-by: Peter Zijlstra
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

31 Oct, 2008

1 commit


09 Feb, 2008

1 commit


07 Dec, 2007

1 commit


17 Oct, 2007

1 commit


12 Oct, 2007

1 commit

  • The fancy mutex_lock fastpath has too many indirections to track the caller
    hence all contentions are perceived to come from mutex_lock().

    Avoid this by explicitly not using the fastpath code (it was disabled already
    anyway).

    Signed-off-by: Peter Zijlstra
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

10 May, 2007

1 commit


27 Jan, 2007

1 commit


09 Dec, 2006

1 commit

  • md_open takes ->reconfig_mutex which causes lockdep to complain. This
    (normally) doesn't have deadlock potential as the possible conflict is with a
    reconfig_mutex in a different device.

    I say "normally" because if a loop were created in the array->member hierarchy
    a deadlock could happen. However that causes bigger problems than a deadlock
    and should be fixed independently.

    So we flag the lock in md_open as a nested lock. This requires defining
    mutex_lock_interruptible_nested.

    Cc: Ingo Molnar
    Acked-by: Peter Zijlstra
    Acked-by: Ingo Molnar
    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     

08 Dec, 2006

1 commit


04 Jul, 2006

2 commits

  • Use the lock validator framework to prove mutex locking correctness.

    Signed-off-by: Ingo Molnar
    Signed-off-by: Arjan van de Ven
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ingo Molnar
     
  • Generic lock debugging:

    - generalized lock debugging framework. For example, a bug in one lock
    subsystem turns off debugging in all lock subsystems.

    - got rid of the caller address passing (__IP__/__IP_DECL__/etc.) from
    the mutex/rtmutex debugging code: it caused way too much prototype
    hackery, and lockdep will give the same information anyway.

    - ability to do silent tests

    - check lock freeing in vfree too.

    - more finegrained debugging options, to allow distributions to
    turn off more expensive debugging features.

    There's no separate 'held mutexes' list anymore - but there's a 'held locks'
    stack within lockdep, which unifies deadlock detection across all lock
    classes. (this is independent of the lockdep validation stuff - lockdep first
    checks whether we are holding a lock already)

    Here are the current debugging options:

    CONFIG_DEBUG_MUTEXES=y
    CONFIG_DEBUG_LOCK_ALLOC=y

    which do:

    config DEBUG_MUTEXES
    bool "Mutex debugging, basic checks"

    config DEBUG_LOCK_ALLOC
    bool "Detect incorrect freeing of live mutexes"

    Signed-off-by: Ingo Molnar
    Signed-off-by: Arjan van de Ven
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ingo Molnar
     

12 Jan, 2006

1 commit

  • Let's switch mutex_debug_check_no_locks_freed() to take (addr, len) as
    arguments instead, since all its callers were just calculating the 'to'
    address for themselves anyway... (and sometimes doing so badly).

    Signed-off-by: David Woodhouse
    Acked-by: Ingo Molnar
    Signed-off-by: Linus Torvalds

    David Woodhouse
     

11 Jan, 2006

1 commit


10 Jan, 2006

1 commit