08 Jun, 2016

1 commit

  • Currently, it is not possible to determine for sure if a reader
    owns a rwsem by looking at the content of the rwsem data structure.
    This patch adds a new state RWSEM_READER_OWNED to the owner field
    to indicate that readers currently own the lock. This enables us to
    address the following 2 issues in the rwsem optimistic spinning code:

    1) rwsem_can_spin_on_owner() will disallow optimistic spinning if
    the owner field is NULL which can mean either the readers own
    the lock or the owning writer hasn't set the owner field yet.
    In the latter case, we miss the chance to do optimistic spinning.

    2) While a writer is waiting in the OSQ and a reader takes the lock,
    the writer will continue to spin when out of the OSQ in the main
    rwsem_optimistic_spin() loop as the owner field is NULL wasting
    CPU cycles if some of readers are sleeping.

    Adding the new state will allow optimistic spinning to go forward as
    long as the owner field is not RWSEM_READER_OWNED and the owner is
    running, if set, but stop immediately when that state has been reached.

    On a 4-socket Haswell machine running on a 4.6-rc1 based kernel, the
    fio test with multithreaded randrw and randwrite tests on the same
    file on a XFS partition on top of a NVDIMM were run, the aggregated
    bandwidths before and after the patch were as follows:

    Test BW before patch BW after patch % change
    ---- --------------- -------------- --------
    randrw 988 MB/s 1192 MB/s +21%
    randwrite 1513 MB/s 1623 MB/s +7.3%

    The perf profile of the rwsem_down_write_failed() function in randrw
    before and after the patch were:

    19.95% 5.88% fio [kernel.vmlinux] [k] rwsem_down_write_failed
    14.20% 1.52% fio [kernel.vmlinux] [k] rwsem_down_write_failed

    The actual CPU cycles spend in rwsem_down_write_failed() dropped from
    5.88% to 1.52% after the patch.

    The xfstests was also run and no regression was observed.

    Signed-off-by: Waiman Long
    Signed-off-by: Peter Zijlstra (Intel)
    Acked-by: Jason Low
    Acked-by: Davidlohr Bueso
    Cc: Andrew Morton
    Cc: Dave Chinner
    Cc: Douglas Hatch
    Cc: Linus Torvalds
    Cc: Paul E. McKenney
    Cc: Peter Hurley
    Cc: Peter Zijlstra
    Cc: Scott J Norton
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/1463534783-38814-2-git-send-email-Waiman.Long@hpe.com
    Signed-off-by: Ingo Molnar

    Waiman Long
     

26 May, 2016

1 commit


22 Apr, 2016

1 commit

  • Now that all the architectures implement the necessary glue code
    we can introduce down_write_killable(). The only difference wrt. regular
    down_write() is that the slow path waits in TASK_KILLABLE state and the
    interruption by the fatal signal is reported as -EINTR to the caller.

    Signed-off-by: Michal Hocko
    Cc: Andrew Morton
    Cc: Chris Zankel
    Cc: David S. Miller
    Cc: Linus Torvalds
    Cc: Max Filippov
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Signed-off-by: Davidlohr Bueso
    Cc: Signed-off-by: Jason Low
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Cc: linux-alpha@vger.kernel.org
    Cc: linux-arch@vger.kernel.org
    Cc: linux-ia64@vger.kernel.org
    Cc: linux-s390@vger.kernel.org
    Cc: linux-sh@vger.kernel.org
    Cc: linux-xtensa@linux-xtensa.org
    Cc: sparclinux@vger.kernel.org
    Link: http://lkml.kernel.org/r/1460041951-22347-12-git-send-email-mhocko@kernel.org
    Signed-off-by: Ingo Molnar

    Michal Hocko
     

18 Feb, 2015

1 commit

  • In order to optimize the spinning step, we need to set the lock
    owner as soon as the lock is acquired; after a successful counter
    cmpxchg operation, that is. This is particularly useful as rwsems
    need to set the owner to nil for readers, so there is a greater
    chance of falling out of the spinning. Currently we only set the
    owner much later in the game, in the more generic level -- latency
    can be specially bad when waiting for a node->next pointer when
    releasing the osq in up_write calls.

    As such, update the owner inside rwsem_try_write_lock (when the
    lock is obtained after blocking) and rwsem_try_write_lock_unqueued
    (when the lock is obtained while spinning). This requires creating
    a new internal rwsem.h header to share the owner related calls.

    Also cleanup some headers for mutex and rwsem.

    Suggested-by: Peter Zijlstra
    Signed-off-by: Davidlohr Bueso
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Jason Low
    Cc: Linus Torvalds
    Cc: Michel Lespinasse
    Cc: Paul E. McKenney
    Cc: Tim Chen
    Link: http://lkml.kernel.org/r/1422609267-15102-4-git-send-email-dave@stgolabs.net
    Signed-off-by: Ingo Molnar

    Davidlohr Bueso
     

16 Jul, 2014

1 commit

  • Just like with mutexes (CONFIG_MUTEX_SPIN_ON_OWNER),
    encapsulate the dependencies for rwsem optimistic spinning.
    No logical changes here as it continues to depend on both
    SMP and the XADD algorithm variant.

    Signed-off-by: Davidlohr Bueso
    Acked-by: Jason Low
    [ Also make it depend on ARCH_SUPPORTS_ATOMIC_RMW. ]
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1405112406-13052-2-git-send-email-davidlohr@hp.com
    Cc: aswin@hp.com
    Cc: Chris Mason
    Cc: Davidlohr Bueso
    Cc: Josef Bacik
    Cc: Linus Torvalds
    Cc: Waiman Long
    Signed-off-by: Ingo Molnar

    Signed-off-by: Ingo Molnar

    Davidlohr Bueso
     

05 Jun, 2014

1 commit

  • We have reached the point where our mutexes are quite fine tuned
    for a number of situations. This includes the use of heuristics
    and optimistic spinning, based on MCS locking techniques.

    Exclusive ownership of read-write semaphores are, conceptually,
    just about the same as mutexes, making them close cousins. To
    this end we need to make them both perform similarly, and
    right now, rwsems are simply not up to it. This was discovered
    by both reverting commit 4fc3f1d6 (mm/rmap, migration: Make
    rmap_walk_anon() and try_to_unmap_anon() more scalable) and
    similarly, converting some other mutexes (ie: i_mmap_mutex) to
    rwsems. This creates a situation where users have to choose
    between a rwsem and mutex taking into account this important
    performance difference. Specifically, biggest difference between
    both locks is when we fail to acquire a mutex in the fastpath,
    optimistic spinning comes in to play and we can avoid a large
    amount of unnecessary sleeping and overhead of moving tasks in
    and out of wait queue. Rwsems do not have such logic.

    This patch, based on the work from Tim Chen and I, adds support
    for write-side optimistic spinning when the lock is contended.
    It also includes support for the recently added cancelable MCS
    locking for adaptive spinning. Note that is is only applicable
    to the xadd method, and the spinlock rwsem variant remains intact.

    Allowing optimistic spinning before putting the writer on the wait
    queue reduces wait queue contention and provided greater chance
    for the rwsem to get acquired. With these changes, rwsem is on par
    with mutex. The performance benefits can be seen on a number of
    workloads. For instance, on a 8 socket, 80 core 64bit Westmere box,
    aim7 shows the following improvements in throughput:

    +--------------+---------------------+-----------------+
    | Workload | throughput-increase | number of users |
    +--------------+---------------------+-----------------+
    | alltests | 20% | >1000 |
    | custom | 27%, 60% | 10-100, >1000 |
    | high_systime | 36%, 30% | >100, >1000 |
    | shared | 58%, 29% | 10-100, >1000 |
    +--------------+---------------------+-----------------+

    There was also improvement on smaller systems, such as a quad-core
    x86-64 laptop running a 30Gb PostgreSQL (pgbench) workload for up
    to +60% in throughput for over 50 clients. Additionally, benefits
    were also noticed in exim (mail server) workloads. Furthermore, no
    performance regression have been seen at all.

    Based-on-work-from: Tim Chen
    Signed-off-by: Davidlohr Bueso
    [peterz: rej fixup due to comment patches, sched/rt.h header]
    Signed-off-by: Peter Zijlstra
    Cc: Alex Shi
    Cc: Andi Kleen
    Cc: Michel Lespinasse
    Cc: Rik van Riel
    Cc: Peter Hurley
    Cc: "Paul E.McKenney"
    Cc: Jason Low
    Cc: Aswin Chandramouleeswaran
    Cc: Andrew Morton
    Cc: Linus Torvalds
    Cc: "Scott J Norton"
    Cc: Andrea Arcangeli
    Cc: Chris Mason
    Cc: Josef Bacik
    Link: http://lkml.kernel.org/r/1399055055.6275.15.camel@buesod1.americas.hpqcorp.net
    Signed-off-by: Ingo Molnar

    Davidlohr Bueso
     

06 Nov, 2013

1 commit