22 Nov, 2016

1 commit

  • An over-committed guest with more vCPUs than pCPUs has a heavy overload
    in osq_lock().

    This is because if vCPU-A holds the osq lock and yields out, vCPU-B ends
    up waiting for per_cpu node->locked to be set. IOW, vCPU-B waits for
    vCPU-A to run and unlock the osq lock.

    Use the new vcpu_is_preempted(cpu) interface to detect if a vCPU is
    currently running or not, and break out of the spin-loop if so.

    test case:

    $ perf record -a perf bench sched messaging -g 400 -p && perf report

    before patch:
    18.09% sched-messaging [kernel.vmlinux] [k] osq_lock
    12.28% sched-messaging [kernel.vmlinux] [k] rwsem_spin_on_owner
    5.27% sched-messaging [kernel.vmlinux] [k] mutex_unlock
    3.89% sched-messaging [kernel.vmlinux] [k] wait_consider_task
    3.64% sched-messaging [kernel.vmlinux] [k] _raw_write_lock_irq
    3.41% sched-messaging [kernel.vmlinux] [k] mutex_spin_on_owner.is
    2.49% sched-messaging [kernel.vmlinux] [k] system_call

    after patch:
    20.68% sched-messaging [kernel.vmlinux] [k] mutex_spin_on_owner
    8.45% sched-messaging [kernel.vmlinux] [k] mutex_unlock
    4.12% sched-messaging [kernel.vmlinux] [k] system_call
    3.01% sched-messaging [kernel.vmlinux] [k] system_call_common
    2.83% sched-messaging [kernel.vmlinux] [k] copypage_power7
    2.64% sched-messaging [kernel.vmlinux] [k] rwsem_spin_on_owner
    2.00% sched-messaging [kernel.vmlinux] [k] osq_lock

    Suggested-by: Boqun Feng
    Tested-by: Juergen Gross
    Signed-off-by: Pan Xinhui
    Signed-off-by: Peter Zijlstra (Intel)
    Acked-by: Christian Borntraeger
    Acked-by: Paolo Bonzini
    Cc: David.Laight@ACULAB.COM
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: benh@kernel.crashing.org
    Cc: bsingharora@gmail.com
    Cc: dave@stgolabs.net
    Cc: kernellwp@gmail.com
    Cc: konrad.wilk@oracle.com
    Cc: linuxppc-dev@lists.ozlabs.org
    Cc: mpe@ellerman.id.au
    Cc: paulmck@linux.vnet.ibm.com
    Cc: paulus@samba.org
    Cc: rkrcmar@redhat.com
    Cc: virtualization@lists.linux-foundation.org
    Cc: will.deacon@arm.com
    Cc: xen-devel-request@lists.xenproject.org
    Cc: xen-devel@lists.xenproject.org
    Link: http://lkml.kernel.org/r/1478077718-37424-3-git-send-email-xinhui.pan@linux.vnet.ibm.com
    [ Translated to English. ]
    Signed-off-by: Ingo Molnar

    Pan Xinhui
     

16 Nov, 2016

1 commit

  • With the s390 special case of a yielding cpu_relax() implementation gone,
    we can now remove all users of cpu_relax_lowlatency() and replace them
    with cpu_relax().

    Signed-off-by: Christian Borntraeger
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Catalin Marinas
    Cc: Heiko Carstens
    Cc: Linus Torvalds
    Cc: Martin Schwidefsky
    Cc: Nicholas Piggin
    Cc: Noam Camus
    Cc: Peter Zijlstra
    Cc: Russell King
    Cc: Thomas Gleixner
    Cc: Will Deacon
    Cc: linuxppc-dev@lists.ozlabs.org
    Cc: virtualization@lists.linux-foundation.org
    Cc: xen-devel@lists.xenproject.org
    Link: http://lkml.kernel.org/r/1477386195-32736-5-git-send-email-borntraeger@de.ibm.com
    Signed-off-by: Ingo Molnar

    Christian Borntraeger
     

18 Dec, 2015

1 commit

  • The Cavium guys reported a soft lockup on their arm64 machine, caused by
    commit c55a6ffa6285 ("locking/osq: Relax atomic semantics"):

    mutex_optimistic_spin+0x9c/0x1d0
    __mutex_lock_slowpath+0x44/0x158
    mutex_lock+0x54/0x58
    kernfs_iop_permission+0x38/0x70
    __inode_permission+0x88/0xd8
    inode_permission+0x30/0x6c
    link_path_walk+0x68/0x4d4
    path_openat+0xb4/0x2bc
    do_filp_open+0x74/0xd0
    do_sys_open+0x14c/0x228
    SyS_openat+0x3c/0x48
    el0_svc_naked+0x24/0x28

    This is because in osq_lock we initialise the node for the current CPU:

    node->locked = 0;
    node->next = NULL;
    node->cpu = curr;

    and then publish the current CPU in the lock tail:

    old = atomic_xchg_acquire(&lock->tail, curr);

    Once the update to lock->tail is visible to another CPU, the node is
    then live and can be both read and updated by concurrent lockers.

    Unfortunately, the ACQUIRE semantics of the xchg operation mean that
    there is no guarantee the contents of the node will be visible before
    lock tail is updated. This can lead to lock corruption when, for
    example, a concurrent locker races to set the next field.

    Fixes: c55a6ffa6285 ("locking/osq: Relax atomic semantics"):
    Reported-by: David Daney
    Reported-by: Andrew Pinski
    Tested-by: Andrew Pinski
    Acked-by: Davidlohr Bueso
    Signed-off-by: Will Deacon
    Signed-off-by: Peter Zijlstra (Intel)
    Link: http://lkml.kernel.org/r/1449856001-21177-1-git-send-email-will.deacon@arm.com
    Signed-off-by: Linus Torvalds

    Will Deacon
     

18 Sep, 2015

1 commit

  • ... by using acquire/release for ops around the lock->tail. As such,
    weakly ordered archs can benefit from more relaxed use of barriers
    when issuing atomics.

    Signed-off-by: Davidlohr Bueso
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andrew Morton
    Cc: Linus Torvalds
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Waiman Long
    Link: http://lkml.kernel.org/r/1442216244-4409-3-git-send-email-dave@stgolabs.net
    Signed-off-by: Ingo Molnar

    Davidlohr Bueso
     

24 Feb, 2015

1 commit

  • With the new standardized functions, we can replace all
    ACCESS_ONCE() calls across relevant locking - this includes
    lockref and seqlock while at it.

    ACCESS_ONCE() does not work reliably on non-scalar types.
    For example gcc 4.6 and 4.7 might remove the volatile tag
    for such accesses during the SRA (scalar replacement of
    aggregates) step:

    https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58145

    Update the new calls regardless of if it is a scalar type,
    this is cleaner than having three alternatives.

    Signed-off-by: Davidlohr Bueso
    Cc: Peter Zijlstra
    Cc: Linus Torvalds
    Cc: Andrew Morton
    Cc: Thomas Gleixner
    Cc: Paul E. McKenney
    Link: http://lkml.kernel.org/r/1424662301.6539.18.camel@stgolabs.net
    Signed-off-by: Ingo Molnar

    Davidlohr Bueso
     

14 Jan, 2015

2 commits

  • Both mutexes and rwsems took a performance hit when we switched
    over from the original mcs code to the cancelable variant (osq).
    The reason being the use of smp_load_acquire() when polling for
    node->locked. This is not needed as reordering is not an issue,
    as such, relax the barrier semantics. Paul describes the scenario
    nicely: https://lkml.org/lkml/2013/11/19/405

    - If we start polling before the insertion is complete, all that
    happens is that the first few polls have no chance of seeing a lock
    grant.

    - Ordering the polling against the initialization -- the above
    xchg() is already doing that for us.

    The smp_load_acquire() when unqueuing make sense. In addition,
    we don't need to worry about leaking the critical region as
    osq is only used internally.

    This impacts both regular and large levels of concurrency,
    ie on a 40 core system with a disk intensive workload:

    disk-1 804.83 ( 0.00%) 828.16 ( 2.90%)
    disk-61 8063.45 ( 0.00%) 18181.82 (125.48%)
    disk-121 7187.41 ( 0.00%) 20119.17 (179.92%)
    disk-181 6933.32 ( 0.00%) 20509.91 (195.82%)
    disk-241 6850.81 ( 0.00%) 20397.80 (197.74%)
    disk-301 6815.22 ( 0.00%) 20287.58 (197.68%)
    disk-361 7080.40 ( 0.00%) 20205.22 (185.37%)
    disk-421 7076.13 ( 0.00%) 19957.33 (182.04%)
    disk-481 7083.25 ( 0.00%) 19784.06 (179.31%)
    disk-541 7038.39 ( 0.00%) 19610.92 (178.63%)
    disk-601 7072.04 ( 0.00%) 19464.53 (175.23%)
    disk-661 7010.97 ( 0.00%) 19348.23 (175.97%)
    disk-721 7069.44 ( 0.00%) 19255.33 (172.37%)
    disk-781 7007.58 ( 0.00%) 19103.14 (172.61%)
    disk-841 6981.18 ( 0.00%) 18964.22 (171.65%)
    disk-901 6968.47 ( 0.00%) 18826.72 (170.17%)
    disk-961 6964.61 ( 0.00%) 18708.02 (168.62%)

    Signed-off-by: Davidlohr Bueso
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: "Paul E. McKenney"
    Cc: Thomas Gleixner
    Cc: Linus Torvalds
    Link: http://lkml.kernel.org/r/1420573509-24774-7-git-send-email-dave@stgolabs.net
    Signed-off-by: Ingo Molnar

    Davidlohr Bueso
     
  • We have two flavors of the MCS spinlock: standard and cancelable (OSQ).
    While each one is independent of the other, we currently mix and match
    them. This patch:

    - Moves the OSQ code out of mcs_spinlock.h (which only deals with the traditional
    version) into include/linux/osq_lock.h. No unnecessary code is added to the
    more global header file, anything locks that make use of OSQ must include
    it anyway.

    - Renames mcs_spinlock.c to osq_lock.c. This file only contains osq code.

    - Introduces a CONFIG_LOCK_SPIN_ON_OWNER in order to only build osq_lock
    if there is support for it.

    Signed-off-by: Davidlohr Bueso
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Thomas Gleixner
    Cc: "Paul E. McKenney"
    Cc: Jason Low
    Cc: Linus Torvalds
    Cc: Mikulas Patocka
    Cc: Waiman Long
    Link: http://lkml.kernel.org/r/1420573509-24774-5-git-send-email-dave@stgolabs.net
    Signed-off-by: Ingo Molnar

    Davidlohr Bueso