12 Aug, 2015

1 commit

  • Including an asm/ header directly is best avoided, so use linux/atomic.h
    instead of asm/cmpxchg.h in linux/llist.h.

    Signed-off-by: Will Deacon
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Waiman.Long@hp.com
    Cc: paulmck@linux.vnet.ibm.com
    Link: http://lkml.kernel.org/r/1438880084-18856-8-git-send-email-will.deacon@arm.com
    Signed-off-by: Ingo Molnar

    Will Deacon
     

15 Nov, 2013

1 commit


24 Jul, 2013

1 commit

  • In preparation for lockless flip buffers, make the flip buffer
    free list lockless.

    NB: using llist is not the optimal solution, as the driver and
    buffer work may contend over the llist head unnecessarily. However,
    test measurements indicate this contention is low.

    Signed-off-by: Peter Hurley
    Signed-off-by: Greg Kroah-Hartman

    Peter Hurley
     

13 Jul, 2013

2 commits

  • llist_add(new, head) can simply use llist_add_batch(new, new, head),
    no need to duplicate the code.

    This obviously uninlines llist_add() and to me this is a win. But we
    can make llist_add_batch() inline if this is desirable, in this case
    gcc can notice that new_first == new_last if the caller is llist_add().

    Signed-off-by: Oleg Nesterov
    Cc: Al Viro
    Cc: Andrey Vagin
    Cc: "Eric W. Biederman"
    Cc: David Howells
    Cc: Huang Ying
    Cc: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Al Viro

    Oleg Nesterov
     
  • 1. This is mostly theoretical, but llist_add*() need ACCESS_ONCE().

    Otherwise it is not guaranteed that the first cmpxchg() uses the
    same value for old_entry and new_last->next.

    2. These helpers cache the result of cmpxchg() and read the initial
    value of head->first before the main loop. I do not think this
    makes sense. In the likely case cmpxchg() succeeds, otherwise
    it doesn't hurt to reload head->first.

    I think it would be better to simplify the code and simply read
    ->first before cmpxchg().

    Signed-off-by: Oleg Nesterov
    Cc: Al Viro
    Cc: Andrey Vagin
    Cc: "Eric W. Biederman"
    Cc: David Howells
    Cc: Huang Ying
    Cc: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Al Viro

    Oleg Nesterov
     

29 Mar, 2012

1 commit

  • asm/system.h is a cause of circular dependency problems because it contains
    commonly used primitive stuff like barrier definitions and uncommonly used
    stuff like switch_to() that might require MMU definitions.

    asm/system.h has been disintegrated by this point on all arches into the
    following common segments:

    (1) asm/barrier.h

    Moved memory barrier definitions here.

    (2) asm/cmpxchg.h

    Moved xchg() and cmpxchg() here. #included in asm/atomic.h.

    (3) asm/bug.h

    Moved die() and similar here.

    (4) asm/exec.h

    Moved arch_align_stack() here.

    (5) asm/elf.h

    Moved AT_VECTOR_SIZE_ARCH here.

    (6) asm/switch_to.h

    Moved switch_to() here.

    Signed-off-by: David Howells

    David Howells
     

01 Nov, 2011

1 commit


11 Oct, 2011

1 commit

  • Commit 1230db8e1543 ("llist: Make some llist functions inline")
    has deleted the definitions, causing problems for (not upstream yet)
    code that tries to make use of them.

    Signed-off-by: Stephen Rothwell
    Acked-by: Peter Zijlstra
    Cc: Huang Ying
    Cc: David Miller
    Link: http://lkml.kernel.org/r/20111005172528.0d0a8afc65acef7ace22a24e@canb.auug.org.au
    Signed-off-by: Ingo Molnar

    Stephen Rothwell
     

04 Oct, 2011

6 commits

  • Initial benchmarks show they're a net loss:

    $ for i in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor ; do echo performance > $i; done
    $ echo 4096 32000 64 128 > /proc/sys/kernel/sem
    $ ./sembench -t 2048 -w 1900 -o 0

    Pre:

    run time 30 seconds 778936 worker burns per second
    run time 30 seconds 912190 worker burns per second
    run time 30 seconds 817506 worker burns per second
    run time 30 seconds 830870 worker burns per second
    run time 30 seconds 845056 worker burns per second

    Post:

    run time 30 seconds 905920 worker burns per second
    run time 30 seconds 849046 worker burns per second
    run time 30 seconds 886286 worker burns per second
    run time 30 seconds 822320 worker burns per second
    run time 30 seconds 900283 worker burns per second

    So about 4% faster. (!)

    cpu_relax() stalls the pipeline, therefore, when used in a tight loop
    it has the following benefits:

    - allows SMT siblings to have a go;
    - reduces pressure on the CPU interconnect.

    However, cmpxchg loops are unfair and thus have unbounded completion
    time, therefore we should avoid getting in such heavily contended
    situations where the above benefits make any difference.

    A typical cmpxchg loop should not go round more than a handfull of
    times at worst, therefore adding extra delays just slows things down.

    Since the llist primitives are new, there aren't any bad users yet,
    and we should avoid growing them. Heavily contended sites should
    generally be better off using the ticket locks for serialization since
    they provide bounded completion times (fifo-fair over the cpus).

    Signed-off-by: Peter Zijlstra
    Cc: Huang Ying
    Cc: Andrew Morton
    Link: http://lkml.kernel.org/r/1315836358.26517.43.camel@twins
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • So we don't have to expose the struct list_node member.

    Cc: Huang Ying
    Cc: Andrew Morton
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1315836348.26517.41.camel@twins
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Extend the llist_add*() functions to return a success indicator, this
    allows us in the scheduler code to send an IPI if the queue was empty.

    ( There's no effect on existing users, because the list_add_xxx() functions
    are inline, thus this will be optimized out by the compiler if not used
    by callers. )

    Signed-off-by: Huang Ying
    Cc: Mathieu Desnoyers
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1315461646-1379-5-git-send-email-ying.huang@intel.com
    Signed-off-by: Ingo Molnar

    Huang Ying
     
  • If in llist_add()/etc. functions the first cmpxchg() call succeeds, it is
    not necessary to use cpu_relax() before the cmpxchg(). So cpu_relax() in
    a busy loop involving cmpxchg() should go after cmpxchg() instead of before
    that.

    This patch fixes this for all involved llist functions.

    Signed-off-by: Huang Ying
    Acked-by: Mathieu Desnoyers
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1315461646-1379-4-git-send-email-ying.huang@intel.com
    Signed-off-by: Ingo Molnar

    Huang Ying
     
  • Remove the nmi() checks spread around the code. in_nmi() is not available
    on every architecture and it's a pretty obscure and ugly check in any case.

    Cc: Huang Ying
    Cc: Mathieu Desnoyers
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1315461646-1379-3-git-send-email-ying.huang@intel.com
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • Because llist code will be used in performance critical scheduler
    code path, make llist_add() and llist_del_all() inline to avoid
    function calling overhead and related 'glue' overhead.

    Signed-off-by: Huang Ying
    Acked-by: Mathieu Desnoyers
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1315461646-1379-2-git-send-email-ying.huang@intel.com
    Signed-off-by: Ingo Molnar

    Huang Ying
     

03 Aug, 2011

1 commit

  • Cmpxchg is used to implement adding new entry to the list, deleting
    all entries from the list, deleting first entry of the list and some
    other operations.

    Because this is a single list, so the tail can not be accessed in O(1).

    If there are multiple producers and multiple consumers, llist_add can
    be used in producers and llist_del_all can be used in consumers. They
    can work simultaneously without lock. But llist_del_first can not be
    used here. Because llist_del_first depends on list->first->next does
    not changed if list->first is not changed during its operation, but
    llist_del_first, llist_add, llist_add (or llist_del_all, llist_add,
    llist_add) sequence in another consumer may violate that.

    If there are multiple producers and one consumer, llist_add can be
    used in producers and llist_del_all or llist_del_first can be used in
    the consumer.

    This can be summarized as follow:

    | add | del_first | del_all
    add | - | - | -
    del_first | | L | L
    del_all | | | -

    Where "-" stands for no lock is needed, while "L" stands for lock is
    needed.

    The list entries deleted via llist_del_all can be traversed with
    traversing function such as llist_for_each etc. But the list entries
    can not be traversed safely before deleted from the list. The order
    of deleted entries is from the newest to the oldest added one. If you
    want to traverse from the oldest to the newest, you must reverse the
    order by yourself before traversing.

    The basic atomic operation of this list is cmpxchg on long. On
    architectures that don't have NMI-safe cmpxchg implementation, the
    list can NOT be used in NMI handler. So code uses the list in NMI
    handler should depend on CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG.

    Signed-off-by: Huang Ying
    Reviewed-by: Andi Kleen
    Reviewed-by: Mathieu Desnoyers
    Cc: Andrew Morton
    Signed-off-by: Len Brown

    Huang Ying