01 Nov, 2016

2 commits

  • Similar to the list_add() debug consolidation, this commit consolidates
    the debug checking performed during CONFIG_DEBUG_LIST into a new
    __list_del_entry_valid() function, and stops list updates when corruption
    is found.

    Refactored from same hardening in PaX and Grsecurity.

    Signed-off-by: Kees Cook
    Acked-by: Steven Rostedt
    Signed-off-by: Paul E. McKenney
    Acked-by: Rik van Riel

    Kees Cook
     
  • Right now, __list_add() code is repeated either in list.h or in
    list_debug.c, but the only differences between the two versions
    are the debug checks. This commit therefore extracts these debug
    checks into a separate __list_add_valid() function and consolidates
    __list_add(). Additionally this new __list_add_valid() function will stop
    list manipulations if a corruption is detected, instead of allowing for
    further corruption that may lead to even worse conditions.

    This is slight refactoring of the same hardening done in PaX and Grsecurity.

    Signed-off-by: Kees Cook
    Acked-by: Steven Rostedt
    Signed-off-by: Paul E. McKenney
    Acked-by: Rik van Riel

    Kees Cook
     

15 Sep, 2016

1 commit

  • Due to the use of READ_ONCE() in list_empty() the compiler cannot
    optimise !list_empty() ? list_first_entry() : NULL very well. By
    manually expanding list_first_entry_or_null() we can take advantage of
    the READ_ONCE() to avoid the list element changing under the test while
    the compiler can generate smaller code.

    Signed-off-by: Chris Wilson
    Cc: "Paul E. McKenney"
    Cc: Andrew Morton
    Cc: Dan Williams
    Cc: Jan Kara
    Cc: Josef Bacik
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Paul E. McKenney

    Chris Wilson
     

07 Jul, 2016

1 commit

  • Required to figure out whether the entry is the only one in the hlist.

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Frederic Weisbecker
    Cc: Arjan van de Ven
    Cc: Chris Mason
    Cc: Eric Dumazet
    Cc: George Spelvin
    Cc: Josh Triplett
    Cc: Len Brown
    Cc: Linus Torvalds
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: rt@linutronix.de
    Link: http://lkml.kernel.org/r/20160704094341.867631372@linutronix.de
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     

10 Mar, 2016

1 commit

  • Given we have uninitialized list_heads being passed to list_add() it
    will always be the case that those uninitialized values randomly trigger
    the poison value. Especially since a list_add() operation will seed the
    stack with the poison value for later stack allocations to trip over.

    For example, see these two false positive reports:

    list_add attempted on force-poisoned entry
    WARNING: at lib/list_debug.c:34
    [..]
    NIP [c00000000043c390] __list_add+0xb0/0x150
    LR [c00000000043c38c] __list_add+0xac/0x150
    Call Trace:
    __list_add+0xac/0x150 (unreliable)
    __down+0x4c/0xf8
    down+0x68/0x70
    xfs_buf_lock+0x4c/0x150 [xfs]

    list_add attempted on force-poisoned entry(0000000000000500),
    new->next == d0000000059ecdb0, new->prev == 0000000000000500
    WARNING: at lib/list_debug.c:33
    [..]
    NIP [c00000000042db78] __list_add+0xa8/0x140
    LR [c00000000042db74] __list_add+0xa4/0x140
    Call Trace:
    __list_add+0xa4/0x140 (unreliable)
    rwsem_down_read_failed+0x6c/0x1a0
    down_read+0x58/0x60
    xfs_log_commit_cil+0x7c/0x600 [xfs]

    Fixes: commit 5c2c2587b132 ("mm, dax, pmem: introduce {get|put}_dev_pagemap() for dax-gup")
    Signed-off-by: Dan Williams
    Reported-by: Eryu Guan
    Tested-by: Eryu Guan
    Cc: Ross Zwisler
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dan Williams
     

16 Jan, 2016

1 commit

  • get_dev_page() enables paths like get_user_pages() to pin a dynamically
    mapped pfn-range (devm_memremap_pages()) while the resulting struct page
    objects are in use. Unlike get_page() it may fail if the device is, or
    is in the process of being, disabled. While the initial lookup of the
    range may be an expensive list walk, the result is cached to speed up
    subsequent lookups which are likely to be in the same mapped range.

    devm_memremap_pages() now requires a reference counter to be specified
    at init time. For pmem this means moving request_queue allocation into
    pmem_alloc() so the existing queue usage counter can track "device
    pages".

    ZONE_DEVICE pages always have an elevated count and will never be on an
    lru reclaim list. That space in 'struct page' can be redirected for
    other uses, but for safety introduce a poison value that will always
    trip __list_add() to assert. This allows half of the struct list_head
    storage to be reclaimed with some assurance to back up the assumption
    that the page count never goes to zero and a list_add() is never
    attempted.

    Signed-off-by: Dan Williams
    Tested-by: Logan Gunthorpe
    Cc: Dave Hansen
    Cc: Matthew Wilcox
    Cc: Ross Zwisler
    Cc: Alexander Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dan Williams
     

05 Dec, 2015

1 commit

  • Code that does lockless emptiness testing of non-RCU lists is relying
    on INIT_LIST_HEAD() to write the list head's ->next pointer atomically,
    particularly when INIT_LIST_HEAD() is invoked from list_del_init().
    This commit therefore adds WRITE_ONCE() to this function's pointer stores
    that could affect the head's ->next pointer.

    Reported-by: Andrey Konovalov
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

24 Nov, 2015

2 commits

  • Most of the list-empty-check macros (list_empty(), hlist_empty(),
    hlist_bl_empty(), hlist_nulls_empty(), and hlist_nulls_empty()) use
    an unadorned load to check the list header. Given that these macros
    are sometimes invoked without the protection of a lock, this is
    not sufficient. This commit therefore adds READ_ONCE() calls to
    them. This commit does not touch llist_empty() because it already
    has the needed ACCESS_ONCE().

    Reported-by: Dmitry Vyukov
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Code that does lockless emptiness testing of non-RCU lists is relying
    on the list-addition code to write the list head's ->next pointer
    atomically. This commit therefore adds WRITE_ONCE() to list-addition
    pointer stores that could affect the head's ->next pointer.

    Reported-by: Dmitry Vyukov
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

07 Oct, 2015

1 commit

  • The various RCU list-deletion macros (list_del_rcu(),
    hlist_del_init_rcu(), hlist_del_rcu(), hlist_bl_del_init_rcu(),
    hlist_bl_del_rcu(), hlist_nulls_del_init_rcu(), and hlist_nulls_del_rcu())
    do plain stores into the ->next pointer of the preceding list elemment.
    Unfortunately, the compiler is within its rights to (for example) use
    byte-at-a-time writes to update the pointer, which would fatally confuse
    concurrent readers. This patch therefore adds the needed WRITE_ONCE()
    macros.

    KernelThreadSanitizer (KTSAN) reported the __hlist_del() issue, which
    is a problem when __hlist_del() is invoked by hlist_del_rcu().

    Reported-by: Dmitry Vyukov
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     

18 Aug, 2015

1 commit

  • Some filesystems don't use the VFS inode hash and fake the fact they
    are hashed so that all the writeback code works correctly. However,
    this means the evict() path still tries to remove the inode from the
    hash, meaning that the inode_hash_lock() needs to be taken
    unnecessarily. Hence under certain workloads the inode_hash_lock can
    be contended even if the inode is never actually hashed.

    To avoid this add hlist_fake to test if the inode isn't actually
    hashed to avoid taking the hash lock on inodes that have never been
    hashed. Based on Dave Chinner's

    inode: add IOP_NOTHASHED to avoid inode hash lock in evict

    basd on Al's suggestions. Thanks,

    Signed-off-by: Josef Bacik
    Reviewed-by: Jan Kara
    Reviewed-by: Christoph Hellwig
    Tested-by: Dave Chinner

    Josef Bacik
     

20 Nov, 2014

1 commit


14 Oct, 2014

1 commit


07 Aug, 2014

2 commits

  • All other add functions for lists have the new item as first argument
    and the position where it is added as second argument. This was changed
    for no good reason in this function and makes using it unnecessary
    confusing.

    The name was changed to hlist_add_behind() to cause unconverted code to
    generate a compile error instead of using the wrong parameter order.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Ken Helias
    Cc: "Paul E. McKenney"
    Acked-by: Jeff Kirsher [intel driver bits]
    Cc: Hugh Dickins
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ken Helias
     
  • The argument names for hlist_add_after() are poorly chosen because they
    look the same as the ones for hlist_add_before() but have to be used
    differently.

    hlist_add_after_rcu() has made a better choice.

    Signed-off-by: Ken Helias
    Cc: "Paul E. McKenney"
    Cc: Christoph Hellwig
    Cc: Hugh Dickins
    Cc: Jeff Kirsher
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ken Helias
     

13 Nov, 2013

3 commits

  • We already have list_first_entry(), it makes sense to also add
    list_last_entry() for consistency. And we use both helpers in
    list_for_each_*().

    Signed-off-by: Oleg Nesterov
    Cc: Eilon Greenstein
    Cc: Greg Kroah-Hartman
    Cc: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Now that we have list_{next,prev}_entry() we can change
    list_for_each_entry*() and list_safe_reset_next() to use the new helpers
    to improve the readability.

    Signed-off-by: Oleg Nesterov
    Cc: Eilon Greenstein
    Cc: Greg Kroah-Hartman
    Cc: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Add two trivial helpers list_next_entry() and list_prev_entry(), they
    can have a lot of users including list.h itself. In fact the 1st one is
    already defined in events/core.c and bnx2x_sp.c, so the patch simply
    moves the definition to list.h.

    Signed-off-by: Oleg Nesterov
    Cc: Eilon Greenstein
    Cc: Greg Kroah-Hartman
    Cc: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     

17 Jul, 2013

1 commit

  • __list_for_each used to be the non prefetch() aware list walking
    primitive. When we removed the prefetch macros from the list routines,
    it became redundant. Given it does exactly the same thing as
    list_for_each now, we might as well remove it and call list_for_each
    directly.

    All users of __list_for_each have been converted to list_for_each calls
    in the current merge window.

    Signed-off-by: Dave Jones
    Signed-off-by: Linus Torvalds

    Dave Jones
     

01 Jun, 2013

1 commit


15 Mar, 2013

1 commit

  • The current version of hlist_entry_safe() fetches the pointer twice,
    once to test for NULL and the other to compute the offset back to the
    enclosing structure. This is OK for normal lock-based use because in
    that case, the pointer cannot change. However, when the pointer is
    protected by RCU (as in "rcu_dereference(p)"), then the pointer can
    change at any time. This use case can result in the following sequence
    of events:

    1. CPU 0 invokes hlist_entry_safe(), fetches the RCU-protected
    pointer as sees that it is non-NULL.

    2. CPU 1 invokes hlist_del_rcu(), deleting the entry that CPU 0
    just fetched a pointer to. Because this is the last entry
    in the list, the pointer fetched by CPU 0 is now NULL.

    3. CPU 0 refetches the pointer, obtains NULL, and then gets a
    NULL-pointer crash.

    This commit therefore applies gcc's "({ })" statement expression to
    create a temporary variable so that the specified pointer is fetched
    only once, avoiding the above sequence of events. Please note that
    it is the caller's responsibility to use rcu_dereference() as needed.
    This allows RCU-protected uses to work correctly without imposing
    any additional overhead on the non-RCU case.

    Many thanks to Eric Dumazet for spotting root cause!

    Reported-by: CAI Qian
    Reported-by: Eric Dumazet
    Signed-off-by: Paul E. McKenney
    Tested-by: Li Zefan

    Paul E. McKenney
     

28 Feb, 2013

1 commit

  • I'm not sure why, but the hlist for each entry iterators were conceived

    list_for_each_entry(pos, head, member)

    The hlist ones were greedy and wanted an extra parameter:

    hlist_for_each_entry(tpos, pos, head, member)

    Why did they need an extra pos parameter? I'm not quite sure. Not only
    they don't really need it, it also prevents the iterator from looking
    exactly like the list iterator, which is unfortunate.

    Besides the semantic patch, there was some manual work required:

    - Fix up the actual hlist iterators in linux/list.h
    - Fix up the declaration of other iterators based on the hlist ones.
    - A very small amount of places were using the 'node' parameter, this
    was modified to use 'obj->member' instead.
    - Coccinelle didn't handle the hlist_for_each_entry_safe iterator
    properly, so those had to be fixed up manually.

    The semantic patch which is mostly the work of Peter Senna Tschudin is here:

    @@
    iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;

    type T;
    expression a,c,d,e;
    identifier b;
    statement S;
    @@

    -T b;

    [akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
    [akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
    [akpm@linux-foundation.org: checkpatch fixes]
    [akpm@linux-foundation.org: fix warnings]
    [akpm@linux-foudnation.org: redo intrusive kvm changes]
    Tested-by: Peter Senna Tschudin
    Acked-by: Paul E. McKenney
    Signed-off-by: Sasha Levin
    Cc: Wu Fengguang
    Cc: Marcelo Tosatti
    Cc: Gleb Natapov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sasha Levin
     

20 May, 2011

2 commits

  • This is removes the use of software prefetching from the regular list
    iterators. We don't want it. If you do want to prefetch in some
    iterator of yours, go right ahead. Just don't expect the iterator to do
    it, since normally the downsides are bigger than the upsides.

    It also replaces with , because the
    use of LIST_POISON ends up needing it. is sadly not
    self-contained, and including prefetch.h just happened to hide that.

    Suggested by David Miller (networking has a lot of regular lists that
    are often empty or a single entry, and prefetching is not going to do
    anything but add useless instructions).

    Acked-by: Ingo Molnar
    Acked-by: David S. Miller
    Cc: linux-arch@vger.kernel.org
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • They not only increase the code footprint, they actually make things
    slower rather than faster. On internationally acclaimed benchmarks
    ("make -j16" on an already fully built kernel source tree) the hlist
    prefetching slows down the build by up to 1%.

    (Almost all of it comes from hlist_for_each_entry_rcu() as used by
    avc_has_perm_noaudit(), which is very hot due to all the pathname
    lookups to see if there is anything to do).

    The cause seems to be two-fold:

    - on at least some Intel cores, prefetch(NULL) ends up with some
    microarchitectural stall due to the TLB miss that it incurs. The
    hlist case triggers this very commonly, since the NULL pointer is the
    last entry in the list.

    - the prefetch appears to cause more D$ activity, probably because it
    prefetches hash list entries that are never actually used (because we
    ended the search early due to a hit).

    Regardless, the numbers clearly say that the implicit prefetching is
    simply a bad idea. If some _particular_ user of the hlist iterators
    wants to prefetch the next list entry, they can do so themselves
    explicitly, rather than depend on all list iterators doing so
    implicitly.

    Acked-by: Ingo Molnar
    Acked-by: David S. Miller
    Cc: linux-arch@vger.kernel.org
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

19 Feb, 2011

1 commit

  • When list debugging is enabled, we aim to readably show list corruption
    errors, and the basic list_add/list_del operations end up having extra
    debugging code in them to do some basic validation of the list entries.

    However, "list_del_init()" and "list_move[_tail]()" ended up avoiding
    the debug code due to how they were written. This fixes that.

    So the _next_ time we have list_move() problems with stale list entries,
    we'll hopefully have an easier time finding them..

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

26 Oct, 2010

1 commit


07 Oct, 2010

1 commit

  • Drop inclusions of asm/system.h from linux/hardirq.h and linux/list.h as
    they're no longer required and prevent the M68K arch's IRQ flag handling macros
    from being made into inlined functions due to circular dependencies.

    Signed-off-by: David Howells
    Acked-by: Greg Ungerer
    Acked-by: Geert Uytterhoeven

    David Howells
     

07 Jul, 2010

2 commits


30 Jun, 2010

1 commit

  • list_for_each_entry_safe is not suitable to protect against concurrent
    modification of the list. 6754af6 introduced a race in sb walking.

    list_for_each_entry can use the trick of pinning the current entry in
    the list before we drop and retake the lock because it subsequently
    follows cur->next. However list_for_each_entry_safe saves n=cur->next
    for following before entering the loop body, so when the lock is
    dropped, n may be deleted.

    Signed-off-by: Nick Piggin
    Cc: Christoph Hellwig
    Cc: John Stultz
    Cc: Frank Mayhar
    Cc: Al Viro
    Signed-off-by: Linus Torvalds

    npiggin@suse.de
     

07 Mar, 2010

1 commit


16 Jan, 2010

1 commit

  • Bring a new list_rotate_left() helper that rotates a list to
    the left. This is useful for codes that need to round roubin
    elements which queue priority increases from tail to head.

    Signed-off-by: Frederic Weisbecker
    Acked-by: Peter Zijlstra
    Cc: Paul Mackerras
    Cc: Ingo Molnar
    Cc: Arnaldo Carvalho de Melo

    Frederic Weisbecker
     

01 Sep, 2008

1 commit

  • Daniel J. Blueman reported:
    > =======================================================
    > [ INFO: possible circular locking dependency detected ]
    > 2.6.27-rc4-224c #1
    > -------------------------------------------------------
    > hald/4680 is trying to acquire lock:
    > (&n->list_lock){++..}, at: [] add_partial+0x26/0x80
    >
    > but task is already holding lock:
    > (&obj_hash[i].lock){++..}, at: []
    > debug_object_free+0x5c/0x120

    We fix it by moving the actual freeing to outside the lock (the lock
    now only protects the list).

    The pool lock is also promoted to irq-safe (suggested by Dan). It's
    necessary because free_pool is now called outside the irq disabled
    region. So we need to protect against an interrupt handler which calls
    debug_object_init().

    [tglx@linutronix.de: added hlist_move_list helper to avoid looping
    through the list twice]

    Reported-by: Daniel J Blueman
    Signed-off-by: Vegard Nossum
    Signed-off-by: Thomas Gleixner

    Vegard Nossum
     

09 Aug, 2008

1 commit

  • Fix fatal multi-line kernel-doc error in list.h:
    function short description must be on one line.

    Error(linux-2.6.27-rc2-git3//include/linux/list.h:318): duplicate section name 'Description'

    Signed-off-by: Randy Dunlap
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     

07 Aug, 2008

2 commits


26 Jul, 2008

1 commit

  • Remove the conditional surrounding the definition of list_add() from list.h
    since, if you define CONFIG_DEBUG_LIST, the definition you will subsequently
    pick up from lib/list_debug.c will be absolutely identical, at which point you
    can remove that redundant definition from list_debug.c as well.

    Signed-off-by: Robert P. J. Day
    Cc: Dave Jones
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Robert P. J. Day
     

19 May, 2008

1 commit

  • Move rcu-protected lists from list.h into a new header file rculist.h.

    This is done because list are a very used primitive structure all over the
    kernel and it's currently impossible to include other header files in this
    list.h without creating some circular dependencies.

    For example, list.h implements rcu-protected list and uses rcu_dereference()
    without including rcupdate.h. It actually compiles because users of
    rcu_dereference() are macros. Others RCU functions could be used too but
    aren't probably because of this.

    Therefore this patch creates rculist.h which includes rcupdates without to
    many changes/troubles.

    Signed-off-by: Franck Bui-Huu
    Acked-by: Paul E. McKenney
    Acked-by: Josh Triplett
    Signed-off-by: Andrew Morton
    Signed-off-by: Ingo Molnar

    Franck Bui-Huu
     

30 Apr, 2008

1 commit


29 Apr, 2008

1 commit