13 Jan, 2012

1 commit

  • Down, down in the deepest depths of GFP_NOIO page reclaim, we have
    shrink_page_list() calling __remove_mapping() calling __delete_from_
    swap_cache() or __delete_from_page_cache().

    You would not expect those to need much stack, but in fact they call
    radix_tree_delete(): which declares a 192-byte radix_tree_path array on
    its stack (to record the node,offsets it visits when descending, in case
    it needs to ascend to update them). And if any tag is still set [1],
    that calls radix_tree_tag_clear(), which declares a further such
    192-byte radix_tree_path array on the stack. (At least we have
    interrupts disabled here, so won't then be pushing registers too.)

    That was probably a good choice when most users were 32-bit (array of
    half the size), and adding fields to radix_tree_node would have bloated
    it unnecessarily. But nowadays many are 64-bit, and each
    radix_tree_node contains a struct rcu_head, which is only used when
    freeing; whereas the radix_tree_path info is only used for updating the
    tree (deleting, clearing tags or setting tags if tagged) when a lock
    must be held, of no interest when accessing the tree locklessly.

    So add a parent pointer to the radix_tree_node, in union with the
    rcu_head, and remove all uses of the radix_tree_path. There would be
    space in that union to save the offset when descending as before (we can
    argue that a lock must already be held to exclude other users), but
    recalculating it when ascending is both easy (a constant shift and a
    constant mask) and uncommon, so it seems better just to do that.

    Two little optimizations: no need to decrement height when descending,
    adjusting shift is enough; and once radix_tree_tag_if_tagged() has set
    tag on a node and its ancestors, it need not ascend from that node
    again.

    perf on the radix tree test harness reports radix_tree_insert() as 2%
    slower (now having to set parent), but radix_tree_delete() 24% faster.
    Surely that's an exaggeration from rtth's artificially low map shift 3,
    but forcing it back to 6 still rates radix_tree_delete() 8% faster.

    [1] Can a pagecache tag (dirty, writeback or towrite) actually still be
    set at the time of radix_tree_delete()? Perhaps not if the filesystem is
    well-behaved. But although I've not tracked any stack overflow down to
    this cause, I have observed a curious case in which a dirty tag is set
    and left set on tmpfs: page migration's migrate_page_copy() happens to
    use __set_page_dirty_nobuffers() to set PageDirty on the newpage, and
    that sets PAGECACHE_TAG_DIRTY as a side-effect - harmless to a
    filesystem which doesn't use tags, except for this stack depth issue.

    Signed-off-by: Hugh Dickins
    Cc: Jan Kara
    Cc: Dave Chinner
    Cc: Mel Gorman
    Cc: Nai Xia
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

01 Nov, 2011

1 commit

  • radix_tree_tag_get()'s BUG (when it sees a tag after saw_unset_tag) was
    unsafe and removed in 2.6.34, but the pointless saw_unset_tag left behind.

    Remove it now, and return 0 as soon as we see unset tag - we already rely
    upon the root tag to be correct, returning 0 immediately if it's not set.

    Signed-off-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

04 Aug, 2011

2 commits

  • We have already acknowledged that swapoff of a tmpfs file is slower than
    it was before conversion to the generic radix_tree: a little slower
    there will be acceptable, if the hotter paths are faster.

    But it was a shock to find swapoff of a 500MB file 20 times slower on my
    laptop, taking 10 minutes; and at that rate it significantly slows down
    my testing.

    Now, most of that turned out to be overhead from PROVE_LOCKING and
    PROVE_RCU: without those it was only 4 times slower than before; and
    more realistic tests on other machines don't fare as badly.

    I've tried a number of things to improve it, including tagging the swap
    entries, then doing lookup by tag: I'd expected that to halve the time,
    but in practice it's erratic, and often counter-productive.

    The only change I've so far found to make a consistent improvement, is
    to short-circuit the way we go back and forth, gang lookup packing
    entries into the array supplied, then shmem scanning that array for the
    target entry. Scanning in place doubles the speed, so it's now only
    twice as slow as before (or three times slower when the PROVEs are on).

    So, add radix_tree_locate_item() as an expedient, once-off,
    single-caller hack to do the lookup directly in place. #ifdef it on
    CONFIG_SHMEM and CONFIG_SWAP, as much to document its limited
    applicability as save space in other configurations. And, sadly,
    #include sched.h for cond_resched().

    Signed-off-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • A patchset to extend tmpfs to MAX_LFS_FILESIZE by abandoning its
    peculiar swap vector, instead keeping a file's swap entries in the same
    radix tree as its struct page pointers: thus saving memory, and
    simplifying its code and locking.

    This patch:

    The radix_tree is used by several subsystems for different purposes. A
    major use is to store the struct page pointers of a file's pagecache for
    memory management. But what if mm wanted to store something other than
    page pointers there too?

    The low bit of a radix_tree entry is already used to denote an indirect
    pointer, for internal use, and the unlikely radix_tree_deref_retry()
    case.

    Define the next bit as denoting an exceptional entry, and supply inline
    functions radix_tree_exception() to return non-0 in either unlikely
    case, and radix_tree_exceptional_entry() to return non-0 in the second
    case.

    If a subsystem already uses radix_tree with that bit set, no problem: it
    does not affect internal workings at all, but is defined for the
    convenience of those storing well-aligned pointers in the radix_tree.

    The radix_tree_gang_lookups have an implicit assumption that the caller
    can deduce the offset of each entry returned e.g. by the page->index of
    a struct page. But that may not be feasible for some kinds of item to
    be stored there.

    radix_tree_gang_lookup_slot() allow for an optional indices argument,
    output array in which to return those offsets. The same could be added
    to other radix_tree_gang_lookups, but for now keep it to the only one
    for which we need it.

    Signed-off-by: Hugh Dickins
    Acked-by: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

26 Jan, 2011

1 commit

  • Executed command: fsstress -d /mnt -n 600 -p 850

    crash> bt
    PID: 7947 TASK: ffff880160546a70 CPU: 0 COMMAND: "fsstress"
    #0 [ffff8800dfc07d00] machine_kexec at ffffffff81030db9
    #1 [ffff8800dfc07d70] crash_kexec at ffffffff810a7952
    #2 [ffff8800dfc07e40] oops_end at ffffffff814aa7c8
    #3 [ffff8800dfc07e70] die_nmi at ffffffff814aa969
    #4 [ffff8800dfc07ea0] do_nmi_callback at ffffffff8102b07b
    #5 [ffff8800dfc07f10] do_nmi at ffffffff814aa514
    #6 [ffff8800dfc07f50] nmi at ffffffff814a9d60
    [exception RIP: __lookup_tag+100]
    RIP: ffffffff812274b4 RSP: ffff88016056b998 RFLAGS: 00000287
    RAX: 0000000000000000 RBX: 0000000000000002 RCX: 0000000000000006
    RDX: 000000000000001d RSI: ffff88016056bb18 RDI: ffff8800c85366e0
    RBP: ffff88016056b9c8 R8: ffff88016056b9e8 R9: 0000000000000000
    R10: 000000000000000e R11: ffff8800c8536908 R12: 0000000000000010
    R13: 0000000000000040 R14: ffffffffffffffc0 R15: ffff8800c85366e0
    ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018

    #7 [ffff88016056b998] __lookup_tag at ffffffff812274b4
    #8 [ffff88016056b9d0] radix_tree_gang_lookup_tag_slot at ffffffff81227605
    #9 [ffff88016056ba20] find_get_pages_tag at ffffffff810fc110
    #10 [ffff88016056ba80] pagevec_lookup_tag at ffffffff81105e85
    #11 [ffff88016056baa0] write_cache_pages at ffffffff81104c47
    #12 [ffff88016056bbd0] generic_writepages at ffffffff81105014
    #13 [ffff88016056bbe0] do_writepages at ffffffff81105055
    #14 [ffff88016056bbf0] __filemap_fdatawrite_range at ffffffff810fb2cb
    #15 [ffff88016056bc40] filemap_write_and_wait_range at ffffffff810fb32a
    #16 [ffff88016056bc70] generic_file_direct_write at ffffffff810fb3dc
    #17 [ffff88016056bce0] __generic_file_aio_write at ffffffff810fcee5
    #18 [ffff88016056bda0] generic_file_aio_write at ffffffff810fd085
    #19 [ffff88016056bdf0] do_sync_write at ffffffff8114f9ea
    #20 [ffff88016056bf00] vfs_write at ffffffff8114fcf8
    #21 [ffff88016056bf30] sys_write at ffffffff81150691
    #22 [ffff88016056bf80] system_call_fastpath at ffffffff8100c0b2

    I think this root cause is the following:

    radix_tree_range_tag_if_tagged() always tags the root tag with settag
    if the root tag is set with iftag even if there are no iftag tags
    in the specified range (Of course, there are some iftag tags
    outside the specified range).

    ===============================================================================
    [[[Detailed description]]]

    (1) Why cannot radix_tree_gang_lookup_tag_slot() return forever?

    __lookup_tag():
    - Return with 0.
    - Return with the index which is not bigger than the old one as the
    input parameter.

    Therefore the following "while" repeats forever because the above
    conditions cause "ret" not to be updated and the cur_index cannot be
    changed into the bigger one.

    (So, radix_tree_gang_lookup_tag_slot() cannot return forever.)

    radix_tree_gang_lookup_tag_slot():
    1178 while (ret < max_items) {
    1179 unsigned int slots_found;
    1180 unsigned long next_index; /* Index of next search */
    1181
    1182 if (cur_index > max_index)
    1183 break;
    1184 slots_found = __lookup_tag(node, results + ret,
    1185 cur_index, max_items - ret, &next_index,
    tag);
    1186 ret += slots_found;
    // cannot update ret because slots_found == 0.
    // so, this while loops forever.
    1187 if (next_index == 0)
    1188 break;
    1189 cur_index = next_index;
    1190 }

    (2) Why does __lookup_tag() return with 0 and doesn't update the index?

    Assuming the following:
    - the one of the slot in radix_tree_node is NULL.
    - the one of the tag which corresponds to the slot sets with
    PAGECACHE_TAG_TOWRITE or other.
    - In a certain height(!=0), the corresponding index is 0.

    a) __lookup_tag() notices that the tag is set.

    1005 static unsigned int
    1006 __lookup_tag(struct radix_tree_node *slot, void ***results, unsigned long index,
    1007 unsigned int max_items, unsigned long *next_index, unsigned int tag)
    1008 {
    1009 unsigned int nr_found = 0;
    1010 unsigned int shift, height;
    1011
    1012 height = slot->height;
    1013 if (height == 0)
    1014 goto out;
    1015 shift = (height-1) * RADIX_TREE_MAP_SHIFT;
    1016
    1017 while (height > 0) {
    1018 unsigned long i = (index >> shift) & RADIX_TREE_MAP_MASK ;
    1019
    1020 for (;;) {
    1021 if (tag_get(slot, tag, i))
    1022 break;
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    * the index is not updated yet.

    b) __lookup_tag() notices that the slot is NULL.

    1023 index &= ~((1UL << shift) - 1);
    1024 index += 1UL << shift;
    1025 if (index == 0)
    1026 goto out; /* 32-bit wraparound */
    1027 i++;
    1028 if (i == RADIX_TREE_MAP_SIZE)
    1029 goto out;
    1030 }
    1031 height--;
    1032 if (height == 0) { /* Bottom level: grab some items */
    ...
    1055 }
    1056 shift -= RADIX_TREE_MAP_SHIFT;
    1057 slot = rcu_dereference_raw(slot->slots[i]);
    1058 if (slot == NULL)
    1059 break;
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

    c) __lookup_tag() doesn't update the index and return with 0.

    1060 }
    1061 out:
    1062 *next_index = index;
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    1063 return nr_found;
    1064 }

    (3) Why is the slot NULL even if the tag is set?

    Because radix_tree_range_tag_if_tagged() always sets the root tag with
    PAGECACHE_TAG_TOWRITE if the root tag is set with PAGECACHE_TAG_DIRTY,
    even if there is no tag which can be set with PAGECACHE_TAG_TOWRITE
    in the specified range (from *first_indexp to last_index). Of course,
    some PAGECACHE_TAG_DIRTY nodes must exist outside the specified range.
    (radix_tree_range_tag_if_tagged() is called only from tag_pages_for_writeback())

    640 unsigned long radix_tree_range_tag_if_tagged(struct radix_tree_root
    *root,
    641 unsigned long *first_indexp, unsigned long last_index,
    642 unsigned long nr_to_tag,
    643 unsigned int iftag, unsigned int settag)
    644 {
    645 unsigned int height = root->height;
    646 struct radix_tree_path path[height];
    647 struct radix_tree_path *pathp = path;
    648 struct radix_tree_node *slot;
    649 unsigned int shift;
    650 unsigned long tagged = 0;
    651 unsigned long index = *first_indexp;
    652
    653 last_index = min(last_index, radix_tree_maxindex(height));
    654 if (index > last_index)
    655 return 0;
    656 if (!nr_to_tag)
    657 return 0;
    658 if (!root_tag_get(root, iftag)) {
    659 *first_indexp = last_index + 1;
    660 return 0;
    661 }
    662 if (height == 0) {
    663 *first_indexp = last_index + 1;
    664 root_tag_set(root, settag);
    665 return 1;
    666 }
    ...
    733 root_tag_set(root, settag);
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    734 *first_indexp = index;
    735
    736 return tagged;
    737 }

    As the result, there is no radix_tree_node which is set with
    PAGECACHE_TAG_TOWRITE but the root tag(radix_tree_root) is set with
    PAGECACHE_TAG_TOWRITE.

    [figure: inside radix_tree]
    (Please see the figure with typewriter font)
    ===========================================
    [roottag = DIRTY]
    | tag=0:NOTHING
    tag[0 0 0 1] 1:DIRTY
    [x x x +] 2:WRITEBACK
    | 3:DIRTY,WRITEBACK
    p 4:TOWRITE
    5:DIRTY,TOWRITE ...
    specified range (index: 0 to 2)

    * There is no DIRTY tag within the specified range.
    (But there is a DIRTY tag outside that range.)

    | | | | | | | | |
    after calling tag_pages_for_writeback()
    | | | | | | | | |
    v v v v v v v v v

    [roottag = DIRTY,TOWRITE]
    | p is "page".
    tag[0 0 0 1] x is NULL.
    [x x x +] +- is a pointer to "page".
    |
    p

    * But TOWRITE tag is set on the root tag.
    ============================================

    After that, radix_tree_extend() via radix_tree_insert() is called
    when the page is added.
    This function sets the new radix_tree_node with PAGECACHE_TAG_TOWRITE
    to succeed the status of the root tag.

    246 static int radix_tree_extend(struct radix_tree_root *root, unsigned long
    index)
    247 {
    248 struct radix_tree_node *node;
    249 unsigned int height;
    250 int tag;
    251
    252 /* Figure out what the height should be. */
    253 height = root->height + 1;
    254 while (index > radix_tree_maxindex(height))
    255 height++;
    256
    257 if (root->rnode == NULL) {
    258 root->height = height;
    259 goto out;
    260 }
    261
    262 do {
    263 unsigned int newheight;
    264 if (!(node = radix_tree_node_alloc(root)))
    265 return -ENOMEM;
    266
    267 /* Increase the height. */
    268 node->slots[0] = radix_tree_indirect_to_ptr(root->rnode);
    269
    270 /* Propagate the aggregated tag info into the new root */
    271 for (tag = 0; tag < RADIX_TREE_MAX_TAGS; tag++) {
    272 if (root_tag_get(root, tag))
    273 tag_set(node, tag, 0);
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    274 }

    ===========================================
    [roottag = DIRTY,TOWRITE]
    | :
    tag[0 0 0 1] [0 0 0 0]
    [x x x +] [+ x x x]
    | |
    p p (new page)

    | | | | | | | | |
    after calling radix_tree_insert
    | | | | | | | | |
    v v v v v v v v v

    [roottag = DIRTY,TOWRITE]
    |
    tag [5 0 0 0] * DIRTY and TOWRITE tags are
    [+ + x x] succeeded to the new node.
    | |
    tag [0 0 0 1] [0 0 0 0]
    [x x x +] [+ x x x]
    | |
    p p
    ============================================

    After that, the index 3 page is released by remove_from_page_cache().
    Then we can make the situation that the tag is set with PAGECACHE_TAG_TOWRITE
    and that the slot which corresponds to the tag is NULL.
    ===========================================
    [roottag = DIRTY,TOWRITE]
    |
    tag [5 0 0 0]
    [+ + x x]
    | |
    tag [0 0 0 1] [0 0 0 0]
    [x x x +] [+ x x x]
    | |
    p p
    (remove)

    | | | | | | | | |
    after calling remove_page_cache
    | | | | | | | | |
    v v v v v v v v v

    [roottag = DIRTY,TOWRITE]
    |
    tag [4 0 0 0] * Only DIRTY tag is cleared
    [x + x x] because no TOWRITE tag is existed
    | in the bottom node.
    [0 0 0 0]
    [+ x x x]
    |
    p
    ============================================

    To solve this problem

    Change to that radix_tree_tag_if_tagged() doesn't tag the root tag
    if it doesn't set any tags within the specified range.

    Like this.
    ============================================
    640 unsigned long radix_tree_range_tag_if_tagged(struct radix_tree_root
    *root,
    641 unsigned long *first_indexp, unsigned long last_index,
    642 unsigned long nr_to_tag,
    643 unsigned int iftag, unsigned int settag)
    644 {
    650 unsigned long tagged = 0;
    ...
    733 if (tagged)
    ^^^^^^^^^^^^^^^^^^^^^^^^
    734 root_tag_set(root, settag);
    735 *first_indexp = index;
    736
    737 return tagged;
    738 }

    ============================================

    Signed-off-by: Toshiyuki Okajima
    Acked-by: Jan Kara
    Cc: Dave Chinner
    Cc: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Toshiyuki Okajima
     

12 Nov, 2010

1 commit

  • Salman Qazi describes the following radix-tree bug:

    In the following case, we get can get a deadlock:

    0. The radix tree contains two items, one has the index 0.
    1. The reader (in this case find_get_pages) takes the rcu_read_lock.
    2. The reader acquires slot(s) for item(s) including the index 0 item.
    3. The non-zero index item is deleted, and as a consequence the other item is
    moved to the root of the tree. The place where it used to be is queued for
    deletion after the readers finish.
    3b. The zero item is deleted, removing it from the direct slot, it remains in
    the rcu-delayed indirect node.
    4. The reader looks at the index 0 slot, and finds that the page has 0 ref
    count
    5. The reader looks at it again, hoping that the item will either be freed or
    the ref count will increase. This never happens, as the slot it is looking
    at will never be updated. Also, this slot can never be reclaimed because
    the reader is holding rcu_read_lock and is in an infinite loop.

    The fix is to re-use the same "indirect" pointer case that requires a slot
    lookup retry into a general "retry the lookup" bit.

    Signed-off-by: Nick Piggin
    Reported-by: Salman Qazi
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     

07 Oct, 2010

1 commit


23 Aug, 2010

4 commits

  • …/linux-2.6-rcu into core/rcu

    Ingo Molnar
     
  • * 'radix-tree' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/xfsdev:
    radix-tree: radix_tree_range_tag_if_tagged() can set incorrect tags
    radix-tree: clear all tags in radix_tree_node_rcu_free

    Linus Torvalds
     
  • Commit ebf8aa44beed48cd17893a83d92a4403e5f9d9e2 ("radix-tree:
    omplement function radix_tree_range_tag_if_tagged") does not safely
    set tags on on intermediate tree nodes. The code walks down the tree
    setting tags before it has fully resolved the path to the leaf under
    the assumption there will be a leaf slot with the tag set in the
    range it is searching.

    Unfortunately, this is not a valid assumption - we can abort after
    setting a tag on an intermediate node if we overrun the number of
    tags we are allowed to set in a batch, or stop scanning because we
    we have passed the last scan index before we reach a leaf slot with
    the tag we are searching for set.

    As a result, we can leave the function with tags set on intemediate
    nodes which can be tripped over later by tag-based lookups. The
    result of these stale tags is that lookup may end prematurely or
    livelock because the lookup cannot make progress.

    The fix for the problem involves reocrding the traversal path we
    take to the leaf nodes, and only propagating the tags back up the
    tree once the tag is set in the leaf node slot. We are already
    recording the path for efficient traversal, so there is no
    additional overhead to do the intermediately node tag setting in
    this manner.

    This fixes a radix tree lookup livelock triggered by the new
    writeback sync livelock avoidance code introduced in commit
    f446daaea9d4a420d16c606f755f3689dcb2d0ce ("mm: implement writeback
    livelock avoidance using page tagging").

    Signed-off-by: Dave Chinner
    Acked-by: Jan Kara

    Dave Chinner
     
  • Commit f446daaea9d4a420d16c606f755f3689dcb2d0ce ("mm: implement
    writeback livelock avoidance using page tagging") introduced a new
    radix tree tag, increasing the number of tags in each node from 2 to
    3. It did not, however, fix up the code in
    radix_tree_node_rcu_free() that cleans up after radix_tree_shrink()
    and hence could leave stray tags set in the new tag array.

    The result is that the livelock avoidance code added in the the
    above commit would hit stale tags when doing tag based lookups,
    resulting in livelocks when trying to traverse the tree.

    Fix this problem in radix_tree_node_rcu_free() so it doesn't happen
    again in the future by using a loop to walk all the tags up to
    RADIX_TREE_MAX_TAGS to clear the stray tags radix_tree_shrink()
    leaves behind.

    Signed-off-by: Dave Chinner
    Acked-by: Nick Piggin
    Acked-by: Jan Kara

    Dave Chinner
     

21 Aug, 2010

1 commit


20 Aug, 2010

1 commit


10 Aug, 2010

1 commit


28 May, 2010

1 commit


10 Apr, 2010

1 commit

  • radix_tree_tag_get() is not safe to use concurrently with radix_tree_tag_set()
    or radix_tree_tag_clear(). The problem is that the double tag_get() in
    radix_tree_tag_get():

    if (!tag_get(node, tag, offset))
    saw_unset_tag = 1;
    if (height == 1) {
    int ret = tag_get(node, tag, offset);

    may see the value change due to the action of set/clear. RCU is no protection
    against this as no pointers are being changed, no nodes are being replaced
    according to a COW protocol - set/clear alter the node directly.

    The documentation in linux/radix-tree.h, however, says that
    radix_tree_tag_get() is an exception to the rule that "any function modifying
    the tree or tags (...) must exclude other modifications, and exclude any
    functions reading the tree".

    The problem is that the next statement in radix_tree_tag_get() checks that the
    tag doesn't vary over time:

    BUG_ON(ret && saw_unset_tag);

    This has been seen happening in FS-Cache:

    https://www.redhat.com/archives/linux-cachefs/2010-April/msg00013.html

    To this end, remove the BUG_ON() from radix_tree_tag_get() and note in various
    comments that the value of the tag may change whilst the RCU read lock is held,
    and thus that the return value of radix_tree_tag_get() may not be relied upon
    unless radix_tree_tag_set/clear() and radix_tree_delete() are excluded from
    running concurrently with it.

    Reported-by: Romain DEGEZ
    Signed-off-by: David Howells
    Acked-by: Nick Piggin
    Signed-off-by: Linus Torvalds

    David Howells
     

30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

25 Feb, 2010

1 commit

  • Because the radix tree is used with many different locking
    designs, we cannot do any effective checking without changing
    the radix-tree APIs. It might make sense to do this later, but
    only if the RCU lockdep checking proves itself sufficiently
    valuable.

    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     

20 Nov, 2009

2 commits

  • Don't delete pending pages from the page-store tracking tree, but rather send
    them for another write as they've presumably been updated.

    Signed-off-by: David Howells

    David Howells
     
  • __fscache_write_page() attempts to load the radix tree preallocation pool for
    the CPU it is on before calling radix_tree_insert(), as the insertion must be
    done inside a pair of spinlocks.

    Use of the preallocation pool, however, is contingent on the radix tree being
    initialised without __GFP_WAIT specified. __fscache_acquire_cookie() was
    passing GFP_NOFS to INIT_RADIX_TREE() - but that includes __GFP_WAIT.

    The solution is to AND out __GFP_WAIT.

    Additionally, the banner comment to radix_tree_preload() is altered to make
    note of this prerequisite. Possibly there should be a WARN_ON() too.

    Without this fix, I have seen the following recursive deadlock caused by
    radix_tree_insert() attempting to allocate memory inside the spinlocked
    region, which resulted in FS-Cache being called back into to release memory -
    which required the spinlock already held.

    =============================================
    [ INFO: possible recursive locking detected ]
    2.6.32-rc6-cachefs #24
    ---------------------------------------------
    nfsiod/7916 is trying to acquire lock:
    (&cookie->lock){+.+.-.}, at: [] __fscache_uncache_page+0xdb/0x160 [fscache]

    but task is already holding lock:
    (&cookie->lock){+.+.-.}, at: [] __fscache_write_page+0x15c/0x3f3 [fscache]

    other info that might help us debug this:
    5 locks held by nfsiod/7916:
    #0: (nfsiod){+.+.+.}, at: [] worker_thread+0x19a/0x2e2
    #1: (&task->u.tk_work#2){+.+.+.}, at: [] worker_thread+0x19a/0x2e2
    #2: (&cookie->lock){+.+.-.}, at: [] __fscache_write_page+0x15c/0x3f3 [fscache]
    #3: (&object->lock#2){+.+.-.}, at: [] __fscache_write_page+0x197/0x3f3 [fscache]
    #4: (&cookie->stores_lock){+.+...}, at: [] __fscache_write_page+0x19f/0x3f3 [fscache]

    stack backtrace:
    Pid: 7916, comm: nfsiod Not tainted 2.6.32-rc6-cachefs #24
    Call Trace:
    [] __lock_acquire+0x1649/0x16e3
    [] ? __lock_acquire+0x7b7/0x16e3
    [] ? dump_trace+0x248/0x257
    [] lock_acquire+0x57/0x6d
    [] ? __fscache_uncache_page+0xdb/0x160 [fscache]
    [] _spin_lock+0x2c/0x3b
    [] ? __fscache_uncache_page+0xdb/0x160 [fscache]
    [] __fscache_uncache_page+0xdb/0x160 [fscache]
    [] ? __fscache_check_page_write+0x0/0x71 [fscache]
    [] nfs_fscache_release_page+0x86/0xc4 [nfs]
    [] nfs_release_page+0x3c/0x41 [nfs]
    [] try_to_release_page+0x32/0x3b
    [] shrink_page_list+0x316/0x4ac
    [] ? mark_held_locks+0x52/0x70
    [] ? _spin_unlock_irq+0x2b/0x31
    [] shrink_inactive_list+0x392/0x67c
    [] ? mark_held_locks+0x52/0x70
    [] shrink_list+0x8d/0x8f
    [] shrink_zone+0x278/0x33c
    [] ? ktime_get_ts+0xad/0xba
    [] try_to_free_pages+0x22e/0x392
    [] ? isolate_pages_global+0x0/0x212
    [] __alloc_pages_nodemask+0x3dc/0x5cf
    [] cache_alloc_refill+0x34d/0x6c1
    [] ? radix_tree_node_alloc+0x52/0x5c
    [] kmem_cache_alloc+0xb2/0x118
    [] radix_tree_node_alloc+0x52/0x5c
    [] radix_tree_insert+0x57/0x19c
    [] __fscache_write_page+0x1e3/0x3f3 [fscache]
    [] __nfs_readpage_to_fscache+0x58/0x11e [nfs]
    [] nfs_readpage_release+0x34/0x9b [nfs]
    [] nfs_readpage_release_full+0x32/0x4b [nfs]
    [] rpc_release_calldata+0x12/0x14 [sunrpc]
    [] rpc_free_task+0x59/0x61 [sunrpc]
    [] rpc_async_release+0x10/0x12 [sunrpc]
    [] worker_thread+0x1ef/0x2e2
    [] ? worker_thread+0x19a/0x2e2
    [] ? thread_return+0x3e/0x101
    [] ? rpc_async_release+0x0/0x12 [sunrpc]
    [] ? autoremove_wake_function+0x0/0x34
    [] ? trace_hardirqs_on+0xd/0xf
    [] ? worker_thread+0x0/0x2e2
    [] kthread+0x7a/0x82
    [] child_rip+0xa/0x20
    [] ? restore_args+0x0/0x30
    [] ? add_wait_queue+0x15/0x44
    [] ? kthread+0x0/0x82
    [] ? child_rip+0x0/0x20

    Signed-off-by: David Howells

    David Howells
     

17 Jun, 2009

2 commits

  • radix_tree_lookup() and radix_tree_lookup_slot() have much the
    same code except for the return value.

    Introduce radix_tree_lookup_element() to do the real work.

    /*
    * is_slot == 1 : search for the slot.
    * is_slot == 0 : search for the node.
    */
    static void * radix_tree_lookup_element(struct radix_tree_root *root,
    unsigned long index, int is_slot);

    Signed-off-by: Huang Shijie
    Cc: Nick Piggin
    Cc: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Huang Shijie
     
  • The counterpart of radix_tree_next_hole(). To be used by context readahead.

    Signed-off-by: Wu Fengguang
    Cc: Vladislav Bolkhovitin
    Cc: Jens Axboe
    Cc: Jeff Moyer
    Cc: Nick Piggin
    Cc: Ying Han
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wu Fengguang
     

08 Jan, 2009

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (24 commits)
    trivial: chack -> check typo fix in main Makefile
    trivial: Add a space (and a comma) to a printk in 8250 driver
    trivial: Fix misspelling of "firmware" in docs for ncr53c8xx/sym53c8xx
    trivial: Fix misspelling of "firmware" in powerpc Makefile
    trivial: Fix misspelling of "firmware" in usb.c
    trivial: Fix misspelling of "firmware" in qla1280.c
    trivial: Fix misspelling of "firmware" in a100u2w.c
    trivial: Fix misspelling of "firmware" in megaraid.c
    trivial: Fix misspelling of "firmware" in ql4_mbx.c
    trivial: Fix misspelling of "firmware" in acpi_memhotplug.c
    trivial: Fix misspelling of "firmware" in ipw2100.c
    trivial: Fix misspelling of "firmware" in atmel.c
    trivial: Fix misspelled firmware in Kconfig
    trivial: fix an -> a typos in documentation and comments
    trivial: fix then -> than typos in comments and documentation
    trivial: update Jesper Juhl CREDITS entry with new email
    trivial: fix singal -> signal typo
    trivial: Fix incorrect use of "loose" in event.c
    trivial: printk: fix indentation of new_text_line declaration
    trivial: rtc-stk17ta8: fix sparse warning
    ...

    Linus Torvalds
     

07 Jan, 2009

1 commit

  • radix_tree_preloads is unused outside of this file, make it static.

    Noticed by sparse:
    lib/radix-tree.c:84:1: warning: symbol 'per_cpu__radix_tree_preloads' was not declared. Should it be static?

    Signed-off-by: Harvey Harrison
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Harvey Harrison
     

06 Jan, 2009

1 commit


27 Jul, 2008

2 commits

  • Kmem cache passed to constructor is only needed for constructors that are
    themselves multiplexeres. Nobody uses this "feature", nor does anybody uses
    passed kmem cache in non-trivial way, so pass only pointer to object.

    Non-trivial places are:
    arch/powerpc/mm/init_64.c
    arch/powerpc/mm/hugetlbpage.c

    This is flag day, yes.

    Signed-off-by: Alexey Dobriyan
    Acked-by: Pekka Enberg
    Acked-by: Christoph Lameter
    Cc: Jon Tollefson
    Cc: Nick Piggin
    Cc: Matt Mackall
    [akpm@linux-foundation.org: fix arch/powerpc/mm/hugetlbpage.c]
    [akpm@linux-foundation.org: fix mm/slab.c]
    [akpm@linux-foundation.org: fix ubifs]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • Introduce gang_lookup_slot() and gang_lookup_slot_tag() functions, which
    are used by lockless pagecache.

    Signed-off-by: Nick Piggin
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Hugh Dickins
    Cc: "Paul E. McKenney"
    Reviewed-by: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     

05 Jul, 2008

1 commit

  • Remove all clameter@sgi.com addresses from the kernel tree since they will
    become invalid on June 27th. Change my maintainer email address for the
    slab allocators to cl@linux-foundation.org (which will be the new email
    address for the future).

    Signed-off-by: Christoph Lameter
    Signed-off-by: Christoph Lameter
    Cc: Pekka Enberg
    Cc: Stephen Rothwell
    Cc: Matt Mackall
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

13 Jun, 2008

1 commit

  • We shrink a radix tree when its root node has only one child, in the left
    most slot. The child becomes the new root node. To perform this
    operation in a manner compatible with concurrent lockless lookups, we
    atomically switch the root pointer from the parent to its child.

    However a concurrent lockless lookup may now have loaded a pointer to the
    parent (and is presently deciding what to do next). For this reason, we
    also have to keep the parent node in a valid state after shrinking the
    tree, until the next RCU grace period -- otherwise this lookup with the
    parent pointer may not do the right thing. Notably, we need to keep the
    child in the left most slot there in case that is requested by the lookup.

    This is all pretty standard RCU stuff. It is worth repeating because in
    my eagerness to obey the radix tree node constructor scheme, I had broken
    it by zeroing the radix tree node before the grace period.

    What could happen is that a lookup can load the parent pointer, then
    decide it wants to follow the left most child slot, only to find the slot
    contained NULL due to the concurrent shrinker having zeroed the parent
    node before waiting for a grace period. The lookup would return a false
    negative as a result.

    Fix it by doing that clearing in the RCU callback. I would normally want
    to rip out the constructor entirely, but radix tree nodes are one of those
    places where they make sense (only few cachelines will be touched soon
    after allocation).

    This was never actually found in any lockless pagecache testing or by the
    test harness, but by seeing the odd problem with my scalable vmap rewrite.
    I have not tickled the test harness into reproducing it yet, but I'll
    keep working at it.

    Fortunately, it is not a problem anywhere lockless pagecache is used in
    mainline kernels (pagecache probe is not a guarantee, and brd does not
    have concurrent lookups and deletes).

    Signed-off-by: Nick Piggin
    Acked-by: Peter Zijlstra
    Cc: "Paul E. McKenney"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     

28 Apr, 2008

1 commit

  • Migrate flags must be set on slab creation as agreed upon when the antifrag
    logic was reviewed. Otherwise some slabs of a slabcache will end up in the
    unmovable and others in the reclaimable section depending on which flag was
    active when a new slab page was allocated.

    This likely slid in somehow when antifrag was merged. Remove it.

    The buffer_heads are always allocated with __GFP_RECLAIMABLE because the
    SLAB_RECLAIM_ACCOUNT option is set. The set_migrateflags() never had any
    effect there.

    Radix tree allocations are not directly reclaimable but they are allocated
    with __GFP_RECLAIMABLE set on each allocation. We now set
    SLAB_RECLAIM_ACCOUNT on radix tree slab creation making sure that radix
    tree slabs are consistently placed in the reclaimable section. Radix tree
    slabs will also be accounted as such.

    There is then no user left of set_migratepages. So remove it.

    Signed-off-by: Christoph Lameter
    Cc: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

06 Feb, 2008

1 commit

  • Most pagecache (and some other) radix tree insertions have the great
    opportunity to preallocate a few nodes with relaxed gfp flags. But the
    preallocation is squandered when it comes time to allocate a node, we
    default to first attempting a GFP_ATOMIC allocation -- that doesn't
    normally fail, but it can eat into atomic memory reserves that we don't
    need to be using.

    Another upshot of this is that it removes the sometimes highly contended
    zone->lock from underneath tree_lock. Pagecache insertions are always
    performed with a radix tree preload, and after this change, such a
    situation will never fall back to kmem_cache_alloc within
    radix_tree_node_alloc.

    David Miller reports seeing this allocation fail on a highly threaded
    sparc64 system:

    [527319.459981] dd: page allocation failure. order:0, mode:0x20
    [527319.460403] Call Trace:
    [527319.460568] [00000000004b71e0] __slab_alloc+0x1b0/0x6a8
    [527319.460636] [00000000004b7bbc] kmem_cache_alloc+0x4c/0xa8
    [527319.460698] [000000000055309c] radix_tree_node_alloc+0x20/0x90
    [527319.460763] [0000000000553238] radix_tree_insert+0x12c/0x260
    [527319.460830] [0000000000495cd0] add_to_page_cache+0x38/0xb0
    [527319.460893] [00000000004e4794] mpage_readpages+0x6c/0x134
    [527319.460955] [000000000049c7fc] __do_page_cache_readahead+0x170/0x280
    [527319.461028] [000000000049cc88] ondemand_readahead+0x208/0x214
    [527319.461094] [0000000000496018] do_generic_mapping_read+0xe8/0x428
    [527319.461152] [0000000000497948] generic_file_aio_read+0x108/0x170
    [527319.461217] [00000000004badac] do_sync_read+0x88/0xd0
    [527319.461292] [00000000004bb5cc] vfs_read+0x78/0x10c
    [527319.461361] [00000000004bb920] sys_read+0x34/0x60
    [527319.461424] [0000000000406294] linux_sparc_syscall32+0x3c/0x40

    The calltrace is significant: __do_page_cache_readahead allocates a number
    of pages with GFP_KERNEL, and hence it should have reclaimed sufficient
    memory to satisfy GFP_ATOMIC allocations. However after the list of pages
    goes to mpage_readpages, there can be significant intervals (including disk
    IO) before all the pages are inserted into the radix-tree. So the reserves
    can easily be depleted at that point. The patch is confirmed to fix the
    problem.

    Signed-off-by: Nick Piggin
    Cc: "David S. Miller"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     

17 Oct, 2007

6 commits

  • Negative shifts are not allowed in C (the result is undefined). Same thing
    with full-width shifts.

    It works on most platforms but not on the VAX with gcc 4.0.1 (it results in an
    "operand reserved" fault).

    Shifting by more than the width of the value on the left is also not
    allowed. I think the extra '>> 1' tacked on at the end in the original
    code was an attempt to work around that. Getting rid of that is an extra
    feature of this patch.

    Here's the chapter and verse, taken from the final draft of the C99
    standard ("6.5.7 Bitwise shift operators", paragraph 3):

    "The integer promotions are performed on each of the operands. The
    type of the result is that of the promoted left operand. If the
    value of the right operand is negative or is greater than or equal
    to the width of the promoted left operand, the behavior is
    undefined."

    Thank you to Jan-Benedict Glaw, Christoph Hellwig, Maciej Rozycki, Pekka
    Enberg, Andreas Schwab, and Christoph Lameter for review. Special thanks
    to Andreas for spotting that my fix only removed half the undefined
    behaviour.

    Signed-off-by: Peter Lund
    Christoph Lameter
    Cc: Christoph Hellwig
    Cc: "Maciej W. Rozycki"
    Cc: Pekka Enberg
    Cc: Andreas Schwab
    Cc: Nick Piggin
    Cc: WU Fengguang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Lund
     
  • Slab constructors currently have a flags parameter that is never used. And
    the order of the arguments is opposite to other slab functions. The object
    pointer is placed before the kmem_cache pointer.

    Convert

    ctor(void *object, struct kmem_cache *s, unsigned long flags)

    to

    ctor(struct kmem_cache *s, void *object)

    throughout the kernel

    [akpm@linux-foundation.org: coupla fixes]
    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • This patch marks a number of allocations that are either short-lived such as
    network buffers or are reclaimable such as inode allocations. When something
    like updatedb is called, long-lived and unmovable kernel allocations tend to
    be spread throughout the address space which increases fragmentation.

    This patch groups these allocations together as much as possible by adding a
    new MIGRATE_TYPE. The MIGRATE_RECLAIMABLE type is for allocations that can be
    reclaimed on demand, but not moved. i.e. they can be migrated by deleting
    them and re-reading the information from elsewhere.

    Signed-off-by: Mel Gorman
    Cc: Andy Whitcroft
    Cc: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • A while back, Nick Piggin introduced a patch to reduce the node memory
    usage for small files (commit cfd9b7df4abd3257c9e381b0e445817b26a51c0c):

    -#define RADIX_TREE_MAP_SHIFT 6
    +#define RADIX_TREE_MAP_SHIFT (CONFIG_BASE_SMALL ? 4 : 6)

    Unfortunately, he didn't take into account the fact that the
    calculation of the maximum path was based on an assumption of having
    to round up:

    #define RADIX_TREE_MAX_PATH (RADIX_TREE_INDEX_BITS/RADIX_TREE_MAP_SHIFT + 2)

    So, if CONFIG_BASE_SMALL is set, you will end up with a
    RADIX_TREE_MAX_PATH that is one greater than necessary. The practical
    upshot of this is just a bit of wasted memory (one long in the
    height_to_maxindex array, an extra pre-allocated radix tree node per
    cpu, and extra stack usage in a couple of functions), but it seems
    worth getting right.

    It's also worth noting that I never build with CONFIG_BASE_SMALL.
    What I did to test this was duplicate the code in a small user-space
    program and check the results of the calculations for max path and the
    contents of the height_to_maxindex array.

    Signed-off-by: Jeff Moyer
    Acked-by: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jeff Moyer
     
  • Rather than sign direct radix-tree pointers with a special bit, sign the
    indirect one that hangs off the root. This means that, given a lookup_slot
    operation, the invalid result will be differentiated from the valid
    (previously, valid results could have the bit either set or clear).

    This does not affect slot lookups which occur under lock -- they can never
    return an invalid result. Is needed in future for lockless pagecache.

    Signed-off-by: Nick Piggin
    Acked-by: Peter Zijlstra
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • Introduce radix_tree_next_hole(root, index, max_scan) to scan radix tree for
    the first hole. It will be used in interleaved readahead.

    The implementation is dumb and obviously correct. It can help debug(and
    document) the possible smart one in future.

    Cc: Nick Piggin
    Signed-off-by: Fengguang Wu
    Cc: Rusty Russell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Fengguang Wu
     

20 Jul, 2007

1 commit

  • Slab destructors were no longer supported after Christoph's
    c59def9f222d44bb7e2f0a559f2906191a0862d7 change. They've been
    BUGs for both slab and slub, and slob never supported them
    either.

    This rips out support for the dtor pointer from kmem_cache_create()
    completely and fixes up every single callsite in the kernel (there were
    about 224, not including the slab allocator definitions themselves,
    or the documentation references).

    Signed-off-by: Paul Mundt

    Paul Mundt
     

14 Jul, 2007

1 commit

  • XFS filestreams functionality uses radix trees and the preload
    functions. XFS can be built as a module and hence we need
    radix_tree_preload() exported. radix_tree_preload_end() is a
    static inline, so it doesn't need exporting.

    Signed-Off-By: Dave Chinner
    Signed-Off-By: Tim Shimmin

    David Chinner
     

10 May, 2007

1 commit

  • Since nonboot CPUs are now disabled after tasks and devices have been
    frozen and the CPU hotplug infrastructure is used for this purpose, we need
    special CPU hotplug notifications that will help the CPU-hotplug-aware
    subsystems distinguish normal CPU hotplug events from CPU hotplug events
    related to a system-wide suspend or resume operation in progress. This
    patch introduces such notifications and causes them to be used during
    suspend and resume transitions. It also changes all of the
    CPU-hotplug-aware subsystems to take these notifications into consideration
    (for now they are handled in the same way as the corresponding "normal"
    ones).

    [oleg@tv-sign.ru: cleanups]
    Signed-off-by: Rafael J. Wysocki
    Cc: Gautham R Shenoy
    Cc: Pavel Machek
    Signed-off-by: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rafael J. Wysocki