07 Mar, 2010

2 commits

  • Clarify and correct header comment of list_sort().

    Signed-off-by: Don Mullis
    Cc: Dave Airlie
    Cc: Andi Kleen
    Cc: Dave Chinner
    Cc: Artem Bityutskiy
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Don Mullis
     
  • XFS and UBIFS can pass long lists to list_sort(); this alternative
    implementation scales better, reaching ~3x performance gain when list
    length exceeds the L2 cache size.

    Stand-alone program timings were run on a Core 2 duo L1=32KB L2=4MB,
    gcc-4.4, with flags extracted from an Ubuntu kernel build. Object size is
    581 bytes compared to 455 for Mark J. Roberts' code.

    Worst case for either implementation is a list length just over a power of
    two, and to roughly the same degree, so here are timing results for a
    range of 2^N+1 lengths. List elements were 16 bytes each including malloc
    overhead; initial order was random.

    time (msec)
    Tatham-Roberts
    | generic-Mullis-v2
    loop_count length | | ratio
    4000000 2 206 294 1.427
    2000000 3 176 227 1.289
    1000000 5 199 172 0.864
    500000 9 235 178 0.757
    250000 17 243 182 0.748
    125000 33 261 196 0.750
    62500 65 277 209 0.754
    31250 129 292 219 0.75
    15625 257 317 235 0.741
    7812 513 340 252 0.741
    3906 1025 362 267 0.737
    1953 2049 388 283 0.729 ~ L1 size
    976 4097 556 323 0.580
    488 8193 678 361 0.532
    244 16385 773 395 0.510
    122 32769 844 418 0.495
    61 65537 917 454 0.495
    30 131073 1128 543 0.481
    15 262145 2355 869 0.369 ~ L2 size
    7 524289 5597 1714 0.306
    3 1048577 6218 2022 0.325

    Mark's code does not actually implement the usual or generic mergesort,
    but rather a variant from Simon Tatham described here:

    http://www.chiark.greenend.org.uk/~sgtatham/algorithms/listsort.html

    Simon's algorithm performs O(log N) passes over the entire input list,
    doing merges of sublists that double in size on each pass. The generic
    algorithm instead merges pairs of equal length lists as early as possible,
    in recursive order. For either algorithm, the elements that extend the
    list beyond power-of-two length are a special case, handled as nearly as
    possible as a "rounding-up" to a full POT.

    Some intuition for the locality of reference implications of merge order
    may be gotten by watching this animation:

    http://www.sorting-algorithms.com/merge-sort

    Simon's algorithm requires only O(1) extra space rather than the generic
    algorithm's O(log N), but in my non-recursive implementation the actual
    O(log N) data is merely a vector of ~20 pointers, which I've put on the
    stack.

    Long-running list_sort() calls: If the list passed in may be long, or the
    client's cmp() callback function is slow, the client's cmp() may
    periodically invoke cond_resched() to voluntarily yield the CPU. All
    inner loops of list_sort() call back to cmp().

    Stability of the sort: distinct elements that compare equal emerge from
    the sort in the same order as with Mark's code, for simple test cases. A
    boot-time test is provided to verify this and other correctness
    requirements.

    A kernel that uses drm.ko appears to run normally with this change; I have
    no suitable hardware to similarly test the use by UBIFS.

    [akpm@linux-foundation.org: style tweaks, fix comment, make list_sort_test __init]
    Signed-off-by: Don Mullis
    Cc: Dave Airlie
    Cc: Andi Kleen
    Cc: Dave Chinner
    Cc: Artem Bityutskiy
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Don Mullis
     

13 Jan, 2010

1 commit

  • There are two copies of list_sort() in the tree already, one in the DRM
    code, another in ubifs. Now XFS needs this as well. Create a generic
    list_sort() function from the ubifs version and convert existing users
    to it so we don't end up with yet another copy in the tree.

    Signed-off-by: Dave Chinner
    Acked-by: Dave Airlie
    Acked-by: Artem Bityutskiy
    Signed-off-by: Linus Torvalds

    Dave Chinner