Eric Lee / smarc-fsl-linux-kernel

25 Apr, 2010

3 commits

98d5ce0d0 lib/vsprintf.c: add missing EXPORT_SYMBOL(simple_strtoll) ... Browse Code »

Add a missing EXPORT_SYMBOL.

I must be the first person that wants to use this function :-)

Signed-off-by: Hans Verkuil
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hans Verkuil
2010-04-25 02:31:26 +0800
ccdb40048 lib: fix the use of LZO to decompress initramfs images ... Browse Code »

This patch fixes 2 issues with the LZO decompressor:

- It doesn't handle the case where a block isn't compressed at all. In
this case, calling lzo1x_decompress_safe will fail, so we need to just
use memcpy() instead (the upstream LZO code does something similar)

- Since commit 54291362d2a5738e1b0495df2abcb9e6b0563a3f ("initramfs: add
missing decompressor error check") , the decompressor return code is
checked in the init/initramfs.c The LZO decompressor didn't return the
expected value, causing the initramfs code to falsely believe a
decompression error occured

Signed-off-by: Albin Tonnerre
Tested-by: bert schulze
Cc: "H. Peter Anvin"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Albin Tonnerre
2010-04-25 02:31:25 +0800
e59464c73 flex_array: fix the panic when calling flex_array_alloc() without __GFP_ZERO ... Browse Code »

memset() is called with the wrong address and the kernel panics.

Signed-off-by: Changli Gao
Cc: Patrick McHardy
Acked-by: David Rientjes
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Changli Gao
2010-04-25 02:31:24 +0800

16 Apr, 2010

1 commit

dc57da387 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/… ... Browse Code »

…git/tip/linux-2.6-tip

* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86/gart: Disable GART explicitly before initialization
dma-debug: Cleanup for copy-loop in filter_write()
x86/amd-iommu: Remove obsolete parameter documentation
x86/amd-iommu: use for_each_pci_dev
Revert "x86: disable IOMMUs on kernel crash"
x86/amd-iommu: warn when issuing command to uninitialized cmd buffer
x86/amd-iommu: enable iommu before attaching devices
x86/amd-iommu: Use helper function to destroy domain
x86/amd-iommu: Report errors in acpi parsing functions upstream
x86/amd-iommu: Pt mode fix for domain_destroy
x86/amd-iommu: Protect IOMMU-API map/unmap path
x86/amd-iommu: Remove double NULL check in check_device

Linus Torvalds
2010-04-16 03:20:56 +0800

15 Apr, 2010

1 commit

4e310fda9 vsprintf: Change struct printf_spec.precision from s8 to s16 ... Browse Code »

Commit ef0658f3de484bf9b173639cd47544584e01efa5 changed precision
from int to s8.

There is existing kernel code that uses a larger precision.

An example from the audit code:
vsnprintf(...,..., " msg='%.1024s'", (char *)data);
which overflows precision and truncates to nothing.

Extending precision size fixes the audit system issue.

Other changes:

Change the size of the struct printf_spec.type from u16 to u8 so
sizeof(struct printf_spec) stays as small as possible.
Reorder the struct members so sizeof(struct printf_spec) remains 64 bits
without alignment holes.
Document the struct members a bit more.

Original-patch-by: Eric Paris
Signed-off-by: Joe Perches
Tested-by: Justin P. Mattock
Signed-off-by: Linus Torvalds

Joe Perches
2010-04-15 01:32:35 +0800

13 Apr, 2010

3 commits

2b2f862ee Merge branch 'iommu/fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/joro… ... Browse Code »

…/linux-2.6-iommu into x86/urgent

Ingo Molnar
2010-04-13 19:24:54 +0800
9343af084 Merge branch 'master' of /home/davem/src/GIT/linux-2.6/ ... Browse Code »

Conflicts:
lib/Kconfig.debug

David S. Miller
2010-04-13 15:28:45 +0800
8b8d8e284 sparc64: Support kmemleak. ... Browse Code »

Only missing thing was an _sdata marker in vmlinux.lds.S

Signed-off-by: David S. Miller

David S. Miller
2010-04-13 14:46:17 +0800

10 Apr, 2010

2 commits

2f4084209 Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block ... Browse Code »

* 'for-linus' of git://git.kernel.dk/linux-2.6-block: (34 commits)
cfq-iosched: Fix the incorrect timeslice accounting with forced_dispatch
loop: Update mtime when writing using aops
block: expose the statistics in blkio.time and blkio.sectors for the root cgroup
backing-dev: Handle class_create() failure
Block: Fix block/elevator.c elevator_get() off-by-one error
drbd: lc_element_by_index() never returns NULL
cciss: unlock on error path
cfq-iosched: Do not merge queues of BE and IDLE classes
cfq-iosched: Add additional blktrace log messages in CFQ for easier debugging
i2o: Remove the dangerous kobj_to_i2o_device macro
block: remove 16 bytes of padding from struct request on 64bits
cfq-iosched: fix a kbuild regression
block: make CONFIG_BLK_CGROUP visible
Remove GENHD_FL_DRIVERFS
block: Export max number of segments and max segment size in sysfs
block: Finalize conversion of block limits functions
block: Fix overrun in lcm() and move it to lib
vfs: improve writeback_inodes_wb()
paride: fix off-by-one test
drbd: fix al-to-on-disk-bitmap for 4k logical_block_size
...

Linus Torvalds
2010-04-10 02:50:29 +0800
ce82653d6 radix_tree_tag_get() is not as safe as the docs make out [ver #2] ... Browse Code »

radix_tree_tag_get() is not safe to use concurrently with radix_tree_tag_set()
or radix_tree_tag_clear(). The problem is that the double tag_get() in
radix_tree_tag_get():

if (!tag_get(node, tag, offset))
saw_unset_tag = 1;
if (height == 1) {
int ret = tag_get(node, tag, offset);

may see the value change due to the action of set/clear. RCU is no protection
against this as no pointers are being changed, no nodes are being replaced
according to a COW protocol - set/clear alter the node directly.

The documentation in linux/radix-tree.h, however, says that
radix_tree_tag_get() is an exception to the rule that "any function modifying
the tree or tags (...) must exclude other modifications, and exclude any
functions reading the tree".

The problem is that the next statement in radix_tree_tag_get() checks that the
tag doesn't vary over time:

BUG_ON(ret && saw_unset_tag);

This has been seen happening in FS-Cache:

https://www.redhat.com/archives/linux-cachefs/2010-April/msg00013.html

To this end, remove the BUG_ON() from radix_tree_tag_get() and note in various
comments that the value of the tag may change whilst the RCU read lock is held,
and thus that the return value of radix_tree_tag_get() may not be relied upon
unless radix_tree_tag_set/clear() and radix_tree_delete() are excluded from
running concurrently with it.

Reported-by: Romain DEGEZ
Signed-off-by: David Howells
Acked-by: Nick Piggin
Signed-off-by: Linus Torvalds

David Howells
2010-04-10 01:12:03 +0800

08 Apr, 2010

1 commit

3eac4abaa rwsem generic spinlock: use IRQ save/restore spinlocks ... Browse Code »

rwsems can be used with IRQs disabled, particularily in early boot
before IRQs are enabled. Currently the spin_unlock_irq() usage in the
slow-patch will unconditionally enable interrupts and cause problems
since interrupts are not yet initialized or enabled.

This patch uses save/restore versions of IRQ spinlocks in the slowpath
to ensure interrupts are not unintentionally disabled.

Signed-off-by: Kevin Hilman
Signed-off-by: Linus Torvalds

Kevin Hilman
2010-04-08 07:15:05 +0800

07 Apr, 2010

5 commits

addb2d6c1 Merge branch 'for-linus' of git://git.monstr.eu/linux-2.6-microblaze ... Browse Code »

* 'for-linus' of git://git.monstr.eu/linux-2.6-microblaze:
microblaze: Remove unused variable from ptrace
microblaze: io.h: Add io big-endian function
microblaze: Enable memory leak detector
microblaze: Fix futex code
microblaze: Fix ftrace_update_ftrace_func panic

Linus Torvalds
2010-04-07 23:48:39 +0800
57119c34e ratelimit: fix the return value when __ratelimit() fails to acquire the lock ... Browse Code »

The log of commit edaac8e3167501cda336231d00611bf59c164346 ("ratelimit:
Fix/allow use in atomic contexts"), indicates that we want to suppress the
callback when the trylock fails.

Signed-off-by: Yong Zhang
Cc: Ingo Molnar
Cc: Christian Borntraeger
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Yong Zhang
2010-04-07 23:38:04 +0800
2a7268abc ratelimit: annotate ___ratelimit() ... Browse Code »

To prevent from wrongly using the return value.

[akpm@linux-foundation.org: fix spello]
Signed-off-by: Yong Zhang
Cc: Ingo Molnar
Cc: Dave Young
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Yong Zhang
2010-04-07 23:38:04 +0800
39a37ce1c dma-debug: Cleanup for copy-loop in filter_write() ... Browse Code »

Earlier in this function we set the last byte of "buf" to NULL so we
always hit the break statement and "i" is never equal to NAME_MAX_LEN.
This patch doesn't change how the driver works but it silences a Smatch
warning and it makes it clearer that we don't write past the end of the
array.

Signed-off-by: Dan Carpenter
Signed-off-by: Joerg Roedel

Dan Carpenter
2010-04-07 20:36:27 +0800
47c4c864a microblaze: Enable memory leak detector ... Browse Code »

Enable DEBUG_KMEMLEAK for microblaze

Signed-off-by: Michal Simek

Michal Simek
2010-04-07 13:27:26 +0800

30 Mar, 2010

1 commit

5a0e3ad6a include cleanup: Update gfp.h and slab.h includes to prepare for breaking implic… ... Browse Code »

…it slab.h inclusion from percpu.h

percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.

2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).

* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

Tejun Heo
2010-03-30 21:02:32 +0800

25 Mar, 2010

1 commit

1d53661d2 blackfin: enable DEBUG_SECTION_MISMATCH ... Browse Code »

We see only one section mismatch now after thousands of randconfigs, and a
bug has been filed about that one.

Signed-off-by: Mike Frysinger
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mike Frysinger
2010-03-25 07:31:20 +0800

19 Mar, 2010

1 commit

b4b7a4ef0 Merge branch 'master' into for-linus ... Browse Code »

Conflicts:
block/Kconfig

Signed-off-by: Jens Axboe

Jens Axboe
2010-03-19 15:05:10 +0800

15 Mar, 2010

4 commits

2cda2728a block: Fix overrun in lcm() and move it to lib ... Browse Code »

lcm() was defined to take integer-sized arguments. The supplied
arguments are multiplied, however, causing us to overflow given
sufficiently large input. That in turn led to incorrect optimal I/O
size reporting in some cases (RAID over RAID).

Switch lcm() over to unsigned long similar to gcd() and move the
function from blk-settings.c to lib.

Signed-off-by: Martin K. Petersen
Signed-off-by: Jens Axboe

Martin K. Petersen
2010-03-15 19:47:59 +0800
ec28dcc6b Merge branches 'battery-2.6.34', 'bugzilla-10805', 'bugzilla-14668', 'bugzilla-5… ... Browse Code »

…31916-power-state', 'ht-warn-2.6.34', 'pnp', 'processor-rename', 'sony-2.6.34', 'suse-bugzilla-531547', 'tz-check', 'video' and 'misc-2.6.34' into release

Len Brown
2010-03-15 09:30:17 +0800
9d7cca042 resource: add window support ... Browse Code »

Add support for resource windows. This is for bridge resources, i.e.,
regions where a bridge forwards transactions from the primary to the
secondary side.

Signed-off-by: Bjorn Helgaas
Signed-off-by: Len Brown

Bjorn Helgaas
2010-03-15 08:08:36 +0800
0f4050c7d resource: add bus number support ... Browse Code »

Add support for bus number resources. This is for bridges with a range of
bus numbers behind them.

Signed-off-by: Bjorn Helgaas
Signed-off-by: Len Brown

Bjorn Helgaas
2010-03-15 08:08:35 +0800

14 Mar, 2010

1 commit

9fdfbc2bf Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel… ... Browse Code »

…/git/tip/linux-2.6-tip

* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
perf: Provide generic perf_sample_data initialization
MAINTAINERS: Add Arnaldo as tools/perf/ co-maintainer
perf trace: Don't use pager if scripting
perf trace/scripting: Remove extraneous header read
perf, ARM: Modify kuser rmb() call to compile for Thumb-2
x86/stacktrace: Don't dereference bad frame pointers
perf archive: Don't try to collect files without a build-id
perf_events, x86: Fixup fixed counter constraints
perf, x86: Restrict the ANY flag
perf, x86: rename macro in ARCH_PERFMON_EVENTSEL_ENABLE
perf, x86: add some IBS macros to perf_event.h
perf, x86: make IBS macros available in perf_event.h
hw-breakpoints: Remove stub unthrottle callback
x86/hw-breakpoints: Remove the name field
perf: Remove pointless breakpoint union
perf lock: Drop the buffers multiplexing dependency
perf lock: Fix and add misc documentally things
percpu: Add __percpu sparse annotations to hw_breakpoint

Linus Torvalds
2010-03-14 06:39:42 +0800

13 Mar, 2010

2 commits

51ea3f6a4 inflate_fast: sout is already a short so ptr arith was off by one. ... Browse Code »

inflate_fast() can do either POST INC or PRE INC on its pointers walking
the memory to decompress. Default is PRE INC.

The sout pointer offset was miscalculated in one case as the calculation
assumed sout was a char * This breaks inflate_fast() iff configured to do
POST INC.

Signed-off-by: Joakim Tjernlund
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joakim Tjernlund
2010-03-13 07:52:44 +0800
e69eae655 zlib: make new optimized inflate endian independent ... Browse Code »

Commit 6846ee5ca68d81e6baccf0d56221d7a00c1be18b ("zlib: Fix build of
powerpc boot wrapper") made the new optimized inflate only available on
arch's that define CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS.

This patch will again enable the optimization for all arch's by defining
our own endian independent version of unaligned access. As an added
bonus, arch's that define CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS do a
plain load instead.

Signed-off-by: Joakim Tjernlund
Cc: Anton Blanchard
Cc: Benjamin Herrenschmidt
Cc: David Woodhouse
Cc: Kumar Gala
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joakim Tjernlund
2010-03-13 07:52:44 +0800

10 Mar, 2010

1 commit

548b84166 Merge commit 'v2.6.34-rc1' into perf/urgent ... Browse Code »

Conflicts:
tools/perf/util/probe-event.c

Merge reason: Pick up -rc1 and resolve the conflict as well.

Signed-off-by: Ingo Molnar

Ingo Molnar
2010-03-10 00:11:53 +0800

08 Mar, 2010

3 commits

52cf25d0a Driver core: Constify struct sysfs_ops in struct kobj_type ... Browse Code »

Constify struct sysfs_ops.

This is part of the ops structure constification
effort started by Arjan van de Ven et al.

Benefits of this constification:

* prevents modification of data that is shared
(referenced) by many other structure instances
at runtime

* detects/prevents accidental (but not intentional)
modification attempts on archs that enforce
read-only kernel data at runtime

* potentially better optimized code as the compiler
can assume that the const data cannot be changed

* the compiler/linker move const data into .rodata
and therefore exclude them from false sharing

Signed-off-by: Emese Revfy
Acked-by: David Teigland
Acked-by: Matt Domsch
Acked-by: Maciej Sosnowski
Acked-by: Hans J. Koch
Acked-by: Pekka Enberg
Acked-by: Jens Axboe
Acked-by: Stephen Hemminger
Signed-off-by: Greg Kroah-Hartman

Emese Revfy
2010-03-08 09:04:49 +0800
9cd43611c kobject: Constify struct kset_uevent_ops ... Browse Code »

Constify struct kset_uevent_ops.

This is part of the ops structure constification
effort started by Arjan van de Ven et al.

Benefits of this constification:

* prevents modification of data that is shared
(referenced) by many other structure instances
at runtime

* detects/prevents accidental (but not intentional)
modification attempts on archs that enforce
read-only kernel data at runtime

* potentially better optimized code as the compiler
can assume that the const data cannot be changed

* the compiler/linker move const data into .rodata
and therefore exclude them from false sharing

Signed-off-by: Emese Revfy
Signed-off-by: Greg Kroah-Hartman

Emese Revfy
2010-03-08 09:04:49 +0800
b8fa05719 Revert "lib: build list_sort() only if needed" ... Browse Code »

This reverts commit a069c266ae5fdfbf5b4aecf2c672413aa33b2504.

It turns ou that not only was it missing a case (XFS) that needed it,
but perhaps more importantly, people sometimes want to enable new
modules that they hadn't had enabled before, and if such a module uses
list_sort(), it can't easily be inserted any more.

So rather than add a "select LIST_SORT" to the XFS case, just leave it
compiled in. It's not all _that_ big, after all, and the inconvenience
isn't worth it.

Requested-by: Alexey Dobriyan
Cc: Christoph Hellwig
Cc: Don Mullis
Cc: Andrew Morton
Cc: Dave Chinner
Signed-off-by: Linus Torvalds

Linus Torvalds
2010-03-08 01:54:44 +0800

07 Mar, 2010

10 commits

4da0b66c6 vsprintf: move %pR resource printf_specs off the stack ... Browse Code »

This adds separate I/O and memory specs, so we don't have to change the
field width in a shared spec, which then lets us make all the specs const
and static, since they never change.

Signed-off-by: Bjorn Helgaas
Signed-off-by: Linus Torvalds

Bjorn Helgaas
2010-03-07 09:53:07 +0800
b89dc5d6b vsprintf: clarify comments for printf_spec flags ... Browse Code »

Add clues about what the SMALL and SPECIAL flags do.

Signed-off-by: Bjorn Helgaas
Signed-off-by: Linus Torvalds

Bjorn Helgaas
2010-03-07 09:53:07 +0800
ef0658f3d vsprintf.c: Reduce sizeof struct printf_spec from 24 to 8 bytes ... Browse Code »

Reducing the size of struct printf_spec is a good thing because multiple
instances are commonly passed on stack.

It's possible for type to be u8 and field_width to be s8, but this is
likely small enough for now.

Signed-off-by: Joe Perches
Signed-off-by: Linus Torvalds

Joe Perches
2010-03-07 09:47:45 +0800
66b89159c Merge git://git.kernel.org/pub/scm/linux/kernel/git/joern/logfs ... Browse Code »

* git://git.kernel.org/pub/scm/linux/kernel/git/joern/logfs:
[LogFS] Change magic number
[LogFS] Remove h_version field
[LogFS] Check feature flags
[LogFS] Only write journal if dirty
[LogFS] Fix bdev erases
[LogFS] Silence gcc
[LogFS] Prevent 64bit divisions in hash_index
[LogFS] Plug memory leak on error paths
[LogFS] Add MAINTAINERS entry
[LogFS] add new flash file system

Fixed up trivial conflict in lib/Kconfig, and a semantic conflict in
fs/logfs/inode.c introduced by write_inode() being changed to use
writeback_control' by commit a9185b41a4f84971b930c519f0c63bd450c4810d
("pass writeback_control to ->write_inode")

Linus Torvalds
2010-03-07 05:18:03 +0800
4f2a9463d crc32: some minor cleanups ... Browse Code »

Signed-off-by: Joakim Tjernlund
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joakim Tjernlund
2010-03-07 03:26:45 +0800
08564fb7a bitmap: use for_each_set_bit() ... Browse Code »

Replace open-coded loop with for_each_set_bit().

Signed-off-by: Akinobu Mita
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Akinobu Mita
2010-03-07 03:26:35 +0800
9a86e2bad lib: fix first line of kernel-doc for a few functions ... Browse Code »

The function name must be followed by a space, hypen, space, and a short
description.

Signed-off-by: Ben Hutchings
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ben Hutchings
2010-03-07 03:26:35 +0800
a069c266a lib: build list_sort() only if needed ... Browse Code »

Build list_sort() only for configs that need it -- those that don't save
~581 bytes (i386).

Signed-off-by: Don Mullis
Cc: Dave Airlie
Cc: Andi Kleen
Cc: Dave Chinner
Cc: Artem Bityutskiy
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Don Mullis
2010-03-07 03:26:35 +0800
02b12b7a2 lib: revise list_sort() header comment ... Browse Code »

Clarify and correct header comment of list_sort().

Signed-off-by: Don Mullis
Cc: Dave Airlie
Cc: Andi Kleen
Cc: Dave Chinner
Cc: Artem Bityutskiy
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Don Mullis
2010-03-07 03:26:35 +0800
835cc0c84 lib: more scalable list_sort() ... Browse Code »

XFS and UBIFS can pass long lists to list_sort(); this alternative
implementation scales better, reaching ~3x performance gain when list
length exceeds the L2 cache size.

Stand-alone program timings were run on a Core 2 duo L1=32KB L2=4MB,
gcc-4.4, with flags extracted from an Ubuntu kernel build. Object size is
581 bytes compared to 455 for Mark J. Roberts' code.

Worst case for either implementation is a list length just over a power of
two, and to roughly the same degree, so here are timing results for a
range of 2^N+1 lengths. List elements were 16 bytes each including malloc
overhead; initial order was random.

time (msec)
Tatham-Roberts
| generic-Mullis-v2
loop_count length | | ratio
4000000 2 206 294 1.427
2000000 3 176 227 1.289
1000000 5 199 172 0.864
500000 9 235 178 0.757
250000 17 243 182 0.748
125000 33 261 196 0.750
62500 65 277 209 0.754
31250 129 292 219 0.75
15625 257 317 235 0.741
7812 513 340 252 0.741
3906 1025 362 267 0.737
1953 2049 388 283 0.729 ~ L1 size
976 4097 556 323 0.580
488 8193 678 361 0.532
244 16385 773 395 0.510
122 32769 844 418 0.495
61 65537 917 454 0.495
30 131073 1128 543 0.481
15 262145 2355 869 0.369 ~ L2 size
7 524289 5597 1714 0.306
3 1048577 6218 2022 0.325

Mark's code does not actually implement the usual or generic mergesort,
but rather a variant from Simon Tatham described here:

http://www.chiark.greenend.org.uk/~sgtatham/algorithms/listsort.html

Simon's algorithm performs O(log N) passes over the entire input list,
doing merges of sublists that double in size on each pass. The generic
algorithm instead merges pairs of equal length lists as early as possible,
in recursive order. For either algorithm, the elements that extend the
list beyond power-of-two length are a special case, handled as nearly as
possible as a "rounding-up" to a full POT.

Some intuition for the locality of reference implications of merge order
may be gotten by watching this animation:

http://www.sorting-algorithms.com/merge-sort

Simon's algorithm requires only O(1) extra space rather than the generic
algorithm's O(log N), but in my non-recursive implementation the actual
O(log N) data is merely a vector of ~20 pointers, which I've put on the
stack.

Long-running list_sort() calls: If the list passed in may be long, or the
client's cmp() callback function is slow, the client's cmp() may
periodically invoke cond_resched() to voluntarily yield the CPU. All
inner loops of list_sort() call back to cmp().

Stability of the sort: distinct elements that compare equal emerge from
the sort in the same order as with Mark's code, for simple test cases. A
boot-time test is provided to verify this and other correctness
requirements.

A kernel that uses drm.ko appears to run normally with this change; I have
no suitable hardware to similarly test the use by UBIFS.

[akpm@linux-foundation.org: style tweaks, fix comment, make list_sort_test __init]
Signed-off-by: Don Mullis
Cc: Dave Airlie
Cc: Andi Kleen
Cc: Dave Chinner
Cc: Artem Bityutskiy
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Don Mullis
2010-03-07 03:26:35 +0800