Eric Lee / smarc-fsl-linux-kernel

20 Aug, 2019

1 commit

4333fb96c media: lib/sort.c: implement sort() variant taking context argument ... Browse Code »

Our list_sort() utility has always supported a context argument that
is passed through to the comparison routine. Now there's a use case
for the similar thing for sort().

This implements sort_r by simply extending the existing sort function
in the obvious way. To avoid code duplication, we want to implement
sort() in terms of sort_r(). The naive way to do that is

static int cmp_wrapper(const void *a, const void *b, const void *ctx)
{
int (*real_cmp)(const void*, const void*) = ctx;
return real_cmp(a, b);
}

sort(..., cmp) { sort_r(..., cmp_wrapper, cmp) }

but this would do two indirect calls for each comparison. Instead, do
as is done for the default swap functions - that only adds a cost of a
single easily predicted branch to each comparison call.

Aside from introducing support for the context argument, this also
serves as preparation for patches that will eliminate the indirect
comparison calls in common cases.

Requested-by: Boris Brezillon

Signed-off-by: Rasmus Villemoes
Signed-off-by: Boris Brezillon
Acked-by: Andrew Morton
Tested-by: Philipp Zabel
Signed-off-by: Hans Verkuil
Signed-off-by: Mauro Carvalho Chehab

Rasmus Villemoes
2019-08-20 00:14:53 +0800

02 Jun, 2019

1 commit

aa52619cc lib/sort.c: fix kernel-doc notation warnings ... Browse Code »

Fix kernel-doc notation in lib/sort.c by using correct function parameter
names.

lib/sort.c:59: warning: Excess function parameter 'size' description in 'swap_words_32'
lib/sort.c:83: warning: Excess function parameter 'size' description in 'swap_words_64'
lib/sort.c:110: warning: Excess function parameter 'size' description in 'swap_bytes'

Link: http://lkml.kernel.org/r/60e25d3d-68d1-bde2-3b39-e4baa0b14907@infradead.org
Fixes: 37d0ec34d111a ("lib/sort: make swap functions more generic")
Signed-off-by: Randy Dunlap
Cc: George Spelvin
Cc: Andrew Morton
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Randy Dunlap
2019-06-02 06:51:31 +0800

15 May, 2019

3 commits

8fb583c42 lib/sort: avoid indirect calls to built-in swap ... Browse Code »

Similar to what's being done in the net code, this takes advantage of
the fact that most invocations use only a few common swap functions, and
replaces indirect calls to them with (highly predictable) conditional
branches. (The downside, of course, is that if you *do* use a custom
swap function, there are a few extra predicted branches on the code
path.)

This actually *shrinks* the x86-64 code, because it inlines the various
swap functions inside do_swap, eliding function prologues & epilogues.

x86-64 code size 767 -> 703 bytes (-64)

Link: http://lkml.kernel.org/r/d10c5d4b393a1847f32f5b26f4bbaa2857140e1e.1552704200.git.lkml@sdf.org
Signed-off-by: George Spelvin
Acked-by: Andrey Abramov
Acked-by: Rasmus Villemoes
Reviewed-by: Andy Shevchenko
Cc: Daniel Wagner
Cc: Dave Chinner
Cc: Don Mullis
Cc: Geert Uytterhoeven
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

George Spelvin
2019-05-15 10:52:49 +0800
22a241ccb lib/sort: use more efficient bottom-up heapsort variant ... Browse Code »

This uses fewer comparisons than the previous code (approaching half as
many for large random inputs), but produces identical results; it
actually performs the exact same series of swap operations.

Specifically, it reduces the average number of compares from
2*n*log2(n) - 3*n + o(n)
to
n*log2(n) + 0.37*n + o(n).

This is still 1.63*n worse than glibc qsort() which manages n*log2(n) -
1.26*n, but at least the leading coefficient is correct.

Standard heapsort, when sifting down, performs two comparisons per
level: one to find the greater child, and a second to see if the current
node should be exchanged with that child.

Bottom-up heapsort observes that it's better to postpone the second
comparison and search for the leaf where -infinity would be sent to,
then search back *up* for the current node's destination.

Since sifting down usually proceeds to the leaf level (that's where half
the nodes are), this does O(1) second comparisons rather than log2(n).
That saves a lot of (expensive since Spectre) indirect function calls.

The one time it's worse than the previous code is if there are large
numbers of duplicate keys, when the top-down algorithm is O(n) and
bottom-up is O(n log n). For distinct keys, it's provably always
better, doing 1.5*n*log2(n) + O(n) in the worst case.

(The code is not significantly more complex. This patch also merges the
heap-building and -extracting sift-down loops, resulting in a net code
size savings.)

x86-64 code size 885 -> 767 bytes (-118)

(I see the checkpatch complaint about "else if (n -= size)". The
alternative is significantly uglier.)

Link: http://lkml.kernel.org/r/2de8348635a1a421a72620677898c7fd5bd4b19d.1552704200.git.lkml@sdf.org
Signed-off-by: George Spelvin
Acked-by: Andrey Abramov
Acked-by: Rasmus Villemoes
Reviewed-by: Andy Shevchenko
Cc: Daniel Wagner
Cc: Dave Chinner
Cc: Don Mullis
Cc: Geert Uytterhoeven
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

George Spelvin
2019-05-15 10:52:49 +0800
37d0ec34d lib/sort: make swap functions more generic ... Browse Code »

Patch series "lib/sort & lib/list_sort: faster and smaller", v2.

Because CONFIG_RETPOLINE has made indirect calls much more expensive, I
thought I'd try to reduce the number made by the library sort functions.

The first three patches apply to lib/sort.c.

Patch #1 is a simple optimization. The built-in swap has special cases
for aligned 4- and 8-byte objects. But those are almost never used;
most calls to sort() work on larger structures, which fall back to the
byte-at-a-time loop. This generalizes them to aligned *multiples* of 4
and 8 bytes. (If nothing else, it saves an awful lot of energy by not
thrashing the store buffers as much.)

Patch #2 grabs a juicy piece of low-hanging fruit. I agree that nice
simple solid heapsort is preferable to more complex algorithms (sorry,
Andrey), but it's possible to implement heapsort with far fewer
comparisons (50% asymptotically, 25-40% reduction for realistic sizes)
than the way it's been done up to now. And with some care, the code
ends up smaller, as well. This is the "big win" patch.

Patch #3 adds the same sort of indirect call bypass that has been added
to the net code of late. The great majority of the callers use the
builtin swap functions, so replace the indirect call to sort_func with a
(highly preditable) series of if() statements. Rather surprisingly,
this decreased code size, as the swap functions were inlined and their
prologue & epilogue code eliminated.

lib/list_sort.c is a bit trickier, as merge sort is already close to
optimal, and we don't want to introduce triumphs of theory over
practicality like the Ford-Johnson merge-insertion sort.

Patch #4, without changing the algorithm, chops 32% off the code size
and removes the part[MAX_LIST_LENGTH+1] pointer array (and the
corresponding upper limit on efficiently sortable input size).

Patch #5 improves the algorithm. The previous code is already optimal
for power-of-two (or slightly smaller) size inputs, but when the input
size is just over a power of 2, there's a very unbalanced final merge.

There are, in the literature, several algorithms which solve this, but
they all depend on the "breadth-first" merge order which was replaced by
commit 835cc0c8477f with a more cache-friendly "depth-first" order.
Some hard thinking came up with a depth-first algorithm which defers
merges as little as possible while avoiding bad merges. This saves
0.2*n compares, averaged over all sizes.

The code size increase is minimal (64 bytes on x86-64, reducing the net
savings to 26%), but the comments expanded significantly to document the
clever algorithm.

TESTING NOTES: I have some ugly user-space benchmarking code which I
used for testing before moving this code into the kernel. Shout if you
want a copy.

I'm running this code right now, with CONFIG_TEST_SORT and
CONFIG_TEST_LIST_SORT, but I confess I haven't rebooted since the last
round of minor edits to quell checkpatch. I figure there will be at
least one round of comments and final testing.

This patch (of 5):

Rather than having special-case swap functions for 4- and 8-byte
objects, special-case aligned multiples of 4 or 8 bytes. This speeds up
most users of sort() by avoiding fallback to the byte copy loop.

Despite what ca96ab859ab4 ("lib/sort: Add 64 bit swap function") claims,
very few users of sort() sort pointers (or pointer-sized objects); most
sort structures containing at least two words. (E.g.
drivers/acpi/fan.c:acpi_fan_get_fps() sorts an array of 40-byte struct
acpi_fan_fps.)

The functions also got renamed to reflect the fact that they support
multiple words. In the great tradition of bikeshedding, the names were
by far the most contentious issue during review of this patch series.

x86-64 code size 872 -> 886 bytes (+14)

With feedback from Andy Shevchenko, Rasmus Villemoes and Geert
Uytterhoeven.

Link: http://lkml.kernel.org/r/f24f932df3a7fa1973c1084154f1cea596bcf341.1552704200.git.lkml@sdf.org
Signed-off-by: George Spelvin
Acked-by: Andrey Abramov
Acked-by: Rasmus Villemoes
Reviewed-by: Andy Shevchenko
Cc: Rasmus Villemoes
Cc: Geert Uytterhoeven
Cc: Daniel Wagner
Cc: Don Mullis
Cc: Dave Chinner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

George Spelvin
2019-05-15 10:52:49 +0800

02 Nov, 2017

1 commit

b24413180 License cleanup: add SPDX GPL-2.0 license identifier to files with no license ... Browse Code »

Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.

By default all files without license information are under the default
license of the kernel, which is GPL version 2.

Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.

How this work was done:

Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,

Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.

The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.

The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.

Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if
Reviewed-by: Philippe Ombredanne
Reviewed-by: Thomas Gleixner
Signed-off-by: Greg Kroah-Hartman

Greg Kroah-Hartman
2017-11-02 18:10:55 +0800

25 Feb, 2017

1 commit

c5adae958 lib: add CONFIG_TEST_SORT to enable self-test of sort() ... Browse Code »

Along with the addition made to Kconfig.debug, the prior existing but
permanently disabled test function has been slightly refactored.

Patch has been tested using QEMU 2.1.2 with a .config obtained through
'make defconfig' (x86_64) and manually enabling the option.

[arnd@arndb.de: move sort self-test into a separate file]
Link: http://lkml.kernel.org/r/20170112110657.3123790-1-arnd@arndb.de
Link: http://lkml.kernel.org/r/HE1PR09MB0394B0418D504DCD27167D4FD49B0@HE1PR09MB0394.eurprd09.prod.outlook.com
Signed-off-by: Kostenzer Felix
Signed-off-by: Arnd Bergmann
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kostenzer Felix
2017-02-25 09:46:57 +0800

26 Jun, 2015

1 commit

ca96ab859 lib/sort: Add 64 bit swap function ... Browse Code »

In case the call side is not providing a swap function, we either use a
32 bit or a generic swap function. When swapping around pointers on 64
bit architectures falling back to use the generic swap function seems
like an unnecessary waste.

There at least 9 users ('sort' is of difficult to grep for) of sort()
and all of them use the sort function without a customized swap
function. Furthermore, they are all using pointers to swap around:

arch/x86/kernel/e820.c:sanitize_e820_map()
arch/x86/mm/extable.c:sort_extable()
drivers/acpi/fan.c:acpi_fan_get_fps()
fs/btrfs/super.c:btrfs_descending_sort_devices()
fs/xfs/libxfs/xfs_dir2_block.c:xfs_dir2_sf_to_block()
kernel/range.c:clean_sort_range()
mm/memcontrol.c:__mem_cgroup_usage_register_event()
sound/pci/hda/hda_auto_parser.c:snd_hda_parse_pin_defcfg()
sound/pci/hda/hda_auto_parser.c:sort_pins_by_sequence()

Obviously, we could improve the swap for other sizes as well
but this is overkill at this point.

A simple test shows sorting a 400 element array (try to stay in one
page) with either with u32_swap() or u64_swap() show that the theory
actually works. This test was done on a x86_64 (Intel Xeon E5-4610)
machine.

- swap_32:

NumSamples = 100; Min = 48.00; Max = 49.00
Mean = 48.320000; Variance = 0.217600; SD = 0.466476; Median 48.000000
each * represents a count of 1
48.0000 - 48.1000 [ 68]: ********************************************************************
48.1000 - 48.2000 [ 0]:
48.2000 - 48.3000 [ 0]:
48.3000 - 48.4000 [ 0]:
48.4000 - 48.5000 [ 0]:
48.5000 - 48.6000 [ 0]:
48.6000 - 48.7000 [ 0]:
48.7000 - 48.8000 [ 0]:
48.8000 - 48.9000 [ 0]:
48.9000 - 49.0000 [ 32]: ********************************

- swap_64:

NumSamples = 100; Min = 44.00; Max = 63.00
Mean = 48.250000; Variance = 18.687500; SD = 4.322904; Median 47.000000
each * represents a count of 1
44.0000 - 45.9000 [ 15]: ***************
45.9000 - 47.8000 [ 37]: *************************************
47.8000 - 49.7000 [ 39]: ***************************************
49.7000 - 51.6000 [ 0]:
51.6000 - 53.5000 [ 0]:
53.5000 - 55.4000 [ 0]:
55.4000 - 57.3000 [ 0]:
57.3000 - 59.2000 [ 1]: *
59.2000 - 61.1000 [ 3]: ***
61.1000 - 63.0000 [ 5]: *****

- swap_72:

NumSamples = 100; Min = 53.00; Max = 71.00
Mean = 55.070000; Variance = 21.565100; SD = 4.643824; Median 53.000000
each * represents a count of 1
53.0000 - 54.8000 [ 73]: *************************************************************************
54.8000 - 56.6000 [ 9]: *********
56.6000 - 58.4000 [ 9]: *********
58.4000 - 60.2000 [ 0]:
60.2000 - 62.0000 [ 0]:
62.0000 - 63.8000 [ 0]:
63.8000 - 65.6000 [ 0]:
65.6000 - 67.4000 [ 1]: *
67.4000 - 69.2000 [ 4]: ****
69.2000 - 71.0000 [ 4]: ****

- test program:

static int cmp_32(const void *a, const void *b)
{
u32 l = *(u32 *)a;
u32 r = *(u32 *)b;

if (l < r)
return -1;
if (l > r)
return 1;
return 0;
}

static int cmp_64(const void *a, const void *b)
{
u64 l = *(u64 *)a;
u64 r = *(u64 *)b;

if (l < r)
return -1;
if (l > r)
return 1;
return 0;
}

static int cmp_72(const void *a, const void *b)
{
u32 l = get_unaligned((u32 *) a);
u32 r = get_unaligned((u32 *) b);

if (l < r)
return -1;
if (l > r)
return 1;
return 0;
}

static void init_array32(void *array)
{
u32 *a = array;
int i;

a[0] = 3821;
for (i = 1; i < ARRAY_ELEMENTS; i++)
a[i] = next_pseudo_random32(a[i-1]);
}

static void init_array64(void *array)
{
u64 *a = array;
int i;

a[0] = 3821;
for (i = 1; i < ARRAY_ELEMENTS; i++)
a[i] = next_pseudo_random32(a[i-1]);
}

static void init_array72(void *array)
{
char *p;
u32 v;
int i;

v = 3821;
for (i = 0; i < ARRAY_ELEMENTS; i++) {
p = (char *)array + (i * 9);
put_unaligned(v, (u32*) p);
v = next_pseudo_random32(v);
}
}

static void sort_test(void (*init)(void *array),
int (*cmp) (const void *, const void *),
void *array, size_t size)
{
ktime_t start, stop;
int i;

for (i = 0; i < 10000; i++) {
init(array);

local_irq_disable();
start = ktime_get();

sort(array, ARRAY_ELEMENTS, size, cmp, NULL);

stop = ktime_get();
local_irq_enable();

if (i > 10000 - 101)
pr_info("%lld\n", ktime_to_us(ktime_sub(stop, start)));
}
}

static void *create_array(size_t size)
{
void *array;

array = kmalloc(ARRAY_ELEMENTS * size, GFP_KERNEL);
if (!array)
return NULL;

return array;
}

static int perform_test(size_t size)
{
void *array;

array = create_array(size);
if (!array)
return -ENOMEM;

pr_info("test element size %d bytes\n", (int)size);
switch (size) {
case 4:
sort_test(init_array32, cmp_32, array, size);
break;
case 8:
sort_test(init_array64, cmp_64, array, size);
break;
case 9:
sort_test(init_array72, cmp_72, array, size);
break;
}
kfree(array);

return 0;
}

static int __init sort_tests_init(void)
{
int err;

err = perform_test(sizeof(u32));
if (err)
return err;

err = perform_test(sizeof(u64));
if (err)
return err;

err = perform_test(sizeof(u64)+1);
if (err)
return err;

return 0;
}

static void __exit sort_tests_exit(void)
{
}

module_init(sort_tests_init);
module_exit(sort_tests_exit);

MODULE_LICENSE("GPL v2");
MODULE_AUTHOR("Daniel Wagner");
MODULE_DESCRIPTION("sort perfomance tests");

Signed-off-by: Daniel Wagner
Cc: Rasmus Villemoes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Daniel Wagner
2015-06-26 08:00:40 +0800

13 Feb, 2015

2 commits

2ddae683b lib/sort.c: move include inside #if 0 ... Browse Code »

The sort function and its helpers don't do memory allocation, so the
slab.h include is redundant. Move it inside the #if 0 protecting the
self-test, similar to how it is done in lib/list_sort.c. This removes
over 450 lines from the generated dependency file.

Signed-off-by: Rasmus Villemoes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Rasmus Villemoes
2015-02-13 10:54:16 +0800
42cf80965 lib/sort.c: use simpler includes ... Browse Code »

sort.c doesn't use facilities from kernel.h, but does use some types
defined in linux/types.h. Include the latter directly instead of relying
on some other header doing it. Similarly, include linux/export.h directly
instead of through module.h. This removes 80 lines from the dependency
file .sort.o.cmd.

Signed-off-by: Rasmus Villemoes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Rasmus Villemoes
2015-02-13 10:54:15 +0800

09 Jan, 2009

1 commit

b53907c01 generic swap(): lib/sort.c: rename swap to swap_func ... Browse Code »

This is to avoid name clashes for the introduction of a global swap()
macro.

Signed-off-by: Wu Fengguang
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wu Fengguang
2009-01-09 00:31:14 +0800

17 Oct, 2007

1 commit

995e4286a lib/sort.c optimization ... Browse Code »

Hello, I fixed and tested a small bug in lib/sort.c file, heap sort
function.

The fix avoids unnecessary swap of contents when i is 0 (saves few loads
and stores), which happens every time sort function is called. I felt the
fix is worth bringing it to your attention given the importance and
frequent use of the sort function.

Acked-by: Matt Mackall
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Subbaiah Venkata
2007-10-17 23:42:52 +0800

12 Feb, 2007

1 commit

72fd4a35a [PATCH] Numerous fixes to kernel-doc info in source files. ... Browse Code »

A variety of (mostly) innocuous fixes to the embedded kernel-doc content in
source files, including:

* make multi-line initial descriptions single line
* denote some function names, constants and structs as such
* change erroneous opening '/*' to '/**' in a few places
* reword some text for clarity

Signed-off-by: Robert P. J. Day
Cc: "Randy.Dunlap"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Robert P. J. Day
2007-02-12 02:51:32 +0800

03 Oct, 2006

1 commit

d3717bdf8 [PATCH] low performance of lib/sort.c ... Browse Code »

It is a non-standard heap-sort algorithm implementation because the index
of child node is wrong . The sort function still outputs right result, but
the performance is O( n * ( log(n) + 1 ) ) , about 10% ~ 20% worse than
standard algorithm.

Signed-off-by: keios
Acked-by: Matt Mackall
Acked-by: Zou Nan hai
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

keios
2006-10-03 23:03:41 +0800

31 Oct, 2005

1 commit

4e57b6817 [PATCH] fix missing includes ... Browse Code »

I recently picked up my older work to remove unnecessary #includes of
sched.h, starting from a patch by Dave Jones to not include sched.h
from module.h. This reduces the number of indirect includes of sched.h
by ~300. Another ~400 pointless direct includes can be removed after
this disentangling (patch to follow later).
However, quite a few indirect includes need to be fixed up for this.

In order to feed the patches through -mm with as little disturbance as
possible, I've split out the fixes I accumulated up to now (complete for
i386 and x86_64, more archs to follow later) and post them before the real
patch. This way this large part of the patch is kept simple with only
adding #includes, and all hunks are independent of each other. So if any
hunk rejects or gets in the way of other patches, just drop it. My scripts
will pick it up again in the next round.

Signed-off-by: Tim Schmielau
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Tim Schmielau
2005-10-31 09:37:32 +0800

11 Sep, 2005

1 commit

ecec4cb7a [PATCH] lib/sort.c: small cleanups ... Browse Code »

This patch contains the following small cleanups:
- make two needlessly global functions static
- every file should #include the header files containing the prototypes
of it's global functions

Signed-off-by: Adrian Bunk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Adrian Bunk
2005-09-11 01:06:31 +0800

06 May, 2005

1 commit

d28c2bc8d [PATCH] fix lib/sort regression test ... Browse Code »

The regression test in lib/sort.c is currently worthless because the array
that is generated for sorting will be all zeros. This patch fixes things
so that the array that is generated will contain unsorted integers (that
are not all identical) as was probably intended.

Signed-off-by Daniel Dickman
Signed-off-by: Domen Puncer
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Domen Puncer
2005-05-06 07:36:50 +0800

17 Apr, 2005

1 commit

1da177e4c Linux-2.6.12-rc2 ... Browse Code »

Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.

Let it rip!

Linus Torvalds
2005-04-17 06:20:36 +0800