23 Jan, 2019
1 commit
-
commit fbfaf851902cd9293f392f3a1735e0543016d530 upstream.
If an input number x for int_sqrt64() has the highest bit set, then
fls64(x) is 64. (1UL << 64) is an overflow and breaks the algorithm.Subtracting 1 is a better guess for the initial value of m anyway and
that's what also done in int_sqrt() implicitly [*].[*] Note how int_sqrt() uses __fls() with two underscores, which already
returns the proper raw bit number.In contrast, int_sqrt64() used fls64(), and that returns bit numbers
illogically starting at 1, because of error handling for the "no
bits set" case. Will points out that he bug probably is due to a
copy-and-paste error from the regular int_sqrt() case.Signed-off-by: Florian La Roche
Acked-by: Will Deacon
Signed-off-by: Linus Torvalds
Signed-off-by: Greg Kroah-Hartman
04 Feb, 2018
1 commit
-
There is no option to perform 64bit integer sqrt on 32bit platform.
Added stronger typed int_sqrt64 enables the 64bit calculations to
be performed on 32bit platforms. Using same algorithm as int_sqrt()
with strong typing provides enough precision also on 32bit platforms,
but it sacrifices some performance. In case values are smaller than
ULONG_MAX the standard int_sqrt is used for calculation to maximize the
performance due to more native calculations.Signed-off-by: Crt Mori
Acked-by: Joe Perches
Signed-off-by: Jonathan Cameron
18 Nov, 2017
3 commits
-
Our current int_sqrt() is not rough nor any approximation; it calculates
the exact value of: floor(sqrt()). Document this.Link: http://lkml.kernel.org/r/20171020164645.001652117@infradead.org
Signed-off-by: Peter Zijlstra (Intel)
Acked-by: Linus Torvalds
Cc: Anshul Garg
Cc: Davidlohr Bueso
Cc: David Miller
Cc: Ingo Molnar
Cc: Joe Perches
Cc: Kees Cook
Cc: Matthew Wilcox
Cc: Michael Davidson
Cc: Thomas Gleixner
Cc: Will Deacon
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The initial value (@m) compute is:
m = 1UL << (BITS_PER_LONG - 2);
while (m > x)
m >>= 2;Which is a linear search for the highest even bit smaller or equal to @x
We can implement this using a binary search using __fls() (or better when
its hardware implemented).m = 1UL << (__fls(x) & ~1UL);
Especially for small values of @x; which are the more common arguments
when doing a CDF on idle times; the linear search is near to worst case,
while the binary search of __fls() is a constant 6 (or 5 on 32bit)
branches.cycles: branches: branch-misses:
PRE:
hot: 43.633557 +- 0.034373 45.333132 +- 0.002277 0.023529 +- 0.000681
cold: 207.438411 +- 0.125840 45.333132 +- 0.002277 6.976486 +- 0.004219SOFTWARE FLS:
hot: 29.576176 +- 0.028850 26.666730 +- 0.004511 0.019463 +- 0.000663
cold: 165.947136 +- 0.188406 26.666746 +- 0.004511 6.133897 +- 0.004386HARDWARE FLS:
hot: 24.720922 +- 0.025161 20.666784 +- 0.004509 0.020836 +- 0.000677
cold: 132.777197 +- 0.127471 20.666776 +- 0.004509 5.080285 +- 0.003874Averages computed over all values
Suggested-by: Joe Perches
Acked-by: Will Deacon
Acked-by: Linus Torvalds
Cc: Anshul Garg
Cc: Davidlohr Bueso
Cc: David Miller
Cc: Ingo Molnar
Cc: Kees Cook
Cc: Matthew Wilcox
Cc: Michael Davidson
Cc: Thomas Gleixner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The current int_sqrt() computation is sub-optimal for the case of small
@x. Which is the interesting case when we're going to do cumulative
distribution functions on idle times, which we assume to be a random
variable, where the target residency of the deepest idle state gives an
upper bound on the variable (5e6ns on recent Intel chips).In the case of small @x, the compute loop:
while (m != 0) {
b = y + m;
y >>= 1;if (x >= b) {
x -= b;
y += m;
}
m >>= 2;
}can be reduced to:
while (m > x)
m >>= 2;Because y==0, b==m and until x>=m y will remain 0.
And while this is computationally equivalent, it runs much faster
because there's less code, in particular less branches.cycles: branches: branch-misses:
OLD:
hot: 45.109444 +- 0.044117 44.333392 +- 0.002254 0.018723 +- 0.000593
cold: 187.737379 +- 0.156678 44.333407 +- 0.002254 6.272844 +- 0.004305PRE:
hot: 67.937492 +- 0.064124 66.999535 +- 0.000488 0.066720 +- 0.001113
cold: 232.004379 +- 0.332811 66.999527 +- 0.000488 6.914634 +- 0.006568POST:
hot: 43.633557 +- 0.034373 45.333132 +- 0.002277 0.023529 +- 0.000681
cold: 207.438411 +- 0.125840 45.333132 +- 0.002277 6.976486 +- 0.004219Averages computed over all values
Suggested-by: Anshul Garg
Acked-by: Linus Torvalds
Cc: Davidlohr Bueso
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: Will Deacon
Cc: Joe Perches
Cc: David Miller
Cc: Matthew Wilcox
Cc: Kees Cook
Cc: Michael Davidson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
02 Nov, 2017
1 commit
-
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.By default all files without license information are under the default
license of the kernel, which is GPL version 2.Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if
Reviewed-by: Philippe Ombredanne
Reviewed-by: Thomas Gleixner
Signed-off-by: Greg Kroah-Hartman
30 Apr, 2013
1 commit
-
Optimize the current version of the shift-and-subtract (hardware)
algorithm, described by John von Newmann[1] and Guy L Steele.Iterating 1,000,000 times, perf shows for the current version:
Performance counter stats for './sqrt-curr' (10 runs):
27.170996 task-clock # 0.979 CPUs utilized ( +- 3.19% )
3 context-switches # 0.103 K/sec ( +- 4.76% )
0 cpu-migrations # 0.004 K/sec ( +-100.00% )
104 page-faults # 0.004 M/sec ( +- 0.16% )
64,921,199 cycles # 2.389 GHz ( +- 0.03% )
28,967,789 stalled-cycles-frontend # 44.62% frontend cycles idle ( +- 0.18% )
stalled-cycles-backend
104,502,623 instructions # 1.61 insns per cycle
# 0.28 stalled cycles per insn ( +- 0.00% )
34,088,368 branches # 1254.587 M/sec ( +- 0.00% )
4,901 branch-misses # 0.01% of all branches ( +- 1.32% )0.027763015 seconds time elapsed ( +- 3.22% )
And for the new version:
Performance counter stats for './sqrt-new' (10 runs):
0.496869 task-clock # 0.519 CPUs utilized ( +- 2.38% )
0 context-switches # 0.000 K/sec
0 cpu-migrations # 0.403 K/sec ( +-100.00% )
104 page-faults # 0.209 M/sec ( +- 0.15% )
590,760 cycles # 1.189 GHz ( +- 2.35% )
395,053 stalled-cycles-frontend # 66.87% frontend cycles idle ( +- 3.67% )
stalled-cycles-backend
398,963 instructions # 0.68 insns per cycle
# 0.99 stalled cycles per insn ( +- 0.39% )
70,228 branches # 141.341 M/sec ( +- 0.36% )
3,364 branch-misses # 4.79% of all branches ( +- 5.45% )0.000957440 seconds time elapsed ( +- 2.42% )
Furthermore, this saves space in instruction text:
text data bss dec hex filename
111 0 0 111 6f lib/int_sqrt-baseline.o
89 0 0 89 59 lib/int_sqrt.o[1] http://en.wikipedia.org/wiki/First_Draft_of_a_Report_on_the_EDVAC
Signed-off-by: Davidlohr Bueso
Reviewed-by: Jonathan Gonzalez
Tested-by: Jonathan Gonzalez
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
08 Mar, 2012
1 commit
-
For files only using THIS_MODULE and/or EXPORT_SYMBOL, map
them onto including export.h -- or if the file isn't even
using those, then just delete the include. Fix up any implicit
include dependencies that were being masked by module.h along
the way.Signed-off-by: Paul Gortmaker
04 Feb, 2006
1 commit
-
The implementation of int_sqrt() assumes that longs have 32 bits. On
systems that have 64 bit longs this will result in gross errors when the
argument to the function is greater than 2^32 - 1 on such systems. I doubt
whether any such use is currently made of int_sqrt() but the attached patch
fixes the problem anyway.Signed-off-by: Peter Williams
Cc: Dave Jones
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
17 Apr, 2005
1 commit
-
Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.Let it rip!