23 Jan, 2019

1 commit

  • commit fbfaf851902cd9293f392f3a1735e0543016d530 upstream.

    If an input number x for int_sqrt64() has the highest bit set, then
    fls64(x) is 64. (1UL << 64) is an overflow and breaks the algorithm.

    Subtracting 1 is a better guess for the initial value of m anyway and
    that's what also done in int_sqrt() implicitly [*].

    [*] Note how int_sqrt() uses __fls() with two underscores, which already
    returns the proper raw bit number.

    In contrast, int_sqrt64() used fls64(), and that returns bit numbers
    illogically starting at 1, because of error handling for the "no
    bits set" case. Will points out that he bug probably is due to a
    copy-and-paste error from the regular int_sqrt() case.

    Signed-off-by: Florian La Roche
    Acked-by: Will Deacon
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Florian La Roche
     

04 Feb, 2018

1 commit

  • There is no option to perform 64bit integer sqrt on 32bit platform.
    Added stronger typed int_sqrt64 enables the 64bit calculations to
    be performed on 32bit platforms. Using same algorithm as int_sqrt()
    with strong typing provides enough precision also on 32bit platforms,
    but it sacrifices some performance. In case values are smaller than
    ULONG_MAX the standard int_sqrt is used for calculation to maximize the
    performance due to more native calculations.

    Signed-off-by: Crt Mori
    Acked-by: Joe Perches
    Signed-off-by: Jonathan Cameron

    Crt Mori
     

18 Nov, 2017

3 commits

  • Our current int_sqrt() is not rough nor any approximation; it calculates
    the exact value of: floor(sqrt()). Document this.

    Link: http://lkml.kernel.org/r/20171020164645.001652117@infradead.org
    Signed-off-by: Peter Zijlstra (Intel)
    Acked-by: Linus Torvalds
    Cc: Anshul Garg
    Cc: Davidlohr Bueso
    Cc: David Miller
    Cc: Ingo Molnar
    Cc: Joe Perches
    Cc: Kees Cook
    Cc: Matthew Wilcox
    Cc: Michael Davidson
    Cc: Thomas Gleixner
    Cc: Will Deacon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     
  • The initial value (@m) compute is:

    m = 1UL << (BITS_PER_LONG - 2);
    while (m > x)
    m >>= 2;

    Which is a linear search for the highest even bit smaller or equal to @x
    We can implement this using a binary search using __fls() (or better when
    its hardware implemented).

    m = 1UL << (__fls(x) & ~1UL);

    Especially for small values of @x; which are the more common arguments
    when doing a CDF on idle times; the linear search is near to worst case,
    while the binary search of __fls() is a constant 6 (or 5 on 32bit)
    branches.

    cycles: branches: branch-misses:

    PRE:

    hot: 43.633557 +- 0.034373 45.333132 +- 0.002277 0.023529 +- 0.000681
    cold: 207.438411 +- 0.125840 45.333132 +- 0.002277 6.976486 +- 0.004219

    SOFTWARE FLS:

    hot: 29.576176 +- 0.028850 26.666730 +- 0.004511 0.019463 +- 0.000663
    cold: 165.947136 +- 0.188406 26.666746 +- 0.004511 6.133897 +- 0.004386

    HARDWARE FLS:

    hot: 24.720922 +- 0.025161 20.666784 +- 0.004509 0.020836 +- 0.000677
    cold: 132.777197 +- 0.127471 20.666776 +- 0.004509 5.080285 +- 0.003874

    Averages computed over all values
    Suggested-by: Joe Perches
    Acked-by: Will Deacon
    Acked-by: Linus Torvalds
    Cc: Anshul Garg
    Cc: Davidlohr Bueso
    Cc: David Miller
    Cc: Ingo Molnar
    Cc: Kees Cook
    Cc: Matthew Wilcox
    Cc: Michael Davidson
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     
  • The current int_sqrt() computation is sub-optimal for the case of small
    @x. Which is the interesting case when we're going to do cumulative
    distribution functions on idle times, which we assume to be a random
    variable, where the target residency of the deepest idle state gives an
    upper bound on the variable (5e6ns on recent Intel chips).

    In the case of small @x, the compute loop:

    while (m != 0) {
    b = y + m;
    y >>= 1;

    if (x >= b) {
    x -= b;
    y += m;
    }
    m >>= 2;
    }

    can be reduced to:

    while (m > x)
    m >>= 2;

    Because y==0, b==m and until x>=m y will remain 0.

    And while this is computationally equivalent, it runs much faster
    because there's less code, in particular less branches.

    cycles: branches: branch-misses:

    OLD:

    hot: 45.109444 +- 0.044117 44.333392 +- 0.002254 0.018723 +- 0.000593
    cold: 187.737379 +- 0.156678 44.333407 +- 0.002254 6.272844 +- 0.004305

    PRE:

    hot: 67.937492 +- 0.064124 66.999535 +- 0.000488 0.066720 +- 0.001113
    cold: 232.004379 +- 0.332811 66.999527 +- 0.000488 6.914634 +- 0.006568

    POST:

    hot: 43.633557 +- 0.034373 45.333132 +- 0.002277 0.023529 +- 0.000681
    cold: 207.438411 +- 0.125840 45.333132 +- 0.002277 6.976486 +- 0.004219

    Averages computed over all values
    Suggested-by: Anshul Garg
    Acked-by: Linus Torvalds
    Cc: Davidlohr Bueso
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: Will Deacon
    Cc: Joe Perches
    Cc: David Miller
    Cc: Matthew Wilcox
    Cc: Kees Cook
    Cc: Michael Davidson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     

02 Nov, 2017

1 commit

  • Many source files in the tree are missing licensing information, which
    makes it harder for compliance tools to determine the correct license.

    By default all files without license information are under the default
    license of the kernel, which is GPL version 2.

    Update the files which contain no license information with the 'GPL-2.0'
    SPDX license identifier. The SPDX identifier is a legally binding
    shorthand, which can be used instead of the full boiler plate text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart and
    Philippe Ombredanne.

    How this work was done:

    Patches were generated and checked against linux-4.14-rc6 for a subset of
    the use cases:
    - file had no licensing information it it.
    - file was a */uapi/* one with no licensing information in it,
    - file was a */uapi/* one with existing licensing information,

    Further patches will be generated in subsequent months to fix up cases
    where non-standard license headers were used, and references to license
    had to be inferred by heuristics based on keywords.

    The analysis to determine which SPDX License Identifier to be applied to
    a file was done in a spreadsheet of side by side results from of the
    output of two independent scanners (ScanCode & Windriver) producing SPDX
    tag:value files created by Philippe Ombredanne. Philippe prepared the
    base worksheet, and did an initial spot review of a few 1000 files.

    The 4.13 kernel was the starting point of the analysis with 60,537 files
    assessed. Kate Stewart did a file by file comparison of the scanner
    results in the spreadsheet to determine which SPDX license identifier(s)
    to be applied to the file. She confirmed any determination that was not
    immediately clear with lawyers working with the Linux Foundation.

    Criteria used to select files for SPDX license identifier tagging was:
    - Files considered eligible had to be source code files.
    - Make and config files were included as candidates if they contained >5
    lines of source
    - File already had some variant of a license header in it (even if
    Reviewed-by: Philippe Ombredanne
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

30 Apr, 2013

1 commit

  • Optimize the current version of the shift-and-subtract (hardware)
    algorithm, described by John von Newmann[1] and Guy L Steele.

    Iterating 1,000,000 times, perf shows for the current version:

    Performance counter stats for './sqrt-curr' (10 runs):

    27.170996 task-clock # 0.979 CPUs utilized ( +- 3.19% )
    3 context-switches # 0.103 K/sec ( +- 4.76% )
    0 cpu-migrations # 0.004 K/sec ( +-100.00% )
    104 page-faults # 0.004 M/sec ( +- 0.16% )
    64,921,199 cycles # 2.389 GHz ( +- 0.03% )
    28,967,789 stalled-cycles-frontend # 44.62% frontend cycles idle ( +- 0.18% )
    stalled-cycles-backend
    104,502,623 instructions # 1.61 insns per cycle
    # 0.28 stalled cycles per insn ( +- 0.00% )
    34,088,368 branches # 1254.587 M/sec ( +- 0.00% )
    4,901 branch-misses # 0.01% of all branches ( +- 1.32% )

    0.027763015 seconds time elapsed ( +- 3.22% )

    And for the new version:

    Performance counter stats for './sqrt-new' (10 runs):

    0.496869 task-clock # 0.519 CPUs utilized ( +- 2.38% )
    0 context-switches # 0.000 K/sec
    0 cpu-migrations # 0.403 K/sec ( +-100.00% )
    104 page-faults # 0.209 M/sec ( +- 0.15% )
    590,760 cycles # 1.189 GHz ( +- 2.35% )
    395,053 stalled-cycles-frontend # 66.87% frontend cycles idle ( +- 3.67% )
    stalled-cycles-backend
    398,963 instructions # 0.68 insns per cycle
    # 0.99 stalled cycles per insn ( +- 0.39% )
    70,228 branches # 141.341 M/sec ( +- 0.36% )
    3,364 branch-misses # 4.79% of all branches ( +- 5.45% )

    0.000957440 seconds time elapsed ( +- 2.42% )

    Furthermore, this saves space in instruction text:

    text data bss dec hex filename
    111 0 0 111 6f lib/int_sqrt-baseline.o
    89 0 0 89 59 lib/int_sqrt.o

    [1] http://en.wikipedia.org/wiki/First_Draft_of_a_Report_on_the_EDVAC

    Signed-off-by: Davidlohr Bueso
    Reviewed-by: Jonathan Gonzalez
    Tested-by: Jonathan Gonzalez
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     

08 Mar, 2012

1 commit


04 Feb, 2006

1 commit

  • The implementation of int_sqrt() assumes that longs have 32 bits. On
    systems that have 64 bit longs this will result in gross errors when the
    argument to the function is greater than 2^32 - 1 on such systems. I doubt
    whether any such use is currently made of int_sqrt() but the attached patch
    fixes the problem anyway.

    Signed-off-by: Peter Williams
    Cc: Dave Jones
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Williams
     

17 Apr, 2005

1 commit

  • Initial git repository build. I'm not bothering with the full history,
    even though we have it. We can create a separate "historical" git
    archive of that later if we want to, and in the meantime it's about
    3.2GB when imported into git - space that would just make the early
    git days unnecessarily complicated, when we don't have a lot of good
    infrastructure for it.

    Let it rip!

    Linus Torvalds