08 Jun, 2016

1 commit

  • People complained about ARCH_HWEIGHT_CFLAGS and how it throws a wrench
    into kcov, lto, etc, experimentations.

    Add asm versions for __sw_hweight{32,64}() and do explicit saving and
    restoring of clobbered registers. This gets rid of the special calling
    convention. We get to call those functions on !X86_FEATURE_POPCNT CPUs.

    We still need to hardcode POPCNT and register operands as some old gas
    versions which we support, do not know about POPCNT.

    Btw, remove redundant REX prefix from 32-bit POPCNT because alternatives
    can do padding now.

    Suggested-by: H. Peter Anvin
    Signed-off-by: Borislav Petkov
    Acked-by: Peter Zijlstra (Intel)
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Denys Vlasenko
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/1464605787-20603-1-git-send-email-bp@alien8.de
    Signed-off-by: Ingo Molnar

    Borislav Petkov
     

14 Sep, 2014

1 commit

  • It used to be an ad-hoc hack defined by the x86 version of
    that enabled a couple of library routines to know whether
    an integer multiply is faster than repeated shifts and additions.

    This just makes it use the real Kconfig system instead, and makes x86
    (which was the only architecture that did this) select the option.

    NOTE! Even for x86, this really is kind of wrong. If we cared, we would
    probably not enable this for builds optimized for netburst (P4), where
    shifts-and-adds are generally faster than multiplies. This patch does
    *not* change that kind of logic, though, it is purely a syntactic change
    with no code changes.

    This was triggered by the fact that we have other places that really
    want to know "do I want to expand multiples by constants by hand or
    not", particularly the hash generation code.

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

08 Mar, 2012

1 commit


07 Apr, 2010

2 commits

  • Add support for the hardware version of the Hamming weight function,
    popcnt, present in CPUs which advertize it under CPUID, Function
    0x0000_0001_ECX[23]. On CPUs which don't support it, we fallback to the
    default lib/hweight.c sw versions.

    A synthetic benchmark comparing popcnt with __sw_hweight64 showed almost
    a 3x speedup on a F10h machine.

    Signed-off-by: Borislav Petkov
    LKML-Reference:
    Signed-off-by: H. Peter Anvin

    Borislav Petkov
     
  • Rename the extisting runtime hweight() implementations to
    __arch_hweight(), rename the compile-time versions to __const_hweight()
    and then have hweight() pick between them.

    Suggested-by: H. Peter Anvin
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Acked-by: H. Peter Anvin
    LKML-Reference:
    Signed-off-by: H. Peter Anvin

    Peter Zijlstra
     

28 Dec, 2009

1 commit

  • Optimize hweight32 by using the same technique in hweight64.

    The proof of this technique can be found in the commit log for
    f9b4192923fa6e38331e88214b1fe5fc21583fcc ("bitops: hweight()
    speedup").

    The userspace benchmark on x86_32 showed 20% speedup with
    bitmap_weight() which uses hweight32 to count bits for each
    unsigned long on 32bit architectures.

    int main(void)
    {
    #define SZ (1024 * 1024 * 512)

    static DECLARE_BITMAP(bitmap, SZ) = {
    [0 ... 100] = 1,
    };

    return bitmap_weight(bitmap, SZ);
    }

    Signed-off-by: Akinobu Mita
    Signed-off-by: Andrew Morton
    Cc: Linus Torvalds
    LKML-Reference:
    [ only x86 sets ARCH_HAS_FAST_MULTIPLIER so we do this via the x86 tree]
    Signed-off-by: Ingo Molnar

    Akinobu Mita
     

20 Oct, 2007

1 commit

  • remove asm/bitops.h includes

    including asm/bitops directly may cause compile errors. don't include it
    and include linux/bitops instead. next patch will deny including asm header
    directly.

    Cc: Adrian Bunk
    Signed-off-by: Jiri Slaby
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiri Slaby