16 Oct, 2016

1 commit

  • Pull gcc plugins update from Kees Cook:
    "This adds a new gcc plugin named "latent_entropy". It is designed to
    extract as much possible uncertainty from a running system at boot
    time as possible, hoping to capitalize on any possible variation in
    CPU operation (due to runtime data differences, hardware differences,
    SMP ordering, thermal timing variation, cache behavior, etc).

    At the very least, this plugin is a much more comprehensive example
    for how to manipulate kernel code using the gcc plugin internals"

    * tag 'gcc-plugins-v4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
    latent_entropy: Mark functions with __latent_entropy
    gcc-plugins: Add latent_entropy plugin

    Linus Torvalds
     

15 Oct, 2016

1 commit

  • Pull kbuild updates from Michal Marek:

    - EXPORT_SYMBOL for asm source by Al Viro.

    This does bring a regression, because genksyms no longer generates
    checksums for these symbols (CONFIG_MODVERSIONS). Nick Piggin is
    working on a patch to fix this.

    Plus, we are talking about functions like strcpy(), which rarely
    change prototypes.

    - Fixes for PPC fallout of the above by Stephen Rothwell and Nick
    Piggin

    - fixdep speedup by Alexey Dobriyan.

    - preparatory work by Nick Piggin to allow architectures to build with
    -ffunction-sections, -fdata-sections and --gc-sections

    - CONFIG_THIN_ARCHIVES support by Stephen Rothwell

    - fix for filenames with colons in the initramfs source by me.

    * 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild: (22 commits)
    initramfs: Escape colons in depfile
    ppc: there is no clear_pages to export
    powerpc/64: whitelist unresolved modversions CRCs
    kbuild: -ffunction-sections fix for archs with conflicting sections
    kbuild: add arch specific post-link Makefile
    kbuild: allow archs to select link dead code/data elimination
    kbuild: allow architectures to use thin archives instead of ld -r
    kbuild: Regenerate genksyms lexer
    kbuild: genksyms fix for typeof handling
    fixdep: faster CONFIG_ search
    ia64: move exports to definitions
    sparc32: debride memcpy.S a bit
    [sparc] unify 32bit and 64bit string.h
    sparc: move exports to definitions
    ppc: move exports to definitions
    arm: move exports to definitions
    s390: move exports to definitions
    m68k: move exports to definitions
    alpha: move exports to actual definitions
    x86: move exports to actual definitions
    ...

    Linus Torvalds
     

11 Oct, 2016

1 commit

  • This adds a new gcc plugin named "latent_entropy". It is designed to
    extract as much possible uncertainty from a running system at boot time as
    possible, hoping to capitalize on any possible variation in CPU operation
    (due to runtime data differences, hardware differences, SMP ordering,
    thermal timing variation, cache behavior, etc).

    At the very least, this plugin is a much more comprehensive example for
    how to manipulate kernel code using the gcc plugin internals.

    The need for very-early boot entropy tends to be very architecture or
    system design specific, so this plugin is more suited for those sorts
    of special cases. The existing kernel RNG already attempts to extract
    entropy from reliable runtime variation, but this plugin takes the idea to
    a logical extreme by permuting a global variable based on any variation
    in code execution (e.g. a different value (and permutation function)
    is used to permute the global based on loop count, case statement,
    if/then/else branching, etc).

    To do this, the plugin starts by inserting a local variable in every
    marked function. The plugin then adds logic so that the value of this
    variable is modified by randomly chosen operations (add, xor and rol) and
    random values (gcc generates separate static values for each location at
    compile time and also injects the stack pointer at runtime). The resulting
    value depends on the control flow path (e.g., loops and branches taken).

    Before the function returns, the plugin mixes this local variable into
    the latent_entropy global variable. The value of this global variable
    is added to the kernel entropy pool in do_one_initcall() and _do_fork(),
    though it does not credit any bytes of entropy to the pool; the contents
    of the global are just used to mix the pool.

    Additionally, the plugin can pre-initialize arrays with build-time
    random contents, so that two different kernel builds running on identical
    hardware will not have the same starting values.

    Signed-off-by: Emese Revfy
    [kees: expanded commit message and code comments]
    Signed-off-by: Kees Cook

    Emese Revfy
     

22 Sep, 2016

1 commit

  • Enabling -ffunction-sections modified the generic linker script to
    pull .text.* sections into regular TEXT_TEXT section, conflicting
    with some architectures. Revert that change and require archs that
    enable the option to ensure they have no conflicting section names,
    and do the appropriate merging.

    Reported-by: Guenter Roeck
    Tested-by: Guenter Roeck
    Fixes: b67067f1176d ("kbuild: allow archs to select link dead code/data elimination")
    Signed-off-by: Nicholas Piggin
    Signed-off-by: Michal Marek

    Nicholas Piggin
     

15 Sep, 2016

1 commit


09 Sep, 2016

2 commits

  • Introduce LD_DEAD_CODE_DATA_ELIMINATION option for architectures to
    select to build with -ffunction-sections, -fdata-sections, and link
    with --gc-sections. It requires some work (documented) to ensure all
    unreferenced entrypoints are live, and requires toolchain and build
    verification, so it is made a per-arch option for now.

    On a random powerpc64le build, this yelds a significant size saving,
    it boots and runs fine, but there is a lot I haven't tested as yet, so
    these savings may be reduced if there are bugs in the link.

    text data bss dec filename
    11169741 1180744 1923176 14273661 vmlinux
    10445269 1004127 1919707 13369103 vmlinux.dce

    ~700K text, ~170K data, 6% removed from kernel image size.

    Signed-off-by: Nicholas Piggin
    Signed-off-by: Michal Marek

    Nicholas Piggin
     
  • ld -r is an incremental link used to create built-in.o files in build
    subdirectories. It produces relocatable object files containing all
    its input files, and these are are then pulled together and relocated
    in the final link. Aside from the bloat, this constrains the final
    link relocations, which has bitten large powerpc builds with
    unresolvable relocations in the final link.

    Alan Modra has recommended the kernel use thin archives for linking.
    This is an alternative and means that the linker has more information
    available to it when it links the kernel.

    This patch enables a config option architectures can select, which
    causes all built-in.o files to be built as thin archives. built-in.o
    files in subdirectories do not get symbol table or index attached,
    which improves speed and size. The final link pass creates a
    built-in.o archive in the root output directory which includes the
    symbol table and index. The linker then uses takes this file to link.

    The --whole-archive linker option is required, because the linker now
    has visibility to every individual object file, and it will otherwise
    just completely avoid including those without external references
    (consider a file with EXPORT_SYMBOL or initcall or hardware exceptions
    as its only entry points). The traditional built works "by luck" as
    built-in.o files are large enough that they're going to get external
    references. However this optimisation is unpredictable for the kernel
    (due to above external references), ineffective at culling unused, and
    costly because the .o files have to be searched for references.
    Superior alternatives for link-time culling should be used instead.

    Build characteristics for inclink vs thinarc, on a small powerpc64le
    pseries VM with a modest .config:

    inclink thinarc
    sizes
    vmlinux 15 618 680 15 625 028
    sum of all built-in.o 56 091 808 1 054 334
    sum excluding root built-in.o 151 430

    find -name built-in.o | xargs rm ; time make vmlinux
    real 22.772s 21.143s
    user 13.280s 13.430s
    sys 4.310s 2.750s

    - Final kernel pulled in only about 6K more, which shows how
    ineffective the object file culling is.
    - Build performance looks improved due to less pagecache activity.
    On IO constrained systems it could be a bigger win.
    - Build size saving is significant.

    Side note, the toochain understands archives, so there's some tricks,
    $ ar t built-in.o # list all files you linked with
    $ size built-in.o # and their sizes
    $ objdump -d built-in.o # disassembly (unrelocated) with filenames

    Implementation by sfr, minor tweaks by npiggin.

    Signed-off-by: Stephen Rothwell
    Signed-off-by: Nicholas Piggin
    Signed-off-by: Michal Marek

    Stephen Rothwell
     

08 Sep, 2016

1 commit


24 Aug, 2016

1 commit

  • If CONFIG_VMAP_STACK=y is selected, kernel stacks are allocated with
    __vmalloc_node_range().

    Grsecurity has had a similar feature (called GRKERNSEC_KSTACKOVERFLOW=y)
    for a long time.

    Signed-off-by: Andy Lutomirski
    Acked-by: Michal Hocko
    Cc: Alexander Potapenko
    Cc: Andrey Ryabinin
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Denys Vlasenko
    Cc: Dmitry Vyukov
    Cc: H. Peter Anvin
    Cc: Josh Poimboeuf
    Cc: Linus Torvalds
    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/14c07d4fd173a5b117f51e8b939f9f4323e39899.1470907718.git.luto@kernel.org
    [ Minor edits. ]
    Signed-off-by: Ingo Molnar

    Andy Lutomirski
     

09 Aug, 2016

1 commit

  • Pull usercopy protection from Kees Cook:
    "Tbhis implements HARDENED_USERCOPY verification of copy_to_user and
    copy_from_user bounds checking for most architectures on SLAB and
    SLUB"

    * tag 'usercopy-v4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
    mm: SLUB hardened usercopy support
    mm: SLAB hardened usercopy support
    s390/uaccess: Enable hardened usercopy
    sparc/uaccess: Enable hardened usercopy
    powerpc/uaccess: Enable hardened usercopy
    ia64/uaccess: Enable hardened usercopy
    arm64/uaccess: Enable hardened usercopy
    ARM: uaccess: Enable hardened usercopy
    x86/uaccess: Enable hardened usercopy
    mm: Hardened usercopy
    mm: Implement stack frame object validation
    mm: Add is_migrate_cma_page

    Linus Torvalds
     

03 Aug, 2016

1 commit

  • Pull kbuild updates from Michal Marek:

    - GCC plugin support by Emese Revfy from grsecurity, with a fixup from
    Kees Cook. The plugins are meant to be used for static analysis of
    the kernel code. Two plugins are provided already.

    - reduction of the gcc commandline by Arnd Bergmann.

    - IS_ENABLED / IS_REACHABLE macro enhancements by Masahiro Yamada

    - bin2c fix by Michael Tautschnig

    - setlocalversion fix by Wolfram Sang

    * 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
    gcc-plugins: disable under COMPILE_TEST
    kbuild: Abort build on bad stack protector flag
    scripts: Fix size mismatch of kexec_purgatory_size
    kbuild: make samples depend on headers_install
    Kbuild: don't add obj tree in additional includes
    Kbuild: arch: look for generated headers in obtree
    Kbuild: always prefix objtree in LINUXINCLUDE
    Kbuild: avoid duplicate include path
    Kbuild: don't add ../../ to include path
    vmlinux.lds.h: replace config_enabled() with IS_ENABLED()
    kconfig.h: allow to use IS_{ENABLE,REACHABLE} in macro expansion
    kconfig.h: use already defined macros for IS_REACHABLE() define
    export.h: use __is_defined() to check if __KSYM_* is defined
    kconfig.h: use __is_defined() to check if MODULE is defined
    kbuild: setlocalversion: print error to STDERR
    Add sancov plugin
    Add Cyclomatic complexity GCC plugin
    GCC plugin infrastructure
    Shared library support

    Linus Torvalds
     

27 Jul, 2016

2 commits

  • Since adding the gcc plugin development headers is required for the
    gcc plugin support, we should ease into this new kernel build dependency
    more slowly. For now, disable the gcc plugins under COMPILE_TEST so that
    all*config builds will skip it.

    Signed-off-by: Kees Cook
    Signed-off-by: Michal Marek

    Kees Cook
     
  • This creates per-architecture function arch_within_stack_frames() that
    should validate if a given object is contained by a kernel stack frame.
    Initial implementation is on x86.

    This is based on code from PaX.

    Signed-off-by: Kees Cook

    Kees Cook
     

25 Jun, 2016

1 commit

  • We've had the thread info allocated together with the thread stack for
    most architectures for a long time (since the thread_info was split off
    from the task struct), but that is about to change.

    But the patches that move the thread info to be off-stack (and a part of
    the task struct instead) made it clear how confused the allocator and
    freeing functions are.

    Because the common case was that we share an allocation with the thread
    stack and the thread_info, the two pointers were identical. That
    identity then meant that we would have things like

    ti = alloc_thread_info_node(tsk, node);
    ...
    tsk->stack = ti;

    which certainly _worked_ (since stack and thread_info have the same
    value), but is rather confusing: why are we assigning a thread_info to
    the stack? And if we move the thread_info away, the "confusing" code
    just gets to be entirely bogus.

    So remove all this confusion, and make it clear that we are doing the
    stack allocation by renaming and clarifying the function names to be
    about the stack. The fact that the thread_info then shares the
    allocation is an implementation detail, and not really about the
    allocation itself.

    This is a pure renaming and type fix: we pass in the same pointer, it's
    just that we clarify what the pointer means.

    The ia64 code that actually only has one single allocation (for all of
    task_struct, thread_info and kernel thread stack) now looks a bit odd,
    but since "tsk->stack" is actually not even used there, that oddity
    doesn't matter. It would be a separate thing to clean that up, I
    intentionally left the ia64 changes as a pure brute-force renaming and
    type change.

    Acked-by: Andy Lutomirski
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

18 Jun, 2016

1 commit

  • Several modern devices, such as PC/104 cards, are expected to run on
    modern systems via an ISA bus interface. Since ISA is a legacy interface
    for most modern architectures, ISA support should remain disabled in
    general. Support for ISA-style drivers should be enabled on a per driver
    basis.

    To allow ISA-style drivers on modern systems, this patch introduces the
    ISA_BUS_API and ISA_BUS Kconfig options. The ISA bus driver will now
    build conditionally on the ISA_BUS_API Kconfig option, which defaults to
    the legacy ISA Kconfig option. The ISA_BUS Kconfig option allows the
    ISA_BUS_API Kconfig option to be selected on architectures which do not
    enable ISA (e.g. X86_64).

    The ISA_BUS Kconfig option is currently only implemented for X86
    architectures. Other architectures may have their own ISA_BUS Kconfig
    options added as required.

    Reviewed-by: Guenter Roeck
    Signed-off-by: William Breathitt Gray
    Acked-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    William Breathitt Gray
     

08 Jun, 2016

3 commits

  • The sancov gcc plugin inserts a __sanitizer_cov_trace_pc() call
    at the start of basic blocks.

    This plugin is a helper plugin for the kcov feature. It supports
    all gcc versions with plugin support (from gcc-4.5 on).
    It is based on the gcc commit "Add fuzzing coverage support" by Dmitry Vyukov
    (https://gcc.gnu.org/viewcvs/gcc?limit_changes=0&view=revision&revision=231296).

    Signed-off-by: Emese Revfy
    Acked-by: Kees Cook
    Signed-off-by: Michal Marek

    Emese Revfy
     
  • Add a very simple plugin to demonstrate the GCC plugin infrastructure. This GCC
    plugin computes the cyclomatic complexity of each function.

    The complexity M of a function's control flow graph is defined as:
    M = E - N + 2P
    where
    E = the number of edges
    N = the number of nodes
    P = the number of connected components (exit nodes).

    Signed-off-by: Emese Revfy
    Acked-by: Kees Cook
    Signed-off-by: Michal Marek

    Emese Revfy
     
  • This patch allows to build the whole kernel with GCC plugins. It was ported from
    grsecurity/PaX. The infrastructure supports building out-of-tree modules and
    building in a separate directory. Cross-compilation is supported too.
    Currently the x86, arm, arm64 and uml architectures enable plugins.

    The directory of the gcc plugins is scripts/gcc-plugins. You can use a file or a directory
    there. The plugins compile with these options:
    * -fno-rtti: gcc is compiled with this option so the plugins must use it too
    * -fno-exceptions: this is inherited from gcc too
    * -fasynchronous-unwind-tables: this is inherited from gcc too
    * -ggdb: it is useful for debugging a plugin (better backtrace on internal
    errors)
    * -Wno-narrowing: to suppress warnings from gcc headers (ipa-utils.h)
    * -Wno-unused-variable: to suppress warnings from gcc headers (gcc_version
    variable, plugin-version.h)

    The infrastructure introduces a new Makefile target called gcc-plugins. It
    supports all gcc versions from 4.5 to 6.0. The scripts/gcc-plugin.sh script
    chooses the proper host compiler (gcc-4.7 can be built by either gcc or g++).
    This script also checks the availability of the included headers in
    scripts/gcc-plugins/gcc-common.h.

    The gcc-common.h header contains frequently included headers for GCC plugins
    and it has a compatibility layer for the supported gcc versions.

    The gcc-generate-*-pass.h headers automatically generate the registration
    structures for GIMPLE, SIMPLE_IPA, IPA and RTL passes.

    Note that 'make clean' keeps the *.so files (only the distclean or mrproper
    targets clean all) because they are needed for out-of-tree modules.

    Based on work created by the PaX Team.

    Signed-off-by: Emese Revfy
    Acked-by: Kees Cook
    Signed-off-by: Michal Marek

    Emese Revfy
     

29 May, 2016

2 commits

  • Pull string hash improvements from George Spelvin:
    "This series does several related things:

    - Makes the dcache hash (fs/namei.c) useful for general kernel use.

    (Thanks to Bruce for noticing the zero-length corner case)

    - Converts the string hashes in to use the
    above.

    - Avoids 64-bit multiplies in hash_64() on 32-bit platforms. Two
    32-bit multiplies will do well enough.

    - Rids the world of the bad hash multipliers in hash_32.

    This finishes the job started in commit 689de1d6ca95 ("Minimal
    fix-up of bad hashing behavior of hash_64()")

    The vast majority of Linux architectures have hardware support for
    32x32-bit multiply and so derive no benefit from "simplified"
    multipliers.

    The few processors that do not (68000, h8/300 and some models of
    Microblaze) have arch-specific implementations added. Those
    patches are last in the series.

    - Overhauls the dcache hash mixing.

    The patch in commit 0fed3ac866ea ("namei: Improve hash mixing if
    CONFIG_DCACHE_WORD_ACCESS") was an off-the-cuff suggestion.
    Replaced with a much more careful design that's simultaneously
    faster and better. (My own invention, as there was noting suitable
    in the literature I could find. Comments welcome!)

    - Modify the hash_name() loop to skip the initial HASH_MIX(). This
    would let us salt the hash if we ever wanted to.

    - Sort out partial_name_hash().

    The hash function is declared as using a long state, even though
    it's truncated to 32 bits at the end and the extra internal state
    contributes nothing to the result. And some callers do odd things:

    - fs/hfs/string.c only allocates 32 bits of state
    - fs/hfsplus/unicode.c uses it to hash 16-bit unicode symbols not bytes

    - Modify bytemask_from_count to handle inputs of 1..sizeof(long)
    rather than 0..sizeof(long)-1. This would simplify users other
    than full_name_hash"

    Special thanks to Bruce Fields for testing and finding bugs in v1. (I
    learned some humbling lessons about "obviously correct" code.)

    On the arch-specific front, the m68k assembly has been tested in a
    standalone test harness, I've been in contact with the Microblaze
    maintainers who mostly don't care, as the hardware multiplier is never
    omitted in real-world applications, and I haven't heard anything from
    the H8/300 world"

    * 'hash' of git://ftp.sciencehorizons.net/linux:
    h8300: Add
    microblaze: Add
    m68k: Add
    : Add support for architecture-specific functions
    fs/namei.c: Improve dcache hash function
    Eliminate bad hash multipliers from hash_32() and hash_64()
    Change hash_64() return value to 32 bits
    : Define hash_str() in terms of hashlen_string()
    fs/namei.c: Add hashlen_string() function
    Pull out string hash to

    Linus Torvalds
     
  • This is just the infrastructure; there are no users yet.

    This is modelled on CONFIG_ARCH_RANDOM; a CONFIG_ symbol declares
    the existence of .

    That file may define its own versions of various functions, and define
    HAVE_* symbols (no CONFIG_ prefix!) to suppress the generic ones.

    Included is a self-test (in lib/test_hash.c) that verifies the basics.
    It is NOT in general required that the arch-specific functions compute
    the same thing as the generic, but if a HAVE_* symbol is defined with
    the value 1, then equality is tested.

    Signed-off-by: George Spelvin
    Cc: Geert Uytterhoeven
    Cc: Greg Ungerer
    Cc: Andreas Schwab
    Cc: Philippe De Muyter
    Cc: linux-m68k@lists.linux-m68k.org
    Cc: Alistair Francis
    Cc: Michal Simek
    Cc: Yoshinori Sato
    Cc: uclinux-h8-devel@lists.sourceforge.jp

    George Spelvin
     

21 May, 2016

3 commits

  • The binary GCD algorithm is based on the following facts:
    1. If a and b are all evens, then gcd(a,b) = 2 * gcd(a/2, b/2)
    2. If a is even and b is odd, then gcd(a,b) = gcd(a/2, b)
    3. If a and b are all odds, then gcd(a,b) = gcd((a-b)/2, b) = gcd((a+b)/2, b)

    Even on x86 machines with reasonable division hardware, the binary
    algorithm runs about 25% faster (80% the execution time) than the
    division-based Euclidian algorithm.

    On platforms like Alpha and ARMv6 where division is a function call to
    emulation code, it's even more significant.

    There are two variants of the code here, depending on whether a fast
    __ffs (find least significant set bit) instruction is available. This
    allows the unpredictable branches in the bit-at-a-time shifting loop to
    be eliminated.

    If fast __ffs is not available, the "even/odd" GCD variant is used.

    I use the following code to benchmark:

    #include
    #include
    #include
    #include
    #include
    #include

    #define swap(a, b) \
    do { \
    a ^= b; \
    b ^= a; \
    a ^= b; \
    } while (0)

    unsigned long gcd0(unsigned long a, unsigned long b)
    {
    unsigned long r;

    if (a < b) {
    swap(a, b);
    }

    if (b == 0)
    return a;

    while ((r = a % b) != 0) {
    a = b;
    b = r;
    }

    return b;
    }

    unsigned long gcd1(unsigned long a, unsigned long b)
    {
    unsigned long r = a | b;

    if (!a || !b)
    return r;

    b >>= __builtin_ctzl(b);

    for (;;) {
    a >>= __builtin_ctzl(a);
    if (a == b)
    return a << __builtin_ctzl(r);

    if (a < b)
    swap(a, b);
    a -= b;
    }
    }

    unsigned long gcd2(unsigned long a, unsigned long b)
    {
    unsigned long r = a | b;

    if (!a || !b)
    return r;

    r &= -r;

    while (!(b & r))
    b >>= 1;

    for (;;) {
    while (!(a & r))
    a >>= 1;
    if (a == b)
    return a;

    if (a < b)
    swap(a, b);
    a -= b;
    a >>= 1;
    if (a & r)
    a += b;
    a >>= 1;
    }
    }

    unsigned long gcd3(unsigned long a, unsigned long b)
    {
    unsigned long r = a | b;

    if (!a || !b)
    return r;

    b >>= __builtin_ctzl(b);
    if (b == 1)
    return r & -r;

    for (;;) {
    a >>= __builtin_ctzl(a);
    if (a == 1)
    return r & -r;
    if (a == b)
    return a << __builtin_ctzl(r);

    if (a < b)
    swap(a, b);
    a -= b;
    }
    }

    unsigned long gcd4(unsigned long a, unsigned long b)
    {
    unsigned long r = a | b;

    if (!a || !b)
    return r;

    r &= -r;

    while (!(b & r))
    b >>= 1;
    if (b == r)
    return r;

    for (;;) {
    while (!(a & r))
    a >>= 1;
    if (a == r)
    return r;
    if (a == b)
    return a;

    if (a < b)
    swap(a, b);
    a -= b;
    a >>= 1;
    if (a & r)
    a += b;
    a >>= 1;
    }
    }

    static unsigned long (*gcd_func[])(unsigned long a, unsigned long b) = {
    gcd0, gcd1, gcd2, gcd3, gcd4,
    };

    #define TEST_ENTRIES (sizeof(gcd_func) / sizeof(gcd_func[0]))

    #if defined(__x86_64__)

    #define rdtscll(val) do { \
    unsigned long __a,__d; \
    __asm__ __volatile__("rdtsc" : "=a" (__a), "=d" (__d)); \
    (val) = ((unsigned long long)__a) | (((unsigned long long)__d)<= start)
    ret = end - start;
    else
    ret = ~0ULL - start + 1 + end;

    *res = gcd_res;
    return ret;
    }

    #else

    static inline struct timespec read_time(void)
    {
    struct timespec time;
    clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &time);
    return time;
    }

    static inline unsigned long long diff_time(struct timespec start, struct timespec end)
    {
    struct timespec temp;

    if ((end.tv_nsec - start.tv_nsec) < 0) {
    temp.tv_sec = end.tv_sec - start.tv_sec - 1;
    temp.tv_nsec = 1000000000ULL + end.tv_nsec - start.tv_nsec;
    } else {
    temp.tv_sec = end.tv_sec - start.tv_sec;
    temp.tv_nsec = end.tv_nsec - start.tv_nsec;
    }

    return temp.tv_sec * 1000000000ULL + temp.tv_nsec;
    }

    static unsigned long long benchmark_gcd_func(unsigned long (*gcd)(unsigned long, unsigned long),
    unsigned long a, unsigned long b, unsigned long *res)
    {
    struct timespec start, end;
    unsigned long gcd_res;

    start = read_time();
    gcd_res = gcd(a, b);
    end = read_time();

    *res = gcd_res;
    return diff_time(start, end);
    }

    #endif

    static inline unsigned long get_rand()
    {
    if (sizeof(long) == 8)
    return (unsigned long)rand() << 32 | rand();
    else
    return rand();
    }

    int main(int argc, char **argv)
    {
    unsigned int seed = time(0);
    int loops = 100;
    int repeats = 1000;
    unsigned long (*res)[TEST_ENTRIES];
    unsigned long long elapsed[TEST_ENTRIES];
    int i, j, k;

    for (;;) {
    int opt = getopt(argc, argv, "n:r:s:");
    /* End condition always first */
    if (opt == -1)
    break;

    switch (opt) {
    case 'n':
    loops = atoi(optarg);
    break;
    case 'r':
    repeats = atoi(optarg);
    break;
    case 's':
    seed = strtoul(optarg, NULL, 10);
    break;
    default:
    /* You won't actually get here. */
    break;
    }
    }

    res = malloc(sizeof(unsigned long) * TEST_ENTRIES * loops);
    memset(elapsed, 0, sizeof(elapsed));

    srand(seed);
    for (j = 0; j < loops; j++) {
    unsigned long a = get_rand();
    /* Do we have args? */
    unsigned long b = argc > optind ? strtoul(argv[optind], NULL, 10) : get_rand();
    unsigned long long min_elapsed[TEST_ENTRIES];
    for (k = 0; k < repeats; k++) {
    for (i = 0; i < TEST_ENTRIES; i++) {
    unsigned long long tmp = benchmark_gcd_func(gcd_func[i], a, b, &res[j][i]);
    if (k == 0 || min_elapsed[i] > tmp)
    min_elapsed[i] = tmp;
    }
    }
    for (i = 0; i < TEST_ENTRIES; i++)
    elapsed[i] += min_elapsed[i];
    }

    for (i = 0; i < TEST_ENTRIES; i++)
    printf("gcd%d: elapsed %llu\n", i, elapsed[i]);

    k = 0;
    srand(seed);
    for (j = 0; j < loops; j++) {
    unsigned long a = get_rand();
    unsigned long b = argc > optind ? strtoul(argv[optind], NULL, 10) : get_rand();
    for (i = 1; i < TEST_ENTRIES; i++) {
    if (res[j][i] != res[j][0])
    break;
    }
    if (i < TEST_ENTRIES) {
    if (k == 0) {
    k = 1;
    fprintf(stderr, "Error:\n");
    }
    fprintf(stderr, "gcd(%lu, %lu): ", a, b);
    for (i = 0; i < TEST_ENTRIES; i++)
    fprintf(stderr, "%ld%s", res[j][i], i < TEST_ENTRIES - 1 ? ", " : "\n");
    }
    }

    if (k == 0)
    fprintf(stderr, "PASS\n");

    free(res);

    return 0;
    }

    Compiled with "-O2", on "VirtualBox 4.4.0-22-generic #38-Ubuntu x86_64" got:

    zhaoxiuzeng@zhaoxiuzeng-VirtualBox:~/develop$ ./gcd -r 500000 -n 10
    gcd0: elapsed 10174
    gcd1: elapsed 2120
    gcd2: elapsed 2902
    gcd3: elapsed 2039
    gcd4: elapsed 2812
    PASS
    zhaoxiuzeng@zhaoxiuzeng-VirtualBox:~/develop$ ./gcd -r 500000 -n 10
    gcd0: elapsed 9309
    gcd1: elapsed 2280
    gcd2: elapsed 2822
    gcd3: elapsed 2217
    gcd4: elapsed 2710
    PASS
    zhaoxiuzeng@zhaoxiuzeng-VirtualBox:~/develop$ ./gcd -r 500000 -n 10
    gcd0: elapsed 9589
    gcd1: elapsed 2098
    gcd2: elapsed 2815
    gcd3: elapsed 2030
    gcd4: elapsed 2718
    PASS
    zhaoxiuzeng@zhaoxiuzeng-VirtualBox:~/develop$ ./gcd -r 500000 -n 10
    gcd0: elapsed 9914
    gcd1: elapsed 2309
    gcd2: elapsed 2779
    gcd3: elapsed 2228
    gcd4: elapsed 2709
    PASS

    [akpm@linux-foundation.org: avoid #defining a CONFIG_ variable]
    Signed-off-by: Zhaoxiu Zeng
    Signed-off-by: George Spelvin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Zhaoxiu Zeng
     
  • printk() takes some locks and could not be used a safe way in NMI
    context.

    The chance of a deadlock is real especially when printing stacks from
    all CPUs. This particular problem has been addressed on x86 by the
    commit a9edc8809328 ("x86/nmi: Perform a safe NMI stack trace on all
    CPUs").

    The patchset brings two big advantages. First, it makes the NMI
    backtraces safe on all architectures for free. Second, it makes all NMI
    messages almost safe on all architectures (the temporary buffer is
    limited. We still should keep the number of messages in NMI context at
    minimum).

    Note that there already are several messages printed in NMI context:
    WARN_ON(in_nmi()), BUG_ON(in_nmi()), anything being printed out from MCE
    handlers. These are not easy to avoid.

    This patch reuses most of the code and makes it generic. It is useful
    for all messages and architectures that support NMI.

    The alternative printk_func is set when entering and is reseted when
    leaving NMI context. It queues IRQ work to copy the messages into the
    main ring buffer in a safe context.

    __printk_nmi_flush() copies all available messages and reset the buffer.
    Then we could use a simple cmpxchg operations to get synchronized with
    writers. There is also used a spinlock to get synchronized with other
    flushers.

    We do not longer use seq_buf because it depends on external lock. It
    would be hard to make all supported operations safe for a lockless use.
    It would be confusing and error prone to make only some operations safe.

    The code is put into separate printk/nmi.c as suggested by Steven
    Rostedt. It needs a per-CPU buffer and is compiled only on
    architectures that call nmi_enter(). This is achieved by the new
    HAVE_NMI Kconfig flag.

    The are MN10300 and Xtensa architectures. We need to clean up NMI
    handling there first. Let's do it separately.

    The patch is heavily based on the draft from Peter Zijlstra, see

    https://lkml.org/lkml/2015/6/10/327

    [arnd@arndb.de: printk-nmi: use %zu format string for size_t]
    [akpm@linux-foundation.org: min_t->min - all types are size_t here]
    Signed-off-by: Petr Mladek
    Suggested-by: Peter Zijlstra
    Suggested-by: Steven Rostedt
    Cc: Jan Kara
    Acked-by: Russell King [arm part]
    Cc: Daniel Thompson
    Cc: Jiri Kosina
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: Ralf Baechle
    Cc: Benjamin Herrenschmidt
    Cc: Martin Schwidefsky
    Cc: David Miller
    Cc: Daniel Thompson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Petr Mladek
     
  • Define HAVE_EXIT_THREAD for archs which want to do something in
    exit_thread. For others, let's define exit_thread as an empty inline.

    This is a cleanup before we change the prototype of exit_thread to
    accept a task parameter.

    [akpm@linux-foundation.org: fix mips]
    Signed-off-by: Jiri Slaby
    Cc: "David S. Miller"
    Cc: "H. Peter Anvin"
    Cc: "James E.J. Bottomley"
    Cc: Aurelien Jacquiot
    Cc: Benjamin Herrenschmidt
    Cc: Catalin Marinas
    Cc: Chen Liqin
    Cc: Chris Metcalf
    Cc: Chris Zankel
    Cc: David Howells
    Cc: Fenghua Yu
    Cc: Geert Uytterhoeven
    Cc: Guan Xuetao
    Cc: Haavard Skinnemoen
    Cc: Hans-Christian Egtvedt
    Cc: Heiko Carstens
    Cc: Helge Deller
    Cc: Ingo Molnar
    Cc: Ivan Kokshaysky
    Cc: James Hogan
    Cc: Jeff Dike
    Cc: Jesper Nilsson
    Cc: Jiri Slaby
    Cc: Jonas Bonn
    Cc: Koichi Yasutake
    Cc: Lennox Wu
    Cc: Ley Foon Tan
    Cc: Mark Salter
    Cc: Martin Schwidefsky
    Cc: Matt Turner
    Cc: Max Filippov
    Cc: Michael Ellerman
    Cc: Michal Simek
    Cc: Mikael Starvik
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Ralf Baechle
    Cc: Rich Felker
    Cc: Richard Henderson
    Cc: Richard Kuo
    Cc: Richard Weinberger
    Cc: Russell King
    Cc: Steven Miao
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Cc: Vineet Gupta
    Cc: Will Deacon
    Cc: Yoshinori Sato
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiri Slaby
     

29 Feb, 2016

1 commit

  • Add a CONFIG_STACK_VALIDATION option which will run "objtool check" for
    each .o file to ensure the validity of its stack metadata.

    Signed-off-by: Josh Poimboeuf
    Cc: Andrew Morton
    Cc: Andy Lutomirski
    Cc: Arnaldo Carvalho de Melo
    Cc: Bernd Petrovitsch
    Cc: Borislav Petkov
    Cc: Chris J Arges
    Cc: Jiri Slaby
    Cc: Linus Torvalds
    Cc: Michal Marek
    Cc: Namhyung Kim
    Cc: Pedro Alves
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: live-patching@vger.kernel.org
    Link: http://lkml.kernel.org/r/92baab69a6bf9bc7043af0bfca9fb964a1d45546.1456719558.git.jpoimboe@redhat.com
    Signed-off-by: Ingo Molnar

    Josh Poimboeuf
     

21 Jan, 2016

2 commits

  • Move the generic implementation to now that all
    architectures support it and remove the HAVE_DMA_ATTR Kconfig symbol now
    that everyone supports them.

    [valentinrothberg@gmail.com: remove leftovers in Kconfig]
    Signed-off-by: Christoph Hellwig
    Cc: "David S. Miller"
    Cc: Aurelien Jacquiot
    Cc: Chris Metcalf
    Cc: David Howells
    Cc: Geert Uytterhoeven
    Cc: Haavard Skinnemoen
    Cc: Hans-Christian Egtvedt
    Cc: Helge Deller
    Cc: James Hogan
    Cc: Jesper Nilsson
    Cc: Koichi Yasutake
    Cc: Ley Foon Tan
    Cc: Mark Salter
    Cc: Mikael Starvik
    Cc: Steven Miao
    Cc: Vineet Gupta
    Cc: Christian Borntraeger
    Cc: Joerg Roedel
    Cc: Sebastian Ott
    Signed-off-by: Valentin Rothberg
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     
  • This series converts all remaining architectures to use dma_map_ops and
    the generic implementation of the DMA API. This not only simplifies the
    code a lot, but also prepares for possible future changes like more
    generic non-iommu dma_ops implementations or generic per-device
    dma_map_ops.

    This patch (of 16):

    We have a couple architectures that do not want to support this code, so
    add another Kconfig symbol that disables the code similar to what we do
    for the nommu case.

    Signed-off-by: Christoph Hellwig
    Cc: Haavard Skinnemoen
    Cc: Hans-Christian Egtvedt
    Cc: Steven Miao
    Cc: Ley Foon Tan
    Cc: David Howells
    Cc: Koichi Yasutake
    Cc: Chris Metcalf
    Cc: "David S. Miller"
    Cc: Aurelien Jacquiot
    Cc: Geert Uytterhoeven
    Cc: Helge Deller
    Cc: James Hogan
    Cc: Jesper Nilsson
    Cc: Mark Salter
    Cc: Mikael Starvik
    Cc: Vineet Gupta
    Cc: Christian Borntraeger
    Cc: Joerg Roedel
    Cc: Sebastian Ott
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     

15 Jan, 2016

1 commit

  • Address Space Layout Randomization (ASLR) provides a barrier to
    exploitation of user-space processes in the presence of security
    vulnerabilities by making it more difficult to find desired code/data
    which could help an attack. This is done by adding a random offset to
    the location of regions in the process address space, with a greater
    range of potential offset values corresponding to better protection/a
    larger search-space for brute force, but also to greater potential for
    fragmentation.

    The offset added to the mmap_base address, which provides the basis for
    the majority of the mappings for a process, is set once on process exec
    in arch_pick_mmap_layout() and is done via hard-coded per-arch values,
    which reflect, hopefully, the best compromise for all systems. The
    trade-off between increased entropy in the offset value generation and
    the corresponding increased variability in address space fragmentation
    is not absolute, however, and some platforms may tolerate higher amounts
    of entropy. This patch introduces both new Kconfig values and a sysctl
    interface which may be used to change the amount of entropy used for
    offset generation on a system.

    The direct motivation for this change was in response to the
    libstagefright vulnerabilities that affected Android, specifically to
    information provided by Google's project zero at:

    http://googleprojectzero.blogspot.com/2015/09/stagefrightened.html

    The attack presented therein, by Google's project zero, specifically
    targeted the limited randomness used to generate the offset added to the
    mmap_base address in order to craft a brute-force-based attack.
    Concretely, the attack was against the mediaserver process, which was
    limited to respawning every 5 seconds, on an arm device. The hard-coded
    8 bits used resulted in an average expected success rate of defeating
    the mmap ASLR after just over 10 minutes (128 tries at 5 seconds a
    piece). With this patch, and an accompanying increase in the entropy
    value to 16 bits, the same attack would take an average expected time of
    over 45 hours (32768 tries), which makes it both less feasible and more
    likely to be noticed.

    The introduced Kconfig and sysctl options are limited by per-arch
    minimum and maximum values, the minimum of which was chosen to match the
    current hard-coded value and the maximum of which was chosen so as to
    give the greatest flexibility without generating an invalid mmap_base
    address, generally a 3-4 bits less than the number of bits in the
    user-space accessible virtual address space.

    When decided whether or not to change the default value, a system
    developer should consider that mmap_base address could be placed
    anywhere up to 2^(value) bits away from the non-randomized location,
    which would introduce variable-sized areas above and below the mmap_base
    address such that the maximum vm_area_struct size may be reduced,
    preventing very large allocations.

    This patch (of 4):

    ASLR only uses as few as 8 bits to generate the random offset for the
    mmap base address on 32 bit architectures. This value was chosen to
    prevent a poorly chosen value from dividing the address space in such a
    way as to prevent large allocations. This may not be an issue on all
    platforms. Allow the specification of a minimum number of bits so that
    platforms desiring greater ASLR protection may determine where to place
    the trade-off.

    Signed-off-by: Daniel Cashman
    Cc: Russell King
    Acked-by: Kees Cook
    Cc: Ingo Molnar
    Cc: Jonathan Corbet
    Cc: Don Zickus
    Cc: Eric W. Biederman
    Cc: Heinrich Schuchardt
    Cc: Josh Poimboeuf
    Cc: Kirill A. Shutemov
    Cc: Naoya Horiguchi
    Cc: Andrea Arcangeli
    Cc: Mel Gorman
    Cc: Thomas Gleixner
    Cc: David Rientjes
    Cc: Mark Salyzyn
    Cc: Jeff Vander Stoep
    Cc: Nick Kralevich
    Cc: Catalin Marinas
    Cc: Will Deacon
    Cc: "H. Peter Anvin"
    Cc: Hector Marco-Gisbert
    Cc: Borislav Petkov
    Cc: Ralf Baechle
    Cc: Heiko Carstens
    Cc: Martin Schwidefsky
    Cc: Benjamin Herrenschmidt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daniel Cashman
     

11 Sep, 2015

1 commit

  • There are two kexec load syscalls, kexec_load another and kexec_file_load.
    kexec_file_load has been splited as kernel/kexec_file.c. In this patch I
    split kexec_load syscall code to kernel/kexec.c.

    And add a new kconfig option KEXEC_CORE, so we can disable kexec_load and
    use kexec_file_load only, or vice verse.

    The original requirement is from Ted Ts'o, he want kexec kernel signature
    being checked with CONFIG_KEXEC_VERIFY_SIG enabled. But kexec-tools use
    kexec_load syscall can bypass the checking.

    Vivek Goyal proposed to create a common kconfig option so user can compile
    in only one syscall for loading kexec kernel. KEXEC/KEXEC_FILE selects
    KEXEC_CORE so that old config files still work.

    Because there's general code need CONFIG_KEXEC_CORE, so I updated all the
    architecture Kconfig with a new option KEXEC_CORE, and let KEXEC selects
    KEXEC_CORE in arch Kconfig. Also updated general kernel code with to
    kexec_load syscall.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Dave Young
    Cc: Eric W. Biederman
    Cc: Vivek Goyal
    Cc: Petr Tesarik
    Cc: Theodore Ts'o
    Cc: Josh Boyer
    Cc: David Howells
    Cc: Geert Uytterhoeven
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Young
     

06 Sep, 2015

1 commit

  • Pull vfs updates from Al Viro:
    "In this one:

    - d_move fixes (Eric Biederman)

    - UFS fixes (me; locking is mostly sane now, a bunch of bugs in error
    handling ought to be fixed)

    - switch of sb_writers to percpu rwsem (Oleg Nesterov)

    - superblock scalability (Josef Bacik and Dave Chinner)

    - swapon(2) race fix (Hugh Dickins)"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (65 commits)
    vfs: Test for and handle paths that are unreachable from their mnt_root
    dcache: Reduce the scope of i_lock in d_splice_alias
    dcache: Handle escaped paths in prepend_path
    mm: fix potential data race in SyS_swapon
    inode: don't softlockup when evicting inodes
    inode: rename i_wb_list to i_io_list
    sync: serialise per-superblock sync operations
    inode: convert inode_sb_list_lock to per-sb
    inode: add hlist_fake to avoid the inode hash lock in evict
    writeback: plug writeback at a high level
    change sb_writers to use percpu_rw_semaphore
    shift percpu_counter_destroy() into destroy_super_work()
    percpu-rwsem: kill CONFIG_PERCPU_RWSEM
    percpu-rwsem: introduce percpu_rwsem_release() and percpu_rwsem_acquire()
    percpu-rwsem: introduce percpu_down_read_trylock()
    document rwsem_release() in sb_wait_write()
    fix the broken lockdep logic in __sb_start_write()
    introduce __sb_writers_{acquired,release}() helpers
    ufs_inode_get{frag,block}(): get rid of 'phys' argument
    ufs_getfrag_block(): tidy up a bit
    ...

    Linus Torvalds
     

15 Aug, 2015

1 commit


03 Aug, 2015

1 commit

  • Add a little selftest that validates all combinations.

    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andrew Morton
    Cc: Linus Torvalds
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

18 Jul, 2015

1 commit

  • Don't burden architectures without dynamic task_struct sizing
    with the overhead of dynamic sizing.

    Also optimize the x86 code a bit by caching task_struct_size.

    Acked-and-Tested-by: Dave Hansen
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Dave Hansen
    Cc: Denys Vlasenko
    Cc: Linus Torvalds
    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/1437128892-9831-3-git-send-email-mingo@kernel.org
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

26 Jun, 2015

1 commit

  • clone has some of the quirkiest syscall handling in the kernel, with a
    pile of special cases, historical curiosities, and architecture-specific
    calling conventions. In particular, clone with CLONE_SETTLS accepts a
    parameter "tls" that the C entry point completely ignores and some
    assembly entry points overwrite; instead, the low-level arch-specific
    code pulls the tls parameter out of the arch-specific register captured
    as part of pt_regs on entry to the kernel. That's a massive hack, and
    it makes the arch-specific code only work when called via the specific
    existing syscall entry points; because of this hack, any new clone-like
    system call would have to accept an identical tls argument in exactly
    the same arch-specific position, rather than providing a unified system
    call entry point across architectures.

    The first patch allows architectures to handle the tls argument via
    normal C parameter passing, if they opt in by selecting
    HAVE_COPY_THREAD_TLS. The second patch makes 32-bit and 64-bit x86 opt
    into this.

    These two patches came out of the clone4 series, which isn't ready for
    this merge window, but these first two cleanup patches were entirely
    uncontroversial and have acks. I'd like to go ahead and submit these
    two so that other architectures can begin building on top of this and
    opting into HAVE_COPY_THREAD_TLS. However, I'm also happy to wait and
    send these through the next merge window (along with v3 of clone4) if
    anyone would prefer that.

    This patch (of 2):

    clone with CLONE_SETTLS accepts an argument to set the thread-local
    storage area for the new thread. sys_clone declares an int argument
    tls_val in the appropriate point in the argument list (based on the
    various CLONE_BACKWARDS variants), but doesn't actually use or pass along
    that argument. Instead, sys_clone calls do_fork, which calls
    copy_process, which calls the arch-specific copy_thread, and copy_thread
    pulls the corresponding syscall argument out of the pt_regs captured at
    kernel entry (knowing what argument of clone that architecture passes tls
    in).

    Apart from being awful and inscrutable, that also only works because only
    one code path into copy_thread can pass the CLONE_SETTLS flag, and that
    code path comes from sys_clone with its architecture-specific
    argument-passing order. This prevents introducing a new version of the
    clone system call without propagating the same architecture-specific
    position of the tls argument.

    However, there's no reason to pull the argument out of pt_regs when
    sys_clone could just pass it down via C function call arguments.

    Introduce a new CONFIG_HAVE_COPY_THREAD_TLS for architectures to opt into,
    and a new copy_thread_tls that accepts the tls parameter as an additional
    unsigned long (syscall-argument-sized) argument. Change sys_clone's tls
    argument to an unsigned long (which does not change the ABI), and pass
    that down to copy_thread_tls.

    Architectures that don't opt into copy_thread_tls will continue to ignore
    the C argument to sys_clone in favor of the pt_regs captured at kernel
    entry, and thus will be unable to introduce new versions of the clone
    syscall.

    Patch co-authored by Josh Triplett and Thiago Macieira.

    Signed-off-by: Josh Triplett
    Acked-by: Andy Lutomirski
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Thiago Macieira
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Josh Triplett
     

17 Apr, 2015

1 commit

  • Pull powerpc updates from Michael Ellerman:

    - Numerous minor fixes, cleanups etc.

    - More EEH work from Gavin to remove its dependency on device_nodes.

    - Memory hotplug implemented entirely in the kernel from Nathan
    Fontenot.

    - Removal of redundant CONFIG_PPC_OF by Kevin Hao.

    - Rewrite of VPHN parsing logic & tests from Greg Kurz.

    - A fix from Nish Aravamudan to reduce memory usage by clamping
    nodes_possible_map.

    - Support for pstore on powernv from Hari Bathini.

    - Removal of old powerpc specific byte swap routines by David Gibson.

    - Fix from Vasant Hegde to prevent the flash driver telling you it was
    flashing your firmware when it wasn't.

    - Patch from Ben Herrenschmidt to add an OPAL heartbeat driver.

    - Fix for an oops causing get/put_cpu_var() imbalance in perf by Jan
    Stancek.

    - Some fixes for migration from Tyrel Datwyler.

    - A new syscall to switch the cpu endian by Michael Ellerman.

    - Large series from Wei Yang to implement SRIOV, reviewed and acked by
    Bjorn.

    - A fix for the OPAL sensor driver from Cédric Le Goater.

    - Fixes to get STRICT_MM_TYPECHECKS building again by Michael Ellerman.

    - Large series from Daniel Axtens to make our PCI hooks per PHB rather
    than per machine.

    - Small patch from Sam Bobroff to explicitly abort non-suspended
    transactions on syscalls, plus a test to exercise it.

    - Numerous reworks and fixes for the 24x7 PMU from Sukadev Bhattiprolu.

    - Small patch to enable the hard lockup detector from Anton Blanchard.

    - Fix from Dave Olson for missing L2 cache information on some CPUs.

    - Some fixes from Michael Ellerman to get Cell machines booting again.

    - Freescale updates from Scott: Highlights include BMan device tree
    nodes, an MSI erratum workaround, a couple minor performance
    improvements, config updates, and misc fixes/cleanup.

    * tag 'powerpc-4.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux: (196 commits)
    powerpc/powermac: Fix build error seen with powermac smp builds
    powerpc/pseries: Fix compile of memory hotplug without CONFIG_MEMORY_HOTREMOVE
    powerpc: Remove PPC32 code from pseries specific find_and_init_phbs()
    powerpc/cell: Fix iommu breakage caused by controller_ops change
    powerpc/eeh: Fix crash in eeh_add_device_early() on Cell
    powerpc/perf: Cap 64bit userspace backtraces to PERF_MAX_STACK_DEPTH
    powerpc/perf/hv-24x7: Fail 24x7 initcall if create_events_from_catalog() fails
    powerpc/pseries: Correct memory hotplug locking
    powerpc: Fix missing L2 cache size in /sys/devices/system/cpu
    powerpc: Add ppc64 hard lockup detector support
    oprofile: Disable oprofile NMI timer on ppc64
    powerpc/perf/hv-24x7: Add missing put_cpu_var()
    powerpc/perf/hv-24x7: Break up single_24x7_request
    powerpc/perf/hv-24x7: Define update_event_count()
    powerpc/perf/hv-24x7: Whitespace cleanup
    powerpc/perf/hv-24x7: Define add_event_to_24x7_request()
    powerpc/perf/hv-24x7: Rename hv_24x7_event_update
    powerpc/perf/hv-24x7: Move debug prints to separate function
    powerpc/perf/hv-24x7: Drop event_24x7_request()
    powerpc/perf/hv-24x7: Use pr_devel() to log message
    ...

    Conflicts:
    tools/testing/selftests/powerpc/Makefile
    tools/testing/selftests/powerpc/tm/Makefile

    Linus Torvalds
     

15 Apr, 2015

4 commits

  • The arch_randomize_brk() function is used on several architectures,
    even those that don't support ET_DYN ASLR. To avoid bulky extern/#define
    tricks, consolidate the support under CONFIG_ARCH_HAS_ELF_RANDOMIZE for
    the architectures that support it, while still handling CONFIG_COMPAT_BRK.

    Signed-off-by: Kees Cook
    Cc: Hector Marco-Gisbert
    Cc: Russell King
    Reviewed-by: Ingo Molnar
    Cc: Catalin Marinas
    Cc: Will Deacon
    Cc: Ralf Baechle
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Michael Ellerman
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: Alexander Viro
    Cc: Oleg Nesterov
    Cc: Andy Lutomirski
    Cc: "David A. Long"
    Cc: Andrey Ryabinin
    Cc: Arun Chandran
    Cc: Yann Droneaud
    Cc: Min-Hua Chen
    Cc: Paul Burton
    Cc: Alex Smith
    Cc: Markos Chandras
    Cc: Vineeth Vijayan
    Cc: Jeff Bailey
    Cc: Michael Holzheu
    Cc: Ben Hutchings
    Cc: Behan Webster
    Cc: Ismael Ripoll
    Cc: Jan-Simon Mller
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     
  • When an architecture fully supports randomizing the ELF load location,
    a per-arch mmap_rnd() function is used to find a randomized mmap base.
    In preparation for randomizing the location of ET_DYN binaries
    separately from mmap, this renames and exports these functions as
    arch_mmap_rnd(). Additionally introduces CONFIG_ARCH_HAS_ELF_RANDOMIZE
    for describing this feature on architectures that support it
    (which is a superset of ARCH_BINFMT_ELF_RANDOMIZE_PIE, since s390
    already supports a separated ET_DYN ASLR from mmap ASLR without the
    ARCH_BINFMT_ELF_RANDOMIZE_PIE logic).

    Signed-off-by: Kees Cook
    Cc: Hector Marco-Gisbert
    Cc: Russell King
    Reviewed-by: Ingo Molnar
    Cc: Catalin Marinas
    Cc: Will Deacon
    Cc: Ralf Baechle
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Michael Ellerman
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: Alexander Viro
    Cc: Oleg Nesterov
    Cc: Andy Lutomirski
    Cc: "David A. Long"
    Cc: Andrey Ryabinin
    Cc: Arun Chandran
    Cc: Yann Droneaud
    Cc: Min-Hua Chen
    Cc: Paul Burton
    Cc: Alex Smith
    Cc: Markos Chandras
    Cc: Vineeth Vijayan
    Cc: Jeff Bailey
    Cc: Michael Holzheu
    Cc: Ben Hutchings
    Cc: Behan Webster
    Cc: Ismael Ripoll
    Cc: Jan-Simon Mller
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     
  • Add ioremap_pud_enabled() and ioremap_pmd_enabled(), which return 1 when
    I/O mappings with pud/pmd are enabled on the kernel.

    ioremap_huge_init() calls arch_ioremap_pud_supported() and
    arch_ioremap_pmd_supported() to initialize the capabilities at boot-time.

    A new kernel option "nohugeiomap" is also added, so that user can disable
    the huge I/O map capabilities when necessary.

    Signed-off-by: Toshi Kani
    Cc: "H. Peter Anvin"
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: Arnd Bergmann
    Cc: Dave Hansen
    Cc: Robert Elliott
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Toshi Kani
     
  • By this time all architectures which support more than two page table
    levels should be covered. This patch add default definiton of
    PGTABLE_LEVELS equal 2.

    We also add assert to detect inconsistence between CONFIG_PGTABLE_LEVELS
    and __PAGETABLE_PMD_FOLDED/__PAGETABLE_PUD_FOLDED.

    Signed-off-by: Kirill A. Shutemov
    Tested-by: Guenter Roeck
    Cc: Richard Henderson
    Cc: Ivan Kokshaysky
    Cc: Matt Turner
    Cc: "David S. Miller"
    Cc: "H. Peter Anvin"
    Cc: "James E.J. Bottomley"
    Cc: Benjamin Herrenschmidt
    Cc: Catalin Marinas
    Cc: Chris Metcalf
    Cc: David Howells
    Cc: Fenghua Yu
    Cc: Geert Uytterhoeven
    Cc: Heiko Carstens
    Cc: Helge Deller
    Cc: Ingo Molnar
    Cc: Jeff Dike
    Cc: Kirill A. Shutemov
    Cc: Koichi Yasutake
    Cc: Martin Schwidefsky
    Cc: Michael Ellerman
    Cc: Paul Mackerras
    Cc: Ralf Baechle
    Cc: Richard Weinberger
    Cc: Russell King
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Cc: Will Deacon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     

11 Apr, 2015

1 commit

  • We want to enable the hard lockup detector on ppc64, but right now
    that enables the oprofile NMI timer too.

    We'd prefer not to enable the oprofile NMI timer, it adds another
    element to our PMU testing and it requires us to increase our
    exported symbols (eg cpu_khz).

    Modify the config entry for OPROFILE_NMI_TIMER to disable it on PPC64.

    Signed-off-by: Anton Blanchard
    Acked-by: Robert Richter
    Signed-off-by: Michael Ellerman

    Anton Blanchard
     

04 Sep, 2014

1 commit