25 Jun, 2016

1 commit

  • We've had the thread info allocated together with the thread stack for
    most architectures for a long time (since the thread_info was split off
    from the task struct), but that is about to change.

    But the patches that move the thread info to be off-stack (and a part of
    the task struct instead) made it clear how confused the allocator and
    freeing functions are.

    Because the common case was that we share an allocation with the thread
    stack and the thread_info, the two pointers were identical. That
    identity then meant that we would have things like

    ti = alloc_thread_info_node(tsk, node);
    ...
    tsk->stack = ti;

    which certainly _worked_ (since stack and thread_info have the same
    value), but is rather confusing: why are we assigning a thread_info to
    the stack? And if we move the thread_info away, the "confusing" code
    just gets to be entirely bogus.

    So remove all this confusion, and make it clear that we are doing the
    stack allocation by renaming and clarifying the function names to be
    about the stack. The fact that the thread_info then shares the
    allocation is an implementation detail, and not really about the
    allocation itself.

    This is a pure renaming and type fix: we pass in the same pointer, it's
    just that we clarify what the pointer means.

    The ia64 code that actually only has one single allocation (for all of
    task_struct, thread_info and kernel thread stack) now looks a bit odd,
    but since "tsk->stack" is actually not even used there, that oddity
    doesn't matter. It would be a separate thing to clean that up, I
    intentionally left the ia64 changes as a pure brute-force renaming and
    type change.

    Acked-by: Andy Lutomirski
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

18 Jun, 2016

1 commit

  • Several modern devices, such as PC/104 cards, are expected to run on
    modern systems via an ISA bus interface. Since ISA is a legacy interface
    for most modern architectures, ISA support should remain disabled in
    general. Support for ISA-style drivers should be enabled on a per driver
    basis.

    To allow ISA-style drivers on modern systems, this patch introduces the
    ISA_BUS_API and ISA_BUS Kconfig options. The ISA bus driver will now
    build conditionally on the ISA_BUS_API Kconfig option, which defaults to
    the legacy ISA Kconfig option. The ISA_BUS Kconfig option allows the
    ISA_BUS_API Kconfig option to be selected on architectures which do not
    enable ISA (e.g. X86_64).

    The ISA_BUS Kconfig option is currently only implemented for X86
    architectures. Other architectures may have their own ISA_BUS Kconfig
    options added as required.

    Reviewed-by: Guenter Roeck
    Signed-off-by: William Breathitt Gray
    Acked-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    William Breathitt Gray
     

29 May, 2016

2 commits

  • Pull string hash improvements from George Spelvin:
    "This series does several related things:

    - Makes the dcache hash (fs/namei.c) useful for general kernel use.

    (Thanks to Bruce for noticing the zero-length corner case)

    - Converts the string hashes in to use the
    above.

    - Avoids 64-bit multiplies in hash_64() on 32-bit platforms. Two
    32-bit multiplies will do well enough.

    - Rids the world of the bad hash multipliers in hash_32.

    This finishes the job started in commit 689de1d6ca95 ("Minimal
    fix-up of bad hashing behavior of hash_64()")

    The vast majority of Linux architectures have hardware support for
    32x32-bit multiply and so derive no benefit from "simplified"
    multipliers.

    The few processors that do not (68000, h8/300 and some models of
    Microblaze) have arch-specific implementations added. Those
    patches are last in the series.

    - Overhauls the dcache hash mixing.

    The patch in commit 0fed3ac866ea ("namei: Improve hash mixing if
    CONFIG_DCACHE_WORD_ACCESS") was an off-the-cuff suggestion.
    Replaced with a much more careful design that's simultaneously
    faster and better. (My own invention, as there was noting suitable
    in the literature I could find. Comments welcome!)

    - Modify the hash_name() loop to skip the initial HASH_MIX(). This
    would let us salt the hash if we ever wanted to.

    - Sort out partial_name_hash().

    The hash function is declared as using a long state, even though
    it's truncated to 32 bits at the end and the extra internal state
    contributes nothing to the result. And some callers do odd things:

    - fs/hfs/string.c only allocates 32 bits of state
    - fs/hfsplus/unicode.c uses it to hash 16-bit unicode symbols not bytes

    - Modify bytemask_from_count to handle inputs of 1..sizeof(long)
    rather than 0..sizeof(long)-1. This would simplify users other
    than full_name_hash"

    Special thanks to Bruce Fields for testing and finding bugs in v1. (I
    learned some humbling lessons about "obviously correct" code.)

    On the arch-specific front, the m68k assembly has been tested in a
    standalone test harness, I've been in contact with the Microblaze
    maintainers who mostly don't care, as the hardware multiplier is never
    omitted in real-world applications, and I haven't heard anything from
    the H8/300 world"

    * 'hash' of git://ftp.sciencehorizons.net/linux:
    h8300: Add
    microblaze: Add
    m68k: Add
    : Add support for architecture-specific functions
    fs/namei.c: Improve dcache hash function
    Eliminate bad hash multipliers from hash_32() and hash_64()
    Change hash_64() return value to 32 bits
    : Define hash_str() in terms of hashlen_string()
    fs/namei.c: Add hashlen_string() function
    Pull out string hash to

    Linus Torvalds
     
  • This is just the infrastructure; there are no users yet.

    This is modelled on CONFIG_ARCH_RANDOM; a CONFIG_ symbol declares
    the existence of .

    That file may define its own versions of various functions, and define
    HAVE_* symbols (no CONFIG_ prefix!) to suppress the generic ones.

    Included is a self-test (in lib/test_hash.c) that verifies the basics.
    It is NOT in general required that the arch-specific functions compute
    the same thing as the generic, but if a HAVE_* symbol is defined with
    the value 1, then equality is tested.

    Signed-off-by: George Spelvin
    Cc: Geert Uytterhoeven
    Cc: Greg Ungerer
    Cc: Andreas Schwab
    Cc: Philippe De Muyter
    Cc: linux-m68k@lists.linux-m68k.org
    Cc: Alistair Francis
    Cc: Michal Simek
    Cc: Yoshinori Sato
    Cc: uclinux-h8-devel@lists.sourceforge.jp

    George Spelvin
     

21 May, 2016

3 commits

  • The binary GCD algorithm is based on the following facts:
    1. If a and b are all evens, then gcd(a,b) = 2 * gcd(a/2, b/2)
    2. If a is even and b is odd, then gcd(a,b) = gcd(a/2, b)
    3. If a and b are all odds, then gcd(a,b) = gcd((a-b)/2, b) = gcd((a+b)/2, b)

    Even on x86 machines with reasonable division hardware, the binary
    algorithm runs about 25% faster (80% the execution time) than the
    division-based Euclidian algorithm.

    On platforms like Alpha and ARMv6 where division is a function call to
    emulation code, it's even more significant.

    There are two variants of the code here, depending on whether a fast
    __ffs (find least significant set bit) instruction is available. This
    allows the unpredictable branches in the bit-at-a-time shifting loop to
    be eliminated.

    If fast __ffs is not available, the "even/odd" GCD variant is used.

    I use the following code to benchmark:

    #include
    #include
    #include
    #include
    #include
    #include

    #define swap(a, b) \
    do { \
    a ^= b; \
    b ^= a; \
    a ^= b; \
    } while (0)

    unsigned long gcd0(unsigned long a, unsigned long b)
    {
    unsigned long r;

    if (a < b) {
    swap(a, b);
    }

    if (b == 0)
    return a;

    while ((r = a % b) != 0) {
    a = b;
    b = r;
    }

    return b;
    }

    unsigned long gcd1(unsigned long a, unsigned long b)
    {
    unsigned long r = a | b;

    if (!a || !b)
    return r;

    b >>= __builtin_ctzl(b);

    for (;;) {
    a >>= __builtin_ctzl(a);
    if (a == b)
    return a << __builtin_ctzl(r);

    if (a < b)
    swap(a, b);
    a -= b;
    }
    }

    unsigned long gcd2(unsigned long a, unsigned long b)
    {
    unsigned long r = a | b;

    if (!a || !b)
    return r;

    r &= -r;

    while (!(b & r))
    b >>= 1;

    for (;;) {
    while (!(a & r))
    a >>= 1;
    if (a == b)
    return a;

    if (a < b)
    swap(a, b);
    a -= b;
    a >>= 1;
    if (a & r)
    a += b;
    a >>= 1;
    }
    }

    unsigned long gcd3(unsigned long a, unsigned long b)
    {
    unsigned long r = a | b;

    if (!a || !b)
    return r;

    b >>= __builtin_ctzl(b);
    if (b == 1)
    return r & -r;

    for (;;) {
    a >>= __builtin_ctzl(a);
    if (a == 1)
    return r & -r;
    if (a == b)
    return a << __builtin_ctzl(r);

    if (a < b)
    swap(a, b);
    a -= b;
    }
    }

    unsigned long gcd4(unsigned long a, unsigned long b)
    {
    unsigned long r = a | b;

    if (!a || !b)
    return r;

    r &= -r;

    while (!(b & r))
    b >>= 1;
    if (b == r)
    return r;

    for (;;) {
    while (!(a & r))
    a >>= 1;
    if (a == r)
    return r;
    if (a == b)
    return a;

    if (a < b)
    swap(a, b);
    a -= b;
    a >>= 1;
    if (a & r)
    a += b;
    a >>= 1;
    }
    }

    static unsigned long (*gcd_func[])(unsigned long a, unsigned long b) = {
    gcd0, gcd1, gcd2, gcd3, gcd4,
    };

    #define TEST_ENTRIES (sizeof(gcd_func) / sizeof(gcd_func[0]))

    #if defined(__x86_64__)

    #define rdtscll(val) do { \
    unsigned long __a,__d; \
    __asm__ __volatile__("rdtsc" : "=a" (__a), "=d" (__d)); \
    (val) = ((unsigned long long)__a) | (((unsigned long long)__d)<= start)
    ret = end - start;
    else
    ret = ~0ULL - start + 1 + end;

    *res = gcd_res;
    return ret;
    }

    #else

    static inline struct timespec read_time(void)
    {
    struct timespec time;
    clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &time);
    return time;
    }

    static inline unsigned long long diff_time(struct timespec start, struct timespec end)
    {
    struct timespec temp;

    if ((end.tv_nsec - start.tv_nsec) < 0) {
    temp.tv_sec = end.tv_sec - start.tv_sec - 1;
    temp.tv_nsec = 1000000000ULL + end.tv_nsec - start.tv_nsec;
    } else {
    temp.tv_sec = end.tv_sec - start.tv_sec;
    temp.tv_nsec = end.tv_nsec - start.tv_nsec;
    }

    return temp.tv_sec * 1000000000ULL + temp.tv_nsec;
    }

    static unsigned long long benchmark_gcd_func(unsigned long (*gcd)(unsigned long, unsigned long),
    unsigned long a, unsigned long b, unsigned long *res)
    {
    struct timespec start, end;
    unsigned long gcd_res;

    start = read_time();
    gcd_res = gcd(a, b);
    end = read_time();

    *res = gcd_res;
    return diff_time(start, end);
    }

    #endif

    static inline unsigned long get_rand()
    {
    if (sizeof(long) == 8)
    return (unsigned long)rand() << 32 | rand();
    else
    return rand();
    }

    int main(int argc, char **argv)
    {
    unsigned int seed = time(0);
    int loops = 100;
    int repeats = 1000;
    unsigned long (*res)[TEST_ENTRIES];
    unsigned long long elapsed[TEST_ENTRIES];
    int i, j, k;

    for (;;) {
    int opt = getopt(argc, argv, "n:r:s:");
    /* End condition always first */
    if (opt == -1)
    break;

    switch (opt) {
    case 'n':
    loops = atoi(optarg);
    break;
    case 'r':
    repeats = atoi(optarg);
    break;
    case 's':
    seed = strtoul(optarg, NULL, 10);
    break;
    default:
    /* You won't actually get here. */
    break;
    }
    }

    res = malloc(sizeof(unsigned long) * TEST_ENTRIES * loops);
    memset(elapsed, 0, sizeof(elapsed));

    srand(seed);
    for (j = 0; j < loops; j++) {
    unsigned long a = get_rand();
    /* Do we have args? */
    unsigned long b = argc > optind ? strtoul(argv[optind], NULL, 10) : get_rand();
    unsigned long long min_elapsed[TEST_ENTRIES];
    for (k = 0; k < repeats; k++) {
    for (i = 0; i < TEST_ENTRIES; i++) {
    unsigned long long tmp = benchmark_gcd_func(gcd_func[i], a, b, &res[j][i]);
    if (k == 0 || min_elapsed[i] > tmp)
    min_elapsed[i] = tmp;
    }
    }
    for (i = 0; i < TEST_ENTRIES; i++)
    elapsed[i] += min_elapsed[i];
    }

    for (i = 0; i < TEST_ENTRIES; i++)
    printf("gcd%d: elapsed %llu\n", i, elapsed[i]);

    k = 0;
    srand(seed);
    for (j = 0; j < loops; j++) {
    unsigned long a = get_rand();
    unsigned long b = argc > optind ? strtoul(argv[optind], NULL, 10) : get_rand();
    for (i = 1; i < TEST_ENTRIES; i++) {
    if (res[j][i] != res[j][0])
    break;
    }
    if (i < TEST_ENTRIES) {
    if (k == 0) {
    k = 1;
    fprintf(stderr, "Error:\n");
    }
    fprintf(stderr, "gcd(%lu, %lu): ", a, b);
    for (i = 0; i < TEST_ENTRIES; i++)
    fprintf(stderr, "%ld%s", res[j][i], i < TEST_ENTRIES - 1 ? ", " : "\n");
    }
    }

    if (k == 0)
    fprintf(stderr, "PASS\n");

    free(res);

    return 0;
    }

    Compiled with "-O2", on "VirtualBox 4.4.0-22-generic #38-Ubuntu x86_64" got:

    zhaoxiuzeng@zhaoxiuzeng-VirtualBox:~/develop$ ./gcd -r 500000 -n 10
    gcd0: elapsed 10174
    gcd1: elapsed 2120
    gcd2: elapsed 2902
    gcd3: elapsed 2039
    gcd4: elapsed 2812
    PASS
    zhaoxiuzeng@zhaoxiuzeng-VirtualBox:~/develop$ ./gcd -r 500000 -n 10
    gcd0: elapsed 9309
    gcd1: elapsed 2280
    gcd2: elapsed 2822
    gcd3: elapsed 2217
    gcd4: elapsed 2710
    PASS
    zhaoxiuzeng@zhaoxiuzeng-VirtualBox:~/develop$ ./gcd -r 500000 -n 10
    gcd0: elapsed 9589
    gcd1: elapsed 2098
    gcd2: elapsed 2815
    gcd3: elapsed 2030
    gcd4: elapsed 2718
    PASS
    zhaoxiuzeng@zhaoxiuzeng-VirtualBox:~/develop$ ./gcd -r 500000 -n 10
    gcd0: elapsed 9914
    gcd1: elapsed 2309
    gcd2: elapsed 2779
    gcd3: elapsed 2228
    gcd4: elapsed 2709
    PASS

    [akpm@linux-foundation.org: avoid #defining a CONFIG_ variable]
    Signed-off-by: Zhaoxiu Zeng
    Signed-off-by: George Spelvin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Zhaoxiu Zeng
     
  • printk() takes some locks and could not be used a safe way in NMI
    context.

    The chance of a deadlock is real especially when printing stacks from
    all CPUs. This particular problem has been addressed on x86 by the
    commit a9edc8809328 ("x86/nmi: Perform a safe NMI stack trace on all
    CPUs").

    The patchset brings two big advantages. First, it makes the NMI
    backtraces safe on all architectures for free. Second, it makes all NMI
    messages almost safe on all architectures (the temporary buffer is
    limited. We still should keep the number of messages in NMI context at
    minimum).

    Note that there already are several messages printed in NMI context:
    WARN_ON(in_nmi()), BUG_ON(in_nmi()), anything being printed out from MCE
    handlers. These are not easy to avoid.

    This patch reuses most of the code and makes it generic. It is useful
    for all messages and architectures that support NMI.

    The alternative printk_func is set when entering and is reseted when
    leaving NMI context. It queues IRQ work to copy the messages into the
    main ring buffer in a safe context.

    __printk_nmi_flush() copies all available messages and reset the buffer.
    Then we could use a simple cmpxchg operations to get synchronized with
    writers. There is also used a spinlock to get synchronized with other
    flushers.

    We do not longer use seq_buf because it depends on external lock. It
    would be hard to make all supported operations safe for a lockless use.
    It would be confusing and error prone to make only some operations safe.

    The code is put into separate printk/nmi.c as suggested by Steven
    Rostedt. It needs a per-CPU buffer and is compiled only on
    architectures that call nmi_enter(). This is achieved by the new
    HAVE_NMI Kconfig flag.

    The are MN10300 and Xtensa architectures. We need to clean up NMI
    handling there first. Let's do it separately.

    The patch is heavily based on the draft from Peter Zijlstra, see

    https://lkml.org/lkml/2015/6/10/327

    [arnd@arndb.de: printk-nmi: use %zu format string for size_t]
    [akpm@linux-foundation.org: min_t->min - all types are size_t here]
    Signed-off-by: Petr Mladek
    Suggested-by: Peter Zijlstra
    Suggested-by: Steven Rostedt
    Cc: Jan Kara
    Acked-by: Russell King [arm part]
    Cc: Daniel Thompson
    Cc: Jiri Kosina
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: Ralf Baechle
    Cc: Benjamin Herrenschmidt
    Cc: Martin Schwidefsky
    Cc: David Miller
    Cc: Daniel Thompson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Petr Mladek
     
  • Define HAVE_EXIT_THREAD for archs which want to do something in
    exit_thread. For others, let's define exit_thread as an empty inline.

    This is a cleanup before we change the prototype of exit_thread to
    accept a task parameter.

    [akpm@linux-foundation.org: fix mips]
    Signed-off-by: Jiri Slaby
    Cc: "David S. Miller"
    Cc: "H. Peter Anvin"
    Cc: "James E.J. Bottomley"
    Cc: Aurelien Jacquiot
    Cc: Benjamin Herrenschmidt
    Cc: Catalin Marinas
    Cc: Chen Liqin
    Cc: Chris Metcalf
    Cc: Chris Zankel
    Cc: David Howells
    Cc: Fenghua Yu
    Cc: Geert Uytterhoeven
    Cc: Guan Xuetao
    Cc: Haavard Skinnemoen
    Cc: Hans-Christian Egtvedt
    Cc: Heiko Carstens
    Cc: Helge Deller
    Cc: Ingo Molnar
    Cc: Ivan Kokshaysky
    Cc: James Hogan
    Cc: Jeff Dike
    Cc: Jesper Nilsson
    Cc: Jiri Slaby
    Cc: Jonas Bonn
    Cc: Koichi Yasutake
    Cc: Lennox Wu
    Cc: Ley Foon Tan
    Cc: Mark Salter
    Cc: Martin Schwidefsky
    Cc: Matt Turner
    Cc: Max Filippov
    Cc: Michael Ellerman
    Cc: Michal Simek
    Cc: Mikael Starvik
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Ralf Baechle
    Cc: Rich Felker
    Cc: Richard Henderson
    Cc: Richard Kuo
    Cc: Richard Weinberger
    Cc: Russell King
    Cc: Steven Miao
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Cc: Vineet Gupta
    Cc: Will Deacon
    Cc: Yoshinori Sato
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiri Slaby
     

29 Feb, 2016

1 commit

  • Add a CONFIG_STACK_VALIDATION option which will run "objtool check" for
    each .o file to ensure the validity of its stack metadata.

    Signed-off-by: Josh Poimboeuf
    Cc: Andrew Morton
    Cc: Andy Lutomirski
    Cc: Arnaldo Carvalho de Melo
    Cc: Bernd Petrovitsch
    Cc: Borislav Petkov
    Cc: Chris J Arges
    Cc: Jiri Slaby
    Cc: Linus Torvalds
    Cc: Michal Marek
    Cc: Namhyung Kim
    Cc: Pedro Alves
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: live-patching@vger.kernel.org
    Link: http://lkml.kernel.org/r/92baab69a6bf9bc7043af0bfca9fb964a1d45546.1456719558.git.jpoimboe@redhat.com
    Signed-off-by: Ingo Molnar

    Josh Poimboeuf
     

21 Jan, 2016

2 commits

  • Move the generic implementation to now that all
    architectures support it and remove the HAVE_DMA_ATTR Kconfig symbol now
    that everyone supports them.

    [valentinrothberg@gmail.com: remove leftovers in Kconfig]
    Signed-off-by: Christoph Hellwig
    Cc: "David S. Miller"
    Cc: Aurelien Jacquiot
    Cc: Chris Metcalf
    Cc: David Howells
    Cc: Geert Uytterhoeven
    Cc: Haavard Skinnemoen
    Cc: Hans-Christian Egtvedt
    Cc: Helge Deller
    Cc: James Hogan
    Cc: Jesper Nilsson
    Cc: Koichi Yasutake
    Cc: Ley Foon Tan
    Cc: Mark Salter
    Cc: Mikael Starvik
    Cc: Steven Miao
    Cc: Vineet Gupta
    Cc: Christian Borntraeger
    Cc: Joerg Roedel
    Cc: Sebastian Ott
    Signed-off-by: Valentin Rothberg
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     
  • This series converts all remaining architectures to use dma_map_ops and
    the generic implementation of the DMA API. This not only simplifies the
    code a lot, but also prepares for possible future changes like more
    generic non-iommu dma_ops implementations or generic per-device
    dma_map_ops.

    This patch (of 16):

    We have a couple architectures that do not want to support this code, so
    add another Kconfig symbol that disables the code similar to what we do
    for the nommu case.

    Signed-off-by: Christoph Hellwig
    Cc: Haavard Skinnemoen
    Cc: Hans-Christian Egtvedt
    Cc: Steven Miao
    Cc: Ley Foon Tan
    Cc: David Howells
    Cc: Koichi Yasutake
    Cc: Chris Metcalf
    Cc: "David S. Miller"
    Cc: Aurelien Jacquiot
    Cc: Geert Uytterhoeven
    Cc: Helge Deller
    Cc: James Hogan
    Cc: Jesper Nilsson
    Cc: Mark Salter
    Cc: Mikael Starvik
    Cc: Vineet Gupta
    Cc: Christian Borntraeger
    Cc: Joerg Roedel
    Cc: Sebastian Ott
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     

15 Jan, 2016

1 commit

  • Address Space Layout Randomization (ASLR) provides a barrier to
    exploitation of user-space processes in the presence of security
    vulnerabilities by making it more difficult to find desired code/data
    which could help an attack. This is done by adding a random offset to
    the location of regions in the process address space, with a greater
    range of potential offset values corresponding to better protection/a
    larger search-space for brute force, but also to greater potential for
    fragmentation.

    The offset added to the mmap_base address, which provides the basis for
    the majority of the mappings for a process, is set once on process exec
    in arch_pick_mmap_layout() and is done via hard-coded per-arch values,
    which reflect, hopefully, the best compromise for all systems. The
    trade-off between increased entropy in the offset value generation and
    the corresponding increased variability in address space fragmentation
    is not absolute, however, and some platforms may tolerate higher amounts
    of entropy. This patch introduces both new Kconfig values and a sysctl
    interface which may be used to change the amount of entropy used for
    offset generation on a system.

    The direct motivation for this change was in response to the
    libstagefright vulnerabilities that affected Android, specifically to
    information provided by Google's project zero at:

    http://googleprojectzero.blogspot.com/2015/09/stagefrightened.html

    The attack presented therein, by Google's project zero, specifically
    targeted the limited randomness used to generate the offset added to the
    mmap_base address in order to craft a brute-force-based attack.
    Concretely, the attack was against the mediaserver process, which was
    limited to respawning every 5 seconds, on an arm device. The hard-coded
    8 bits used resulted in an average expected success rate of defeating
    the mmap ASLR after just over 10 minutes (128 tries at 5 seconds a
    piece). With this patch, and an accompanying increase in the entropy
    value to 16 bits, the same attack would take an average expected time of
    over 45 hours (32768 tries), which makes it both less feasible and more
    likely to be noticed.

    The introduced Kconfig and sysctl options are limited by per-arch
    minimum and maximum values, the minimum of which was chosen to match the
    current hard-coded value and the maximum of which was chosen so as to
    give the greatest flexibility without generating an invalid mmap_base
    address, generally a 3-4 bits less than the number of bits in the
    user-space accessible virtual address space.

    When decided whether or not to change the default value, a system
    developer should consider that mmap_base address could be placed
    anywhere up to 2^(value) bits away from the non-randomized location,
    which would introduce variable-sized areas above and below the mmap_base
    address such that the maximum vm_area_struct size may be reduced,
    preventing very large allocations.

    This patch (of 4):

    ASLR only uses as few as 8 bits to generate the random offset for the
    mmap base address on 32 bit architectures. This value was chosen to
    prevent a poorly chosen value from dividing the address space in such a
    way as to prevent large allocations. This may not be an issue on all
    platforms. Allow the specification of a minimum number of bits so that
    platforms desiring greater ASLR protection may determine where to place
    the trade-off.

    Signed-off-by: Daniel Cashman
    Cc: Russell King
    Acked-by: Kees Cook
    Cc: Ingo Molnar
    Cc: Jonathan Corbet
    Cc: Don Zickus
    Cc: Eric W. Biederman
    Cc: Heinrich Schuchardt
    Cc: Josh Poimboeuf
    Cc: Kirill A. Shutemov
    Cc: Naoya Horiguchi
    Cc: Andrea Arcangeli
    Cc: Mel Gorman
    Cc: Thomas Gleixner
    Cc: David Rientjes
    Cc: Mark Salyzyn
    Cc: Jeff Vander Stoep
    Cc: Nick Kralevich
    Cc: Catalin Marinas
    Cc: Will Deacon
    Cc: "H. Peter Anvin"
    Cc: Hector Marco-Gisbert
    Cc: Borislav Petkov
    Cc: Ralf Baechle
    Cc: Heiko Carstens
    Cc: Martin Schwidefsky
    Cc: Benjamin Herrenschmidt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daniel Cashman
     

11 Sep, 2015

1 commit

  • There are two kexec load syscalls, kexec_load another and kexec_file_load.
    kexec_file_load has been splited as kernel/kexec_file.c. In this patch I
    split kexec_load syscall code to kernel/kexec.c.

    And add a new kconfig option KEXEC_CORE, so we can disable kexec_load and
    use kexec_file_load only, or vice verse.

    The original requirement is from Ted Ts'o, he want kexec kernel signature
    being checked with CONFIG_KEXEC_VERIFY_SIG enabled. But kexec-tools use
    kexec_load syscall can bypass the checking.

    Vivek Goyal proposed to create a common kconfig option so user can compile
    in only one syscall for loading kexec kernel. KEXEC/KEXEC_FILE selects
    KEXEC_CORE so that old config files still work.

    Because there's general code need CONFIG_KEXEC_CORE, so I updated all the
    architecture Kconfig with a new option KEXEC_CORE, and let KEXEC selects
    KEXEC_CORE in arch Kconfig. Also updated general kernel code with to
    kexec_load syscall.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Dave Young
    Cc: Eric W. Biederman
    Cc: Vivek Goyal
    Cc: Petr Tesarik
    Cc: Theodore Ts'o
    Cc: Josh Boyer
    Cc: David Howells
    Cc: Geert Uytterhoeven
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Young
     

06 Sep, 2015

1 commit

  • Pull vfs updates from Al Viro:
    "In this one:

    - d_move fixes (Eric Biederman)

    - UFS fixes (me; locking is mostly sane now, a bunch of bugs in error
    handling ought to be fixed)

    - switch of sb_writers to percpu rwsem (Oleg Nesterov)

    - superblock scalability (Josef Bacik and Dave Chinner)

    - swapon(2) race fix (Hugh Dickins)"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (65 commits)
    vfs: Test for and handle paths that are unreachable from their mnt_root
    dcache: Reduce the scope of i_lock in d_splice_alias
    dcache: Handle escaped paths in prepend_path
    mm: fix potential data race in SyS_swapon
    inode: don't softlockup when evicting inodes
    inode: rename i_wb_list to i_io_list
    sync: serialise per-superblock sync operations
    inode: convert inode_sb_list_lock to per-sb
    inode: add hlist_fake to avoid the inode hash lock in evict
    writeback: plug writeback at a high level
    change sb_writers to use percpu_rw_semaphore
    shift percpu_counter_destroy() into destroy_super_work()
    percpu-rwsem: kill CONFIG_PERCPU_RWSEM
    percpu-rwsem: introduce percpu_rwsem_release() and percpu_rwsem_acquire()
    percpu-rwsem: introduce percpu_down_read_trylock()
    document rwsem_release() in sb_wait_write()
    fix the broken lockdep logic in __sb_start_write()
    introduce __sb_writers_{acquired,release}() helpers
    ufs_inode_get{frag,block}(): get rid of 'phys' argument
    ufs_getfrag_block(): tidy up a bit
    ...

    Linus Torvalds
     

15 Aug, 2015

1 commit


03 Aug, 2015

1 commit

  • Add a little selftest that validates all combinations.

    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andrew Morton
    Cc: Linus Torvalds
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

18 Jul, 2015

1 commit

  • Don't burden architectures without dynamic task_struct sizing
    with the overhead of dynamic sizing.

    Also optimize the x86 code a bit by caching task_struct_size.

    Acked-and-Tested-by: Dave Hansen
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Dave Hansen
    Cc: Denys Vlasenko
    Cc: Linus Torvalds
    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/1437128892-9831-3-git-send-email-mingo@kernel.org
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

26 Jun, 2015

1 commit

  • clone has some of the quirkiest syscall handling in the kernel, with a
    pile of special cases, historical curiosities, and architecture-specific
    calling conventions. In particular, clone with CLONE_SETTLS accepts a
    parameter "tls" that the C entry point completely ignores and some
    assembly entry points overwrite; instead, the low-level arch-specific
    code pulls the tls parameter out of the arch-specific register captured
    as part of pt_regs on entry to the kernel. That's a massive hack, and
    it makes the arch-specific code only work when called via the specific
    existing syscall entry points; because of this hack, any new clone-like
    system call would have to accept an identical tls argument in exactly
    the same arch-specific position, rather than providing a unified system
    call entry point across architectures.

    The first patch allows architectures to handle the tls argument via
    normal C parameter passing, if they opt in by selecting
    HAVE_COPY_THREAD_TLS. The second patch makes 32-bit and 64-bit x86 opt
    into this.

    These two patches came out of the clone4 series, which isn't ready for
    this merge window, but these first two cleanup patches were entirely
    uncontroversial and have acks. I'd like to go ahead and submit these
    two so that other architectures can begin building on top of this and
    opting into HAVE_COPY_THREAD_TLS. However, I'm also happy to wait and
    send these through the next merge window (along with v3 of clone4) if
    anyone would prefer that.

    This patch (of 2):

    clone with CLONE_SETTLS accepts an argument to set the thread-local
    storage area for the new thread. sys_clone declares an int argument
    tls_val in the appropriate point in the argument list (based on the
    various CLONE_BACKWARDS variants), but doesn't actually use or pass along
    that argument. Instead, sys_clone calls do_fork, which calls
    copy_process, which calls the arch-specific copy_thread, and copy_thread
    pulls the corresponding syscall argument out of the pt_regs captured at
    kernel entry (knowing what argument of clone that architecture passes tls
    in).

    Apart from being awful and inscrutable, that also only works because only
    one code path into copy_thread can pass the CLONE_SETTLS flag, and that
    code path comes from sys_clone with its architecture-specific
    argument-passing order. This prevents introducing a new version of the
    clone system call without propagating the same architecture-specific
    position of the tls argument.

    However, there's no reason to pull the argument out of pt_regs when
    sys_clone could just pass it down via C function call arguments.

    Introduce a new CONFIG_HAVE_COPY_THREAD_TLS for architectures to opt into,
    and a new copy_thread_tls that accepts the tls parameter as an additional
    unsigned long (syscall-argument-sized) argument. Change sys_clone's tls
    argument to an unsigned long (which does not change the ABI), and pass
    that down to copy_thread_tls.

    Architectures that don't opt into copy_thread_tls will continue to ignore
    the C argument to sys_clone in favor of the pt_regs captured at kernel
    entry, and thus will be unable to introduce new versions of the clone
    syscall.

    Patch co-authored by Josh Triplett and Thiago Macieira.

    Signed-off-by: Josh Triplett
    Acked-by: Andy Lutomirski
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Thiago Macieira
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Josh Triplett
     

17 Apr, 2015

1 commit

  • Pull powerpc updates from Michael Ellerman:

    - Numerous minor fixes, cleanups etc.

    - More EEH work from Gavin to remove its dependency on device_nodes.

    - Memory hotplug implemented entirely in the kernel from Nathan
    Fontenot.

    - Removal of redundant CONFIG_PPC_OF by Kevin Hao.

    - Rewrite of VPHN parsing logic & tests from Greg Kurz.

    - A fix from Nish Aravamudan to reduce memory usage by clamping
    nodes_possible_map.

    - Support for pstore on powernv from Hari Bathini.

    - Removal of old powerpc specific byte swap routines by David Gibson.

    - Fix from Vasant Hegde to prevent the flash driver telling you it was
    flashing your firmware when it wasn't.

    - Patch from Ben Herrenschmidt to add an OPAL heartbeat driver.

    - Fix for an oops causing get/put_cpu_var() imbalance in perf by Jan
    Stancek.

    - Some fixes for migration from Tyrel Datwyler.

    - A new syscall to switch the cpu endian by Michael Ellerman.

    - Large series from Wei Yang to implement SRIOV, reviewed and acked by
    Bjorn.

    - A fix for the OPAL sensor driver from Cédric Le Goater.

    - Fixes to get STRICT_MM_TYPECHECKS building again by Michael Ellerman.

    - Large series from Daniel Axtens to make our PCI hooks per PHB rather
    than per machine.

    - Small patch from Sam Bobroff to explicitly abort non-suspended
    transactions on syscalls, plus a test to exercise it.

    - Numerous reworks and fixes for the 24x7 PMU from Sukadev Bhattiprolu.

    - Small patch to enable the hard lockup detector from Anton Blanchard.

    - Fix from Dave Olson for missing L2 cache information on some CPUs.

    - Some fixes from Michael Ellerman to get Cell machines booting again.

    - Freescale updates from Scott: Highlights include BMan device tree
    nodes, an MSI erratum workaround, a couple minor performance
    improvements, config updates, and misc fixes/cleanup.

    * tag 'powerpc-4.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux: (196 commits)
    powerpc/powermac: Fix build error seen with powermac smp builds
    powerpc/pseries: Fix compile of memory hotplug without CONFIG_MEMORY_HOTREMOVE
    powerpc: Remove PPC32 code from pseries specific find_and_init_phbs()
    powerpc/cell: Fix iommu breakage caused by controller_ops change
    powerpc/eeh: Fix crash in eeh_add_device_early() on Cell
    powerpc/perf: Cap 64bit userspace backtraces to PERF_MAX_STACK_DEPTH
    powerpc/perf/hv-24x7: Fail 24x7 initcall if create_events_from_catalog() fails
    powerpc/pseries: Correct memory hotplug locking
    powerpc: Fix missing L2 cache size in /sys/devices/system/cpu
    powerpc: Add ppc64 hard lockup detector support
    oprofile: Disable oprofile NMI timer on ppc64
    powerpc/perf/hv-24x7: Add missing put_cpu_var()
    powerpc/perf/hv-24x7: Break up single_24x7_request
    powerpc/perf/hv-24x7: Define update_event_count()
    powerpc/perf/hv-24x7: Whitespace cleanup
    powerpc/perf/hv-24x7: Define add_event_to_24x7_request()
    powerpc/perf/hv-24x7: Rename hv_24x7_event_update
    powerpc/perf/hv-24x7: Move debug prints to separate function
    powerpc/perf/hv-24x7: Drop event_24x7_request()
    powerpc/perf/hv-24x7: Use pr_devel() to log message
    ...

    Conflicts:
    tools/testing/selftests/powerpc/Makefile
    tools/testing/selftests/powerpc/tm/Makefile

    Linus Torvalds
     

15 Apr, 2015

4 commits

  • The arch_randomize_brk() function is used on several architectures,
    even those that don't support ET_DYN ASLR. To avoid bulky extern/#define
    tricks, consolidate the support under CONFIG_ARCH_HAS_ELF_RANDOMIZE for
    the architectures that support it, while still handling CONFIG_COMPAT_BRK.

    Signed-off-by: Kees Cook
    Cc: Hector Marco-Gisbert
    Cc: Russell King
    Reviewed-by: Ingo Molnar
    Cc: Catalin Marinas
    Cc: Will Deacon
    Cc: Ralf Baechle
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Michael Ellerman
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: Alexander Viro
    Cc: Oleg Nesterov
    Cc: Andy Lutomirski
    Cc: "David A. Long"
    Cc: Andrey Ryabinin
    Cc: Arun Chandran
    Cc: Yann Droneaud
    Cc: Min-Hua Chen
    Cc: Paul Burton
    Cc: Alex Smith
    Cc: Markos Chandras
    Cc: Vineeth Vijayan
    Cc: Jeff Bailey
    Cc: Michael Holzheu
    Cc: Ben Hutchings
    Cc: Behan Webster
    Cc: Ismael Ripoll
    Cc: Jan-Simon Mller
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     
  • When an architecture fully supports randomizing the ELF load location,
    a per-arch mmap_rnd() function is used to find a randomized mmap base.
    In preparation for randomizing the location of ET_DYN binaries
    separately from mmap, this renames and exports these functions as
    arch_mmap_rnd(). Additionally introduces CONFIG_ARCH_HAS_ELF_RANDOMIZE
    for describing this feature on architectures that support it
    (which is a superset of ARCH_BINFMT_ELF_RANDOMIZE_PIE, since s390
    already supports a separated ET_DYN ASLR from mmap ASLR without the
    ARCH_BINFMT_ELF_RANDOMIZE_PIE logic).

    Signed-off-by: Kees Cook
    Cc: Hector Marco-Gisbert
    Cc: Russell King
    Reviewed-by: Ingo Molnar
    Cc: Catalin Marinas
    Cc: Will Deacon
    Cc: Ralf Baechle
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Michael Ellerman
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: Alexander Viro
    Cc: Oleg Nesterov
    Cc: Andy Lutomirski
    Cc: "David A. Long"
    Cc: Andrey Ryabinin
    Cc: Arun Chandran
    Cc: Yann Droneaud
    Cc: Min-Hua Chen
    Cc: Paul Burton
    Cc: Alex Smith
    Cc: Markos Chandras
    Cc: Vineeth Vijayan
    Cc: Jeff Bailey
    Cc: Michael Holzheu
    Cc: Ben Hutchings
    Cc: Behan Webster
    Cc: Ismael Ripoll
    Cc: Jan-Simon Mller
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     
  • Add ioremap_pud_enabled() and ioremap_pmd_enabled(), which return 1 when
    I/O mappings with pud/pmd are enabled on the kernel.

    ioremap_huge_init() calls arch_ioremap_pud_supported() and
    arch_ioremap_pmd_supported() to initialize the capabilities at boot-time.

    A new kernel option "nohugeiomap" is also added, so that user can disable
    the huge I/O map capabilities when necessary.

    Signed-off-by: Toshi Kani
    Cc: "H. Peter Anvin"
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: Arnd Bergmann
    Cc: Dave Hansen
    Cc: Robert Elliott
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Toshi Kani
     
  • By this time all architectures which support more than two page table
    levels should be covered. This patch add default definiton of
    PGTABLE_LEVELS equal 2.

    We also add assert to detect inconsistence between CONFIG_PGTABLE_LEVELS
    and __PAGETABLE_PMD_FOLDED/__PAGETABLE_PUD_FOLDED.

    Signed-off-by: Kirill A. Shutemov
    Tested-by: Guenter Roeck
    Cc: Richard Henderson
    Cc: Ivan Kokshaysky
    Cc: Matt Turner
    Cc: "David S. Miller"
    Cc: "H. Peter Anvin"
    Cc: "James E.J. Bottomley"
    Cc: Benjamin Herrenschmidt
    Cc: Catalin Marinas
    Cc: Chris Metcalf
    Cc: David Howells
    Cc: Fenghua Yu
    Cc: Geert Uytterhoeven
    Cc: Heiko Carstens
    Cc: Helge Deller
    Cc: Ingo Molnar
    Cc: Jeff Dike
    Cc: Kirill A. Shutemov
    Cc: Koichi Yasutake
    Cc: Martin Schwidefsky
    Cc: Michael Ellerman
    Cc: Paul Mackerras
    Cc: Ralf Baechle
    Cc: Richard Weinberger
    Cc: Russell King
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Cc: Will Deacon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     

11 Apr, 2015

1 commit

  • We want to enable the hard lockup detector on ppc64, but right now
    that enables the oprofile NMI timer too.

    We'd prefer not to enable the oprofile NMI timer, it adds another
    element to our PMU testing and it requires us to increase our
    exported symbols (eg cpu_khz).

    Modify the config entry for OPROFILE_NMI_TIMER to disable it on PPC64.

    Signed-off-by: Anton Blanchard
    Acked-by: Robert Richter
    Signed-off-by: Michael Ellerman

    Anton Blanchard
     

04 Sep, 2014

1 commit


19 Jul, 2014

1 commit

  • This adds the new "seccomp" syscall with both an "operation" and "flags"
    parameter for future expansion. The third argument is a pointer value,
    used with the SECCOMP_SET_MODE_FILTER operation. Currently, flags must
    be 0. This is functionally equivalent to prctl(PR_SET_SECCOMP, ...).

    In addition to the TSYNC flag later in this patch series, there is a
    non-zero chance that this syscall could be used for configuring a fixed
    argument area for seccomp-tracer-aware processes to pass syscall arguments
    in the future. Hence, the use of "seccomp" not simply "seccomp_add_filter"
    for this syscall. Additionally, this syscall uses operation, flags,
    and user pointer for arguments because strictly passing arguments via
    a user pointer would mean seccomp itself would be unable to trivially
    filter the seccomp syscall itself.

    Signed-off-by: Kees Cook
    Reviewed-by: Oleg Nesterov
    Reviewed-by: Andy Lutomirski

    Kees Cook
     

19 Mar, 2014

1 commit


20 Dec, 2013

2 commits

  • This changes the stack protector config option into a choice of
    "None", "Regular", and "Strong":

    CONFIG_CC_STACKPROTECTOR_NONE
    CONFIG_CC_STACKPROTECTOR_REGULAR
    CONFIG_CC_STACKPROTECTOR_STRONG

    "Regular" means the old CONFIG_CC_STACKPROTECTOR=y option.

    "Strong" is a new mode introduced by this patch. With "Strong" the
    kernel is built with -fstack-protector-strong (available in
    gcc 4.9 and later). This option increases the coverage of the stack
    protector without the heavy performance hit of -fstack-protector-all.

    For reference, the stack protector options available in gcc are:

    -fstack-protector-all:
    Adds the stack-canary saving prefix and stack-canary checking
    suffix to _all_ function entry and exit. Results in substantial
    use of stack space for saving the canary for deep stack users
    (e.g. historically xfs), and measurable (though shockingly still
    low) performance hit due to all the saving/checking. Really not
    suitable for sane systems, and was entirely removed as an option
    from the kernel many years ago.

    -fstack-protector:
    Adds the canary save/check to functions that define an 8
    (--param=ssp-buffer-size=N, N=8 by default) or more byte local
    char array. Traditionally, stack overflows happened with
    string-based manipulations, so this was a way to find those
    functions. Very few total functions actually get the canary; no
    measurable performance or size overhead.

    -fstack-protector-strong
    Adds the canary for a wider set of functions, since it's not
    just those with strings that have ultimately been vulnerable to
    stack-busting. With this superset, more functions end up with a
    canary, but it still remains small compared to all functions
    with only a small change in performance. Based on the original
    design document, a function gets the canary when it contains any
    of:

    - local variable's address used as part of the right hand side
    of an assignment or function argument
    - local variable is an array (or union containing an array),
    regardless of array type or length
    - uses register local variables

    https://docs.google.com/a/google.com/document/d/1xXBH6rRZue4f296vGt9YQcuLVQHeE516stHwt8M9xyU

    Find below a comparison of "size" and "objdump" output when built with
    gcc-4.9 in three configurations:

    - defconfig
    11430641 kernel text size
    36110 function bodies

    - defconfig + CONFIG_CC_STACKPROTECTOR_REGULAR
    11468490 kernel text size (+0.33%)
    1015 of 36110 functions are stack-protected (2.81%)

    - defconfig + CONFIG_CC_STACKPROTECTOR_STRONG via this patch
    11692790 kernel text size (+2.24%)
    7401 of 36110 functions are stack-protected (20.5%)

    With -strong, ARM's compressed boot code now triggers stack
    protection, so a static guard was added. Since this is only used
    during decompression and was never used before, the exposure
    here is very small. Once it switches to the full kernel, the
    stack guard is back to normal.

    Chrome OS has been using -fstack-protector-strong for its kernel
    builds for the last 8 months with no problems.

    Signed-off-by: Kees Cook
    Cc: Arjan van de Ven
    Cc: Michal Marek
    Cc: Russell King
    Cc: Ralf Baechle
    Cc: Paul Mundt
    Cc: James Hogan
    Cc: Stephen Rothwell
    Cc: Shawn Guo
    Cc: Linus Torvalds
    Cc: Andrew Morton
    Cc: Peter Zijlstra
    Cc: linux-arm-kernel@lists.infradead.org
    Cc: linux-mips@linux-mips.org
    Cc: linux-arch@vger.kernel.org
    Link: http://lkml.kernel.org/r/1387481759-14535-3-git-send-email-keescook@chromium.org
    [ Improved the changelog and descriptions some more. ]
    Signed-off-by: Ingo Molnar

    Kees Cook
     
  • Instead of duplicating the CC_STACKPROTECTOR Kconfig and
    Makefile logic in each architecture, switch to using
    HAVE_CC_STACKPROTECTOR and keep everything in one place. This
    retains the x86-specific bug verification scripts.

    Signed-off-by: Kees Cook
    Cc: Arjan van de Ven
    Cc: Michal Marek
    Cc: Russell King
    Cc: Ralf Baechle
    Cc: Paul Mundt
    Cc: James Hogan
    Cc: Stephen Rothwell
    Cc: Shawn Guo
    Cc: Linus Torvalds
    Cc: Andrew Morton
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: linux-arm-kernel@lists.infradead.org
    Cc: linux-mips@linux-mips.org
    Cc: linux-arch@vger.kernel.org
    Link: http://lkml.kernel.org/r/1387481759-14535-2-git-send-email-keescook@chromium.org
    Signed-off-by: Ingo Molnar

    Kees Cook
     

15 Nov, 2013

1 commit


12 Nov, 2013

1 commit

  • Pull timer changes from Ingo Molnar:
    "Main changes in this cycle were:

    - Updated full dynticks support.

    - Event stream support for architected (ARM) timers.

    - ARM clocksource driver updates.

    - Move arm64 to using the generic sched_clock framework & resulting
    cleanup in the generic sched_clock code.

    - Misc fixes and cleanups"

    * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (50 commits)
    x86/time: Honor ACPI FADT flag indicating absence of a CMOS RTC
    clocksource: sun4i: remove IRQF_DISABLED
    clocksource: sun4i: Report the minimum tick that we can program
    clocksource: sun4i: Select CLKSRC_MMIO
    clocksource: Provide timekeeping for efm32 SoCs
    clocksource: em_sti: convert to clk_prepare/unprepare
    time: Fix signedness bug in sysfs_get_uname() and its callers
    timekeeping: Fix some trivial typos in comments
    alarmtimer: return EINVAL instead of ENOTSUPP if rtcdev doesn't exist
    clocksource: arch_timer: Do not register arch_sys_counter twice
    timer stats: Add a 'Collection: active/inactive' line to timer usage statistics
    sched_clock: Remove sched_clock_func() hook
    arch_timer: Move to generic sched_clock framework
    clocksource: tcb_clksrc: Remove IRQF_DISABLED
    clocksource: tcb_clksrc: Improve driver robustness
    clocksource: tcb_clksrc: Replace clk_enable/disable with clk_prepare_enable/disable_unprepare
    clocksource: arm_arch_timer: Use clocksource for suspend timekeeping
    clocksource: dw_apb_timer_of: Mark a few more functions as __init
    clocksource: Put nodes passed to CLOCKSOURCE_OF_DECLARE callbacks centrally
    arm: zynq: Enable arm_global_timer
    ...

    Linus Torvalds
     

01 Oct, 2013

1 commit

  • If irq_exit() is called on the arch's specified irq stack,
    it should be safe to run softirqs inline under that same
    irq stack as it is near empty by the time we call irq_exit().

    For example if we use the same stack for both hard and soft irqs here,
    the worst case scenario is:
    hardirq -> softirq -> hardirq. But then the softirq supersedes the
    first hardirq as the stack user since irq_exit() is called in
    a mostly empty stack. So the stack merge in this case looks acceptable.

    Stack overrun still have a chance to happen if hardirqs have more
    opportunities to nest, but then it's another problem to solve.

    So lets adapt the irq exit's softirq stack on top of a new Kconfig symbol
    that can be defined when irq_exit() runs on the irq stack. That way
    we can spare some stack switch on irq processing and all the cache
    issues that come along.

    Acked-by: Linus Torvalds
    Signed-off-by: Frederic Weisbecker
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Paul Mackerras
    Cc: James Hogan
    Cc: James E.J. Bottomley
    Cc: Helge Deller
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: David S. Miller
    Cc: Andrew Morton

    Frederic Weisbecker
     

30 Sep, 2013

1 commit

  • With VIRT_CPU_ACCOUNTING_GEN, cputime_t becomes 64-bit. In order
    to use that feature, arch code should be audited to ensure there are no
    races in concurrent read/write of cputime_t. For example,
    reading/writing 64-bit cputime_t on some 32-bit arches may require
    multiple accesses for low and high value parts, so proper locking
    is needed to protect against concurrent accesses.

    Therefore, add CONFIG_HAVE_VIRT_CPU_ACCOUNTING_GEN which arches can
    enable after they've been audited for potential races.

    This option is automatically enabled on 64-bit platforms.

    Feature requested by Frederic Weisbecker.

    Signed-off-by: Kevin Hilman
    Cc: Ingo Molnar
    Cc: Russell King
    Cc: Paul E. McKenney
    Cc: Arm Linux
    Signed-off-by: Frederic Weisbecker

    Kevin Hilman
     

28 Sep, 2013

1 commit

  • Linus suggested to replace

    #ifndef CONFIG_HAVE_ARCH_MUTEX_CPU_RELAX
    #define arch_mutex_cpu_relax() cpu_relax()
    #endif

    with just a simple

    #ifndef arch_mutex_cpu_relax
    # define arch_mutex_cpu_relax() cpu_relax()
    #endif

    to get rid of CONFIG_HAVE_CPU_RELAX_SIMPLE. So architectures can
    simply define arch_mutex_cpu_relax if they want an architecture
    specific function instead of having to add a select statement in
    their Kconfig in addition.

    Suggested-by: Linus Torvalds
    Signed-off-by: Heiko Carstens

    Heiko Carstens
     

14 Aug, 2013

1 commit

  • Fix inadvertent breakage in the clone syscall ABI for Microblaze that
    was introduced in commit f3268edbe6fe ("microblaze: switch to generic
    fork/vfork/clone").

    The Microblaze syscall ABI for clone takes the parent tid address in the
    4th argument; the third argument slot is used for the stack size. The
    incorrectly-used CLONE_BACKWARDS type assigned parent tid to the 3rd
    slot.

    This commit restores the original ABI so that existing userspace libc
    code will work correctly.

    All kernel versions from v3.8-rc1 were affected.

    Signed-off-by: Michal Simek
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Simek
     

04 Jul, 2013

1 commit

  • The soft-dirty is a bit on a PTE which helps to track which pages a task
    writes to. In order to do this tracking one should

    1. Clear soft-dirty bits from PTEs ("echo 4 > /proc/PID/clear_refs)
    2. Wait some time.
    3. Read soft-dirty bits (55'th in /proc/PID/pagemap2 entries)

    To do this tracking, the writable bit is cleared from PTEs when the
    soft-dirty bit is. Thus, after this, when the task tries to modify a
    page at some virtual address the #PF occurs and the kernel sets the
    soft-dirty bit on the respective PTE.

    Note, that although all the task's address space is marked as r/o after
    the soft-dirty bits clear, the #PF-s that occur after that are processed
    fast. This is so, since the pages are still mapped to physical memory,
    and thus all the kernel does is finds this fact out and puts back
    writable, dirty and soft-dirty bits on the PTE.

    Another thing to note, is that when mremap moves PTEs they are marked
    with soft-dirty as well, since from the user perspective mremap modifies
    the virtual memory at mremap's new address.

    Signed-off-by: Pavel Emelyanov
    Cc: Matt Mackall
    Cc: Xiao Guangrong
    Cc: Glauber Costa
    Cc: Marcelo Tosatti
    Cc: KOSAKI Motohiro
    Cc: Stephen Rothwell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     

16 May, 2013

1 commit


06 May, 2013

1 commit

  • Pull mudule updates from Rusty Russell:
    "We get rid of the general module prefix confusion with a binary config
    option, fix a remove/insert race which Never Happens, and (my
    favorite) handle the case when we have too many modules for a single
    commandline. Seriously, the kernel is full, please go away!"

    * tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
    modpost: fix unwanted VMLINUX_SYMBOL_STR expansion
    X.509: Support parse long form of length octets in Authority Key Identifier
    module: don't unlink the module until we've removed all exposure.
    kernel: kallsyms: memory override issue, need check destination buffer length
    MODSIGN: do not send garbage to stderr when enabling modules signature
    modpost: handle huge numbers of modules.
    modpost: add -T option to read module names from file/stdin.
    modpost: minor cleanup.
    genksyms: pass symbol-prefix instead of arch
    module: fix symbol versioning with symbol prefixes
    CONFIG_SYMBOL_PREFIX: cleanup.

    Linus Torvalds
     

05 May, 2013

1 commit

  • commit d1669912 (idle: Implement generic idle function) added a new
    generic idle along with support for hlt/nohlt command line options to
    override default idle loop behavior. However, the command-line
    processing is never compiled.

    The command-line handling is wrapped by CONFIG_GENERIC_IDLE_POLL_SETUP
    and arches that use this feature select it in their Kconfigs.
    However, no Kconfig definition was created for this option, so it is
    never enabled, and therefore command-line override of the idle-loop
    behavior is broken after migrating to the generic idle loop.

    To fix, add a Kconfig definition for GENERIC_IDLE_POLL_SETUP.

    Tested on ARM (OMAP4/Panda) which enables the command-line overrides
    by default.

    Signed-off-by: Kevin Hilman
    Reviewed-by: Srivatsa S. Bhat
    Cc: Linus Torvalds
    Cc: Rusty Russell
    Cc: Paul McKenney
    Cc: Peter Zijlstra
    Cc: Magnus Damm
    Cc: linux-arm-kernel@lists.infradead.org
    Cc: linaro-kernel@lists.linaro.org
    Link: http://lkml.kernel.org/r/1366849153-25564-1-git-send-email-khilman@linaro.org
    Signed-off-by: Thomas Gleixner

    Kevin Hilman
     

01 May, 2013

1 commit

  • Pull compat cleanup from Al Viro:
    "Mostly about syscall wrappers this time; there will be another pile
    with patches in the same general area from various people, but I'd
    rather push those after both that and vfs.git pile are in."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal:
    syscalls.h: slightly reduce the jungles of macros
    get rid of union semop in sys_semctl(2) arguments
    make do_mremap() static
    sparc: no need to sign-extend in sync_file_range() wrapper
    ppc compat wrappers for add_key(2) and request_key(2) are pointless
    x86: trim sys_ia32.h
    x86: sys32_kill and sys32_mprotect are pointless
    get rid of compat_sys_semctl() and friends in case of ARCH_WANT_OLD_COMPAT_IPC
    merge compat sys_ipc instances
    consolidate compat lookup_dcookie()
    convert vmsplice to COMPAT_SYSCALL_DEFINE
    switch getrusage() to COMPAT_SYSCALL_DEFINE
    switch epoll_pwait to COMPAT_SYSCALL_DEFINE
    convert sendfile{,64} to COMPAT_SYSCALL_DEFINE
    switch signalfd{,4}() to COMPAT_SYSCALL_DEFINE
    make SYSCALL_DEFINE-generated wrappers do asmlinkage_protect
    make HAVE_SYSCALL_WRAPPERS unconditional
    consolidate cond_syscall and SYSCALL_ALIAS declarations
    teach SYSCALL_DEFINE how to deal with long long/unsigned long long
    get rid of duplicate logics in __SC_....[1-6] definitions

    Linus Torvalds
     

15 Mar, 2013

1 commit

  • We have CONFIG_SYMBOL_PREFIX, which three archs define to the string
    "_". But Al Viro broke this in "consolidate cond_syscall and
    SYSCALL_ALIAS declarations" (in linux-next), and he's not the first to
    do so.

    Using CONFIG_SYMBOL_PREFIX is awkward, since we usually just want to
    prefix it so something. So various places define helpers which are
    defined to nothing if CONFIG_SYMBOL_PREFIX isn't set:

    1) include/asm-generic/unistd.h defines __SYMBOL_PREFIX.
    2) include/asm-generic/vmlinux.lds.h defines VMLINUX_SYMBOL(sym)
    3) include/linux/export.h defines MODULE_SYMBOL_PREFIX.
    4) include/linux/kernel.h defines SYMBOL_PREFIX (which differs from #7)
    5) kernel/modsign_certificate.S defines ASM_SYMBOL(sym)
    6) scripts/modpost.c defines MODULE_SYMBOL_PREFIX
    7) scripts/Makefile.lib defines SYMBOL_PREFIX on the commandline if
    CONFIG_SYMBOL_PREFIX is set, so that we have a non-string version
    for pasting.

    (arch/h8300/include/asm/linkage.h defines SYMBOL_NAME(), too).

    Let's solve this properly:
    1) No more generic prefix, just CONFIG_HAVE_UNDERSCORE_SYMBOL_PREFIX.
    2) Make linux/export.h usable from asm.
    3) Define VMLINUX_SYMBOL() and VMLINUX_SYMBOL_STR().
    4) Make everyone use them.

    Signed-off-by: Rusty Russell
    Reviewed-by: James Hogan
    Tested-by: James Hogan (metag)

    Rusty Russell