12 Apr, 2018

1 commit

  • commit 39290b389ea upstream.

    The current "rodata=off" parameter disables read-only kernel mappings
    under CONFIG_DEBUG_RODATA:
    commit d2aa1acad22f ("mm/init: Add 'rodata=off' boot cmdline parameter
    to disable read-only kernel mappings")

    This patch is a logical extension to module mappings ie. read-only mappings
    at module loading can be disabled even if CONFIG_DEBUG_SET_MODULE_RONX
    (mainly for debug use). Please note, however, that it only affects RO/RW
    permissions, keeping NX set.

    This is the first step to make CONFIG_DEBUG_SET_MODULE_RONX mandatory
    (always-on) in the future as CONFIG_DEBUG_RODATA on x86 and arm64.

    Suggested-by: and Acked-by: Mark Rutland
    Signed-off-by: AKASHI Takahiro
    Reviewed-by: Kees Cook
    Acked-by: Rusty Russell
    Link: http://lkml.kernel.org/r/20161114061505.15238-1-takahiro.akashi@linaro.org
    Signed-off-by: Jessica Yu
    Signed-off-by: Alex Shi

    Conflicts:
    keeping kaiser.h in init/main.c

    AKASHI Takahiro
     

31 Jan, 2018

1 commit

  • [ upstream commit 290af86629b25ffd1ed6232c4e9107da031705cb ]

    The BPF interpreter has been used as part of the spectre 2 attack CVE-2017-5715.

    A quote from goolge project zero blog:
    "At this point, it would normally be necessary to locate gadgets in
    the host kernel code that can be used to actually leak data by reading
    from an attacker-controlled location, shifting and masking the result
    appropriately and then using the result of that as offset to an
    attacker-controlled address for a load. But piecing gadgets together
    and figuring out which ones work in a speculation context seems annoying.
    So instead, we decided to use the eBPF interpreter, which is built into
    the host kernel - while there is no legitimate way to invoke it from inside
    a VM, the presence of the code in the host kernel's text section is sufficient
    to make it usable for the attack, just like with ordinary ROP gadgets."

    To make attacker job harder introduce BPF_JIT_ALWAYS_ON config
    option that removes interpreter from the kernel in favor of JIT-only mode.
    So far eBPF JIT is supported by:
    x64, arm64, arm32, sparc64, s390, powerpc64, mips64

    The start of JITed program is randomized and code page is marked as read-only.
    In addition "constant blinding" can be turned on with net.core.bpf_jit_harden

    v2->v3:
    - move __bpf_prog_ret0 under ifdef (Daniel)

    v1->v2:
    - fix init order, test_bpf and cBPF (Daniel's feedback)
    - fix offloaded bpf (Jakub's feedback)
    - add 'return 0' dummy in case something can invoke prog->bpf_func
    - retarget bpf tree. For bpf-next the patch would need one extra hunk.
    It will be sent when the trees are merged back to net-next

    Considered doing:
    int bpf_jit_enable __read_mostly = BPF_EBPF_JIT_DEFAULT;
    but it seems better to land the patch as-is and in bpf-next remove
    bpf_jit_enable global variable from all JITs, consolidate in one place
    and remove this jit_init() function.

    Signed-off-by: Alexei Starovoitov
    Signed-off-by: Daniel Borkmann
    Signed-off-by: Greg Kroah-Hartman

    Alexei Starovoitov
     

05 Jan, 2018

2 commits

  • Kaiser only needs to map one page of the stack; and
    kernel/fork.c did not build on powerpc (no __PAGE_KERNEL).
    It's all cleaner if linux/kaiser.h provides kaiser_map_thread_stack()
    and kaiser_unmap_thread_stack() wrappers around asm/kaiser.h's
    kaiser_add_mapping() and kaiser_remove_mapping(). And use
    linux/kaiser.h in init/main.c to avoid the #ifdefs there.

    Signed-off-by: Hugh Dickins
    Signed-off-by: Greg Kroah-Hartman

    Hugh Dickins
     
  • This patch introduces our implementation of KAISER (Kernel Address Isolation to
    have Side-channels Efficiently Removed), a kernel isolation technique to close
    hardware side channels on kernel address information.

    More information about the patch can be found on:

    https://github.com/IAIK/KAISER

    From: Richard Fellner
    From: Daniel Gruss
    Subject: [RFC, PATCH] x86_64: KAISER - do not map kernel in user mode
    Date: Thu, 4 May 2017 14:26:50 +0200
    Link: http://marc.info/?l=linux-kernel&m=149390087310405&w=2
    Kaiser-4.10-SHA1: c4b1831d44c6144d3762ccc72f0c4e71a0c713e5

    To:
    To:
    Cc:
    Cc:
    Cc: Michael Schwarz
    Cc: Richard Fellner
    Cc: Ingo Molnar
    Cc:
    Cc:

    After several recent works [1,2,3] KASLR on x86_64 was basically
    considered dead by many researchers. We have been working on an
    efficient but effective fix for this problem and found that not mapping
    the kernel space when running in user mode is the solution to this
    problem [4] (the corresponding paper [5] will be presented at ESSoS17).

    With this RFC patch we allow anybody to configure their kernel with the
    flag CONFIG_KAISER to add our defense mechanism.

    If there are any questions we would love to answer them.
    We also appreciate any comments!

    Cheers,
    Daniel (+ the KAISER team from Graz University of Technology)

    [1] http://www.ieee-security.org/TC/SP2013/papers/4977a191.pdf
    [2] https://www.blackhat.com/docs/us-16/materials/us-16-Fogh-Using-Undocumented-CPU-Behaviour-To-See-Into-Kernel-Mode-And-Break-KASLR-In-The-Process.pdf
    [3] https://www.blackhat.com/docs/us-16/materials/us-16-Jang-Breaking-Kernel-Address-Space-Layout-Randomization-KASLR-With-Intel-TSX.pdf
    [4] https://github.com/IAIK/KAISER
    [5] https://gruss.cc/files/kaiser.pdf

    [patch based also on
    https://raw.githubusercontent.com/IAIK/KAISER/master/KAISER/0001-KAISER-Kernel-Address-Isolation.patch]

    Signed-off-by: Richard Fellner
    Signed-off-by: Moritz Lipp
    Signed-off-by: Daniel Gruss
    Signed-off-by: Michael Schwarz
    Acked-by: Jiri Kosina
    Signed-off-by: Hugh Dickins
    Signed-off-by: Greg Kroah-Hartman

    Richard Fellner
     

21 Oct, 2017

1 commit

  • [ Upstream commit 08865514805d2de8e7002fa8149c5de3e391f412 ]

    Commit 4a9d4b024a31 ("switch fput to task_work_add") implements a
    schedule_work() for completing fput(), but did not guarantee calling
    __fput() after unpacking initramfs. Because of this, there is a
    possibility that during boot a driver can see ETXTBSY when it tries to
    load a binary from initramfs as fput() is still pending on that binary.

    This patch makes sure that fput() is completed after unpacking initramfs
    and removes the call to flush_delayed_fput() in kernel_init() which
    happens very late after unpacking initramfs.

    Link: http://lkml.kernel.org/r/20170201140540.22051-1-lokeshvutla@ti.com
    Signed-off-by: Lokesh Vutla
    Reported-by: Murali Karicheri
    Cc: Al Viro
    Cc: Tero Kristo
    Cc: Sekhar Nori
    Cc: Nishanth Menon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Lokesh Vutla
     

12 Apr, 2017

1 commit

  • commit f5b98461cb8167ba362ad9f74c41d126b7becea7 upstream.

    Now that our crng uses chacha20, we can rely on its speedy
    characteristics for replacing MD5, while simultaneously achieving a
    higher security guarantee. Before the idea was to use these functions if
    you wanted random integers that aren't stupidly insecure but aren't
    necessarily secure either, a vague gray zone, that hopefully was "good
    enough" for its users. With chacha20, we can strengthen this claim,
    since either we're using an rdrand-like instruction, or we're using the
    same crng as /dev/urandom. And it's faster than what was before.

    We could have chosen to replace this with a SipHash-derived function,
    which might be slightly faster, but at the cost of having yet another
    RNG construction in the kernel. By moving to chacha20, we have a single
    RNG to analyze and verify, and we also already get good performance
    improvements on all platforms.

    Implementation-wise, rather than use a generic buffer for both
    get_random_int/long and memcpy based on the size needs, we use a
    specific buffer for 32-bit reads and for 64-bit reads. This way, we're
    guaranteed to always have aligned accesses on all platforms. While
    slightly more verbose in C, the assembly this generates is a lot
    simpler than otherwise.

    Finally, on 32-bit platforms where longs and ints are the same size,
    we simply alias get_random_int to get_random_long.

    Signed-off-by: Jason A. Donenfeld
    Suggested-by: Theodore Ts'o
    Cc: Theodore Ts'o
    Cc: Hannes Frederic Sowa
    Cc: Andy Lutomirski
    Signed-off-by: Theodore Ts'o
    Signed-off-by: Greg Kroah-Hartman

    Jason A. Donenfeld
     

30 Nov, 2016

1 commit

  • This enables CONFIG_MODVERSIONS again, but allows for missing symbol CRC
    information in order to work around the issue that newer binutils
    versions seem to occasionally drop the CRC on the floor. binutils 2.26
    seems to work fine, while binutils 2.27 seems to break MODVERSIONS of
    symbols that have been defined in assembler files.

    [ We've had random missing CRC's before - it may be an old problem that
    just is now reliably triggered with the weak asm symbols and a new
    version of binutils ]

    Some day I really do want to remove MODVERSIONS entirely. Sadly, today
    does not appear to be that day: Debian people apparently do want the
    option to enable MODVERSIONS to make it easier to have external modules
    across kernel versions, and this seems to be a fairly minimal fix for
    the annoying problem.

    Cc: Ben Hutchings
    Acked-by: Michal Marek
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

26 Nov, 2016

1 commit

  • CONFIG_MODVERSIONS has been broken for pretty much the whole 4.9 series,
    and quite frankly, nobody has cared very deeply. We absolutely know how
    to fix it, and it's not _complicated_, but it's not exactly pretty
    either.

    This oneliner fixes it without the ugliness, and allows for further
    future cleanups.

    "We've secretly replaced their regular MODVERSIONS with nothing at
    all, let's see if they notice"

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

25 Nov, 2016

1 commit


16 Oct, 2016

1 commit

  • Pull gcc plugins update from Kees Cook:
    "This adds a new gcc plugin named "latent_entropy". It is designed to
    extract as much possible uncertainty from a running system at boot
    time as possible, hoping to capitalize on any possible variation in
    CPU operation (due to runtime data differences, hardware differences,
    SMP ordering, thermal timing variation, cache behavior, etc).

    At the very least, this plugin is a much more comprehensive example
    for how to manipulate kernel code using the gcc plugin internals"

    * tag 'gcc-plugins-v4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
    latent_entropy: Mark functions with __latent_entropy
    gcc-plugins: Add latent_entropy plugin

    Linus Torvalds
     

15 Oct, 2016

1 commit

  • Pull kbuild updates from Michal Marek:

    - EXPORT_SYMBOL for asm source by Al Viro.

    This does bring a regression, because genksyms no longer generates
    checksums for these symbols (CONFIG_MODVERSIONS). Nick Piggin is
    working on a patch to fix this.

    Plus, we are talking about functions like strcpy(), which rarely
    change prototypes.

    - Fixes for PPC fallout of the above by Stephen Rothwell and Nick
    Piggin

    - fixdep speedup by Alexey Dobriyan.

    - preparatory work by Nick Piggin to allow architectures to build with
    -ffunction-sections, -fdata-sections and --gc-sections

    - CONFIG_THIN_ARCHIVES support by Stephen Rothwell

    - fix for filenames with colons in the initramfs source by me.

    * 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild: (22 commits)
    initramfs: Escape colons in depfile
    ppc: there is no clear_pages to export
    powerpc/64: whitelist unresolved modversions CRCs
    kbuild: -ffunction-sections fix for archs with conflicting sections
    kbuild: add arch specific post-link Makefile
    kbuild: allow archs to select link dead code/data elimination
    kbuild: allow architectures to use thin archives instead of ld -r
    kbuild: Regenerate genksyms lexer
    kbuild: genksyms fix for typeof handling
    fixdep: faster CONFIG_ search
    ia64: move exports to definitions
    sparc32: debride memcpy.S a bit
    [sparc] unify 32bit and 64bit string.h
    sparc: move exports to definitions
    ppc: move exports to definitions
    arm: move exports to definitions
    s390: move exports to definitions
    m68k: move exports to definitions
    alpha: move exports to actual definitions
    x86: move exports to actual definitions
    ...

    Linus Torvalds
     

12 Oct, 2016

1 commit

  • Relay avoids calling wake_up_interruptible() for doing the wakeup of
    readers/consumers, waiting for the generation of new data, from the
    context of a process which produced the data. This is apparently done to
    prevent the possibility of a deadlock in case Scheduler itself is is
    generating data for the relay, after acquiring rq->lock.

    The following patch used a timer (to be scheduled at next jiffy), for
    delegating the wakeup to another context.
    commit 7c9cb38302e78d24e37f7d8a2ea7eed4ae5f2fa7
    Author: Tom Zanussi
    Date: Wed May 9 02:34:01 2007 -0700

    relay: use plain timer instead of delayed work

    relay doesn't need to use schedule_delayed_work() for waking readers
    when a simple timer will do.

    Scheduling a plain timer, at next jiffies boundary, to do the wakeup
    causes a significant wakeup latency for the Userspace client, which makes
    relay less suitable for the high-frequency low-payload use cases where the
    data gets generated at a very high rate, like multiple sub buffers getting
    filled within a milli second. Moreover the timer is re-scheduled on every
    newly produced sub buffer so the timer keeps getting pushed out if sub
    buffers are filled in a very quick succession (less than a jiffy gap
    between filling of 2 sub buffers). As a result relay runs out of sub
    buffers to store the new data.

    By using irq_work it is ensured that wakeup of userspace client, blocked
    in the poll call, is done at earliest (through self IPI or next timer
    tick) enabling it to always consume the data in time. Also this makes
    relay consistent with printk & ring buffers (trace), as they too use
    irq_work for deferred wake up of readers.

    [arnd@arndb.de: select CONFIG_IRQ_WORK]
    Link: http://lkml.kernel.org/r/20160912154035.3222156-1-arnd@arndb.de
    [akpm@linux-foundation.org: coding-style fixes]
    Link: http://lkml.kernel.org/r/1472906487-1559-1-git-send-email-akash.goel@intel.com
    Signed-off-by: Peter Zijlstra
    Signed-off-by: Akash Goel
    Cc: Tom Zanussi
    Cc: Chris Wilson
    Cc: Tvrtko Ursulin
    Signed-off-by: Arnd Bergmann
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     

11 Oct, 2016

1 commit

  • This adds a new gcc plugin named "latent_entropy". It is designed to
    extract as much possible uncertainty from a running system at boot time as
    possible, hoping to capitalize on any possible variation in CPU operation
    (due to runtime data differences, hardware differences, SMP ordering,
    thermal timing variation, cache behavior, etc).

    At the very least, this plugin is a much more comprehensive example for
    how to manipulate kernel code using the gcc plugin internals.

    The need for very-early boot entropy tends to be very architecture or
    system design specific, so this plugin is more suited for those sorts
    of special cases. The existing kernel RNG already attempts to extract
    entropy from reliable runtime variation, but this plugin takes the idea to
    a logical extreme by permuting a global variable based on any variation
    in code execution (e.g. a different value (and permutation function)
    is used to permute the global based on loop count, case statement,
    if/then/else branching, etc).

    To do this, the plugin starts by inserting a local variable in every
    marked function. The plugin then adds logic so that the value of this
    variable is modified by randomly chosen operations (add, xor and rol) and
    random values (gcc generates separate static values for each location at
    compile time and also injects the stack pointer at runtime). The resulting
    value depends on the control flow path (e.g., loops and branches taken).

    Before the function returns, the plugin mixes this local variable into
    the latent_entropy global variable. The value of this global variable
    is added to the kernel entropy pool in do_one_initcall() and _do_fork(),
    though it does not credit any bytes of entropy to the pool; the contents
    of the global are just used to mix the pool.

    Additionally, the plugin can pre-initialize arrays with build-time
    random contents, so that two different kernel builds running on identical
    hardware will not have the same starting values.

    Signed-off-by: Emese Revfy
    [kees: expanded commit message and code comments]
    Signed-off-by: Kees Cook

    Emese Revfy
     

08 Oct, 2016

1 commit

  • Pull parisc updates from Helge Deller:
    "Changes include:

    - Fix boot of 32bit SMP kernel (initial kernel mapping was too small)

    - Added hardened usercopy checks

    - Drop bootmem and switch to memblock and NO_BOOTMEM implementation

    - Drop the BROKEN_RODATA config option (and thus remove the relevant
    code from the generic headers and files because parisc was the last
    architecture which used this config option)

    - Improve segfault reporting by printing human readable error strings

    - Various smaller changes, e.g. dwarf debug support for assembly
    code, update comments regarding copy_user_page_asm, switch to
    kmalloc_array()"

    * 'parisc-4.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
    parisc: Increase KERNEL_INITIAL_SIZE for 32-bit SMP kernels
    parisc: Drop bootmem and switch to memblock
    parisc: Add hardened usercopy feature
    parisc: Add cfi_startproc and cfi_endproc to assembly code
    parisc: Move hpmc stack into page aligned bss section
    parisc: Fix self-detected CPU stall warnings on Mako machines
    parisc: Report trap type as human readable string
    parisc: Update comment regarding implementation of copy_user_page_asm
    parisc: Use kmalloc_array() in add_system_map_addresses()
    parisc: Check return value of smp_boot_one_cpu()
    parisc: Drop BROKEN_RODATA config option

    Linus Torvalds
     

21 Sep, 2016

1 commit


16 Sep, 2016

1 commit

  • There are a few places in the kernel that access stack memory
    belonging to a different task. Before we can start freeing task
    stacks before the task_struct is freed, we need a way for those code
    paths to pin the stack.

    Signed-off-by: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Denys Vlasenko
    Cc: H. Peter Anvin
    Cc: Jann Horn
    Cc: Josh Poimboeuf
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/17a434f50ad3d77000104f21666575e10a9c1fbd.1474003868.git.luto@kernel.org
    Signed-off-by: Ingo Molnar

    Andy Lutomirski
     

15 Sep, 2016

1 commit

  • If an arch opts in by setting CONFIG_THREAD_INFO_IN_TASK_STRUCT,
    then thread_info is defined as a single 'u32 flags' and is the first
    entry of task_struct. thread_info::task is removed (it serves no
    purpose if thread_info is embedded in task_struct), and
    thread_info::cpu gets its own slot in task_struct.

    This is heavily based on a patch written by Linus.

    Originally-from: Linus Torvalds
    Signed-off-by: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Denys Vlasenko
    Cc: H. Peter Anvin
    Cc: Jann Horn
    Cc: Josh Poimboeuf
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/a0898196f0476195ca02713691a5037a14f2aac5.1473801993.git.luto@kernel.org
    Signed-off-by: Ingo Molnar

    Andy Lutomirski
     

09 Sep, 2016

1 commit

  • Introduce LD_DEAD_CODE_DATA_ELIMINATION option for architectures to
    select to build with -ffunction-sections, -fdata-sections, and link
    with --gc-sections. It requires some work (documented) to ensure all
    unreferenced entrypoints are live, and requires toolchain and build
    verification, so it is made a per-arch option for now.

    On a random powerpc64le build, this yelds a significant size saving,
    it boots and runs fine, but there is a lot I haven't tested as yet, so
    these savings may be reduced if there are bugs in the link.

    text data bss dec filename
    11169741 1180744 1923176 14273661 vmlinux
    10445269 1004127 1919707 13369103 vmlinux.dce

    ~700K text, ~170K data, 6% removed from kernel image size.

    Signed-off-by: Nicholas Piggin
    Signed-off-by: Michal Marek

    Nicholas Piggin
     

09 Aug, 2016

1 commit

  • Pull usercopy protection from Kees Cook:
    "Tbhis implements HARDENED_USERCOPY verification of copy_to_user and
    copy_from_user bounds checking for most architectures on SLAB and
    SLUB"

    * tag 'usercopy-v4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
    mm: SLUB hardened usercopy support
    mm: SLAB hardened usercopy support
    s390/uaccess: Enable hardened usercopy
    sparc/uaccess: Enable hardened usercopy
    powerpc/uaccess: Enable hardened usercopy
    ia64/uaccess: Enable hardened usercopy
    arm64/uaccess: Enable hardened usercopy
    ARM: uaccess: Enable hardened usercopy
    x86/uaccess: Enable hardened usercopy
    mm: Hardened usercopy
    mm: Implement stack frame object validation
    mm: Add is_migrate_cma_page

    Linus Torvalds
     

03 Aug, 2016

6 commits

  • It doesn't trim just symbols that are totally unused in-tree - it trims
    the symbols unused by any in-tree modules actually built. If you've
    done a 'make localmodconfig' and only build a hundred or so modules,
    it's pretty likely that your out-of-tree module will come up lacking
    something...

    Hopefully this will save the next guy from a Homer Simpson "D'oh!"
    moment.

    Link: http://lkml.kernel.org/r/10177.1469787292@turing-police.cc.vt.edu
    Signed-off-by: Valdis Kletnieks
    Cc: Michal Marek
    Cc: Rusty Russell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Valdis Kletnieks
     
  • Doing patches with allmodconfig kernel compiled and committing stuff
    into local tree have unfortunate consequence: kernel version changes (as
    it should) leading to recompiling and relinking of several files even if
    they weren't touched (or interesting at all). This and "git-whatever"
    figuring out current version slow down compilation for no good reason.

    But lets face it, "allmodconfig" kernels don't care about kernel
    version, they are simply compile check guinea pigs.

    Make LOCALVERSION_AUTO depend on !COMPILE_TEST, so it doesn't sneak into
    allmodconfig .config.

    Link: http://lkml.kernel.org/r/20160707214954.GC31678@p183.telecom.by
    Signed-off-by: Alexey Dobriyan
    Cc: Michal Marek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • sprint_symbol_no_offset() returns the string "function_name
    [module_name]" where [module_name] is not printed for built in kernel
    functions. This means that the blacklisting code will fail when
    comparing module function names with the extended string.

    This patch adds the functionality to block a module's module_init()
    function by finding the space in the string and truncating the
    comparison to that length.

    Link: http://lkml.kernel.org/r/1466124387-20446-1-git-send-email-prarit@redhat.com
    Signed-off-by: Prarit Bhargava
    Cc: Thomas Gleixner
    Cc: Yang Shi
    Cc: Prarit Bhargava
    Cc: Ingo Molnar
    Cc: Mel Gorman
    Cc: Rasmus Villemoes
    Cc: Kees Cook
    Cc: Yaowei Bai
    Cc: Andrey Ryabinin
    Cc: Rusty Russell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Prarit Bhargava
     
  • There was only one use of __initdata_refok and __exit_refok

    __init_refok was used 46 times against 82 for __ref.

    Those definitions are obsolete since commit 312b1485fb50 ("Introduce new
    section reference annotations tags: __ref, __refdata, __refconst")

    This patch removes the following compatibility definitions and replaces
    them treewide.

    /* compatibility defines */
    #define __init_refok __ref
    #define __initdata_refok __refdata
    #define __exit_refok __ref

    I can also provide separate patches if necessary.
    (One patch per tree and check in 1 month or 2 to remove old definitions)

    [akpm@linux-foundation.org: coding-style fixes]
    Link: http://lkml.kernel.org/r/1466796271-3043-1-git-send-email-fabf@skynet.be
    Signed-off-by: Fabian Frederick
    Cc: Ingo Molnar
    Cc: Sam Ravnborg
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Fabian Frederick
     
  • UML is a bit special since it does not have iomem nor dma. That means a
    lot of drivers will not build if they miss a dependency on HAS_IOMEM.
    s390 used to have the same issues but since it gained PCI support UML is
    the only stranger.

    We are tired of patching dozens of new drivers after every merge window
    just to un-break allmod/yesconfig UML builds. One could argue that a
    decent driver has to know on what it depends and therefore a missing
    HAS_IOMEM dependency is a clear driver bug. But the dependency not
    obvious and not everyone does UML builds with COMPILE_TEST enabled when
    developing a device driver.

    A possible solution to make these builds succeed on UML would be
    providing stub functions for ioremap() and friends which fail upon
    runtime. Another one is simply disabling COMPILE_TEST for UML. Since
    it is the least hassle and does not force use to fake iomem support
    let's do the latter.

    Link: http://lkml.kernel.org/r/1466152995-28367-1-git-send-email-richard@nod.at
    Signed-off-by: Richard Weinberger
    Acked-by: Arnd Bergmann
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Richard Weinberger
     
  • cgroup's document path is changed to "cgroup-v1". update it.

    Link: http://lkml.kernel.org/r/1470148443-6509-1-git-send-email-iamyooon@gmail.com
    Signed-off-by: seokhoon.yoon
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    seokhoon.yoon
     

29 Jul, 2016

1 commit

  • Pull trivial tree updates from Jiri Kosina.

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial:
    fat: fix error message for bogus number of directory entries
    fat: fix typo s/supeblock/superblock/
    ASoC: max9877: Remove unused function declaration
    dw2102: don't output spurious blank lines to the kernel log
    init: fix Kconfig text
    ARM: io: fix comment grammar
    ocfs: fix ocfs2_xattr_user_get() argument name
    scsi/qla2xxx: Remove erroneous unused macro qla82xx_get_temp_val1()

    Linus Torvalds
     

27 Jul, 2016

3 commits

  • Implements freelist randomization for the SLUB allocator. It was
    previous implemented for the SLAB allocator. Both use the same
    configuration option (CONFIG_SLAB_FREELIST_RANDOM).

    The list is randomized during initialization of a new set of pages. The
    order on different freelist sizes is pre-computed at boot for
    performance. Each kmem_cache has its own randomized freelist.

    This security feature reduces the predictability of the kernel SLUB
    allocator against heap overflows rendering attacks much less stable.

    For example these attacks exploit the predictability of the heap:
    - Linux Kernel CAN SLUB overflow (https://goo.gl/oMNWkU)
    - Exploiting Linux Kernel Heap corruptions (http://goo.gl/EXLn95)

    Performance results:

    slab_test impact is between 3% to 4% on average for 100000 attempts
    without smp. It is a very focused testing, kernbench show the overall
    impact on the system is way lower.

    Before:

    Single thread testing
    =====================
    1. Kmalloc: Repeatedly allocate then free test
    100000 times kmalloc(8) -> 49 cycles kfree -> 77 cycles
    100000 times kmalloc(16) -> 51 cycles kfree -> 79 cycles
    100000 times kmalloc(32) -> 53 cycles kfree -> 83 cycles
    100000 times kmalloc(64) -> 62 cycles kfree -> 90 cycles
    100000 times kmalloc(128) -> 81 cycles kfree -> 97 cycles
    100000 times kmalloc(256) -> 98 cycles kfree -> 121 cycles
    100000 times kmalloc(512) -> 95 cycles kfree -> 122 cycles
    100000 times kmalloc(1024) -> 96 cycles kfree -> 126 cycles
    100000 times kmalloc(2048) -> 115 cycles kfree -> 140 cycles
    100000 times kmalloc(4096) -> 149 cycles kfree -> 171 cycles
    2. Kmalloc: alloc/free test
    100000 times kmalloc(8)/kfree -> 70 cycles
    100000 times kmalloc(16)/kfree -> 70 cycles
    100000 times kmalloc(32)/kfree -> 70 cycles
    100000 times kmalloc(64)/kfree -> 70 cycles
    100000 times kmalloc(128)/kfree -> 70 cycles
    100000 times kmalloc(256)/kfree -> 69 cycles
    100000 times kmalloc(512)/kfree -> 70 cycles
    100000 times kmalloc(1024)/kfree -> 73 cycles
    100000 times kmalloc(2048)/kfree -> 72 cycles
    100000 times kmalloc(4096)/kfree -> 71 cycles

    After:

    Single thread testing
    =====================
    1. Kmalloc: Repeatedly allocate then free test
    100000 times kmalloc(8) -> 57 cycles kfree -> 78 cycles
    100000 times kmalloc(16) -> 61 cycles kfree -> 81 cycles
    100000 times kmalloc(32) -> 76 cycles kfree -> 93 cycles
    100000 times kmalloc(64) -> 83 cycles kfree -> 94 cycles
    100000 times kmalloc(128) -> 106 cycles kfree -> 107 cycles
    100000 times kmalloc(256) -> 118 cycles kfree -> 117 cycles
    100000 times kmalloc(512) -> 114 cycles kfree -> 116 cycles
    100000 times kmalloc(1024) -> 115 cycles kfree -> 118 cycles
    100000 times kmalloc(2048) -> 147 cycles kfree -> 131 cycles
    100000 times kmalloc(4096) -> 214 cycles kfree -> 161 cycles
    2. Kmalloc: alloc/free test
    100000 times kmalloc(8)/kfree -> 66 cycles
    100000 times kmalloc(16)/kfree -> 66 cycles
    100000 times kmalloc(32)/kfree -> 66 cycles
    100000 times kmalloc(64)/kfree -> 66 cycles
    100000 times kmalloc(128)/kfree -> 65 cycles
    100000 times kmalloc(256)/kfree -> 67 cycles
    100000 times kmalloc(512)/kfree -> 67 cycles
    100000 times kmalloc(1024)/kfree -> 64 cycles
    100000 times kmalloc(2048)/kfree -> 67 cycles
    100000 times kmalloc(4096)/kfree -> 67 cycles

    Kernbench, before:

    Average Optimal load -j 12 Run (std deviation):
    Elapsed Time 101.873 (1.16069)
    User Time 1045.22 (1.60447)
    System Time 88.969 (0.559195)
    Percent CPU 1112.9 (13.8279)
    Context Switches 189140 (2282.15)
    Sleeps 99008.6 (768.091)

    After:

    Average Optimal load -j 12 Run (std deviation):
    Elapsed Time 102.47 (0.562732)
    User Time 1045.3 (1.34263)
    System Time 88.311 (0.342554)
    Percent CPU 1105.8 (6.49444)
    Context Switches 189081 (2355.78)
    Sleeps 99231.5 (800.358)

    Link: http://lkml.kernel.org/r/1464295031-26375-3-git-send-email-thgarnie@google.com
    Signed-off-by: Thomas Garnier
    Reviewed-by: Kees Cook
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Thomas Garnier
     
  • Under CONFIG_HARDENED_USERCOPY, this adds object size checking to the
    SLUB allocator to catch any copies that may span objects. Includes a
    redzone handling fix discovered by Michael Ellerman.

    Based on code from PaX and grsecurity.

    Signed-off-by: Kees Cook
    Tested-by: Michael Ellerman
    Reviwed-by: Laura Abbott

    Kees Cook
     
  • Under CONFIG_HARDENED_USERCOPY, this adds object size checking to the
    SLAB allocator to catch any copies that may span objects.

    Based on code from PaX and grsecurity.

    Signed-off-by: Kees Cook
    Tested-by: Valdis Kletnieks

    Kees Cook
     

26 Jul, 2016

2 commits

  • Pull NOHZ updates from Ingo Molnar:

    - fix system/idle cputime leaked on cputime accounting (all nohz
    configs) (Rik van Riel)

    - remove the messy, ad-hoc irqtime account on nohz-full and make it
    compatible with CONFIG_IRQ_TIME_ACCOUNTING=y instead (Rik van Riel)

    - cleanups (Frederic Weisbecker)

    - remove unecessary irq disablement in the irqtime code (Rik van Riel)

    * 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    sched/cputime: Drop local_irq_save/restore from irqtime_account_irq()
    sched/cputime: Reorganize vtime native irqtime accounting headers
    sched/cputime: Clean up the old vtime gen irqtime accounting completely
    sched/cputime: Replace VTIME_GEN irq time code with IRQ_TIME_ACCOUNTING code
    sched/cputime: Count actually elapsed irq & softirq time

    Linus Torvalds
     
  • Pull RCU updates from Ingo Molnar:
    "The main changes in this cycle were:

    - documentation updates

    - miscellaneous fixes

    - minor reorganization of code

    - torture-test updates"

    * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (30 commits)
    rcu: Correctly handle sparse possible cpus
    rcu: sysctl: Panic on RCU Stall
    rcu: Fix a typo in a comment
    rcu: Make call_rcu_tasks() tolerate first call with irqs disabled
    rcu: Disable TASKS_RCU for usermode Linux
    rcu: No ordering for rcu_assign_pointer() of NULL
    rcutorture: Fix error return code in rcu_perf_init()
    torture: Inflict default jitter
    rcuperf: Don't treat gp_exp mis-setting as a WARN
    rcutorture: Drop "-soundhw pcspkr" from x86 boot arguments
    rcutorture: Don't specify the cpu type of QEMU on PPC
    rcutorture: Make -soundhw a x86 specific option
    rcutorture: Use vmlinux as the fallback kernel image
    rcutorture/doc: Create initrd using dracut
    torture: Stop onoff task if there is only one cpu
    torture: Add starvation events to error summary
    torture: Break online and offline functions out of torture_onoff()
    torture: Forgive lengthy trace dumps and preemption
    torture: Remove CONFIG_RCU_TORTURE_TEST_RUNNABLE, simplify code
    torture: Simplify code, eliminate RCU_PERF_TEST_RUNNABLE
    ...

    Linus Torvalds
     

14 Jul, 2016

1 commit

  • The CONFIG_VIRT_CPU_ACCOUNTING_GEN irq time tracking code does not
    appear to currently work right.

    On CPUs without nohz_full=, only tick based irq time sampling is
    done, which breaks down when dealing with a nohz_idle CPU.

    On firewalls and similar systems, no ticks may happen on a CPU for a
    while, and the irq time spent may never get accounted properly. This
    can cause issues with capacity planning and power saving, which use
    the CPU statistics as inputs in decision making.

    Remove the VTIME_GEN vtime irq time code, and replace it with the
    IRQ_TIME_ACCOUNTING code, when selected as a config option by the user.

    Signed-off-by: Rik van Riel
    Signed-off-by: Frederic Weisbecker
    Cc: Linus Torvalds
    Cc: Mike Galbraith
    Cc: Paolo Bonzini
    Cc: Peter Zijlstra
    Cc: Radim Krcmar
    Cc: Thomas Gleixner
    Cc: Wanpeng Li
    Link: http://lkml.kernel.org/r/1468421405-20056-3-git-send-email-fweisbec@gmail.com
    Signed-off-by: Ingo Molnar

    Rik van Riel
     

07 Jul, 2016

1 commit

  • The "expert" menu was broken (split) such that all entries in it after
    KALLSYMS were displayed in the "General setup" area instead of in the
    "Expert users" area. Fix this by adding one kconfig dependency.

    Yes, the Expert users menu is fragile. Problems like this have happened
    several times in the past. I will attempt to isolate the Expert users
    menu if there is interest in that.

    Fixes: 4d5d5664c900 ("x86: kallsyms: disable absolute percpu symbols on !SMP")
    Signed-off-by: Randy Dunlap
    Cc: Ard Biesheuvel
    Cc: stable@vger.kernel.org # 4.6
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     

30 Jun, 2016

1 commit


25 Jun, 2016

3 commits

  • Merge misc fixes from Andrew Morton:
    "Two weeks worth of fixes here"

    * emailed patches from Andrew Morton : (41 commits)
    init/main.c: fix initcall_blacklisted on ia64, ppc64 and parisc64
    autofs: don't get stuck in a loop if vfs_write() returns an error
    mm/page_owner: avoid null pointer dereference
    tools/vm/slabinfo: fix spelling mistake: "Ocurrences" -> "Occurrences"
    fs/nilfs2: fix potential underflow in call to crc32_le
    oom, suspend: fix oom_reaper vs. oom_killer_disable race
    ocfs2: disable BUG assertions in reading blocks
    mm, compaction: abort free scanner if split fails
    mm: prevent KASAN false positives in kmemleak
    mm/hugetlb: clear compound_mapcount when freeing gigantic pages
    mm/swap.c: flush lru pvecs on compound page arrival
    memcg: css_alloc should return an ERR_PTR value on error
    memcg: mem_cgroup_migrate() may be called with irq disabled
    hugetlb: fix nr_pmds accounting with shared page tables
    Revert "mm: disable fault around on emulated access bit architecture"
    Revert "mm: make faultaround produce old ptes"
    mailmap: add Boris Brezillon's email
    mailmap: add Antoine Tenart's email
    mm, sl[au]b: add __GFP_ATOMIC to the GFP reclaim mask
    mm: mempool: kasan: don't poot mempool objects in quarantine
    ...

    Linus Torvalds
     
  • When I replaced kasprintf("%pf") with a direct call to
    sprint_symbol_no_offset I must have broken the initcall blacklisting
    feature on the arches where dereference_function_descriptor() is
    non-trivial.

    Fixes: c8cdd2be213f (init/main.c: simplify initcall_blacklisted())
    Link: http://lkml.kernel.org/r/1466027283-4065-1-git-send-email-linux@rasmusvillemoes.dk
    Signed-off-by: Rasmus Villemoes
    Cc: Yang Shi
    Cc: Prarit Bhargava
    Cc: Petr Mladek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rasmus Villemoes
     
  • We've had the thread info allocated together with the thread stack for
    most architectures for a long time (since the thread_info was split off
    from the task struct), but that is about to change.

    But the patches that move the thread info to be off-stack (and a part of
    the task struct instead) made it clear how confused the allocator and
    freeing functions are.

    Because the common case was that we share an allocation with the thread
    stack and the thread_info, the two pointers were identical. That
    identity then meant that we would have things like

    ti = alloc_thread_info_node(tsk, node);
    ...
    tsk->stack = ti;

    which certainly _worked_ (since stack and thread_info have the same
    value), but is rather confusing: why are we assigning a thread_info to
    the stack? And if we move the thread_info away, the "confusing" code
    just gets to be entirely bogus.

    So remove all this confusion, and make it clear that we are doing the
    stack allocation by renaming and clarifying the function names to be
    about the stack. The fact that the thread_info then shares the
    allocation is an implementation detail, and not really about the
    allocation itself.

    This is a pure renaming and type fix: we pass in the same pointer, it's
    just that we clarify what the pointer means.

    The ia64 code that actually only has one single allocation (for all of
    task_struct, thread_info and kernel thread stack) now looks a bit odd,
    but since "tsk->stack" is actually not even used there, that oddity
    doesn't matter. It would be a separate thing to clean that up, I
    intentionally left the ia64 changes as a pure brute-force renaming and
    type change.

    Acked-by: Andy Lutomirski
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

21 Jun, 2016

1 commit


16 Jun, 2016

1 commit

  • Usermode Linux currently does not implement arch_irqs_disabled_flags(),
    which results in a build failure in TASKS_RCU. Therefore, this commit
    disables the TASKS_RCU Kconfig option in usermode Linux builds. The
    usermode Linux maintainers expect to merge arch_irqs_disabled_flags()
    into 4.8, at which point this commit may be reverted.

    Signed-off-by: Paul E. McKenney
    Cc: Jeff Dike
    Acked-by: Richard Weinberger

    Paul E. McKenney
     

28 May, 2016

1 commit

  • page_ext_init() checks suitable pages with pfn_to_nid(), but
    pfn_to_nid() depends on memmap which will not be setup fully until
    page_alloc_init_late() is done. Use early_pfn_to_nid() instead of
    pfn_to_nid() so that page extension could be still used early even
    though CONFIG_ DEFERRED_STRUCT_PAGE_INIT is enabled and catch early page
    allocation call sites.

    Suggested by Joonsoo Kim [1], this fix basically undoes the change
    introduced by commit b8f1a75d61d840 ("mm: call page_ext_init() after all
    struct pages are initialized") and fixes the same problem with a better
    approach.

    [1] http://lkml.kernel.org/r/CAAmzW4OUmyPwQjvd7QUfc6W1Aic__TyAuH80MLRZNMxKy0-wPQ@mail.gmail.com

    Link: http://lkml.kernel.org/r/1464198689-23458-1-git-send-email-yang.shi@linaro.org
    Signed-off-by: Yang Shi
    Cc: Joonsoo Kim
    Cc: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yang Shi