12 Oct, 2016

1 commit

  • Relay avoids calling wake_up_interruptible() for doing the wakeup of
    readers/consumers, waiting for the generation of new data, from the
    context of a process which produced the data. This is apparently done to
    prevent the possibility of a deadlock in case Scheduler itself is is
    generating data for the relay, after acquiring rq->lock.

    The following patch used a timer (to be scheduled at next jiffy), for
    delegating the wakeup to another context.
    commit 7c9cb38302e78d24e37f7d8a2ea7eed4ae5f2fa7
    Author: Tom Zanussi
    Date: Wed May 9 02:34:01 2007 -0700

    relay: use plain timer instead of delayed work

    relay doesn't need to use schedule_delayed_work() for waking readers
    when a simple timer will do.

    Scheduling a plain timer, at next jiffies boundary, to do the wakeup
    causes a significant wakeup latency for the Userspace client, which makes
    relay less suitable for the high-frequency low-payload use cases where the
    data gets generated at a very high rate, like multiple sub buffers getting
    filled within a milli second. Moreover the timer is re-scheduled on every
    newly produced sub buffer so the timer keeps getting pushed out if sub
    buffers are filled in a very quick succession (less than a jiffy gap
    between filling of 2 sub buffers). As a result relay runs out of sub
    buffers to store the new data.

    By using irq_work it is ensured that wakeup of userspace client, blocked
    in the poll call, is done at earliest (through self IPI or next timer
    tick) enabling it to always consume the data in time. Also this makes
    relay consistent with printk & ring buffers (trace), as they too use
    irq_work for deferred wake up of readers.

    [arnd@arndb.de: select CONFIG_IRQ_WORK]
    Link: http://lkml.kernel.org/r/20160912154035.3222156-1-arnd@arndb.de
    [akpm@linux-foundation.org: coding-style fixes]
    Link: http://lkml.kernel.org/r/1472906487-1559-1-git-send-email-akash.goel@intel.com
    Signed-off-by: Peter Zijlstra
    Signed-off-by: Akash Goel
    Cc: Tom Zanussi
    Cc: Chris Wilson
    Cc: Tvrtko Ursulin
    Signed-off-by: Arnd Bergmann
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     

08 Oct, 2016

1 commit

  • Pull parisc updates from Helge Deller:
    "Changes include:

    - Fix boot of 32bit SMP kernel (initial kernel mapping was too small)

    - Added hardened usercopy checks

    - Drop bootmem and switch to memblock and NO_BOOTMEM implementation

    - Drop the BROKEN_RODATA config option (and thus remove the relevant
    code from the generic headers and files because parisc was the last
    architecture which used this config option)

    - Improve segfault reporting by printing human readable error strings

    - Various smaller changes, e.g. dwarf debug support for assembly
    code, update comments regarding copy_user_page_asm, switch to
    kmalloc_array()"

    * 'parisc-4.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
    parisc: Increase KERNEL_INITIAL_SIZE for 32-bit SMP kernels
    parisc: Drop bootmem and switch to memblock
    parisc: Add hardened usercopy feature
    parisc: Add cfi_startproc and cfi_endproc to assembly code
    parisc: Move hpmc stack into page aligned bss section
    parisc: Fix self-detected CPU stall warnings on Mako machines
    parisc: Report trap type as human readable string
    parisc: Update comment regarding implementation of copy_user_page_asm
    parisc: Use kmalloc_array() in add_system_map_addresses()
    parisc: Check return value of smp_boot_one_cpu()
    parisc: Drop BROKEN_RODATA config option

    Linus Torvalds
     

21 Sep, 2016

1 commit


16 Sep, 2016

1 commit

  • There are a few places in the kernel that access stack memory
    belonging to a different task. Before we can start freeing task
    stacks before the task_struct is freed, we need a way for those code
    paths to pin the stack.

    Signed-off-by: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Denys Vlasenko
    Cc: H. Peter Anvin
    Cc: Jann Horn
    Cc: Josh Poimboeuf
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/17a434f50ad3d77000104f21666575e10a9c1fbd.1474003868.git.luto@kernel.org
    Signed-off-by: Ingo Molnar

    Andy Lutomirski
     

15 Sep, 2016

1 commit

  • If an arch opts in by setting CONFIG_THREAD_INFO_IN_TASK_STRUCT,
    then thread_info is defined as a single 'u32 flags' and is the first
    entry of task_struct. thread_info::task is removed (it serves no
    purpose if thread_info is embedded in task_struct), and
    thread_info::cpu gets its own slot in task_struct.

    This is heavily based on a patch written by Linus.

    Originally-from: Linus Torvalds
    Signed-off-by: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Denys Vlasenko
    Cc: H. Peter Anvin
    Cc: Jann Horn
    Cc: Josh Poimboeuf
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/a0898196f0476195ca02713691a5037a14f2aac5.1473801993.git.luto@kernel.org
    Signed-off-by: Ingo Molnar

    Andy Lutomirski
     

09 Aug, 2016

1 commit

  • Pull usercopy protection from Kees Cook:
    "Tbhis implements HARDENED_USERCOPY verification of copy_to_user and
    copy_from_user bounds checking for most architectures on SLAB and
    SLUB"

    * tag 'usercopy-v4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
    mm: SLUB hardened usercopy support
    mm: SLAB hardened usercopy support
    s390/uaccess: Enable hardened usercopy
    sparc/uaccess: Enable hardened usercopy
    powerpc/uaccess: Enable hardened usercopy
    ia64/uaccess: Enable hardened usercopy
    arm64/uaccess: Enable hardened usercopy
    ARM: uaccess: Enable hardened usercopy
    x86/uaccess: Enable hardened usercopy
    mm: Hardened usercopy
    mm: Implement stack frame object validation
    mm: Add is_migrate_cma_page

    Linus Torvalds
     

03 Aug, 2016

6 commits

  • It doesn't trim just symbols that are totally unused in-tree - it trims
    the symbols unused by any in-tree modules actually built. If you've
    done a 'make localmodconfig' and only build a hundred or so modules,
    it's pretty likely that your out-of-tree module will come up lacking
    something...

    Hopefully this will save the next guy from a Homer Simpson "D'oh!"
    moment.

    Link: http://lkml.kernel.org/r/10177.1469787292@turing-police.cc.vt.edu
    Signed-off-by: Valdis Kletnieks
    Cc: Michal Marek
    Cc: Rusty Russell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Valdis Kletnieks
     
  • Doing patches with allmodconfig kernel compiled and committing stuff
    into local tree have unfortunate consequence: kernel version changes (as
    it should) leading to recompiling and relinking of several files even if
    they weren't touched (or interesting at all). This and "git-whatever"
    figuring out current version slow down compilation for no good reason.

    But lets face it, "allmodconfig" kernels don't care about kernel
    version, they are simply compile check guinea pigs.

    Make LOCALVERSION_AUTO depend on !COMPILE_TEST, so it doesn't sneak into
    allmodconfig .config.

    Link: http://lkml.kernel.org/r/20160707214954.GC31678@p183.telecom.by
    Signed-off-by: Alexey Dobriyan
    Cc: Michal Marek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • sprint_symbol_no_offset() returns the string "function_name
    [module_name]" where [module_name] is not printed for built in kernel
    functions. This means that the blacklisting code will fail when
    comparing module function names with the extended string.

    This patch adds the functionality to block a module's module_init()
    function by finding the space in the string and truncating the
    comparison to that length.

    Link: http://lkml.kernel.org/r/1466124387-20446-1-git-send-email-prarit@redhat.com
    Signed-off-by: Prarit Bhargava
    Cc: Thomas Gleixner
    Cc: Yang Shi
    Cc: Prarit Bhargava
    Cc: Ingo Molnar
    Cc: Mel Gorman
    Cc: Rasmus Villemoes
    Cc: Kees Cook
    Cc: Yaowei Bai
    Cc: Andrey Ryabinin
    Cc: Rusty Russell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Prarit Bhargava
     
  • There was only one use of __initdata_refok and __exit_refok

    __init_refok was used 46 times against 82 for __ref.

    Those definitions are obsolete since commit 312b1485fb50 ("Introduce new
    section reference annotations tags: __ref, __refdata, __refconst")

    This patch removes the following compatibility definitions and replaces
    them treewide.

    /* compatibility defines */
    #define __init_refok __ref
    #define __initdata_refok __refdata
    #define __exit_refok __ref

    I can also provide separate patches if necessary.
    (One patch per tree and check in 1 month or 2 to remove old definitions)

    [akpm@linux-foundation.org: coding-style fixes]
    Link: http://lkml.kernel.org/r/1466796271-3043-1-git-send-email-fabf@skynet.be
    Signed-off-by: Fabian Frederick
    Cc: Ingo Molnar
    Cc: Sam Ravnborg
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Fabian Frederick
     
  • UML is a bit special since it does not have iomem nor dma. That means a
    lot of drivers will not build if they miss a dependency on HAS_IOMEM.
    s390 used to have the same issues but since it gained PCI support UML is
    the only stranger.

    We are tired of patching dozens of new drivers after every merge window
    just to un-break allmod/yesconfig UML builds. One could argue that a
    decent driver has to know on what it depends and therefore a missing
    HAS_IOMEM dependency is a clear driver bug. But the dependency not
    obvious and not everyone does UML builds with COMPILE_TEST enabled when
    developing a device driver.

    A possible solution to make these builds succeed on UML would be
    providing stub functions for ioremap() and friends which fail upon
    runtime. Another one is simply disabling COMPILE_TEST for UML. Since
    it is the least hassle and does not force use to fake iomem support
    let's do the latter.

    Link: http://lkml.kernel.org/r/1466152995-28367-1-git-send-email-richard@nod.at
    Signed-off-by: Richard Weinberger
    Acked-by: Arnd Bergmann
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Richard Weinberger
     
  • cgroup's document path is changed to "cgroup-v1". update it.

    Link: http://lkml.kernel.org/r/1470148443-6509-1-git-send-email-iamyooon@gmail.com
    Signed-off-by: seokhoon.yoon
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    seokhoon.yoon
     

29 Jul, 2016

1 commit

  • Pull trivial tree updates from Jiri Kosina.

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial:
    fat: fix error message for bogus number of directory entries
    fat: fix typo s/supeblock/superblock/
    ASoC: max9877: Remove unused function declaration
    dw2102: don't output spurious blank lines to the kernel log
    init: fix Kconfig text
    ARM: io: fix comment grammar
    ocfs: fix ocfs2_xattr_user_get() argument name
    scsi/qla2xxx: Remove erroneous unused macro qla82xx_get_temp_val1()

    Linus Torvalds
     

27 Jul, 2016

3 commits

  • Implements freelist randomization for the SLUB allocator. It was
    previous implemented for the SLAB allocator. Both use the same
    configuration option (CONFIG_SLAB_FREELIST_RANDOM).

    The list is randomized during initialization of a new set of pages. The
    order on different freelist sizes is pre-computed at boot for
    performance. Each kmem_cache has its own randomized freelist.

    This security feature reduces the predictability of the kernel SLUB
    allocator against heap overflows rendering attacks much less stable.

    For example these attacks exploit the predictability of the heap:
    - Linux Kernel CAN SLUB overflow (https://goo.gl/oMNWkU)
    - Exploiting Linux Kernel Heap corruptions (http://goo.gl/EXLn95)

    Performance results:

    slab_test impact is between 3% to 4% on average for 100000 attempts
    without smp. It is a very focused testing, kernbench show the overall
    impact on the system is way lower.

    Before:

    Single thread testing
    =====================
    1. Kmalloc: Repeatedly allocate then free test
    100000 times kmalloc(8) -> 49 cycles kfree -> 77 cycles
    100000 times kmalloc(16) -> 51 cycles kfree -> 79 cycles
    100000 times kmalloc(32) -> 53 cycles kfree -> 83 cycles
    100000 times kmalloc(64) -> 62 cycles kfree -> 90 cycles
    100000 times kmalloc(128) -> 81 cycles kfree -> 97 cycles
    100000 times kmalloc(256) -> 98 cycles kfree -> 121 cycles
    100000 times kmalloc(512) -> 95 cycles kfree -> 122 cycles
    100000 times kmalloc(1024) -> 96 cycles kfree -> 126 cycles
    100000 times kmalloc(2048) -> 115 cycles kfree -> 140 cycles
    100000 times kmalloc(4096) -> 149 cycles kfree -> 171 cycles
    2. Kmalloc: alloc/free test
    100000 times kmalloc(8)/kfree -> 70 cycles
    100000 times kmalloc(16)/kfree -> 70 cycles
    100000 times kmalloc(32)/kfree -> 70 cycles
    100000 times kmalloc(64)/kfree -> 70 cycles
    100000 times kmalloc(128)/kfree -> 70 cycles
    100000 times kmalloc(256)/kfree -> 69 cycles
    100000 times kmalloc(512)/kfree -> 70 cycles
    100000 times kmalloc(1024)/kfree -> 73 cycles
    100000 times kmalloc(2048)/kfree -> 72 cycles
    100000 times kmalloc(4096)/kfree -> 71 cycles

    After:

    Single thread testing
    =====================
    1. Kmalloc: Repeatedly allocate then free test
    100000 times kmalloc(8) -> 57 cycles kfree -> 78 cycles
    100000 times kmalloc(16) -> 61 cycles kfree -> 81 cycles
    100000 times kmalloc(32) -> 76 cycles kfree -> 93 cycles
    100000 times kmalloc(64) -> 83 cycles kfree -> 94 cycles
    100000 times kmalloc(128) -> 106 cycles kfree -> 107 cycles
    100000 times kmalloc(256) -> 118 cycles kfree -> 117 cycles
    100000 times kmalloc(512) -> 114 cycles kfree -> 116 cycles
    100000 times kmalloc(1024) -> 115 cycles kfree -> 118 cycles
    100000 times kmalloc(2048) -> 147 cycles kfree -> 131 cycles
    100000 times kmalloc(4096) -> 214 cycles kfree -> 161 cycles
    2. Kmalloc: alloc/free test
    100000 times kmalloc(8)/kfree -> 66 cycles
    100000 times kmalloc(16)/kfree -> 66 cycles
    100000 times kmalloc(32)/kfree -> 66 cycles
    100000 times kmalloc(64)/kfree -> 66 cycles
    100000 times kmalloc(128)/kfree -> 65 cycles
    100000 times kmalloc(256)/kfree -> 67 cycles
    100000 times kmalloc(512)/kfree -> 67 cycles
    100000 times kmalloc(1024)/kfree -> 64 cycles
    100000 times kmalloc(2048)/kfree -> 67 cycles
    100000 times kmalloc(4096)/kfree -> 67 cycles

    Kernbench, before:

    Average Optimal load -j 12 Run (std deviation):
    Elapsed Time 101.873 (1.16069)
    User Time 1045.22 (1.60447)
    System Time 88.969 (0.559195)
    Percent CPU 1112.9 (13.8279)
    Context Switches 189140 (2282.15)
    Sleeps 99008.6 (768.091)

    After:

    Average Optimal load -j 12 Run (std deviation):
    Elapsed Time 102.47 (0.562732)
    User Time 1045.3 (1.34263)
    System Time 88.311 (0.342554)
    Percent CPU 1105.8 (6.49444)
    Context Switches 189081 (2355.78)
    Sleeps 99231.5 (800.358)

    Link: http://lkml.kernel.org/r/1464295031-26375-3-git-send-email-thgarnie@google.com
    Signed-off-by: Thomas Garnier
    Reviewed-by: Kees Cook
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Thomas Garnier
     
  • Under CONFIG_HARDENED_USERCOPY, this adds object size checking to the
    SLUB allocator to catch any copies that may span objects. Includes a
    redzone handling fix discovered by Michael Ellerman.

    Based on code from PaX and grsecurity.

    Signed-off-by: Kees Cook
    Tested-by: Michael Ellerman
    Reviwed-by: Laura Abbott

    Kees Cook
     
  • Under CONFIG_HARDENED_USERCOPY, this adds object size checking to the
    SLAB allocator to catch any copies that may span objects.

    Based on code from PaX and grsecurity.

    Signed-off-by: Kees Cook
    Tested-by: Valdis Kletnieks

    Kees Cook
     

26 Jul, 2016

2 commits

  • Pull NOHZ updates from Ingo Molnar:

    - fix system/idle cputime leaked on cputime accounting (all nohz
    configs) (Rik van Riel)

    - remove the messy, ad-hoc irqtime account on nohz-full and make it
    compatible with CONFIG_IRQ_TIME_ACCOUNTING=y instead (Rik van Riel)

    - cleanups (Frederic Weisbecker)

    - remove unecessary irq disablement in the irqtime code (Rik van Riel)

    * 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    sched/cputime: Drop local_irq_save/restore from irqtime_account_irq()
    sched/cputime: Reorganize vtime native irqtime accounting headers
    sched/cputime: Clean up the old vtime gen irqtime accounting completely
    sched/cputime: Replace VTIME_GEN irq time code with IRQ_TIME_ACCOUNTING code
    sched/cputime: Count actually elapsed irq & softirq time

    Linus Torvalds
     
  • Pull RCU updates from Ingo Molnar:
    "The main changes in this cycle were:

    - documentation updates

    - miscellaneous fixes

    - minor reorganization of code

    - torture-test updates"

    * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (30 commits)
    rcu: Correctly handle sparse possible cpus
    rcu: sysctl: Panic on RCU Stall
    rcu: Fix a typo in a comment
    rcu: Make call_rcu_tasks() tolerate first call with irqs disabled
    rcu: Disable TASKS_RCU for usermode Linux
    rcu: No ordering for rcu_assign_pointer() of NULL
    rcutorture: Fix error return code in rcu_perf_init()
    torture: Inflict default jitter
    rcuperf: Don't treat gp_exp mis-setting as a WARN
    rcutorture: Drop "-soundhw pcspkr" from x86 boot arguments
    rcutorture: Don't specify the cpu type of QEMU on PPC
    rcutorture: Make -soundhw a x86 specific option
    rcutorture: Use vmlinux as the fallback kernel image
    rcutorture/doc: Create initrd using dracut
    torture: Stop onoff task if there is only one cpu
    torture: Add starvation events to error summary
    torture: Break online and offline functions out of torture_onoff()
    torture: Forgive lengthy trace dumps and preemption
    torture: Remove CONFIG_RCU_TORTURE_TEST_RUNNABLE, simplify code
    torture: Simplify code, eliminate RCU_PERF_TEST_RUNNABLE
    ...

    Linus Torvalds
     

14 Jul, 2016

1 commit

  • The CONFIG_VIRT_CPU_ACCOUNTING_GEN irq time tracking code does not
    appear to currently work right.

    On CPUs without nohz_full=, only tick based irq time sampling is
    done, which breaks down when dealing with a nohz_idle CPU.

    On firewalls and similar systems, no ticks may happen on a CPU for a
    while, and the irq time spent may never get accounted properly. This
    can cause issues with capacity planning and power saving, which use
    the CPU statistics as inputs in decision making.

    Remove the VTIME_GEN vtime irq time code, and replace it with the
    IRQ_TIME_ACCOUNTING code, when selected as a config option by the user.

    Signed-off-by: Rik van Riel
    Signed-off-by: Frederic Weisbecker
    Cc: Linus Torvalds
    Cc: Mike Galbraith
    Cc: Paolo Bonzini
    Cc: Peter Zijlstra
    Cc: Radim Krcmar
    Cc: Thomas Gleixner
    Cc: Wanpeng Li
    Link: http://lkml.kernel.org/r/1468421405-20056-3-git-send-email-fweisbec@gmail.com
    Signed-off-by: Ingo Molnar

    Rik van Riel
     

07 Jul, 2016

1 commit

  • The "expert" menu was broken (split) such that all entries in it after
    KALLSYMS were displayed in the "General setup" area instead of in the
    "Expert users" area. Fix this by adding one kconfig dependency.

    Yes, the Expert users menu is fragile. Problems like this have happened
    several times in the past. I will attempt to isolate the Expert users
    menu if there is interest in that.

    Fixes: 4d5d5664c900 ("x86: kallsyms: disable absolute percpu symbols on !SMP")
    Signed-off-by: Randy Dunlap
    Cc: Ard Biesheuvel
    Cc: stable@vger.kernel.org # 4.6
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     

30 Jun, 2016

1 commit


25 Jun, 2016

3 commits

  • Merge misc fixes from Andrew Morton:
    "Two weeks worth of fixes here"

    * emailed patches from Andrew Morton : (41 commits)
    init/main.c: fix initcall_blacklisted on ia64, ppc64 and parisc64
    autofs: don't get stuck in a loop if vfs_write() returns an error
    mm/page_owner: avoid null pointer dereference
    tools/vm/slabinfo: fix spelling mistake: "Ocurrences" -> "Occurrences"
    fs/nilfs2: fix potential underflow in call to crc32_le
    oom, suspend: fix oom_reaper vs. oom_killer_disable race
    ocfs2: disable BUG assertions in reading blocks
    mm, compaction: abort free scanner if split fails
    mm: prevent KASAN false positives in kmemleak
    mm/hugetlb: clear compound_mapcount when freeing gigantic pages
    mm/swap.c: flush lru pvecs on compound page arrival
    memcg: css_alloc should return an ERR_PTR value on error
    memcg: mem_cgroup_migrate() may be called with irq disabled
    hugetlb: fix nr_pmds accounting with shared page tables
    Revert "mm: disable fault around on emulated access bit architecture"
    Revert "mm: make faultaround produce old ptes"
    mailmap: add Boris Brezillon's email
    mailmap: add Antoine Tenart's email
    mm, sl[au]b: add __GFP_ATOMIC to the GFP reclaim mask
    mm: mempool: kasan: don't poot mempool objects in quarantine
    ...

    Linus Torvalds
     
  • When I replaced kasprintf("%pf") with a direct call to
    sprint_symbol_no_offset I must have broken the initcall blacklisting
    feature on the arches where dereference_function_descriptor() is
    non-trivial.

    Fixes: c8cdd2be213f (init/main.c: simplify initcall_blacklisted())
    Link: http://lkml.kernel.org/r/1466027283-4065-1-git-send-email-linux@rasmusvillemoes.dk
    Signed-off-by: Rasmus Villemoes
    Cc: Yang Shi
    Cc: Prarit Bhargava
    Cc: Petr Mladek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rasmus Villemoes
     
  • We've had the thread info allocated together with the thread stack for
    most architectures for a long time (since the thread_info was split off
    from the task struct), but that is about to change.

    But the patches that move the thread info to be off-stack (and a part of
    the task struct instead) made it clear how confused the allocator and
    freeing functions are.

    Because the common case was that we share an allocation with the thread
    stack and the thread_info, the two pointers were identical. That
    identity then meant that we would have things like

    ti = alloc_thread_info_node(tsk, node);
    ...
    tsk->stack = ti;

    which certainly _worked_ (since stack and thread_info have the same
    value), but is rather confusing: why are we assigning a thread_info to
    the stack? And if we move the thread_info away, the "confusing" code
    just gets to be entirely bogus.

    So remove all this confusion, and make it clear that we are doing the
    stack allocation by renaming and clarifying the function names to be
    about the stack. The fact that the thread_info then shares the
    allocation is an implementation detail, and not really about the
    allocation itself.

    This is a pure renaming and type fix: we pass in the same pointer, it's
    just that we clarify what the pointer means.

    The ia64 code that actually only has one single allocation (for all of
    task_struct, thread_info and kernel thread stack) now looks a bit odd,
    but since "tsk->stack" is actually not even used there, that oddity
    doesn't matter. It would be a separate thing to clean that up, I
    intentionally left the ia64 changes as a pure brute-force renaming and
    type change.

    Acked-by: Andy Lutomirski
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

21 Jun, 2016

1 commit


16 Jun, 2016

1 commit

  • Usermode Linux currently does not implement arch_irqs_disabled_flags(),
    which results in a build failure in TASKS_RCU. Therefore, this commit
    disables the TASKS_RCU Kconfig option in usermode Linux builds. The
    usermode Linux maintainers expect to merge arch_irqs_disabled_flags()
    into 4.8, at which point this commit may be reverted.

    Signed-off-by: Paul E. McKenney
    Cc: Jeff Dike
    Acked-by: Richard Weinberger

    Paul E. McKenney
     

28 May, 2016

1 commit

  • page_ext_init() checks suitable pages with pfn_to_nid(), but
    pfn_to_nid() depends on memmap which will not be setup fully until
    page_alloc_init_late() is done. Use early_pfn_to_nid() instead of
    pfn_to_nid() so that page extension could be still used early even
    though CONFIG_ DEFERRED_STRUCT_PAGE_INIT is enabled and catch early page
    allocation call sites.

    Suggested by Joonsoo Kim [1], this fix basically undoes the change
    introduced by commit b8f1a75d61d840 ("mm: call page_ext_init() after all
    struct pages are initialized") and fixes the same problem with a better
    approach.

    [1] http://lkml.kernel.org/r/CAAmzW4OUmyPwQjvd7QUfc6W1Aic__TyAuH80MLRZNMxKy0-wPQ@mail.gmail.com

    Link: http://lkml.kernel.org/r/1464198689-23458-1-git-send-email-yang.shi@linaro.org
    Signed-off-by: Yang Shi
    Cc: Joonsoo Kim
    Cc: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yang Shi
     

27 May, 2016

1 commit

  • Pull kbuild updates from Michal Marek:

    - new option CONFIG_TRIM_UNUSED_KSYMS which does a two-pass build and
    unexports symbols which are not used in the current config [Nicolas
    Pitre]

    - several kbuild rule cleanups [Masahiro Yamada]

    - warning option adjustments for gcov etc [Arnd Bergmann]

    - a few more small fixes

    * 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild: (31 commits)
    kbuild: move -Wunused-const-variable to W=1 warning level
    kbuild: fix if_change and friends to consider argument order
    kbuild: fix adjust_autoksyms.sh for modules that need only one symbol
    kbuild: fix ksym_dep_filter when multiple EXPORT_SYMBOL() on the same line
    gcov: disable -Wmaybe-uninitialized warning
    gcov: disable tree-loop-im to reduce stack usage
    gcov: disable for COMPILE_TEST
    Kbuild: disable 'maybe-uninitialized' warning for CONFIG_PROFILE_ALL_BRANCHES
    Kbuild: change CC_OPTIMIZE_FOR_SIZE definition
    kbuild: forbid kernel directory to contain spaces and colons
    kbuild: adjust ksym_dep_filter for some cmd_* renames
    kbuild: Fix dependencies for final vmlinux link
    kbuild: better abstract vmlinux sequential prerequisites
    kbuild: fix call to adjust_autoksyms.sh when output directory specified
    kbuild: Get rid of KBUILD_STR
    kbuild: rename cmd_as_s_S to cmd_cpp_s_S
    kbuild: rename cmd_cc_i_c to cmd_cpp_i_c
    kbuild: drop redundant "PHONY += FORCE"
    kbuild: delete unnecessary "@:"
    kbuild: mark help target as PHONY
    ...

    Linus Torvalds
     

21 May, 2016

4 commits

  • Using kasprintf to get the function name makes us look up the name
    twice, along with all the vsnprintf overhead of parsing the format
    string etc. It also means there is an allocation failure case to deal
    with. Since symbol_string in vsprintf.c would anyway allocate an array
    of size KSYM_SYMBOL_LEN on the stack, that might as well be done up
    here.

    Moreover, since this is a debug feature and the blacklisted_initcalls
    list is usually empty, we might as well test that and thus avoid looking
    up the symbol name even once in the common case.

    Signed-off-by: Rasmus Villemoes
    Acked-by: Rusty Russell
    Acked-by: Prarit Bhargava
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rasmus Villemoes
     
  • Testing has shown that the backtrace sometimes does not fit into the 4kB
    temporary buffer that is used in NMI context. The warnings are gone
    when I double the temporary buffer size.

    This patch doubles the buffer size and makes it configurable.

    Note that this problem existed even in the x86-specific implementation
    that was added by the commit a9edc8809328 ("x86/nmi: Perform a safe NMI
    stack trace on all CPUs"). Nobody noticed it because it did not print
    any warnings.

    Signed-off-by: Petr Mladek
    Cc: Jan Kara
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Russell King
    Cc: Daniel Thompson
    Cc: Jiri Kosina
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: Ralf Baechle
    Cc: Benjamin Herrenschmidt
    Cc: Martin Schwidefsky
    Cc: David Miller
    Cc: Daniel Thompson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Petr Mladek
     
  • printk() takes some locks and could not be used a safe way in NMI
    context.

    The chance of a deadlock is real especially when printing stacks from
    all CPUs. This particular problem has been addressed on x86 by the
    commit a9edc8809328 ("x86/nmi: Perform a safe NMI stack trace on all
    CPUs").

    The patchset brings two big advantages. First, it makes the NMI
    backtraces safe on all architectures for free. Second, it makes all NMI
    messages almost safe on all architectures (the temporary buffer is
    limited. We still should keep the number of messages in NMI context at
    minimum).

    Note that there already are several messages printed in NMI context:
    WARN_ON(in_nmi()), BUG_ON(in_nmi()), anything being printed out from MCE
    handlers. These are not easy to avoid.

    This patch reuses most of the code and makes it generic. It is useful
    for all messages and architectures that support NMI.

    The alternative printk_func is set when entering and is reseted when
    leaving NMI context. It queues IRQ work to copy the messages into the
    main ring buffer in a safe context.

    __printk_nmi_flush() copies all available messages and reset the buffer.
    Then we could use a simple cmpxchg operations to get synchronized with
    writers. There is also used a spinlock to get synchronized with other
    flushers.

    We do not longer use seq_buf because it depends on external lock. It
    would be hard to make all supported operations safe for a lockless use.
    It would be confusing and error prone to make only some operations safe.

    The code is put into separate printk/nmi.c as suggested by Steven
    Rostedt. It needs a per-CPU buffer and is compiled only on
    architectures that call nmi_enter(). This is achieved by the new
    HAVE_NMI Kconfig flag.

    The are MN10300 and Xtensa architectures. We need to clean up NMI
    handling there first. Let's do it separately.

    The patch is heavily based on the draft from Peter Zijlstra, see

    https://lkml.org/lkml/2015/6/10/327

    [arnd@arndb.de: printk-nmi: use %zu format string for size_t]
    [akpm@linux-foundation.org: min_t->min - all types are size_t here]
    Signed-off-by: Petr Mladek
    Suggested-by: Peter Zijlstra
    Suggested-by: Steven Rostedt
    Cc: Jan Kara
    Acked-by: Russell King [arm part]
    Cc: Daniel Thompson
    Cc: Jiri Kosina
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: Ralf Baechle
    Cc: Benjamin Herrenschmidt
    Cc: Martin Schwidefsky
    Cc: David Miller
    Cc: Daniel Thompson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Petr Mladek
     
  • When DEFERRED_STRUCT_PAGE_INIT is enabled, just a subset of memmap at
    boot are initialized, then the rest are initialized in parallel by
    starting one-off "pgdatinitX" kernel thread for each node X.

    If page_ext_init is called before it, some pages will not have valid
    extension, this may lead the below kernel oops when booting up kernel:

    BUG: unable to handle kernel NULL pointer dereference at (null)
    IP: [] free_pcppages_bulk+0x2d2/0x8d0
    PGD 0
    Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
    Modules linked in:
    CPU: 11 PID: 106 Comm: pgdatinit1 Not tainted 4.6.0-rc5-next-20160427 #26
    Hardware name: Intel Corporation S5520HC/S5520HC, BIOS S5500.86B.01.10.0025.030220091519 03/02/2009
    task: ffff88017c080040 ti: ffff88017c084000 task.ti: ffff88017c084000
    RIP: 0010:[] [] free_pcppages_bulk+0x2d2/0x8d0
    RSP: 0000:ffff88017c087c48 EFLAGS: 00010046
    RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000001
    RDX: 0000000000000980 RSI: 0000000000000080 RDI: 0000000000660401
    RBP: ffff88017c087cd0 R08: 0000000000000401 R09: 0000000000000009
    R10: ffff88017c080040 R11: 000000000000000a R12: 0000000000000400
    R13: ffffea0019810000 R14: ffffea0019810040 R15: ffff88066cfe6080
    FS: 0000000000000000(0000) GS:ffff88066cd40000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000000000000 CR3: 0000000002406000 CR4: 00000000000006e0
    Call Trace:
    free_hot_cold_page+0x192/0x1d0
    __free_pages+0x5c/0x90
    __free_pages_boot_core+0x11a/0x14e
    deferred_free_range+0x50/0x62
    deferred_init_memmap+0x220/0x3c3
    kthread+0xf8/0x110
    ret_from_fork+0x22/0x40
    Code: 49 89 d4 48 c1 e0 06 49 01 c5 e9 de fe ff ff 4c 89 f7 44 89 4d b8 4c 89 45 c0 44 89 5d c8 48 89 4d d0 e8 62 c7 07 00 48 8b 4d d0 8b 00 44 8b 5d c8 4c 8b 45 c0 44 8b 4d b8 a8 02 0f 84 05 ff
    RIP [] free_pcppages_bulk+0x2d2/0x8d0
    RSP
    CR2: 0000000000000000

    Move page_ext_init() after page_alloc_init_late() to make sure page extension
    is setup for all pages.

    Link: http://lkml.kernel.org/r/1463696006-31360-1-git-send-email-yang.shi@linaro.org
    Signed-off-by: Yang Shi
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yang Shi
     

20 May, 2016

1 commit

  • Provides an optional config (CONFIG_SLAB_FREELIST_RANDOM) to randomize
    the SLAB freelist. The list is randomized during initialization of a
    new set of pages. The order on different freelist sizes is pre-computed
    at boot for performance. Each kmem_cache has its own randomized
    freelist. Before pre-computed lists are available freelists are
    generated dynamically. This security feature reduces the predictability
    of the kernel SLAB allocator against heap overflows rendering attacks
    much less stable.

    For example this attack against SLUB (also applicable against SLAB)
    would be affected:

    https://jon.oberheide.org/blog/2010/09/10/linux-kernel-can-slub-overflow/

    Also, since v4.6 the freelist was moved at the end of the SLAB. It
    means a controllable heap is opened to new attacks not yet publicly
    discussed. A kernel heap overflow can be transformed to multiple
    use-after-free. This feature makes this type of attack harder too.

    To generate entropy, we use get_random_bytes_arch because 0 bits of
    entropy is available in the boot stage. In the worse case this function
    will fallback to the get_random_bytes sub API. We also generate a shift
    random number to shift pre-computed freelist for each new set of pages.

    The config option name is not specific to the SLAB as this approach will
    be extended to other allocators like SLUB.

    Performance results highlighted no major changes:

    Hackbench (running 90 10 times):

    Before average: 0.0698
    After average: 0.0663 (-5.01%)

    slab_test 1 run on boot. Difference only seen on the 2048 size test
    being the worse case scenario covered by freelist randomization. New
    slab pages are constantly being created on the 10000 allocations.
    Variance should be mainly due to getting new pages every few
    allocations.

    Before:

    Single thread testing
    =====================
    1. Kmalloc: Repeatedly allocate then free test
    10000 times kmalloc(8) -> 99 cycles kfree -> 112 cycles
    10000 times kmalloc(16) -> 109 cycles kfree -> 140 cycles
    10000 times kmalloc(32) -> 129 cycles kfree -> 137 cycles
    10000 times kmalloc(64) -> 141 cycles kfree -> 141 cycles
    10000 times kmalloc(128) -> 152 cycles kfree -> 148 cycles
    10000 times kmalloc(256) -> 195 cycles kfree -> 167 cycles
    10000 times kmalloc(512) -> 257 cycles kfree -> 199 cycles
    10000 times kmalloc(1024) -> 393 cycles kfree -> 251 cycles
    10000 times kmalloc(2048) -> 649 cycles kfree -> 228 cycles
    10000 times kmalloc(4096) -> 806 cycles kfree -> 370 cycles
    10000 times kmalloc(8192) -> 814 cycles kfree -> 411 cycles
    10000 times kmalloc(16384) -> 892 cycles kfree -> 455 cycles
    2. Kmalloc: alloc/free test
    10000 times kmalloc(8)/kfree -> 121 cycles
    10000 times kmalloc(16)/kfree -> 121 cycles
    10000 times kmalloc(32)/kfree -> 121 cycles
    10000 times kmalloc(64)/kfree -> 121 cycles
    10000 times kmalloc(128)/kfree -> 121 cycles
    10000 times kmalloc(256)/kfree -> 119 cycles
    10000 times kmalloc(512)/kfree -> 119 cycles
    10000 times kmalloc(1024)/kfree -> 119 cycles
    10000 times kmalloc(2048)/kfree -> 119 cycles
    10000 times kmalloc(4096)/kfree -> 121 cycles
    10000 times kmalloc(8192)/kfree -> 119 cycles
    10000 times kmalloc(16384)/kfree -> 119 cycles

    After:

    Single thread testing
    =====================
    1. Kmalloc: Repeatedly allocate then free test
    10000 times kmalloc(8) -> 130 cycles kfree -> 86 cycles
    10000 times kmalloc(16) -> 118 cycles kfree -> 86 cycles
    10000 times kmalloc(32) -> 121 cycles kfree -> 85 cycles
    10000 times kmalloc(64) -> 176 cycles kfree -> 102 cycles
    10000 times kmalloc(128) -> 178 cycles kfree -> 100 cycles
    10000 times kmalloc(256) -> 205 cycles kfree -> 109 cycles
    10000 times kmalloc(512) -> 262 cycles kfree -> 136 cycles
    10000 times kmalloc(1024) -> 342 cycles kfree -> 157 cycles
    10000 times kmalloc(2048) -> 701 cycles kfree -> 238 cycles
    10000 times kmalloc(4096) -> 803 cycles kfree -> 364 cycles
    10000 times kmalloc(8192) -> 835 cycles kfree -> 404 cycles
    10000 times kmalloc(16384) -> 896 cycles kfree -> 441 cycles
    2. Kmalloc: alloc/free test
    10000 times kmalloc(8)/kfree -> 121 cycles
    10000 times kmalloc(16)/kfree -> 121 cycles
    10000 times kmalloc(32)/kfree -> 123 cycles
    10000 times kmalloc(64)/kfree -> 142 cycles
    10000 times kmalloc(128)/kfree -> 121 cycles
    10000 times kmalloc(256)/kfree -> 119 cycles
    10000 times kmalloc(512)/kfree -> 119 cycles
    10000 times kmalloc(1024)/kfree -> 119 cycles
    10000 times kmalloc(2048)/kfree -> 119 cycles
    10000 times kmalloc(4096)/kfree -> 119 cycles
    10000 times kmalloc(8192)/kfree -> 119 cycles
    10000 times kmalloc(16384)/kfree -> 119 cycles

    [akpm@linux-foundation.org: propagate gfp_t into cache_random_seq_create()]
    Signed-off-by: Thomas Garnier
    Acked-by: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Cc: Kees Cook
    Cc: Greg Thelen
    Cc: Laura Abbott
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Thomas Garnier
     

10 May, 2016

1 commit

  • CC_OPTIMIZE_FOR_SIZE disables the often useful -Wmaybe-unused warning,
    because that causes a ridiculous amount of false positives when combined
    with -Os.

    This means a lot of warnings don't show up in testing by the developers
    that should see them with an 'allmodconfig' kernel that has
    CC_OPTIMIZE_FOR_SIZE enabled, but only later in randconfig builds
    that don't.

    This changes the Kconfig logic around CC_OPTIMIZE_FOR_SIZE to make
    it a 'choice' statement defaulting to CC_OPTIMIZE_FOR_PERFORMANCE
    that gets added for this purpose. The allmodconfig and allyesconfig
    kernels now default to -O2 with the maybe-unused warning enabled.

    Signed-off-by: Arnd Bergmann
    Signed-off-by: Michal Marek

    Arnd Bergmann
     

02 Apr, 2016

1 commit

  • Newer Fedora and OpenSUSE didn't boot with my standard configuration.
    It took me some time to figure out why, in fact I had to write a script
    to try different config options systematically.

    The problem is that something (systemd) in dracut depends on
    CONFIG_FHANDLE, which adds open by file handle syscalls.

    While it is set in defconfigs it is very easy to miss when updating
    older configs because it is not default y.

    Make it default y and also depend on EXPERT, as dracut use is likely
    widespread.

    Signed-off-by: Andi Kleen
    Cc: Richard Weinberger
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andi Kleen
     

30 Mar, 2016

1 commit


19 Mar, 2016

1 commit

  • Pull cgroup updates from Tejun Heo:
    "cgroup changes for v4.6-rc1. No userland visible behavior changes in
    this pull request. I'll send out a separate pull request for the
    addition of cgroup namespace support.

    - The biggest change is the revamping of cgroup core task migration
    and controller handling logic. There are quite a few places where
    controllers and tasks are manipulated. Previously, many of those
    places implemented custom operations for each specific use case
    assuming specific starting conditions. While this worked, it makes
    the code fragile and difficult to follow.

    The bulk of this pull request restructures these operations so that
    most related operations are performed through common helpers which
    implement recursive (subtrees are always processed consistently)
    and idempotent (they make cgroup hierarchy converge to the target
    state rather than performing operations assuming specific starting
    conditions). This makes the code a lot easier to understand,
    verify and extend.

    - Implicit controller support is added. This is primarily for using
    perf_event on the v2 hierarchy so that perf can match cgroup v2
    path without requiring the user to do anything special. The kernel
    portion of perf_event changes is acked but userland changes are
    still pending review.

    - cgroup_no_v1= boot parameter added to ease testing cgroup v2 in
    certain environments.

    - There is a regression introduced during v4.4 devel cycle where
    attempts to migrate zombie tasks can mess up internal object
    management. This was fixed earlier this week and included in this
    pull request w/ stable cc'd.

    - Misc non-critical fixes and improvements"

    * 'for-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (44 commits)
    cgroup: avoid false positive gcc-6 warning
    cgroup: ignore css_sets associated with dead cgroups during migration
    Documentation: cgroup v2: Trivial heading correction.
    cgroup: implement cgroup_subsys->implicit_on_dfl
    cgroup: use css_set->mg_dst_cgrp for the migration target cgroup
    cgroup: make cgroup[_taskset]_migrate() take cgroup_root instead of cgroup
    cgroup: move migration destination verification out of cgroup_migrate_prepare_dst()
    cgroup: fix incorrect destination cgroup in cgroup_update_dfl_csses()
    cgroup: Trivial correction to reflect controller.
    cgroup: remove stale item in cgroup-v1 document INDEX file.
    cgroup: update css iteration in cgroup_update_dfl_csses()
    cgroup: allocate 2x cgrp_cset_links when setting up a new root
    cgroup: make cgroup_calc_subtree_ss_mask() take @this_ss_mask
    cgroup: reimplement rebind_subsystems() using cgroup_apply_control() and friends
    cgroup: use cgroup_apply_enable_control() in cgroup creation path
    cgroup: combine cgroup_mutex locking and offline css draining
    cgroup: factor out cgroup_{apply|finalize}_control() from cgroup_subtree_control_write()
    cgroup: introduce cgroup_{save|propagate|restore}_control()
    cgroup: make cgroup_drain_offline() and cgroup_apply_control_{disable|enable}() recursive
    cgroup: factor out cgroup_apply_control_enable() from cgroup_subtree_control_write()
    ...

    Linus Torvalds
     

18 Mar, 2016

1 commit

  • Pull security layer updates from James Morris:
    "There are a bunch of fixes to the TPM, IMA, and Keys code, with minor
    fixes scattered across the subsystem.

    IMA now requires signed policy, and that policy is also now measured
    and appraised"

    * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (67 commits)
    X.509: Make algo identifiers text instead of enum
    akcipher: Move the RSA DER encoding check to the crypto layer
    crypto: Add hash param to pkcs1pad
    sign-file: fix build with CMS support disabled
    MAINTAINERS: update tpmdd urls
    MODSIGN: linux/string.h should be #included to get memcpy()
    certs: Fix misaligned data in extra certificate list
    X.509: Handle midnight alternative notation in GeneralizedTime
    X.509: Support leap seconds
    Handle ISO 8601 leap seconds and encodings of midnight in mktime64()
    X.509: Fix leap year handling again
    PKCS#7: fix unitialized boolean 'want'
    firmware: change kernel read fail to dev_dbg()
    KEYS: Use the symbol value for list size, updated by scripts/insert-sys-cert
    KEYS: Reserve an extra certificate symbol for inserting without recompiling
    modsign: hide openssl output in silent builds
    tpm_tis: fix build warning with tpm_tis_resume
    ima: require signed IMA policy
    ima: measure and appraise the IMA policy itself
    ima: load policy using path
    ...

    Linus Torvalds
     

17 Mar, 2016

1 commit

  • Merge first patch-bomb from Andrew Morton:

    - some misc things

    - ofs2 updates

    - about half of MM

    - checkpatch updates

    - autofs4 update

    * emailed patches from Andrew Morton : (120 commits)
    autofs4: fix string.h include in auto_dev-ioctl.h
    autofs4: use pr_xxx() macros directly for logging
    autofs4: change log print macros to not insert newline
    autofs4: make autofs log prints consistent
    autofs4: fix some white space errors
    autofs4: fix invalid ioctl return in autofs4_root_ioctl_unlocked()
    autofs4: fix coding style line length in autofs4_wait()
    autofs4: fix coding style problem in autofs4_get_set_timeout()
    autofs4: coding style fixes
    autofs: show pipe inode in mount options
    kallsyms: add support for relative offsets in kallsyms address table
    kallsyms: don't overload absolute symbol type for percpu symbols
    x86: kallsyms: disable absolute percpu symbols on !SMP
    checkpatch: fix another left brace warning
    checkpatch: improve UNSPECIFIED_INT test for bare signed/unsigned uses
    checkpatch: warn on bare unsigned or signed declarations without int
    checkpatch: exclude asm volatile from complex macro check
    mm: memcontrol: drop unnecessary lru locking from mem_cgroup_migrate()
    mm: migrate: consolidate mem_cgroup_migrate() calls
    mm/compaction: speed up pageblock_pfn_to_page() when zone is contiguous
    ...

    Linus Torvalds
     

16 Mar, 2016

1 commit

  • Similar to how relative extables are implemented, it is possible to emit
    the kallsyms table in such a way that it contains offsets relative to
    some anchor point in the kernel image rather than absolute addresses.

    On 64-bit architectures, it cuts the size of the kallsyms address table
    in half, since offsets between kernel symbols can typically be expressed
    in 32 bits. This saves several hundreds of kilobytes of permanent
    .rodata on average. In addition, the kallsyms address table is no
    longer subject to dynamic relocation when CONFIG_RELOCATABLE is in
    effect, so the relocation work done after decompression now doesn't have
    to do relocation updates for all these values. This saves up to 24
    bytes (i.e., the size of a ELF64 RELA relocation table entry) per value,
    which easily adds up to a couple of megabytes of uncompressed __init
    data on ppc64 or arm64. Even if these relocation entries typically
    compress well, the combined size reduction of 2.8 MB uncompressed for a
    ppc64_defconfig build (of which 2.4 MB is __init data) results in a ~500
    KB space saving in the compressed image.

    Since it is useful for some architectures (like x86) to retain the
    ability to emit absolute values as well, this patch also adds support
    for capturing both absolute and relative values when
    KALLSYMS_ABSOLUTE_PERCPU is in effect, by emitting absolute per-cpu
    addresses as positive 32-bit values, and addresses relative to the
    lowest encountered relative symbol as negative values, which are
    subtracted from the runtime address of this base symbol to produce the
    actual address.

    Support for the above is enabled by default for all architectures except
    IA-64 and Tile-GX, whose symbols are too far apart to capture in this
    manner.

    Signed-off-by: Ard Biesheuvel
    Tested-by: Guenter Roeck
    Reviewed-by: Kees Cook
    Tested-by: Kees Cook
    Cc: Heiko Carstens
    Cc: Michael Ellerman
    Cc: Ingo Molnar
    Cc: H. Peter Anvin
    Cc: Benjamin Herrenschmidt
    Cc: Michal Marek
    Cc: Rusty Russell
    Cc: Arnd Bergmann
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ard Biesheuvel