13 Sep, 2013

2 commits

  • Unlike global OOM handling, memory cgroup code will invoke the OOM killer
    in any OOM situation because it has no way of telling faults occuring in
    kernel context - which could be handled more gracefully - from
    user-triggered faults.

    Pass a flag that identifies faults originating in user space from the
    architecture-specific fault handlers to generic code so that memcg OOM
    handling can be improved.

    Signed-off-by: Johannes Weiner
    Reviewed-by: Michal Hocko
    Cc: David Rientjes
    Cc: KAMEZAWA Hiroyuki
    Cc: azurIt
    Cc: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • Kernel faults are expected to handle OOM conditions gracefully (gup,
    uaccess etc.), so they should never invoke the OOM killer. Reserve this
    for faults triggered in user context when it is the only option.

    Most architectures already do this, fix up the remaining few.

    Signed-off-by: Johannes Weiner
    Reviewed-by: Michal Hocko
    Acked-by: KOSAKI Motohiro
    Cc: David Rientjes
    Cc: KAMEZAWA Hiroyuki
    Cc: azurIt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     

07 Sep, 2013

2 commits

  • These handlers are not optional and need in our case
    dummy implementions to avoid NULL pointer bugs within
    the irq core code.

    Reported-and-tested-by: Toralf Foester
    Signed-off-by: Richard Weinberger

    Richard Weinberger
     
  • UML needs it's own probe_kernel_read() to handle kernel
    mode faults correctly.
    The implementation uses mincore() on the host side to detect
    whether a page is owned by the UML kernel process.

    This fixes also a possible crash when sysrq-t is used.
    Starting with 3.10 sysrq-t calls probe_kernel_read() to
    read details from the kernel workers. As kernel worker are
    completely async pointers may turn NULL while reading them.

    Cc:
    Cc:
    Cc: # 3.10.x
    Signed-off-by: Richard Weinberger

    Richard Weinberger
     

19 Jul, 2013

3 commits


04 Jul, 2013

5 commits

  • Prepare for removing num_physpages and simplify mem_init().

    Signed-off-by: Jiang Liu
    Cc: Jeff Dike
    Cc: Richard Weinberger
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiang Liu
     
  • Normalize global variables exported by vmlinux.lds to conform usage
    guidelines from include/asm-generic/sections.h.

    1) Use _text to mark the start of the kernel image including the head
    text, and _stext to mark the start of the .text section.
    2) Export mandatory global variables __bss_stop.
    3) Adjust __init_begin and __init_end to avoid acrossing .text and
    .data sections.

    Signed-off-by: Jiang Liu
    Cc: Jeff Dike
    Cc: Richard Weinberger
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiang Liu
     
  • Concentrate code to modify totalram_pages into the mm core, so the arch
    memory initialized code doesn't need to take care of it. With these
    changes applied, only following functions from mm core modify global
    variable totalram_pages: free_bootmem_late(), free_all_bootmem(),
    free_all_bootmem_node(), adjust_managed_page_count().

    With this patch applied, it will be much more easier for us to keep
    totalram_pages and zone->managed_pages in consistence.

    Signed-off-by: Jiang Liu
    Acked-by: David Howells
    Cc: "H. Peter Anvin"
    Cc: "Michael S. Tsirkin"
    Cc:
    Cc: Arnd Bergmann
    Cc: Catalin Marinas
    Cc: Chris Metcalf
    Cc: Geert Uytterhoeven
    Cc: Ingo Molnar
    Cc: Jeremy Fitzhardinge
    Cc: Jianguo Wu
    Cc: Joonsoo Kim
    Cc: Kamezawa Hiroyuki
    Cc: Konrad Rzeszutek Wilk
    Cc: Marek Szyprowski
    Cc: Mel Gorman
    Cc: Michel Lespinasse
    Cc: Minchan Kim
    Cc: Rik van Riel
    Cc: Rusty Russell
    Cc: Tang Chen
    Cc: Tejun Heo
    Cc: Thomas Gleixner
    Cc: Wen Congyang
    Cc: Will Deacon
    Cc: Yasuaki Ishimatsu
    Cc: Yinghai Lu
    Cc: Russell King
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiang Liu
     
  • Address more review comments from last round of code review.
    1) Enhance free_reserved_area() to support poisoning freed memory with
    pattern '0'. This could be used to get rid of poison_init_mem()
    on ARM64.
    2) A previous patch has disabled memory poison for initmem on s390
    by mistake, so restore to the original behavior.
    3) Remove redundant PAGE_ALIGN() when calling free_reserved_area().

    Signed-off-by: Jiang Liu
    Cc: Geert Uytterhoeven
    Cc: "H. Peter Anvin"
    Cc: "Michael S. Tsirkin"
    Cc:
    Cc: Arnd Bergmann
    Cc: Catalin Marinas
    Cc: Chris Metcalf
    Cc: David Howells
    Cc: Ingo Molnar
    Cc: Jeremy Fitzhardinge
    Cc: Jianguo Wu
    Cc: Joonsoo Kim
    Cc: Kamezawa Hiroyuki
    Cc: Konrad Rzeszutek Wilk
    Cc: Marek Szyprowski
    Cc: Mel Gorman
    Cc: Michel Lespinasse
    Cc: Minchan Kim
    Cc: Rik van Riel
    Cc: Rusty Russell
    Cc: Tang Chen
    Cc: Tejun Heo
    Cc: Thomas Gleixner
    Cc: Wen Congyang
    Cc: Will Deacon
    Cc: Yasuaki Ishimatsu
    Cc: Yinghai Lu
    Cc: Russell King
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiang Liu
     
  • Change signature of free_reserved_area() according to Russell King's
    suggestion to fix following build warnings:

    arch/arm/mm/init.c: In function 'mem_init':
    arch/arm/mm/init.c:603:2: warning: passing argument 1 of 'free_reserved_area' makes integer from pointer without a cast [enabled by default]
    free_reserved_area(__va(PHYS_PFN_OFFSET), swapper_pg_dir, 0, NULL);
    ^
    In file included from include/linux/mman.h:4:0,
    from arch/arm/mm/init.c:15:
    include/linux/mm.h:1301:22: note: expected 'long unsigned int' but argument is of type 'void *'
    extern unsigned long free_reserved_area(unsigned long start, unsigned long end,

    mm/page_alloc.c: In function 'free_reserved_area':
    >> mm/page_alloc.c:5134:3: warning: passing argument 1 of 'virt_to_phys' makes pointer from integer without a cast [enabled by default]
    In file included from arch/mips/include/asm/page.h:49:0,
    from include/linux/mmzone.h:20,
    from include/linux/gfp.h:4,
    from include/linux/mm.h:8,
    from mm/page_alloc.c:18:
    arch/mips/include/asm/io.h:119:29: note: expected 'const volatile void *' but argument is of type 'long unsigned int'
    mm/page_alloc.c: In function 'free_area_init_nodes':
    mm/page_alloc.c:5030:34: warning: array subscript is below array bounds [-Warray-bounds]

    Also address some minor code review comments.

    Signed-off-by: Jiang Liu
    Reported-by: Arnd Bergmann
    Cc: "H. Peter Anvin"
    Cc: "Michael S. Tsirkin"
    Cc:
    Cc: Catalin Marinas
    Cc: Chris Metcalf
    Cc: David Howells
    Cc: Geert Uytterhoeven
    Cc: Ingo Molnar
    Cc: Jeremy Fitzhardinge
    Cc: Jianguo Wu
    Cc: Joonsoo Kim
    Cc: Kamezawa Hiroyuki
    Cc: Konrad Rzeszutek Wilk
    Cc: Marek Szyprowski
    Cc: Mel Gorman
    Cc: Michel Lespinasse
    Cc: Minchan Kim
    Cc: Rik van Riel
    Cc: Rusty Russell
    Cc: Tang Chen
    Cc: Tejun Heo
    Cc: Thomas Gleixner
    Cc: Wen Congyang
    Cc: Will Deacon
    Cc: Yasuaki Ishimatsu
    Cc: Yinghai Lu
    Cc: Russell King
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiang Liu
     

19 Jun, 2013

1 commit

  • Most of the stuff from kernel/sched.c was moved to kernel/sched/core.c long time
    back and the comments/Documentation never got updated.

    I figured it out when I was going through sched-domains.txt and so thought of
    fixing it globally.

    I haven't crossed check if the stuff that is referenced in sched/core.c by all
    these files is still present and hasn't changed as that wasn't the motive behind
    this patch.

    Signed-off-by: Viresh Kumar
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/cdff76a265326ab8d71922a1db5be599f20aad45.1370329560.git.viresh.kumar@linaro.org
    Signed-off-by: Ingo Molnar

    Viresh Kumar
     

01 May, 2013

1 commit

  • Both dump_stack() and show_stack() are currently implemented by each
    architecture. show_stack(NULL, NULL) dumps the backtrace for the
    current task as does dump_stack(). On some archs, dump_stack() prints
    extra information - pid, utsname and so on - in addition to the
    backtrace while the two are identical on other archs.

    The usages in arch-independent code of the two functions indicate
    show_stack(NULL, NULL) should print out bare backtrace while
    dump_stack() is used for debugging purposes when something went wrong,
    so it does make sense to print additional information on the task which
    triggered dump_stack().

    There's no reason to require archs to implement two separate but mostly
    identical functions. It leads to unnecessary subtle information.

    This patch expands the dummy fallback dump_stack() implementation in
    lib/dump_stack.c such that it prints out debug information (taken from
    x86) and invokes show_stack(NULL, NULL) and drops arch-specific
    dump_stack() implementations in all archs except blackfin. Blackfin's
    dump_stack() does something wonky that I don't understand.

    Debug information can be printed separately by calling
    dump_stack_print_info() so that arch-specific dump_stack()
    implementation can still emit the same debug information. This is used
    in blackfin.

    This patch brings the following behavior changes.

    * On some archs, an extra level in backtrace for show_stack() could be
    printed. This is because the top frame was determined in
    dump_stack() on those archs while generic dump_stack() can't do that
    reliably. It can be compensated by inlining dump_stack() but not
    sure whether that'd be necessary.

    * Most archs didn't use to print debug info on dump_stack(). They do
    now.

    An example WARN dump follows.

    WARNING: at kernel/workqueue.c:4841 init_workqueues+0x35/0x505()
    Hardware name: empty
    Modules linked in:
    CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.9.0-rc1-work+ #9
    0000000000000009 ffff88007c861e08 ffffffff81c614dc ffff88007c861e48
    ffffffff8108f50f ffffffff82228240 0000000000000040 ffffffff8234a03c
    0000000000000000 0000000000000000 0000000000000000 ffff88007c861e58
    Call Trace:
    [] dump_stack+0x19/0x1b
    [] warn_slowpath_common+0x7f/0xc0
    [] warn_slowpath_null+0x1a/0x20
    [] init_workqueues+0x35/0x505
    ...

    v2: CPU number added to the generic debug info as requested by s390
    folks and dropped the s390 specific dump_stack(). This loses %ksp
    from the debug message which the maintainers think isn't important
    enough to keep the s390-specific dump_stack() implementation.

    dump_stack_print_info() is moved to kernel/printk.c from
    lib/dump_stack.c. Because linkage is per objecct file,
    dump_stack_print_info() living in the same lib file as generic
    dump_stack() means that archs which implement custom dump_stack()
    - at this point, only blackfin - can't use dump_stack_print_info()
    as that will bring in the generic version of dump_stack() too. v1
    The v1 patch broke build on blackfin due to this issue. The build
    breakage was reported by Fengguang Wu.

    Signed-off-by: Tejun Heo
    Acked-by: David S. Miller
    Acked-by: Vineet Gupta
    Acked-by: Jesper Nilsson
    Acked-by: Vineet Gupta
    Acked-by: Martin Schwidefsky [s390 bits]
    Cc: Heiko Carstens
    Cc: Mike Frysinger
    Cc: Fengguang Wu
    Cc: Bjorn Helgaas
    Cc: Sam Ravnborg
    Acked-by: Richard Kuo [hexagon bits]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     

30 Apr, 2013

4 commits

  • Pull SMP/hotplug changes from Ingo Molnar:
    "This is a pretty large, multi-arch series unifying and generalizing
    the various disjunct pieces of idle routines that architectures have
    historically copied from each other and have grown in random, wildly
    inconsistent and sometimes buggy directions:

    101 files changed, 455 insertions(+), 1328 deletions(-)

    this went through a number of review and test iterations before it was
    committed, it was tested on various architectures, was exposed to
    linux-next for quite some time - nevertheless it might cause problems
    on architectures that don't read the mailing lists and don't regularly
    test linux-next.

    This cat herding excercise was motivated by the -rt kernel, and was
    brought to you by Thomas "the Whip" Gleixner."

    * 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (40 commits)
    idle: Remove GENERIC_IDLE_LOOP config switch
    um: Use generic idle loop
    ia64: Make sure interrupts enabled when we "safe_halt()"
    sparc: Use generic idle loop
    idle: Remove unused ARCH_HAS_DEFAULT_IDLE
    bfin: Fix typo in arch_cpu_idle()
    xtensa: Use generic idle loop
    x86: Use generic idle loop
    unicore: Use generic idle loop
    tile: Use generic idle loop
    tile: Enter idle with preemption disabled
    sh: Use generic idle loop
    score: Use generic idle loop
    s390: Use generic idle loop
    powerpc: Use generic idle loop
    parisc: Use generic idle loop
    openrisc: Use generic idle loop
    mn10300: Use generic idle loop
    mips: Use generic idle loop
    microblaze: Use generic idle loop
    ...

    Linus Torvalds
     
  • The early console implementations are the same all over the place. Move
    the print function to kernel/printk and get rid of the copies.

    [akpm@linux-foundation.org: arch/mips/kernel/early_printk.c needs kernel.h for va_list]
    [paul.gortmaker@windriver.com: sh4: make the bios early console support depend on EARLY_PRINTK]
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Paul Gortmaker
    Cc: Russell King
    Acked-by: Mike Frysinger
    Cc: Michal Simek
    Cc: Ralf Baechle
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mundt
    Cc: "David S. Miller"
    Cc: Chris Metcalf
    Cc: Richard Weinberger
    Reviewed-by: Ingo Molnar
    Tested-by: Paul Gortmaker
    Cc: Geert Uytterhoeven
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Thomas Gleixner
     
  • Use helper function free_highmem_page() to free highmem pages into
    the buddy system.

    Signed-off-by: Jiang Liu
    Cc: Jeff Dike
    Cc: Richard Weinberger
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiang Liu
     
  • Use common help functions to free reserved pages.

    Signed-off-by: Jiang Liu
    Cc: Jeff Dike
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiang Liu
     

17 Apr, 2013

1 commit


04 Feb, 2013

1 commit


20 Dec, 2012

1 commit


29 Nov, 2012

5 commits


13 Oct, 2012

2 commits

  • Pull third pile of kernel_execve() patches from Al Viro:
    "The last bits of infrastructure for kernel_thread() et.al., with
    alpha/arm/x86 use of those. Plus sanitizing the asm glue and
    do_notify_resume() on alpha, fixing the "disabled irq while running
    task_work stuff" breakage there.

    At that point the rest of kernel_thread/kernel_execve/sys_execve work
    can be done independently for different architectures. The only
    pending bits that do depend on having all architectures converted are
    restrictred to fs/* and kernel/* - that'll obviously have to wait for
    the next cycle.

    I thought we'd have to wait for all of them done before we start
    eliminating the longjump-style insanity in kernel_execve(), but it
    turned out there's a very simple way to do that without flagday-style
    changes."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal:
    alpha: switch to saner kernel_execve() semantics
    arm: switch to saner kernel_execve() semantics
    x86, um: convert to saner kernel_execve() semantics
    infrastructure for saner ret_from_kernel_thread semantics
    make sure that kernel_thread() callbacks call do_exit() themselves
    make sure that we always have a return path from kernel_execve()
    ppc: eeh_event should just use kthread_run()
    don't bother with kernel_thread/kernel_execve for launching linuxrc
    alpha: get rid of switch_stack argument of do_work_pending()
    alpha: don't bother passing switch_stack separately from regs
    alpha: take SIGPENDING/NOTIFY_RESUME loop into signal.c
    alpha: simplify TIF_NEED_RESCHED handling

    Linus Torvalds
     
  • Signed-off-by: Al Viro

    Al Viro
     

10 Oct, 2012

4 commits

  • Pull generic execve() changes from Al Viro:
    "This introduces the generic kernel_thread() and kernel_execve()
    functions, and switches x86, arm, alpha, um and s390 over to them."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal: (26 commits)
    s390: convert to generic kernel_execve()
    s390: switch to generic kernel_thread()
    s390: fold kernel_thread_helper() into ret_from_fork()
    s390: fold execve_tail() into start_thread(), convert to generic sys_execve()
    um: switch to generic kernel_thread()
    x86, um/x86: switch to generic sys_execve and kernel_execve
    x86: split ret_from_fork
    alpha: introduce ret_from_kernel_execve(), switch to generic kernel_execve()
    alpha: switch to generic kernel_thread()
    alpha: switch to generic sys_execve()
    arm: get rid of execve wrapper, switch to generic execve() implementation
    arm: optimized current_pt_regs()
    arm: introduce ret_from_kernel_execve(), switch to generic kernel_execve()
    arm: split ret_from_fork, simplify kernel_thread() [based on patch by rmk]
    generic sys_execve()
    generic kernel_execve()
    new helper: current_pt_regs()
    preparation for generic kernel_thread()
    um: kill thread->forking
    um: let signal_delivered() do SIGTRAP on singlestepping into handler
    ...

    Linus Torvalds
     
  • Pull UML changes from Richard Weinberger:
    "UML receives this time only cleanups.

    The most outstanding change is the 'include "foo.h"' do 'include
    ' conversion done by Al Viro.

    It touches many files, that's why the diffstat is rather big."

    * 'for-linus-37rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
    typo in UserModeLinux-HOWTO
    hppfs: fix the return value of get_inode()
    hostfs: drop vmtruncate
    um: get rid of pointless include "..." where include will do
    um: move sysrq.h out of include/shared
    um/x86: merge 32 and 64 bit variants of ptrace.h
    um/x86: merge 32 and 64bit variants of checksum.h

    Linus Torvalds
     
  • Signed-off-by: Al Viro
    Signed-off-by: Richard Weinberger

    Al Viro
     
  • never used by userland-side objects

    Signed-off-by: Al Viro
    Signed-off-by: Richard Weinberger

    Al Viro
     

09 Oct, 2012

1 commit

  • .fault now can retry. The retry can break state machine of .fault. In
    filemap_fault, if page is miss, ra->mmap_miss is increased. In the second
    try, since the page is in page cache now, ra->mmap_miss is decreased. And
    these are done in one fault, so we can't detect random mmap file access.

    Add a new flag to indicate .fault is tried once. In the second try, skip
    ra->mmap_miss decreasing. The filemap_fault state machine is ok with it.

    I only tested x86, didn't test other archs, but looks the change for other
    archs is obvious, but who knows :)

    Signed-off-by: Shaohua Li
    Cc: Rik van Riel
    Cc: Wu Fengguang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Shaohua Li
     

01 Oct, 2012

2 commits


28 Sep, 2012

4 commits


20 Sep, 2012

1 commit

  • we only use that to tell copy_thread() done by syscall from that
    done by kernel_thread(). However, it's easier to do simply by
    checking PF_KTHREAD in thread flags.

    Merge sys_clone() guts for 32bit and 64bit, while we are at it...

    Signed-off-by: Al Viro

    Al Viro