13 Sep, 2013

3 commits

  • After the last architecture switched to generic hard irqs the config
    options HAVE_GENERIC_HARDIRQS & GENERIC_HARDIRQS and the related code
    for !CONFIG_GENERIC_HARDIRQS can be removed.

    Signed-off-by: Martin Schwidefsky

    Martin Schwidefsky
     
  • Unlike global OOM handling, memory cgroup code will invoke the OOM killer
    in any OOM situation because it has no way of telling faults occuring in
    kernel context - which could be handled more gracefully - from
    user-triggered faults.

    Pass a flag that identifies faults originating in user space from the
    architecture-specific fault handlers to generic code so that memcg OOM
    handling can be improved.

    Signed-off-by: Johannes Weiner
    Reviewed-by: Michal Hocko
    Cc: David Rientjes
    Cc: KAMEZAWA Hiroyuki
    Cc: azurIt
    Cc: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • Kernel faults are expected to handle OOM conditions gracefully (gup,
    uaccess etc.), so they should never invoke the OOM killer. Reserve this
    for faults triggered in user context when it is the only option.

    Most architectures already do this, fix up the remaining few.

    Signed-off-by: Johannes Weiner
    Reviewed-by: Michal Hocko
    Acked-by: KOSAKI Motohiro
    Cc: David Rientjes
    Cc: KAMEZAWA Hiroyuki
    Cc: azurIt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     

07 Sep, 2013

6 commits

  • These handlers are not optional and need in our case
    dummy implementions to avoid NULL pointer bugs within
    the irq core code.

    Reported-and-tested-by: Toralf Foester
    Signed-off-by: Richard Weinberger

    Richard Weinberger
     
  • If UML is not run by a shell it can happen that UML
    will kill unrelated proceses upon a fatal exit because
    it issues a kill(0, ...).
    To prevent such oddities we create a new session in main().

    Reported-and-tested-by: Richard W.M. Jones
    Signed-off-by: Richard Weinberger

    Richard Weinberger
     
  • Richard reported that some UML processes survive if the UML
    main process receives a SIGTERM.
    This issue was caused by a wrongly placed signal(SIGTERM, SIG_DFL)
    in init_new_thread_signals().
    It disabled the UML exit handler accidently for some processes.
    The correct solution is to disable the fatal handler for all
    UML helper threads/processes.
    Such that last_ditch_exit() does not get called multiple times
    and all processes can exit due to SIGTERM.

    Reported-and-tested-by: Richard W.M. Jones
    Signed-off-by: Richard Weinberger

    Richard Weinberger
     
  • Just a clean-up patch to remove the open coded
    variants and to ensure that all requests are submitted the
    same way.

    Signed-off-by: Richard Weinberger

    Richard Weinberger
     
  • UML's block device driver does not support write barriers,
    to support this this patch adds REQ_FLUSH suppport.
    Every time the block layer sends a REQ_FLUSH we fsync() now
    our backing file to guarantee data consistency.

    Reported-and-tested-by: Richard W.M. Jones
    Signed-off-by: Richard Weinberger

    Richard Weinberger
     
  • UML needs it's own probe_kernel_read() to handle kernel
    mode faults correctly.
    The implementation uses mincore() on the host side to detect
    whether a page is owned by the UML kernel process.

    This fixes also a possible crash when sysrq-t is used.
    Starting with 3.10 sysrq-t calls probe_kernel_read() to
    read details from the kernel workers. As kernel worker are
    completely async pointers may turn NULL while reading them.

    Cc:
    Cc:
    Cc: # 3.10.x
    Signed-off-by: Richard Weinberger

    Richard Weinberger
     

16 Aug, 2013

1 commit

  • Ben Tebulin reported:

    "Since v3.7.2 on two independent machines a very specific Git
    repository fails in 9/10 cases on git-fsck due to an SHA1/memory
    failures. This only occurs on a very specific repository and can be
    reproduced stably on two independent laptops. Git mailing list ran
    out of ideas and for me this looks like some very exotic kernel issue"

    and bisected the failure to the backport of commit 53a59fc67f97 ("mm:
    limit mmu_gather batching to fix soft lockups on !CONFIG_PREEMPT").

    That commit itself is not actually buggy, but what it does is to make it
    much more likely to hit the partial TLB invalidation case, since it
    introduces a new case in tlb_next_batch() that previously only ever
    happened when running out of memory.

    The real bug is that the TLB gather virtual memory range setup is subtly
    buggered. It was introduced in commit 597e1c3580b7 ("mm/mmu_gather:
    enable tlb flush range in generic mmu_gather"), and the range handling
    was already fixed at least once in commit e6c495a96ce0 ("mm: fix the TLB
    range flushed when __tlb_remove_page() runs out of slots"), but that fix
    was not complete.

    The problem with the TLB gather virtual address range is that it isn't
    set up by the initial tlb_gather_mmu() initialization (which didn't get
    the TLB range information), but it is set up ad-hoc later by the
    functions that actually flush the TLB. And so any such case that forgot
    to update the TLB range entries would potentially miss TLB invalidates.

    Rather than try to figure out exactly which particular ad-hoc range
    setup was missing (I personally suspect it's the hugetlb case in
    zap_huge_pmd(), which didn't have the same logic as zap_pte_range()
    did), this patch just gets rid of the problem at the source: make the
    TLB range information available to tlb_gather_mmu(), and initialize it
    when initializing all the other tlb gather fields.

    This makes the patch larger, but conceptually much simpler. And the end
    result is much more understandable; even if you want to play games with
    partial ranges when invalidating the TLB contents in chunks, now the
    range information is always there, and anybody who doesn't want to
    bother with it won't introduce subtle bugs.

    Ben verified that this fixes his problem.

    Reported-bisected-and-tested-by: Ben Tebulin
    Build-testing-by: Stephen Rothwell
    Build-testing-by: Richard Weinberger
    Reviewed-by: Michal Hocko
    Acked-by: Peter Zijlstra
    Cc: stable@vger.kernel.org
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

19 Jul, 2013

5 commits


04 Jul, 2013

5 commits

  • Prepare for removing num_physpages and simplify mem_init().

    Signed-off-by: Jiang Liu
    Cc: Jeff Dike
    Cc: Richard Weinberger
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiang Liu
     
  • Normalize global variables exported by vmlinux.lds to conform usage
    guidelines from include/asm-generic/sections.h.

    1) Use _text to mark the start of the kernel image including the head
    text, and _stext to mark the start of the .text section.
    2) Export mandatory global variables __bss_stop.
    3) Adjust __init_begin and __init_end to avoid acrossing .text and
    .data sections.

    Signed-off-by: Jiang Liu
    Cc: Jeff Dike
    Cc: Richard Weinberger
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiang Liu
     
  • Concentrate code to modify totalram_pages into the mm core, so the arch
    memory initialized code doesn't need to take care of it. With these
    changes applied, only following functions from mm core modify global
    variable totalram_pages: free_bootmem_late(), free_all_bootmem(),
    free_all_bootmem_node(), adjust_managed_page_count().

    With this patch applied, it will be much more easier for us to keep
    totalram_pages and zone->managed_pages in consistence.

    Signed-off-by: Jiang Liu
    Acked-by: David Howells
    Cc: "H. Peter Anvin"
    Cc: "Michael S. Tsirkin"
    Cc:
    Cc: Arnd Bergmann
    Cc: Catalin Marinas
    Cc: Chris Metcalf
    Cc: Geert Uytterhoeven
    Cc: Ingo Molnar
    Cc: Jeremy Fitzhardinge
    Cc: Jianguo Wu
    Cc: Joonsoo Kim
    Cc: Kamezawa Hiroyuki
    Cc: Konrad Rzeszutek Wilk
    Cc: Marek Szyprowski
    Cc: Mel Gorman
    Cc: Michel Lespinasse
    Cc: Minchan Kim
    Cc: Rik van Riel
    Cc: Rusty Russell
    Cc: Tang Chen
    Cc: Tejun Heo
    Cc: Thomas Gleixner
    Cc: Wen Congyang
    Cc: Will Deacon
    Cc: Yasuaki Ishimatsu
    Cc: Yinghai Lu
    Cc: Russell King
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiang Liu
     
  • Address more review comments from last round of code review.
    1) Enhance free_reserved_area() to support poisoning freed memory with
    pattern '0'. This could be used to get rid of poison_init_mem()
    on ARM64.
    2) A previous patch has disabled memory poison for initmem on s390
    by mistake, so restore to the original behavior.
    3) Remove redundant PAGE_ALIGN() when calling free_reserved_area().

    Signed-off-by: Jiang Liu
    Cc: Geert Uytterhoeven
    Cc: "H. Peter Anvin"
    Cc: "Michael S. Tsirkin"
    Cc:
    Cc: Arnd Bergmann
    Cc: Catalin Marinas
    Cc: Chris Metcalf
    Cc: David Howells
    Cc: Ingo Molnar
    Cc: Jeremy Fitzhardinge
    Cc: Jianguo Wu
    Cc: Joonsoo Kim
    Cc: Kamezawa Hiroyuki
    Cc: Konrad Rzeszutek Wilk
    Cc: Marek Szyprowski
    Cc: Mel Gorman
    Cc: Michel Lespinasse
    Cc: Minchan Kim
    Cc: Rik van Riel
    Cc: Rusty Russell
    Cc: Tang Chen
    Cc: Tejun Heo
    Cc: Thomas Gleixner
    Cc: Wen Congyang
    Cc: Will Deacon
    Cc: Yasuaki Ishimatsu
    Cc: Yinghai Lu
    Cc: Russell King
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiang Liu
     
  • Change signature of free_reserved_area() according to Russell King's
    suggestion to fix following build warnings:

    arch/arm/mm/init.c: In function 'mem_init':
    arch/arm/mm/init.c:603:2: warning: passing argument 1 of 'free_reserved_area' makes integer from pointer without a cast [enabled by default]
    free_reserved_area(__va(PHYS_PFN_OFFSET), swapper_pg_dir, 0, NULL);
    ^
    In file included from include/linux/mman.h:4:0,
    from arch/arm/mm/init.c:15:
    include/linux/mm.h:1301:22: note: expected 'long unsigned int' but argument is of type 'void *'
    extern unsigned long free_reserved_area(unsigned long start, unsigned long end,

    mm/page_alloc.c: In function 'free_reserved_area':
    >> mm/page_alloc.c:5134:3: warning: passing argument 1 of 'virt_to_phys' makes pointer from integer without a cast [enabled by default]
    In file included from arch/mips/include/asm/page.h:49:0,
    from include/linux/mmzone.h:20,
    from include/linux/gfp.h:4,
    from include/linux/mm.h:8,
    from mm/page_alloc.c:18:
    arch/mips/include/asm/io.h:119:29: note: expected 'const volatile void *' but argument is of type 'long unsigned int'
    mm/page_alloc.c: In function 'free_area_init_nodes':
    mm/page_alloc.c:5030:34: warning: array subscript is below array bounds [-Warray-bounds]

    Also address some minor code review comments.

    Signed-off-by: Jiang Liu
    Reported-by: Arnd Bergmann
    Cc: "H. Peter Anvin"
    Cc: "Michael S. Tsirkin"
    Cc:
    Cc: Catalin Marinas
    Cc: Chris Metcalf
    Cc: David Howells
    Cc: Geert Uytterhoeven
    Cc: Ingo Molnar
    Cc: Jeremy Fitzhardinge
    Cc: Jianguo Wu
    Cc: Joonsoo Kim
    Cc: Kamezawa Hiroyuki
    Cc: Konrad Rzeszutek Wilk
    Cc: Marek Szyprowski
    Cc: Mel Gorman
    Cc: Michel Lespinasse
    Cc: Minchan Kim
    Cc: Rik van Riel
    Cc: Rusty Russell
    Cc: Tang Chen
    Cc: Tejun Heo
    Cc: Thomas Gleixner
    Cc: Wen Congyang
    Cc: Will Deacon
    Cc: Yasuaki Ishimatsu
    Cc: Yinghai Lu
    Cc: Russell King
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiang Liu
     

03 Jul, 2013

1 commit

  • Pull scheduler updates from Ingo Molnar:
    "The main changes:

    - load-calculation cleanups and improvements, by Alex Shi
    - various nohz related tidying up of statisics, by Frederic
    Weisbecker
    - factor out /proc functions to kernel/sched/proc.c, by Paul
    Gortmaker
    - simplify the RT policy scheduler, by Kirill Tkhai
    - various fixes and cleanups"

    * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (42 commits)
    sched/debug: Remove CONFIG_FAIR_GROUP_SCHED mask
    sched/debug: Fix formatting of /proc//sched
    sched: Fix typo in struct sched_avg member description
    sched/fair: Fix typo describing flags in enqueue_entity
    sched/debug: Add load-tracking statistics to task
    sched: Change get_rq_runnable_load() to static and inline
    sched/tg: Remove tg.load_weight
    sched/cfs_rq: Change atomic64_t removed_load to atomic_long_t
    sched/tg: Use 'unsigned long' for load variable in task group
    sched: Change cfs_rq load avg to unsigned long
    sched: Consider runnable load average in move_tasks()
    sched: Compute runnable load avg in cpu_load and cpu_avg_load_per_task
    sched: Update cpu load after task_tick
    sched: Fix sleep time double accounting in enqueue entity
    sched: Set an initial value of runnable avg for new forked task
    sched: Move a few runnable tg variables into CONFIG_SMP
    Revert "sched: Introduce temporary FAIR_GROUP_SCHED dependency for load-tracking"
    sched: Don't mix use of typedef ctl_table and struct ctl_table
    sched: Remove WARN_ON(!sd) from init_sched_groups_power()
    sched: Fix memory leakage in build_sched_groups()
    ...

    Linus Torvalds
     

01 Jul, 2013

1 commit

  • Merge in a recent upstream commit:

    c2853c8df57f include/linux/math64.h: add div64_ul()

    because:

    72a4cf20cb71 sched: Change cfs_rq load avg to unsigned long

    relies on it.

    [ We don't rebase sched/core for this, because the handful of
    followup commits after the broken commit are not behavioral
    changes so are unlikely to be needed during bisection. ]

    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

29 Jun, 2013

1 commit


19 Jun, 2013

2 commits


08 May, 2013

1 commit


07 May, 2013

1 commit


06 May, 2013

1 commit

  • Pull 'full dynticks' support from Ingo Molnar:
    "This tree from Frederic Weisbecker adds a new, (exciting! :-) core
    kernel feature to the timer and scheduler subsystems: 'full dynticks',
    or CONFIG_NO_HZ_FULL=y.

    This feature extends the nohz variable-size timer tick feature from
    idle to busy CPUs (running at most one task) as well, potentially
    reducing the number of timer interrupts significantly.

    This feature got motivated by real-time folks and the -rt tree, but
    the general utility and motivation of full-dynticks runs wider than
    that:

    - HPC workloads get faster: CPUs running a single task should be able
    to utilize a maximum amount of CPU power. A periodic timer tick at
    HZ=1000 can cause a constant overhead of up to 1.0%. This feature
    removes that overhead - and speeds up the system by 0.5%-1.0% on
    typical distro configs even on modern systems.

    - Real-time workload latency reduction: CPUs running critical tasks
    should experience as little jitter as possible. The last remaining
    source of kernel-related jitter was the periodic timer tick.

    - A single task executing on a CPU is a pretty common situation,
    especially with an increasing number of cores/CPUs, so this feature
    helps desktop and mobile workloads as well.

    The cost of the feature is mainly related to increased timer
    reprogramming overhead when a CPU switches its tick period, and thus
    slightly longer to-idle and from-idle latency.

    Configuration-wise a third mode of operation is added to the existing
    two NOHZ kconfig modes:

    - CONFIG_HZ_PERIODIC: [formerly !CONFIG_NO_HZ], now explicitly named
    as a config option. This is the traditional Linux periodic tick
    design: there's a HZ tick going on all the time, regardless of
    whether a CPU is idle or not.

    - CONFIG_NO_HZ_IDLE: [formerly CONFIG_NO_HZ=y], this turns off the
    periodic tick when a CPU enters idle mode.

    - CONFIG_NO_HZ_FULL: this new mode, in addition to turning off the
    tick when a CPU is idle, also slows the tick down to 1 Hz (one
    timer interrupt per second) when only a single task is running on a
    CPU.

    The .config behavior is compatible: existing !CONFIG_NO_HZ and
    CONFIG_NO_HZ=y settings get translated to the new values, without the
    user having to configure anything. CONFIG_NO_HZ_FULL is turned off by
    default.

    This feature is based on a lot of infrastructure work that has been
    steadily going upstream in the last 2-3 cycles: related RCU support
    and non-periodic cputime support in particular is upstream already.

    This tree adds the final pieces and activates the feature. The pull
    request is marked RFC because:

    - it's marked 64-bit only at the moment - the 32-bit support patch is
    small but did not get ready in time.

    - it has a number of fresh commits that came in after the merge
    window. The overwhelming majority of commits are from before the
    merge window, but still some aspects of the tree are fresh and so I
    marked it RFC.

    - it's a pretty wide-reaching feature with lots of effects - and
    while the components have been in testing for some time, the full
    combination is still not very widely used. That it's default-off
    should reduce its regression abilities and obviously there are no
    known regressions with CONFIG_NO_HZ_FULL=y enabled either.

    - the feature is not completely idempotent: there is no 100%
    equivalent replacement for a periodic scheduler/timer tick. In
    particular there's ongoing work to map out and reduce its effects
    on scheduler load-balancing and statistics. This should not impact
    correctness though, there are no known regressions related to this
    feature at this point.

    - it's a pretty ambitious feature that with time will likely be
    enabled by most Linux distros, and we'd like you to make input on
    its design/implementation, if you dislike some aspect we missed.
    Without flaming us to crisp! :-)

    Future plans:

    - there's ongoing work to reduce 1Hz to 0Hz, to essentially shut off
    the periodic tick altogether when there's a single busy task on a
    CPU. We'd first like 1 Hz to be exposed more widely before we go
    for the 0 Hz target though.

    - once we reach 0 Hz we can remove the periodic tick assumption from
    nr_running>=2 as well, by essentially interrupting busy tasks only
    as frequently as the sched_latency constraints require us to do -
    once every 4-40 msecs, depending on nr_running.

    I am personally leaning towards biting the bullet and doing this in
    v3.10, like the -rt tree this effort has been going on for too long -
    but the final word is up to you as usual.

    More technical details can be found in Documentation/timers/NO_HZ.txt"

    * 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (39 commits)
    sched: Keep at least 1 tick per second for active dynticks tasks
    rcu: Fix full dynticks' dependency on wide RCU nocb mode
    nohz: Protect smp_processor_id() in tick_nohz_task_switch()
    nohz_full: Add documentation.
    cputime_nsecs: use math64.h for nsec resolution conversion helpers
    nohz: Select VIRT_CPU_ACCOUNTING_GEN from full dynticks config
    nohz: Reduce overhead under high-freq idling patterns
    nohz: Remove full dynticks' superfluous dependency on RCU tree
    nohz: Fix unavailable tick_stop tracepoint in dynticks idle
    nohz: Add basic tracing
    nohz: Select wide RCU nocb for full dynticks
    nohz: Disable the tick when irq resume in full dynticks CPU
    nohz: Re-evaluate the tick for the new task after a context switch
    nohz: Prepare to stop the tick on irq exit
    nohz: Implement full dynticks kick
    nohz: Re-evaluate the tick from the scheduler IPI
    sched: New helper to prevent from stopping the tick in full dynticks
    sched: Kick full dynticks CPU that have more than one task enqueued.
    perf: New helper to prevent full dynticks CPUs from stopping tick
    perf: Kick full dynticks CPU if events rotation is needed
    ...

    Linus Torvalds
     

02 May, 2013

2 commits

  • The full dynticks tree needs the latest RCU and sched
    upstream updates in order to fix some dependencies.

    Merge a common upstream merge point that has these
    updates.

    Conflicts:
    include/linux/perf_event.h
    kernel/rcutree.h
    kernel/rcutree_plugin.h

    Signed-off-by: Frederic Weisbecker

    Frederic Weisbecker
     
  • Pull VFS updates from Al Viro,

    Misc cleanups all over the place, mainly wrt /proc interfaces (switch
    create_proc_entry to proc_create(), get rid of the deprecated
    create_proc_read_entry() in favor of using proc_create_data() and
    seq_file etc).

    7kloc removed.

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (204 commits)
    don't bother with deferred freeing of fdtables
    proc: Move non-public stuff from linux/proc_fs.h to fs/proc/internal.h
    proc: Make the PROC_I() and PDE() macros internal to procfs
    proc: Supply a function to remove a proc entry by PDE
    take cgroup_open() and cpuset_open() to fs/proc/base.c
    ppc: Clean up scanlog
    ppc: Clean up rtas_flash driver somewhat
    hostap: proc: Use remove_proc_subtree()
    drm: proc: Use remove_proc_subtree()
    drm: proc: Use minor->index to label things, not PDE->name
    drm: Constify drm_proc_list[]
    zoran: Don't print proc_dir_entry data in debug
    reiserfs: Don't access the proc_dir_entry in r_open(), r_start() r_show()
    proc: Supply an accessor for getting the data from a PDE's parent
    airo: Use remove_proc_subtree()
    rtl8192u: Don't need to save device proc dir PDE
    rtl8187se: Use a dir under /proc/net/r8180/
    proc: Add proc_mkdir_data()
    proc: Move some bits from linux/proc_fs.h to linux/{of.h,signal.h,tty.h}
    proc: Move PDE_NET() to fs/proc/proc_net.c
    ...

    Linus Torvalds
     

01 May, 2013

2 commits

  • show_regs() is inherently arch-dependent but it does make sense to print
    generic debug information and some archs already do albeit in slightly
    different forms. This patch introduces a generic function to print debug
    information from show_regs() so that different archs print out the same
    information and it's much easier to modify what's printed.

    show_regs_print_info() prints out the same debug info as dump_stack()
    does plus task and thread_info pointers.

    * Archs which didn't print debug info now do.

    alpha, arc, blackfin, c6x, cris, frv, h8300, hexagon, ia64, m32r,
    metag, microblaze, mn10300, openrisc, parisc, score, sh64, sparc,
    um, xtensa

    * Already prints debug info. Replaced with show_regs_print_info().
    The printed information is superset of what used to be there.

    arm, arm64, avr32, mips, powerpc, sh32, tile, unicore32, x86

    * s390 is special in that it used to print arch-specific information
    along with generic debug info. Heiko and Martin think that the
    arch-specific extra isn't worth keeping s390 specfic implementation.
    Converted to use the generic version.

    Note that now all archs print the debug info before actual register
    dumps.

    An example BUG() dump follows.

    kernel BUG at /work/os/work/kernel/workqueue.c:4841!
    invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
    Modules linked in:
    CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.9.0-rc1-work+ #7
    Hardware name: empty empty/S3992, BIOS 080011 10/26/2007
    task: ffff88007c85e040 ti: ffff88007c860000 task.ti: ffff88007c860000
    RIP: 0010:[] [] init_workqueues+0x4/0x6
    RSP: 0000:ffff88007c861ec8 EFLAGS: 00010246
    RAX: ffff88007c861fd8 RBX: ffffffff824466a8 RCX: 0000000000000001
    RDX: 0000000000000046 RSI: 0000000000000001 RDI: ffffffff8234a07a
    RBP: ffff88007c861ec8 R08: 0000000000000000 R09: 0000000000000000
    R10: 0000000000000001 R11: 0000000000000000 R12: ffffffff8234a07a
    R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
    FS: 0000000000000000(0000) GS:ffff88007dc00000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    CR2: ffff88015f7ff000 CR3: 00000000021f1000 CR4: 00000000000007f0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    Stack:
    ffff88007c861ef8 ffffffff81000312 ffffffff824466a8 ffff88007c85e650
    0000000000000003 0000000000000000 ffff88007c861f38 ffffffff82335e5d
    ffff88007c862080 ffffffff8223d8c0 ffff88007c862080 ffffffff81c47760
    Call Trace:
    [] do_one_initcall+0x122/0x170
    [] kernel_init_freeable+0x9b/0x1c8
    [] ? rest_init+0x140/0x140
    [] kernel_init+0xe/0xf0
    [] ret_from_fork+0x7c/0xb0
    [] ? rest_init+0x140/0x140
    ...

    v2: Typo fix in x86-32.

    v3: CPU number dropped from show_regs_print_info() as
    dump_stack_print_info() has been updated to print it. s390
    specific implementation dropped as requested by s390 maintainers.

    Signed-off-by: Tejun Heo
    Acked-by: David S. Miller
    Acked-by: Jesper Nilsson
    Cc: Heiko Carstens
    Cc: Martin Schwidefsky
    Cc: Bjorn Helgaas
    Cc: Fengguang Wu
    Cc: Mike Frysinger
    Cc: Vineet Gupta
    Cc: Sam Ravnborg
    Acked-by: Chris Metcalf [tile bits]
    Acked-by: Richard Kuo [hexagon bits]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     
  • Both dump_stack() and show_stack() are currently implemented by each
    architecture. show_stack(NULL, NULL) dumps the backtrace for the
    current task as does dump_stack(). On some archs, dump_stack() prints
    extra information - pid, utsname and so on - in addition to the
    backtrace while the two are identical on other archs.

    The usages in arch-independent code of the two functions indicate
    show_stack(NULL, NULL) should print out bare backtrace while
    dump_stack() is used for debugging purposes when something went wrong,
    so it does make sense to print additional information on the task which
    triggered dump_stack().

    There's no reason to require archs to implement two separate but mostly
    identical functions. It leads to unnecessary subtle information.

    This patch expands the dummy fallback dump_stack() implementation in
    lib/dump_stack.c such that it prints out debug information (taken from
    x86) and invokes show_stack(NULL, NULL) and drops arch-specific
    dump_stack() implementations in all archs except blackfin. Blackfin's
    dump_stack() does something wonky that I don't understand.

    Debug information can be printed separately by calling
    dump_stack_print_info() so that arch-specific dump_stack()
    implementation can still emit the same debug information. This is used
    in blackfin.

    This patch brings the following behavior changes.

    * On some archs, an extra level in backtrace for show_stack() could be
    printed. This is because the top frame was determined in
    dump_stack() on those archs while generic dump_stack() can't do that
    reliably. It can be compensated by inlining dump_stack() but not
    sure whether that'd be necessary.

    * Most archs didn't use to print debug info on dump_stack(). They do
    now.

    An example WARN dump follows.

    WARNING: at kernel/workqueue.c:4841 init_workqueues+0x35/0x505()
    Hardware name: empty
    Modules linked in:
    CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.9.0-rc1-work+ #9
    0000000000000009 ffff88007c861e08 ffffffff81c614dc ffff88007c861e48
    ffffffff8108f50f ffffffff82228240 0000000000000040 ffffffff8234a03c
    0000000000000000 0000000000000000 0000000000000000 ffff88007c861e58
    Call Trace:
    [] dump_stack+0x19/0x1b
    [] warn_slowpath_common+0x7f/0xc0
    [] warn_slowpath_null+0x1a/0x20
    [] init_workqueues+0x35/0x505
    ...

    v2: CPU number added to the generic debug info as requested by s390
    folks and dropped the s390 specific dump_stack(). This loses %ksp
    from the debug message which the maintainers think isn't important
    enough to keep the s390-specific dump_stack() implementation.

    dump_stack_print_info() is moved to kernel/printk.c from
    lib/dump_stack.c. Because linkage is per objecct file,
    dump_stack_print_info() living in the same lib file as generic
    dump_stack() means that archs which implement custom dump_stack()
    - at this point, only blackfin - can't use dump_stack_print_info()
    as that will bring in the generic version of dump_stack() too. v1
    The v1 patch broke build on blackfin due to this issue. The build
    breakage was reported by Fengguang Wu.

    Signed-off-by: Tejun Heo
    Acked-by: David S. Miller
    Acked-by: Vineet Gupta
    Acked-by: Jesper Nilsson
    Acked-by: Vineet Gupta
    Acked-by: Martin Schwidefsky [s390 bits]
    Cc: Heiko Carstens
    Cc: Mike Frysinger
    Cc: Fengguang Wu
    Cc: Bjorn Helgaas
    Cc: Sam Ravnborg
    Acked-by: Richard Kuo [hexagon bits]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     

30 Apr, 2013

4 commits

  • Pull SMP/hotplug changes from Ingo Molnar:
    "This is a pretty large, multi-arch series unifying and generalizing
    the various disjunct pieces of idle routines that architectures have
    historically copied from each other and have grown in random, wildly
    inconsistent and sometimes buggy directions:

    101 files changed, 455 insertions(+), 1328 deletions(-)

    this went through a number of review and test iterations before it was
    committed, it was tested on various architectures, was exposed to
    linux-next for quite some time - nevertheless it might cause problems
    on architectures that don't read the mailing lists and don't regularly
    test linux-next.

    This cat herding excercise was motivated by the -rt kernel, and was
    brought to you by Thomas "the Whip" Gleixner."

    * 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (40 commits)
    idle: Remove GENERIC_IDLE_LOOP config switch
    um: Use generic idle loop
    ia64: Make sure interrupts enabled when we "safe_halt()"
    sparc: Use generic idle loop
    idle: Remove unused ARCH_HAS_DEFAULT_IDLE
    bfin: Fix typo in arch_cpu_idle()
    xtensa: Use generic idle loop
    x86: Use generic idle loop
    unicore: Use generic idle loop
    tile: Use generic idle loop
    tile: Enter idle with preemption disabled
    sh: Use generic idle loop
    score: Use generic idle loop
    s390: Use generic idle loop
    powerpc: Use generic idle loop
    parisc: Use generic idle loop
    openrisc: Use generic idle loop
    mn10300: Use generic idle loop
    mips: Use generic idle loop
    microblaze: Use generic idle loop
    ...

    Linus Torvalds
     
  • The early console implementations are the same all over the place. Move
    the print function to kernel/printk and get rid of the copies.

    [akpm@linux-foundation.org: arch/mips/kernel/early_printk.c needs kernel.h for va_list]
    [paul.gortmaker@windriver.com: sh4: make the bios early console support depend on EARLY_PRINTK]
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Paul Gortmaker
    Cc: Russell King
    Acked-by: Mike Frysinger
    Cc: Michal Simek
    Cc: Ralf Baechle
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mundt
    Cc: "David S. Miller"
    Cc: Chris Metcalf
    Cc: Richard Weinberger
    Reviewed-by: Ingo Molnar
    Tested-by: Paul Gortmaker
    Cc: Geert Uytterhoeven
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Thomas Gleixner
     
  • Use helper function free_highmem_page() to free highmem pages into
    the buddy system.

    Signed-off-by: Jiang Liu
    Cc: Jeff Dike
    Cc: Richard Weinberger
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiang Liu
     
  • Use common help functions to free reserved pages.

    Signed-off-by: Jiang Liu
    Cc: Jeff Dike
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiang Liu
     

17 Apr, 2013

2 commits


10 Apr, 2013

1 commit


03 Apr, 2013

1 commit

  • We are planning to convert the dynticks Kconfig options layout
    into a choice menu. The user must be able to easily pick
    any of the following implementations: constant periodic tick,
    idle dynticks, full dynticks.

    As this implies a mutual exclusion, the two dynticks implementions
    need to converge on the selection of a common Kconfig option in order
    to ease the sharing of a common infrastructure.

    It would thus seem pretty natural to reuse CONFIG_NO_HZ to
    that end. It already implements all the idle dynticks code
    and the full dynticks depends on all that code for now.
    So ideally the choice menu would propose CONFIG_NO_HZ_IDLE and
    CONFIG_NO_HZ_EXTENDED then both would select CONFIG_NO_HZ.

    On the other hand we want to stay backward compatible: if
    CONFIG_NO_HZ is set in an older config file, we want to
    enable CONFIG_NO_HZ_IDLE by default.

    But we can't afford both at the same time or we run into
    a circular dependency:

    1) CONFIG_NO_HZ_IDLE and CONFIG_NO_HZ_EXTENDED both select
    CONFIG_NO_HZ
    2) If CONFIG_NO_HZ is set, we default to CONFIG_NO_HZ_IDLE

    We might be able to support that from Kconfig/Kbuild but it
    may not be wise to introduce such a confusing behaviour.

    So to solve this, create a new CONFIG_NO_HZ_COMMON option
    which gathers the common code between idle and full dynticks
    (that common code for now is simply the idle dynticks code)
    and select it from their referring Kconfig.

    Then we'll later create CONFIG_NO_HZ_IDLE and map CONFIG_NO_HZ
    to it for backward compatibility.

    Signed-off-by: Frederic Weisbecker
    Cc: Andrew Morton
    Cc: Chris Metcalf
    Cc: Christoph Lameter
    Cc: Geoff Levand
    Cc: Gilad Ben Yossef
    Cc: Hakan Akkan
    Cc: Ingo Molnar
    Cc: Kevin Hilman
    Cc: Li Zhong
    Cc: Namhyung Kim
    Cc: Paul E. McKenney
    Cc: Paul Gortmaker
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Thomas Gleixner

    Frederic Weisbecker