27 Oct, 2009

1 commit


09 Oct, 2009

1 commit

  • …/git/tip/linux-2.6-tip

    * 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    futex: fix requeue_pi key imbalance
    futex: Fix typo in FUTEX_WAIT/WAKE_BITSET_PRIVATE definitions
    rcu: Place root rcu_node structure in separate lockdep class
    rcu: Make hot-unplugged CPU relinquish its own RCU callbacks
    rcu: Move rcu_barrier() to rcutree
    futex: Move exit_pi_state() call to release_mm()
    futex: Nullify robust lists after cleanup
    futex: Fix locking imbalance
    panic: Fix panic message visibility by calling bust_spinlocks(0) before dying
    rcu: Replace the rcu_barrier enum with pointer to call_rcu*() function
    rcu: Clean up code based on review feedback from Josh Triplett, part 4
    rcu: Clean up code based on review feedback from Josh Triplett, part 3
    rcu: Fix rcu_lock_map build failure on CONFIG_PROVE_LOCKING=y
    rcu: Clean up code to address Ingo's checkpatch feedback
    rcu: Clean up code based on review feedback from Josh Triplett, part 2
    rcu: Clean up code based on review feedback from Josh Triplett

    Linus Torvalds
     

06 Oct, 2009

1 commit

  • Some architectures such as Sparc, ARM and MIPS (basically
    everything with flush_dcache_page()) need to deal with dcache
    aliases by carefully placing pages in both kernel and user maps.

    These architectures typically have to use vmalloc_user() for this.

    However, on other architectures, vmalloc() is not needed and has
    the downsides of being more restricted and slower than regular
    allocations.

    Signed-off-by: Peter Zijlstra
    Acked-by: David Miller
    Cc: Andrew Morton
    Cc: Jens Axboe
    Cc: Paul Mackerras
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

24 Sep, 2009

6 commits

  • * git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus: (39 commits)
    cpumask: Move deprecated functions to end of header.
    cpumask: remove unused deprecated functions, avoid accusations of insanity
    cpumask: use new-style cpumask ops in mm/quicklist.
    cpumask: use mm_cpumask() wrapper: x86
    cpumask: use mm_cpumask() wrapper: um
    cpumask: use mm_cpumask() wrapper: mips
    cpumask: use mm_cpumask() wrapper: mn10300
    cpumask: use mm_cpumask() wrapper: m32r
    cpumask: use mm_cpumask() wrapper: arm
    cpumask: Use accessors for cpu_*_mask: um
    cpumask: Use accessors for cpu_*_mask: powerpc
    cpumask: Use accessors for cpu_*_mask: mips
    cpumask: Use accessors for cpu_*_mask: m32r
    cpumask: remove arch_send_call_function_ipi
    cpumask: arch_send_call_function_ipi_mask: s390
    cpumask: arch_send_call_function_ipi_mask: powerpc
    cpumask: arch_send_call_function_ipi_mask: mips
    cpumask: arch_send_call_function_ipi_mask: m32r
    cpumask: arch_send_call_function_ipi_mask: alpha
    cpumask: remove obsolete topology_core_siblings and topology_thread_siblings: ia64
    ...

    Linus Torvalds
     
  • * remove asm/atomic.h inclusion from linux/utsname.h --
    not needed after kref conversion
    * remove linux/utsname.h inclusion from files which do not need it

    NOTE: it looks like fs/binfmt_elf.c do not need utsname.h, however
    due to some personality stuff it _is_ needed -- cowardly leave ELF-related
    headers and files alone.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • It's only defined for NR_CPUS > BITS_PER_LONG; cpu_all_mask is always
    defined (and const).

    Signed-off-by: Rusty Russell

    Rusty Russell
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild-next: (30 commits)
    Use macros for .data.page_aligned section.
    Use macros for .bss.page_aligned section.
    Use new __init_task_data macro in arch init_task.c files.
    kbuild: Don't define ALIGN and ENTRY when preprocessing linker scripts.
    arm, cris, mips, sparc, powerpc, um, xtensa: fix build with bash 4.0
    kbuild: add static to prototypes
    kbuild: fail build if recordmcount.pl fails
    kbuild: set -fconserve-stack option for gcc 4.5
    kbuild: echo the record_mcount command
    gconfig: disable "typeahead find" search in treeviews
    kbuild: fix cc1 options check to ensure we do not use -fPIC when compiling
    checkincludes.pl: add option to remove duplicates in place
    markup_oops: use modinfo to avoid confusion with underscored module names
    checkincludes.pl: provide usage helper
    checkincludes.pl: close file as soon as we're done with it
    ctags: usability fix
    kernel hacking: move STRIP_ASM_SYMS from General
    gitignore usr/initramfs_data.cpio.bz2 and usr/initramfs_data.cpio.lzma
    kbuild: Check if linker supports the -X option
    kbuild: introduce ld-option
    ...

    Fix trivial conflict in scripts/basic/fixdep.c

    Linus Torvalds
     
  • These issues identified during an old-fashioned face-to-face code
    review extending over many hours.

    o Add comments for tricky parts of code, and correct comments
    that have passed their sell-by date.

    o Get rid of the vestiges of rcu_init_sched(), which is no
    longer needed now that PREEMPT_RCU is gone.

    o Move the #include of rcutree_plugin.h to the end of
    rcutree.c, which means that, rather than having a random
    collection of forward declarations, the new set of forward
    declarations document the set of plugins. The new home for
    this #include also allows __rcu_init_preempt() to move into
    rcutree_plugin.h.

    o Fix rcu_preempt_check_callbacks() to be static.

    Suggested-by: Josh Triplett
    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: akpm@linux-foundation.org
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar
    Peter Zijlstra

    Paul E. McKenney
     
  • * 'sfi-release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-sfi-2.6:
    SFI: remove unneeded includes
    sfi: Remove unused code
    SFI: Hook PCI MMCONFIG
    x86: add arch-specific SFI support
    SFI: add capability to parse ACPI tables
    SFI: add platform-independent core support
    SFI: create linux/sfi.h
    SFI: Simple Firmware Interface - MAINTAINERS, Kconfig

    Linus Torvalds
     

22 Sep, 2009

3 commits

  • Sizing of memory allocations shouldn't depend on the number of physical
    pages found in a system, as that generally includes (perhaps a huge amount
    of) non-RAM pages. The amount of what actually is usable as storage
    should instead be used as a basis here.

    Some of the calculations (i.e. those not intending to use high memory)
    should likely even use (totalram_pages - totalhigh_pages).

    Signed-off-by: Jan Beulich
    Acked-by: Rusty Russell
    Acked-by: Ingo Molnar
    Cc: Dave Airlie
    Cc: Kyle McMartin
    Cc: Jeremy Fitzhardinge
    Cc: Pekka Enberg
    Cc: Hugh Dickins
    Cc: "David S. Miller"
    Cc: Patrick McHardy
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Beulich
     
  • …ux/kernel/git/tip/linux-2.6-tip

    * 'perfcounters-rename-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    perf: Tidy up after the big rename
    perf: Do the big rename: Performance Counters -> Performance Events
    perf_counter: Rename 'event' to event_id/hw_event
    perf_counter: Rename list_entry -> group_entry, counter_list -> group_list

    Manually resolved some fairly trivial conflicts with the tracing tree in
    include/trace/ftrace.h and kernel/trace/trace_syscalls.c.

    Linus Torvalds
     
  • …/git/tip/linux-2.6-tip

    * 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    rcu: Fix whitespace inconsistencies
    rcu: Fix thinko, actually initialize full tree
    rcu: Apply results of code inspection of kernel/rcutree_plugin.h
    rcu: Add WARN_ON_ONCE() consistency checks covering state transitions
    rcu: Fix synchronize_rcu() for TREE_PREEMPT_RCU
    rcu: Simplify rcu_read_unlock_special() quiescent-state accounting
    rcu: Add debug checks to TREE_PREEMPT_RCU for premature grace periods
    rcu: Kconfig help needs to say that TREE_PREEMPT_RCU scales down
    rcutorture: Occasionally delay readers enough to make RCU force_quiescent_state
    rcu: Initialize multi-level RCU grace periods holding locks
    rcu: Need to update rnp->gpnum if preemptable RCU is to be reliable

    Linus Torvalds
     

21 Sep, 2009

2 commits

  • - provide compatibility Kconfig entry for existing PERF_COUNTERS .config's

    - provide courtesy copy of old perf_counter.h, for user-space projects

    - small indentation fixups

    - fix up MAINTAINERS

    - fix small x86 printout fallout

    - fix up small PowerPC comment fallout (use 'counter' as in register)

    Reviewed-by: Arjan van de Ven
    Acked-by: Peter Zijlstra
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Benjamin Herrenschmidt
    Cc: Frederic Weisbecker
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • Bye-bye Performance Counters, welcome Performance Events!

    In the past few months the perfcounters subsystem has grown out its
    initial role of counting hardware events, and has become (and is
    becoming) a much broader generic event enumeration, reporting, logging,
    monitoring, analysis facility.

    Naming its core object 'perf_counter' and naming the subsystem
    'perfcounters' has become more and more of a misnomer. With pending
    code like hw-breakpoints support the 'counter' name is less and
    less appropriate.

    All in one, we've decided to rename the subsystem to 'performance
    events' and to propagate this rename through all fields, variables
    and API names. (in an ABI compatible fashion)

    The word 'event' is also a bit shorter than 'counter' - which makes
    it slightly more convenient to write/handle as well.

    Thanks goes to Stephane Eranian who first observed this misnomer and
    suggested a rename.

    User-space tooling and ABI compatibility is not affected - this patch
    should be function-invariant. (Also, defconfigs were not touched to
    keep the size down.)

    This patch has been generated via the following script:

    FILES=$(find * -type f | grep -vE 'oprofile|[^K]config')

    sed -i \
    -e 's/PERF_EVENT_/PERF_RECORD_/g' \
    -e 's/PERF_COUNTER/PERF_EVENT/g' \
    -e 's/perf_counter/perf_event/g' \
    -e 's/nb_counters/nb_events/g' \
    -e 's/swcounter/swevent/g' \
    -e 's/tpcounter_event/tp_event/g' \
    $FILES

    for N in $(find . -name perf_counter.[ch]); do
    M=$(echo $N | sed 's/perf_counter/perf_event/g')
    mv $N $M
    done

    FILES=$(find . -name perf_event.*)

    sed -i \
    -e 's/COUNTER_MASK/REG_MASK/g' \
    -e 's/COUNTER/EVENT/g' \
    -e 's/\/event_id/g' \
    -e 's/counter/event/g' \
    -e 's/Counter/Event/g' \
    $FILES

    ... to keep it as correct as possible. This script can also be
    used by anyone who has pending perfcounters patches - it converts
    a Linux kernel tree over to the new naming. We tried to time this
    change to the point in time where the amount of pending patches
    is the smallest: the end of the merge window.

    Namespace clashes were fixed up in a preparatory patch - and some
    stylistic fallout will be fixed up in a subsequent patch.

    ( NOTE: 'counters' are still the proper terminology when we deal
    with hardware registers - and these sed scripts are a bit
    over-eager in renaming them. I've undone some of that, but
    in case there's something left where 'counter' would be
    better than 'event' we can undo that on an individual basis
    instead of touching an otherwise nicely automated patch. )

    Suggested-by: Stephane Eranian
    Acked-by: Peter Zijlstra
    Acked-by: Paul Mackerras
    Reviewed-by: Arjan van de Ven
    Cc: Mike Galbraith
    Cc: Arnaldo Carvalho de Melo
    Cc: Frederic Weisbecker
    Cc: Steven Rostedt
    Cc: Benjamin Herrenschmidt
    Cc: David Howells
    Cc: Kyle McMartin
    Cc: Martin Schwidefsky
    Cc: "David S. Miller"
    Cc: Thomas Gleixner
    Cc: "H. Peter Anvin"
    Cc:
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

20 Sep, 2009

1 commit


19 Sep, 2009

2 commits


18 Sep, 2009

1 commit

  • To quote Valdis:

    This leaves somebody who has a laptop wondering which
    choice is best for a system with only one or two cores that
    has CONFIG_PREEMPT defined. One choice says it scales down
    nicely, the other explicitly has a 'depends on PREEMPT'
    attached to it...

    So add "scales down nicely" to TREE_PREEMPT_RCU to match that of
    TREE_RCU.

    Suggested-by: Valdis Kletnieks
    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: akpm@linux-foundation.org
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     

16 Sep, 2009

3 commits

  • * git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6:
    Driver Core: devtmpfs - kernel-maintained tmpfs-based /dev
    debugfs: Modify default debugfs directory for debugging pktcdvd.
    debugfs: Modified default dir of debugfs for debugging UHCI.
    debugfs: Change debugfs directory of IWMC3200
    debugfs: Change debuhgfs directory of trace-events-sample.h
    debugfs: Fix mount directory of debugfs by default in events.txt
    hpilo: add poll f_op
    hpilo: add interrupt handler
    hpilo: staging for interrupt handling
    driver core: platform_device_add_data(): use kmemdup()
    Driver core: Add support for compatibility classes
    uio: add generic driver for PCI 2.3 devices
    driver-core: move dma-coherent.c from kernel to driver/base
    mem_class: fix bug
    mem_class: use minor as index instead of searching the array
    driver model: constify attribute groups
    UIO: remove 'default n' from Kconfig
    Driver core: Add accessor for device platform data
    Driver core: move dev_get/set_drvdata to drivers/base/dd.c
    Driver core: add new device to bus's list before probing

    Linus Torvalds
     
  • Devtmpfs lets the kernel create a tmpfs instance called devtmpfs
    very early at kernel initialization, before any driver-core device
    is registered. Every device with a major/minor will provide a
    device node in devtmpfs.

    Devtmpfs can be changed and altered by userspace at any time,
    and in any way needed - just like today's udev-mounted tmpfs.
    Unmodified udev versions will run just fine on top of it, and will
    recognize an already existing kernel-created device node and use it.
    The default node permissions are root:root 0600. Proper permissions
    and user/group ownership, meaningful symlinks, all other policy still
    needs to be applied by userspace.

    If a node is created by devtmps, devtmpfs will remove the device node
    when the device goes away. If the device node was created by
    userspace, or the devtmpfs created node was replaced by userspace, it
    will no longer be removed by devtmpfs.

    If it is requested to auto-mount it, it makes init=/bin/sh work
    without any further userspace support. /dev will be fully populated
    and dynamic, and always reflect the current device state of the kernel.
    With the commonly used dynamic device numbers, it solves the problem
    where static devices nodes may point to the wrong devices.

    It is intended to make the initial bootup logic simpler and more robust,
    by de-coupling the creation of the inital environment, to reliably run
    userspace processes, from a complex userspace bootstrap logic to provide
    a working /dev.

    Signed-off-by: Kay Sievers
    Signed-off-by: Jan Blunck
    Tested-By: Harald Hoyer
    Tested-By: Scott James Remnant
    Signed-off-by: Greg Kroah-Hartman

    Kay Sievers
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (46 commits)
    powerpc64: convert to dynamic percpu allocator
    sparc64: use embedding percpu first chunk allocator
    percpu: kill lpage first chunk allocator
    x86,percpu: use embedding for 64bit NUMA and page for 32bit NUMA
    percpu: update embedding first chunk allocator to handle sparse units
    percpu: use group information to allocate vmap areas sparsely
    vmalloc: implement pcpu_get_vm_areas()
    vmalloc: separate out insert_vmalloc_vm()
    percpu: add chunk->base_addr
    percpu: add pcpu_unit_offsets[]
    percpu: introduce pcpu_alloc_info and pcpu_group_info
    percpu: move pcpu_lpage_build_unit_map() and pcpul_lpage_dump_cfg() upward
    percpu: add @align to pcpu_fc_alloc_fn_t
    percpu: make @dyn_size mandatory for pcpu_setup_first_chunk()
    percpu: drop @static_size from first chunk allocators
    percpu: generalize first chunk allocator selection
    percpu: build first chunk allocators selectively
    percpu: rename 4k first chunk allocator to page
    percpu: improve boot messages
    percpu: fix pcpu_reclaim() locking
    ...

    Fix trivial conflict as by Tejun Heo in kernel/sched.c

    Linus Torvalds
     

12 Sep, 2009

1 commit

  • …/git/tip/linux-2.6-tip

    * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (64 commits)
    sched: Fix sched::sched_stat_wait tracepoint field
    sched: Disable NEW_FAIR_SLEEPERS for now
    sched: Keep kthreads at default priority
    sched: Re-tune the scheduler latency defaults to decrease worst-case latencies
    sched: Turn off child_runs_first
    sched: Ensure that a child can't gain time over it's parent after fork()
    sched: enable SD_WAKE_IDLE
    sched: Deal with low-load in wake_affine()
    sched: Remove short cut from select_task_rq_fair()
    sched: Turn on SD_BALANCE_NEWIDLE
    sched: Clean up topology.h
    sched: Fix dynamic power-balancing crash
    sched: Remove reciprocal for cpu_power
    sched: Try to deal with low capacity, fix update_sd_power_savings_stats()
    sched: Try to deal with low capacity
    sched: Scale down cpu_power due to RT tasks
    sched: Implement dynamic cpu_power
    sched: Add smt_gain
    sched: Update the cpu_power sum during load-balance
    sched: Add SD_PREFER_SIBLING
    ...

    Linus Torvalds
     

04 Sep, 2009

2 commits


29 Aug, 2009

1 commit


27 Aug, 2009

1 commit

  • Some architectures initialize clocks and timers in late_time_init and
    x86 wants to do the same to avoid FIXMAP hackery for calibrating the
    TSC. That would result in undefined sched_clock readout and wreckaged
    printk timestamps again. We probably have those already on archs which
    do all their time/clock setup in late_time_init.

    There is no harm to move that after late_time_init except that a few
    more boot timestamps are stale. The scheduler is not active at that
    point so no real wreckage is expected.

    Signed-off-by: Thomas Gleixner
    LKML-Reference:
    Cc: linux-arch@vger.kernel.org

    Thomas Gleixner
     

26 Aug, 2009

1 commit


23 Aug, 2009

2 commits

  • Now that CONFIG_TREE_PREEMPT_RCU is in place, there is no
    further need for CONFIG_PREEMPT_RCU. Remove it, along with
    whatever subtle bugs it may (or may not) contain.

    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: akpm@linux-foundation.org
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josht@linux.vnet.ibm.com
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     
  • Create a kernel/rcutree_plugin.h file that contains definitions
    for preemptable RCU (or, under the #else branch of the #ifdef,
    empty definitions for the classic non-preemptable semantics).
    These definitions fit into plugins defined in kernel/rcutree.c
    for this purpose.

    This variant of preemptable RCU uses a new algorithm whose
    read-side expense is roughly that of classic hierarchical RCU
    under CONFIG_PREEMPT. This new algorithm's update-side expense
    is similar to that of classic hierarchical RCU, and, in absence
    of read-side preemption or blocking, is exactly that of classic
    hierarchical RCU. Perhaps more important, this new algorithm
    has a much simpler implementation, saving well over 1,000 lines
    of code compared to mainline's implementation of preemptable
    RCU, which will hopefully be retired in favor of this new
    algorithm.

    The simplifications are obtained by maintaining per-task
    nesting state for running tasks, and using a simple
    lock-protected algorithm to handle accounting when tasks block
    within RCU read-side critical sections, making use of lessons
    learned while creating numerous user-level RCU implementations
    over the past 18 months.

    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: akpm@linux-foundation.org
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josht@linux.vnet.ibm.com
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     

21 Aug, 2009

1 commit

  • One of my testboxes triggered this nasty stack overflow crash
    during SCSI probing:

    [ 5.874004] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
    [ 5.875004] device: 'sda': device_add
    [ 5.878004] BUG: unable to handle kernel NULL pointer dereference at 00000a0c
    [ 5.878004] IP: [] print_context_stack+0x81/0x110
    [ 5.878004] *pde = 00000000
    [ 5.878004] Thread overran stack, or stack corrupted
    [ 5.878004] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
    [ 5.878004] last sysfs file:
    [ 5.878004]
    [ 5.878004] Pid: 1, comm: swapper Not tainted (2.6.31-rc6-tip-01272-g9919e28-dirty #5685)
    [ 5.878004] EIP: 0060:[] EFLAGS: 00010083 CPU: 0
    [ 5.878004] EIP is at print_context_stack+0x81/0x110
    [ 5.878004] EAX: cf8a3000 EBX: cf8a3fe4 ECX: 00000049 EDX: 00000000
    [ 5.878004] ESI: b1cfce84 EDI: 00000000 EBP: cf8a3018 ESP: cf8a2ff4
    [ 5.878004] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
    [ 5.878004] Process swapper (pid: 1, ti=cf8a2000 task=cf8a8000 task.ti=cf8a3000)
    [ 5.878004] Stack:
    [ 5.878004] b1004867 fffff000 cf8a3ffc
    [ 5.878004] Call Trace:
    [ 5.878004] [] ? kernel_thread_helper+0x7/0x10
    [ 5.878004] BUG: unable to handle kernel NULL pointer dereference at 00000a0c
    [ 5.878004] IP: [] print_context_stack+0x81/0x110
    [ 5.878004] *pde = 00000000
    [ 5.878004] Thread overran stack, or stack corrupted
    [ 5.878004] Oops: 0000 [#2] PREEMPT SMP DEBUG_PAGEALLOC

    The oops did not reveal any more details about the real stack
    that we have and the system got into an infinite loop of
    recursive pagefaults.

    So i booted with CONFIG_STACK_TRACER=y and the 'stacktrace' boot
    parameter. The box did not crash (timings/conditions probably
    changed a tiny bit to trigger the catastrophic crash), but the
    /debug/tracing/stack_trace file was rather revealing:

    Depth Size Location (72 entries)
    ----- ---- --------
    0) 3704 52 __change_page_attr+0xb8/0x290
    1) 3652 24 __change_page_attr_set_clr+0x43/0x90
    2) 3628 60 kernel_map_pages+0x108/0x120
    3) 3568 40 prep_new_page+0x7d/0x130
    4) 3528 84 get_page_from_freelist+0x106/0x420
    5) 3444 116 __alloc_pages_nodemask+0xd7/0x550
    6) 3328 36 allocate_slab+0xb1/0x100
    7) 3292 36 new_slab+0x1c/0x160
    8) 3256 36 __slab_alloc+0x133/0x2b0
    9) 3220 4 kmem_cache_alloc+0x1bb/0x1d0
    10) 3216 108 create_object+0x28/0x250
    11) 3108 40 kmemleak_alloc+0x81/0xc0
    12) 3068 24 kmem_cache_alloc+0x162/0x1d0
    13) 3044 52 scsi_pool_alloc_command+0x29/0x70
    14) 2992 20 scsi_host_alloc_command+0x22/0x70
    15) 2972 24 __scsi_get_command+0x1b/0x90
    16) 2948 28 scsi_get_command+0x35/0x90
    17) 2920 24 scsi_setup_blk_pc_cmnd+0xd4/0x100
    18) 2896 128 sd_prep_fn+0x332/0xa70
    19) 2768 36 blk_peek_request+0xe7/0x1d0
    20) 2732 56 scsi_request_fn+0x54/0x520
    21) 2676 12 __generic_unplug_device+0x2b/0x40
    22) 2664 24 blk_execute_rq_nowait+0x59/0x80
    23) 2640 172 blk_execute_rq+0x6b/0xb0
    24) 2468 32 scsi_execute+0xe0/0x140
    25) 2436 64 scsi_execute_req+0x152/0x160
    26) 2372 60 scsi_vpd_inquiry+0x6c/0x90
    27) 2312 44 scsi_get_vpd_page+0x112/0x160
    28) 2268 52 sd_revalidate_disk+0x1df/0x320
    29) 2216 92 rescan_partitions+0x98/0x330
    30) 2124 52 __blkdev_get+0x309/0x350
    31) 2072 8 blkdev_get+0xf/0x20
    32) 2064 44 register_disk+0xff/0x120
    33) 2020 36 add_disk+0x6e/0xb0
    34) 1984 44 sd_probe_async+0xfb/0x1d0
    35) 1940 44 __async_schedule+0xf4/0x1b0
    36) 1896 8 async_schedule+0x12/0x20
    37) 1888 60 sd_probe+0x305/0x360
    38) 1828 44 really_probe+0x63/0x170
    39) 1784 36 driver_probe_device+0x5d/0x60
    40) 1748 16 __device_attach+0x49/0x50
    41) 1732 32 bus_for_each_drv+0x5b/0x80
    42) 1700 24 device_attach+0x6b/0x70
    43) 1676 16 bus_attach_device+0x47/0x60
    44) 1660 76 device_add+0x33d/0x400
    45) 1584 52 scsi_sysfs_add_sdev+0x6a/0x2c0
    46) 1532 108 scsi_add_lun+0x44b/0x460
    47) 1424 116 scsi_probe_and_add_lun+0x182/0x4e0
    48) 1308 36 __scsi_add_device+0xd9/0xe0
    49) 1272 44 ata_scsi_scan_host+0x10b/0x190
    50) 1228 24 async_port_probe+0x96/0xd0
    51) 1204 44 __async_schedule+0xf4/0x1b0
    52) 1160 8 async_schedule+0x12/0x20
    53) 1152 48 ata_host_register+0x171/0x1d0
    54) 1104 60 ata_pci_sff_activate_host+0xf3/0x230
    55) 1044 44 ata_pci_sff_init_one+0xea/0x100
    56) 1000 48 amd_init_one+0xb2/0x190
    57) 952 8 local_pci_probe+0x13/0x20
    58) 944 32 pci_device_probe+0x68/0x90
    59) 912 44 really_probe+0x63/0x170
    60) 868 36 driver_probe_device+0x5d/0x60
    61) 832 20 __driver_attach+0x89/0xa0
    62) 812 32 bus_for_each_dev+0x5b/0x80
    63) 780 12 driver_attach+0x1e/0x20
    64) 768 72 bus_add_driver+0x14b/0x2d0
    65) 696 36 driver_register+0x6e/0x150
    66) 660 20 __pci_register_driver+0x53/0xc0
    67) 640 8 amd_init+0x14/0x16
    68) 632 572 do_one_initcall+0x2b/0x1d0
    69) 60 12 do_basic_setup+0x56/0x6a
    70) 48 20 kernel_init+0x84/0xce
    71) 28 28 kernel_thread_helper+0x7/0x10

    There's a lot of fat functions on that stack trace, but
    the largest of all is do_one_initcall(). This is due to
    the boot trace entry variables being on the stack.

    Fixing this is relatively easy, initcalls are fundamentally
    serialized, so we can move the local variables to file scope.

    Note that this large stack footprint was present for a
    couple of months already - what pushed my system over
    the edge was the addition of kmemleak to the call-chain:

    6) 3328 36 allocate_slab+0xb1/0x100
    7) 3292 36 new_slab+0x1c/0x160
    8) 3256 36 __slab_alloc+0x133/0x2b0
    9) 3220 4 kmem_cache_alloc+0x1bb/0x1d0
    10) 3216 108 create_object+0x28/0x250
    11) 3108 40 kmemleak_alloc+0x81/0xc0
    12) 3068 24 kmem_cache_alloc+0x162/0x1d0
    13) 3044 52 scsi_pool_alloc_command+0x29/0x70

    This pushes the total to ~3800 bytes, only a tiny bit
    more was needed to corrupt the on-kernel-stack thread_info.

    The fix reduces the stack footprint from 572 bytes
    to 28 bytes.

    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Frederic Weisbecker
    Cc: Steven Rostedt
    Cc: Catalin Marinas
    Cc: Jens Axboe
    Cc: Linus Torvalds
    Cc:
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

16 Aug, 2009

1 commit


14 Aug, 2009

2 commits

  • Conflicts:
    arch/sparc/kernel/smp_64.c
    arch/x86/kernel/cpu/perf_counter.c
    arch/x86/kernel/setup_percpu.c
    drivers/cpufreq/cpufreq_ondemand.c
    mm/percpu.c

    Conflicts in core and arch percpu codes are mostly from commit
    ed78e1e078dd44249f88b1dd8c76dafb39567161 which substituted many
    num_possible_cpus() with nr_cpu_ids. As for-next branch has moved all
    the first chunk allocators into mm/percpu.c, the changes are moved
    from arch code to mm/percpu.c.

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • nr_cpu_ids is dependent only on cpu_possible_map and
    setup_per_cpu_areas() already depends on cpu_possible_map and will use
    nr_cpu_ids. Initialize nr_cpu_ids before setting up percpu areas.

    Signed-off-by: Tejun Heo

    Tejun Heo
     

05 Aug, 2009

1 commit


02 Aug, 2009

1 commit


13 Jul, 2009

1 commit

  • Fix a missed rename in EVENT_PROFILE support so that it gets
    built and allows tracepoint tracing from the 'perf' tool.

    Fix a typo in the (never before built & enabled) portion in
    perf_counter.c as well, and update that code to the
    attr.config changes as well.

    Signed-off-by: Chris Wilson
    Cc: Ben Gamari
    Cc: Jason Baron
    Cc: Frederic Weisbecker
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    Cc: Steven Rostedt
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Chris Wilson
     

04 Jul, 2009

1 commit

  • Pull linus#master to merge PER_CPU_DEF_ATTRIBUTES and alpha build fix
    changes. As alpha in percpu tree uses 'weak' attribute instead of
    inline assembly, there's no need for __used attribute.

    Conflicts:
    arch/alpha/include/asm/percpu.h
    arch/mn10300/kernel/vmlinux.lds.S
    include/linux/percpu-defs.h

    Tejun Heo
     

24 Jun, 2009

3 commits

  • Remove Classic RCU, given that the combination of Tree RCU and
    the proposed Bloatwatch RCU do everything that Classic RCU can
    with fewer bugs.

    Tree RCU has been default in x86 builds for almost six months,
    and seems to be quite reliable, so there does not seem to be
    much justification for keeping the Classic RCU code and config
    complexity around anymore.

    Signed-off-by: Paul E. McKenney
    Cc: akpm@linux-foundation.org
    Cc: niv@us.ibm.com
    Cc: dvhltc@us.ibm.com
    Cc: dipankar@in.ibm.com
    Cc: dhowells@redhat.com
    Cc: lethal@linux-sh.org
    Cc: kernel@wantstofly.org
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     
  • This patch makes most !CONFIG_HAVE_SETUP_PER_CPU_AREA archs use
    dynamic percpu allocator. The first chunk is allocated using
    embedding helper and 8k is reserved for modules. This ensures that
    the new allocator behaves almost identically to the original allocator
    as long as static percpu variables are concerned, so it shouldn't
    introduce much breakage.

    s390 and alpha use custom SHIFT_PERCPU_PTR() to work around addressing
    range limit the addressing model imposes. Unfortunately, this breaks
    if the address is specified using a variable, so for now, the two
    archs aren't converted.

    The following architectures are affected by this change.

    * sh
    * arm
    * cris
    * mips
    * sparc(32)
    * blackfin
    * avr32
    * parisc (broken, under investigation)
    * m32r
    * powerpc(32)

    As this change makes the dynamic allocator the default one,
    CONFIG_HAVE_DYNAMIC_PER_CPU_AREA is replaced with its invert -
    CONFIG_HAVE_LEGACY_PER_CPU_AREA, which is added to yet-to-be converted
    archs. These archs implement their own setup_per_cpu_areas() and the
    conversion is not trivial.

    * powerpc(64)
    * sparc(64)
    * ia64
    * alpha
    * s390

    Boot and batch alloc/free tests on x86_32 with debug code (x86_32
    doesn't use default first chunk initialization). Compile tested on
    sparc(32), powerpc(32), arm and alpha.

    Kyle McMartin reported that this change breaks parisc. The problem is
    still under investigation and he is okay with pushing this patch
    forward and fixing parisc later.

    [ Impact: use dynamic allocator for most archs w/o custom percpu setup ]

    Signed-off-by: Tejun Heo
    Acked-by: Rusty Russell
    Acked-by: David S. Miller
    Acked-by: Benjamin Herrenschmidt
    Acked-by: Martin Schwidefsky
    Reviewed-by: Christoph Lameter
    Cc: Paul Mundt
    Cc: Russell King
    Cc: Mikael Starvik
    Cc: Ralf Baechle
    Cc: Bryan Wu
    Cc: Kyle McMartin
    Cc: Matthew Wilcox
    Cc: Grant Grundler
    Cc: Hirokazu Takata
    Cc: Richard Henderson
    Cc: Ivan Kokshaysky
    Cc: Heiko Carstens
    Cc: Ingo Molnar

    Tejun Heo
     
  • …bugzilla-13121', 'bugzilla-13396', 'bugzilla-13533', 'bugzilla-13612', 'c3_lock', 'hid-cleanups', 'misc-2.6.31', 'pdc-leak-fix', 'pnpacpi', 'power_nocheck', 'thinkpad_acpi', 'video' and 'wmi' into release

    Len Brown