03 Jul, 2010

1 commit


30 Jun, 2010

1 commit

  • Apparently "pid-1" confuses people...

    Requested-by: Randy Dunlap
    Signed-off-by: Peter Zijlstra
    Cc: torvalds@linux-foundation.org
    Cc: randy.dunlap@oracle.com
    Cc: Ilya Loginov
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

29 Jun, 2010

1 commit

  • Ilya reported that on a very slow machine he could reliably
    reproduce a race between forking init and kthreadd. We first
    fork init so that it obtains pid-1, however since the scheduler
    is already fully running at this point it can preempt and run
    the init thread before we spawn and set kthreadd_task.

    The init thread can then attempt spawning kthreads without
    kthreadd being present which results in an OOPS.

    Reported-by: Ilya Loginov
    Signed-off-by: Peter Zijlstra
    Acked-by: Linus Torvalds
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

10 Jun, 2010

1 commit

  • Patch is against latest Linus master branch and is expected to be
    safe bug fix.

    You get:
    ACPI: HARDWARE addr space,NOT supported yet
    for each ACPI defined CPU which status is active, but exceeds
    maxcpus= count.

    As these "not booted" CPUs do not run an idle routine
    and echo X >/proc/acpi/processor/*/throttling did not work
    I couldn't find a way to really access not onlined/booted
    machines. Still this should get fixed and
    /proc/acpi/processor/X dirs of cores exceeding maxcpus
    should not show up.

    I wonder whether this could get cleaned up by truncating possible cpu mask
    and nr_cpu_ids to setup_max_cpus early some day
    (and not exporting setup_max_cpus anymore then).
    But this needs touching of a lot other places...

    Signed-off-by: Thomas Renninger
    CC: travis@sgi.com
    CC: linux-acpi@vger.kernel.org
    CC: lenb@kernel.org
    Signed-off-by: Len Brown

    Thomas Renninger
     

25 May, 2010

1 commit

  • For each new populated zone of hotadded node, need to update its pagesets
    with dynamically allocated per_cpu_pageset struct for all possible CPUs:

    1) Detach zone->pageset from the shared boot_pageset
    at end of __build_all_zonelists().

    2) Use mutex to protect zone->pageset when it's still
    shared in onlined_pages()

    Otherwises, multiple zones of different nodes would share same boot strapping
    boot_pageset for same CPU, which will finally cause below kernel panic:

    ------------[ cut here ]------------
    kernel BUG at mm/page_alloc.c:1239!
    invalid opcode: 0000 [#1] SMP
    ...
    Call Trace:
    [] __alloc_pages_nodemask+0x131/0x7b0
    [] alloc_pages_current+0x87/0xd0
    [] __page_cache_alloc+0x67/0x70
    [] __do_page_cache_readahead+0x120/0x260
    [] ra_submit+0x21/0x30
    [] ondemand_readahead+0x166/0x2c0
    [] page_cache_async_readahead+0x80/0xa0
    [] generic_file_aio_read+0x364/0x670
    [] nfs_file_read+0xca/0x130
    [] do_sync_read+0xfa/0x140
    [] vfs_read+0xb5/0x1a0
    [] sys_read+0x51/0x80
    [] system_call_fastpath+0x16/0x1b
    RIP [] get_page_from_freelist+0x883/0x900
    RSP
    ---[ end trace 4bda28328b9990db ]

    [akpm@linux-foundation.org: merge fix]
    Signed-off-by: Haicheng Li
    Signed-off-by: Wu Fengguang
    Reviewed-by: Andi Kleen
    Reviewed-by: Christoph Lameter
    Cc: Mel Gorman
    Cc: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Haicheng Li
     

21 May, 2010

2 commits

  • The kernel debugger can operate well before mm_init(), but the x86
    hardware breakpoint code which uses the perf api requires that the
    kernel allocators are initialized.

    This means the kernel debug core needs to provide an optional arch
    specific call back to allow the initialization functions to run after
    the kernel has been further initialized.

    The kdb shell already had a similar restriction with an early
    initialization and late initialization. The kdb_init() was moved into
    the debug core's version of the late init which is called
    dbg_late_init();

    CC: kgdb-bugreport@lists.sourceforge.net
    Signed-off-by: Jason Wessel

    Jason Wessel
     
  • This patch contains the hooks and instrumentation into kernel which
    live outside the kernel/debug directory, which the kdb core
    will call to run commands like lsmod, dmesg, bt etc...

    CC: linux-arch@vger.kernel.org
    Signed-off-by: Jason Wessel
    Signed-off-by: Martin Hicks

    Jason Wessel
     

30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

25 Mar, 2010

1 commit

  • cpuset_mem_spread_node() returns an offline node, and causes an oops.

    This patch fixes it by initializing task->mems_allowed to
    node_states[N_HIGH_MEMORY], and updating task->mems_allowed when doing
    memory hotplug.

    Signed-off-by: Miao Xie
    Acked-by: David Rientjes
    Reported-by: Nick Piggin
    Tested-by: Nick Piggin
    Cc: Paul Menage
    Cc: Li Zefan
    Cc: Ingo Molnar
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miao Xie
     

07 Mar, 2010

3 commits

  • The only in tree external users of the symbol setup_max_cpus are in
    arch/x86/. The files ./kernel/alternative.c, ./kernel/visws_quirks.c, and
    ./mm/kmemcheck/kmemcheck.c are all guarded by CONFIG_SMP being defined.
    For this case the symbol is an unsigned int and declared as an extern in
    include/linux/smp.h.

    When CONFIG_SMP is not defined the symbol setup_max_cpus is
    a constant value that is only used in init/main.c. Make the symbol
    static for this case.

    Signed-off-by: H Hartley Sweeten
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    H Hartley Sweeten
     
  • - new Documentation/init.txt file describing various forms of failure
    trying to load the init binary after kernel bootup

    - extend the init/main.c init failure message to direct to
    Documentation/init.txt

    Signed-off-by: Andreas Mohr
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andreas Mohr
     
  • There are quite a few GFP_KERNEL memory allocations made during
    suspend/hibernation and resume that may cause the system to hang, because
    the I/O operations they depend on cannot be completed due to the
    underlying devices being suspended.

    Avoid this problem by clearing the __GFP_IO and __GFP_FS bits in
    gfp_allowed_mask before suspend/hibernation and restoring the original
    values of these bits in gfp_allowed_mask durig the subsequent resume.

    [akpm@linux-foundation.org: fix CONFIG_PM=n linkage]
    Signed-off-by: Rafael J. Wysocki
    Reported-by: Maxim Levitsky
    Cc: Sebastian Ott
    Cc: Benjamin Herrenschmidt
    Cc: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rafael J. Wysocki
     

05 Mar, 2010

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (52 commits)
    init: Open /dev/console from rootfs
    mqueue: fix typo "failues" -> "failures"
    mqueue: only set error codes if they are really necessary
    mqueue: simplify do_open() error handling
    mqueue: apply mathematics distributivity on mq_bytes calculation
    mqueue: remove unneeded info->messages initialization
    mqueue: fix mq_open() file descriptor leak on user-space processes
    fix race in d_splice_alias()
    set S_DEAD on unlink() and non-directory rename() victims
    vfs: add NOFOLLOW flag to umount(2)
    get rid of ->mnt_parent in tomoyo/realpath
    hppfs can use existing proc_mnt, no need for do_kern_mount() in there
    Mirror MS_KERNMOUNT in ->mnt_flags
    get rid of useless vfsmount_lock use in put_mnt_ns()
    Take vfsmount_lock to fs/internal.h
    get rid of insanity with namespace roots in tomoyo
    take check for new events in namespace (guts of mounts_poll()) to namespace.c
    Don't mess with generic_permission() under ->d_lock in hpfs
    sanitize const/signedness for udf
    nilfs: sanitize const/signedness in dealing with ->d_name.name
    ...

    Fix up fairly trivial (famous last words...) conflicts in
    drivers/infiniband/core/uverbs_main.c and security/tomoyo/realpath.c

    Linus Torvalds
     

04 Mar, 2010

2 commits

  • To avoid potential problems with an empty /dev open /dev/console
    from rootfs instead of waiting to mount our root filesystem and
    mounting it there. This effectively guarantees that there will
    be a device node, and it won't be on a filesystem that we will
    ever unmount, so there are no issues with leaving /dev/console
    open and pinning the filesystem.

    This is actually more effective than automatically mounting
    devtmpfs on /dev because it removes removes the occasionally
    problematic assumption that /dev/console exists from the boot
    code.

    With this patch I was able to throw busybox on my /boot partition
    (which has no /dev directory) and boot into userspace without
    problems.

    The only possible negative consequence I can think of is that
    someone out there deliberately used did not use a character device
    that is major 5 minor 2 for /dev/console. Does anyone know of a
    situation in which that could make sense?

    Signed-off-by: Eric W. Biederman
    Signed-off-by: Al Viro

    Eric W. Biederman
     
  • * 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (25 commits)
    x86: Fix out of order of gsi
    x86: apic: Fix mismerge, add arch_probe_nr_irqs() again
    x86, irq: Keep chip_data in create_irq_nr and destroy_irq
    xen: Remove unnecessary arch specific xen irq functions.
    smp: Use nr_cpus= to set nr_cpu_ids early
    x86, irq: Remove arch_probe_nr_irqs
    sparseirq: Use radix_tree instead of ptrs array
    sparseirq: Change irq_desc_ptrs to static
    init: Move radix_tree_init() early
    irq: Remove unnecessary bootmem code
    x86: Add iMac9,1 to pci_reboot_dmi_table
    x86: Convert i8259_lock to raw_spinlock
    x86: Convert nmi_lock to raw_spinlock
    x86: Convert ioapic_lock and vector_lock to raw_spinlock
    x86: Avoid race condition in pci_enable_msix()
    x86: Fix SCI on IOAPIC != 0
    x86, ia32_aout: do not kill argument mapping
    x86, irq: Move __setup_vector_irq() before the first irq enable in cpu online path
    x86, irq: Update the vector domain for legacy irqs handled by io-apic
    x86, irq: Don't block IRQ0_VECTOR..IRQ15_VECTOR's on all cpu's
    ...

    Linus Torvalds
     

25 Feb, 2010

1 commit

  • Update the rcu_dereference() usages to take advantage of the new
    lockdep-based checking.

    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    LKML-Reference:
    [ -v2: fix allmodconfig missing symbol export build failure on x86 ]
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     

18 Feb, 2010

2 commits


07 Feb, 2010

1 commit

  • ima wants to create an inode information struct (iint) when inodes are
    allocated. This means that at least the part of ima which does this
    allocation (the allocation is filled with information later) should
    before any inodes are created. To accomplish this we split the ima
    initialization routine placing the kmem cache allocator inside a
    security_initcall() function. Since this makes use of radix trees we also
    need to make sure that is initialized before security_initcall().

    Signed-off-by: Eric Paris
    Acked-by: Mimi Zohar
    Signed-off-by: Al Viro

    Eric Paris
     

17 Dec, 2009

1 commit


16 Dec, 2009

1 commit

  • The symbol 'call' is a static symbol used for initcall_debug. This same
    symbol name is used locally by a couple functions and produces the
    following sparse warnings:

    warning: symbol 'call' shadows an earlier one

    Fix this noise by renaming the local symbols.

    Signed-off-by: H Hartley Sweeten
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    H Hartley Sweeten
     

03 Dec, 2009

1 commit


09 Oct, 2009

1 commit

  • …/git/tip/linux-2.6-tip

    * 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    futex: fix requeue_pi key imbalance
    futex: Fix typo in FUTEX_WAIT/WAKE_BITSET_PRIVATE definitions
    rcu: Place root rcu_node structure in separate lockdep class
    rcu: Make hot-unplugged CPU relinquish its own RCU callbacks
    rcu: Move rcu_barrier() to rcutree
    futex: Move exit_pi_state() call to release_mm()
    futex: Nullify robust lists after cleanup
    futex: Fix locking imbalance
    panic: Fix panic message visibility by calling bust_spinlocks(0) before dying
    rcu: Replace the rcu_barrier enum with pointer to call_rcu*() function
    rcu: Clean up code based on review feedback from Josh Triplett, part 4
    rcu: Clean up code based on review feedback from Josh Triplett, part 3
    rcu: Fix rcu_lock_map build failure on CONFIG_PROVE_LOCKING=y
    rcu: Clean up code to address Ingo's checkpatch feedback
    rcu: Clean up code based on review feedback from Josh Triplett, part 2
    rcu: Clean up code based on review feedback from Josh Triplett

    Linus Torvalds
     

24 Sep, 2009

5 commits

  • * git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus: (39 commits)
    cpumask: Move deprecated functions to end of header.
    cpumask: remove unused deprecated functions, avoid accusations of insanity
    cpumask: use new-style cpumask ops in mm/quicklist.
    cpumask: use mm_cpumask() wrapper: x86
    cpumask: use mm_cpumask() wrapper: um
    cpumask: use mm_cpumask() wrapper: mips
    cpumask: use mm_cpumask() wrapper: mn10300
    cpumask: use mm_cpumask() wrapper: m32r
    cpumask: use mm_cpumask() wrapper: arm
    cpumask: Use accessors for cpu_*_mask: um
    cpumask: Use accessors for cpu_*_mask: powerpc
    cpumask: Use accessors for cpu_*_mask: mips
    cpumask: Use accessors for cpu_*_mask: m32r
    cpumask: remove arch_send_call_function_ipi
    cpumask: arch_send_call_function_ipi_mask: s390
    cpumask: arch_send_call_function_ipi_mask: powerpc
    cpumask: arch_send_call_function_ipi_mask: mips
    cpumask: arch_send_call_function_ipi_mask: m32r
    cpumask: arch_send_call_function_ipi_mask: alpha
    cpumask: remove obsolete topology_core_siblings and topology_thread_siblings: ia64
    ...

    Linus Torvalds
     
  • * remove asm/atomic.h inclusion from linux/utsname.h --
    not needed after kref conversion
    * remove linux/utsname.h inclusion from files which do not need it

    NOTE: it looks like fs/binfmt_elf.c do not need utsname.h, however
    due to some personality stuff it _is_ needed -- cowardly leave ELF-related
    headers and files alone.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • It's only defined for NR_CPUS > BITS_PER_LONG; cpu_all_mask is always
    defined (and const).

    Signed-off-by: Rusty Russell

    Rusty Russell
     
  • These issues identified during an old-fashioned face-to-face code
    review extending over many hours.

    o Add comments for tricky parts of code, and correct comments
    that have passed their sell-by date.

    o Get rid of the vestiges of rcu_init_sched(), which is no
    longer needed now that PREEMPT_RCU is gone.

    o Move the #include of rcutree_plugin.h to the end of
    rcutree.c, which means that, rather than having a random
    collection of forward declarations, the new set of forward
    declarations document the set of plugins. The new home for
    this #include also allows __rcu_init_preempt() to move into
    rcutree_plugin.h.

    o Fix rcu_preempt_check_callbacks() to be static.

    Suggested-by: Josh Triplett
    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: akpm@linux-foundation.org
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar
    Peter Zijlstra

    Paul E. McKenney
     
  • * 'sfi-release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-sfi-2.6:
    SFI: remove unneeded includes
    sfi: Remove unused code
    SFI: Hook PCI MMCONFIG
    x86: add arch-specific SFI support
    SFI: add capability to parse ACPI tables
    SFI: add platform-independent core support
    SFI: create linux/sfi.h
    SFI: Simple Firmware Interface - MAINTAINERS, Kconfig

    Linus Torvalds
     

22 Sep, 2009

1 commit

  • Sizing of memory allocations shouldn't depend on the number of physical
    pages found in a system, as that generally includes (perhaps a huge amount
    of) non-RAM pages. The amount of what actually is usable as storage
    should instead be used as a basis here.

    Some of the calculations (i.e. those not intending to use high memory)
    should likely even use (totalram_pages - totalhigh_pages).

    Signed-off-by: Jan Beulich
    Acked-by: Rusty Russell
    Acked-by: Ingo Molnar
    Cc: Dave Airlie
    Cc: Kyle McMartin
    Cc: Jeremy Fitzhardinge
    Cc: Pekka Enberg
    Cc: Hugh Dickins
    Cc: "David S. Miller"
    Cc: Patrick McHardy
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Beulich
     

19 Sep, 2009

1 commit


16 Sep, 2009

3 commits

  • * git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6:
    Driver Core: devtmpfs - kernel-maintained tmpfs-based /dev
    debugfs: Modify default debugfs directory for debugging pktcdvd.
    debugfs: Modified default dir of debugfs for debugging UHCI.
    debugfs: Change debugfs directory of IWMC3200
    debugfs: Change debuhgfs directory of trace-events-sample.h
    debugfs: Fix mount directory of debugfs by default in events.txt
    hpilo: add poll f_op
    hpilo: add interrupt handler
    hpilo: staging for interrupt handling
    driver core: platform_device_add_data(): use kmemdup()
    Driver core: Add support for compatibility classes
    uio: add generic driver for PCI 2.3 devices
    driver-core: move dma-coherent.c from kernel to driver/base
    mem_class: fix bug
    mem_class: use minor as index instead of searching the array
    driver model: constify attribute groups
    UIO: remove 'default n' from Kconfig
    Driver core: Add accessor for device platform data
    Driver core: move dev_get/set_drvdata to drivers/base/dd.c
    Driver core: add new device to bus's list before probing

    Linus Torvalds
     
  • Devtmpfs lets the kernel create a tmpfs instance called devtmpfs
    very early at kernel initialization, before any driver-core device
    is registered. Every device with a major/minor will provide a
    device node in devtmpfs.

    Devtmpfs can be changed and altered by userspace at any time,
    and in any way needed - just like today's udev-mounted tmpfs.
    Unmodified udev versions will run just fine on top of it, and will
    recognize an already existing kernel-created device node and use it.
    The default node permissions are root:root 0600. Proper permissions
    and user/group ownership, meaningful symlinks, all other policy still
    needs to be applied by userspace.

    If a node is created by devtmps, devtmpfs will remove the device node
    when the device goes away. If the device node was created by
    userspace, or the devtmpfs created node was replaced by userspace, it
    will no longer be removed by devtmpfs.

    If it is requested to auto-mount it, it makes init=/bin/sh work
    without any further userspace support. /dev will be fully populated
    and dynamic, and always reflect the current device state of the kernel.
    With the commonly used dynamic device numbers, it solves the problem
    where static devices nodes may point to the wrong devices.

    It is intended to make the initial bootup logic simpler and more robust,
    by de-coupling the creation of the inital environment, to reliably run
    userspace processes, from a complex userspace bootstrap logic to provide
    a working /dev.

    Signed-off-by: Kay Sievers
    Signed-off-by: Jan Blunck
    Tested-By: Harald Hoyer
    Tested-By: Scott James Remnant
    Signed-off-by: Greg Kroah-Hartman

    Kay Sievers
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (46 commits)
    powerpc64: convert to dynamic percpu allocator
    sparc64: use embedding percpu first chunk allocator
    percpu: kill lpage first chunk allocator
    x86,percpu: use embedding for 64bit NUMA and page for 32bit NUMA
    percpu: update embedding first chunk allocator to handle sparse units
    percpu: use group information to allocate vmap areas sparsely
    vmalloc: implement pcpu_get_vm_areas()
    vmalloc: separate out insert_vmalloc_vm()
    percpu: add chunk->base_addr
    percpu: add pcpu_unit_offsets[]
    percpu: introduce pcpu_alloc_info and pcpu_group_info
    percpu: move pcpu_lpage_build_unit_map() and pcpul_lpage_dump_cfg() upward
    percpu: add @align to pcpu_fc_alloc_fn_t
    percpu: make @dyn_size mandatory for pcpu_setup_first_chunk()
    percpu: drop @static_size from first chunk allocators
    percpu: generalize first chunk allocator selection
    percpu: build first chunk allocators selectively
    percpu: rename 4k first chunk allocator to page
    percpu: improve boot messages
    percpu: fix pcpu_reclaim() locking
    ...

    Fix trivial conflict as by Tejun Heo in kernel/sched.c

    Linus Torvalds
     

12 Sep, 2009

1 commit

  • …/git/tip/linux-2.6-tip

    * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (64 commits)
    sched: Fix sched::sched_stat_wait tracepoint field
    sched: Disable NEW_FAIR_SLEEPERS for now
    sched: Keep kthreads at default priority
    sched: Re-tune the scheduler latency defaults to decrease worst-case latencies
    sched: Turn off child_runs_first
    sched: Ensure that a child can't gain time over it's parent after fork()
    sched: enable SD_WAKE_IDLE
    sched: Deal with low-load in wake_affine()
    sched: Remove short cut from select_task_rq_fair()
    sched: Turn on SD_BALANCE_NEWIDLE
    sched: Clean up topology.h
    sched: Fix dynamic power-balancing crash
    sched: Remove reciprocal for cpu_power
    sched: Try to deal with low capacity, fix update_sd_power_savings_stats()
    sched: Try to deal with low capacity
    sched: Scale down cpu_power due to RT tasks
    sched: Implement dynamic cpu_power
    sched: Add smt_gain
    sched: Update the cpu_power sum during load-balance
    sched: Add SD_PREFER_SIBLING
    ...

    Linus Torvalds
     

04 Sep, 2009

1 commit

  • Ingo was getting warnings from rcu_scheduler_starting()
    indicating that context switches had occurred before RCU ended
    its special early-boot handling of grace periods.

    This is a dangerous condition, as it indicates that RCU might
    have prematurely ended grace periods. This exploratory fix
    moves rcu_scheduler_starting() earlier in boot.

    Reported-by: Ingo Molnar
    Signed-off-by: Paul E. McKenney
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     

29 Aug, 2009

1 commit


27 Aug, 2009

1 commit

  • Some architectures initialize clocks and timers in late_time_init and
    x86 wants to do the same to avoid FIXMAP hackery for calibrating the
    TSC. That would result in undefined sched_clock readout and wreckaged
    printk timestamps again. We probably have those already on archs which
    do all their time/clock setup in late_time_init.

    There is no harm to move that after late_time_init except that a few
    more boot timestamps are stale. The scheduler is not active at that
    point so no real wreckage is expected.

    Signed-off-by: Thomas Gleixner
    LKML-Reference:
    Cc: linux-arch@vger.kernel.org

    Thomas Gleixner
     

26 Aug, 2009

1 commit


21 Aug, 2009

1 commit

  • One of my testboxes triggered this nasty stack overflow crash
    during SCSI probing:

    [ 5.874004] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
    [ 5.875004] device: 'sda': device_add
    [ 5.878004] BUG: unable to handle kernel NULL pointer dereference at 00000a0c
    [ 5.878004] IP: [] print_context_stack+0x81/0x110
    [ 5.878004] *pde = 00000000
    [ 5.878004] Thread overran stack, or stack corrupted
    [ 5.878004] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
    [ 5.878004] last sysfs file:
    [ 5.878004]
    [ 5.878004] Pid: 1, comm: swapper Not tainted (2.6.31-rc6-tip-01272-g9919e28-dirty #5685)
    [ 5.878004] EIP: 0060:[] EFLAGS: 00010083 CPU: 0
    [ 5.878004] EIP is at print_context_stack+0x81/0x110
    [ 5.878004] EAX: cf8a3000 EBX: cf8a3fe4 ECX: 00000049 EDX: 00000000
    [ 5.878004] ESI: b1cfce84 EDI: 00000000 EBP: cf8a3018 ESP: cf8a2ff4
    [ 5.878004] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
    [ 5.878004] Process swapper (pid: 1, ti=cf8a2000 task=cf8a8000 task.ti=cf8a3000)
    [ 5.878004] Stack:
    [ 5.878004] b1004867 fffff000 cf8a3ffc
    [ 5.878004] Call Trace:
    [ 5.878004] [] ? kernel_thread_helper+0x7/0x10
    [ 5.878004] BUG: unable to handle kernel NULL pointer dereference at 00000a0c
    [ 5.878004] IP: [] print_context_stack+0x81/0x110
    [ 5.878004] *pde = 00000000
    [ 5.878004] Thread overran stack, or stack corrupted
    [ 5.878004] Oops: 0000 [#2] PREEMPT SMP DEBUG_PAGEALLOC

    The oops did not reveal any more details about the real stack
    that we have and the system got into an infinite loop of
    recursive pagefaults.

    So i booted with CONFIG_STACK_TRACER=y and the 'stacktrace' boot
    parameter. The box did not crash (timings/conditions probably
    changed a tiny bit to trigger the catastrophic crash), but the
    /debug/tracing/stack_trace file was rather revealing:

    Depth Size Location (72 entries)
    ----- ---- --------
    0) 3704 52 __change_page_attr+0xb8/0x290
    1) 3652 24 __change_page_attr_set_clr+0x43/0x90
    2) 3628 60 kernel_map_pages+0x108/0x120
    3) 3568 40 prep_new_page+0x7d/0x130
    4) 3528 84 get_page_from_freelist+0x106/0x420
    5) 3444 116 __alloc_pages_nodemask+0xd7/0x550
    6) 3328 36 allocate_slab+0xb1/0x100
    7) 3292 36 new_slab+0x1c/0x160
    8) 3256 36 __slab_alloc+0x133/0x2b0
    9) 3220 4 kmem_cache_alloc+0x1bb/0x1d0
    10) 3216 108 create_object+0x28/0x250
    11) 3108 40 kmemleak_alloc+0x81/0xc0
    12) 3068 24 kmem_cache_alloc+0x162/0x1d0
    13) 3044 52 scsi_pool_alloc_command+0x29/0x70
    14) 2992 20 scsi_host_alloc_command+0x22/0x70
    15) 2972 24 __scsi_get_command+0x1b/0x90
    16) 2948 28 scsi_get_command+0x35/0x90
    17) 2920 24 scsi_setup_blk_pc_cmnd+0xd4/0x100
    18) 2896 128 sd_prep_fn+0x332/0xa70
    19) 2768 36 blk_peek_request+0xe7/0x1d0
    20) 2732 56 scsi_request_fn+0x54/0x520
    21) 2676 12 __generic_unplug_device+0x2b/0x40
    22) 2664 24 blk_execute_rq_nowait+0x59/0x80
    23) 2640 172 blk_execute_rq+0x6b/0xb0
    24) 2468 32 scsi_execute+0xe0/0x140
    25) 2436 64 scsi_execute_req+0x152/0x160
    26) 2372 60 scsi_vpd_inquiry+0x6c/0x90
    27) 2312 44 scsi_get_vpd_page+0x112/0x160
    28) 2268 52 sd_revalidate_disk+0x1df/0x320
    29) 2216 92 rescan_partitions+0x98/0x330
    30) 2124 52 __blkdev_get+0x309/0x350
    31) 2072 8 blkdev_get+0xf/0x20
    32) 2064 44 register_disk+0xff/0x120
    33) 2020 36 add_disk+0x6e/0xb0
    34) 1984 44 sd_probe_async+0xfb/0x1d0
    35) 1940 44 __async_schedule+0xf4/0x1b0
    36) 1896 8 async_schedule+0x12/0x20
    37) 1888 60 sd_probe+0x305/0x360
    38) 1828 44 really_probe+0x63/0x170
    39) 1784 36 driver_probe_device+0x5d/0x60
    40) 1748 16 __device_attach+0x49/0x50
    41) 1732 32 bus_for_each_drv+0x5b/0x80
    42) 1700 24 device_attach+0x6b/0x70
    43) 1676 16 bus_attach_device+0x47/0x60
    44) 1660 76 device_add+0x33d/0x400
    45) 1584 52 scsi_sysfs_add_sdev+0x6a/0x2c0
    46) 1532 108 scsi_add_lun+0x44b/0x460
    47) 1424 116 scsi_probe_and_add_lun+0x182/0x4e0
    48) 1308 36 __scsi_add_device+0xd9/0xe0
    49) 1272 44 ata_scsi_scan_host+0x10b/0x190
    50) 1228 24 async_port_probe+0x96/0xd0
    51) 1204 44 __async_schedule+0xf4/0x1b0
    52) 1160 8 async_schedule+0x12/0x20
    53) 1152 48 ata_host_register+0x171/0x1d0
    54) 1104 60 ata_pci_sff_activate_host+0xf3/0x230
    55) 1044 44 ata_pci_sff_init_one+0xea/0x100
    56) 1000 48 amd_init_one+0xb2/0x190
    57) 952 8 local_pci_probe+0x13/0x20
    58) 944 32 pci_device_probe+0x68/0x90
    59) 912 44 really_probe+0x63/0x170
    60) 868 36 driver_probe_device+0x5d/0x60
    61) 832 20 __driver_attach+0x89/0xa0
    62) 812 32 bus_for_each_dev+0x5b/0x80
    63) 780 12 driver_attach+0x1e/0x20
    64) 768 72 bus_add_driver+0x14b/0x2d0
    65) 696 36 driver_register+0x6e/0x150
    66) 660 20 __pci_register_driver+0x53/0xc0
    67) 640 8 amd_init+0x14/0x16
    68) 632 572 do_one_initcall+0x2b/0x1d0
    69) 60 12 do_basic_setup+0x56/0x6a
    70) 48 20 kernel_init+0x84/0xce
    71) 28 28 kernel_thread_helper+0x7/0x10

    There's a lot of fat functions on that stack trace, but
    the largest of all is do_one_initcall(). This is due to
    the boot trace entry variables being on the stack.

    Fixing this is relatively easy, initcalls are fundamentally
    serialized, so we can move the local variables to file scope.

    Note that this large stack footprint was present for a
    couple of months already - what pushed my system over
    the edge was the addition of kmemleak to the call-chain:

    6) 3328 36 allocate_slab+0xb1/0x100
    7) 3292 36 new_slab+0x1c/0x160
    8) 3256 36 __slab_alloc+0x133/0x2b0
    9) 3220 4 kmem_cache_alloc+0x1bb/0x1d0
    10) 3216 108 create_object+0x28/0x250
    11) 3108 40 kmemleak_alloc+0x81/0xc0
    12) 3068 24 kmem_cache_alloc+0x162/0x1d0
    13) 3044 52 scsi_pool_alloc_command+0x29/0x70

    This pushes the total to ~3800 bytes, only a tiny bit
    more was needed to corrupt the on-kernel-stack thread_info.

    The fix reduces the stack footprint from 572 bytes
    to 28 bytes.

    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Frederic Weisbecker
    Cc: Steven Rostedt
    Cc: Catalin Marinas
    Cc: Jens Axboe
    Cc: Linus Torvalds
    Cc:
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

14 Aug, 2009

1 commit

  • Conflicts:
    arch/sparc/kernel/smp_64.c
    arch/x86/kernel/cpu/perf_counter.c
    arch/x86/kernel/setup_percpu.c
    drivers/cpufreq/cpufreq_ondemand.c
    mm/percpu.c

    Conflicts in core and arch percpu codes are mostly from commit
    ed78e1e078dd44249f88b1dd8c76dafb39567161 which substituted many
    num_possible_cpus() with nr_cpu_ids. As for-next branch has moved all
    the first chunk allocators into mm/percpu.c, the changes are moved
    from arch code to mm/percpu.c.

    Signed-off-by: Tejun Heo

    Tejun Heo