26 Aug, 2008

2 commits

  • This fixes a regression that was indirectly caused by commit
    1184dc2ffe2c8fb9afb766d870850f2c3165ef25 ("x86: modify Kconfig to allow
    up to 4096 cpus").

    Allowing 4k CPU's is not practical at this time, because we still have a
    number of places that have several 'cpumask_t's on the stack, and a
    4k-bit cpumask is 512 bytes of stack-space for each such variable. This
    literally caused functions like 'smp_call_function_mask' to have a 2.5kB
    stack frame, and several functions to have 2kB stackframes.

    With an 8kB stack total, smashing the stack was simply much too likely.
    At least bugzilla entry

    http://bugzilla.kernel.org/show_bug.cgi?id=11342

    was due to this.

    The earlier commit to not inline load_module() into sys_init_module()
    fixed the particular symptoms of this that Alan Brunelle saw in that
    bugzilla entry, but the huge stack waste by cpumask_t's was the more
    direct cause.

    Some day we'll have allocation helpers that allocate large CPU masks
    dynamically, but in the meantime we simply cannot allow cpumasks this
    large.

    Cc: Alan D. Brunelle
    Cc: Mike Travis
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • …git/tip/linux-2.6-tip

    * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    x86: add X86_FEATURE_XMM4_2 definitions
    x86: fix cpufreq + sched_clock() regression
    x86: fix HPET regression in 2.6.26 versus 2.6.25, check hpet against BAR, v3
    x86: do not enable TSC notifier if we don't need it
    x86 MCE: Fix CPU hotplug problem with multiple multicore AMD CPUs
    x86: fix: make PCI ECS for AMD CPUs hotplug capable
    x86: fix: do not run code in amd_bus.c on non-AMD CPUs

    Linus Torvalds
     

25 Aug, 2008

5 commits

  • The shadow code assigns a pte directly in one place, which is nonatomic on
    i386 can can cause random memory references. Fix by using an atomic setter.

    Signed-off-by: Avi Kivity

    Avi Kivity
     
  • I noticed that my sched_clock() was slow on a number of machine, so I
    started looking at cpufreq.

    The below seems to fix the problem for me.

    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Ingo Molnar
     
  • David Witbrodt tracked down (and bisected) a hpet bootup hang on his
    system to the following problem: a BIOS bug made the hpet device
    visible as a generic PCI device. If e820 reserved entries happen to
    be registered first in the resource tree [which v2.6.26 started doing],
    then the PCI code will reallocate that device's BAR to some other
    address - breaking timer IRQs and hanging the system.

    ( Normally hpet devices are hidden by the BIOS from the OS's PCI
    discovery via chipset magic. Sometimes the hpet is not a PCI device
    at all. )

    Solve this fundamental fragility by making non-PCI platform drivers
    insert resources into the resource tree even if it overlaps the e820
    reserved entry, to keep the resource manager from updating the BAR.

    Also do these checks for the ioapic and mmconfig addresses, and emit
    a warning if this happens.

    Bisected-by: David Witbrodt
    Signed-off-by: Yinghai Lu
    Tested-by: David Witbrodt
    Signed-off-by: Ingo Molnar

    Yinghai Lu
     
  • Impact: crash on non-TSC-equipped CPUs

    Don't enable the TSC notifier if we *either*:

    1. don't have a CPU, or
    2. have a CPU with constant TSC.

    In either of those cases, the notifier is either damaging (1) or useless(2).

    From: Linus Torvalds
    Signed-off-by: H. Peter Anvin

    Linus Torvalds
     

24 Aug, 2008

1 commit


23 Aug, 2008

3 commits

  • During CPU hot-remove the sysfs directory created by
    threshold_create_bank(), defined in
    arch/x86/kernel/cpu/mcheck/mce_amd_64.c, has to be removed before
    its parent directory, created by mce_create_device(), defined in
    arch/x86/kernel/cpu/mcheck/mce_64.c . Moreover, when the CPU in
    question is hotplugged again, obviously the latter has to be created
    before the former. At present, the right ordering is not enforced,
    because all of these operations are carried out by CPU hotplug
    notifiers which are not appropriately ordered with respect to each
    other. This leads to serious problems on systems with two or more
    multicore AMD CPUs, among other things during suspend and hibernation.

    Fix the problem by placing threshold bank CPU hotplug callbacks in
    mce_cpu_callback(), so that they are invoked at the right places,
    if defined. Additionally, use kobject_del() to remove the sysfs
    directory associated with the kobject created by
    kobject_create_and_add() in threshold_create_bank(), to prevent the
    kernel from crashing during CPU hotplug operations on systems with
    two or more multicore AMD CPUs.

    This patch fixes bug #11337.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Andi Kleen
    Tested-by: Mark Langsdorf
    Signed-off-by: Ingo Molnar

    Rafael J. Wysocki
     
  • Until now, PCI ECS setup was performed at boot time only and for cpus
    that are enabled then. This patch fixes this and adds cpu hotplug.

    Tests sequence (check if ECS bit is set when bringing cpu online again):

    # ( perl -e 'sysseek(STDIN, 0xC001001F, 0)'; hexdump -n 8 -e '2/4 "%08x " "\n"' ) < /dev/cpu/1/msr
    00000008 00404010
    # ( perl -e 'sysseek(STDOUT, 0xC001001F, 0); print pack "l*", 8, 0x00400010' ) > /dev/cpu/1/msr
    # ( perl -e 'sysseek(STDIN, 0xC001001F, 0)'; hexdump -n 8 -e '2/4 "%08x " "\n"' ) < /dev/cpu/1/msr
    00000008 00400010
    # echo 0 > /sys/devices/system/cpu/cpu1/online
    # echo 1 > /sys/devices/system/cpu/cpu1/online
    # ( perl -e 'sysseek(STDIN, 0xC001001F, 0)'; hexdump -n 8 -e '2/4 "%08x " "\n"' ) < /dev/cpu/1/msr
    00000008 00404010

    Reported-by: Yinghai Lu
    Signed-off-by: Robert Richter
    Signed-off-by: Ingo Molnar

    Robert Richter
     
  • Jan Beulich wrote:

    > Even worse - this would even try to access the MSR on non-AMD CPUs
    > (currently probably prevented just by the fact that only AMD ones use
    > family values of 0x10 or higher).

    This patch adds cpu vendor check to the postcore_initcalls.

    Reported-by: Jan Beulich
    Signed-off-by: Robert Richter
    Signed-off-by: Ingo Molnar

    Robert Richter
     

22 Aug, 2008

12 commits

  • …git/tip/linux-2.6-tip

    * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    x86: work around MTRR mask setting, v2
    x86: fix section mismatch warning - uv_cpu_init
    x86: fix VMI for early params
    x86: fix two modpost warnings in mm/init_64.c
    x86: fix 1:1 mapping init on 64-bit (memory hotplug case)
    x86: work around MTRR mask setting
    x86: PAT Update validate_pat_support for intel CPUs
    devmem, x86: PAT Change /dev/mem mmap with O_SYNC to use UC_MINUS
    x86: PAT proper tracking of set_memory_uc and friends
    x86: fix BUG: unable to handle kernel paging request (numaq_tsc_disable)
    x86: export pv_lock_ops non-GPL
    x86, mmiotrace: silence section mismatch warning - leave_uniprocessor
    x86: use WARN() in arch/x86/kernel
    x86: use WARN() in arch/x86/mm/ioremap.c
    werror: fix pci calgary
    x86: fix oprofile + hibernation badness
    x86, SGI UV: hardcode the TLB flush interrupt system vector
    x86: fix Xorg startup/shutdown slowdown with PAT
    x86: fix "kernel won't boot on a Cyrix MediaGXm (Geode)"
    x86 iommu: remove unneeded parenthesis

    Linus Torvalds
     
  • improve the debug printout:

    - make it actually display something
    - print it only once

    would be nice to have a WARN_ONCE() facility, to feed such things to
    kerneloops.org.

    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • WARNING: vmlinux.o(.cpuinit.text+0x3cc4): Section mismatch in reference from the function uv_cpu_init() to the function .init.text:uv_system_init()
    The function __cpuinit uv_cpu_init() references
    a function __init uv_system_init().
    If uv_system_init is only used by uv_cpu_init then
    annotate uv_system_init with a matching annotation.

    uv_system_init was ment to be called only once, so do it from codepath
    (native_smp_prepare_cpus) which is called once, right before activation
    of other cpus (smp_init).

    Note: old code relied on uv_node_to_blade being initialized to 0,
    but it'a not initialized from anywhere.

    Signed-off-by: Marcin Slusarz
    Acked-by: Jack Steiner
    Signed-off-by: Ingo Molnar

    Marcin Slusarz
     
  • while fixing a different bug i moved the call to vmi_init before
    early params could be parsed.

    This broke the vmi specific commandline parameters.
    Fix that, by moving vmi initialization after kernel has got a chance to
    parse early parameters.

    Signed-off-by: Alok N Kataria
    Signed-off-by: Ingo Molnar

    Alok Kataria
     
  • early_io{re,un}map() are __init and hence can't be called from __meminit
    functions.

    Signed-off-by: Jan Beulich
    Signed-off-by: Ingo Molnar

    Jan Beulich
     
  • While I don't have a hotplug capable system at hand, I think two issues need
    fixing:

    - pud_phys (in kernel_physical_ampping_init()) would remain uninitialized in
    the after_bootmem case

    - the locking done just around phys_pmd_{init,update}() would leave out pgd
    updates, and it was needlessly covering code portions that do allocations
    (perhaps using a more friendly gfp value in alloc_low_page() would then be
    possible)

    Signed-off-by: Jan Beulich
    Signed-off-by: Ingo Molnar

    Jan Beulich
     
  • Joshua Hoblitt reported that only 3 GB of his 16 GB of RAM is
    usable. Booting with mtrr_show showed us the BIOS-initialized
    MTRR settings - which are all wrong.

    So the root cause is that the BIOS has not set the mask correctly:

    > [ 0.429971] MSR00000200: 00000000d0000000
    > [ 0.433305] MSR00000201: 0000000ff0000800
    > should be ==> [ 0.433305] MSR00000201: 0000003ff0000800
    >
    > [ 0.436638] MSR00000202: 00000000e0000000
    > [ 0.439971] MSR00000203: 0000000fe0000800
    > should be ==> [ 0.439971] MSR00000203: 0000003fe0000800
    >
    > [ 0.443304] MSR00000204: 0000000000000006
    > [ 0.446637] MSR00000205: 0000000c00000800
    > should be ==> [ 0.446637] MSR00000205: 0000003c00000800
    >
    > [ 0.449970] MSR00000206: 0000000400000006
    > [ 0.453303] MSR00000207: 0000000fe0000800
    > should be ==> [ 0.453303] MSR00000207: 0000003fe0000800
    >
    > [ 0.456636] MSR00000208: 0000000420000006
    > [ 0.459970] MSR00000209: 0000000ff0000800
    > should be ==> [ 0.459970] MSR00000209: 0000003ff0000800

    So detect this borkage and add the prefix 111.

    Signed-off-by: Yinghai Lu
    Cc:
    Signed-off-by: Ingo Molnar

    Yinghai Lu
     
  • * master.kernel.org:/home/rmk/linux-2.6-arm:
    [ARM] 5212/1: pxa: fix build error when CPU_PXA310 is not defined
    [ARM] 5208/1: fsg-setup.c fixes
    [ARM] fix impd1.c build warning
    [ARM] e400 config use MFP
    [ARM] e740 config use MFP
    [ARM] Fix eseries IRQ limit
    [ARM] clocklib: Update users of aliases to new API
    [ARM] clocklib: Allow dynamic alias creation
    [ARM] eseries: whitespace fixes and cleanup

    Linus Torvalds
     
  • Signed-off-by: Martin Schwidefsky

    Martin Schwidefsky
     
  • ext4 does not work on s390 because ext2_find_next_bit is broken. Fortunately
    this function is only used by ext4. The function uses ffs which does not work
    analog to ffz. The result of ffs has an offset of 1 which is not taken into
    account. To fix this use the low level __ffs_word function directly instead
    of the ill defined ffs.

    In addition the patch improves find_next_zero_bit and ext2_find_next_zero_bit
    by passing the bit offset into __ffz_word instead of adding it after the
    function call returned.

    Signed-off-by: Eric Sandeen
    Signed-off-by: Martin Schwidefsky

    Eric Sandeen
     
  • Remove the now unneeded s390_idle.lock spinlock initialization after
    Josef Sipek did it the right way in arch/s390/kernel/process.c.

    Signed-off-by: Heiko Carstens
    Signed-off-by: Martin Schwidefsky

    Heiko Carstens
     
  • Ever since commit 43ca5c3a1cefdaa09231d64485b8f676118bf1e0 ([S390] Convert
    monitor calls to function calls.), the kernel refused to IPL with spinlock
    debugging enabled.

    BUG: spinlock bad magic on CPU#0, swapper/0
    lock: 00000000003a4668, .magic: 00000000, .owner: /-1, .owner_cpu: 0
    CPU: 0 Not tainted 2.6.25 #1
    Process swapper (pid: 0, task: 000000000034f958, ksp: 0000000000377d60)
    0000000000377ab8 0000000000352628 0000000000377d60 0000000000377d60
    0000000000016af4 00000000fffff7b5 0000000000377d60 0000000000000000
    0000000000000000 0000000000377a18 0000000000000009 0000000000377a18
    0000000000377a78 000000000023c920 0000000000016af4 0000000000377a18
    0000000000000005 0000000000000000 0000000000377b58 0000000000377ab8
    Call Trace:
    ([] show_trace+0xdc/0x108)
    [] show_stack+0xc2/0xfc
    [] dump_stack+0xb2/0xc0
    []

    Signed-off-by: Josef 'Jeff' Sipek
    Signed-off-by: Heiko Carstens
    Signed-off-by: Martin Schwidefsky

    Josef 'Jeff' Sipek
     

21 Aug, 2008

10 commits

  • Pentium III and Core Solo/Duo CPUs have an erratum
    " Page with PAT set to WC while associated MTRR is UC may consolidate to UC "
    which can result in WC setting in PAT to be ineffective. We will disable
    PAT on such CPUs, so that we can continue to use MTRR WC setting.

    Signed-off-by: Venkatesh Pallipadi
    Signed-off-by: Ingo Molnar

    venkatesh.pallipadi@intel.com
     
  • All kernel mappings like ioremap(), etc uses UC_MINUS as the type. /dev/mem
    mappings with /dev/mem being opened with O_SYNC however was using UC,
    resulting in a conflict with /dev/mem mmap failing. This seems to be
    affecting some apps (one being flashrom) which are using O_SYNC and which were
    working before.

    Switch /dev/mem with O_SYNC also to UC_MINUS.

    Signed-off-by: Venkatesh Pallipadi
    Signed-off-by: Ingo Molnar

    venkatesh.pallipadi@intel.com
     
  • Big thinko in pat memtype tracking code. reserve_memtype should be called
    with physical address and not virtual address.

    Signed-off-by: Venkatesh Pallipadi
    Signed-off-by: Suresh Siddha
    Signed-off-by: Ingo Molnar

    venkatesh.pallipadi@intel.com
     
  • This section mismatch:

    >> Seems to be a section mismatch; init_intel() is __cpuinit while
    >> numaq_tsc_disable() is __init. Seems to be introduced in:
    >>
    >> commit 64898a8bad8c94ad7a4bd5cc86b66edfbb081f4a
    >> Author: Yinghai Lu
    >> Date: Sat Jul 19 18:01:16 2008 -0700
    >>
    >> x86: extend and use x86_quirks to clean up NUMAQ code
    >
    > Oops, I am wrong about numaq_tsc_disable() being __init. Still, I
    > believe that Yinghai might be able to say what's really wrong :-)

    Would lead to this crash:

    BUG: unable to handle kernel paging request at c08a45f0
    IP: [] numaq_tsc_disable+0x0/0x40

    Fixed by the patch below.

    Signed-off-by: Vegard Nossum
    Signed-off-by: Ingo Molnar

    Vegard Nossum
     
  • None of the spinlock API is exported GPL, so there's no reason for
    pv_lock_ops to be.

    Signed-off-by: Jeremy Fitzhardinge
    Signed-off-by: Ingo Molnar
    Cc: drago01

    Jeremy Fitzhardinge
     
  • WARNING: vmlinux.o(.text+0x180af): Section mismatch in reference from the function leave_uniprocessor() to the function .cpuinit.text:cpu_up()
    The function leave_uniprocessor() references
    the function __cpuinit cpu_up().
    This is often because leave_uniprocessor lacks a __cpuinit
    annotation or the annotation of cpu_up is wrong.

    leave_uniprocessor calls cpu_up only when CONFIG_HOTPLUG_CPU is set,
    so it can be safely annotated as __ref

    Signed-off-by: Marcin Slusarz
    Cc: Pekka Paalanen
    Signed-off-by: Ingo Molnar
    Cc: Ingo Molnar
    Cc: Pekka Paalanen

    Marcin Slusarz
     
  • Use WARN() instead of a printk+WARN_ON() pair; this way the message
    becomes part of the warning section for better reporting/collection.
    This also allowed the folding of some if()'s into the WARN()

    Signed-off-by: Arjan van de Ven
    Cc: akpm@linux-foundation.org
    Signed-off-by: Ingo Molnar

    Arjan van de Ven
     
  • Use WARN() instead of a printk+WARN_ON() pair; this way the message
    becomes part of the warning section for better reporting/collection.

    Signed-off-by: Arjan van de Ven
    Cc: akpm@linux-foundation.org
    Signed-off-by: Ingo Molnar

    Arjan van de Ven
     
  • Fix an integer comparison always false warning in the PCI Calgary 64 driver.

    A u8 is being compared to something that's 512 by default, resulting in the
    following warning:

    arch/x86/kernel/pci-calgary_64.c:1285: warning: comparison is always false due to limited range of data type

    This was introduced by patch b34e90b8f0f30151349134f87b5dc6ef75a5218c.

    Signed-off-by: David Howells
    Signed-off-by: Ingo Molnar

    David Howells
     
  • Fix
    arch/arm/mach-pxa/pxa300.c:94: error: 'CKEN_MMC3' undeclared here (not in a function)
    when building for PXA300.

    Signed-off-by: Mike Rapoport
    Acked-by: Eric Miao
    Signed-off-by: Russell King

    Mike Rapoport
     

20 Aug, 2008

7 commits

  • * 'sh/for-2.6.27' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6:
    sh: Provide a FLAT_PLAT_INIT() definition.
    binfmt_flat: Stub in a FLAT_PLAT_INIT().
    video: export sh_mobile_lcdc panel size
    sh: select memchunk size using kernel cmdline
    sh: export sh7723 VEU as VEU2H
    input: migor_ts compile and detection fix
    sh: remove MSTPCR defines from Migo-R header file
    sh: Update sh7763rdp defconfig
    sh: Add support sh7760fb to sh7763rdp board
    sh: Add support sh_eth to sh7763rdp board
    sh: Disable 64kB hugetlbpage size when using 64kB PAGE_SIZE.
    sh: Don't export __{s,u}divsi3_i4i from SH-2 libgcc.
    fix SH7705_CACHE_32KB compilation
    sh: mach-x3proto: Fix up smc91x platform data.

    Linus Torvalds
     
  • * 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc:
    powerpc: Fix vio_bus_probe oops on probe error
    powerpc/ibmebus: Restore "name" sysfs attribute on ibmebus devices
    powerpc: Fix /dev/oldmem interface for kdump
    powerpc/spufs: Remove invalid semicolon after if statement
    powerpc/spufs: reference context while dropping state mutex in scheduler
    powerpc/spufs: fix npc setting for NOSCHED contexts

    Linus Torvalds
     
  • Vegard Nossum reported oprofile + hibernation problems:

    > Now some warnings:
    >
    > ------------[ cut here ]------------
    > WARNING: at /uio/arkimedes/s29/vegardno/git-working/linux-2.6/kernel/smp.c:328 s
    > mp_call_function_mask+0x194/0x1a0()

    The usual problem: the suspend function when interrupts are
    already disabled calls smp_call_function which is not allowed with
    interrupt off. But at this point all the other CPUs should be already
    down anyways, so it should be enough to just drop that.

    This patch should fix that problem at least by fixing cpu hotplug&
    suspend support.

    [ mingo@elte.hu: fixed 5 coding style errors. ]

    Signed-off-by: Andi Kleen
    Tested-by: Vegard Nossum
    Signed-off-by: Ingo Molnar

    Andi Kleen
     
  • The UV TLB shootdown mechanism needs a system interrupt vector.

    Its vector had been hardcoded as 200, but needs to moved to the reserved
    system vector range so that it does not collide with some device vector.

    This is still temporary until dynamic system IRQ allocation is provided.
    But it will be needed when real UV hardware becomes available and runs 2.6.27.

    Signed-off-by: Cliff Wickman
    Signed-off-by: Ingo Molnar

    Cliff Wickman
     
  • Rene Herman reported significant Xorg startup/shutdown slowdown due
    to PAT. It turns out that the memtype list has thousands of entries.

    Add cached_entry to list add routine, in order to speed up the
    lookup for sequential reserve_memtype calls.

    Reported-by: Rene Herman
    Signed-off-by: Venkatesh Pallipadi
    Signed-off-by: Ingo Molnar

    Venki Pallipadi
     
  • Cyrix MediaGXm/Cx5530 Unicorn Revision 1.19.3B has stopped
    booting starting at v2.6.22.

    The reason is this commit:

    > commit f25f64ed5bd3c2932493681bdfdb483ea707da0a
    > Author: Juergen Beisert
    > Date: Sun Jul 22 11:12:38 2007 +0200
    >
    > x86: Replace NSC/Cyrix specific chipset access macros by inlined functions.

    this commit activated a macro which was dormant before due to (buggy)
    macro side-effects.

    I've looked through various datasheets and found that the GXm and GXLV
    Geode processors don't have an incrementor.

    Remove the incrementor setup entirely. As the incrementor value
    differs according to clock speed and we would hope that the BIOS
    configures it correctly, it is probably the right solution.

    Cc:
    Signed-off-by: Ingo Molnar

    Samuel Sieb
     
  • When CMO is enabled and booted on a non CMO system and the VIO
    device's probe function fails, an oops can result since
    vio_cmo_bus_remove is called when it should not. This fixes it by
    avoiding the vio_cmo_bus_remove call on platforms that don't implement
    CMO.

    cpu 0x0: Vector: 300 (Data Access) at [c00000000e13b3d0]
    pc: c000000000020d34: .vio_cmo_bus_remove+0xc0/0x1f4
    lr: c000000000020ca4: .vio_cmo_bus_remove+0x30/0x1f4
    sp: c00000000e13b650
    msr: 8000000000009032
    dar: 0
    dsisr: 40000000
    current = 0xc00000000e0566c0
    paca = 0xc0000000006f9b80
    pid = 2428, comm = modprobe
    enter ? for help
    [c00000000e13b6e0] c000000000021d94 .vio_bus_probe+0x2f8/0x33c
    [c00000000e13b7a0] c00000000029fc88 .driver_probe_device+0x13c/0x200
    [c00000000e13b830] c00000000029fdac .__driver_attach+0x60/0xa4
    [c00000000e13b8c0] c00000000029f050 .bus_for_each_dev+0x80/0xd8
    [c00000000e13b980] c00000000029f9ec .driver_attach+0x28/0x40
    [c00000000e13ba00] c00000000029f630 .bus_add_driver+0xd4/0x284
    [c00000000e13baa0] c0000000002a01bc .driver_register+0xc4/0x198
    [c00000000e13bb50] c00000000002168c .vio_register_driver+0x40/0x5c
    [c00000000e13bbe0] d0000000003b3f1c .ibmvfc_module_init+0x70/0x109c [ibmvfc]
    [c00000000e13bc70] c0000000000acf08 .sys_init_module+0x184c/0x1a10
    [c00000000e13be30] c000000000008748 syscall_exit+0x0/0x40

    Signed-off-by: Brian King
    Signed-off-by: Paul Mackerras

    Brian King