29 May, 2009

2 commits

  • Addresses http://bugzilla.kernel.org/show_bug.cgi?id=13302

    On x86 and x86-64, it is possible that page tables are shared beween
    shared mappings backed by hugetlbfs. As part of this,
    page_table_shareable() checks a pair of vma->vm_flags and they must match
    if they are to be shared. All VMA flags are taken into account, including
    VM_LOCKED.

    The problem is that VM_LOCKED is cleared on fork(). When a process with a
    shared memory segment forks() to exec() a helper, there will be shared
    VMAs with different flags. The impact is that the shared segment is
    sometimes considered shareable and other times not, depending on what
    process is checking.

    What happens is that the segment page tables are being shared but the
    count is inaccurate depending on the ordering of events. As the page
    tables are freed with put_page(), bad pmd's are found when some of the
    children exit. The hugepage counters also get corrupted and the Total and
    Free count will no longer match even when all the hugepage-backed regions
    are freed. This requires a reboot of the machine to "fix".

    This patch addresses the problem by comparing all flags except VM_LOCKED
    when deciding if pagetables should be shared or not for hugetlbfs-backed
    mapping.

    Signed-off-by: Mel Gorman
    Acked-by: Hugh Dickins
    Cc: Ingo Molnar
    Cc:
    Cc: Lee Schermerhorn
    Cc: KOSAKI Motohiro
    Cc:
    Cc: Eric B Munson
    Cc: Adam Litke
    Cc: Andy Whitcroft
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • The flat loader uses an architecture's flat_stack_align() to align the
    stack but assumes word-alignment is enough for the data sections.

    However, on the Xtensa S6000 we have registers up to 128bit width
    which can be used from userspace and therefor need userspace stack and
    data-section alignment of at least this size.

    This patch drops flat_stack_align() and uses the same alignment that
    is required for slab caches, ARCH_SLAB_MINALIGN, or wordsize if it's
    not defined by the architecture.

    It also fixes m32r which was obviously kaput, aligning an
    uninitialized stack entry instead of the stack pointer.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Oskar Schirmer
    Cc: David Howells
    Cc: Russell King
    Cc: Bryan Wu
    Cc: Geert Uytterhoeven
    Acked-by: Paul Mundt
    Cc: Greg Ungerer
    Signed-off-by: Johannes Weiner
    Acked-by: Mike Frysinger
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oskar Schirmer
     

28 May, 2009

1 commit


27 May, 2009

16 commits

  • The implementation we just revived has issues, such as using a
    Kconfig-defined virtual address area in kernel space that nothing
    actually carves out (and thus will overlap whatever is there),
    or having some dependencies on being self contained in a single
    PTE page which adds unnecessary constraints on the kernel virtual
    address space.

    This fixes it by using more classic PTE accessors and automatically
    locating the area for consistent memory, carving an appropriate hole
    in the kernel virtual address space, leaving only the size of that
    area as a Kconfig option. It also brings some dma-mask related fixes
    from the ARM implementation which was almost identical initially but
    grew its own fixes.

    Signed-off-by: Benjamin Herrenschmidt

    Benjamin Herrenschmidt
     
  • Make FIXADDR_TOP a compile time constant and cleanup a
    couple of definitions relative to the layout of the kernel
    address space on ppc32. We also print out that layout at
    boot time for debugging purposes.

    This is a pre-requisite for properly fixing non-coherent
    DMA allocactions.

    Signed-off-by: Benjamin Herrenschmidt

    Benjamin Herrenschmidt
     
  • (pre-requisite to make the next patches more palatable)

    Signed-off-by: Benjamin Herrenschmidt

    Benjamin Herrenschmidt
     
  • Fix some more fallout of the string changes:

    CC arch/blackfin/lib/strncmp.o
    In file included from include/linux/bitmap.h:9,
    from include/linux/nodemask.h:90,
    from include/linux/mmzone.h:17,
    from include/linux/gfp.h:5,
    from include/linux/kmod.h:23,
    from include/linux/module.h:14,
    from arch/blackfin/lib/strncmp.c:14:
    include/linux/string.h: In function ‘strstarts’:
    include/linux/string.h:132: error: implicit declaration of function ‘strncmp’
    make[1]: *** [arch/blackfin/lib/strncmp.o] Error 1

    Signed-off-by: Mike Frysinger
    CC: Rusty Russell

    Mike Frysinger
     
  • We don't create a include/asm/mach/ symlink anymore, so we don't need the
    .gitignore for it.

    Signed-off-by: Mike Frysinger

    Mike Frysinger
     
  • Signed-off-by: Mike Frysinger

    Mike Frysinger
     
  • Signed-off-by: Mike Frysinger

    Mike Frysinger
     
  • This reverts commit 33f00dcedb0e22cdb156a23632814fc580fcfcf8.

    While it was a good idea to try to use the mm/vmalloc.c allocator instead
    of our own (in fact, ours is itself a dup on an old variant of the vmalloc
    one), unfortunately, the approach is terminally busted since
    dma_alloc_coherent() can be called at interrupt time or in atomic contexts
    and there's little chances we'll make the code in mm/vmalloc.c cope with\ that :-(

    Until we can get the generic code to forbid that idiocy and fix all
    drivers abusing it, we pretty much have no choice but revert to
    our custom virtual space allocator.

    There's also a problem with SMP safety since freeing such mapping
    would require an IPI which cannot be done at interrupt time.

    However, right now, I don't think we support any platform that is
    both SMP and has non-coherent DMA (don't laugh, I know such things
    do exist !) so we can sort that out later.

    Signed-off-by: Benjamin Herrenschmidt

    Benjamin Herrenschmidt
     
  • …git/tip/linux-2.6-tip

    * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    x86: avoid back to back on_each_cpu in cpa_flush_array
    x86, relocs: ignore R_386_NONE in kernel relocation entries

    Linus Torvalds
     
  • Cleanup cpa_flush_array() to avoid back to back on_each_cpu() calls.

    [ Impact: optimizes fix 0af48f42df15b97080b450d24219dd95db7b929a ]

    Signed-off-by: Venkatesh Pallipadi
    Signed-off-by: H. Peter Anvin

    Pallipadi, Venkatesh
     
  • * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq:
    [CPUFREQ] powernow-k8: determine exact CPU frequency for HW Pstates
    [CPUFREQ] powernow-k8 cleanup msg if BIOS does not export ACPI _PSS cpufreq data
    [CPUFREQ] fix timer teardown in ondemand governor
    [CPUFREQ] fix timer teardown in conservative governor
    [CPUFREQ] remove rwsem lock from CPUFREQ_GOV_STOP call
    [CPUFREQ] powernow-k7 build fix when ACPI=n
    [CPUFREQ] add atom family to p4-clockmod

    Linus Torvalds
     
  • * 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
    powerpc/mm: Fix broken MMU PID stealing on !SMP

    Linus Torvalds
     
  • Slightly modified by trenn@suse.de -> only do this on fam 10h and fam 11h.

    Currently powernow-k8 determines CPU frequency from ACPI PSS objects, but
    according to AMD family 11h BKDG this frequency is just a rounded value:

    "CoreFreq (MHz) = The CPU COF specified by MSRC001_00[6B:64][CpuFid]
    rounded to the nearest 100 Mhz."

    As a consequnce powernow-k8 reports wrong CPU frequency on some systems,
    e.g. on Turion X2 Ultra:

    powernow-k8: Found 1 AMD Turion(tm)X2 Ultra DualCore Mobile ZM-82
    processors (2 cpu cores) (version 2.20.00)
    powernow-k8: 0 : pstate 0 (2200 MHz)
    powernow-k8: 1 : pstate 1 (1100 MHz)
    powernow-k8: 2 : pstate 2 (600 MHz)

    But this is wrong as frequency for Pstate2 is 550 MHz. x86info reports it
    correctly:

    #x86info -a |grep Pstate
    ...
    Pstate-0: fid=e, did=0, vid=24 (2200MHz)
    Pstate-1: fid=e, did=1, vid=30 (1100MHz)
    Pstate-2: fid=e, did=2, vid=3c (550MHz) (current)

    Solution is to determine the frequency directly from Pstate MSRs instead
    of using rounded values from ACPI table.

    Signed-off-by: Andreas Herrmann
    Signed-off-by: Thomas Renninger
    Signed-off-by: Dave Jones

    Andreas Herrmann
     
  • - Make the message shorter and easier to grep for
    - Use printk_once instead of WARN_ONCE (functionality of these was mixed)

    Signed-off-by: Thomas Renninger
    Cc: Langsdorf, Mark
    Signed-off-by: Dave Jones

    Thomas Renninger
     
  • arch/x86/kernel/cpu/cpufreq/powernow-k7.c:172: warning: 'invalidate_entry' defined but not used

    Reported-by: Toralf Förster
    Signed-off-by: Dave Jones

    Dave Jones
     
  • Some atom procs don't do freq scaling (such as the atom 330 on my own
    littlefalls2 board). By adding the atom family here, we at least get
    the benefit of passive cooling in a thermal emergency. Not sure how
    to see that its actually helping any, but the driver does bind and
    claim its functioning on my atom 330.

    Signed-off-by: Jarod Wilson
    Signed-off-by: Dave Jones

    Jarod Wilson
     

26 May, 2009

6 commits

  • For relocatable 32bit kernels, boot/compressed/relocs.c processes
    relocation entries in the kernel image and appends it to the kernel
    image such that boot/compressed/head_32.S can relocate the kernel.
    The kernel image is one statically linked object and only uses two
    relocation types - R_386_PC32 and R_386_32, of the two only the latter
    needs massaging during kernel relocation and thus handled by relocs.
    R_386_PC32 is ignored and all other relocation types are considered
    error.

    When the target of a relocation resides in a discarded section,
    binutils doesn't throw away the relocation record but nullifies it by
    changing it to R_386_NONE, which unfortunately makes relocs fail.

    The problem was triggered by yet out-of-tree x86 stack unwind patches
    but given the binutils behavior, ignoring R_386_NONE is the right
    thing to do.

    The problem has been tracked down to binutils behavior by Jan Beulich.

    [ Impact: fix build with certain binutils by ignoring R_386_NONE ]

    Signed-off-by: Tejun Heo
    Cc: Jan Beulich
    Cc: Ingo Molnar
    LKML-Reference:
    Signed-off-by: H. Peter Anvin

    Tejun Heo
     
  • The recent rework of the MMU PID handling for non-hash CPUs has a
    subtle bug in the !SMP "optimized" variant of the PID stealing
    function. It clears the PID in the mm context before it calls
    local_flush_tlb_mm(). However, the later will not flush anything
    if the PID in the context is clear...

    Signed-off-by: Hideo Saito
    Signed-off-by: Benjamin Herrenschmidt

    Hideo Saito
     
  • * 'kvm-updates/2.6.30' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
    KVM: Fix PDPTR reloading on CR4 writes
    KVM: Make paravirt tlb flush also reload the PAE PDPTRs

    Linus Torvalds
     
  • …git/tip/linux-2.6-tip

    * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    x86: Remove remap percpu allocator for the time being
    x86: cpa_flush_array wbinvd should be done on all CPUs
    x86: bugfix wbinvd() model check instead of family check
    x86: introduce noxsave boot parameter
    x86, setup: revert ACPI 3 E820 extended attributes support
    x86: DMI match for the Sony VGN-Z540N as it needs BIOS reboot

    Linus Torvalds
     
  • The processor is documented to reload the PDPTRs while in PAE mode if any
    of the CR4 bits PSE, PGE, or PAE change. Linux relies on this
    behaviour when zapping the low mappings of PAE kernels during boot.

    The code already handled changes to CR4.PAE; augment it to also notice changes
    to PSE and PGE.

    This triggered while booting an F11 PAE kernel; the futex initialization code
    runs before any CR3 reloads and writes to a NULL pointer; the futex subsystem
    ended up uninitialized, killing PI futexes and pulseaudio which uses them.

    Cc: stable@kernel.org
    Signed-off-by: Avi Kivity

    Avi Kivity
     
  • The paravirt tlb flush may be used not only to flush TLBs, but also
    to reload the four page-directory-pointer-table entries, as it is used
    as a replacement for reloading CR3. Change the code to do the entire
    CR3 reloading dance instead of simply flushing the TLB.

    Cc: stable@kernel.org
    Signed-off-by: Avi Kivity

    Avi Kivity
     

25 May, 2009

1 commit

  • Remap percpu allocator has subtle bug when combined with page
    attribute changing. Remap percpu allocator aliases PMD pages for the
    first chunk and as pageattr doesn't know about the alias it ends up
    updating page attributes of the original mapping thus leaving the
    alises in inconsistent state which might lead to subtle data
    corruption. Please read the following threads for more information:

    http://thread.gmane.org/gmane.linux.kernel/835783

    The following is the proposed fix which teaches pageattr about percpu
    aliases.

    http://thread.gmane.org/gmane.linux.kernel/837157

    However, the above changes are deemed too pervasive for upstream
    inclusion for 2.6.30 release, so this patch essentially disables
    the remap allocator for the time being.

    Signed-off-by: Tejun Heo
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Tejun Heo
     

23 May, 2009

4 commits

  • cpa_flush_array seems to prefer wbinvd() over clflush at 4M threshold.
    clflush needs to be done on only one CPU as per instruction definition.
    wbinvd() however, should be done on all CPUs.

    [ Impact: fix missing flush which could cause data corruption ]

    Signed-off-by: Venkatesh Pallipadi
    Signed-off-by: Suresh Siddha
    Signed-off-by: H. Peter Anvin

    venkatesh.pallipadi@intel.com
     
  • wbinvd is supported on all CPUs 486 or later. But,
    pageattr.c is checking x86_model >= 4 before wbinvd(), which looks like
    an oversight bug. It was first introduced at one place by changeset
    d7c8f21a8cad0228c7c5ce2bb6dbd95d1ee49d13 and got copied over to second
    place in the same file later.

    [ Impact: fix missing cache flush on early-model CPUs, potential data corruption ]

    Signed-off-by: Venkatesh Pallipadi
    Signed-off-by: H. Peter Anvin

    venkatesh.pallipadi@intel.com
     
  • Introduce "noxsave" boot parameter which will disable the cpu's xsave/xrstor
    capabilities. Useful for debugging and working around xsave related issues.

    [ Impact: make it possible to debug problems in the field ]

    Signed-off-by: Suresh Siddha
    Signed-off-by: H. Peter Anvin

    Suresh Siddha
     
  • Remove ACPI 3 E820 extended memory attributes support. At least one
    vendor actively set all the flags to zero, but left ECX on return at
    24. This bug may be present in other BIOSes.

    The breakage functionally means the ACPI 3 flags are probably
    completely useless, and that no OS any time soon is going to rely on
    their existence. Therefore, drop support completely. We may want to
    revisit this question in the future, if we find ourselves actually
    needing the flags.

    This reverts all or part of the following checkins:

    cd670599b7b00d9263f6f11a05c0edeb9cbedaf3
    c549e71d073a6e9a4847497344db28a784061455

    However, retain the part from the latter commit that copies e820 into
    a temporary buffer; that is an unrelated BIOS workaround. Put in a
    comment to explain that part.

    See https://bugzilla.redhat.com/show_bug.cgi?id=499396 for some
    additional information.

    [ Impact: detect all memory on affected machines ]

    Reported-by: Thomas J. Baker
    Signed-off-by: H. Peter Anvin
    Acked-by: Len Brown
    Cc: Chuck Ebbert
    Cc: Kyle McMartin
    Cc: Matt Domsch

    H. Peter Anvin
     

22 May, 2009

8 commits


21 May, 2009

2 commits

  • * 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus:
    MIPS: 64-bit: Fix system lockup.
    MIPS: IP28: Change to build with -mr10k-cache-barrier=store
    MIPS: IP22: Fix hang in power button interrupt handler
    MIPS: IP32: Fix hang on shutdown in power button interrupt handler.

    Linus Torvalds
     
  • * master.kernel.org:/home/rmk/linux-2.6-arm: (25 commits)
    [ARM] 5519/1: amba probe: pass "struct amba_id *" instead of void *
    [ARM] 5517/1: integrator: don't put clock lookups in __initdata
    [ARM] 5518/1: versatile: don't put clock lookups in __initdata
    [ARM] mach-l7200: fix spelling of SYS_CLOCK_OFF
    [ARM] Double check memmap is actually valid with a memmap has unexpected holes V2
    [ARM] realview: fix broadcast tick support
    [ARM] realview: remove useless smp_cross_call_done()
    [ARM] smp: fix cpumask usage in ARM SMP code
    [ARM] 5513/1: Eurotech VIPER SBC: fix compilation error
    [ARM] 5509/1: ep93xx: clkdev enable UARTS
    ARM: OMAP2/3: Change omapfb to use clkdev for dispc and rfbi, v2
    ARM: OMAP3: Fix HW SAVEANDRESTORE shift define
    ARM: OMAP3: Fix number of GPIO lines for 34xx
    [ARM] S3C: Do not set clk->owner field if unset
    [ARM] S3C2410: mach-bast.c registering i2c data too early
    [ARM] S3C24XX: Fix unused code warning in arch/arm/plat-s3c24xx/dma.c
    [ARM] S3C64XX: fix GPIO debug
    [ARM] S3C64XX: GPIO include cleanup
    [ARM] nwfpe: fix 'floatx80_is_nan' sparse warning
    [ARM] nwfpe: Add decleration for ExtendedCPDO
    ...

    Linus Torvalds