05 Dec, 2011

1 commit

  • To make this work, we teach the page fault handler how to send
    signals on failed uaccess. This only works for user addresses
    (kernel addresses will never hit the page fault handler in the
    first place), so we need to generate signals for those
    separately.

    This gets the tricky case right: if the user buffer spans
    multiple pages and only the second page is invalid, we set
    cr2 and si_addr correctly. UML relies on this behavior to
    "fault in" pages as needed.

    We steal a bit from thread_info.uaccess_err to enable this.
    Before this change, uaccess_err was a 32-bit boolean value.

    This fixes issues with UML when vsyscall=emulate.

    Reported-by: Adrian Bunk
    Signed-off-by: Andy Lutomirski
    Cc: richard -rw- weinberger
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Link: http://lkml.kernel.org/r/4c8f91de7ec5cd2ef0f59521a04e1015f11e42b4.1320712291.git.luto@amacapital.net
    Signed-off-by: Ingo Molnar

    Andy Lutomirski
     

28 Oct, 2011

1 commit


29 Sep, 2011

1 commit

  • Erratum 93 applies to AMD K8 CPUs only, and its workaround
    (forcing the upper 32 bits of %rip to all get set under certain
    conditions) is actually getting in the way of analyzing page
    faults occurring during EFI physical mode runtime calls (in
    particular the page table walk shown is completely unrelated to
    the actual fault). This is because typically EFI runtime code
    lives in the space between 2G and 4G, which - modulo the above
    manipulation - is likely to overlap with the kernel or modules
    area.

    While even for the other errata workarounds their taking effect
    could be limited to just the affected CPUs, none of them appears
    to be destructive, and they're generally getting called only
    outside of performance critical paths, so they're being left
    untouched.

    Signed-off-by: Jan Beulich
    Link: http://lkml.kernel.org/r/4E835FE30200007800058464@nat28.tlf.novell.com
    Signed-off-by: Ingo Molnar

    Jan Beulich
     

16 Aug, 2011

2 commits

  • arch/x86/mm/fault.c now depend on having the symbol VSYSCALL_START
    defined, which is best handled by including (it isn't
    unreasonable we may want other fixed addresses in this file in the
    future, and so it is cleaner than including
    directly.)

    This addresses an x86-64 allnoconfig build failure. On other
    configurations it was masked by an indirect path:

    -> -> ->

    ... however, the first such include is conditional on CONFIG_X86_LOCAL_APIC.

    Originally-by: Randy Dunlap
    Cc: Linus Torvalds
    Link: http://lkml.kernel.org/r/CA%2B55aFxsOMc9=p02r8-QhJ=h=Mqwckk4_Pnx9LQt5%2BfqMp_exQ@mail.gmail.com
    Signed-off-by: H. Peter Anvin

    H. Peter Anvin
     
  • arch/x86/mm/fault.c needs to include asm/vsyscall.h to fix a
    build error:

    arch/x86/mm/fault.c: In function '__bad_area_nosemaphore':
    arch/x86/mm/fault.c:728: error: 'VSYSCALL_START' undeclared (first use in this function)

    Signed-off-by: Randy Dunlap
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     

13 Aug, 2011

1 commit

  • * 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-tip:
    x86-64: Rework vsyscall emulation and add vsyscall= parameter
    x86-64: Wire up getcpu syscall
    x86: Remove unnecessary compile flag tweaks for vsyscall code
    x86-64: Add vsyscall:emulate_vsyscall trace event
    x86-64: Add user_64bit_mode paravirt op
    x86-64, xen: Enable the vvar mapping
    x86-64: Work around gold bug 13023
    x86-64: Move the "user" vsyscall segment out of the data segment.
    x86-64: Pad vDSO to a page boundary

    Linus Torvalds
     

11 Aug, 2011

1 commit

  • There are three choices:

    vsyscall=native: Vsyscalls are native code that issues the
    corresponding syscalls.

    vsyscall=emulate (default): Vsyscalls are emulated by instruction
    fault traps, tested in the bad_area path. The actual contents of
    the vsyscall page is the same as the vsyscall=native case except
    that it's marked NX. This way programs that make assumptions about
    what the code in the page does will not be confused when they read
    that code.

    vsyscall=none: Trying to execute a vsyscall will segfault.

    Signed-off-by: Andy Lutomirski
    Link: http://lkml.kernel.org/r/8449fb3abf89851fd6b2260972666a6f82542284.1312988155.git.luto@mit.edu
    Signed-off-by: H. Peter Anvin

    Andy Lutomirski
     

05 Aug, 2011

1 commit

  • Three places in the kernel assume that the only long mode CPL 3
    selector is __USER_CS. This is not true on Xen -- Xen's sysretq
    changes cs to the magic value 0xe033.

    Two of the places are corner cases, but as of "x86-64: Improve
    vsyscall emulation CS and RIP handling"
    (c9712944b2a12373cb6ff8059afcfb7e826a6c54), vsyscalls will segfault
    if called with Xen's extra CS selector. This causes a panic when
    older init builds die.

    It seems impossible to make Xen use __USER_CS reliably without
    taking a performance hit on every system call, so this fixes the
    tests instead with a new paravirt op. It's a little ugly because
    ptrace.h can't include paravirt.h.

    Signed-off-by: Andy Lutomirski
    Link: http://lkml.kernel.org/r/f4fcb3947340d9e96ce1054a432f183f9da9db83.1312378163.git.luto@mit.edu
    Reported-by: Konrad Rzeszutek Wilk
    Signed-off-by: H. Peter Anvin

    Andy Lutomirski
     

01 Jul, 2011

1 commit

  • The nmi parameter indicated if we could do wakeups from the current
    context, if not, we would set some state and self-IPI and let the
    resulting interrupt do the wakeup.

    For the various event classes:

    - hardware: nmi=0; PMI is in fact an NMI or we run irq_work_run from
    the PMI-tail (ARM etc.)
    - tracepoint: nmi=0; since tracepoint could be from NMI context.
    - software: nmi=[0,1]; some, like the schedule thing cannot
    perform wakeups, and hence need 0.

    As one can see, there is very little nmi=1 usage, and the down-side of
    not using it is that on some platforms some software events can have a
    jiffy delay in wakeup (when arch_irq_work_raise isn't implemented).

    The up-side however is that we can remove the nmi parameter and save a
    bunch of conditionals in fast paths.

    Signed-off-by: Peter Zijlstra
    Cc: Michael Cree
    Cc: Will Deacon
    Cc: Deng-Cheng Zhu
    Cc: Anton Blanchard
    Cc: Eric B Munson
    Cc: Heiko Carstens
    Cc: Paul Mundt
    Cc: David S. Miller
    Cc: Frederic Weisbecker
    Cc: Jason Wessel
    Cc: Don Zickus
    Link: http://lkml.kernel.org/n/tip-agjev8eu666tvknpb3iaj0fg@git.kernel.org
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

26 May, 2011

1 commit

  • Ingo suggested SIGKILL check should be moved into slowpath
    function. This will reduce the page fault fastpath impact
    of this recent commit:

    37b23e0525d3: x86,mm: make pagefault killable

    Suggested-by: Ingo Molnar
    Signed-off-by: KOSAKI Motohiro
    Cc: kamezawa.hiroyu@jp.fujitsu.com
    Cc: minchan.kim@gmail.com
    Cc: willy@linux.intel.com
    Cc: Andrew Morton
    Cc: Linus Torvalds
    Link: http://lkml.kernel.org/r/4DDE0B5C.9050907@jp.fujitsu.com
    Signed-off-by: Ingo Molnar

    KOSAKI Motohiro
     

25 May, 2011

1 commit

  • When an oom killing occurs, almost all processes are getting stuck at the
    following two points.

    1) __alloc_pages_nodemask
    2) __lock_page_or_retry

    1) is not very problematic because TIF_MEMDIE leads to an allocation
    failure and getting out from page allocator.

    2) is more problematic. In an OOM situation, zones typically don't have
    page cache at all and memory starvation might lead to greatly reduced IO
    performance. When a fork bomb occurs, TIF_MEMDIE tasks don't die quickly,
    meaning that a fork bomb may create new process quickly rather than the
    oom-killer killing it. Then, the system may become livelocked.

    This patch makes the pagefault interruptible by SIGKILL.

    Signed-off-by: KOSAKI Motohiro
    Reviewed-by: KAMEZAWA Hiroyuki
    Cc: Minchan Kim
    Cc: Matthew Wilcox
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: "H. Peter Anvin"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KOSAKI Motohiro
     

21 May, 2011

1 commit

  • Commit e66eed651fd1 ("list: remove prefetching from regular list
    iterators") removed the include of prefetch.h from list.h, which
    uncovered several cases that had apparently relied on that rather
    obscure header file dependency.

    So this fixes things up a bit, using

    grep -L linux/prefetch.h $(git grep -l '[^a-z_]prefetchw*(' -- '*.[ch]')
    grep -L 'prefetchw*(' $(git grep -l 'linux/prefetch.h' -- '*.[ch]')

    to guide us in finding files that either need
    inclusion, or have it despite not needing it.

    There are more of them around (mostly network drivers), but this gets
    many core ones.

    Reported-by: Stephen Rothwell
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

10 Mar, 2011

2 commits

  • It's forbidden to take the page_table_lock with the irq disabled
    or if there's contention the IPIs (for tlb flushes) sent with
    the page_table_lock held will never run leading to a deadlock.

    Nobody takes the pgd_lock from irq context so the _irqsave can be
    removed.

    Signed-off-by: Andrea Arcangeli
    Acked-by: Rik van Riel
    Tested-by: Konrad Rzeszutek Wilk
    Signed-off-by: Andrew Morton
    Cc: Peter Zijlstra
    Cc: Linus Torvalds
    Cc:
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Andrea Arcangeli
     
  • mm_fault_error() should not execute oom-killer, if page fault
    occurs in kernel space. E.g. in copy_from_user()/copy_to_user().

    This would happen if we find ourselves in OOM on a
    copy_to_user(), or a copy_from_user() which faults.

    Without this patch, the kernels hangs up in copy_from_user(),
    because OOM killer sends SIG_KILL to current process, but it
    can't handle a signal while in syscall, then the kernel returns
    to copy_from_user(), reexcute current command and provokes
    page_fault again.

    With this patch the kernel return -EFAULT from copy_from_user().

    The code, which checks that page fault occurred in kernel space,
    has been copied from do_sigbus().

    This situation is handled by the same way on powerpc, xtensa,
    tile, ...

    Signed-off-by: Andrey Vagin
    Signed-off-by: Andrew Morton
    Cc: "H. Peter Anvin"
    Cc: Linus Torvalds
    Cc:
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Andrey Vagin
     

27 Oct, 2010

2 commits

  • access_error() already takes error_code as an argument, so there is
    no need for an additional write flag.

    Signed-off-by: Michel Lespinasse
    Acked-by: Rik van Riel
    Cc: Nick Piggin
    Acked-by: Wu Fengguang
    Cc: Ying Han
    Cc: Peter Zijlstra
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Acked-by: "H. Peter Anvin"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     
  • This change reduces mmap_sem hold times that are caused by waiting for
    disk transfers when accessing file mapped VMAs.

    It introduces the VM_FAULT_ALLOW_RETRY flag, which indicates that the call
    site wants mmap_sem to be released if blocking on a pending disk transfer.
    In that case, filemap_fault() returns the VM_FAULT_RETRY status bit and
    do_page_fault() will then re-acquire mmap_sem and retry the page fault.

    It is expected that the retry will hit the same page which will now be
    cached, and thus it will complete with a low mmap_sem hold time.

    Tests:

    - microbenchmark: thread A mmaps a large file and does random read accesses
    to the mmaped area - achieves about 55 iterations/s. Thread B does
    mmap/munmap in a loop at a separate location - achieves 55 iterations/s
    before, 15000 iterations/s after.

    - We are seeing related effects in some applications in house, which show
    significant performance regressions when running without this change.

    [akpm@linux-foundation.org: fix warning & crash]
    Signed-off-by: Michel Lespinasse
    Acked-by: Rik van Riel
    Acked-by: Linus Torvalds
    Cc: Nick Piggin
    Reviewed-by: Wu Fengguang
    Cc: Ying Han
    Cc: Peter Zijlstra
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Acked-by: "H. Peter Anvin"
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     

22 Oct, 2010

2 commits

  • Conflicts:
    mm/memory-failure.c

    Andi Kleen
     
  • * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    x86-32, percpu: Correct the ordering of the percpu readmostly section
    x86, mm: Enable ARCH_DMA_ADDR_T_64BIT with X86_64 || HIGHMEM64G
    x86: Spread tlb flush vector between nodes
    percpu: Introduce a read-mostly percpu API
    x86, mm: Fix incorrect data type in vmalloc_sync_all()
    x86, mm: Hold mm->page_table_lock while doing vmalloc_sync
    x86, mm: Fix bogus whitespace in sync_global_pgds()
    x86-32: Fix sparse warning for the __PHYSICAL_MASK calculation
    x86, mm: Add RESERVE_BRK_ARRAY() helper
    mm, x86: Saving vmcore with non-lazy freeing of vmas
    x86, kdump: Change copy_oldmem_page() to use cached addressing
    x86, mm: fix uninitialized addr in kernel_physical_mapping_init()
    x86, kmemcheck: Remove double test
    x86, mm: Make spurious_fault check explicitly check the PRESENT bit
    x86-64, mem: Update all PGDs for direct mapping and vmemmap mapping changes
    x86, mm: Separate x86_64 vmalloc_sync_all() into separate functions
    x86, mm: Avoid unnecessary TLB flush

    Linus Torvalds
     

21 Oct, 2010

1 commit


20 Oct, 2010

1 commit

  • Take mm->page_table_lock while syncing the vmalloc region. This prevents
    a race with the Xen pagetable pin/unpin code, which expects that the
    page_table_lock is already held. If this race occurs, then Xen can see
    an inconsistent page type (a page can either be read/write or a pagetable
    page, and pin/unpin converts it between them), which will cause either
    the pin or the set_p[gm]d to fail; either will crash the kernel.

    vmalloc_sync_all() should be called rarely, so this extra use of
    page_table_lock should not interfere with its normal users.

    The mm pointer is stashed in the pgd page's index field, as that won't
    be otherwise used for pgds.

    Reported-by: Ian Campbell
    Originally-by: Jan Beulich
    LKML-Reference:
    Signed-off-by: Jeremy Fitzhardinge
    Signed-off-by: H. Peter Anvin

    Jeremy Fitzhardinge
     

15 Oct, 2010

1 commit

  • In x86, faults exit by executing the iret instruction, which then
    reenables NMIs if we faulted in NMI context. Then if a fault
    happens in NMI, another NMI can nest after the fault exits.

    But we don't yet support nested NMIs because we have only one NMI
    stack. To prevent from that, check that vmalloc and kmemcheck
    faults don't happen in this context. Most of the other kernel faults
    in NMIs can be more easily spotted by finding explicit
    copy_from,to_user() calls on review.

    Signed-off-by: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: H. Peter Anvin
    Cc: Mathieu Desnoyers
    Cc: Peter Zijlstra

    Frederic Weisbecker
     

08 Oct, 2010

1 commit

  • An earlier patch fixed the hwpoison fault handling to encode the
    huge page size in the fault code of the page fault handler.

    This is needed to report this information in SIGBUS to user space.

    This is a straight forward patch to pass this information
    through to the signal handling in the x86 specific fault.c

    Cc: x86@kernel.org
    Cc: Naoya Horiguchi
    Cc: fengguang.wu@intel.com
    Signed-off-by: Andi Kleen

    Andi Kleen
     

27 Aug, 2010

2 commits


14 Aug, 2010

1 commit

  • It's wrong for several reasons, but the most direct one is that the
    fault may be for the stack accesses to set up a previous SIGBUS. When
    we have a kernel exception, the kernel exception handler does all the
    fixups, not some user-level signal handler.

    Even apart from the nested SIGBUS issue, it's also wrong to give out
    kernel fault addresses in the signal handler info block, or to send a
    SIGBUS when a system call already returns EFAULT.

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

06 Dec, 2009

1 commit

  • …git/tip/linux-2.6-tip

    * 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    x86: Limit number of per cpu TSC sync messages
    x86: dumpstack, 64-bit: Disable preemption when walking the IRQ/exception stacks
    x86: dumpstack: Clean up the x86_stack_ids[][] initalization and other details
    x86, cpu: mv display_cacheinfo -> cpu_detect_cache_sizes
    x86: Suppress stack overrun message for init_task
    x86: Fix cpu_devs[] initialization in early_cpu_init()
    x86: Remove CPU cache size output for non-Intel too
    x86: Minimise printk spew from per-vendor init code
    x86: Remove the CPU cache size printk's
    cpumask: Avoid cpumask_t in arch/x86/kernel/apic/nmi.c
    x86: Make sure we also print a Code: line for show_regs()

    Linus Torvalds
     

23 Nov, 2009

1 commit


17 Oct, 2009

1 commit


24 Sep, 2009

2 commits

  • * 'hwpoison' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6: (21 commits)
    HWPOISON: Enable error_remove_page on btrfs
    HWPOISON: Add simple debugfs interface to inject hwpoison on arbitary PFNs
    HWPOISON: Add madvise() based injector for hardware poisoned pages v4
    HWPOISON: Enable error_remove_page for NFS
    HWPOISON: Enable .remove_error_page for migration aware file systems
    HWPOISON: The high level memory error handler in the VM v7
    HWPOISON: Add PR_MCE_KILL prctl to control early kill behaviour per process
    HWPOISON: shmem: call set_page_dirty() with locked page
    HWPOISON: Define a new error_remove_page address space op for async truncation
    HWPOISON: Add invalidate_inode_page
    HWPOISON: Refactor truncate to allow direct truncating of page v2
    HWPOISON: check and isolate corrupted free pages v2
    HWPOISON: Handle hardware poisoned pages in try_to_unmap
    HWPOISON: Use bitmask/action code for try_to_unmap behaviour
    HWPOISON: x86: Add VM_FAULT_HWPOISON handling to x86 page fault handler v2
    HWPOISON: Add poison check to page fault handling
    HWPOISON: Add basic support for poisoned pages in fault handler v3
    HWPOISON: Add new SIGBUS error codes for hardware poison signals
    HWPOISON: Add support for poison swap entries v2
    HWPOISON: Export some rmap vma locking to outside world
    ...

    Linus Torvalds
     
  • Conflicts:
    kernel/trace/Makefile
    kernel/trace/trace.h
    kernel/trace/trace_event_types.h
    kernel/trace/trace_export.c

    Merge reason:
    Sync with latest significant tracing core changes.

    Frederic Weisbecker
     

21 Sep, 2009

1 commit

  • Bye-bye Performance Counters, welcome Performance Events!

    In the past few months the perfcounters subsystem has grown out its
    initial role of counting hardware events, and has become (and is
    becoming) a much broader generic event enumeration, reporting, logging,
    monitoring, analysis facility.

    Naming its core object 'perf_counter' and naming the subsystem
    'perfcounters' has become more and more of a misnomer. With pending
    code like hw-breakpoints support the 'counter' name is less and
    less appropriate.

    All in one, we've decided to rename the subsystem to 'performance
    events' and to propagate this rename through all fields, variables
    and API names. (in an ABI compatible fashion)

    The word 'event' is also a bit shorter than 'counter' - which makes
    it slightly more convenient to write/handle as well.

    Thanks goes to Stephane Eranian who first observed this misnomer and
    suggested a rename.

    User-space tooling and ABI compatibility is not affected - this patch
    should be function-invariant. (Also, defconfigs were not touched to
    keep the size down.)

    This patch has been generated via the following script:

    FILES=$(find * -type f | grep -vE 'oprofile|[^K]config')

    sed -i \
    -e 's/PERF_EVENT_/PERF_RECORD_/g' \
    -e 's/PERF_COUNTER/PERF_EVENT/g' \
    -e 's/perf_counter/perf_event/g' \
    -e 's/nb_counters/nb_events/g' \
    -e 's/swcounter/swevent/g' \
    -e 's/tpcounter_event/tp_event/g' \
    $FILES

    for N in $(find . -name perf_counter.[ch]); do
    M=$(echo $N | sed 's/perf_counter/perf_event/g')
    mv $N $M
    done

    FILES=$(find . -name perf_event.*)

    sed -i \
    -e 's/COUNTER_MASK/REG_MASK/g' \
    -e 's/COUNTER/EVENT/g' \
    -e 's/\/event_id/g' \
    -e 's/counter/event/g' \
    -e 's/Counter/Event/g' \
    $FILES

    ... to keep it as correct as possible. This script can also be
    used by anyone who has pending perfcounters patches - it converts
    a Linux kernel tree over to the new naming. We tried to time this
    change to the point in time where the amount of pending patches
    is the smallest: the end of the merge window.

    Namespace clashes were fixed up in a preparatory patch - and some
    stylistic fallout will be fixed up in a subsequent patch.

    ( NOTE: 'counters' are still the proper terminology when we deal
    with hardware registers - and these sed scripts are a bit
    over-eager in renaming them. I've undone some of that, but
    in case there's something left where 'counter' would be
    better than 'event' we can undo that on an individual basis
    instead of touching an otherwise nicely automated patch. )

    Suggested-by: Stephane Eranian
    Acked-by: Peter Zijlstra
    Acked-by: Paul Mackerras
    Reviewed-by: Arjan van de Ven
    Cc: Mike Galbraith
    Cc: Arnaldo Carvalho de Melo
    Cc: Frederic Weisbecker
    Cc: Steven Rostedt
    Cc: Benjamin Herrenschmidt
    Cc: David Howells
    Cc: Kyle McMartin
    Cc: Martin Schwidefsky
    Cc: "David S. Miller"
    Cc: Thomas Gleixner
    Cc: "H. Peter Anvin"
    Cc:
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

16 Sep, 2009

1 commit


14 Sep, 2009

1 commit


30 Aug, 2009

1 commit

  • Add __kprobes to the functions which handle in-kernel fixable page
    faults. Since kprobes can cause those in-kernel page faults by accessing
    kprobe data structures, probing those fault functions will cause
    fault-int3-loop (do_page_fault has already been marked as __kprobes).

    Signed-off-by: Masami Hiramatsu
    Acked-by: Ananth N Mavinakayanahalli
    Cc: Ingo Molnar
    LKML-Reference:
    Signed-off-by: Frederic Weisbecker

    Masami Hiramatsu
     

11 Jul, 2009

1 commit

  • Since commit 5fd29d6c ("printk: clean up handling of log-levels
    and newlines"), the kernel logs segfaults like:

    gnome-power-man[24509]: segfault at 20 ip 00007f9d4950465a sp 00007fffbb50fc70 error 4 in libgobject-2.0.so.0.2103.0[7f9d494f7000+45000]

    with the extra "" being KERN_INFO. This happens because the
    printk in show_signal_msg() started with KERN_CONT and then
    used "%s" to pass in the real level; and KERN_CONT is no longer
    an empty string, and printk only pays attention to the level at
    the very beginning of the format string.

    Therefore, remove the KERN_CONT from this printk, since it is
    now actively causing problems (and never really made any
    sense).

    Signed-off-by: Roland Dreier
    Cc: Linus Torvalds
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Roland Dreier
     

09 Jul, 2009

1 commit

  • Commit 5fd29d6ccbc98884569d6f3105aeca70858b3e0f ("printk: clean up
    handling of log-levels and newlines") changed printk semantics. printk
    lines with multiple KERN_ prefixes are no longer emitted as
    before the patch.

    is now included in the output on each additional use.

    Remove all uses of multiple KERN_s in formats.

    Signed-off-by: Joe Perches
    Signed-off-by: Linus Torvalds

    Joe Perches
     

29 Jun, 2009

1 commit

  • Use pgtable access helpers for 32-bit version dump_pagetable()
    and get rid of __typeof__() operators. This needs to make
    pmd_pfn() available for 2-level pgtable.

    Also, remove some casts for 64-bit version dump_pagetable().

    Signed-off-by: Akinobu Mita
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Akinobu Mita
     

22 Jun, 2009

1 commit

  • This allows the callers to now pass down the full set of FAULT_FLAG_xyz
    flags to handle_mm_fault(). All callers have been (mechanically)
    converted to the new calling convention, there's almost certainly room
    for architectures to clean up their code and then add FAULT_FLAG_RETRY
    when that support is added.

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

21 Jun, 2009

1 commit

  • …git/tip/linux-2.6-tip

    * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (45 commits)
    x86, mce: fix error path in mce_create_device()
    x86: use zalloc_cpumask_var for mce_dev_initialized
    x86: fix duplicated sysfs attribute
    x86: de-assembler-ize asm/desc.h
    i386: fix/simplify espfix stack switching, move it into assembly
    i386: fix return to 16-bit stack from NMI handler
    x86, ioapic: Don't call disconnect_bsp_APIC if no APIC present
    x86: Remove duplicated #include's
    x86: msr.h linux/types.h is only required for __KERNEL__
    x86: nmi: Add Intel processor 0x6f4 to NMI perfctr1 workaround
    x86, mce: mce_intel.c needs <asm/apic.h>
    x86: apic/io_apic.c: dmar_msi_type should be static
    x86, io_apic.c: Work around compiler warning
    x86: mce: Don't touch THERMAL_APIC_VECTOR if no active APIC present
    x86: mce: Handle banks == 0 case in K7 quirk
    x86, boot: use .code16gcc instead of .code16
    x86: correct the conversion of EFI memory types
    x86: cap iomem_resource to addressable physical memory
    x86, mce: rename _64.c files which are no longer 64-bit-specific
    x86, mce: mce.h cleanup
    ...

    Manually fix up trivial conflict in arch/x86/mm/fault.c

    Linus Torvalds
     

16 Jun, 2009

1 commit

  • Prefetch instructions can generate spurious faults on certain
    models of older CPUs. The faults themselves cannot be stopped
    and they can occur pretty much anywhere - so the way we solve
    them is that we detect certain patterns and ignore the fault.

    There is one small path of code where we must not take faults
    though: the #PF handler execution leading up to the reading
    of the CR2 (the faulting address). If we take a fault there
    then we destroy the CR2 value (with that of the prefetching
    instruction's) and possibly mishandle user-space or
    kernel-space pagefaults.

    It turns out that in current upstream we do exactly that:

    prefetchw(&mm->mmap_sem);

    /* Get the faulting address: */
    address = read_cr2();

    This is not good.

    So turn around the order: first read the cr2 then prefetch
    the lock address. Reading cr2 is plenty fast (2 cycles) so
    delaying the prefetch by this amount shouldnt be a big issue
    performance-wise.

    [ And this might explain a mystery fault.c warning that sometimes
    occurs on one an old AMD/Semptron based test-system i have -
    which does have such prefetch problems. ]

    Cc: Mathieu Desnoyers
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Nick Piggin
    Cc: Pekka Enberg
    Cc: Vegard Nossum
    Cc: Jeremy Fitzhardinge
    Cc: Hugh Dickins
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Ingo Molnar