Eric Lee / smarc-fsl-linux-kernel

13 May, 2007

1 commit

f1d1a842d SLUB: i386 support ... Browse Code »

SLUB cannot run on i386 at this point because i386 uses the page->private and
page->index field of slab pages for the pgd cache.

Make SLUB run on i386 by replacing the pgd slab cache with a quicklist.
Limit the changes as much as possible. Leave the improvised linked list in place
etc etc. This has been working here for a couple of weeks now.

Acked-by: William Lee Irwin III
Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2007-05-13 02:26:22 +0800

09 May, 2007

2 commits

0bb5e19d6 Clean up mostly unused IOSPACE macros ... Browse Code »

Most architectures defined three macros, MK_IOSPACE_PFN(), GET_IOSPACE()
and GET_PFN() in pgtable.h. However, the only callers of any of these
macros are in Sparc specific code, either in arch/sparc, arch/sparc64 or
drivers/sbus.

This patch removes the redundant macros from all architectures except
sparc and sparc64.

Signed-off-by: David Gibson
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Gibson
2007-05-09 02:15:13 +0800
1eeb66a1b move die notifier handling to common code ... Browse Code »

This patch moves the die notifier handling to common code. Previous
various architectures had exactly the same code for it. Note that the new
code is compiled unconditionally, this should be understood as an appel to
the other architecture maintainer to implement support for it aswell (aka
sprinkling a notify_die or two in the proper place)

arm had a notifiy_die that did something totally different, I renamed it to
arm_notify_die as part of the patch and made it static to the file it's
declared and used at. avr32 used to pass slightly less information through
this interface and I brought it into line with the other architectures.

[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: fix vmalloc_sync_all bustage]
[bryan.wu@analog.com: fix vmalloc_sync_all in nommu]
Signed-off-by: Christoph Hellwig
Cc:
Cc: Russell King
Signed-off-by: Bryan Wu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Hellwig
2007-05-09 02:15:04 +0800

08 May, 2007

2 commits

0013572b2 i386: use pte_update_defer in ptep_test_and_clear_{dirty,young} ... Browse Code »

If you actually clear the bit, you need to:

+ pte_update_defer(vma->vm_mm, addr, ptep);

The reason is, when updating PTEs, the hypervisor must be notified. Using
atomic operations to do this is fine for all hypervisors I am aware of.
However, for hypervisors which shadow page tables, if these PTE
modifications are not trapped, you need a post-modification call to fulfill
the update of the shadow page table.

Acked-by: Zachary Amsden
Cc: Hugh Dickins
Signed-off-by: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Zachary Amsden
2007-05-08 03:12:52 +0800
10a8d6ae4 i386: add ptep_test_and_clear_{dirty,young} ... Browse Code »

Add ptep_test_and_clear_{dirty,young} to i386. They advertise that they
have it and there is at least one place where it needs to be called without
the page table lock: to clear the accessed bit on write to
/proc/pid/clear_refs.

ptep_clear_flush_{dirty,young} are updated to use the new functions. The
overall net effect to current users of ptep_clear_flush_{dirty,young} is
that we introduce an additional branch.

Cc: Hugh Dickins
Cc: Ingo Molnar
Signed-off-by: David Rientjes
Cc: Andi Kleen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Rientjes
2007-05-08 03:12:52 +0800

03 May, 2007

9 commits

9e5e3162b [PATCH] i386: pte simplify ops ... Browse Code »

Add comment and condense code to make use of native_local_ptep_get_and_clear
function. Also, it turns out the 2-level and 3-level paging definitions were
identical, so move the common definition into pgtable.h

Signed-off-by: Zachary Amsden
Signed-off-by: Andrew Morton
Signed-off-by: Andi Kleen

Zachary Amsden
2007-05-03 01:27:19 +0800
c2c1accd4 [PATCH] i386: pte clear optimization ... Browse Code »

When exiting from an address space, no special hypervisor notification of page
table updates needs to occur; direct page table hypervisors, such as Xen,
switch to another address space first (init_mm) and unprotects the page tables
to avoid the cost of trapping to the hypervisor for each pte_clear. Shadow
mode hypervisors, such as VMI and lhype don't need to do the extra work of
calling through paravirt-ops, and can just directly clear the page table
entries without notifiying the hypervisor, since all the page tables are about
to be freed.

So introduce native_pte_clear functions which bypass any paravirt-ops
notification. This results in a significant performance win for VMI and
removes some indirect calls from zap_pte_range.

Note the 3-level paging already had a native_pte_clear function, thus
demanding argument conformance and extra args for the 2-level definition.

Signed-off-by: Zachary Amsden
Signed-off-by: Andrew Morton
Signed-off-by: Andi Kleen

Zachary Amsden
2007-05-03 01:27:19 +0800
4cdd9c893 [PATCH] i386: PARAVIRT: drop unused ptep_get_and_clear ... Browse Code »

In shadow mode hypervisors, ptep_get_and_clear achieves the desired
purpose of keeping the shadows in sync by issuing a native_get_and_clear,
followed by a call to pte_update, which indicates the PTE has been
modified.

Direct mode hypervisors (Xen) have no need for this anyway, and will trap
the update using writable pagetables.

This means no hypervisor makes use of ptep_get_and_clear; there is no
reason to have it in the paravirt-ops structure. Change confusing
terminology about raw vs. native functions into consistent use of
native_pte_xxx for operations which do not invoke paravirt-ops.

Signed-off-by: Zachary Amsden
Signed-off-by: Andi Kleen

Jeremy Fitzhardinge
2007-05-03 01:27:15 +0800
ce6234b52 [PATCH] i386: PARAVIRT: add kmap_atomic_pte for mapping highpte pages ... Browse Code »

Xen and VMI both have special requirements when mapping a highmem pte
page into the kernel address space. These can be dealt with by adding
a new kmap_atomic_pte() function for mapping highptes, and hooking it
into the paravirt_ops infrastructure.

Xen specifically wants to map the pte page RO, so this patch exposes a
helper function, kmap_atomic_prot, which maps the page with the
specified page protections.

This also adds a kmap_flush_unused() function to clear out the cached
kmap mappings. Xen needs this to clear out any potential stray RW
mappings of pages which will become part of a pagetable.

[ Zach - vmi.c will need some attention after this patch. It wasn't
immediately obvious to me what needs to be done. ]

Signed-off-by: Jeremy Fitzhardinge
Signed-off-by: Andi Kleen
Cc: Zachary Amsden

Jeremy Fitzhardinge
2007-05-03 01:27:15 +0800
a27fe809b [PATCH] i386: PARAVIRT: revert map_pt_hook. ... Browse Code »

Back out the map_pt_hook to clear the way for kmap_atomic_pte.

Signed-off-by: Jeremy Fitzhardinge
Signed-off-by: Andi Kleen
Cc: Zachary Amsden

Jeremy Fitzhardinge
2007-05-03 01:27:15 +0800
5311ab62c [PATCH] i386: PARAVIRT: Allow paravirt backend to choose kernel PMD sharing ... Browse Code »

Normally when running in PAE mode, the 4th PMD maps the kernel address space,
which can be shared among all processes (since they all need the same kernel
mappings).

Xen, however, does not allow guests to have the kernel pmd shared between page
tables, so parameterize pgtable.c to allow both modes of operation.

There are several side-effects of this. One is that vmalloc will update the
kernel address space mappings, and those updates need to be propagated into
all processes if the kernel mappings are not intrinsically shared. In the
non-PAE case, this is done by maintaining a pgd_list of all processes; this
list is used when all process pagetables must be updated. pgd_list is
threaded via otherwise unused entries in the page structure for the pgd, which
means that the pgd must be page-sized for this to work.

Normally the PAE pgd is only 4x64 byte entries large, but Xen requires the PAE
pgd to page aligned anyway, so this patch forces the pgd to be page
aligned+sized when the kernel pmd is unshared, to accomodate both these
requirements.

Also, since there may be several distinct kernel pmds (if the user/kernel
split is below 3G), there's no point in allocating them from a slab cache;
they're just allocated with get_free_page and initialized appropriately. (Of
course the could be cached if there is just a single kernel pmd - which is the
default with a 3G user/kernel split - but it doesn't seem worthwhile to add
yet another case into this code).

[ Many thanks to wli for review comments. ]

Signed-off-by: Jeremy Fitzhardinge
Signed-off-by: William Lee Irwin III
Signed-off-by: Andi Kleen
Cc: Zachary Amsden
Cc: Christoph Lameter
Acked-by: Ingo Molnar
Signed-off-by: Andrew Morton

Jeremy Fitzhardinge
2007-05-03 01:27:13 +0800
b239fb250 [PATCH] i386: PARAVIRT: Hooks to set up initial pagetable ... Browse Code »

This patch introduces paravirt_ops hooks to control how the kernel's
initial pagetable is set up.

In the case of a native boot, the very early bootstrap code creates a
simple non-PAE pagetable to map the kernel and physical memory. When
the VM subsystem is initialized, it creates a proper pagetable which
respects the PAE mode, large pages, etc.

When booting under a hypervisor, there are many possibilities for what
paging environment the hypervisor establishes for the guest kernel, so
the constructon of the kernel's pagetable depends on the hypervisor.

In the case of Xen, the hypervisor boots the kernel with a fully
constructed pagetable, which is already using PAE if necessary. Also,
Xen requires particular care when constructing pagetables to make sure
all pagetables are always mapped read-only.

In order to make this easier, kernel's initial pagetable construction
has been changed to only allocate and initialize a pagetable page if
there's no page already present in the pagetable. This allows the Xen
paravirt backend to make a copy of the hypervisor-provided pagetable,
allowing the kernel to establish any more mappings it needs while
keeping the existing ones.

A slightly subtle point which is worth highlighting here is that Xen
requires all kernel mappings to share the same pte_t pages between all
pagetables, so that updating a kernel page's mapping in one pagetable
is reflected in all other pagetables. This makes it possible to
allocate a page and attach it to a pagetable without having to
explicitly enumerate that page's mapping in all pagetables.

And:

+From: "Eric W. Biederman"

If we don't set the leaf page table entries it is quite possible that
will inherit and incorrect page table entry from the initial boot
page table setup in head.S. So we need to redo the effort here,
so we pick up PSE, PGE and the like.

Hypervisors like Xen require that their page tables be read-only,
which is slightly incompatible with our low identity mappings, however
I discussed this with Jeremy he has modified the Xen early set_pte
function to avoid problems in this area.

Signed-off-by: Eric W. Biederman
Signed-off-by: Jeremy Fitzhardinge
Signed-off-by: Andi Kleen
Acked-by: William Irwin
Cc: Ingo Molnar

Jeremy Fitzhardinge
2007-05-03 01:27:13 +0800
3dc494e86 [PATCH] i386: PARAVIRT: Add pagetable accessors to pack and unpack pagetable entries ... Browse Code »

Add a set of accessors to pack, unpack and modify page table entries
(at all levels). This allows a paravirt implementation to control the
contents of pgd/pmd/pte entries. For example, Xen uses this to
convert the (pseudo-)physical address into a machine address when
populating a pagetable entry, and converting back to pphys address
when an entry is read.

Signed-off-by: Jeremy Fitzhardinge
Signed-off-by: Andi Kleen
Acked-by: Ingo Molnar

Jeremy Fitzhardinge
2007-05-03 01:27:13 +0800
d01ad8dd5 [PATCH] x86: Improve handling of kernel mappings in change_page_attr ... Browse Code »

Fix various broken corner cases in i386 and x86-64 change_page_attr.

AK: split off from tighten kernel image access rights

Signed-off-by: Jan Beulich
Signed-off-by: Andi Kleen

Jan Beulich
2007-05-03 01:27:10 +0800

05 Mar, 2007

1 commit

9a1c13e91 [PATCH] vmi: fix highpte ... Browse Code »

Provide a PT map hook for HIGHPTE kernels to designate where they are mapping
page tables. This information is required so the physical address of PTE
updates can be determined; otherwise, the mm layer would have to carry the
physical address all the way to each PTE modification callsite, which is even
more hideous that the macros required to provide the proper hooks.

So lets not mess up arch neutral code to achieve this, but keep the horror in
an #ifdef HIGHPTE in include/asm-i386/pgtable.h. I had to use macros here
because some types are not yet defined in all the include paths for this
header.

This patch is absolutely required for HIGHPTE kernels to operate properly with
VMI.

Signed-off-by: Zachary Amsden
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Zachary Amsden
2007-03-05 23:57:52 +0800

08 Dec, 2006

2 commits

4522d5827 Merge branch 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6 ... Browse Code »

* 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6: (156 commits)
[PATCH] x86-64: Export smp_call_function_single
[PATCH] i386: Clean up smp_tune_scheduling()
[PATCH] unwinder: move .eh_frame to RODATA
[PATCH] unwinder: fully support linker generated .eh_frame_hdr section
[PATCH] x86-64: don't use set_irq_regs()
[PATCH] x86-64: check vector in setup_ioapic_dest to verify if need setup_IO_APIC_irq
[PATCH] x86-64: Make ix86 default to HIGHMEM4G instead of NOHIGHMEM
[PATCH] i386: replace kmalloc+memset with kzalloc
[PATCH] x86-64: remove remaining pc98 code
[PATCH] x86-64: remove unused variable
[PATCH] x86-64: Fix constraints in atomic_add_return()
[PATCH] x86-64: fix asm constraints in i386 atomic_add_return
[PATCH] x86-64: Correct documentation for bzImage protocol v2.05
[PATCH] x86-64: replace kmalloc+memset with kzalloc in MTRR code
[PATCH] x86-64: Fix numaq build error
[PATCH] x86-64: include/asm-x86_64/cpufeature.h isn't a userspace header
[PATCH] unwinder: Add debugging output to the Dwarf2 unwinder
[PATCH] x86-64: Clarify error message in GART code
[PATCH] x86-64: Fix interrupt race in idle callback (3rd try)
[PATCH] x86-64: Remove unwind stack pointer alignment forcing again
...

Fixed conflict in include/linux/uaccess.h manually

Signed-off-by: Linus Torvalds

Linus Torvalds
2006-12-08 00:59:11 +0800
e18b890bb [PATCH] slab: remove kmem_cache_t ... Browse Code »

Replace all uses of kmem_cache_t with struct kmem_cache.

The patch was generated using the following script:

#!/bin/sh
#
# Replace one string by another in all the kernel sources.
#

set -e

for file in `find * -name "*.c" -o -name "*.h"|xargs grep -l $1`; do
quilt add $file
sed -e "1,\$s/$1/$2/g" $file >/tmp/$$
mv /tmp/$$ $file
quilt refresh
done

The script was run like this

sh replace kmem_cache_t "struct kmem_cache"

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-12-08 00:39:25 +0800

07 Dec, 2006

3 commits

8ecb89506 [PATCH] paravirt: fix missing pte update ... Browse Code »

The function ptep_get_and_clear uses an atomic instruction sequence to get and
clear an active pte. Rather than add such an atomic operator to all virtual
machine implementations in paravirt-ops, it is easier to support the raw
atomic sequence and use either a trapping writable pagetable approach, or a
post-update notification. For the post update notification, we require the
pte_update function to be called after the access. Combine the 2-level and
3-level paging operators into one common function which does the post-update
notification, and rename the actual atomic sequences to raw_ptep_xxx
operators.

Signed-off-by: Zachary Amsden
Signed-off-by: Andi Kleen
Cc: Andi Kleen
Cc: Jeremy Fitzhardinge
Cc: Chris Wright
Signed-off-by: Andrew Morton

Zachary Amsden
2006-12-07 09:14:09 +0800
dfbea0ad5 [PATCH] paravirt: fix parameter names in mmu operations ... Browse Code »

Make parameter names match function argument names for the yet to be defined
pte_update_defer accessor.

Signed-off-by: Zachary Amsden
Signed-off-by: Andi Kleen
Cc: Andi Kleen
Cc: Jeremy Fitzhardinge
Cc: Chris Wright
Signed-off-by: Andrew Morton

Zachary Amsden
2006-12-07 09:14:08 +0800
da181a8b3 [PATCH] paravirt: Add MMU virtualization to paravirt_ops ... Browse Code »

Add the three bare TLB accessor functions to paravirt-ops. Most amusingly,
flush_tlb is redefined on SMP, so I can't call the paravirt op flush_tlb.
Instead, I chose to indicate the actual flush type, kernel (global) vs. user
(non-global). Global in this sense means using the global bit in the page
table entry, which makes TLB entries persistent across CR3 reloads, not
global as in the SMP sense of invoking remote shootdowns, so the term is
confusingly overloaded.

AK: folded in fix from Zach for PAE compilation

Signed-off-by: Zachary Amsden
Signed-off-by: Chris Wright
Signed-off-by: Andi Kleen
Cc: Rusty Russell
Cc: Jeremy Fitzhardinge
Signed-off-by: Andrew Morton

Rusty Russell
2006-12-07 09:14:08 +0800

01 Oct, 2006

4 commits

789e6ac0a [PATCH] paravirt: update pte hook ... Browse Code »

Add a pte_update_hook which notifies about pte changes that have been made
without using the set_pte / clear_pte interfaces. This allows shadow mode
hypervisors which do not trap on page table access to maintain synchronized
shadows.

It also turns out, there was one pte update in PAE mode that wasn't using any
accessor interface at all for setting NX protection. Considering it is PAE
specific, and the accessor is i386 specific, I didn't want to add a generic
encapsulation of this behavior yet.

Signed-off-by: Zachary Amsden
Cc: Rusty Russell
Cc: Jeremy Fitzhardinge
Cc: Andi Kleen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Zachary Amsden
2006-10-01 15:39:34 +0800
d6d861e3c [PATCH] paravirt: optimize ptep establish for pae ... Browse Code »

The ptep_establish macro is only used on user-level PTEs, for P->P mapping
changes. Since these always happen under protection of the pagetable lock,
the strong synchronization of a 64-bit cmpxchg is not needed, in fact, not
even a lock prefix needs to be used. We can simply instead clear the P-bit,
followed by a normal set. The write ordering is still important to avoid the
possibility of the TLB snooping a partially written PTE and getting a bad
mapping installed.

Signed-off-by: Zachary Amsden
Cc: Rusty Russell
Cc: Jeremy Fitzhardinge
Cc: Andi Kleen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Zachary Amsden
2006-10-01 15:39:34 +0800
23002d88b [PATCH] paravirt: kpte flush ... Browse Code »

Create a new PTE function which combines clearing a kernel PTE with the
subsequent flush. This allows the two to be easily combined into a single
hypercall or paravirt-op. More subtly, reverse the order of the flush for
kmap_atomic. Instead of flushing on establishing a mapping, flush on clearing
a mapping. This eliminates the possibility of leaving stale kmap entries
which may still have valid TLB mappings. This is required for direct mode
hypervisors, which need to reprotect all mappings of a given page when
changing the page type from a normal page to a protected page (such as a page
table or descriptor table page). But it also provides some nicer semantics
for real hardware, by providing extra debug-proofing against using stale
mappings, as well as ensuring that no stale mappings exist when changing the
cacheability attributes of a page, which could lead to cache conflicts when
two different types of mappings exist for the same page.

Signed-off-by: Zachary Amsden
Cc: Rusty Russell
Cc: Jeremy Fitzhardinge
Cc: Andi Kleen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Zachary Amsden
2006-10-01 15:39:34 +0800
25e4df5ba [PATCH] paravirt: combine flush accessed dirty.patch ... Browse Code »

Remove ptep_test_and_clear_{dirty|young} from i386, and instead use the
dominating functions, ptep_clear_flush_{dirty|young}. This allows the TLB
page flush to be contained in the same macro, and allows for an eager
optimization - if reading the PTE initially returned dirty/accessed, we can
assume the fact that no subsequent update to the PTE which cleared accessed /
dirty has occurred, as the only way A/D bits can change without holding the
page table lock is if a remote processor clears them. This eliminates an
extra branch which came from the generic version of the code, as we know that
no other CPU could have cleared the A/D bit, so the flush will always be
needed.

We still export these two defines, even though we do not actually define
the macros in the i386 code:

#define __HAVE_ARCH_PTEP_TEST_AND_CLEAR_YOUNG
#define __HAVE_ARCH_PTEP_TEST_AND_CLEAR_DIRTY

The reason for this is that the only use of these functions is within the
generic clear_flush functions, and we want a strong guarantee that there
are no other users of these functions, so we want to prevent the generic
code from defining them for us.

Signed-off-by: Zachary Amsden
Cc: Rusty Russell
Cc: Jeremy Fitzhardinge
Cc: Andi Kleen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Zachary Amsden
2006-10-01 15:39:34 +0800

27 Sep, 2006

1 commit

b27824083 Merge branch 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6 ... Browse Code »

* 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6: (225 commits)
[PATCH] Don't set calgary iommu as default y
[PATCH] i386/x86-64: New Intel feature flags
[PATCH] x86: Add a cumulative thermal throttle event counter.
[PATCH] i386: Make the jiffies compares use the 64bit safe macros.
[PATCH] x86: Refactor thermal throttle processing
[PATCH] Add 64bit jiffies compares (for use with get_jiffies_64)
[PATCH] Fix unwinder warning in traps.c
[PATCH] x86: Allow disabling early pci scans with pci=noearly or disallowing conf1
[PATCH] x86: Move direct PCI scanning functions out of line
[PATCH] i386/x86-64: Make all early PCI scans dependent on CONFIG_PCI
[PATCH] Don't leak NT bit into next task
[PATCH] i386/x86-64: Work around gcc bug with noreturn functions in unwinder
[PATCH] Fix some broken white space in ia32_signal.c
[PATCH] Initialize argument registers for 32bit signal handlers.
[PATCH] Remove all traces of signal number conversion
[PATCH] Don't synchronize time reading on single core AMD systems
[PATCH] Remove outdated comment in x86-64 mmconfig code
[PATCH] Use string instructions for Core2 copy/clear
[PATCH] x86: - restore i8259A eoi status on resume
[PATCH] i386: Split multi-line printk in oops output.
...

Linus Torvalds
2006-09-27 04:07:55 +0800

26 Sep, 2006

4 commits

2965a0e6d [PATCH] x86: trivial move of ptep_set_access_flags ... Browse Code »

Move ptep_set_access_flags to be closer to the other ptep accessors, and make
the indentation standard.

Signed-off-by: Zachary Amsden
Signed-off-by: Rusty Russell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Rusty Russell
2006-09-26 23:48:56 +0800
6049742db [PATCH] x86: trivial move of __HAVE macros in i386 pagetable headers ... Browse Code »

Move the __HAVE_ARCH_PTEP defines to accompany the function definitions.
Anything else is just a complete nightmare to track through the 2/3-level
paging code, and this caused duplicate definitions to be needed (pte_same),
which could have easily been taken care of with the asm-generic pgtable
functions.

Signed-off-by: Zachary Amsden
Signed-off-by: Rusty Russell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Rusty Russell
2006-09-26 23:48:56 +0800
46a82b2d5 [PATCH] Standardize pxx_page macros ... Browse Code »

One of the changes necessary for shared page tables is to standardize the
pxx_page macros. pte_page and pmd_page have always returned the struct
page associated with their entry, while pte_page_kernel and pmd_page_kernel
have returned the kernel virtual address. pud_page and pgd_page, on the
other hand, return the kernel virtual address.

Shared page tables needs pud_page and pgd_page to return the actual page
structures. There are very few actual users of these functions, so it is
simple to standardize their usage.

Since this is basic cleanup, I am submitting these changes as a standalone
patch. Per Hugh Dickins' comments about it, I am also changing the
pxx_page_kernel macros to pxx_page_vaddr to clarify their meaning.

Signed-off-by: Dave McCracken
Cc: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dave McCracken
2006-09-26 23:48:51 +0800
1a3f239dd [PATCH] i386: Replace i386 open-coded cmdline parsing with ... Browse Code »

This patch replaces the open-coded early commandline parsing
throughout the i386 boot code with the generic mechanism (already used
by ppc, powerpc, ia64 and s390). The code was inconsistent with
whether it deletes the option from the cmdline or not, meaning some of
these will get passed through the environment into init.

This transformation is mainly mechanical, but there are some notable
parts:

1) Grammar: s/linux never set's it up/linux never sets it up/

2) Remove hacked-in earlyprintk= option scanning. When someone
actually implements CONFIG_EARLY_PRINTK, then they can use
early_param().
[AK: actually it is implemented, but I'm adding the early_param it in the next
x86-64 patch]

3) Move declaration of generic_apic_probe() from setup.c into asm/apic.h

4) Various parameters now moved into their appropriate files (thanks Andi).

5) All parse functions which examine arg need to check for NULL,
except one where it has subtle humor value.

AK: readded acpi_sci handling which was completely dropped
AK: moved some more variables into acpi/boot.c

Cc: len.brown@intel.com

Signed-off-by: Rusty Russell
Signed-off-by: Andi Kleen

Rusty Russell
2006-09-26 16:52:32 +0800

29 Apr, 2006

1 commit

d6754b401 Merge git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 Browse Code »

David Woodhouse
2006-04-29 08:42:26 +0800

28 Apr, 2006

1 commit

6e5882cfa [PATCH] x86/PAE: Fix pte_clear for the >4GB RAM case ... Browse Code »

Proposed fix for ptep_get_and_clear_full PAE bug. Pte_clear had the same bug,
so use the same fix for both. Turns out pmd_clear had it as well, but pgds
are not affected.

The problem is rather intricate. Page table entries in PAE mode are 64-bits
wide, but the only atomic 8-byte write operation available in 32-bit mode is
cmpxchg8b, which is expensive (at least on P4), and thus avoided. But it can
happen that the processor may prefetch entries into the TLB in the middle of an
operation which clears a page table entry. So one must always clear the P-bit
in the low word of the page table entry first when clearing it.

Since the sequence *ptep = __pte(0) leaves the order of the write dependent on
the compiler, it must be coded explicitly as a clear of the low word followed
by a clear of the high word. Further, there must be a write memory barrier
here to enforce proper ordering by the compiler (and, in the future, by the
processor as well).

On > 4GB memory machines, the implementation of pte_clear for PAE was clearly
deficient, as it could leave virtual mappings of physical memory above 4GB
aliased to memory below 4GB in the TLB. The implementation of
ptep_get_and_clear_full has a similar bug, although not nearly as likely to
occur, since the mappings being cleared are in the process of being destroyed,
and should never be dereferenced again.

But, as luck would have it, it is possible to trigger bugs even without ever
dereferencing these bogus TLB mappings, even if the clear is followed fairly
soon after with a TLB flush or invalidation. The problem is that memory above
4GB may now be aliased into the first 4GB of memory, and in fact, may hit a
region of memory with non-memory semantics. These regions include AGP and PCI
space. As such, these memory regions are not cached by the processor. This
introduces the bug.

The processor can speculate memory operations, including memory writes, as long
as they are committed with the proper ordering. Speculating a memory write to
a linear address that has a bogus TLB mapping is possible. Normally, the
speculation is harmless. But for cached memory, it does leave the falsely
speculated cacheline unmodified, but in a dirty state. This cache line will be
eventually written back. If this cacheline happens to intersect a region of
memory that is not protected by the cache coherency protocol, it can corrupt
data in I/O memory, which is generally a very bad thing to do, and can cause
total system failure or just plain undefined behavior.

These bugs are extremely unlikely, but the severity is of such magnitude, and
the fix so simple that I think fixing them immediately is justified. Also,
they are nearly impossible to debug.

Signed-off-by: Zachary Amsden
Signed-off-by: Linus Torvalds

Zachary Amsden
2006-04-28 03:00:59 +0800

26 Apr, 2006

1 commit

62c4f0a2d Don't include linux/config.h from anywhere else in include/ ... Browse Code »

Signed-off-by: David Woodhouse

David Woodhouse
2006-04-26 19:56:16 +0800

22 Mar, 2006

1 commit

8f860591f [PATCH] Enable mprotect on huge pages ... Browse Code »

2.6.16-rc3 uses hugetlb on-demand paging, but it doesn_t support hugetlb
mprotect.

From: David Gibson

Remove a test from the mprotect() path which checks that the mprotect()ed
range on a hugepage VMA is hugepage aligned (yes, really, the sense of
is_aligned_hugepage_range() is the opposite of what you'd guess :-/).

In fact, we don't need this test. If the given addresses match the
beginning/end of a hugepage VMA they must already be suitably aligned. If
they don't, then mprotect_fixup() will attempt to split the VMA. The very
first test in split_vma() will check for a badly aligned address on a
hugepage VMA and return -EINVAL if necessary.

From: "Chen, Kenneth W"

On i386 and x86-64, pte flag _PAGE_PSE collides with _PAGE_PROTNONE. The
identify of hugetlb pte is lost when changing page protection via mprotect.
A page fault occurs later will trigger a bug check in huge_pte_alloc().

The fix is to always make new pte a hugetlb pte and also to clean up
legacy code where _PAGE_PRESENT is forced on in the pre-faulting day.

Signed-off-by: Zhang Yanmin
Cc: David Gibson
Cc: "David S. Miller"
Cc: Benjamin Herrenschmidt
Cc: Paul Mackerras
Cc: William Lee Irwin III
Signed-off-by: Ken Chen
Signed-off-by: Nishanth Aravamudan
Cc: Andi Kleen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Zhang, Yanmin
2006-03-22 23:54:03 +0800

07 Nov, 2005

1 commit

8c65b4a60 [PATCH] fix remaining missing includes ... Browse Code »

Fix more include file problems that surfaced since I submitted the previous
fix-missing-includes.patch. This should now allow not to include sched.h
from module.h, which is done by a followup patch.

Signed-off-by: Tim Schmielau
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Tim Schmielau
2005-11-07 23:53:41 +0800

31 Oct, 2005

2 commits

1426d7a81 [PATCH] vm: remove unused/broken page_pte[_prot] macros ... Browse Code »

This patch removes page_pte_prot and page_pte macros from all
architectures. Some architectures define both, some only page_pte (broken)
and others none. These macros are not used anywhere.

page_pte_prot(page, prot) is identical to mk_pte(page, prot) and
page_pte(page) is identical to page_pte_prot(page, __pgprot(0)).

* The following architectures define both page_pte_prot and page_pte

arm, arm26, ia64, sh64, sparc, sparc64

* The following architectures define only page_pte (broken)

frv, i386, m32r, mips, sh, x86-64

* All other architectures define neither

Signed-off-by: Tejun Heo
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Tejun Heo
2005-10-31 09:37:22 +0800
ca140fdad [PATCH] i386: little pgtable.h consolidation vs 2/3level ... Browse Code »

Join together some common functions (pmd_page{,_kernel}) over 2level and
3level pages.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso
Acked-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paolo 'Blaisorblade' Giarrusso
2005-10-31 09:37:12 +0800

30 Oct, 2005

1 commit

705e87c0c [PATCH] mm: pte_offset_map_lock loops ... Browse Code »

Convert those common loops using page_table_lock on the outside and
pte_offset_map within to use just pte_offset_map_lock within instead.

These all hold mmap_sem (some exclusively, some not), so at no level can a
page table be whipped away from beneath them. But whereas pte_alloc loops
tested with the "atomic" pmd_present, these loops are testing with pmd_none,
which on i386 PAE tests both lower and upper halves.

That's now unsafe, so add a cast into pmd_none to test only the vital lower
half: we lose a little sensitivity to a corrupt middle directory, but not
enough to worry about. It appears that i386 and UML were the only
architectures vulnerable in this way, and pgd and pud no problem.

Signed-off-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2005-10-30 12:40:40 +0800

13 Sep, 2005

1 commit

33bf56106 [PATCH] feature removal of io_remap_page_range() ... Browse Code »
44

As written in Documentation/feature-removal-schedule.txt, remove the
io_remap_page_range() kernel API.

Signed-off-by: Randy Dunlap
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Randy Dunlap
2005-09-13 23:22:33 +0800

05 Sep, 2005

2 commits

d7271b14b [PATCH] i386: encapsulate copying of pgd entries ... Browse Code »

Add a clone operation for pgd updates.

This helps complete the encapsulation of updates to page tables (or pages
about to become page tables) into accessor functions rather than using
memcpy() to duplicate them. This is both generally good for consistency
and also necessary for running in a hypervisor which requires explicit
updates to page table entries.

The new function is:

clone_pgd_range(pgd_t *dst, pgd_t *src, int count);

dst - pointer to pgd range anwhere on a pgd page
src - ""
count - the number of pgds to copy.

dst and src can be on the same page, but the range must not overlap
and must not cross a page boundary.

Note that I ommitted using this call to copy pgd entries into the
software suspend page root, since this is not technically a live paging
structure, rather it is used on resume from suspend. CC'ing Pavel in case
he has any feedback on this.

Thanks to Chris Wright for noticing that this could be more optimal in
PAE compiles by eliminating the memset.

Signed-off-by: Zachary Amsden
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Zachary Amsden
2005-09-05 15:06:13 +0800
a600388d2 [PATCH] x86: ptep_clear optimization ... Browse Code »

Add a new accessor for PTEs, which passes the full hint from the mmu_gather
struct; this allows architectures with hardware pagetables to optimize away
atomic PTE operations when destroying an address space. Removing the
locked operation should allow better pipelining of memory access in this
loop. I measured an average savings of 30-35 cycles per zap_pte_range on
the first 500 destructions on Pentium-M, but I believe the optimization
would win more on older processors which still assert the bus lock on xchg
for an exclusive cacheline.

Update: I made some new measurements, and this saves exactly 26 cycles over
ptep_get_and_clear on Pentium M. On P4, with a PAE kernel, this saves 180
cycles per ptep_get_and_clear, for a whopping 92160 cycles savings for a
full address space destruction.

pte_clear_full is not yet used, but is provided for future optimizations
(in particular, when running inside of a hypervisor that queues page table
updates, the full hint allows us to avoid queueing unnecessary page table
update for an address space in the process of being destroyed.

This is not a huge win, but it does help a bit, and sets the stage for
further hypervisor optimization of the mm layer on all architectures.

Signed-off-by: Zachary Amsden
Cc: Christoph Lameter
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Zachary Amsden
2005-09-05 15:05:48 +0800