Eric Lee / smarc-ti-linux-kernel | Embedian Git Server

15 May, 2008

1 commit

1c12c4cf9 mprotect: prevent alteration of the PAT bits ... Browse Code »

There is a defect in mprotect, which lets the user change the page cache
type bits by-passing the kernel reserve_memtype and free_memtype
wrappers. Fix the problem by not letting mprotect change the PAT bits.

Signed-off-by: Venkatesh Pallipadi
Signed-off-by: Suresh Siddha
Signed-off-by: Ingo Molnar
Signed-off-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Venki Pallipadi
2008-05-15 10:11:15 +0800

23 Oct, 2007

1 commit

1ddd439ef fix mprotect vma_wants_writenotify prot ... Browse Code »

Fix mprotect bug in recent commit 3ed75eb8f1cd89565966599c4f77d2edb086d5b0
(setup vma->vm_page_prot by vm_get_page_prot()): the vma_wants_writenotify
case was setting the same prot as when not.

Nothing wrong with the use of protection_map[] in mmap_region(),
but use vm_get_page_prot() there too in the same ~VM_SHARED way.

Signed-off-by: Hugh Dickins
Cc: Coly Li
Cc: Tony Luck
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2007-10-23 23:32:06 +0800

20 Oct, 2007

1 commit

3ed75eb8f setup vma->vm_page_prot by vm_get_page_prot() ... Browse Code »

This patch uses vm_get_page_prot() to setup vma->vm_page_prot.

Though inside vm_get_page_prot() the protection flags is AND with
(VM_READ|VM_WRITE|VM_EXEC|VM_SHARED), it does not hurt correct code.

Signed-off-by: Coly Li
Cc: Hugh Dickins
Cc: Tony Luck
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Coly Li
2007-10-20 02:53:34 +0800

17 Oct, 2007

1 commit

954ffcb35 flush icache before set_pte() on ia64: flush icache at set_pte ... Browse Code »

Current ia64 kernel flushes icache by lazy_mmu_prot_update() *after*
set_pte(). This is too late. This patch removes lazy_mmu_prot_update and
add modfied set_pte() for flushing if necessary.

This patch flush icache of a page when
new pte has exec bit.
&& new pte has present bit
&& new pte is user's page.
&& (old *ptep is not present
|| new pte's pfn is not same to old *ptep's ptn)
&& new pte's page has no Pg_arch_1 bit.
Pg_arch_1 is set when a page is cache consistent.

I think this condition checks are much easier to understand than considering
"Where sync_icache_dcache() should be inserted ?".

pte_user() for ia64 was removed by http://lkml.org/lkml/2007/6/12/67 as
clean-up. So, I added it again.

Signed-off-by: KAMEZAWA Hiroyuki
Cc: "Luck, Tony"
Cc: Christoph Lameter
Cc: Hugh Dickins
Cc: Nick Piggin
Acked-by: David S. Miller
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2007-10-17 00:42:59 +0800

20 Jul, 2007

1 commit

b6a2fea39 mm: variable length argument support ... Browse Code »
4

Remove the arg+env limit of MAX_ARG_PAGES by copying the strings directly from
the old mm into the new mm.

We create the new mm before the binfmt code runs, and place the new stack at
the very top of the address space. Once the binfmt code runs and figures out
where the stack should be, we move it downwards.

It is a bit peculiar in that we have one task with two mm's, one of which is
inactive.

[a.p.zijlstra@chello.nl: limit stack size]
Signed-off-by: Ollie Wild
Signed-off-by: Peter Zijlstra
Cc:
Cc: Hugh Dickins
[bunk@stusta.de: unexport bprm_mm_init]
Signed-off-by: Adrian Bunk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ollie Wild
2007-07-20 01:04:45 +0800

01 Oct, 2006

1 commit

6606c3e0d [PATCH] paravirt: lazy mmu mode hooks.patch ... Browse Code »

Implement lazy MMU update hooks which are SMP safe for both direct and shadow
page tables. The idea is that PTE updates and page invalidations while in
lazy mode can be batched into a single hypercall. We use this in VMI for
shadow page table synchronization, and it is a win. It also can be used by
PPC and for direct page tables on Xen.

For SMP, the enter / leave must happen under protection of the page table
locks for page tables which are being modified. This is because otherwise,
you end up with stale state in the batched hypercall, which other CPUs can
race ahead of. Doing this under the protection of the locks guarantees the
synchronization is correct, and also means that spurious faults which are
generated during this window by remote CPUs are properly handled, as the page
fault handler must re-check the PTE under protection of the same lock.

Signed-off-by: Zachary Amsden
Signed-off-by: Jeremy Fitzhardinge
Cc: Rusty Russell
Cc: Andi Kleen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Zachary Amsden
2006-10-01 15:39:33 +0800

26 Sep, 2006

2 commits

c1e6098b2 [PATCH] mm: optimize the new mprotect() code a bit ... Browse Code »

mprotect() resets the page protections, which could result in extra write
faults for those pages whose dirty state we track using write faults and are
dirty already.

Signed-off-by: Peter Zijlstra
Cc: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Peter Zijlstra
2006-09-26 23:48:44 +0800
d08b3851d [PATCH] mm: tracking shared dirty pages ... Browse Code »

Tracking of dirty pages in shared writeable mmap()s.

The idea is simple: write protect clean shared writeable pages, catch the
write-fault, make writeable and set dirty. On page write-back clean all the
PTE dirty bits and write protect them once again.

The implementation is a tad harder, mainly because the default
backing_dev_info capabilities were too loosely maintained. Hence it is not
enough to test the backing_dev_info for cap_account_dirty.

The current heuristic is as follows, a VMA is eligible when:
- its shared writeable
(vm_flags & (VM_WRITE|VM_SHARED)) == (VM_WRITE|VM_SHARED)
- it is not a 'special' mapping
(vm_flags & (VM_PFNMAP|VM_INSERTPAGE)) == 0
- the backing_dev_info is cap_account_dirty
mapping_cap_account_dirty(vma->vm_file->f_mapping)
- f_op->mmap() didn't change the default page protection

Page from remap_pfn_range() are explicitly excluded because their COW
semantics are already horrid enough (see vm_normal_page() in do_wp_page()) and
because they don't have a backing store anyway.

mprotect() is taught about the new behaviour as well. However it overrides
the last condition.

Cleaning the pages on write-back is done with page_mkclean() a new rmap call.
It can be called on any page, but is currently only implemented for mapped
pages, if the page is found the be of a VMA that accounts dirty pages it will
also wrprotect the PTE.

Finally, in fs/buffers.c:try_to_free_buffers(); remove clear_page_dirty() from
under ->private_lock. This seems to be safe, since ->private_lock is used to
serialize access to the buffers, not the page itself. This is needed because
clear_page_dirty() will call into page_mkclean() and would thereby violate
locking order.

[dhowells@redhat.com: Provide a page_mkclean() implementation for NOMMU]
Signed-off-by: Peter Zijlstra
Cc: Hugh Dickins
Signed-off-by: David Howells
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Peter Zijlstra
2006-09-26 23:48:44 +0800

23 Jun, 2006

3 commits

9637a5efd [PATCH] add page_mkwrite() vm_operations method ... Browse Code »

Add a new VMA operation to notify a filesystem or other driver about the
MMU generating a fault because userspace attempted to write to a page
mapped through a read-only PTE.

This facility permits the filesystem or driver to:

(*) Implement storage allocation/reservation on attempted write, and so to
deal with problems such as ENOSPC more gracefully (perhaps by generating
SIGBUS).

(*) Delay making the page writable until the contents have been written to a
backing cache. This is useful for NFS/AFS when using FS-Cache/CacheFS.
It permits the filesystem to have some guarantee about the state of the
cache.

(*) Account and limit number of dirty pages. This is one piece of the puzzle
needed to make shared writable mapping work safely in FUSE.

Needed by cachefs (Or is it cachefiles? Or fscache? ).

At least four other groups have stated an interest in it or a desire to use
the functionality it provides: FUSE, OCFS2, NTFS and JFFS2. Also, things like
EXT3 really ought to use it to deal with the case of shared-writable mmap
encountering ENOSPC before we permit the page to be dirtied.

From: Peter Zijlstra

get_user_pages(.write=1, .force=1) can generate COW hits on read-only
shared mappings, this patch traps those as mkpage_write candidates and fails
to handle them the old way.

Signed-off-by: David Howells
Cc: Miklos Szeredi
Cc: Joel Becker
Cc: Mark Fasheh
Cc: Anton Altaparmakov
Cc: David Woodhouse
Cc: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Howells
2006-06-23 22:42:51 +0800
0697212a4 [PATCH] Swapless page migration: add R/W migration entries ... Browse Code »

Implement read/write migration ptes

We take the upper two swapfiles for the two types of migration ptes and define
a series of macros in swapops.h.

The VM is modified to handle the migration entries. migration entries can
only be encountered when the page they are pointing to is locked. This limits
the number of places one has to fix. We also check in copy_pte_range and in
mprotect_pte_range() for migration ptes.

We check for migration ptes in do_swap_cache and call a function that will
then wait on the page lock. This allows us to effectively stop all accesses
to apge.

Migration entries are created by try_to_unmap if called for migration and
removed by local functions in migrate.c

From: Hugh Dickins

Several times while testing swapless page migration (I've no NUMA, just
hacking it up to migrate recklessly while running load), I've hit the
BUG_ON(!PageLocked(p)) in migration_entry_to_page.

This comes from an orphaned migration entry, unrelated to the current
correctly locked migration, but hit by remove_anon_migration_ptes as it
checks an address in each vma of the anon_vma list.

Such an orphan may be left behind if an earlier migration raced with fork:
copy_one_pte can duplicate a migration entry from parent to child, after
remove_anon_migration_ptes has checked the child vma, but before it has
removed it from the parent vma. (If the process were later to fault on this
orphaned entry, it would hit the same BUG from migration_entry_wait.)

This could be fixed by locking anon_vma in copy_one_pte, but we'd rather
not. There's no such problem with file pages, because vma_prio_tree_add
adds child vma after parent vma, and the page table locking at each end is
enough to serialize. Follow that example with anon_vma: add new vmas to the
tail instead of the head.

(There's no corresponding problem when inserting migration entries,
because a missed pte will leave the page count and mapcount high, which is
allowed for. And there's no corresponding problem when migrating via swap,
because a leftover swap entry will be correctly faulted. But the swapless
method has no refcounting of its entries.)

From: Ingo Molnar

pte_unmap_unlock() takes the pte pointer as an argument.

From: Hugh Dickins

Several times while testing swapless page migration, gcc has tried to exec
a pointer instead of a string: smells like COW mappings are not being
properly write-protected on fork.

The protection in copy_one_pte looks very convincing, until at last you
realize that the second arg to make_migration_entry is a boolean "write",
and SWP_MIGRATION_READ is 30.

Anyway, it's better done like in change_pte_range, using
is_write_migration_entry and make_migration_entry_read.

From: Hugh Dickins

Remove unnecessary obfuscation from sys_swapon's range check on swap type,
which blew up causing memory corruption once swapless migration made
MAX_SWAPFILES no longer 2 ^ MAX_SWAPFILES_SHIFT.

Signed-off-by: Hugh Dickins
Acked-by: Martin Schwidefsky
Signed-off-by: Hugh Dickins
Signed-off-by: Christoph Lameter
Signed-off-by: Ingo Molnar
From: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-06-23 22:42:50 +0800
b344e05c5 [PATCH] likely cleanup: remove unlikely in sys_mprotect() ... Browse Code »

With likely/unlikely profiling on my not-so-busy-typical-developmentsystem
there are 5k misses vs 2k hits. So I guess we should remove the unlikely.

Signed-off-by: Hua Zhong
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hua Zhong
2006-06-23 22:42:49 +0800

22 Mar, 2006

1 commit

8f860591f [PATCH] Enable mprotect on huge pages ... Browse Code »

2.6.16-rc3 uses hugetlb on-demand paging, but it doesn_t support hugetlb
mprotect.

From: David Gibson

Remove a test from the mprotect() path which checks that the mprotect()ed
range on a hugepage VMA is hugepage aligned (yes, really, the sense of
is_aligned_hugepage_range() is the opposite of what you'd guess :-/).

In fact, we don't need this test. If the given addresses match the
beginning/end of a hugepage VMA they must already be suitably aligned. If
they don't, then mprotect_fixup() will attempt to split the VMA. The very
first test in split_vma() will check for a badly aligned address on a
hugepage VMA and return -EINVAL if necessary.

From: "Chen, Kenneth W"

On i386 and x86-64, pte flag _PAGE_PSE collides with _PAGE_PROTNONE. The
identify of hugetlb pte is lost when changing page protection via mprotect.
A page fault occurs later will trigger a bug check in huge_pte_alloc().

The fix is to always make new pte a hugetlb pte and also to clean up
legacy code where _PAGE_PRESENT is forced on in the pre-faulting day.

Signed-off-by: Zhang Yanmin
Cc: David Gibson
Cc: "David S. Miller"
Cc: Benjamin Herrenschmidt
Cc: Paul Mackerras
Cc: William Lee Irwin III
Signed-off-by: Ken Chen
Signed-off-by: Nishanth Aravamudan
Cc: Andi Kleen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Zhang, Yanmin
2006-03-22 23:54:03 +0800

23 Nov, 2005

1 commit

83e9b7e92 [PATCH] unpaged: private write VM_RESERVED ... Browse Code »

The PageReserved removal in 2.6.15-rc1 issued a "deprecated" message when you
tried to mmap or mprotect MAP_PRIVATE PROT_WRITE a VM_RESERVED, and failed
with -EACCES: because do_wp_page lacks the refinement to COW pages in those
areas, nor do we expect to find anonymous pages in them; and it seemed just
bloat to add code for handling such a peculiar case. But immediately it
caused vbetool and ddcprobe (using lrmi) to fail.

So revert the "deprecated" messages, letting mmap and mprotect succeed. But
leave do_wp_page's BUG_ON(vma->vm_flags & VM_RESERVED) in place until we've
added the code to do it right: so this particular patch is only good if the
app doesn't really need to write to that private area.

Signed-off-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2005-11-23 01:13:42 +0800

30 Oct, 2005

3 commits

705e87c0c [PATCH] mm: pte_offset_map_lock loops ... Browse Code »

Convert those common loops using page_table_lock on the outside and
pte_offset_map within to use just pte_offset_map_lock within instead.

These all hold mmap_sem (some exclusively, some not), so at no level can a
page table be whipped away from beneath them. But whereas pte_alloc loops
tested with the "atomic" pmd_present, these loops are testing with pmd_none,
which on i386 PAE tests both lower and upper halves.

That's now unsafe, so add a cast into pmd_none to test only the vital lower
half: we lose a little sensitivity to a corrupt middle directory, but not
enough to worry about. It appears that i386 and UML were the only
architectures vulnerable in this way, and pgd and pud no problem.

Signed-off-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2005-10-30 12:40:40 +0800
b5810039a [PATCH] core remove PageReserved ... Browse Code »

Remove PageReserved() calls from core code by tightening VM_RESERVED
handling in mm/ to cover PageReserved functionality.

PageReserved special casing is removed from get_page and put_page.

All setting and clearing of PageReserved is retained, and it is now flagged
in the page_alloc checks to help ensure we don't introduce any refcount
based freeing of Reserved pages.

MAP_PRIVATE, PROT_WRITE of VM_RESERVED regions is tentatively being
deprecated. We never completely handled it correctly anyway, and is be
reintroduced in future if required (Hugh has a proof of concept).

Once PageReserved() calls are removed from kernel/power/swsusp.c, and all
arch/ and driver code, the Set and Clear calls, and the PG_reserved bit can
be trivially removed.

Last real user of PageReserved is swsusp, which uses PageReserved to
determine whether a struct page points to valid memory or not. This still
needs to be addressed (a generic page_is_ram() should work).

A last caveat: the ZERO_PAGE is now refcounted and managed with rmap (and
thus mapcounted and count towards shared rss). These writes to the struct
page could cause excessive cacheline bouncing on big systems. There are a
number of ways this could be addressed if it is an issue.

Signed-off-by: Nick Piggin

Refcount bug fix for filemap_xip.c

Signed-off-by: Carsten Otte
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2005-10-30 12:40:39 +0800
ab50b8ed8 [PATCH] mm: vm_stat_account unshackled ... Browse Code »

The original vm_stat_account has fallen into disuse, with only one user, and
only one user of vm_stat_unaccount. It's easier to keep track if we convert
them all to __vm_stat_account, then free it from its __shackles.

Signed-off-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2005-10-30 12:40:37 +0800

22 Sep, 2005

1 commit

7e2cff42c [PATCH] mm: add a note about partially hardcoded VM_* flags ... Browse Code »

Hugh made me note this line for permission checking in mprotect():

if ((newflags & ~(newflags >> 4)) & 0xf) {

after figuring out what's that about, I decided it's nasty enough. Btw
Hugh itself didn't like the 0xf.

We can safely change it to VM_READ|VM_WRITE|VM_EXEC because we never change
VM_SHARED, so no need to check that.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso
Acked-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paolo 'Blaisorblade' Giarrusso
2005-09-22 01:11:55 +0800

17 Apr, 2005

1 commit

1da177e4c Linux-2.6.12-rc2 ... Browse Code »
212

Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.

Let it rip!

Linus Torvalds
2005-04-17 06:20:36 +0800