14 Aug, 2013

1 commit

  • Andy reported that if file page get reclaimed we lose the soft-dirty bit
    if it was there, so save _PAGE_BIT_SOFT_DIRTY bit when page address get
    encoded into pte entry. Thus when #pf happens on such non-present pte
    we can restore it back.

    Reported-by: Andy Lutomirski
    Signed-off-by: Cyrill Gorcunov
    Acked-by: Pavel Emelyanov
    Cc: Matt Mackall
    Cc: Xiao Guangrong
    Cc: Marcelo Tosatti
    Cc: KOSAKI Motohiro
    Cc: Stephen Rothwell
    Cc: Peter Zijlstra
    Cc: "Aneesh Kumar K.V"
    Cc: Minchan Kim
    Cc: Wanpeng Li
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Cyrill Gorcunov
     

29 Mar, 2013

1 commit

  • This reverts commit 186930500985 ("mm: introduce VM_POPULATE flag to
    better deal with racy userspace programs").

    VM_POPULATE only has any effect when userspace plays racy games with
    vmas by trying to unmap and remap memory regions that mmap or mlock are
    operating on.

    Also, the only effect of VM_POPULATE when userspace plays such games is
    that it avoids populating new memory regions that get remapped into the
    address range that was being operated on by the original mmap or mlock
    calls.

    Let's remove VM_POPULATE as there isn't any strong argument to mandate a
    new vm_flag.

    Signed-off-by: Michel Lespinasse
    Signed-off-by: Hugh Dickins
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     

15 Mar, 2013

1 commit

  • The vm_flags introduced in 6d7825b10dbe ("mm/fremap.c: fix oops on error
    path") is supposed to avoid a compiler warning about unitialized
    vm_flags without changing the generated code.

    However I am concerned that this is going to be very brittle, and fail
    with some compiler versions. The failure could be either of:

    - compiler could actually load vma->vm_flags before checking for the
    !vma condition, thus reintroducing the oops

    - compiler could optimize out the !vma check, since the pointer just got
    dereferenced shortly before (so the compiler knows it can't be NULL!)

    I propose reversing this part of the change and initializing vm_flags to 0
    just to avoid the bogus uninitialized use warning.

    Signed-off-by: Michel Lespinasse
    Cc: Tommi Rantala
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     

14 Mar, 2013

1 commit

  • If find_vma() fails, sys_remap_file_pages() will dereference `vma', which
    contains NULL. Fix it by checking the pointer.

    (We could alternatively check for err==0, but this seems more direct)

    (The vm_flags change is to squish a bogus used-uninitialised warning
    without adding extra code).

    Reported-by: Tommi Rantala
    Cc: Michel Lespinasse
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

24 Feb, 2013

4 commits

  • The vm_populate() code populates user mappings without constantly
    holding the mmap_sem. This makes it susceptible to racy userspace
    programs: the user mappings may change while vm_populate() is running,
    and in this case vm_populate() may end up populating the new mapping
    instead of the old one.

    In order to reduce the possibility of userspace getting surprised by
    this behavior, this change introduces the VM_POPULATE vma flag which
    gets set on vmas we want vm_populate() to work on. This way
    vm_populate() may still end up populating the new mapping after such a
    race, but only if the new mapping is also one that the user has
    requested (using MAP_SHARED, MAP_LOCKED or mlock) to be populated.

    Signed-off-by: Michel Lespinasse
    Acked-by: Rik van Riel
    Tested-by: Andy Lutomirski
    Cc: Greg Ungerer
    Cc: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     
  • After the MAP_POPULATE handling has been moved to mmap_region() call
    sites, the only remaining use of the flags argument is to pass the
    MAP_NORESERVE flag. This can be just as easily handled by
    do_mmap_pgoff(), so do that and remove the mmap_region() flags
    parameter.

    [akpm@linux-foundation.org: remove double parens]
    Signed-off-by: Michel Lespinasse
    Acked-by: Rik van Riel
    Tested-by: Andy Lutomirski
    Cc: Greg Ungerer
    Cc: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     
  • Signed-off-by: Michel Lespinasse
    Reviewed-by: Rik van Riel
    Tested-by: Andy Lutomirski
    Cc: Greg Ungerer
    Cc: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     
  • We have many vma manipulation functions that are fast in the typical
    case, but can optionally be instructed to populate an unbounded number
    of ptes within the region they work on:

    - mmap with MAP_POPULATE or MAP_LOCKED flags;
    - remap_file_pages() with MAP_NONBLOCK not set or when working on a
    VM_LOCKED vma;
    - mmap_region() and all its wrappers when mlock(MCL_FUTURE) is in
    effect;
    - brk() when mlock(MCL_FUTURE) is in effect.

    Current code handles these pte operations locally, while the
    sourrounding code has to hold the mmap_sem write side since it's
    manipulating vmas. This means we're doing an unbounded amount of pte
    population work with mmap_sem held, and this causes problems as Andy
    Lutomirski reported (we've hit this at Google as well, though it's not
    entirely clear why people keep trying to use mlock(MCL_FUTURE) in the
    first place).

    I propose introducing a new mm_populate() function to do this pte
    population work after the mmap_sem has been released. mm_populate()
    does need to acquire the mmap_sem read side, but critically, it doesn't
    need to hold it continuously for the entire duration of the operation -
    it can drop it whenever things take too long (such as when hitting disk
    for a file read) and re-acquire it later on.

    The following patches are included

    - Patches 1 fixes some issues I noticed while working on the existing code.
    If needed, they could potentially go in before the rest of the patches.

    - Patch 2 introduces the new mm_populate() function and changes
    mmap_region() call sites to use it after they drop mmap_sem. This is
    inspired from Andy Lutomirski's proposal and is built as an extension
    of the work I had previously done for mlock() and mlockall() around
    v2.6.38-rc1. I had tried doing something similar at the time but had
    given up as there were so many do_mmap() call sites; the recent cleanups
    by Linus and Viro are a tremendous help here.

    - Patches 3-5 convert some of the less-obvious places doing unbounded
    pte populates to the new mm_populate() mechanism.

    - Patches 6-7 are code cleanups that are made possible by the
    mm_populate() work. In particular, they remove more code than the
    entire patch series added, which should be a good thing :)

    - Patch 8 is optional to this entire series. It only helps to deal more
    nicely with racy userspace programs that might modify their mappings
    while we're trying to populate them. It adds a new VM_POPULATE flag
    on the mappings we do want to populate, so that if userspace replaces
    them with mappings it doesn't want populated, mm_populate() won't
    populate those replacement mappings.

    This patch:

    Assorted small fixes. The first two are quite small:

    - Move check for vma->vm_private_data && !(vma->vm_flags & VM_NONLINEAR)
    within existing if (!(vma->vm_flags & VM_NONLINEAR)) block.
    Purely cosmetic.

    - In the VM_LOCKED case, when dropping PG_Mlocked for the over-mapped
    range, make sure we own the mmap_sem write lock around the
    munlock_vma_pages_range call as this manipulates the vma's vm_flags.

    Last fix requires a longer explanation. remap_file_pages() can do its work
    either through VM_NONLINEAR manipulation or by creating extra vmas.
    These two cases were inconsistent with each other (and ultimately, both wrong)
    as to exactly when did they fault in the newly mapped file pages:

    - In the VM_NONLINEAR case, new file pages would be populated if
    the MAP_NONBLOCK flag wasn't passed. If MAP_NONBLOCK was passed,
    new file pages wouldn't be populated even if the vma is already
    marked as VM_LOCKED.

    - In the linear (emulated) case, the work is passed to the mmap_region()
    function which would populate the pages if the vma is marked as
    VM_LOCKED, and would not otherwise - regardless of the value of the
    MAP_NONBLOCK flag, because MAP_POPULATE wasn't being passed to
    mmap_region().

    The desired behavior is that we want the pages to be populated and locked
    if the vma is marked as VM_LOCKED, or to be populated if the MAP_NONBLOCK
    flag is not passed to remap_file_pages().

    Signed-off-by: Michel Lespinasse
    Acked-by: Rik van Riel
    Tested-by: Andy Lutomirski
    Cc: Greg Ungerer
    Cc: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     

20 Oct, 2012

1 commit

  • In commit 0b173bc4daa8 ("mm: kill vma flag VM_CAN_NONLINEAR") we
    replaced the VM_CAN_NONLINEAR test with checking whether the mapping has
    a '->remap_pages()' vm operation, but there is no guarantee that there
    it even has a vm_ops pointer at all.

    Add the appropriate test for NULL vm_ops.

    Reported-by: Sasha Levin
    Cc: Konstantin Khlebnikov
    Cc: Andrew Morton
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

09 Oct, 2012

2 commits

  • Implement an interval tree as a replacement for the VMA prio_tree. The
    algorithms are similar to lib/interval_tree.c; however that code can't be
    directly reused as the interval endpoints are not explicitly stored in the
    VMA. So instead, the common algorithm is moved into a template and the
    details (node type, how to get interval endpoints from the node, etc) are
    filled in using the C preprocessor.

    Once the interval tree functions are available, using them as a
    replacement to the VMA prio tree is a relatively simple, mechanical job.

    Signed-off-by: Michel Lespinasse
    Cc: Rik van Riel
    Cc: Hillf Danton
    Cc: Peter Zijlstra
    Cc: Catalin Marinas
    Cc: Andrea Arcangeli
    Cc: David Woodhouse
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     
  • Move actual pte filling for non-linear file mappings into the new special
    vma operation: ->remap_pages().

    Filesystems must implement this method to get non-linear mapping support,
    if it uses filemap_fault() then generic_file_remap_pages() can be used.

    Now device drivers can implement this method and obtain nonlinear vma support.

    Signed-off-by: Konstantin Khlebnikov
    Cc: Alexander Viro
    Cc: Carsten Otte
    Cc: Chris Metcalf #arch/tile
    Cc: Cyrill Gorcunov
    Cc: Eric Paris
    Cc: H. Peter Anvin
    Cc: Hugh Dickins
    Cc: Ingo Molnar
    Cc: James Morris
    Cc: Jason Baron
    Cc: Kentaro Takeda
    Cc: Matt Helsley
    Cc: Nick Piggin
    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Robert Richter
    Cc: Suresh Siddha
    Cc: Tetsuo Handa
    Cc: Venkatesh Pallipadi
    Acked-by: Linus Torvalds
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Konstantin Khlebnikov
     

27 Sep, 2012

1 commit


31 Oct, 2011

1 commit


27 May, 2011

1 commit

  • The type of vma->vm_flags is 'unsigned long'. Neither 'int' nor
    'unsigned int'. This patch fixes such misuse.

    Signed-off-by: KOSAKI Motohiro
    [ Changed to use a typedef - we'll extend it to cover more cases
    later, since there has been discussion about making it a 64-bit
    type.. - Linus ]
    Signed-off-by: Linus Torvalds

    KOSAKI Motohiro
     

25 May, 2011

1 commit

  • Straightforward conversion of i_mmap_lock to a mutex.

    Signed-off-by: Peter Zijlstra
    Acked-by: Hugh Dickins
    Cc: Benjamin Herrenschmidt
    Cc: David Miller
    Cc: Martin Schwidefsky
    Cc: Russell King
    Cc: Paul Mundt
    Cc: Jeff Dike
    Cc: Richard Weinberger
    Cc: Tony Luck
    Cc: KAMEZAWA Hiroyuki
    Cc: Mel Gorman
    Cc: KOSAKI Motohiro
    Cc: Nick Piggin
    Cc: Namhyung Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     

26 Sep, 2010

1 commit

  • Thomas Pollet noticed that the remap_file_pages() system call in
    fremap.c has a potential overflow in the first part of the if statement
    below, which could cause it to process bogus input parameters.
    Specifically the pgoff + size parameters could be wrap thereby
    preventing the system call from failing when it should.

    Reported-by: Thomas Pollet
    Signed-off-by: Larry Woodman
    Signed-off-by: Linus Torvalds

    Larry Woodman
     

25 Sep, 2010

1 commit

  • Thomas Pollet points out that the 'end' variable is broken. It was
    computed based on start/size before they were page-aligned, and as such
    doesn't actually match any of the other actions we take. The overflow
    test on end was also redundant, since we had already tested it with the
    properly aligned version.

    So just get rid of it entirely. The one remaining use for that broken
    variable can just use 'start+size' like all the other cases already did.

    Reported-by: Thomas Pollet
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

07 Mar, 2010

1 commit

  • Presently, per-mm statistics counter is defined by macro in sched.h

    This patch modifies it to
    - defined in mm.h as inlinf functions
    - use array instead of macro's name creation.

    This patch is for reducing patch size in future patch to modify
    implementation of per-mm counter.

    Signed-off-by: KAMEZAWA Hiroyuki
    Reviewed-by: Minchan Kim
    Cc: Christoph Lameter
    Cc: Lee Schermerhorn
    Cc: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     

11 Feb, 2009

1 commit

  • When overcommit is disabled, the core VM accounts for pages used by anonymous
    shared, private mappings and special mappings. It keeps track of VMAs that
    should be accounted for with VM_ACCOUNT and VMAs that never had a reserve
    with VM_NORESERVE.

    Overcommit for hugetlbfs is much riskier than overcommit for base pages
    due to contiguity requirements. It avoids overcommiting on both shared and
    private mappings using reservation counters that are checked and updated
    during mmap(). This ensures (within limits) that hugepages exist in the
    future when faults occurs or it is too easy to applications to be SIGKILLed.

    As hugetlbfs makes its own reservations of a different unit to the base page
    size, VM_ACCOUNT should never be set. Even if the units were correct, we would
    double account for the usage in the core VM and hugetlbfs. VM_NORESERVE may
    be set because an application can request no reserves be made for hugetlbfs
    at the risk of getting killed later.

    With commit fc8744adc870a8d4366908221508bb113d8b72ee, VM_NORESERVE and
    VM_ACCOUNT are getting unconditionally set for hugetlbfs-backed mappings. This
    breaks the accounting for both the core VM and hugetlbfs, can trigger an
    OOM storm when hugepage pools are too small lockups and corrupted counters
    otherwise are used. This patch brings hugetlbfs more in line with how the
    core VM treats VM_NORESERVE but prevents VM_ACCOUNT being set.

    Signed-off-by: Mel Gorman
    Signed-off-by: Linus Torvalds

    Mel Gorman
     

14 Jan, 2009

1 commit


07 Jan, 2009

1 commit

  • Remove page_remove_rmap()'s vma arg, which was only for the Eeek message.
    And remove the BUG_ON(page_mapcount(page) == 0) from CONFIG_DEBUG_VM's
    page_dup_rmap(): we're trying to be more resilient about that than BUGs.

    Signed-off-by: Hugh Dickins
    Cc: Nick Piggin
    Cc: Christoph Lameter
    Cc: Mel Gorman
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

20 Oct, 2008

1 commit

  • Originally by Nick Piggin

    Remove mlocked pages from the LRU using "unevictable infrastructure"
    during mmap(), munmap(), mremap() and truncate(). Try to move back to
    normal LRU lists on munmap() when last mlocked mapping removed. Remove
    PageMlocked() status when page truncated from file.

    [akpm@linux-foundation.org: cleanup]
    [kamezawa.hiroyu@jp.fujitsu.com: fix double unlock_page()]
    [kosaki.motohiro@jp.fujitsu.com: split LRU: munlock rework]
    [lee.schermerhorn@hp.com: mlock: fix __mlock_vma_pages_range comment block]
    [akpm@linux-foundation.org: remove bogus kerneldoc token]
    Signed-off-by: Nick Piggin
    Signed-off-by: Lee Schermerhorn
    Signed-off-by: Rik van Riel
    Signed-off-by: KOSAKI Motohiro
    Signed-off-by: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rik van Riel
     

29 Jul, 2008

1 commit

  • With KVM/GFP/XPMEM there isn't just the primary CPU MMU pointing to pages.
    There are secondary MMUs (with secondary sptes and secondary tlbs) too.
    sptes in the kvm case are shadow pagetables, but when I say spte in
    mmu-notifier context, I mean "secondary pte". In GRU case there's no
    actual secondary pte and there's only a secondary tlb because the GRU
    secondary MMU has no knowledge about sptes and every secondary tlb miss
    event in the MMU always generates a page fault that has to be resolved by
    the CPU (this is not the case of KVM where the a secondary tlb miss will
    walk sptes in hardware and it will refill the secondary tlb transparently
    to software if the corresponding spte is present). The same way
    zap_page_range has to invalidate the pte before freeing the page, the spte
    (and secondary tlb) must also be invalidated before any page is freed and
    reused.

    Currently we take a page_count pin on every page mapped by sptes, but that
    means the pages can't be swapped whenever they're mapped by any spte
    because they're part of the guest working set. Furthermore a spte unmap
    event can immediately lead to a page to be freed when the pin is released
    (so requiring the same complex and relatively slow tlb_gather smp safe
    logic we have in zap_page_range and that can be avoided completely if the
    spte unmap event doesn't require an unpin of the page previously mapped in
    the secondary MMU).

    The mmu notifiers allow kvm/GRU/XPMEM to attach to the tsk->mm and know
    when the VM is swapping or freeing or doing anything on the primary MMU so
    that the secondary MMU code can drop sptes before the pages are freed,
    avoiding all page pinning and allowing 100% reliable swapping of guest
    physical address space. Furthermore it avoids the code that teardown the
    mappings of the secondary MMU, to implement a logic like tlb_gather in
    zap_page_range that would require many IPI to flush other cpu tlbs, for
    each fixed number of spte unmapped.

    To make an example: if what happens on the primary MMU is a protection
    downgrade (from writeable to wrprotect) the secondary MMU mappings will be
    invalidated, and the next secondary-mmu-page-fault will call
    get_user_pages and trigger a do_wp_page through get_user_pages if it
    called get_user_pages with write=1, and it'll re-establishing an updated
    spte or secondary-tlb-mapping on the copied page. Or it will setup a
    readonly spte or readonly tlb mapping if it's a guest-read, if it calls
    get_user_pages with write=0. This is just an example.

    This allows to map any page pointed by any pte (and in turn visible in the
    primary CPU MMU), into a secondary MMU (be it a pure tlb like GRU, or an
    full MMU with both sptes and secondary-tlb like the shadow-pagetable layer
    with kvm), or a remote DMA in software like XPMEM (hence needing of
    schedule in XPMEM code to send the invalidate to the remote node, while no
    need to schedule in kvm/gru as it's an immediate event like invalidating
    primary-mmu pte).

    At least for KVM without this patch it's impossible to swap guests
    reliably. And having this feature and removing the page pin allows
    several other optimizations that simplify life considerably.

    Dependencies:

    1) mm_take_all_locks() to register the mmu notifier when the whole VM
    isn't doing anything with "mm". This allows mmu notifier users to keep
    track if the VM is in the middle of the invalidate_range_begin/end
    critical section with an atomic counter incraese in range_begin and
    decreased in range_end. No secondary MMU page fault is allowed to map
    any spte or secondary tlb reference, while the VM is in the middle of
    range_begin/end as any page returned by get_user_pages in that critical
    section could later immediately be freed without any further
    ->invalidate_page notification (invalidate_range_begin/end works on
    ranges and ->invalidate_page isn't called immediately before freeing
    the page). To stop all page freeing and pagetable overwrites the
    mmap_sem must be taken in write mode and all other anon_vma/i_mmap
    locks must be taken too.

    2) It'd be a waste to add branches in the VM if nobody could possibly
    run KVM/GRU/XPMEM on the kernel, so mmu notifiers will only enabled if
    CONFIG_KVM=m/y. In the current kernel kvm won't yet take advantage of
    mmu notifiers, but this already allows to compile a KVM external module
    against a kernel with mmu notifiers enabled and from the next pull from
    kvm.git we'll start using them. And GRU/XPMEM will also be able to
    continue the development by enabling KVM=m in their config, until they
    submit all GRU/XPMEM GPLv2 code to the mainline kernel. Then they can
    also enable MMU_NOTIFIERS in the same way KVM does it (even if KVM=n).
    This guarantees nobody selects MMU_NOTIFIER=y if KVM and GRU and XPMEM
    are all =n.

    The mmu_notifier_register call can fail because mm_take_all_locks may be
    interrupted by a signal and return -EINTR. Because mmu_notifier_reigster
    is used when a driver startup, a failure can be gracefully handled. Here
    an example of the change applied to kvm to register the mmu notifiers.
    Usually when a driver startups other allocations are required anyway and
    -ENOMEM failure paths exists already.

    struct kvm *kvm_arch_create_vm(void)
    {
    struct kvm *kvm = kzalloc(sizeof(struct kvm), GFP_KERNEL);
    + int err;

    if (!kvm)
    return ERR_PTR(-ENOMEM);

    INIT_LIST_HEAD(&kvm->arch.active_mmu_pages);

    + kvm->arch.mmu_notifier.ops = &kvm_mmu_notifier_ops;
    + err = mmu_notifier_register(&kvm->arch.mmu_notifier, current->mm);
    + if (err) {
    + kfree(kvm);
    + return ERR_PTR(err);
    + }
    +
    return kvm;
    }

    mmu_notifier_unregister returns void and it's reliable.

    The patch also adds a few needed but missing includes that would prevent
    kernel to compile after these changes on non-x86 archs (x86 didn't need
    them by luck).

    [akpm@linux-foundation.org: coding-style fixes]
    [akpm@linux-foundation.org: fix mm/filemap_xip.c build]
    [akpm@linux-foundation.org: fix mm/mmu_notifier.c build]
    Signed-off-by: Andrea Arcangeli
    Signed-off-by: Nick Piggin
    Signed-off-by: Christoph Lameter
    Cc: Jack Steiner
    Cc: Robin Holt
    Cc: Nick Piggin
    Cc: Peter Zijlstra
    Cc: Kanoj Sarcar
    Cc: Roland Dreier
    Cc: Steve Wise
    Cc: Avi Kivity
    Cc: Hugh Dickins
    Cc: Rusty Russell
    Cc: Anthony Liguori
    Cc: Chris Wright
    Cc: Marcelo Tosatti
    Cc: Eric Dumazet
    Cc: "Paul E. McKenney"
    Cc: Izik Eidus
    Cc: Anthony Liguori
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     

20 Mar, 2008

1 commit

  • Fix various kernel-doc notation in mm/:

    filemap.c: add function short description; convert 2 to kernel-doc
    fremap.c: change parameter 'prot' to @prot
    pagewalk.c: change "-" in function parameters to ":"
    slab.c: fix short description of kmem_ptr_validate()
    swap.c: fix description & parameters of put_pages_list()
    swap_state.c: fix function parameters
    vmalloc.c: change "@returns" to "Returns:" since that is not a parameter

    Signed-off-by: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     

06 Feb, 2008

1 commit


17 Oct, 2007

2 commits

  • Fix kernel-doc for sys_remap_file_pages() and add info to the 'prot' NOTE.
    Rename __prot parameter to prot.

    Signed-off-by: Randy Dunlap
    Acked-by: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     
  • mm.h doesn't use directly anything from mutex.h and backing-dev.h, so
    remove them and add them back to files which need them.

    Cross-compile tested on many configs and archs.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

09 Oct, 2007

1 commit


20 Jul, 2007

3 commits

  • page_mkclean() doesn't re-protect ptes for non-linear mappings, so a later
    re-dirty through such a mapping will not generate a fault, PG_dirty will
    not reflect the dirty state and the dirty count will be skewed. This
    implies that msync() is also currently broken for nonlinear mappings.

    The easiest solution is to emulate remap_file_pages on non-linear mappings
    with simple mmap() for non ram-backed filesystems. Applications continue
    to work (albeit slower), as long as the number of remappings remain below
    the maximum vma count.

    However all currently known real uses of non-linear mappings are for ram
    backed filesystems, which this patch doesn't affect.

    Signed-off-by: Miklos Szeredi
    Acked-by: Peter Zijlstra
    Cc: William Lee Irwin III
    Cc: Nick Piggin
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     
  • Change ->fault prototype. We now return an int, which contains
    VM_FAULT_xxx code in the low byte, and FAULT_RET_xxx code in the next byte.
    FAULT_RET_ code tells the VM whether a page was found, whether it has been
    locked, and potentially other things. This is not quite the way he wanted
    it yet, but that's changed in the next patch (which requires changes to
    arch code).

    This means we no longer set VM_CAN_INVALIDATE in the vma in order to say
    that a page is locked which requires filemap_nopage to go away (because we
    can no longer remain backward compatible without that flag), but we were
    going to do that anyway.

    struct fault_data is renamed to struct vm_fault as Linus asked. address
    is now a void __user * that we should firmly encourage drivers not to use
    without really good reason.

    The page is now returned via a page pointer in the vm_fault struct.

    Signed-off-by: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • Nonlinear mappings are (AFAIKS) simply a virtual memory concept that encodes
    the virtual address -> file offset differently from linear mappings.

    ->populate is a layering violation because the filesystem/pagecache code
    should need to know anything about the virtual memory mapping. The hitch here
    is that the ->nopage handler didn't pass down enough information (ie. pgoff).
    But it is more logical to pass pgoff rather than have the ->nopage function
    calculate it itself anyway (because that's a similar layering violation).

    Having the populate handler install the pte itself is likewise a nasty thing
    to be doing.

    This patch introduces a new fault handler that replaces ->nopage and
    ->populate and (later) ->nopfn. Most of the old mechanism is still in place
    so there is a lot of duplication and nice cleanups that can be removed if
    everyone switches over.

    The rationale for doing this in the first place is that nonlinear mappings are
    subject to the pagefault vs invalidate/truncate race too, and it seemed stupid
    to duplicate the synchronisation logic rather than just consolidate the two.

    After this patch, MAP_NONBLOCK no longer sets up ptes for pages present in
    pagecache. Seems like a fringe functionality anyway.

    NOPAGE_REFAULT is removed. This should be implemented with ->fault, and no
    users have hit mainline yet.

    [akpm@linux-foundation.org: cleanup]
    [randy.dunlap@oracle.com: doc. fixes for readahead]
    [akpm@linux-foundation.org: build fix]
    Signed-off-by: Nick Piggin
    Signed-off-by: Randy Dunlap
    Cc: Mark Fasheh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     

23 Dec, 2006

1 commit


08 Dec, 2006

1 commit


01 Oct, 2006

1 commit

  • Change pte_clear_full to a more appropriately named pte_clear_not_present,
    allowing optimizations when not-present mapping changes need not be reflected
    in the hardware TLB for protected page table modes. There is also another
    case that can use it in the fremap code.

    Signed-off-by: Zachary Amsden
    Signed-off-by: Jeremy Fitzhardinge
    Cc: Rusty Russell
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Zachary Amsden
     

26 Sep, 2006

1 commit


23 Jun, 2006

1 commit

  • There are two calls to update_mmu_cache in fremap.c, both defective.
    The one in install_page needs to be accompanied by lazy_mmu_prot_update
    (some other cleanup time, move that into ia64 update_mmu_cache itself); and
    the one in install_file_pte should be removed since the pte is not present.

    Signed-off-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

30 Nov, 2005

1 commit


29 Nov, 2005

1 commit

  • This replaces the (in my opinion horrible) VM_UNMAPPED logic with very
    explicit support for a "remapped page range" aka VM_PFNMAP. It allows a
    VM area to contain an arbitrary range of page table entries that the VM
    never touches, and never considers to be normal pages.

    Any user of "remap_pfn_range()" automatically gets this new
    functionality, and doesn't even have to mark the pages reserved or
    indeed mark them any other way. It just works. As a side effect, doing
    mmap() on /dev/mem works for arbitrary ranges.

    Sparc update from David in the next commit.

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

23 Nov, 2005

2 commits

  • There's one peculiar use of VM_RESERVED which the previous patch left behind:
    because VM_NONLINEAR's try_to_unmap_cluster uses vm_private_data as a swapout
    cursor, but should never meet VM_RESERVED vmas, it was a way of extending
    VM_NONLINEAR to VM_RESERVED vmas using vm_private_data for some other purpose.
    But that's an empty set - they don't have the populate function required. So
    just throw away those VM_RESERVED tests.

    But one more interesting in rmap.c has to go too: try_to_unmap_one will want
    to swap out an anonymous page from VM_RESERVED or VM_UNPAGED area.

    Signed-off-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • Although we tend to associate VM_RESERVED with remap_pfn_range, quite a few
    drivers set VM_RESERVED on areas which are then populated by nopage. The
    PageReserved removal in 2.6.15-rc1 changed VM_RESERVED not to free pages in
    zap_pte_range, without changing those drivers not to set it: so their pages
    just leak away.

    Let's not change miscellaneous drivers now: introduce VM_UNPAGED at the core,
    to flag the special areas where the ptes may have no struct page, or if they
    have then it's not to be touched. Replace most instances of VM_RESERVED in
    core mm by VM_UNPAGED. Force it on in remap_pfn_range, and the sparc and
    sparc64 io_remap_pfn_range.

    Revert addition of VM_RESERVED to powerpc vdso, it's not needed there. Is it
    needed anywhere? It still governs the mm->reserved_vm statistic, and special
    vmas not to be merged, and areas not to be core dumped; but could probably be
    eliminated later (the drivers are probably specifying it because in 2.4 it
    kept swapout off the vma, but in 2.6 we work from the LRU, which these pages
    don't get on).

    Use the VM_SHM slot for VM_UNPAGED, and define VM_SHM to 0: it serves no
    purpose whatsoever, and should be removed from drivers when we clean up.

    Signed-off-by: Hugh Dickins
    Acked-by: William Irwin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins