01 Oct, 2006

1 commit

  • Change pte_clear_full to a more appropriately named pte_clear_not_present,
    allowing optimizations when not-present mapping changes need not be reflected
    in the hardware TLB for protected page table modes. There is also another
    case that can use it in the fremap code.

    Signed-off-by: Zachary Amsden
    Signed-off-by: Jeremy Fitzhardinge
    Cc: Rusty Russell
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Zachary Amsden
     

26 Sep, 2006

1 commit


23 Jun, 2006

1 commit

  • There are two calls to update_mmu_cache in fremap.c, both defective.
    The one in install_page needs to be accompanied by lazy_mmu_prot_update
    (some other cleanup time, move that into ia64 update_mmu_cache itself); and
    the one in install_file_pte should be removed since the pte is not present.

    Signed-off-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

30 Nov, 2005

1 commit


29 Nov, 2005

1 commit

  • This replaces the (in my opinion horrible) VM_UNMAPPED logic with very
    explicit support for a "remapped page range" aka VM_PFNMAP. It allows a
    VM area to contain an arbitrary range of page table entries that the VM
    never touches, and never considers to be normal pages.

    Any user of "remap_pfn_range()" automatically gets this new
    functionality, and doesn't even have to mark the pages reserved or
    indeed mark them any other way. It just works. As a side effect, doing
    mmap() on /dev/mem works for arbitrary ranges.

    Sparc update from David in the next commit.

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

23 Nov, 2005

2 commits

  • There's one peculiar use of VM_RESERVED which the previous patch left behind:
    because VM_NONLINEAR's try_to_unmap_cluster uses vm_private_data as a swapout
    cursor, but should never meet VM_RESERVED vmas, it was a way of extending
    VM_NONLINEAR to VM_RESERVED vmas using vm_private_data for some other purpose.
    But that's an empty set - they don't have the populate function required. So
    just throw away those VM_RESERVED tests.

    But one more interesting in rmap.c has to go too: try_to_unmap_one will want
    to swap out an anonymous page from VM_RESERVED or VM_UNPAGED area.

    Signed-off-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • Although we tend to associate VM_RESERVED with remap_pfn_range, quite a few
    drivers set VM_RESERVED on areas which are then populated by nopage. The
    PageReserved removal in 2.6.15-rc1 changed VM_RESERVED not to free pages in
    zap_pte_range, without changing those drivers not to set it: so their pages
    just leak away.

    Let's not change miscellaneous drivers now: introduce VM_UNPAGED at the core,
    to flag the special areas where the ptes may have no struct page, or if they
    have then it's not to be touched. Replace most instances of VM_RESERVED in
    core mm by VM_UNPAGED. Force it on in remap_pfn_range, and the sparc and
    sparc64 io_remap_pfn_range.

    Revert addition of VM_RESERVED to powerpc vdso, it's not needed there. Is it
    needed anywhere? It still governs the mm->reserved_vm statistic, and special
    vmas not to be merged, and areas not to be core dumped; but could probably be
    eliminated later (the drivers are probably specifying it because in 2.4 it
    kept swapout off the vma, but in 2.6 we work from the LRU, which these pages
    don't get on).

    Use the VM_SHM slot for VM_UNPAGED, and define VM_SHM to 0: it serves no
    purpose whatsoever, and should be removed from drivers when we clean up.

    Signed-off-by: Hugh Dickins
    Acked-by: William Irwin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

30 Oct, 2005

5 commits

  • Second step in pushing down the page_table_lock. Remove the temporary
    bridging hack from __pud_alloc, __pmd_alloc, __pte_alloc: expect callers not
    to hold page_table_lock, whether it's on init_mm or a user mm; take
    page_table_lock internally to check if a racing task already allocated.

    Convert their callers from common code. But avoid coming back to change them
    again later: instead of moving the spin_lock(&mm->page_table_lock) down,
    switch over to new macros pte_alloc_map_lock and pte_unmap_unlock, which
    encapsulate the mapping+locking and unlocking+unmapping together, and in the
    end may use alternatives to the mm page_table_lock itself.

    These callers all hold mmap_sem (some exclusively, some not), so at no level
    can a page table be whipped away from beneath them; and pte_alloc uses the
    "atomic" pmd_present to test whether it needs to allocate. It appears that on
    all arches we can safely descend without page_table_lock.

    Signed-off-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • update_mem_hiwater has attracted various criticisms, in particular from those
    concerned with mm scalability. Originally it was called whenever rss or
    total_vm got raised. Then many of those callsites were replaced by a timer
    tick call from account_system_time. Now Frank van Maarseveen reports that to
    be found inadequate. How about this? Works for Frank.

    Replace update_mem_hiwater, a poor combination of two unrelated ops, by macros
    update_hiwater_rss and update_hiwater_vm. Don't attempt to keep
    mm->hiwater_rss up to date at timer tick, nor every time we raise rss (usually
    by 1): those are hot paths. Do the opposite, update only when about to lower
    rss (usually by many), or just before final accounting in do_exit. Handle
    mm->hiwater_vm in the same way, though it's much less of an issue. Demand
    that whoever collects these hiwater statistics do the work of taking the
    maximum with rss or total_vm.

    And there has been no collector of these hiwater statistics in the tree. The
    new convention needs an example, so match Frank's usage by adding a VmPeak
    line above VmSize to /proc//status, and also a VmHWM line above VmRSS
    (High-Water-Mark or High-Water-Memory).

    There was a particular anomaly during mremap move, that hiwater_vm might be
    captured too high. A fleeting such anomaly remains, but it's quickly
    corrected now, whereas before it would stick.

    What locking? None: if the app is racy then these statistics will be racy,
    it's not worth any overhead to make them exact. But whenever it suits,
    hiwater_vm is updated under exclusive mmap_sem, and hiwater_rss under
    page_table_lock (for now) or with preemption disabled (later on): without
    going to any trouble, minimize the time between reading current values and
    updating, to minimize those occasions when a racing thread bumps a count up
    and back down in between.

    Signed-off-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • There used to be just one call to zap_pte, but it shouldn't be inline now
    there are two. Check for the common case pte_none before calling, and move
    its rss accounting up into install_page or install_file_pte - which helps the
    next patch.

    Signed-off-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • Remove PageReserved() calls from core code by tightening VM_RESERVED
    handling in mm/ to cover PageReserved functionality.

    PageReserved special casing is removed from get_page and put_page.

    All setting and clearing of PageReserved is retained, and it is now flagged
    in the page_alloc checks to help ensure we don't introduce any refcount
    based freeing of Reserved pages.

    MAP_PRIVATE, PROT_WRITE of VM_RESERVED regions is tentatively being
    deprecated. We never completely handled it correctly anyway, and is be
    reintroduced in future if required (Hugh has a proof of concept).

    Once PageReserved() calls are removed from kernel/power/swsusp.c, and all
    arch/ and driver code, the Set and Clear calls, and the PG_reserved bit can
    be trivially removed.

    Last real user of PageReserved is swsusp, which uses PageReserved to
    determine whether a struct page points to valid memory or not. This still
    needs to be addressed (a generic page_is_ram() should work).

    A last caveat: the ZERO_PAGE is now refcounted and managed with rmap (and
    thus mapcounted and count towards shared rss). These writes to the struct
    page could cause excessive cacheline bouncing on big systems. There are a
    number of ways this could be addressed if it is an issue.

    Signed-off-by: Nick Piggin

    Refcount bug fix for filemap_xip.c

    Signed-off-by: Carsten Otte
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • I was lazy when we added anon_rss, and chose to change as few places as
    possible. So currently each anonymous page has to be counted twice, in rss
    and in anon_rss. Which won't be so good if those are atomic counts in some
    configurations.

    Change that around: keep file_rss and anon_rss separately, and add them
    together (with get_mm_rss macro) when the total is needed - reading two
    atomics is much cheaper than updating two atomics. And update anon_rss
    upfront, typically in memory.c, not tucked away in page_add_anon_rmap.

    Signed-off-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

12 Oct, 2005

1 commit

  • Refuse to install a page into a mapping if the mapping count is already
    ridiculously large.

    You probably cannot trigger this on 32-bit architectures, but on a
    64-bit setup we should protect against it.

    Signed-off-by: Hugh Dickins
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

17 Apr, 2005

1 commit

  • Initial git repository build. I'm not bothering with the full history,
    even though we have it. We can create a separate "historical" git
    archive of that later if we want to, and in the meantime it's about
    3.2GB when imported into git - space that would just make the early
    git days unnecessarily complicated, when we don't have a lot of good
    infrastructure for it.

    Let it rip!

    Linus Torvalds