11 Apr, 2006

1 commit

  • This patch is an enhancement of OVERCOMMIT_GUESS algorithm in
    __vm_enough_memory() in mm/nommu.c.

    When the OVERCOMMIT_GUESS algorithm calculates the number of free pages,
    the algorithm subtracts the number of reserved pages from the result
    nr_free_pages().

    Signed-off-by: Hideo Aoki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hideo AOKI
     

22 Mar, 2006

1 commit

  • Now that compound page handling is properly fixed in the VM, move nommu
    over to using compound pages rather than rolling their own refcounting.

    nommu vm page refcounting is broken anyway, but there is no need to have
    divergent code in the core VM now, nor when it gets fixed.

    Signed-off-by: Nick Piggin
    Cc: David Howells

    (Needs testing, please).
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     

01 Mar, 2006

1 commit


21 Feb, 2006

1 commit


07 Jan, 2006

1 commit

  • The attached patch makes the SYSV IPC shared memory facilities use the new
    ramfs facilities on a no-MMU kernel.

    The following changes are made:

    (1) There are now shmem_mmap() and shmem_get_unmapped_area() functions to
    allow the IPC SHM facilities to commune with the tiny-shmem and shmem
    code.

    (2) ramfs files now need resizing using do_truncate() rather than by modifying
    the inode size directly (see shmem_file_setup()). This causes ramfs to
    attempt to bind a block of pages of sufficient size to the inode.

    (3) CONFIG_SYSVIPC is no longer contingent on CONFIG_MMU.

    Signed-off-by: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Howells
     

29 Nov, 2005

1 commit

  • This replaces the (in my opinion horrible) VM_UNMAPPED logic with very
    explicit support for a "remapped page range" aka VM_PFNMAP. It allows a
    VM area to contain an arbitrary range of page table entries that the VM
    never touches, and never considers to be normal pages.

    Any user of "remap_pfn_range()" automatically gets this new
    functionality, and doesn't even have to mark the pages reserved or
    indeed mark them any other way. It just works. As a side effect, doing
    mmap() on /dev/mem works for arbitrary ranges.

    Sparc update from David in the next commit.

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

07 Nov, 2005

1 commit


30 Oct, 2005

3 commits

  • Final step in pushing down common core's page_table_lock. follow_page no
    longer wants caller to hold page_table_lock, uses pte_offset_map_lock itself;
    and so no page_table_lock is taken in get_user_pages itself.

    But get_user_pages (and get_futex_key) do then need follow_page to pin the
    page for them: take Daniel's suggestion of bitflags to follow_page.

    Need one for WRITE, another for TOUCH (it was the accessed flag before:
    vanished along with check_user_page_readable, but surely get_numa_maps is
    wrong to mark every page it finds as accessed), another for GET.

    And another, ANON to dispose of untouched_anonymous_page: it seems silly for
    that to descend a second time, let follow_page observe if there was no page
    table and return ZERO_PAGE if so. Fix minor bug in that: check VM_LOCKED -
    make_pages_present ought to make readonly anonymous present.

    Give get_numa_maps a cond_resched while we're there.

    Signed-off-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • update_mem_hiwater has attracted various criticisms, in particular from those
    concerned with mm scalability. Originally it was called whenever rss or
    total_vm got raised. Then many of those callsites were replaced by a timer
    tick call from account_system_time. Now Frank van Maarseveen reports that to
    be found inadequate. How about this? Works for Frank.

    Replace update_mem_hiwater, a poor combination of two unrelated ops, by macros
    update_hiwater_rss and update_hiwater_vm. Don't attempt to keep
    mm->hiwater_rss up to date at timer tick, nor every time we raise rss (usually
    by 1): those are hot paths. Do the opposite, update only when about to lower
    rss (usually by many), or just before final accounting in do_exit. Handle
    mm->hiwater_vm in the same way, though it's much less of an issue. Demand
    that whoever collects these hiwater statistics do the work of taking the
    maximum with rss or total_vm.

    And there has been no collector of these hiwater statistics in the tree. The
    new convention needs an example, so match Frank's usage by adding a VmPeak
    line above VmSize to /proc//status, and also a VmHWM line above VmRSS
    (High-Water-Mark or High-Water-Memory).

    There was a particular anomaly during mremap move, that hiwater_vm might be
    captured too high. A fleeting such anomaly remains, but it's quickly
    corrected now, whereas before it would stick.

    What locking? None: if the app is racy then these statistics will be racy,
    it's not worth any overhead to make them exact. But whenever it suits,
    hiwater_vm is updated under exclusive mmap_sem, and hiwater_rss under
    page_table_lock (for now) or with preemption disabled (later on): without
    going to any trouble, minimize the time between reading current values and
    updating, to minimize those occasions when a racing thread bumps a count up
    and back down in between.

    Signed-off-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • I was lazy when we added anon_rss, and chose to change as few places as
    possible. So currently each anonymous page has to be counted twice, in rss
    and in anon_rss. Which won't be so good if those are atomic counts in some
    configurations.

    Change that around: keep file_rss and anon_rss separately, and add them
    together (with get_mm_rss macro) when the total is needed - reading two
    atomics is much cheaper than updating two atomics. And update anon_rss
    upfront, typically in memory.c, not tucked away in page_add_anon_rmap.

    Signed-off-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

09 Oct, 2005

1 commit

  • - added typedef unsigned int __nocast gfp_t;

    - replaced __nocast uses for gfp flags with gfp_t - it gives exactly
    the same warnings as far as sparse is concerned, doesn't change
    generated code (from gcc point of view we replaced unsigned int with
    typedef) and documents what's going on far better.

    Signed-off-by: Al Viro
    Signed-off-by: Linus Torvalds

    Al Viro
     

12 Sep, 2005

1 commit

  • Move call to get_mm_counter() in update_mem_hiwater() to be
    inside the check for tsk->mm being null. Otherwise you can be
    following a null pointer here. This patch submitted by
    Javier Herrero .

    Modify the end check for munmap regions to allow for the
    legacy behavior of 0 being valid. Pretty much all current
    uClinux system libc malloc's pass in 0 as the end point.
    A hard check will fail on these, so change the check so
    that if it is non-zero it must be valid otherwise it fails.
    A passed in value will always succeed (as it used too).

    Also export a few more mm system functions - to be consistent
    with the VM code exports.

    Signed-off-by: Greg Ungerer
    Signed-off-by: Linus Torvalds

    Greg Ungerer
     

05 Aug, 2005

1 commit

  • We have found what seems to be a small bug in __vm_enough_memory() when
    sysctl_overcommit_memory is set to OVERCOMMIT_NEVER.

    When this bug occurs the systems fails to boot, with /sbin/init whining
    about fork() returning ENOMEM.

    We hunted down the problem to this:

    The deferred update mecanism used in vm_acct_memory(), on a SMP system,
    allows the vm_committed_space counter to have a negative value.

    This should not be a problem since this counter is known to be inaccurate.

    But in __vm_enough_memory() this counter is compared to the `allowed'
    variable, which is an unsigned long. This comparison is broken since it
    will consider the negative values of vm_committed_space to be huge positive
    values, resulting in a memory allocation failure.

    Signed-off-by:
    Signed-off-by:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Simon Derr
     

22 Jun, 2005

1 commit

  • Ingo recently introduced a great speedup for allocating new mmaps using the
    free_area_cache pointer which boosts the specweb SSL benchmark by 4-5% and
    causes huge performance increases in thread creation.

    The downside of this patch is that it does lead to fragmentation in the
    mmap-ed areas (visible via /proc/self/maps), such that some applications
    that work fine under 2.4 kernels quickly run out of memory on any 2.6
    kernel.

    The problem is twofold:

    1) the free_area_cache is used to continue a search for memory where
    the last search ended. Before the change new areas were always
    searched from the base address on.

    So now new small areas are cluttering holes of all sizes
    throughout the whole mmap-able region whereas before small holes
    tended to close holes near the base leaving holes far from the base
    large and available for larger requests.

    2) the free_area_cache also is set to the location of the last
    munmap-ed area so in scenarios where we allocate e.g. five regions of
    1K each, then free regions 4 2 3 in this order the next request for 1K
    will be placed in the position of the old region 3, whereas before we
    appended it to the still active region 1, placing it at the location
    of the old region 2. Before we had 1 free region of 2K, now we only
    get two free regions of 1K -> fragmentation.

    The patch addresses thes issues by introducing yet another cache descriptor
    cached_hole_size that contains the largest known hole size below the
    current free_area_cache. If a new request comes in the size is compared
    against the cached_hole_size and if the request can be filled with a hole
    below free_area_cache the search is started from the base instead.

    The results look promising: Whereas 2.6.12-rc4 fragments quickly and my
    (earlier posted) leakme.c test program terminates after 50000+ iterations
    with 96 distinct and fragmented maps in /proc/self/maps it performs nicely
    (as expected) with thread creation, Ingo's test_str02 with 20000 threads
    requires 0.7s system time.

    Taking out Ingo's patch (un-patch available per request) by basically
    deleting all mentions of free_area_cache from the kernel and starting the
    search for new memory always at the respective bases we observe: leakme
    terminates successfully with 11 distinctive hardly fragmented areas in
    /proc/self/maps but thread creating is gringdingly slow: 30+s(!) system
    time for Ingo's test_str02 with 20000 threads.

    Now - drumroll ;-) the appended patch works fine with leakme: it ends with
    only 7 distinct areas in /proc/self/maps and also thread creation seems
    sufficiently fast with 0.71s for 20000 threads.

    Signed-off-by: Wolfgang Wander
    Credit-to: "Richard Purdie"
    Signed-off-by: Ken Chen
    Acked-by: Ingo Molnar (partly)
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wolfgang Wander
     

17 May, 2005

1 commit

  • Linus changed the second argument of __vmalloc from int to unsigned int
    breaking the compilation for CONFIG_MMU=n configurations (since he only
    changed vmalloc.c but not nommu.c).

    Signed-off-by: Adrian Bunk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     

17 Apr, 2005

1 commit

  • Initial git repository build. I'm not bothering with the full history,
    even though we have it. We can create a separate "historical" git
    archive of that later if we want to, and in the meantime it's about
    3.2GB when imported into git - space that would just make the early
    git days unnecessarily complicated, when we don't have a lot of good
    infrastructure for it.

    Let it rip!

    Linus Torvalds