10 Sep, 2010

1 commit

  • …low and kswapd is awake

    Ordinarily watermark checks are based on the vmstat NR_FREE_PAGES as it is
    cheaper than scanning a number of lists. To avoid synchronization
    overhead, counter deltas are maintained on a per-cpu basis and drained
    both periodically and when the delta is above a threshold. On large CPU
    systems, the difference between the estimated and real value of
    NR_FREE_PAGES can be very high. If NR_FREE_PAGES is much higher than
    number of real free page in buddy, the VM can allocate pages below min
    watermark, at worst reducing the real number of pages to zero. Even if
    the OOM killer kills some victim for freeing memory, it may not free
    memory if the exit path requires a new page resulting in livelock.

    This patch introduces a zone_page_state_snapshot() function (courtesy of
    Christoph) that takes a slightly more accurate view of an arbitrary vmstat
    counter. It is used to read NR_FREE_PAGES while kswapd is awake to avoid
    the watermark being accidentally broken. The estimate is not perfect and
    may result in cache line bounces but is expected to be lighter than the
    IPI calls necessary to continually drain the per-cpu counters while kswapd
    is awake.

    Signed-off-by: Christoph Lameter <cl@linux.com>
    Signed-off-by: Mel Gorman <mel@csn.ul.ie>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

    Christoph Lameter
     

18 May, 2009

1 commit

  • pfn_valid() is meant to be able to tell if a given PFN has valid memmap
    associated with it or not. In FLATMEM, it is expected that holes always
    have valid memmap as long as there is valid PFNs either side of the hole.
    In SPARSEMEM, it is assumed that a valid section has a memmap for the
    entire section.

    However, ARM and maybe other embedded architectures in the future free
    memmap backing holes to save memory on the assumption the memmap is never
    used. The page_zone linkages are then broken even though pfn_valid()
    returns true. A walker of the full memmap must then do this additional
    check to ensure the memmap they are looking at is sane by making sure the
    zone and PFN linkages are still valid. This is expensive, but walkers of
    the full memmap are extremely rare.

    This was caught before for FLATMEM and hacked around but it hits again for
    SPARSEMEM because the page_zone linkages can look ok where the PFN linkages
    are totally screwed. This looks like a hatchet job but the reality is that
    any clean solution would end up consumning all the memory saved by punching
    these unexpected holes in the memmap. For example, we tried marking the
    memmap within the section invalid but the section size exceeds the size of
    the hole in most cases so pfn_valid() starts returning false where valid
    memmap exists. Shrinking the size of the section would increase memory
    consumption offsetting the gains.

    This patch identifies when an architecture is punching unexpected holes
    in the memmap that the memory model cannot automatically detect and sets
    ARCH_HAS_HOLES_MEMORYMODEL. At the moment, this is restricted to EP93xx
    which is the model sub-architecture this has been reported on but may expand
    later. When set, walkers of the full memmap must call memmap_valid_within()
    for each PFN and passing in what it expects the page and zone to be for
    that PFN. If it finds the linkages to be broken, it assumes the memmap is
    invalid for that PFN.

    Signed-off-by: Mel Gorman
    Signed-off-by: Russell King

    Mel Gorman
     

14 Sep, 2008

1 commit

  • The iterator for_each_zone_zonelist() uses a struct zoneref *z cursor when
    scanning zonelists to keep track of where in the zonelist it is. The
    zoneref that is returned corresponds to the the next zone that is to be
    scanned, not the current one. It was intended to be treated as an opaque
    list.

    When the page allocator is scanning a zonelist, it marks elements in the
    zonelist corresponding to zones that are temporarily full. As the
    zonelist is being updated, it uses the cursor here;

    if (NUMA_BUILD)
    zlc_mark_zone_full(zonelist, z);

    This is intended to prevent rescanning in the near future but the zoneref
    cursor does not correspond to the zone that has been found to be full.
    This is an easy misunderstanding to make so this patch corrects the
    problem by changing zoneref cursor to be the current zone being scanned
    instead of the next one.

    Signed-off-by: Mel Gorman
    Cc: Andy Whitcroft
    Cc: KAMEZAWA Hiroyuki
    Cc: [2.6.26.x]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     

28 Apr, 2008

1 commit

  • The MPOL_BIND policy creates a zonelist that is used for allocations
    controlled by that mempolicy. As the per-node zonelist is already being
    filtered based on a zone id, this patch adds a version of __alloc_pages() that
    takes a nodemask for further filtering. This eliminates the need for
    MPOL_BIND to create a custom zonelist.

    A positive benefit of this is that allocations using MPOL_BIND now use the
    local node's distance-ordered zonelist instead of a custom node-id-ordered
    zonelist. I.e., pages will be allocated from the closest allowed node with
    available memory.

    [Lee.Schermerhorn@hp.com: Mempolicy: update stale documentation and comments]
    [Lee.Schermerhorn@hp.com: Mempolicy: make dequeue_huge_page_vma() obey MPOL_BIND nodemask]
    [Lee.Schermerhorn@hp.com: Mempolicy: make dequeue_huge_page_vma() obey MPOL_BIND nodemask rework]
    Signed-off-by: Mel Gorman
    Acked-by: Christoph Lameter
    Signed-off-by: Lee Schermerhorn
    Cc: KAMEZAWA Hiroyuki
    Cc: Mel Gorman
    Cc: Hugh Dickins
    Cc: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     

08 Dec, 2006

1 commit


11 Jul, 2006

1 commit


01 Jul, 2006

1 commit


28 Mar, 2006

1 commit

  • Helper functions for for_each_online_pgdat/for_each_zone look too big to be
    inlined. Speed of these helper macro itself is not very important. (inner
    loops are tend to do more work than this)

    This patch make helper function to be out-of-lined.

    inline out-of-line
    .text 005c0680 005bf6a0

    005c0680 - 005bf6a0 = FE0 = 4Kbytes.

    Signed-off-by: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki