20 Jul, 2007

3 commits

  • page-writeback accounting is presently performed in the page-flags macros.
    This is inconsistent and a bit ugly and makes it awkward to implement
    per-backing_dev under-writeback page accounting.

    So move this accounting down to the callsite(s).

    Acked-by: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • Share the same page flag bit for PG_readahead and PG_reclaim.

    One is used only on file reads, another is only for emergency writes. One
    is used mostly for fresh/young pages, another is for old pages.

    Combinations of possible interactions are:

    a) clear PG_reclaim => implicit clear of PG_readahead
    it will delay an asynchronous readahead into a synchronous one
    it actually does _good_ for readahead:
    the pages will be reclaimed soon, it's readahead thrashing!
    in this case, synchronous readahead makes more sense.

    b) clear PG_readahead => implicit clear of PG_reclaim
    one(and only one) page will not be reclaimed in time
    it can be avoided by checking PageWriteback(page) in readahead first

    c) set PG_reclaim => implicit set of PG_readahead
    will confuse readahead and make it restart the size rampup process
    it's a trivial problem, and can mostly be avoided by checking
    PageWriteback(page) first in readahead

    d) set PG_readahead => implicit set of PG_reclaim
    PG_readahead will never be set on already cached pages.
    PG_reclaim will always be cleared on dirtying a page.
    so not a problem.

    In summary,
    a) we get better behavior
    b,d) possible interactions can be avoided
    c) racy condition exists that might affect readahead, but the chance
    is _really_ low, and the hurt on readahead is trivial.

    Compound pages also use PG_reclaim, but for now they do not interact with
    reclaim/readahead code.

    Signed-off-by: Fengguang Wu
    Cc: Rusty Russell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Fengguang Wu
     
  • Introduce a new page flag: PG_readahead.

    It acts as a look-ahead mark, which tells the page reader: Hey, it's time to
    invoke the read-ahead logic. For the sake of I/O pipelining, don't wait until
    it runs out of cached pages!

    Signed-off-by: Fengguang Wu
    Cc: Steven Pratt
    Cc: Ram Pai
    Cc: Rusty Russell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Fengguang Wu
     

18 Jul, 2007

1 commit


08 May, 2007

3 commits

  • Remove the two page flags that were previously used by swsusp and are no
    longer needed.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Pavel Machek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rafael J. Wysocki
     
  • The patch adds PageTail(page) and PageHead(page) to check if a page is the
    head or the tail of a compound page. This is done by masking the two bits
    describing the state of a compound page and then comparing them. So one
    comparision and a branch instead of two bit checks and two branches.

    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • If we add a new flag so that we can distinguish between the first page and the
    tail pages then we can avoid to use page->private in the first page.
    page->private == page for the first page, so there is no real information in
    there.

    Freeing up page->private makes the use of compound pages more transparent.
    They become more usable like real pages. Right now we have to be careful f.e.
    if we are going beyond PAGE_SIZE allocations in the slab on i386 because we
    can then no longer use the private field. This is one of the issues that
    cause us not to support debugging for page size slabs in SLAB.

    Having page->private available for SLUB would allow more meta information in
    the page struct. I can probably avoid the 16 bit ints that I have in there
    right now.

    Also if page->private is available then a compound page may be equipped with
    buffer heads. This may free up the way for filesystems to support larger
    blocks than page size.

    We add PageTail as an alias of PageReclaim. Compound pages cannot currently
    be reclaimed. Because of the alias one needs to check PageCompound first.

    The RFC for the this approach was discussed at
    http://marc.info/?t=117574302800001&r=1&w=2

    [nacc@us.ibm.com: fix hugetlbfs]
    Signed-off-by: Christoph Lameter
    Signed-off-by: Nishanth Aravamudan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

27 Apr, 2007

1 commit

  • The page_test_and_clear_dirty primitive really consists of two
    operations, page_test_dirty and the page_clear_dirty. The combination
    of the two is not an atomic operation, so it makes more sense to have
    two separate operations instead of one.
    In addition to the improved readability of the s390 version of
    SetPageUptodate, it now avoids the page_test_dirty operation which is
    an insert-storage-key-extended (iske) instruction which is an expensive
    operation.

    Signed-off-by: Martin Schwidefsky

    Martin Schwidefsky
     

02 Mar, 2007

1 commit

  • Rename PG_checked to PG_owner_priv_1 to reflect its availablilty as a
    private flag for use by the owner/allocator of the page. In the case of
    pagecache pages (which might be considered to be owned by the mm),
    filesystems may use the flag.

    Signed-off-by: Jeremy Fitzhardinge
    Signed-off-by: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     

22 Dec, 2006

1 commit

  • They were horribly easy to mis-use because of their tempting naming, and
    they also did way more than any users of them generally wanted them to
    do.

    A dirty page can become clean under two circumstances:

    (a) when we write it out. We have "clear_page_dirty_for_io()" for
    this, and that function remains unchanged.

    In the "for IO" case it is not sufficient to just clear the dirty
    bit, you also have to mark the page as being under writeback etc.

    (b) when we actually remove a page due to it becoming inaccessible to
    users, notably because it was truncate()'d away or the file (or
    metadata) no longer exists, and we thus want to cancel any
    outstanding dirty state.

    For the (b) case, we now introduce "cancel_dirty_page()", which only
    touches the page state itself, and verifies that the page is not mapped
    (since cancelling writes on a mapped page would be actively wrong as it
    is still accessible to users).

    Some filesystems need to be fixed up for this: CIFS, FUSE, JFS,
    ReiserFS, XFS all use the old confusing functions, and will be fixed
    separately in subsequent commits (with some of them just removing the
    offending logic, and others using clear_page_dirty_for_io()).

    This was confirmed by Martin Michlmayr to fix the apt database
    corruption on ARM.

    Cc: Martin Michlmayr
    Cc: Peter Zijlstra
    Cc: Hugh Dickins
    Cc: Nick Piggin
    Cc: Arjan van de Ven
    Cc: Andrei Popa
    Cc: Andrew Morton
    Cc: Dave Kleikamp
    Cc: Gordon Farquharson
    Cc: Martin Schwidefsky
    Cc: Trond Myklebust
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

30 Sep, 2006

1 commit

  • Convert s390 page handling macros to functions. In particular this fixes a
    problem with s390's SetPageUptodate macro which uses its input parameter
    twice which again can cause subtle bugs.

    [akpm@osdl.org: build fix]
    Cc: Martin Schwidefsky
    Signed-off-by: Heiko Carstens
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Heiko Carstens
     

26 Sep, 2006

1 commit


01 Jul, 2006

2 commits

  • Conversion of nr_writeback to per zone counter.

    This removes the last page_state counter from arch/i386/mm/pgtable.c so we
    drop the page_state from there.

    [akpm@osdl.org: bugfix]
    Signed-off-by: Christoph Lameter
    Cc: Trond Myklebust
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • NOTE: ZVC are *not* the lightweight event counters. ZVCs are reliable whereas
    event counters do not need to be.

    Zone based VM statistics are necessary to be able to determine what the state
    of memory in one zone is. In a NUMA system this can be helpful for local
    reclaim and other memory optimizations that may be able to shift VM load in
    order to get more balanced memory use.

    It is also useful to know how the computing load affects the memory
    allocations on various zones. This patchset allows the retrieval of that data
    from userspace.

    The patchset introduces a framework for counters that is a cross between the
    existing page_stats --which are simply global counters split per cpu-- and the
    approach of deferred incremental updates implemented for nr_pagecache.

    Small per cpu 8 bit counters are added to struct zone. If the counter exceeds
    certain thresholds then the counters are accumulated in an array of
    atomic_long in the zone and in a global array that sums up all zone values.
    The small 8 bit counters are next to the per cpu page pointers and so they
    will be in high in the cpu cache when pages are allocated and freed.

    Access to VM counter information for a zone and for the whole machine is then
    possible by simply indexing an array (Thanks to Nick Piggin for pointing out
    that approach). The access to the total number of pages of various types does
    no longer require the summing up of all per cpu counters.

    Benefits of this patchset right now:

    - Ability for UP and SMP configuration to determine how memory
    is balanced between the DMA, NORMAL and HIGHMEM zones.

    - loops over all processors are avoided in writeback and
    reclaim paths. We can avoid caching the writeback information
    because the needed information is directly accessible.

    - Special handling for nr_pagecache removed.

    - zone_reclaim_interval vanishes since VM stats can now determine
    when it is worth to do local reclaim.

    - Fast inline per node page state determination.

    - Accurate counters in /sys/devices/system/node/node*/meminfo. Current
    counters are counting simply which processor allocated a page somewhere
    and guestimate based on that. So the counters were not useful to show
    the actual distribution of page use on a specific zone.

    - The swap_prefetch patch requires per node statistics in order to
    figure out when processors of a node can prefetch. This patch provides
    some of the needed numbers.

    - Detailed VM counters available in more /proc and /sys status files.

    References to earlier discussions:
    V1 http://marc.theaimsgroup.com/?l=linux-kernel&m=113511649910826&w=2
    V2 http://marc.theaimsgroup.com/?l=linux-kernel&m=114980851924230&w=2
    V3 http://marc.theaimsgroup.com/?l=linux-kernel&m=115014697910351&w=2
    V4 http://marc.theaimsgroup.com/?l=linux-kernel&m=115024767318740&w=2

    Performance tests with AIM7 did not show any regressions. Seems to be a tad
    faster even. Tested on ia64/NUMA. Builds fine on i386, SMP / UP. Includes
    fixes for s390/arm/uml arch code.

    This patch:

    Move counter code from page_alloc.c/page-flags.h to vmstat.c/h.

    Create vmstat.c/vmstat.h by separating the counter code and the proc
    functions.

    Move the vm_stat_text array before zoneinfo_show.

    [akpm@osdl.org: s390 build fix]
    [akpm@osdl.org: HOTPLUG_CPU build fix]
    Signed-off-by: Christoph Lameter
    Cc: Heiko Carstens
    Cc: Martin Schwidefsky
    Cc: Trond Myklebust
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

23 Jun, 2006

1 commit

  • As Nick points out, only ia64 uses PG_uncached. So we can push it up into the
    higher bits of the lower half of page->flags and make room for another flag on
    32-bit machines.

    Cc: "Luck, Tony"
    Cc: Jesse Barnes
    Cc: Jes Sorensen
    Cc: Nick Piggin
    Cc: Andy Whitcroft
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

11 Apr, 2006

2 commits

  • Add some documentation regarding the utilisation of the flags field in
    struct page. This field is overloaded for per page bits and to hold node,
    zone and SPARSEMEM information. Make it clear which areas are used for
    what and how many bits are in each area.

    Signed-off-by: Andy Whitcroft
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andy Whitcroft
     
  • Rohit found an obscure bug causing buddy list corruption.

    page_is_buddy is using a non-atomic test (PagePrivate && page_count == 0)
    to determine whether or not a free page's buddy is itself free and in the
    buddy lists.

    Each of the conjuncts may be true at different times due to unrelated
    conditions, so the non-atomic page_is_buddy test may find each conjunct to
    be true even if they were not both true at the same time (ie. the page was
    not on the buddy lists).

    Signed-off-by: Martin Bligh
    Signed-off-by: Rohit Seth
    Signed-off-by: Nick Piggin
    Signed-off-by: KAMEZAWA Hiroyuki
    Signed-off-by: Linus Torvalds

    Nick Piggin
     

22 Mar, 2006

6 commits


07 Jan, 2006

3 commits

  • Comment the new locking rules for page_state statistics.

    Signed-off-by: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • Optimise page_state manipulations by introducing interrupt unsafe accessors
    to page_state fields. Callers must provide their own locking (either
    disable interrupts or not update from interrupt context).

    Switch over the hot callsites that can easily be moved under interrupts off
    sections.

    Signed-off-by: Nick Piggin
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • Add dma32 to zone statistics. Also attempt to arrange struct page_state a
    bit better (visually).

    Signed-off-by: Nick Piggin
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     

23 Nov, 2005

1 commit

  • It looks like snd_xxx is not the only nopage to be using PageReserved as a way
    of holding a high-order page together: which no longer works, but is masked by
    our failure to free from VM_RESERVED areas. We cannot fix that bug without
    first substituting another way to hold the high-order page together, while
    farming out the 0-order pages from within it.

    That's just what PageCompound is designed for, but it's been kept under
    CONFIG_HUGETLB_PAGE. Remove the #ifdefs: which saves some space (out- of-line
    put_page), doesn't slow down what most needs to be fast (already using
    hugetlb), and unifies the way we handle high-order pages.

    Signed-off-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

05 Sep, 2005

2 commits

  • Add page_state info to the per-node meminfo file in sysfs. This is mostly
    just for informational purposes.

    The lack of this information was brought up recently during a discussion
    regarding pagecache clearing, and I put this patch together to test out one
    of the suggestions.

    It seems like interesting info to have, so I'm submitting the patch.

    Signed-off-by: Martin Hicks
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Martin Hicks
     
  • This bitop does not need to be atomic because it is performed when there will
    be no references to the page (ie. the page is being freed).

    Signed-off-by: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     

22 Jun, 2005

3 commits


01 May, 2005

1 commit

  • This is a patch for counting the number of pages for bounce buffers. It's
    shown in /proc/vmstat.

    Currently, the number of bounce pages are not counted anywhere. So, if
    there are many bounce pages, it seems that there are leaked pages. And
    it's difficult for a user to imagine the usage of bounce pages. So, it's
    meaningful to show # of bouce pages.

    Signed-off-by: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     

17 Apr, 2005

1 commit

  • Initial git repository build. I'm not bothering with the full history,
    even though we have it. We can create a separate "historical" git
    archive of that later if we want to, and in the meantime it's about
    3.2GB when imported into git - space that would just make the early
    git days unnecessarily complicated, when we don't have a lot of good
    infrastructure for it.

    Let it rip!

    Linus Torvalds