05 Aug, 2008

1 commit

  • Converting page lock to new locking bitops requires a change of page flag
    operation naming, so we might as well convert it to something nicer
    (!TestSetPageLocked_Lock => trylock_page, SetPageLocked => set_page_locked).

    This also facilitates lockdeping of page lock.

    Signed-off-by: Nick Piggin
    Acked-by: KOSAKI Motohiro
    Acked-by: Peter Zijlstra
    Acked-by: Andrew Morton
    Acked-by: Benjamin Herrenschmidt
    Signed-off-by: Linus Torvalds

    Nick Piggin
     

31 Jul, 2008

1 commit


25 Jul, 2008

1 commit


25 May, 2008

1 commit

  • The atomic_t type is 32bit but a 64bit system can have more than 2^32
    pages of virtual address space available. Without this we overflow on
    ludicrously large mappings

    Signed-off-by: Alan Cox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alan Cox
     

28 Apr, 2008

1 commit

  • Clean up messy conditional calling of test_clear_page_writeback() from both
    rotate_reclaimable_page() and end_page_writeback().

    The only user of rotate_reclaimable_page() is end_page_writeback() so this is
    OK.

    Signed-off-by: Miklos Szeredi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     

20 Mar, 2008

1 commit

  • Fix various kernel-doc notation in mm/:

    filemap.c: add function short description; convert 2 to kernel-doc
    fremap.c: change parameter 'prot' to @prot
    pagewalk.c: change "-" in function parameters to ":"
    slab.c: fix short description of kmem_ptr_validate()
    swap.c: fix description & parameters of put_pages_list()
    swap_state.c: fix function parameters
    vmalloc.c: change "@returns" to "Returns:" since that is not a parameter

    Signed-off-by: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     

05 Mar, 2008

1 commit

  • Each caller of mem_cgroup_move_lists is having to use page_get_page_cgroup:
    it's more convenient if it acts upon the page itself not the page_cgroup; and
    in a later patch this becomes important to handle within memcontrol.c.

    Signed-off-by: Hugh Dickins
    Cc: David Rientjes
    Acked-by: Balbir Singh
    Acked-by: KAMEZAWA Hiroyuki
    Cc: Hirokazu Takahashi
    Cc: YAMAMOTO Takashi
    Cc: Paul Menage
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

08 Feb, 2008

1 commit

  • Add the page_cgroup to the per cgroup LRU. The reclaim algorithm has
    been modified to make the isolate_lru_pages() as a pluggable component. The
    scan_control data structure now accepts the cgroup on behalf of which
    reclaims are carried out. try_to_free_pages() has been extended to become
    cgroup aware.

    [akpm@linux-foundation.org: fix warning]
    [Lee.Schermerhorn@hp.com: initialize all scan_control's isolate_pages member]
    [bunk@kernel.org: make do_try_to_free_pages() static]
    [hugh@veritas.com: memcgroup: fix try_to_free order]
    [kamezawa.hiroyu@jp.fujitsu.com: this unlock_page_cgroup() is unnecessary]
    Signed-off-by: Pavel Emelianov
    Signed-off-by: Balbir Singh
    Cc: Paul Menage
    Cc: Peter Zijlstra
    Cc: "Eric W. Biederman"
    Cc: Nick Piggin
    Cc: Kirill Korotaev
    Cc: Herbert Poetzl
    Cc: David Rientjes
    Cc: Vaidyanathan Srinivasan
    Signed-off-by: Lee Schermerhorn
    Signed-off-by: Hugh Dickins
    Signed-off-by: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Balbir Singh
     

06 Feb, 2008

1 commit


20 Oct, 2007

1 commit


17 Oct, 2007

3 commits

  • provide BDI constructor/destructor hooks

    [akpm@linux-foundation.org: compile fix]
    Signed-off-by: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     
  • While running some memory intensive load, system response deteriorated just
    after swap-out started.

    The cause of this problem is that when a PG_reclaim page is moved to the tail
    of the inactive LRU list in rotate_reclaimable_page(), lru_lock spin lock is
    acquired every page writeback . This deteriorates system performance and
    makes interrupt hold off time longer when swap-out started.

    Following patch solves this problem. I use pagevec in rotating reclaimable
    pages to mitigate LRU spin lock contention and reduce interrupt hold off time.

    I did a test that allocating and touching pages in multiple processes, and
    pinging to the test machine in flooding mode to measure response under memory
    intensive load.

    The test result is:

    -2.6.23-rc5
    --- testmachine ping statistics ---
    3000 packets transmitted, 3000 received, 0% packet loss, time 53222ms
    rtt min/avg/max/mdev = 0.074/0.652/172.228/7.176 ms, pipe 11, ipg/ewma
    17.746/0.092 ms

    -2.6.23-rc5-patched
    --- testmachine ping statistics ---
    3000 packets transmitted, 3000 received, 0% packet loss, time 51924ms
    rtt min/avg/max/mdev = 0.072/0.108/3.884/0.114 ms, pipe 2, ipg/ewma
    17.314/0.091 ms

    Max round-trip-time was improved.

    The test machine spec is that 4CPU(3.16GHz, Hyper-threading enabled)
    8GB memory , 8GB swap.

    I did ping test again to observe performance deterioration caused by taking
    a ref.

    -2.6.23-rc6-with-modifiedpatch
    --- testmachine ping statistics ---
    3000 packets transmitted, 3000 received, 0% packet loss, time 53386ms
    rtt min/avg/max/mdev = 0.074/0.110/4.716/0.147 ms, pipe 2, ipg/ewma 17.801/0.129 ms

    The result for my original patch is as follows.

    -2.6.23-rc5-with-originalpatch
    --- testmachine ping statistics ---
    3000 packets transmitted, 3000 received, 0% packet loss, time 51924ms
    rtt min/avg/max/mdev = 0.072/0.108/3.884/0.114 ms, pipe 2, ipg/ewma 17.314/0.091 ms

    The influence to response was small.

    [akpm@linux-foundation.org: fix uninitalised var warning]
    [hugh@veritas.com: fix locking]
    [randy.dunlap@oracle.com: fix function declaration]
    [hugh@veritas.com: fix BUG at include/linux/mm.h:220!]
    [hugh@veritas.com: kill redundancy in rotate_reclaimable_page]
    [hugh@veritas.com: move_tail_pages into lru_add_drain]
    Signed-off-by: Hisashi Hifumi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hisashi Hifumi
     
  • This patch cleans up duplicate includes in
    mm/

    Signed-off-by: Jesper Juhl
    Acked-by: Paul Mundt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jesper Juhl
     

10 May, 2007

1 commit

  • Since nonboot CPUs are now disabled after tasks and devices have been
    frozen and the CPU hotplug infrastructure is used for this purpose, we need
    special CPU hotplug notifications that will help the CPU-hotplug-aware
    subsystems distinguish normal CPU hotplug events from CPU hotplug events
    related to a system-wide suspend or resume operation in progress. This
    patch introduces such notifications and causes them to be used during
    suspend and resume transitions. It also changes all of the
    CPU-hotplug-aware subsystems to take these notifications into consideration
    (for now they are handled in the same way as the corresponding "normal"
    ones).

    [oleg@tv-sign.ru: cleanups]
    Signed-off-by: Rafael J. Wysocki
    Cc: Gautham R Shenoy
    Cc: Pavel Machek
    Signed-off-by: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rafael J. Wysocki
     

08 May, 2007

1 commit

  • If we add a new flag so that we can distinguish between the first page and the
    tail pages then we can avoid to use page->private in the first page.
    page->private == page for the first page, so there is no real information in
    there.

    Freeing up page->private makes the use of compound pages more transparent.
    They become more usable like real pages. Right now we have to be careful f.e.
    if we are going beyond PAGE_SIZE allocations in the slab on i386 because we
    can then no longer use the private field. This is one of the issues that
    cause us not to support debugging for page size slabs in SLAB.

    Having page->private available for SLUB would allow more meta information in
    the page struct. I can probably avoid the 16 bit ints that I have in there
    right now.

    Also if page->private is available then a compound page may be equipped with
    buffer heads. This may free up the way for filesystems to support larger
    blocks than page size.

    We add PageTail as an alias of PageReclaim. Compound pages cannot currently
    be reclaimed. Because of the alias one needs to check PageCompound first.

    The RFC for the this approach was discussed at
    http://marc.info/?t=117574302800001&r=1&w=2

    [nacc@us.ibm.com: fix hugetlbfs]
    Signed-off-by: Christoph Lameter
    Signed-off-by: Nishanth Aravamudan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

08 Dec, 2006

2 commits

  • There was lots of #ifdef noise in the kernel due to hotcpu_notifier(fn,
    prio) not correctly marking 'fn' as used in the !HOTPLUG_CPU case, and thus
    generating compiler warnings of unused symbols, hence forcing people to add
    #ifdefs.

    the compiler can skip truly unused functions just fine:

    text data bss dec hex filename
    1624412 728710 3674856 6027978 5bfaca vmlinux.before
    1624412 728710 3674856 6027978 5bfaca vmlinux.after

    [akpm@osdl.org: topology.c fix]
    Signed-off-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ingo Molnar
     
  • Currently we we use the lru head link of the second page of a compound page
    to hold its destructor. This was ok when it was purely an internal
    implmentation detail. However, hugetlbfs overrides this destructor
    violating the layering. Abstract this out as explicit calls, also
    introduce a type for the callback function allowing them to be type
    checked. For each callback we pre-declare the function, causing a type
    error on definition rather than on use elsewhere.

    [akpm@osdl.org: cleanups]
    Signed-off-by: Andy Whitcroft
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andy Whitcroft
     

22 Nov, 2006

1 commit


26 Sep, 2006

2 commits

  • This patch makes the following needlessly global functions static:
    - slab.c: kmem_find_general_cachep()
    - swap.c: __page_cache_release()
    - vmalloc.c: __vmalloc_node()

    Signed-off-by: Adrian Bunk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     
  • Introduce a VM_BUG_ON, which is turned on with CONFIG_DEBUG_VM. Use this
    in the lightweight, inline refcounting functions; PageLRU and PageActive
    checks in vmscan, because they're pretty well confined to vmscan. And in
    page allocate/free fastpaths which can be the hottest parts of the kernel
    for kbuilds.

    Unlike BUG_ON, VM_BUG_ON must not be used to execute statements with
    side-effects, and should not be used outside core mm code.

    Signed-off-by: Nick Piggin
    Cc: Hugh Dickins
    Cc: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     

15 Aug, 2006

1 commit


01 Jul, 2006

1 commit

  • The remaining counters in page_state after the zoned VM counter patches
    have been applied are all just for show in /proc/vmstat. They have no
    essential function for the VM.

    We use a simple increment of per cpu variables. In order to avoid the most
    severe races we disable preempt. Preempt does not prevent the race between
    an increment and an interrupt handler incrementing the same statistics
    counter. However, that race is exceedingly rare, we may only loose one
    increment or so and there is no requirement (at least not in kernel) that
    the vm event counters have to be accurate.

    In the non preempt case this results in a simple increment for each
    counter. For many architectures this will be reduced by the compiler to a
    single instruction. This single instruction is atomic for i386 and x86_64.
    And therefore even the rare race condition in an interrupt is avoided for
    both architectures in most cases.

    The patchset also adds an off switch for embedded systems that allows a
    building of linux kernels without these counters.

    The implementation of these counters is through inline code that hopefully
    results in only a single instruction increment instruction being emitted
    (i386, x86_64) or in the increment being hidden though instruction
    concurrency (EPIC architectures such as ia64 can get that done).

    Benefits:
    - VM event counter operations usually reduce to a single inline instruction
    on i386 and x86_64.
    - No interrupt disable, only preempt disable for the preempt case.
    Preempt disable can also be avoided by moving the counter into a spinlock.
    - Handling is similar to zoned VM counters.
    - Simple and easily extendable.
    - Can be omitted to reduce memory use for embedded use.

    References:

    RFC http://marc.theaimsgroup.com/?l=linux-kernel&m=113512330605497&w=2
    RFC http://marc.theaimsgroup.com/?l=linux-kernel&m=114988082814934&w=2
    local_t http://marc.theaimsgroup.com/?l=linux-kernel&m=114991748606690&w=2
    V2 http://marc.theaimsgroup.com/?t=115014808400007&r=1&w=2
    V3 http://marc.theaimsgroup.com/?l=linux-kernel&m=115024767022346&w=2
    V4 http://marc.theaimsgroup.com/?l=linux-kernel&m=115047968808926&w=2

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

27 Jun, 2006

1 commit

  • This patch converts the combination of list_del(A) and list_add(A, B) to
    list_move(A, B).

    Cc: Greg Kroah-Hartman
    Cc: Ram Pai
    Signed-off-by: Akinobu Mita
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     

23 Jun, 2006

1 commit


29 Mar, 2006

1 commit


22 Mar, 2006

4 commits

  • In the page release paths, we can be sure that nobody will mess with our
    page->flags because the refcount has dropped to 0. So no need for atomic
    operations here.

    Signed-off-by: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • PG_active is protected by zone->lru_lock, it does not need TestSet/TestClear
    operations.

    Signed-off-by: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • PG_lru is protected by zone->lru_lock. It does not need TestSet/TestClear
    operations.

    Signed-off-by: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • If vmscan finds a zero refcount page on the lru list, never ClearPageLRU
    it. This means the release code need not hold ->lru_lock to stabilise
    PageLRU, so that lock may be skipped entirely when releasing !PageLRU pages
    (because we know PageLRU won't have been temporarily cleared by vmscan,
    which was previously guaranteed by holding the lock to synchronise against
    vmscan).

    Signed-off-by: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     

17 Mar, 2006

1 commit

  • We can call try_to_release_page() with PagePrivate off and a valid
    page->mapping This may cause all sorts of trouble for the filesystem
    *_releasepage() handlers. XFS bombs out in that case.

    Lock the page before checking for page private.

    Signed-off-by: Christoph Lameter
    Cc: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

09 Mar, 2006

1 commit

  • Implement percpu_counter_sum(). This is a more accurate but slower version of
    percpu_counter_read_positive().

    We need this for Alex's speedup-ext3_statfs patch and for the nr_file
    accounting fix. Otherwise these things would be too inaccurate on large CPU
    counts.

    Cc: Ravikiran G Thirumalai
    Cc: Alex Tomas
    Cc: "David S. Miller"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

15 Feb, 2006

1 commit

  • If a compound page has its own put_page_testzero destructor (the only current
    example is free_huge_page), that is noted in page[1].mapping of the compound
    page. But that's rather a poor place to keep it: functions which call
    set_page_dirty_lock after get_user_pages (e.g. Infiniband's
    __ib_umem_release) ought to be checking first, otherwise set_page_dirty is
    liable to crash on what's not the address of a struct address_space.

    And now I'm about to make that worse: it turns out that every compound page
    needs a destructor, so we can no longer rely on hugetlb pages going their own
    special way, to avoid further problems of page->mapping reuse. For example,
    not many people know that: on 50% of i386 -Os builds, the first tail page of a
    compound page purports to be PageAnon (when its destructor has an odd
    address), which surprises page_add_file_rmap.

    Keep the compound page destructor in page[1].lru.next instead. And to free up
    the common pairing of mapping and index, also move compound page order from
    index to lru.prev. Slab reuses page->lru too: but if we ever need slab to use
    compound pages, it can easily stack its use above this.

    (akpm: decoded version of the above: the tail pages of a compound page now
    have ->mapping==NULL, so there's no need for the set_page_dirty[_lock]()
    caller to check that they're not compund pages before doing the dirty).

    Signed-off-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

08 Feb, 2006

1 commit

  • Compound pages on SMP systems can now often be freed from pagetables via
    the release_pages path. This uses put_page_testzero which does not handle
    compound pages at all. Releasing constituent pages from process mappings
    decrements their count to a large negative number and leaks the reference
    at the head page - net result is a memory leak.

    The problem was hidden because the debug check in put_page_testzero itself
    actually did take compound pages into consideration.

    Fix the bug and the debug check.

    Signed-off-by: Nick Piggin
    Acked-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     

19 Jan, 2006

1 commit

  • Migration code currently does not take a reference to target page
    properly, so between unlocking the pte and trying to take a new
    reference to the page with isolate_lru_page, anything could happen to
    it.

    Fix this by holding the pte lock until we get a chance to elevate the
    refcount.

    Other small cleanups while we're here.

    Signed-off-by: Nick Piggin
    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     

11 Jan, 2006

1 commit


07 Jan, 2006

1 commit


23 Nov, 2005

1 commit

  • It looks like snd_xxx is not the only nopage to be using PageReserved as a way
    of holding a high-order page together: which no longer works, but is masked by
    our failure to free from VM_RESERVED areas. We cannot fix that bug without
    first substituting another way to hold the high-order page together, while
    farming out the 0-order pages from within it.

    That's just what PageCompound is designed for, but it's been kept under
    CONFIG_HUGETLB_PAGE. Remove the #ifdefs: which saves some space (out- of-line
    put_page), doesn't slow down what most needs to be fast (already using
    hugetlb), and unifies the way we handle high-order pages.

    Signed-off-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

07 Nov, 2005

1 commit


02 Nov, 2005

1 commit


31 Oct, 2005

1 commit