15 Aug, 2006

1 commit


01 Jul, 2006

1 commit

  • The remaining counters in page_state after the zoned VM counter patches
    have been applied are all just for show in /proc/vmstat. They have no
    essential function for the VM.

    We use a simple increment of per cpu variables. In order to avoid the most
    severe races we disable preempt. Preempt does not prevent the race between
    an increment and an interrupt handler incrementing the same statistics
    counter. However, that race is exceedingly rare, we may only loose one
    increment or so and there is no requirement (at least not in kernel) that
    the vm event counters have to be accurate.

    In the non preempt case this results in a simple increment for each
    counter. For many architectures this will be reduced by the compiler to a
    single instruction. This single instruction is atomic for i386 and x86_64.
    And therefore even the rare race condition in an interrupt is avoided for
    both architectures in most cases.

    The patchset also adds an off switch for embedded systems that allows a
    building of linux kernels without these counters.

    The implementation of these counters is through inline code that hopefully
    results in only a single instruction increment instruction being emitted
    (i386, x86_64) or in the increment being hidden though instruction
    concurrency (EPIC architectures such as ia64 can get that done).

    Benefits:
    - VM event counter operations usually reduce to a single inline instruction
    on i386 and x86_64.
    - No interrupt disable, only preempt disable for the preempt case.
    Preempt disable can also be avoided by moving the counter into a spinlock.
    - Handling is similar to zoned VM counters.
    - Simple and easily extendable.
    - Can be omitted to reduce memory use for embedded use.

    References:

    RFC http://marc.theaimsgroup.com/?l=linux-kernel&m=113512330605497&w=2
    RFC http://marc.theaimsgroup.com/?l=linux-kernel&m=114988082814934&w=2
    local_t http://marc.theaimsgroup.com/?l=linux-kernel&m=114991748606690&w=2
    V2 http://marc.theaimsgroup.com/?t=115014808400007&r=1&w=2
    V3 http://marc.theaimsgroup.com/?l=linux-kernel&m=115024767022346&w=2
    V4 http://marc.theaimsgroup.com/?l=linux-kernel&m=115047968808926&w=2

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

27 Jun, 2006

1 commit

  • This patch converts the combination of list_del(A) and list_add(A, B) to
    list_move(A, B).

    Cc: Greg Kroah-Hartman
    Cc: Ram Pai
    Signed-off-by: Akinobu Mita
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     

23 Jun, 2006

1 commit


29 Mar, 2006

1 commit


22 Mar, 2006

4 commits

  • In the page release paths, we can be sure that nobody will mess with our
    page->flags because the refcount has dropped to 0. So no need for atomic
    operations here.

    Signed-off-by: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • PG_active is protected by zone->lru_lock, it does not need TestSet/TestClear
    operations.

    Signed-off-by: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • PG_lru is protected by zone->lru_lock. It does not need TestSet/TestClear
    operations.

    Signed-off-by: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • If vmscan finds a zero refcount page on the lru list, never ClearPageLRU
    it. This means the release code need not hold ->lru_lock to stabilise
    PageLRU, so that lock may be skipped entirely when releasing !PageLRU pages
    (because we know PageLRU won't have been temporarily cleared by vmscan,
    which was previously guaranteed by holding the lock to synchronise against
    vmscan).

    Signed-off-by: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     

17 Mar, 2006

1 commit

  • We can call try_to_release_page() with PagePrivate off and a valid
    page->mapping This may cause all sorts of trouble for the filesystem
    *_releasepage() handlers. XFS bombs out in that case.

    Lock the page before checking for page private.

    Signed-off-by: Christoph Lameter
    Cc: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

09 Mar, 2006

1 commit

  • Implement percpu_counter_sum(). This is a more accurate but slower version of
    percpu_counter_read_positive().

    We need this for Alex's speedup-ext3_statfs patch and for the nr_file
    accounting fix. Otherwise these things would be too inaccurate on large CPU
    counts.

    Cc: Ravikiran G Thirumalai
    Cc: Alex Tomas
    Cc: "David S. Miller"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

15 Feb, 2006

1 commit

  • If a compound page has its own put_page_testzero destructor (the only current
    example is free_huge_page), that is noted in page[1].mapping of the compound
    page. But that's rather a poor place to keep it: functions which call
    set_page_dirty_lock after get_user_pages (e.g. Infiniband's
    __ib_umem_release) ought to be checking first, otherwise set_page_dirty is
    liable to crash on what's not the address of a struct address_space.

    And now I'm about to make that worse: it turns out that every compound page
    needs a destructor, so we can no longer rely on hugetlb pages going their own
    special way, to avoid further problems of page->mapping reuse. For example,
    not many people know that: on 50% of i386 -Os builds, the first tail page of a
    compound page purports to be PageAnon (when its destructor has an odd
    address), which surprises page_add_file_rmap.

    Keep the compound page destructor in page[1].lru.next instead. And to free up
    the common pairing of mapping and index, also move compound page order from
    index to lru.prev. Slab reuses page->lru too: but if we ever need slab to use
    compound pages, it can easily stack its use above this.

    (akpm: decoded version of the above: the tail pages of a compound page now
    have ->mapping==NULL, so there's no need for the set_page_dirty[_lock]()
    caller to check that they're not compund pages before doing the dirty).

    Signed-off-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

08 Feb, 2006

1 commit

  • Compound pages on SMP systems can now often be freed from pagetables via
    the release_pages path. This uses put_page_testzero which does not handle
    compound pages at all. Releasing constituent pages from process mappings
    decrements their count to a large negative number and leaks the reference
    at the head page - net result is a memory leak.

    The problem was hidden because the debug check in put_page_testzero itself
    actually did take compound pages into consideration.

    Fix the bug and the debug check.

    Signed-off-by: Nick Piggin
    Acked-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     

19 Jan, 2006

1 commit

  • Migration code currently does not take a reference to target page
    properly, so between unlocking the pte and trying to take a new
    reference to the page with isolate_lru_page, anything could happen to
    it.

    Fix this by holding the pte lock until we get a chance to elevate the
    refcount.

    Other small cleanups while we're here.

    Signed-off-by: Nick Piggin
    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     

11 Jan, 2006

1 commit


07 Jan, 2006

1 commit


23 Nov, 2005

1 commit

  • It looks like snd_xxx is not the only nopage to be using PageReserved as a way
    of holding a high-order page together: which no longer works, but is masked by
    our failure to free from VM_RESERVED areas. We cannot fix that bug without
    first substituting another way to hold the high-order page together, while
    farming out the 0-order pages from within it.

    That's just what PageCompound is designed for, but it's been kept under
    CONFIG_HUGETLB_PAGE. Remove the #ifdefs: which saves some space (out- of-line
    put_page), doesn't slow down what most needs to be fast (already using
    hugetlb), and unifies the way we handle high-order pages.

    Signed-off-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

07 Nov, 2005

1 commit


02 Nov, 2005

1 commit


31 Oct, 2005

1 commit


30 Oct, 2005

2 commits

  • Christoph Lameter demonstrated very poor scalability on the SGI 512-way, with
    a many-threaded application which concurrently initializes different parts of
    a large anonymous area.

    This patch corrects that, by using a separate spinlock per page table page, to
    guard the page table entries in that page, instead of using the mm's single
    page_table_lock. (But even then, page_table_lock is still used to guard page
    table allocation, and anon_vma allocation.)

    In this implementation, the spinlock is tucked inside the struct page of the
    page table page: with a BUILD_BUG_ON in case it overflows - which it would in
    the case of 32-bit PA-RISC with spinlock debugging enabled.

    Splitting the lock is not quite for free: another cacheline access. Ideally,
    I suppose we would use split ptlock only for multi-threaded processes on
    multi-cpu machines; but deciding that dynamically would have its own costs.
    So for now enable it by config, at some number of cpus - since the Kconfig
    language doesn't support inequalities, let preprocessor compare that with
    NR_CPUS. But I don't think it's worth being user-configurable: for good
    testing of both split and unsplit configs, split now at 4 cpus, and perhaps
    change that to 8 later.

    There is a benefit even for singly threaded processes: kswapd can be attacking
    one part of the mm while another part is busy faulting.

    Signed-off-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • Remove PageReserved() calls from core code by tightening VM_RESERVED
    handling in mm/ to cover PageReserved functionality.

    PageReserved special casing is removed from get_page and put_page.

    All setting and clearing of PageReserved is retained, and it is now flagged
    in the page_alloc checks to help ensure we don't introduce any refcount
    based freeing of Reserved pages.

    MAP_PRIVATE, PROT_WRITE of VM_RESERVED regions is tentatively being
    deprecated. We never completely handled it correctly anyway, and is be
    reintroduced in future if required (Hugh has a proof of concept).

    Once PageReserved() calls are removed from kernel/power/swsusp.c, and all
    arch/ and driver code, the Set and Clear calls, and the PG_reserved bit can
    be trivially removed.

    Last real user of PageReserved is swsusp, which uses PageReserved to
    determine whether a struct page points to valid memory or not. This still
    needs to be addressed (a generic page_is_ram() should work).

    A last caveat: the ZERO_PAGE is now refcounted and managed with rmap (and
    thus mapcounted and count towards shared rss). These writes to the struct
    page could cause excessive cacheline bouncing on big systems. There are a
    number of ways this could be addressed if it is an issue.

    Signed-off-by: Nick Piggin

    Refcount bug fix for filemap_xip.c

    Signed-off-by: Carsten Otte
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     

17 Apr, 2005

1 commit

  • Initial git repository build. I'm not bothering with the full history,
    even though we have it. We can create a separate "historical" git
    archive of that later if we want to, and in the meantime it's about
    3.2GB when imported into git - space that would just make the early
    git days unnecessarily complicated, when we don't have a lot of good
    infrastructure for it.

    Let it rip!

    Linus Torvalds