29 May, 2009

4 commits

  • Fix build warning, "mem_cgroup_is_obsolete defined but not used" when
    CONFIG_DEBUG_VM is not set. Also avoid checking for !mem again and again.

    Signed-off-by: Nikanth Karthikesan
    Acked-by: Pekka Enberg
    Acked-by: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nikanth Karthikesan
     
  • Addresses http://bugzilla.kernel.org/show_bug.cgi?id=13302

    hugetlbfs reserves huge pages but does not fault them at mmap() time to
    ensure that future faults succeed. The reservation behaviour differs
    depending on whether the mapping was mapped MAP_SHARED or MAP_PRIVATE.
    For MAP_SHARED mappings, hugepages are reserved when mmap() is first
    called and are tracked based on information associated with the inode.
    Other processes mapping MAP_SHARED use the same reservation. MAP_PRIVATE
    track the reservations based on the VMA created as part of the mmap()
    operation. Each process mapping MAP_PRIVATE must make its own
    reservation.

    hugetlbfs currently checks if a VMA is MAP_SHARED with the VM_SHARED flag
    and not VM_MAYSHARE. For file-backed mappings, such as hugetlbfs,
    VM_SHARED is set only if the mapping is MAP_SHARED and the file was opened
    read-write. If a shared memory mapping was mapped shared-read-write for
    populating of data and mapped shared-read-only by other processes, then
    hugetlbfs would account for the mapping as if it was MAP_PRIVATE. This
    causes processes to fail to map the file MAP_SHARED even though it should
    succeed as the reservation is there.

    This patch alters mm/hugetlb.c and replaces VM_SHARED with VM_MAYSHARE
    when the intent of the code was to check whether the VMA was mapped
    MAP_SHARED or MAP_PRIVATE.

    Signed-off-by: Mel Gorman
    Cc: Hugh Dickins
    Cc: Ingo Molnar
    Cc:
    Cc: Lee Schermerhorn
    Cc: KOSAKI Motohiro
    Cc:
    Cc: Eric B Munson
    Cc: Adam Litke
    Cc: Andy Whitcroft
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • mapping->tree_lock can be acquired from interrupt context. Then,
    following dead lock can occur.

    Assume "A" as a page.

    CPU0:
    lock_page_cgroup(A)
    interrupted
    -> take mapping->tree_lock.
    CPU1:
    take mapping->tree_lock
    -> lock_page_cgroup(A)

    This patch tries to fix above deadlock by moving memcg's hook to out of
    mapping->tree_lock. charge/uncharge of pagecache/swapcache is protected
    by page lock, not tree_lock.

    After this patch, lock_page_cgroup() is not called under mapping->tree_lock.

    Signed-off-by: KAMEZAWA Hiroyuki
    Signed-off-by: Daisuke Nishimura
    Cc: Balbir Singh
    Cc: Daisuke Nishimura
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daisuke Nishimura
     
  • When /proc/sys/vm/oom_dump_tasks is enabled, it is possible to get a NULL
    pointer for tasks that have detached mm's since task_lock() is not held
    during the tasklist scan. Add the task_lock().

    Acked-by: Nick Piggin
    Acked-by: Mel Gorman
    Cc: Rik van Riel
    Signed-off-by: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Rientjes
     

22 May, 2009

1 commit

  • My old address will shut down in a few days time: remove it from the tree,
    and add a tmpfs (shmem filesystem) maintainer entry with the new address.

    Signed-off-by: Hugh Dickins
    Signed-off-by: Hugh Dickins
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

21 May, 2009

1 commit

  • * master.kernel.org:/home/rmk/linux-2.6-arm: (25 commits)
    [ARM] 5519/1: amba probe: pass "struct amba_id *" instead of void *
    [ARM] 5517/1: integrator: don't put clock lookups in __initdata
    [ARM] 5518/1: versatile: don't put clock lookups in __initdata
    [ARM] mach-l7200: fix spelling of SYS_CLOCK_OFF
    [ARM] Double check memmap is actually valid with a memmap has unexpected holes V2
    [ARM] realview: fix broadcast tick support
    [ARM] realview: remove useless smp_cross_call_done()
    [ARM] smp: fix cpumask usage in ARM SMP code
    [ARM] 5513/1: Eurotech VIPER SBC: fix compilation error
    [ARM] 5509/1: ep93xx: clkdev enable UARTS
    ARM: OMAP2/3: Change omapfb to use clkdev for dispc and rfbi, v2
    ARM: OMAP3: Fix HW SAVEANDRESTORE shift define
    ARM: OMAP3: Fix number of GPIO lines for 34xx
    [ARM] S3C: Do not set clk->owner field if unset
    [ARM] S3C2410: mach-bast.c registering i2c data too early
    [ARM] S3C24XX: Fix unused code warning in arch/arm/plat-s3c24xx/dma.c
    [ARM] S3C64XX: fix GPIO debug
    [ARM] S3C64XX: GPIO include cleanup
    [ARM] nwfpe: fix 'floatx80_is_nan' sparse warning
    [ARM] nwfpe: Add decleration for ExtendedCPDO
    ...

    Linus Torvalds
     

18 May, 2009

3 commits

  • pfn_valid() is meant to be able to tell if a given PFN has valid memmap
    associated with it or not. In FLATMEM, it is expected that holes always
    have valid memmap as long as there is valid PFNs either side of the hole.
    In SPARSEMEM, it is assumed that a valid section has a memmap for the
    entire section.

    However, ARM and maybe other embedded architectures in the future free
    memmap backing holes to save memory on the assumption the memmap is never
    used. The page_zone linkages are then broken even though pfn_valid()
    returns true. A walker of the full memmap must then do this additional
    check to ensure the memmap they are looking at is sane by making sure the
    zone and PFN linkages are still valid. This is expensive, but walkers of
    the full memmap are extremely rare.

    This was caught before for FLATMEM and hacked around but it hits again for
    SPARSEMEM because the page_zone linkages can look ok where the PFN linkages
    are totally screwed. This looks like a hatchet job but the reality is that
    any clean solution would end up consumning all the memory saved by punching
    these unexpected holes in the memmap. For example, we tried marking the
    memmap within the section invalid but the section size exceeds the size of
    the hole in most cases so pfn_valid() starts returning false where valid
    memmap exists. Shrinking the size of the section would increase memory
    consumption offsetting the gains.

    This patch identifies when an architecture is punching unexpected holes
    in the memmap that the memory model cannot automatically detect and sets
    ARCH_HAS_HOLES_MEMORYMODEL. At the moment, this is restricted to EP93xx
    which is the model sub-architecture this has been reported on but may expand
    later. When set, walkers of the full memmap must call memmap_valid_within()
    for each PFN and passing in what it expects the page and zone to be for
    that PFN. If it finds the linkages to be broken, it assumes the memmap is
    invalid for that PFN.

    Signed-off-by: Mel Gorman
    Signed-off-by: Russell King

    Mel Gorman
     
  • wb_kupdate() function has a bug on linux-2.6.30-rc5. This bug causes
    generic_sync_sb_inodes() to start to write inodes back much earlier than
    our expectations because it miscalculates oldest_jif in wb_kupdate().

    This bug was introduced in 704503d836042d4a4c7685b7036e7de0418fbc0f
    ('mm: fix proc_dointvec_userhz_jiffies "breakage"').

    Signed-off-by: Toshiyuki Okajima
    Cc: Alexey Dobriyan
    Cc: Peter Zijlstra
    Cc: Nick Piggin
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Toshiyuki Okajima
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6:
    mm: SLOB fix reclaim_state
    mm: SLUB fix reclaim_state
    slub: add Documentation/ABI/testing/sysfs-kernel-slab
    slub: enforce MAX_ORDER

    Linus Torvalds
     

15 May, 2009

2 commits

  • * 'for-linus' of git://git.kernel.dk/linux-2.6-block:
    Revert "mm: add /proc controls for pdflush threads"
    viocd: needs to depend on BLOCK
    block: fix the bio_vec array index out-of-bounds test

    Linus Torvalds
     
  • This reverts commit fafd688e4c0c34da0f3de909881117d374e4c7af.

    Work is progressing to switch away from pdflush as the process backing
    for flushing out dirty data. So it seems pointless to add more knobs
    to control pdflush threads. The original author of the patch did not
    have any specific use cases for adding the knobs, so we can easily
    revert this before 2.6.30 to avoid having to maintain this API
    forever.

    Signed-off-by: Jens Axboe

    Jens Axboe
     

13 May, 2009

1 commit


08 May, 2009

1 commit


07 May, 2009

5 commits

  • NOMMU mmap() has an option controlled by a sysctl variable that determines
    whether the allocations made by do_mmap_private() should have the excess
    space trimmed off and returned to the allocator. Make the initial setting
    of this variable a Kconfig configuration option.

    The reason there can be excess space is that the allocator only allocates
    in power-of-2 size chunks, but mmap()'s can be made in sizes that aren't a
    power of 2.

    There are two alternatives:

    (1) Keep the excess as dead space. The dead space then remains unused for the
    lifetime of the mapping. Mappings of shared objects such as libc, ld.so
    or busybox's text segment may retain their dead space forever.

    (2) Return the excess to the allocator. This means that the dead space is
    limited to less than a page per mapping, but it means that for a transient
    process, there's more chance of fragmentation as the excess space may be
    reused fairly quickly.

    During the boot process, a lot of transient processes are created, and
    this can cause a lot of fragmentation as the pagecache and various slabs
    grow greatly during this time.

    By turning off the trimming of excess space during boot and disabling
    batching of frees, Coldfire can manage to boot.

    A better way of doing things might be to have /sbin/init turn this option
    off. By that point libc, ld.so and init - which are all long-duration
    processes - have all been loaded and trimmed.

    Reported-by: Lanttor Guo
    Signed-off-by: David Howells
    Tested-by: Lanttor Guo
    Cc: Greg Ungerer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Howells
     
  • Clamp zone_batchsize() to 0 under NOMMU conditions to stop
    free_hot_cold_page() from queueing and batching frees.

    The problem is that under NOMMU conditions it is really important to be
    able to allocate large contiguous chunks of memory, but when munmap() or
    exit_mmap() releases big stretches of memory, return of these to the buddy
    allocator can be deferred, and when it does finally happen, it can be in
    small chunks.

    Whilst the fragmentation this incurs isn't so much of a problem under MMU
    conditions as userspace VM is glued together from individual pages with
    the aid of the MMU, it is a real problem if there isn't an MMU.

    By clamping the page freeing queue size to 0, pages are returned to the
    allocator immediately, and the buddy detector is more likely to be able to
    glue them together into large chunks immediately, and fragmentation is
    less likely to occur.

    By disabling batching of frees, and by turning off the trimming of excess
    space during boot, Coldfire can manage to boot.

    Reported-by: Lanttor Guo
    Signed-off-by: David Howells
    Tested-by: Lanttor Guo
    Cc: Greg Ungerer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Howells
     
  • Use roundown_pow_of_two(N) in zone_batchsize() rather than (1 <<
    (fls(N)-1)) as they are equivalent, and with the former it is easier to
    see what is going on.

    Signed-off-by: David Howells
    Tested-by: Lanttor Guo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Howells
     
  • If alloc_vmap_area() fails the allocated struct vmap_area has to be freed.

    Signed-off-by: Ralph Wuerthner
    Reviewed-by: Christoph Lameter
    Reviewed-by: Minchan Kim
    Cc: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ralph Wuerthner
     
  • When /proc/sys/vm/oom_kill_allocating_task is set for large systems that
    want to avoid the lengthy tasklist scan, it's possible to livelock if
    current is ineligible for oom kill. This normally happens when it is set
    to OOM_DISABLE, but is also possible if any threads are sharing the same
    ->mm with a different tgid.

    So change __out_of_memory() to fall back to the full task-list scan if it
    was unable to kill `current'.

    Cc: Nick Piggin
    Signed-off-by: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Rientjes
     

06 May, 2009

4 commits


03 May, 2009

6 commits

  • Local variable `scan' can overflow on zones which are larger than

    (2G * 4k) / 100 = 80GB.

    Making it 64-bit on 64-bit will fix that up.

    Cc: KOSAKI Motohiro
    Cc: Wu Fengguang
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: Lee Schermerhorn
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • The Committed_AS field can underflow in certain situations:

    > # while true; do cat /proc/meminfo | grep _AS; sleep 1; done | uniq -c
    > 1 Committed_AS: 18446744073709323392 kB
    > 11 Committed_AS: 18446744073709455488 kB
    > 6 Committed_AS: 35136 kB
    > 5 Committed_AS: 18446744073709454400 kB
    > 7 Committed_AS: 35904 kB
    > 3 Committed_AS: 18446744073709453248 kB
    > 2 Committed_AS: 34752 kB
    > 9 Committed_AS: 18446744073709453248 kB
    > 8 Committed_AS: 34752 kB
    > 3 Committed_AS: 18446744073709320960 kB
    > 7 Committed_AS: 18446744073709454080 kB
    > 3 Committed_AS: 18446744073709320960 kB
    > 5 Committed_AS: 18446744073709454080 kB
    > 6 Committed_AS: 18446744073709320960 kB

    Because NR_CPUS can be greater than 1000 and meminfo_proc_show() does
    not check for underflow.

    But NR_CPUS proportional isn't good calculation. In general,
    possibility of lock contention is proportional to the number of online
    cpus, not theorical maximum cpus (NR_CPUS).

    The current kernel has generic percpu-counter stuff. using it is right
    way. it makes code simplify and percpu_counter_read_positive() don't
    make underflow issue.

    Reported-by: Dave Hansen
    Signed-off-by: KOSAKI Motohiro
    Cc: Eric B Munson
    Cc: Mel Gorman
    Cc: Christoph Lameter
    Cc: [All kernel versions]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KOSAKI Motohiro
     
  • Current mem_cgroup_shrink_usage() has two problems.

    1. It doesn't call mem_cgroup_out_of_memory and doesn't update
    last_oom_jiffies, so pagefault_out_of_memory invokes global OOM.

    2. Considering hierarchy, shrinking has to be done from the
    mem_over_limit, not from the memcg which the page would be charged to.

    mem_cgroup_try_charge_swapin() does all of these things properly, so we
    use it and call cancel_charge_swapin when it succeeded.

    The name of "shrink_usage" is not appropriate for this behavior, so we
    change it too.

    Signed-off-by: Daisuke Nishimura
    Acked-by: KAMEZAWA Hiroyuki
    Cc: Li Zefan
    Cc: Paul Menage
    Cc: Dhaval Giani
    Cc: Daisuke Nishimura
    Cc: YAMAMOTO Takashi
    Cc: KOSAKI Motohiro
    Cc: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daisuke Nishimura
     
  • Change page_mkwrite to allow implementations to return with the page
    locked, and also change it's callers (in page fault paths) to hold the
    lock until the page is marked dirty. This allows the filesystem to have
    full control of page dirtying events coming from the VM.

    Rather than simply hold the page locked over the page_mkwrite call, we
    call page_mkwrite with the page unlocked and allow callers to return with
    it locked, so filesystems can avoid LOR conditions with page lock.

    The problem with the current scheme is this: a filesystem that wants to
    associate some metadata with a page as long as the page is dirty, will
    perform this manipulation in its ->page_mkwrite. It currently then must
    return with the page unlocked and may not hold any other locks (according
    to existing page_mkwrite convention).

    In this window, the VM could write out the page, clearing page-dirty. The
    filesystem has no good way to detect that a dirty pte is about to be
    attached, so it will happily write out the page, at which point, the
    filesystem may manipulate the metadata to reflect that the page is no
    longer dirty.

    It is not always possible to perform the required metadata manipulation in
    ->set_page_dirty, because that function cannot block or fail. The
    filesystem may need to allocate some data structure, for example.

    And the VM cannot mark the pte dirty before page_mkwrite, because
    page_mkwrite is allowed to fail, so we must not allow any window where the
    page could be written to if page_mkwrite does fail.

    This solution of holding the page locked over the 3 critical operations
    (page_mkwrite, setting the pte dirty, and finally setting the page dirty)
    closes out races nicely, preventing page cleaning for writeout being
    initiated in that window. This provides the filesystem with a strong
    synchronisation against the VM here.

    - Sage needs this race closed for ceph filesystem.
    - Trond for NFS (http://bugzilla.kernel.org/show_bug.cgi?id=12913).
    - I need it for fsblock.
    - I suspect other filesystems may need it too (eg. btrfs).
    - I have converted buffer.c to the new locking. Even simple block allocation
    under dirty pages might be susceptible to i_size changing under partial page
    at the end of file (we also have a buffer.c-side problem here, but it cannot
    be fixed properly without this patch).
    - Other filesystems (eg. NFS, maybe btrfs) will need to change their
    page_mkwrite functions themselves.

    [ This also moves page_mkwrite another step closer to fault, which should
    eventually allow page_mkwrite to be moved into ->fault, and thus avoiding a
    filesystem calldown and page lock/unlock cycle in __do_fault. ]

    [akpm@linux-foundation.org: fix derefs of NULL ->mapping]
    Cc: Sage Weil
    Cc: Trond Myklebust
    Signed-off-by: Nick Piggin
    Cc: Valdis Kletnieks
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • This is a bugfix for commit 3c776e64660028236313f0e54f3a9945764422df
    ("memcg: charge swapcache to proper memcg").

    Used bit of swapcache is solid under page lock, but considering
    move_account, pc->mem_cgroup is not.

    We need lock_page_cgroup() anyway.

    Signed-off-by: Daisuke Nishimura
    Acked-by: KAMEZAWA Hiroyuki
    Cc: Balbir Singh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daisuke Nishimura
     
  • By the time the memory cgroup code is notified about a swapin we
    already hold a reference on the fault page.

    If the cgroup callback fails make sure to unlock AND release the page
    reference which was taken by lookup_swap_cach(), or we leak the reference.

    Signed-off-by: Johannes Weiner
    Cc: Balbir Singh
    Reviewed-by: Minchan Kim
    Acked-by: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     

23 Apr, 2009

1 commit

  • slub_max_order may not be equal to or greater than MAX_ORDER.

    Additionally, if a single object cannot be placed in a slab of
    slub_max_order, it still must allocate slabs below MAX_ORDER.

    Acked-by: Christoph Lameter
    Signed-off-by: David Rientjes
    Signed-off-by: Pekka Enberg

    David Rientjes
     

22 Apr, 2009

1 commit

  • Commit a6dc60f8975ad96d162915e07703a4439c80dcf0 ("vmscan: rename
    sc.may_swap to may_unmap") removed the may_swap flag, but memcg had used
    it as a flag for "we need to use swap?", as the name indicate.

    And in the current implementation, memcg cannot reclaim mapped file
    caches when mem+swap hits the limit.

    re-introduce may_swap flag and handle it at get_scan_ratio(). This
    patch doesn't influence any scan_control users other than memcg.

    Signed-off-by: KOSAKI Motohiro
    Signed-off-by: Daisuke Nishimura
    Acked-by: Johannes Weiner
    Cc: Balbir Singh
    Cc: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KOSAKI Motohiro
     

19 Apr, 2009

1 commit

  • Commit d979677c4c0 ("mm: shrink_all_memory(): use sc.nr_reclaimed")
    broke the memory shrinking used by hibernation, becuse it did not update
    shrink_all_zones() in accordance with the other changes it made.

    Fix this by making shrink_all_zones() update sc->nr_reclaimed instead of
    overwriting its value.

    This fixes http://bugzilla.kernel.org/show_bug.cgi?id=13058

    Reported-and-tested-by: Alan Jenkins
    Signed-off-by: Rafael J. Wysocki
    Signed-off-by: Linus Torvalds

    Rafael J. Wysocki
     

17 Apr, 2009

1 commit

  • Tetsuo Handa reports seeing the WARN_ON(current->mm == NULL) in
    security_vm_enough_memory(), when do_execve() is touching the
    target mm's stack, to set up its args and environment.

    Yes, a UMH_NO_WAIT or UMH_WAIT_PROC call_usermodehelper() spawns
    an mm-less kernel thread to do the exec. And in any case, that
    vm_enough_memory check when growing stack ought to be done on the
    target mm, not on the execer's mm (though apart from the warning,
    it only makes a slight tweak to OVERCOMMIT_NEVER behaviour).

    Reported-by: Tetsuo Handa
    Signed-off-by: Hugh Dickins
    Cc: stable@kernel.org
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

16 Apr, 2009

1 commit


14 Apr, 2009

6 commits

  • SHMEM_MAX_BYTES was derived from the maximum size of its triple-indirect
    swap vector, forgetting to take the MAX_LFS_FILESIZE limit into account.
    Never mind 256kB pages, even 8kB pages on 32-bit kernels allowed files to
    grow slightly bigger than that supposed maximum.

    Fix this by using the min of both (at build time not run time). And it
    happens that this calculation is good as far as 8MB pages on 32-bit or
    16MB pages on 64-bit: though SHMSWP_MAX_INDEX gets truncated before that,
    it's truncated to such large numbers that we don't need to care.

    [akpm@linux-foundation.org: it needs pagemap.h]
    [akpm@linux-foundation.org: fix sparc64 min() warnings]
    Signed-off-by: Hugh Dickins
    Cc: Yuri Tikhonov
    Cc: Paul Mackerras
    Cc: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • Fix a division by zero which we have in shmem_truncate_range() and
    shmem_unuse_inode() when using big PAGE_SIZE values (e.g. 256kB on
    ppc44x).

    With 256kB PAGE_SIZE, the ENTRIES_PER_PAGEPAGE constant becomes too large
    (0x1.0000.0000) on a 32-bit kernel, so this patch just changes its type
    from 'unsigned long' to 'unsigned long long'.

    Hugh: reverted its unsigned long longs in shmem_truncate_range() and
    shmem_getpage(): the pagecache index cannot be more than an unsigned long,
    so the divisions by zero occurred in unreached code. It's a pity we need
    any ULL arithmetic here, but I found no pretty way to avoid it.

    Signed-off-by: Yuri Tikhonov
    Signed-off-by: Hugh Dickins
    Cc: Paul Mackerras
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yuri Tikhonov
     
  • mm/memcontrol.c:318: warning: `mem_cgroup_is_obsolete' defined but not used

    [akpm@linux-foundation.org: simplify as suggested by Balbir]
    Signed-off-by: KAMEZAWA Hiroyuki
    Reviewed-by: Balbir Singh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • While better than get_user_pages(), the usage of gupf(), especially the
    return values and the fact that it can potentially only partially pin the
    range, warranted some documentation.

    Signed-off-by: Andy Grover
    Cc: Ingo Molnar
    Cc: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andy Grover
     
  • Point the UNEVICTABLE_LRU config option at the documentation describing
    the option.

    Signed-off-by: David Howells
    Cc: Lee Schermerhorn
    Cc: Rik van Riel
    Cc: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Howells
     
  • Fix filemap.c kernel-doc warnings:

    Warning(mm/filemap.c:575): No description found for parameter 'page'
    Warning(mm/filemap.c:575): No description found for parameter 'waiter'

    Signed-off-by: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     

07 Apr, 2009

1 commit

  • Add /proc entries to give the admin the ability to control the minimum and
    maximum number of pdflush threads. This allows finer control of pdflush
    on both large and small machines.

    The rationale is simply one size does not fit all. Admins on large and/or
    small systems may want to tune the min/max pdflush thread count to best
    suit their needs. Right now the min/max is hardcoded to 2/8. While
    probably a fair estimate for smaller machines, large machines with large
    numbers of CPUs and large numbers of filesystems/block devices may benefit
    from larger numbers of threads working on different block devices.

    Even if the background flushing algorithm is radically changed, it is
    still likely that multiple threads will be involved and admins would still
    desire finer control on the min/max other than to have to recompile the
    kernel.

    The patch adds '/proc/sys/vm/nr_pdflush_threads_min' and
    '/proc/sys/vm/nr_pdflush_threads_max' with r/w permissions.

    The minimum value for nr_pdflush_threads_min is 1 and the maximum value is
    the current value of nr_pdflush_threads_max. This minimum is required
    since additional thread creation is performed in a pdflush thread itself.

    The minimum value for nr_pdflush_threads_max is the current value of
    nr_pdflush_threads_min and the maximum value can be 1000.

    Documentation/sysctl/vm.txt is also updated.

    [akpm@linux-foundation.org: fix comment, fix whitespace, use __read_mostly]
    Signed-off-by: Peter W Morreale
    Reviewed-by: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter W Morreale