20 Nov, 2008

7 commits

  • Fix the old comment on the scan ratio calculations.

    Signed-off-by: Rik van Riel
    Cc: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rik van Riel
     
  • In the past, GFP_NOFS (but of course not GFP_NOIO) was allowed to reclaim
    by writing to swap. That got partially broken in 2.6.23, when may_enter_fs
    initialization was moved up before the allocation of swap, so its
    PageSwapCache test was failing the first time around,

    Fix it by setting may_enter_fs when add_to_swap() succeeds with
    __GFP_IO. In fact, check __GFP_IO before calling add_to_swap():
    allocating swap we're not ready to use just increases disk seeking.

    Signed-off-by: Hugh Dickins
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • Page migration's writeout() has got understandably confused by the nasty
    AOP_WRITEPAGE_ACTIVATE case: as in normal success, a writepage() error has
    unlocked the page, so writeout() then needs to relock it.

    Signed-off-by: Hugh Dickins
    Cc: KAMEZAWA Hiroyuki
    Cc: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • Current vmalloc restart search for a free area in case we can't find one.
    The reason is there are areas which are lazily freed, and could be
    possibly freed now. However, current implementation start searching the
    tree from the last failing address, which is pretty much by definition at
    the end of address space. So, we fail.

    The proposal of this patch is to restart the search from the beginning of
    the requested vstart address. This fixes the regression in running KVM
    virtual machines for me, described in http://lkml.org/lkml/2008/10/28/349,
    caused by commit db64fe02258f1507e13fe5212a989922323685ce.

    Signed-off-by: Glauber Costa
    Signed-off-by: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Glauber Costa
     
  • An initial vmalloc failure should start off a synchronous flush of lazy
    areas, in case someone is in progress flushing them already, which could
    cause us to return an allocation failure even if there is plenty of KVA
    free.

    Signed-off-by: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • Fix off by one bug in the KVA allocator that can leave gaps in the address
    space.

    Signed-off-by: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • After adding a node into the machine, top cpuset's mems isn't updated.

    By reviewing the code, we found that the update function

    cpuset_track_online_nodes()

    was invoked after node_states[N_ONLINE] changes. It is wrong because
    N_ONLINE just means node has pgdat, and if node has/added memory, we use
    N_HIGH_MEMORY. So, We should invoke the update function after
    node_states[N_HIGH_MEMORY] changes, just like its commit says.

    This patch fixes it. And we use notifier of memory hotplug instead of
    direct calling of cpuset_track_online_nodes().

    Signed-off-by: Miao Xie
    Acked-by: Yasunori Goto
    Cc: David Rientjes
    Cc: Paul Menage
    Signed-off-by: Linus Torvalds

    Miao Xie
     

17 Nov, 2008

1 commit

  • Fix an unitialized return value when compiling on parisc (with CONFIG_UNEVICTABLE_LRU=y):
    mm/mlock.c: In function `__mlock_vma_pages_range':
    mm/mlock.c:165: warning: `ret' might be used uninitialized in this function

    Signed-off-by: Helge Deller
    [ It isn't ever really used uninitialized, since no caller should ever
    call this function with an empty range. But the compiler is correct
    that from a local analysis standpoint that is impossible to see, and
    fixing the warning is appropriate. ]
    Signed-off-by: Linus Torvalds

    Helge Deller
     

16 Nov, 2008

1 commit

  • Hugh Dickins reported show_page_path() is buggy and unsafe because

    - lack dput() against d_find_alias()
    - don't concern vma->vm_mm->owner == NULL
    - lack lock_page()

    it was only for debugging, so rather than trying to fix it, just remove
    it now.

    Reported-by: Hugh Dickins
    Signed-off-by: Hugh Dickins
    Signed-off-by: KOSAKI Motohiro
    CC: Lee Schermerhorn
    CC: Rik van Riel
    Signed-off-by: Linus Torvalds

    KOSAKI Motohiro
     

13 Nov, 2008

5 commits

  • The start pfn calculation in page_cgroup's memory hotplug notifier chain
    is wrong.

    Tested-by: Badari Pulavarty
    Signed-off-by: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • lockdep warns about following message at boot time on one of my test
    machine. Then, schedule_on_each_cpu() sholdn't be called when the task
    have mmap_sem.

    Actually, lru_add_drain_all() exist to prevent the unevictalble pages
    stay on reclaimable lru list. but currenct unevictable code can rescue
    unevictable pages although it stay on reclaimable list.

    So removing is better.

    In addition, this patch add lru_add_drain_all() to sys_mlock() and
    sys_mlockall(). it isn't must. but it reduce the failure of moving to
    unevictable list. its failure can rescue in vmscan later. but reducing
    is better.

    Note, if above rescuing happend, the Mlocked and the Unevictable field
    mismatching happend in /proc/meminfo. but it doesn't cause any real
    trouble.

    =======================================================
    [ INFO: possible circular locking dependency detected ]
    2.6.28-rc2-mm1 #2
    -------------------------------------------------------
    lvm/1103 is trying to acquire lock:
    (&cpu_hotplug.lock){--..}, at: [] get_online_cpus+0x29/0x50

    but task is already holding lock:
    (&mm->mmap_sem){----}, at: [] sys_mlockall+0x4e/0xb0

    which lock already depends on the new lock.

    the existing dependency chain (in reverse order) is:

    -> #3 (&mm->mmap_sem){----}:
    [] check_noncircular+0x82/0x110
    [] might_fault+0x4a/0xa0
    [] validate_chain+0xb11/0x1070
    [] might_fault+0x4a/0xa0
    [] __lock_acquire+0x263/0xa10
    [] lock_acquire+0x7c/0xb0 (*) grab mmap_sem
    [] might_fault+0x4a/0xa0
    [] might_fault+0x7b/0xa0
    [] might_fault+0x4a/0xa0
    [] copy_to_user+0x30/0x60
    [] filldir+0x7c/0xd0
    [] sysfs_readdir+0x11a/0x1f0 (*) grab sysfs_mutex
    [] filldir+0x0/0xd0
    [] filldir+0x0/0xd0
    [] vfs_readdir+0x86/0xa0 (*) grab i_mutex
    [] sys_getdents+0x6b/0xc0
    [] syscall_call+0x7/0xb
    [] 0xffffffff

    -> #2 (sysfs_mutex){--..}:
    [] check_noncircular+0x82/0x110
    [] sysfs_addrm_start+0x2c/0xc0
    [] validate_chain+0xb11/0x1070
    [] sysfs_addrm_start+0x2c/0xc0
    [] __lock_acquire+0x263/0xa10
    [] lock_acquire+0x7c/0xb0 (*) grab sysfs_mutex
    [] sysfs_addrm_start+0x2c/0xc0
    [] mutex_lock_nested+0xa5/0x2f0
    [] sysfs_addrm_start+0x2c/0xc0
    [] sysfs_addrm_start+0x2c/0xc0
    [] sysfs_addrm_start+0x2c/0xc0
    [] create_dir+0x3f/0x90
    [] sysfs_create_dir+0x29/0x50
    [] _spin_unlock+0x25/0x40
    [] kobject_add_internal+0xcd/0x1a0
    [] kobject_set_name_vargs+0x3a/0x50
    [] kobject_init_and_add+0x2d/0x40
    [] sysfs_slab_add+0xd2/0x180
    [] sysfs_add_func+0x0/0x70
    [] sysfs_add_func+0x5c/0x70 (*) grab slub_lock
    [] run_workqueue+0x172/0x200
    [] run_workqueue+0x10f/0x200
    [] worker_thread+0x0/0xf0
    [] worker_thread+0x9c/0xf0
    [] autoremove_wake_function+0x0/0x50
    [] worker_thread+0x0/0xf0
    [] kthread+0x42/0x70
    [] kthread+0x0/0x70
    [] kernel_thread_helper+0x7/0x1c
    [] 0xffffffff

    -> #1 (slub_lock){----}:
    [] check_noncircular+0xd/0x110
    [] slab_cpuup_callback+0x11f/0x1d0
    [] validate_chain+0xb11/0x1070
    [] slab_cpuup_callback+0x11f/0x1d0
    [] mark_lock+0x35d/0xd00
    [] __lock_acquire+0x263/0xa10
    [] lock_acquire+0x7c/0xb0
    [] slab_cpuup_callback+0x11f/0x1d0
    [] down_read+0x43/0x80
    [] slab_cpuup_callback+0x11f/0x1d0 (*) grab slub_lock
    [] slab_cpuup_callback+0x11f/0x1d0
    [] notifier_call_chain+0x3c/0x70
    [] _cpu_up+0x84/0x110
    [] cpu_up+0x4b/0x70 (*) grab cpu_hotplug.lock
    [] kernel_init+0x0/0x170
    [] kernel_init+0xb5/0x170
    [] kernel_init+0x0/0x170
    [] kernel_thread_helper+0x7/0x1c
    [] 0xffffffff

    -> #0 (&cpu_hotplug.lock){--..}:
    [] validate_chain+0x5af/0x1070
    [] dev_status+0x0/0x50
    [] __lock_acquire+0x263/0xa10
    [] lock_acquire+0x7c/0xb0
    [] get_online_cpus+0x29/0x50
    [] mutex_lock_nested+0xa5/0x2f0
    [] get_online_cpus+0x29/0x50
    [] get_online_cpus+0x29/0x50
    [] lru_add_drain_per_cpu+0x0/0x10
    [] get_online_cpus+0x29/0x50 (*) grab cpu_hotplug.lock
    [] schedule_on_each_cpu+0x32/0xe0
    [] __mlock_vma_pages_range+0x85/0x2c0
    [] __lock_acquire+0x285/0xa10
    [] vma_merge+0xa9/0x1d0
    [] mlock_fixup+0x180/0x200
    [] do_mlockall+0x78/0x90 (*) grab mmap_sem
    [] sys_mlockall+0x81/0xb0
    [] syscall_call+0x7/0xb
    [] 0xffffffff

    other info that might help us debug this:

    1 lock held by lvm/1103:
    #0: (&mm->mmap_sem){----}, at: [] sys_mlockall+0x4e/0xb0

    stack backtrace:
    Pid: 1103, comm: lvm Not tainted 2.6.28-rc2-mm1 #2
    Call Trace:
    [] print_circular_bug_tail+0x7c/0xd0
    [] validate_chain+0x5af/0x1070
    [] dev_status+0x0/0x50
    [] __lock_acquire+0x263/0xa10
    [] lock_acquire+0x7c/0xb0
    [] get_online_cpus+0x29/0x50
    [] mutex_lock_nested+0xa5/0x2f0
    [] get_online_cpus+0x29/0x50
    [] get_online_cpus+0x29/0x50
    [] lru_add_drain_per_cpu+0x0/0x10
    [] get_online_cpus+0x29/0x50
    [] schedule_on_each_cpu+0x32/0xe0
    [] __mlock_vma_pages_range+0x85/0x2c0
    [] __lock_acquire+0x285/0xa10
    [] vma_merge+0xa9/0x1d0
    [] mlock_fixup+0x180/0x200
    [] do_mlockall+0x78/0x90
    [] sys_mlockall+0x81/0xb0
    [] syscall_call+0x7/0xb

    Signed-off-by: KOSAKI Motohiro
    Tested-by: Kamalesh Babulal
    Cc: Lee Schermerhorn
    Cc: Christoph Lameter
    Cc: Heiko Carstens
    Cc: Nick Piggin
    Cc: Hugh Dickins
    Cc: Rik van Riel
    Cc: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KOSAKI Motohiro
     
  • If all allowable memory is unreclaimable, it is possible to loop forever
    in the page allocator for ~__GFP_NORETRY allocations.

    During this time, it is also possible for a task's cpuset to expand its
    set of allowable nodes so that it now includes free memory. The cached
    copy of this set, current->mems_allowed, is stale, however, since there
    has not been a subsequent call to cpuset_update_task_memory_state().

    The cached copy of the set of allowable nodes is now updated in the page
    allocator's slow path so the additional memory is available to
    get_page_from_freelist().

    [akpm@linux-foundation.org: add comment]
    Signed-off-by: David Rientjes
    Cc: Paul Menage
    Cc: Christoph Lameter
    Cc: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Rientjes
     
  • Oops. Part of the hugetlb private reservation code was not fully
    converted to use hstates.

    When a huge page must be unmapped from VMAs due to a failed COW,
    HPAGE_SIZE is used in the call to unmap_hugepage_range() regardless of
    the page size being used. This works if the VMA is using the default
    huge page size. Otherwise we might unmap too much, too little, or
    trigger a BUG_ON. Rare but serious -- fix it.

    Signed-off-by: Adam Litke
    Cc: Jon Tollefson
    Cc: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adam Litke
     
  • The STACK_GROWSUP case of stack expansion was missing a test for 'prev',
    which got removed by commit cb8f488c33539f096580e202f5438a809195008f
    ("mmap.c: deinline a few functions") by mistake.

    I found my original email in "sent" folder. The patch in that mail
    does NOT remove !prev. That change had beed added by someone else.

    Ok, I think we are not much interested in who did it, let's
    fix it for good.

    [ "It looks like this was caused by me fixing rejects. That was the
    fancy include-lots-of-context-so-it-wont-apply patch." - akpm ]

    Reported-and-bisected-by: Helge Deller
    Signed-off-by: Denys Vlasenko
    Cc: Andrew Morton
    Cc: Jiri Kosina
    Signed-off-by: Linus Torvalds

    Denys Vlasenko
     

07 Nov, 2008

10 commits

  • Xen can end up calling vm_unmap_aliases() before vmalloc_init() has
    been called. In this case its safe to make it a simple no-op.

    Signed-off-by: Jeremy Fitzhardinge
    Cc: Linux Memory Management List
    Cc: Nick Piggin
    Signed-off-by: Ingo Molnar

    Jeremy Fitzhardinge
     
  • * master.kernel.org:/home/rmk/linux-2.6-arm:
    [ARM] xsc3: fix xsc3_l2_inv_range
    [ARM] mm: fix page table initialization
    [ARM] fix naming of MODULE_START / MODULE_END
    ARM: OMAP: Fix define for twl4030 irqs
    ARM: OMAP: Fix get_irqnr_and_base to clear spurious interrupt bits
    ARM: OMAP: Fix debugfs_create_*'s error checking method for arm/plat-omap
    ARM: OMAP: Fix compiler warnings in gpmc.c
    [ARM] fix VFP+softfloat binaries

    Linus Torvalds
     
  • My last bugfix here (adding zone->lock) introduced a new problem: Using
    page_zone(pfn_to_page(pfn)) to get the zone after the for() loop is wrong.
    pfn will then be >= end_pfn, which may be in a different zone or not
    present at all. This may lead to an addressing exception in page_zone()
    or spin_lock_irqsave().

    Now I use __first_valid_page() again after the loop to find a valid page
    for page_zone().

    Signed-off-by: Gerald Schaefer
    Acked-by: Nathan Fontenot
    Reviewed-by: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Gerald Schaefer
     
  • Paramter @mem has been removed since v2.6.26, now delete it's comment.

    Signed-off-by: Qinghuang Feng
    Acked-by: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Qinghuang Feng
     
  • It's insufficient to simply compare node ids when warning about offnode
    page_structs since it's possible to still have local affinity.

    Acked-by: Christoph Lameter
    Signed-off-by: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Rientjes
     
  • Move the migrate_prep outside the mmap_sem for the following system calls

    1. sys_move_pages
    2. sys_migrate_pages
    3. sys_mbind()

    It really does not matter when we flush the lru. The system is free to
    add pages onto the lru even during migration which will make the page
    migration either skip the page (mbind, migrate_pages) or return a busy
    state (move_pages).

    Fixes this lockdep warning (and potential deadlock):

    Some VM place has
    mmap_sem -> kevent_wq via lru_add_drain_all()

    net/core/dev.c::dev_ioctl() has
    rtnl_lock -> mmap_sem (*) the ioctl has copy_from_user() and it can do page fault.

    linkwatch_event has
    kevent_wq -> rtnl_lock

    Signed-off-by: Christoph Lameter
    Cc: KOSAKI Motohiro
    Reported-by: Heiko Carstens
    Cc: Nick Piggin
    Cc: Hugh Dickins
    Cc: Rik van Riel
    Cc: Lee Schermerhorn
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • When /proc/sys/vm/oom_dump_tasks is enabled, it's only necessary to dump
    task state information for thread group leaders. The kernel log gets
    quickly overwhelmed on machines with a massive number of threads by
    dumping non-thread group leaders.

    Reviewed-by: Christoph Lameter
    Signed-off-by: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Rientjes
     
  • As we can determine exactly when a gigantic page is in use we can optimise
    the common regular page cases by pulling out gigantic page initialisation
    into its own function. As gigantic pages are never released to buddy we
    do not need a destructor. This effectivly reverts the previous change to
    the main buddy allocator. It also adds a paranoid check to ensure we
    never release gigantic pages from hugetlbfs to the main buddy.

    Signed-off-by: Andy Whitcroft
    Cc: Jon Tollefson
    Cc: Mel Gorman
    Cc: Nick Piggin
    Cc: Christoph Lameter
    Cc: [2.6.27.x]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andy Whitcroft
     
  • When working with hugepages, hugetlbfs assumes that those hugepages are
    smaller than MAX_ORDER. Specifically it assumes that the mem_map is
    contigious and uses that to optimise access to the elements of the mem_map
    that represent the hugepage. Gigantic pages (such as 16GB pages on
    powerpc) by definition are of greater order than MAX_ORDER (larger than
    MAX_ORDER_NR_PAGES in size). This means that we can no longer make use of
    the buddy alloctor guarentees for the contiguity of the mem_map, which
    ensures that the mem_map is at least contigious for maximmally aligned
    areas of MAX_ORDER_NR_PAGES pages.

    This patch adds new mem_map accessors and iterator helpers which handle
    any discontiguity at MAX_ORDER_NR_PAGES boundaries. It then uses these to
    implement gigantic page versions of copy_huge_page and clear_huge_page,
    and to allow follow_hugetlb_page handle gigantic pages.

    Signed-off-by: Andy Whitcroft
    Cc: Jon Tollefson
    Cc: Mel Gorman
    Cc: Nick Piggin
    Cc: Christoph Lameter
    Cc: [2.6.27.x]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andy Whitcroft
     
  • As of 73bdf0a60e607f4b8ecc5aec597105976565a84f, the kernel needs
    to know where modules are located in the virtual address space.
    On ARM, we located this region between MODULE_START and MODULE_END.
    Unfortunately, everyone else calls it MODULES_VADDR and MODULES_END.
    Update ARM to use the same naming, so is_vmalloc_or_module_addr()
    can work properly. Also update the comment on mm/vmalloc.c to
    reflect that ARM also places modules in a separate region from the
    vmalloc space.

    Signed-off-by: Russell King

    Russell King
     

31 Oct, 2008

3 commits

  • Junjiro R. Okajima reported a problem where knfsd crashes if you are
    using it to export shmemfs objects and run strict overcommit. In this
    situation the current->mm based modifier to the overcommit goes through a
    NULL pointer.

    We could simply check for NULL and skip the modifier but we've caught
    other real bugs in the past from mm being NULL here - cases where we did
    need a valid mm set up (eg the exec bug about a year ago).

    To preserve the checks and get the logic we want shuffle the checking
    around and add a new helper to the vm_ security wrappers

    Also fix a current->mm reference in nommu that should use the passed mm

    [akpm@linux-foundation.org: coding-style fixes]
    [akpm@linux-foundation.org: fix build]
    Reported-by: Junjiro R. Okajima
    Acked-by: James Morris
    Signed-off-by: Alan Cox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alan Cox
     
  • Delete excess kernel-doc notation in mm/ subdirectory.
    Actually this is a kernel-doc notation fix.

    Warning(/var/linsrc/linux-2.6.27-git10//mm/vmalloc.c:902): Excess function parameter or struct member 'returns' description in 'vm_map_ram'

    Signed-off-by: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     
  • Nothing uses prepare_write or commit_write. Remove them from the tree
    completely.

    [akpm@linux-foundation.org: schedule simple_prepare_write() for unexporting]
    Signed-off-by: Nick Piggin
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     

24 Oct, 2008

1 commit

  • * 'proc' of git://git.kernel.org/pub/scm/linux/kernel/git/adobriyan/proc: (35 commits)
    proc: remove fs/proc/proc_misc.c
    proc: move /proc/vmcore creation to fs/proc/vmcore.c
    proc: move pagecount stuff to fs/proc/page.c
    proc: move all /proc/kcore stuff to fs/proc/kcore.c
    proc: move /proc/schedstat boilerplate to kernel/sched_stats.h
    proc: move /proc/modules boilerplate to kernel/module.c
    proc: move /proc/diskstats boilerplate to block/genhd.c
    proc: move /proc/zoneinfo boilerplate to mm/vmstat.c
    proc: move /proc/vmstat boilerplate to mm/vmstat.c
    proc: move /proc/pagetypeinfo boilerplate to mm/vmstat.c
    proc: move /proc/buddyinfo boilerplate to mm/vmstat.c
    proc: move /proc/vmallocinfo to mm/vmalloc.c
    proc: move /proc/slabinfo boilerplate to mm/slub.c, mm/slab.c
    proc: move /proc/slab_allocators boilerplate to mm/slab.c
    proc: move /proc/interrupts boilerplate code to fs/proc/interrupts.c
    proc: move /proc/stat to fs/proc/stat.c
    proc: move rest of /proc/partitions code to block/genhd.c
    proc: move /proc/cpuinfo code to fs/proc/cpuinfo.c
    proc: move /proc/devices code to fs/proc/devices.c
    proc: move rest of /proc/locks to fs/locks.c
    ...

    Linus Torvalds
     

23 Oct, 2008

10 commits


21 Oct, 2008

2 commits

  • Removed duplicated #include in mm/vmalloc.c and
    "internal.h" in mm/memory.c.

    Signed-off-by: Huang Weiyi
    Signed-off-by: Linus Torvalds

    Huang Weiyi
     
  • We're trying to keep the !CONFIG_SHMEM tiny-shmem.c (using ramfs without
    swap) in synch with CONFIG_SHMEM shmem.c (and mpm is preparing patches
    to combine them). I was glad to see EXPORT_SYMBOL_GPL(shmem_file_setup)
    go into shmem.c, but why not support DRM-GEM when !CONFIG_SHMEM too?
    But caution says still depend on MMU, since !CONFIG_MMU is.. different.

    Signed-off-by: Hugh Dickins
    Acked-by: Matt Mackall
    Acked-by: Dave Airlie
    Signed-off-by: Linus Torvalds

    Hugh Dickins