07 Nov, 2011

3 commits

  • In balance_dirty_pages() task_ratelimit may be not initialized
    (initialization skiped by goto pause), and then used when calling
    tracing hook.

    Fix it by moving the task_ratelimit assignment before goto pause.

    Reported-by: Witold Baryluk
    Signed-off-by: Wu Fengguang

    Wu Fengguang
     
  • * 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits)
    Revert "tracing: Include module.h in define_trace.h"
    irq: don't put module.h into irq.h for tracking irqgen modules.
    bluetooth: macroize two small inlines to avoid module.h
    ip_vs.h: fix implicit use of module_get/module_put from module.h
    nf_conntrack.h: fix up fallout from implicit moduleparam.h presence
    include: replace linux/module.h with "struct module" wherever possible
    include: convert various register fcns to macros to avoid include chaining
    crypto.h: remove unused crypto_tfm_alg_modname() inline
    uwb.h: fix implicit use of asm/page.h for PAGE_SIZE
    pm_runtime.h: explicitly requires notifier.h
    linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h
    miscdevice.h: fix up implicit use of lists and types
    stop_machine.h: fix implicit use of smp.h for smp_processor_id
    of: fix implicit use of errno.h in include/linux/of.h
    of_platform.h: delete needless include
    acpi: remove module.h include from platform/aclinux.h
    miscdevice.h: delete unnecessary inclusion of module.h
    device_cgroup.h: delete needless include
    net: sch_generic remove redundant use of
    net: inet_timewait_sock doesnt need
    ...

    Fix up trivial conflicts (other header files, and removal of the ab3550 mfd driver) in
    - drivers/media/dvb/frontends/dibx000_common.c
    - drivers/media/video/{mt9m111.c,ov6650.c}
    - drivers/mfd/ab3550-core.c
    - include/linux/dmaengine.h

    Linus Torvalds
     
  • * 'writeback-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux:
    writeback: Add a 'reason' to wb_writeback_work
    writeback: send work item to queue_io, move_expired_inodes
    writeback: trace event balance_dirty_pages
    writeback: trace event bdi_dirty_ratelimit
    writeback: fix ppc compile warnings on do_div(long long, unsigned long)
    writeback: per-bdi background threshold
    writeback: dirty position control - bdi reserve area
    writeback: control dirty pause time
    writeback: limit max dirty pause time
    writeback: IO-less balance_dirty_pages()
    writeback: per task dirty rate limit
    writeback: stabilize bdi->dirty_ratelimit
    writeback: dirty rate control
    writeback: add bg_threshold parameter to __bdi_update_bandwidth()
    writeback: dirty position control
    writeback: account per-bdi accumulated dirtied pages

    Linus Torvalds
     

05 Nov, 2011

1 commit

  • * 'for-3.2/core' of git://git.kernel.dk/linux-block: (29 commits)
    block: don't call blk_drain_queue() if elevator is not up
    blk-throttle: use queue_is_locked() instead of lockdep_is_held()
    blk-throttle: Take blkcg->lock while traversing blkcg->policy_list
    blk-throttle: Free up policy node associated with deleted rule
    block: warn if tag is greater than real_max_depth.
    block: make gendisk hold a reference to its queue
    blk-flush: move the queue kick into
    blk-flush: fix invalid BUG_ON in blk_insert_flush
    block: Remove the control of complete cpu from bio.
    block: fix a typo in the blk-cgroup.h file
    block: initialize the bounce pool if high memory may be added later
    block: fix request_queue lifetime handling by making blk_queue_cleanup() properly shutdown
    block: drop @tsk from attempt_plug_merge() and explain sync rules
    block: make get_request[_wait]() fail if queue is dead
    block: reorganize throtl_get_tg() and blk_throtl_bio()
    block: reorganize queue draining
    block: drop unnecessary blk_get/put_queue() in scsi_cmd_ioctl() and blk_get_tg()
    block: pass around REQ_* flags instead of broken down booleans during request alloc/free
    block: move blk_throtl prototypes to block/blk.h
    block: fix genhd refcounting in blkio_policy_parse_and_set()
    ...

    Fix up trivial conflicts due to "mddev_t" -> "struct mddev" conversion
    and making the request functions be of type "void" instead of "int" in
    - drivers/md/{faulty.c,linear.c,md.c,md.h,multipath.c,raid0.c,raid1.c,raid10.c,raid5.c}
    - drivers/staging/zram/zram_drv.c

    Linus Torvalds
     

03 Nov, 2011

10 commits

  • Says Andrew:

    "60 patches. That's good enough for -rc1 I guess. I have quite a lot
    of detritus to be rechecked, work through maintainers, etc.

    - most of the remains of MM
    - rtc
    - various misc
    - cgroups
    - memcg
    - cpusets
    - procfs
    - ipc
    - rapidio
    - sysctl
    - pps
    - w1
    - drivers/misc
    - aio"

    * akpm: (60 commits)
    memcg: replace ss->id_lock with a rwlock
    aio: allocate kiocbs in batches
    drivers/misc/vmw_balloon.c: fix typo in code comment
    drivers/misc/vmw_balloon.c: determine page allocation flag can_sleep outside loop
    w1: disable irqs in critical section
    drivers/w1/w1_int.c: multiple masters used same init_name
    drivers/power/ds2780_battery.c: fix deadlock upon insertion and removal
    drivers/power/ds2780_battery.c: add a nolock function to w1 interface
    drivers/power/ds2780_battery.c: create central point for calling w1 interface
    w1: ds2760 and ds2780, use ida for id and ida_simple_get() to get it
    pps gpio client: add missing dependency
    pps: new client driver using GPIO
    pps: default echo function
    include/linux/dma-mapping.h: add dma_zalloc_coherent()
    sysctl: make CONFIG_SYSCTL_SYSCALL default to n
    sysctl: add support for poll()
    RapidIO: documentation update
    drivers/net/rionet.c: fix ethernet address macros for LE platforms
    RapidIO: fix potential null deref in rio_setup_device()
    RapidIO: add mport driver for Tsi721 bridge
    ...

    Linus Torvalds
     
  • warning: symbol 'swap_cgroup_ctrl' was not declared. Should it be static?

    Signed-off-by: H Hartley Sweeten
    Cc: Paul Menage
    Cc: Li Zefan
    Acked-by: Balbir Singh
    Cc: Daisuke Nishimura
    Acked-by: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    H Hartley Sweeten
     
  • Various code in memcontrol.c () calls this_cpu_read() on the calculations
    to be done from two different percpu variables, or does an open-coded
    read-modify-write on a single percpu variable.

    Disable preemption throughout these operations so that the writes go to
    the correct palces.

    [hannes@cmpxchg.org: added this_cpu to __this_cpu conversion]
    Signed-off-by: Johannes Weiner
    Signed-off-by: Steven Rostedt
    Cc: Greg Thelen
    Cc: KAMEZAWA Hiroyuki
    Cc: Balbir Singh
    Cc: Daisuke Nishimura
    Cc: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Steven Rostedt
     
  • There is a potential race between a thread charging a page and another
    thread putting it back to the LRU list:

    charge: putback:
    SetPageCgroupUsed SetPageLRU
    PageLRU && add to memcg LRU PageCgroupUsed && add to memcg LRU

    The order of setting one flag and checking the other is crucial, otherwise
    the charge may observe !PageLRU while the putback observes !PageCgroupUsed
    and the page is not linked to the memcg LRU at all.

    Global memory pressure may fix this by trying to isolate and putback the
    page for reclaim, where that putback would link it to the memcg LRU again.
    Without that, the memory cgroup is undeletable due to a charge whose
    physical page can not be found and moved out.

    Signed-off-by: Johannes Weiner
    Cc: Ying Han
    Acked-by: KAMEZAWA Hiroyuki
    Cc: Daisuke Nishimura
    Cc: Balbir Singh
    Cc: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • Reclaim decides to skip scanning an active list when the corresponding
    inactive list is above a certain size in comparison to leave the assumed
    working set alone while there are still enough reclaim candidates around.

    The memcg implementation of comparing those lists instead reports whether
    the whole memcg is low on the requested type of inactive pages,
    considering all nodes and zones.

    This can lead to an oversized active list not being scanned because of the
    state of the other lists in the memcg, as well as an active list being
    scanned while its corresponding inactive list has enough pages.

    Not only is this wrong, it's also a scalability hazard, because the global
    memory state over all nodes and zones has to be gathered for each memcg
    and zone scanned.

    Make these calculations purely based on the size of the two LRU lists
    that are actually affected by the outcome of the decision.

    Signed-off-by: Johannes Weiner
    Reviewed-by: Rik van Riel
    Cc: KOSAKI Motohiro
    Acked-by: KAMEZAWA Hiroyuki
    Cc: Daisuke Nishimura
    Cc: Balbir Singh
    Reviewed-by: Minchan Kim
    Reviewed-by: Ying Han
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • If somebody is touching data too early, it might be easier to diagnose a
    problem when dereferencing NULL at mem->info.nodeinfo[node] than trying to
    understand why mem_cgroup_per_zone is [un|partly]initialized.

    Signed-off-by: Igor Mammedov
    Acked-by: Michal Hocko
    Cc: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Igor Mammedov
     
  • Before calling schedule_timeout(), task state should be changed.

    Signed-off-by: KAMEZAWA Hiroyuki
    Acked-by: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • The memcg code sometimes uses "struct mem_cgroup *mem" and sometimes uses
    "struct mem_cgroup *memcg". Rename all mem variables to memcg in source
    file.

    Signed-off-by: Raghavendra K T
    Acked-by: KAMEZAWA Hiroyuki
    Acked-by: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Raghavendra K T
     
  • When the cgroup base was allocated with kmalloc, it was necessary to
    annotate the variable with kmemleak_not_leak(). But because it has
    recently been changed to be allocated with alloc_page() (which skips
    kmemleak checks) causes a warning on boot up.

    I was triggering this output:

    allocated 8388608 bytes of page_cgroup
    please try 'cgroup_disable=memory' option if you don't want memory cgroups
    kmemleak: Trying to color unknown object at 0xf5840000 as Grey
    Pid: 0, comm: swapper Not tainted 3.0.0-test #12
    Call Trace:
    [] ? printk+0x1d/0x1f^M
    [] paint_ptr+0x4f/0x78
    [] kmemleak_not_leak+0x58/0x7d
    [] ? __rcu_read_unlock+0x9/0x7d
    [] kmemleak_init+0x19d/0x1e9
    [] start_kernel+0x346/0x3ec
    [] ? loglevel+0x18/0x18
    [] i386_start_kernel+0xaa/0xb0

    After a bit of debugging I tracked the object 0xf840000 (and others) down
    to the cgroup code. The change from allocating base with kmalloc to
    alloc_page() has the base not calling kmemleak_alloc() which adds the
    pointer to the object_tree_root, but kmemleak_not_leak() adds it to the
    crt_early_log[] table. On kmemleak_init(), the entry is found in the
    early_log[] but not the object_tree_root, and this error message is
    displayed.

    If alloc_page() fails then it defaults back to vmalloc() which still uses
    the kmemleak_alloc() which makes us still need the kmemleak_not_leak()
    call. The solution is to call the kmemleak_alloc() directly if the
    alloc_page() succeeds.

    Reviewed-by: Michal Hocko
    Signed-off-by: Steven Rostedt
    Acked-by: Catalin Marinas
    Signed-off-by: Jonathan Nieder
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Steven Rostedt
     
  • Michel while working on the working set estimation code, noticed that
    calling get_page_unless_zero() on a random pfn_to_page(random_pfn)
    wasn't safe, if the pfn ended up being a tail page of a transparent
    hugepage under splitting by __split_huge_page_refcount().

    He then found the problem could also theoretically materialize with
    page_cache_get_speculative() during the speculative radix tree lookups
    that uses get_page_unless_zero() in SMP if the radix tree page is freed
    and reallocated and get_user_pages is called on it before
    page_cache_get_speculative has a chance to call get_page_unless_zero().

    So the best way to fix the problem is to keep page_tail->_count zero at
    all times. This will guarantee that get_page_unless_zero() can never
    succeed on any tail page. page_tail->_mapcount is guaranteed zero and
    is unused for all tail pages of a compound page, so we can simply
    account the tail page references there and transfer them to
    tail_page->_count in __split_huge_page_refcount() (in addition to the
    head_page->_mapcount).

    While debugging this s/_count/_mapcount/ change I also noticed get_page is
    called by direct-io.c on pages returned by get_user_pages. That wasn't
    entirely safe because the two atomic_inc in get_page weren't atomic. As
    opposed to other get_user_page users like secondary-MMU page fault to
    establish the shadow pagetables would never call any superflous get_page
    after get_user_page returns. It's safer to make get_page universally safe
    for tail pages and to use get_page_foll() within follow_page (inside
    get_user_pages()). get_page_foll() is safe to do the refcounting for tail
    pages without taking any locks because it is run within PT lock protected
    critical sections (PT lock for pte and page_table_lock for
    pmd_trans_huge).

    The standard get_page() as invoked by direct-io instead will now take
    the compound_lock but still only for tail pages. The direct-io paths
    are usually I/O bound and the compound_lock is per THP so very
    finegrined, so there's no risk of scalability issues with it. A simple
    direct-io benchmarks with all lockdep prove locking and spinlock
    debugging infrastructure enabled shows identical performance and no
    overhead. So it's worth it. Ideally direct-io should stop calling
    get_page() on pages returned by get_user_pages(). The spinlock in
    get_page() is already optimized away for no-THP builds but doing
    get_page() on tail pages returned by GUP is generally a rare operation
    and usually only run in I/O paths.

    This new refcounting on page_tail->_mapcount in addition to avoiding new
    RCU critical sections will also allow the working set estimation code to
    work without any further complexity associated to the tail page
    refcounting with THP.

    Signed-off-by: Andrea Arcangeli
    Reported-by: Michel Lespinasse
    Reviewed-by: Michel Lespinasse
    Reviewed-by: Minchan Kim
    Cc: Peter Zijlstra
    Cc: Hugh Dickins
    Cc: Johannes Weiner
    Cc: Rik van Riel
    Cc: Mel Gorman
    Cc: KOSAKI Motohiro
    Cc: Benjamin Herrenschmidt
    Cc: David Gibson
    Cc:
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     

02 Nov, 2011

1 commit


01 Nov, 2011

25 commits

  • Avoid false sharing of the vm_stat array.

    This was found to adversely affect tmpfs I/O performance.

    Tests run on a 640 cpu UV system.

    With 120 threads doing parallel writes, each to different tmpfs mounts:
    No patch: ~300 MB/sec
    With vm_stat alignment: ~430 MB/sec

    Signed-off-by: Dimitri Sivanich
    Acked-by: Christoph Lameter
    Acked-by: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dimitri Sivanich
     
  • A process spent 30 minutes exiting, just munlocking the pages of a large
    anonymous area that had been alternately mprotected into page-sized vmas:
    for every single page there's an anon_vma walk through all the other
    little vmas to find the right one.

    A general fix to that would be a lot more complicated (use prio_tree on
    anon_vma?), but there's one very simple thing we can do to speed up the
    common case: if a page to be munlocked is mapped only once, then it is our
    vma that it is mapped into, and there's no need whatever to walk through
    all the others.

    Okay, there is a very remote race in munlock_vma_pages_range(), if between
    its follow_page() and lock_page(), another process were to munlock the
    same page, then page reclaim remove it from our vma, then another process
    mlock it again. We would find it with page_mapcount 1, yet it's still
    mlocked in another process. But never mind, that's much less likely than
    the down_read_trylock() failure which munlocking already tolerates (in
    try_to_unmap_one()): in due course page reclaim will discover and move the
    page to unevictable instead.

    [akpm@linux-foundation.org: add comment]
    Signed-off-by: Hugh Dickins
    Cc: Michel Lespinasse
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • There are three cases of update_mmu_cache() in the file, and the case in
    function collapse_huge_page() has a typo, namely the last parameter used,
    which is corrected based on the other two cases.

    Due to the define of update_mmu_cache by X86, the only arch that
    implements THP currently, the change here has no really crystal point, but
    one or two minutes of efforts could be saved for those archs that are
    likely to support THP in future.

    Signed-off-by: Hillf Danton
    Cc: Johannes Weiner
    Reviewed-by: Andrea Arcangeli
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hillf Danton
     
  • The THP copy-on-write handler falls back to regular-sized pages for a huge
    page replacement upon allocation failure or if THP has been individually
    disabled in the target VMA. The loop responsible for copying page-sized
    chunks accidentally uses multiples of PAGE_SHIFT instead of PAGE_SIZE as
    the virtual address arg for copy_user_highpage().

    Signed-off-by: Hillf Danton
    Acked-by: Johannes Weiner
    Reviewed-by: Andrea Arcangeli
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hillf Danton
     
  • MCL_FUTURE does not move pages between lru list and draining the LRU per
    cpu pagevecs is a nasty activity. Avoid doing it unecessarily.

    Signed-off-by: Christoph Lameter
    Cc: David Rientjes
    Reviewed-by: Minchan Kim
    Acked-by: KOSAKI Motohiro
    Cc: Mel Gorman
    Acked-by: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • If compaction can proceed, shrink_zones() stops doing any work but its
    callers still call shrink_slab() which raises the priority and potentially
    sleeps. This is unnecessary and wasteful so this patch aborts direct
    reclaim/compaction entirely if compaction can proceed.

    Signed-off-by: Mel Gorman
    Acked-by: Rik van Riel
    Reviewed-by: Minchan Kim
    Acked-by: Johannes Weiner
    Cc: Josh Boyer
    Cc: Andrea Arcangeli
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • When suffering from memory fragmentation due to unfreeable pages, THP page
    faults will repeatedly try to compact memory. Due to the unfreeable
    pages, compaction fails.

    Needless to say, at that point page reclaim also fails to create free
    contiguous 2MB areas. However, that doesn't stop the current code from
    trying, over and over again, and freeing a minimum of 4MB (2UL <<
    sc->order pages) at every single invocation.

    This resulted in my 12GB system having 2-3GB free memory, a corresponding
    amount of used swap and very sluggish response times.

    This can be avoided by having the direct reclaim code not reclaim from
    zones that already have plenty of free memory available for compaction.

    If compaction still fails due to unmovable memory, doing additional
    reclaim will only hurt the system, not help.

    [jweiner@redhat.com: change comment to explain the order check]
    Signed-off-by: Rik van Riel
    Acked-by: Johannes Weiner
    Acked-by: Mel Gorman
    Cc: Andrea Arcangeli
    Reviewed-by: Minchan Kim
    Signed-off-by: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rik van Riel
     
  • When a race between putback_lru_page() and shmem_lock with lock=0 happens,
    progrom execution order is as follows, but clear_bit in processor #1 could
    be reordered right before spin_unlock of processor #1. Then, the page
    would be stranded on the unevictable list.

    spin_lock
    SetPageLRU
    spin_unlock
    clear_bit(AS_UNEVICTABLE)
    spin_lock
    if PageLRU()
    if !test_bit(AS_UNEVICTABLE)
    move evictable list
    smp_mb
    if !test_bit(AS_UNEVICTABLE)
    move evictable list
    spin_unlock

    But, pagevec_lookup() in scan_mapping_unevictable_pages() has
    rcu_read_[un]lock() so it could protect reordering before reaching
    test_bit(AS_UNEVICTABLE) on processor #1 so this problem never happens.
    But it's a unexpected side effect and we should solve this problem
    properly.

    This patch adds a barrier after mapping_clear_unevictable.

    I didn't meet this problem but just found during review.

    Signed-off-by: Minchan Kim
    Acked-by: KOSAKI Motohiro
    Cc: Mel Gorman
    Cc: Rik van Riel
    Cc: Lee Schermerhorn
    Acked-by: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • Quiet the sparse noise:

    warning: symbol 'khugepaged_scan' was not declared. Should it be static?
    warning: context imbalance in 'khugepaged_scan_mm_slot' - unexpected unlock

    Signed-off-by: H Hartley Sweeten
    Cc: Andrea Arcangeli
    Cc: Rik van Riel
    Cc: Johannes Weiner
    Cc: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    H Hartley Sweeten
     
  • Quiet the spares noise:

    warning: symbol 'default_policy' was not declared. Should it be static?

    Signed-off-by: H Hartley Sweeten
    Cc: KOSAKI Motohiro
    Cc: Stephen Wilson
    Cc: Andrea Arcangeli
    Cc: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    H Hartley Sweeten
     
  • Quiet the following sparse noise:

    warning: symbol 'swap_token_memcg' was not declared. Should it be static?

    Signed-off-by: H Hartley Sweeten
    Cc: Rik van Riel
    Cc: KOSAKI Motohiro
    Cc: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    H Hartley Sweeten
     
  • Quiet the following sparse noise in this file:

    warning: symbol 'memblock_overlaps_region' was not declared. Should it be static?

    Signed-off-by: H Hartley Sweeten
    Cc: Yinghai Lu
    Cc: "H. Peter Anvin"
    Cc: Benjamin Herrenschmidt
    Cc: Tomi Valkeinen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    H Hartley Sweeten
     
  • At one point, anonymous pages were supposed to go on the unevictable list
    when no swap space was configured, and the idea was to manually rescue
    those pages after adding swap and making them evictable again. But
    nowadays, swap-backed pages on the anon LRU list are not scanned without
    available swap space anyway, so there is no point in moving them to a
    separate list anymore.

    The manual rescue could also be used in case pages were stranded on the
    unevictable list due to race conditions. But the code has been around for
    a while now and newly discovered bugs should be properly reported and
    dealt with instead of relying on such a manual fixup.

    In addition to the lack of a usecase, the sysfs interface to rescue pages
    from a specific NUMA node has been broken since its introduction, so it's
    unlikely that anybody ever relied on that.

    This patch removes the functionality behind the sysctl and the
    node-interface and emits a one-time warning when somebody tries to access
    either of them.

    Signed-off-by: Johannes Weiner
    Reported-by: Kautuk Consul
    Reviewed-by: Minchan Kim
    Acked-by: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • write_scan_unevictable_node() checks the value req returned by
    strict_strtoul() and returns 1 if req is 0.

    However, when strict_strtoul() returns 0, it means successful conversion
    of buf to unsigned long.

    Due to this, the function was not proceeding to scan the zones for
    unevictable pages even though we write a valid value to the
    scan_unevictable_pages sys file.

    Change this check slightly to check for invalid value in buf as well as 0
    value stored in res after successful conversion via strict_strtoul. In
    both cases, we do not perform the scanning of this node's zones.

    Signed-off-by: Kautuk Consul
    Reviewed-by: KAMEZAWA Hiroyuki
    Cc: Johannes Weiner
    Cc: Lee Schermerhorn
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kautuk Consul
     
  • Signed-off-by: Li Haifeng
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Li Haifeng
     
  • There's no compact_zone_order() user outside file scope, so make it static.

    Signed-off-by: Kyungmin Park
    Acked-by: David Rientjes
    Reviewed-by: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kyungmin Park
     
  • Commit fb46e73520940b ("HWPOISON: Convert pr_debugs to pr_info) authored
    by Andi Kleen converted a number of pr_debug()s to pr_info()s.

    About the same time additional code with pr_debug()s was added by two
    other commits 8c6c2ecb4466 ("HWPOSION, hugetlb: recover from free hugepage
    error when !MF_COUNT_INCREASED") and d950b95882f3d ("HWPOISON, hugetlb:
    soft offlining for hugepage"). And these pr_debug()s failed to get
    converted to pr_info()s.

    This patch converts them as well. And does some minor related whitespace
    cleanup.

    Signed-off-by: Dean Nelson
    Reviewed-by: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dean Nelson
     
  • The ret variable is really not needed in mm_take_all_locks().

    Signed-off-by: Kautuk Consul
    Reviewed-by: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kautuk Consul
     
  • fiddle wording

    Cc: Jan Kara
    Cc: Wu Fengguang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • try_to_unmap_one() is called by try_to_unmap_ksm(), too.

    Signed-off-by: Wanlong Gao
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wanlong Gao
     
  • Some vmalloc failure paths do not report OOM conditions.

    Add warn_alloc_failed, which also does a dump_stack, to those failure
    paths.

    This allows more site specific vmalloc failure logging message printks to
    be removed.

    Signed-off-by: Joe Perches
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     
  • There 2 places to read pgdat in kswapd. One is return from a successful
    balance, another is waked up from kswapd sleeping. The new_order and
    new_classzone_idx represent the balance input order and classzone_idx.

    But current new_order and new_classzone_idx are not assigned after
    kswapd_try_to_sleep(), that will cause a bug in the following scenario.

    1: after a successful balance, kswapd goes to sleep, and new_order = 0;
    new_classzone_idx = __MAX_NR_ZONES - 1;

    2: kswapd waked up with order = 3 and classzone_idx = ZONE_NORMAL

    3: in the balance_pgdat() running, a new balance wakeup happened with
    order = 5, and classzone_idx = ZONE_NORMAL

    4: the first wakeup(order = 3) finished successufly, return order = 3
    but, the new_order is still 0, so, this balancing will be treated as a
    failed balance. And then the second tighter balancing will be missed.

    So, to avoid the above problem, the new_order and new_classzone_idx need
    to be assigned for later successful comparison.

    Signed-off-by: Alex Shi
    Acked-by: Mel Gorman
    Reviewed-by: Minchan Kim
    Tested-by: Pádraig Brady
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alex,Shi
     
  • warning: function 'memblock_memory_can_coalesce'
    with external linkage has definition.

    Signed-off-by: Jonghwan Choi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jonghwan Choi
     
  • In commit 215ddd66 ("mm: vmscan: only read new_classzone_idx from pgdat
    when reclaiming successfully") , Mel Gorman said kswapd is better to sleep
    after a unsuccessful balancing if there is tighter reclaim request pending
    in the balancing. But in the following scenario, kswapd do something that
    is not matched our expectation. The patch fixes this issue.

    1, Read pgdat request A (classzone_idx, order = 3)
    2, balance_pgdat()
    3, During pgdat, a new pgdat request B (classzone_idx, order = 5) is placed
    4, balance_pgdat() returns but failed since returned order = 0
    5, pgdat of request A assigned to balance_pgdat(), and do balancing again.
    While the expectation behavior of kswapd should try to sleep.

    Signed-off-by: Alex Shi
    Reviewed-by: Tim Chen
    Acked-by: Mel Gorman
    Tested-by: Pádraig Brady
    Cc: Rik van Riel
    Cc: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alex,Shi
     
  • This adds support for highmem pages poisoning and verification to the
    debug-pagealloc feature for no-architecture support.

    [akpm@linux-foundation.org: remove unneeded preempt_disable/enable]
    Signed-off-by: Akinobu Mita
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita