16 Dec, 2009

40 commits

  • Two IOC3 and IOC4 drivers have broken error paths on registration. Fix
    them.

    Signed-off-by: Jean Delvare
    Cc: Pat Gefre
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jean Delvare
     
  • Several IOC3 and IOC4 drivers misuse the __devinit and __devexit section
    markers. Use __init and __exit instead as appropriate, then add __devinit
    and __devexit where they really belong for PCI drivers.

    Also make ioc4_serial_init static.

    Signed-off-by: Jean Delvare
    Cc: Pat Gefre
    Cc: Alan Cox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jean Delvare
     
  • journal_info in task_struct is used in journaling file system only. So
    introduce CONFIG_FS_JOURNAL_INFO and make it conditional.

    Signed-off-by: Hiroshi Shimamoto
    Cc: Chris Mason
    Cc: "Theodore Ts'o"
    Cc: Steven Whitehouse
    Cc: KONISHI Ryusuke
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hiroshi Shimamoto
     
  • It's easy to lose useful DEBUG_BUGVERBOSE by switching EMBEDDED left and right.

    Signed-off-by: Alexey Dobriyan
    Cc: Sam Ravnborg
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • Add a printk_ratelimited statement expression macro that uses a per-call
    ratelimit_state so that multiple subsystems output messages are not
    suppressed by a global __ratelimit state.

    [akpm@linux-foundation.org: coding-style fixes]
    [akpm@linux-foundation.org: s/_rl/_ratelimited/g]
    Signed-off-by: Joe Perches
    Cc: Naohiro Ooiwa
    Cc: Ingo Molnar
    Cc: Hiroshi Shimamoto
    Cc: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     
  • Commit 8c8709334cec803368a432a33e0f2e116d48fe07 has removed the
    pmu_device_init call from misc_init, but unlike other similar commits,
    has not removed its declaration.

    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Thadeu Lima de Souza Cascardo
     
  • do_each_thread/while_each_thread wrap a block of code that is in this format:

    for (...)
    do
    ...
    while

    If curly braces do not surround the inner loop the following warning is
    generated by sparse:

    warning: do-while statement is not a compound statement

    Fix the warning by adding the braces.

    Signed-off-by: H Hartley Sweeten
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    H Hartley Sweeten
     
  • According to feature-removal-schedule.txt, it is the time to remove
    print_fn_descriptor_symbol().

    And a quick grep shows that it no longer has any callers.

    Signed-off-by: WANG Cong
    Cc: Bjorn Helgaas
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Amerigo Wang
     
  • rwsem_is_locked() tests ->activity without locks, so we should always keep
    ->activity consistent. However, the code in __rwsem_do_wake() breaks this
    rule, it updates ->activity after _all_ readers waken up, this may give
    some reader a wrong ->activity value, thus cause rwsem_is_locked() behaves
    wrong.

    Quote from Andrew:

    "
    - we have one or more processes sleeping in down_read(), waiting for access.

    - we wake one or more processes up without altering ->activity

    - they start to run and they do rwsem_is_locked(). This incorrectly
    returns "false", because the waker process is still crunching away in
    __rwsem_do_wake().

    - the waker now alters ->activity, but it was too late.
    "

    So we need get a spinlock to protect this. And rwsem_is_locked() should
    not block, thus we use spin_trylock_irqsave().

    [akpm@linux-foundation.org: simplify code]
    Reported-by: Brian Behlendorf
    Cc: Ben Woodard
    Cc: David Howells
    Signed-off-by: WANG Cong
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Amerigo Wang
     
  • These functions need not to be exported, since no drivers should use them.

    __init_rwsem() is an exception, because init_rwsem(), which is a macro,
    is used.

    Signed-off-by: WANG Cong
    Cc: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Amerigo Wang
     
  • Don't initialize __print_once. Invert the test to reduce initialized
    data.

    defconfig before: $size vmlinux
    text data bss dec hex filename
    6976022 679572 1359668 9015262 898fde vmlinux

    defconfig after: $size vmlinux
    text data bss dec hex filename
    6976006 679508 1359700 9015214 898fae vmlinux

    Signed-off-by: Joe Perches
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     
  • The symbol 'call' is a static symbol used for initcall_debug. This same
    symbol name is used locally by a couple functions and produces the
    following sparse warnings:

    warning: symbol 'call' shadows an earlier one

    Fix this noise by renaming the local symbols.

    Signed-off-by: H Hartley Sweeten
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    H Hartley Sweeten
     
  • Signed-off-by: Daniel Mack
    Cc: "H Hartley Sweeten"
    Cc: David Brownell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daniel Mack
     
  • Use smp_processor_id() instead of get_cpu() and put_cpu() in
    generic_smp_call_function_interrupt(), It's no need to disable preempt,
    because we must call generic_smp_call_function_interrupt() with interrupts
    disabled.

    Signed-off-by: Xiao Guangrong
    Acked-by: Ingo Molnar
    Cc: Jens Axboe
    Cc: Nick Piggin
    Cc: Peter Zijlstra
    Cc: Rusty Russell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Xiao Guangrong
     
  • This driver supports the non-volatile digital potentiometers via I2C:
    AD5258, AD5259, AD5251, AD5252, AD5253, AD5254, and AD5255

    It provides a sysfs interface to each device for reading/writing which
    is documented in Documentation/misc-devices/ad525x_dpot.txt.

    Signed-off-by: Michael Hennerich
    Signed-off-by: Chris Verges
    Signed-off-by: Mike Frysinger
    Cc: Jean Delvare
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michael Hennerich
     
  • If CONFIG_DYNAMIC_DEBUG is enabled and a source file has:

    #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
    #include

    dynamic_debug.h will duplicate KBUILD_MODNAME
    in the output string.

    Remove the use of KBUILD_MODNAME from the
    output format string generated by dynamic_debug.h

    If CONFIG_DYNAMIC_DEBUG is not enabled, no compile-time
    check is done to printk/dev_printk arguments.

    Add it.

    Signed-off-by: Joe Perches
    Cc: Jason Baron
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     
  • Commit 70867453092297be9afb2249e712a1f960ec0a09 ("printk_once(): use bool
    for boolean flag") changed printk_once() to use bool instead of int for
    its guard variable. Do the same change to WARN_ONCE() and WARN_ON_ONCE(),
    for the same reasons.

    This resulted in a reduction of 1462 bytes on a x86-64 defconfig:

    text data bss dec hex filename
    8101271 1207116 992764 10301151 9d2edf vmlinux.before
    8100553 1207148 991988 10299689 9d2929 vmlinux.after

    Signed-off-by: Cesar Eduardo Barros
    Cc: Roland Dreier
    Cc: Daniel Walker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Cesar Eduardo Barros
     
  • Convert code away from ->read_proc/->write_proc interfaces. Switch to
    proc_create()/proc_create_data() which make addition of proc entries
    reliable wrt NULL ->proc_fops, NULL ->data and so on.

    Problem with ->read_proc et al is described here commit
    786d7e1612f0b0adb6046f19b906609e4fe8b1ba "Fix rmmod/read/write races in
    /proc entries"

    Signed-off-by: Alexey Dobriyan
    Cc: Jeff Dike
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • gcc is not convinced that the floppy.c ioctl has sufficient bound checks:

    In function `copy_from_user',
    inlined from `fd_copyin' at drivers/block/floppy.c:3080,
    inlined from `fd_ioctl' at drivers/block/floppy.c:3503:
    arch/x86/include/asm/uaccess_32.h:211:
    warning: call to `copy_from_user_overflow' declared with attribute
    warning: copy_from_user buffer size is not provably correct

    And frankly, as a human I have a hard time proving the same more or less
    (the size comes from the ioctl argument. humpf. maybe. the code isn't
    very nice)

    This patch adds an explicit check to make 100% sure it's safe, better than
    finding out later that there indeed was a gap.

    [akpm@linux-foundation.org: add WARN_ON()]
    Signed-off-by: Arjan van de Ven
    Acked-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arjan van de Ven
     
  • It does not seem possible that ldev can be NULL, so drop the unnecessary
    test. If ldev can somehow be NULL, then the initialization of last_idx
    should be moved below the test.

    A simplified version of the semantic match that detects this problem is as
    follows (http://coccinelle.lip6.fr/):

    //
    @match exists@
    expression x, E;
    identifier fld;
    @@

    * x->fld
    ... when != \(x = E\|&x\)
    * x == NULL
    //

    Signed-off-by: Julia Lawall
    Acked-by: Arjan van de Ven
    Cc: Ingo Molnar
    Cc: Venkatesh Pallipadi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Julia Lawall
     
  • Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • Convert code away from ->read_proc/->write_proc interfaces. Switch to
    proc_create()/proc_create_data() which make addition of proc entries
    reliable wrt NULL ->proc_fops, NULL ->data and so on.

    Problem with ->read_proc et al is described here commit
    786d7e1612f0b0adb6046f19b906609e4fe8b1ba "Fix rmmod/read/write races in
    /proc entries"

    Signed-off-by: Alexey Dobriyan
    Cc: Richard Henderson
    Cc: Ivan Kokshaysky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • Setting a thread's comm to be something unique is a very useful ability
    and is helpful for debugging complicated threaded applications. However
    currently the only way to set a thread name is for the thread to name
    itself via the PR_SET_NAME prctl.

    However, there may be situations where it would be advantageous for a
    thread dispatcher to be naming the threads its managing, rather then
    having the threads self-describe themselves. This sort of behavior is
    available on other systems via the pthread_setname_np() interface.

    This patch exports a task's comm via proc/pid/comm and
    proc/pid/task/tid/comm interfaces, and allows thread siblings to write to
    these values.

    [akpm@linux-foundation.org: cleanups]
    Signed-off-by: John Stultz
    Cc: Andi Kleen
    Cc: Arjan van de Ven
    Cc: Mike Fulton
    Cc: Sean Foley
    Cc: Darren Hart
    Cc: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    john stultz
     
  • On no-MMU systems, sizes reported in /proc/n/statm have units of bytes.
    Per Documentation/filesystems/proc.txt, these values should be in pages.

    Signed-off-by: Steven J. Magnani
    Cc: Greg Ungerer
    Cc: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Steven J. Magnani
     
  • The NOMMU code currently clears all anonymous mmapped memory. While this
    is what we want in the default case, all memory allocation from userspace
    under NOMMU has to go through this interface, including malloc() which is
    allowed to return uninitialized memory. This can easily be a significant
    performance penalty. So for constrained embedded systems were security is
    irrelevant, allow people to avoid clearing memory unnecessarily.

    This also alters the ELF-FDPIC binfmt such that it obtains uninitialised
    memory for the brk and stack region.

    Signed-off-by: Jie Zhang
    Signed-off-by: Robin Getz
    Signed-off-by: Mike Frysinger
    Signed-off-by: David Howells
    Acked-by: Paul Mundt
    Acked-by: Greg Ungerer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jie Zhang
     
  • This patch enables extraction of the pfn of a hugepage from
    /proc/pid/pagemap in an architecture independent manner.

    Details
    -------
    My test program (leak_pagemap) works as follows:
    - creat() and mmap() a file on hugetlbfs (file size is 200MB == 100 hugepages,)
    - read()/write() something on it,
    - call page-types with option -p,
    - munmap() and unlink() the file on hugetlbfs

    Without my patches
    ------------------
    $ ./leak_pagemap
    flags page-count MB symbolic-flags long-symbolic-flags
    0x0000000000000000 1 0 __________________________________
    0x0000000000000804 1 0 __R________M______________________ referenced,mmap
    0x000000000000086c 81 0 __RU_lA____M______________________ referenced,uptodate,lru,active,mmap
    0x0000000000005808 5 0 ___U_______Ma_b___________________ uptodate,mmap,anonymous,swapbacked
    0x0000000000005868 12 0 ___U_lA____Ma_b___________________ uptodate,lru,active,mmap,anonymous,swapbacked
    0x000000000000586c 1 0 __RU_lA____Ma_b___________________ referenced,uptodate,lru,active,mmap,anonymous,swapbacked
    total 101 0

    The output of page-types don't show any hugepage.

    With my patches
    ---------------
    $ ./leak_pagemap
    flags page-count MB symbolic-flags long-symbolic-flags
    0x0000000000000000 1 0 __________________________________
    0x0000000000030000 51100 199 ________________TG________________ compound_tail,huge
    0x0000000000028018 100 0 ___UD__________H_G________________ uptodate,dirty,compound_head,huge
    0x0000000000000804 1 0 __R________M______________________ referenced,mmap
    0x000000000000080c 1 0 __RU_______M______________________ referenced,uptodate,mmap
    0x000000000000086c 80 0 __RU_lA____M______________________ referenced,uptodate,lru,active,mmap
    0x0000000000005808 4 0 ___U_______Ma_b___________________ uptodate,mmap,anonymous,swapbacked
    0x0000000000005868 12 0 ___U_lA____Ma_b___________________ uptodate,lru,active,mmap,anonymous,swapbacked
    0x000000000000586c 1 0 __RU_lA____Ma_b___________________ referenced,uptodate,lru,active,mmap,anonymous,swapbacked
    total 51300 200

    The output of page-types shows 51200 pages contributing to hugepages,
    containing 100 head pages and 51100 tail pages as expected.

    [akpm@linux-foundation.org: build fix]
    Signed-off-by: Naoya Horiguchi
    Cc: Andi Kleen
    Cc: Wu Fengguang
    Cc: Hugh Dickins
    Cc: Mel Gorman
    Cc: Lee Schermerhorn
    Cc: Andy Whitcroft
    Cc: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Naoya Horiguchi
     
  • Most callers of pmd_none_or_clear_bad() check whether the target page is
    in a hugepage or not, but walk_page_range() do not check it. So if we
    read /proc/pid/pagemap for the hugepage on x86 machine, the hugepage
    memory is leaked as shown below. This patch fixes it.

    Details
    =======
    My test program (leak_pagemap) works as follows:
    - creat() and mmap() a file on hugetlbfs (file size is 200MB == 100 hugepages,)
    - read()/write() something on it,
    - call page-types with option -p (walk around the page tables),
    - munmap() and unlink() the file on hugetlbfs

    Without my patches
    ------------------
    $ cat /proc/meminfo |grep "HugePage"
    HugePages_Total: 1000
    HugePages_Free: 1000
    HugePages_Rsvd: 0
    HugePages_Surp: 0
    $ ./leak_pagemap
    [snip output]
    $ cat /proc/meminfo |grep "HugePage"
    HugePages_Total: 1000
    HugePages_Free: 900
    HugePages_Rsvd: 0
    HugePages_Surp: 0
    $ ls /hugetlbfs/
    $

    100 hugepages are accounted as used while there is no file on hugetlbfs.

    With my patches
    ---------------
    $ cat /proc/meminfo |grep "HugePage"
    HugePages_Total: 1000
    HugePages_Free: 1000
    HugePages_Rsvd: 0
    HugePages_Surp: 0
    $ ./leak_pagemap
    [snip output]
    $ cat /proc/meminfo |grep "HugePage"
    HugePages_Total: 1000
    HugePages_Free: 1000
    HugePages_Rsvd: 0
    HugePages_Surp: 0
    $ ls /hugetlbfs
    $

    No memory leaks.

    Signed-off-by: Naoya Horiguchi
    Cc: Andi Kleen
    Cc: Wu Fengguang
    Cc: Hugh Dickins
    Cc: Mel Gorman
    Cc: Lee Schermerhorn
    Cc: Andy Whitcroft
    Cc: David Rientjes
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Naoya Horiguchi
     
  • Most callers of pmd_none_or_clear_bad() check whether the target page is
    in a hugepage or not, but mincore() and walk_page_range() do not check it.
    So if we use mincore() on a hugepage on x86 machine, the hugepage memory
    is leaked as shown below. This patch fixes it by extending mincore()
    system call to support hugepages.

    Details
    =======
    My test program (leak_mincore) works as follows:
    - creat() and mmap() a file on hugetlbfs (file size is 200MB == 100 hugepages,)
    - read()/write() something on it,
    - call mincore() for first ten pages and printf() the values of *vec
    - munmap() and unlink() the file on hugetlbfs

    Without my patch
    ----------------
    $ cat /proc/meminfo| grep "HugePage"
    HugePages_Total: 1000
    HugePages_Free: 1000
    HugePages_Rsvd: 0
    HugePages_Surp: 0
    $ ./leak_mincore
    vec[0] 0
    vec[1] 0
    vec[2] 0
    vec[3] 0
    vec[4] 0
    vec[5] 0
    vec[6] 0
    vec[7] 0
    vec[8] 0
    vec[9] 0
    $ cat /proc/meminfo |grep "HugePage"
    HugePages_Total: 1000
    HugePages_Free: 999
    HugePages_Rsvd: 0
    HugePages_Surp: 0
    $ ls /hugetlbfs/
    $

    Return values in *vec from mincore() are set to 0, while the hugepage
    should be in memory, and 1 hugepage is still accounted as used while
    there is no file on hugetlbfs.

    With my patch
    -------------
    $ cat /proc/meminfo| grep "HugePage"
    HugePages_Total: 1000
    HugePages_Free: 1000
    HugePages_Rsvd: 0
    HugePages_Surp: 0
    $ ./leak_mincore
    vec[0] 1
    vec[1] 1
    vec[2] 1
    vec[3] 1
    vec[4] 1
    vec[5] 1
    vec[6] 1
    vec[7] 1
    vec[8] 1
    vec[9] 1
    $ cat /proc/meminfo |grep "HugePage"
    HugePages_Total: 1000
    HugePages_Free: 1000
    HugePages_Rsvd: 0
    HugePages_Surp: 0
    $ ls /hugetlbfs/
    $

    Return value in *vec set to 1 and no memory leaks.

    [akpm@linux-foundation.org: cleanup]
    [akpm@linux-foundation.org: build fix]
    Signed-off-by: Naoya Horiguchi
    Cc: Andi Kleen
    Cc: Wu Fengguang
    Cc: Hugh Dickins
    Cc: Mel Gorman
    Cc: Lee Schermerhorn
    Cc: Andy Whitcroft
    Cc: David Rientjes
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Naoya Horiguchi
     
  • If a user asks for a hugepage pool resize but specified a large number,
    the machine can begin trashing. In response, they might hit ctrl-c but
    signals are ignored and the pool resize continues until it fails an
    allocation. This can take a considerable amount of time so this patch
    aborts a pool resize if a signal is pending.

    Suggested by Dave Hansen.

    Signed-off-by: Mel Gorman
    Cc: Dave Hansen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • Cleanup stale comments on munlock_vma_page().

    Signed-off-by: Lee Schermerhorn
    Acked-by: Hugh Dickins
    Cc: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lee Schermerhorn
     
  • unevictable_migrate_page() in mm/internal.h is a relic of the since
    removed UNEVICTABLE_LRU Kconfig option. This patch removes the function
    and open codes the test in migrate_page_copy().

    Signed-off-by: Lee Schermerhorn
    Reviewed-by: Christoph Lameter
    Acked-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lee Schermerhorn
     
  • When the owner of a mapping fails COW because a child process is holding a
    reference, the children VMAs are walked and the page is unmapped. The
    i_mmap_lock is taken for the unmapping of the page but not the walking of
    the prio_tree. In theory, that tree could be changing if the lock is not
    held. This patch takes the i_mmap_lock properly for the duration of the
    prio_tree walk.

    [hugh.dickins@tiscali.co.uk: Spotted the problem in the first place]
    Signed-off-by: Mel Gorman
    Acked-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • Jan Engelhardt reported we have this problem:

    setting max_map_count to a value large enough results in programs dying at
    first try. This is on 2.6.31.6:

    15:59 borg:/proc/sys/vm # echo $[1<max_map_count
    15:59 borg:/proc/sys/vm # cat max_map_count
    1073741824
    15:59 borg:/proc/sys/vm # echo $[1<max_map_count
    15:59 borg:/proc/sys/vm # cat max_map_count
    Killed

    This is because we have a chance to make 'max_map_count' negative. but
    it's meaningless. Make it only accept non-negative values.

    Reported-by: Jan Engelhardt
    Signed-off-by: WANG Cong
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: James Morris
    Cc: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Amerigo Wang
     
  • The check code for CONFIG_SWAP is redundant, because there is a
    non-CONFIG_SWAP version for PageSwapCache() which just returns 0.

    Signed-off-by: Huang Shijie
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Huang Shijie
     
  • Modify the generic mmap() code to keep the cache attribute in
    vma->vm_page_prot regardless if writenotify is enabled or not. Without
    this patch the cache configuration selected by f_op->mmap() is overwritten
    if writenotify is enabled, making it impossible to keep the vma uncached.

    Needed by drivers such as drivers/video/sh_mobile_lcdcfb.c which uses
    deferred io together with uncached memory.

    Signed-off-by: Magnus Damm
    Cc: Nick Piggin
    Cc: Hugh Dickins
    Cc: Paul Mundt
    Cc: Jaya Kumar
    Cc: Arnd Bergmann
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Magnus Damm
     
  • Simplify the code for shrink_inactive_list().

    Signed-off-by: Huang Shijie
    Reviewed-by: KOSAKI Motohiro
    Reviewed-by: Wu Fengguang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Huang Shijie
     
  • In AIM7 runs, recent kernels start swapping out anonymous pages well
    before they should. This is due to shrink_list falling through to
    shrink_inactive_list if !inactive_anon_is_low(zone, sc), when all we
    really wanted to do is pre-age some anonymous pages to give them extra
    time to be referenced while on the inactive list.

    The obvious fix is to make sure that shrink_list does not fall through to
    scanning/reclaiming inactive pages when we called it to scan one of the
    active lists.

    This change should be safe because the loop in shrink_zone ensures that we
    will still shrink the anon and file inactive lists whenever we should.

    [kosaki.motohiro@jp.fujitsu.com: inactive_file_is_low() should be inactive_anon_is_low()]
    Reported-by: Larry Woodman
    Signed-off-by: Rik van Riel
    Acked-by: Johannes Weiner
    Cc: Tomasz Chmielewski
    Signed-off-by: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rik van Riel
     
  • Signed-off-by: Jan Beulich
    Cc: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Beulich
     
  • Nodemasks should not be allocated on the stack for large systems (when it
    is larger than 256 bytes) since there is a threat of overflow.

    This patch causes the unregister_mem_sect_under_nodes() nodemask to be
    allocated on the stack for smaller systems and be allocated by slab for
    larger systems.

    GFP_KERNEL is used since remove_memory_block() can block.

    Cc: Gary Hade
    Cc: Badari Pulavarty
    Cc: Alex Chiang
    Signed-off-by: David Rientjes
    Cc: Greg Kroah-Hartman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Rientjes
     
  • SWAP_MLOCK mean "We marked the page as PG_MLOCK, please move it to
    unevictable-lru". So, following code is easy confusable.

    if (vma->vm_flags & VM_LOCKED) {
    ret = SWAP_MLOCK;
    goto out_unmap;
    }

    Plus, if the VMA doesn't have VM_LOCKED, We don't need to check
    the needed of calling mlock_vma_page().

    Also, add some commentary to try_to_unmap_one().

    Acked-by: Hugh Dickins
    Signed-off-by: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KOSAKI Motohiro