15 Nov, 2010

1 commit


14 Nov, 2010

1 commit

  • There are two places, that do not release the slub_lock.

    Respective bugs were introduced by sysfs changes ab4d5ed5 (slub: Enable
    sysfs support for !CONFIG_SLUB_DEBUG) and 2bce6485 ( slub: Allow removal
    of slab caches during boot).

    Acked-by: Christoph Lameter
    Signed-off-by: Pavel Emelyanov
    Signed-off-by: Pekka Enberg

    Pavel Emelyanov
     

12 Nov, 2010

4 commits

  • Salman Qazi describes the following radix-tree bug:

    In the following case, we get can get a deadlock:

    0. The radix tree contains two items, one has the index 0.
    1. The reader (in this case find_get_pages) takes the rcu_read_lock.
    2. The reader acquires slot(s) for item(s) including the index 0 item.
    3. The non-zero index item is deleted, and as a consequence the other item is
    moved to the root of the tree. The place where it used to be is queued for
    deletion after the readers finish.
    3b. The zero item is deleted, removing it from the direct slot, it remains in
    the rcu-delayed indirect node.
    4. The reader looks at the index 0 slot, and finds that the page has 0 ref
    count
    5. The reader looks at it again, hoping that the item will either be freed or
    the ref count will increase. This never happens, as the slot it is looking
    at will never be updated. Also, this slot can never be reclaimed because
    the reader is holding rcu_read_lock and is in an infinite loop.

    The fix is to re-use the same "indirect" pointer case that requires a slot
    lookup retry into a general "retry the lookup" bit.

    Signed-off-by: Nick Piggin
    Reported-by: Salman Qazi
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • nr_dirty and nr_congested are increased only when the page is dirty. So
    if all pages are clean, both them will be zero. In this case, we should
    not mark the zone congested.

    Signed-off-by: Shaohua Li
    Reviewed-by: Johannes Weiner
    Reviewed-by: Minchan Kim
    Acked-by: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Shaohua Li
     
  • 70 hours into some stress tests of a 2.6.32-based enterprise kernel, we
    ran into a NULL dereference in here:

    int block_is_partially_uptodate(struct page *page, read_descriptor_t *desc,
    unsigned long from)
    {
    ----> struct inode *inode = page->mapping->host;

    It looks like page->mapping was the culprit. (xmon trace is below).
    After closer examination, I realized that do_generic_file_read() does a
    find_get_page(), and eventually locks the page before calling
    block_is_partially_uptodate(). However, it doesn't revalidate the
    page->mapping after the page is locked. So, there's a small window
    between the find_get_page() and ->is_partially_uptodate() where the page
    could get truncated and page->mapping cleared.

    We _have_ a reference, so it can't get reclaimed, but it certainly
    can be truncated.

    I think the correct thing is to check page->mapping after the
    trylock_page(), and jump out if it got truncated. This patch has been
    running in the test environment for a month or so now, and we have not
    seen this bug pop up again.

    xmon info:

    1f:mon> e
    cpu 0x1f: Vector: 300 (Data Access) at [c0000002ae36f770]
    pc: c0000000001e7a6c: .block_is_partially_uptodate+0xc/0x100
    lr: c000000000142944: .generic_file_aio_read+0x1e4/0x770
    sp: c0000002ae36f9f0
    msr: 8000000000009032
    dar: 0
    dsisr: 40000000
    current = 0xc000000378f99e30
    paca = 0xc000000000f66300
    pid = 21946, comm = bash
    1f:mon> r
    R00 = 0025c0500000006d R16 = 0000000000000000
    R01 = c0000002ae36f9f0 R17 = c000000362cd3af0
    R02 = c000000000e8cd80 R18 = ffffffffffffffff
    R03 = c0000000031d0f88 R19 = 0000000000000001
    R04 = c0000002ae36fa68 R20 = c0000003bb97b8a0
    R05 = 0000000000000000 R21 = c0000002ae36fa68
    R06 = 0000000000000000 R22 = 0000000000000000
    R07 = 0000000000000001 R23 = c0000002ae36fbb0
    R08 = 0000000000000002 R24 = 0000000000000000
    R09 = 0000000000000000 R25 = c000000362cd3a80
    R10 = 0000000000000000 R26 = 0000000000000002
    R11 = c0000000001e7b60 R27 = 0000000000000000
    R12 = 0000000042000484 R28 = 0000000000000001
    R13 = c000000000f66300 R29 = c0000003bb97b9b8
    R14 = 0000000000000001 R30 = c000000000e28a08
    R15 = 000000000000ffff R31 = c0000000031d0f88
    pc = c0000000001e7a6c .block_is_partially_uptodate+0xc/0x100
    lr = c000000000142944 .generic_file_aio_read+0x1e4/0x770
    msr = 8000000000009032 cr = 22000488
    ctr = c0000000001e7a60 xer = 0000000020000000 trap = 300
    dar = 0000000000000000 dsisr = 40000000
    1f:mon> t
    [link register ] c000000000142944 .generic_file_aio_read+0x1e4/0x770
    [c0000002ae36f9f0] c000000000142a14 .generic_file_aio_read+0x2b4/0x770 (unreliable)
    [c0000002ae36fb40] c0000000001b03e4 .do_sync_read+0xd4/0x160
    [c0000002ae36fce0] c0000000001b153c .vfs_read+0xec/0x1f0
    [c0000002ae36fd80] c0000000001b1768 .SyS_read+0x58/0xb0
    [c0000002ae36fe30] c00000000000852c syscall_exit+0x0/0x40
    --- Exception: c00 (System Call) at 00000080a840bc54
    SP (fffca15df30) is in userspace
    1f:mon> di c0000000001e7a6c
    c0000000001e7a6c e9290000 ld r9,0(r9)
    c0000000001e7a70 418200c0 beq c0000000001e7b30 # .block_is_partially_uptodate+0xd0/0x100
    c0000000001e7a74 e9440008 ld r10,8(r4)
    c0000000001e7a78 78a80020 clrldi r8,r5,32
    c0000000001e7a7c 3c000001 lis r0,1
    c0000000001e7a80 812900a8 lwz r9,168(r9)
    c0000000001e7a84 39600001 li r11,1
    c0000000001e7a88 7c080050 subf r0,r8,r0
    c0000000001e7a8c 7f805040 cmplw cr7,r0,r10
    c0000000001e7a90 7d6b4830 slw r11,r11,r9
    c0000000001e7a94 796b0020 clrldi r11,r11,32
    c0000000001e7a98 419d00a8 bgt cr7,c0000000001e7b40 # .block_is_partially_uptodate+0xe0/0x100
    c0000000001e7a9c 7fa55840 cmpld cr7,r5,r11
    c0000000001e7aa0 7d004214 add r8,r0,r8
    c0000000001e7aa4 79080020 clrldi r8,r8,32
    c0000000001e7aa8 419c0078 blt cr7,c0000000001e7b20 # .block_is_partially_uptodate+0xc0/0x100

    Signed-off-by: Dave Hansen
    Reviewed-by: Minchan Kim
    Reviewed-by: Johannes Weiner
    Acked-by: Rik van Riel
    Cc:
    Cc:
    Cc: Christoph Hellwig
    Cc: Al Viro
    Cc: Minchan Kim
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Hansen
     
  • The original code had a null dereference if alloc_percpu() failed. This
    was introduced in commit 711d3d2c9bc3 ("memcg: cpu hotplug aware percpu
    count updates")

    Signed-off-by: Dan Carpenter
    Reviewed-by: Balbir Singh
    Acked-by: KAMEZAWA Hiroyuki
    Acked-by: Daisuke Nishimura
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dan Carpenter
     

10 Nov, 2010

1 commit

  • As pointed out by Linus, commit dab5855 ("perf_counter: Add mmap event hooks to
    mprotect()") is fundamentally wrong as mprotect_fixup() can free 'vma' due to
    merging. Fix the problem by moving perf_event_mmap() hook to
    mprotect_fixup().

    Note: there's another successful return path from mprotect_fixup() if old
    flags equal to new flags. We don't, however, need to call
    perf_event_mmap() there because 'perf' already knows the VMA is
    executable.

    Reported-by: Dave Jones
    Analyzed-by: Linus Torvalds
    Cc: Ingo Molnar
    Reviewed-by: Peter Zijlstra
    Signed-off-by: Pekka Enberg
    Signed-off-by: Linus Torvalds

    Pekka Enberg
     

04 Nov, 2010

1 commit

  • Fix regression introduced by commit 79da826aee6 ("writeback: report
    dirty thresholds in /proc/vmstat").

    The incorrect pointer arithmetic can result in problems like this:

    BUG: unable to handle kernel paging request at 07c06d16
    IP: [] strnlen+0x6/0x20
    Call Trace:
    [] ? string+0x39/0xe0
    [] ? __wake_up_common+0x4b/0x80
    [] ? vsnprintf+0x1ec/0x380
    [] ? seq_printf+0x2e/0x60
    [] ? vmstat_show+0x26/0x30
    [] ? seq_read+0xa6/0x380
    [] ? seq_read+0x0/0x380
    [] ? proc_reg_read+0x5f/0x90
    [] ? vfs_read+0xa1/0x140
    [] ? proc_reg_read+0x0/0x90
    [] ? sys_read+0x41/0x70
    [] ? sysenter_do_call+0x12/0x26

    Reported-by: Tetsuo Handa
    Cc: Michael Rubin
    Signed-off-by: Wu Fengguang
    Signed-off-by: Linus Torvalds

    Wu Fengguang
     

03 Nov, 2010

1 commit


30 Oct, 2010

1 commit

  • Normal syscall audit doesn't catch 5th argument of syscall. It also
    doesn't catch the contents of userland structures pointed to be
    syscall argument, so for both old and new mmap(2) ABI it doesn't
    record the descriptor we are mapping. For old one it also misses
    flags.

    Signed-off-by: Al Viro

    Al Viro
     

29 Oct, 2010

2 commits

  • Signed-off-by: Al Viro

    Al Viro
     
  • When a node contains only HighMem memory, slab_node(MPOL_BIND)
    dereferences a NULL pointer.

    [ This code seems to go back all the way to commit 19770b32609b: "mm:
    filter based on a nodemask as well as a gfp_mask". Which was back in
    April 2008, and it got merged into 2.6.26. - Linus ]

    Signed-off-by: Eric Dumazet
    Cc: Mel Gorman
    Cc: Christoph Lameter
    Cc: Lee Schermerhorn
    Cc: Andrew Morton
    Cc: stable@kernel.org
    Signed-off-by: Linus Torvalds

    Eric Dumazet
     

28 Oct, 2010

10 commits

  • * git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-2.6-mn10300: (44 commits)
    MN10300: Save frame pointer in thread_info struct rather than global var
    MN10300: Change "Matsushita" to "Panasonic".
    MN10300: Create a defconfig for the ASB2364 board
    MN10300: Update the ASB2303 defconfig
    MN10300: ASB2364: Add support for SMSC911X and SMC911X
    MN10300: ASB2364: Handle the IRQ multiplexer in the FPGA
    MN10300: Generic time support
    MN10300: Specify an ELF HWCAP flag for MN10300 Atomic Operations Unit support
    MN10300: Map userspace atomic op regs as a vmalloc page
    MN10300: And Panasonic AM34 subarch and implement SMP
    MN10300: Delete idle_timestamp from irq_cpustat_t
    MN10300: Make various interrupt priority settings configurable
    MN10300: Optimise do_csum()
    MN10300: Implement atomic ops using atomic ops unit
    MN10300: Make the FPU operate in non-lazy mode under SMP
    MN10300: SMP TLB flushing
    MN10300: Use the [ID]PTEL2 registers rather than [ID]PTEL for TLB control
    MN10300: Make the use of PIDR to mark TLB entries controllable
    MN10300: Rename __flush_tlb*() to local_flush_tlb*()
    MN10300: AM34 erratum requires MMUCTR read and write on exception entry
    ...

    Linus Torvalds
     
  • Replace iterated page_cache_release() with release_pages(), which is
    faster and shorter.

    Needs release_pages() to be exported to modules.

    Suggested-by: Andrew Morton
    Signed-off-by: Miklos Szeredi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     
  • This patch extracts the core logic from mem_cgroup_update_file_mapped() as
    mem_cgroup_update_file_stat() and adds a wrapper.

    As a planned future update, memory cgroup has to count dirty pages to
    implement dirty_ratio/limit. And more, the number of dirty pages is
    required to kick flusher thread to start writeback. (Now, no kick.)

    This patch is preparation for it and makes other statistics implementation
    clearer. Just a clean up.

    Signed-off-by: KAMEZAWA Hiroyuki
    Acked-by: Balbir Singh
    Reviewed-by: Greg Thelen
    Cc: Daisuke Nishimura
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • An event counter MEM_CGROUP_ON_MOVE is used for quick check whether file
    stat update can be done in async manner or not. Now, it use percpu
    counter and for_each_possible_cpu to update.

    This patch replaces for_each_possible_cpu to for_each_online_cpu and adds
    necessary synchronization logic at CPU HOTPLUG.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: KAMEZAWA Hiroyuki
    Cc: Balbir Singh
    Cc: Daisuke Nishimura
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • Now, memcgroup's per cpu coutner uses for_each_possible_cpu() to get the
    value. It's better to use for_each_online_cpu() and a cpu hotplug
    handler.

    This patch only handles statistics counter. MEM_CGROUP_ON_MOVE will be
    handled in another patch.

    Signed-off-by: KAMEZAWA Hiroyuki
    Cc: Balbir Singh
    Cc: Daisuke Nishimura
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • In memory cgroup management, we sometimes have to walk through
    subhierarchy of cgroup to gather informaiton, or lock something, etc.

    Now, to do that, mem_cgroup_walk_tree() function is provided. It calls
    given callback function per cgroup found. But the bad thing is that it
    has to pass a fixed style function and argument, "void*" and it adds much
    type casting to memcontrol.c.

    To make the code clean, this patch replaces walk_tree() with

    for_each_mem_cgroup_tree(iter, root)

    An iterator style call. The good point is that iterator call doesn't have
    to assume what kind of function is called under it. A bad point is that
    it may cause reference-count leak if a caller use "break" from the loop by
    mistake.

    I think the benefit is larger. The modified code seems straigtforward and
    easy to read because we don't have misterious callbacks and pointer cast.

    Signed-off-by: KAMEZAWA Hiroyuki
    Cc: Balbir Singh
    Cc: Daisuke Nishimura
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • At accounting file events per memory cgroup, we need to find memory cgroup
    via page_cgroup->mem_cgroup. Now, we use lock_page_cgroup() for guarantee
    pc->mem_cgroup is not overwritten while we make use of it.

    But, considering the context which page-cgroup for files are accessed,
    we can use alternative light-weight mutual execusion in the most case.

    At handling file-caches, the only race we have to take care of is "moving"
    account, IOW, overwriting page_cgroup->mem_cgroup. (See comment in the
    patch)

    Unlike charge/uncharge, "move" happens not so frequently. It happens only when
    rmdir() and task-moving (with a special settings.)
    This patch adds a race-checker for file-cache-status accounting v.s. account
    moving. The new per-cpu-per-memcg counter MEM_CGROUP_ON_MOVE is added.
    The routine for account move
    1. Increment it before start moving
    2. Call synchronize_rcu()
    3. Decrement it after the end of moving.
    By this, file-status-counting routine can check it needs to call
    lock_page_cgroup(). In most case, I doesn't need to call it.

    Following is a perf data of a process which mmap()/munmap 32MB of file cache
    in a minute.

    Before patch:
    28.25% mmap mmap [.] main
    22.64% mmap [kernel.kallsyms] [k] page_fault
    9.96% mmap [kernel.kallsyms] [k] mem_cgroup_update_file_mapped
    3.67% mmap [kernel.kallsyms] [k] filemap_fault
    3.50% mmap [kernel.kallsyms] [k] unmap_vmas
    2.99% mmap [kernel.kallsyms] [k] __do_fault
    2.76% mmap [kernel.kallsyms] [k] find_get_page

    After patch:
    30.00% mmap mmap [.] main
    23.78% mmap [kernel.kallsyms] [k] page_fault
    5.52% mmap [kernel.kallsyms] [k] mem_cgroup_update_file_mapped
    3.81% mmap [kernel.kallsyms] [k] unmap_vmas
    3.26% mmap [kernel.kallsyms] [k] find_get_page
    3.18% mmap [kernel.kallsyms] [k] __do_fault
    3.03% mmap [kernel.kallsyms] [k] filemap_fault
    2.40% mmap [kernel.kallsyms] [k] handle_mm_fault
    2.40% mmap [kernel.kallsyms] [k] do_page_fault

    This patch reduces memcg's cost to some extent.
    (mem_cgroup_update_file_mapped is called by both of map/unmap)

    Note: It seems some more improvements are required..but no idea.
    maybe removing set/unset flag is required.

    Signed-off-by: KAMEZAWA Hiroyuki
    Reviewed-by: Daisuke Nishimura
    Cc: Balbir Singh
    Cc: Greg Thelen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • Presently memory cgroup accounts file-mapped by counter and flag. counter
    is working in the same way with zone_stat but FileMapped flag only exists
    in memcg (for helping move_account).

    This flag can be updated wrongly in a case. Assume CPU0 and CPU1 and a
    thread mapping a page on CPU0, another thread unmapping it on CPU1.

    CPU0 CPU1
    rmv rmap (mapcount 1->0)
    add rmap (mapcount 0->1)
    lock_page_cgroup()
    memcg counter+1 (some delay)
    set MAPPED FLAG.
    unlock_page_cgroup()
    lock_page_cgroup()
    memcg counter-1
    clear MAPPED flag

    In the above sequence counter is properly updated but FLAG is not. This
    means that representing a state by a flag which is maintained by counter
    needs some special care.

    To handle this, when clearing a flag, this patch check mapcount directly
    and clear the flag only when mapcount == 0. (if mapcount >0, someone will
    make it to zero later and flag will be cleared.)

    Reverse case, dec-after-inc cannot be a problem because page_table_lock()
    works well for it. (IOW, to make above sequence, 2 processes should touch
    the same page at once with map/unmap.)

    Signed-off-by: KAMEZAWA Hiroyuki
    Cc: Balbir Singh
    Cc: Daisuke Nishimura
    Cc: Greg Thelen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • It appears i386 uses kmap_atomic infrastructure regardless of
    CONFIG_HIGHMEM which results in a compile error when highmem is disabled.

    Cure this by providing the needed few bits for both CONFIG_HIGHMEM and
    CONFIG_X86_32.

    Signed-off-by: Peter Zijlstra
    Reported-by: Chris Wilson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     
  • Save the current exception frame pointer in the thread_info struct rather than
    in a global variable as the latter makes SMP tricky, especially when preemption
    is also enabled.

    This also replaces __frame with current_frame() and rearranges header file
    inclusions to make it all compile.

    Signed-off-by: David Howells
    Acked-by: Akira Takeuchi

    David Howells
     

27 Oct, 2010

18 commits

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (52 commits)
    split invalidate_inodes()
    fs: skip I_FREEING inodes in writeback_sb_inodes
    fs: fold invalidate_list into invalidate_inodes
    fs: do not drop inode_lock in dispose_list
    fs: inode split IO and LRU lists
    fs: switch bdev inode bdi's correctly
    fs: fix buffer invalidation in invalidate_list
    fsnotify: use dget_parent
    smbfs: use dget_parent
    exportfs: use dget_parent
    fs: use RCU read side protection in d_validate
    fs: clean up dentry lru modification
    fs: split __shrink_dcache_sb
    fs: improve DCACHE_REFERENCED usage
    fs: use percpu counter for nr_dentry and nr_dentry_unused
    fs: simplify __d_free
    fs: take dcache_lock inside __d_path
    fs: do not assign default i_ino in new_inode
    fs: introduce a per-cpu last_ino allocator
    new helper: ihold()
    ...

    Linus Torvalds
     
  • PF_FLUSHER is only ever set, not tested, remove it.

    Signed-off-by: Peter Zijlstra
    Cc: Jens Axboe
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     
  • After all that's what they are intended for.

    Signed-off-by: Jan Beulich
    Cc: Miklos Szeredi
    Cc: "Eric W. Biederman"
    Cc: "Rafael J. Wysocki"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Beulich
     
  • Use the new {max,min}3 macros to save some cycles and bytes on the stack.
    This patch substitutes trivial nested macros with their counterpart.

    Signed-off-by: Hagen Paul Pfeifer
    Cc: Joe Perches
    Cc: Ingo Molnar
    Cc: Hartley Sweeten
    Cc: Russell King
    Cc: Benjamin Herrenschmidt
    Cc: Thomas Gleixner
    Cc: Herbert Xu
    Cc: Roland Dreier
    Cc: Sean Hefty
    Cc: Pekka Enberg
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hagen Paul Pfeifer
     
  • Simple code for reducing list_empty(&source) check.

    Signed-off-by: Bob Liu
    Acked-by: KAMEZAWA Hiroyuki
    Acked-by: Wu Fengguang
    Cc: KOSAKI Motohiro
    Cc: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bob Liu
     
  • If not_managed is true all pages will be putback to lru, so break the loop
    earlier to skip other pages isolate.

    Signed-off-by: Bob Liu
    Acked-by: KAMEZAWA Hiroyuki
    Acked-by: Wu Fengguang
    Cc: KOSAKI Motohiro
    Cc: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bob Liu
     
  • __test_page_isolated_in_pageblock() returns 1 if all pages in the range
    are isolated, so fix the comment. Variable `pfn' will be initialised in
    the following loop so remove it.

    Signed-off-by: Bob Liu
    Acked-by: KAMEZAWA Hiroyuki
    Cc: Wu Fengguang
    Cc: KOSAKI Motohiro
    Cc: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bob Liu
     
  • page_order() is called by memory hotplug's user interface to check the
    section is removable or not. (is_mem_section_removable())

    It calls page_order() withoug holding zone->lock.
    So, even if the caller does

    if (PageBuddy(page))
    ret = page_order(page) ...
    The caller may hit BUG_ON().

    For fixing this, there are 2 choices.
    1. add zone->lock.
    2. remove BUG_ON().

    is_mem_section_removable() is used for some "advice" and doesn't need to
    be 100% accurate. This is_removable() can be called via user program..
    We don't want to take this important lock for long by user's request. So,
    this patch removes BUG_ON().

    Signed-off-by: KAMEZAWA Hiroyuki
    Acked-by: Wu Fengguang
    Acked-by: Michal Hocko
    Acked-by: Mel Gorman
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • Add missing spin_lock() of the page_table_lock before an error return in
    hugetlb_cow(). Callers of hugtelb_cow() expect it to be held upon return.

    Signed-off-by: Dean Nelson
    Cc: Mel Gorman
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dean Nelson
     
  • The vma returned by find_vma does not necessarily include the target
    address. If this happens the code tries to follow a page outside of any
    vma and returns ENOENT instead of EFAULT.

    Signed-off-by: Gleb Natapov
    Acked-by: Christoph Lameter
    Cc: Minchan Kim
    Cc: KAMEZAWA Hiroyuki
    Cc: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Gleb Natapov
     
  • System management wants to subscribe to changes in swap configuration.
    Make /proc/swaps pollable like /proc/mounts.

    [akpm@linux-foundation.org: document proc_poll_event]
    Signed-off-by: Kay Sievers
    Acked-by: Greg KH
    Cc: Jonathan Corbet
    Cc: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kay Sievers
     
  • Add vzalloc() and vzalloc_node() to encapsulate the
    vmalloc-then-memset-zero operation.

    Use __GFP_ZERO to zero fill the allocated memory.

    Signed-off-by: Dave Young
    Cc: Christoph Lameter
    Acked-by: Greg Ungerer
    Cc: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Young
     
  • Reported-by: KOSAKI Motohiro
    Cc: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • This removes following warning from sparse:

    mm/vmstat.c:466:5: warning: symbol 'fragmentation_index' was not declared. Should it be static?

    [akpm@linux-foundation.org: move the include to top-of-file]
    Signed-off-by: Namhyung Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Namhyung Kim
     
  • s_start() and s_stop() grab/release vmlist_lock but were missing proper
    annotations. Add them.

    Signed-off-by: Namhyung Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Namhyung Kim
     
  • Rename redundant 'tmp' to fix following sparse warnings:

    mm/vmalloc.c:296:34: warning: symbol 'tmp' shadows an earlier one
    mm/vmalloc.c:293:24: originally declared here

    Signed-off-by: Namhyung Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Namhyung Kim
     
  • Make anon_vma_chain_free() static. It is called only in rmap.c and the
    corresponding alloc function is already static.

    Signed-off-by: Namhyung Kim
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Namhyung Kim
     
  • The page_check_address() conditionally grabs *@ptlp in case of returning
    non-NULL. Rename and wrap it using __cond_lock() removes following
    warnings from sparse:

    mm/rmap.c:472:9: warning: context imbalance in 'page_mapped_in_vma' - unexpected unlock
    mm/rmap.c:524:9: warning: context imbalance in 'page_referenced_one' - unexpected unlock
    mm/rmap.c:706:9: warning: context imbalance in 'page_mkclean_one' - unexpected unlock
    mm/rmap.c:1066:9: warning: context imbalance in 'try_to_unmap_one' - unexpected unlock

    Signed-off-by: Namhyung Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Namhyung Kim