02 Apr, 2008

1 commit

  • Small typo in the patch recently merged to avoid the unused symbol
    message for count_partial(). Discussion thread with confirmation of fix at
    http://marc.info/?t=120696854400001&r=1&w=2

    Typo in the check if we need the count_partial function that was
    introduced by 53625b4204753b904addd40ca96d9ba802e6977d

    Signed-off-by: Christoph Lameter
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

31 Mar, 2008

1 commit


28 Mar, 2008

1 commit


27 Mar, 2008

4 commits

  • Running the counters testcase from libhugetlbfs results in on 2.6.25-rc5
    and 2.6.25-rc5-mm1:

    BUG: soft lockup - CPU#3 stuck for 61s! [counters:10531]
    NIP: c0000000000d1f3c LR: c0000000000d1f2c CTR: c0000000001b5088
    REGS: c000005db12cb360 TRAP: 0901 Not tainted (2.6.25-rc5-autokern1)
    MSR: 8000000000009032 CR: 48008448 XER: 20000000
    TASK = c000005dbf3d6000[10531] 'counters' THREAD: c000005db12c8000 CPU: 3
    GPR00: 0000000000000004 c000005db12cb5e0 c000000000879228 0000000000000004
    GPR04: 0000000000000010 0000000000000000 0000000000200200 0000000000100100
    GPR08: c0000000008aba10 000000000000ffff 0000000000000004 0000000000000000
    GPR12: 0000000028000442 c000000000770080
    NIP [c0000000000d1f3c] .return_unused_surplus_pages+0x84/0x18c
    LR [c0000000000d1f2c] .return_unused_surplus_pages+0x74/0x18c
    Call Trace:
    [c000005db12cb5e0] [c000005db12cb670] 0xc000005db12cb670 (unreliable)
    [c000005db12cb670] [c0000000000d24c4] .hugetlb_acct_memory+0x2e0/0x354
    [c000005db12cb740] [c0000000001b5048] .truncate_hugepages+0x1d4/0x214
    [c000005db12cb890] [c0000000001b50a4] .hugetlbfs_delete_inode+0x1c/0x3c
    [c000005db12cb920] [c000000000103fd8] .generic_delete_inode+0xf8/0x1c0
    [c000005db12cb9b0] [c0000000001b5100] .hugetlbfs_drop_inode+0x3c/0x24c
    [c000005db12cba50] [c00000000010287c] .iput+0xdc/0xf8
    [c000005db12cbad0] [c0000000000fee54] .dentry_iput+0x12c/0x194
    [c000005db12cbb60] [c0000000000ff050] .d_kill+0x6c/0xa4
    [c000005db12cbbf0] [c0000000000ffb74] .dput+0x18c/0x1b0
    [c000005db12cbc70] [c0000000000e9e98] .__fput+0x1a4/0x1e8
    [c000005db12cbd10] [c0000000000e61ec] .filp_close+0xb8/0xe0
    [c000005db12cbda0] [c0000000000e62d0] .sys_close+0xbc/0x134
    [c000005db12cbe30] [c00000000000872c] syscall_exit+0x0/0x40
    Instruction dump:
    ebbe8038 38800010 e8bf0002 3bbd0008 7fa3eb78 38a50001 7ca507b4 4818df25
    60000000 38800010 38a00000 7c601b78 2f800010 409d0008 38000010

    This was tracked down to a potential livelock in
    return_unused_surplus_hugepages(). In the case where we have surplus
    pages on some node, but no free pages on the same node, we may never
    break out of the loop. To avoid this livelock, terminate the search if
    we iterate a number of times equal to the number of online nodes without
    freeing a page.

    Thanks to Andy Whitcroft and Adam Litke for helping with debugging and
    the patch.

    Signed-off-by: Nishanth Aravamudan
    Signed-off-by: Linus Torvalds

    Nishanth Aravamudan
     
  • Currently we show the surplus hugetlb pool state in /proc/meminfo, but
    not in the per-node meminfo files, even though we track the information
    on a per-node basis. Printing it there can help track down dynamic pool
    bugs including the one in the follow-on patch.

    Signed-off-by: Nishanth Aravamudan
    Signed-off-by: Linus Torvalds

    Nishanth Aravamudan
     
  • Commit 556a169dab38b5100df6f4a45b655dddd3db94c1 ("slab: fix bootstrap on
    memoryless node") introduced bootstrap-time cache_cache list3s for all nodes
    but forgot that initkmem_list3 needs to be accessed by [somevalue + node]. This
    patch fixes list_add() corruption in mm/slab.c seen on the ES7000.

    Cc: Mel Gorman
    Cc: Olaf Hering
    Cc: Christoph Lameter
    Signed-off-by: Dan Yeisley
    Signed-off-by: Pekka Enberg
    Signed-off-by: Christoph Lameter

    Daniel Yeisley
     
  • Avoid warnings about unused functions if neither SLUB_DEBUG nor CONFIG_SLABINFO
    is defined. This patch will be reversed when slab defrag is merged since slab
    defrag requires count_partial() to determine the fragmentation status of
    slab caches.

    Signed-off-by: Christoph Lameter

    Christoph Lameter
     

25 Mar, 2008

3 commits

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
    [PATCH] get stack footprint of pathname resolution back to relative sanity
    [PATCH] double iput() on failure exit in hugetlb
    [PATCH] double dput() on failure exit in tiny-shmem
    [PATCH] fix up new filp allocators
    [PATCH] check for null vfsmount in dentry_open()
    [PATCH] reiserfs: eliminate private use of struct file in xattr
    [PATCH] sanitize hppfs
    hppfs pass vfsmount to dentry_open()
    [PATCH] restore export of do_kern_mount()

    Linus Torvalds
     
  • Revert commit f1a9ee758de7de1e040de849fdef46e6802ea117:

    Author: Rik van Riel
    Date: Thu Feb 7 00:14:08 2008 -0800

    kswapd should only wait on IO if there is IO

    The current kswapd (and try_to_free_pages) code has an oddity where the
    code will wait on IO, even if there is no IO in flight. This problem is
    notable especially when the system scans through many unfreeable pages,
    causing unnecessary stalls in the VM.

    Additionally, tasks without __GFP_FS or __GFP_IO in the direct reclaim path
    will sleep if a significant number of pages are encountered that should be
    written out. This gives kswapd a chance to write out those pages, while
    the direct reclaim task sleeps.

    Signed-off-by: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Because of large latencies and interactivity problems reported by Carlos,
    here: http://lkml.org/lkml/2008/3/22/211

    Cc: Rik van Riel
    Cc: "Carlos R. Mafra"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • With numa enabled, some callers could have a range of memory on one node
    but try to free that on other node. This can cause some pages to be
    freed wrongly.

    For example: when we try to allocate 128g boot ram early for
    gart/swiotlb, and free that range later so gart/swiotlb can get some
    range afterwards.

    With this patch, we don't need to care which node holds the range, just
    loop to call free_bootmem_node for all online nodes.

    This patch makes free_bootmem_core() more robust by trimming the sidx
    and eidx according the ram range that the node has.

    And make the free_bootmem_core handle this out of range case. We could
    use bdata_list to make sure the range can be freed for sure. So next
    time, we don't need to loop online nodes and could use free_bootmem
    directly.

    Signed-off-by: Yinghai Lu
    Cc: Andi Kleen
    Cc: Yasunori Goto
    Cc: KAMEZAWA Hiroyuki
    Acked-by: Ingo Molnar
    Tested-by: Ingo Molnar
    Cc: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yinghai Lu
     

20 Mar, 2008

7 commits

  • Fix kernel-doc notation in mm/readahead.c.

    Change ":" to ";" so that it doesn't get treated as a doc section heading.
    Move the comment block ending "*/" to a line by itself so that the text on
    that last line is not lost (dropped).

    Signed-off-by: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     
  • The check t->pid == t->pid is not the blessed way to check whether a task is a
    group leader.

    This is not about the code beautifulness only, but about pid namespaces fixes
    - both the tgid and the pid fields on the task_struct are (slowly :( )
    becoming deprecated.

    Besides, the thread_group_leader() macro makes only one dereference :)

    Signed-off-by: Pavel Emelyanov
    Cc: Balbir Singh
    Cc: KAMEZAWA Hiroyuki
    Cc: Paul Menage
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • Correct kernel-doc function names and parameters in rmap.c.

    Signed-off-by: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     
  • Add kernel-doc comments to highmem.c.

    Signed-off-by: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     
  • Fix kernel-doc notation in oom_kill.c.

    Signed-off-by: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     
  • Convert tiny-shmem.c function comments to kernel-doc. Add parameters and
    convert/fix other kernel-doc in shmem.c.

    Signed-off-by: Randy Dunlap
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     
  • Fix various kernel-doc notation in mm/:

    filemap.c: add function short description; convert 2 to kernel-doc
    fremap.c: change parameter 'prot' to @prot
    pagewalk.c: change "-" in function parameters to ":"
    slab.c: fix short description of kmem_ptr_validate()
    swap.c: fix description & parameters of put_pages_list()
    swap_state.c: fix function parameters
    vmalloc.c: change "@returns" to "Returns:" since that is not a parameter

    Signed-off-by: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     

19 Mar, 2008

1 commit


18 Mar, 2008

1 commit

  • The fallback path needs to enable interrupts like done for
    the other page allocator calls. This was not necessary with
    the alternate fast path since we handled irq enable/disable in
    the slow path. The regular fastpath handles irq enable/disable
    around calls to the slow path so we need to restore the proper
    status before calling the page allocator from the slowpath.

    Signed-off-by: Christoph Lameter

    Christoph Lameter
     

11 Mar, 2008

3 commits

  • iov_iter_advance() skips over zero-length iovecs, however it does not properly
    terminate at the end of the iovec array. Fix this by checking against
    i->count before we skip a zero-length iov.

    The bug was reproduced with a test program that continually randomly creates
    iovs to writev. The fix was also verified with the same program and also it
    could verify that the correct data was contained in the file after each
    writev.

    Signed-off-by: Nick Piggin
    Tested-by: "Kevin Coffman"
    Cc: "Alexey Dobriyan"
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • Free pages in the hugetlb pool are free and as such have a reference count of
    zero. Regular allocations into the pool from the buddy are "freed" into the
    pool which results in their page_count dropping to zero. However, surplus
    pages can be directly utilized by the caller without first being freed to the
    pool. Therefore, a call to put_page_testzero() is in order so that such a
    page will be handed to the caller with a correct count.

    This has not affected end users because the bad page count is reset before the
    page is handed off. However, under CONFIG_DEBUG_VM this triggers a BUG when
    the page count is validated.

    Thanks go to Mel for first spotting this issue and providing an initial fix.

    Signed-off-by: Adam Litke
    Cc: Mel Gorman
    Cc: Dave Hansen
    Cc: William Lee Irwin III
    Cc: Andy Whitcroft
    Cc: Mel Gorman
    Cc: David Gibson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adam Litke
     
  • Address 3 known bugs in the current memory policy reference counting method.
    I have a series of patches to rework the reference counting to reduce overhead
    in the allocation path. However, that series will require testing in -mm once
    I repost it.

    1) alloc_page_vma() does not release the extra reference taken for
    vma/shared mempolicy when the mode == MPOL_INTERLEAVE. This can result in
    leaking mempolicy structures. This is probably occurring, but not being
    noticed.

    Fix: add the conditional release of the reference.

    2) hugezonelist unconditionally releases a reference on the mempolicy when
    mode == MPOL_INTERLEAVE. This can result in decrementing the reference
    count for system default policy [should have no ill effect] or premature
    freeing of task policy. If this occurred, the next allocation using task
    mempolicy would use the freed structure and probably BUG out.

    Fix: add the necessary check to the release.

    3) The current reference counting method assumes that vma 'get_policy()'
    methods automatically add an extra reference a non-NULL returned mempolicy.
    This is true for shmem_get_policy() used by tmpfs mappings, including
    regular page shm segments. However, SHM_HUGETLB shm's, backed by
    hugetlbfs, just use the vma policy without the extra reference. This
    results in freeing of the vma policy on the first allocation, with reuse of
    the freed mempolicy structure on subsequent allocations.

    Fix: Rather than add another condition to the conditional reference
    release, which occur in the allocation path, just add a reference when
    returning the vma policy in shm_get_policy() to match the assumptions.

    Signed-off-by: Lee Schermerhorn
    Cc: Greg KH
    Cc: Andi Kleen
    Cc: Christoph Lameter
    Cc: Mel Gorman
    Cc: David Rientjes
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lee Schermerhorn
     

10 Mar, 2008

1 commit


07 Mar, 2008

5 commits

  • NUMA slab allocator cpu migration bugfix

    The NUMA slab allocator (specifically, cache_alloc_refill)
    is not refreshing its local copies of what cpu and what
    numa node it is on, when it drops and reacquires the irq
    block that it inherited from its caller. As a result
    those values become invalid if an attempt to migrate the
    process to another numa node occured while the irq block
    had been dropped.

    The solution is to make cache_alloc_refill reload these
    variables whenever it drops and reacquires the irq block.

    The error is very difficult to hit. When it does occur,
    one gets the following oops + stack traceback bits in
    check_spinlock_acquired:

    kernel BUG at mm/slab.c:2417
    cache_alloc_refill+0xe6
    kmem_cache_alloc+0xd0
    ...

    This patch was developed against 2.6.23, ported to and
    compiled-tested only against 2.6.25-rc4.

    Signed-off-by: Joe Korty
    Signed-off-by: Christoph Lameter

    Joe Korty
     
  • SLUB should pack even small objects nicely into cachelines if that is what
    has been asked for. Use the same algorithm as SLAB for this.

    The effect of this patch for a system with a cacheline size of 64
    bytes is that the 24 byte sized slab caches will now put exactly
    2 objects into a cacheline instead of 3 with some overlap into
    the next cacheline. This reduces the object density in a 4k slab
    from 170 to 128 objects (same as SLAB).

    Signed-off-by: Nick Piggin
    Signed-off-by: Christoph Lameter

    Nick Piggin
     
  • Make them all use angle brackets and the directory name.

    Acked-by: Pekka Enberg
    Signed-off-by: Joe Perches
    Signed-off-by: Christoph Lameter

    Joe Perches
     
  • The NUMA fallback logic should be passing local_flags to kmem_get_pages() and not simply the
    flags passed in.

    Reviewed-by: Pekka Enberg
    Signed-off-by: Christoph Lameter

    Christoph Lameter
     
  • The remote frees are in the freelist of the page and not in the
    percpu freelist.

    Reviewed-by: Pekka Enberg
    Signed-off-by: Christoph Lameter

    Christoph Lameter
     

05 Mar, 2008

12 commits

  • Adam Litke noticed that currently we grow the hugepage pool independent of any
    cpuset the running process may be in, but when shrinking the pool, the cpuset
    is checked. This leads to inconsistency when shrinking the pool in a
    restricted cpuset -- an administrator may have been able to grow the pool on a
    node restricted by a containing cpuset, but they cannot shrink it there.

    There are two options: either prevent growing of the pool outside of the
    cpuset or allow shrinking outside of the cpuset. >From previous discussions
    on linux-mm, /proc/sys/vm/nr_hugepages is an administrative interface that
    should not be restricted by cpusets. So allow shrinking the pool by removing
    pages from nodes outside of current's cpuset.

    Signed-off-by: Nishanth Aravamudan
    Acked-by: Adam Litke
    Cc: William Irwin
    Cc: Lee Schermerhorn
    Cc: Christoph Lameter
    Cc: Paul Jackson
    Cc: David Gibson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nishanth Aravamudan
     
  • A hugetlb reservation may be inadequately backed in the event of racing
    allocations and frees when utilizing surplus huge pages. Consider the
    following series of events in processes A and B:

    A) Allocates some surplus pages to satisfy a reservation
    B) Frees some huge pages
    A) A notices the extra free pages and drops hugetlb_lock to free some of
    its surplus pages back to the buddy allocator.
    B) Allocates some huge pages
    A) Reacquires hugetlb_lock and returns from gather_surplus_huge_pages()

    Avoid this by commiting the reservation after pages have been allocated but
    before dropping the lock to free excess pages. For parity, release the
    reservation in return_unused_surplus_pages().

    This patch also corrects the cpuset_mems_nr() error path in
    hugetlb_acct_memory(). If the cpuset check fails, uncommit the
    reservation, but also be sure to return any surplus huge pages that may
    have been allocated to back the failed reservation.

    Thanks to Andy Whitcroft for discovering this.

    Signed-off-by: Adam Litke
    Cc: Mel Gorman
    Cc: Andy Whitcroft
    Cc: Dave Hansen
    Cc: William Lee Irwin III
    Cc: Andy Whitcroft
    Cc: Mel Gorman
    Cc: David Gibson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adam Litke
     
  • While testing force_empty, during an exit_mmap, __mem_cgroup_remove_list
    called from mem_cgroup_uncharge_page oopsed on a NULL pointer in the lru list.
    I couldn't see what racing tasks on other cpus were doing, but surmise that
    another must have been in mem_cgroup_charge_common on the same page, between
    its unlock_page_cgroup and spin_lock_irqsave near done (thanks to that kzalloc
    which I'd almost changed to a kmalloc).

    Normally such a race cannot happen, the ref_cnt prevents it, the final
    uncharge cannot race with the initial charge. But force_empty buggers the
    ref_cnt, that's what it's all about; and thereafter forced pages are
    vulnerable to races such as this (just think of a shared page also mapped into
    an mm of another mem_cgroup than that just emptied). And remain vulnerable
    until they're freed indefinitely later.

    This patch just fixes the oops by moving the unlock_page_cgroups down below
    adding to and removing from the list (only possible given the previous patch);
    and while we're at it, we might as well make it an invariant that
    page->page_cgroup is always set while pc is on lru.

    But this behaviour of force_empty seems highly unsatisfactory to me: why have
    a ref_cnt if we always have to cope with it being violated (as in the earlier
    page migration patch). We may prefer force_empty to move pages to an orphan
    mem_cgroup (could be the root, but better not), from which other cgroups could
    recover them; we might need to reverse the locking again; but no time now for
    such concerns.

    Signed-off-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • As for force_empty, though this may not be the main topic here,
    mem_cgroup_force_empty_list() can be implemented simpler. It is possible to
    make the function just call mem_cgroup_uncharge_page() instead of releasing
    page_cgroups by itself. The tip is to call get_page() before invoking
    mem_cgroup_uncharge_page(), so the page won't be released during this
    function.

    Kamezawa-san points out that by the time mem_cgroup_uncharge_page() uncharges,
    the page might have been reassigned to an lru of a different mem_cgroup, and
    now be emptied from that; but Hugh claims that's okay, the end state is the
    same as when it hasn't gone to another list.

    And once force_empty stops taking lock_page_cgroup within mz->lru_lock,
    mem_cgroup_move_lists() can be simplified to take mz->lru_lock directly while
    holding page_cgroup lock (but still has to use try_lock_page_cgroup).

    Signed-off-by: Hirokazu Takahashi
    Signed-off-by: Hugh Dickins
    Cc: David Rientjes
    Cc: Balbir Singh
    Cc: KAMEZAWA Hiroyuki
    Cc: YAMAMOTO Takashi
    Cc: Paul Menage
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hirokazu Takahashi
     
  • Ever since the VM_BUG_ON(page_get_page_cgroup(page)) (now Bad page state) went
    into page freeing, I've hit it from time to time in testing on some machines,
    sometimes only after many days. Recently found a machine which could usually
    produce it within a few hours, which got me there at last.

    The culprit is mem_cgroup_move_lists, whose locking is inadequate; and the
    arrangement of structures was such that you got page_cgroups from the lru list
    neatly put on to SLUB's freelist. Kamezawa-san identified the same hole
    independently.

    The main problem was that it was missing the lock_page_cgroup it needs to
    safely page_get_page_cgroup; but it's tricky to go beyond that too, and I
    couldn't do it with SLAB_DESTROY_BY_RCU as I'd expected. See the code for
    comments on the constraints.

    This patch immediately gets replaced by a simpler one from Hirokazu-san; but
    is it just foolish pride that tells me to put this one on record, in case we
    need to come back to it later?

    Signed-off-by: Hugh Dickins
    Cc: David Rientjes
    Cc: Balbir Singh
    Acked-by: KAMEZAWA Hiroyuki
    Cc: Hirokazu Takahashi
    Cc: YAMAMOTO Takashi
    Cc: Paul Menage
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • mem_cgroup_uncharge_page does css_put on the mem_cgroup before uncharging from
    it, and before removing page_cgroup from one of its lru lists: isn't there a
    danger that struct mem_cgroup memory could be freed and reused before
    completing that, so corrupting something? Never seen it, and for all I know
    there may be other constraints which make it impossible; but let's be
    defensive and reverse the ordering there.

    mem_cgroup_force_empty_list is safe because there's an extra css_get around
    all its works; but even so, change its ordering the same way round, to help
    get in the habit of doing it like this.

    Signed-off-by: Hugh Dickins
    Cc: David Rientjes
    Cc: Balbir Singh
    Acked-by: KAMEZAWA Hiroyuki
    Cc: Hirokazu Takahashi
    Cc: YAMAMOTO Takashi
    Cc: Paul Menage
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • Remove clear_page_cgroup: it's an unhelpful helper, see for example how
    mem_cgroup_uncharge_page had to unlock_page_cgroup just in order to call it
    (serious races from that? I'm not sure).

    Once that's gone, you can see it's pointless for page_cgroup's ref_cnt to be
    atomic: it's always manipulated under lock_page_cgroup, except where
    force_empty unilaterally reset it to 0 (and how does uncharge's
    atomic_dec_and_test protect against that?).

    Simplify this page_cgroup locking: if you've got the lock and the pc is
    attached, then the ref_cnt must be positive: VM_BUG_ONs to check that, and to
    check that pc->page matches page (we're on the way to finding why sometimes it
    doesn't, but this patch doesn't fix that).

    Signed-off-by: Hugh Dickins
    Cc: David Rientjes
    Cc: Balbir Singh
    Acked-by: KAMEZAWA Hiroyuki
    Cc: Hirokazu Takahashi
    Cc: YAMAMOTO Takashi
    Cc: Paul Menage
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • More cleanup to memcontrol.c, this time changing some of the code generated.
    Let the compiler decide what to inline (except for page_cgroup_locked which is
    only used when CONFIG_DEBUG_VM): the __always_inline on lock_page_cgroup etc.
    was quite a waste since bit_spin_lock etc. are inlines in a header file; made
    mem_cgroup_force_empty and mem_cgroup_write_strategy static.

    Signed-off-by: Hugh Dickins
    Cc: David Rientjes
    Cc: Balbir Singh
    Acked-by: KAMEZAWA Hiroyuki
    Cc: Hirokazu Takahashi
    Cc: YAMAMOTO Takashi
    Cc: Paul Menage
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • Sorry, before getting down to more important changes, I'd like to do some
    cleanup in memcontrol.c. This patch doesn't change the code generated, but
    cleans up whitespace, moves up a double declaration, removes an unused enum,
    removes void returns, removes misleading comments, that kind of thing.

    Signed-off-by: Hugh Dickins
    Cc: David Rientjes
    Cc: Balbir Singh
    Cc: KAMEZAWA Hiroyuki
    Cc: Hirokazu Takahashi
    Cc: YAMAMOTO Takashi
    Cc: Paul Menage
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • Nothing uses mem_cgroup_uncharge apart from mem_cgroup_uncharge_page, (a
    trivial wrapper around it) and mem_cgroup_end_migration (which does the same
    as mem_cgroup_uncharge_page). And it often ends up having to lock just to let
    its caller unlock. Remove it (but leave the silly locking until a later
    patch).

    Moved mem_cgroup_cache_charge next to mem_cgroup_charge in memcontrol.h.

    Signed-off-by: Hugh Dickins
    Cc: David Rientjes
    Acked-by: Balbir Singh
    Acked-by: KAMEZAWA Hiroyuki
    Cc: Hirokazu Takahashi
    Cc: YAMAMOTO Takashi
    Cc: Paul Menage
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • My memcgroup patch to fix hang with shmem/tmpfs added NULL page handling to
    mem_cgroup_charge_common. It seemed convenient at the time, but hard to
    justify now: there's a perfectly appropriate swappage to charge and uncharge
    instead, this is not on any hot path through shmem_getpage, and no performance
    hit was observed from the slight extra overhead.

    So revert that NULL page handling from mem_cgroup_charge_common; and make it
    clearer by bringing page_cgroup_assign_new_page_cgroup into its body - that
    was a helper I found more of a hindrance to understanding.

    Signed-off-by: Hugh Dickins
    Cc: David Rientjes
    Acked-by: Balbir Singh
    Acked-by: KAMEZAWA Hiroyuki
    Cc: Hirokazu Takahashi
    Cc: YAMAMOTO Takashi
    Cc: Paul Menage
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • Replace free_hot_cold_page's VM_BUG_ON(page_get_page_cgroup(page)) by a "Bad
    page state" and clear: most users don't have CONFIG_DEBUG_VM on, and if it
    were set here, it'd likely cause corruption when the page is reused.

    Don't use page_assign_page_cgroup to clear it: that should be private to
    memcontrol.c, and always called with the lock taken; and memmap_init_zone
    doesn't need it either - like page->mapping and other pointers throughout the
    kernel, Linux assumes pointers in zeroed structures are NULL pointers.

    Instead use page_reset_bad_cgroup, added to memcontrol.h for this only.

    Signed-off-by: Hugh Dickins
    Cc: David Rientjes
    Acked-by: Balbir Singh
    Acked-by: KAMEZAWA Hiroyuki
    Cc: Hirokazu Takahashi
    Cc: YAMAMOTO Takashi
    Cc: Paul Menage
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins