11 Dec, 2014

40 commits

  • This patch replaces calls to get_unused_fd() with equivalent call to
    get_unused_fd_flags(0) to preserve current behavor for existing code.

    In a further patch, get_unused_fd() will be removed so that new code start
    using get_unused_fd_flags(), with the hope O_CLOEXEC could be used, either
    by default or choosen by userspace.

    Signed-off-by: Yann Droneaud
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yann Droneaud
     
  • This patch replaces calls to get_unused_fd() with equivalent call to
    get_unused_fd_flags(0) to preserve current behavor for existing code.

    In a further patch, get_unused_fd() will be removed so that new code start
    using get_unused_fd_flags(), with the hope O_CLOEXEC could be used, either
    by default or choosen by userspace.

    Signed-off-by: Yann Droneaud
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yann Droneaud
     
  • This patch replaces calls to get_unused_fd() with equivalent call to
    get_unused_fd_flags(0) to preserve current behavor for existing code.

    In a further patch, get_unused_fd() will be removed so that new code start
    using get_unused_fd_flags(), with the hope O_CLOEXEC could be used, either
    by default or choosen by userspace.

    Signed-off-by: Yann Droneaud
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yann Droneaud
     
  • Now that forget_original_parent() uses ->ptrace_entry for EXIT_DEAD tasks,
    we can simply pass "dead_children" list to exit_ptrace() and remove
    another release_task() loop. Plus this way we do not need to drop and
    reacquire tasklist_lock.

    Also shift the list_empty(ptraced) check, if we want this optimization it
    makes sense to eliminate the function call altogether.

    Signed-off-by: Oleg Nesterov
    Cc: Aaron Tomlin
    Cc: Alexey Dobriyan
    Cc: "Eric W. Biederman" ,
    Cc: Sterling Alexander
    Cc: Peter Zijlstra
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • 1. Now that reparent_leader() doesn't abuse ->sibling we can shift
    list_move_tail() from reparent_leader() to forget_original_parent()
    and turn it into a single list_splice_tail_init(). This also makes
    BUG_ON(!list_empty()) and list_for_each_entry_safe() unnecessary.

    2. This also allows to shift the same_thread_group() check, it looks
    a bit more clear in the caller.

    Signed-off-by: Oleg Nesterov
    Cc: Aaron Tomlin
    Cc: Alexey Dobriyan
    Cc: "Eric W. Biederman" ,
    Cc: Sterling Alexander
    Cc: Peter Zijlstra
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • 1. Cosmetic, but "if (t->parent == father)" looks a bit confusing.
    We need to change t->parent if and only if t is not traced.

    2. If we actually want this BUG_ON() to ensure that parent/ptrace
    match each other, then we should also take ptrace_reparented()
    case into account too.

    3. Change this code to use for_each_thread() instead of deprecated
    while_each_thread().

    [dan.carpenter@oracle.com: silence a bogus static checker warning]
    Signed-off-by: Oleg Nesterov
    Cc: Aaron Tomlin
    Cc: Alexey Dobriyan
    Cc: "Eric W. Biederman" ,
    Cc: Sterling Alexander
    Cc: Peter Zijlstra
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • reparent_leader() reuses ->sibling as a list node to add an EXIT_DEAD task
    into dead_children list we are going to release. This obviously removes
    the dead task from its real_parent->children list and this is even good;
    the parent can do nothing with the EXIT_DEAD reparented zombie, it only
    makes do_wait() slower.

    But, this also means that it can not be reparented once again, so if its
    new parent dies too nobody will update ->parent/real_parent, they can
    point to the freed memory even before release_task() we are going to call,
    this breaks the code which relies on pid_alive() to access
    ->real_parent/parent.

    Fortunately this is mostly theoretical, this can only happen if init or
    PR_SET_CHILD_SUBREAPER process ignores SIGCHLD and the new parent
    sub-thread exits right after we drop tasklist_lock.

    Change this code to use ->ptrace_entry instead, we know that the child is
    not traced so nobody can ever use this member. This also allows to unify
    this logic with exit_ptrace(), see the next changes.

    Note: we really need to change release_task() to nullify real_parent/
    parent/group_leader pointers, but we need to change the current users
    first somehow. And it would be better to reap this zombie immediately but
    release_task_locked() we need is complicated by proc_flush_task().

    Signed-off-by: Oleg Nesterov
    Cc: Aaron Tomlin
    Cc: Alexey Dobriyan
    Cc: "Eric W. Biederman" ,
    Cc: Sterling Alexander
    Cc: Peter Zijlstra
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • rcu_read_lock() can not protect p->real_parent if release_task(p) was
    already called, change sched_show_task() to check pis_alive() like other
    users do.

    Note: we need some helpers to cleanup the code like this. And it seems
    that that the usage of cpu_curr(cpu) in dump_cpu_task() is not safe too.

    Signed-off-by: Oleg Nesterov
    Cc: Aaron Tomlin
    Cc: Alexey Dobriyan
    Cc: "Eric W. Biederman" ,
    Cc: Sterling Alexander
    Acked-by: Peter Zijlstra (Intel)
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • p->ptrace != 0 means that release_task(p) was not called, so pid_alive()
    buys nothing and we can remove this check. Other callers already use it
    directly without additional checks.

    Note: with or without this patch ptrace_parent() can return the pointer to
    the freed task, this will be explained/fixed later.

    Signed-off-by: Oleg Nesterov
    Cc: Aaron Tomlin
    Cc: Alexey Dobriyan
    Cc: "Eric W. Biederman" ,
    Cc: Sterling Alexander
    Cc: Peter Zijlstra
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • task_state() does seq_printf() under rcu_read_lock(), but this is only
    needed for task_tgid_nr_ns() and task_numa_group_id(). We can calculate
    tgid/ngid and drop rcu lock.

    Signed-off-by: Oleg Nesterov
    Cc: Aaron Tomlin
    Cc: Alexey Dobriyan
    Cc: "Eric W. Biederman" ,
    Cc: Sterling Alexander
    Cc: Peter Zijlstra
    Cc: Roland McGrath
    Reviewed-by: Paul E. McKenney
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • 1. The usage of fdt looks very ugly, it can't be NULL if ->files is
    not NULL. We can use "unsigned int max_fds" instead.

    2. This also allows to move seq_printf(max_fds) outside of task_lock()
    and join it with the previous seq_printf(). See also the next patch.

    Signed-off-by: Oleg Nesterov
    Cc: Aaron Tomlin
    Cc: Alexey Dobriyan
    Cc: "Eric W. Biederman" ,
    Cc: Sterling Alexander
    Cc: Peter Zijlstra
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • task_state() reads cred->group_info under task_lock() because a long ago
    it was task_struct->group_info and it was actually protected by
    task->alloc_lock. Today this task_unlock() after rcu_read_unlock() just
    adds the confusion, move task_unlock() up.

    Signed-off-by: Oleg Nesterov
    Cc: Aaron Tomlin
    Cc: Alexey Dobriyan
    Cc: "Eric W. Biederman" ,
    Cc: Sterling Alexander
    Cc: Peter Zijlstra
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Better to use existing macro that rewriting them.

    Signed-off-by: Nicolas Dichtel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nicolas Dichtel
     
  • proc_register() error paths are leaking inodes and directory refcounts.

    Signed-off-by: Debabrata Banerjee
    Cc: Alexander Viro
    Acked-by: Nicolas Dichtel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Debabrata Banerjee
     
  • When a lot of netdevices are created, one of the bottleneck is the
    creation of proc entries. This serie aims to accelerate this part.

    The current implementation for the directories in /proc is using a single
    linked list. This is slow when handling directories with large numbers of
    entries (eg netdevice-related entries when lots of tunnels are opened).

    This patch replaces this linked list by a red-black tree.

    Here are some numbers:

    dummy30000.batch contains 30 000 times 'link add type dummy'.

    Before the patch:
    $ time ip -b dummy30000.batch
    real 2m31.950s
    user 0m0.440s
    sys 2m21.440s
    $ time rmmod dummy
    real 1m35.764s
    user 0m0.000s
    sys 1m24.088s

    After the patch:
    $ time ip -b dummy30000.batch
    real 2m0.874s
    user 0m0.448s
    sys 1m49.720s
    $ time rmmod dummy
    real 1m13.988s
    user 0m0.000s
    sys 1m1.008s

    The idea of improving this part was suggested by Thierry Herbelot.

    [akpm@linux-foundation.org: initialise proc_root.subdir at compile time]
    Signed-off-by: Nicolas Dichtel
    Acked-by: David S. Miller
    Cc: Thierry Herbelot .
    Acked-by: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nicolas Dichtel
     
  • Now that the external page_cgroup data structure and its lookup is
    gone, let the generic bad_page() check for page->mem_cgroup sanity.

    Signed-off-by: Johannes Weiner
    Acked-by: Michal Hocko
    Acked-by: Vladimir Davydov
    Acked-by: David S. Miller
    Cc: KAMEZAWA Hiroyuki
    Cc: "Kirill A. Shutemov"
    Cc: Tejun Heo
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • Now that the external page_cgroup data structure and its lookup is gone,
    the only code remaining in there is swap slot accounting.

    Rename it and move the conditional compilation into mm/Makefile.

    Signed-off-by: Johannes Weiner
    Acked-by: Michal Hocko
    Acked-by: Vladimir Davydov
    Acked-by: David S. Miller
    Acked-by: KAMEZAWA Hiroyuki
    Cc: "Kirill A. Shutemov"
    Cc: Tejun Heo
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • Memory cgroups used to have 5 per-page pointers. To allow users to
    disable that amount of overhead during runtime, those pointers were
    allocated in a separate array, with a translation layer between them and
    struct page.

    There is now only one page pointer remaining: the memcg pointer, that
    indicates which cgroup the page is associated with when charged. The
    complexity of runtime allocation and the runtime translation overhead is
    no longer justified to save that *potential* 0.19% of memory. With
    CONFIG_SLUB, page->mem_cgroup actually sits in the doubleword padding
    after the page->private member and doesn't even increase struct page,
    and then this patch actually saves space. Remaining users that care can
    still compile their kernels without CONFIG_MEMCG.

    text data bss dec hex filename
    8828345 1725264 983040 11536649 b00909 vmlinux.old
    8827425 1725264 966656 11519345 afc571 vmlinux.new

    [mhocko@suse.cz: update Documentation/cgroups/memory.txt]
    Signed-off-by: Johannes Weiner
    Acked-by: Michal Hocko
    Acked-by: Vladimir Davydov
    Acked-by: David S. Miller
    Acked-by: KAMEZAWA Hiroyuki
    Cc: "Kirill A. Shutemov"
    Cc: Michal Hocko
    Cc: Vladimir Davydov
    Cc: Tejun Heo
    Cc: Joonsoo Kim
    Acked-by: Konstantin Khlebnikov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • There is no cgroup-specific page lock anymore.

    Signed-off-by: Johannes Weiner
    Acked-by: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • The largest index of swap device is MAX_SWAPFILES-1. So the type should
    be less than MAX_SWAPFILES.

    Signed-off-by: Haifeng Li
    Acked-by: Konrad Rzeszutek Wilk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Li Haifeng
     
  • Signed-off-by Wei Yuan
    Acked-by: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wei Yuan
     
  • First, after flushing TLB, we have no need to scan pte from start again.
    Second, before bail out loop, the address is forwarded one step.

    Signed-off-by: Hillf Danton
    Reviewed-by: Michal Hocko
    Acked-by: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hillf Danton
     
  • Since commit d7365e783edb ("mm: memcontrol: fix missed end-writeback
    page accounting") mem_cgroup_end_page_stat consumes locked and flags
    variables directly rather than via pointers which might trigger C
    undefined behavior as those variables are initialized only in the slow
    path of mem_cgroup_begin_page_stat.

    Although mem_cgroup_end_page_stat handles parameters correctly and
    touches them only when they hold a sensible value it is caller which
    loads a potentially uninitialized value which then might allow compiler
    to do crazy things.

    I haven't seen any warning from gcc and it seems that the current
    version (4.9) doesn't exploit this type undefined behavior but Sasha has
    reported the following:

    UBSan: Undefined behaviour in mm/rmap.c:1084:2
    load of value 255 is not a valid value for type '_Bool'
    CPU: 4 PID: 8304 Comm: rngd Not tainted 3.18.0-rc2-next-20141029-sasha-00039-g77ed13d-dirty #1427
    Call Trace:
    dump_stack (lib/dump_stack.c:52)
    ubsan_epilogue (lib/ubsan.c:159)
    __ubsan_handle_load_invalid_value (lib/ubsan.c:482)
    page_remove_rmap (mm/rmap.c:1084 mm/rmap.c:1096)
    unmap_page_range (./arch/x86/include/asm/atomic.h:27 include/linux/mm.h:463 mm/memory.c:1146 mm/memory.c:1258 mm/memory.c:1279 mm/memory.c:1303)
    unmap_single_vma (mm/memory.c:1348)
    unmap_vmas (mm/memory.c:1377 (discriminator 3))
    exit_mmap (mm/mmap.c:2837)
    mmput (kernel/fork.c:659)
    do_exit (./arch/x86/include/asm/thread_info.h:168 kernel/exit.c:462 kernel/exit.c:747)
    do_group_exit (include/linux/sched.h:775 kernel/exit.c:873)
    SyS_exit_group (kernel/exit.c:901)
    tracesys_phase2 (arch/x86/kernel/entry_64.S:529)

    Fix this by using pointer parameters for both locked and flags and be
    more robust for future compiler changes even though the current code is
    implemented correctly.

    Signed-off-by: Michal Hocko
    Reported-by: Sasha Levin
    Acked-by: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     
  • As a small zero page, huge zero page should not be accounted in smaps
    report as normal page.

    For small pages we rely on vm_normal_page() to filter out zero page, but
    vm_normal_page() is not designed to handle pmds. We only get here due
    hackish cast pmd to pte in smaps_pte_range() -- pte and pmd format is not
    necessary compatible on each and every architecture.

    Let's add separate codepath to handle pmds. follow_trans_huge_pmd() will
    detect huge zero page for us.

    We would need pmd_dirty() helper to do this properly. The patch adds it
    to THP-enabled architectures which don't yet have one.

    [akpm@linux-foundation.org: use do_div to fix 32-bit build]
    Signed-off-by: "Kirill A. Shutemov"
    Reported-by: Fengguang Wu
    Tested-by: Fengwei Yin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     
  • None of the mem_cgroup_same_or_subtree() callers actually require it to
    take the RCU lock, either because they hold it themselves or they have css
    references. Remove it.

    To make the API change clear, rename the leftover helper to
    mem_cgroup_is_descendant() to match cgroup_is_descendant().

    Signed-off-by: Johannes Weiner
    Reviewed-by: Vladimir Davydov
    Acked-by: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • The NULL in mm_match_cgroup() comes from a possibly exiting mm->owner. It
    makes a lot more sense to check where it's looked up, rather than check
    for it in __mem_cgroup_same_or_subtree() where it's unexpected.

    No other callsite passes NULL to __mem_cgroup_same_or_subtree().

    Signed-off-by: Johannes Weiner
    Reviewed-by: Vladimir Davydov
    Acked-by: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • That function acts like a typecast - unless NULL is passed in, no NULL can
    come out. task_in_mem_cgroup() callers don't pass NULL tasks.

    Signed-off-by: Johannes Weiner
    Reviewed-by: Vladimir Davydov
    Acked-by: Michal Hocko
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • While moving charges from one memcg to another, page stat updates must
    acquire the old memcg's move_lock to prevent double accounting. That
    situation is denoted by an increased memcg->move_accounting. However, the
    charge moving code declares this way too early for now, even before
    summing up the RSS and pre-allocating destination charges.

    Shorten this slowpath mode by increasing memcg->move_accounting only right
    before walking the task's address space with the intention of actually
    moving the pages.

    Signed-off-by: Johannes Weiner
    Acked-by: Michal Hocko
    Reviewed-by: Vladimir Davydov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • Zero pages can be used only in anonymous mappings, which never have
    writable vma->vm_page_prot: see protection_map in mm/mmap.c and __PX1X
    definitions.

    Let's drop redundant pmd_wrprotect() in set_huge_zero_page().

    Signed-off-by: "Kirill A. Shutemov"
    Cc: Andrea Arcangeli
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     
  • Let's use generic slab_start/next/stop for showing memcg caches info. In
    contrast to the current implementation, this will work even if all memcg
    caches' info doesn't fit into a seq buffer (a page), plus it simply looks
    neater.

    Actually, the main reason I do this isn't mere cleanup. I'm going to zap
    the memcg_slab_caches list, because I find it useless provided we have the
    slab_caches list, and this patch is a step in this direction.

    It should be noted that before this patch an attempt to read
    memory.kmem.slabinfo of a cgroup that doesn't have kmem limit set resulted
    in -EIO, while after this patch it will silently show nothing except the
    header, but I don't think it will frustrate anyone.

    Signed-off-by: Vladimir Davydov
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Cc: Johannes Weiner
    Cc: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vladimir Davydov
     
  • mem_cgroup_reclaimable() checks whether a cgroup has reclaimable pages on
    *any* NUMA node. However, the only place where it's called is
    mem_cgroup_soft_reclaim(), which tries to reclaim memory from a *specific*
    zone. So the way it is used is incorrect - it will return true even if
    the cgroup doesn't have pages on the zone we're scanning.

    I think we can get rid of this check completely, because
    mem_cgroup_shrink_node_zone(), which is called by
    mem_cgroup_soft_reclaim() if mem_cgroup_reclaimable() returns true, is
    equivalent to shrink_lruvec(), which exits almost immediately if the
    lruvec passed to it is empty. So there's no need to optimize anything
    here. Besides, we don't have such a check in the general scan path
    (shrink_zone) either.

    Signed-off-by: Vladimir Davydov
    Acked-by: Michal Hocko
    Acked-by: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vladimir Davydov
     
  • hstate_sizelog() would shift left an int rather than long, triggering
    undefined behaviour and passing an incorrect value when the requested
    page size was more than 4GB, thus breaking >4GB pages.

    Signed-off-by: Sasha Levin
    Cc: Andrea Arcangeli
    Cc: Mel Gorman
    Cc: Andrey Ryabinin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sasha Levin
     
  • Having these functions and their documentation split out and somewhere
    makes it harder, not easier, to follow what's going on.

    Inline them directly where charge moving is prepared and finished, and put
    an explanation right next to it.

    Signed-off-by: Johannes Weiner
    Cc: Michal Hocko
    Cc: Vladimir Davydov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • mem_cgroup_end_move() checks if the passed memcg is NULL, along with a
    lengthy comment to explain why this seemingly non-sensical situation is
    even possible.

    Check in cancel_attach() itself whether can_attach() set up the move
    context or not, it's a lot more obvious from there. Then remove the check
    and comment in mem_cgroup_end_move().

    Signed-off-by: Johannes Weiner
    Acked-by: Vladimir Davydov
    Acked-by: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • The wrappers around taking and dropping the memcg->move_lock spinlock add
    nothing of value. Inline the spinlock calls into the callsites.

    Signed-off-by: Johannes Weiner
    Acked-by: Vladimir Davydov
    Acked-by: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • pc->mem_cgroup had to be left intact after uncharge for the final LRU
    removal, and !PCG_USED indicated whether the page was uncharged. But
    since commit 0a31bc97c80c ("mm: memcontrol: rewrite uncharge API") pages
    are uncharged after the final LRU removal. Uncharge can simply clear
    the pointer and the PCG_USED/PageCgroupUsed sites can test that instead.

    Because this is the last page_cgroup flag, this patch reduces the memcg
    per-page overhead to a single pointer.

    [akpm@linux-foundation.org: remove unneeded initialization of `memcg', per Michal]
    Signed-off-by: Johannes Weiner
    Cc: Hugh Dickins
    Acked-by: Michal Hocko
    Reviewed-by: Vladimir Davydov
    Acked-by: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • PCG_MEM is a remnant from an earlier version of 0a31bc97c80c ("mm:
    memcontrol: rewrite uncharge API"), used to tell whether migration cleared
    a charge while leaving pc->mem_cgroup valid and PCG_USED set. But in the
    final version, mem_cgroup_migrate() directly uncharges the source page,
    rendering this distinction unnecessary. Remove it.

    Signed-off-by: Johannes Weiner
    Cc: Hugh Dickins
    Acked-by: Michal Hocko
    Reviewed-by: Vladimir Davydov
    Acked-by: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • Now that mem_cgroup_swapout() fully uncharges the page, every page that is
    still in use when reaching mem_cgroup_uncharge() is known to carry both
    the memory and the memory+swap charge. Simplify the uncharge path and
    remove the PCG_MEMSW page flag accordingly.

    Signed-off-by: Johannes Weiner
    Cc: Hugh Dickins
    Acked-by: Michal Hocko
    Reviewed-by: Vladimir Davydov
    Acked-by: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • This series gets rid of the remaining page_cgroup flags, thus cutting the
    memcg per-page overhead down to one pointer.

    This patch (of 4):

    mem_cgroup_swapout() is called with exclusive access to the page at the
    end of the page's lifetime. Instead of clearing the PCG_MEMSW flag and
    deferring the uncharge, just do it right away. This allows follow-up
    patches to simplify the uncharge code.

    Signed-off-by: Johannes Weiner
    Cc: Hugh Dickins
    Acked-by: Michal Hocko
    Acked-by: Vladimir Davydov
    Reviewed-by: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • Don't call lookup_page_cgroup() when memcg is disabled.

    Cc: Johannes Weiner
    Cc: Vladimir Davydov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko