24 Feb, 2008

1 commit

  • Fix build failure on sparc:

    In file included from include/linux/mm.h:39,
    from include/linux/memcontrol.h:24,
    from include/linux/swap.h:8,
    from include/linux/suspend.h:7,
    from init/do_mounts.c:6:
    include/asm/pgtable.h:344: warning: parameter names (without
    types) in function declaration
    include/asm/pgtable.h:345: warning: parameter names (without
    types) in function declaration
    include/asm/pgtable.h:346: error: expected '=', ',', ';', 'asm' or
    '__attribute__' before '___f___swp_entry'

    viro sayeth:

    I've run allmodconfig builds on a bunch of target, FWIW (essentially the
    same patch). Note that these includes are recent addition caused by added
    inline function that had since then become a define. So while I agree with
    your comments in general, in _this_ case it's pretty safe.

    The commit that had done it is 3062fc67dad01b1d2a15d58c709eff946389eca4
    ("memcontrol: move mm_cgroup to header file") and the switch to #define
    is in commit 60c12b1202a60eabb1c61317e5d2678fcea9893f ("memcontrol: add
    vm_match_cgroup()") (BTW, that probably warranted mentioning in the
    changelog of the latter).

    Cc: Adrian Bunk
    Cc: Robert Reif
    Signed-off-by: David Rientjes
    Cc: "David S. Miller"
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Rientjes
     

10 Feb, 2008

1 commit

  • mm_cgroup() is exclusively used to test whether an mm's mem_cgroup pointer
    is pointing to a specific cgroup. Instead of returning the pointer, we can
    just do the test itself in a new macro:

    vm_match_cgroup(mm, cgroup)

    returns non-zero if the mm's mem_cgroup points to cgroup. Otherwise it
    returns zero.

    Signed-off-by: David Rientjes
    Cc: Balbir Singh
    Cc: Adrian Bunk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Rientjes
     

08 Feb, 2008

16 commits

  • Based on the discussion at http://lkml.org/lkml/2007/12/20/383, it was felt
    that control_type might not be a good thing to implement right away. We
    can add this flexibility at a later point when required.

    Signed-off-by: Balbir Singh
    Acked-by: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Balbir Singh
     
  • … pages to be scanned per cgroup

    Define function for calculating the number of scan target on each Zone/LRU.

    Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
    Cc: "Eric W. Biederman" <ebiederm@xmission.com>
    Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
    Cc: David Rientjes <rientjes@google.com>
    Cc: Herbert Poetzl <herbert@13thfloor.at>
    Cc: Kirill Korotaev <dev@sw.ru>
    Cc: Nick Piggin <nickpiggin@yahoo.com.au>
    Cc: Paul Menage <menage@google.com>
    Cc: Pavel Emelianov <xemul@openvz.org>
    Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
    Cc: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
    Cc: Rik van Riel <riel@redhat.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

    KAMEZAWA Hiroyuki
     
  • Functions to remember reclaim priority per cgroup (as zone->prev_priority)

    [akpm@linux-foundation.org: build fixes]
    [akpm@linux-foundation.org: more build fixes]
    Signed-off-by: KAMEZAWA Hiroyuki
    Cc: "Eric W. Biederman"
    Cc: Balbir Singh
    Cc: David Rientjes
    Cc: Herbert Poetzl
    Cc: Kirill Korotaev
    Cc: Nick Piggin
    Cc: Paul Menage
    Cc: Pavel Emelianov
    Cc: Peter Zijlstra
    Cc: Vaidyanathan Srinivasan
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • …ve imbalance per cgroup

    Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
    Cc: "Eric W. Biederman" <ebiederm@xmission.com>
    Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
    Cc: David Rientjes <rientjes@google.com>
    Cc: Herbert Poetzl <herbert@13thfloor.at>
    Cc: Kirill Korotaev <dev@sw.ru>
    Cc: Nick Piggin <nickpiggin@yahoo.com.au>
    Cc: Paul Menage <menage@google.com>
    Cc: Pavel Emelianov <xemul@openvz.org>
    Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
    Cc: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
    Cc: Rik van Riel <riel@redhat.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

    KAMEZAWA Hiroyuki
     
  • Define function for calculating mapped_ratio in memory cgroup.

    Signed-off-by: KAMEZAWA Hiroyuki
    Cc: "Eric W. Biederman"
    Cc: Balbir Singh
    Cc: David Rientjes
    Cc: Herbert Poetzl
    Cc: Kirill Korotaev
    Cc: Nick Piggin
    Cc: Paul Menage
    Cc: Pavel Emelianov
    Cc: Peter Zijlstra
    Cc: Vaidyanathan Srinivasan
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • While using memory control cgroup, page-migration under it works as following.
    ==
    1. uncharge all refs at try to unmap.
    2. charge regs again remove_migration_ptes()
    ==
    This is simple but has following problems.
    ==
    The page is uncharged and charged back again if *mapped*.
    - This means that cgroup before migration can be different from one after
    migration
    - If page is not mapped but charged as page cache, charge is just ignored
    (because not mapped, it will not be uncharged before migration)
    This is memory leak.
    ==
    This patch tries to keep memory cgroup at page migration by increasing
    one refcnt during it. 3 functions are added.

    mem_cgroup_prepare_migration() --- increase refcnt of page->page_cgroup
    mem_cgroup_end_migration() --- decrease refcnt of page->page_cgroup
    mem_cgroup_page_migration() --- copy page->page_cgroup from old page to
    new page.

    During migration
    - old page is under PG_locked.
    - new page is under PG_locked, too.
    - both old page and new page is not on LRU.

    These 3 facts guarantee that page_cgroup() migration has no race.

    Tested and worked well in x86_64/fake-NUMA box.

    Signed-off-by: KAMEZAWA Hiroyuki
    Cc: Balbir Singh
    Cc: Pavel Emelianov
    Cc: Paul Menage
    Cc: Peter Zijlstra
    Cc: "Eric W. Biederman"
    Cc: Nick Piggin
    Cc: Kirill Korotaev
    Cc: Herbert Poetzl
    Cc: David Rientjes
    Cc: Vaidyanathan Srinivasan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • Creates a helper function to return non-zero if a task is a member of a
    memory controller:

    int task_in_mem_cgroup(const struct task_struct *task,
    const struct mem_cgroup *mem);

    When the OOM killer is constrained by the memory controller, the exclusion
    of tasks that are not a member of that controller was previously misplaced
    and appeared in the badness scoring function. It should be excluded
    during the tasklist scan in select_bad_process() instead.

    [akpm@linux-foundation.org: build fix]
    Cc: Christoph Lameter
    Cc: Balbir Singh
    Signed-off-by: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Rientjes
     
  • Inline functions must preceed their use, so mm_cgroup() should be defined
    in linux/memcontrol.h.

    include/linux/memcontrol.h:48: warning: 'mm_cgroup' declared inline after
    being called
    include/linux/memcontrol.h:48: warning: previous declaration of
    'mm_cgroup' was here

    [akpm@linux-foundation.org: build fix]
    [akpm@linux-foundation.org: nuther build fix]
    Cc: Balbir Singh
    Signed-off-by: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Rientjes
     
  • Nick Piggin pointed out that swap cache and page cache addition routines
    could be called from non GFP_KERNEL contexts. This patch makes the
    charging routine aware of the gfp context. Charging might fail if the
    cgroup is over it's limit, in which case a suitable error is returned.

    This patch was tested on a Powerpc box. I am still looking at being able
    to test the path, through which allocations happen in non GFP_KERNEL
    contexts.

    [kamezawa.hiroyu@jp.fujitsu.com: problem with ZONE_MOVABLE]
    Signed-off-by: Balbir Singh
    Cc: Pavel Emelianov
    Cc: Paul Menage
    Cc: Peter Zijlstra
    Cc: "Eric W. Biederman"
    Cc: Nick Piggin
    Cc: Kirill Korotaev
    Cc: Herbert Poetzl
    Cc: David Rientjes
    Cc: Vaidyanathan Srinivasan
    Signed-off-by: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Balbir Singh
     
  • Make page_referenced() cgroup aware. Without this patch, page_referenced()
    can cause a page to be skipped while reclaiming pages. This patch ensures
    that other cgroups do not hold pages in a particular cgroup hostage. It
    is required to ensure that shared pages are freed from a cgroup when they
    are not actively referenced from the cgroup that brought them in

    Signed-off-by: Balbir Singh
    Cc: Pavel Emelianov
    Cc: Paul Menage
    Cc: Peter Zijlstra
    Cc: "Eric W. Biederman"
    Cc: Nick Piggin
    Cc: Kirill Korotaev
    Cc: Herbert Poetzl
    Cc: David Rientjes
    Cc: Vaidyanathan Srinivasan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Balbir Singh
     
  • Choose if we want cached pages to be accounted or not. By default both are
    accounted for. A new set of tunables are added.

    echo -n 1 > mem_control_type

    switches the accounting to account for only mapped pages

    echo -n 3 > mem_control_type

    switches the behaviour back

    [bunk@kernel.org: mm/memcontrol.c: clenups]
    [akpm@linux-foundation.org: fix sparc32 build]
    Signed-off-by: Balbir Singh
    Cc: Pavel Emelianov
    Cc: Paul Menage
    Cc: Peter Zijlstra
    Cc: "Eric W. Biederman"
    Cc: Nick Piggin
    Cc: Kirill Korotaev
    Cc: Herbert Poetzl
    Cc: David Rientjes
    Cc: Vaidyanathan Srinivasan
    Signed-off-by: Adrian Bunk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Balbir Singh
     
  • Out of memory handling for cgroups over their limit. A task from the
    cgroup over limit is chosen using the existing OOM logic and killed.

    TODO:
    1. As discussed in the OLS BOF session, consider implementing a user
    space policy for OOM handling.

    [akpm@linux-foundation.org: fix build due to oom-killer changes]
    Signed-off-by: Pavel Emelianov
    Signed-off-by: Balbir Singh
    Cc: Paul Menage
    Cc: Peter Zijlstra
    Cc: "Eric W. Biederman"
    Cc: Nick Piggin
    Cc: Kirill Korotaev
    Cc: Herbert Poetzl
    Cc: David Rientjes
    Cc: Vaidyanathan Srinivasan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelianov
     
  • Add the page_cgroup to the per cgroup LRU. The reclaim algorithm has
    been modified to make the isolate_lru_pages() as a pluggable component. The
    scan_control data structure now accepts the cgroup on behalf of which
    reclaims are carried out. try_to_free_pages() has been extended to become
    cgroup aware.

    [akpm@linux-foundation.org: fix warning]
    [Lee.Schermerhorn@hp.com: initialize all scan_control's isolate_pages member]
    [bunk@kernel.org: make do_try_to_free_pages() static]
    [hugh@veritas.com: memcgroup: fix try_to_free order]
    [kamezawa.hiroyu@jp.fujitsu.com: this unlock_page_cgroup() is unnecessary]
    Signed-off-by: Pavel Emelianov
    Signed-off-by: Balbir Singh
    Cc: Paul Menage
    Cc: Peter Zijlstra
    Cc: "Eric W. Biederman"
    Cc: Nick Piggin
    Cc: Kirill Korotaev
    Cc: Herbert Poetzl
    Cc: David Rientjes
    Cc: Vaidyanathan Srinivasan
    Signed-off-by: Lee Schermerhorn
    Signed-off-by: Hugh Dickins
    Signed-off-by: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Balbir Singh
     
  • Add the accounting hooks. The accounting is carried out for RSS and Page
    Cache (unmapped) pages. There is now a common limit and accounting for both.
    The RSS accounting is accounted at page_add_*_rmap() and page_remove_rmap()
    time. Page cache is accounted at add_to_page_cache(),
    __delete_from_page_cache(). Swap cache is also accounted for.

    Each page's page_cgroup is protected with the last bit of the
    page_cgroup pointer, this makes handling of race conditions involving
    simultaneous mappings of a page easier. A reference count is kept in the
    page_cgroup to deal with cases where a page might be unmapped from the RSS
    of all tasks, but still lives in the page cache.

    Credits go to Vaidyanathan Srinivasan for helping with reference counting work
    of the page cgroup. Almost all of the page cache accounting code has help
    from Vaidyanathan Srinivasan.

    [hugh@veritas.com: fix swapoff breakage]
    [akpm@linux-foundation.org: fix locking]
    Signed-off-by: Vaidyanathan Srinivasan
    Signed-off-by: Balbir Singh
    Cc: Pavel Emelianov
    Cc: Paul Menage
    Cc: Peter Zijlstra
    Cc: "Eric W. Biederman"
    Cc: Nick Piggin
    Cc: Kirill Korotaev
    Cc: Herbert Poetzl
    Cc: David Rientjes
    Cc:
    Signed-off-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Balbir Singh
     
  • Basic setup routines, the mm_struct has a pointer to the cgroup that
    it belongs to and the the page has a page_cgroup associated with it.

    Signed-off-by: Pavel Emelianov
    Signed-off-by: Balbir Singh
    Cc: Paul Menage
    Cc: Peter Zijlstra
    Cc: "Eric W. Biederman"
    Cc: Nick Piggin
    Cc: Kirill Korotaev
    Cc: Herbert Poetzl
    Cc: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelianov
     
  • Setup the memory cgroup and add basic hooks and controls to integrate
    and work with the cgroup.

    Signed-off-by: Balbir Singh
    Cc: Pavel Emelianov
    Cc: Paul Menage
    Cc: Peter Zijlstra
    Cc: "Eric W. Biederman"
    Cc: Nick Piggin
    Cc: Kirill Korotaev
    Cc: Herbert Poetzl
    Cc: David Rientjes
    Cc: Vaidyanathan Srinivasan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Balbir Singh