13 Jan, 2012

5 commits

  • No need for two CONFIG_MEMORY_HOTPLUG blocks.

    Signed-off-by: Bob Liu
    Acked-by: Michal Hocko
    Cc: Johannes Weiner
    Acked-by: KAMEZAWA Hiroyuki
    Cc: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bob Liu
     
  • There are multiple places which need to get the swap_cgroup address, so
    add a helper function:

    static struct swap_cgroup *swap_cgroup_getsc(swp_entry_t ent,
    struct swap_cgroup_ctrl **ctrl);

    to simplify the code.

    Signed-off-by: Bob Liu
    Acked-by: Michal Hocko
    Acked-by: KAMEZAWA Hiroyuki
    Cc: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bob Liu
     
  • lookup_page_cgroup() is usually used only against pages that are used in
    userspace.

    The exception is the CONFIG_DEBUG_VM-only memcg check from the page
    allocator: it can run on pages without page_cgroup descriptors allocated
    when the pages are fed into the page allocator for the first time during
    boot or memory hotplug.

    Include the array check only when CONFIG_DEBUG_VM is set and save the
    unnecessary check in production kernels.

    Signed-off-by: Johannes Weiner
    Acked-by: KAMEZAWA Hiroyuki
    Acked-by: Michal Hocko
    Cc: Balbir Singh
    Cc: David Rientjes
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • To find the page corresponding to a certain page_cgroup, the pc->flags
    encoded the node or section ID with the base array to compare the pc
    pointer to.

    Now that the per-memory cgroup LRU lists link page descriptors directly,
    there is no longer any code that knows the struct page_cgroup of a PFN
    but not the struct page.

    [hughd@google.com: remove unused node/section info from pc->flags fix]
    Signed-off-by: Johannes Weiner
    Reviewed-by: KAMEZAWA Hiroyuki
    Reviewed-by: Michal Hocko
    Reviewed-by: Kirill A. Shutemov
    Cc: KAMEZAWA Hiroyuki
    Cc: Michal Hocko
    Cc: "Kirill A. Shutemov"
    Cc: Daisuke Nishimura
    Cc: Balbir Singh
    Cc: Ying Han
    Cc: Greg Thelen
    Cc: Michel Lespinasse
    Cc: Rik van Riel
    Cc: Minchan Kim
    Cc: Christoph Hellwig
    Signed-off-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • Now that all code that operated on global per-zone LRU lists is
    converted to operate on per-memory cgroup LRU lists instead, there is no
    reason to keep the double-LRU scheme around any longer.

    The pc->lru member is removed and page->lru is linked directly to the
    per-memory cgroup LRU lists, which removes two pointers from a
    descriptor that exists for every page frame in the system.

    Signed-off-by: Johannes Weiner
    Signed-off-by: Hugh Dickins
    Signed-off-by: Ying Han
    Reviewed-by: KAMEZAWA Hiroyuki
    Reviewed-by: Michal Hocko
    Reviewed-by: Kirill A. Shutemov
    Cc: Daisuke Nishimura
    Cc: Balbir Singh
    Cc: Greg Thelen
    Cc: Michel Lespinasse
    Cc: Rik van Riel
    Cc: Minchan Kim
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     

03 Nov, 2011

2 commits

  • warning: symbol 'swap_cgroup_ctrl' was not declared. Should it be static?

    Signed-off-by: H Hartley Sweeten
    Cc: Paul Menage
    Cc: Li Zefan
    Acked-by: Balbir Singh
    Cc: Daisuke Nishimura
    Acked-by: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    H Hartley Sweeten
     
  • When the cgroup base was allocated with kmalloc, it was necessary to
    annotate the variable with kmemleak_not_leak(). But because it has
    recently been changed to be allocated with alloc_page() (which skips
    kmemleak checks) causes a warning on boot up.

    I was triggering this output:

    allocated 8388608 bytes of page_cgroup
    please try 'cgroup_disable=memory' option if you don't want memory cgroups
    kmemleak: Trying to color unknown object at 0xf5840000 as Grey
    Pid: 0, comm: swapper Not tainted 3.0.0-test #12
    Call Trace:
    [] ? printk+0x1d/0x1f^M
    [] paint_ptr+0x4f/0x78
    [] kmemleak_not_leak+0x58/0x7d
    [] ? __rcu_read_unlock+0x9/0x7d
    [] kmemleak_init+0x19d/0x1e9
    [] start_kernel+0x346/0x3ec
    [] ? loglevel+0x18/0x18
    [] i386_start_kernel+0xaa/0xb0

    After a bit of debugging I tracked the object 0xf840000 (and others) down
    to the cgroup code. The change from allocating base with kmalloc to
    alloc_page() has the base not calling kmemleak_alloc() which adds the
    pointer to the object_tree_root, but kmemleak_not_leak() adds it to the
    crt_early_log[] table. On kmemleak_init(), the entry is found in the
    early_log[] but not the object_tree_root, and this error message is
    displayed.

    If alloc_page() fails then it defaults back to vmalloc() which still uses
    the kmemleak_alloc() which makes us still need the kmemleak_not_leak()
    call. The solution is to call the kmemleak_alloc() directly if the
    alloc_page() succeeds.

    Reviewed-by: Michal Hocko
    Signed-off-by: Steven Rostedt
    Acked-by: Catalin Marinas
    Signed-off-by: Jonathan Nieder
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Steven Rostedt
     

15 Sep, 2011

1 commit


26 Jul, 2011

2 commits


16 Jun, 2011

1 commit

  • Commit 21a3c9646873 ("memcg: allocate memory cgroup structures in local
    nodes") makes page_cgroup allocation as NUMA aware. But that caused a
    problem https://bugzilla.kernel.org/show_bug.cgi?id=36192.

    The problem was getting a NID from invalid struct pages, which was not
    initialized because it was out-of-node, out of [node_start_pfn,
    node_end_pfn)

    Now, with sparsemem, page_cgroup_init scans pfn from 0 to max_pfn. But
    this may scan a pfn which is not on any node and can access memmap which
    is not initialized.

    This makes page_cgroup_init() for SPARSEMEM node aware and remove a code
    to get nid from page->flags. (Then, we'll use valid NID always.)

    [akpm@linux-foundation.org: try to fix up comments]
    Signed-off-by: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     

27 May, 2011

3 commits

  • Move page-freeing code out of swap_cgroup_mutex in the hope that it could
    reduce few of theoretical contentions between swapons and/or swapoffs.

    This is just a cleanup, no functional changes.

    Signed-off-by: Namhyung Kim
    Acked-by: KAMEZAWA Hiroyuki
    Cc: Balbir Singh
    Cc: Daisuke Nishimura
    Cc: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Namhyung Kim
     
  • It allocated one more page than necessary if @max_pages was a multiple of
    SC_PER_PAGE.

    Signed-off-by: Namhyung Kim
    Acked-by: KAMEZAWA Hiroyuki
    Acked-by: Balbir Singh
    Cc: Daisuke Nishimura
    Cc: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Namhyung Kim
     
  • Commit ca371c0d7e23 ("memcg: fix page_cgroup fatal error in FLATMEM")
    removes call to alloc_bootmem() in the function so that it can be marked
    as __meminit to reduce memory usage when MEMORY_HOTPLUG=n.

    Also as the new helper function alloc_page_cgroup() is called only in the
    function, it should be marked too.

    Signed-off-by: Namhyung Kim
    Acked-by: KAMEZAWA Hiroyuki
    Acked-by: Balbir Singh
    Cc: Michal Hocko
    Cc: Daisuke Nishimura
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Namhyung Kim
     

12 May, 2011

1 commit

  • Commit dde79e005a769 ("page_cgroup: reduce allocation overhead for
    page_cgroup array for CONFIG_SPARSEMEM") added a regression that the
    memory cgroup data structures all end up in node 0 because the first
    attempt at allocating them would not pass in a node hint. Since the
    initialization runs on CPU #0 it would all end up node 0. This is a
    problem on large memory systems, where node 0 would lose a lot of
    memory.

    Change the alloc_pages_exact() to alloc_pages_exact_nid(). This will
    still fall back to other nodes if not enough memory is available.

    [ RED-PEN: right now it would fall back first before trying
    vmalloc_node. Probably not the best strategy ... But I left it like
    that for now. ]

    Signed-off-by: Andi Kleen
    Reported-by: Doug Nelson
    Cc: David Rientjes
    Reviewed-by: Michal Hocko
    Cc: Dave Hansen
    Acked-by: Balbir Singh
    Acked-by: Johannes Weiner
    Reviewed-by: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andi Kleen
     

31 Mar, 2011

1 commit


24 Mar, 2011

3 commits

  • KAMEZAWA Hiroyuki noted that free_pages_cgroup doesn't have to check for
    PageReserved because we never store the array on reserved pages (neither
    alloc_pages_exact nor vmalloc use those pages).

    So we can replace the check by a BUG_ON.

    Signed-off-by: Michal Hocko
    Acked-by: KAMEZAWA Hiroyuki
    Cc: Balbir Singh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     
  • Currently we are allocating a single page_cgroup array per memory section
    (stored in mem_section->base) when CONFIG_SPARSEMEM is selected. This is
    correct but memory inefficient solution because the allocated memory
    (unless we fall back to vmalloc) is not kmalloc friendly:

    - 32b - 16384 entries (20B per entry) fit into 327680B so the
    524288B slab cache is used
    - 32b with PAE - 131072 entries with 2621440B fit into 4194304B
    - 64b - 32768 entries (40B per entry) fit into 2097152 cache

    This is ~37% wasted space per memory section and it sumps up for the whole
    memory. On a x86_64 machine it is something like 6MB per 1GB of RAM.

    We can reduce the internal fragmentation by using alloc_pages_exact which
    allocates PAGE_SIZE aligned blocks so we will get down to
    Cc: Dave Hansen
    Acked-by: KAMEZAWA Hiroyuki
    Cc: Balbir Singh
    Signed-off-by: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     
  • In struct page_cgroup, we have a full word for flags but only a few are
    reserved. Use the remaining upper bits to encode, depending on
    configuration, the node or the section, to enable page_cgroup-to-page
    lookups without a direct pointer.

    This saves a full word for every page in a system with memory cgroups
    enabled.

    Signed-off-by: Johannes Weiner
    Acked-by: KAMEZAWA Hiroyuki
    Cc: Daisuke Nishimura
    Cc: Balbir Singh
    Cc: Minchan Kim
    Cc: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     

23 Mar, 2011

1 commit

  • While looking at some other notifier callbacks I noticed this code could
    use a simple cleanup.

    notifier_from_errno() no longer needs the if (ret)/else conditional. That
    same conditional is now done in notifier_from_errno().

    Signed-off-by: Prarit Bhargava
    Cc: Paul Menage
    Cc: Li Zefan
    Acked-by: Pekka Enberg
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Prarit Bhargava
     

19 Jul, 2010

1 commit

  • The pointer to the page_cgroup table allocated in
    init_section_page_cgroup() is stored in section->page_cgroup as (base -
    pfn). Since this value does not point to the beginning or inside the
    allocated memory block, kmemleak reports a false positive.

    This was reported in bugzilla.kernel.org as #16297.

    Signed-off-by: Catalin Marinas
    Reported-by: Adrien Dessemond
    Reviewed-by: KAMEZAWA Hiroyuki
    Cc: Pekka Enberg
    Cc: Andrew Morton

    Catalin Marinas
     

18 Mar, 2010

1 commit

  • swap_cgroup uses 2bytes data and uses cmpxchg in a new operation. 2byte
    cmpxchg/xchg is not available on some archs. This patch replaces
    cmpxchg/xchg with operations under lock.

    Signed-off-by: KAMEZAWA Hiroyuki
    Reported-by: Sachin Sant wrote:
    Acked-by: Balbir Singh
    Acked-by: Daisuke Nishimura
    Cc: Li Zefan
    Cc: Benjamin Herrenschmidt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     

13 Mar, 2010

1 commit

  • This patch is another core part of this move-charge-at-task-migration
    feature. It enables moving charges of anonymous swaps.

    To move the charge of swap, we need to exchange swap_cgroup's record.

    In current implementation, swap_cgroup's record is protected by:

    - page lock: if the entry is on swap cache.
    - swap_lock: if the entry is not on swap cache.

    This works well in usual swap-in/out activity.

    But this behavior make the feature of moving swap charge check many
    conditions to exchange swap_cgroup's record safely.

    So I changed modification of swap_cgroup's recored(swap_cgroup_record())
    to use xchg, and define a new function to cmpxchg swap_cgroup's record.

    This patch also enables moving charge of non pte_present but not uncharged
    swap caches, which can be exist on swap-out path, by getting the target
    pages via find_get_page() as do_mincore() does.

    [kosaki.motohiro@jp.fujitsu.com: fix ia64 build]
    [akpm@linux-foundation.org: fix typos]
    Signed-off-by: Daisuke Nishimura
    Cc: Balbir Singh
    Acked-by: KAMEZAWA Hiroyuki
    Cc: Li Zefan
    Cc: Paul Menage
    Cc: Daisuke Nishimura
    Signed-off-by: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daisuke Nishimura
     

22 Sep, 2009

1 commit

  • To initialize hotadded node, some pages are allocated. At that time, the
    node hasn't memory, this makes the allocation always fail. In such case,
    let's allocate pages from other nodes.

    Signed-off-by: Shaohua Li
    Signed-off-by: Yakui Zhao
    Cc: Mel Gorman
    Cc: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Shaohua Li
     

19 Jun, 2009

3 commits

  • We don't need to check do_swap_account in the case that the function which
    checks do_swap_account will never get called if do_swap_account == 0.

    Signed-off-by: Li Zefan
    Cc: Balbir Singh
    Acked-by: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Li Zefan
     
  • Add file RSS tracking per memory cgroup

    We currently don't track file RSS, the RSS we report is actually anon RSS.
    All the file mapped pages, come in through the page cache and get
    accounted there. This patch adds support for accounting file RSS pages.
    It should

    1. Help improve the metrics reported by the memory resource controller
    2. Will form the basis for a future shared memory accounting heuristic
    that has been proposed by Kamezawa.

    Unfortunately, we cannot rename the existing "rss" keyword used in
    memory.stat to "anon_rss". We however, add "mapped_file" data and hope to
    educate the end user through documentation.

    [hugh.dickins@tiscali.co.uk: fix mem_cgroup_update_mapped_file_stat oops]
    Signed-off-by: Balbir Singh
    Acked-by: KAMEZAWA Hiroyuki
    Cc: Li Zefan
    Cc: Paul Menage
    Cc: Dhaval Giani
    Cc: Daisuke Nishimura
    Cc: YAMAMOTO Takashi
    Cc: KOSAKI Motohiro
    Cc: David Rientjes
    Signed-off-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Balbir Singh
     
  • Fix some cgroup messages to read better.
    Update MAINTAINERS to include mm/*cgroup* files.

    Signed-off-by: Randy Dunlap
    Cc: Paul Menage
    Cc: Li Zefan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     

12 Jun, 2009

2 commits

  • Now, SLAB is configured in very early stage and it can be used in
    init routine now.

    But replacing alloc_bootmem() in FLAT/DISCONTIGMEM's page_cgroup()
    initialization breaks the allocation, now.
    (Works well in SPARSEMEM case...it supports MEMORY_HOTPLUG and
    size of page_cgroup is in reasonable size (< 1 << MAX_ORDER.)

    This patch revive FLATMEM+memory cgroup by using alloc_bootmem.

    In future,
    We stop to support FLATMEM (if no users) or rewrite codes for flatmem
    completely.But this will adds more messy codes and overheads.

    Reported-by: Li Zefan
    Tested-by: Li Zefan
    Tested-by: Ingo Molnar
    Signed-off-by: KAMEZAWA Hiroyuki
    Signed-off-by: Pekka Enberg

    KAMEZAWA Hiroyuki
     
  • The bootmem allocator is no longer available for page_cgroup_init() because we
    set up the kernel slab allocator much earlier now.

    Cc: Ingo Molnar
    Cc: Johannes Weiner
    Cc: Linus Torvalds
    Signed-off-by: Yinghai Lu
    Signed-off-by: Pekka Enberg

    Yinghai Lu
     

03 Apr, 2009

2 commits

  • It's pointed out that swap_cgroup's message at swapon() is nonsense.
    Because

    * It can be calculated very easily if all necessary information is
    written in Kconfig.

    * It's not necessary to annoying people at every swapon().

    In other view, now, memory usage per swp_entry is reduced to 2bytes from
    8bytes(64bit) and I think it's reasonably small.

    Reported-by: Hugh Dickins
    Signed-off-by: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • Try to use CSS ID for records in swap_cgroup. By this, on 64bit machine,
    size of swap_cgroup goes down to 2 bytes from 8bytes.

    This means, when 2GB of swap is equipped, (assume the page size is 4096bytes)

    From size of swap_cgroup = 2G/4k * 8 = 4Mbytes.
    To size of swap_cgroup = 2G/4k * 2 = 1Mbytes.

    Reduction is large. Of course, there are trade-offs. This CSS ID will
    add overhead to swap-in/swap-out/swap-free.

    But in general,
    - swap is a resource which the user tend to avoid use.
    - If swap is never used, swap_cgroup area is not used.
    - Reading traditional manuals, size of swap should be proportional to
    size of memory. Memory size of machine is increasing now.

    I think reducing size of swap_cgroup makes sense.

    Note:
    - ID->CSS lookup routine has no locks, it's under RCU-Read-Side.
    - memcg can be obsolete at rmdir() but not freed while refcnt from
    swap_cgroup is available.

    Changelog v4->v5:
    - reworked on to memcg-charge-swapcache-to-proper-memcg.patch
    Changlog ->v4:
    - fixed not configured case.
    - deleted unnecessary comments.
    - fixed NULL pointer bug.
    - fixed message in dmesg.

    [nishimura@mxp.nes.nec.co.jp: css_tryget can be called twice in !PageCgroupUsed case]
    Signed-off-by: KAMEZAWA Hiroyuki
    Cc: Li Zefan
    Cc: Balbir Singh
    Cc: Paul Menage
    Cc: Hugh Dickins
    Signed-off-by: Daisuke Nishimura
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     

12 Feb, 2009

1 commit

  • page_cgroup's page allocation at init/memory hotplug uses kmalloc() and
    vmalloc(). If kmalloc() failes, vmalloc() is used.

    This is because vmalloc() is very limited resource on 32bit systems.
    We want to use kmalloc() first.

    But in this kind of call, __GFP_NOWARN should be specified.

    Reported-by: Heiko Carstens
    Signed-off-by: KAMEZAWA Hiroyuki
    Acked-by: Balbir Singh
    Acked-by: Pekka Enberg
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     

09 Jan, 2009

4 commits

  • We check mem_cgroup is disabled or not by checking
    mem_cgroup_subsys.disabled. I think it has more references than expected,
    now.

    replacing
    if (mem_cgroup_subsys.disabled)
    with
    if (mem_cgroup_disabled())

    give us good look, I think.

    [kamezawa.hiroyu@jp.fujitsu.com: fix typo]
    Signed-off-by: KAMEZAWA Hiroyuki
    Cc: Li Zefan
    Cc: Balbir Singh
    Cc: Pavel Emelyanov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hirokazu Takahashi
     
  • A big patch for changing memcg's LRU semantics.

    Now,
    - page_cgroup is linked to mem_cgroup's its own LRU (per zone).

    - LRU of page_cgroup is not synchronous with global LRU.

    - page and page_cgroup is one-to-one and statically allocated.

    - To find page_cgroup is on what LRU, you have to check pc->mem_cgroup as
    - lru = page_cgroup_zoneinfo(pc, nid_of_pc, zid_of_pc);

    - SwapCache is handled.

    And, when we handle LRU list of page_cgroup, we do following.

    pc = lookup_page_cgroup(page);
    lock_page_cgroup(pc); .....................(1)
    mz = page_cgroup_zoneinfo(pc);
    spin_lock(&mz->lru_lock);
    .....add to LRU
    spin_unlock(&mz->lru_lock);
    unlock_page_cgroup(pc);

    But (1) is spin_lock and we have to be afraid of dead-lock with zone->lru_lock.
    So, trylock() is used at (1), now. Without (1), we can't trust "mz" is correct.

    This is a trial to remove this dirty nesting of locks.
    This patch changes mz->lru_lock to be zone->lru_lock.
    Then, above sequence will be written as

    spin_lock(&zone->lru_lock); # in vmscan.c or swap.c via global LRU
    mem_cgroup_add/remove/etc_lru() {
    pc = lookup_page_cgroup(page);
    mz = page_cgroup_zoneinfo(pc);
    if (PageCgroupUsed(pc)) {
    ....add to LRU
    }
    spin_lock(&zone->lru_lock); # in vmscan.c or swap.c via global LRU

    This is much simpler.
    (*) We're safe even if we don't take lock_page_cgroup(pc). Because..
    1. When pc->mem_cgroup can be modified.
    - at charge.
    - at account_move().
    2. at charge
    the PCG_USED bit is not set before pc->mem_cgroup is fixed.
    3. at account_move()
    the page is isolated and not on LRU.

    Pros.
    - easy for maintenance.
    - memcg can make use of laziness of pagevec.
    - we don't have to duplicated LRU/Active/Unevictable bit in page_cgroup.
    - LRU status of memcg will be synchronized with global LRU's one.
    - # of locks are reduced.
    - account_move() is simplified very much.
    Cons.
    - may increase cost of LRU rotation.
    (no impact if memcg is not configured.)

    Signed-off-by: KAMEZAWA Hiroyuki
    Cc: Li Zefan
    Cc: Balbir Singh
    Cc: Pavel Emelyanov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • For accounting swap, we need a record per swap entry, at least.

    This patch adds following function.
    - swap_cgroup_swapon() .... called from swapon
    - swap_cgroup_swapoff() ... called at the end of swapoff.

    - swap_cgroup_record() .... record information of swap entry.
    - swap_cgroup_lookup() .... lookup information of swap entry.

    This patch just implements "how to record information". No actual method
    for limit the usage of swap. These routine uses flat table to record and
    lookup. "wise" lookup system like radix-tree requires requires memory
    allocation at new records but swap-out is usually called under memory
    shortage (or memcg hits limit.) So, I used static allocation. (maybe
    dynamic allocation is not very hard but it adds additional memory
    allocation in memory shortage path.)

    Note1: In this, we use pointer to record information and this means
    8bytes per swap entry. I think we can reduce this when we
    create "id of cgroup" in the range of 0-65535 or 0-255.

    Reported-by: Daisuke Nishimura
    Reviewed-by: Daisuke Nishimura
    Tested-by: Daisuke Nishimura
    Reported-by: Hugh Dickins
    Reported-by: Balbir Singh
    Reported-by: Andrew Morton
    Signed-off-by: KAMEZAWA Hiroyuki
    Cc: Pavel Emelianov
    Cc: Li Zefan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • In init_section_page_cgroup() the section a given pfn belongs to is
    calculated at the top of the function and, despite the fact that the
    pfn/section correspondence does not change, it is recalculated further
    down the same function. By computing this just once and reusing that
    value we save some bytes in the object file and do not waste CPU cycles.

    Signed-off-by: Fernando Luis Vazquez Cao
    Reviewed-by: KAMEZAWA Hiroyuki
    Cc: Daisuke Nishimura
    Cc: Balbir Singh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Fernando Luis Vazquez Cao
     

07 Jan, 2009

1 commit


11 Dec, 2008

1 commit


02 Dec, 2008

1 commit

  • Fixes for memcg/memory hotplug.

    While memory hotplug allocate/free memmap, page_cgroup doesn't free
    page_cgroup at OFFLINE when page_cgroup is allocated via bootomem.
    (Because freeing bootmem requires special care.)

    Then, if page_cgroup is allocated by bootmem and memmap is freed/allocated
    by memory hotplug, page_cgroup->page == page is no longer true.

    But current MEM_ONLINE handler doesn't check it and update
    page_cgroup->page if it's not necessary to allocate page_cgroup. (This
    was not found because memmap is not freed if SPARSEMEM_VMEMMAP is y.)

    And I noticed that MEM_ONLINE can be called against "part of section".
    So, freeing page_cgroup at CANCEL_ONLINE will cause trouble. (freeing
    used page_cgroup) Don't rollback at CANCEL.

    One more, current memory hotplug notifier is stopped by slub because it
    sets NOTIFY_STOP_MASK to return vaule. So, page_cgroup's callback never
    be called. (low priority than slub now.)

    I think this slub's behavior is not intentional(BUG). and fixes it.

    Another way to be considered about page_cgroup allocation:
    - free page_cgroup at OFFLINE even if it's from bootmem
    and remove specieal handler. But it requires more changes.

    Addresses http://bugzilla.kernel.org/show_bug.cgi?id=12041

    Signed-off-by: KAMEZAWA Hiruyoki
    Cc: Li Zefan
    Cc: Balbir Singh
    Cc: Pavel Emelyanov
    Tested-by: Badari Pulavarty
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     

01 Dec, 2008

1 commit