03 Apr, 2020

3 commits

  • For shared mappings, the pointer to the hugetlb_cgroup to uncharge lives
    in the resv_map entries, in file_region->reservation_counter.

    After a call to region_chg, we charge the approprate hugetlb_cgroup, and
    if successful, we pass on the hugetlb_cgroup info to a follow up
    region_add call. When a file_region entry is added to the resv_map via
    region_add, we put the pointer to that cgroup in
    file_region->reservation_counter. If charging doesn't succeed, we report
    the error to the caller, so that the kernel fails the reservation.

    On region_del, which is when the hugetlb memory is unreserved, we also
    uncharge the file_region->reservation_counter.

    [akpm@linux-foundation.org: forward declare struct file_region]
    Signed-off-by: Mina Almasry
    Signed-off-by: Andrew Morton
    Reviewed-by: Mike Kravetz
    Cc: David Rientjes
    Cc: Greg Thelen
    Cc: Mike Kravetz
    Cc: Sandipan Das
    Cc: Shakeel Butt
    Cc: Shuah Khan
    Link: http://lkml.kernel.org/r/20200211213128.73302-5-almasrymina@google.com
    Signed-off-by: Linus Torvalds

    Mina Almasry
     
  • Normally the pointer to the cgroup to uncharge hangs off the struct page,
    and gets queried when it's time to free the page. With hugetlb_cgroup
    reservations, this is not possible. Because it's possible for a page to
    be reserved by one task and actually faulted in by another task.

    The best place to put the hugetlb_cgroup pointer to uncharge for
    reservations is in the resv_map. But, because the resv_map has different
    semantics for private and shared mappings, the code patch to
    charge/uncharge shared and private mappings is different. This patch
    implements charging and uncharging for private mappings.

    For private mappings, the counter to uncharge is in
    resv_map->reservation_counter. On initializing the resv_map this is set
    to NULL. On reservation of a region in private mapping, the tasks
    hugetlb_cgroup is charged and the hugetlb_cgroup is placed is
    resv_map->reservation_counter.

    On hugetlb_vm_op_close, we uncharge resv_map->reservation_counter.

    [akpm@linux-foundation.org: forward declare struct resv_map]
    Signed-off-by: Mina Almasry
    Signed-off-by: Andrew Morton
    Reviewed-by: Mike Kravetz
    Acked-by: David Rientjes
    Cc: Greg Thelen
    Cc: Sandipan Das
    Cc: Shakeel Butt
    Cc: Shuah Khan
    Link: http://lkml.kernel.org/r/20200211213128.73302-3-almasrymina@google.com
    Signed-off-by: Linus Torvalds

    Mina Almasry
     
  • Augments hugetlb_cgroup_charge_cgroup to be able to charge hugetlb usage
    or hugetlb reservation counter.

    Adds a new interface to uncharge a hugetlb_cgroup counter via
    hugetlb_cgroup_uncharge_counter.

    Integrates the counter with hugetlb_cgroup, via hugetlb_cgroup_init,
    hugetlb_cgroup_have_usage, and hugetlb_cgroup_css_offline.

    Signed-off-by: Mina Almasry
    Signed-off-by: Andrew Morton
    Acked-by: Mike Kravetz
    Acked-by: David Rientjes
    Cc: Greg Thelen
    Cc: Sandipan Das
    Cc: Shakeel Butt
    Cc: Shuah Khan
    Link: http://lkml.kernel.org/r/20200211213128.73302-2-almasrymina@google.com
    Signed-off-by: Linus Torvalds

    Mina Almasry
     

21 May, 2016

1 commit

  • Macro HUGETLBFS_SB is clear enough, so one statement is clearer than 3
    lines statements.

    Remove redundant return statements for non-return functions, which can
    save lines, at least.

    Signed-off-by: Chen Gang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Chen Gang
     

07 Nov, 2015

1 commit

  • Hugh has pointed that compound_head() call can be unsafe in some
    context. There's one example:

    CPU0 CPU1

    isolate_migratepages_block()
    page_count()
    compound_head()
    !!PageTail() == true
    put_page()
    tail->first_page = NULL
    head = tail->first_page
    alloc_pages(__GFP_COMP)
    prep_compound_page()
    tail->first_page = head
    __SetPageTail(p);
    !!PageTail() == true

    The race is pure theoretical. I don't it's possible to trigger it in
    practice. But who knows.

    We can fix the race by changing how encode PageTail() and compound_head()
    within struct page to be able to update them in one shot.

    The patch introduces page->compound_head into third double word block in
    front of compound_dtor and compound_order. Bit 0 encodes PageTail() and
    the rest bits are pointer to head page if bit zero is set.

    The patch moves page->pmd_huge_pte out of word, just in case if an
    architecture defines pgtable_t into something what can have the bit 0
    set.

    hugetlb_cgroup uses page->lru.next in the second tail page to store
    pointer struct hugetlb_cgroup. The patch switch it to use page->private
    in the second tail page instead. The space is free since ->first_page is
    removed from the union.

    The patch also opens possibility to remove HUGETLB_CGROUP_MIN_ORDER
    limitation, since there's now space in first tail page to store struct
    hugetlb_cgroup pointer. But that's out of scope of the patch.

    That means page->compound_head shares storage space with:

    - page->lru.next;
    - page->next;
    - page->rcu_head.next;

    That's too long list to be absolutely sure, but looks like nobody uses
    bit 0 of the word.

    page->rcu_head.next guaranteed[1] to have bit 0 clean as long as we use
    call_rcu(), call_rcu_bh(), call_rcu_sched(), or call_srcu(). But future
    call_rcu_lazy() is not allowed as it makes use of the bit and we can
    get false positive PageTail().

    [1] http://lkml.kernel.org/g/20150827163634.GD4029@linux.vnet.ibm.com

    Signed-off-by: Kirill A. Shutemov
    Acked-by: Michal Hocko
    Reviewed-by: Andrea Arcangeli
    Cc: Hugh Dickins
    Cc: David Rientjes
    Cc: Vlastimil Babka
    Acked-by: Paul E. McKenney
    Cc: Aneesh Kumar K.V
    Cc: Andi Kleen
    Cc: Christoph Lameter
    Cc: Joonsoo Kim
    Cc: Sergey Senozhatsky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     

18 Sep, 2015

1 commit

  • Replace cgroup_subsys->disabled tests in controllers with
    cgroup_subsys_enabled(). cgroup_subsys_enabled() requires literal
    subsys name as its parameter and thus can't be used for cgroup core
    which iterates through controllers. For cgroup core, introduce and
    use cgroup_ssid_enabled() which uses slower static_key_enabled() test
    and can be indexed by subsys ID.

    This leaves cgroup_subsys->disabled unused. Removed.

    Signed-off-by: Tejun Heo
    Acked-by: Zefan Li
    Cc: Johannes Weiner
    Cc: Michal Hocko

    Tejun Heo
     

11 Dec, 2014

1 commit


08 Feb, 2014

1 commit

  • cgroup_subsys is a bit messier than it needs to be.

    * The name of a subsys can be different from its internal identifier
    defined in cgroup_subsys.h. Most subsystems use the matching name
    but three - cpu, memory and perf_event - use different ones.

    * cgroup_subsys_id enums are postfixed with _subsys_id and each
    cgroup_subsys is postfixed with _subsys. cgroup.h is widely
    included throughout various subsystems, it doesn't and shouldn't
    have claim on such generic names which don't have any qualifier
    indicating that they belong to cgroup.

    * cgroup_subsys->subsys_id should always equal the matching
    cgroup_subsys_id enum; however, we require each controller to
    initialize it and then BUG if they don't match, which is a bit
    silly.

    This patch cleans up cgroup_subsys names and initialization by doing
    the followings.

    * cgroup_subsys_id enums are now postfixed with _cgrp_id, and each
    cgroup_subsys with _cgrp_subsys.

    * With the above, renaming subsys identifiers to match the userland
    visible names doesn't cause any naming conflicts. All non-matching
    identifiers are renamed to match the official names.

    cpu_cgroup -> cpu
    mem_cgroup -> memory
    perf -> perf_event

    * controllers no longer need to initialize ->subsys_id and ->name.
    They're generated in cgroup core and set automatically during boot.

    * Redundant cgroup_subsys declarations removed.

    * While updating BUG_ON()s in cgroup_init_early(), convert them to
    WARN()s. BUGging that early during boot is stupid - the kernel
    can't print anything, even through serial console and the trap
    handler doesn't even link stack frame properly for back-tracing.

    This patch doesn't introduce any behavior changes.

    v2: Rebased on top of fe1217c4f3f7 ("net: net_cls: move cgroupfs
    classid handling into core").

    Signed-off-by: Tejun Heo
    Acked-by: Neil Horman
    Acked-by: "David S. Miller"
    Acked-by: "Rafael J. Wysocki"
    Acked-by: Michal Hocko
    Acked-by: Peter Zijlstra
    Acked-by: Aristeu Rozanski
    Acked-by: Ingo Molnar
    Acked-by: Li Zefan
    Cc: Johannes Weiner
    Cc: Balbir Singh
    Cc: KAMEZAWA Hiroyuki
    Cc: Serge E. Hallyn
    Cc: Vivek Goyal
    Cc: Thomas Graf

    Tejun Heo
     

24 Jan, 2014

1 commit

  • Most of the VM_BUG_ON assertions are performed on a page. Usually, when
    one of these assertions fails we'll get a BUG_ON with a call stack and
    the registers.

    I've recently noticed based on the requests to add a small piece of code
    that dumps the page to various VM_BUG_ON sites that the page dump is
    quite useful to people debugging issues in mm.

    This patch adds a VM_BUG_ON_PAGE(cond, page) which beyond doing what
    VM_BUG_ON() does, also dumps the page before executing the actual
    BUG_ON.

    [akpm@linux-foundation.org: fix up includes]
    Signed-off-by: Sasha Levin
    Cc: "Kirill A. Shutemov"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sasha Levin
     

19 Dec, 2012

1 commit

  • Build kernel with CONFIG_HUGETLBFS=y,CONFIG_HUGETLB_PAGE=y and
    CONFIG_CGROUP_HUGETLB=y, then specify hugepagesz=xx boot option, system
    will fail to boot.

    This failure is caused by following code path:

    setup_hugepagesz
    hugetlb_add_hstate
    hugetlb_cgroup_file_init
    cgroup_add_cftypes
    kzalloc
    Signed-off-by: Jiang Liu
    Reviewed-by: Aneesh Kumar K.V
    Acked-by: Michal Hocko
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jianguo Wu
     

01 Aug, 2012

5 commits

  • With HugeTLB pages, hugetlb cgroup is uncharged in compound page
    destructor. Since we are holding a hugepage reference, we can be sure
    that old page won't get uncharged till the last put_page().

    Signed-off-by: Aneesh Kumar K.V
    Cc: David Rientjes
    Acked-by: KAMEZAWA Hiroyuki
    Cc: Hillf Danton
    Cc: Michal Hocko
    Cc: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Aneesh Kumar K.V
     
  • Add the control files for hugetlb controller

    [akpm@linux-foundation.org: s/CONFIG_CGROUP_HUGETLB_RES_CTLR/CONFIG_MEMCG_HUGETLB/g]
    [akpm@linux-foundation.org: s/CONFIG_MEMCG_HUGETLB/CONFIG_CGROUP_HUGETLB/]
    Signed-off-by: Aneesh Kumar K.V
    Cc: David Rientjes
    Acked-by: KAMEZAWA Hiroyuki
    Cc: Hillf Danton
    Reviewed-by: Michal Hocko
    Cc: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Aneesh Kumar K.V
     
  • Add the charge and uncharge routines for hugetlb cgroup. We do cgroup
    charging in page alloc and uncharge in compound page destructor.
    Assigning page's hugetlb cgroup is protected by hugetlb_lock.

    [liwp@linux.vnet.ibm.com: add huge_page_order check to avoid incorrect uncharge]
    Signed-off-by: Aneesh Kumar K.V
    Cc: David Rientjes
    Acked-by: KAMEZAWA Hiroyuki
    Cc: Hillf Danton
    Cc: Michal Hocko
    Cc: KOSAKI Motohiro
    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: Wanpeng Li
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Aneesh Kumar K.V
     
  • Add the hugetlb cgroup pointer to 3rd page lru.next. This limit the usage
    to hugetlb cgroup to only hugepages with 3 or more normal pages. I guess
    that is an acceptable limitation.

    Signed-off-by: Aneesh Kumar K.V
    Cc: David Rientjes
    Acked-by: KAMEZAWA Hiroyuki
    Cc: Hillf Danton
    Reviewed-by: Michal Hocko
    Cc: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Aneesh Kumar K.V
     
  • Implement a new controller that allows us to control HugeTLB allocations.
    The extension allows to limit the HugeTLB usage per control group and
    enforces the controller limit during page fault. Since HugeTLB doesn't
    support page reclaim, enforcing the limit at page fault time implies that,
    the application will get SIGBUS signal if it tries to access HugeTLB pages
    beyond its limit. This requires the application to know beforehand how
    much HugeTLB pages it would require for its use.

    The charge/uncharge calls will be added to HugeTLB code in later patch.
    Support for cgroup removal will be added in later patches.

    [akpm@linux-foundation.org: s/CONFIG_CGROUP_HUGETLB_RES_CTLR/CONFIG_MEMCG_HUGETLB/g]
    [akpm@linux-foundation.org: s/CONFIG_MEMCG_HUGETLB/CONFIG_CGROUP_HUGETLB/g]
    Reviewed-by: KAMEZAWA Hiroyuki
    Signed-off-by: Aneesh Kumar K.V
    Cc: David Rientjes
    Cc: Hillf Danton
    Reviewed-by: Michal Hocko
    Cc: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Aneesh Kumar K.V