19 Nov, 2021

1 commit

  • [ Upstream commit 81c49d39aea8a10e6d05d3aa1cb65ceb721e19b0 ]

    In account_guest_time in kernel/sched/cputime.c guest time is
    attributed to both CPUTIME_NICE and CPUTIME_USER in addition to
    CPUTIME_GUEST_NICE and CPUTIME_GUEST respectively. Therefore, adding
    both to calculate usage results in double counting any guest time at
    the rootcg.

    Fixes: 936f2a70f207 ("cgroup: add cpu.stat file to root cgroup")
    Signed-off-by: Dan Schatzberg
    Signed-off-by: Tejun Heo
    Signed-off-by: Sasha Levin

    Dan Schatzberg
     

28 Jul, 2021

1 commit

  • 0fa294fb1985 ("cgroup: Replace cgroup_rstat_mutex with a spinlock") added
    cgroup_rstat_flush_irqsafe() allowing flushing to happen from the irq
    context. However, rstat paths use u64_stats_sync to synchronize access to
    64bit stat counters on 32bit machines. u64_stats_sync is implemented using
    seq_lock and trying to read from an irq context can lead to A-A deadlock if
    the irq happens to interrupt the stat update.

    Fix it by using the irqsafe variants - u64_stats_update_begin_irqsave() and
    u64_stats_update_end_irqrestore() - in the update paths. Note that none of
    this matters on 64bit machines. All these are just for 32bit SMP setups.

    Note that the interface was introduced way back, its first and currently
    only use was recently added by 2d146aa3aa84 ("mm: memcontrol: switch to
    rstat"). Stable tagging targets this commit.

    Signed-off-by: Tejun Heo
    Reported-by: Rik van Riel
    Fixes: 2d146aa3aa84 ("mm: memcontrol: switch to rstat")
    Cc: stable@vger.kernel.org # v5.13+

    Tejun Heo
     

04 Jun, 2021

1 commit

  • Fix function name in cgroup.c and rstat.c kernel-doc comment
    to remove these warnings found by clang_w1.

    kernel/cgroup/cgroup.c:2401: warning: expecting prototype for
    cgroup_taskset_migrate(). Prototype was for cgroup_migrate_execute()
    instead.
    kernel/cgroup/rstat.c:233: warning: expecting prototype for
    cgroup_rstat_flush_begin(). Prototype was for cgroup_rstat_flush_hold()
    instead.

    Reported-by: Abaci Robot
    Fixes: 'commit e595cd706982 ("cgroup: track migration context in cgroup_mgctx")'
    Signed-off-by: Yang Li
    Signed-off-by: Tejun Heo

    Yang Li
     

25 May, 2021

1 commit

  • Fix some spelling mistakes in comments:
    hierarhcy ==> hierarchy
    automtically ==> automatically
    overriden ==> overridden
    In absense of .. or ==> In absence of .. and
    assocaited ==> associated
    taget ==> target
    initate ==> initiate
    succeded ==> succeeded
    curremt ==> current
    udpated ==> updated

    Signed-off-by: Zhen Lei
    Signed-off-by: Tejun Heo

    Zhen Lei
     

01 May, 2021

2 commits

  • Current users of the rstat code can source root-level statistics from
    the native counters of their respective subsystem, allowing them to
    forego aggregation at the root level. This optimization is currently
    implemented inside the generic rstat code, which doesn't track the root
    cgroup and doesn't invoke the subsystem flush callbacks on it.

    However, the memory controller cannot do this optimization, because
    cgroup1 breaks out memory specifically for the local level, including at
    the root level. In preparation for the memory controller switching to
    rstat, move the optimization from rstat core to the controllers.

    Afterwards, rstat will always track the root cgroup for changes and
    invoke the subsystem callbacks on it; and it's up to the subsystem to
    special-case and skip aggregation of the root cgroup if it can source
    this information through other, cheaper means.

    This is the case for the io controller and the cgroup base stats. In
    their respective flush callbacks, check whether the parent is the root
    cgroup, and if so, skip the unnecessary upward propagation.

    The extra cost of tracking the root cgroup is negligible: on stat
    changes, we actually remove a branch that checks for the root. The
    queueing for a flush touches only per-cpu data, and only the first stat
    change since a flush requires a (per-cpu) lock.

    Link: https://lkml.kernel.org/r/20210209163304.77088-6-hannes@cmpxchg.org
    Signed-off-by: Johannes Weiner
    Acked-by: Tejun Heo
    Cc: Michal Hocko
    Cc: Michal Koutný
    Cc: Roman Gushchin
    Cc: Shakeel Butt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • Rstat currently only supports the default hierarchy in cgroup2. In
    order to replace memcg's private stats infrastructure - used in both
    cgroup1 and cgroup2 - with rstat, the latter needs to support cgroup1.

    The initialization and destruction callbacks for regular cgroups are
    already in place. Remove the cgroup_on_dfl() guards to handle cgroup1.

    The initialization of the root cgroup is currently hardcoded to only
    handle cgrp_dfl_root.cgrp. Move those callbacks to cgroup_setup_root()
    and cgroup_destroy_root() to handle the default root as well as the
    various cgroup1 roots we may set up during mounting.

    The linking of css to cgroups happens in code shared between cgroup1 and
    cgroup2 as well. Simply remove the cgroup_on_dfl() guard.

    Linkage of the root css to the root cgroup is a bit trickier: per
    default, the root css of a subsystem controller belongs to the default
    hierarchy (i.e. the cgroup2 root). When a controller is mounted in its
    cgroup1 version, the root css is stolen and moved to the cgroup1 root;
    on unmount, the css moves back to the default hierarchy. Annotate
    rebind_subsystems() to move the root css linkage along between roots.

    Link: https://lkml.kernel.org/r/20210209163304.77088-5-hannes@cmpxchg.org
    Signed-off-by: Johannes Weiner
    Reviewed-by: Roman Gushchin
    Reviewed-by: Shakeel Butt
    Acked-by: Tejun Heo
    Reviewed-by: Michal Koutný
    Cc: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     

29 Jun, 2020

1 commit


28 May, 2020

1 commit

  • Currently, the root cgroup does not have a cpu.stat file. Add one which
    is consistent with /proc/stat to capture global cpu statistics that
    might not fall under cgroup accounting.

    We haven't done this in the past because the data are already presented
    in /proc/stat and we didn't want to add overhead from collecting root
    cgroup stats when cgroups are configured, but no cgroups have been
    created.

    By keeping the data consistent with /proc/stat, I think we avoid the
    first problem, while improving the usability of cgroups stats.
    We avoid the second problem by computing the contents of cpu.stat from
    existing data collected for /proc/stat anyway.

    Signed-off-by: Boris Burkov
    Suggested-by: Tejun Heo
    Signed-off-by: Tejun Heo

    Boris Burkov
     

10 Apr, 2020

1 commit

  • This reverts commit 9a9e97b2f1f2 ("cgroup: Add memory barriers to plug
    cgroup_rstat_updated() race window").

    The commit was added in anticipation of memcg rstat conversion which needed
    synchronous accounting for the event counters (e.g. oom kill count). However,
    the conversion didn't get merged due to percpu memory overhead concern which
    couldn't be addressed at the time.

    Unfortunately, the patch's addition of smp_mb() to cgroup_rstat_updated()
    meant that every scheduling event now had to go through an additional full
    barrier and Mel Gorman noticed it as 1% regression in netperf UDP_STREAM test.

    There's no need to have this barrier in tree now and even if we need
    synchronous accounting in the future, the right thing to do is separating that
    out to a separate function so that hot paths which don't care about
    synchronous behavior don't have to pay the overhead of the full barrier. Let's
    revert.

    Signed-off-by: Tejun Heo
    Reported-by: Mel Gorman
    Link: http://lkml.kernel.org/r/20200409154413.GK3818@techsingularity.net
    Cc: v4.18+

    Tejun Heo
     

15 Jan, 2020

1 commit


07 Nov, 2019

1 commit

  • cgroup->bstat_pending is used to determine the base stat delta to
    propagate to the parent. While correct, this is different from how
    percpu delta is determined for no good reason and the inconsistency
    makes the code more difficult to understand.

    This patch makes parent propagation delta calculation use the same
    method as percpu to global propagation.

    * cgroup_base_stat_accumulate() is renamed to cgroup_base_stat_add()
    and cgroup_base_stat_sub() is added.

    * percpu propagation calculation is updated to use the above helpers.

    * cgroup->bstat_pending is replaced with cgroup->last_bstat and
    updated to use the same calculation as percpu propagation.

    Signed-off-by: Tejun Heo

    Tejun Heo
     

21 May, 2019

1 commit

  • Add SPDX license identifiers to all files which:

    - Have no license information of any form

    - Have EXPORT_.*_SYMBOL_GPL inside which was used in the
    initial scan/conversion to ignore the file

    These files fall under the project license, GPL v2 only. The resulting SPDX
    license identifier is:

    GPL-2.0-only

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

16 Feb, 2019

1 commit

  • cgroup_rstat_cpu_pop_updated() is used to traverse the updated cgroups
    on flush. While it was only visiting updated ones in the subtree, it
    was visiting @root unconditionally. We can easily check whether @root
    is updated or not by looking at its ->updated_next just as with the
    cgroups in the subtree.

    * Remove the unnecessary cgroup_parent() test. The system root cgroup
    is never updated and thus its ->updated_next is always NULL. No
    need to test whether cgroup_parent() exists in addition to
    ->updated_next.

    * Terminate traverse if ->updated_next is NULL. This can only happen
    for subtree @root and there's no reason to visit it if it's not
    marked updated.

    This reduces cpu consumption when reading a lot of rstat backed files.
    In a micro benchmark reading stat from ~1600 cgroups, the sys time was
    lowered by >40%.

    Signed-off-by: Tejun Heo

    Tejun Heo
     

27 Apr, 2018

9 commits

  • cgroup_rstat_updated() ensures that the cgroup's rstat is linked to
    the parent. If there's no parent, it never gets linked and the
    function ends up grabbing and releasing the cgroup_rstat_lock each
    time for no reason which can be expensive.

    This hasn't been a problem till now because nobody was calling the
    function for the root cgroup but rstat is gonna be exposed to
    controllers and use cases, so let's get ready. Make
    cgroup_rstat_updated() an no-op for the root cgroup.

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • cgroup_rstat_updated() has a small race window where an updated
    signaling can race with flush and could be lost till the next update.
    This wasn't a problem for the existing usages, but we plan to use
    rstat to track counters which need to be accurate.

    This patch plugs the race window by synchronizing
    cgroup_rstat_updated() and flush path with memory barriers around
    cgroup_rstat_cpu->updated_next pointer.

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • This patch adds cgroup_subsys->css_rstat_flush(). If a subsystem has
    this callback, its csses are linked on cgrp->css_rstat_list and rstat
    will call the function whenever the associated cgroup is flushed.
    Flush is also performed when such csses are released so that residual
    counts aren't lost.

    Combined with the rstat API previous patches factored out, this allows
    controllers to plug into rstat to manage their statistics in a
    scalable way.

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • Currently, rstat flush path is protected with a mutex which is fine as
    all the existing users are from interface file show path. However,
    rstat is being generalized for use by controllers and flushing from
    atomic contexts will be necessary.

    This patch replaces cgroup_rstat_mutex with a spinlock and adds a
    irq-safe flush function - cgroup_rstat_flush_irqsafe(). Explicit
    yield handling is added to the flush path so that other flush
    functions can yield to other threads and flushers.

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • cgroup_rstat is being generalized so that controllers can use it too.
    This patch factors out and exposes the following interface functions.

    * cgroup_rstat_updated(): Renamed from cgroup_rstat_cpu_updated() for
    consistency.

    * cgroup_rstat_flush_hold/release(): Factored out from base stat
    implementation.

    * cgroup_rstat_flush(): Verbatim expose.

    While at it, drop assert on cgroup_rstat_mutex in
    cgroup_base_stat_flush() as it crosses layers and make a minor comment
    update.

    v2: Added EXPORT_SYMBOL_GPL(cgroup_rstat_updated) to fix a build bug.

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • Currently, rstat.c has rstat and base stat implementations intermixed.
    Collect base stat implementation at the end of the file. Also,
    reorder the prototypes.

    This patch doesn't make any functional changes.

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • Base resource stat accounts universial (not specific to any
    controller) resource consumptions on top of rstat. Currently, its
    implementation is intermixed with rstat implementation making the code
    confusing to follow.

    This patch clarifies the distintion by doing the followings.

    * Encapsulate base resource stat counters, currently only cputime, in
    struct cgroup_base_stat.

    * Move prev_cputime into struct cgroup and initialize it with cgroup.

    * Rename the related functions so that they start with cgroup_base_stat.

    * Prefix the related variables and field names with b.

    This patch doesn't make any functional changes.

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • stat is too generic a name and ends up causing subtle confusions.
    It'll be made generic so that controllers can plug into it, which will
    make the problem worse. Let's rename it to something more specific -
    cgroup_rstat for cgroup recursive stat.

    This patch does the following renames. No other changes.

    * cpu_stat -> rstat_cpu
    * stat -> rstat
    * ?cstat -> ?rstatc

    Note that the renames are selective. The unrenamed are the ones which
    implement basic resource statistics on top of rstat. This will be
    further cleaned up in the following patches.

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • stat is too generic a name and ends up causing subtle confusions.
    It'll be made generic so that controllers can plug into it, which will
    make the problem worse. Let's rename it to something more specific -
    cgroup_rstat for cgroup recursive stat.

    First, rename kernel/cgroup/stat.c to kernel/cgroup/rstat.c. No
    content changes.

    Signed-off-by: Tejun Heo

    Tejun Heo