20 Apr, 2019

2 commits

  • Cgroup v1 implements the freezer controller, which provides an ability
    to stop the workload in a cgroup and temporarily free up some
    resources (cpu, io, network bandwidth and, potentially, memory)
    for some other tasks. Cgroup v2 lacks this functionality.

    This patch implements freezer for cgroup v2.

    Cgroup v2 freezer tries to put tasks into a state similar to jobctl
    stop. This means that tasks can be killed, ptraced (using
    PTRACE_SEIZE*), and interrupted. It is possible to attach to
    a frozen task, get some information (e.g. read registers) and detach.
    It's also possible to migrate a frozen tasks to another cgroup.

    This differs cgroup v2 freezer from cgroup v1 freezer, which mostly
    tried to imitate the system-wide freezer. However uninterruptible
    sleep is fine when all tasks are going to be frozen (hibernation case),
    it's not the acceptable state for some subset of the system.

    Cgroup v2 freezer is not supporting freezing kthreads.
    If a non-root cgroup contains kthread, the cgroup still can be frozen,
    but the kthread will remain running, the cgroup will be shown
    as non-frozen, and the notification will not be delivered.

    * PTRACE_ATTACH is not working because non-fatal signal delivery
    is blocked in frozen state.

    There are some interface differences between cgroup v1 and cgroup v2
    freezer too, which are required to conform the cgroup v2 interface
    design principles:
    1) There is no separate controller, which has to be turned on:
    the functionality is always available and is represented by
    cgroup.freeze and cgroup.events cgroup control files.
    2) The desired state is defined by the cgroup.freeze control file.
    Any hierarchical configuration is allowed.
    3) The interface is asynchronous. The actual state is available
    using cgroup.events control file ("frozen" field). There are no
    dedicated transitional states.
    4) It's allowed to make any changes with the cgroup hierarchy
    (create new cgroups, remove old cgroups, move tasks between cgroups)
    no matter if some cgroups are frozen.

    Signed-off-by: Roman Gushchin
    Signed-off-by: Tejun Heo
    No-objection-from-me-by: Oleg Nesterov
    Cc: kernel-team@fb.com

    Roman Gushchin
     
  • Freezer.c will contain an implementation of cgroup v2 freezer,
    so let's rename the v1 freezer to avoid naming conflicts.

    Signed-off-by: Roman Gushchin
    Signed-off-by: Tejun Heo
    Cc: kernel-team@fb.com

    Roman Gushchin
     

27 Apr, 2018

1 commit

  • stat is too generic a name and ends up causing subtle confusions.
    It'll be made generic so that controllers can plug into it, which will
    make the problem worse. Let's rename it to something more specific -
    cgroup_rstat for cgroup recursive stat.

    First, rename kernel/cgroup/stat.c to kernel/cgroup/rstat.c. No
    content changes.

    Signed-off-by: Tejun Heo

    Tejun Heo
     

16 Nov, 2017

1 commit

  • Pull cgroup updates from Tejun Heo:
    "Cgroup2 cpu controller support is finally merged.

    - Basic cpu statistics support to allow monitoring by default without
    the CPU controller enabled.

    - cgroup2 cpu controller support.

    - /sys/kernel/cgroup files to help dealing with new / optional
    features"

    * 'for-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
    cgroup: export list of cgroups v2 features using sysfs
    cgroup: export list of delegatable control files using sysfs
    cgroup: mark @cgrp __maybe_unused in cpu_stat_show()
    MAINTAINERS: relocate cpuset.c
    cgroup, sched: Move basic cpu stats from cgroup.stat to cpu.stat
    sched: Implement interface for cgroup unified hierarchy
    sched: Misc preps for cgroup unified hierarchy interface
    sched/cputime: Add dummy cputime_adjust() implementation for CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
    cgroup: statically initialize init_css_set->dfl_cgrp
    cgroup: Implement cgroup2 basic CPU usage accounting
    cpuacct: Introduce cgroup_account_cputime[_field]()
    sched/cputime: Expose cputime_adjust()

    Linus Torvalds
     

02 Nov, 2017

1 commit

  • Many source files in the tree are missing licensing information, which
    makes it harder for compliance tools to determine the correct license.

    By default all files without license information are under the default
    license of the kernel, which is GPL version 2.

    Update the files which contain no license information with the 'GPL-2.0'
    SPDX license identifier. The SPDX identifier is a legally binding
    shorthand, which can be used instead of the full boiler plate text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart and
    Philippe Ombredanne.

    How this work was done:

    Patches were generated and checked against linux-4.14-rc6 for a subset of
    the use cases:
    - file had no licensing information it it.
    - file was a */uapi/* one with no licensing information in it,
    - file was a */uapi/* one with existing licensing information,

    Further patches will be generated in subsequent months to fix up cases
    where non-standard license headers were used, and references to license
    had to be inferred by heuristics based on keywords.

    The analysis to determine which SPDX License Identifier to be applied to
    a file was done in a spreadsheet of side by side results from of the
    output of two independent scanners (ScanCode & Windriver) producing SPDX
    tag:value files created by Philippe Ombredanne. Philippe prepared the
    base worksheet, and did an initial spot review of a few 1000 files.

    The 4.13 kernel was the starting point of the analysis with 60,537 files
    assessed. Kate Stewart did a file by file comparison of the scanner
    results in the spreadsheet to determine which SPDX license identifier(s)
    to be applied to the file. She confirmed any determination that was not
    immediately clear with lawyers working with the Linux Foundation.

    Criteria used to select files for SPDX license identifier tagging was:
    - Files considered eligible had to be source code files.
    - Make and config files were included as candidates if they contained >5
    lines of source
    - File already had some variant of a license header in it (even if
    Reviewed-by: Philippe Ombredanne
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

25 Sep, 2017

1 commit

  • In cgroup1, while cpuacct isn't actually controlling any resources, it
    is a separate controller due to combination of two factors -
    1. enabling cpu controller has significant side effects, and 2. we
    have to pick one of the hierarchies to account CPU usages on. cpuacct
    controller is effectively used to designate a hierarchy to track CPU
    usages on.

    cgroup2's unified hierarchy removes the second reason and we can
    account basic CPU usages by default. While we can use cpuacct for
    this purpose, both its interface and implementation leave a lot to be
    desired - it collects and exposes two sources of truth which don't
    agree with each other and some of the exposed statistics don't make
    much sense. Also, it propagates all the way up the hierarchy on each
    accounting event which is unnecessary.

    This patch adds basic resource accounting mechanism to cgroup2's
    unified hierarchy and accounts CPU usages using it.

    * All accountings are done per-cpu and don't propagate immediately.
    It just bumps the per-cgroup per-cpu counters and links to the
    parent's updated list if not already on it.

    * On a read, the per-cpu counters are collected into the global ones
    and then propagated upwards. Only the per-cpu counters which have
    changed since the last read are propagated.

    * CPU usage stats are collected and shown in "cgroup.stat" with "cpu."
    prefix. Total usage is collected from scheduling events. User/sys
    breakdown is sourced from tick sampling and adjusted to the usage
    using cputime_adjust().

    This keeps the accounting side hot path O(1) and per-cpu and the read
    side O(nr_updated_since_last_read).

    v2: Minor changes and documentation updates as suggested by Waiman and
    Roman.

    Signed-off-by: Tejun Heo
    Acked-by: Peter Zijlstra
    Cc: Ingo Molnar
    Cc: Li Zefan
    Cc: Johannes Weiner
    Cc: Waiman Long
    Cc: Roman Gushchin

    Tejun Heo
     

15 Jun, 2017

1 commit

  • The debug cgroup currently resides within cgroup-v1.c and is enabled
    only for v1 cgroup. To enable the debug cgroup also for v2, it makes
    sense to put the code into its own file as it will no longer be v1
    specific. There is no change to the debug cgroup specific code.

    Signed-off-by: Waiman Long
    Signed-off-by: Tejun Heo

    Waiman Long
     

11 Jan, 2017

1 commit

  • Added rdma cgroup controller that does accounting, limit enforcement
    on rdma/IB resources.

    Added rdma cgroup header file which defines its APIs to perform
    charging/uncharging functionality. It also defined APIs for RDMA/IB
    stack for device registration. Devices which are registered will
    participate in controller functions of accounting and limit
    enforcements. It define rdmacg_device structure to bind IB stack
    and RDMA cgroup controller.

    RDMA resources are tracked using resource pool. Resource pool is per
    device, per cgroup entity which allows setting up accounting limits
    on per device basis.

    Currently resources are defined by the RDMA cgroup.

    Resource pool is created/destroyed dynamically whenever
    charging/uncharging occurs respectively and whenever user
    configuration is done. Its a tradeoff of memory vs little more code
    space that creates resource pool object whenever necessary, instead of
    creating them during cgroup creation and device registration time.

    Signed-off-by: Parav Pandit
    Signed-off-by: Tejun Heo

    Parav Pandit
     

28 Dec, 2016

3 commits