20 Apr, 2019
2 commits
-
Cgroup v1 implements the freezer controller, which provides an ability
to stop the workload in a cgroup and temporarily free up some
resources (cpu, io, network bandwidth and, potentially, memory)
for some other tasks. Cgroup v2 lacks this functionality.This patch implements freezer for cgroup v2.
Cgroup v2 freezer tries to put tasks into a state similar to jobctl
stop. This means that tasks can be killed, ptraced (using
PTRACE_SEIZE*), and interrupted. It is possible to attach to
a frozen task, get some information (e.g. read registers) and detach.
It's also possible to migrate a frozen tasks to another cgroup.This differs cgroup v2 freezer from cgroup v1 freezer, which mostly
tried to imitate the system-wide freezer. However uninterruptible
sleep is fine when all tasks are going to be frozen (hibernation case),
it's not the acceptable state for some subset of the system.Cgroup v2 freezer is not supporting freezing kthreads.
If a non-root cgroup contains kthread, the cgroup still can be frozen,
but the kthread will remain running, the cgroup will be shown
as non-frozen, and the notification will not be delivered.* PTRACE_ATTACH is not working because non-fatal signal delivery
is blocked in frozen state.There are some interface differences between cgroup v1 and cgroup v2
freezer too, which are required to conform the cgroup v2 interface
design principles:
1) There is no separate controller, which has to be turned on:
the functionality is always available and is represented by
cgroup.freeze and cgroup.events cgroup control files.
2) The desired state is defined by the cgroup.freeze control file.
Any hierarchical configuration is allowed.
3) The interface is asynchronous. The actual state is available
using cgroup.events control file ("frozen" field). There are no
dedicated transitional states.
4) It's allowed to make any changes with the cgroup hierarchy
(create new cgroups, remove old cgroups, move tasks between cgroups)
no matter if some cgroups are frozen.Signed-off-by: Roman Gushchin
Signed-off-by: Tejun Heo
No-objection-from-me-by: Oleg Nesterov
Cc: kernel-team@fb.com -
Freezer.c will contain an implementation of cgroup v2 freezer,
so let's rename the v1 freezer to avoid naming conflicts.Signed-off-by: Roman Gushchin
Signed-off-by: Tejun Heo
Cc: kernel-team@fb.com
27 Apr, 2018
1 commit
-
stat is too generic a name and ends up causing subtle confusions.
It'll be made generic so that controllers can plug into it, which will
make the problem worse. Let's rename it to something more specific -
cgroup_rstat for cgroup recursive stat.First, rename kernel/cgroup/stat.c to kernel/cgroup/rstat.c. No
content changes.Signed-off-by: Tejun Heo
16 Nov, 2017
1 commit
-
Pull cgroup updates from Tejun Heo:
"Cgroup2 cpu controller support is finally merged.- Basic cpu statistics support to allow monitoring by default without
the CPU controller enabled.- cgroup2 cpu controller support.
- /sys/kernel/cgroup files to help dealing with new / optional
features"* 'for-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: export list of cgroups v2 features using sysfs
cgroup: export list of delegatable control files using sysfs
cgroup: mark @cgrp __maybe_unused in cpu_stat_show()
MAINTAINERS: relocate cpuset.c
cgroup, sched: Move basic cpu stats from cgroup.stat to cpu.stat
sched: Implement interface for cgroup unified hierarchy
sched: Misc preps for cgroup unified hierarchy interface
sched/cputime: Add dummy cputime_adjust() implementation for CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
cgroup: statically initialize init_css_set->dfl_cgrp
cgroup: Implement cgroup2 basic CPU usage accounting
cpuacct: Introduce cgroup_account_cputime[_field]()
sched/cputime: Expose cputime_adjust()
02 Nov, 2017
1 commit
-
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.By default all files without license information are under the default
license of the kernel, which is GPL version 2.Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if
Reviewed-by: Philippe Ombredanne
Reviewed-by: Thomas Gleixner
Signed-off-by: Greg Kroah-Hartman
25 Sep, 2017
1 commit
-
In cgroup1, while cpuacct isn't actually controlling any resources, it
is a separate controller due to combination of two factors -
1. enabling cpu controller has significant side effects, and 2. we
have to pick one of the hierarchies to account CPU usages on. cpuacct
controller is effectively used to designate a hierarchy to track CPU
usages on.cgroup2's unified hierarchy removes the second reason and we can
account basic CPU usages by default. While we can use cpuacct for
this purpose, both its interface and implementation leave a lot to be
desired - it collects and exposes two sources of truth which don't
agree with each other and some of the exposed statistics don't make
much sense. Also, it propagates all the way up the hierarchy on each
accounting event which is unnecessary.This patch adds basic resource accounting mechanism to cgroup2's
unified hierarchy and accounts CPU usages using it.* All accountings are done per-cpu and don't propagate immediately.
It just bumps the per-cgroup per-cpu counters and links to the
parent's updated list if not already on it.* On a read, the per-cpu counters are collected into the global ones
and then propagated upwards. Only the per-cpu counters which have
changed since the last read are propagated.* CPU usage stats are collected and shown in "cgroup.stat" with "cpu."
prefix. Total usage is collected from scheduling events. User/sys
breakdown is sourced from tick sampling and adjusted to the usage
using cputime_adjust().This keeps the accounting side hot path O(1) and per-cpu and the read
side O(nr_updated_since_last_read).v2: Minor changes and documentation updates as suggested by Waiman and
Roman.Signed-off-by: Tejun Heo
Acked-by: Peter Zijlstra
Cc: Ingo Molnar
Cc: Li Zefan
Cc: Johannes Weiner
Cc: Waiman Long
Cc: Roman Gushchin
15 Jun, 2017
1 commit
-
The debug cgroup currently resides within cgroup-v1.c and is enabled
only for v1 cgroup. To enable the debug cgroup also for v2, it makes
sense to put the code into its own file as it will no longer be v1
specific. There is no change to the debug cgroup specific code.Signed-off-by: Waiman Long
Signed-off-by: Tejun Heo
11 Jan, 2017
1 commit
-
Added rdma cgroup controller that does accounting, limit enforcement
on rdma/IB resources.Added rdma cgroup header file which defines its APIs to perform
charging/uncharging functionality. It also defined APIs for RDMA/IB
stack for device registration. Devices which are registered will
participate in controller functions of accounting and limit
enforcements. It define rdmacg_device structure to bind IB stack
and RDMA cgroup controller.RDMA resources are tracked using resource pool. Resource pool is per
device, per cgroup entity which allows setting up accounting limits
on per device basis.Currently resources are defined by the RDMA cgroup.
Resource pool is created/destroyed dynamically whenever
charging/uncharging occurs respectively and whenever user
configuration is done. Its a tradeoff of memory vs little more code
space that creates resource pool object whenever necessary, instead of
creating them during cgroup creation and device registration time.Signed-off-by: Parav Pandit
Signed-off-by: Tejun Heo
28 Dec, 2016
3 commits
-
get/put_css_set() get exposed in cgroup-internal.h in the process.
Signed-off-by: Tejun Heo
Acked-by: Acked-by: Zefan Li -
cgroup.c is getting too unwieldy. Let's move out cgroup v1 specific
code along with the debug controller into kernel/cgroup/cgroup-v1.c.v2: cgroup_mutex and css_set_lock made available in cgroup-internal.h
regardless of CONFIG_PROVE_RCU.Signed-off-by: Tejun Heo
Acked-by: Acked-by: Zefan Li -
They're growing to be too many and planned to get split further. Move
them under their own directory.kernel/cgroup.c -> kernel/cgroup/cgroup.c
kernel/cgroup_freezer.c -> kernel/cgroup/freezer.c
kernel/cgroup_pids.c -> kernel/cgroup/pids.c
kernel/cpuset.c -> kernel/cgroup/cpuset.cSigned-off-by: Tejun Heo
Acked-by: Acked-by: Zefan Li