Eric Lee / smarc-fsl-linux-kernel

14 Dec, 2020

1 commit

3975c98ba ANDROID: cgroup: Export functions used by modules ... Browse Code »

Below symbols would be used by vendor modules for tracking tasks
in various cgroups
1. cgroup_taskset_first
2. cgroup_taskset_next

Bug: 175045928
Change-Id: I6d89148eb4c71174a02a27acab196ff940be9082
Signed-off-by: Abhijeet Dharmapurikar
(cherry picked from commit 7333bd73c055e8ebe9862316eaaff691e0eace27)

Abhijeet Dharmapurikar
2020-12-14 19:22:22 +0800

04 Dec, 2020

1 commit

5a920a650 ANDROID: Sched: Export scheduler symbols needed by vendor modules ... Browse Code »

Need to export internal scheduler symbols to facilitate vendor module
with scheduler based value-adds.

Bug: 173725277
Change-Id: I021f09097dfc1480abcc998cc8e05e75b2ee828b
Signed-off-by: Shaleen Agrawal
Signed-off-by: Pavankumar Kondeti

Shaleen Agrawal
2020-12-04 00:50:04 +0800

01 Oct, 2020

2 commits

65026da59 cgroup: Zero sized write should be no-op ... Browse Code »

Do not report failure on zero sized writes, and handle them as no-op.

There's issues for example in case of writev() when there's iovec
containing zero buffer as a first one. It's expected writev() on below
example to successfully perform the write to specified writable cgroup
file expecting integer value, and to return 2. For now it's returning
value -1, and skipping the write:

int writetest(int fd) {
const char *buf1 = "";
const char *buf2 = "1\n";
struct iovec iov[2] = {
{ .iov_base = (void*)buf1, .iov_len = 0 },
{ .iov_base = (void*)buf2, .iov_len = 2 }
};
return writev(fd, iov, 2);
}

This patch fixes the issue by checking if there's nothing to write,
and handling the write as no-op by just returning 0.

Signed-off-by: Jouni Roivas
Signed-off-by: Tejun Heo

Jouni Roivas
2020-10-01 01:52:06 +0800
95d325185 cgroup: remove redundant kernfs_activate in cgroup_setup_root() ... Browse Code »

This step is already done in rebind_subsystems().

Not necessary to do it again.

Signed-off-by: Wei Yang
Signed-off-by: Tejun Heo

Wei Yang
2020-10-01 00:03:10 +0800

08 Jul, 2020

1 commit

ad0f75e5f cgroup: fix cgroup_sk_alloc() for sk_clone_lock() ... Browse Code »

When we clone a socket in sk_clone_lock(), its sk_cgrp_data is
copied, so the cgroup refcnt must be taken too. And, unlike the
sk_alloc() path, sock_update_netprioidx() is not called here.
Therefore, it is safe and necessary to grab the cgroup refcnt
even when cgroup_sk_alloc is disabled.

sk_clone_lock() is in BH context anyway, the in_interrupt()
would terminate this function if called there. And for sk_alloc()
skcd->val is always zero. So it's safe to factor out the code
to make it more readable.

The global variable 'cgroup_sk_alloc_disabled' is used to determine
whether to take these reference counts. It is impossible to make
the reference counting correct unless we save this bit of information
in skcd->val. So, add a new bit there to record whether the socket
has already taken the reference counts. This obviously relies on
kmalloc() to align cgroup pointers to at least 4 bytes,
ARCH_KMALLOC_MINALIGN is certainly larger than that.

This bug seems to be introduced since the beginning, commit
d979a39d7242 ("cgroup: duplicate cgroup reference when cloning sockets")
tried to fix it but not compeletely. It seems not easy to trigger until
the recent commit 090e28b229af
("netprio_cgroup: Fix unlimited memory leak of v2 cgroups") was merged.

Fixes: bd1060a1d671 ("sock, cgroup: add sock->sk_cgroup")
Reported-by: Cameron Berkenpas
Reported-by: Peter Geis
Reported-by: Lu Fengqi
Reported-by: Daniël Sonck
Reported-by: Zhang Qiang
Tested-by: Cameron Berkenpas
Tested-by: Peter Geis
Tested-by: Thomas Lamprecht
Cc: Daniel Borkmann
Cc: Zefan Li
Cc: Tejun Heo
Cc: Roman Gushchin
Signed-off-by: Cong Wang
Signed-off-by: David S. Miller

Cong Wang
2020-07-08 04:34:11 +0800

07 Jun, 2020

1 commit

4a7e89c5e Merge branch 'for-5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup ... Browse Code »

Pull cgroup updates from Tejun Heo:
"Just two patches: one to add system-level cpu.stat to the root cgroup
for convenience and a trivial comment update"

* 'for-5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: add cpu.stat file to root cgroup
cgroup: Remove stale comments

Linus Torvalds
2020-06-07 00:59:34 +0800

28 May, 2020

1 commit

936f2a70f cgroup: add cpu.stat file to root cgroup ... Browse Code »

Currently, the root cgroup does not have a cpu.stat file. Add one which
is consistent with /proc/stat to capture global cpu statistics that
might not fall under cgroup accounting.

We haven't done this in the past because the data are already presented
in /proc/stat and we didn't want to add overhead from collecting root
cgroup stats when cgroups are configured, but no cgroups have been
created.

By keeping the data consistent with /proc/stat, I think we avoid the
first problem, while improving the usability of cgroups stats.
We avoid the second problem by computing the contents of cpu.stat from
existing data collected for /proc/stat anyway.

Signed-off-by: Boris Burkov
Suggested-by: Tejun Heo
Signed-off-by: Tejun Heo

Boris Burkov
2020-05-28 22:06:35 +0800

27 May, 2020

1 commit

6b6ebb347 cgroup: Remove stale comments ... Browse Code »

- The default root is where we can create v2 cgroups.
- The __DEVEL__sane_behavior mount option has been removed long long ago.

Signed-off-by: Li Zefan
Signed-off-by: Tejun Heo

Zefan Li
2020-05-27 01:20:24 +0800

29 Apr, 2020

1 commit

f9d041271 bpf: Refactor bpf_link update handling ... Browse Code »

Make bpf_link update support more generic by making it into another
bpf_link_ops methods. This allows generic syscall handling code to be agnostic
to various conditionally compiled features (e.g., the case of
CONFIG_CGROUP_BPF). This also allows to keep link type-specific code to remain
static within respective code base. Refactor existing bpf_cgroup_link code and
take advantage of this.

Signed-off-by: Andrii Nakryiko
Signed-off-by: Alexei Starovoitov
Link: https://lore.kernel.org/bpf/20200429001614.1544-2-andriin@fb.com

Andrii Nakryiko
2020-04-29 08:27:07 +0800

04 Apr, 2020

1 commit

d88360052 Merge branch 'for-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup ... Browse Code »

Pull cgroup updates from Tejun Heo:

- Christian extended clone3 so that processes can be spawned into
cgroups directly.

This is not only neat in terms of semantics but also avoids grabbing
the global cgroup_threadgroup_rwsem for migration.

- Daniel added !root xattr support to cgroupfs.

Userland already uses xattrs on cgroupfs for bookkeeping. This will
allow delegated cgroups to support such usages.

- Prateek tried to make cpuset hotplug handling synchronous but that
led to possible deadlock scenarios. Reverted.

- Other minor changes including release_agent_path handling cleanup.

* 'for-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
docs: cgroup-v1: Document the cpuset_v2_mode mount option
Revert "cpuset: Make cpuset hotplug synchronous"
cgroupfs: Support user xattrs
kernfs: Add option to enable user xattrs
kernfs: Add removed_size out param for simple_xattr_set
kernfs: kvmalloc xattr value instead of kmalloc
cgroup: Restructure release_agent_path handling
selftests/cgroup: add tests for cloning into cgroups
clone3: allow spawning processes into cgroups
cgroup: add cgroup_may_write() helper
cgroup: refactor fork helpers
cgroup: add cgroup_get_from_file() helper
cgroup: unify attach permission checking
cpuset: Make cpuset hotplug synchronous
cgroup.c: Use built-in RCU list checking
kselftest/cgroup: add cgroup destruction test
cgroup: Clean up css_set task traversal

Linus Torvalds
2020-04-04 02:30:20 +0800

03 Apr, 2020

1 commit

8a931f801 mm: memcontrol: recursive memory.low protection ... Browse Code »

Right now, the effective protection of any given cgroup is capped by its
own explicit memory.low setting, regardless of what the parent says. The
reasons for this are mostly historical and ease of implementation: to make
delegation of memory.low safe, effective protection is the min() of all
memory.low up the tree.

Unfortunately, this limitation makes it impossible to protect an entire
subtree from another without forcing the user to make explicit protection
allocations all the way to the leaf cgroups - something that is highly
undesirable in real life scenarios.

Consider memory in a data center host. At the cgroup top level, we have a
distinction between system management software and the actual workload the
system is executing. Both branches are further subdivided into individual
services, job components etc.

We want to protect the workload as a whole from the system management
software, but that doesn't mean we want to protect and prioritize
individual workload wrt each other. Their memory demand can vary over
time, and we'd want the VM to simply cache the hottest data within the
workload subtree. Yet, the current memory.low limitations force us to
allocate a fixed amount of protection to each workload component in order
to get protection from system management software in general. This
results in very inefficient resource distribution.

Another concern with mandating downward allocation is that, as the
complexity of the cgroup tree grows, it gets harder for the lower levels
to be informed about decisions made at the host-level. Consider a
container inside a namespace that in turn creates its own nested tree of
cgroups to run multiple workloads. It'd be extremely difficult to
configure memory.low parameters in those leaf cgroups that on one hand
balance pressure among siblings as the container desires, while also
reflecting the host-level protection from e.g. rpm upgrades, that lie
beyond one or more delegation and namespacing points in the tree.

It's highly unusual from a cgroup interface POV that nested levels have to
be aware of and reflect decisions made at higher levels for them to be
effective.

To enable such use cases and scale configurability for complex trees, this
patch implements a resource inheritance model for memory that is similar
to how the CPU and the IO controller implement work-conserving resource
allocations: a share of a resource allocated to a subree always applies to
the entire subtree recursively, while allowing, but not mandating,
children to further specify distribution rules.

That means that if protection is explicitly allocated among siblings,
those configured shares are being followed during page reclaim just like
they are now. However, if the memory.low set at a higher level is not
fully claimed by the children in that subtree, the "floating" remainder is
applied to each cgroup in the tree in proportion to its size. Since
reclaim pressure is applied in proportion to size as well, each child in
that tree gets the same boost, and the effect is neutral among siblings -
with respect to each other, they behave as if no memory control was
enabled at all, and the VM simply balances the memory demands optimally
within the subtree. But collectively those cgroups enjoy a boost over the
cgroups in neighboring trees.

E.g. a leaf cgroup with a memory.low setting of 0 no longer means that
it's not getting a share of the hierarchically assigned resource, just
that it doesn't claim a fixed amount of it to protect from its siblings.

This allows us to recursively protect one subtree (workload) from another
(system management), while letting subgroups compete freely among each
other - without having to assign fixed shares to each leaf, and without
nested groups having to echo higher-level settings.

The floating protection composes naturally with fixed protection.
Consider the following example tree:

A A: low = 2G
/ \ A1: low = 1G
A1 A2 A2: low = 0G

As outside pressure is applied to this tree, A1 will enjoy a fixed
protection from A2 of 1G, but the remaining, unclaimed 1G from A is split
evenly among A1 and A2, coming out to 1.5G and 0.5G.

There is a slight risk of regressing theoretical setups where the
top-level cgroups don't know about the true budgeting and set bogusly high
"bypass" values that are meaningfully allocated down the tree. Such
setups would rely on unclaimed protection to be discarded, and
distributing it would change the intended behavior. Be safe and hide the
new behavior behind a mount option, 'memory_recursiveprot'.

Signed-off-by: Johannes Weiner
Signed-off-by: Andrew Morton
Acked-by: Tejun Heo
Acked-by: Roman Gushchin
Acked-by: Chris Down
Cc: Michal Hocko
Cc: Michal Koutný
Link: http://lkml.kernel.org/r/20200227195606.46212-4-hannes@cmpxchg.org
Signed-off-by: Linus Torvalds

Johannes Weiner
2020-04-03 00:35:28 +0800

31 Mar, 2020

2 commits

0c991ebc8 bpf: Implement bpf_prog replacement for an active bpf_cgroup_link ... Browse Code »

Add new operation (LINK_UPDATE), which allows to replace active bpf_prog from
under given bpf_link. Currently this is only supported for bpf_cgroup_link,
but will be extended to other kinds of bpf_links in follow-up patches.

For bpf_cgroup_link, implemented functionality matches existing semantics for
direct bpf_prog attachment (including BPF_F_REPLACE flag). User can either
unconditionally set new bpf_prog regardless of which bpf_prog is currently
active under given bpf_link, or, optionally, can specify expected active
bpf_prog. If active bpf_prog doesn't match expected one, no changes are
performed, old bpf_link stays intact and attached, operation returns
a failure.

cgroup_bpf_replace() operation is resolving race between auto-detachment and
bpf_prog update in the same fashion as it's done for bpf_link detachment,
except in this case update has no way of succeeding because of target cgroup
marked as dying. So in this case error is returned.

Signed-off-by: Andrii Nakryiko
Signed-off-by: Alexei Starovoitov
Link: https://lore.kernel.org/bpf/20200330030001.2312810-3-andriin@fb.com

Andrii Nakryiko
2020-03-31 08:36:33 +0800
af6eea574 bpf: Implement bpf_link-based cgroup BPF program attachment ... Browse Code »

Implement new sub-command to attach cgroup BPF programs and return FD-based
bpf_link back on success. bpf_link, once attached to cgroup, cannot be
replaced, except by owner having its FD. Cgroup bpf_link supports only
BPF_F_ALLOW_MULTI semantics. Both link-based and prog-based BPF_F_ALLOW_MULTI
attachments can be freely intermixed.

To prevent bpf_cgroup_link from keeping cgroup alive past the point when no
BPF program can be executed, implement auto-detachment of link. When
cgroup_bpf_release() is called, all attached bpf_links are forced to release
cgroup refcounts, but they leave bpf_link otherwise active and allocated, as
well as still owning underlying bpf_prog. This is because user-space might
still have FDs open and active, so bpf_link as a user-referenced object can't
be freed yet. Once last active FD is closed, bpf_link will be freed and
underlying bpf_prog refcount will be dropped. But cgroup refcount won't be
touched, because cgroup is released already.

The inherent race between bpf_cgroup_link release (from closing last FD) and
cgroup_bpf_release() is resolved by both operations taking cgroup_mutex. So
the only additional check required is when bpf_cgroup_link attempts to detach
itself from cgroup. At that time we need to check whether there is still
cgroup associated with that link. And if not, exit with success, because
bpf_cgroup_link was already successfully detached.

Signed-off-by: Andrii Nakryiko
Signed-off-by: Alexei Starovoitov
Acked-by: Roman Gushchin
Link: https://lore.kernel.org/bpf/20200330030001.2312810-2-andriin@fb.com

Andrii Nakryiko
2020-03-31 08:35:59 +0800

17 Mar, 2020

1 commit

38aca3071 cgroupfs: Support user xattrs ... Browse Code »

This patch turns on xattr support for cgroupfs. This is useful for
letting non-root owners of delegated subtrees attach metadata to
cgroups.

One use case is for subtree owners to tell a userspace out of memory
killer to bias away from killing specific subtrees.

Tests:

[/sys/fs/cgroup]# for i in $(seq 0 130); \
do setfattr workload.slice -n user.name$i -v wow; done
setfattr: workload.slice: No space left on device
setfattr: workload.slice: No space left on device
setfattr: workload.slice: No space left on device

[/sys/fs/cgroup]# for i in $(seq 0 130); \
do setfattr workload.slice --remove user.name$i; done
setfattr: workload.slice: No such attribute
setfattr: workload.slice: No such attribute
setfattr: workload.slice: No such attribute

[/sys/fs/cgroup]# for i in $(seq 0 130); \
do setfattr workload.slice -n user.name$i -v wow; done
setfattr: workload.slice: No space left on device
setfattr: workload.slice: No space left on device
setfattr: workload.slice: No space left on device

`seq 0 130` is inclusive, and 131 - 128 = 3, which is the number of
errors we expect to see.

[/data]# cat testxattr.c
#include
#include
#include
#include

int main() {
char name[256];
char *buf = malloc(64 << 10);
if (!buf) {
perror("malloc");
return 1;
}

for (int i = 0; i < 4; ++i) {
snprintf(name, 256, "user.bigone%d", i);
if (setxattr("/sys/fs/cgroup/system.slice", name, buf,
64 << 10, 0)) {
printf("setxattr failed on iteration=%d\n", i);
return 1;
}
}

return 0;
}

[/data]# ./a.out
setxattr failed on iteration=2

[/data]# ./a.out
setxattr failed on iteration=0

[/sys/fs/cgroup]# setfattr -x user.bigone0 system.slice/
[/sys/fs/cgroup]# setfattr -x user.bigone1 system.slice/

[/data]# ./a.out
setxattr failed on iteration=2

Signed-off-by: Daniel Xu
Acked-by: Chris Down
Reviewed-by: Greg Kroah-Hartman
Signed-off-by: Tejun Heo

Daniel Xu
2020-03-17 03:53:47 +0800

13 Mar, 2020

2 commits

1b51f6946 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net ... Browse Code »

Pull networking fixes from David Miller:
"It looks like a decent sized set of fixes, but a lot of these are one
liner off-by-one and similar type changes:

1) Fix netlink header pointer to calcular bad attribute offset
reported to user. From Pablo Neira Ayuso.

2) Don't double clear PHY interrupts when ->did_interrupt is set,
from Heiner Kallweit.

3) Add missing validation of various (devlink, nl802154, fib, etc.)
attributes, from Jakub Kicinski.

4) Missing *pos increments in various netfilter seq_next ops, from
Vasily Averin.

5) Missing break in of_mdiobus_register() loop, from Dajun Jin.

6) Don't double bump tx_dropped in veth driver, from Jiang Lidong.

7) Work around FMAN erratum A050385, from Madalin Bucur.

8) Make sure ARP header is pulled early enough in bonding driver,
from Eric Dumazet.

9) Do a cond_resched() during multicast processing of ipvlan and
macvlan, from Mahesh Bandewar.

10) Don't attach cgroups to unrelated sockets when in interrupt
context, from Shakeel Butt.

11) Fix tpacket ring state management when encountering unknown GSO
types. From Willem de Bruijn.

12) Fix MDIO bus PHY resume by checking mdio_bus_phy_may_suspend()
only in the suspend context. From Heiner Kallweit"

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (112 commits)
net: systemport: fix index check to avoid an array out of bounds access
tc-testing: add ETS scheduler to tdc build configuration
net: phy: fix MDIO bus PM PHY resuming
net: hns3: clear port base VLAN when unload PF
net: hns3: fix RMW issue for VLAN filter switch
net: hns3: fix VF VLAN table entries inconsistent issue
net: hns3: fix "tc qdisc del" failed issue
taprio: Fix sending packets without dequeueing them
net: mvmdio: avoid error message for optional IRQ
net: dsa: mv88e6xxx: Add missing mask of ATU occupancy register
net: memcg: fix lockdep splat in inet_csk_accept()
s390/qeth: implement smarter resizing of the RX buffer pool
s390/qeth: refactor buffer pool code
s390/qeth: use page pointers to manage RX buffer pool
seg6: fix SRv6 L2 tunnels to use IANA-assigned protocol number
net: dsa: Don't instantiate phylink for CPU/DSA ports unless needed
net/packet: tpacket_rcv: do not increment ring index on drop
sxgbe: Fix off by one in samsung driver strncpy size arg
net: caif: Add lockdep expression to RCU traversal primitive
MAINTAINERS: remove Sathya Perla as Emulex NIC maintainer
...

Linus Torvalds
2020-03-13 07:19:19 +0800
a09833f7c Merge branch 'for-5.6-fixes' into for-5.7 Browse Code »

Tejun Heo
2020-03-13 04:44:18 +0800

11 Mar, 2020

1 commit

e876ecc67 cgroup: memcg: net: do not associate sock with unrelated cgroup ... Browse Code »

We are testing network memory accounting in our setup and noticed
inconsistent network memory usage and often unrelated cgroups network
usage correlates with testing workload. On further inspection, it
seems like mem_cgroup_sk_alloc() and cgroup_sk_alloc() are broken in
irq context specially for cgroup v1.

mem_cgroup_sk_alloc() and cgroup_sk_alloc() can be called in irq context
and kind of assumes that this can only happen from sk_clone_lock()
and the source sock object has already associated cgroup. However in
cgroup v1, where network memory accounting is opt-in, the source sock
can be unassociated with any cgroup and the new cloned sock can get
associated with unrelated interrupted cgroup.

Cgroup v2 can also suffer if the source sock object was created by
process in the root cgroup or if sk_alloc() is called in irq context.
The fix is to just do nothing in interrupt.

WARNING: Please note that about half of the TCP sockets are allocated
from the IRQ context, so, memory used by such sockets will not be
accouted by the memcg.

The stack trace of mem_cgroup_sk_alloc() from IRQ-context:

CPU: 70 PID: 12720 Comm: ssh Tainted: 5.6.0-smp-DEV #1
Hardware name: ...
Call Trace:

dump_stack+0x57/0x75
mem_cgroup_sk_alloc+0xe9/0xf0
sk_clone_lock+0x2a7/0x420
inet_csk_clone_lock+0x1b/0x110
tcp_create_openreq_child+0x23/0x3b0
tcp_v6_syn_recv_sock+0x88/0x730
tcp_check_req+0x429/0x560
tcp_v6_rcv+0x72d/0xa40
ip6_protocol_deliver_rcu+0xc9/0x400
ip6_input+0x44/0xd0
? ip6_protocol_deliver_rcu+0x400/0x400
ip6_rcv_finish+0x71/0x80
ipv6_rcv+0x5b/0xe0
? ip6_sublist_rcv+0x2e0/0x2e0
process_backlog+0x108/0x1e0
net_rx_action+0x26b/0x460
__do_softirq+0x104/0x2a6
do_softirq_own_stack+0x2a/0x40

do_softirq.part.19+0x40/0x50
__local_bh_enable_ip+0x51/0x60
ip6_finish_output2+0x23d/0x520
? ip6table_mangle_hook+0x55/0x160
__ip6_finish_output+0xa1/0x100
ip6_finish_output+0x30/0xd0
ip6_output+0x73/0x120
? __ip6_finish_output+0x100/0x100
ip6_xmit+0x2e3/0x600
? ipv6_anycast_cleanup+0x50/0x50
? inet6_csk_route_socket+0x136/0x1e0
? skb_free_head+0x1e/0x30
inet6_csk_xmit+0x95/0xf0
__tcp_transmit_skb+0x5b4/0xb20
__tcp_send_ack.part.60+0xa3/0x110
tcp_send_ack+0x1d/0x20
tcp_rcv_state_process+0xe64/0xe80
? tcp_v6_connect+0x5d1/0x5f0
tcp_v6_do_rcv+0x1b1/0x3f0
? tcp_v6_do_rcv+0x1b1/0x3f0
__release_sock+0x7f/0xd0
release_sock+0x30/0xa0
__inet_stream_connect+0x1c3/0x3b0
? prepare_to_wait+0xb0/0xb0
inet_stream_connect+0x3b/0x60
__sys_connect+0x101/0x120
? __sys_getsockopt+0x11b/0x140
__x64_sys_connect+0x1a/0x20
do_syscall_64+0x51/0x200
entry_SYSCALL_64_after_hwframe+0x44/0xa9

The stack trace of mem_cgroup_sk_alloc() from IRQ-context:
Fixes: 2d7580738345 ("mm: memcontrol: consolidate cgroup socket tracking")
Fixes: d979a39d7242 ("cgroup: duplicate cgroup reference when cloning sockets")
Signed-off-by: Shakeel Butt
Reviewed-by: Roman Gushchin
Signed-off-by: David S. Miller

Shakeel Butt
2020-03-11 06:33:05 +0800

05 Mar, 2020

1 commit

190ecb190 cgroup: fix psi_show() crash on 32bit ino archs ... Browse Code »

Similar to the commit d7495343228f ("cgroup: fix incorrect
WARN_ON_ONCE() in cgroup_setup_root()"), cgroup_id(root_cgrp) does not
equal to 1 on 32bit ino archs which triggers all sorts of issues with
psi_show() on s390x. For example,

BUG: KASAN: slab-out-of-bounds in collect_percpu_times+0x2d0/
Read of size 4 at addr 000000001e0ce000 by task read_all/3667
collect_percpu_times+0x2d0/0x798
psi_show+0x7c/0x2a8
seq_read+0x2ac/0x830
vfs_read+0x92/0x150
ksys_read+0xe2/0x188
system_call+0xd8/0x2b4

Fix it by using cgroup_ino().

Fixes: 743210386c03 ("cgroup: use cgrp->kn->id as the cgroup ID")
Signed-off-by: Qian Cai
Acked-by: Johannes Weiner
Signed-off-by: Tejun Heo
Cc: stable@vger.kernel.org # v5.5

Qian Cai
2020-03-05 00:15:58 +0800

13 Feb, 2020

9 commits

ef2c41cf3 clone3: allow spawning processes into cgroups ... Browse Code »

This adds support for creating a process in a different cgroup than its
parent. Callers can limit and account processes and threads right from
the moment they are spawned:
- A service manager can directly spawn new services into dedicated
cgroups.
- A process can be directly created in a frozen cgroup and will be
frozen as well.
- The initial accounting jitter experienced by process supervisors and
daemons is eliminated with this.
- Threaded applications or even thread implementations can choose to
create a specific cgroup layout where each thread is spawned
directly into a dedicated cgroup.

This feature is limited to the unified hierarchy. Callers need to pass
a directory file descriptor for the target cgroup. The caller can
choose to pass an O_PATH file descriptor. All usual migration
restrictions apply, i.e. there can be no processes in inner nodes. In
general, creating a process directly in a target cgroup adheres to all
migration restrictions.

One of the biggest advantages of this feature is that CLONE_INTO_GROUP does
not need to grab the write side of the cgroup cgroup_threadgroup_rwsem.
This global lock makes moving tasks/threads around super expensive. With
clone3() this lock is avoided.

Cc: Tejun Heo
Cc: Ingo Molnar
Cc: Oleg Nesterov
Cc: Johannes Weiner
Cc: Li Zefan
Cc: Peter Zijlstra
Cc: cgroups@vger.kernel.org
Signed-off-by: Christian Brauner
Signed-off-by: Tejun Heo

Christian Brauner
2020-02-13 06:57:51 +0800
f3553220d cgroup: add cgroup_may_write() helper ... Browse Code »

Add a cgroup_may_write() helper which we can use in the
CLONE_INTO_CGROUP patch series to verify that we can write to the
destination cgroup.

Cc: Tejun Heo
Cc: Johannes Weiner
Cc: Li Zefan
Cc: cgroups@vger.kernel.org
Signed-off-by: Christian Brauner
Signed-off-by: Tejun Heo

Christian Brauner
2020-02-13 06:57:51 +0800
5a5cf5cb3 cgroup: refactor fork helpers ... Browse Code »

This refactors the fork helpers so they can be easily modified in the
next patches. The patch just moves the cgroup threadgroup rwsem grab and
release into the helpers. They don't need to be directly exposed in fork.c.

Cc: Tejun Heo
Cc: Johannes Weiner
Cc: Li Zefan
Cc: cgroups@vger.kernel.org
Acked-by: Michal Koutný
Signed-off-by: Christian Brauner
Signed-off-by: Tejun Heo

Christian Brauner
2020-02-13 06:57:51 +0800
17703097f cgroup: add cgroup_get_from_file() helper ... Browse Code »

Add a helper cgroup_get_from_file(). The helper will be used in
subsequent patches to retrieve a cgroup while holding a reference to the
struct file it was taken from.

Cc: Tejun Heo
Cc: Johannes Weiner
Cc: Li Zefan
Cc: cgroups@vger.kernel.org
Acked-by: Michal Koutný
Signed-off-by: Christian Brauner
Signed-off-by: Tejun Heo

Christian Brauner
2020-02-13 06:57:51 +0800
6df970e4f cgroup: unify attach permission checking ... Browse Code »

The core codepaths to check whether a process can be attached to a
cgroup are the same for threads and thread-group leaders. Only a small
piece of code verifying that source and destination cgroup are in the
same domain differentiates the thread permission checking from
thread-group leader permission checking.
Since cgroup_migrate_vet_dst() only matters cgroup2 - it is a noop on
cgroup1 - we can move it out of cgroup_attach_task().
All checks can now be consolidated into a new helper
cgroup_attach_permissions() callable from both cgroup_procs_write() and
cgroup_threads_write().

Cc: Tejun Heo
Cc: Li Zefan
Cc: Johannes Weiner
Cc: cgroups@vger.kernel.org
Acked-by: Michal Koutný
Signed-off-by: Christian Brauner
Signed-off-by: Tejun Heo

Christian Brauner
2020-02-13 06:57:51 +0800
3010c5b9f cgroup.c: Use built-in RCU list checking ... Browse Code »

list_for_each_entry_rcu has built-in RCU and lock checking.
Pass cond argument to list_for_each_entry_rcu() to silence
false lockdep warning when CONFIG_PROVE_RCU_LIST is enabled
by default.

Even though the function css_next_child() already checks if
cgroup_mutex or rcu_read_lock() is held using
cgroup_assert_mutex_or_rcu_locked(), there is a need to pass
cond to list_for_each_entry_rcu() to avoid false positive
lockdep warning.

Signed-off-by: Madhuparna Bhowmik
Acked-by: Michal Koutný
Signed-off-by: Tejun Heo

Madhuparna Bhowmik
2020-02-13 06:12:22 +0800
f43caa2ad cgroup: Clean up css_set task traversal ... Browse Code »

css_task_iter stores pointer to head of each iterable list, this dates
back to commit 0f0a2b4fa621 ("cgroup: reorganize css_task_iter") when we
did not store cur_cset. Let us utilize list heads directly in cur_cset
and streamline css_task_iter_advance_css_set a bit. This is no
intentional function change.

Signed-off-by: Michal Koutný
Signed-off-by: Tejun Heo

Michal Koutný
2020-02-13 06:11:52 +0800
9c974c772 cgroup: Iterate tasks that did not finish do_exit() ... Browse Code »

PF_EXITING is set earlier than actual removal from css_set when a task
is exitting. This can confuse cgroup.procs readers who see no PF_EXITING
tasks, however, rmdir is checking against css_set membership so it can
transitionally fail with EBUSY.

Fix this by listing tasks that weren't unlinked from css_set active
lists.
It may happen that other users of the task iterator (without
CSS_TASK_ITER_PROCS) spot a PF_EXITING task before cgroup_exit(). This
is equal to the state before commit c03cd7738a83 ("cgroup: Include dying
leaders with live threads in PROCS iterations") but it may be reviewed
later.

Reported-by: Suren Baghdasaryan
Fixes: c03cd7738a83 ("cgroup: Include dying leaders with live threads in PROCS iterations")
Signed-off-by: Michal Koutný
Signed-off-by: Tejun Heo

Michal Koutný
2020-02-13 06:02:53 +0800
2d4ecb030 cgroup: cgroup_procs_next should increase position index ... Browse Code »

If seq_file .next fuction does not change position index,
read after some lseek can generate unexpected output:

1) dd bs=1 skip output of each 2nd elements
$ dd if=/sys/fs/cgroup/cgroup.procs bs=8 count=1
2
3
4
5
1+0 records in
1+0 records out
8 bytes copied, 0,000267297 s, 29,9 kB/s
[test@localhost ~]$ dd if=/sys/fs/cgroup/cgroup.procs bs=1 count=8
2
4 <<< NB! 3 was skipped
6 <<< ... and 5 too
8 <<< ... and 7
8+0 records in
8+0 records out
8 bytes copied, 5,2123e-05 s, 153 kB/s

This happen because __cgroup_procs_start() makes an extra
extra cgroup_procs_next() call

2) read after lseek beyond end of file generates whole last line.
3) read after lseek into middle of last line generates
expected rest of last line and unexpected whole line once again.

Additionally patch removes an extra position index changes in
__cgroup_procs_start()

Cc: stable@vger.kernel.org
https://bugzilla.kernel.org/show_bug.cgi?id=206283
Signed-off-by: Vasily Averin
Signed-off-by: Tejun Heo

Vasily Averin
2020-02-13 06:00:04 +0800

11 Feb, 2020

1 commit

0a679e13e Merge branch 'for-5.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup ... Browse Code »

Pull cgroup fix from Tejun Heo:
"I made a mistake while removing cgroup task list lazy init
optimization making the root cgroup.procs show entries for the
init_tasks. The zero entries doesn't cause critical failures but does
make systemd print out warning messages during boot.

Fix it by omitting init_tasks as they should be"

* 'for-5.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: init_tasks shouldn't be linked to the root cgroup

Linus Torvalds
2020-02-11 09:07:05 +0800

09 Feb, 2020

1 commit

c9d35ee04 Merge branch 'merge.nfs-fs_parse.1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull vfs file system parameter updates from Al Viro:
"Saner fs_parser.c guts and data structures. The system-wide registry
of syntax types (string/enum/int32/oct32/.../etc.) is gone and so is
the horror switch() in fs_parse() that would have to grow another case
every time something got added to that system-wide registry.

New syntax types can be added by filesystems easily now, and their
namespace is that of functions - not of system-wide enum members. IOW,
they can be shared or kept private and if some turn out to be widely
useful, we can make them common library helpers, etc., without having
to do anything whatsoever to fs_parse() itself.

And we already get that kind of requests - the thing that finally
pushed me into doing that was "oh, and let's add one for timeouts -
things like 15s or 2h". If some filesystem really wants that, let them
do it. Without somebody having to play gatekeeper for the variants
blessed by direct support in fs_parse(), TYVM.

Quite a bit of boilerplate is gone. And IMO the data structures make a
lot more sense now. -200LoC, while we are at it"

* 'merge.nfs-fs_parse.1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (25 commits)
tmpfs: switch to use of invalfc()
cgroup1: switch to use of errorfc() et.al.
procfs: switch to use of invalfc()
hugetlbfs: switch to use of invalfc()
cramfs: switch to use of errofc() et.al.
gfs2: switch to use of errorfc() et.al.
fuse: switch to use errorfc() et.al.
ceph: use errorfc() and friends instead of spelling the prefix out
prefix-handling analogues of errorf() and friends
turn fs_param_is_... into functions
fs_parse: handle optional arguments sanely
fs_parse: fold fs_parameter_desc/fs_parameter_spec
fs_parser: remove fs_parameter_description name field
add prefix to fs_context->log
ceph_parse_param(), ceph_parse_mon_ips(): switch to passing fc_log
new primitive: __fs_parse()
switch rbd and libceph to p_log-based primitives
struct p_log, variants of warnf() et.al. taking that one instead
teach logfc() to handle prefices, give it saner calling conventions
get rid of cg_invalf()
...

Linus Torvalds
2020-02-09 05:26:41 +0800

08 Feb, 2020

2 commits

d7167b149 fs_parse: fold fs_parameter_desc/fs_parameter_spec ... Browse Code »

The former contains nothing but a pointer to an array of the latter...

Signed-off-by: Al Viro

Al Viro
2020-02-08 03:48:37 +0800
96cafb9cc fs_parser: remove fs_parameter_description name field ... Browse Code »

Unused now.

Signed-off-by: Eric Sandeen
Acked-by: David Howells
Signed-off-by: Al Viro

Eric Sandeen
2020-02-08 03:48:36 +0800

31 Jan, 2020

1 commit

0cd9d33ac cgroup: init_tasks shouldn't be linked to the root cgroup ... Browse Code »

5153faac18d2 ("cgroup: remove cgroup_enable_task_cg_lists()
optimization") removed lazy initialization of css_sets so that new
tasks are always lniked to its css_set. In the process, it incorrectly
ended up adding init_tasks to root css_set. They show up as PID 0's in
root's cgroup.procs triggering warnings in systemd and generally
confusing people.

Fix it by skip css_set linking for init_tasks.

Signed-off-by: Tejun Heo
Reported-by: https://github.com/joanbm
Link: https://github.com/systemd/systemd/issues/14682
Fixes: 5153faac18d2 ("cgroup: remove cgroup_enable_task_cg_lists() optimization")
Cc: stable@vger.kernel.org # v5.5+

Tejun Heo
2020-01-31 00:37:33 +0800

29 Jan, 2020

1 commit

bd2463ac7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next ... Browse Code »

Pull networking updates from David Miller:

1) Add WireGuard

2) Add HE and TWT support to ath11k driver, from John Crispin.

3) Add ESP in TCP encapsulation support, from Sabrina Dubroca.

4) Add variable window congestion control to TIPC, from Jon Maloy.

5) Add BCM84881 PHY driver, from Russell King.

6) Start adding netlink support for ethtool operations, from Michal
Kubecek.

7) Add XDP drop and TX action support to ena driver, from Sameeh
Jubran.

8) Add new ipv4 route notifications so that mlxsw driver does not have
to handle identical routes itself. From Ido Schimmel.

9) Add BPF dynamic program extensions, from Alexei Starovoitov.

10) Support RX and TX timestamping in igc, from Vinicius Costa Gomes.

11) Add support for macsec HW offloading, from Antoine Tenart.

12) Add initial support for MPTCP protocol, from Christoph Paasch,
Matthieu Baerts, Florian Westphal, Peter Krystad, and many others.

13) Add Octeontx2 PF support, from Sunil Goutham, Geetha sowjanya, Linu
Cherian, and others.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1469 commits)
net: phy: add default ARCH_BCM_IPROC for MDIO_BCM_IPROC
udp: segment looped gso packets correctly
netem: change mailing list
qed: FW 8.42.2.0 debug features
qed: rt init valid initialization changed
qed: Debug feature: ilt and mdump
qed: FW 8.42.2.0 Add fw overlay feature
qed: FW 8.42.2.0 HSI changes
qed: FW 8.42.2.0 iscsi/fcoe changes
qed: Add abstraction for different hsi values per chip
qed: FW 8.42.2.0 Additional ll2 type
qed: Use dmae to write to widebus registers in fw_funcs
qed: FW 8.42.2.0 Parser offsets modified
qed: FW 8.42.2.0 Queue Manager changes
qed: FW 8.42.2.0 Expose new registers and change windows
qed: FW 8.42.2.0 Internal ram offsets modifications
MAINTAINERS: Add entry for Marvell OcteonTX2 Physical Function driver
Documentation: net: octeontx2: Add RVU HW and drivers overview
octeontx2-pf: ethtool RSS config support
octeontx2-pf: Add basic ethtool support
...

Linus Torvalds
2020-01-29 08:02:33 +0800

16 Jan, 2020

1 commit

3bc0bb36f cgroup: Prevent double killing of css when enabling threaded cgroup ... Browse Code »

The test_cgcore_no_internal_process_constraint_on_threads selftest when
running with subsystem controlling noise triggers two warnings:

> [ 597.443115] WARNING: CPU: 1 PID: 28167 at kernel/cgroup/cgroup.c:3131 cgroup_apply_control_enable+0xe0/0x3f0
> [ 597.443413] WARNING: CPU: 1 PID: 28167 at kernel/cgroup/cgroup.c:3177 cgroup_apply_control_disable+0xa6/0x160

Both stem from a call to cgroup_type_write. The first warning was also
triggered by syzkaller.

When we're switching cgroup to threaded mode shortly after a subsystem
was disabled on it, we can see the respective subsystem css dying there.

The warning in cgroup_apply_control_enable is harmless in this case
since we're not adding new subsys anyway.
The warning in cgroup_apply_control_disable indicates an attempt to kill
css of recently disabled subsystem repeatedly.

The commit prevents these situations by making cgroup_type_write wait
for all dying csses to go away before re-applying subtree controls.
When at it, the locations of WARN_ON_ONCE calls are moved so that
warning is triggered only when we are about to misuse the dying css.

Reported-by: syzbot+5493b2a54d31d6aea629@syzkaller.appspotmail.com
Reported-by: Christian Brauner
Signed-off-by: Michal Koutný
Signed-off-by: Tejun Heo

Michal Koutný
2020-01-16 00:04:29 +0800

20 Dec, 2019

1 commit

7dd68b327 bpf: Support replacing cgroup-bpf program in MULTI mode ... Browse Code »

The common use-case in production is to have multiple cgroup-bpf
programs per attach type that cover multiple use-cases. Such programs
are attached with BPF_F_ALLOW_MULTI and can be maintained by different
people.

Order of programs usually matters, for example imagine two egress
programs: the first one drops packets and the second one counts packets.
If they're swapped the result of counting program will be different.

It brings operational challenges with updating cgroup-bpf program(s)
attached with BPF_F_ALLOW_MULTI since there is no way to replace a
program:

* One way to update is to detach all programs first and then attach the
new version(s) again in the right order. This introduces an
interruption in the work a program is doing and may not be acceptable
(e.g. if it's egress firewall);

* Another way is attach the new version of a program first and only then
detach the old version. This introduces the time interval when two
versions of same program are working, what may not be acceptable if a
program is not idempotent. It also imposes additional burden on
program developers to make sure that two versions of their program can
co-exist.

Solve the problem by introducing a "replace" mode in BPF_PROG_ATTACH
command for cgroup-bpf programs being attached with BPF_F_ALLOW_MULTI
flag. This mode is enabled by newly introduced BPF_F_REPLACE attach flag
and bpf_attr.replace_bpf_fd attribute to pass fd of the old program to
replace

That way user can replace any program among those attached with
BPF_F_ALLOW_MULTI flag without the problems described above.

Details of the new API:

* If BPF_F_REPLACE is set but replace_bpf_fd doesn't have valid
descriptor of BPF program, BPF_PROG_ATTACH will return corresponding
error (EINVAL or EBADF).

* If replace_bpf_fd has valid descriptor of BPF program but such a
program is not attached to specified cgroup, BPF_PROG_ATTACH will
return ENOENT.

BPF_F_REPLACE is introduced to make the user intent clear, since
replace_bpf_fd alone can't be used for this (its default value, 0, is a
valid fd). BPF_F_REPLACE also makes it possible to extend the API in the
future (e.g. add BPF_F_BEFORE and BPF_F_AFTER if needed).

Signed-off-by: Andrey Ignatov
Signed-off-by: Alexei Starovoitov
Acked-by: Martin KaFai Lau
Acked-by: Andrii Narkyiko
Link: https://lore.kernel.org/bpf/30cd850044a0057bdfcaaf154b7d2f39850ba813.1576741281.git.rdna@fb.com

Andrey Ignatov
2019-12-20 13:22:25 +0800

26 Nov, 2019

1 commit

1b96a41b4 Merge branch 'for-5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup ... Browse Code »

Pull cgroup updates from Tejun Heo:
"There are several notable changes here:

- Single thread migrating itself has been optimized so that it
doesn't need threadgroup rwsem anymore.

- Freezer optimization to avoid unnecessary frozen state changes.

- cgroup ID unification so that cgroup fs ino is the only unique ID
used for the cgroup and can be used to directly look up live
cgroups through filehandle interface on 64bit ino archs. On 32bit
archs, cgroup fs ino is still the only ID in use but it is only
unique when combined with gen.

- selftest and other changes"

* 'for-5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (24 commits)
writeback: fix -Wformat compilation warnings
docs: cgroup: mm: Fix spelling of "list"
cgroup: fix incorrect WARN_ON_ONCE() in cgroup_setup_root()
cgroup: use cgrp->kn->id as the cgroup ID
kernfs: use 64bit inos if ino_t is 64bit
kernfs: implement custom exportfs ops and fid type
kernfs: combine ino/id lookup functions into kernfs_find_and_get_node_by_id()
kernfs: convert kernfs_node->id from union kernfs_node_id to u64
kernfs: kernfs_find_and_get_node_by_ino() should only look up activated nodes
kernfs: use dumber locking for kernfs_find_and_get_node_by_ino()
netprio: use css ID instead of cgroup ID
writeback: use ino_t for inodes in tracepoints
kernfs: fix ino wrap-around detection
kselftests: cgroup: Avoid the reuse of fd after it is deallocated
cgroup: freezer: don't change task and cgroups status unnecessarily
cgroup: use cgroup->last_bstat instead of cgroup->bstat_pending for consistency
cgroup: remove cgroup_enable_task_cg_lists() optimization
cgroup: pids: use atomic64_t for pids->limit
selftests: cgroup: Run test_core under interfering stress
selftests: cgroup: Add task migration tests
...

Linus Torvalds
2019-11-26 11:23:46 +0800

15 Nov, 2019

1 commit

d74953432 cgroup: fix incorrect WARN_ON_ONCE() in cgroup_setup_root() ... Browse Code »

743210386c03 ("cgroup: use cgrp->kn->id as the cgroup ID") added WARN
which triggers if cgroup_id(root_cgrp) is not 1. This is fine on
64bit ino archs but on 32bit archs cgroup ID is ((gen << 32) | ino)
and gen starts at 1, so the root id is 0x1_0000_0001 instead of 1
always triggering the WARN.

What we wanna make sure is that the ino part is 1. Fix it.

Reported-by: Naresh Kamboju
Fixes: 743210386c03 ("cgroup: use cgrp->kn->id as the cgroup ID")
Signed-off-by: Tejun Heo

Tejun Heo
2019-11-15 06:46:51 +0800

13 Nov, 2019

3 commits

743210386 cgroup: use cgrp->kn->id as the cgroup ID ... Browse Code »

cgroup ID is currently allocated using a dedicated per-hierarchy idr
and used internally and exposed through tracepoints and bpf. This is
confusing because there are tracepoints and other interfaces which use
the cgroupfs ino as IDs.

The preceding changes made kn->id exposed as ino as 64bit ino on
supported archs or ino+gen (low 32bits as ino, high gen). There's no
reason for cgroup to use different IDs. The kernfs IDs are unique and
userland can easily discover them and map them back to paths using
standard file operations.

This patch replaces cgroup IDs with kernfs IDs.

* cgroup_id() is added and all cgroup ID users are converted to use it.

* kernfs_node creation is moved to earlier during cgroup init so that
cgroup_id() is available during init.

* While at it, s/cgroup/cgrp/ in psi helpers for consistency.

* Fallback ID value is changed to 1 to be consistent with root cgroup
ID.

Signed-off-by: Tejun Heo
Reviewed-by: Greg Kroah-Hartman
Cc: Namhyung Kim

Tejun Heo
2019-11-13 00:18:04 +0800
fe0f726c9 kernfs: combine ino/id lookup functions into kernfs_find_and_get_node_by_id() ... Browse Code »

kernfs_find_and_get_node_by_ino() looks the kernfs_node matching the
specified ino. On top of that, kernfs_get_node_by_id() and
kernfs_fh_get_inode() implement full ID matching by testing the rest
of ID.

On surface, confusingly, the two are slightly different in that the
latter uses 0 gen as wildcard while the former doesn't - does it mean
that the latter can't uniquely identify inodes w/ 0 gen? In practice,
this is a distinction without a difference because generation number
starts at 1. There are no actual IDs with 0 gen, so it can always
safely used as wildcard.

Let's simplify the code by renaming kernfs_find_and_get_node_by_ino()
to kernfs_find_and_get_node_by_id(), moving all lookup logics into it,
and removing now unnecessary kernfs_get_node_by_id().

Signed-off-by: Tejun Heo
Reviewed-by: Greg Kroah-Hartman

Tejun Heo
2019-11-13 00:18:04 +0800
67c0496e8 kernfs: convert kernfs_node->id from union kernfs_node_id to u64 ... Browse Code »

kernfs_node->id is currently a union kernfs_node_id which represents
either a 32bit (ino, gen) pair or u64 value. I can't see much value
in the usage of the union - all that's needed is a 64bit ID which the
current code is already limited to. Using a union makes the code
unnecessarily complicated and prevents using 64bit ino without adding
practical benefits.

This patch drops union kernfs_node_id and makes kernfs_node->id a u64.
ino is stored in the lower 32bits and gen upper. Accessors -
kernfs[_id]_ino() and kernfs[_id]_gen() - are added to retrieve the
ino and gen. This simplifies ID handling less cumbersome and will
allow using 64bit inos on supported archs.

This patch doesn't make any functional changes.

Signed-off-by: Tejun Heo
Reviewed-by: Greg Kroah-Hartman
Cc: Namhyung Kim
Cc: Jens Axboe
Cc: Alexei Starovoitov

Tejun Heo
2019-11-13 00:18:03 +0800